\section{Discussion} The experiment on \textsc{IntroClass} reveals some limitations of \psychic{}. First, the tool has trouble scaling with growing path spaces. While this is inherent to the path explosion problem, not all paths need to be explored. Therefore, more aggressive path pruning~\cite{homi,hydiff} and scheduling heuristics~\cite{directed,rsc,paradyse} could prove to be beneficial. The use of \textsc{DiffTest} on any (undifferentiated) state pair in algorithm~\ref{diffgen} further hurts performance. Next, while in theory, \klee{} can execute any LLVM bitcode, the programming interface sets the limit on logical interpretation. More specifically, to leave rooms for optimization, C and POSIX standards leave a lot of behaviors unspecified. For example, function \texttt{atoi(3)}, which converts string to integer, may return any number upon a parsing error~\cite{atoi}, lumping this implementation-defined number with error cases. Some others like \texttt{scanf(3)} and \texttt{printf(3)} are prohibitively expensive due to heavy pattern matching, and since they operate on input/output, their calls are likely dominators or post-dominators in the control-flow graph, requiring manual interventions. Failing to replace these calls could either significantly degrade efficiency or force the theorem prover to give up on \gls{smt} solving. In case of \textsc{IntroClass}, ignoring \texttt{printf(3)} calls disabled the use of output files capturing. On more positive notes, capturing local function return values and mutable arguments has proved its usability. Together with the use of eager concrete execution to skip symbolic test generation, promising preliminaries have been obtained. \subsection{Threats to Validity} The small scope of experiments on toy programs challenges the practical generalizibility of the tool. Since the \textsc{Prophet}-based \gls{apr} tool MSV mines for tokens within the codebase~\cite{prophet}, patches are not diverse in semantics and the overall result is not statistically strong. \subsection{Related Works} There exists multiple differential test generators for a pair of program revisions, such as \textsc{Nezha} which uses black- or gray-box fuzzing~\cite{nezha}, \textsc{Shadow} based on symbolic execution~\cite{shadow}, and \textsc{HyDiff} combining both~\cite{hydiff}. Some recent work targets the opposite problem of detecting semantic code clone~\cite{semclone}. For interpreting multiple program variants symbolically at the same time, alternative to \psychic{} keeping independent states, other approaches may choose shadow execution~\cite{shadow}, or to join~\cite{join-split} or merge~\cite{merge} states at higher-order branches.