\section{Experiments} To evaluate our approach, we perform experiments with our implementation \psychic{} on C meta programs to generate decision trees of patches. \subsection{Experiment setup} Experiments are run on a machine with AMD Ryzen 3700X at \SI{3.6}{\giga\hertz} and \SI{16}{\giga\byte} of memory. Software dependencies and tooling are provided by a declarative Nix environment~\cite{nix} for reproducibility. We evaluate \psychic{} on patches generated by MSV~\cite{msv}, a fork of the pattern-based \gls{apr} tool \textsc{Prophet}~\cite{prophet} that adds extra repair templates and generate meta programs. Patches are generated for \textsc{IntroClass}, a set of attempts for solving homework problems~\cite{introclass}. In addition to the test cases provided by \textsc{IntroClass}, which are non-exhaustive of the input domains, we use property tests~\cite{hypothesis} with the respective ground truth program\footnote{\url{https://github.com/McSinyx/IntroClass}} to select buggy programs. MSV then generate fixes in the form of meta programs, which are then post-processed into \psychic{}-compatible format. Due to the limitation of \klee{}'s POSIX runtime, \texttt{libc} calls such as \texttt{scanf} has to be replaced by more direct access of standard input or command-line arguments. Unless the logic is outside of the \texttt{main} function, an output value is manually annotated. A selection of 10 meta programs with at least two semantically different revisions (including the original buggy version) are then fed to \psychic{} to generate differential tests and decision trees with a timeout of 10 minutes. The vast majority of these bugs are either simple logical mistakes or missing initializations that trigger \glspl{ub}. \subsection{Results} Experiment results on \textsc{IntroClass} is summarized in table~\ref{result}. In case of \texttt{digits} \texttt{1b31@2} and \texttt{smallest} \texttt{d25c@1}, \psychic{} failed to generate differential tests. In other cases, except for the timeout in \texttt{digits} \texttt{0cdf@3}, the tool successfully differentiate all semantically different patches. \begin{table}[h] \begin{center} \begin{tabular}{l c l r r r r} \toprule Task & Attempt & Bug type & $n$ & Tree height & Largest cluster & Time (s)\\ \midrule \texttt{checksum} & \texttt{3b23@3} & Arithmetic & 2 & 1 & 1 & 1.2\\ \texttt{checksum} & \texttt{cb24@3} & Logic & 3 & 1 & 2 & 1.7\\ \texttt{digits} & \texttt{0cdf@3} & Logic & 16 & 1 & 5 & 600.0\\ \texttt{digits} & \texttt{1b31@2} & Logic & 5 & 0 & 5 & 0.9\\ \texttt{grade} & \texttt{1b31@3} & \gls{ub} & 2 & 1 & 1 & 0.8\\ \texttt{grade} & \texttt{317a@3} & \gls{ub} & 8 & 3 & 5 & 1.0\\ \texttt{grade} & \texttt{b192@3} & \gls{ub} & 2 & 1 & 1 & 0.5\\ \texttt{median} & \texttt{97f6@3} & \gls{ub} & 7 & 1 & 6 & 1.6\\ \texttt{smallest} & \texttt{0704@2} & \gls{ub} & 5 & 1 & 3 & 87.0\\ \texttt{smallest} & \texttt{d25c@1} & \gls{ub} & 5 & 0 & 5 & 1.3\\ \bottomrule \end{tabular} \end{center} \caption{Decision tree quality on the patch pool generated by MSV for \textsc{IntroClass} programs. Respective to each meta program with $n$ revisions are the height of the patch decision tree and the maximum number of undistinguished revisions.}\label{result} \end{table} \psychic{} took orders of magnitude more time on \texttt{digits} \texttt{0cdf@3} and \texttt{smallest} \texttt{d25c@1}, which respectively include a loop and heavily-nested conditions. \begin{figure} \begin{tikzpicture}[mymatrix/.style={matrix of nodes, nodes=typetag,row sep=1ex}, typetag/.style={draw,anchor=west}, title/.style={draw=none,inner sep=0pt}, line width=0.2ex] \node[draw,diamond,aspect=4] (root) {Input: 6 3 2 1 0}; \node[draw,diamond,aspect=4,below left=6em of root] (left) {Input: 66 7 6 2 1}; \draw[-angle 45] (root) to node[above,sloped]{Output: \gls{ub}} (left); \node[draw,diamond,aspect=4,below right=6em of root] (right) {Input: 6 3 2 1 1}; \draw[-angle 45] (root) to node[above,sloped]{Output: F} (right); \node[draw,below left=6em of left] (i) {Original}; \node[draw,right=5em of i] (d) {Patches 108, 147, 187, 227, 267}; \node[draw,right=5em of d] (a) {Patch 10}; \node[draw,right=5em of a] (b) {Patch 307}; \draw[-angle 45] (left) to node[above,sloped]{Output: \gls{ub}} (i); \draw[-angle 45] (left) to node[above,sloped]{Output: F} (d); \draw[-angle 45] (right) to node[above,sloped]{Output: D} (a); \draw[-angle 45] (right) to node[above,sloped]{Output: F} (b); \end{tikzpicture} \caption{Decision tree generated from patches of \texttt{grade} \texttt{317a@3}, with outputs minimized.}\label{317a} \end{figure} To demonstrate the decision tree, we examine the result for \texttt{grade} \texttt{317a@3}. The task is to calculate the letter grade, given the threshold for each grade from A to D, followed by a percentage grade. This attempt is buggy because it misses the initialization the result grade variable, which is inserted in various locations by MSV. The decision tree generated from these patches is illustrated in figure~\ref{317a}, from which patch number 10 can be deduced to be the most correct one.