1 files changed, 55 insertions, 13 deletions
diff --git a/eval.tex b/eval.tex
index cfd76ab..63fb11b 100644
--- a/eval.tex
+++ b/eval.tex
@@ -20,33 +20,40 @@ ground truth program\footnote{\url{https://github.com/McSinyx/IntroClass}}
 to select buggy programs.  MSV then generate fixes in form of meta programs,
 which are then post-processed into \psychic{}-compatible format.
 
+Due to the limitation of \klee{}'s POSIX runtime, \texttt{libc} calls
+such as \texttt{scanf} has to be replaced by more direct access
+of standard input or command-line arguments.  Unless the logic is outside
+of the \texttt{main} function, an output value is manually annotated.
+
 A selection of 10 meta programs with at least two semantically different
 revisions (including the original buggy version) are then fed to \psychic{}
 to generate differential tests and decision trees with a timeout of 10 minutes.
+The vast majority of these bugs are either simple logical mistakes
+or missing initializations that trigger \glspl{ub}.
 
 \subsection{Results}
-Experiment results on \textsc{IntroClass} is shown in table~\ref{result}.
-In case of \texttt{digits-1b31@2} and \texttt{smallest-} \texttt{d25c@1},
+Experiment results on \textsc{IntroClass} is summarized in table~\ref{result}.
+In case of \texttt{digits} \texttt{1b31@2} and \texttt{smallest} \texttt{d25c@1},
 \psychic{} failed to generate differential tests.  In other cases,
-except for the timeout in \texttt{digits-} \texttt{0cdf@3},
+except for the timeout in \texttt{digits} \texttt{0cdf@3},
 the tool successfully differentiate all semantically different patches.
 
 \begin{table}[h]
   \begin{center}
     \begin{tabular}{l c l r r r r}
       \toprule
-      Task & Attempt & Bug type & $n$ & Tree height & Largest cluster & Time\\
+      Task & Attempt & Bug type & $n$ & Tree height & Largest cluster & Time (s)\\
       \midrule
       \texttt{checksum} & \texttt{3b23@3} & Arithmetic & 2 & 1 & 1 & 1.2\\
       \texttt{checksum} & \texttt{cb24@3} & Logic & 3 & 1 & 2 & 1.7\\
       \texttt{digits} & \texttt{0cdf@3} & Logic & 16 & 1 & 5 & 600.0\\
       \texttt{digits} & \texttt{1b31@2} & Logic & 5 & 0 & 5 & 0.9\\
-      \texttt{grade} & \texttt{1b31@3} & UB & 2 & 1 & 1 & 0.8\\
-      \texttt{grade} & \texttt{317a@3} & Logic & 8 & 2 & 5 & 1.0\\
-      \texttt{grade} & \texttt{b192@3} & UB & 2 & 1 & 1 & 0.5\\
-      \texttt{median} & \texttt{97f6@3} & UB & 7 & 1 & 6 & 1.6\\
-      \texttt{smallest} & \texttt{0704@2} & UB & 5 & 1 & 3 & 87.0\\
-      \texttt{smallest} & \texttt{d25c@1} & UB & 5 & 0 & 5 & 1.3\\
+      \texttt{grade} & \texttt{1b31@3} & \gls{ub} & 2 & 1 & 1 & 0.8\\
+      \texttt{grade} & \texttt{317a@3} & \gls{ub} & 8 & 3 & 5 & 1.0\\
+      \texttt{grade} & \texttt{b192@3} & \gls{ub} & 2 & 1 & 1 & 0.5\\
+      \texttt{median} & \texttt{97f6@3} & \gls{ub} & 7 & 1 & 6 & 1.6\\
+      \texttt{smallest} & \texttt{0704@2} & \gls{ub} & 5 & 1 & 3 & 87.0\\
+      \texttt{smallest} & \texttt{d25c@1} & \gls{ub} & 5 & 0 & 5 & 1.3\\
       \bottomrule
     \end{tabular}
   \end{center}
@@ -56,6 +63,41 @@ the tool successfully differentiate all semantically different patches.
     and the maximum number of undistinguished revisions.}\label{result}
 \end{table}
 
-The stark difference in execution time reveals an inherent weakness
-of our approach: heavily-nested conditions or loops increase
-the search space dramatically.
+\psychic{} took orders of magnitude more time on \texttt{digits} \texttt{0cdf@3}
+and \texttt{smallest} \texttt{d25c@1}, which respectively include a loop
+and heavily-nested conditions.
+
+\begin{figure}
+  \begin{tikzpicture}[mymatrix/.style={matrix of nodes,
+                                       nodes=typetag,row sep=1ex},
+                      typetag/.style={draw,anchor=west},
+                      title/.style={draw=none,inner sep=0pt},
+                      line width=0.2ex]
+    \node[draw,diamond,aspect=4] (root) {Input: 6 3 2 1 0}
+    \node[draw,diamond,aspect=4,below left=6em of root]
+      (left) {Input: 66 7 6 2 1}
+    \draw[-angle 45] (root) to node[above,sloped]{Output: \gls{ub}} (left);
+    \node[draw,diamond,aspect=4,below right=6em of root]
+      (right) {Input: 6 3 2 1 1}
+    \draw[-angle 45] (root) to node[above,sloped]{Output: F} (right);
+    \node[draw,below left=6em of left] (i) {Original}
+    \node[draw,right=5em of i] (d) {Patches 108, 147, 187, 227, 267}
+    \node[draw,right=5em of d] (a) {Patch 10}
+    \node[draw,right=5em of a] (b) {Patch 307}
+    \draw[-angle 45] (left) to node[above,sloped]{Output: \gls{ub}} (i);
+    \draw[-angle 45] (left) to node[above,sloped]{Output: F} (d);
+    \draw[-angle 45] (right) to node[above,sloped]{Output: D} (a);
+    \draw[-angle 45] (right) to node[above,sloped]{Output: F} (b);
+  \end{tikzpicture}
+  \caption{Decision tree generated from patches of \texttt{grade}
+    \texttt{317a@3}, with outputs minimized.}\label{317a}
+\end{figure}
+
+To demonstrate the decision tree, we examine the result for \texttt{grade}
+\texttt{317a@3}.  The task is to calculate the letter grade,
+given the threshold for each grade from A to D, followed by a percentage grade.
+This attempt is buggy because it misses the initialization the result
+grade variable, which is inserted in various locations by MSV.
+The decision tree generated from these patches is illustrated
+in figure~\ref{317a}, from which patch number 10 can be deduced to be
+the most correct one.