8 files changed, 166 insertions, 46 deletions
diff --git a/abstract.tex b/abstract.tex
index 4701ff7..23c50a2 100644
--- a/abstract.tex
+++ b/abstract.tex
@@ -1,5 +1,11 @@
 \renewcommand{\abstractname}{\Large Abstract}
 \begin{abstract}
-Your abstract should be here.
+In recent decades, \gls{apr} has been advancing consistently
+according to benchmarks.  However, its use in practice is still limited
+due to the difficulty in choosing a desired patch among the generated pool.
+This work introduces a method to logically differentiate between patches
+through symbolic execution.  The technique generates a tree of decisions
+for developers to reason between patches based on the program's inputs
+and semi-automatically captured outputs.
 \end{abstract}
 \thispagestyle{empty}
diff --git a/all.tex b/all.tex
index 519096f..a87213e 100644
--- a/all.tex
+++ b/all.tex
@@ -32,6 +32,7 @@
 \usetikzlibrary{matrix}
 \usetikzlibrary{positioning}
 \usetikzlibrary{shapes}
+\usetikzlibrary{shapes.geometric}
 
 \renewcommand\cftsecafterpnum{\vskip15pt}
 \renewcommand\cftsubsecafterpnum{\vskip15pt}
diff --git a/discuss.tex b/discuss.tex
index 682bd5e..12af349 100644
--- a/discuss.tex
+++ b/discuss.tex
@@ -1,3 +1,49 @@
 \section{Discussion}
+The experiment on \textsc{IntroClass} reveals some limitations
+of \psychic{}.  First, the tool has trouble scaling with growing path spaces.
+While this is inherent to the path explosion problem, not all paths
+need to be explored.  Therefore, more aggressive path pruning~\cite{homi,hydiff}
+and scheduling heuristics~\cite{directed,rsc,paradyse} could prove
+to be beneficial.  The use of \textsc{DiffTest} on any (undifferentiated)
+state pair in algorithm~\ref{diffgen} further hurts performance.
+
+Next, while in theory, \klee{} can execute any LLVM bitcode,
+the programming interface sets the limit on logical interpretation.
+More specifically, to leave rooms for optimization, C and POSIX standards
+leave a lot of behaviors unspecified.  For example, function \texttt{atoi(3)},
+which converts string to integer, may return any number upon
+a parsing error~\cite{atoi}, lumping this implementation-defined number
+with error cases.
+
+Some others like \texttt{scanf(3)} and \texttt{printf(3)}
+are prohibitively expensive due to heavy pattern matching, and since
+they operate on input/output, their calls are likely dominators
+or post-dominators in the control-flow graph, requiring manual interventions.
+Failing to replace these calls could either significantly degrade efficiency
+or force the theorem prover to give up on \gls{smt} solving.  In case
+of \textsc{IntroClass}, ignoring \texttt{printf(3)} calls disabled
+the use of output files capturing.
+
+On more positive notes, capturing local function return values
+and mutable arguments has proved its usability.  Together
+with the use of eager concrete execution to skip symbolic test generation,
+promising preliminaries have been obtained.
+
 \subsection{Threats to Validity}
-\subsection{Related Work}
+The small scope of experiments on toy programs challenges
+the practical generalizibility of the tool.  Since the \textsc{Prophet}-based
+\gls{apr} tool MSV mines for tokens within the codebase~\cite{prophet},
+patches are not diverse in semantics and the overall result
+is not statistically strong.
+
+\subsection{Related Works}
+There exists multiple differential test generators for a pair
+of program revisions, such as \textsc{Nezha} which uses black- or gray-box
+fuzzing~\cite{nezha}, \textsc{Shadow} based on symbolic execution~\cite{shadow},
+and \textsc{HyDiff} combining both~\cite{hydiff}.  Some recent work targets
+the opposite problem of detecting semantic code clone~\cite{semclone}.
+
+For interpreting multiple program variants symbolically at the same time,
+alternative to \psychic{} keeping independent states, other approaches
+may choose shadow execution~\cite{shadow}, or to join~\cite{join-split}
+or merge~\cite{merge} states at higher-order branches.
diff --git a/eval.tex b/eval.tex
index cfd76ab..63fb11b 100644
--- a/eval.tex
+++ b/eval.tex
@@ -20,33 +20,40 @@ ground truth program\footnote{\url{https://github.com/McSinyx/IntroClass}}
 to select buggy programs.  MSV then generate fixes in form of meta programs,
 which are then post-processed into \psychic{}-compatible format.
 
+Due to the limitation of \klee{}'s POSIX runtime, \texttt{libc} calls
+such as \texttt{scanf} has to be replaced by more direct access
+of standard input or command-line arguments.  Unless the logic is outside
+of the \texttt{main} function, an output value is manually annotated.
+
 A selection of 10 meta programs with at least two semantically different
 revisions (including the original buggy version) are then fed to \psychic{}
 to generate differential tests and decision trees with a timeout of 10 minutes.
+The vast majority of these bugs are either simple logical mistakes
+or missing initializations that trigger \glspl{ub}.
 
 \subsection{Results}
-Experiment results on \textsc{IntroClass} is shown in table~\ref{result}.
-In case of \texttt{digits-1b31@2} and \texttt{smallest-} \texttt{d25c@1},
+Experiment results on \textsc{IntroClass} is summarized in table~\ref{result}.
+In case of \texttt{digits} \texttt{1b31@2} and \texttt{smallest} \texttt{d25c@1},
 \psychic{} failed to generate differential tests.  In other cases,
-except for the timeout in \texttt{digits-} \texttt{0cdf@3},
+except for the timeout in \texttt{digits} \texttt{0cdf@3},
 the tool successfully differentiate all semantically different patches.
 
 \begin{table}[h]
   \begin{center}
     \begin{tabular}{l c l r r r r}
       \toprule
-      Task & Attempt & Bug type & $n$ & Tree height & Largest cluster & Time\\
+      Task & Attempt & Bug type & $n$ & Tree height & Largest cluster & Time (s)\\
       \midrule
       \texttt{checksum} & \texttt{3b23@3} & Arithmetic & 2 & 1 & 1 & 1.2\\
       \texttt{checksum} & \texttt{cb24@3} & Logic & 3 & 1 & 2 & 1.7\\
       \texttt{digits} & \texttt{0cdf@3} & Logic & 16 & 1 & 5 & 600.0\\
       \texttt{digits} & \texttt{1b31@2} & Logic & 5 & 0 & 5 & 0.9\\
-      \texttt{grade} & \texttt{1b31@3} & UB & 2 & 1 & 1 & 0.8\\
-      \texttt{grade} & \texttt{317a@3} & Logic & 8 & 2 & 5 & 1.0\\
-      \texttt{grade} & \texttt{b192@3} & UB & 2 & 1 & 1 & 0.5\\
-      \texttt{median} & \texttt{97f6@3} & UB & 7 & 1 & 6 & 1.6\\
-      \texttt{smallest} & \texttt{0704@2} & UB & 5 & 1 & 3 & 87.0\\
-      \texttt{smallest} & \texttt{d25c@1} & UB & 5 & 0 & 5 & 1.3\\
+      \texttt{grade} & \texttt{1b31@3} & \gls{ub} & 2 & 1 & 1 & 0.8\\
+      \texttt{grade} & \texttt{317a@3} & \gls{ub} & 8 & 3 & 5 & 1.0\\
+      \texttt{grade} & \texttt{b192@3} & \gls{ub} & 2 & 1 & 1 & 0.5\\
+      \texttt{median} & \texttt{97f6@3} & \gls{ub} & 7 & 1 & 6 & 1.6\\
+      \texttt{smallest} & \texttt{0704@2} & \gls{ub} & 5 & 1 & 3 & 87.0\\
+      \texttt{smallest} & \texttt{d25c@1} & \gls{ub} & 5 & 0 & 5 & 1.3\\
       \bottomrule
     \end{tabular}
   \end{center}
@@ -56,6 +63,41 @@ the tool successfully differentiate all semantically different patches.
     and the maximum number of undistinguished revisions.}\label{result}
 \end{table}
 
-The stark difference in execution time reveals an inherent weakness
-of our approach: heavily-nested conditions or loops increase
-the search space dramatically.
+\psychic{} took orders of magnitude more time on \texttt{digits} \texttt{0cdf@3}
+and \texttt{smallest} \texttt{d25c@1}, which respectively include a loop
+and heavily-nested conditions.
+
+\begin{figure}
+  \begin{tikzpicture}[mymatrix/.style={matrix of nodes,
+                                       nodes=typetag,row sep=1ex},
+                      typetag/.style={draw,anchor=west},
+                      title/.style={draw=none,inner sep=0pt},
+                      line width=0.2ex]
+    \node[draw,diamond,aspect=4] (root) {Input: 6 3 2 1 0}
+    \node[draw,diamond,aspect=4,below left=6em of root]
+      (left) {Input: 66 7 6 2 1}
+    \draw[-angle 45] (root) to node[above,sloped]{Output: \gls{ub}} (left);
+    \node[draw,diamond,aspect=4,below right=6em of root]
+      (right) {Input: 6 3 2 1 1}
+    \draw[-angle 45] (root) to node[above,sloped]{Output: F} (right);
+    \node[draw,below left=6em of left] (i) {Original}
+    \node[draw,right=5em of i] (d) {Patches 108, 147, 187, 227, 267}
+    \node[draw,right=5em of d] (a) {Patch 10}
+    \node[draw,right=5em of a] (b) {Patch 307}
+    \draw[-angle 45] (left) to node[above,sloped]{Output: \gls{ub}} (i);
+    \draw[-angle 45] (left) to node[above,sloped]{Output: F} (d);
+    \draw[-angle 45] (right) to node[above,sloped]{Output: D} (a);
+    \draw[-angle 45] (right) to node[above,sloped]{Output: F} (b);
+  \end{tikzpicture}
+  \caption{Decision tree generated from patches of \texttt{grade}
+    \texttt{317a@3}, with outputs minimized.}\label{317a}
+\end{figure}
+
+To demonstrate the decision tree, we examine the result for \texttt{grade}
+\texttt{317a@3}.  The task is to calculate the letter grade,
+given the threshold for each grade from A to D, followed by a percentage grade.
+This attempt is buggy because it misses the initialization the result
+grade variable, which is inserted in various locations by MSV.
+The decision tree generated from these patches is illustrated
+in figure~\ref{317a}, from which patch number 10 can be deduced to be
+the most correct one.
diff --git a/glossary.tex b/glossary.tex
index d8953ab..83fc900 100644
--- a/glossary.tex
+++ b/glossary.tex
@@ -18,6 +18,7 @@
 \newacronym{sat}{SAT}{satisfiability}
 \newacronym{smt}{SMT}{satisfiability modulo theories}
 \newacronym{ffi}{FFI}{foreign function interface}
+\newacronym{ub}{UB}{undefined behavior}
 \newglossaryentry{searcher}{name=searcher,
   description={Is what keeps track of states to be executed,
     along with the information for scheduling them to optimize resources
diff --git a/outro.tex b/outro.tex
index 5e87498..c0624e3 100644
--- a/outro.tex
+++ b/outro.tex
@@ -1 +1,15 @@
 \section{Conclusion}
+This work introduced an differential testing technique
+for multiple program variants at once, implemented as an extension
+named \psychic{} on top of the symbolic execution engine \klee{}.
+It took into account edge cases to lay out a groundwork to be easily used
+for a diverse set C programs to help developers with choosing
+automatically generated patches.  That being said, experiment results
+indicates a large room for improvements.
+
+In the future, other techniques need to be incorporated
+to improving the performance and generalizibility.
+Sophisticated scheduling and path pruning could help with efficiency,
+and further works on symbolic execution runtimes would also be beneficial.
+Finally, a more comprehensive evaluation is needed to prove
+the applicability of the approach.
diff --git a/ref.bib b/ref.bib
index 9519fca..26949fe 100644
--- a/ref.bib
+++ b/ref.bib
@@ -12,7 +12,7 @@
   author = {Christoph M. Hoffmann and Michael J. O'Donnell},
   title = {Pattern Matching in Trees},
   year = 1982,
-  publisher = {Association for Computing Machinery},
+  publisher = {ACM},
   volume = 29,
   number = 1,
   issn = {0004-5411},
@@ -70,7 +70,7 @@
   year = 2009,
   publisher = {Springer Berlin Heidelberg},
   pages = {76--92},
-  doi = {10.1007/978-3-642-04694-0_6}}
+  note = {\doi{10.1007/978-3-642-04694-0\_6}}}
 
 @inproceedings{smt,
   author = {Leonardo de Moura and Nikolaj Bj{\o}rner},
@@ -87,27 +87,34 @@
   title = {Path Exploration Based on Symbolic Output},
   year = 2011,
   isbn = {9781450304436},
-  publisher = {Association for Computing Machinery},
-  url = {https://doi.org/10.1145/2025113.2025152},
-  doi = {10.1145/2025113.2025152},
+  note = {\doi{10.1145/2025113.2025152}},
   booktitle = {Proc. ACM SIGSOFT Symp. and Eur. Conf. Found. Softw. Eng.},
   pages = {278--288},
   numpages = 11,
-  keywords = {relevant slice condition, symbolic execution, path exploration},
-  series = {ESEC/FSE '11}}
+  keywords = {relevant slice condition, symbolic execution, path exploration}}
+
+@inproceedings{directed,
+  author = {Ma, Kin-Keung and Phang, Khoo Yit
+            and Jeffrey S. Foster and Michael Hicks},
+  title = {Directed Symbolic Execution},
+  year = 2011,
+  isbn = {9783642237010},
+  booktitle = {Proc. Int. Conf. Static Analysis (SAS)},
+  pages = {95--111},
+  note = {\doi{10.5555/2041552.2041563}},
+  numpages = 17}
 
 @article{merge,
   author = {Volodymyr Kuznetsov and Johannes Kinder
             and Stefan Bucur and George Candea},
   title = {Efficient State Merging in Symbolic Execution},
   year = 2012,
-  publisher = {Association for Computing Machinery},
+  publisher = {ACM},
   address = {New York, NY, USA},
   volume = 47,
   number = 6,
   issn = {0362-1340},
-  url = {https://doi.org/10.1145/2345156.2254088},
-  doi = {10.1145/2345156.2254088},
+  note = {\doi{10.1145/2345156.2254088}},
   journal = {SIGPLAN Not.},
   month = jun,
   pages = {193--204},
@@ -121,7 +128,7 @@
            A Context-Guided Search Strategy in Concolic Testing},
   year = 2014,
   isbn = {9781450330565},
-  publisher = {Association for Computing Machinery},
+  publisher = {ACM},
   url = {https://doi.org/10.1145/2635868.2635872},
   doi = {10.1145/2635868.2635872},
   booktitle = {Proc. ACM SIGSOFT Int. Symp. Found. Softw. Eng. (FSE)},
@@ -151,9 +158,8 @@
            Overfitting in Automated Program Repair},
   year = 2015,
   isbn = {9781450336758},
-  publisher = {Association for Computing Machinery},
-  url = {https://doi.org/10.1145/2786805.2786825},
-  doi = {10.1145/2786805.2786825},
+  publisher = {ACM},
+  note = {\doi{10.1145/2786805.2786825}},
   booktitle = {Proc. Joint Meet. Found. Softw. Eng. (FSE)},
   pages = {532--543},
   numpages = 12,
@@ -218,7 +224,7 @@
             and Matias Martinez and Benoit Baudry and Lionel Seinturier},
   title = {Repairnator Patches Programs Automatically},
   year = 2019,
-  publisher = {Association for Computing Machinery},
+  publisher = {ACM},
   volume = 2019,
   url = {https://doi.org/10.1145/3349589},
   doi = {10.1145/3349589},
@@ -269,11 +275,18 @@
   pages = {609--620},
   note = {\doi{10.1109/ICSE.2017.62}}}
 
+@inbook{atoi,
+  title = {The Open Group Base Specifications},
+  year = 2018,
+  author = {{IEEE} and {The Open Group}},
+  chapter = {{atoi}},
+  url = {https://pubs.opengroup.org/onlinepubs/9699919799/functions/atoi.html}}
+
 @article{shadow,
   author = {Tomasz Kuchta and Hristina Palikareva and Cristian Cadar},
   title = {Shadow Symbolic Execution for Testing Software Patches},
   year = 2018,
-  publisher = {Association for Computing Machinery},
+  publisher = {ACM},
   volume = 27,
   number = 3,
   issn = {1049-331X},
@@ -290,9 +303,7 @@
            by Learning Aggressive State-Pruning Strategy},
   year = 2020,
   isbn = {9781450370431},
-  publisher = {Association for Computing Machinery},
-  url = {https://doi.org/10.1145/3368089.3409755},
-  doi = {10.1145/3368089.3409755},
+  note = {\doi{10.1145/3368089.3409755}},
   booktitle = {Proc. Joint Meet. Eur. Softw. Eng. Conf.
                and ACM Symp. Found. Softw. Eng. (ESEC/FSE)},
   pages = {147--158},
@@ -302,12 +313,10 @@
 @inproceedings{hydiff,
   author = {Yannic Noller and Corina S. P\u{a}s\u{a}reanu and Marcel B\"{o}hme
             and Sun, Youcheng and Nguyen, Hoang Lam and Lars Grunske},
-  title = {HyDiff: Hybrid Differential Software Analysis},
+  title = {{HyDiff}: Hybrid Differential Software Analysis},
   year = 2020,
   isbn = {9781450371216},
-  publisher = {Association for Computing Machinery},
-  url = {https://doi.org/10.1145/3377811.3380363},
-  doi = {10.1145/3377811.3380363},
+  note = {\doi{10.1145/3377811.3380363}},
   booktitle = {Proc. ACM/IEEE Int. Conf. Softw. Eng. (ICSE)},
   pages = {1273--1285},
   numpages = 13,
@@ -319,7 +328,7 @@
   title = {Abstracting Failure-Inducing Inputs},
   year = 2020,
   isbn = {9781450380089},
-  publisher = {Association for Computing Machinery},
+  publisher = {ACM},
   url = {https://doi.org/10.1145/3395363.3397349},
   doi = {10.1145/3395363.3397349},
   booktitle = {Proc. ACM SIGSOFT Int. Symp. Softw. Test. and Analysis (ISSTA)},
@@ -336,7 +345,7 @@
            for Java Programs},
   year = 2020,
   isbn = {9781450371216},
-  publisher = {Association for Computing Machinery},
+  publisher = {ACM},
   doi = {10.1145/3377811.3380338},
   booktitle = {Proc. ACM/IEEE Int. Conf. Softw. Eng. (ICSE)},
   pages = {615--627},
@@ -392,7 +401,7 @@
   pages = {118--122},
   publisher = {{KSI} Research Inc.},
   year = 2023,
-  note = {\doi{https://doi.org/10.18293/SEKE2023-070}}}
+  note = {\doi{10.18293/SEKE2023-070}}}
 
 @inbook{unidiff,
   author = {David MacKenzie and Paul Eggert and Richard Stallman},
diff --git a/technique.tex b/technique.tex
index 470449f..af1fa92 100644
--- a/technique.tex
+++ b/technique.tex
@@ -333,6 +333,9 @@ If the path constraints of both states are \gls{sat} with at least
 one difference in outputs, a \gls{difftest} can be generated
 from the \gls{smt} model.
 
+Halt states are reached naturally at the program's termination or an error.
+The latter category also includes sanitizer traps for \gls{ub}.
+
 \begin{algorithm}
   \caption{\Gls{difftest} generation}\label{diffgen}
   \begin{algorithmic}[1]
@@ -379,8 +382,6 @@ from the \gls{smt} model.
 \subsection{Decision tree construction}
 From each \gls{difftest}, a set of clusters of revisions with the same output
 is constructed.  We then superimpose these clustering sets to create
-hierarchical clusterings or a decision tree.  A minimal tree, i.e~a tree
-with the shortest height, is found through exhaustively search though
-all subsets of set of clustering sets.
-
-TODO: analyse complexity
+hierarchical clusterings or a decision tree.  A minimal tree, i.e.~a tree
+with the shortest height, is found by searching through the subsets
+of set of clustering sets.