From 210ddf92aab09acc796b2a7fe34ba3bd983cbd62 Mon Sep 17 00:00:00 2001 From: Nguyễn Gia Phong Date: Mon, 11 Dec 2023 13:34:41 +0900 Subject: Finish content --- abstract.tex | 8 ++++++- all.tex | 1 + discuss.tex | 48 ++++++++++++++++++++++++++++++++++++++++- eval.tex | 68 +++++++++++++++++++++++++++++++++++++++++++++++------------ glossary.tex | 1 + outro.tex | 14 ++++++++++++ ref.bib | 61 ++++++++++++++++++++++++++++++----------------------- technique.tex | 11 +++++----- 8 files changed, 166 insertions(+), 46 deletions(-) diff --git a/abstract.tex b/abstract.tex index 4701ff7..23c50a2 100644 --- a/abstract.tex +++ b/abstract.tex @@ -1,5 +1,11 @@ \renewcommand{\abstractname}{\Large Abstract} \begin{abstract} -Your abstract should be here. +In recent decades, \gls{apr} has been advancing consistently +according to benchmarks. However, its use in practice is still limited +due to the difficulty in choosing a desired patch among the generated pool. +This work introduces a method to logically differentiate between patches +through symbolic execution. The technique generates a tree of decisions +for developers to reason between patches based on the program's inputs +and semi-automatically captured outputs. \end{abstract} \thispagestyle{empty} diff --git a/all.tex b/all.tex index 519096f..a87213e 100644 --- a/all.tex +++ b/all.tex @@ -32,6 +32,7 @@ \usetikzlibrary{matrix} \usetikzlibrary{positioning} \usetikzlibrary{shapes} +\usetikzlibrary{shapes.geometric} \renewcommand\cftsecafterpnum{\vskip15pt} \renewcommand\cftsubsecafterpnum{\vskip15pt} diff --git a/discuss.tex b/discuss.tex index 682bd5e..12af349 100644 --- a/discuss.tex +++ b/discuss.tex @@ -1,3 +1,49 @@ \section{Discussion} +The experiment on \textsc{IntroClass} reveals some limitations +of \psychic{}. First, the tool has trouble scaling with growing path spaces. +While this is inherent to the path explosion problem, not all paths +need to be explored. Therefore, more aggressive path pruning~\cite{homi,hydiff} +and scheduling heuristics~\cite{directed,rsc,paradyse} could prove +to be beneficial. The use of \textsc{DiffTest} on any (undifferentiated) +state pair in algorithm~\ref{diffgen} further hurts performance. + +Next, while in theory, \klee{} can execute any LLVM bitcode, +the programming interface sets the limit on logical interpretation. +More specifically, to leave rooms for optimization, C and POSIX standards +leave a lot of behaviors unspecified. For example, function \texttt{atoi(3)}, +which converts string to integer, may return any number upon +a parsing error~\cite{atoi}, lumping this implementation-defined number +with error cases. + +Some others like \texttt{scanf(3)} and \texttt{printf(3)} +are prohibitively expensive due to heavy pattern matching, and since +they operate on input/output, their calls are likely dominators +or post-dominators in the control-flow graph, requiring manual interventions. +Failing to replace these calls could either significantly degrade efficiency +or force the theorem prover to give up on \gls{smt} solving. In case +of \textsc{IntroClass}, ignoring \texttt{printf(3)} calls disabled +the use of output files capturing. + +On more positive notes, capturing local function return values +and mutable arguments has proved its usability. Together +with the use of eager concrete execution to skip symbolic test generation, +promising preliminaries have been obtained. + \subsection{Threats to Validity} -\subsection{Related Work} +The small scope of experiments on toy programs challenges +the practical generalizibility of the tool. Since the \textsc{Prophet}-based +\gls{apr} tool MSV mines for tokens within the codebase~\cite{prophet}, +patches are not diverse in semantics and the overall result +is not statistically strong. + +\subsection{Related Works} +There exists multiple differential test generators for a pair +of program revisions, such as \textsc{Nezha} which uses black- or gray-box +fuzzing~\cite{nezha}, \textsc{Shadow} based on symbolic execution~\cite{shadow}, +and \textsc{HyDiff} combining both~\cite{hydiff}. Some recent work targets +the opposite problem of detecting semantic code clone~\cite{semclone}. + +For interpreting multiple program variants symbolically at the same time, +alternative to \psychic{} keeping independent states, other approaches +may choose shadow execution~\cite{shadow}, or to join~\cite{join-split} +or merge~\cite{merge} states at higher-order branches. diff --git a/eval.tex b/eval.tex index cfd76ab..63fb11b 100644 --- a/eval.tex +++ b/eval.tex @@ -20,33 +20,40 @@ ground truth program\footnote{\url{https://github.com/McSinyx/IntroClass}} to select buggy programs. MSV then generate fixes in form of meta programs, which are then post-processed into \psychic{}-compatible format. +Due to the limitation of \klee{}'s POSIX runtime, \texttt{libc} calls +such as \texttt{scanf} has to be replaced by more direct access +of standard input or command-line arguments. Unless the logic is outside +of the \texttt{main} function, an output value is manually annotated. + A selection of 10 meta programs with at least two semantically different revisions (including the original buggy version) are then fed to \psychic{} to generate differential tests and decision trees with a timeout of 10 minutes. +The vast majority of these bugs are either simple logical mistakes +or missing initializations that trigger \glspl{ub}. \subsection{Results} -Experiment results on \textsc{IntroClass} is shown in table~\ref{result}. -In case of \texttt{digits-1b31@2} and \texttt{smallest-} \texttt{d25c@1}, +Experiment results on \textsc{IntroClass} is summarized in table~\ref{result}. +In case of \texttt{digits} \texttt{1b31@2} and \texttt{smallest} \texttt{d25c@1}, \psychic{} failed to generate differential tests. In other cases, -except for the timeout in \texttt{digits-} \texttt{0cdf@3}, +except for the timeout in \texttt{digits} \texttt{0cdf@3}, the tool successfully differentiate all semantically different patches. \begin{table}[h] \begin{center} \begin{tabular}{l c l r r r r} \toprule - Task & Attempt & Bug type & $n$ & Tree height & Largest cluster & Time\\ + Task & Attempt & Bug type & $n$ & Tree height & Largest cluster & Time (s)\\ \midrule \texttt{checksum} & \texttt{3b23@3} & Arithmetic & 2 & 1 & 1 & 1.2\\ \texttt{checksum} & \texttt{cb24@3} & Logic & 3 & 1 & 2 & 1.7\\ \texttt{digits} & \texttt{0cdf@3} & Logic & 16 & 1 & 5 & 600.0\\ \texttt{digits} & \texttt{1b31@2} & Logic & 5 & 0 & 5 & 0.9\\ - \texttt{grade} & \texttt{1b31@3} & UB & 2 & 1 & 1 & 0.8\\ - \texttt{grade} & \texttt{317a@3} & Logic & 8 & 2 & 5 & 1.0\\ - \texttt{grade} & \texttt{b192@3} & UB & 2 & 1 & 1 & 0.5\\ - \texttt{median} & \texttt{97f6@3} & UB & 7 & 1 & 6 & 1.6\\ - \texttt{smallest} & \texttt{0704@2} & UB & 5 & 1 & 3 & 87.0\\ - \texttt{smallest} & \texttt{d25c@1} & UB & 5 & 0 & 5 & 1.3\\ + \texttt{grade} & \texttt{1b31@3} & \gls{ub} & 2 & 1 & 1 & 0.8\\ + \texttt{grade} & \texttt{317a@3} & \gls{ub} & 8 & 3 & 5 & 1.0\\ + \texttt{grade} & \texttt{b192@3} & \gls{ub} & 2 & 1 & 1 & 0.5\\ + \texttt{median} & \texttt{97f6@3} & \gls{ub} & 7 & 1 & 6 & 1.6\\ + \texttt{smallest} & \texttt{0704@2} & \gls{ub} & 5 & 1 & 3 & 87.0\\ + \texttt{smallest} & \texttt{d25c@1} & \gls{ub} & 5 & 0 & 5 & 1.3\\ \bottomrule \end{tabular} \end{center} @@ -56,6 +63,41 @@ the tool successfully differentiate all semantically different patches. and the maximum number of undistinguished revisions.}\label{result} \end{table} -The stark difference in execution time reveals an inherent weakness -of our approach: heavily-nested conditions or loops increase -the search space dramatically. +\psychic{} took orders of magnitude more time on \texttt{digits} \texttt{0cdf@3} +and \texttt{smallest} \texttt{d25c@1}, which respectively include a loop +and heavily-nested conditions. + +\begin{figure} + \begin{tikzpicture}[mymatrix/.style={matrix of nodes, + nodes=typetag,row sep=1ex}, + typetag/.style={draw,anchor=west}, + title/.style={draw=none,inner sep=0pt}, + line width=0.2ex] + \node[draw,diamond,aspect=4] (root) {Input: 6 3 2 1 0} + \node[draw,diamond,aspect=4,below left=6em of root] + (left) {Input: 66 7 6 2 1} + \draw[-angle 45] (root) to node[above,sloped]{Output: \gls{ub}} (left); + \node[draw,diamond,aspect=4,below right=6em of root] + (right) {Input: 6 3 2 1 1} + \draw[-angle 45] (root) to node[above,sloped]{Output: F} (right); + \node[draw,below left=6em of left] (i) {Original} + \node[draw,right=5em of i] (d) {Patches 108, 147, 187, 227, 267} + \node[draw,right=5em of d] (a) {Patch 10} + \node[draw,right=5em of a] (b) {Patch 307} + \draw[-angle 45] (left) to node[above,sloped]{Output: \gls{ub}} (i); + \draw[-angle 45] (left) to node[above,sloped]{Output: F} (d); + \draw[-angle 45] (right) to node[above,sloped]{Output: D} (a); + \draw[-angle 45] (right) to node[above,sloped]{Output: F} (b); + \end{tikzpicture} + \caption{Decision tree generated from patches of \texttt{grade} + \texttt{317a@3}, with outputs minimized.}\label{317a} +\end{figure} + +To demonstrate the decision tree, we examine the result for \texttt{grade} +\texttt{317a@3}. The task is to calculate the letter grade, +given the threshold for each grade from A to D, followed by a percentage grade. +This attempt is buggy because it misses the initialization the result +grade variable, which is inserted in various locations by MSV. +The decision tree generated from these patches is illustrated +in figure~\ref{317a}, from which patch number 10 can be deduced to be +the most correct one. diff --git a/glossary.tex b/glossary.tex index d8953ab..83fc900 100644 --- a/glossary.tex +++ b/glossary.tex @@ -18,6 +18,7 @@ \newacronym{sat}{SAT}{satisfiability} \newacronym{smt}{SMT}{satisfiability modulo theories} \newacronym{ffi}{FFI}{foreign function interface} +\newacronym{ub}{UB}{undefined behavior} \newglossaryentry{searcher}{name=searcher, description={Is what keeps track of states to be executed, along with the information for scheduling them to optimize resources diff --git a/outro.tex b/outro.tex index 5e87498..c0624e3 100644 --- a/outro.tex +++ b/outro.tex @@ -1 +1,15 @@ \section{Conclusion} +This work introduced an differential testing technique +for multiple program variants at once, implemented as an extension +named \psychic{} on top of the symbolic execution engine \klee{}. +It took into account edge cases to lay out a groundwork to be easily used +for a diverse set C programs to help developers with choosing +automatically generated patches. That being said, experiment results +indicates a large room for improvements. + +In the future, other techniques need to be incorporated +to improving the performance and generalizibility. +Sophisticated scheduling and path pruning could help with efficiency, +and further works on symbolic execution runtimes would also be beneficial. +Finally, a more comprehensive evaluation is needed to prove +the applicability of the approach. diff --git a/ref.bib b/ref.bib index 9519fca..26949fe 100644 --- a/ref.bib +++ b/ref.bib @@ -12,7 +12,7 @@ author = {Christoph M. Hoffmann and Michael J. O'Donnell}, title = {Pattern Matching in Trees}, year = 1982, - publisher = {Association for Computing Machinery}, + publisher = {ACM}, volume = 29, number = 1, issn = {0004-5411}, @@ -70,7 +70,7 @@ year = 2009, publisher = {Springer Berlin Heidelberg}, pages = {76--92}, - doi = {10.1007/978-3-642-04694-0_6}} + note = {\doi{10.1007/978-3-642-04694-0\_6}}} @inproceedings{smt, author = {Leonardo de Moura and Nikolaj Bj{\o}rner}, @@ -87,27 +87,34 @@ title = {Path Exploration Based on Symbolic Output}, year = 2011, isbn = {9781450304436}, - publisher = {Association for Computing Machinery}, - url = {https://doi.org/10.1145/2025113.2025152}, - doi = {10.1145/2025113.2025152}, + note = {\doi{10.1145/2025113.2025152}}, booktitle = {Proc. ACM SIGSOFT Symp. and Eur. Conf. Found. Softw. Eng.}, pages = {278--288}, numpages = 11, - keywords = {relevant slice condition, symbolic execution, path exploration}, - series = {ESEC/FSE '11}} + keywords = {relevant slice condition, symbolic execution, path exploration}} + +@inproceedings{directed, + author = {Ma, Kin-Keung and Phang, Khoo Yit + and Jeffrey S. Foster and Michael Hicks}, + title = {Directed Symbolic Execution}, + year = 2011, + isbn = {9783642237010}, + booktitle = {Proc. Int. Conf. Static Analysis (SAS)}, + pages = {95--111}, + note = {\doi{10.5555/2041552.2041563}}, + numpages = 17} @article{merge, author = {Volodymyr Kuznetsov and Johannes Kinder and Stefan Bucur and George Candea}, title = {Efficient State Merging in Symbolic Execution}, year = 2012, - publisher = {Association for Computing Machinery}, + publisher = {ACM}, address = {New York, NY, USA}, volume = 47, number = 6, issn = {0362-1340}, - url = {https://doi.org/10.1145/2345156.2254088}, - doi = {10.1145/2345156.2254088}, + note = {\doi{10.1145/2345156.2254088}}, journal = {SIGPLAN Not.}, month = jun, pages = {193--204}, @@ -121,7 +128,7 @@ A Context-Guided Search Strategy in Concolic Testing}, year = 2014, isbn = {9781450330565}, - publisher = {Association for Computing Machinery}, + publisher = {ACM}, url = {https://doi.org/10.1145/2635868.2635872}, doi = {10.1145/2635868.2635872}, booktitle = {Proc. ACM SIGSOFT Int. Symp. Found. Softw. Eng. (FSE)}, @@ -151,9 +158,8 @@ Overfitting in Automated Program Repair}, year = 2015, isbn = {9781450336758}, - publisher = {Association for Computing Machinery}, - url = {https://doi.org/10.1145/2786805.2786825}, - doi = {10.1145/2786805.2786825}, + publisher = {ACM}, + note = {\doi{10.1145/2786805.2786825}}, booktitle = {Proc. Joint Meet. Found. Softw. Eng. (FSE)}, pages = {532--543}, numpages = 12, @@ -218,7 +224,7 @@ and Matias Martinez and Benoit Baudry and Lionel Seinturier}, title = {Repairnator Patches Programs Automatically}, year = 2019, - publisher = {Association for Computing Machinery}, + publisher = {ACM}, volume = 2019, url = {https://doi.org/10.1145/3349589}, doi = {10.1145/3349589}, @@ -269,11 +275,18 @@ pages = {609--620}, note = {\doi{10.1109/ICSE.2017.62}}} +@inbook{atoi, + title = {The Open Group Base Specifications}, + year = 2018, + author = {{IEEE} and {The Open Group}}, + chapter = {{atoi}}, + url = {https://pubs.opengroup.org/onlinepubs/9699919799/functions/atoi.html}} + @article{shadow, author = {Tomasz Kuchta and Hristina Palikareva and Cristian Cadar}, title = {Shadow Symbolic Execution for Testing Software Patches}, year = 2018, - publisher = {Association for Computing Machinery}, + publisher = {ACM}, volume = 27, number = 3, issn = {1049-331X}, @@ -290,9 +303,7 @@ by Learning Aggressive State-Pruning Strategy}, year = 2020, isbn = {9781450370431}, - publisher = {Association for Computing Machinery}, - url = {https://doi.org/10.1145/3368089.3409755}, - doi = {10.1145/3368089.3409755}, + note = {\doi{10.1145/3368089.3409755}}, booktitle = {Proc. Joint Meet. Eur. Softw. Eng. Conf. and ACM Symp. Found. Softw. Eng. (ESEC/FSE)}, pages = {147--158}, @@ -302,12 +313,10 @@ @inproceedings{hydiff, author = {Yannic Noller and Corina S. P\u{a}s\u{a}reanu and Marcel B\"{o}hme and Sun, Youcheng and Nguyen, Hoang Lam and Lars Grunske}, - title = {HyDiff: Hybrid Differential Software Analysis}, + title = {{HyDiff}: Hybrid Differential Software Analysis}, year = 2020, isbn = {9781450371216}, - publisher = {Association for Computing Machinery}, - url = {https://doi.org/10.1145/3377811.3380363}, - doi = {10.1145/3377811.3380363}, + note = {\doi{10.1145/3377811.3380363}}, booktitle = {Proc. ACM/IEEE Int. Conf. Softw. Eng. (ICSE)}, pages = {1273--1285}, numpages = 13, @@ -319,7 +328,7 @@ title = {Abstracting Failure-Inducing Inputs}, year = 2020, isbn = {9781450380089}, - publisher = {Association for Computing Machinery}, + publisher = {ACM}, url = {https://doi.org/10.1145/3395363.3397349}, doi = {10.1145/3395363.3397349}, booktitle = {Proc. ACM SIGSOFT Int. Symp. Softw. Test. and Analysis (ISSTA)}, @@ -336,7 +345,7 @@ for Java Programs}, year = 2020, isbn = {9781450371216}, - publisher = {Association for Computing Machinery}, + publisher = {ACM}, doi = {10.1145/3377811.3380338}, booktitle = {Proc. ACM/IEEE Int. Conf. Softw. Eng. (ICSE)}, pages = {615--627}, @@ -392,7 +401,7 @@ pages = {118--122}, publisher = {{KSI} Research Inc.}, year = 2023, - note = {\doi{https://doi.org/10.18293/SEKE2023-070}}} + note = {\doi{10.18293/SEKE2023-070}}} @inbook{unidiff, author = {David MacKenzie and Paul Eggert and Richard Stallman}, diff --git a/technique.tex b/technique.tex index 470449f..af1fa92 100644 --- a/technique.tex +++ b/technique.tex @@ -333,6 +333,9 @@ If the path constraints of both states are \gls{sat} with at least one difference in outputs, a \gls{difftest} can be generated from the \gls{smt} model. +Halt states are reached naturally at the program's termination or an error. +The latter category also includes sanitizer traps for \gls{ub}. + \begin{algorithm} \caption{\Gls{difftest} generation}\label{diffgen} \begin{algorithmic}[1] @@ -379,8 +382,6 @@ from the \gls{smt} model. \subsection{Decision tree construction} From each \gls{difftest}, a set of clusters of revisions with the same output is constructed. We then superimpose these clustering sets to create -hierarchical clusterings or a decision tree. A minimal tree, i.e~a tree -with the shortest height, is found through exhaustively search though -all subsets of set of clustering sets. - -TODO: analyse complexity +hierarchical clusterings or a decision tree. A minimal tree, i.e.~a tree +with the shortest height, is found by searching through the subsets +of set of clustering sets. -- cgit 1.4.1