From 1540c49161fc4f9fc075d3fd978f682f8f041dac Mon Sep 17 00:00:00 2001 From: Nguyễn Gia Phong Date: Tue, 12 Dec 2023 11:56:49 +0900 Subject: Proof read for the last time --- bg.tex | 4 ++-- eval.tex | 4 ++-- glossary.tex | 2 +- intro.tex | 4 ++-- outro.tex | 2 +- technique.tex | 20 ++++++++++---------- 6 files changed, 18 insertions(+), 18 deletions(-) diff --git a/bg.tex b/bg.tex index b1845d4..f8b759f 100644 --- a/bg.tex +++ b/bg.tex @@ -9,7 +9,7 @@ At a high level, symbolic execution is an interpreter loop. Every iteration, it selects a symbolic process, also known as a \emph{\gls{state}}, in whose context an instruction is then executed~\cite{klee}. As shown in algorithm~\ref{symb-exec}, -each path with possible \gls{sat} of the given program is explore. +each path with possible \gls{sat} of the given program is explored. The number of paths grows at an exponential rate, so the interpreter loop also breaks at the exhaustion of the time budget~\cite{klee}, which is omitted from the algorithm for brevity. @@ -67,7 +67,7 @@ for files (including standard streams) and command-line arguments or via in-source annotations~\cite{klee-docs}. -As shown in algorithm~\ref{symb-exec}, the path constraints is only expanded +As shown in algorithm~\ref{symb-exec}, the path constraints are only expanded at branching instructions, so only values affecting the control flow are tracked, since \klee{} targets the exploration of execution paths. In~order to track other values of interest, i.e.~non-inputs, a condition diff --git a/eval.tex b/eval.tex index e77a4ac..98358b7 100644 --- a/eval.tex +++ b/eval.tex @@ -17,7 +17,7 @@ In addition to the test cases provided by \textsc{IntroClass}, which are non-exhaustive of the input domains, we use property tests~\cite{hypothesis} with the respective ground truth program\footnote{\url{https://github.com/McSinyx/IntroClass}} -to select buggy programs. MSV then generate fixes in form of meta programs, +to select buggy programs. MSV then generate fixes in the form of meta programs, which are then post-processed into \psychic{}-compatible format. Due to the limitation of \klee{}'s POSIX runtime, \texttt{libc} calls @@ -57,7 +57,7 @@ the tool successfully differentiate all semantically different patches. \bottomrule \end{tabular} \end{center} - \caption{Decision tree quality on patch pool generated by MSV + \caption{Decision tree quality on the patch pool generated by MSV for \textsc{IntroClass} programs. Respective to each meta program with $n$ revisions are the height of the patch decision tree and the maximum number of undistinguished revisions.}\label{result} diff --git a/glossary.tex b/glossary.tex index 83fc900..91ee422 100644 --- a/glossary.tex +++ b/glossary.tex @@ -3,7 +3,7 @@ description={is a software defect that results in undesired behaviors}} \newglossaryentry{difftest}{name={differential test}, description={is an input and a collection of outputs of two or more programs - of the same purpose, such that each output is pairwise different}} + of the same purpose, such that each output is pair-wise different}} \newglossaryentry{state}{name=state, description={or \emph{symbolic process}, includes a program counter, an execution environment (a register file, a stack frame, diff --git a/intro.tex b/intro.tex index 120b9b3..1f15e6e 100644 --- a/intro.tex +++ b/intro.tex @@ -24,7 +24,7 @@ of this process, we work on methods to highlight the semantic difference between patches, in the form of \glspl{difftest}. Existing automation techniques for differentiation such as black- or gray-box fuzzing~\cite{nezha} and symbolic execution~\cite{shadow} has shown promise on pairs -of program revisions. However, there is a lack in precedents in doing so +of program revisions. However, there is a lack of precedents in doing so at scale like for multiple \gls{apr}-generated patches. Symbolic execution is observed as one promising direction as it works directly @@ -38,7 +38,7 @@ of this work are: program revisions as a semi-automated pipeline, for communicating the reasoning behind each patch in form of a decision tree \item A tool named \psychic{} implementing this process for C programs - and handling platform specificities for the generalization of application + and handling platform specificities for applicative generalization \item Preliminary results of its performance on patches automatically generated for small programs \end{enumerate} diff --git a/outro.tex b/outro.tex index d8dc375..ac1203d 100644 --- a/outro.tex +++ b/outro.tex @@ -1,5 +1,5 @@ \section{Conclusion} -This work introduced an differential testing technique +This work introduced a differential testing technique for multiple program variants at once, implemented as an extension named \psychic{} on top of the symbolic execution engine \klee{}. It took into account edge cases to lay out a groundwork to be easily used diff --git a/technique.tex b/technique.tex index af1fa92..63b7ea1 100644 --- a/technique.tex +++ b/technique.tex @@ -13,7 +13,7 @@ to generate \glspl{difftest}. Third, the new conditions the combined are assertions on the distinction of corresponding values across revisions, so we offer multiple options to capture such \emph{\glspl{symout}}. -Next, we introduces concrete execution to reduce the theorem proving load, +Next, we introduce concrete execution to reduce the theorem proving load, and produce \glspl{difftest} for clustering the patches and finally produce a decision tree. One may observe that the clustering information can help with selecting more ``promising'' states, although such scheduling @@ -161,7 +161,7 @@ used for shadow symbolic execution. Unlike \textsc{Shadow} which tests only one patch~\cite{shadow} to reveal its regressions, though, the goal of \psychic{} is to find the difference in behavior between multiple patches. Therefore, the former's annotation -at level of expressions does not meet our expectation on expressivity. +at the level of expressions does not meet our expectation on expressivity. Instead, we implement higher-order branching. Annotating patches at block level simplifies the generation process and improves the readability of the meta program. @@ -195,7 +195,7 @@ and improves the readability of the meta program. For instance, given the two unified diffs~\cite{unidiff} of a bug fix from upstream \texttt{grep}, and a patch in \textsc{DBGBench}~\cite{dbgbench} -in figure~\ref{grep-8f08d8e2},\footnote{The snippets have been outdented +in figure~\ref{grep-8f08d8e2},\footnote{These snippets have been outdented to fit within the width of this page.} one can construct the relevant part of the meta program in figure~\ref{grep-8f08d8e2-meta}, where \verb|__klee_meta| parses the environment for the choice of revision when executed @@ -227,7 +227,7 @@ if (__klee_meta(3) == \textcolor{purple}{0}) \verb|{| One may notice that the meta program may have any level of granularity. \psychic{} utilizes the meta program to maximize instructions -and path constraints sharing, which effectively means to defer +and path constraints that are shared, which effectively means to defer the divergence point deep into the control flow. For its intended use on automatically generated patches whose hunks are commonly short, optimizing the meta program construction is not a major issue. @@ -237,7 +237,7 @@ higher-order branching patch identifier, via basic tree pattern matching% ~\cite{tree-pattern} and skip adding its condition to the children states. Instead, those states carry the revision number outside of their path constraints for constant-time access when pruning paths -going though multiple patches in algorithm~\ref{apply-meta}. +going through multiple patches in algorithm~\ref{apply-meta}. \begin{algorithm} \caption{Construction of possible states after a patch location} @@ -247,7 +247,7 @@ going though multiple patches in algorithm~\ref{apply-meta}. \State $n' \gets \Call{RevisionNumber}{\varphi}$ \State $s_a = (i_a, e, \varPhi, n)$ \Comment{assume the lack of meta branch nesting - and always follow alternative path} + and always follow alternative paths} \If{$n = 0$} \Return $\{(i_c, e, \varPhi, n'), s_a\}$ \Comment{$i_c$ enters patch $n'$ } @@ -288,7 +288,7 @@ or automatically~\cite{fl-eval}, output annotation has limited exhaustiveness. To balance between manual annotation and computational scalability, methods \ref{ret} and \ref{parg} are introduced to automatically capture output more precisely at the end of every interested function. Debugging information -forwarded from compiler frontend are kept in bitcodes, and they are used +forwarded from compiler frontend are kept in bitcodes. They are used to limit the subroutines to be monitored, for example in algorithm~\ref{fnout}, the file that is patched by an \gls{apr} tool. @@ -314,7 +314,7 @@ the file that is patched by an \gls{apr} tool. \end{algorithmic} \end{algorithm} -Each \gls{symout} then is given an unique name for later comparison. +Each \gls{symout} then is given a unique name for later comparison. For the sake of simplicity, we name them after the capturing method and associated identifier, i.e.~variable or subroutine. @@ -328,7 +328,7 @@ and the conjunction of their path conditions is satisfiable. Currently \psychic{} only compare halt states to minimize memory usage and \gls{smt} solving cost. Such \gls{smt} formula is constructed in algorithm~\ref{diffgen} by first append a suffix to each output name, -then for the common ones assert for possible distinction between two paths. +then for the common ones assert for a possible distinction between two paths. If the path constraints of both states are \gls{sat} with at least one difference in outputs, a \gls{difftest} can be generated from the \gls{smt} model. @@ -384,4 +384,4 @@ From each \gls{difftest}, a set of clusters of revisions with the same output is constructed. We then superimpose these clustering sets to create hierarchical clusterings or a decision tree. A minimal tree, i.e.~a tree with the shortest height, is found by searching through the subsets -of set of clustering sets. +of the set of clustering sets. -- cgit 1.4.1