diff options
author | Nguyễn Gia Phong <cnx@loang.net> | 2023-12-11 13:34:41 +0900 |
---|---|---|
committer | Nguyễn Gia Phong <cnx@loang.net> | 2023-12-11 13:34:41 +0900 |
commit | 210ddf92aab09acc796b2a7fe34ba3bd983cbd62 (patch) | |
tree | e14507055b594dddf22c5f44fc2f25b356902d6d /eval.tex | |
parent | 0f6b9207d5380eefb3a99d21a12751a32de63bfb (diff) | |
download | thesis-210ddf92aab09acc796b2a7fe34ba3bd983cbd62.tar.gz |
Finish content
Diffstat (limited to 'eval.tex')
-rw-r--r-- | eval.tex | 68 |
1 files changed, 55 insertions, 13 deletions
diff --git a/eval.tex b/eval.tex index cfd76ab..63fb11b 100644 --- a/eval.tex +++ b/eval.tex @@ -20,33 +20,40 @@ ground truth program\footnote{\url{https://github.com/McSinyx/IntroClass}} to select buggy programs. MSV then generate fixes in form of meta programs, which are then post-processed into \psychic{}-compatible format. +Due to the limitation of \klee{}'s POSIX runtime, \texttt{libc} calls +such as \texttt{scanf} has to be replaced by more direct access +of standard input or command-line arguments. Unless the logic is outside +of the \texttt{main} function, an output value is manually annotated. + A selection of 10 meta programs with at least two semantically different revisions (including the original buggy version) are then fed to \psychic{} to generate differential tests and decision trees with a timeout of 10 minutes. +The vast majority of these bugs are either simple logical mistakes +or missing initializations that trigger \glspl{ub}. \subsection{Results} -Experiment results on \textsc{IntroClass} is shown in table~\ref{result}. -In case of \texttt{digits-1b31@2} and \texttt{smallest-} \texttt{d25c@1}, +Experiment results on \textsc{IntroClass} is summarized in table~\ref{result}. +In case of \texttt{digits} \texttt{1b31@2} and \texttt{smallest} \texttt{d25c@1}, \psychic{} failed to generate differential tests. In other cases, -except for the timeout in \texttt{digits-} \texttt{0cdf@3}, +except for the timeout in \texttt{digits} \texttt{0cdf@3}, the tool successfully differentiate all semantically different patches. \begin{table}[h] \begin{center} \begin{tabular}{l c l r r r r} \toprule - Task & Attempt & Bug type & $n$ & Tree height & Largest cluster & Time\\ + Task & Attempt & Bug type & $n$ & Tree height & Largest cluster & Time (s)\\ \midrule \texttt{checksum} & \texttt{3b23@3} & Arithmetic & 2 & 1 & 1 & 1.2\\ \texttt{checksum} & \texttt{cb24@3} & Logic & 3 & 1 & 2 & 1.7\\ \texttt{digits} & \texttt{0cdf@3} & Logic & 16 & 1 & 5 & 600.0\\ \texttt{digits} & \texttt{1b31@2} & Logic & 5 & 0 & 5 & 0.9\\ - \texttt{grade} & \texttt{1b31@3} & UB & 2 & 1 & 1 & 0.8\\ - \texttt{grade} & \texttt{317a@3} & Logic & 8 & 2 & 5 & 1.0\\ - \texttt{grade} & \texttt{b192@3} & UB & 2 & 1 & 1 & 0.5\\ - \texttt{median} & \texttt{97f6@3} & UB & 7 & 1 & 6 & 1.6\\ - \texttt{smallest} & \texttt{0704@2} & UB & 5 & 1 & 3 & 87.0\\ - \texttt{smallest} & \texttt{d25c@1} & UB & 5 & 0 & 5 & 1.3\\ + \texttt{grade} & \texttt{1b31@3} & \gls{ub} & 2 & 1 & 1 & 0.8\\ + \texttt{grade} & \texttt{317a@3} & \gls{ub} & 8 & 3 & 5 & 1.0\\ + \texttt{grade} & \texttt{b192@3} & \gls{ub} & 2 & 1 & 1 & 0.5\\ + \texttt{median} & \texttt{97f6@3} & \gls{ub} & 7 & 1 & 6 & 1.6\\ + \texttt{smallest} & \texttt{0704@2} & \gls{ub} & 5 & 1 & 3 & 87.0\\ + \texttt{smallest} & \texttt{d25c@1} & \gls{ub} & 5 & 0 & 5 & 1.3\\ \bottomrule \end{tabular} \end{center} @@ -56,6 +63,41 @@ the tool successfully differentiate all semantically different patches. and the maximum number of undistinguished revisions.}\label{result} \end{table} -The stark difference in execution time reveals an inherent weakness -of our approach: heavily-nested conditions or loops increase -the search space dramatically. +\psychic{} took orders of magnitude more time on \texttt{digits} \texttt{0cdf@3} +and \texttt{smallest} \texttt{d25c@1}, which respectively include a loop +and heavily-nested conditions. + +\begin{figure} + \begin{tikzpicture}[mymatrix/.style={matrix of nodes, + nodes=typetag,row sep=1ex}, + typetag/.style={draw,anchor=west}, + title/.style={draw=none,inner sep=0pt}, + line width=0.2ex] + \node[draw,diamond,aspect=4] (root) {Input: 6 3 2 1 0} + \node[draw,diamond,aspect=4,below left=6em of root] + (left) {Input: 66 7 6 2 1} + \draw[-angle 45] (root) to node[above,sloped]{Output: \gls{ub}} (left); + \node[draw,diamond,aspect=4,below right=6em of root] + (right) {Input: 6 3 2 1 1} + \draw[-angle 45] (root) to node[above,sloped]{Output: F} (right); + \node[draw,below left=6em of left] (i) {Original} + \node[draw,right=5em of i] (d) {Patches 108, 147, 187, 227, 267} + \node[draw,right=5em of d] (a) {Patch 10} + \node[draw,right=5em of a] (b) {Patch 307} + \draw[-angle 45] (left) to node[above,sloped]{Output: \gls{ub}} (i); + \draw[-angle 45] (left) to node[above,sloped]{Output: F} (d); + \draw[-angle 45] (right) to node[above,sloped]{Output: D} (a); + \draw[-angle 45] (right) to node[above,sloped]{Output: F} (b); + \end{tikzpicture} + \caption{Decision tree generated from patches of \texttt{grade} + \texttt{317a@3}, with outputs minimized.}\label{317a} +\end{figure} + +To demonstrate the decision tree, we examine the result for \texttt{grade} +\texttt{317a@3}. The task is to calculate the letter grade, +given the threshold for each grade from A to D, followed by a percentage grade. +This attempt is buggy because it misses the initialization the result +grade variable, which is inserted in various locations by MSV. +The decision tree generated from these patches is illustrated +in figure~\ref{317a}, from which patch number 10 can be deduced to be +the most correct one. |