intro.tex


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44

\setcounter{page}{1}
\section{Introduction}
Modern software development is largely an iterative and incremental
process~\cite{incremental}.  Due to practically commonly weak specifications
and thus incomplete verification, each refinement may involve discovering,
eliminating, and creating new software defects, or \glspl{bug}.
Like for other software engineering tasks, efforts have been put into
automating this debugging process, including detecting and locating
software faults~\cite{fl-survey}, and generating fixes~\cite{apr}.

Progress on \gls{apr} is greatly hindered by the weak specification issue
as well~\cite{apr}.  Due to the criteria for correctness being incomplete,
patches generated may over-fit such test suites, resulting in an incomplete fix
and/or new regression \glspl{bug}~\cite{overfit}.  For the same reason,
more than one patch fitting the specification can be found.
Contemporary \gls{apr} tools do not give insight on the difference
between these patches, but only a ranking on potential correctness~\cite{apr}.

While this ranking is beneficial in evaluating the tools themselves,
together with the lack of certainty of correctness, it is an insufficient guide
for choosing \emph{the} desired patch, if any.  \Gls{apr} users must then
verify each patch to decide which to apply.  Recognizing the tedious nature
of this process, we work on methods to highlight the semantic difference
between patches, in the form of \glspl{difftest}.  Existing automation
techniques for differentiation such as black- or gray-box fuzzing~\cite{nezha}
and symbolic execution~\cite{shadow} has shown promise on pairs
of program revisions.  However, there is a lack of precedents in doing so
at scale like for multiple \gls{apr}-generated patches.

Symbolic execution is observed as one promising direction as it works directly
with path constraints, which can be combined and manipulated to reveal
semantic differences.  In this research, we explore symbolic execution
for mass differential testing with the ultimate goal of making deciding on
automatically generated patches easier for developers.  The main contributions
of this work are:
\begin{enumerate}
  \item Introduction of semantic difference mining from multiple
    program revisions as a semi-automated pipeline, for communicating
    the reasoning behind each patch in form of a decision tree
  \item A tool named \psychic{} implementing this process for C programs
    and handling platform specificities for applicative generalization
  \item Preliminary results of its performance
    on patches automatically generated for small programs
\end{enumerate}