summary refs log tree commit diff homepage
path: root/bg.tex
blob: f8b759f00527e584592d42d3e32c65a52e908e20 (plain) (blame)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
\section{Background}
As mentioned earlier, our technique is based on the symbolic execution,
an engine named \klee{} in particular.  Their general principles
and some specifics are laid out here to make it less ambiguous to describe
our approach later on.

\subsection{Symbolic execution}
At a high level, symbolic execution is an interpreter loop.
Every iteration, it selects a symbolic process,
also known as a \emph{\gls{state}}, in whose context an instruction
is then executed~\cite{klee}.  As shown in algorithm~\ref{symb-exec},
each path with possible \gls{sat} of the given program is explored.
The number of paths grows at an exponential rate, so the interpreter loop
also breaks at the exhaustion of the time budget~\cite{klee},
which is omitted from the algorithm for brevity.

\begin{algorithm}
  \caption{Symbolic execution}\label{symb-exec}
  \begin{algorithmic}[1]
    \Require Program ($P$)
    \Ensure Set of test cases ($T$)
    \State $T \gets \varnothing$
    \State $S \gets \Call{InitialState}{P}$ \Comment{set of states}
    \While{$S \neq \varnothing$}
      \State $s \gets \Call{Select}{S}$
      \State $S \gets S \setminus \{s\}$
      \State $(i, e, \varPhi) \gets s$
        \Comment{instruction, environment, and path constraints}
      \State $(i', e') \gets \Call{Execute}{i, e}$
      \If{$\Call{IsHalt}{i}$}
        \State $t \gets \Call{GenerateTest}{\varPhi}$
        \If{$t \ne \bot$}
          $T \gets T \cup \{t\}$
        \EndIf
      \ElsIf{$\Call{IsBranch}{i}$}
        \State $(\varphi, i_c', i_a') \gets i'$
          \Comment{condition, consequent, and alternative}
        \If{$\Call{sat}{\varPhi \land \varphi}$}
          \State $s_c' \gets (i_c', e', \varPhi \land \varphi)$
          \State $S \gets S \cup \{s_c'\}$
        \EndIf
        \If{$\Call{sat}{\varPhi \land \lnot\varphi}$}
          \State $s_a' \gets i_a', e', \varPhi \land \lnot\varphi$
          \State $S \gets S \cup \{s_a'\}$
        \EndIf
      \Else
        \State $s' \gets (i', e', \varPhi)$
        \State $S \gets S \cup \{s'\}$
      \EndIf
    \EndWhile
  \end{algorithmic}
\end{algorithm}
\pagebreak

A \gls{sat} problem involving background theories like arithmetic
in computer programs is known as a~\gls{smt} problem, a generalization
of the Boolean \gls{sat} problem~\cite{smt}, which is already NP-complete.
Under the exponential time hypothesis, the cost of solving
an arbitrary \gls{sat} problem grows exponentially with the number
of variables~\cite{k-sat}.

\subsection{Usage of \klee{}}\label{klee-use}
The symbolic execution engine \klee{} analyses the LLVM \gls{ir} of a program,
also known as its bitcode~\cite{bc}, starting from values marked as symbolic.
Marking can be communicated either through command-line options
for files (including standard streams) and command-line arguments
or via in-source annotations~\cite{klee-docs}.


As shown in algorithm~\ref{symb-exec}, the path constraints are only expanded
at branching instructions, so only values affecting the control flow
are tracked, since \klee{} targets the exploration of execution paths.
In~order to track other values of interest, i.e.~non-inputs, a condition
involving such value must be artificially injected.  One common idiom for doing
this condition injection~\cite{angelix,cpr,trident} is shown
in figure~\ref{klee-output}.  There, a~temporary symbolic avatar \texttt{var}
is created for the sole purpose of being fixed to the value \texttt{val}
in the path constraints.  From here, this mechanism shall be referred to
in abstract descriptions as \textsc{SymbolicOutput(value, context)}.

\begin{figure}[h]\centering\hfill
  \begin{subfigure}{0.6\textwidth}
\begin{verbatim}
#define KLEE_OUTPUT_GEN(T)                  \
T __klee_output_##T(T val, ...)    { T var; \
  klee_make_symbolic(&var, sizeof(T), ...); \
  klee_assume(var == val);      return val; }
KLEE_OUTPUT_GEN(int) ...
#define KLEE_OUTPUT(T, val, ...) \
  __klee_output_##T((val), ...)
\end{verbatim}
    \subcaption{}\label{klee-output-def}
  \end{subfigure}\hfill
  \begin{subfigure}{0.3\textwidth}
\begin{alltt}
 int foo;
 ...
\textcolor{purple}{-bar(foo);}
\textcolor{teal}{+bar(KLEE_OUTPUT(foo));}\end{alltt}
    \subcaption{}\label{klee-output-use}
  \end{subfigure}\hfill
  \captionsetup{subrefformat=parens}
  \caption{Condition injection \subref{klee-output-def} implementation in C
           and \subref{klee-output-use} example annotation.}\label{klee-output}
\end{figure}