about summary refs log tree commit diff homepage
path: root/www/Tutorial-2.html
blob: 47187a454b58e3ae48f36d2d602860aa4291e65c (plain) (blame)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
          "http://www.w3.org/TR/html4/strict.dtd">
<html>
<head>
  <META http-equiv="Content-Type" content="text/html; charset=ISO-8859-1" />
  <title>KLEE - Tutorial Two</title>
  <link type="text/css" rel="stylesheet" href="menu.css" />
  <link type="text/css" rel="stylesheet" href="content.css" />
</head>
<body>

<!--#include virtual="menu.html.incl"-->

<div id="content">

<h1>Tutorial Two: Testing a Simple Regular Expression Library</h1>

<p>This is an example of using KLEE to test a simple regular expression matching
function. You can find the basic example in the source tree
under <tt>examples/regexp</tt>.</p>

<p><tt>Regexp.c</tt> contains a simple regular expression matching function, and
the bare bones testing harness (in <tt>main</tt>) needed to explore this code
with klee. You can see a version of the source
code <a href="resources/Regexp.c.html">here</a>.</p>

<p>This example will show to build and run the example using KLEE, as well as
how to interpret the output, and some additional KLEE features that can be used
when writing a test driver by hand.</p>

<p>We'll start by showing how to build and run the example, and then explain how
the test harness works in more detail.</p>

<h2>Building the example</h2>

<p>The first step is to compile the source code using a compiler which can
generate object files in LLVM bitcode format. Here we use <tt>llvm-gcc</tt>,
but <a href="http://clang.llvm.org">Clang</a> works just as well!</p>

<p>From within the <tt>examples/regexp</tt> directory:
<div class="instr">
  <b>$ llvm-gcc -I ../../include -emit-llvm -c -g Regexp.c</b>
</div>
which should create a <tt>Regexp.o</tt> file in LLVM bitcode
format. The <tt>-I</tt> argument is used so that the compiler can
find <a href="http://t1.minormatter.com/~ddunbar/klee-doxygen/klee_8h-source.html">"klee/klee.h"</a>,
which contains definitions for the intrinsic functions used to interact with the
KLEE virtual machine. <tt>-c</tt> is used because we only want to compile the
code to an object file (not a native executable), and finally <tt>-g</tt> causes
additional debug information to be stored in the object file, which KLEE will
use to determine source line number information.</p>

<p>If you have the LLVM tools installed in your path, you can verify that this step
worked by running <tt>llvm-nm</tt> on the generated file:</p>
<div class="instr">
<pre>
  <b>$ llvm-nm Regexp.o</b>
         t matchstar
         t matchhere
         T match
         T main
         U klee_make_symbolic_name
         d LC
         d LC1
</pre>
</div>

<p>Normally before running this program we would need to link it to create a
native executable. However, KLEE runs directly on LLVM bitcode files -- since
this program only has a single file there is no need to link. For "real"
programs with multiple inputs,
the <a href="http://llvm.org/cmds/llvm-link.html"><tt>llvm-link</tt></a>
and <a href="http://llvm.org/cmds/llvm-ld.html"><tt>llvm-ld</tt></a> tools can
be used in place of the regular link step to merge multiple LLVM bitcode files
into a single module which can be executed by KLEE.</p>

<h2>Executing the code with KLEE</h2>

<!-- FIXME: Make only-output-states-covering-new default -->
<p>The next step is to execute the code with KLEE:</p>
<div class="instr">
<pre>
<b>$ klee --only-output-states-covering-new Regexp.o</b>
KLEE: output directory = "klee-out-1"
KLEE: ERROR: .../klee/examples/regexp/Regexp.c:23: memory error: out of bound pointer
KLEE: NOTE: now ignoring this error at this location
KLEE: ERROR: .../klee/examples/regexp/Regexp.c:25: memory error: out of bound pointer
KLEE: NOTE: now ignoring this error at this location
KLEE: done: total instructions = 6334861
KLEE: done: completed paths = 7692
KLEE: done: generated tests = 22
</pre>
</div>

<p>On startup, KLEE prints the directory used to store output (in this
case <tt>klee-out-1</tt>). By default klee will use the first
free <tt>klee-out-<em>N</em></tt> directory and also create a <tt>klee-last</tt>
symlink which will point to the most recent created directory. You can specify a
directory to use for outputs using the <tt>-output-dir=<em>path</em></tt>
command line argument.</p>

<p>While KLEE is running, it will print status messages for "important" events,
for example when it finds an error in the program. In this case, KLEE detected
to invalid memory accesses on lines 23 and 25 of our test program. We'll look
more at this in a moment.</p>

<p>Finally, when KLEE finishes execution it prints out a few statistics about
the run. Here we see that KLEE executed a total of ~6 million instructions,
explored 7,692 paths, and generated 22 test cases.</p>

<p>Note that many realistic programs have an infinite (or extremely large)
number of paths through them, and it is common that KLEE will not terminate. By
default KLEE will run until the user presses Control-C (i.e. <tt>klee</tt> gets
a SIGINT), but there are additional options to limit KLEE's runtime and memory
usage:<p>
<ul>
  <li><tt>-max-time=<em>seconds</em></tt>: Halt execution after the given number
  of seconds.</li>
  <li><tt>-max-forks=<em>N</em></tt>: Stop forking after <em>N</em> symbolic
  branches, and run the remaining paths to termination.</li>
  <li><tt>-max-memory=<em>N</em></tt>: Try to limit memory consumption
  to <em>N</em> megabytes.</li>
</ul>
</p>

<h2>KLEE error reports</h2>

<p>When KLEE detects an error in the program being executed it will generate a
test case which exhibits the error, and write some additional information about
the error into a file <tt>test<i>N</i>.<i>TYPE</i>.err</tt>, where <i>N</i> is
the test case number, and <i>TYPE</i> identifies the kind of error. Some
types of errors KLEE detects include:</p>

<ul>
  <li><b>ptr</b>: Stores or loads of invalid memory locations.</li>

  <li><b>free</b>: Double or invalid <tt>free()</tt>.</li>

  <li><b>abort</b>: The program called <tt>abort()</tt>.</li>

  <li><b>assert</b>: An assertion failed.</li>

  <li><b>div</b>: A division or modulus by zero was detected.</li>

  <li><b>user</b>: There is a problem with the input (invalid <tt>klee</tt>
  intrinsic calls) or the way KLEE is being used.</li>

  <li><b>exec</b>: There was a problem which prevented KLEE from executing the
  program; for example an unknown instruction, a call to an invalid function
  pointer, or inline assembly.</li>

  <li><b>model</b>: KLEE was unable to keep full precision and is only exploring
  parts of the program state. For example, symbolic sizes to <tt>malloc</tt> are
  not currently supported, in such cases KLEE will concretize the argument.</li>
</ul>

<p>KLEE will print a message to the console when it detects an error, in the
test run above we can see that KLEE detected two memory errors. For all program
errors, KLEE will write a simple backtrace into the <tt>.err</tt> file. This is
what one of the errors above looks like:</p>
<div class="instr">
<pre>
Error: memory error: out of bound pointer
File: .../klee/examples/regexp/Regexp.c
Line: 23
Stack: 
	#0 00000146 in matchhere (re=14816471, text=14815301) at .../klee/examples/regexp/Regexp.c:23
	#1 00000074 in matchstar (c, re=14816471, text=14815301) at .../klee/examples/regexp/Regexp.c:16
	#2 00000172 in matchhere (re=14816469, text=14815301) at .../klee/examples/regexp/Regexp.c:26
	#3 00000074 in matchstar (c, re=14816469, text=14815301) at .../klee/examples/regexp/Regexp.c:16
	#4 00000172 in matchhere (re=14816467, text=14815301) at .../klee/examples/regexp/Regexp.c:26
	#5 00000074 in matchstar (c, re=14816467, text=14815301) at .../klee/examples/regexp/Regexp.c:16
	#6 00000172 in matchhere (re=14816465, text=14815301) at .../klee/examples/regexp/Regexp.c:26
	#7 00000231 in matchhere (re=14816464, text=14815300) at .../klee/examples/regexp/Regexp.c:30
	#8 00000280 in match (re=14816464, text=14815296) at .../klee/examples/regexp/Regexp.c:38
	#9 00000327 in main () at .../klee/examples/regexp/Regexp.c:59
Info: 
	address: 14816471
	next: object at 14816624 of size 4
	prev: object at 14816464 of size 7
</pre>
</div>

<p>Each line of the backtrace lists the frame number, the instruction line (this
is the line number in the <tt>assembly.ll</tt> file found along with the run
output), the function and arguments (including values for the concrete
parameters), and the source information.</p>

<p>Particular error reports may also include additional information. For memory
errors, KLEE will show the invalid address, and what objects are on the heap
both before and after that address. In this case, we can see that the address
happens to be exactly one byte past the end of the previous object.</p>

<h2>Changing the test harness</h2>

<p>The reason KLEE is finding memory errors in this program isn't because the
regular expression functions have a bug, rather it indicates a problem in our
test driver. The problem is that we are making the input regular expression
buffer completely symbolic, but the <tt>match</tt> function expects it to be a
null terminated string. Let's look at two ways we can fix this.</p>

<p>The simplest way to fix this problem is to store <tt>'\0'</tt> at the end of
the buffer, after making it symbolic. This makes our driver look like this:</p>
<div class="instr">
<pre>
int main() {
  // The input regular expression.
  char re[SIZE];
  
  // Make the input symbolic. 
  klee_make_symbolic(re, sizeof re, "re");
  re[SIZE - 1] = '\0';

  // Try to match against a constant string "hello".
  match(re, "hello");

  return 0;
}
</pre>
</div>
<p>Making a buffer symbolic just initializes the contents to refer to symbolic
variables, we are still free to modify the memory as we wish. If you recompile
and run <tt>klee</tt> on this test program, the memory errors should now be
gone.</p>

<p>Another way to accomplish the same effect is to use the <tt>klee_assume</tt>
intrinsic function. <tt>klee_assume</tt> takes a single argument (an unsigned
integer) which generally should some kind of conditional expression, and
"assumes" that expression to be true on the current path (if that can never
happen, i.e. the expression is provably false, KLEE will report an error).</pp>

<p>We can use <tt>klee_assume</tt> to cause KLEE to only explore states where
the string is null terminated by writing the driver like this:</p>
<div class="instr">
<pre>
int main() {
  // The input regular expression.
  char re[SIZE];
  
  // Make the input symbolic. 
  klee_make_symbolic(re, sizeof re, "re");
  klee_assume(re[SIZE - 1] == '\0');

  // Try to match against a constant string "hello".
  match(re, "hello");

  return 0;
}
</pre>
</div>
<p>In this particular example, both solutions work fine, but in
general <tt>klee_assume</tt> is more flexible:</p>
<ul>
  <li>By explicitly declaring the constraint, this will force test cases to have
  the <tt>'\0'</tt> in them. In the first example where we write the terminating
  null explicitly, it doesn't matter what the last byte of the symbolic input is
  and KLEE is free to generate any value. In some cases where you want to
  inspect the test cases by hand, it is more convenient for the test case to
  show all the values that matter.</li>

  <li><tt>klee_assume</tt> can be used to encode more complicated
  constraints. For example, we could use <tt>klee_assume(re[0] != '^')</tt> to
  cause KLEE to only explore states where the first byte is
  not <tt>'^'</tt>.</li>
</ul>

<p><b>NOTE</b>: One important caveat when using <tt>klee_assume</tt> with
multiple conditions; remember that boolean conditionals like '&&' and '||' may
be compiled into code which branches before computing the result of the
expression. In such situations KLEE will branch the process *before* it reaches
the call to <tt>klee_assume</tt>, which may result in exploring unnecessary
additional states. For this reason it is good to use as simple expressions as
possible to <tt>klee_assume</tt> (for example splitting a single call into
multiple ones), and to use the '&' and '|' operators instead of the
short-circuiting ones.</p>

<!--
<h2>Visualizing what KLEE explored<h2>

<p>For large or long running programs, it is frequently useful to know exactly
what code KLEE explored, where it spent its time, where branches were taken,
and so on.</p>
-->

</div>
</body>
</html>