From 0192b63f304c75f225d7ad13c1ee348597195c77 Mon Sep 17 00:00:00 2001 From: Daniel Dunbar Date: Wed, 27 May 2009 16:03:03 +0000 Subject: Move the regular expression example to "Tutorial Two". git-svn-id: https://llvm.org/svn/llvm-project/klee/trunk@72472 91177308-0d34-0410-b5e6-96231b3b80d8 --- www/Examples.html | 134 ------------------------ www/Tutorial-2.html | 287 ++++++++++++++++++++++++++++++++++++++++++++++++++++ www/menu.html.incl | 4 +- www/tutorial-1.html | 4 +- www/tutorials.html | 8 +- 5 files changed, 297 insertions(+), 140 deletions(-) delete mode 100644 www/Examples.html create mode 100644 www/Tutorial-2.html (limited to 'www') diff --git a/www/Examples.html b/www/Examples.html deleted file mode 100644 index 25ee6fb0..00000000 --- a/www/Examples.html +++ /dev/null @@ -1,134 +0,0 @@ - - - - - KLEE - Examples - - - - - - - -
- -

KLEE Examples

- -

Here is a basic example of using KLEE to test a simple regular expression matching function. You can find the basic example in the source tree under examples/regexp.

- -

Example: Regexp.c

- -

Regexp.c contains a simple regular expression matching function, and -the bare bones testing harness (in main) need to explore this code with -klee. You can see a version of the source -code here.

- -

This example will show to build and run the example using KLEE, as well as -how to interpret the output, and some additional KLEE features that can be used -when writing a test driver by hand.

- -

We'll start by showing how to build and run the example, and then explain how -the test harness works in more detail.

- -

Step 1: Building the example

- -

The first step is to compile the source code using a compiler which can -generate object files in LLVM bitcode format. Here we use llvm-gcc, -but Clang works just as well!

- -

From within the examples/regexp directory: -

- $ llvm-gcc -I ../../include -emit-llvm -c -g Regexp.c -
-which should create a Regexp.o file in LLVM bitcode -format. The -I argument is used so that the compiler can -find "klee/klee.h",which -contains definitions for the intrinsic functions used to interact with the KLEE -virtual machine. -c is used because we only want to compile the code to -an object file (not a native executable), and finally -g causes -additional debug information to be stored in the object file, which KLEE will -use to determine source line number information.

- -

If you have the LLVM tools installed in your path, you can verify that this step -worked by running llvm-nm on the generated file:

-
-
-  $ llvm-nm Regexp.o
-         t matchstar
-         t matchhere
-         T match
-         T main
-         U klee_make_symbolic_name
-         d LC
-         d LC1
-
-
- -

Normally before running this program we would need to link it to create a -native executable. However, KLEE runs directly on LLVM bitcode files -- since -this program only has a single file there is no need to link. For "real" -programs with multiple inputs, -the llvm-link -and llvm-ld tools can -be used in place of the regular link step to merge multiple LLVM bitcode files -into a single module which can be executed by KLEE.

- -

Step 2: Executing the code with KLEE

- - -

The next step is to execute the code with KLEE:

-
-
-$ klee --only-output-states-covering-new Regexp.o
-KLEE: output directory = "klee-out-1"
-KLEE: ERROR: .../klee/examples/regexp/Regexp.c:23: memory error: out of bound pointer
-KLEE: NOTE: now ignoring this error at this location
-KLEE: ERROR: .../klee/examples/regexp/Regexp.c:25: memory error: out of bound pointer
-KLEE: NOTE: now ignoring this error at this location
-KLEE: done: total instructions = 6334861
-KLEE: done: completed paths = 7692
-KLEE: done: generated tests = 22
-
-
- -

On startup, KLEE prints the directory used to store output (in this -case klee-out-1). By default klee will use the first -free klee-out-N directory and also create a klee-last -symlink which will point to the most recent created directory. You can specify a -directory to use for outputs using the -output-dir=path -command line argument.

- -

While KLEE is running, it will print status messages for "important" events, -for example when it finds an error in the program. In this case, KLEE detected -to invalid memory accesses on lines 23 and 25 of our test program. We'll look -more at this in a moment.

- -

Finally, when KLEE finishes execution it prints out a few statistics about -the run. Here we see that KLEE executed a total of ~6 million instructions, -explored 7,692 paths, and generated 22 test cases.

- -

Note that many realistic programs have an infinite (or extremely large) -number of paths through them, and it is common that KLEE will not terminate. By -default KLEE will run until the user presses Control-C (i.e. klee gets -a SIGINT), but there are additional options to limit KLEE's runtime and memory -usage:

-

-

- -

Step ?: Changing the test harness

- -

FIXME: Step through & explain the test harness, and how we can modify it.

- -

FIXME: Write: show the important klee.h functions.

- -
- - diff --git a/www/Tutorial-2.html b/www/Tutorial-2.html new file mode 100644 index 00000000..47187a45 --- /dev/null +++ b/www/Tutorial-2.html @@ -0,0 +1,287 @@ + + + + + KLEE - Tutorial Two + + + + + + + +
+ +

Tutorial Two: Testing a Simple Regular Expression Library

+ +

This is an example of using KLEE to test a simple regular expression matching +function. You can find the basic example in the source tree +under examples/regexp.

+ +

Regexp.c contains a simple regular expression matching function, and +the bare bones testing harness (in main) needed to explore this code +with klee. You can see a version of the source +code here.

+ +

This example will show to build and run the example using KLEE, as well as +how to interpret the output, and some additional KLEE features that can be used +when writing a test driver by hand.

+ +

We'll start by showing how to build and run the example, and then explain how +the test harness works in more detail.

+ +

Building the example

+ +

The first step is to compile the source code using a compiler which can +generate object files in LLVM bitcode format. Here we use llvm-gcc, +but Clang works just as well!

+ +

From within the examples/regexp directory: +

+ $ llvm-gcc -I ../../include -emit-llvm -c -g Regexp.c +
+which should create a Regexp.o file in LLVM bitcode +format. The -I argument is used so that the compiler can +find "klee/klee.h", +which contains definitions for the intrinsic functions used to interact with the +KLEE virtual machine. -c is used because we only want to compile the +code to an object file (not a native executable), and finally -g causes +additional debug information to be stored in the object file, which KLEE will +use to determine source line number information.

+ +

If you have the LLVM tools installed in your path, you can verify that this step +worked by running llvm-nm on the generated file:

+
+
+  $ llvm-nm Regexp.o
+         t matchstar
+         t matchhere
+         T match
+         T main
+         U klee_make_symbolic_name
+         d LC
+         d LC1
+
+
+ +

Normally before running this program we would need to link it to create a +native executable. However, KLEE runs directly on LLVM bitcode files -- since +this program only has a single file there is no need to link. For "real" +programs with multiple inputs, +the llvm-link +and llvm-ld tools can +be used in place of the regular link step to merge multiple LLVM bitcode files +into a single module which can be executed by KLEE.

+ +

Executing the code with KLEE

+ + +

The next step is to execute the code with KLEE:

+
+
+$ klee --only-output-states-covering-new Regexp.o
+KLEE: output directory = "klee-out-1"
+KLEE: ERROR: .../klee/examples/regexp/Regexp.c:23: memory error: out of bound pointer
+KLEE: NOTE: now ignoring this error at this location
+KLEE: ERROR: .../klee/examples/regexp/Regexp.c:25: memory error: out of bound pointer
+KLEE: NOTE: now ignoring this error at this location
+KLEE: done: total instructions = 6334861
+KLEE: done: completed paths = 7692
+KLEE: done: generated tests = 22
+
+
+ +

On startup, KLEE prints the directory used to store output (in this +case klee-out-1). By default klee will use the first +free klee-out-N directory and also create a klee-last +symlink which will point to the most recent created directory. You can specify a +directory to use for outputs using the -output-dir=path +command line argument.

+ +

While KLEE is running, it will print status messages for "important" events, +for example when it finds an error in the program. In this case, KLEE detected +to invalid memory accesses on lines 23 and 25 of our test program. We'll look +more at this in a moment.

+ +

Finally, when KLEE finishes execution it prints out a few statistics about +the run. Here we see that KLEE executed a total of ~6 million instructions, +explored 7,692 paths, and generated 22 test cases.

+ +

Note that many realistic programs have an infinite (or extremely large) +number of paths through them, and it is common that KLEE will not terminate. By +default KLEE will run until the user presses Control-C (i.e. klee gets +a SIGINT), but there are additional options to limit KLEE's runtime and memory +usage:

+

+

+ +

KLEE error reports

+ +

When KLEE detects an error in the program being executed it will generate a +test case which exhibits the error, and write some additional information about +the error into a file testN.TYPE.err, where N is +the test case number, and TYPE identifies the kind of error. Some +types of errors KLEE detects include:

+ + + +

KLEE will print a message to the console when it detects an error, in the +test run above we can see that KLEE detected two memory errors. For all program +errors, KLEE will write a simple backtrace into the .err file. This is +what one of the errors above looks like:

+
+
+Error: memory error: out of bound pointer
+File: .../klee/examples/regexp/Regexp.c
+Line: 23
+Stack: 
+	#0 00000146 in matchhere (re=14816471, text=14815301) at .../klee/examples/regexp/Regexp.c:23
+	#1 00000074 in matchstar (c, re=14816471, text=14815301) at .../klee/examples/regexp/Regexp.c:16
+	#2 00000172 in matchhere (re=14816469, text=14815301) at .../klee/examples/regexp/Regexp.c:26
+	#3 00000074 in matchstar (c, re=14816469, text=14815301) at .../klee/examples/regexp/Regexp.c:16
+	#4 00000172 in matchhere (re=14816467, text=14815301) at .../klee/examples/regexp/Regexp.c:26
+	#5 00000074 in matchstar (c, re=14816467, text=14815301) at .../klee/examples/regexp/Regexp.c:16
+	#6 00000172 in matchhere (re=14816465, text=14815301) at .../klee/examples/regexp/Regexp.c:26
+	#7 00000231 in matchhere (re=14816464, text=14815300) at .../klee/examples/regexp/Regexp.c:30
+	#8 00000280 in match (re=14816464, text=14815296) at .../klee/examples/regexp/Regexp.c:38
+	#9 00000327 in main () at .../klee/examples/regexp/Regexp.c:59
+Info: 
+	address: 14816471
+	next: object at 14816624 of size 4
+	prev: object at 14816464 of size 7
+
+
+ +

Each line of the backtrace lists the frame number, the instruction line (this +is the line number in the assembly.ll file found along with the run +output), the function and arguments (including values for the concrete +parameters), and the source information.

+ +

Particular error reports may also include additional information. For memory +errors, KLEE will show the invalid address, and what objects are on the heap +both before and after that address. In this case, we can see that the address +happens to be exactly one byte past the end of the previous object.

+ +

Changing the test harness

+ +

The reason KLEE is finding memory errors in this program isn't because the +regular expression functions have a bug, rather it indicates a problem in our +test driver. The problem is that we are making the input regular expression +buffer completely symbolic, but the match function expects it to be a +null terminated string. Let's look at two ways we can fix this.

+ +

The simplest way to fix this problem is to store '\0' at the end of +the buffer, after making it symbolic. This makes our driver look like this:

+
+
+int main() {
+  // The input regular expression.
+  char re[SIZE];
+  
+  // Make the input symbolic. 
+  klee_make_symbolic(re, sizeof re, "re");
+  re[SIZE - 1] = '\0';
+
+  // Try to match against a constant string "hello".
+  match(re, "hello");
+
+  return 0;
+}
+
+
+

Making a buffer symbolic just initializes the contents to refer to symbolic +variables, we are still free to modify the memory as we wish. If you recompile +and run klee on this test program, the memory errors should now be +gone.

+ +

Another way to accomplish the same effect is to use the klee_assume +intrinsic function. klee_assume takes a single argument (an unsigned +integer) which generally should some kind of conditional expression, and +"assumes" that expression to be true on the current path (if that can never +happen, i.e. the expression is provably false, KLEE will report an error). + +

We can use klee_assume to cause KLEE to only explore states where +the string is null terminated by writing the driver like this:

+
+
+int main() {
+  // The input regular expression.
+  char re[SIZE];
+  
+  // Make the input symbolic. 
+  klee_make_symbolic(re, sizeof re, "re");
+  klee_assume(re[SIZE - 1] == '\0');
+
+  // Try to match against a constant string "hello".
+  match(re, "hello");
+
+  return 0;
+}
+
+
+

In this particular example, both solutions work fine, but in +general klee_assume is more flexible:

+ + +

NOTE: One important caveat when using klee_assume with +multiple conditions; remember that boolean conditionals like '&&' and '||' may +be compiled into code which branches before computing the result of the +expression. In such situations KLEE will branch the process *before* it reaches +the call to klee_assume, which may result in exploring unnecessary +additional states. For this reason it is good to use as simple expressions as +possible to klee_assume (for example splitting a single call into +multiple ones), and to use the '&' and '|' operators instead of the +short-circuiting ones.

+ + + +
+ + diff --git a/www/menu.html.incl b/www/menu.html.incl index 58245138..1c15ab44 100644 --- a/www/menu.html.incl +++ b/www/menu.html.incl @@ -6,15 +6,15 @@