Coreutils Case Study

As a more detailed explanation of using KLEE, we will look at how we did our testing of GNU coreutils using KLEE.

These tests were done on a 32-bit Intel Linux machine, they aren't likely to work elsewhere. In addition, you will need to have configured and built KLEE with uclibc and POSIX runtime support.

Step 1: Build coreutils with gcov

First you will need to download and unpack the source for coreutils. In this example we use version 6.11, which is what was used for our OSDI paper.

Before we build with LLVM, let's build a version of coreutils with gcov support. We will use this later to get coverage information on the test cases produced by KLEE.

From inside the coreutils directory, we'll do the usual configure/make steps inside a subdirectory (obj-gcov). Here are the steps:

coreutils-6.11$ mkdir obj-gcov
coreutils-6.11$ cd obj-gcov
obj-gcov$ ../configure --disable-nls CFLAGS="-g -fprofile-arcs -ftest-coverage"
... verify that configure worked ...
obj-gcov$ make
... verify that make worked ... 

We build with --disable-nls because this adds a lot of extra initialization in the C library which we are not interested in testing. Even though these aren't the executables that KLEE will be running on, we want to use the same compiler flags so that the test cases KLEE generates are most likely to work correctly when run on the uninstrumented binaries.

You should now have a set of coreutils in the objc-gcov/src directory. For example:

obj-gcov$ cd src
src$ ls -l ls echo cat
-rwxr-xr-x 1 ddunbar ddunbar 164841 2009-07-25 20:58 cat
-rwxr-xr-x 1 ddunbar ddunbar 151051 2009-07-25 20:59 echo
-rwxr-xr-x 1 ddunbar ddunbar 439712 2009-07-25 20:58 ls
src$ ./cat --version
cat (GNU coreutils) 6.11
Copyright (C) 2008 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later 
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Written by Torbjorn Granlund and Richard M. Stallman.

Step 2: Build coreutils with LLVM

One of the difficult parts of testing real software using KLEE is that it must be first compiled so that the final program is an LLVM bitcode file and not a linked executable. The software's build system may be set up to use tools such as 'ar', 'libtool', and 'ld', which do not in general understand LLVM bitcode files.

It depends on the actual project what the best way to do this is. For coreutils, we use a helper script klee-gcc, which acts like llvm-gcc but adds additional arguments to cause it to emit LLVM bitcode files and to call llvm-ld to link executables. This is not a general solution, and your mileage may vary.

As before, we will build in a separate directory so we can easily access both the native executables and the LLVM versions. Here are the steps:

coreutils-6.11$ mkdir obj-llvm
coreutils-6.11$ cd obj-llvm
obj-llvm$ ../configure --disable-nls CFLAGS="-g"
... verify that configure worked ...
obj-llvm$ make CC=/full/path/to/klee/scripts/klee-gcc
... verify that make worked ... 

Notice that we made two changes. First, we don't want to add gcov instrumentation in the binary we are going to test using KLEE, so we left of the -fprofile-arcs -ftest-coverage flags. Second, when running make, we set the CC variable to point to our klee-gcc wrapper script.

If all went well, you should now have LLVM bitcode versions of coreutils! For example:

obj-llvm$ cd src
src$ ls -l ls echo cat
-rwxr-xr-x 1 ddunbar ddunbar 65 2009-07-25 23:40 cat
-rwxr-xr-x 1 ddunbar ddunbar 66 2009-07-25 23:43 echo
-rwxr-xr-x 1 ddunbar ddunbar 94 2009-07-25 23:38 ls
src$ ./cat --version
cat (GNU coreutils) 6.11
Copyright (C) 2008 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later 
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

LLVM ERROR: JIT does not support inline asm! 

You may notice some funny things going on. To start with, the files are way too small! Since we are actually producing LLVM bitcode files, the operating system can't run them directly. What llvm-ld does to make it so we can still run the resulting outputs is write a little shell script which uses the LLVM interpreter to run the binaries; the actual LLVM bitcode files have .bc appended. If we look a little closer:

src$ cat ls
#!/bin/sh
lli=${LLVMINTERP-lli}
exec $lli \
    -load=/usr/lib/librt.so \
    ls.bc ${1+"$@"}
src$ ls -l ls.bc
-rwxr-xr-x 1 ddunbar ddunbar 643640 2009-07-25 23:38 ls.bc 

The other funny thing is that the version message doesn't all print out, the LLVM interpreter emits a message about not supporting inline assembly. The problem here is that glibc occasionally implements certain operations using inline assembly, which the LLVM interpreter (lli) doesn't understand. KLEE works around this problem by specially recognizing certain common inline assembly sequences and turning them back into the appropriate LLVM instructions before executing the binary.

Step 3: Using KLEE as an interpreter

At its core, KLEE is just an interpreter for LLVM bitcode. For example, here is how to run the same cat command we did before, using KLEE. Note, this step requires that you configured and built KLEE with uclibc and POSIX runtime support (if you didn't, you'll need to go do that now).

src$ klee --libc=uclibc --posix-runtime ./cat.bc --version
KLEE: NOTE: Using model: /home/ddunbar/public/klee/Release/lib/libkleeRuntimePOSIX.bca
KLEE: output directory = "klee-out-3"
KLEE: WARNING: undefined reference to function: __signbitl
KLEE: WARNING: executable has module level assembly (ignoring)
KLEE: WARNING: calling external: syscall(54, 0, 21505, 177325672)
KLEE: WARNING: calling __user_main with extra arguments.
KLEE: WARNING: calling external: getpagesize()
KLEE: WARNING: calling external: vprintf(177640072, 183340048)
cat (GNU coreutils) 6.11

License GPLv3+: GNU GPL version 3 or later 
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Written by Torbjorn Granlund and Richard M. Stallman.
KLEE: WARNING: calling close_stdout with extra arguments.
Copyright (C) 2008 Free Software Foundation, Inc.
KLEE: done: total instructions = 259357
KLEE: done: completed paths = 1
KLEE: done: generated tests = 1
  

We got a lot more output this time! Let's step through it, starting with the KLEE command itself. The general form of a KLEE command line is first the arguments for KLEE itself, then the LLVM bitcode file to execute (cat.bc), and then any arguments to pass to the application (--version in this case, as before).

If we were running a normal native application, it would have been linked with the C library, but in this case KLEE is running the LLVM bitcode file directly. In order for KLEE to work effectively, it needs to have definitions for all the external functions the program may call. We have modified the uClibc C library implementation for use with KLEE; the --libc=uclibc KLEE argument tells KLEE to load that library and link it with the application before it starts execution.

Similarly, a native application would be running on top of an operating system that provides lower level facilities like write(), which the C library uses in its implementation. As before, KLEE needs definitions for these functions in order to fully understand the program. We provide a POSIX runtime which is designed to work with KLEE and the uClibc library to provide the majority of operating system facilities used by command line applications -- the --posix-runtime argument tells KLEE to link this library in as well.

As before, cat prints out its version information (note that this time all the text is written out), but we now have a number of additional information output by KLEE. In this case, most of these warnings are innocuous, but for completeness here is what they mean:

In general, KLEE will only emit a given warning once. The warnings are also logged to warnings.txt in the KLEE output directory.

Step 4: Introducing symbolic data to an application

We've seen that KLEE can interpret a program normally, but the real purpose of KLEE is to explore programs more exhaustively by making parts of their input symbolic. For example, lets look at running KLEE on the echo application.

When using uClibc and the POSIX runtime, KLEE offers an additional argument --init-env which replaces the programs main() function with a special function (klee_init_env) provided inside the runtime library. This function alters the normal command line processing of the application, in particular to support construction of symbolic arguments. For example, passing --help yields:

src$ klee --libc=uclibc --posix-runtime --init-env ./echo.bc --help
...

usage: (klee_init_env) [options] [program arguments]
  -sym-arg               - Replace by a symbolic argument with length N
  -sym-args    - Replace by at least MIN arguments and at most
                              MAX arguments, each with maximum length N
  -sym-files        - Make stdin and up to NUM symbolic files, each
                              with maximum size N.
  -sym-stdout               - Make stdout symbolic.
  -max-fail              - Allow up to  injected failures
  -fd-fail                  - Shortcut for '-max-fail 1'
...
  

As an example, lets run echo with a symbolic argument of 3 characters.

src$ klee --libc=uclibc --posix-runtime --init-env ./echo.bc --sym-arg 3
KLEE: NOTE: Using model: /home/ddunbar/public/klee/Release/lib/libkleeRuntimePOSIX.bca
KLEE: output directory = "klee-out-16"
KLEE: WARNING: undefined reference to function: __signbitl
KLEE: WARNING: executable has module level assembly (ignoring)
KLEE: WARNING: calling external: syscall(54, 0, 21505, 189414856)
KLEE: WARNING: calling __user_main with extra arguments.
..
KLEE: WARNING: calling close_stdout with extra arguments.
...
KLEE: WARNING: calling external: printf(183664896, 183580400)
Usage: ./echo.bc [OPTION]... [STRING]...
Echo the STRING(s) to standard output.

  -n             do not output the trailing newline
  -e             enable interpretation of backslash escapes
  -E             disable interpretation of backslash escapes (default)
      --help     display this help and exit
      --version  output version information and exit

If -e is in effect, the following sequences are recognized:

  \0NNN   the character whose ASCII code is NNN (octal)
  \\     backslash
  \a     alert (BEL)
  \b     backspace
  \c     suppress trailing newline
  \f     form feed
  \n     new line
  \r     carriage return
  \t     horizontal tab
  \v     vertical tab

NOTE: your shell may have its own version of echo, which usually supersedes
the version described here.  Please refer to your shell's documentation
for details about the options it supports.

Report bugs to .
KLEE: WARNING: calling external: vprintf(183956664, 190534360)
echo (GNU coreutils) 6.11

License GPLv3+: GNU GPL version 3 or later 
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Written by FIXME unknown.
...
...
...




..


.

.
..
...
Copyright (C) 2008 Free Software Foundation, Inc.
KLEE: done: total instructions = 300193
KLEE: done: completed paths = 25
KLEE: done: generated tests = 25
  

The results here are slightly more interesting, KLEE has explored 25 paths through the program. The output from all the paths is intermingled, but you can see that in addition to echoing various random characters, some blocks of text also were output. You may be suprised to learn that coreutils' echo takes some arguments, in this case the options --v (short for --version) and --h (short for --help) were explored. We can get a short summary of KLEE's internal statistics by running klee-stats on the output directory (remember, KLEE always makes a symlink called klee-last to the most recent output directory).

src$ klee-stats klee-last
-------------------------------------------------------------------------
| Path      | Instrs | Time(s) | ICov(%) | BCov(%) | ICount | Solver(%) |
-------------------------------------------------------------------------
| klee-last | 300673 |    1.47 |   28.18 |   17.37 |  28635 |      5.65 |
-------------------------------------------------------------------------

Here ICov is the percentage of LLVM instructions which were covered, and BCov is the percentage of branches that were covered. You may be wondering why the percentages are so low -- how much more code can echo have! The main reason is that these numbers are computed using all the instructions or branches in the bitcode files; that includes a lot of library code which may not even be executable. We can help with that problem (and others) by passing the --optimize option to KLEE. This will cause KLEE to run the LLVM optimization passes on the bitcode module before executing it; in particular they will remove any dead code. When working with non-trivial applications, it is almost always a good idea to use this flag. Here are the results from running again with --optimze enabled:

src$ klee --optimize --libc=uclibc --posix-runtime --init-env ./echo.bc --sym-arg 3
...
KLEE: done: total instructions = 123251
KLEE: done: completed paths = 25
KLEE: done: generated tests = 25
src$ klee-stats klee-last
-------------------------------------------------------------------------
| Path      | Instrs | Time(s) | ICov(%) | BCov(%) | ICount | Solver(%) |
-------------------------------------------------------------------------
| klee-last | 123251 |    0.32 |   38.02 |   25.43 |   9531 |     29.66 |
-------------------------------------------------------------------------

This time the instruction coverage went up by about ten percent, and you can see that KLEE also ran faster and executed less instructions. Most of the remaining code is still in library functions, just in places that the optimizers aren't smart enough to remove. We can verify this -- and look for uncovered code inside echo -- by using KCachegrind to visualize the results of a KLEE run.

Step 5: Visualizing KLEE's progress with kcachegrind

KCachegrind is an excellent profiling visualization tool, originally written for use with the callgrind plugin for valgrind. If you don't have it already, it is usually easily available on a modern Linux distribution via your platforms' software installation tool (e.g., apt-get or yum).

KLEE by default writes out a run.istats file into the test output directory which is actually a KCachegrind file. In this example, the run.istats is from a run without --optimize, so the results are easier to understand. Assuming you have KCachegrind installed, just run:

 src$ kcachegrind klee-last/run.istats 

After KCachegrind opens, you should see a window that looks something like the one below. You should make sure that the "Instructions" statistic is selected by choosing "View" > "Primary Event Type" > "Instructions" from the menu, and make sure the "Source Code" view is selected (the right hand pane in the screenshot below).

KCachegrind is a complex application in itself, and interested users should see the KCachegrind website for more information and documentation. However, the basics are that the one pane shows the "Flat Profile"; this is a list of which how many instructions were executed in each function. The "Self" column is the number of instructions which were executed in the function itself, and the "Incl" (inclusive) column is the number of instructions which were executed in the function, or any of the functions it called (or its callees called, and so on).

KLEE includes quite a few statistics about execution. The one we are interested in now is "Uncovered Instructions", which will show which functions have instructions which were never executed. If you select that statistic and resort the list of functions, you should see something like this:

Notice that most of the uncovered instructions are in library code as we would expect. However, if we select the __user_main function, we can look for code inside echo itself that was uncovered. In this case, most of the uncovered instructions are inside a large if statement guarded by the variable do_v9. If you look a bit more, you can see that this is a flag set to true when -e is passed. The reason that KLEE never explored this code is because we only passed one symbolic argument -- hitting this code requires a command line like $ echo -e \a.

One subtle thing to understand if you are trying to actually make sense of the KCachegrind numbers is that they include events accumulated across all states. For example, consider the following code:

Line 1:      a = 1;
Line 2:      if (...)
Line 3:        printf("hello\n");
Line 4:      b = c; 

In a normal application, if the statement on Line 1 was only executed once, then the statement on Line 4 could be (at most) executed once. When KLEE is running an application, however, it could fork and generate separate processes at Line 2. In that case, Line 4 may be executed more times than Line 1!

Another useful tidbit: KLEE actually writes the run.istats file periodically as the application is running. This provides one way to monitor the status of long running applications (another way is to use the klee-stats tool).

Step 6: Replaying KLEE generated test cases

To be written.

Step 7: Running KLEE on larger applications

To be written.