diff options
-rw-r--r-- | instrumentation/README.cmplog.md | 20 | ||||
-rw-r--r-- | instrumentation/README.gcc_plugin.md | 152 | ||||
-rw-r--r-- | instrumentation/README.instrument_list.md | 109 | ||||
-rw-r--r-- | instrumentation/README.laf-intel.md | 66 | ||||
-rw-r--r-- | instrumentation/README.lto.md | 322 | ||||
-rw-r--r-- | instrumentation/README.persistent_mode.md | 2 |
6 files changed, 350 insertions, 321 deletions
diff --git a/instrumentation/README.cmplog.md b/instrumentation/README.cmplog.md index a796c7a7..146b4620 100644 --- a/instrumentation/README.cmplog.md +++ b/instrumentation/README.cmplog.md @@ -1,11 +1,12 @@ # CmpLog instrumentation -The CmpLog instrumentation enables logging of comparison operands in a -shared memory. +The CmpLog instrumentation enables logging of comparison operands in a shared +memory. -These values can be used by various mutators built on top of it. -At the moment we support the RedQueen mutator (input-2-state instructions only), -for details see [the RedQueen paper](https://www.syssec.ruhr-uni-bochum.de/media/emma/veroeffentlichungen/2018/12/17/NDSS19-Redqueen.pdf). +These values can be used by various mutators built on top of it. At the moment, +we support the RedQueen mutator (input-2-state instructions only), for details +see +[the RedQueen paper](https://www.syssec.ruhr-uni-bochum.de/media/emma/veroeffentlichungen/2018/12/17/NDSS19-Redqueen.pdf). ## Build @@ -14,7 +15,8 @@ program. The first version is built using the regular AFL++ instrumentation. -The second one, the CmpLog binary, is built with setting AFL_LLVM_CMPLOG during the compilation. +The second one, the CmpLog binary, is built with setting AFL_LLVM_CMPLOG during +the compilation. For example: @@ -32,8 +34,8 @@ unset AFL_LLVM_CMPLOG ## Use -AFL++ has the new `-c` option that needs to be used to specify the CmpLog binary (the second -build). +AFL++ has the new `-c` option that needs to be used to specify the CmpLog binary +(the second build). For example: @@ -41,4 +43,4 @@ For example: afl-fuzz -i input -o output -c ./program.cmplog -m none -- ./program.afl @@ ``` -Be sure to use `-m none` because CmpLog can map a lot of pages. +Be sure to use `-m none` because CmpLog can map a lot of pages. \ No newline at end of file diff --git a/instrumentation/README.gcc_plugin.md b/instrumentation/README.gcc_plugin.md index 230ceb73..33cf1c33 100644 --- a/instrumentation/README.gcc_plugin.md +++ b/instrumentation/README.gcc_plugin.md @@ -1,64 +1,68 @@ # GCC-based instrumentation for afl-fuzz -See [../README.md](../README.md) for the general instruction manual. -See [README.llvm.md](README.llvm.md) for the LLVM-based instrumentation. +For the general instruction manual, see [../README.md](../README.md). +For the LLVM-based instrumentation, see [README.llvm.md](README.llvm.md). This document describes how to build and use `afl-gcc-fast` and `afl-g++-fast`, which instrument the target with the help of gcc plugins. -TLDR: - * check the version of your gcc compiler: `gcc --version` - * `apt-get install gcc-VERSION-plugin-dev` or similar to install headers for gcc plugins - * `gcc` and `g++` must match the gcc-VERSION you installed headers for. You can set `AFL_CC`/`AFL_CXX` - to point to these! - * `make` - * just use `afl-gcc-fast`/`afl-g++-fast` normally like you would do with `afl-clang-fast` +TL;DR: +* Check the version of your gcc compiler: `gcc --version` +* `apt-get install gcc-VERSION-plugin-dev` or similar to install headers for gcc + plugins. +* `gcc` and `g++` must match the gcc-VERSION you installed headers for. You can + set `AFL_CC`/`AFL_CXX` to point to these! +* `make` +* Just use `afl-gcc-fast`/`afl-g++-fast` normally like you would do with + `afl-clang-fast`. ## 1) Introduction -The code in this directory allows to instrument programs for AFL using -true compiler-level instrumentation, instead of the more crude -assembly-level rewriting approach taken by afl-gcc and afl-clang. This has -several interesting properties: +The code in this directory allows to instrument programs for AFL++ using true +compiler-level instrumentation, instead of the more crude assembly-level +rewriting approach taken by afl-gcc and afl-clang. This has several interesting +properties: - - The compiler can make many optimizations that are hard to pull off when - manually inserting assembly. As a result, some slow, CPU-bound programs will - run up to around faster. +- The compiler can make many optimizations that are hard to pull off when + manually inserting assembly. As a result, some slow, CPU-bound programs will + run up to around faster. - The gains are less pronounced for fast binaries, where the speed is limited - chiefly by the cost of creating new processes. In such cases, the gain will - probably stay within 10%. + The gains are less pronounced for fast binaries, where the speed is limited + chiefly by the cost of creating new processes. In such cases, the gain will + probably stay within 10%. - - The instrumentation is CPU-independent. At least in principle, you should - be able to rely on it to fuzz programs on non-x86 architectures (after - building `afl-fuzz` with `AFL_NOX86=1`). +- The instrumentation is CPU-independent. At least in principle, you should be + able to rely on it to fuzz programs on non-x86 architectures (after building + `afl-fuzz` with `AFL_NOX86=1`). - - Because the feature relies on the internals of GCC, it is gcc-specific - and will *not* work with LLVM (see [README.llvm.md](README.llvm.md) for an alternative). +- Because the feature relies on the internals of GCC, it is gcc-specific and + will *not* work with LLVM (see [README.llvm.md](README.llvm.md) for an + alternative). Once this implementation is shown to be sufficiently robust and portable, it -will probably replace afl-gcc. For now, it can be built separately and -co-exists with the original code. +will probably replace afl-gcc. For now, it can be built separately and co-exists +with the original code. The idea and much of the implementation comes from Laszlo Szekeres. ## 2) How to use -In order to leverage this mechanism, you need to have modern enough GCC -(>= version 4.5.0) and the plugin development headers installed on your system. That +In order to leverage this mechanism, you need to have modern enough GCC (>= +version 4.5.0) and the plugin development headers installed on your system. That should be all you need. On Debian machines, these headers can be acquired by installing the `gcc-VERSION-plugin-dev` packages. To build the instrumentation itself, type `make`. This will generate binaries -called `afl-gcc-fast` and `afl-g++-fast` in the parent directory. +called `afl-gcc-fast` and `afl-g++-fast` in the parent directory. -The gcc and g++ compiler links have to point to gcc-VERSION - or set these -by pointing the environment variables `AFL_CC`/`AFL_CXX` to them. -If the `CC`/`CXX` environment variables have been set, those compilers will be -preferred over those from the `AFL_CC`/`AFL_CXX` settings. +The gcc and g++ compiler links have to point to gcc-VERSION - or set these by +pointing the environment variables `AFL_CC`/`AFL_CXX` to them. If the `CC`/`CXX` +environment variables have been set, those compilers will be preferred over +those from the `AFL_CC`/`AFL_CXX` settings. Once this is done, you can instrument third-party code in a way similar to the -standard operating mode of AFL, e.g.: +standard operating mode of AFL++, e.g.: + ``` CC=/path/to/afl/afl-gcc-fast CXX=/path/to/afl/afl-g++-fast @@ -66,15 +70,15 @@ standard operating mode of AFL, e.g.: ./configure [...options...] make ``` + Note: We also used `CXX` to set the C++ compiler to `afl-g++-fast` for C++ code. The tool honors roughly the same environmental variables as `afl-gcc` (see -[env_variables.md](../docs/env_variables.md). This includes `AFL_INST_RATIO`, -`AFL_USE_ASAN`, `AFL_HARDEN`, and `AFL_DONT_OPTIMIZE`. +[docs/env_variables.md](../docs/env_variables.md). This includes +`AFL_INST_RATIO`, `AFL_USE_ASAN`, `AFL_HARDEN`, and `AFL_DONT_OPTIMIZE`. -Note: if you want the GCC plugin to be installed on your system for all -users, you need to build it before issuing 'make install' in the parent -directory. +Note: if you want the GCC plugin to be installed on your system for all users, +you need to build it before issuing 'make install' in the parent directory. ## 3) Gotchas, feedback, bugs @@ -83,41 +87,40 @@ reports to afl@aflplus.plus. ## 4) Bonus feature #1: deferred initialization -AFL tries to optimize performance by executing the targeted binary just once, -stopping it just before main(), and then cloning this "main" process to get -a steady supply of targets to fuzz. +AFL++ tries to optimize performance by executing the targeted binary just once, +stopping it just before `main()`, and then cloning this "main" process to get a +steady supply of targets to fuzz. -Although this approach eliminates much of the OS-, linker- and libc-level -costs of executing the program, it does not always help with binaries that -perform other time-consuming initialization steps - say, parsing a large config -file before getting to the fuzzed data. +Although this approach eliminates much of the OS-, linker- and libc-level costs +of executing the program, it does not always help with binaries that perform +other time-consuming initialization steps - say, parsing a large config file +before getting to the fuzzed data. In such cases, it's beneficial to initialize the forkserver a bit later, once most of the initialization work is already done, but before the binary attempts to read the fuzzed input and parse it; in some cases, this can offer a 10x+ performance gain. You can implement delayed initialization in GCC mode in a -fairly simple way. +fairly simple way: -First, locate a suitable location in the code where the delayed cloning can -take place. This needs to be done with *extreme* care to avoid breaking the -binary. In particular, the program will probably malfunction if you select -a location after: +First, locate a suitable location in the code where the delayed cloning can take +place. This needs to be done with *extreme* care to avoid breaking the binary. +In particular, the program will probably malfunction if you select a location +after: - - The creation of any vital threads or child processes - since the forkserver - can't clone them easily. +- The creation of any vital threads or child processes - since the forkserver + can't clone them easily. - - The initialization of timers via setitimer() or equivalent calls. +- The initialization of timers via `setitimer()` or equivalent calls. - - The creation of temporary files, network sockets, offset-sensitive file - descriptors, and similar shared-state resources - but only provided that - their state meaningfully influences the behavior of the program later on. +- The creation of temporary files, network sockets, offset-sensitive file + descriptors, and similar shared-state resources - but only provided that their + state meaningfully influences the behavior of the program later on. - - Any access to the fuzzed input, including reading the metadata about its - size. +- Any access to the fuzzed input, including reading the metadata about its size. With the location selected, add this code in the appropriate spot: -``` +```c #ifdef __AFL_HAVE_MANUAL_CONTROL __AFL_INIT(); #endif @@ -131,14 +134,14 @@ Finally, recompile the program with afl-gcc-fast (afl-gcc or afl-clang will ## 5) Bonus feature #2: persistent mode -Some libraries provide APIs that are stateless, or whose state can be reset in +Some libraries provide APIs that are stateless or whose state can be reset in between processing different input files. When such a reset is performed, a single long-lived process can be reused to try out multiple test cases, eliminating the need for repeated `fork()` calls and the associated OS overhead. The basic structure of the program that does this would be: -``` +```c while (__AFL_LOOP(1000)) { /* Read input data. */ @@ -147,22 +150,21 @@ The basic structure of the program that does this would be: } - /* Exit normally */ + /* Exit normally. */ ``` -The numerical value specified within the loop controls the maximum number -of iterations before AFL will restart the process from scratch. This minimizes +The numerical value specified within the loop controls the maximum number of +iterations before AFL++ will restart the process from scratch. This minimizes the impact of memory leaks and similar glitches; 1000 is a good starting point. -A more detailed template is shown in ../utils/persistent_mode/. -Similarly to the previous mode, the feature works only with afl-gcc-fast or -afl-clang-fast; #ifdef guards can be used to suppress it when using other -compilers. +A more detailed template is shown in ../utils/persistent_mode/. Similarly to the +previous mode, the feature works only with afl-gcc-fast or afl-clang-fast; +#ifdef guards can be used to suppress it when using other compilers. -Note that as with the previous mode, the feature is easy to misuse; if you -do not reset the critical state fully, you may end up with false positives or -waste a whole lot of CPU power doing nothing useful at all. Be particularly -wary of memory leaks and the state of file descriptors. +Note that as with the previous mode, the feature is easy to misuse; if you do +not reset the critical state fully, you may end up with false positives or waste +a whole lot of CPU power doing nothing useful at all. Be particularly wary of +memory leaks and the state of file descriptors. When running in this mode, the execution paths will inherently vary a bit depending on whether the input loop is being entered for the first time or @@ -171,5 +173,5 @@ executed again. To avoid spurious warnings, the feature implies ## 6) Bonus feature #3: selective instrumentation -It can be more effective to fuzzing to only instrument parts of the code. -For details see [README.instrument_list.md](README.instrument_list.md). +It can be more effective to fuzzing to only instrument parts of the code. For +details, see [README.instrument_list.md](README.instrument_list.md). \ No newline at end of file diff --git a/instrumentation/README.instrument_list.md b/instrumentation/README.instrument_list.md index 7db9c055..b412b600 100644 --- a/instrumentation/README.instrument_list.md +++ b/instrumentation/README.instrument_list.md @@ -1,80 +1,84 @@ # Using AFL++ with partial instrumentation - This file describes two different mechanisms to selectively instrument - only specific parts in the target. +This file describes two different mechanisms to selectively instrument only +specific parts in the target. - Both mechanisms work for LLVM and GCC_PLUGIN, but not for afl-clang/afl-gcc. +Both mechanisms work for LLVM and GCC_PLUGIN, but not for afl-clang/afl-gcc. ## 1) Description and purpose When building and testing complex programs where only a part of the program is -the fuzzing target, it often helps to only instrument the necessary parts of -the program, leaving the rest uninstrumented. This helps to focus the fuzzer -on the important parts of the program, avoiding undesired noise and -disturbance by uninteresting code being exercised. +the fuzzing target, it often helps to only instrument the necessary parts of the +program, leaving the rest uninstrumented. This helps to focus the fuzzer on the +important parts of the program, avoiding undesired noise and disturbance by +uninteresting code being exercised. For this purpose, "partial instrumentation" support is provided by AFL++ that allows to specify what should be instrumented and what not. -Both mechanisms can be used together. +Both mechanisms for partial instrumentation can be used together. ## 2) Selective instrumentation with __AFL_COVERAGE_... directives -In this mechanism the selective instrumentation is done in the source code. +In this mechanism, the selective instrumentation is done in the source code. -After the includes a special define has to be made, eg.: +After the includes, a special define has to be made, e.g.: ``` #include <stdio.h> #include <stdint.h> // ... - + __AFL_COVERAGE(); // <- required for this feature to work ``` -If you want to disable the coverage at startup until you specify coverage -should be started, then add `__AFL_COVERAGE_START_OFF();` at that position. +If you want to disable the coverage at startup until you specify coverage should +be started, then add `__AFL_COVERAGE_START_OFF();` at that position. -From here on out you have the following macros available that you can use -in any function where you want: +From here on out, you have the following macros available that you can use in +any function where you want: - * `__AFL_COVERAGE_ON();` - enable coverage from this point onwards - * `__AFL_COVERAGE_OFF();` - disable coverage from this point onwards - * `__AFL_COVERAGE_DISCARD();` - reset all coverage gathered until this point - * `__AFL_COVERAGE_SKIP();` - mark this test case as unimportant. Whatever happens, afl-fuzz will ignore it. +* `__AFL_COVERAGE_ON();` - Enable coverage from this point onwards. +* `__AFL_COVERAGE_OFF();` - Disable coverage from this point onwards. +* `__AFL_COVERAGE_DISCARD();` - Reset all coverage gathered until this point. +* `__AFL_COVERAGE_SKIP();` - Mark this test case as unimportant. Whatever + happens, afl-fuzz will ignore it. -A special function is `__afl_coverage_interesting`. -To use this, you must define `void __afl_coverage_interesting(u8 val, u32 id);`. -Then you can use this function globally, where the `val` parameter can be set -by you, the `id` parameter is for afl-fuzz and will be overwritten. -Note that useful parameters for `val` are: 1, 2, 3, 4, 8, 16, 32, 64, 128. -A value of e.g. 33 will be seen as 32 for coverage purposes. +A special function is `__afl_coverage_interesting`. To use this, you must define +`void __afl_coverage_interesting(u8 val, u32 id);`. Then you can use this +function globally, where the `val` parameter can be set by you, the `id` +parameter is for afl-fuzz and will be overwritten. Note that useful parameters +for `val` are: 1, 2, 3, 4, 8, 16, 32, 64, 128. A value of, e.g., 33 will be seen +as 32 for coverage purposes. ## 3) Selective instrumentation with AFL_LLVM_ALLOWLIST/AFL_LLVM_DENYLIST -This feature is equivalent to llvm 12 sancov feature and allows to specify -on a filename and/or function name level to instrument these or skip them. +This feature is equivalent to llvm 12 sancov feature and allows to specify on a +filename and/or function name level to instrument these or skip them. ### 3a) How to use the partial instrumentation mode In order to build with partial instrumentation, you need to build with -afl-clang-fast/afl-clang-fast++ or afl-clang-lto/afl-clang-lto++. -The only required change is that you need to set either the environment variable -AFL_LLVM_ALLOWLIST or AFL_LLVM_DENYLIST set with a filename. +afl-clang-fast/afl-clang-fast++ or afl-clang-lto/afl-clang-lto++. The only +required change is that you need to set either the environment variable +`AFL_LLVM_ALLOWLIST` or `AFL_LLVM_DENYLIST` set with a filename. That file should contain the file names or functions that are to be instrumented -(AFL_LLVM_ALLOWLIST) or are specifically NOT to be instrumented (AFL_LLVM_DENYLIST). +(`AFL_LLVM_ALLOWLIST`) or are specifically NOT to be instrumented +(`AFL_LLVM_DENYLIST`). + +GCC_PLUGIN: you can use either `AFL_LLVM_ALLOWLIST` or `AFL_GCC_ALLOWLIST` (or +the same for `_DENYLIST`), both work. -GCC_PLUGIN: you can use either AFL_LLVM_ALLOWLIST or AFL_GCC_ALLOWLIST (or the -same for _DENYLIST), both work. +For matching to succeed, the function/file name that is being compiled must end +in the function/file name entry contained in this instrument file list. That is +to avoid breaking the match when absolute paths are used during compilation. -For matching to succeed, the function/file name that is being compiled must end in the -function/file name entry contained in this instrument file list. That is to avoid -breaking the match when absolute paths are used during compilation. +**NOTE:** In builds with optimization enabled, functions might be inlined and +would not match! -**NOTE:** In builds with optimization enabled, functions might be inlined and would not match! +For example, if your source tree looks like this: -For example if your source tree looks like this: ``` project/ project/feature_a/a1.cpp @@ -83,36 +87,45 @@ project/feature_b/b1.cpp project/feature_b/b2.cpp ``` -and you only want to test feature_a, then create an "instrument file list" file containing: +And you only want to test feature_a, then create an "instrument file list" file +containing: + ``` feature_a/a1.cpp feature_a/a2.cpp ``` -However if the "instrument file list" file contains only this, it works as well: +However, if the "instrument file list" file contains only this, it works as +well: + ``` a1.cpp a2.cpp ``` -but it might lead to files being unwantedly instrumented if the same filename + +But it might lead to files being unwantedly instrumented if the same filename exists somewhere else in the project directories. -You can also specify function names. Note that for C++ the function names -must be mangled to match! `nm` can print these names. +You can also specify function names. Note that for C++ the function names must +be mangled to match! `nm` can print these names. + +AFL++ is able to identify whether an entry is a filename or a function. However, +if you want to be sure (and compliant to the sancov allow/blocklist format), you +can specify source file entries like this: -AFL++ is able to identify whether an entry is a filename or a function. -However if you want to be sure (and compliant to the sancov allow/blocklist -format), you can specify source file entries like this: ``` src: *malloc.c ``` -and function entries like this: + +And function entries like this: + ``` fun: MallocFoo ``` + Note that whitespace is ignored and comments (`# foo`) are supported. ### 3b) UNIX-style pattern matching You can add UNIX-style pattern matching in the "instrument file list" entries. -See `man fnmatch` for the syntax. We do not set any of the `fnmatch` flags. +See `man fnmatch` for the syntax. We do not set any of the `fnmatch` flags. \ No newline at end of file diff --git a/instrumentation/README.laf-intel.md b/instrumentation/README.laf-intel.md index 789055ed..3cde10c3 100644 --- a/instrumentation/README.laf-intel.md +++ b/instrumentation/README.laf-intel.md @@ -2,19 +2,17 @@ ## Introduction -This originally is the work of an individual nicknamed laf-intel. -His blog [Circumventing Fuzzing Roadblocks with Compiler Transformations](https://lafintel.wordpress.com/) -and gitlab repo [laf-llvm-pass](https://gitlab.com/laf-intel/laf-llvm-pass/) -describe some code transformations that -help AFL++ to enter conditional blocks, where conditions consist of -comparisons of large values. +This originally is the work of an individual nicknamed laf-intel. His blog +[Circumventing Fuzzing Roadblocks with Compiler Transformations](https://lafintel.wordpress.com/) +and GitLab repo [laf-llvm-pass](https://gitlab.com/laf-intel/laf-llvm-pass/) +describe some code transformations that help AFL++ to enter conditional blocks, +where conditions consist of comparisons of large values. ## Usage -By default these passes will not run when you compile programs using -afl-clang-fast. Hence, you can use AFL as usual. -To enable the passes you must set environment variables before you -compile the target project. +By default, these passes will not run when you compile programs using +afl-clang-fast. Hence, you can use AFL++ as usual. To enable the passes, you +must set environment variables before you compile the target project. The following options exist: @@ -24,32 +22,30 @@ Enables the split-switches pass. `export AFL_LLVM_LAF_TRANSFORM_COMPARES=1` -Enables the transform-compares pass (strcmp, memcmp, strncmp, -strcasecmp, strncasecmp). +Enables the transform-compares pass (strcmp, memcmp, strncmp, strcasecmp, +strncasecmp). `export AFL_LLVM_LAF_SPLIT_COMPARES=1` -Enables the split-compares pass. -By default it will +Enables the split-compares pass. By default, it will 1. simplify operators >= (and <=) into chains of > (<) and == comparisons -2. change signed integer comparisons to a chain of sign-only comparison -and unsigned integer comparisons -3. split all unsigned integer comparisons with bit widths of -64, 32 or 16 bits to chains of 8 bits comparisons. - -You can change the behaviour of the last step by setting -`export AFL_LLVM_LAF_SPLIT_COMPARES_BITW=<bit_width>`, where -bit_width may be 64, 32 or 16. For example, a bit_width of 16 -would split larger comparisons down to 16 bit comparisons. - -A new experimental feature is splitting floating point comparisons into a -series of sign, exponent and mantissa comparisons followed by splitting each -of them into 8 bit comparisons when necessary. -It is activated with the `AFL_LLVM_LAF_SPLIT_FLOATS` setting. -Please note that full IEEE 754 functionality is not preserved, that is -values of nan and infinity will probably behave differently. - -Note that setting this automatically activates `AFL_LLVM_LAF_SPLIT_COMPARES` - -You can also set `AFL_LLVM_LAF_ALL` and have all of the above enabled :-) - +2. change signed integer comparisons to a chain of sign-only comparison and + unsigned integer comparisons +3. split all unsigned integer comparisons with bit widths of 64, 32, or 16 bits + to chains of 8 bits comparisons. + +You can change the behavior of the last step by setting `export +AFL_LLVM_LAF_SPLIT_COMPARES_BITW=<bit_width>`, where bit_width may be 64, 32, or +16. For example, a bit_width of 16 would split larger comparisons down to 16 bit +comparisons. + +A new experimental feature is splitting floating point comparisons into a series +of sign, exponent and mantissa comparisons followed by splitting each of them +into 8 bit comparisons when necessary. It is activated with the +`AFL_LLVM_LAF_SPLIT_FLOATS` setting. Please note that full IEEE 754 +functionality is not preserved, that is values of nan and infinity will probably +behave differently. + +Note that setting this automatically activates `AFL_LLVM_LAF_SPLIT_COMPARES`. + +You can also set `AFL_LLVM_LAF_ALL` and have all of the above enabled. :-) \ No newline at end of file diff --git a/instrumentation/README.lto.md b/instrumentation/README.lto.md index 6174cdc0..a74425dc 100644 --- a/instrumentation/README.lto.md +++ b/instrumentation/README.lto.md @@ -1,55 +1,56 @@ # afl-clang-lto - collision free instrumentation at link time -## TLDR; +## TL;DR: -This version requires a current llvm 11+ compiled from the github master. +This version requires a current llvm 11+ compiled from the GitHub master. 1. Use afl-clang-lto/afl-clang-lto++ because it is faster and gives better - coverage than anything else that is out there in the AFL world + coverage than anything else that is out there in the AFL world. -2. You can use it together with llvm_mode: laf-intel and the instrument file listing - features and can be combined with cmplog/Redqueen +2. You can use it together with llvm_mode: laf-intel and the instrument file + listing features and can be combined with cmplog/Redqueen. -3. It only works with llvm 11+ +3. It only works with llvm 11+. -4. AUTODICTIONARY feature! see below +4. AUTODICTIONARY feature (see below)! -5. If any problems arise be sure to set `AR=llvm-ar RANLIB=llvm-ranlib`. - Some targets might need `LD=afl-clang-lto` and others `LD=afl-ld-lto`. +5. If any problems arise, be sure to set `AR=llvm-ar RANLIB=llvm-ranlib`. Some + targets might need `LD=afl-clang-lto` and others `LD=afl-ld-lto`. ## Introduction and problem description -A big issue with how AFL/AFL++ works is that the basic block IDs that are -set during compilation are random - and hence naturally the larger the number -of instrumented locations, the higher the number of edge collisions are in the -map. This can result in not discovering new paths and therefore degrade the +A big issue with how AFL++ works is that the basic block IDs that are set during +compilation are random - and hence naturally the larger the number of +instrumented locations, the higher the number of edge collisions are in the map. +This can result in not discovering new paths and therefore degrade the efficiency of the fuzzing process. -*This issue is underestimated in the fuzzing community!* -With a 2^16 = 64kb standard map at already 256 instrumented blocks there is -on average one collision. On average a target has 10.000 to 50.000 -instrumented blocks hence the real collisions are between 750-18.000! +*This issue is underestimated in the fuzzing community!* With a 2^16 = 64kb +standard map at already 256 instrumented blocks, there is on average one +collision. On average, a target has 10.000 to 50.000 instrumented blocks, hence +the real collisions are between 750-18.000! -To reach a solution that prevents any collisions took several approaches -and many dead ends until we got to this: +To reach a solution that prevents any collisions took several approaches and +many dead ends until we got to this: - * We instrument at link time when we have all files pre-compiled - * To instrument at link time we compile in LTO (link time optimization) mode - * Our compiler (afl-clang-lto/afl-clang-lto++) takes care of setting the - correct LTO options and runs our own afl-ld linker instead of the system - linker - * The LLVM linker collects all LTO files to link and instruments them so that - we have non-colliding edge overage - * We use a new (for afl) edge coverage - which is the same as in llvm - -fsanitize=coverage edge coverage mode :) +* We instrument at link time when we have all files pre-compiled. +* To instrument at link time, we compile in LTO (link time optimization) mode. +* Our compiler (afl-clang-lto/afl-clang-lto++) takes care of setting the correct + LTO options and runs our own afl-ld linker instead of the system linker. +* The LLVM linker collects all LTO files to link and instruments them so that we + have non-colliding edge overage. +* We use a new (for afl) edge coverage - which is the same as in llvm + -fsanitize=coverage edge coverage mode. :) The result: - * 10-25% speed gain compared to llvm_mode - * guaranteed non-colliding edge coverage :-) - * The compile time especially for binaries to an instrumented library can be - much longer + +* 10-25% speed gain compared to llvm_mode +* guaranteed non-colliding edge coverage :-) +* The compile time, especially for binaries to an instrumented library, can be + much longer. Example build output from a libtiff build: + ``` libtool: link: afl-clang-lto -g -O2 -Wall -W -o thumbnail thumbnail.o ../libtiff/.libs/libtiff.a ../port/.libs/libport.a -llzma -ljbig -ljpeg -lz -lm afl-clang-lto++2.63d by Marc "vanHauser" Heuse <mh@mh-sec.de> in mode LTO @@ -62,21 +63,24 @@ AUTODICTIONARY: 11 strings found ### Installing llvm version 11 or 12 -llvm 11 or even 12 should be available in all current Linux repositories. -If you use an outdated Linux distribution read the next section. +llvm 11 or even 12 should be available in all current Linux repositories. If you +use an outdated Linux distribution, read the next section. ### Installing llvm from the llvm repository (version 12+) Installing the llvm snapshot builds is easy and mostly painless: -In the follow line change `NAME` for your Debian or Ubuntu release name +In the following line, change `NAME` for your Debian or Ubuntu release name (e.g. buster, focal, eon, etc.): + ``` echo deb http://apt.llvm.org/NAME/ llvm-toolchain-NAME NAME >> /etc/apt/sources.list ``` -then add the pgp key of llvm and install the packages: + +Then add the pgp key of llvm and install the packages: + ``` -wget -O - https://apt.llvm.org/llvm-snapshot.gpg.key | apt-key add - +wget -O - https://apt.llvm.org/llvm-snapshot.gpg.key | apt-key add - apt-get update && apt-get upgrade -y apt-get install -y clang-12 clang-tools-12 libc++1-12 libc++-12-dev \ libc++abi1-12 libc++abi-12-dev libclang1-12 libclang-12-dev \ @@ -87,7 +91,8 @@ apt-get install -y clang-12 clang-tools-12 libc++1-12 libc++-12-dev \ ### Building llvm yourself (version 12+) -Building llvm from github takes quite some long time and is not painless: +Building llvm from GitHub takes quite some time and is not painless: + ```sh sudo apt install binutils-dev # this is *essential*! git clone --depth=1 https://github.com/llvm/llvm-project @@ -126,10 +131,12 @@ sudo make install Just use afl-clang-lto like you did with afl-clang-fast or afl-gcc. -Also the instrument file listing (AFL_LLVM_ALLOWLIST/AFL_LLVM_DENYLIST -> [README.instrument_list.md](README.instrument_list.md)) and -laf-intel/compcov (AFL_LLVM_LAF_* -> [README.laf-intel.md](README.laf-intel.md)) work. +Also, the instrument file listing (AFL_LLVM_ALLOWLIST/AFL_LLVM_DENYLIST -> +[README.instrument_list.md](README.instrument_list.md)) and laf-intel/compcov +(AFL_LLVM_LAF_* -> [README.laf-intel.md](README.laf-intel.md)) work. Example: + ``` CC=afl-clang-lto CXX=afl-clang-lto++ RANLIB=llvm-ranlib AR=llvm-ar ./configure make @@ -143,51 +150,48 @@ NOTE: some targets also need to set the linker, try both `afl-clang-lto` and Note: this is highly discouraged! Try to compile to static libraries with afl-clang-lto instead of shared libraries! -To make instrumented shared libraries work with afl-clang-lto you have to do +To make instrumented shared libraries work with afl-clang-lto, you have to do quite some extra steps. -Every shared library you want to instrument has to be individually compiled. -The environment variable `AFL_LLVM_LTO_DONTWRITEID=1` has to be set during -compilation. -Additionally the environment variable `AFL_LLVM_LTO_STARTID` has to be set to -the added edge count values of all previous compiled instrumented shared -libraries for that target. -E.g. for the first shared library this would be `AFL_LLVM_LTO_STARTID=0` and -afl-clang-lto will then report how many edges have been instrumented (let's say -it reported 1000 instrumented edges). -The second shared library then has to be set to that value +Every shared library you want to instrument has to be individually compiled. The +environment variable `AFL_LLVM_LTO_DONTWRITEID=1` has to be set during +compilation. Additionally, the environment variable `AFL_LLVM_LTO_STARTID` has +to be set to the added edge count values of all previous compiled instrumented +shared libraries for that target. E.g., for the first shared library this would +be `AFL_LLVM_LTO_STARTID=0` and afl-clang-lto will then report how many edges +have been instrumented (let's say it reported 1000 instrumented edges). The +second shared library then has to be set to that value (`AFL_LLVM_LTO_STARTID=1000` in our example), for the third to all previous counts added, etc. -The final program compilation step then may *not* have `AFL_LLVM_LTO_DONTWRITEID` -set, and `AFL_LLVM_LTO_STARTID` must be set to all edge counts added of all shared -libraries it will be linked to. +The final program compilation step then may *not* have +`AFL_LLVM_LTO_DONTWRITEID` set, and `AFL_LLVM_LTO_STARTID` must be set to all +edge counts added of all shared libraries it will be linked to. -This is quite some hands-on work, so better stay away from instrumenting -shared libraries :-) +This is quite some hands-on work, so better stay away from instrumenting shared +libraries. :-) ## AUTODICTIONARY feature While compiling, a dictionary based on string comparisons is automatically -generated and put into the target binary. This dictionary is transfered to afl-fuzz -on start. This improves coverage statistically by 5-10% :) +generated and put into the target binary. This dictionary is transferred to +afl-fuzz on start. This improves coverage statistically by 5-10%. :) -Note that if for any reason you do not want to use the autodictionary feature +Note that if for any reason you do not want to use the autodictionary feature, then just set the environment variable `AFL_NO_AUTODICT` when starting afl-fuzz. ## Fixed memory map To speed up fuzzing a little bit more, it is possible to set a fixed shared -memory map. -Recommended is the value 0x10000. +memory map. Recommended is the value 0x10000. -In most cases this will work without any problems. However if a target uses -early constructors, ifuncs or a deferred forkserver this can crash the target. +In most cases, this will work without any problems. However, if a target uses +early constructors, ifuncs, or a deferred forkserver, this can crash the target. -Also on unusual operating systems/processors/kernels or weird libraries the +Also, on unusual operating systems/processors/kernels or weird libraries the recommended 0x10000 address might not work, so then change the fixed address. -To enable this feature set AFL_LLVM_MAP_ADDR with the address. +To enable this feature, set `AFL_LLVM_MAP_ADDR` with the address. ## Document edge IDs @@ -206,143 +210,155 @@ these. An example of a hard to solve target is ffmpeg. Here is how to successfully instrument it: -1. Get and extract the current ffmpeg and change to its directory +1. Get and extract the current ffmpeg and change to its directory. 2. Running configure with --cc=clang fails and various other items will fail when compiling, so we have to trick configure: -``` -./configure --enable-lto --disable-shared --disable-inline-asm -``` - -3. Now the configuration is done - and we edit the settings in `./ffbuild/config.mak` - (-: the original line, +: what to change it into): -``` --CC=gcc -+CC=afl-clang-lto --CXX=g++ -+CXX=afl-clang-lto++ --AS=gcc -+AS=llvm-as --LD=gcc -+LD=afl-clang-lto++ --DEPCC=gcc -+DEPCC=afl-clang-lto --DEPAS=gcc -+DEPAS=afl-clang-lto++ --AR=ar -+AR=llvm-ar --AR_CMD=ar -+AR_CMD=llvm-ar --NM_CMD=nm -g -+NM_CMD=llvm-nm -g --RANLIB=ranlib -D -+RANLIB=llvm-ranlib -D -``` - -4. Then type make, wait for a long time and you are done :) + ``` + ./configure --enable-lto --disable-shared --disable-inline-asm + ``` + +3. Now the configuration is done - and we edit the settings in + `./ffbuild/config.mak` (-: the original line, +: what to change it into): + + ``` + -CC=gcc + +CC=afl-clang-lto + -CXX=g++ + +CXX=afl-clang-lto++ + -AS=gcc + +AS=llvm-as + -LD=gcc + +LD=afl-clang-lto++ + -DEPCC=gcc + +DEPCC=afl-clang-lto + -DEPAS=gcc + +DEPAS=afl-clang-lto++ + -AR=ar + +AR=llvm-ar + -AR_CMD=ar + +AR_CMD=llvm-ar + -NM_CMD=nm -g + +NM_CMD=llvm-nm -g + -RANLIB=ranlib -D + +RANLIB=llvm-ranlib -D + ``` + +4. Then type make, wait for a long time, and you are done. :) ### Example: WebKit jsc Building jsc is difficult as the build script has bugs. -1. checkout Webkit: -``` -svn checkout https://svn.webkit.org/repository/webkit/trunk WebKit -cd WebKit -``` +1. Checkout Webkit: + + ``` + svn checkout https://svn.webkit.org/repository/webkit/trunk WebKit + cd WebKit + ``` 2. Fix the build environment: -``` -mkdir -p WebKitBuild/Release -cd WebKitBuild/Release -ln -s ../../../../../usr/bin/llvm-ar-12 llvm-ar-12 -ln -s ../../../../../usr/bin/llvm-ranlib-12 llvm-ranlib-12 -cd ../.. -``` -3. Build :) + ``` + mkdir -p WebKitBuild/Release + cd WebKitBuild/Release + ln -s ../../../../../usr/bin/llvm-ar-12 llvm-ar-12 + ln -s ../../../../../usr/bin/llvm-ranlib-12 llvm-ranlib-12 + cd ../.. + ``` -``` -Tools/Scripts/build-jsc --jsc-only --cli --cmakeargs="-DCMAKE_AR='llvm-ar-12' -DCMAKE_RANLIB='llvm-ranlib-12' -DCMAKE_VERBOSE_MAKEFILE:BOOL=ON -DCMAKE_CC_FLAGS='-O3 -lrt' -DCMAKE_CXX_FLAGS='-O3 -lrt' -DIMPORTED_LOCATION='/lib/x86_64-linux-gnu/' -DCMAKE_CC=afl-clang-lto -DCMAKE_CXX=afl-clang-lto++ -DENABLE_STATIC_JSC=ON" -``` +3. Build. :) + + ``` + Tools/Scripts/build-jsc --jsc-only --cli --cmakeargs="-DCMAKE_AR='llvm-ar-12' -DCMAKE_RANLIB='llvm-ranlib-12' -DCMAKE_VERBOSE_MAKEFILE:BOOL=ON -DCMAKE_CC_FLAGS='-O3 -lrt' -DCMAKE_CXX_FLAGS='-O3 -lrt' -DIMPORTED_LOCATION='/lib/x86_64-linux-gnu/' -DCMAKE_CC=afl-clang-lto -DCMAKE_CXX=afl-clang-lto++ -DENABLE_STATIC_JSC=ON" + ``` ## Potential issues -### compiling libraries fails +### Compiling libraries fails If you see this message: + ``` /bin/ld: libfoo.a: error adding symbols: archive has no index; run ranlib to add one ``` -This is because usually gnu gcc ranlib is being called which cannot deal with clang LTO files. -The solution is simple: when you ./configure you also have to set RANLIB=llvm-ranlib and AR=llvm-ar + +This is because usually gnu gcc ranlib is being called which cannot deal with +clang LTO files. The solution is simple: when you `./configure`, you also have +to set `RANLIB=llvm-ranlib` and `AR=llvm-ar`. Solution: + ``` AR=llvm-ar RANLIB=llvm-ranlib CC=afl-clang-lto CXX=afl-clang-lto++ ./configure --disable-shared ``` -and on some targets you have to set AR=/RANLIB= even for make as the configure script does not save it. -Other targets ignore environment variables and need the parameters set via -`./configure --cc=... --cxx= --ranlib= ...` etc. (I am looking at you ffmpeg!). +And on some targets you have to set `AR=/RANLIB=` even for `make` as the +configure script does not save it. Other targets ignore environment variables +and need the parameters set via `./configure --cc=... --cxx= --ranlib= ...` etc. +(I am looking at you ffmpeg!) + +If you see this message: -If you see this message ``` assembler command failed ... ``` -then try setting `llvm-as` for configure: + +Then try setting `llvm-as` for configure: + ``` AS=llvm-as ... ``` -### compiling programs still fail +### Compiling programs still fail afl-clang-lto is still work in progress. Known issues: - * Anything that llvm 11+ cannot compile, afl-clang-lto cannot compile either - obviously - * Anything that does not compile with LTO, afl-clang-lto cannot compile either - obviously +* Anything that llvm 11+ cannot compile, afl-clang-lto cannot compile either - + obviously. +* Anything that does not compile with LTO, afl-clang-lto cannot compile either - + obviously. -Hence if building a target with afl-clang-lto fails try to build it with llvm12 -and LTO enabled (`CC=clang-12` `CXX=clang++-12` `CFLAGS=-flto=full` and -`CXXFLAGS=-flto=full`). +Hence, if building a target with afl-clang-lto fails, try to build it with +llvm12 and LTO enabled (`CC=clang-12`, `CXX=clang++-12`, `CFLAGS=-flto=full`, +and `CXXFLAGS=-flto=full`). -If this succeeeds then there is an issue with afl-clang-lto. Please report at -[https://github.com/AFLplusplus/AFLplusplus/issues/226](https://github.com/AFLplusplus/AFLplusplus/issues/226) +If this succeeds, then there is an issue with afl-clang-lto. Please report at +[https://github.com/AFLplusplus/AFLplusplus/issues/226](https://github.com/AFLplusplus/AFLplusplus/issues/226). Even some targets where clang-12 fails can be build if the fail is just in `./configure`, see `Solving difficult targets` above. ## History -This was originally envisioned by hexcoder- in Summer 2019, however we saw no -way to create a pass that is run at link time - although there is a option -for this in the PassManager: EP_FullLinkTimeOptimizationLast -("Fun" info - nobody knows what this is doing. And the developer who -implemented this didn't respond to emails.) - -In December then came the idea to implement this as a pass that is run via -the llvm "opt" program, which is performed via an own linker that afterwards -calls the real linker. -This was first implemented in January and work ... kinda. -The LTO time instrumentation worked, however "how" the basic blocks were -instrumented was a problem, as reducing duplicates turned out to be very, -very difficult with a program that has so many paths and therefore so many -dependencies. A lot of strategies were implemented - and failed. -And then sat solvers were tried, but with over 10.000 variables that turned -out to be a dead-end too. +This was originally envisioned by hexcoder- in Summer 2019. However, we saw no +way to create a pass that is run at link time - although there is a option for +this in the PassManager: EP_FullLinkTimeOptimizationLast. ("Fun" info - nobody +knows what this is doing. And the developer who implemented this didn't respond +to emails.) + +In December then came the idea to implement this as a pass that is run via the +llvm "opt" program, which is performed via an own linker that afterwards calls +the real linker. This was first implemented in January and work ... kinda. The +LTO time instrumentation worked, however, "how" the basic blocks were +instrumented was a problem, as reducing duplicates turned out to be very, very +difficult with a program that has so many paths and therefore so many +dependencies. A lot of strategies were implemented - and failed. And then sat +solvers were tried, but with over 10.000 variables that turned out to be a +dead-end too. The final idea to solve this came from domenukk who proposed to insert a block -into an edge and then just use incremental counters ... and this worked! -After some trials and errors to implement this vanhauser-thc found out that -there is actually an llvm function for this: SplitEdge() :-) +into an edge and then just use incremental counters ... and this worked! After +some trials and errors to implement this vanhauser-thc found out that there is +actually an llvm function for this: SplitEdge() :-) -Still more problems came up though as this only works without bugs from -llvm 9 onwards, and with high optimization the link optimization ruins -the instrumented control flow graph. +Still more problems came up though as this only works without bugs from llvm 9 +onwards, and with high optimization the link optimization ruins the instrumented +control flow graph. -This is all now fixed with llvm 11+. The llvm's own linker is now able to -load passes and this bypasses all problems we had. +This is all now fixed with llvm 11+. The llvm's own linker is now able to load +passes and this bypasses all problems we had. -Happy end :) +Happy end :) \ No newline at end of file diff --git a/instrumentation/README.persistent_mode.md b/instrumentation/README.persistent_mode.md index e9d2a523..d0ccba8c 100644 --- a/instrumentation/README.persistent_mode.md +++ b/instrumentation/README.persistent_mode.md @@ -132,7 +132,7 @@ and you should be all set! Some libraries provide APIs that are stateless, or whose state can be reset in between processing different input files. When such a reset is performed, a single long-lived process can be reused to try out multiple test cases, -eliminating the need for repeated fork() calls and the associated OS overhead. +eliminating the need for repeated `fork()` calls and the associated OS overhead. The basic structure of the program that does this would be: |