From 4dad895bbb187fcff0fa01666640e0fed7108905 Mon Sep 17 00:00:00 2001 From: llzmb <46303940+llzmb@users.noreply.github.com> Date: Tue, 23 Nov 2021 12:17:04 +0100 Subject: Edit "README.persistent_mode.md" --- instrumentation/README.persistent_mode.md | 107 ++++++++++++++++-------------- 1 file changed, 56 insertions(+), 51 deletions(-) (limited to 'instrumentation/README.persistent_mode.md') diff --git a/instrumentation/README.persistent_mode.md b/instrumentation/README.persistent_mode.md index c6ba2103..efe64a37 100644 --- a/instrumentation/README.persistent_mode.md +++ b/instrumentation/README.persistent_mode.md @@ -3,23 +3,23 @@ ## 1) Introduction In persistent mode, AFL++ fuzzes a target multiple times in a single forked -process, instead of forking a new process for each fuzz execution. -This is the most effective way to fuzz, as the speed can easily be x10 or x20 -times faster without any disadvanges. -*All professional fuzzing uses this mode.* +process, instead of forking a new process for each fuzz execution. This is the +most effective way to fuzz, as the speed can easily be x10 or x20 times faster +without any disadvantages. *All professional fuzzing uses this mode.* Persistent mode requires that the target can be called in one or more functions, and that it's state can be completely reset so that multiple calls can be performed without resource leaks, and that earlier runs will have no impact on -future runs (an indicator for this is the `stability` value in the `afl-fuzz` -UI, if this decreases to lower values in persistent mode compared to -non-persistent mode, that the fuzz target keeps state). +future runs. An indicator for this is the `stability` value in the `afl-fuzz` +UI. If this decreases to lower values in persistent mode compared to +non-persistent mode, then the fuzz target keeps state. Examples can be found in [utils/persistent_mode](../utils/persistent_mode). -## 2) TLDR; +## 2) TL;DR: Example `fuzz_target.c`: + ```c #include "what_you_need_for_your_target.h" @@ -27,7 +27,7 @@ __AFL_FUZZ_INIT(); main() { - // anything else here, eg. command line arguments, initialization, etc. + // anything else here, e.g. command line arguments, initialization, etc. #ifdef __AFL_HAVE_MANUAL_CONTROL __AFL_INIT(); @@ -54,14 +54,16 @@ main() { } ``` + And then compile: + ``` afl-clang-fast -o fuzz_target fuzz_target.c -lwhat_you_need_for_your_target ``` -And that is it! -The speed increase is usually x10 to x20. -If you want to be able to compile the target without afl-clang-fast/lto then +And that is it! The speed increase is usually x10 to x20. + +If you want to be able to compile the target without afl-clang-fast/lto, then add this just after the includes: ```c @@ -72,20 +74,20 @@ add this just after the includes: #define __AFL_FUZZ_TESTCASE_BUF fuzz_buf #define __AFL_FUZZ_INIT() void sync(void); #define __AFL_LOOP(x) ((fuzz_len = read(0, fuzz_buf, sizeof(fuzz_buf))) > 0 ? 1 : 0) - #define __AFL_INIT() sync() + #define __AFL_INIT() sync() #endif ``` ## 3) Deferred initialization -AFL tries to optimize performance by executing the targeted binary just once, -stopping it just before `main()`, and then cloning this "main" process to get -a steady supply of targets to fuzz. +AFL++ tries to optimize performance by executing the targeted binary just once, +stopping it just before `main()`, and then cloning this "main" process to get a +steady supply of targets to fuzz. -Although this approach eliminates much of the OS-, linker- and libc-level -costs of executing the program, it does not always help with binaries that -perform other time-consuming initialization steps - say, parsing a large config -file before getting to the fuzzed data. +Although this approach eliminates much of the OS-, linker- and libc-level costs +of executing the program, it does not always help with binaries that perform +other time-consuming initialization steps - say, parsing a large config file +before getting to the fuzzed data. In such cases, it's beneficial to initialize the forkserver a bit later, once most of the initialization work is already done, but before the binary attempts @@ -93,22 +95,21 @@ to read the fuzzed input and parse it; in some cases, this can offer a 10x+ performance gain. You can implement delayed initialization in LLVM mode in a fairly simple way. -First, find a suitable location in the code where the delayed cloning can -take place. This needs to be done with *extreme* care to avoid breaking the -binary. In particular, the program will probably malfunction if you select -a location after: +First, find a suitable location in the code where the delayed cloning can take +place. This needs to be done with *extreme* care to avoid breaking the binary. +In particular, the program will probably malfunction if you select a location +after: - - The creation of any vital threads or child processes - since the forkserver - can't clone them easily. +- The creation of any vital threads or child processes - since the forkserver + can't clone them easily. - - The initialization of timers via `setitimer()` or equivalent calls. +- The initialization of timers via `setitimer()` or equivalent calls. - - The creation of temporary files, network sockets, offset-sensitive file - descriptors, and similar shared-state resources - but only provided that - their state meaningfully influences the behavior of the program later on. +- The creation of temporary files, network sockets, offset-sensitive file + descriptors, and similar shared-state resources - but only provided that their + state meaningfully influences the behavior of the program later on. - - Any access to the fuzzed input, including reading the metadata about its - size. +- Any access to the fuzzed input, including reading the metadata about its size. With the location selected, add this code in the appropriate spot: @@ -126,7 +127,6 @@ Finally, recompile the program with afl-clang-fast/afl-clang-lto/afl-gcc-fast (afl-gcc or afl-clang will *not* generate a deferred-initialization binary) - and you should be all set! - ## 4) Persistent mode Some libraries provide APIs that are stateless, or whose state can be reset in @@ -145,23 +145,24 @@ The basic structure of the program that does this would be: } - /* Exit normally */ + /* Exit normally. */ ``` -The numerical value specified within the loop controls the maximum number -of iterations before AFL will restart the process from scratch. This minimizes +The numerical value specified within the loop controls the maximum number of +iterations before AFL++ will restart the process from scratch. This minimizes the impact of memory leaks and similar glitches; 1000 is a good starting point, -and going much higher increases the likelihood of hiccups without giving you -any real performance benefits. +and going much higher increases the likelihood of hiccups without giving you any +real performance benefits. -A more detailed template is shown in `../utils/persistent_mode/.` -Similarly to the previous mode, the feature works only with afl-clang-fast; -`#ifdef` guards can be used to suppress it when using other compilers. +A more detailed template is shown in +[utils/persistent_mode](../utils/persistent_mode). Similarly to the previous +mode, the feature works only with afl-clang-fast; `#ifdef` guards can be used to +suppress it when using other compilers. -Note that as with the previous mode, the feature is easy to misuse; if you -do not fully reset the critical state, you may end up with false positives or -waste a whole lot of CPU power doing nothing useful at all. Be particularly -wary of memory leaks and of the state of file descriptors. +Note that as with the previous mode, the feature is easy to misuse; if you do +not fully reset the critical state, you may end up with false positives or waste +a whole lot of CPU power doing nothing useful at all. Be particularly wary of +memory leaks and of the state of file descriptors. PS. Because there are task switches still involved, the mode isn't as fast as "pure" in-process fuzzing offered, say, by LLVM's LibFuzzer; but it is a lot @@ -170,9 +171,9 @@ should be a lot more robust. ## 5) Shared memory fuzzing -You can speed up the fuzzing process even more by receiving the fuzzing data -via shared memory instead of stdin or files. -This is a further speed multiplier of about 2x. +You can speed up the fuzzing process even more by receiving the fuzzing data via +shared memory instead of stdin or files. This is a further speed multiplier of +about 2x. Setting this up is very easy: @@ -181,14 +182,18 @@ After the includes set the following macro: ```c __AFL_FUZZ_INIT(); ``` -Directly at the start of main - or if you are using the deferred forkserver -with `__AFL_INIT()` then *after* `__AFL_INIT()` : + +Directly at the start of main - or if you are using the deferred forkserver with +`__AFL_INIT()`, then *after* `__AFL_INIT()`: + ```c unsigned char *buf = __AFL_FUZZ_TESTCASE_BUF; ``` Then as first line after the `__AFL_LOOP` while loop: + ```c int len = __AFL_FUZZ_TESTCASE_LEN; ``` -and that is all! + +And that is all! \ No newline at end of file -- cgit 1.4.1 From d9ff3745d01e30f3addbb51e391b8b5d456d07a4 Mon Sep 17 00:00:00 2001 From: llzmb <46303940+llzmb@users.noreply.github.com> Date: Tue, 23 Nov 2021 18:58:36 +0100 Subject: Edit "README.persistent_mode.md" --- instrumentation/README.persistent_mode.md | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) (limited to 'instrumentation/README.persistent_mode.md') diff --git a/instrumentation/README.persistent_mode.md b/instrumentation/README.persistent_mode.md index efe64a37..e9d2a523 100644 --- a/instrumentation/README.persistent_mode.md +++ b/instrumentation/README.persistent_mode.md @@ -155,14 +155,14 @@ and going much higher increases the likelihood of hiccups without giving you any real performance benefits. A more detailed template is shown in -[utils/persistent_mode](../utils/persistent_mode). Similarly to the previous -mode, the feature works only with afl-clang-fast; `#ifdef` guards can be used to -suppress it when using other compilers. - -Note that as with the previous mode, the feature is easy to misuse; if you do -not fully reset the critical state, you may end up with false positives or waste -a whole lot of CPU power doing nothing useful at all. Be particularly wary of -memory leaks and of the state of file descriptors. +[utils/persistent_mode](../utils/persistent_mode). Similarly to the deferred +initialization, the feature works only with afl-clang-fast; `#ifdef` guards can +be used to suppress it when using other compilers. + +Note that as with the deferred initialization, the feature is easy to misuse; if +you do not fully reset the critical state, you may end up with false positives +or waste a whole lot of CPU power doing nothing useful at all. Be particularly +wary of memory leaks and of the state of file descriptors. PS. Because there are task switches still involved, the mode isn't as fast as "pure" in-process fuzzing offered, say, by LLVM's LibFuzzer; but it is a lot -- cgit 1.4.1 From 6cce577b907eb2ac58b0bc5ddacf373627b3480f Mon Sep 17 00:00:00 2001 From: llzmb <46303940+llzmb@users.noreply.github.com> Date: Tue, 23 Nov 2021 21:03:56 +0100 Subject: Edit instrumentation READMEs --- instrumentation/README.cmplog.md | 20 +- instrumentation/README.gcc_plugin.md | 152 +++++++------- instrumentation/README.instrument_list.md | 109 +++++----- instrumentation/README.laf-intel.md | 66 +++--- instrumentation/README.lto.md | 322 ++++++++++++++++-------------- instrumentation/README.persistent_mode.md | 2 +- 6 files changed, 350 insertions(+), 321 deletions(-) (limited to 'instrumentation/README.persistent_mode.md') diff --git a/instrumentation/README.cmplog.md b/instrumentation/README.cmplog.md index a796c7a7..146b4620 100644 --- a/instrumentation/README.cmplog.md +++ b/instrumentation/README.cmplog.md @@ -1,11 +1,12 @@ # CmpLog instrumentation -The CmpLog instrumentation enables logging of comparison operands in a -shared memory. +The CmpLog instrumentation enables logging of comparison operands in a shared +memory. -These values can be used by various mutators built on top of it. -At the moment we support the RedQueen mutator (input-2-state instructions only), -for details see [the RedQueen paper](https://www.syssec.ruhr-uni-bochum.de/media/emma/veroeffentlichungen/2018/12/17/NDSS19-Redqueen.pdf). +These values can be used by various mutators built on top of it. At the moment, +we support the RedQueen mutator (input-2-state instructions only), for details +see +[the RedQueen paper](https://www.syssec.ruhr-uni-bochum.de/media/emma/veroeffentlichungen/2018/12/17/NDSS19-Redqueen.pdf). ## Build @@ -14,7 +15,8 @@ program. The first version is built using the regular AFL++ instrumentation. -The second one, the CmpLog binary, is built with setting AFL_LLVM_CMPLOG during the compilation. +The second one, the CmpLog binary, is built with setting AFL_LLVM_CMPLOG during +the compilation. For example: @@ -32,8 +34,8 @@ unset AFL_LLVM_CMPLOG ## Use -AFL++ has the new `-c` option that needs to be used to specify the CmpLog binary (the second -build). +AFL++ has the new `-c` option that needs to be used to specify the CmpLog binary +(the second build). For example: @@ -41,4 +43,4 @@ For example: afl-fuzz -i input -o output -c ./program.cmplog -m none -- ./program.afl @@ ``` -Be sure to use `-m none` because CmpLog can map a lot of pages. +Be sure to use `-m none` because CmpLog can map a lot of pages. \ No newline at end of file diff --git a/instrumentation/README.gcc_plugin.md b/instrumentation/README.gcc_plugin.md index 230ceb73..33cf1c33 100644 --- a/instrumentation/README.gcc_plugin.md +++ b/instrumentation/README.gcc_plugin.md @@ -1,64 +1,68 @@ # GCC-based instrumentation for afl-fuzz -See [../README.md](../README.md) for the general instruction manual. -See [README.llvm.md](README.llvm.md) for the LLVM-based instrumentation. +For the general instruction manual, see [../README.md](../README.md). +For the LLVM-based instrumentation, see [README.llvm.md](README.llvm.md). This document describes how to build and use `afl-gcc-fast` and `afl-g++-fast`, which instrument the target with the help of gcc plugins. -TLDR: - * check the version of your gcc compiler: `gcc --version` - * `apt-get install gcc-VERSION-plugin-dev` or similar to install headers for gcc plugins - * `gcc` and `g++` must match the gcc-VERSION you installed headers for. You can set `AFL_CC`/`AFL_CXX` - to point to these! - * `make` - * just use `afl-gcc-fast`/`afl-g++-fast` normally like you would do with `afl-clang-fast` +TL;DR: +* Check the version of your gcc compiler: `gcc --version` +* `apt-get install gcc-VERSION-plugin-dev` or similar to install headers for gcc + plugins. +* `gcc` and `g++` must match the gcc-VERSION you installed headers for. You can + set `AFL_CC`/`AFL_CXX` to point to these! +* `make` +* Just use `afl-gcc-fast`/`afl-g++-fast` normally like you would do with + `afl-clang-fast`. ## 1) Introduction -The code in this directory allows to instrument programs for AFL using -true compiler-level instrumentation, instead of the more crude -assembly-level rewriting approach taken by afl-gcc and afl-clang. This has -several interesting properties: +The code in this directory allows to instrument programs for AFL++ using true +compiler-level instrumentation, instead of the more crude assembly-level +rewriting approach taken by afl-gcc and afl-clang. This has several interesting +properties: - - The compiler can make many optimizations that are hard to pull off when - manually inserting assembly. As a result, some slow, CPU-bound programs will - run up to around faster. +- The compiler can make many optimizations that are hard to pull off when + manually inserting assembly. As a result, some slow, CPU-bound programs will + run up to around faster. - The gains are less pronounced for fast binaries, where the speed is limited - chiefly by the cost of creating new processes. In such cases, the gain will - probably stay within 10%. + The gains are less pronounced for fast binaries, where the speed is limited + chiefly by the cost of creating new processes. In such cases, the gain will + probably stay within 10%. - - The instrumentation is CPU-independent. At least in principle, you should - be able to rely on it to fuzz programs on non-x86 architectures (after - building `afl-fuzz` with `AFL_NOX86=1`). +- The instrumentation is CPU-independent. At least in principle, you should be + able to rely on it to fuzz programs on non-x86 architectures (after building + `afl-fuzz` with `AFL_NOX86=1`). - - Because the feature relies on the internals of GCC, it is gcc-specific - and will *not* work with LLVM (see [README.llvm.md](README.llvm.md) for an alternative). +- Because the feature relies on the internals of GCC, it is gcc-specific and + will *not* work with LLVM (see [README.llvm.md](README.llvm.md) for an + alternative). Once this implementation is shown to be sufficiently robust and portable, it -will probably replace afl-gcc. For now, it can be built separately and -co-exists with the original code. +will probably replace afl-gcc. For now, it can be built separately and co-exists +with the original code. The idea and much of the implementation comes from Laszlo Szekeres. ## 2) How to use -In order to leverage this mechanism, you need to have modern enough GCC -(>= version 4.5.0) and the plugin development headers installed on your system. That +In order to leverage this mechanism, you need to have modern enough GCC (>= +version 4.5.0) and the plugin development headers installed on your system. That should be all you need. On Debian machines, these headers can be acquired by installing the `gcc-VERSION-plugin-dev` packages. To build the instrumentation itself, type `make`. This will generate binaries -called `afl-gcc-fast` and `afl-g++-fast` in the parent directory. +called `afl-gcc-fast` and `afl-g++-fast` in the parent directory. -The gcc and g++ compiler links have to point to gcc-VERSION - or set these -by pointing the environment variables `AFL_CC`/`AFL_CXX` to them. -If the `CC`/`CXX` environment variables have been set, those compilers will be -preferred over those from the `AFL_CC`/`AFL_CXX` settings. +The gcc and g++ compiler links have to point to gcc-VERSION - or set these by +pointing the environment variables `AFL_CC`/`AFL_CXX` to them. If the `CC`/`CXX` +environment variables have been set, those compilers will be preferred over +those from the `AFL_CC`/`AFL_CXX` settings. Once this is done, you can instrument third-party code in a way similar to the -standard operating mode of AFL, e.g.: +standard operating mode of AFL++, e.g.: + ``` CC=/path/to/afl/afl-gcc-fast CXX=/path/to/afl/afl-g++-fast @@ -66,15 +70,15 @@ standard operating mode of AFL, e.g.: ./configure [...options...] make ``` + Note: We also used `CXX` to set the C++ compiler to `afl-g++-fast` for C++ code. The tool honors roughly the same environmental variables as `afl-gcc` (see -[env_variables.md](../docs/env_variables.md). This includes `AFL_INST_RATIO`, -`AFL_USE_ASAN`, `AFL_HARDEN`, and `AFL_DONT_OPTIMIZE`. +[docs/env_variables.md](../docs/env_variables.md). This includes +`AFL_INST_RATIO`, `AFL_USE_ASAN`, `AFL_HARDEN`, and `AFL_DONT_OPTIMIZE`. -Note: if you want the GCC plugin to be installed on your system for all -users, you need to build it before issuing 'make install' in the parent -directory. +Note: if you want the GCC plugin to be installed on your system for all users, +you need to build it before issuing 'make install' in the parent directory. ## 3) Gotchas, feedback, bugs @@ -83,41 +87,40 @@ reports to afl@aflplus.plus. ## 4) Bonus feature #1: deferred initialization -AFL tries to optimize performance by executing the targeted binary just once, -stopping it just before main(), and then cloning this "main" process to get -a steady supply of targets to fuzz. +AFL++ tries to optimize performance by executing the targeted binary just once, +stopping it just before `main()`, and then cloning this "main" process to get a +steady supply of targets to fuzz. -Although this approach eliminates much of the OS-, linker- and libc-level -costs of executing the program, it does not always help with binaries that -perform other time-consuming initialization steps - say, parsing a large config -file before getting to the fuzzed data. +Although this approach eliminates much of the OS-, linker- and libc-level costs +of executing the program, it does not always help with binaries that perform +other time-consuming initialization steps - say, parsing a large config file +before getting to the fuzzed data. In such cases, it's beneficial to initialize the forkserver a bit later, once most of the initialization work is already done, but before the binary attempts to read the fuzzed input and parse it; in some cases, this can offer a 10x+ performance gain. You can implement delayed initialization in GCC mode in a -fairly simple way. +fairly simple way: -First, locate a suitable location in the code where the delayed cloning can -take place. This needs to be done with *extreme* care to avoid breaking the -binary. In particular, the program will probably malfunction if you select -a location after: +First, locate a suitable location in the code where the delayed cloning can take +place. This needs to be done with *extreme* care to avoid breaking the binary. +In particular, the program will probably malfunction if you select a location +after: - - The creation of any vital threads or child processes - since the forkserver - can't clone them easily. +- The creation of any vital threads or child processes - since the forkserver + can't clone them easily. - - The initialization of timers via setitimer() or equivalent calls. +- The initialization of timers via `setitimer()` or equivalent calls. - - The creation of temporary files, network sockets, offset-sensitive file - descriptors, and similar shared-state resources - but only provided that - their state meaningfully influences the behavior of the program later on. +- The creation of temporary files, network sockets, offset-sensitive file + descriptors, and similar shared-state resources - but only provided that their + state meaningfully influences the behavior of the program later on. - - Any access to the fuzzed input, including reading the metadata about its - size. +- Any access to the fuzzed input, including reading the metadata about its size. With the location selected, add this code in the appropriate spot: -``` +```c #ifdef __AFL_HAVE_MANUAL_CONTROL __AFL_INIT(); #endif @@ -131,14 +134,14 @@ Finally, recompile the program with afl-gcc-fast (afl-gcc or afl-clang will ## 5) Bonus feature #2: persistent mode -Some libraries provide APIs that are stateless, or whose state can be reset in +Some libraries provide APIs that are stateless or whose state can be reset in between processing different input files. When such a reset is performed, a single long-lived process can be reused to try out multiple test cases, eliminating the need for repeated `fork()` calls and the associated OS overhead. The basic structure of the program that does this would be: -``` +```c while (__AFL_LOOP(1000)) { /* Read input data. */ @@ -147,22 +150,21 @@ The basic structure of the program that does this would be: } - /* Exit normally */ + /* Exit normally. */ ``` -The numerical value specified within the loop controls the maximum number -of iterations before AFL will restart the process from scratch. This minimizes +The numerical value specified within the loop controls the maximum number of +iterations before AFL++ will restart the process from scratch. This minimizes the impact of memory leaks and similar glitches; 1000 is a good starting point. -A more detailed template is shown in ../utils/persistent_mode/. -Similarly to the previous mode, the feature works only with afl-gcc-fast or -afl-clang-fast; #ifdef guards can be used to suppress it when using other -compilers. +A more detailed template is shown in ../utils/persistent_mode/. Similarly to the +previous mode, the feature works only with afl-gcc-fast or afl-clang-fast; +#ifdef guards can be used to suppress it when using other compilers. -Note that as with the previous mode, the feature is easy to misuse; if you -do not reset the critical state fully, you may end up with false positives or -waste a whole lot of CPU power doing nothing useful at all. Be particularly -wary of memory leaks and the state of file descriptors. +Note that as with the previous mode, the feature is easy to misuse; if you do +not reset the critical state fully, you may end up with false positives or waste +a whole lot of CPU power doing nothing useful at all. Be particularly wary of +memory leaks and the state of file descriptors. When running in this mode, the execution paths will inherently vary a bit depending on whether the input loop is being entered for the first time or @@ -171,5 +173,5 @@ executed again. To avoid spurious warnings, the feature implies ## 6) Bonus feature #3: selective instrumentation -It can be more effective to fuzzing to only instrument parts of the code. -For details see [README.instrument_list.md](README.instrument_list.md). +It can be more effective to fuzzing to only instrument parts of the code. For +details, see [README.instrument_list.md](README.instrument_list.md). \ No newline at end of file diff --git a/instrumentation/README.instrument_list.md b/instrumentation/README.instrument_list.md index 7db9c055..b412b600 100644 --- a/instrumentation/README.instrument_list.md +++ b/instrumentation/README.instrument_list.md @@ -1,80 +1,84 @@ # Using AFL++ with partial instrumentation - This file describes two different mechanisms to selectively instrument - only specific parts in the target. +This file describes two different mechanisms to selectively instrument only +specific parts in the target. - Both mechanisms work for LLVM and GCC_PLUGIN, but not for afl-clang/afl-gcc. +Both mechanisms work for LLVM and GCC_PLUGIN, but not for afl-clang/afl-gcc. ## 1) Description and purpose When building and testing complex programs where only a part of the program is -the fuzzing target, it often helps to only instrument the necessary parts of -the program, leaving the rest uninstrumented. This helps to focus the fuzzer -on the important parts of the program, avoiding undesired noise and -disturbance by uninteresting code being exercised. +the fuzzing target, it often helps to only instrument the necessary parts of the +program, leaving the rest uninstrumented. This helps to focus the fuzzer on the +important parts of the program, avoiding undesired noise and disturbance by +uninteresting code being exercised. For this purpose, "partial instrumentation" support is provided by AFL++ that allows to specify what should be instrumented and what not. -Both mechanisms can be used together. +Both mechanisms for partial instrumentation can be used together. ## 2) Selective instrumentation with __AFL_COVERAGE_... directives -In this mechanism the selective instrumentation is done in the source code. +In this mechanism, the selective instrumentation is done in the source code. -After the includes a special define has to be made, eg.: +After the includes, a special define has to be made, e.g.: ``` #include #include // ... - + __AFL_COVERAGE(); // <- required for this feature to work ``` -If you want to disable the coverage at startup until you specify coverage -should be started, then add `__AFL_COVERAGE_START_OFF();` at that position. +If you want to disable the coverage at startup until you specify coverage should +be started, then add `__AFL_COVERAGE_START_OFF();` at that position. -From here on out you have the following macros available that you can use -in any function where you want: +From here on out, you have the following macros available that you can use in +any function where you want: - * `__AFL_COVERAGE_ON();` - enable coverage from this point onwards - * `__AFL_COVERAGE_OFF();` - disable coverage from this point onwards - * `__AFL_COVERAGE_DISCARD();` - reset all coverage gathered until this point - * `__AFL_COVERAGE_SKIP();` - mark this test case as unimportant. Whatever happens, afl-fuzz will ignore it. +* `__AFL_COVERAGE_ON();` - Enable coverage from this point onwards. +* `__AFL_COVERAGE_OFF();` - Disable coverage from this point onwards. +* `__AFL_COVERAGE_DISCARD();` - Reset all coverage gathered until this point. +* `__AFL_COVERAGE_SKIP();` - Mark this test case as unimportant. Whatever + happens, afl-fuzz will ignore it. -A special function is `__afl_coverage_interesting`. -To use this, you must define `void __afl_coverage_interesting(u8 val, u32 id);`. -Then you can use this function globally, where the `val` parameter can be set -by you, the `id` parameter is for afl-fuzz and will be overwritten. -Note that useful parameters for `val` are: 1, 2, 3, 4, 8, 16, 32, 64, 128. -A value of e.g. 33 will be seen as 32 for coverage purposes. +A special function is `__afl_coverage_interesting`. To use this, you must define +`void __afl_coverage_interesting(u8 val, u32 id);`. Then you can use this +function globally, where the `val` parameter can be set by you, the `id` +parameter is for afl-fuzz and will be overwritten. Note that useful parameters +for `val` are: 1, 2, 3, 4, 8, 16, 32, 64, 128. A value of, e.g., 33 will be seen +as 32 for coverage purposes. ## 3) Selective instrumentation with AFL_LLVM_ALLOWLIST/AFL_LLVM_DENYLIST -This feature is equivalent to llvm 12 sancov feature and allows to specify -on a filename and/or function name level to instrument these or skip them. +This feature is equivalent to llvm 12 sancov feature and allows to specify on a +filename and/or function name level to instrument these or skip them. ### 3a) How to use the partial instrumentation mode In order to build with partial instrumentation, you need to build with -afl-clang-fast/afl-clang-fast++ or afl-clang-lto/afl-clang-lto++. -The only required change is that you need to set either the environment variable -AFL_LLVM_ALLOWLIST or AFL_LLVM_DENYLIST set with a filename. +afl-clang-fast/afl-clang-fast++ or afl-clang-lto/afl-clang-lto++. The only +required change is that you need to set either the environment variable +`AFL_LLVM_ALLOWLIST` or `AFL_LLVM_DENYLIST` set with a filename. That file should contain the file names or functions that are to be instrumented -(AFL_LLVM_ALLOWLIST) or are specifically NOT to be instrumented (AFL_LLVM_DENYLIST). +(`AFL_LLVM_ALLOWLIST`) or are specifically NOT to be instrumented +(`AFL_LLVM_DENYLIST`). + +GCC_PLUGIN: you can use either `AFL_LLVM_ALLOWLIST` or `AFL_GCC_ALLOWLIST` (or +the same for `_DENYLIST`), both work. -GCC_PLUGIN: you can use either AFL_LLVM_ALLOWLIST or AFL_GCC_ALLOWLIST (or the -same for _DENYLIST), both work. +For matching to succeed, the function/file name that is being compiled must end +in the function/file name entry contained in this instrument file list. That is +to avoid breaking the match when absolute paths are used during compilation. -For matching to succeed, the function/file name that is being compiled must end in the -function/file name entry contained in this instrument file list. That is to avoid -breaking the match when absolute paths are used during compilation. +**NOTE:** In builds with optimization enabled, functions might be inlined and +would not match! -**NOTE:** In builds with optimization enabled, functions might be inlined and would not match! +For example, if your source tree looks like this: -For example if your source tree looks like this: ``` project/ project/feature_a/a1.cpp @@ -83,36 +87,45 @@ project/feature_b/b1.cpp project/feature_b/b2.cpp ``` -and you only want to test feature_a, then create an "instrument file list" file containing: +And you only want to test feature_a, then create an "instrument file list" file +containing: + ``` feature_a/a1.cpp feature_a/a2.cpp ``` -However if the "instrument file list" file contains only this, it works as well: +However, if the "instrument file list" file contains only this, it works as +well: + ``` a1.cpp a2.cpp ``` -but it might lead to files being unwantedly instrumented if the same filename + +But it might lead to files being unwantedly instrumented if the same filename exists somewhere else in the project directories. -You can also specify function names. Note that for C++ the function names -must be mangled to match! `nm` can print these names. +You can also specify function names. Note that for C++ the function names must +be mangled to match! `nm` can print these names. + +AFL++ is able to identify whether an entry is a filename or a function. However, +if you want to be sure (and compliant to the sancov allow/blocklist format), you +can specify source file entries like this: -AFL++ is able to identify whether an entry is a filename or a function. -However if you want to be sure (and compliant to the sancov allow/blocklist -format), you can specify source file entries like this: ``` src: *malloc.c ``` -and function entries like this: + +And function entries like this: + ``` fun: MallocFoo ``` + Note that whitespace is ignored and comments (`# foo`) are supported. ### 3b) UNIX-style pattern matching You can add UNIX-style pattern matching in the "instrument file list" entries. -See `man fnmatch` for the syntax. We do not set any of the `fnmatch` flags. +See `man fnmatch` for the syntax. We do not set any of the `fnmatch` flags. \ No newline at end of file diff --git a/instrumentation/README.laf-intel.md b/instrumentation/README.laf-intel.md index 789055ed..3cde10c3 100644 --- a/instrumentation/README.laf-intel.md +++ b/instrumentation/README.laf-intel.md @@ -2,19 +2,17 @@ ## Introduction -This originally is the work of an individual nicknamed laf-intel. -His blog [Circumventing Fuzzing Roadblocks with Compiler Transformations](https://lafintel.wordpress.com/) -and gitlab repo [laf-llvm-pass](https://gitlab.com/laf-intel/laf-llvm-pass/) -describe some code transformations that -help AFL++ to enter conditional blocks, where conditions consist of -comparisons of large values. +This originally is the work of an individual nicknamed laf-intel. His blog +[Circumventing Fuzzing Roadblocks with Compiler Transformations](https://lafintel.wordpress.com/) +and GitLab repo [laf-llvm-pass](https://gitlab.com/laf-intel/laf-llvm-pass/) +describe some code transformations that help AFL++ to enter conditional blocks, +where conditions consist of comparisons of large values. ## Usage -By default these passes will not run when you compile programs using -afl-clang-fast. Hence, you can use AFL as usual. -To enable the passes you must set environment variables before you -compile the target project. +By default, these passes will not run when you compile programs using +afl-clang-fast. Hence, you can use AFL++ as usual. To enable the passes, you +must set environment variables before you compile the target project. The following options exist: @@ -24,32 +22,30 @@ Enables the split-switches pass. `export AFL_LLVM_LAF_TRANSFORM_COMPARES=1` -Enables the transform-compares pass (strcmp, memcmp, strncmp, -strcasecmp, strncasecmp). +Enables the transform-compares pass (strcmp, memcmp, strncmp, strcasecmp, +strncasecmp). `export AFL_LLVM_LAF_SPLIT_COMPARES=1` -Enables the split-compares pass. -By default it will +Enables the split-compares pass. By default, it will 1. simplify operators >= (and <=) into chains of > (<) and == comparisons -2. change signed integer comparisons to a chain of sign-only comparison -and unsigned integer comparisons -3. split all unsigned integer comparisons with bit widths of -64, 32 or 16 bits to chains of 8 bits comparisons. - -You can change the behaviour of the last step by setting -`export AFL_LLVM_LAF_SPLIT_COMPARES_BITW=`, where -bit_width may be 64, 32 or 16. For example, a bit_width of 16 -would split larger comparisons down to 16 bit comparisons. - -A new experimental feature is splitting floating point comparisons into a -series of sign, exponent and mantissa comparisons followed by splitting each -of them into 8 bit comparisons when necessary. -It is activated with the `AFL_LLVM_LAF_SPLIT_FLOATS` setting. -Please note that full IEEE 754 functionality is not preserved, that is -values of nan and infinity will probably behave differently. - -Note that setting this automatically activates `AFL_LLVM_LAF_SPLIT_COMPARES` - -You can also set `AFL_LLVM_LAF_ALL` and have all of the above enabled :-) - +2. change signed integer comparisons to a chain of sign-only comparison and + unsigned integer comparisons +3. split all unsigned integer comparisons with bit widths of 64, 32, or 16 bits + to chains of 8 bits comparisons. + +You can change the behavior of the last step by setting `export +AFL_LLVM_LAF_SPLIT_COMPARES_BITW=`, where bit_width may be 64, 32, or +16. For example, a bit_width of 16 would split larger comparisons down to 16 bit +comparisons. + +A new experimental feature is splitting floating point comparisons into a series +of sign, exponent and mantissa comparisons followed by splitting each of them +into 8 bit comparisons when necessary. It is activated with the +`AFL_LLVM_LAF_SPLIT_FLOATS` setting. Please note that full IEEE 754 +functionality is not preserved, that is values of nan and infinity will probably +behave differently. + +Note that setting this automatically activates `AFL_LLVM_LAF_SPLIT_COMPARES`. + +You can also set `AFL_LLVM_LAF_ALL` and have all of the above enabled. :-) \ No newline at end of file diff --git a/instrumentation/README.lto.md b/instrumentation/README.lto.md index 6174cdc0..a74425dc 100644 --- a/instrumentation/README.lto.md +++ b/instrumentation/README.lto.md @@ -1,55 +1,56 @@ # afl-clang-lto - collision free instrumentation at link time -## TLDR; +## TL;DR: -This version requires a current llvm 11+ compiled from the github master. +This version requires a current llvm 11+ compiled from the GitHub master. 1. Use afl-clang-lto/afl-clang-lto++ because it is faster and gives better - coverage than anything else that is out there in the AFL world + coverage than anything else that is out there in the AFL world. -2. You can use it together with llvm_mode: laf-intel and the instrument file listing - features and can be combined with cmplog/Redqueen +2. You can use it together with llvm_mode: laf-intel and the instrument file + listing features and can be combined with cmplog/Redqueen. -3. It only works with llvm 11+ +3. It only works with llvm 11+. -4. AUTODICTIONARY feature! see below +4. AUTODICTIONARY feature (see below)! -5. If any problems arise be sure to set `AR=llvm-ar RANLIB=llvm-ranlib`. - Some targets might need `LD=afl-clang-lto` and others `LD=afl-ld-lto`. +5. If any problems arise, be sure to set `AR=llvm-ar RANLIB=llvm-ranlib`. Some + targets might need `LD=afl-clang-lto` and others `LD=afl-ld-lto`. ## Introduction and problem description -A big issue with how AFL/AFL++ works is that the basic block IDs that are -set during compilation are random - and hence naturally the larger the number -of instrumented locations, the higher the number of edge collisions are in the -map. This can result in not discovering new paths and therefore degrade the +A big issue with how AFL++ works is that the basic block IDs that are set during +compilation are random - and hence naturally the larger the number of +instrumented locations, the higher the number of edge collisions are in the map. +This can result in not discovering new paths and therefore degrade the efficiency of the fuzzing process. -*This issue is underestimated in the fuzzing community!* -With a 2^16 = 64kb standard map at already 256 instrumented blocks there is -on average one collision. On average a target has 10.000 to 50.000 -instrumented blocks hence the real collisions are between 750-18.000! +*This issue is underestimated in the fuzzing community!* With a 2^16 = 64kb +standard map at already 256 instrumented blocks, there is on average one +collision. On average, a target has 10.000 to 50.000 instrumented blocks, hence +the real collisions are between 750-18.000! -To reach a solution that prevents any collisions took several approaches -and many dead ends until we got to this: +To reach a solution that prevents any collisions took several approaches and +many dead ends until we got to this: - * We instrument at link time when we have all files pre-compiled - * To instrument at link time we compile in LTO (link time optimization) mode - * Our compiler (afl-clang-lto/afl-clang-lto++) takes care of setting the - correct LTO options and runs our own afl-ld linker instead of the system - linker - * The LLVM linker collects all LTO files to link and instruments them so that - we have non-colliding edge overage - * We use a new (for afl) edge coverage - which is the same as in llvm - -fsanitize=coverage edge coverage mode :) +* We instrument at link time when we have all files pre-compiled. +* To instrument at link time, we compile in LTO (link time optimization) mode. +* Our compiler (afl-clang-lto/afl-clang-lto++) takes care of setting the correct + LTO options and runs our own afl-ld linker instead of the system linker. +* The LLVM linker collects all LTO files to link and instruments them so that we + have non-colliding edge overage. +* We use a new (for afl) edge coverage - which is the same as in llvm + -fsanitize=coverage edge coverage mode. :) The result: - * 10-25% speed gain compared to llvm_mode - * guaranteed non-colliding edge coverage :-) - * The compile time especially for binaries to an instrumented library can be - much longer + +* 10-25% speed gain compared to llvm_mode +* guaranteed non-colliding edge coverage :-) +* The compile time, especially for binaries to an instrumented library, can be + much longer. Example build output from a libtiff build: + ``` libtool: link: afl-clang-lto -g -O2 -Wall -W -o thumbnail thumbnail.o ../libtiff/.libs/libtiff.a ../port/.libs/libport.a -llzma -ljbig -ljpeg -lz -lm afl-clang-lto++2.63d by Marc "vanHauser" Heuse in mode LTO @@ -62,21 +63,24 @@ AUTODICTIONARY: 11 strings found ### Installing llvm version 11 or 12 -llvm 11 or even 12 should be available in all current Linux repositories. -If you use an outdated Linux distribution read the next section. +llvm 11 or even 12 should be available in all current Linux repositories. If you +use an outdated Linux distribution, read the next section. ### Installing llvm from the llvm repository (version 12+) Installing the llvm snapshot builds is easy and mostly painless: -In the follow line change `NAME` for your Debian or Ubuntu release name +In the following line, change `NAME` for your Debian or Ubuntu release name (e.g. buster, focal, eon, etc.): + ``` echo deb http://apt.llvm.org/NAME/ llvm-toolchain-NAME NAME >> /etc/apt/sources.list ``` -then add the pgp key of llvm and install the packages: + +Then add the pgp key of llvm and install the packages: + ``` -wget -O - https://apt.llvm.org/llvm-snapshot.gpg.key | apt-key add - +wget -O - https://apt.llvm.org/llvm-snapshot.gpg.key | apt-key add - apt-get update && apt-get upgrade -y apt-get install -y clang-12 clang-tools-12 libc++1-12 libc++-12-dev \ libc++abi1-12 libc++abi-12-dev libclang1-12 libclang-12-dev \ @@ -87,7 +91,8 @@ apt-get install -y clang-12 clang-tools-12 libc++1-12 libc++-12-dev \ ### Building llvm yourself (version 12+) -Building llvm from github takes quite some long time and is not painless: +Building llvm from GitHub takes quite some time and is not painless: + ```sh sudo apt install binutils-dev # this is *essential*! git clone --depth=1 https://github.com/llvm/llvm-project @@ -126,10 +131,12 @@ sudo make install Just use afl-clang-lto like you did with afl-clang-fast or afl-gcc. -Also the instrument file listing (AFL_LLVM_ALLOWLIST/AFL_LLVM_DENYLIST -> [README.instrument_list.md](README.instrument_list.md)) and -laf-intel/compcov (AFL_LLVM_LAF_* -> [README.laf-intel.md](README.laf-intel.md)) work. +Also, the instrument file listing (AFL_LLVM_ALLOWLIST/AFL_LLVM_DENYLIST -> +[README.instrument_list.md](README.instrument_list.md)) and laf-intel/compcov +(AFL_LLVM_LAF_* -> [README.laf-intel.md](README.laf-intel.md)) work. Example: + ``` CC=afl-clang-lto CXX=afl-clang-lto++ RANLIB=llvm-ranlib AR=llvm-ar ./configure make @@ -143,51 +150,48 @@ NOTE: some targets also need to set the linker, try both `afl-clang-lto` and Note: this is highly discouraged! Try to compile to static libraries with afl-clang-lto instead of shared libraries! -To make instrumented shared libraries work with afl-clang-lto you have to do +To make instrumented shared libraries work with afl-clang-lto, you have to do quite some extra steps. -Every shared library you want to instrument has to be individually compiled. -The environment variable `AFL_LLVM_LTO_DONTWRITEID=1` has to be set during -compilation. -Additionally the environment variable `AFL_LLVM_LTO_STARTID` has to be set to -the added edge count values of all previous compiled instrumented shared -libraries for that target. -E.g. for the first shared library this would be `AFL_LLVM_LTO_STARTID=0` and -afl-clang-lto will then report how many edges have been instrumented (let's say -it reported 1000 instrumented edges). -The second shared library then has to be set to that value +Every shared library you want to instrument has to be individually compiled. The +environment variable `AFL_LLVM_LTO_DONTWRITEID=1` has to be set during +compilation. Additionally, the environment variable `AFL_LLVM_LTO_STARTID` has +to be set to the added edge count values of all previous compiled instrumented +shared libraries for that target. E.g., for the first shared library this would +be `AFL_LLVM_LTO_STARTID=0` and afl-clang-lto will then report how many edges +have been instrumented (let's say it reported 1000 instrumented edges). The +second shared library then has to be set to that value (`AFL_LLVM_LTO_STARTID=1000` in our example), for the third to all previous counts added, etc. -The final program compilation step then may *not* have `AFL_LLVM_LTO_DONTWRITEID` -set, and `AFL_LLVM_LTO_STARTID` must be set to all edge counts added of all shared -libraries it will be linked to. +The final program compilation step then may *not* have +`AFL_LLVM_LTO_DONTWRITEID` set, and `AFL_LLVM_LTO_STARTID` must be set to all +edge counts added of all shared libraries it will be linked to. -This is quite some hands-on work, so better stay away from instrumenting -shared libraries :-) +This is quite some hands-on work, so better stay away from instrumenting shared +libraries. :-) ## AUTODICTIONARY feature While compiling, a dictionary based on string comparisons is automatically -generated and put into the target binary. This dictionary is transfered to afl-fuzz -on start. This improves coverage statistically by 5-10% :) +generated and put into the target binary. This dictionary is transferred to +afl-fuzz on start. This improves coverage statistically by 5-10%. :) -Note that if for any reason you do not want to use the autodictionary feature +Note that if for any reason you do not want to use the autodictionary feature, then just set the environment variable `AFL_NO_AUTODICT` when starting afl-fuzz. ## Fixed memory map To speed up fuzzing a little bit more, it is possible to set a fixed shared -memory map. -Recommended is the value 0x10000. +memory map. Recommended is the value 0x10000. -In most cases this will work without any problems. However if a target uses -early constructors, ifuncs or a deferred forkserver this can crash the target. +In most cases, this will work without any problems. However, if a target uses +early constructors, ifuncs, or a deferred forkserver, this can crash the target. -Also on unusual operating systems/processors/kernels or weird libraries the +Also, on unusual operating systems/processors/kernels or weird libraries the recommended 0x10000 address might not work, so then change the fixed address. -To enable this feature set AFL_LLVM_MAP_ADDR with the address. +To enable this feature, set `AFL_LLVM_MAP_ADDR` with the address. ## Document edge IDs @@ -206,143 +210,155 @@ these. An example of a hard to solve target is ffmpeg. Here is how to successfully instrument it: -1. Get and extract the current ffmpeg and change to its directory +1. Get and extract the current ffmpeg and change to its directory. 2. Running configure with --cc=clang fails and various other items will fail when compiling, so we have to trick configure: -``` -./configure --enable-lto --disable-shared --disable-inline-asm -``` - -3. Now the configuration is done - and we edit the settings in `./ffbuild/config.mak` - (-: the original line, +: what to change it into): -``` --CC=gcc -+CC=afl-clang-lto --CXX=g++ -+CXX=afl-clang-lto++ --AS=gcc -+AS=llvm-as --LD=gcc -+LD=afl-clang-lto++ --DEPCC=gcc -+DEPCC=afl-clang-lto --DEPAS=gcc -+DEPAS=afl-clang-lto++ --AR=ar -+AR=llvm-ar --AR_CMD=ar -+AR_CMD=llvm-ar --NM_CMD=nm -g -+NM_CMD=llvm-nm -g --RANLIB=ranlib -D -+RANLIB=llvm-ranlib -D -``` - -4. Then type make, wait for a long time and you are done :) + ``` + ./configure --enable-lto --disable-shared --disable-inline-asm + ``` + +3. Now the configuration is done - and we edit the settings in + `./ffbuild/config.mak` (-: the original line, +: what to change it into): + + ``` + -CC=gcc + +CC=afl-clang-lto + -CXX=g++ + +CXX=afl-clang-lto++ + -AS=gcc + +AS=llvm-as + -LD=gcc + +LD=afl-clang-lto++ + -DEPCC=gcc + +DEPCC=afl-clang-lto + -DEPAS=gcc + +DEPAS=afl-clang-lto++ + -AR=ar + +AR=llvm-ar + -AR_CMD=ar + +AR_CMD=llvm-ar + -NM_CMD=nm -g + +NM_CMD=llvm-nm -g + -RANLIB=ranlib -D + +RANLIB=llvm-ranlib -D + ``` + +4. Then type make, wait for a long time, and you are done. :) ### Example: WebKit jsc Building jsc is difficult as the build script has bugs. -1. checkout Webkit: -``` -svn checkout https://svn.webkit.org/repository/webkit/trunk WebKit -cd WebKit -``` +1. Checkout Webkit: + + ``` + svn checkout https://svn.webkit.org/repository/webkit/trunk WebKit + cd WebKit + ``` 2. Fix the build environment: -``` -mkdir -p WebKitBuild/Release -cd WebKitBuild/Release -ln -s ../../../../../usr/bin/llvm-ar-12 llvm-ar-12 -ln -s ../../../../../usr/bin/llvm-ranlib-12 llvm-ranlib-12 -cd ../.. -``` -3. Build :) + ``` + mkdir -p WebKitBuild/Release + cd WebKitBuild/Release + ln -s ../../../../../usr/bin/llvm-ar-12 llvm-ar-12 + ln -s ../../../../../usr/bin/llvm-ranlib-12 llvm-ranlib-12 + cd ../.. + ``` -``` -Tools/Scripts/build-jsc --jsc-only --cli --cmakeargs="-DCMAKE_AR='llvm-ar-12' -DCMAKE_RANLIB='llvm-ranlib-12' -DCMAKE_VERBOSE_MAKEFILE:BOOL=ON -DCMAKE_CC_FLAGS='-O3 -lrt' -DCMAKE_CXX_FLAGS='-O3 -lrt' -DIMPORTED_LOCATION='/lib/x86_64-linux-gnu/' -DCMAKE_CC=afl-clang-lto -DCMAKE_CXX=afl-clang-lto++ -DENABLE_STATIC_JSC=ON" -``` +3. Build. :) + + ``` + Tools/Scripts/build-jsc --jsc-only --cli --cmakeargs="-DCMAKE_AR='llvm-ar-12' -DCMAKE_RANLIB='llvm-ranlib-12' -DCMAKE_VERBOSE_MAKEFILE:BOOL=ON -DCMAKE_CC_FLAGS='-O3 -lrt' -DCMAKE_CXX_FLAGS='-O3 -lrt' -DIMPORTED_LOCATION='/lib/x86_64-linux-gnu/' -DCMAKE_CC=afl-clang-lto -DCMAKE_CXX=afl-clang-lto++ -DENABLE_STATIC_JSC=ON" + ``` ## Potential issues -### compiling libraries fails +### Compiling libraries fails If you see this message: + ``` /bin/ld: libfoo.a: error adding symbols: archive has no index; run ranlib to add one ``` -This is because usually gnu gcc ranlib is being called which cannot deal with clang LTO files. -The solution is simple: when you ./configure you also have to set RANLIB=llvm-ranlib and AR=llvm-ar + +This is because usually gnu gcc ranlib is being called which cannot deal with +clang LTO files. The solution is simple: when you `./configure`, you also have +to set `RANLIB=llvm-ranlib` and `AR=llvm-ar`. Solution: + ``` AR=llvm-ar RANLIB=llvm-ranlib CC=afl-clang-lto CXX=afl-clang-lto++ ./configure --disable-shared ``` -and on some targets you have to set AR=/RANLIB= even for make as the configure script does not save it. -Other targets ignore environment variables and need the parameters set via -`./configure --cc=... --cxx= --ranlib= ...` etc. (I am looking at you ffmpeg!). +And on some targets you have to set `AR=/RANLIB=` even for `make` as the +configure script does not save it. Other targets ignore environment variables +and need the parameters set via `./configure --cc=... --cxx= --ranlib= ...` etc. +(I am looking at you ffmpeg!) + +If you see this message: -If you see this message ``` assembler command failed ... ``` -then try setting `llvm-as` for configure: + +Then try setting `llvm-as` for configure: + ``` AS=llvm-as ... ``` -### compiling programs still fail +### Compiling programs still fail afl-clang-lto is still work in progress. Known issues: - * Anything that llvm 11+ cannot compile, afl-clang-lto cannot compile either - obviously - * Anything that does not compile with LTO, afl-clang-lto cannot compile either - obviously +* Anything that llvm 11+ cannot compile, afl-clang-lto cannot compile either - + obviously. +* Anything that does not compile with LTO, afl-clang-lto cannot compile either - + obviously. -Hence if building a target with afl-clang-lto fails try to build it with llvm12 -and LTO enabled (`CC=clang-12` `CXX=clang++-12` `CFLAGS=-flto=full` and -`CXXFLAGS=-flto=full`). +Hence, if building a target with afl-clang-lto fails, try to build it with +llvm12 and LTO enabled (`CC=clang-12`, `CXX=clang++-12`, `CFLAGS=-flto=full`, +and `CXXFLAGS=-flto=full`). -If this succeeeds then there is an issue with afl-clang-lto. Please report at -[https://github.com/AFLplusplus/AFLplusplus/issues/226](https://github.com/AFLplusplus/AFLplusplus/issues/226) +If this succeeds, then there is an issue with afl-clang-lto. Please report at +[https://github.com/AFLplusplus/AFLplusplus/issues/226](https://github.com/AFLplusplus/AFLplusplus/issues/226). Even some targets where clang-12 fails can be build if the fail is just in `./configure`, see `Solving difficult targets` above. ## History -This was originally envisioned by hexcoder- in Summer 2019, however we saw no -way to create a pass that is run at link time - although there is a option -for this in the PassManager: EP_FullLinkTimeOptimizationLast -("Fun" info - nobody knows what this is doing. And the developer who -implemented this didn't respond to emails.) - -In December then came the idea to implement this as a pass that is run via -the llvm "opt" program, which is performed via an own linker that afterwards -calls the real linker. -This was first implemented in January and work ... kinda. -The LTO time instrumentation worked, however "how" the basic blocks were -instrumented was a problem, as reducing duplicates turned out to be very, -very difficult with a program that has so many paths and therefore so many -dependencies. A lot of strategies were implemented - and failed. -And then sat solvers were tried, but with over 10.000 variables that turned -out to be a dead-end too. +This was originally envisioned by hexcoder- in Summer 2019. However, we saw no +way to create a pass that is run at link time - although there is a option for +this in the PassManager: EP_FullLinkTimeOptimizationLast. ("Fun" info - nobody +knows what this is doing. And the developer who implemented this didn't respond +to emails.) + +In December then came the idea to implement this as a pass that is run via the +llvm "opt" program, which is performed via an own linker that afterwards calls +the real linker. This was first implemented in January and work ... kinda. The +LTO time instrumentation worked, however, "how" the basic blocks were +instrumented was a problem, as reducing duplicates turned out to be very, very +difficult with a program that has so many paths and therefore so many +dependencies. A lot of strategies were implemented - and failed. And then sat +solvers were tried, but with over 10.000 variables that turned out to be a +dead-end too. The final idea to solve this came from domenukk who proposed to insert a block -into an edge and then just use incremental counters ... and this worked! -After some trials and errors to implement this vanhauser-thc found out that -there is actually an llvm function for this: SplitEdge() :-) +into an edge and then just use incremental counters ... and this worked! After +some trials and errors to implement this vanhauser-thc found out that there is +actually an llvm function for this: SplitEdge() :-) -Still more problems came up though as this only works without bugs from -llvm 9 onwards, and with high optimization the link optimization ruins -the instrumented control flow graph. +Still more problems came up though as this only works without bugs from llvm 9 +onwards, and with high optimization the link optimization ruins the instrumented +control flow graph. -This is all now fixed with llvm 11+. The llvm's own linker is now able to -load passes and this bypasses all problems we had. +This is all now fixed with llvm 11+. The llvm's own linker is now able to load +passes and this bypasses all problems we had. -Happy end :) +Happy end :) \ No newline at end of file diff --git a/instrumentation/README.persistent_mode.md b/instrumentation/README.persistent_mode.md index e9d2a523..d0ccba8c 100644 --- a/instrumentation/README.persistent_mode.md +++ b/instrumentation/README.persistent_mode.md @@ -132,7 +132,7 @@ and you should be all set! Some libraries provide APIs that are stateless, or whose state can be reset in between processing different input files. When such a reset is performed, a single long-lived process can be reused to try out multiple test cases, -eliminating the need for repeated fork() calls and the associated OS overhead. +eliminating the need for repeated `fork()` calls and the associated OS overhead. The basic structure of the program that does this would be: -- cgit 1.4.1 From 4f1310db5171ed52660d664551005f305f22a29d Mon Sep 17 00:00:00 2001 From: llzmb <46303940+llzmb@users.noreply.github.com> Date: Wed, 24 Nov 2021 13:30:00 +0100 Subject: Edit instrumentation READMEs --- instrumentation/README.gcc_plugin.md | 84 ++----------------------------- instrumentation/README.persistent_mode.md | 7 ++- 2 files changed, 7 insertions(+), 84 deletions(-) (limited to 'instrumentation/README.persistent_mode.md') diff --git a/instrumentation/README.gcc_plugin.md b/instrumentation/README.gcc_plugin.md index 33cf1c33..f251415b 100644 --- a/instrumentation/README.gcc_plugin.md +++ b/instrumentation/README.gcc_plugin.md @@ -87,89 +87,13 @@ reports to afl@aflplus.plus. ## 4) Bonus feature #1: deferred initialization -AFL++ tries to optimize performance by executing the targeted binary just once, -stopping it just before `main()`, and then cloning this "main" process to get a -steady supply of targets to fuzz. - -Although this approach eliminates much of the OS-, linker- and libc-level costs -of executing the program, it does not always help with binaries that perform -other time-consuming initialization steps - say, parsing a large config file -before getting to the fuzzed data. - -In such cases, it's beneficial to initialize the forkserver a bit later, once -most of the initialization work is already done, but before the binary attempts -to read the fuzzed input and parse it; in some cases, this can offer a 10x+ -performance gain. You can implement delayed initialization in GCC mode in a -fairly simple way: - -First, locate a suitable location in the code where the delayed cloning can take -place. This needs to be done with *extreme* care to avoid breaking the binary. -In particular, the program will probably malfunction if you select a location -after: - -- The creation of any vital threads or child processes - since the forkserver - can't clone them easily. - -- The initialization of timers via `setitimer()` or equivalent calls. - -- The creation of temporary files, network sockets, offset-sensitive file - descriptors, and similar shared-state resources - but only provided that their - state meaningfully influences the behavior of the program later on. - -- Any access to the fuzzed input, including reading the metadata about its size. - -With the location selected, add this code in the appropriate spot: - -```c -#ifdef __AFL_HAVE_MANUAL_CONTROL - __AFL_INIT(); -#endif -``` - -You don't need the #ifdef guards, but they will make the program still work as -usual when compiled with a compiler other than afl-gcc-fast/afl-clang-fast. - -Finally, recompile the program with afl-gcc-fast (afl-gcc or afl-clang will -*not* generate a deferred-initialization binary) - and you should be all set! +See +[README.persistent_mode.md#3) Deferred initialization](README.persistent_mode.md#3-deferred-initialization). ## 5) Bonus feature #2: persistent mode -Some libraries provide APIs that are stateless or whose state can be reset in -between processing different input files. When such a reset is performed, a -single long-lived process can be reused to try out multiple test cases, -eliminating the need for repeated `fork()` calls and the associated OS overhead. - -The basic structure of the program that does this would be: - -```c - while (__AFL_LOOP(1000)) { - - /* Read input data. */ - /* Call library code to be fuzzed. */ - /* Reset state. */ - - } - - /* Exit normally. */ -``` - -The numerical value specified within the loop controls the maximum number of -iterations before AFL++ will restart the process from scratch. This minimizes -the impact of memory leaks and similar glitches; 1000 is a good starting point. - -A more detailed template is shown in ../utils/persistent_mode/. Similarly to the -previous mode, the feature works only with afl-gcc-fast or afl-clang-fast; -#ifdef guards can be used to suppress it when using other compilers. - -Note that as with the previous mode, the feature is easy to misuse; if you do -not reset the critical state fully, you may end up with false positives or waste -a whole lot of CPU power doing nothing useful at all. Be particularly wary of -memory leaks and the state of file descriptors. - -When running in this mode, the execution paths will inherently vary a bit -depending on whether the input loop is being entered for the first time or -executed again. To avoid spurious warnings, the feature implies -`AFL_NO_VAR_CHECK` and hides the "variable path" warnings in the UI. +See +[README.persistent_mode.md#4) Persistent mode](README.persistent_mode.md#4-persistent-mode). ## 6) Bonus feature #3: selective instrumentation diff --git a/instrumentation/README.persistent_mode.md b/instrumentation/README.persistent_mode.md index d0ccba8c..14e59f4a 100644 --- a/instrumentation/README.persistent_mode.md +++ b/instrumentation/README.persistent_mode.md @@ -164,10 +164,9 @@ you do not fully reset the critical state, you may end up with false positives or waste a whole lot of CPU power doing nothing useful at all. Be particularly wary of memory leaks and of the state of file descriptors. -PS. Because there are task switches still involved, the mode isn't as fast as -"pure" in-process fuzzing offered, say, by LLVM's LibFuzzer; but it is a lot -faster than the normal `fork()` model, and compared to in-process fuzzing, -should be a lot more robust. +When running in this mode, the execution paths will inherently vary a bit +depending on whether the input loop is being entered for the first time or +executed again. ## 5) Shared memory fuzzing -- cgit 1.4.1