diff options
-rw-r--r-- | docs/env_variables.md | 7 | ||||
-rw-r--r-- | docs/fuzzing_expert.md | 12 | ||||
-rw-r--r-- | instrumentation/README.ctx.md | 38 | ||||
-rw-r--r-- | instrumentation/README.llvm.md | 229 | ||||
-rw-r--r-- | instrumentation/README.ngram.md | 28 |
5 files changed, 152 insertions, 162 deletions
diff --git a/docs/env_variables.md b/docs/env_variables.md index 65cca0dc..4386c5f8 100644 --- a/docs/env_variables.md +++ b/docs/env_variables.md @@ -171,7 +171,7 @@ config.h to at least 18 and maybe up to 20 for this as otherwise too many map collisions occur. For more information, see -[instrumentation/README.ctx.md](../instrumentation/README.ctx.md). +[instrumentation/README.llvm.md#6) AFL Context Sensitive Branch Coverage](../instrumentation/README.llvm.md#6-afl-context-sensitive-branch-coverage). #### INSTRUMENT LIST (selectively instrument files and functions) @@ -247,7 +247,7 @@ in config.h to at least 18 and maybe up to 20 for this as otherwise too many map collisions occur. For more information, see -[instrumentation/README.ngram.md](../instrumentation/README.ngram.md). +[instrumentation/README.llvm.md#7) AFL N-Gram Branch Coverage](../instrumentation/README.llvm.md#7-afl-n-gram-branch-coverage). #### NOT_ZERO @@ -261,9 +261,6 @@ For more information, see If the target performs only a few loops, then this will give a small performance boost. -For more information, see -[instrumentation/README.neverzero.md](../instrumentation/README.neverzero.md). - #### Thread safe instrumentation counters (in all modes) Setting `AFL_LLVM_THREADSAFE_INST` will inject code that implements thread safe diff --git a/docs/fuzzing_expert.md b/docs/fuzzing_expert.md index 876c5fbb..5945d114 100644 --- a/docs/fuzzing_expert.md +++ b/docs/fuzzing_expert.md @@ -112,12 +112,8 @@ are interested in: There are many more options and modes available however these are most of the time less effective. See: - * [instrumentation/README.ctx.md](../instrumentation/README.ctx.md) - * [instrumentation/README.ngram.md](../instrumentation/README.ngram.md) - -AFL++ performs "never zero" counting in its bitmap. You can read more about this -here: - * [instrumentation/README.neverzero.md](../instrumentation/README.neverzero.md) + * [instrumentation/README.llvm.md#6) AFL Context Sensitive Branch Coverage](../instrumentation/README.llvm.md#6-afl-context-sensitive-branch-coverage). + * [instrumentation/README.llvm.md#7) AFL N-Gram Branch Coverage](../instrumentation/README.llvm.md#7-afl-n-gram-branch-coverage) #### c) Sanitizers @@ -247,7 +243,7 @@ For meson you have to set the AFL++ compiler with the very first command! Sometimes cmake and configure do not pick up the AFL++ compiler, or the ranlib/ar that is needed - because this was just not foreseen by the developer -of the target. Or they have non-standard options. Figure out if there is a +of the target. Or they have non-standard options. Figure out if there is a non-standard way to set this, otherwise set up the build normally and edit the generated build environment afterwards manually to point it to the right compiler (and/or ranlib and ar). @@ -337,7 +333,7 @@ Note that this step is rather optional though. #### Done! -The INPUTS_UNIQUE/ directory from step b) - or even better the directory input/ +The INPUTS_UNIQUE/ directory from step b) - or even better the directory input/ if you minimized the corpus in step c) - is the resulting input corpus directory to be used in fuzzing! :-) diff --git a/instrumentation/README.ctx.md b/instrumentation/README.ctx.md deleted file mode 100644 index 335e9921..00000000 --- a/instrumentation/README.ctx.md +++ /dev/null @@ -1,38 +0,0 @@ -# AFL Context Sensitive Branch Coverage - -## What is this? - -This is an LLVM-based implementation of the context sensitive branch coverage. - -Basically every function gets its own ID and, every time when an edge is logged, -all the IDs in the callstack are hashed and combined with the edge transition -hash to augment the classic edge coverage with the information about the -calling context. - -So if both function A and function B call a function C, the coverage -collected in C will be different. - -In math the coverage is collected as follows: -`map[current_location_ID ^ previous_location_ID >> 1 ^ hash_callstack_IDs] += 1` - -The callstack hash is produced XOR-ing the function IDs to avoid explosion with -recursive functions. - -## Usage - -Set the `AFL_LLVM_INSTRUMENT=CTX` or `AFL_LLVM_CTX=1` environment variable. - -It is highly recommended to increase the MAP_SIZE_POW2 definition in -config.h to at least 18 and maybe up to 20 for this as otherwise too -many map collisions occur. - -## Caller Branch Coverage - -If the context sensitive coverage introduces too may collisions and becoming -detrimental, the user can choose to augment edge coverage with just the -called function ID, instead of the entire callstack hash. - -In math the coverage is collected as follows: -`map[current_location_ID ^ previous_location_ID >> 1 ^ previous_callee_ID] += 1` - -Set the `AFL_LLVM_INSTRUMENT=CALLER` or `AFL_LLVM_CALLER=1` environment variable. diff --git a/instrumentation/README.llvm.md b/instrumentation/README.llvm.md index dbb604f2..1671f385 100644 --- a/instrumentation/README.llvm.md +++ b/instrumentation/README.llvm.md @@ -1,72 +1,79 @@ # Fast LLVM-based instrumentation for afl-fuzz - (See [../README.md](../README.md) for the general instruction manual.) +For the general instruction manual, see [../README.md](../README.md). - (See [README.gcc_plugin.md](README.gcc_plugin.md) for the GCC-based instrumentation.) +For the GCC-based instrumentation, see +[README.gcc_plugin.md](README.gcc_plugin.md). ## 1) Introduction ! llvm_mode works with llvm versions 3.8 up to 13 ! -The code in this directory allows you to instrument programs for AFL using -true compiler-level instrumentation, instead of the more crude -assembly-level rewriting approach taken by afl-gcc and afl-clang. This has -several interesting properties: +The code in this directory allows you to instrument programs for AFL using true +compiler-level instrumentation, instead of the more crude assembly-level +rewriting approach taken by afl-gcc and afl-clang. This has several interesting +properties: - - The compiler can make many optimizations that are hard to pull off when - manually inserting assembly. As a result, some slow, CPU-bound programs will - run up to around 2x faster. +- The compiler can make many optimizations that are hard to pull off when + manually inserting assembly. As a result, some slow, CPU-bound programs will + run up to around 2x faster. - The gains are less pronounced for fast binaries, where the speed is limited - chiefly by the cost of creating new processes. In such cases, the gain will - probably stay within 10%. + The gains are less pronounced for fast binaries, where the speed is limited + chiefly by the cost of creating new processes. In such cases, the gain will + probably stay within 10%. - - The instrumentation is CPU-independent. At least in principle, you should - be able to rely on it to fuzz programs on non-x86 architectures (after - building afl-fuzz with AFL_NO_X86=1). +- The instrumentation is CPU-independent. At least in principle, you should be + able to rely on it to fuzz programs on non-x86 architectures (after building + afl-fuzz with AFL_NO_X86=1). - - The instrumentation can cope a bit better with multi-threaded targets. +- The instrumentation can cope a bit better with multi-threaded targets. - - Because the feature relies on the internals of LLVM, it is clang-specific - and will *not* work with GCC (see ../gcc_plugin/ for an alternative once - it is available). +- Because the feature relies on the internals of LLVM, it is clang-specific and + will *not* work with GCC (see ../gcc_plugin/ for an alternative once it is + available). Once this implementation is shown to be sufficiently robust and portable, it will probably replace afl-clang. For now, it can be built separately and co-exists with the original code. -The idea and much of the intial implementation came from Laszlo Szekeres. +The idea and much of the initial implementation came from Laszlo Szekeres. ## 2a) How to use this - short Set the `LLVM_CONFIG` variable to the clang version you want to use, e.g. + ``` LLVM_CONFIG=llvm-config-9 make ``` + In case you have your own compiled llvm version specify the full path: + ``` LLVM_CONFIG=~/llvm-project/build/bin/llvm-config make ``` + If you try to use a new llvm version on an old Linux this can fail because of old c++ libraries. In this case usually switching to gcc/g++ to compile llvm_mode will work: + ``` LLVM_CONFIG=llvm-config-7 REAL_CC=gcc REAL_CXX=g++ make ``` -It is highly recommended to use the newest clang version you can put your -hands on :) + +It is highly recommended to use the newest clang version you can put your hands +on :) Then look at [README.persistent_mode.md](README.persistent_mode.md). ## 2b) How to use this - long In order to leverage this mechanism, you need to have clang installed on your -system. You should also make sure that the llvm-config tool is in your path -(or pointed to via LLVM_CONFIG in the environment). +system. You should also make sure that the llvm-config tool is in your path (or +pointed to via LLVM_CONFIG in the environment). -Note that if you have several LLVM versions installed, pointing LLVM_CONFIG -to the version you want to use will switch compiling to this specific -version - if you installation is set up correctly :-) +Note that if you have several LLVM versions installed, pointing LLVM_CONFIG to +the version you want to use will switch compiling to this specific version - if +you installation is set up correctly :-) Unfortunately, some systems that do have clang come without llvm-config or the LLVM development headers; one example of this is FreeBSD. FreeBSD users will @@ -75,15 +82,15 @@ load modules (you'll see "Service unavailable" when loading afl-llvm-pass.so). To solve all your problems, you can grab pre-built binaries for your OS from: - https://llvm.org/releases/download.html +[https://llvm.org/releases/download.html](https://llvm.org/releases/download.html) ...and then put the bin/ directory from the tarball at the beginning of your $PATH when compiling the feature and building packages later on. You don't need to be root for that. -To build the instrumentation itself, type 'make'. This will generate binaries -called afl-clang-fast and afl-clang-fast++ in the parent directory. Once this -is done, you can instrument third-party code in a way similar to the standard +To build the instrumentation itself, type `make`. This will generate binaries +called afl-clang-fast and afl-clang-fast++ in the parent directory. Once this is +done, you can instrument third-party code in a way similar to the standard operating mode of AFL, e.g.: ``` @@ -93,81 +100,137 @@ operating mode of AFL, e.g.: Be sure to also include CXX set to afl-clang-fast++ for C++ code. -Note that afl-clang-fast/afl-clang-fast++ are just pointers to afl-cc. -You can also use afl-cc/afl-c++ and instead direct it to use LLVM -instrumentation by either setting `AFL_CC_COMPILER=LLVM` or pass the parameter -`--afl-llvm` via CFLAGS/CXXFLAGS/CPPFLAGS. +Note that afl-clang-fast/afl-clang-fast++ are just pointers to afl-cc. You can +also use afl-cc/afl-c++ and instead direct it to use LLVM instrumentation by +either setting `AFL_CC_COMPILER=LLVM` or pass the parameter `--afl-llvm` via +CFLAGS/CXXFLAGS/CPPFLAGS. The tool honors roughly the same environmental variables as afl-gcc (see [docs/env_variables.md](../docs/env_variables.md)). This includes AFL_USE_ASAN, -AFL_HARDEN, and AFL_DONT_OPTIMIZE. However AFL_INST_RATIO is not honored -as it does not serve a good purpose with the more effective PCGUARD analysis. +AFL_HARDEN, and AFL_DONT_OPTIMIZE. However AFL_INST_RATIO is not honored as it +does not serve a good purpose with the more effective PCGUARD analysis. ## 3) Options -Several options are present to make llvm_mode faster or help it rearrange -the code to make afl-fuzz path discovery easier. +Several options are present to make llvm_mode faster or help it rearrange the +code to make afl-fuzz path discovery easier. -If you need just to instrument specific parts of the code, you can the instrument file list -which C/C++ files to actually instrument. See [README.instrument_list.md](README.instrument_list.md) +If you need just to instrument specific parts of the code, you can the +instrument file list which C/C++ files to actually instrument. See +[README.instrument_list.md](README.instrument_list.md) -For splitting memcmp, strncmp, etc. please see [README.laf-intel.md](README.laf-intel.md) +For splitting memcmp, strncmp, etc. please see +[README.laf-intel.md](README.laf-intel.md) Then there are different ways of instrumenting the target: -1. An better instrumentation strategy uses LTO and link time -instrumentation. Note that not all targets can compile in this mode, however -if it works it is the best option you can use. -Simply use afl-clang-lto/afl-clang-lto++ to use this option. -See [README.lto.md](README.lto.md) +1. An better instrumentation strategy uses LTO and link time instrumentation. + Note that not all targets can compile in this mode, however if it works it is + the best option you can use. Simply use afl-clang-lto/afl-clang-lto++ to use + this option. See [README.lto.md](README.lto.md). -2. Alternativly you can choose a completely different coverage method: +2. Alternatively you can choose a completely different coverage method: -2a. N-GRAM coverage - which combines the previous visited edges with the -current one. This explodes the map but on the other hand has proven to be -effective for fuzzing. -See [README.ngram.md](README.ngram.md) +2a. N-GRAM coverage - which combines the previous visited edges with the current + one. This explodes the map but on the other hand has proven to be effective + for fuzzing. See + [7) AFL N-Gram Branch Coverage](#7-afl-n-gram-branch-coverage). 2b. Context sensitive coverage - which combines the visited edges with an -individual caller ID (the function that called the current one) -[README.ctx.md](README.ctx.md) + individual caller ID (the function that called the current one). See + [6) AFL Context Sensitive Branch Coverage](#6-afl-context-sensitive-branch-coverage). -Then - additionally to one of the instrumentation options above - there is -a very effective new instrumentation option called CmpLog as an alternative to -laf-intel that allow AFL++ to apply mutations similar to Redqueen. -See [README.cmplog.md](README.cmplog.md) +Then - additionally to one of the instrumentation options above - there is a +very effective new instrumentation option called CmpLog as an alternative to +laf-intel that allow AFL++ to apply mutations similar to Redqueen. See +[README.cmplog.md](README.cmplog.md). -Finally if your llvm version is 8 or lower, you can activate a mode that -prevents that a counter overflow result in a 0 value. This is good for -path discovery, but the llvm implementation for x86 for this functionality -is not optimal and was only fixed in llvm 9. -You can set this with AFL_LLVM_NOT_ZERO=1 -See [README.neverzero.md](README.neverzero.md) +Finally, if your llvm version is 8 or lower, you can activate a mode that +prevents that a counter overflow result in a 0 value. This is good for path +discovery, but the llvm implementation for x86 for this functionality is not +optimal and was only fixed in llvm 9. You can set this with AFL_LLVM_NOT_ZERO=1. -Support for thread safe counters has been added for all modes. -Activate it with `AFL_LLVM_THREADSAFE_INST=1`. The tradeoff is better precision -in multi threaded apps for a slightly higher instrumentation overhead. -This also disables the nozero counter default for performance reasons. +Support for thread safe counters has been added for all modes. Activate it with +`AFL_LLVM_THREADSAFE_INST=1`. The tradeoff is better precision in multi threaded +apps for a slightly higher instrumentation overhead. This also disables the +nozero counter default for performance reasons. -## 4) Snapshot feature +## 4) deferred initialization, persistent mode, shared memory fuzzing -To speed up fuzzing you can use a linux loadable kernel module which enables -a snapshot feature. -See [README.snapshot.md](README.snapshot.md) +This is the most powerful and effective fuzzing you can do. Please see +[README.persistent_mode.md](README.persistent_mode.md) for a full explanation. -## 5) Gotchas, feedback, bugs +## 5) Bonus feature: 'dict2file' pass -This is an early-stage mechanism, so field reports are welcome. You can send bug -reports to <afl-users@googlegroups.com>. +Just specify `AFL_LLVM_DICT2FILE=/absolute/path/file.txt` and during compilation +all constant string compare parameters will be written to this file to be used +with afl-fuzz' `-x` option. -## 6) deferred initialization, persistent mode, shared memory fuzzing +## 6) AFL Context Sensitive Branch Coverage -This is the most powerful and effective fuzzing you can do. -Please see [README.persistent_mode.md](README.persistent_mode.md) for a -full explanation. +### What is this? -## 7) Bonus feature: 'dict2file' pass +This is an LLVM-based implementation of the context sensitive branch coverage. -Just specify `AFL_LLVM_DICT2FILE=/absolute/path/file.txt` and during compilation -all constant string compare parameters will be written to this file to be -used with afl-fuzz' `-x` option. +Basically every function gets its own ID and, every time when an edge is logged, +all the IDs in the callstack are hashed and combined with the edge transition +hash to augment the classic edge coverage with the information about the calling +context. + +So if both function A and function B call a function C, the coverage collected +in C will be different. + +In math the coverage is collected as follows: `map[current_location_ID ^ +previous_location_ID >> 1 ^ hash_callstack_IDs] += 1` + +The callstack hash is produced XOR-ing the function IDs to avoid explosion with +recursive functions. + +### Usage + +Set the `AFL_LLVM_INSTRUMENT=CTX` or `AFL_LLVM_CTX=1` environment variable. + +It is highly recommended to increase the MAP_SIZE_POW2 definition in config.h to +at least 18 and maybe up to 20 for this as otherwise too many map collisions +occur. + +### Caller Branch Coverage + +If the context sensitive coverage introduces too may collisions and becoming +detrimental, the user can choose to augment edge coverage with just the called +function ID, instead of the entire callstack hash. + +In math the coverage is collected as follows: `map[current_location_ID ^ +previous_location_ID >> 1 ^ previous_callee_ID] += 1` + +Set the `AFL_LLVM_INSTRUMENT=CALLER` or `AFL_LLVM_CALLER=1` environment +variable. + +## 7) AFL N-Gram Branch Coverage + +### Source + +This is an LLVM-based implementation of the n-gram branch coverage proposed in +the paper +["Be Sensitive and Collaborative: Analyzing Impact of Coverage Metrics in Greybox Fuzzing"](https://www.usenix.org/system/files/raid2019-wang-jinghan.pdf) +by Jinghan Wang, et. al. + +Note that the original implementation (available +[here](https://github.com/bitsecurerlab/afl-sensitive)) is built on top of AFL's +qemu_mode. This is essentially a port that uses LLVM vectorized instructions +(available from llvm versions 4.0.1 and higher) to achieve the same results when +compiling source code. + +In math the branch coverage is performed as follows: `map[current_location ^ +prev_location[0] >> 1 ^ prev_location[1] >> 1 ^ ... up to n-1`] += 1` + +### Usage + +The size of `n` (i.e., the number of branches to remember) is an option that is +specified either in the `AFL_LLVM_INSTRUMENT=NGRAM-{value}` or the +`AFL_LLVM_NGRAM_SIZE` environment variable. Good values are 2, 4, or 8, valid +are 2-16. + +It is highly recommended to increase the MAP_SIZE_POW2 definition in config.h to +at least 18 and maybe up to 20 for this as otherwise too many map collisions +occur. \ No newline at end of file diff --git a/instrumentation/README.ngram.md b/instrumentation/README.ngram.md deleted file mode 100644 index da61ef32..00000000 --- a/instrumentation/README.ngram.md +++ /dev/null @@ -1,28 +0,0 @@ -# AFL N-Gram Branch Coverage - -## Source - -This is an LLVM-based implementation of the n-gram branch coverage proposed in -the paper ["Be Sensitive and Collaborative: Analzying Impact of Coverage Metrics -in Greybox Fuzzing"](https://www.usenix.org/system/files/raid2019-wang-jinghan.pdf), -by Jinghan Wang, et. al. - -Note that the original implementation (available -[here](https://github.com/bitsecurerlab/afl-sensitive)) -is built on top of AFL's QEMU mode. -This is essentially a port that uses LLVM vectorized instructions (available from -llvm versions 4.0.1 and higher) to achieve the same results when compiling source code. - -In math the branch coverage is performed as follows: -`map[current_location ^ prev_location[0] >> 1 ^ prev_location[1] >> 1 ^ ... up to n-1`] += 1` - -## Usage - -The size of `n` (i.e., the number of branches to remember) is an option -that is specified either in the `AFL_LLVM_INSTRUMENT=NGRAM-{value}` or the -`AFL_LLVM_NGRAM_SIZE` environment variable. -Good values are 2, 4 or 8, valid are 2-16. - -It is highly recommended to increase the MAP_SIZE_POW2 definition in -config.h to at least 18 and maybe up to 20 for this as otherwise too -many map collisions occur. |