From 437efe795aa251d3beede18e2efd2f584884f060 Mon Sep 17 00:00:00 2001 From: Andrea Fioraldi Date: Sat, 1 Feb 2020 20:20:41 +0100 Subject: adjust a bit readmes --- docs/ChangeLog | 2 +- gcc_plugin/README.gcc.md | 160 ----------------------------------------- gcc_plugin/README.md | 158 ++++++++++++++++++++++++++++++++++++++++ libtokencap/README.md | 64 +++++++++++++++++ libtokencap/README.tokencap.md | 64 ----------------- testcases/README.md | 17 +++++ testcases/README.testcases | 19 ----- 7 files changed, 240 insertions(+), 244 deletions(-) delete mode 100644 gcc_plugin/README.gcc.md create mode 100644 gcc_plugin/README.md create mode 100644 libtokencap/README.md delete mode 100644 libtokencap/README.tokencap.md create mode 100644 testcases/README.md delete mode 100644 testcases/README.testcases diff --git a/docs/ChangeLog b/docs/ChangeLog index aef3ae7c..a559f6f2 100644 --- a/docs/ChangeLog +++ b/docs/ChangeLog @@ -34,7 +34,7 @@ Version ++2.60d (develop): the original script is still present as afl-cmin.bash - added blacklist and whitelisting function check in all modules of llvm_mode - added fix from Debian project to compile libdislocator and libtokencap - - libdislocator: AFL_ALIGNED_ALLOC to force size alignment to sizeof(void*) + - libdislocator: AFL_ALIGNED_ALLOC to force size alignment to max_align_t -------------------------- diff --git a/gcc_plugin/README.gcc.md b/gcc_plugin/README.gcc.md deleted file mode 100644 index 80fccfb6..00000000 --- a/gcc_plugin/README.gcc.md +++ /dev/null @@ -1,160 +0,0 @@ -=========================================== -GCC-based instrumentation for afl-fuzz -====================================== - - (See ../docs/README.md for the general instruction manual.) - (See ../llvm_mode/README.md for the LLVM-based instrumentation.) - -!!! TODO items are: -!!! => inline instrumentation has to work! -!!! - - -## 1) Introduction - -The code in this directory allows you to instrument programs for AFL using -true compiler-level instrumentation, instead of the more crude -assembly-level rewriting approach taken by afl-gcc and afl-clang. This has -several interesting properties: - - - The compiler can make many optimizations that are hard to pull off when - manually inserting assembly. As a result, some slow, CPU-bound programs will - run up to around faster. - - The gains are less pronounced for fast binaries, where the speed is limited - chiefly by the cost of creating new processes. In such cases, the gain will - probably stay within 10%. - - - The instrumentation is CPU-independent. At least in principle, you should - be able to rely on it to fuzz programs on non-x86 architectures (after - building afl-fuzz with AFL_NOX86=1). - - - Because the feature relies on the internals of GCC, it is gcc-specific - and will *not* work with LLVM (see ../llvm_mode for an alternative). - -Once this implementation is shown to be sufficiently robust and portable, it -will probably replace afl-gcc. For now, it can be built separately and -co-exists with the original code. - -The idea and much of the implementation comes from Laszlo Szekeres. - -## 2) How to use - -In order to leverage this mechanism, you need to have modern enough GCC -(>= version 4.5.0) and the plugin headers installed on your system. That -should be all you need. On Debian machines, these headers can be acquired by -installing the `gcc--plugin-dev` packages. - -To build the instrumentation itself, type 'make'. This will generate binaries -called afl-gcc-fast and afl-g++-fast in the parent directory. -If the CC/CXX have been overridden, those compilers will be used from -those wrappers without using AFL_CXX/AFL_CC settings. -Once this is done, you can instrument third-party code in a way similar to the -standard operating mode of AFL, e.g.: - - CC=/path/to/afl/afl-gcc-fast ./configure [...options...] - make - -Be sure to also include CXX set to afl-g++-fast for C++ code. - -The tool honors roughly the same environmental variables as afl-gcc (see -../docs/env_variables.txt). This includes AFL_INST_RATIO, AFL_USE_ASAN, -AFL_HARDEN, and AFL_DONT_OPTIMIZE. - -Note: if you want the GCC plugin to be installed on your system for all -users, you need to build it before issuing 'make install' in the parent -directory. - -## 3) Gotchas, feedback, bugs - -This is an early-stage mechanism, so field reports are welcome. You can send bug -reports to . - -## 4) Bonus feature #1: deferred initialization - -AFL tries to optimize performance by executing the targeted binary just once, -stopping it just before main(), and then cloning this "master" process to get -a steady supply of targets to fuzz. - -Although this approach eliminates much of the OS-, linker- and libc-level -costs of executing the program, it does not always help with binaries that -perform other time-consuming initialization steps - say, parsing a large config -file before getting to the fuzzed data. - -In such cases, it's beneficial to initialize the forkserver a bit later, once -most of the initialization work is already done, but before the binary attempts -to read the fuzzed input and parse it; in some cases, this can offer a 10x+ -performance gain. You can implement delayed initialization in LLVM mode in a -fairly simple way. - -First, locate a suitable location in the code where the delayed cloning can -take place. This needs to be done with *extreme* care to avoid breaking the -binary. In particular, the program will probably malfunction if you select -a location after: - - - The creation of any vital threads or child processes - since the forkserver - can't clone them easily. - - - The initialization of timers via setitimer() or equivalent calls. - - - The creation of temporary files, network sockets, offset-sensitive file - descriptors, and similar shared-state resources - but only provided that - their state meaningfully influences the behavior of the program later on. - - - Any access to the fuzzed input, including reading the metadata about its - size. - -With the location selected, add this code in the appropriate spot: - -``` -#ifdef __AFL_HAVE_MANUAL_CONTROL - __AFL_INIT(); -#endif -``` - -You don't need the #ifdef guards, but they will make the program still work as -usual when compiled with a tool other than afl-gcc-fast/afl-clang-fast. - -Finally, recompile the program with afl-gcc-fast (afl-gcc or afl-clang will -*not* generate a deferred-initialization binary) - and you should be all set! - -## 5) Bonus feature #2: persistent mode - -Some libraries provide APIs that are stateless, or whose state can be reset in -between processing different input files. When such a reset is performed, a -single long-lived process can be reused to try out multiple test cases, -eliminating the need for repeated fork() calls and the associated OS overhead. - -The basic structure of the program that does this would be: - -``` - while (__AFL_LOOP(1000)) { - - /* Read input data. */ - /* Call library code to be fuzzed. */ - /* Reset state. */ - - } - - /* Exit normally */ -``` - -The numerical value specified within the loop controls the maximum number -of iterations before AFL will restart the process from scratch. This minimizes -the impact of memory leaks and similar glitches; 1000 is a good starting point. - -A more detailed template is shown in ../experimental/persistent_demo/. -Similarly to the previous mode, the feature works only with afl-gcc-fast or -afl-clang-fast; #ifdef guards can be used to suppress it when using other -compilers. - -Note that as with the previous mode, the feature is easy to misuse; if you -do not reset the critical state fully, you may end up with false positives or -waste a whole lot of CPU power doing nothing useful at all. Be particularly -wary of memory leaks and the state of file descriptors. - -When running in this mode, the execution paths will inherently vary a bit -depending on whether the input loop is being entered for the first time or -executed again. To avoid spurious warnings, the feature implies -AFL_NO_VAR_CHECK and hides the "variable path" warnings in the UI. - diff --git a/gcc_plugin/README.md b/gcc_plugin/README.md new file mode 100644 index 00000000..8b944f1a --- /dev/null +++ b/gcc_plugin/README.md @@ -0,0 +1,158 @@ +# GCC-based instrumentation for afl-fuzz + + (See [../README.md](../README.md) for the general instruction manual.) + (See [../llvm_mode/README.md](../llvm_mode/README.md) for the LLVM-based instrumentation.) + +!!! TODO items are: +!!! => inline instrumentation has to work! +!!! + + +## 1) Introduction + +The code in this directory allows you to instrument programs for AFL using +true compiler-level instrumentation, instead of the more crude +assembly-level rewriting approach taken by afl-gcc and afl-clang. This has +several interesting properties: + + - The compiler can make many optimizations that are hard to pull off when + manually inserting assembly. As a result, some slow, CPU-bound programs will + run up to around faster. + + The gains are less pronounced for fast binaries, where the speed is limited + chiefly by the cost of creating new processes. In such cases, the gain will + probably stay within 10%. + + - The instrumentation is CPU-independent. At least in principle, you should + be able to rely on it to fuzz programs on non-x86 architectures (after + building afl-fuzz with AFL_NOX86=1). + + - Because the feature relies on the internals of GCC, it is gcc-specific + and will *not* work with LLVM (see ../llvm_mode for an alternative). + +Once this implementation is shown to be sufficiently robust and portable, it +will probably replace afl-gcc. For now, it can be built separately and +co-exists with the original code. + +The idea and much of the implementation comes from Laszlo Szekeres. + +## 2) How to use + +In order to leverage this mechanism, you need to have modern enough GCC +(>= version 4.5.0) and the plugin headers installed on your system. That +should be all you need. On Debian machines, these headers can be acquired by +installing the `gcc--plugin-dev` packages. + +To build the instrumentation itself, type 'make'. This will generate binaries +called afl-gcc-fast and afl-g++-fast in the parent directory. +If the CC/CXX have been overridden, those compilers will be used from +those wrappers without using AFL_CXX/AFL_CC settings. +Once this is done, you can instrument third-party code in a way similar to the +standard operating mode of AFL, e.g.: + + CC=/path/to/afl/afl-gcc-fast ./configure [...options...] + make + +Be sure to also include CXX set to afl-g++-fast for C++ code. + +The tool honors roughly the same environmental variables as afl-gcc (see +../docs/env_variables.txt). This includes AFL_INST_RATIO, AFL_USE_ASAN, +AFL_HARDEN, and AFL_DONT_OPTIMIZE. + +Note: if you want the GCC plugin to be installed on your system for all +users, you need to build it before issuing 'make install' in the parent +directory. + +## 3) Gotchas, feedback, bugs + +This is an early-stage mechanism, so field reports are welcome. You can send bug +reports to . + +## 4) Bonus feature #1: deferred initialization + +AFL tries to optimize performance by executing the targeted binary just once, +stopping it just before main(), and then cloning this "master" process to get +a steady supply of targets to fuzz. + +Although this approach eliminates much of the OS-, linker- and libc-level +costs of executing the program, it does not always help with binaries that +perform other time-consuming initialization steps - say, parsing a large config +file before getting to the fuzzed data. + +In such cases, it's beneficial to initialize the forkserver a bit later, once +most of the initialization work is already done, but before the binary attempts +to read the fuzzed input and parse it; in some cases, this can offer a 10x+ +performance gain. You can implement delayed initialization in LLVM mode in a +fairly simple way. + +First, locate a suitable location in the code where the delayed cloning can +take place. This needs to be done with *extreme* care to avoid breaking the +binary. In particular, the program will probably malfunction if you select +a location after: + + - The creation of any vital threads or child processes - since the forkserver + can't clone them easily. + + - The initialization of timers via setitimer() or equivalent calls. + + - The creation of temporary files, network sockets, offset-sensitive file + descriptors, and similar shared-state resources - but only provided that + their state meaningfully influences the behavior of the program later on. + + - Any access to the fuzzed input, including reading the metadata about its + size. + +With the location selected, add this code in the appropriate spot: + +``` +#ifdef __AFL_HAVE_MANUAL_CONTROL + __AFL_INIT(); +#endif +``` + +You don't need the #ifdef guards, but they will make the program still work as +usual when compiled with a tool other than afl-gcc-fast/afl-clang-fast. + +Finally, recompile the program with afl-gcc-fast (afl-gcc or afl-clang will +*not* generate a deferred-initialization binary) - and you should be all set! + +## 5) Bonus feature #2: persistent mode + +Some libraries provide APIs that are stateless, or whose state can be reset in +between processing different input files. When such a reset is performed, a +single long-lived process can be reused to try out multiple test cases, +eliminating the need for repeated fork() calls and the associated OS overhead. + +The basic structure of the program that does this would be: + +``` + while (__AFL_LOOP(1000)) { + + /* Read input data. */ + /* Call library code to be fuzzed. */ + /* Reset state. */ + + } + + /* Exit normally */ +``` + +The numerical value specified within the loop controls the maximum number +of iterations before AFL will restart the process from scratch. This minimizes +the impact of memory leaks and similar glitches; 1000 is a good starting point. + +A more detailed template is shown in ../experimental/persistent_demo/. +Similarly to the previous mode, the feature works only with afl-gcc-fast or +afl-clang-fast; #ifdef guards can be used to suppress it when using other +compilers. + +Note that as with the previous mode, the feature is easy to misuse; if you +do not reset the critical state fully, you may end up with false positives or +waste a whole lot of CPU power doing nothing useful at all. Be particularly +wary of memory leaks and the state of file descriptors. + +When running in this mode, the execution paths will inherently vary a bit +depending on whether the input loop is being entered for the first time or +executed again. To avoid spurious warnings, the feature implies +AFL_NO_VAR_CHECK and hides the "variable path" warnings in the UI. + diff --git a/libtokencap/README.md b/libtokencap/README.md new file mode 100644 index 00000000..8aae38bf --- /dev/null +++ b/libtokencap/README.md @@ -0,0 +1,64 @@ +# strcmp() / memcmp() token capture library + + (See ../docs/README for the general instruction manual.) + +This companion library allows you to instrument `strcmp()`, `memcmp()`, +and related functions to automatically extract syntax tokens passed to any of +these libcalls. The resulting list of tokens may be then given as a starting +dictionary to afl-fuzz (the -x option) to improve coverage on subsequent +fuzzing runs. + +This may help improving coverage in some targets, and do precisely nothing in +others. In some cases, it may even make things worse: if libtokencap picks up +syntax tokens that are not used to process the input data, but that are a part +of - say - parsing a config file... well, you're going to end up wasting a lot +of CPU time on trying them out in the input stream. In other words, use this +feature with care. Manually screening the resulting dictionary is almost +always a necessity. + +As for the actual operation: the library stores tokens, without any deduping, +by appending them to a file specified via AFL_TOKEN_FILE. If the variable is not +set, the tool uses stderr (which is probably not what you want). + +Similarly to afl-tmin, the library is not "proprietary" and can be used with +other fuzzers or testing tools without the need for any code tweaks. It does not +require AFL-instrumented binaries to work. + +To use the library, you *need* to make sure that your fuzzing target is compiled +with -fno-builtin and is linked dynamically. If you wish to automate the first +part without mucking with CFLAGS in Makefiles, you can set AFL_NO_BUILTIN=1 +when using afl-gcc. This setting specifically adds the following flags: + +``` + -fno-builtin-strcmp -fno-builtin-strncmp -fno-builtin-strcasecmp + -fno-builtin-strcasencmp -fno-builtin-memcmp -fno-builtin-strstr + -fno-builtin-strcasestr +``` + +The next step is simply loading this library via LD_PRELOAD. The optimal usage +pattern is to allow afl-fuzz to fuzz normally for a while and build up a corpus, +and then fire off the target binary, with libtokencap.so loaded, on every file +found by AFL in that earlier run. This demonstrates the basic principle: + +``` + export AFL_TOKEN_FILE=$PWD/temp_output.txt + + for i in /queue/id*; do + LD_PRELOAD=/path/to/libtokencap.so \ + /path/to/target/program [...params, including $i...] + done + + sort -u temp_output.txt >afl_dictionary.txt +``` + +If you don't get any results, the target library is probably not using strcmp() +and memcmp() to parse input; or you haven't compiled it with -fno-builtin; or +the whole thing isn't dynamically linked, and LD_PRELOAD is having no effect. + +Portability hints: There is probably no particularly portable and non-invasive +way to distinguish between read-only and read-write memory mappings. +The `__tokencap_load_mappings()` function is the only thing that would +need to be changed for other OSes. + +Current supported OSes are: Linux, Darwin, FreeBSD (thanks to @devnexen) + diff --git a/libtokencap/README.tokencap.md b/libtokencap/README.tokencap.md deleted file mode 100644 index 8aae38bf..00000000 --- a/libtokencap/README.tokencap.md +++ /dev/null @@ -1,64 +0,0 @@ -# strcmp() / memcmp() token capture library - - (See ../docs/README for the general instruction manual.) - -This companion library allows you to instrument `strcmp()`, `memcmp()`, -and related functions to automatically extract syntax tokens passed to any of -these libcalls. The resulting list of tokens may be then given as a starting -dictionary to afl-fuzz (the -x option) to improve coverage on subsequent -fuzzing runs. - -This may help improving coverage in some targets, and do precisely nothing in -others. In some cases, it may even make things worse: if libtokencap picks up -syntax tokens that are not used to process the input data, but that are a part -of - say - parsing a config file... well, you're going to end up wasting a lot -of CPU time on trying them out in the input stream. In other words, use this -feature with care. Manually screening the resulting dictionary is almost -always a necessity. - -As for the actual operation: the library stores tokens, without any deduping, -by appending them to a file specified via AFL_TOKEN_FILE. If the variable is not -set, the tool uses stderr (which is probably not what you want). - -Similarly to afl-tmin, the library is not "proprietary" and can be used with -other fuzzers or testing tools without the need for any code tweaks. It does not -require AFL-instrumented binaries to work. - -To use the library, you *need* to make sure that your fuzzing target is compiled -with -fno-builtin and is linked dynamically. If you wish to automate the first -part without mucking with CFLAGS in Makefiles, you can set AFL_NO_BUILTIN=1 -when using afl-gcc. This setting specifically adds the following flags: - -``` - -fno-builtin-strcmp -fno-builtin-strncmp -fno-builtin-strcasecmp - -fno-builtin-strcasencmp -fno-builtin-memcmp -fno-builtin-strstr - -fno-builtin-strcasestr -``` - -The next step is simply loading this library via LD_PRELOAD. The optimal usage -pattern is to allow afl-fuzz to fuzz normally for a while and build up a corpus, -and then fire off the target binary, with libtokencap.so loaded, on every file -found by AFL in that earlier run. This demonstrates the basic principle: - -``` - export AFL_TOKEN_FILE=$PWD/temp_output.txt - - for i in /queue/id*; do - LD_PRELOAD=/path/to/libtokencap.so \ - /path/to/target/program [...params, including $i...] - done - - sort -u temp_output.txt >afl_dictionary.txt -``` - -If you don't get any results, the target library is probably not using strcmp() -and memcmp() to parse input; or you haven't compiled it with -fno-builtin; or -the whole thing isn't dynamically linked, and LD_PRELOAD is having no effect. - -Portability hints: There is probably no particularly portable and non-invasive -way to distinguish between read-only and read-write memory mappings. -The `__tokencap_load_mappings()` function is the only thing that would -need to be changed for other OSes. - -Current supported OSes are: Linux, Darwin, FreeBSD (thanks to @devnexen) - diff --git a/testcases/README.md b/testcases/README.md new file mode 100644 index 00000000..ef38d3c4 --- /dev/null +++ b/testcases/README.md @@ -0,0 +1,17 @@ +# AFL starting test cases + + (See [../README.md](../README.md) for the general instruction manual.) + +The archives/, images/, multimedia/, and others/ subdirectories contain small, +standalone files that can be used to seed afl-fuzz when testing parsers for a +variety of common data formats. + +There is probably not much to be said about these files, except that they were +optimized for size and stripped of any non-essential fluff. Some directories +contain several examples that exercise various features of the underlying format. +For example, there is a PNG file with and without a color profile. + +Additional test cases are always welcome. + +In addition to well-chosen starting files, many fuzzing jobs benefit from a +small and concise dictionary. See [../dictionaries/README.md](../dictionaries/README.md) for more. diff --git a/testcases/README.testcases b/testcases/README.testcases deleted file mode 100644 index 30110ba1..00000000 --- a/testcases/README.testcases +++ /dev/null @@ -1,19 +0,0 @@ -======================= -AFL starting test cases -======================= - - (See ../docs/README for the general instruction manual.) - -The archives/, images/, multimedia/, and others/ subdirectories contain small, -standalone files that can be used to seed afl-fuzz when testing parsers for a -variety of common data formats. - -There is probably not much to be said about these files, except that they were -optimized for size and stripped of any non-essential fluff. Some directories -contain several examples that exercise various features of the underlying format. -For example, there is a PNG file with and without a color profile. - -Additional test cases are always welcome. - -In addition to well-chosen starting files, many fuzzing jobs benefit from a -small and concise dictionary. See ../dictionaries/README.dictionaries for more. -- cgit 1.4.1