about summary refs log tree commit diff
path: root/docs/fuzzing_in_depth.md
diff options
context:
space:
mode:
authorvan Hauser <vh@thc.org>2021-12-07 15:18:32 +0100
committerGitHub <noreply@github.com>2021-12-07 15:18:32 +0100
commit1f6c72ea1baea69b2dc5b3a68bfacbc00652bc66 (patch)
treea5a7ed81710c1dec50f0aa661b53c0cd884a4da2 /docs/fuzzing_in_depth.md
parent5469112db90741cb06c0979313938d83e63f793f (diff)
parentbb506de0b809f97a4221ee1b6e040dcb5f9ca56a (diff)
downloadafl++-1f6c72ea1baea69b2dc5b3a68bfacbc00652bc66.tar.gz
Merge pull request #1191 from llzmb/docs_quality_assurance
Docs content - quality assurance
Diffstat (limited to 'docs/fuzzing_in_depth.md')
-rw-r--r--docs/fuzzing_in_depth.md156
1 files changed, 82 insertions, 74 deletions
diff --git a/docs/fuzzing_in_depth.md b/docs/fuzzing_in_depth.md
index 251bbc1d..aaceb600 100644
--- a/docs/fuzzing_in_depth.md
+++ b/docs/fuzzing_in_depth.md
@@ -1,7 +1,7 @@
 # Fuzzing with AFL++
 
 The following describes how to fuzz with a target if source code is available.
-If you have a binary-only target, please go to
+If you have a binary-only target, go to
 [fuzzing_binary-only_targets.md](fuzzing_binary-only_targets.md).
 
 Fuzzing source code is a three-step process:
@@ -106,9 +106,9 @@ You can select the mode for the afl-cc compiler by:
 MODE can be one of: LTO (afl-clang-lto*), LLVM (afl-clang-fast*), GCC_PLUGIN
 (afl-g*-fast) or GCC (afl-gcc/afl-g++) or CLANG(afl-clang/afl-clang++).
 
-Because no AFL specific command-line options are accepted (beside the --afl-MODE
-command), the compile-time tools make fairly broad use of environment variables,
-which can be listed with `afl-cc -hh` or by reading
+Because no AFL++ specific command-line options are accepted (beside the
+--afl-MODE command), the compile-time tools make fairly broad use of environment
+variables, which can be listed with `afl-cc -hh` or by reading
 [env_variables.md](env_variables.md).
 
 ### b) Selecting instrumentation options
@@ -131,8 +131,8 @@ The following options are available when you instrument with LTO mode
   have to compile the target twice, once specifically with/for this mode by
   setting `AFL_LLVM_CMPLOG=1`, and pass this binary to afl-fuzz via the `-c`
   parameter. Note that you can compile also just a cmplog binary and use that
-  for both however there will be a performance penality. You can read more about
-  this in
+  for both, however, there will be a performance penalty. You can read more
+  about this in
   [instrumentation/README.cmplog.md](../instrumentation/README.cmplog.md).
 
 If you use LTO, LLVM or GCC_PLUGIN mode
@@ -151,14 +151,14 @@ only instrument parts of the target that you are interested in:
   inlined and then would not match! See
   [instrumentation/README.instrument_list.md](../instrumentation/README.instrument_list.md)
 
-There are many more options and modes available however these are most of the
+There are many more options and modes available, however, these are most of the
 time less effective. See:
-* [instrumentation/README.ctx.md](../instrumentation/README.ctx.md)
-* [instrumentation/README.ngram.md](../instrumentation/README.ngram.md)
+* [instrumentation/README.llvm.md#6) AFL++ Context Sensitive Branch Coverage](../instrumentation/README.llvm.md#6-afl-context-sensitive-branch-coverage)
+* [instrumentation/README.llvm.md#7) AFL++ N-Gram Branch Coverage](../instrumentation/README.llvm.md#7-afl-n-gram-branch-coverage)
 
 AFL++ performs "never zero" counting in its bitmap. You can read more about this
 here:
-* [instrumentation/README.neverzero.md](../instrumentation/README.neverzero.md)
+* [instrumentation/README.llvm.md#8-neverzero-counters](../instrumentation/README.llvm.md#8-neverzero-counters)
 
 ### c) Selecting sanitizers
 
@@ -167,7 +167,7 @@ allows you to find bugs that would not necessarily result in a crash.
 
 Note that sanitizers have a huge impact on CPU (= less executions per second)
 and RAM usage. Also you should only run one afl-fuzz instance per sanitizer
-type. This is enough because a use-after-free bug will be picked up, e.g. by
+type. This is enough because a use-after-free bug will be picked up, e.g., by
 ASAN (address sanitizer) anyway when syncing to other fuzzing instances, so not
 all fuzzing instances need to be instrumented with ASAN.
 
@@ -175,11 +175,11 @@ The following sanitizers have built-in support in AFL++:
 * ASAN = Address SANitizer, finds memory corruption vulnerabilities like
   use-after-free, NULL pointer dereference, buffer overruns, etc. Enabled with
   `export AFL_USE_ASAN=1` before compiling.
-* MSAN = Memory SANitizer, finds read access to uninitialized memory, eg. a
+* MSAN = Memory SANitizer, finds read access to uninitialized memory, e.g., a
   local variable that is defined and read before it is even set. Enabled with
   `export AFL_USE_MSAN=1` before compiling.
-* UBSAN = Undefined Behaviour SANitizer, finds instances where - by the C and
-  C++ standards - undefined behaviour happens, e.g. adding two signed integers
+* UBSAN = Undefined Behavior SANitizer, finds instances where - by the C and C++
+  standards - undefined behavior happens, e.g., adding two signed integers
   together where the result is larger than a signed integer can hold. Enabled
   with `export AFL_USE_UBSAN=1` before compiling.
 * CFISAN = Control Flow Integrity SANitizer, finds instances where the control
@@ -196,24 +196,24 @@ The following sanitizers have built-in support in AFL++:
   of the target source code where you find a leak check necessary! Enabled with
   `export AFL_USE_LSAN=1` before compiling.
 
-It is possible to further modify the behaviour of the sanitizers at run-time by
+It is possible to further modify the behavior of the sanitizers at run-time by
 setting `ASAN_OPTIONS=...`, `LSAN_OPTIONS` etc. - the available parameters can
 be looked up in the sanitizer documentation of llvm/clang. afl-fuzz, however,
 requires some specific parameters important for fuzzing to be set. If you want
 to set your own, it might bail and report what it is missing.
 
-Note that some sanitizers cannot be used together, e.g. ASAN and MSAN, and
-others often cannot work together because of target weirdness, e.g. ASAN and
+Note that some sanitizers cannot be used together, e.g., ASAN and MSAN, and
+others often cannot work together because of target weirdness, e.g., ASAN and
 CFISAN. You might need to experiment which sanitizers you can combine in a
 target (which means more instances can be run without a sanitized target, which
 is more effective).
 
 ### d) Modifying the target
 
-If the target has features that make fuzzing more difficult, e.g. checksums,
-HMAC, etc. then modify the source code so that checks for these values are
+If the target has features that make fuzzing more difficult, e.g., checksums,
+HMAC, etc., then modify the source code so that checks for these values are
 removed. This can even be done safely for source code used in operational
-products by eliminating these checks within these AFL specific blocks:
+products by eliminating these checks within these AFL++ specific blocks:
 
 ```
 #ifdef FUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION
@@ -244,21 +244,22 @@ Then build the target. (Usually with `make`)
 
 **NOTES**
 
-1. sometimes configure and build systems are fickle and do not like stderr
+1. Sometimes configure and build systems are fickle and do not like stderr
    output (and think this means a test failure) - which is something AFL++ likes
    to do to show statistics. It is recommended to disable AFL++ instrumentation
    reporting via `export AFL_QUIET=1`.
 
-2. sometimes configure and build systems error on warnings - these should be
-   disabled (e.g. `--disable-werror` for some configure scripts).
+2. Sometimes configure and build systems error on warnings - these should be
+   disabled (e.g., `--disable-werror` for some configure scripts).
 
-3. in case the configure/build system complains about AFL++'s compiler and
-   aborts then set `export AFL_NOOPT=1` which will then just behave like the
+3. In case the configure/build system complains about AFL++'s compiler and
+   aborts, then set `export AFL_NOOPT=1` which will then just behave like the
    real compiler. This option has to be unset again before building the target!
 
 #### configure
 
 For `configure` build systems this is usually done by:
+
 `CC=afl-clang-fast CXX=afl-clang-fast++ ./configure --disable-shared`
 
 Note that if you are using the (better) afl-clang-lto compiler you also have to
@@ -268,6 +269,7 @@ described in [instrumentation/README.lto.md](../instrumentation/README.lto.md).
 #### cmake
 
 For `cmake` build systems this is usually done by:
+
 `mkdir build; cd build; cmake -DCMAKE_C_COMPILER=afl-cc -DCMAKE_CXX_COMPILER=afl-c++ ..`
 
 Note that if you are using the (better) afl-clang-lto compiler you also have to
@@ -302,13 +304,13 @@ that you want to fuzz, plus a few specific AFL++ functions around it. See
 [instrumentation/README.persistent_mode.md](../instrumentation/README.persistent_mode.md)
 for details.
 
-Basically if you do not fuzz a target in persistent mode then you are just doing
-it for a hobby and not professionally :-).
+Basically if you do not fuzz a target in persistent mode, then you are just
+doing it for a hobby and not professionally :-).
 
 ### g) libfuzzer fuzzer harnesses with LLVMFuzzerTestOneInput()
 
-libfuzzer `LLVMFuzzerTestOneInput()` harnesses are the defacto standard
-for fuzzing, and they can be used with AFL++ (and honggfuzz) as well!
+libfuzzer `LLVMFuzzerTestOneInput()` harnesses are the defacto standard for
+fuzzing, and they can be used with AFL++ (and honggfuzz) as well!
 
 Compiling them is as simple as:
 
@@ -354,12 +356,15 @@ You can find many good examples of starting files in the
 Use the AFL++ tool `afl-cmin` to remove inputs from the corpus that do not
 produce a new path in the target.
 
-Put all files from step a) into one directory, e.g. INPUTS.
+Put all files from step a) into one directory, e.g., INPUTS.
 
 If the target program is to be called by fuzzing as `bin/target -d INPUTFILE`
 the run afl-cmin like this:
+
 `afl-cmin -i INPUTS -o INPUTS_UNIQUE -- bin/target -d @@`
-Note that the INPUTFILE argument that the target program would read from has to be set as `@@`.
+
+Note that the INPUTFILE argument that the target program would read from has to
+be set as `@@`.
 
 If the target reads from stdin instead, just omit the `@@` as this is the
 default.
@@ -369,8 +374,8 @@ This step is highly recommended!
 ### c) Minimizing all corpus files
 
 The shorter the input files that still traverse the same path within the target,
-the better the fuzzing will be. This minimization is done with `afl-tmin`
-however it is a long process as this has to be done for every file:
+the better the fuzzing will be. This minimization is done with `afl-tmin`,
+however, it is a long process as this has to be done for every file:
 
 ```
 mkdir input
@@ -380,8 +385,8 @@ for i in *; do
 done
 ```
 
-This step can also be parallelized, e.g. with `parallel`. Note that this step is
-rather optional though.
+This step can also be parallelized, e.g., with `parallel`. Note that this step
+is rather optional though.
 
 ### Done!
 
@@ -391,7 +396,7 @@ to be used in fuzzing! :-)
 
 ## 3. Fuzzing the target
 
-In this final step we fuzz the target. There are not that many important options
+In this final step, fuzz the target. There are not that many important options
 to run the target - unless you want to use many CPU cores/threads for the
 fuzzing, which will make the fuzzing much more useful.
 
@@ -413,35 +418,38 @@ Note that both scripts improve your fuzzing performance but also decrease your
 system protection against attacks! So set strong firewall rules and only expose
 SSH as a network service if you use these (which is highly recommended).
 
-If you have an input corpus from step 2 then specify this directory with the
-`-i` option. Otherwise create a new directory and create a file with any content
-as test data in there.
+If you have an input corpus from step 2, then specify this directory with the
+`-i` option. Otherwise, create a new directory and create a file with any
+content as test data in there.
 
 If you do not want anything special, the defaults are already usually best,
 hence all you need is to specify the seed input directory with the result of
 step [2a) Collect inputs](#a-collect-inputs):
+
 `afl-fuzz -i input -o output -- bin/target -d @@`
-Note that the directory specified with -o will be created if it does not exist.
+
+Note that the directory specified with `-o` will be created if it does not
+exist.
 
 It can be valuable to run afl-fuzz in a screen or tmux shell so you can log off,
 or afl-fuzz is not aborted if you are running it in a remote ssh session where
-the connection fails in between.
-Only do that though once you have verified that your fuzzing setup works!
-Simply run it like `screen -dmS afl-main -- afl-fuzz -M main-$HOSTNAME -i ...`
-and it will start away in a screen session. To enter this session simply type
-`screen -r afl-main`. You see - it makes sense to name the screen session
-same as the afl-fuzz -M/-S naming :-)
-For more information on screen or tmux please check their documentation.
+the connection fails in between. Only do that though once you have verified that
+your fuzzing setup works! Run it like `screen -dmS afl-main -- afl-fuzz -M
+main-$HOSTNAME -i ...` and it will start away in a screen session. To enter this
+session, type `screen -r afl-main`. You see - it makes sense to name the screen
+session same as the afl-fuzz -M/-S naming :-) For more information on screen or
+tmux, check their documentation.
 
 If you need to stop and re-start the fuzzing, use the same command line options
 (or even change them by selecting a different power schedule or another mutation
 mode!) and switch the input directory with a dash (`-`):
+
 `afl-fuzz -i - -o output -- bin/target -d @@`
 
 Adding a dictionary is helpful. See the directory
 [dictionaries/](../dictionaries/) if something is already included for your data
 format, and tell afl-fuzz to load that dictionary by adding `-x
-dictionaries/FORMAT.dict`. With afl-clang-lto you have an autodictionary
+dictionaries/FORMAT.dict`. With afl-clang-lto, you have an autodictionary
 generation for which you need to do nothing except to use afl-clang-lto as the
 compiler. You also have the option to generate a dictionary yourself, see
 [utils/libtokencap/README.md](../utils/libtokencap/README.md).
@@ -457,15 +465,17 @@ handling in the target. Play around with various -m values until you find one
 that safely works for all your input seeds (if you have good ones and then
 double or quadruple that.
 
-By default afl-fuzz never stops fuzzing. To terminate AFL++ simply press
-Control-C or send a signal SIGINT. You can limit the number of executions or
-approximate runtime in seconds with options also.
+By default, afl-fuzz never stops fuzzing. To terminate AFL++, press Control-C or
+send a signal SIGINT. You can limit the number of executions or approximate
+runtime in seconds with options also.
 
 When you start afl-fuzz you will see a user interface that shows what the status
 is:
+
 ![resources/screenshot.png](resources/screenshot.png)
 
-All labels are explained in [status_screen.md](status_screen.md).
+All labels are explained in
+[afl-fuzz_approach.md#understanding-the-status-screen](afl-fuzz_approach.md#understanding-the-status-screen).
 
 ### b) Keeping memory use and timeouts in check
 
@@ -488,8 +498,8 @@ and not waste CPU time.
 
 ### c) Using multiple cores
 
-If you want to seriously fuzz then use as many cores/threads as possible to fuzz
-your target.
+If you want to seriously fuzz, then use as many cores/threads as possible to
+fuzz your target.
 
 On the same machine - due to the design of how AFL++ works - there is a maximum
 number of CPU cores/threads that are useful, use more and the overall
@@ -503,7 +513,7 @@ can set the cache size (in MB) by setting the environment variable
 `AFL_TESTCACHE_SIZE`.
 
 There should be one main fuzzer (`-M main-$HOSTNAME` option) and as many
-secondary fuzzers (e.g. `-S variant1`) as you have cores that you use. Every
+secondary fuzzers (e.g., `-S variant1`) as you have cores that you use. Every
 -M/-S entry needs a unique name (that can be whatever), however, the same -o
 output directory location has to be used for all instances.
 
@@ -522,14 +532,14 @@ All other secondaries should be used like this:
 * a quarter to a third with the MOpt mutator enabled: `-L 0`
 * run with a different power schedule, recommended are:
   `fast (default), explore, coe, lin, quad, exploit and rare` which you can set
-  with e.g. `-p explore`
+  with, e.g., `-p explore`
 * a few instances should use the old queue cycling with `-Z`
 
 Also, it is recommended to set `export AFL_IMPORT_FIRST=1` to load test cases
 from other fuzzers in the campaign first.
 
-If you have a large corpus, a corpus from a previous run or are fuzzing in
-a CI, then also set `export AFL_CMPLOG_ONLY_NEW=1` and `export AFL_FAST_CAL=1`.
+If you have a large corpus, a corpus from a previous run or are fuzzing in a CI,
+then also set `export AFL_CMPLOG_ONLY_NEW=1` and `export AFL_FAST_CAL=1`.
 
 You can also use different fuzzers. If you are using AFL spinoffs or AFL
 conforming fuzzers, then just use the same -o directory and give it a unique
@@ -547,17 +557,16 @@ A long list can be found at
 
 However, you can also sync AFL++ with honggfuzz, libfuzzer with `-entropic=1`,
 etc. Just show the main fuzzer (-M) with the `-F` option where the queue/work
-directory of a different fuzzer is, e.g. `-F /src/target/honggfuzz`. Using
+directory of a different fuzzer is, e.g., `-F /src/target/honggfuzz`. Using
 honggfuzz (with `-n 1` or `-n 2`) and libfuzzer in parallel is highly
 recommended!
 
 ### d) Using multiple machines for fuzzing
 
-Maybe you have more than one machine you want to fuzz the same target on.
-Simply start the `afl-fuzz` (and perhaps libfuzzer, honggfuzz, ...)
-orchestra as you like, just ensure that your have one and only one `-M`
-instance per server, and that its name is unique, hence the recommendation
-for `-M main-$HOSTNAME`.
+Maybe you have more than one machine you want to fuzz the same target on. Start
+the `afl-fuzz` (and perhaps libfuzzer, honggfuzz, ...) orchestra as you like,
+just ensure that your have one and only one `-M` instance per server, and that
+its name is unique, hence the recommendation for `-M main-$HOSTNAME`.
 
 Now there are three strategies on how you can sync between the servers:
 * never: sounds weird, but this makes every server an island and has the chance
@@ -609,14 +618,14 @@ e.g., `afl-plot out/default /srv/www/htdocs/plot`.
 
 ### f) Stopping fuzzing, restarting fuzzing, adding new seeds
 
-To stop an afl-fuzz run, simply press Control-C.
+To stop an afl-fuzz run, press Control-C.
 
 To restart an afl-fuzz run, just reuse the same command line but replace the `-i
 directory` with `-i -` or set `AFL_AUTORESUME=1`.
 
 If you want to add new seeds to a fuzzing campaign you can run a temporary
-fuzzing instance, e.g. when your main fuzzer is using `-o out` and the new seeds
-are in `newseeds/` directory:
+fuzzing instance, e.g., when your main fuzzer is using `-o out` and the new
+seeds are in `newseeds/` directory:
 
 ```
 AFL_BENCH_JUST_ONE=1 AFL_FAST_CAL=1 afl-fuzz -i newseeds -o out -S newseeds -- ./target
@@ -649,7 +658,7 @@ An "easy" helper script for this is
 [https://github.com/vanhauser-thc/afl-cov](https://github.com/vanhauser-thc/afl-cov),
 just follow the README of that separate project.
 
-If you see that an important area or a feature has not been covered so far then
+If you see that an important area or a feature has not been covered so far, then
 try to find an input that is able to reach that and start a new secondary in
 that fuzzing campaign with that seed as input, let it run for a few minutes,
 then terminate it. The main node will pick it up and make it available to the
@@ -659,15 +668,15 @@ AFL_TRY_AFFINITY=1` if you have no free core.
 Note that in nearly all cases you can never reach full coverage. A lot of
 functionality is usually dependent on exclusive options that would need
 individual fuzzing campaigns each with one of these options set. E.g., if you
-fuzz a library to convert image formats and your target is the png to tiff API
+fuzz a library to convert image formats and your target is the png to tiff API,
 then you will not touch any of the other library APIs and features.
 
 ### h) How long to fuzz a target?
 
-This is a difficult question. Basically if no new path is found for a long time
-(e.g. for a day or a week) then you can expect that your fuzzing won't be
+This is a difficult question. Basically, if no new path is found for a long time
+(e.g., for a day or a week), then you can expect that your fuzzing won't be
 fruitful anymore. However, often this just means that you should switch out
-secondaries for others, e.g. custom mutator modules, sync to very different
+secondaries for others, e.g., custom mutator modules, sync to very different
 fuzzers, etc.
 
 Keep the queue/ directory (for future fuzzings of the same or similar targets)
@@ -722,7 +731,7 @@ just for AFL++).
 Here are some of the most important caveats for AFL++:
 
 - AFL++ detects faults by checking for the first spawned process dying due to a
-  signal (SIGSEGV, SIGABRT, etc). Programs that install custom handlers for
+  signal (SIGSEGV, SIGABRT, etc.). Programs that install custom handlers for
   these signals may need to have the relevant code commented out. In the same
   vein, faults in child processes spawned by the fuzzed target may evade
   detection unless you manually add some code to catch that.
@@ -800,7 +809,6 @@ then color-codes the input based on which sections appear to be critical, and
 which are not; while not bulletproof, it can often offer quick insights into
 complex file formats.
 
-
 ## 5. CI fuzzing
 
 Some notes on CI fuzzing - this fuzzing is different to normal fuzzing campaigns