about summary refs log tree commit diff
path: root/docs
diff options
context:
space:
mode:
authorvanhauser-thc <vh@thc.org>2020-09-05 13:18:28 +0200
committervanhauser-thc <vh@thc.org>2020-09-05 13:18:28 +0200
commite30b2c6af6e369844c92c00a20ebdd53473a747c (patch)
treecc546af9c1fc321e32d5473f679e1c6a144cb62d /docs
parent4b3ad5f037ee9a36aa057bf55a69acca1f573922 (diff)
downloadafl++-e30b2c6af6e369844c92c00a20ebdd53473a747c.tar.gz
final changes for pre-3.0
Diffstat (limited to 'docs')
-rw-r--r--docs/Changelog.md14
-rw-r--r--docs/FAQ.md104
-rw-r--r--docs/INSTALL.md19
-rw-r--r--docs/env_variables.md121
-rw-r--r--docs/ideas.md57
-rw-r--r--docs/life_pro_tips.md4
-rw-r--r--docs/perf_tips.md8
-rw-r--r--docs/sister_projects.md4
-rw-r--r--docs/status_screen.md2
9 files changed, 145 insertions, 188 deletions
diff --git a/docs/Changelog.md b/docs/Changelog.md
index 6321aee4..9de03e78 100644
--- a/docs/Changelog.md
+++ b/docs/Changelog.md
@@ -9,6 +9,20 @@ Want to stay in the loop on major new features? Join our mailing list by
 sending a mail to <afl-users+subscribe@googlegroups.com>.
 
 
+### Version ++3.00a (develop)
+  - llvm_mode/ and gcc_plugin/ moved to instrumentation/
+  - all compilers combined to afl-cc which emulates the previous ones
+  - afl-llvm/gcc-rt.o merged into afl-compiler-rt.o
+  - afl-fuzz
+    - reading testcases from -i now descends into subdirectories
+    - allow up to 4 -x command line options
+    - loaded extras now have a duplicate protection
+  - instrumentation
+    - new llvm pass: dict2file via AFL_LLVM_DICT2FILE, create afl-fuzz
+      -x dictionary of string comparisons found during compilation
+    - not overriding -Ox or -fno-unroll-loops anymore
+
+
 ### Version ++2.68c (release)
   - added the GSoC excellent afl++ grammar mutator by Shengtuo to our
     custom_mutators/ (see custom_mutators/README.md) - or get it here:
diff --git a/docs/FAQ.md b/docs/FAQ.md
index 064638f4..24942492 100644
--- a/docs/FAQ.md
+++ b/docs/FAQ.md
@@ -4,11 +4,11 @@
 
   * [What is the difference between afl and afl++?](#what-is-the-difference-between-afl-and-afl)
   * [How to improve the fuzzing speed?](#how-to-improve-the-fuzzing-speed)
-  * [How do I fuzz a network service?](#how-do-i-fuzz-a-network-service)
-  * [How do I fuzz a GUI program?](#how-do-i-fuzz-a-gui-program)
+  * [How do I fuzz a network service?](#how-to-fuzz-a-network-service)
+  * [How do I fuzz a GUI program?](#how-to-fuzz-a-gui-program)
   * [What is an edge?](#what-is-an-edge)
   * [Why is my stability below 100%?](#why-is-my-stability-below-100)
-  * [How can I improve the stability value?](#how-can-i-improve-the-stability-value)
+  * [How can I improve the stability value](#how-can-i-improve-the-stability-value)
 
 If you find an interesting or important question missing, submit it via
 [https://github.com/AFLplusplus/AFLplusplus/issues](https://github.com/AFLplusplus/AFLplusplus/issues)
@@ -18,52 +18,51 @@ If you find an interesting or important question missing, submit it via
 American Fuzzy Lop (AFL) was developed by MichaƂ "lcamtuf" Zalewski starting in
 2013/2014, and when he left Google end of 2017 he stopped developing it.
 
-At the end of 2019 the Google fuzzing team took over maintenance of AFL, however
-it is only accepting PRs from the community and is not developing enhancements
+At the end of 2019 the Google fuzzing team took over maintance of AFL, however
+it is only accepting PR from the community and is not developing enhancements
 anymore.
 
-In the second quarter of 2019, 1 1/2 year later when no further development of
-AFL had happened and it became clear there would none be coming, afl++
-was born, where initially community patches were collected and applied
-for bug fixes and enhancements. Then from various AFL spin-offs - mostly academic
+In the second quarter of 2019, 1 1/2 years after no further development of
+AFL had happened and it became clear there would be none coming, afl++
+was born, where initially first community patches were collected and applied
+for bugs and enhancements. Then from various AFL spin-offs - mostly academic
 research - features were integrated. This already resulted in a much advanced
 AFL.
 
 Until the end of 2019 the afl++ team had grown to four active developers which
-then implemented their own research and features, making it now by far the most
+then implemented their own research and feature, making it now by far the most
 flexible and feature rich guided fuzzer available as open source.
 And in independent fuzzing benchmarks it is one of the best fuzzers available,
 e.g. [Fuzzbench Report](https://www.fuzzbench.com/reports/2020-08-03/index.html)
 
-## How to improve the fuzzing speed?
+## How to improve the fuzzing speed
 
-  1. Use [llvm_mode](docs/llvm_mode/README.md): afl-clang-lto (llvm >= 11) or afl-clang-fast (llvm >= 9 recommended)
-  2. Use [persistent mode](llvm_mode/README.persistent_mode.md) (x2-x20 speed increase)
+  1. use [instrumentation](docs/README.llvm.md): afl-clang-lto (llvm >= 11) or afl-clang-fast (llvm >= 9 recommended)
+  2. Use [persistent mode](instrumentation/README.persistent_mode.md) (x2-x20 speed increase)
   3. Use the [afl++ snapshot module](https://github.com/AFLplusplus/AFL-Snapshot-LKM) (x2 speed increase)
-  4. If you do not use shmem persistent mode, use `AFL_TMPDIR` to put the input file directory on a tempfs location, see [docs/env_variables.md](docs/env_variables.md)
-  5. Improve Linux kernel performance: modify `/etc/default/grub`, set `GRUB_CMDLINE_LINUX_DEFAULT="ibpb=off ibrs=off kpti=off l1tf=off mds=off mitigations=off no_stf_barrier noibpb noibrs nopcid nopti nospec_store_bypass_disable nospectre_v1 nospectre_v2 pcid=off pti=off spec_store_bypass_disable=off spectre_v2=off stf_barrier=off"`; then `update-grub` and `reboot` (warning: makes the system less secure)
+  4. If you do not use shmem persistent mode, use `AFL_TMPDIR` to point the input file on a tempfs location, see [docs/env_variables.md](docs/env_variables.md)
+  5. Improve kernel performance: modify `/etc/default/grub`, set `GRUB_CMDLINE_LINUX_DEFAULT="ibpb=off ibrs=off kpti=off l1tf=off mds=off mitigations=off no_stf_barrier noibpb noibrs nopcid nopti nospec_store_bypass_disable nospectre_v1 nospectre_v2 pcid=off pti=off spec_store_bypass_disable=off spectre_v2=off stf_barrier=off"`; then `update-grub` and `reboot` (warning: makes the system more insecure)
   6. Running on an `ext2` filesystem with `noatime` mount option will be a bit faster than on any other journaling filesystem
   7. Use your cores! [README.md:3.b) Using multiple cores/threads](../README.md#b-using-multiple-coresthreads)
 
 ## How do I fuzz a network service?
 
-The short answer is - you cannot, at least not "out of the box".
+The short answer is - you cannot, at least "out of the box".
 
-Using a network channel is inadequate for several reasons:
-- it has a slow-down of x10-20 on the fuzzing speed
-- it does not scale to fuzzing multiple instances easily,
-- instead of one initial data packet often a back-and-forth interplay of packets is needed for stateful protocols (which is totally unsupported by most coverage aware fuzzers).
+Using network has a slow-down of x10-20 on the fuzzing speed, does not scale,
+and finally usually it is more than one initial data packet but a back-and-forth
+which is totally unsupported by most coverage aware fuzzers.
 
 The established method to fuzz network services is to modify the source code
 to read from a file or stdin (fd 0) (or even faster via shared memory, combine
-this with persistent mode [llvm_mode/README.persistent_mode.md](llvm_mode/README.persistent_mode.md)
+this with persistent mode [instrumentation/README.persistent_mode.md](instrumentation/README.persistent_mode.md)
 and you have a performance gain of x10 instead of a performance loss of over
-x10 - that is a x100 difference!).
+x10 - that is a x100 difference!
 
 If modifying the source is not an option (e.g. because you only have a binary
 and perform binary fuzzing) you can also use a shared library with AFL_PRELOAD
-to emulate the network. This is also much faster than the real network would be.
-See [examples/socket_fuzzing/](../examples/socket_fuzzing/).
+to emulate the network. This is also much faster than network would be.
+See [examples/socket_fuzzing/](../examples/socket_fuzzing/)
 
 There is an outdated afl++ branch that implements networking if you are
 desperate though: [https://github.com/AFLplusplus/AFLplusplus/tree/networking](https://github.com/AFLplusplus/AFLplusplus/tree/networking) - 
@@ -74,7 +73,7 @@ which allows you to define network state with different type of data packets.
 
 If the GUI program can read the fuzz data from a file (via the command line,
 a fixed location or via an environment variable) without needing any user
-interaction then it would be suitable for fuzzing.
+interaction then then yes.
 
 Otherwise it is not possible without modifying the source code - which is a
 very good idea anyway as the GUI functionality is a huge CPU/time overhead
@@ -83,13 +82,13 @@ for the fuzzing.
 So create a new `main()` that just reads the test case and calls the
 functionality for processing the input that the GUI program is using.
 
-## What is an "edge"?
+## What is an "edge"
 
 A program contains `functions`, `functions` contain the compiled machine code.
 The compiled machine code in a `function` can be in a single or many `basic blocks`.
 A `basic block` is the largest possible number of subsequent machine code
-instructions that has exactly one entrypoint (which can be be entered by multiple other basic blocks)
-and runs linearly without branching or jumping to other addresses (except at the end).
+instructions that runs independent, meaning it does not split up to different
+locations nor is it jumped into it from a different location:
 ```
 function() {
   A:
@@ -99,7 +98,7 @@ function() {
     if (x) goto C; else goto D;
   C:
     some code
-    goto E
+    goto D
   D:
     some code
     goto B
@@ -109,7 +108,7 @@ function() {
 ```
 Every code block between two jump locations is a `basic block`.
 
-An `edge` is then the unique relationship between two directly connected `basic blocks` (from the
+An `edge` is then the unique relationship between two `basic blocks` (from the
 code example above):
 ```
               Block A
@@ -124,9 +123,8 @@ code example above):
               Block E
 ```
 Every line between two blocks is an `edge`.
-Note that a few basic block loop to itself, this too would be an edge.
 
-## Why is my stability below 100%?
+## Why is my stability below 100%
 
 Stability is measured by how many percent of the edges in the target are
 "stable". Sending the same input again and again should take the exact same
@@ -134,37 +132,37 @@ path through the target every time. If that is the case, the stability is 100%.
 
 If however randomness happens, e.g. a thread reading other external data,
 reaction to timing, etc. then in some of the re-executions with the same data
-the edge coverage result will be different accross runs.
+the result in the edge information will be different accross runs.
 Those edges that change are then flagged "unstable".
 
 The more "unstable" edges, the more difficult for afl++ to identify valid new
 paths.
 
 A value above 90% is usually fine and a value above 80% is also still ok, and
-even a value above 20% can still result in successful finds of bugs.
-However, it is recommended that for values below 90% or 80% you should take
-countermeasures to improve stability.
+even above 20% can still result in successful finds of bugs.
+However, it is recommended that below 90% or 80% you should take measures to
+improve the stability.
 
-## How can I improve the stability value?
+## How can I improve the stability value
 
-For fuzzing a 100% stable target that covers all edges is the best case.
+For fuzzing a 100% stable target that covers all edges is the best.
 A 90% stable target that covers all edges is however better than a 100% stable
 target that ignores 10% of the edges.
 
 With instability you basically have a partial coverage loss on an edge, with
-ignored functions you have a full loss on that edges.
+ignore you have a full loss on that edge.
 
 There are functions that are unstable, but also provide value to coverage, eg
 init functions that use fuzz data as input for example.
-If however a function that has nothing to do with the input data is the
-source of instability, e.g. checking jitter, or is a hash map function etc.
-then it should not be instrumented.
+If however it is a function that has nothing to do with the input data is the
+source, e.g. checking jitter, or is a hash map function etc. then it should
+not be instrumented.
 
-To be able to exclude these functions (based on AFL++'s measured stability)
-the following process will allow to identify functions with variable edges.
+To be able to make this decision the following process will allow you to
+identify the functions with variable edges so you can make this decision.
 
-Four steps are required to do this and it also requires quite some knowledge
-of coding and/or disassembly and is effectively possible only with
+Four steps are required to do this and requires quite some knowledge of
+coding and/or disassembly and it is only effectively possible with
 afl-clang-fast PCGUARD and afl-clang-lto LTO instrumentation.
 
   1. First step: Identify which edge ID numbers are unstable
@@ -173,7 +171,7 @@ afl-clang-fast PCGUARD and afl-clang-lto LTO instrumentation.
      The out/fuzzer_stats file will then show the edge IDs that were identified
      as unstable.
 
-  2. Second step: Find the responsible function(s).
+  2. Second step: Find the responsible function.
 
      a) For LTO instrumented binaries this can be documented during compile
         time, just set `export AFL_LLVM_DOCUMENT_IDS=/path/to/a/file`.
@@ -182,10 +180,10 @@ afl-clang-fast PCGUARD and afl-clang-lto LTO instrumentation.
 
      b) For PCGUARD instrumented binaries it is much more difficult. Here you
         can either modify the __sanitizer_cov_trace_pc_guard function in
-        llvm_mode/afl-llvm-rt.o.c to write a backtrace to a file if the ID in
+        instrumentation/afl-llvm-rt.o.c to write a backtrace to a file if the ID in
         __afl_area_ptr[*guard] is one of the unstable edge IDs.
         (Example code is already there).
-        Then recompile and reinstall llvm_mode and rebuild your target.
+        Then recompile and reinstall instrumentation and rebuild your target.
         Run the recompiled target with afl-fuzz for a while and then check the
         file that you wrote with the backtrace information.
         Alternatively you can use `gdb` to hook __sanitizer_cov_trace_pc_guard_init
@@ -193,20 +191,20 @@ afl-clang-fast PCGUARD and afl-clang-lto LTO instrumentation.
         and set a write breakpoint to that address (`watch 0x.....`).
 
      c) in all other instrumentation types this is not possible. So just
-        recompile with the two mentioned above. This is just for
+        recompile with the the two mentioned above. This is just for
         identifying the functions that have unstable edges.
 
   3. Third step: create a text file with the filenames/functions
 
      Identify which source code files contain the functions that you need to
      remove from instrumentation, or just specify the functions you want to
-     skip for instrumentation. Note that optimization might inline functions!
+     skip instrumenting. Note that optimization might inline functions!
 
-     Simply follow this document on how to do this: [llvm_mode/README.instrument_list.md](llvm_mode/README.instrument_list.md)
+     Simply follow this document on how to do this: [instrumentation/README.instrument_list.md](instrumentation/README.instrument_list.md)
      If PCGUARD is used, then you need to follow this guide (needs llvm 12+!):
      [http://clang.llvm.org/docs/SanitizerCoverage.html#partially-disabling-instrumentation](http://clang.llvm.org/docs/SanitizerCoverage.html#partially-disabling-instrumentation)
 
-     Only exclude those functions from instrumentation that provide no value
+     Only deny those functions from instrumentation that provide no value
      for coverage - that is if it does not process any fuzz data directly
      or indirectly (e.g. hash maps, thread management etc.).
      If however a function directly or indirectly handles fuzz data then you
diff --git a/docs/INSTALL.md b/docs/INSTALL.md
index 766f24d7..fb7b5642 100644
--- a/docs/INSTALL.md
+++ b/docs/INSTALL.md
@@ -24,7 +24,7 @@ There are no special dependencies to speak of; you will need GNU make and a
 working compiler (gcc or clang). Some of the optional scripts bundled with the
 program may depend on bash, gdb, and similar basic tools.
 
-If you are using clang, please review llvm_mode/README.md; the LLVM
+If you are using clang, please review README.llvm.md; the LLVM
 integration mode can offer substantial performance gains compared to the
 traditional approach.
 
@@ -52,10 +52,10 @@ sudo gmake install
 Keep in mind that if you are using csh as your shell, the syntax of some of the
 shell commands given in the README.md and other docs will be different.
 
-The `llvm_mode` requires a dynamically linked, fully-operational installation of
+The `llvm` requires a dynamically linked, fully-operational installation of
 clang. At least on FreeBSD, the clang binaries are static and do not include
 some of the essential tools, so if you want to make it work, you may need to
-follow the instructions in llvm_mode/README.md.
+follow the instructions in README.llvm.md.
 
 Beyond that, everything should work as advertised.
 
@@ -97,27 +97,24 @@ and definitely don't look POSIX-compliant. This means two things:
 User emulation mode of QEMU does not appear to be supported on MacOS X, so
 black-box instrumentation mode (`-Q`) will not work.
 
-The llvm_mode requires a fully-operational installation of clang. The one that
+The llvm instrumentation requires a fully-operational installation of clang. The one that
 comes with Xcode is missing some of the essential headers and helper tools.
-See llvm_mode/README.md for advice on how to build the compiler from scratch.
+See README.llvm.md for advice on how to build the compiler from scratch.
 
 ## 4. Linux or *BSD on non-x86 systems
 
 Standard build will fail on non-x86 systems, but you should be able to
 leverage two other options:
 
-  - The LLVM mode (see llvm_mode/README.md), which does not rely on
+  - The LLVM mode (see README.llvm.md), which does not rely on
     x86-specific assembly shims. It's fast and robust, but requires a
     complete installation of clang.
   - The QEMU mode (see qemu_mode/README.md), which can be also used for
     fuzzing cross-platform binaries. It's slower and more fragile, but
     can be used even when you don't have the source for the tested app.
 
-If you're not sure what you need, you need the LLVM mode. To get it, try:
-
-```bash
-AFL_NO_X86=1 gmake && gmake -C llvm_mode
-```
+If you're not sure what you need, you need the LLVM mode, which is built by
+default.
 
 ...and compile your target program with afl-clang-fast or afl-clang-fast++
 instead of the traditional afl-gcc or afl-clang wrappers.
diff --git a/docs/env_variables.md b/docs/env_variables.md
index c47d10e8..9d289f6d 100644
--- a/docs/env_variables.md
+++ b/docs/env_variables.md
@@ -5,13 +5,25 @@
   users or for some types of custom fuzzing setups. See README.md for the general
   instruction manual.
 
-## 1) Settings for afl-gcc, afl-clang, and afl-as - and gcc_plugin afl-gcc-fast
+## 1) Settings for all compilers
 
-Because they can't directly accept command-line options, the compile-time
-tools make fairly broad use of environmental variables:
+Starting with afl++ 3.0 there is only one compiler: afl-cc
+To select the different instrumentation modes this can be done by
+  1. passing --afl-MODE command line options to the compiler
+  2. use a symlink to afl-cc: afl-gcc, afl-g++, afl-clang, afl-clang++,
+     afl-clang-fast, afl-clang-fast++, afl-clang-lto, afl-clang-lto++,
+     afl-gcc-fast, afl-g++-fast
+  3. using the environment variable AFL_CC_COMPILER with MODE
 
-  - Most afl tools do not print any output if stdout/stderr are redirected.
-    If you want to save the output in a file then set the AFL_DEBUG
+MODE can one of LTO (afl-clang-lto*), LLVM (afl-clang-fast*), GCC_PLUGIN
+(afl-g*-fast) or GCC (afl-gcc/afl-g++).
+
+Because beside the --afl-MODE command no afl specific command-line options
+are accepted, the compile-time tools make fairly broad use of environmental
+variables:
+
+  - Most afl tools do not print any ouput if stout/stderr are redirected.
+    If you want to have the output into a file then set the AFL_DEBUG
     environment variable.
     This is sadly necessary for various build processes which fail otherwise.
 
@@ -24,6 +36,8 @@ tools make fairly broad use of environmental variables:
     will cause problems in programs built with -Werror, simply because -O3
     enables more thorough code analysis and can spew out additional warnings.
     To disable optimizations, set AFL_DONT_OPTIMIZE.
+    However if -O... and/or -fno-unroll-loops are set, these are not
+    overriden.
 
   - Setting AFL_USE_ASAN automatically enables ASAN, provided that your
     compiler supports that. Note that fuzzing with ASAN is mildly challenging
@@ -44,7 +58,7 @@ tools make fairly broad use of environmental variables:
     you instrument hand-written assembly when compiling clang code by plugging
     a normalizer into the chain. (There is no equivalent feature for GCC.)
 
-  - Setting AFL_INST_RATIO to a percentage between 0% and 100% controls the
+  - Setting AFL_INST_RATIO to a percentage between 0 and 100% controls the
     probability of instrumenting every branch. This is (very rarely) useful
     when dealing with exceptionally complex programs that saturate the output
     bitmap. Examples include v8, ffmpeg, and perl.
@@ -55,19 +69,16 @@ tools make fairly broad use of environmental variables:
     Setting AFL_INST_RATIO to 0 is a valid choice. This will instrument only
     the transitions between function entry points, but not individual branches.
 
+    Note that this is an outdated variable. A few instances (e.g. afl-gcc)
+    still support these, but state-of-the-art (e.g. LLVM LTO and LLVM PCGUARD)
+    do not need this.
+
   - AFL_NO_BUILTIN causes the compiler to generate code suitable for use with
     libtokencap.so (but perhaps running a bit slower than without the flag).
 
   - TMPDIR is used by afl-as for temporary files; if this variable is not set,
     the tool defaults to /tmp.
 
-  - Setting AFL_KEEP_ASSEMBLY prevents afl-as from deleting instrumented
-    assembly files. Useful for troubleshooting problems or understanding how
-    the tool works. To get them in a predictable place, try something like:
-
-    mkdir assembly_here
-    TMPDIR=$PWD/assembly_here AFL_KEEP_ASSEMBLY=1 make clean all
-
   - If you are a weird person that wants to compile and instrument asm
     text files then use the AFL_AS_FORCE_INSTRUMENT variable:
       AFL_AS_FORCE_INSTRUMENT=1 afl-gcc foo.s -o foo
@@ -78,19 +89,24 @@ tools make fairly broad use of environmental variables:
   - Setting AFL_CAL_FAST will speed up the initial calibration, if the
     application is very slow
 
-## 2) Settings for afl-clang-fast / afl-clang-fast++ / afl-gcc-fast / afl-g++-fast
+## 2) Settings for LLVM and LTO: afl-clang-fast / afl-clang-fast++ / afl-clang-lto / afl-clang-lto++
 
-The native instrumentation helpers (llvm_mode and gcc_plugin) accept a subset
+The native instrumentation helpers (instrumentation and gcc_plugin) accept a subset
 of the settings discussed in section #1, with the exception of:
 
+  - LLVM modes support `AFL_LLVM_DICT2FILE=/absolute/path/file.txt` which will
+    write all constant string comparisons  to this file to be used with
+    afl-fuzz' `-x` option.
+
   - AFL_AS, since this toolchain does not directly invoke GNU as.
 
   - TMPDIR and AFL_KEEP_ASSEMBLY, since no temporary assembly files are
     created.
 
-  - AFL_INST_RATIO, as we by default use collision free instrumentation.
+  - AFL_INST_RATIO, as we by default collision free instrumentation is used.
+    Not all passes support this option though as it is an outdated feature.
 
-Then there are a few specific features that are only available in llvm_mode:
+Then there are a few specific features that are only available in instrumentation:
 
 ### Select the instrumentation mode
 
@@ -121,7 +137,7 @@ Then there are a few specific features that are only available in llvm_mode:
 
     None of the following options are necessary to be used and are rather for
     manual use (which only ever the author of this LTO implementation will use).
-    These are used if several seperated instrumentations are performed which
+    These are used if several seperated instrumentation are performed which
     are then later combined.
 
    - AFL_LLVM_DOCUMENT_IDS=file will document to a file which edge ID was given
@@ -136,7 +152,7 @@ Then there are a few specific features that are only available in llvm_mode:
    - AFL_LLVM_LTO_DONTWRITEID prevents that the highest location ID written
      into the instrumentation is set in a global variable
 
-    See llvm_mode/README.LTO.md for more information.
+    See instrumentation/README.LTO.md for more information.
 
 ### INSTRIM
 
@@ -154,7 +170,7 @@ Then there are a few specific features that are only available in llvm_mode:
       afl-fuzz will only be able to see the path the loop took, but not how
       many times it was called (unless it is a complex loop).
 
-    See llvm_mode/README.instrim.md
+    See instrumentation/README.instrim.md
 
 ### NGRAM
 
@@ -165,7 +181,7 @@ Then there are a few specific features that are only available in llvm_mode:
       config.h to at least 18 and maybe up to 20 for this as otherwise too
       many map collisions occur.
 
-    See llvm_mode/README.ctx.md
+    See instrumentation/README.ctx.md
 
 ### CTX
 
@@ -176,7 +192,7 @@ Then there are a few specific features that are only available in llvm_mode:
       config.h to at least 18 and maybe up to 20 for this as otherwise too
       many map collisions occur.
 
-    See llvm_mode/README.ngram.md
+    See instrumentation/README.ngram.md
 
 ### LAF-INTEL
 
@@ -196,17 +212,17 @@ Then there are a few specific features that are only available in llvm_mode:
 
     - Setting AFL_LLVM_LAF_ALL sets all of the above
 
-    See llvm_mode/README.laf-intel.md for more information.
+    See instrumentation/README.laf-intel.md for more information.
 
 ### INSTRUMENT LIST (selectively instrument files and functions)
 
-    This feature allows selective instrumentation of the source
+    This feature allows selectively instrumentation of the source
 
     - Setting AFL_LLVM_ALLOWLIST or AFL_LLVM_DENYLIST with a filenames and/or
       function will only instrument (or skip) those files that match the names
       listed in the specified file.
 
-    See llvm_mode/README.instrument_list.md for more information.
+    See instrumentation/README.instrument_list.md for more information.
 
 ### NOT_ZERO
 
@@ -220,27 +236,34 @@ Then there are a few specific features that are only available in llvm_mode:
       test. If the target performs only few loops then this will give a
       small performance boost.
 
-    See llvm_mode/README.neverzero.md
+    See instrumentation/README.neverzero.md
 
 ### CMPLOG
 
     - Setting AFL_LLVM_CMPLOG=1 during compilation will tell afl-clang-fast to
-      produce a CmpLog binary. See llvm_mode/README.cmplog.md
+      produce a CmpLog binary. See instrumentation/README.cmplog.md
 
-    See llvm_mode/README.neverzero.md
+    See instrumentation/README.neverzero.md
 
-Then there are a few specific features that are only available in the gcc_plugin:
+## 3) Settings for GCC / GCC_PLUGIN modes
 
-### INSTRUMENT_FILE
+Then there are a few specific features that are only available in GCC and
+GCC_PLUGIN mode.
 
-    This feature allows selective instrumentation of the source
+  - Setting AFL_KEEP_ASSEMBLY prevents afl-as from deleting instrumented
+    assembly files. Useful for troubleshooting problems or understanding how
+    the tool works. (GCC mode only)
+    To get them in a predictable place, try something like:
 
-    - Setting AFL_GCC_INSTRUMENT_FILE with a filename will only instrument those
-      files that match the names listed in this file (one filename per line).
+    mkdir assembly_here
+    TMPDIR=$PWD/assembly_here AFL_KEEP_ASSEMBLY=1 make clean all
 
+  - Setting AFL_GCC_INSTRUMENT_FILE with a filename will only instrument those
+    files that match the names listed in this file (one filename per line).
     See gcc_plugin/README.instrument_list.md for more information.
+    (GCC_PLUGIN mode only)
 
-## 3) Settings for afl-fuzz
+## 4) Settings for afl-fuzz
 
 The main fuzzer binary accepts several options that disable a couple of sanity
 checks or alter some of the more exotic semantics of the tool:
@@ -278,14 +301,6 @@ checks or alter some of the more exotic semantics of the tool:
     don't want AFL to spend too much time classifying that stuff and just
     rapidly put all timeouts in that bin.
 
-  - Setting AFL_FORKSRV_INIT_TMOUT allows yout to specify a different timeout
-    to wait for the forkserver to spin up. The default is the `-t` value times
-    `FORK_WAIT_MULT` from `config.h` (usually 10), so for a `-t 100`, the
-    default would wait `1000` milis. Setting a different time here is useful
-    if the target has a very slow startup time, for example when doing
-    full-system fuzzing or emulation, but you don't want the actual runs
-    to wait too long for timeouts.
-
   - AFL_NO_ARITH causes AFL to skip most of the deterministic arithmetics.
     This can be useful to speed up the fuzzing of text-based file formats.
 
@@ -377,22 +392,12 @@ checks or alter some of the more exotic semantics of the tool:
     Note that this setting inhibits some of the user-friendly diagnostics
     normally done when starting up the forkserver and causes a pretty
     significant performance drop.
-  
-  - Setting AFL_MAX_DET_EXTRAS changes the count of dictionary entries/extras
-    (default 200), after which the entries will be used probabilistically.
-    So, if the dict/extras file (`-x`) contains more tokens than this threshold,
-    not all of the tokens will be used in each fuzzing step, every time.
-    Instead, there is a chance that the entry will be skipped during fuzzing.
-    This makes sure that the fuzzer doesn't spend all its time only inserting
-    the extras, but will still do other mutations. However, it decreases the
-    likelihood for each token to be inserted, before the next queue entry is fuzzed.
-    Either way, all tokens will be used eventually, in a longer fuzzing campaign.
 
   - Outdated environment variables that are that not supported anymore:
     AFL_DEFER_FORKSRV
     AFL_PERSISTENT
 
-## 4) Settings for afl-qemu-trace
+## 5) Settings for afl-qemu-trace
 
 The QEMU wrapper used to instrument binary-only code supports several settings:
 
@@ -446,7 +451,7 @@ The QEMU wrapper used to instrument binary-only code supports several settings:
     stack pointer in which QEMU can find the return address when `start addr` is
     hitted.
 
-## 5) Settings for afl-cmin
+## 6) Settings for afl-cmin
 
 The corpus minimization script offers very little customization:
 
@@ -472,12 +477,12 @@ to match when minimizing crashes. This will make minimization less useful, but
 may prevent the tool from "jumping" from one crashing condition to another in
 very buggy software. You probably want to combine it with the -e flag.
 
-## 7) Settings for afl-analyze
+## 8) Settings for afl-analyze
 
 You can set AFL_ANALYZE_HEX to get file offsets printed as hexadecimal instead
 of decimal.
 
-## 8) Settings for libdislocator
+## 9) Settings for libdislocator
 
 The library honors these environmental variables:
 
@@ -499,12 +504,12 @@ The library honors these environmental variables:
   - AFL_ALIGNED_ALLOC=1 will force the alignment of the allocation size to
     max_align_t to be compliant with the C standard.
 
-## 9) Settings for libtokencap
+## 10) Settings for libtokencap
 
 This library accepts AFL_TOKEN_FILE to indicate the location to which the
 discovered tokens should be written.
 
-## 10) Third-party variables set by afl-fuzz & other tools
+## 11) Third-party variables set by afl-fuzz & other tools
 
 Several variables are not directly interpreted by afl-fuzz, but are set to
 optimal values if not already present in the environment:
diff --git a/docs/ideas.md b/docs/ideas.md
index 65e2e8e6..a5d40963 100644
--- a/docs/ideas.md
+++ b/docs/ideas.md
@@ -3,49 +3,6 @@
 In the following, we describe a variety of ideas that could be implemented
 for future AFL++ versions.
 
-For GSOC2020 interested students please see
-[https://github.com/AFLplusplus/AFLplusplus/issues/208](https://github.com/AFLplusplus/AFLplusplus/issues/208)
-
-## Flexible Grammar Mutator (currently in development)
-
-Currently, AFL++'s mutation does not have deeper knowledge about the fuzzed
-binary, apart from feedback, even though the developer may have insights
-about the target.
-
-A developer may choose to provide dictionaries and implement own mutations
-in python or C, but an easy mutator that behaves according to a given grammar,
-does not exist.
-
-State-of-the-art research on grammar fuzzing has some problems in their
-implementations like code quality, scalability, or ease of use and other
-common issues of the academic code.
-
-We aim to develop a pluggable grammar mutator for afl++ that combines
-various results.
-
-Mentor: andreafioraldi 
-
-## perf-fuzz Linux Kernel Module
-
-Expand on [snapshot LKM](https://github.com/AFLplusplus/AFL-Snapshot-LKM)
-To make it thread safe, can snapshot several processes at once and increase
-overall performance.
-
-Mentor: any
-
-## QEMU 5-based Instrumentation
-
-First tests to use QEMU 4 for binary-only AFL++ showed that caching behavior
-changed, which vastly decreases fuzzing speeds.
-
-In this task test if QEMU 5 performs better and port the afl++ QEMU 3.1
-patches to QEMU 5.
-
-Understanding the current instrumentation and fixing the current caching
-issues will be needed.
-
-Mentor: andreafioraldi
-
 ## WASM Instrumentation
 
 Currently, AFL++ can be used for source code fuzzing and traditional binaries.
@@ -66,20 +23,6 @@ Either improve a single mutator thorugh learning of many different bugs
 
 Mentor: domenukk
 
-## Reengineer `afl-fuzz` as Thread Safe, Embeddable Library (currently in development)
-
-Right now, afl-fuzz is single threaded, cannot safely be embedded in tools,
-and not multi-threaded. It makes use of a large number of globals, must always
-be the parent process and exec child processes. 
-Instead, afl-fuzz could be refactored to contain no global state and globals.
-This allows for different use cases that could be implemented during this
-project.
-Note that in the mean time a lot has happened here already, but e.g. making
-it all work and implement multithreading in afl-fuzz ... there is still quite
-some work to do.
-
-Mentor: hexcoder- or vanhauser-thc
-
 ## Collision-free Binary-Only Maps
 
 AFL++ supports collison-free maps using an LTO (link-time-optimization) pass.
diff --git a/docs/life_pro_tips.md b/docs/life_pro_tips.md
index a5bd7286..0004c297 100644
--- a/docs/life_pro_tips.md
+++ b/docs/life_pro_tips.md
@@ -30,10 +30,10 @@ Check out the `fuzzer_stats` file in the AFL output dir or try `afl-whatsup`.
 It could be important - consult docs/status_screen.md right away!
 
 ## Know your target? Convert it to persistent mode for a huge performance gain!
-Consult section #5 in llvm_mode/README.md for tips.
+Consult section #5 in README.llvm.md for tips.
 
 ## Using clang? 
-Check out llvm_mode/ for a faster alternative to afl-gcc!
+Check out instrumentation/ for a faster alternative to afl-gcc!
 
 ## Did you know that AFL can fuzz closed-source or cross-platform binaries?
 Check out qemu_mode/README.md and unicorn_mode/README.md for more.
diff --git a/docs/perf_tips.md b/docs/perf_tips.md
index 731dc238..fbcb4d8d 100644
--- a/docs/perf_tips.md
+++ b/docs/perf_tips.md
@@ -51,7 +51,7 @@ a file.
 ## 3. Use LLVM instrumentation
 
 When fuzzing slow targets, you can gain 20-100% performance improvement by
-using the LLVM-based instrumentation mode described in [the llvm_mode README](../llvm_mode/README.md).
+using the LLVM-based instrumentation mode described in [the instrumentation README](../instrumentation/README.llvm.md).
 Note that this mode requires the use of clang and will not work with GCC.
 
 The LLVM mode also offers a "persistent", in-process fuzzing mode that can
@@ -62,12 +62,12 @@ modes require you to edit the source code of the fuzzed program, but the
 changes often amount to just strategically placing a single line or two.
 
 If there are important data comparisons performed (e.g. `strcmp(ptr, MAGIC_HDR)`)
-then using laf-intel (see llvm_mode/README.laf-intel.md) will help `afl-fuzz` a lot
+then using laf-intel (see instrumentation/README.laf-intel.md) will help `afl-fuzz` a lot
 to get to the important parts in the code.
 
 If you are only interested in specific parts of the code being fuzzed, you can
 instrument_files the files that are actually relevant. This improves the speed and
-accuracy of afl. See llvm_mode/README.instrument_list.md
+accuracy of afl. See instrumentation/README.instrument_list.md
 
 Also use the InsTrim mode on larger binaries, this improves performance and
 coverage a lot.
@@ -110,7 +110,7 @@ e.g.:
   https://launchpad.net/libeatmydata
 
 In programs that are slow due to unavoidable initialization overhead, you may
-want to try the LLVM deferred forkserver mode (see llvm_mode/README.md),
+want to try the LLVM deferred forkserver mode (see README.llvm.md),
 which can give you speed gains up to 10x, as mentioned above.
 
 Last but not least, if you are using ASAN and the performance is unacceptable,
diff --git a/docs/sister_projects.md b/docs/sister_projects.md
index a501ecbd..640e59f7 100644
--- a/docs/sister_projects.md
+++ b/docs/sister_projects.md
@@ -52,7 +52,7 @@ options.
 Provides an evolutionary instrumentation-guided fuzzing harness that allows
 some programs to be fuzzed without the fork / execve overhead. (Similar
 functionality is now available as the "persistent" feature described in
-[the llvm_mode readme](../llvm_mode/README.md))
+[the llvm_mode readme](../instrumentation/README.llvm.md))
 
 http://llvm.org/docs/LibFuzzer.html
 
@@ -245,7 +245,7 @@ https://code.google.com/p/address-sanitizer/wiki/AsanCoverage#Coverage_counters
 ### AFL JS (Han Choongwoo)
 
 One-off optimizations to speed up the fuzzing of JavaScriptCore (now likely
-superseded by LLVM deferred forkserver init - see llvm_mode/README.md).
+superseded by LLVM deferred forkserver init - see README.llvm.md).
 
 https://github.com/tunz/afl-fuzz-js
 
diff --git a/docs/status_screen.md b/docs/status_screen.md
index b89468ce..2eeb8f3f 100644
--- a/docs/status_screen.md
+++ b/docs/status_screen.md
@@ -324,7 +324,7 @@ there are several things to look at:
   - Multiple threads executing at once in semi-random order. This is harmless
     when the 'stability' metric stays over 90% or so, but can become an issue
     if not. Here's what to try:
-    * Use afl-clang-fast from [llvm_mode](../llvm_mode/) - it uses a thread-local tracking
+    * Use afl-clang-fast from [instrumentation](../instrumentation/) - it uses a thread-local tracking
       model that is less prone to concurrency issues,
     * See if the target can be compiled or run without threads. Common
       `./configure` options include `--without-threads`, `--disable-pthreads`, or