From 4e3fec2666c3d317db275f4af8875b56009621e1 Mon Sep 17 00:00:00 2001
From: Stefan Nagy <snagy2@vt.edu>
Date: Wed, 20 Oct 2021 17:09:18 -0400
Subject: Update binaryonly_fuzzing.md with zafl

---
 docs/binaryonly_fuzzing.md | 23 +++++++++++++++++------
 1 file changed, 17 insertions(+), 6 deletions(-)

(limited to 'docs/binaryonly_fuzzing.md')

diff --git a/docs/binaryonly_fuzzing.md b/docs/binaryonly_fuzzing.md
index 90ea3b66..903afb70 100644
--- a/docs/binaryonly_fuzzing.md
+++ b/docs/binaryonly_fuzzing.md
@@ -95,13 +95,28 @@
    utils/afl_untracer/, use afl-untracer.c as a template.
    It is slower than AFL FRIDA (see above).
 
+## ZAFL
+  ZAFL is a static rewriting platform for fast, space-efficient, and inlined 
+  binary fuzzing instrumentation. It currently supports x86-64 C and C++, 
+  stripped and unstripped, and PIE and non-PIE binaries of all sizes and complexity. 
+  
+  Beyond conventional instrumentation, ZAFL's API enables transformation passes 
+  for more effective/efficient fuzzing. Some built-in transformations include 
+  laf-Intel-style constraint unrolling, Angora-style context sensitivity, and 
+  InsTrim-style CFG optimizations.
+  
+  ZAFL's baseline instrumentation speed averages about 90-95% that of afl-clang-fast's 
+  conventional LLVM instrumentation (but is even faster when enabling CFG optimizations).
+
+  [https://git.zephyr-software.com/opensrc/zafl](https://git.zephyr-software.com/opensrc/zafl)
+
 
 ## DYNINST
 
   Dyninst is a binary instrumentation framework similar to Pintool and
   Dynamorio (see far below). However whereas Pintool and Dynamorio work at
   runtime, dyninst instruments the target at load time, and then let it run -
-  or save the  binary with the changes.
+  or save the binary with the changes.
   This is great for some things, e.g. fuzzing, and not so effective for others,
   e.g. malware analysis.
 
@@ -116,13 +131,10 @@
   The speed decrease is about 15-35%, depending on the optimization options
   used with afl-dyninst.
 
-  So if Dyninst works, it is the best option available. Otherwise it just
-  doesn't work well.
-
   [https://github.com/vanhauser-thc/afl-dyninst](https://github.com/vanhauser-thc/afl-dyninst)
 
 
-## RETROWRITE, ZAFL, ... other binary rewriter
+## RETROWRITE
 
   If you have an x86/x86_64 binary that still has its symbols, is compiled
   with position independant code (PIC/PIE) and does not use most of the C++
@@ -131,7 +143,6 @@
 
   It is at about 80-85% performance.
 
-  [https://git.zephyr-software.com/opensrc/zafl](https://git.zephyr-software.com/opensrc/zafl)
   [https://github.com/HexHive/retrowrite](https://github.com/HexHive/retrowrite)
 
 
-- 
cgit 1.4.1


From e637ca216e4559960feec6b7f887571efde4f0ba Mon Sep 17 00:00:00 2001
From: Stefan Nagy <snagy2@vt.edu>
Date: Thu, 21 Oct 2021 04:52:38 -0400
Subject: Tidy-up zafl info

---
 docs/binaryonly_fuzzing.md | 17 ++++++-----------
 1 file changed, 6 insertions(+), 11 deletions(-)

(limited to 'docs/binaryonly_fuzzing.md')

diff --git a/docs/binaryonly_fuzzing.md b/docs/binaryonly_fuzzing.md
index 903afb70..de360543 100644
--- a/docs/binaryonly_fuzzing.md
+++ b/docs/binaryonly_fuzzing.md
@@ -95,18 +95,13 @@
    utils/afl_untracer/, use afl-untracer.c as a template.
    It is slower than AFL FRIDA (see above).
 
+
 ## ZAFL
-  ZAFL is a static rewriting platform for fast, space-efficient, and inlined 
-  binary fuzzing instrumentation. It currently supports x86-64 C and C++, 
-  stripped and unstripped, and PIE and non-PIE binaries of all sizes and complexity. 
-  
-  Beyond conventional instrumentation, ZAFL's API enables transformation passes 
-  for more effective/efficient fuzzing. Some built-in transformations include 
-  laf-Intel-style constraint unrolling, Angora-style context sensitivity, and 
-  InsTrim-style CFG optimizations.
-  
-  ZAFL's baseline instrumentation speed averages about 90-95% that of afl-clang-fast's 
-  conventional LLVM instrumentation (but is even faster when enabling CFG optimizations).
+  ZAFL is a static rewriting platform supporting x86-64 C/C++, stripped/unstripped, 
+  and PIE/non-PIE binaries. Beyond conventional instrumentation, ZAFL's API enables 
+  transformation passes (e.g., laf-Intel, context sensitivity, InsTrim, etc.).
+
+  Its baseline instrumentation speed typically averages 90-95% of afl-clang-fast's.
 
   [https://git.zephyr-software.com/opensrc/zafl](https://git.zephyr-software.com/opensrc/zafl)
 
-- 
cgit 1.4.1


From b659be15494011184694a35ce02927f743fe0518 Mon Sep 17 00:00:00 2001
From: vanhauser-thc <vh@thc.org>
Date: Tue, 16 Nov 2021 13:54:31 +0100
Subject: add coresight to docs

---
 docs/binaryonly_fuzzing.md | 10 +++-------
 docs/features.md           | 31 +++++++++++++++++--------------
 2 files changed, 20 insertions(+), 21 deletions(-)

(limited to 'docs/binaryonly_fuzzing.md')

diff --git a/docs/binaryonly_fuzzing.md b/docs/binaryonly_fuzzing.md
index de360543..2c0872cf 100644
--- a/docs/binaryonly_fuzzing.md
+++ b/docs/binaryonly_fuzzing.md
@@ -175,13 +175,9 @@
 ## CORESIGHT
 
   Coresight is ARM's answer to Intel's PT.
-  There is no implementation so far which handles coresight and getting
-  it working on an ARM Linux is very difficult due to custom kernel building
-  on embedded systems is difficult. And finding one that has coresight in
-  the ARM chip is difficult too.
-  My guess is that it is slower than Qemu, but faster than Intel PT.
-
-  If anyone finds any coresight implementation for AFL please ping me: vh@thc.org
+  With afl++ v3.15 there is a coresight tracer implementation available in
+  `coresight_mode/` which is faster than QEMU, however can not run in parallel.
+  Currently only one process can be traced, it is WIP.
 
 
 ## PIN & DYNAMORIO
diff --git a/docs/features.md b/docs/features.md
index c0956703..f44e32ff 100644
--- a/docs/features.md
+++ b/docs/features.md
@@ -4,20 +4,20 @@
   with laf-intel and redqueen, frida mode, unicorn mode, gcc plugin, full *BSD,
   Mac OS, Solaris and Android support and much, much, much more.
 
-  | Feature/Instrumentation  | afl-gcc | llvm      | gcc_plugin | frida_mode       | qemu_mode        |unicorn_mode      |
-  | -------------------------|:-------:|:---------:|:----------:|:----------------:|:----------------:|:----------------:|
-  | Threadsafe counters      |         |     x(3)  |            |                  |                  |                  |
-  | NeverZero                | x86[_64]|     x(1)  |     x      |         x        |         x        |         x        |
-  | Persistent Mode          |         |     x     |     x      | x86[_64]/arm64   | x86[_64]/arm[64] |         x        |
-  | LAF-Intel / CompCov      |         |     x     |            |                  | x86[_64]/arm[64] | x86[_64]/arm[64] |
-  | CmpLog                   |         |     x     |            | x86[_64]/arm64   | x86[_64]/arm[64] |                  |
-  | Selective Instrumentation|         |     x     |     x      |         x        |         x        |                  |
-  | Non-Colliding Coverage   |         |     x(4)  |            |                  |        (x)(5)    |                  |
-  | Ngram prev_loc Coverage  |         |     x(6)  |            |                  |                  |                  |
-  | Context Coverage         |         |     x(6)  |            |                  |                  |                  |
-  | Auto Dictionary          |         |     x(7)  |            |                  |                  |                  |
-  | Snapshot LKM Support     |         |    (x)(8) |    (x)(8)  |                  |        (x)(5)    |                  |
-  | Shared Memory Testcases  |         |     x     |     x      | x86[_64]/arm64   |         x        |         x        |
+  | Feature/Instrumentation  | afl-gcc | llvm      | gcc_plugin | frida_mode(9)    | qemu_mode(10)    |unicorn_mode(10)  |coresight_mode(11)|
+  | -------------------------|:-------:|:---------:|:----------:|:----------------:|:----------------:|:----------------:|:----------------:|
+  | Threadsafe counters      |         |     x(3)  |            |                  |                  |                  |                  |
+  | NeverZero                | x86[_64]|     x(1)  |     x      |         x        |         x        |         x        |                  |
+  | Persistent Mode          |         |     x     |     x      | x86[_64]/arm64   | x86[_64]/arm[64] |         x        |                  |
+  | LAF-Intel / CompCov      |         |     x     |            |                  | x86[_64]/arm[64] | x86[_64]/arm[64] |                  |
+  | CmpLog                   |         |     x     |            | x86[_64]/arm64   | x86[_64]/arm[64] |                  |                  |
+  | Selective Instrumentation|         |     x     |     x      |         x        |         x        |                  |                  |
+  | Non-Colliding Coverage   |         |     x(4)  |            |                  |        (x)(5)    |                  |                  |
+  | Ngram prev_loc Coverage  |         |     x(6)  |            |                  |                  |                  |                  |
+  | Context Coverage         |         |     x(6)  |            |                  |                  |                  |                  |
+  | Auto Dictionary          |         |     x(7)  |            |                  |                  |                  |                  |
+  | Snapshot LKM Support     |         |    (x)(8) |    (x)(8)  |                  |        (x)(5)    |                  |                  |
+  | Shared Memory Testcases  |         |     x     |     x      | x86[_64]/arm64   |         x        |         x        |                  |
 
   1. default for LLVM >= 9.0, env var for older version due an efficiency bug in previous llvm versions
   2. GCC creates non-performant code, hence it is disabled in gcc_plugin
@@ -27,6 +27,9 @@
   6. not compatible with LTO instrumentation and needs at least LLVM v4.1
   7. automatic in LTO mode with LLVM 11 and newer, an extra pass for all LLVM versions that write to a file to use with afl-fuzz' `-x`
   8. the snapshot LKM is currently unmaintained due to too many kernel changes coming too fast :-(
+  9. frida mode is supported on Linux and MacOS for Intel and ARM
+ 10. QEMU/Unicorn is only supported on Linux
+ 11. Coresight mode is only available on AARCH64 Linux with a CPU with Coresight extension
 
   Among others, the following features and patches have been integrated:
 
-- 
cgit 1.4.1


From 36514a2e4facfc9b9c1184259cb99a1c0d0ec2df Mon Sep 17 00:00:00 2001
From: llzmb <46303940+llzmb@users.noreply.github.com>
Date: Sun, 21 Nov 2021 15:42:46 +0100
Subject: Merge "binaryonly_fuzzing.md" into "fuzzing_binary-only_targets.md"

---
 docs/binaryonly_fuzzing.md          | 225 ----------------------------
 docs/fuzzing_binary-only_targets.md | 289 ++++++++++++++++++++++++++++++------
 2 files changed, 244 insertions(+), 270 deletions(-)
 delete mode 100644 docs/binaryonly_fuzzing.md

(limited to 'docs/binaryonly_fuzzing.md')

diff --git a/docs/binaryonly_fuzzing.md b/docs/binaryonly_fuzzing.md
deleted file mode 100644
index 2c0872cf..00000000
--- a/docs/binaryonly_fuzzing.md
+++ /dev/null
@@ -1,225 +0,0 @@
-# Fuzzing binary-only programs with AFL++
-
-  AFL++, libfuzzer and others are great if you have the source code, and
-  it allows for very fast and coverage guided fuzzing.
-
-  However, if there is only the binary program and no source code available,
-  then standard `afl-fuzz -n` (non-instrumented mode) is not effective.
-
-  The following is a description of how these binaries can be fuzzed with AFL++.
-
-
-## TL;DR:
-
-  qemu_mode in persistent mode is the fastest - if the stability is
-  high enough. Otherwise try retrowrite, afl-dyninst and if these
-  fail too then try standard qemu_mode with AFL_ENTRYPOINT to where you need it.
-
-  If your target is a library use utils/afl_frida/.
-
-  If your target is non-linux then use unicorn_mode/.
-
-
-## QEMU
-
-  Qemu is the "native" solution to the program.
-  It is available in the ./qemu_mode/ directory and once compiled it can
-  be accessed by the afl-fuzz -Q command line option.
-  It is the easiest to use alternative and even works for cross-platform binaries.
-
-  The speed decrease is at about 50%.
-  However various options exist to increase the speed:
-   - using AFL_ENTRYPOINT to move the forkserver entry to a later basic block in
-     the binary (+5-10% speed)
-   - using persistent mode [qemu_mode/README.persistent.md](../qemu_mode/README.persistent.md)
-     this will result in 150-300% overall speed increase - so 3-8x the original
-     qemu_mode speed!
-   - using AFL_CODE_START/AFL_CODE_END to only instrument specific parts
-
-  Note that there is also honggfuzz: [https://github.com/google/honggfuzz](https://github.com/google/honggfuzz)
-  which now has a qemu_mode, but its performance is just 1.5% ...
-
-  As it is included in AFL++ this needs no URL.
-
-  If you like to code a customized fuzzer without much work, we highly
-  recommend to check out our sister project libafl which will support QEMU
-  too:
-  [https://github.com/AFLplusplus/LibAFL](https://github.com/AFLplusplus/LibAFL)
-
-
-## AFL FRIDA
-
-  In frida_mode you can fuzz binary-only targets easily like with QEMU,
-  with the advantage that frida_mode also works on MacOS (both intel and M1).
-
-  If you want to fuzz a binary-only library then you can fuzz it with
-  frida-gum via utils/afl_frida/, you will have to write a harness to
-  call the target function in the library, use afl-frida.c as a template.
-
-  Both come with AFL++ so this needs no URL.
-
-  You can also perform remote fuzzing with frida, e.g. if you want to fuzz
-  on iPhone or Android devices, for this you can use
-  [https://github.com/ttdennis/fpicker/](https://github.com/ttdennis/fpicker/)
-  as an intermediate that uses AFL++ for fuzzing.
-
-  If you like to code a customized fuzzer without much work, we highly
-  recommend to check out our sister project libafl which supports Frida too:
-  [https://github.com/AFLplusplus/LibAFL](https://github.com/AFLplusplus/LibAFL)
-  Working examples already exist :-)
-
-
-## WINE+QEMU
-
-  Wine mode can run Win32 PE binaries with the QEMU instrumentation.
-  It needs Wine, python3 and the pefile python package installed.
-
-  As it is included in AFL++ this needs no URL.
-
-
-## UNICORN
-
-  Unicorn is a fork of QEMU. The instrumentation is, therefore, very similar.
-  In contrast to QEMU, Unicorn does not offer a full system or even userland
-  emulation. Runtime environment and/or loaders have to be written from scratch,
-  if needed. On top, block chaining has been removed. This means the speed boost
-  introduced in  the patched QEMU Mode of AFL++ cannot simply be ported over to
-  Unicorn. For further information, check out [unicorn_mode/README.md](../unicorn_mode/README.md).
-
-  As it is included in AFL++ this needs no URL.
-
-
-## AFL UNTRACER
-
-   If you want to fuzz a binary-only shared library then you can fuzz it with
-   utils/afl_untracer/, use afl-untracer.c as a template.
-   It is slower than AFL FRIDA (see above).
-
-
-## ZAFL
-  ZAFL is a static rewriting platform supporting x86-64 C/C++, stripped/unstripped, 
-  and PIE/non-PIE binaries. Beyond conventional instrumentation, ZAFL's API enables 
-  transformation passes (e.g., laf-Intel, context sensitivity, InsTrim, etc.).
-
-  Its baseline instrumentation speed typically averages 90-95% of afl-clang-fast's.
-
-  [https://git.zephyr-software.com/opensrc/zafl](https://git.zephyr-software.com/opensrc/zafl)
-
-
-## DYNINST
-
-  Dyninst is a binary instrumentation framework similar to Pintool and
-  Dynamorio (see far below). However whereas Pintool and Dynamorio work at
-  runtime, dyninst instruments the target at load time, and then let it run -
-  or save the binary with the changes.
-  This is great for some things, e.g. fuzzing, and not so effective for others,
-  e.g. malware analysis.
-
-  So what we can do with dyninst is taking every basic block, and put afl's
-  instrumention code in there - and then save the binary.
-  Afterwards we can just fuzz the newly saved target binary with afl-fuzz.
-  Sounds great? It is. The issue though - it is a non-trivial problem to
-  insert instructions, which change addresses in the process space, so that
-  everything is still working afterwards. Hence more often than not binaries
-  crash when they are run.
-
-  The speed decrease is about 15-35%, depending on the optimization options
-  used with afl-dyninst.
-
-  [https://github.com/vanhauser-thc/afl-dyninst](https://github.com/vanhauser-thc/afl-dyninst)
-
-
-## RETROWRITE
-
-  If you have an x86/x86_64 binary that still has its symbols, is compiled
-  with position independant code (PIC/PIE) and does not use most of the C++
-  features then the retrowrite solution might be for you.
-  It decompiles to ASM files which can then be instrumented with afl-gcc.
-
-  It is at about 80-85% performance.
-
-  [https://github.com/HexHive/retrowrite](https://github.com/HexHive/retrowrite)
-
-
-## MCSEMA
-
-  Theoretically you can also decompile to llvm IR with mcsema, and then
-  use llvm_mode to instrument the binary.
-  Good luck with that.
-
-  [https://github.com/lifting-bits/mcsema](https://github.com/lifting-bits/mcsema)
-
-
-## INTEL-PT
-
-  If you have a newer Intel CPU, you can make use of Intels processor trace.
-  The big issue with Intel's PT is the small buffer size and the complex
-  encoding of the debug information collected through PT.
-  This makes the decoding very CPU intensive and hence slow.
-  As a result, the overall speed decrease is about 70-90% (depending on
-  the implementation and other factors).
-
-  There are two AFL intel-pt implementations:
-
-  1. [https://github.com/junxzm1990/afl-pt](https://github.com/junxzm1990/afl-pt)
-     => this needs Ubuntu 14.04.05 without any updates and the 4.4 kernel.
-
-  2. [https://github.com/hunter-ht-2018/ptfuzzer](https://github.com/hunter-ht-2018/ptfuzzer)
-     => this needs a 4.14 or 4.15 kernel. the "nopti" kernel boot option must
-        be used. This one is faster than the other.
-
-  Note that there is also honggfuzz: https://github.com/google/honggfuzz
-  But its IPT performance is just 6%!
-
-
-## CORESIGHT
-
-  Coresight is ARM's answer to Intel's PT.
-  With afl++ v3.15 there is a coresight tracer implementation available in
-  `coresight_mode/` which is faster than QEMU, however can not run in parallel.
-  Currently only one process can be traced, it is WIP.
-
-
-## PIN & DYNAMORIO
-
-  Pintool and Dynamorio are dynamic instrumentation engines, and they can be
-  used for getting basic block information at runtime.
-  Pintool is only available for Intel x32/x64 on Linux, Mac OS and Windows,
-  whereas Dynamorio is additionally available for ARM and AARCH64.
-  Dynamorio is also 10x faster than Pintool.
-
-  The big issue with Dynamorio (and therefore Pintool too) is speed.
-  Dynamorio has a speed decrease of 98-99%
-  Pintool has a speed decrease of 99.5%
-
-  Hence Dynamorio is the option to go for if everything else fails, and Pintool
-  only if Dynamorio fails too.
-
-  Dynamorio solutions:
-  * [https://github.com/vanhauser-thc/afl-dynamorio](https://github.com/vanhauser-thc/afl-dynamorio)
-  * [https://github.com/mxmssh/drAFL](https://github.com/mxmssh/drAFL)
-  * [https://github.com/googleprojectzero/winafl/](https://github.com/googleprojectzero/winafl/) <= very good but windows only
-
-  Pintool solutions:
-  * [https://github.com/vanhauser-thc/afl-pin](https://github.com/vanhauser-thc/afl-pin)
-  * [https://github.com/mothran/aflpin](https://github.com/mothran/aflpin)
-  * [https://github.com/spinpx/afl_pin_mode](https://github.com/spinpx/afl_pin_mode) <= only old Pintool version supported
-
-
-## Non-AFL solutions
-
-  There are many binary-only fuzzing frameworks.
-  Some are great for CTFs but don't work with large binaries, others are very
-  slow but have good path discovery, some are very hard to set-up ...
-
-  * QSYM: [https://github.com/sslab-gatech/qsym](https://github.com/sslab-gatech/qsym)
-  * Manticore: [https://github.com/trailofbits/manticore](https://github.com/trailofbits/manticore)
-  * S2E: [https://github.com/S2E](https://github.com/S2E)
-  * Tinyinst: [https://github.com/googleprojectzero/TinyInst](https://github.com/googleprojectzero/TinyInst) (Mac/Windows only)
-  * Jackalope: [https://github.com/googleprojectzero/Jackalope](https://github.com/googleprojectzero/Jackalope)
-  *  ... please send me any missing that are good
-
-
-## Closing words
-
-  That's it! News, corrections, updates? Send an email to vh@thc.org
diff --git a/docs/fuzzing_binary-only_targets.md b/docs/fuzzing_binary-only_targets.md
index ea262f6e..0b39042f 100644
--- a/docs/fuzzing_binary-only_targets.md
+++ b/docs/fuzzing_binary-only_targets.md
@@ -1,83 +1,282 @@
 # Fuzzing binary-only targets
 
-When source code is *NOT* available, AFL++ offers various support for fast,
-on-the-fly instrumentation of black-box binaries. 
+AFL++, libfuzzer, and other fuzzers are great if you have the source code of the
+target. This allows for very fast and coverage guided fuzzing.
 
-If you do not have to use Unicorn the following setup is recommended to use
-qemu_mode:
-  * run 1 afl-fuzz -Q instance with CMPLOG (`-c 0` + `AFL_COMPCOV_LEVEL=2`)
-  * run 1 afl-fuzz -Q instance with QASAN  (`AFL_USE_QASAN=1`)
-  * run 1 afl-fuzz -Q instance with LAF (`AFL_PRELOAD=libcmpcov.so` + `AFL_COMPCOV_LEVEL=2`)
-Alternatively you can use frida_mode, just switch `-Q` with `-O` and remove the
-LAF instance.
+However, if there is only the binary program and no source code available, then
+standard `afl-fuzz -n` (non-instrumented mode) is not effective.
 
-Then run as many instances as you have cores left with either -Q mode or - better -
-use a binary rewriter like afl-dyninst, retrowrite, zafl, etc.
+For fast, on-the-fly instrumentation of black-box binaries, AFL++ still offers
+various support. The following is a description of how these binaries can be
+fuzzed with AFL++.
 
-For Qemu and Frida mode, check out the persistent mode, it gives a huge speed
-improvement if it is possible to use.
+## TL;DR:
+
+Qemu_mode in persistent mode is the fastest - if the stability is high enough.
+Otherwise, try RetroWrite, Dyninst, and if these fail, too, then try standard
+qemu_mode with AFL_ENTRYPOINT to where you need it.
+
+If your target is a library, then use frida_mode.
+
+If your target is non-linux, then use unicorn_mode.
 
-### QEMU
+## Fuzzing binary-only targets with AFL++
+### Qemu_mode
 
-For linux programs and its libraries this is accomplished with a version of
-QEMU running in the lesser-known "user space emulation" mode.
-QEMU is a project separate from AFL, but you can conveniently build the
-feature by doing:
+Qemu_mode is the "native" solution to the program. It is available in the
+./qemu_mode/ directory and, once compiled, it can be accessed by the afl-fuzz -Q
+command line option. It is the easiest to use alternative and even works for
+cross-platform binaries.
+
+For linux programs and its libraries, this is accomplished with a version of
+QEMU running in the lesser-known "user space emulation" mode. QEMU is a project
+separate from AFL++, but you can conveniently build the feature by doing:
 
 ```shell
 cd qemu_mode
 ./build_qemu_support.sh
 ```
 
-For additional instructions and caveats, see [qemu_mode/README.md](../qemu_mode/README.md).
-If possible you should use the persistent mode, see [qemu_mode/README.persistent.md](../qemu_mode/README.persistent.md).
-The mode is approximately 2-5x slower than compile-time instrumentation, and is
-less conducive to parallelization.
+The following setup to use qemu_mode is recommended:
+* run 1 afl-fuzz -Q instance with CMPLOG (`-c 0` + `AFL_COMPCOV_LEVEL=2`)
+* run 1 afl-fuzz -Q instance with QASAN (`AFL_USE_QASAN=1`)
+* run 1 afl-fuzz -Q instance with LAF (`AFL_PRELOAD=libcmpcov.so` +
+  `AFL_COMPCOV_LEVEL=2`), alternatively you can use frida_mode, just switch `-Q`
+  with `-O` and remove the LAF instance
+
+Then run as many instances as you have cores left with either -Q mode or - even
+better - use a binary rewriter like Dyninst, RetroWrite, ZAFL, etc.
+
+If [afl-dyninst](https://github.com/vanhauser-thc/afl-dyninst) works for your
+binary, then you can use afl-fuzz normally and it will have twice the speed
+compared to qemu_mode (but slower than qemu persistent mode). Note that several
+other binary rewriters exist, all with their advantages and caveats.
+
+The speed decrease of qemu_mode is at about 50%. However, various options exist
+to increase the speed:
+- using AFL_ENTRYPOINT to move the forkserver entry to a later basic block in
+  the binary (+5-10% speed)
+- using persistent mode
+  [qemu_mode/README.persistent.md](../qemu_mode/README.persistent.md) this will
+  result in a 150-300% overall speed increase - so 3-8x the original qemu_mode
+  speed!
+- using AFL_CODE_START/AFL_CODE_END to only instrument specific parts
+
+For additional instructions and caveats, see
+[qemu_mode/README.md](../qemu_mode/README.md). If possible, you should use the
+persistent mode, see
+[qemu_mode/README.persistent.md](../qemu_mode/README.persistent.md). The mode is
+approximately 2-5x slower than compile-time instrumentation, and is less
+conducive to parallelization.
+
+Note that there is also honggfuzz:
+[https://github.com/google/honggfuzz](https://github.com/google/honggfuzz) which
+now has a qemu_mode, but its performance is just 1.5% ...
+
+If you like to code a customized fuzzer without much work, we highly recommend
+to check out our sister project libafl which will support QEMU, too:
+[https://github.com/AFLplusplus/LibAFL](https://github.com/AFLplusplus/LibAFL)
+
+### WINE+QEMU
+
+Wine mode can run Win32 PE binaries with the QEMU instrumentation. It needs
+Wine, python3, and the pefile python package installed.
+
+It is included in AFL++.
 
-If [afl-dyninst](https://github.com/vanhauser-thc/afl-dyninst) works for
-your binary, then you can use afl-fuzz normally and it will have twice
-the speed compared to qemu_mode (but slower than qemu persistent mode).
-Note that several other binary rewriters exist, all with their advantages and
-caveats.
+### Frida_mode
 
-### Frida
+In frida_mode, you can fuzz binary-only targets as easily as with QEMU.
+Frida_mode is sometimes faster and sometimes slower than Qemu_mode. It is also
+newer, lacks COMPCOV, and has the advantage that it works on MacOS (both intel
+and M1).
 
-Frida mode is sometimes faster and sometimes slower than Qemu mode.
-It is also newer, lacks COMPCOV, but supports MacOS.
+To build frida_mode:
 
 ```shell
 cd frida_mode
 make
 ```
 
-For additional instructions and caveats, see [frida_mode/README.md](../frida_mode/README.md).
-If possible you should use the persistent mode, see [qemu_frida/README.md](../qemu_frida/README.md).
-The mode is approximately 2-5x slower than compile-time instrumentation, and is
-less conducive to parallelization.
+For additional instructions and caveats, see
+[frida_mode/README.md](../frida_mode/README.md). If possible, you should use the
+persistent mode, see [qemu_frida/README.md](../qemu_frida/README.md). The mode
+is approximately 2-5x slower than compile-time instrumentation, and is less
+conducive to parallelization. But for binary-only fuzzing, it gives a huge speed
+improvement if it is possible to use.
+
+If you want to fuzz a binary-only library, then you can fuzz it with frida-gum
+via frida_mode/. You will have to write a harness to call the target function in
+the library, use afl-frida.c as a template.
+
+You can also perform remote fuzzing with frida, e.g. if you want to fuzz on
+iPhone or Android devices, for this you can use
+[https://github.com/ttdennis/fpicker/](https://github.com/ttdennis/fpicker/) as
+an intermediate that uses AFL++ for fuzzing.
+
+If you like to code a customized fuzzer without much work, we highly recommend
+to check out our sister project libafl which supports Frida, too:
+[https://github.com/AFLplusplus/LibAFL](https://github.com/AFLplusplus/LibAFL).
+Working examples already exist :-)
 
 ### Unicorn
 
-For non-Linux binaries you can use AFL++'s unicorn mode which can emulate
+Unicorn is a fork of QEMU. The instrumentation is, therefore, very similar. In
+contrast to QEMU, Unicorn does not offer a full system or even userland
+emulation. Runtime environment and/or loaders have to be written from scratch,
+if needed. On top, block chaining has been removed. This means the speed boost
+introduced in the patched QEMU Mode of AFL++ cannot simply be ported over to
+Unicorn.
+
+For non-Linux binaries, you can use AFL++'s unicorn_mode which can emulate
 anything you want - for the price of speed and user written scripts.
-See [unicorn_mode/README.md](../unicorn_mode/README.md).
 
-It can be easily built by:
+To build unicorn_mode:
+
 ```shell
 cd unicorn_mode
 ./build_unicorn_support.sh
 ```
 
+For further information, check out
+[unicorn_mode/README.md](../unicorn_mode/README.md).
+
 ### Shared libraries
 
-If the goal is to fuzz a dynamic library then there are two options available.
-For both you need to write a small harness that loads and calls the library.
-Then you fuzz this with either frida_mode or qemu_mode, and either use
+If the goal is to fuzz a dynamic library, then there are two options available.
+For both, you need to write a small harness that loads and calls the library.
+Then you fuzz this with either frida_mode or qemu_mode and either use
 `AFL_INST_LIBS=1` or `AFL_QEMU/FRIDA_INST_RANGES`.
 
-Another, less precise and slower option is using ptrace with debugger interrupt
-instrumentation: [utils/afl_untracer/README.md](../utils/afl_untracer/README.md).
+Another, less precise and slower option is to fuzz it with utils/afl_untracer/
+and use afl-untracer.c as a template. It is slower than frida_mode.
+
+For more information, see
+[utils/afl_untracer/README.md](../utils/afl_untracer/README.md).
+
+## Binary rewriters
+
+### Coresight
+
+Coresight is ARM's answer to Intel's PT. With AFL++ v3.15, there is a coresight
+tracer implementation available in `coresight_mode/` which is faster than QEMU,
+however, cannot run in parallel. Currently, only one process can be traced, it
+is WIP.
+
+### Dyninst
+
+Dyninst is a binary instrumentation framework similar to Pintool and DynamoRIO.
+However, whereas Pintool and DynamoRIO work at runtime, Dyninst instruments the
+target at load time and then let it run - or save the binary with the changes.
+This is great for some things, e.g. fuzzing, and not so effective for others,
+e.g. malware analysis.
+
+So, what we can do with Dyninst is taking every basic block and put AFL++'s
+instrumentation code in there - and then save the binary. Afterwards, we can
+just fuzz the newly saved target binary with afl-fuzz. Sounds great? It is. The
+issue though - it is a non-trivial problem to insert instructions, which change
+addresses in the process space, so that everything is still working afterwards.
+Hence, more often than not binaries crash when they are run.
+
+The speed decrease is about 15-35%, depending on the optimization options used
+with afl-dyninst.
+
+[https://github.com/vanhauser-thc/afl-dyninst](https://github.com/vanhauser-thc/afl-dyninst)
+
+### Intel PT
+
+If you have a newer Intel CPU, you can make use of Intel's processor trace. The
+big issue with Intel's PT is the small buffer size and the complex encoding of
+the debug information collected through PT. This makes the decoding very CPU
+intensive and hence slow. As a result, the overall speed decrease is about
+70-90% (depending on the implementation and other factors).
+
+There are two AFL intel-pt implementations:
+
+1. [https://github.com/junxzm1990/afl-pt](https://github.com/junxzm1990/afl-pt)
+    => This needs Ubuntu 14.04.05 without any updates and the 4.4 kernel.
+
+2. [https://github.com/hunter-ht-2018/ptfuzzer](https://github.com/hunter-ht-2018/ptfuzzer)
+    => This needs a 4.14 or 4.15 kernel. The "nopti" kernel boot option must be
+    used. This one is faster than the other.
+
+Note that there is also honggfuzz:
+[https://github.com/google/honggfuzz](https://github.com/google/honggfuzz). But
+its IPT performance is just 6%!
+
+### Mcsema
+
+Theoretically, you can also decompile to llvm IR with mcsema, and then use
+llvm_mode to instrument the binary. Good luck with that.
+
+[https://github.com/lifting-bits/mcsema](https://github.com/lifting-bits/mcsema)
+
+### Pintool & DynamoRIO
+
+Pintool and DynamoRIO are dynamic instrumentation engines. They can be used for
+getting basic block information at runtime. Pintool is only available for Intel
+x32/x64 on Linux, Mac OS, and Windows, whereas DynamoRIO is additionally
+available for ARM and AARCH64. DynamoRIO is also 10x faster than Pintool.
+
+The big issue with DynamoRIO (and therefore Pintool, too) is speed. DynamoRIO
+has a speed decrease of 98-99%, Pintool has a speed decrease of 99.5%.
+
+Hence, DynamoRIO is the option to go for if everything else fails and Pintool
+only if DynamoRIO fails, too.
+
+DynamoRIO solutions:
+* [https://github.com/vanhauser-thc/afl-dynamorio](https://github.com/vanhauser-thc/afl-dynamorio)
+* [https://github.com/mxmssh/drAFL](https://github.com/mxmssh/drAFL)
+* [https://github.com/googleprojectzero/winafl/](https://github.com/googleprojectzero/winafl/)
+  <= very good but windows only
+
+Pintool solutions:
+* [https://github.com/vanhauser-thc/afl-pin](https://github.com/vanhauser-thc/afl-pin)
+* [https://github.com/mothran/aflpin](https://github.com/mothran/aflpin)
+* [https://github.com/spinpx/afl_pin_mode](https://github.com/spinpx/afl_pin_mode)
+  <= only old Pintool version supported
+
+### RetroWrite
+
+If you have an x86/x86_64 binary that still has its symbols, is compiled with
+position independent code (PIC/PIE), and does not use most of the C++ features,
+then the RetroWrite solution might be for you. It decompiles to ASM files which
+can then be instrumented with afl-gcc.
+
+It is at about 80-85% performance.
+
+[https://github.com/HexHive/retrowrite](https://github.com/HexHive/retrowrite)
+
+### ZAFL
+ZAFL is a static rewriting platform supporting x86-64 C/C++,
+stripped/unstripped, and PIE/non-PIE binaries. Beyond conventional
+instrumentation, ZAFL's API enables transformation passes (e.g., laf-Intel,
+context sensitivity, InsTrim, etc.).
+
+Its baseline instrumentation speed typically averages 90-95% of
+afl-clang-fast's.
+
+[https://git.zephyr-software.com/opensrc/zafl](https://git.zephyr-software.com/opensrc/zafl)
+
+## Non-AFL++ solutions
+
+There are many binary-only fuzzing frameworks. Some are great for CTFs but don't
+work with large binaries, others are very slow but have good path discovery,
+some are very hard to set-up...
+
+
+* Jackalope:
+  [https://github.com/googleprojectzero/Jackalope](https://github.com/googleprojectzero/Jackalope)
+* Manticore:
+  [https://github.com/trailofbits/manticore](https://github.com/trailofbits/manticore)
+* QSYM:
+  [https://github.com/sslab-gatech/qsym](https://github.com/sslab-gatech/qsym)
+* S2E: [https://github.com/S2E](https://github.com/S2E)
+* TinyInst:
+  [https://github.com/googleprojectzero/TinyInst](https://github.com/googleprojectzero/TinyInst)
+  (Mac/Windows only)
+*  ... please send me any missing that are good
 
-### More
+## Closing words
 
-A more comprehensive description of these and other options can be found in
-[binaryonly_fuzzing.md](binaryonly_fuzzing.md).
\ No newline at end of file
+That's it! News, corrections, updates? Send an email to vh@thc.org.
\ No newline at end of file
-- 
cgit 1.4.1