6 files changed, 199 insertions, 104 deletions
diff --git a/docs/Changelog.md b/docs/Changelog.md
index 5f404dba..6c0ad104 100644
--- a/docs/Changelog.md
+++ b/docs/Changelog.md
@@ -10,8 +10,8 @@ sending a mail to <afl-users+subscribe@googlegroups.com>.
 
 
 ### Version ++2.65d (dev)
-  - initial support for persistent mode shared memory testcase handover
-    (instead of via files/stdin)
+  - persistent mode shared memory testcase handover (instead of via
+    files/stdin) - x2 performance increase!
   - afl-fuzz:
      - -S slaves now only sync from the master to increase performance,
        the -M master still syncs from everyone. Added checks that ensure
diff --git a/examples/persistent_demo/Makefile b/examples/persistent_demo/Makefile
new file mode 100644
index 00000000..cbbb7239
--- /dev/null
+++ b/examples/persistent_demo/Makefile
@@ -0,0 +1,6 @@
+all:
+	afl-clang-fast -o persistent_demo persistent_demo.c
+	afl-clang-fast -o persistent_demo_new persistent_demo_new.c
+
+clean:
+	rm -f persistent_demo persistent_demo_new
diff --git a/llvm_mode/README.md b/llvm_mode/README.md
index 96b2762c..fa008cba 100644
--- a/llvm_mode/README.md
+++ b/llvm_mode/README.md
@@ -35,7 +35,7 @@ Once this implementation is shown to be sufficiently robust and portable, it
 will probably replace afl-clang. For now, it can be built separately and
 co-exists with the original code.
 
-The idea and much of the implementation comes from Laszlo Szekeres.
+The idea and much of the intial implementation came from Laszlo Szekeres.
 
 ## 2a) How to use this - short
 
@@ -56,6 +56,8 @@ LLVM_CONFIG=llvm-config-7 REAL_CC=gcc REAL_CXX=g++ make
 It is highly recommended to use the newest clang version you can put your
 hands on :)
 
+Then look at [README.persistent_mode.md](README.persistent_mode.md).
+
 ## 2b) How to use this - long
 
 In order to leverage this mechanism, you need to have clang installed on your
@@ -159,96 +161,13 @@ See [README.snapshot](README.snapshot.md)
 This is an early-stage mechanism, so field reports are welcome. You can send bug
 reports to <afl-users@googlegroups.com>.
 
-## 6) Bonus feature #1: deferred initialization
-
-AFL tries to optimize performance by executing the targeted binary just once,
-stopping it just before main(), and then cloning this "master" process to get
-a steady supply of targets to fuzz.
-
-Although this approach eliminates much of the OS-, linker- and libc-level
-costs of executing the program, it does not always help with binaries that
-perform other time-consuming initialization steps - say, parsing a large config
-file before getting to the fuzzed data.
-
-In such cases, it's beneficial to initialize the forkserver a bit later, once
-most of the initialization work is already done, but before the binary attempts
-to read the fuzzed input and parse it; in some cases, this can offer a 10x+
-performance gain. You can implement delayed initialization in LLVM mode in a
-fairly simple way.
-
-First, find a suitable location in the code where the delayed cloning can 
-take place. This needs to be done with *extreme* care to avoid breaking the
-binary. In particular, the program will probably malfunction if you select
-a location after:
-
-  - The creation of any vital threads or child processes - since the forkserver
-    can't clone them easily.
-
-  - The initialization of timers via setitimer() or equivalent calls.
-
-  - The creation of temporary files, network sockets, offset-sensitive file
-    descriptors, and similar shared-state resources - but only provided that
-    their state meaningfully influences the behavior of the program later on.
-
-  - Any access to the fuzzed input, including reading the metadata about its
-    size.
-
-With the location selected, add this code in the appropriate spot:
-
-```c
-#ifdef __AFL_HAVE_MANUAL_CONTROL
-  __AFL_INIT();
-#endif
-```
-
-You don't need the #ifdef guards, but including them ensures that the program
-will keep working normally when compiled with a tool other than afl-clang-fast.
-
-Finally, recompile the program with afl-clang-fast (afl-gcc or afl-clang will
-*not* generate a deferred-initialization binary) - and you should be all set!
-
-## 7) Bonus feature #2: persistent mode
-
-Some libraries provide APIs that are stateless, or whose state can be reset in
-between processing different input files. When such a reset is performed, a
-single long-lived process can be reused to try out multiple test cases,
-eliminating the need for repeated fork() calls and the associated OS overhead.
-
-The basic structure of the program that does this would be:
-
-```c
-  while (__AFL_LOOP(1000)) {
-
-    /* Read input data. */
-    /* Call library code to be fuzzed. */
-    /* Reset state. */
-
-  }
-
-  /* Exit normally */
-```
-
-The numerical value specified within the loop controls the maximum number
-of iterations before AFL will restart the process from scratch. This minimizes
-the impact of memory leaks and similar glitches; 1000 is a good starting point,
-and going much higher increases the likelihood of hiccups without giving you
-any real performance benefits.
-
-A more detailed template is shown in ../examples/persistent_demo/.
-Similarly to the previous mode, the feature works only with afl-clang-fast; #ifdef
-guards can be used to suppress it when using other compilers.
-
-Note that as with the previous mode, the feature is easy to misuse; if you
-do not fully reset the critical state, you may end up with false positives or
-waste a whole lot of CPU power doing nothing useful at all. Be particularly
-wary of memory leaks and of the state of file descriptors.
+## 6) deferred initialization, persistent mode, shared memory fuzzing
 
-PS. Because there are task switches still involved, the mode isn't as fast as
-"pure" in-process fuzzing offered, say, by LLVM's LibFuzzer; but it is a lot
-faster than the normal fork() model, and compared to in-process fuzzing,
-should be a lot more robust.
+This is the most powerful and effective fuzzing you can do.
+Please see [README.persistent_mode.md](README.persistent_mode.md) for a
+full explanation.
 
-## 8) Bonus feature #3: 'trace-pc-guard' mode
+## 7) Bonus feature: 'trace-pc-guard' mode
 
 LLVM is shipping with a built-in execution tracing feature
 that provides AFL with the necessary tracing data without the need to
@@ -260,11 +179,8 @@ If you have not an outdated compiler and want to give it a try, build
 targets this way:
 
 ```
- libtarget-1.0 $ AFL_LLVM_USE_TRACE_PC=1  make
+$ AFL_LLVM_INSTRUMENT=PCGUARD  make
 ```
 
-Note that this mode is about 20% slower than "vanilla" afl-clang-fast,
-and about 5-10% slower than afl-clang. This is likely because the
-instrumentation is not inlined, and instead involves a function call.
-On systems that support it, compiling your target with -flto can help
-a bit.
+Note that this us currently the default, as it is the best mode.
+If you have llvm 11 and compiled afl-clang-lto - this is the only better mode.
diff --git a/llvm_mode/README.persistent_mode.md b/llvm_mode/README.persistent_mode.md
new file mode 100644
index 00000000..b092de54
--- /dev/null
+++ b/llvm_mode/README.persistent_mode.md
@@ -0,0 +1,168 @@
+# llvm_mode persistent mode
+
+## 1) Introduction
+
+The most effective way is to fuzz in persistent mode, as the speed can easily
+be x10 or x20 times faster without any disadvanges.
+*All professionel fuzzing is using this mode.*
+
+This requires that the target can be called in a (or several) function(s),
+and that the state can be resetted so that multiple calls be be performed
+without memory leaking and former runs having no impact on following runs
+(this can be seen by the `stability` indicator in the `afl-fuzz` UI).
+
+Examples can be found in [examples/persistent_mode](../examples/persistent_mode).
+
+## 2) TLDR;
+
+Example `fuzz_target.c`:
+```
+#include "what_you_need_for_your_target.h"
+
+__AFL_FUZZ_INIT();
+
+main() {
+
+#ifdef __AFL_HAVE_MANUAL_CONTROL
+  __AFL_INIT();
+#endif
+
+  unsigned char *buf = __AFL_FUZZ_TESTCASE_BUF;  // must be after __AFL_INIT
+
+  while (__AFL_LOOP(10000)) {
+
+    int len = __AFL_FUZZ_TESTCASE_LEN;
+
+    if (len < 8) return 0;  // check for a required/useful minimum input length
+
+    /* Setup function call, e.g. struct target *tmp = libtarget_init() */
+    /* Call function to be fuzzed, e.g.: */
+    target_function(buf, len);
+    /* Reset state. e.g. libtarget_free(tmp) */
+
+  }
+
+  return 0;
+
+}
+```
+And then compile:
+```
+afl-clang-fast -o fuzz_target fuzz_target.c -lwhat_you_need_for_your_target
+```
+And that is it!
+The speed increase is usually x10 to x20.
+
+## 3) deferred initialization
+
+AFL tries to optimize performance by executing the targeted binary just once,
+stopping it just before main(), and then cloning this "master" process to get
+a steady supply of targets to fuzz.
+
+Although this approach eliminates much of the OS-, linker- and libc-level
+costs of executing the program, it does not always help with binaries that
+perform other time-consuming initialization steps - say, parsing a large config
+file before getting to the fuzzed data.
+
+In such cases, it's beneficial to initialize the forkserver a bit later, once
+most of the initialization work is already done, but before the binary attempts
+to read the fuzzed input and parse it; in some cases, this can offer a 10x+
+performance gain. You can implement delayed initialization in LLVM mode in a
+fairly simple way.
+
+First, find a suitable location in the code where the delayed cloning can 
+take place. This needs to be done with *extreme* care to avoid breaking the
+binary. In particular, the program will probably malfunction if you select
+a location after:
+
+  - The creation of any vital threads or child processes - since the forkserver
+    can't clone them easily.
+
+  - The initialization of timers via setitimer() or equivalent calls.
+
+  - The creation of temporary files, network sockets, offset-sensitive file
+    descriptors, and similar shared-state resources - but only provided that
+    their state meaningfully influences the behavior of the program later on.
+
+  - Any access to the fuzzed input, including reading the metadata about its
+    size.
+
+With the location selected, add this code in the appropriate spot:
+
+```c
+#ifdef __AFL_HAVE_MANUAL_CONTROL
+  __AFL_INIT();
+#endif
+```
+
+You don't need the #ifdef guards, but including them ensures that the program
+will keep working normally when compiled with a tool other than afl-clang-fast.
+
+Finally, recompile the program with afl-clang-fast (afl-gcc or afl-clang will
+*not* generate a deferred-initialization binary) - and you should be all set!
+
+## 4) persistent mode
+
+Some libraries provide APIs that are stateless, or whose state can be reset in
+between processing different input files. When such a reset is performed, a
+single long-lived process can be reused to try out multiple test cases,
+eliminating the need for repeated fork() calls and the associated OS overhead.
+
+The basic structure of the program that does this would be:
+
+```c
+  while (__AFL_LOOP(1000)) {
+
+    /* Read input data. */
+    /* Call library code to be fuzzed. */
+    /* Reset state. */
+
+  }
+
+  /* Exit normally */
+```
+
+The numerical value specified within the loop controls the maximum number
+of iterations before AFL will restart the process from scratch. This minimizes
+the impact of memory leaks and similar glitches; 1000 is a good starting point,
+and going much higher increases the likelihood of hiccups without giving you
+any real performance benefits.
+
+A more detailed template is shown in ../examples/persistent_demo/.
+Similarly to the previous mode, the feature works only with afl-clang-fast; #ifdef
+guards can be used to suppress it when using other compilers.
+
+Note that as with the previous mode, the feature is easy to misuse; if you
+do not fully reset the critical state, you may end up with false positives or
+waste a whole lot of CPU power doing nothing useful at all. Be particularly
+wary of memory leaks and of the state of file descriptors.
+
+PS. Because there are task switches still involved, the mode isn't as fast as
+"pure" in-process fuzzing offered, say, by LLVM's LibFuzzer; but it is a lot
+faster than the normal fork() model, and compared to in-process fuzzing,
+should be a lot more robust.
+
+## 5) shared memory fuzzing
+
+You can speed up the fuzzing process even more by receiving the fuzzing data
+via shared memory instead of stdin or files.
+This is a further speed multiplier of about 2x.
+
+Setting this up is very easy:
+
+After the includes set the following macro:
+
+```
+__AFL_FUZZ_INIT();
+```
+Directly at the start of main - or if you are using the deferred forkserver
+with `__AFL_INIT()`  then *after* `__AFL_INIT? :
+```
+  unsigned char *buf = __AFL_FUZZ_TESTCASE_BUF;
+```
+
+Then as first line after the `__AFL_LOOP` while loop:
+```
+  int len = __AFL_FUZZ_TESTCASE_LEN;
+```
+and that is all!
diff --git a/llvm_mode/afl-clang-fast.c b/llvm_mode/afl-clang-fast.c
index fb072651..64231a4e 100644
--- a/llvm_mode/afl-clang-fast.c
+++ b/llvm_mode/afl-clang-fast.c
@@ -45,11 +45,11 @@ static u32  cc_par_cnt = 1;            /* Param count, including argv0      */
 static u8   llvm_fullpath[PATH_MAX];
 static u8  instrument_mode, instrument_opt_mode, ngram_size, lto_mode, cpp_mode;
 static u8 *lto_flag = AFL_CLANG_FLTO;
-static u8 *march_opt = CFLAGS_OPT;
 static u8  debug;
 static u8  cwd[4096];
 static u8  cmplog_mode;
 u8         use_stdin = 0;                                          /* dummy */
+// static u8 *march_opt = CFLAGS_OPT;
 
 enum {
 
@@ -335,7 +335,7 @@ static void edit_params(u32 argc, char **argv, char **envp) {
 
   }
 
-  //cc_params[cc_par_cnt++] = "-Qunused-arguments";
+  // cc_params[cc_par_cnt++] = "-Qunused-arguments";
 
   // in case LLVM is installed not via a package manager or "make install"
   // e.g. compiled download or compiled from github then it's ./lib directory
@@ -440,7 +440,7 @@ static void edit_params(u32 argc, char **argv, char **envp) {
     cc_params[cc_par_cnt++] = "-g";
     cc_params[cc_par_cnt++] = "-O3";
     cc_params[cc_par_cnt++] = "-funroll-loops";
-    //if (strlen(march_opt) > 1 && march_opt[0] == '-')
+    // if (strlen(march_opt) > 1 && march_opt[0] == '-')
     //  cc_params[cc_par_cnt++] = march_opt;
 
   }
@@ -493,9 +493,14 @@ static void edit_params(u32 argc, char **argv, char **envp) {
       "-D__AFL_FUZZ_INIT()="
       "int __afl_sharedmem_fuzzing = 1;"
       "extern unsigned int __afl_fuzz_len;"
-      "extern unsigned char *__afl_fuzz_ptr;";
-  cc_params[cc_par_cnt++] = "-D__AFL_FUZZ_TESTCASE_BUF=__afl_fuzz_ptr";
-  cc_params[cc_par_cnt++] = "-D__AFL_FUZZ_TESTCASE_LEN=__afl_fuzz_len";
+      "extern unsigned char *__afl_fuzz_ptr;"
+      "unsigned char *__afl_fuzz_alt_ptr;";
+  cc_params[cc_par_cnt++] =
+      "-D__AFL_FUZZ_TESTCASE_BUF=(__afl_fuzz_ptr ? __afl_fuzz_ptr : "
+      "(__afl_fuzz_alt_ptr = malloc(1 * 1024 * 1024)))";
+  cc_params[cc_par_cnt++] =
+      "-D__AFL_FUZZ_TESTCASE_LEN=(__afl_fuzz_ptr ? __afl_fuzz_len : read(0, "
+      "__afl_fuzz_alt_ptr, 1 * 1024 * 1024))";
 
   cc_params[cc_par_cnt++] =
       "-D__AFL_LOOP(_A)="
diff --git a/llvm_mode/afl-llvm-rt.o.c b/llvm_mode/afl-llvm-rt.o.c
index b151de8e..08733db4 100644
--- a/llvm_mode/afl-llvm-rt.o.c
+++ b/llvm_mode/afl-llvm-rt.o.c
@@ -788,7 +788,7 @@ void __sanitizer_cov_trace_pc_guard(uint32_t *guard) {
 
 void __sanitizer_cov_trace_pc_guard_init(uint32_t *start, uint32_t *stop) {
 
-  u32 inst_ratio = 100;
+  u32   inst_ratio = 100;
   char *x;
 
   if (start == stop || *start) return;