5 files changed, 116 insertions, 61 deletions
diff --git a/docs/Changelog.md b/docs/Changelog.md
index 8bbb4e19..3966464e 100644
--- a/docs/Changelog.md
+++ b/docs/Changelog.md
@@ -10,10 +10,18 @@ sending a mail to <afl-users+subscribe@googlegroups.com>.
 
 
 ### Version ++2.67d (develop)
+  - a few QOL changes for Apple and its outdated gmake
   - afl-fuzz:
     - Fix for auto dictionary entries found during fuzzing to not throw out
       a -x dictionary
     - added total execs done to plot file
+    - AFL_MAX_DET_EXTRAS env variable added to control the amount of deterministic
+      dict entries without recompiling.
+    - AFL_FORKSRV_INIT_TMOUT env variable added to control the time to wait for
+      the forkserver to come up without the need to increase the overall timeout.
+  - custom mutators:
+    - added afl_custom_fuzz_count/fuzz_count function to allow specifying the 
+      number of fuzz attempts for custom_fuzz
   - llvm_mode:
     - Ported SanCov to LTO, and made it the default for LTO. better
       instrumentation locations
@@ -409,7 +417,7 @@ sending a mail to <afl-users+subscribe@googlegroups.com>.
   - big code refactoring:
     * all includes are now in include/
     * all afl sources are now in src/ - see src/README.md
-    * afl-fuzz was splitted up in various individual files for including
+    * afl-fuzz was split up in various individual files for including
       functionality in other programs (e.g. forkserver, memory map, etc.)
       for better readability.
     * new code indention everywhere
diff --git a/docs/binaryonly_fuzzing.md b/docs/binaryonly_fuzzing.md
index a3d3330f..cb1288ef 100644
--- a/docs/binaryonly_fuzzing.md
+++ b/docs/binaryonly_fuzzing.md
@@ -6,14 +6,14 @@
   However, if there is only the binary program and no source code available,
   then standard `afl-fuzz -n` (non-instrumented mode) is not effective.
 
-  The following is a description of how these binaries can be fuzzed with afl++
+  The following is a description of how these binaries can be fuzzed with afl++.
 
 
 ## TL;DR:
 
   qemu_mode in persistent mode is the fastest - if the stability is
   high enough. Otherwise try retrowrite, afl-dyninst and if these
-  fail too then standard qemu_mode with AFL_ENTRYPOINT to where you need it.
+  fail too then try standard qemu_mode with AFL_ENTRYPOINT to where you need it.
 
   If your target is a library use examples/afl_frida/.
 
@@ -29,10 +29,10 @@
 
   The speed decrease is at about 50%.
   However various options exist to increase the speed:
-   - using AFL_ENTRYPOINT to move the forkserver to a later basic block in
+   - using AFL_ENTRYPOINT to move the forkserver entry to a later basic block in
      the binary (+5-10% speed)
    - using persistent mode [qemu_mode/README.persistent.md](../qemu_mode/README.persistent.md)
-     this will result in 150-300% overall speed - so 3-8x the original
+     this will result in 150-300% overall speed increase - so 3-8x the original
      qemu_mode speed!
    - using AFL_CODE_START/AFL_CODE_END to only instrument specific parts
 
@@ -104,7 +104,7 @@
 
 ## RETROWRITE
 
-  If you have an x86/x86_64 binary that still has it's symbols, is compiled
+  If you have an x86/x86_64 binary that still has its symbols, is compiled
   with position independant code (PIC/PIE) and does not use most of the C++
   features then the retrowrite solution might be for you.
   It decompiles to ASM files which can then be instrumented with afl-gcc.
@@ -148,7 +148,7 @@
 ## CORESIGHT
 
   Coresight is ARM's answer to Intel's PT.
-  There is no implementation so far which handle coresight and getting
+  There is no implementation so far which handles coresight and getting
   it working on an ARM Linux is very difficult due to custom kernel building
   on embedded systems is difficult. And finding one that has coresight in
   the ARM chip is difficult too.
diff --git a/docs/custom_mutators.md b/docs/custom_mutators.md
index a22c809b..a128f587 100644
--- a/docs/custom_mutators.md
+++ b/docs/custom_mutators.md
@@ -32,6 +32,7 @@ performed with the custom mutator.
 C/C++:
 ```c
 void *afl_custom_init(afl_t *afl, unsigned int seed);
+uint32_t afl_custom_fuzz_count(void *data, const u8 *buf, size_t buf_size);
 size_t afl_custom_fuzz(void *data, uint8_t *buf, size_t buf_size, u8 **out_buf, uint8_t *add_buf, size_t add_buf_size, size_t max_size);
 size_t afl_custom_post_process(void *data, uint8_t *buf, size_t buf_size, uint8_t **out_buf);
 int32_t afl_custom_init_trim(void *data, uint8_t *buf, size_t buf_size);
@@ -49,6 +50,9 @@ Python:
 def init(seed):
     pass
 
+def fuzz_count(buf, add_buf, max_size):
+    return cnt
+
 def fuzz(buf, add_buf, max_size):
     return mutated_out
 
@@ -88,6 +92,14 @@ def queue_new_entry(filename_new_queue, filename_orig_queue):
     This method determines whether the custom fuzzer should fuzz the current
     queue entry or not
 
+- `fuzz_count` (optional):
+
+    When a queue entry is selected to be fuzzed, afl-fuzz selects the number
+    of fuzzing attempts with this input based on a few factors.
+    If however the custom mutator wants to set this number instead on how often
+    it is called for a specific queue entry, use this function.
+    This function in mostly useful if **not** `AFL_CUSTOM_MUTATOR_ONLY` is used.
+
 - `fuzz` (optional):
 
     This method performs custom mutations on a given input. It also accepts an
diff --git a/docs/env_variables.md b/docs/env_variables.md
index 94c34400..c47d10e8 100644
--- a/docs/env_variables.md
+++ b/docs/env_variables.md
@@ -10,8 +10,8 @@
 Because they can't directly accept command-line options, the compile-time
 tools make fairly broad use of environmental variables:
 
-  - Most afl tools do not print any ouput if stout/stderr are redirected.
-    If you want to have the output into a file then set the AFL_DEBUG
+  - Most afl tools do not print any output if stdout/stderr are redirected.
+    If you want to save the output in a file then set the AFL_DEBUG
     environment variable.
     This is sadly necessary for various build processes which fail otherwise.
 
@@ -44,7 +44,7 @@ tools make fairly broad use of environmental variables:
     you instrument hand-written assembly when compiling clang code by plugging
     a normalizer into the chain. (There is no equivalent feature for GCC.)
 
-  - Setting AFL_INST_RATIO to a percentage between 0 and 100% controls the
+  - Setting AFL_INST_RATIO to a percentage between 0% and 100% controls the
     probability of instrumenting every branch. This is (very rarely) useful
     when dealing with exceptionally complex programs that saturate the output
     bitmap. Examples include v8, ffmpeg, and perl.
@@ -88,7 +88,7 @@ of the settings discussed in section #1, with the exception of:
   - TMPDIR and AFL_KEEP_ASSEMBLY, since no temporary assembly files are
     created.
 
-  - AFL_INST_RATIO, as we by default collision free instrumentation is used.
+  - AFL_INST_RATIO, as we by default use collision free instrumentation.
 
 Then there are a few specific features that are only available in llvm_mode:
 
@@ -121,7 +121,7 @@ Then there are a few specific features that are only available in llvm_mode:
 
     None of the following options are necessary to be used and are rather for
     manual use (which only ever the author of this LTO implementation will use).
-    These are used if several seperated instrumentation are performed which
+    These are used if several seperated instrumentations are performed which
     are then later combined.
 
    - AFL_LLVM_DOCUMENT_IDS=file will document to a file which edge ID was given
@@ -200,7 +200,7 @@ Then there are a few specific features that are only available in llvm_mode:
 
 ### INSTRUMENT LIST (selectively instrument files and functions)
 
-    This feature allows selectively instrumentation of the source
+    This feature allows selective instrumentation of the source
 
     - Setting AFL_LLVM_ALLOWLIST or AFL_LLVM_DENYLIST with a filenames and/or
       function will only instrument (or skip) those files that match the names
@@ -278,6 +278,14 @@ checks or alter some of the more exotic semantics of the tool:
     don't want AFL to spend too much time classifying that stuff and just
     rapidly put all timeouts in that bin.
 
+  - Setting AFL_FORKSRV_INIT_TMOUT allows yout to specify a different timeout
+    to wait for the forkserver to spin up. The default is the `-t` value times
+    `FORK_WAIT_MULT` from `config.h` (usually 10), so for a `-t 100`, the
+    default would wait `1000` milis. Setting a different time here is useful
+    if the target has a very slow startup time, for example when doing
+    full-system fuzzing or emulation, but you don't want the actual runs
+    to wait too long for timeouts.
+
   - AFL_NO_ARITH causes AFL to skip most of the deterministic arithmetics.
     This can be useful to speed up the fuzzing of text-based file formats.
 
@@ -369,6 +377,16 @@ checks or alter some of the more exotic semantics of the tool:
     Note that this setting inhibits some of the user-friendly diagnostics
     normally done when starting up the forkserver and causes a pretty
     significant performance drop.
+  
+  - Setting AFL_MAX_DET_EXTRAS changes the count of dictionary entries/extras
+    (default 200), after which the entries will be used probabilistically.
+    So, if the dict/extras file (`-x`) contains more tokens than this threshold,
+    not all of the tokens will be used in each fuzzing step, every time.
+    Instead, there is a chance that the entry will be skipped during fuzzing.
+    This makes sure that the fuzzer doesn't spend all its time only inserting
+    the extras, but will still do other mutations. However, it decreases the
+    likelihood for each token to be inserted, before the next queue entry is fuzzed.
+    Either way, all tokens will be used eventually, in a longer fuzzing campaign.
 
   - Outdated environment variables that are that not supported anymore:
     AFL_DEFER_FORKSRV
diff --git a/docs/parallel_fuzzing.md b/docs/parallel_fuzzing.md
index 2ab1466c..bf57ace8 100644
--- a/docs/parallel_fuzzing.md
+++ b/docs/parallel_fuzzing.md
@@ -10,8 +10,8 @@ n-core system, you can almost always run around n concurrent fuzzing jobs with
 virtually no performance hit (you can use the afl-gotcpu tool to make sure).
 
 In fact, if you rely on just a single job on a multi-core system, you will
-be underutilizing the hardware. So, parallelization is usually the right
-way to go.
+be underutilizing the hardware. So, parallelization is always the right way to
+go.
 
 When targeting multiple unrelated binaries or using the tool in
 "non-instrumented" (-n) mode, it is perfectly fine to just start up several
@@ -65,22 +65,7 @@ still perform deterministic checks; while the secondary instances will
 proceed straight to random tweaks.
 
 Note that you must always have one -M main instance!
-
-Note that running multiple -M instances is wasteful, although there is an
-experimental support for parallelizing the deterministic checks. To leverage
-that, you need to create -M instances like so:
-
-```
-./afl-fuzz -i testcase_dir -o sync_dir -M mainA:1/3 [...]
-./afl-fuzz -i testcase_dir -o sync_dir -M mainB:2/3 [...]
-./afl-fuzz -i testcase_dir -o sync_dir -M mainC:3/3 [...]
-```
-
-...where the first value after ':' is the sequential ID of a particular main
-instance (starting at 1), and the second value is the total number of fuzzers to
-distribute the deterministic fuzzing across. Note that if you boot up fewer
-fuzzers than indicated by the second number passed to -M, you may end up with
-poor coverage.
+Running multiple -M instances is wasteful!
 
 You can also monitor the progress of your jobs from the command line with the
 provided afl-whatsup tool. When the instances are no longer finding new paths,
@@ -99,61 +84,88 @@ example may be:
 This is not a concern if you use @@ without -f and let afl-fuzz come up with the
 file name.
 
-## 3) Syncing with non-afl fuzzers or independant instances
+## 3) Multiple -M mains
+
+
+There is support for parallelizing the deterministic checks.
+This is only needed where
+ 
+ 1. many new paths are found fast over a long time and it looks unlikely that
+    main node will ever catch up, and
+ 2. deterministic fuzzing is actively helping path discovery (you can see this
+    in the main node for the first for lines in the "fuzzing strategy yields"
+    section. If the ration `found/attemps` is high, then it is effective. It
+    most commonly isn't.)
+
+Only if both are true it is beneficial to have more than one main.
+You can leverage this by creating -M instances like so:
+
+```
+./afl-fuzz -i testcase_dir -o sync_dir -M mainA:1/3 [...]
+./afl-fuzz -i testcase_dir -o sync_dir -M mainB:2/3 [...]
+./afl-fuzz -i testcase_dir -o sync_dir -M mainC:3/3 [...]
+```
+
+... where the first value after ':' is the sequential ID of a particular main
+instance (starting at 1), and the second value is the total number of fuzzers to
+distribute the deterministic fuzzing across. Note that if you boot up fewer
+fuzzers than indicated by the second number passed to -M, you may end up with
+poor coverage.
+
+## 4) Syncing with non-afl fuzzers or independant instances
 
 A -M main node can be told with the `-F other_fuzzer_queue_directory` option
 to sync results from other fuzzers, e.g. libfuzzer or honggfuzz.
 
 Only the specified directory will by synced into afl, not subdirectories.
-The specified directories do not need to exist yet at the start of afl.
+The specified directory does not need to exist yet at the start of afl.
 
-## 4) Multi-system parallelization
+The `-F` option can be passed to the main node several times.
+
+## 5) Multi-system parallelization
 
 The basic operating principle for multi-system parallelization is similar to
 the mechanism explained in section 2. The key difference is that you need to
 write a simple script that performs two actions:
 
   - Uses SSH with authorized_keys to connect to every machine and retrieve
-    a tar archive of the /path/to/sync_dir/<fuzzer_id>/queue/ directories for
-    every <fuzzer_id> local to the machine. It's best to use a naming scheme
-    that includes host name in the fuzzer ID, so that you can do something
-    like:
+    a tar archive of the /path/to/sync_dir/<main_node(s)> directory local to
+    the machine.
+    It is best to use a naming scheme that includes host name and it's being
+    a main node (e.g. main1, main2) in the fuzzer ID, so that you can do
+    something like:
 
     ```sh
-    for s in {1..10}; do
-      ssh user@host${s} "tar -czf - sync/host${s}_fuzzid*/[qf]*" >host${s}.tgz
+    for host in `cat HOSTLIST`; do
+      ssh user@$host "tar -czf - sync/$host_main*/" > $host.tgz
     done
     ```
 
   - Distributes and unpacks these files on all the remaining machines, e.g.:
 
     ```sh
-    for s in {1..10}; do
-      for d in {1..10}; do
-        test "$s" = "$d" && continue
-        ssh user@host${d} 'tar -kxzf -' <host${s}.tgz
+    for srchost in `cat HOSTLIST`; do
+      for dsthost in `cat HOSTLIST`; do
+        test "$srchost" = "$dsthost" && continue
+        ssh user@$srchost 'tar -kxzf -' < $dsthost.tgz
       done
     done
     ```
 
-There is an example of such a script in examples/distributed_fuzzing/;
-you can also find a more featured, experimental tool developed by
-Martijn Bogaard at:
-
-  https://github.com/MartijnB/disfuzz-afl
-
-Another client-server implementation from Richo Healey is:
+There is an example of such a script in examples/distributed_fuzzing/.
 
-  https://github.com/richo/roving
+There are other (older) more featured, experimental tools:
+  * https://github.com/richo/roving
+  * https://github.com/MartijnB/disfuzz-afl
 
-Note that these third-party tools are unsafe to run on systems exposed to the
-Internet or to untrusted users.
+However these do not support syncing just main nodes (yet).
 
 When developing custom test case sync code, there are several optimizations
 to keep in mind:
 
   - The synchronization does not have to happen very often; running the
-    task every 30 minutes or so may be perfectly fine.
+    task every 60 minutes or even less often at later fuzzing stages is
+    fine
 
   - There is no need to synchronize crashes/ or hangs/; you only need to
     copy over queue/* (and ideally, also fuzzer_stats).
@@ -179,19 +191,24 @@ to keep in mind:
   - You do not want a "main" instance of afl-fuzz on every system; you should
     run them all with -S, and just designate a single process somewhere within
     the fleet to run with -M.
+    
+  - Syncing is only necessary for the main nodes on a system. It is possible
+    to run main-less with only secondaries. However then you need to find out
+    which secondary took over the temporary role to be the main node. Look for
+    the `is_main_node` file in the fuzzer directories, eg. `sync-dir/hostname-*/is_main_node`
 
 It is *not* advisable to skip the synchronization script and run the fuzzers
 directly on a network filesystem; unexpected latency and unkillable processes
 in I/O wait state can mess things up.
 
-## 5) Remote monitoring and data collection
+## 6) Remote monitoring and data collection
 
 You can use screen, nohup, tmux, or something equivalent to run remote
 instances of afl-fuzz. If you redirect the program's output to a file, it will
 automatically switch from a fancy UI to more limited status reports. There is
-also basic machine-readable information always written to the fuzzer_stats file
-in the output directory. Locally, that information can be interpreted with
-afl-whatsup.
+also basic machine-readable information which is always written to the
+fuzzer_stats file in the output directory. Locally, that information can be
+interpreted with afl-whatsup.
 
 In principle, you can use the status screen of the main (-M) instance to
 monitor the overall fuzzing progress and decide when to stop. In this
@@ -208,7 +225,7 @@ Keep in mind that crashing inputs are *not* automatically propagated to the
 main instance, so you may still want to monitor for crashes fleet-wide
 from within your synchronization or health checking scripts (see afl-whatsup).
 
-## 6) Asymmetric setups
+## 7) Asymmetric setups
 
 It is perhaps worth noting that all of the following is permitted:
 
@@ -224,7 +241,7 @@ It is perhaps worth noting that all of the following is permitted:
     the discovered test cases can have synergistic effects and improve the
     overall coverage.
 
-    (In this case, running one -M instance per each binary is a good plan.)
+    (In this case, running one -M instance per target is necessary.)
 
   - Having some of the fuzzers invoke the binary in different ways.
     For example, 'djpeg' supports several DCT modes, configurable with