about summary refs log tree commit diff
path: root/docs/parallel_fuzzing.md
diff options
context:
space:
mode:
Diffstat (limited to 'docs/parallel_fuzzing.md')
-rw-r--r--docs/parallel_fuzzing.md256
1 files changed, 0 insertions, 256 deletions
diff --git a/docs/parallel_fuzzing.md b/docs/parallel_fuzzing.md
deleted file mode 100644
index 130cb3ce..00000000
--- a/docs/parallel_fuzzing.md
+++ /dev/null
@@ -1,256 +0,0 @@
-# Tips for parallel fuzzing
-
-This document talks about synchronizing afl-fuzz jobs on a single machine or
-across a fleet of systems. See README.md for the general instruction manual.
-
-Note that this document is rather outdated. please refer to the main document
-section on multiple core usage
-[fuzzing_in_depth.md:b) Using multiple cores](fuzzing_in_depth.md#b-using-multiple-cores)
-for up to date strategies!
-
-## 1) Introduction
-
-Every copy of afl-fuzz will take up one CPU core. This means that on an n-core
-system, you can almost always run around n concurrent fuzzing jobs with
-virtually no performance hit (you can use the afl-gotcpu tool to make sure).
-
-In fact, if you rely on just a single job on a multi-core system, you will be
-underutilizing the hardware. So, parallelization is always the right way to go.
-
-When targeting multiple unrelated binaries or using the tool in
-"non-instrumented" (-n) mode, it is perfectly fine to just start up several
-fully separate instances of afl-fuzz. The picture gets more complicated when you
-want to have multiple fuzzers hammering a common target: if a hard-to-hit but
-interesting test case is synthesized by one fuzzer, the remaining instances will
-not be able to use that input to guide their work.
-
-To help with this problem, afl-fuzz offers a simple way to synchronize test
-cases on the fly.
-
-It is a good idea to use different power schedules if you run several instances
-in parallel (`-p` option).
-
-Alternatively running other AFL spinoffs in parallel can be of value, e.g.
-Angora (https://github.com/AngoraFuzzer/Angora/)
-
-## 2) Single-system parallelization
-
-If you wish to parallelize a single job across multiple cores on a local system,
-simply create a new, empty output directory ("sync dir") that will be shared by
-all the instances of afl-fuzz; and then come up with a naming scheme for every
-instance - say, "fuzzer01", "fuzzer02", etc.
-
-Run the first one ("main node", -M) like this:
-
-```
-./afl-fuzz -i testcase_dir -o sync_dir -M fuzzer01 [...other stuff...]
-```
-
-...and then, start up secondary (-S) instances like this:
-
-```
-./afl-fuzz -i testcase_dir -o sync_dir -S fuzzer02 [...other stuff...]
-./afl-fuzz -i testcase_dir -o sync_dir -S fuzzer03 [...other stuff...]
-```
-
-Each fuzzer will keep its state in a separate subdirectory, like so:
-
-  /path/to/sync_dir/fuzzer01/
-
-Each instance will also periodically rescan the top-level sync directory for any
-test cases found by other fuzzers - and will incorporate them into its own
-fuzzing when they are deemed interesting enough. For performance reasons only -M
-main node syncs the queue with everyone, the -S secondary nodes will only sync
-from the main node.
-
-The difference between the -M and -S modes is that the main instance will still
-perform deterministic checks; while the secondary instances will proceed
-straight to random tweaks.
-
-Note that you must always have one -M main instance! Running multiple -M
-instances is wasteful!
-
-You can also monitor the progress of your jobs from the command line with the
-provided afl-whatsup tool. When the instances are no longer finding new paths,
-it's probably time to stop.
-
-WARNING: Exercise caution when explicitly specifying the -f option. Each fuzzer
-must use a separate temporary file; otherwise, things will go south. One safe
-example may be:
-
-```
-./afl-fuzz [...] -S fuzzer10 -f file10.txt ./fuzzed/binary @@
-./afl-fuzz [...] -S fuzzer11 -f file11.txt ./fuzzed/binary @@
-./afl-fuzz [...] -S fuzzer12 -f file12.txt ./fuzzed/binary @@
-```
-
-This is not a concern if you use @@ without -f and let afl-fuzz come up with the
-file name.
-
-## 3) Multiple -M mains
-
-
-There is support for parallelizing the deterministic checks. This is only needed
-where
-
- 1. many new paths are found fast over a long time and it looks unlikely that
-    main node will ever catch up, and
- 2. deterministic fuzzing is actively helping path discovery (you can see this
-    in the main node for the first for lines in the "fuzzing strategy yields"
-    section. If the ration `found/attempts` is high, then it is effective. It
-    most commonly isn't.)
-
-Only if both are true it is beneficial to have more than one main. You can
-leverage this by creating -M instances like so:
-
-```
-./afl-fuzz -i testcase_dir -o sync_dir -M mainA:1/3 [...]
-./afl-fuzz -i testcase_dir -o sync_dir -M mainB:2/3 [...]
-./afl-fuzz -i testcase_dir -o sync_dir -M mainC:3/3 [...]
-```
-
-... where the first value after ':' is the sequential ID of a particular main
-instance (starting at 1), and the second value is the total number of fuzzers to
-distribute the deterministic fuzzing across. Note that if you boot up fewer
-fuzzers than indicated by the second number passed to -M, you may end up with
-poor coverage.
-
-## 4) Syncing with non-AFL fuzzers or independent instances
-
-A -M main node can be told with the `-F other_fuzzer_queue_directory` option to
-sync results from other fuzzers, e.g. libfuzzer or honggfuzz.
-
-Only the specified directory will by synced into afl, not subdirectories. The
-specified directory does not need to exist yet at the start of afl.
-
-The `-F` option can be passed to the main node several times.
-
-## 5) Multi-system parallelization
-
-The basic operating principle for multi-system parallelization is similar to the
-mechanism explained in section 2. The key difference is that you need to write a
-simple script that performs two actions:
-
-  - Uses SSH with authorized_keys to connect to every machine and retrieve a tar
-    archive of the /path/to/sync_dir/<main_node(s)> directory local to the
-    machine. It is best to use a naming scheme that includes host name and it's
-    being a main node (e.g. main1, main2) in the fuzzer ID, so that you can do
-    something like:
-
-    ```sh
-    for host in `cat HOSTLIST`; do
-      ssh user@$host "tar -czf - sync/$host_main*/" > $host.tgz
-    done
-    ```
-
-  - Distributes and unpacks these files on all the remaining machines, e.g.:
-
-    ```sh
-    for srchost in `cat HOSTLIST`; do
-      for dsthost in `cat HOSTLIST`; do
-        test "$srchost" = "$dsthost" && continue
-        ssh user@$srchost 'tar -kxzf -' < $dsthost.tgz
-      done
-    done
-    ```
-
-There is an example of such a script in utils/distributed_fuzzing/.
-
-There are other (older) more featured, experimental tools:
-  * https://github.com/richo/roving
-  * https://github.com/MartijnB/disfuzz-afl
-
-However these do not support syncing just main nodes (yet).
-
-When developing custom test case sync code, there are several optimizations to
-keep in mind:
-
-  - The synchronization does not have to happen very often; running the task
-    every 60 minutes or even less often at later fuzzing stages is fine
-
-  - There is no need to synchronize crashes/ or hangs/; you only need to copy
-    over queue/* (and ideally, also fuzzer_stats).
-
-  - It is not necessary (and not advisable!) to overwrite existing files; the -k
-    option in tar is a good way to avoid that.
-
-  - There is no need to fetch directories for fuzzers that are not running
-    locally on a particular machine, and were simply copied over onto that
-    system during earlier runs.
-
-  - For large fleets, you will want to consolidate tarballs for each host, as
-    this will let you use n SSH connections for sync, rather than n*(n-1).
-
-    You may also want to implement staged synchronization. For example, you
-    could have 10 groups of systems, with group 1 pushing test cases only to
-    group 2; group 2 pushing them only to group 3; and so on, with group
-    eventually 10 feeding back to group 1.
-
-    This arrangement would allow test interesting cases to propagate across the
-    fleet without having to copy every fuzzer queue to every single host.
-
-  - You do not want a "main" instance of afl-fuzz on every system; you should
-    run them all with -S, and just designate a single process somewhere within
-    the fleet to run with -M.
-
-  - Syncing is only necessary for the main nodes on a system. It is possible to
-    run main-less with only secondaries. However then you need to find out which
-    secondary took over the temporary role to be the main node. Look for the
-    `is_main_node` file in the fuzzer directories, eg.
-    `sync-dir/hostname-*/is_main_node`
-
-It is *not* advisable to skip the synchronization script and run the fuzzers
-directly on a network filesystem; unexpected latency and unkillable processes in
-I/O wait state can mess things up.
-
-## 6) Remote monitoring and data collection
-
-You can use screen, nohup, tmux, or something equivalent to run remote instances
-of afl-fuzz. If you redirect the program's output to a file, it will
-automatically switch from a fancy UI to more limited status reports. There is
-also basic machine-readable information which is always written to the
-fuzzer_stats file in the output directory. Locally, that information can be
-interpreted with afl-whatsup.
-
-In principle, you can use the status screen of the main (-M) instance to monitor
-the overall fuzzing progress and decide when to stop. In this mode, the most
-important signal is just that no new paths are being found for a longer while.
-If you do not have a main instance, just pick any single secondary instance to
-watch and go by that.
-
-You can also rely on that instance's output directory to collect the synthesized
-corpus that covers all the noteworthy paths discovered anywhere within the
-fleet. Secondary (-S) instances do not require any special monitoring, other
-than just making sure that they are up.
-
-Keep in mind that crashing inputs are *not* automatically propagated to the main
-instance, so you may still want to monitor for crashes fleet-wide from within
-your synchronization or health checking scripts (see afl-whatsup).
-
-## 7) Asymmetric setups
-
-It is perhaps worth noting that all of the following is permitted:
-
-  - Running afl-fuzz with conjunction with other guided tools that can extend
-    coverage (e.g., via concolic execution). Third-party tools simply need to
-    follow the protocol described above for pulling new test cases from
-    out_dir/<fuzzer_id>/queue/* and writing their own finds to sequentially
-    numbered id:nnnnnn files in out_dir/<ext_tool_id>/queue/*.
-
-  - Running some of the synchronized fuzzers with different (but related) target
-    binaries. For example, simultaneously stress-testing several different JPEG
-    parsers (say, IJG jpeg and libjpeg-turbo) while sharing the discovered test
-    cases can have synergistic effects and improve the overall coverage.
-
-    (In this case, running one -M instance per target is necessary.)
-
-  - Having some of the fuzzers invoke the binary in different ways. For example,
-    'djpeg' supports several DCT modes, configurable with a command-line flag,
-    while 'dwebp' supports incremental and one-shot decoding. In some scenarios,
-    going after multiple distinct modes and then pooling test cases will improve
-    coverage.
-
-  - Much less convincingly, running the synchronized fuzzers with different
-    starting test cases (e.g., progressive and standard JPEG) or dictionaries.
-    The synchronization mechanism ensures that the test sets will get fairly
-    homogeneous over time, but it introduces some initial variability.
\ No newline at end of file