diff options
Diffstat (limited to 'README.md')
-rw-r--r-- | README.md | 224 |
1 files changed, 181 insertions, 43 deletions
diff --git a/README.md b/README.md index ba612edb..2facedb6 100644 --- a/README.md +++ b/README.md @@ -38,7 +38,6 @@ With afl++ 3.10 we introduced the following changes from previous behaviours: With afl++ 3.00 we introduced changes that break some previous afl and afl++ behaviours and defaults: - * There are no llvm_mode and gcc_plugin subdirectories anymore and there is only one compiler: afl-cc. All previous compilers now symlink to this one. All instrumentation source code is now in the `instrumentation/` folder. @@ -50,8 +49,8 @@ behaviours and defaults: shared libraries, etc. Additionally QEMU 5.1 supports more CPU targets so this is really worth it. * When instrumenting targets, afl-cc will not supersede optimizations anymore - if any were given. This allows to fuzz targets as same as they are built - for debug or release. + if any were given. This allows to fuzz targets build regularly like those + for debug or release versions. * afl-fuzz: * if neither -M or -S is specified, `-S default` is assumed, so more fuzzers can easily be added later @@ -383,13 +382,62 @@ afl++ performs "never zero" counting in its bitmap. You can read more about this here: * [instrumentation/README.neverzero.md](instrumentation/README.neverzero.md) -#### c) Modify the target +#### c) Sanitizers + +It is possible to use sanitizers when instrumenting targets for fuzzing, +which allows you to find bugs that would not necessarily result in a crash. + +Note that sanitizers have a huge impact on CPU (= less executions per second) +and RAM usage. Also you should only run one afl-fuzz instance per sanitizer type. +This is enough because a use-after-free bug will be picked up, e.g. by +ASAN (address sanitizer) anyway when syncing to other fuzzing instances, +so not all fuzzing instances need to be instrumented with ASAN. + +The following sanitizers have built-in support in afl++: + * ASAN = Address SANitizer, finds memory corruption vulnerabilities like + use-after-free, NULL pointer dereference, buffer overruns, etc. + Enabled with `export AFL_USE_ASAN=1` before compiling. + * MSAN = Memory SANitizer, finds read access to uninitialized memory, eg. + a local variable that is defined and read before it is even set. + Enabled with `export AFL_USE_MSAN=1` before compiling. + * UBSAN = Undefined Behaviour SANitizer, finds instances where - by the + C and C++ standards - undefined behaviour happens, e.g. adding two + signed integers together where the result is larger than a signed integer + can hold. + Enabled with `export AFL_USE_UBSAN=1` before compiling. + * CFISAN = Control Flow Integrity SANitizer, finds instances where the + control flow is found to be illegal. Originally this was rather to + prevent return oriented programming exploit chains from functioning, + in fuzzing this is mostly reduced to detecting type confusion + vulnerabilities - which is however one of the most important and dangerous + C++ memory corruption classes! + Enabled with `export AFL_USE_CFISAN=1` before compiling. + * LSAN = Leak SANitizer, finds memory leaks in a program. This is not really + a security issue, but for developers this can be very valuable. + Note that unlike the other sanitizers above this needs + `__AFL_LEAK_CHECK();` added to all areas of the target source code where you + find a leak check necessary! + Enabled with `export AFL_USE_LSAN=1` before compiling. + +It is possible to further modify the behaviour of the sanitizers at run-time +by setting `ASAN_OPTIONS=...`, `LSAN_OPTIONS` etc. - the available parameters +can be looked up in the sanitizer documentation of llvm/clang. +afl-fuzz however requires some specific parameters important for fuzzing to be +set. If you want to set your own, it might bail and report what it is missing. + +Note that some sanitizers cannot be used together, e.g. ASAN and MSAN, and +others often cannot work together because of target weirdness, e.g. ASAN and +CFISAN. You might need to experiment which sanitizers you can combine in a +target (which means more instances can be run without a sanitized target, +which is more effective). + +#### d) Modify the target If the target has features that make fuzzing more difficult, e.g. -checksums, HMAC, etc. then modify the source code so that this is -removed. -This can even be done for operational source code by eliminating -these checks within this specific defines: +checksums, HMAC, etc. then modify the source code so that checks for these +values are removed. +This can even be done safely for source code used in operational products +by eliminating these checks within these AFL specific blocks: ``` #ifdef FUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION @@ -401,7 +449,7 @@ these checks within this specific defines: All afl++ compilers will set this preprocessor definition automatically. -#### d) Instrument the target +#### e) Instrument the target In this step the target source code is compiled so that it can be fuzzed. @@ -416,8 +464,8 @@ Then build the target. (Usually with `make`) 1. sometimes configure and build systems are fickle and do not like stderr output (and think this means a test failure) - which is something - afl++ likes to do to show statistics. It is recommended to disable them via - `export AFL_QUIET=1`. + afl++ likes to do to show statistics. It is recommended to disable afl++ + instrumentation reporting via `export AFL_QUIET=1`. 2. sometimes configure and build systems error on warnings - these should be disabled (e.g. `--disable-werror` for some configure scripts). @@ -458,20 +506,38 @@ non-standard way to set this, otherwise set up the build normally and edit the generated build environment afterwards manually to point it to the right compiler (and/or ranlib and ar). -#### d) Better instrumentation +#### f) Better instrumentation If you just fuzz a target program as-is you are wasting a great opportunity for much more fuzzing speed. -This requires the usage of afl-clang-lto or afl-clang-fast. +This variant requires the usage of afl-clang-lto, afl-clang-fast or afl-gcc-fast. -This is the so-called `persistent mode`, which is much, much faster but +It is the so-called `persistent mode`, which is much, much faster but requires that you code a source file that is specifically calling the target functions that you want to fuzz, plus a few specific afl++ functions around it. See [instrumentation/README.persistent_mode.md](instrumentation/README.persistent_mode.md) for details. Basically if you do not fuzz a target in persistent mode then you are just -doing it for a hobby and not professionally :-) +doing it for a hobby and not professionally :-). + +#### g) libfuzzer fuzzer harnesses with LLVMFuzzerTestOneInput() + +libfuzzer `LLVMFuzzerTestOneInput()` harnesses are the defacto standard +for fuzzing, and they can be used with afl++ (and honggfuzz) as well! +Compiling them is as simple as: +``` +afl-clang-fast++ -fsanitize=fuzzer -o harness harness.cpp targetlib.a +``` +You can even use advanced libfuzzer features like `FuzzedDataProvider`, +`LLVMFuzzerMutate()` etc. and they will work! + +The generated binary is fuzzed with afl-fuzz like any other fuzz target. + +Bonus: the target is already optimized for fuzzing due to persistent mode and +shared-memory testcases and hence gives you the fastest speed possible. + +For more information see [utils/aflpp_driver/README.md](utils/aflpp_driver/README.md) ### 2. Preparing the fuzzing campaign @@ -503,6 +569,8 @@ Note that the INPUTFILE argument that the target program would read from has to If the target reads from stdin instead, just omit the `@@` as this is the default. +This step is highly recommended! + #### c) Minimizing all corpus files The shorter the input files that still traverse the same path @@ -518,7 +586,8 @@ for i in *; do done ``` -This step can also be parallelized, e.g. with `parallel` +This step can also be parallelized, e.g. with `parallel`. +Note that this step is rather optional though. #### Done! @@ -554,6 +623,16 @@ step [2a. Collect inputs](#a-collect-inputs): `afl-fuzz -i input -o output -- bin/target -d @@` Note that the directory specified with -o will be created if it does not exist. +It can be valuable to run afl-fuzz in a screen or tmux shell so you can log off, +or afl-fuzz is not aborted if you are running it in a remote ssh session where +the connection fails in between. +Only do that though once you have verified that your fuzzing setup works! +Simply run it like `screen -dmS afl-main -- afl-fuzz -M main-$HOSTNAME -i ...` +and it will start away in a screen session. To enter this session simply type +`screen -r afl-main`. You see - it makes sense to name the screen session +same as the afl-fuzz -M/-S naming :-) +For more information on screen or tmux please check their documentation. + If you need to stop and re-start the fuzzing, use the same command line options (or even change them by selecting a different power schedule or another mutation mode!) and switch the input directory with a dash (`-`): @@ -572,8 +651,15 @@ to use afl-clang-lto as the compiler. You also have the option to generate a dictionary yourself, see [utils/libtokencap/README.md](utils/libtokencap/README.md). afl-fuzz has a variety of options that help to workaround target quirks like -specific locations for the input file (`-f`), not performing deterministic -fuzzing (`-d`) and many more. Check out `afl-fuzz -h`. +specific locations for the input file (`-f`), performing deterministic +fuzzing (`-D`) and many more. Check out `afl-fuzz -h`. + +We highly recommend that you set a memory limit for running the target with `-m` +which defines the maximum memory in MB. This prevents a potential +out-of-memory problem for your system plus helps you detect missing `malloc()` +failure handling in the target. +Play around with various -m values until you find one that safely works for all +your input seeds (if you have good ones and then double or quadrouple that. By default afl-fuzz never stops fuzzing. To terminate afl++ simply press Control-C or send a signal SIGINT. You can limit the number of executions or approximate runtime @@ -600,7 +686,7 @@ of the testcases. Depending on the average testcase size (and those found during fuzzing) and their number, a value between 50-500MB is recommended. You can set the cache size (in MB) by setting the environment variable `AFL_TESTCACHE_SIZE`. -There should be one main fuzzer (`-M main` option) and as many secondary +There should be one main fuzzer (`-M main-$HOSTNAME` option) and as many secondary fuzzers (eg `-S variant1`) as you have cores that you use. Every -M/-S entry needs a unique name (that can be whatever), however the same -o output directory location has to be used for all instances. @@ -614,23 +700,29 @@ For every secondary fuzzer there should be a variation, e.g.: * one to three fuzzers should fuzz a target compiled with laf-intel/COMPCOV (see above). Important note: If you run more than one laf-intel/COMPCOV fuzzer and you want them to share their intermediate results, the main - fuzzer (`-M`) must be one of the them! + fuzzer (`-M`) must be one of the them! (Although this is not really + recommended.) All other secondaries should be used like this: - * A third to a half with the MOpt mutator enabled: `-L 0` - * run with a different power schedule, available are: - `fast (default), explore, coe, lin, quad, exploit, mmopt, rare, seek` - which you can set with e.g. `-p seek` + * A quarter to a third with the MOpt mutator enabled: `-L 0` + * run with a different power schedule, recommended are: + `fast (default), explore, coe, lin, quad, exploit and rare` + which you can set with e.g. `-p explore` + * a few instances should use the old queue cycling with `-Z` Also it is recommended to set `export AFL_IMPORT_FIRST=1` to load testcases from other fuzzers in the campaign first. +If you have a large corpus, a corpus from a previous run or are fuzzing in +a CI, then also set `export AFL_CMPLOG_ONLY_NEW=1` and `export AFL_FAST_CAL=1`. + You can also use different fuzzers. If you are using afl spinoffs or afl conforming fuzzers, then just use the same -o directory and give it a unique `-S` name. Examples are: + * [Fuzzolic](https://github.com/season-lab/fuzzolic) + * [symcc](https://github.com/eurecom-s/symcc/) * [Eclipser](https://github.com/SoftSec-KAIST/Eclipser/) - * [Untracer](https://github.com/FoRTE-Research/UnTracer-AFL) * [AFLsmart](https://github.com/aflsmart/aflsmart) * [FairFuzz](https://github.com/carolemieux/afl-rb) * [Neuzz](https://github.com/Dongdongshe/neuzz) @@ -638,11 +730,52 @@ Examples are: A long list can be found at [https://github.com/Microsvuln/Awesome-AFL](https://github.com/Microsvuln/Awesome-AFL) -However you can also sync afl++ with honggfuzz, libfuzzer with -entropic, etc. +However you can also sync afl++ with honggfuzz, libfuzzer with `-entropic=1`, etc. Just show the main fuzzer (-M) with the `-F` option where the queue/work directory of a different fuzzer is, e.g. `-F /src/target/honggfuzz`. +Using honggfuzz (with `-n 1` or `-n 2`) and libfuzzer in parallel is highly +recommended! + +#### c) Using multiple machines for fuzzing + +Maybe you have more than one machine you want to fuzz the same target on. +Simply start the `afl-fuzz` (and perhaps libfuzzer, honggfuzz, ...) +orchestra as you like, just ensure that your have one and only one `-M` +instance per server, and that its name is unique, hence the recommendation +for `-M main-$HOSTNAME`. + +Now there are three strategies on how you can sync between the servers: + * never: sounds weird, but this makes every server an island and has the + chance the each follow different paths into the target. You can make + this even more interesting by even giving different seeds to each server. + * regularly (~4h): this ensures that all fuzzing campaigns on the servers + "see" the same thing. It is like fuzzing on a huge server. + * in intervals of 1/10th of the overall expected runtime of the fuzzing you + sync. This tries a bit to combine both. have some individuality of the + paths each campaign on a server explores, on the other hand if one + gets stuck where another found progress this is handed over making it + unstuck. + +The syncing process itself is very simple. +As the `-M main-$HOSTNAME` instance syncs to all `-S` secondaries as well +as to other fuzzers, you have to copy only this directory to the other +machines. + +Lets say all servers have the `-o out` directory in /target/foo/out, and +you created a file `servers.txt` which contains the hostnames of all +participating servers, plus you have an ssh key deployed to all of them, +then run: +```bash +for FROM in `cat servers.txt`; do + for TO in `cat servers.txt`; do + rsync -rlpogtz --rsh=ssh $FROM:/target/foo/out/main-$FROM $TO:target/foo/out/ + done +done +``` +You can run this manually, per cron job - as you need it. +There is a more complex and configurable script in `utils/distributed_fuzzing`. -#### c) The status of the fuzz campaign +#### d) The status of the fuzz campaign afl++ comes with the `afl-whatsup` script to show the status of the fuzzing campaign. @@ -651,11 +784,14 @@ Just supply the directory that afl-fuzz is given with the -o option and you will see a detailed status of every fuzzer in that campaign plus a summary. -To have only the summary use the `-s` switch e.g.: `afl-whatsup -s output/` +To have only the summary use the `-s` switch e.g.: `afl-whatsup -s out/` + +If you have multiple servers then use the command after a sync, or you have +to execute this script per server. -#### d) Checking the coverage of the fuzzing +#### e) Checking the coverage of the fuzzing -The `paths found` value is a bad indicator how good the coverage is. +The `paths found` value is a bad indicator for checking how good the coverage is. A better indicator - if you use default llvm instrumentation with at least version 9 - is to use `afl-showmap` with the collect coverage option `-C` on @@ -683,12 +819,13 @@ then terminate it. The main node will pick it up and make it available to the other secondary nodes over time. Set `export AFL_NO_AFFINITY=1` or `export AFL_TRY_AFFINITY=1` if you have no free core. -Note that you in nearly all cases can never reach full coverage. A lot of -functionality is usually behind options that were not activated or fuzz e.g. -if you fuzz a library to convert image formats and your target is the png to -tiff API then you will not touch any of the other library APIs and features. +Note that in nearly all cases you can never reach full coverage. A lot of +functionality is usually dependent on exclusive options that would need individual +fuzzing campaigns each with one of these options set. E.g. if you fuzz a library to +convert image formats and your target is the png to tiff API then you will not +touch any of the other library APIs and features. -#### e) How long to fuzz a target? +#### f) How long to fuzz a target? This is a difficult question. Basically if no new path is found for a long time (e.g. for a day or a week) @@ -700,7 +837,7 @@ Keep the queue/ directory (for future fuzzings of the same or similar targets) and use them to seed other good fuzzers like libfuzzer with the -entropic switch or honggfuzz. -#### f) Improve the speed! +#### g) Improve the speed! * Use [persistent mode](instrumentation/README.persistent_mode.md) (x2-x20 speed increase) * If you do not use shmem persistent mode, use `AFL_TMPDIR` to point the input file on a tempfs location, see [docs/env_variables.md](docs/env_variables.md) @@ -767,25 +904,26 @@ campaigns as these are much shorter runnings. corpus needs to be loaded. * `AFL_CMPLOG_ONLY_NEW` - only perform cmplog on new found paths, not the initial corpus as this very likely has been done for them already. - * Keep the generated corpus, use afl-cmin and reuse it everytime! + * Keep the generated corpus, use afl-cmin and reuse it every time! 2. Additionally randomize the afl++ compilation options, e.g. * 40% for `AFL_LLVM_CMPLOG` * 10% for `AFL_LLVM_LAF_ALL` 3. Also randomize the afl-fuzz runtime options, e.g. - * 60% for `AFL_DISABLE_TRIM` + * 65% for `AFL_DISABLE_TRIM` * 50% use a dictionary generated by `AFL_LLVM_DICT2FILE` - * 50% use MOpt (`-L 0`) + * 40% use MOpt (`-L 0`) * 40% for `AFL_EXPAND_HAVOC_NOW` - * 30% for old queue processing (`-Z`) + * 20% for old queue processing (`-Z`) * for CMPLOG targets, 60% for `-l 2`, 40% for `-l 3` 4. Do *not* run any `-M` modes, just running `-S` modes is better for CI fuzzing. - `-M` enables deterministic fuzzing, old queue handling etc. which is good for - a fuzzing campaign but not good for short CI runs. + `-M` enables old queue handling etc. which is good for a fuzzing campaign but + not good for short CI runs. -How this can look like can e.g. be seen at afl++'s setup in Google's [oss-fuzz](https://github.com/google/oss-fuzz/blob/4bb61df7905c6005000f5766e966e6fe30ab4559/infra/base-images/base-builder/compile_afl#L69). +How this can look like can e.g. be seen at afl++'s setup in Google's [oss-fuzz](https://github.com/google/oss-fuzz/blob/master/infra/base-images/base-builder/compile_afl) +and [clusterfuzz](https://github.com/google/clusterfuzz/blob/master/src/python/bot/fuzzers/afl/launcher.py). ## Fuzzing binary-only targets |