about summary refs log tree commit diff
path: root/utils
diff options
context:
space:
mode:
authorvan Hauser <vh@thc.org>2024-02-01 15:13:07 +0100
committerGitHub <noreply@github.com>2024-02-01 14:13:07 +0000
commiteda770fd32b804e3ebd6a43738c0002f6118a463 (patch)
treeef1db3e42bd23e9627ad695f9a65f1e7c5b951b0 /utils
parent0c054f520eda67b7bb15f95ca58c028e9b68131f (diff)
downloadafl++-eda770fd32b804e3ebd6a43738c0002f6118a463.tar.gz
push to stable (#1967)
* Output afl-clang-fast stuffs only if necessary (#1912)

* afl-cc header

* afl-cc common declarations

 - Add afl-cc-state.c
 - Strip includes, find_object, debug/be_quiet/have_*/callname setting from afl-cc.c
 - Use debugf_args in main
 - Modify execvp stuffs to fit new aflcc struct

* afl-cc show usage

* afl-cc mode selecting

1. compiler_mode by callname in argv[0]
2. compiler_mode by env "AFL_CC_COMPILER"
3. compiler_mode/instrument_mode by command line options "--afl-..."
4. instrument_mode/compiler_mode by various env vars including "AFL_LLVM_INSTRUMENT"
5. final checking steps
6. print "... - mode: %s-%s\n"
7. determine real argv[0] according to compiler_mode

* afl-cc macro defs

* afl-cc linking behaviors

* afl-cc fsanitize behaviors

* afl-cc misc

* afl-cc body update

* afl-cc all-in-one

formated with custom-format.py

* nits

---------

Co-authored-by: vanhauser-thc <vh@thc.org>

* changelog

* update grammar mutator

* lto llvm 12+

* docs(custom_mutators): fix missing ':' (#1953)

* Fix broken LTO mode and response file support (#1948)

* Strip `-Wl,-no-undefined` during compilation (#1952)

Make the compiler wrapper stripping `-Wl,-no-undefined` in addition to `-Wl,--no-undefined`.
Both versions of the flag are accepted by clang and, therefore, used by building systems in the wild (e.g., samba will not build without this fix).

* Remove dead code in write_to_testcase (#1955)

The custom_mutators_count check in if case is duplicate with if condition.
The else case is custom_mutators_count == 0, neither custom_mutator_list iteration nor sent check needed.

Signed-off-by: Xeonacid <h.dwwwwww@gmail.com>

* update qemuafl

* WIP: Add ability to generate drcov trace using QEMU backend (#1956)

* Document new drcov QEMU plugin

* Add link to lightkeeper for QEMU drcov file loading

---------

Co-authored-by: Jean-Romain Garnier <jean-romain.garnier@airbus.com>

* code format

* changelog

* sleep on uid != 0 afl-system-config

* fix segv about skip_next, warn on unsupported cases of linking options (#1958)

* todos

* ensure afl-cc only allows available compiler modes

* update grammar mutator

* disable aslr on apple

* fix for arm64

* help selective instrumentation

* typos

* macos

* add compiler test script

* apple fixes

* bump nyx submodules (#1963)

* fix docs

* update changelog

* update grammar mutator

* improve compiler test script

* gcc asan workaround (#1966)

* fix github merge fuckup

* fix

* Fix afl-cc (#1968)

- Check if too many cmdline params here, each time before insert a new param.
 - Check if it is "-fsanitize=..." before we do sth.
 - Remove improper param_st transfer.

* Avoid adding llvmnative instrumentation when linking rust sanitizer runtime (#1969)

* Dynamic instrumentation filtering for LLVM native (#1971)

* Add two dynamic instrumentation filter methods to runtime

* Always use pc-table with native pcguard

* Add make_symbol_list.py and README

* changelog

* todos

* new forkserver check

* fix

* nyx test for CI

* improve nyx docs

* Fixes to afl-cc and documentation (#1974)

* Always compile with -ldl when building for CODE_COVERAGE

When building with CODE_COVERAGE, the afl runtime contains code that
calls `dladdr` which requires -ldl. Under most circumstances, clang
already adds this (e.g. when building with pc-table), but there are some
circumstances where it isn't added automatically.

* Add visibility declaration to __afl_connected

When building with hidden visibility, the use of __AFL_LOOP inside such
code can cause linker errors due to __afl_connected being declared
"hidden".

* Update docs to clarify that CODE_COVERAGE=1 is required for dynamic_covfilter

* nits

* nyx build script updates

* test error output

* debug ci

* debug ci

* Improve afl-cc (#1975)

* update response file support

 - full support of rsp file
 - fix some segv issues

* Improve afl-cc

 - remove dead code about allow/denylist options of sancov
 - missing `if (!aflcc->have_msan)`
 - add docs for each function
 - typo

* enable nyx

* debug ci

* debug ci

* debug ci

* debug ci

* debug ci

* debug ci

* debug ci

* debug ci

* fix ci

* clean test script

* NO_NYX

* NO_NYX

* fix ci

* debug ci

* fix ci

* finalize ci fix

---------

Signed-off-by: Xeonacid <h.dwwwwww@gmail.com>
Co-authored-by: Sonic <50692172+SonicStark@users.noreply.github.com>
Co-authored-by: Xeonacid <h.dwwwwww@gmail.com>
Co-authored-by: Nils Bars <nils.bars@rub.de>
Co-authored-by: Jean-Romain Garnier <7504819+JRomainG@users.noreply.github.com>
Co-authored-by: Jean-Romain Garnier <jean-romain.garnier@airbus.com>
Co-authored-by: Sergej Schumilo <sergej@schumilo.de>
Co-authored-by: Christian Holler (:decoder) <choller@mozilla.com>
Diffstat (limited to 'utils')
-rw-r--r--utils/dynamic_covfilter/README.md60
-rw-r--r--utils/dynamic_covfilter/make_symbol_list.py73
2 files changed, 133 insertions, 0 deletions
diff --git a/utils/dynamic_covfilter/README.md b/utils/dynamic_covfilter/README.md
new file mode 100644
index 00000000..381e0855
--- /dev/null
+++ b/utils/dynamic_covfilter/README.md
@@ -0,0 +1,60 @@
+# Dynamic Instrumentation Filter
+
+Sometimes it can be beneficial to limit the instrumentation feedback to
+specific code locations. It is possible to do so at compile-time by simply
+not instrumenting any undesired locations. However, there are situations
+where doing this dynamically without requiring a new build can be beneficial.
+Especially when dealing with larger builds, it is much more convenient to
+select the target code locations at runtime instead of doing so at build time.
+
+There are two ways of doing this in AFL++. Both approaches require a build of
+AFL++ with `CODE_COVERAGE=1`, so make sure to build AFL++ first by invoking
+
+`CODE_COVERAGE=1 make`
+
+Once you have built AFL++, you can choose out of two approaches:
+
+## Simple Selection with `AFL_PC_FILTER`
+
+This approach requires a build with `AFL_INSTRUMENTATION=llvmnative` or
+`llvmcodecov` as well as an AddressSanitizer build with debug information.
+
+By setting the environment variable `AFL_PC_FILTER` to a string, the runtime
+symbolizer is enabled in the AFL++ runtime. At startup, the runtime will call
+the `__sanitizer_symbolize_pc` API to resolve every PC in every loaded module.
+The runtime then matches the result using `strstr` and disables the PC guard
+if the symbolized PC does not contain the specified string.
+
+This approach has the benefit of being very easy to use. The downside is that
+it causes significant startup delays with large binaries and that it requires
+an AddressSanitizer build.
+
+This method has no additional runtime overhead after startup.
+
+## Selection using pre-symbolized data file with `AFL_PC_FILTER_FILE`
+
+To avoid large startup time delays, a specific module can be pre-symbolized
+using the `make_symbol_list.py` script. This script outputs a sorted list of
+functions with their respective relative offsets and lengths in the target
+binary:
+
+`python3 make_symbol_list.py libxul.so > libxul.symbols.txt`
+
+The resulting list can be filtered, e.g. using grep:
+
+`grep -i "webgl" libxul.symbols.txt > libxul.webgl.symbols.txt`
+
+Finally, you can run with `AFL_PC_FILTER_FILE=libxul.webgl.symbols.txt` to
+restrict instrumentation feedback to the given locations. This approach only
+has a minimal startup time delay due to the implementation only using binary
+search on the given file per PC rather than reading debug information for every
+PC. It also works well with Nyx, where symbolizing is usually disabled for the
+target process to avoid delays with frequent crashes.
+
+Similar to the previous method, This approach requires a build with 
+`AFL_INSTRUMENTATION=llvmnative` or `llvmcodecov` as well debug information.
+However, it does not require the ASan runtime as it doesn't do the symbolizing
+in process. Due to the way it maps PCs to symbols, it is less accurate when it
+comes to includes and inlines (it assumes all PCs within a function belong to
+that function and originate from the same file). For most purposes, this should
+be a reasonable simplification to quickly process even the largest binaries.
diff --git a/utils/dynamic_covfilter/make_symbol_list.py b/utils/dynamic_covfilter/make_symbol_list.py
new file mode 100644
index 00000000..d1dd6ab3
--- /dev/null
+++ b/utils/dynamic_covfilter/make_symbol_list.py
@@ -0,0 +1,73 @@
+# This Source Code Form is subject to the terms of the Mozilla Public
+# License, v. 2.0. If a copy of the MPL was not distributed with this
+# file, You can obtain one at http://mozilla.org/MPL/2.0/.
+#
+# Written by Christian Holler <decoder at mozilla dot com>
+
+import json
+import os
+import sys
+import subprocess
+
+if len(sys.argv) != 2:
+    print("Usage: %s binfile" % os.path.basename(sys.argv[0]))
+    sys.exit(1)
+
+binfile = sys.argv[1]
+
+addr2len = {}
+addrs = []
+
+output = subprocess.check_output(["objdump", "-t", binfile]).decode("utf-8")
+for line in output.splitlines():
+    line = line.replace("\t", " ")
+    components = [x for x in line.split(" ") if x]
+    if not components:
+        continue
+    try:
+        start_addr = int(components[0], 16)
+    except ValueError:
+        continue
+
+    # Length has variable position in objdump output
+    length = None
+    for comp in components[1:]:
+        if len(comp) == 16:
+            try:
+                length = int(comp, 16)
+                break
+            except:
+                continue
+
+    if length is None:
+        print("ERROR: Couldn't determine function section length: %s" % line)
+
+    func = components[-1]
+    
+    addrs.append(start_addr)
+    addr2len[str(hex(start_addr))] = str(length)
+
+# The search implementation in the AFL runtime expects everything sorted.
+addrs.sort()
+addrs = [str(hex(addr)) for addr in addrs]
+
+# We symbolize in one go to speed things up with large binaries.
+output = subprocess.check_output([
+    "llvm-addr2line",
+    "--output-style=JSON",
+    "-f", "-C", "-a", "-e",
+    binfile],
+    input="\n".join(addrs).encode("utf-8")).decode("utf-8")
+
+output = output.strip().splitlines()
+for line in output:
+    output = json.loads(line)
+    if "Symbol" in output and output["Address"] in addr2len:
+        final_output = [
+            output["Address"],
+            addr2len[output["Address"]],
+            os.path.basename(output["ModuleName"]),
+            output["Symbol"][0]["FileName"],
+            output["Symbol"][0]["FunctionName"]
+        ]
+        print("\t".join(final_output))