11 files changed, 977 insertions, 0 deletions
diff --git a/qemu_mode/README.qemu b/qemu_mode/README.qemu
new file mode 100644
index 00000000..cf29088b
--- /dev/null
+++ b/qemu_mode/README.qemu
@@ -0,0 +1,125 @@
+=========================================================
+High-performance binary-only instrumentation for afl-fuzz
+=========================================================
+
+  (See ../docs/README for the general instruction manual.)
+
+1) Introduction
+---------------
+
+The code in this directory allows you to build a standalone feature that
+leverages the QEMU "user emulation" mode and allows callers to obtain
+instrumentation output for black-box, closed-source binaries. This mechanism
+can be then used by afl-fuzz to stress-test targets that couldn't be built
+with afl-gcc.
+
+The usual performance cost is 2-5x, which is considerably better than
+seen so far in experiments with tools such as DynamoRIO and PIN.
+
+The idea and much of the implementation comes from Andrew Griffiths.
+
+2) How to use
+-------------
+
+The feature is implemented with a fairly simple patch to QEMU 2.10.0. The
+simplest way to build it is to run ./build_qemu_support.sh. The script will
+download, configure, and compile the QEMU binary for you.
+
+QEMU is a big project, so this will take a while, and you may have to
+resolve a couple of dependencies (most notably, you will definitely need
+libtool and glib2-devel).
+
+Once the binaries are compiled, you can leverage the QEMU tool by calling
+afl-fuzz and all the related utilities with -Q in the command line.
+
+Note that QEMU requires a generous memory limit to run; somewhere around
+200 MB is a good starting point, but considerably more may be needed for
+more complex programs. The default -m limit will be automatically bumped up
+to 200 MB when specifying -Q to afl-fuzz; be careful when overriding this.
+
+In principle, if you set CPU_TARGET before calling ./build_qemu_support.sh,
+you should get a build capable of running non-native binaries (say, you
+can try CPU_TARGET=arm). This is also necessary for running 32-bit binaries
+on a 64-bit system (CPU_TARGET=i386).
+
+Note: if you want the QEMU helper to be installed on your system for all
+users, you need to build it before issuing 'make install' in the parent
+directory.
+
+3) Notes on linking
+-------------------
+
+The feature is supported only on Linux. Supporting BSD may amount to porting
+the changes made to linux-user/elfload.c and applying them to
+bsd-user/elfload.c, but I have not looked into this yet.
+
+The instrumentation follows only the .text section of the first ELF binary
+encountered in the linking process. It does not trace shared libraries. In
+practice, this means two things:
+
+  - Any libraries you want to analyze *must* be linked statically into the
+    executed ELF file (this will usually be the case for closed-source
+    apps).
+
+  - Standard C libraries and other stuff that is wasteful to instrument
+    should be linked dynamically - otherwise, AFL will have no way to avoid
+    peeking into them.
+
+Setting AFL_INST_LIBS=1 can be used to circumvent the .text detection logic
+and instrument every basic block encountered.
+
+4) Benchmarking
+---------------
+
+If you want to compare the performance of the QEMU instrumentation with that of
+afl-gcc compiled code against the same target, you need to build the
+non-instrumented binary with the same optimization flags that are normally
+injected by afl-gcc, and make sure that the bits to be tested are statically
+linked into the binary. A common way to do this would be:
+
+$ CFLAGS="-O3 -funroll-loops" ./configure --disable-shared
+$ make clean all
+
+Comparative measurements of execution speed or instrumentation coverage will be
+fairly meaningless if the optimization levels or instrumentation scopes don't
+match.
+
+5) Gotchas, feedback, bugs
+--------------------------
+
+If you need to fix up checksums or do other cleanup on mutated test cases, see
+experimental/post_library/ for a viable solution.
+
+Do not mix QEMU mode with ASAN, MSAN, or the likes; QEMU doesn't appreciate
+the "shadow VM" trick employed by the sanitizers and will probably just
+run out of memory.
+
+Compared to fully-fledged virtualization, the user emulation mode is *NOT* a
+security boundary. The binaries can freely interact with the host OS. If you
+somehow need to fuzz an untrusted binary, put everything in a sandbox first.
+
+QEMU does not necessarily support all CPU or hardware features that your
+target program may be utilizing. In particular, it does not appear to have
+full support for AVX2 / FMA3. Using binaries for older CPUs, or recompiling them
+with -march=core2, can help.
+
+Beyond that, this is an early-stage mechanism, so fields reports are welcome.
+You can send them to <afl-users@googlegroups.com>.
+
+6) Alternatives: static rewriting
+---------------------------------
+
+Statically rewriting binaries just once, instead of attempting to translate
+them at run time, can be a faster alternative. That said, static rewriting is
+fraught with peril, because it depends on being able to properly and fully model
+program control flow without actually executing each and every code path.
+
+If you want to experiment with this mode of operation, there is a module
+contributed by Aleksandar Nikolich:
+
+  https://github.com/vrtadmin/moflow/tree/master/afl-dyninst
+  https://groups.google.com/forum/#!topic/afl-users/HlSQdbOTlpg
+
+At this point, the author reports the possibility of hiccups with stripped
+binaries. That said, if we can get it to be comparably reliable to QEMU, we may
+decide to switch to this mode, but I had no time to play with it yet.
diff --git a/qemu_mode/build_qemu_support.sh b/qemu_mode/build_qemu_support.sh
new file mode 100755
index 00000000..2c5203cc
--- /dev/null
+++ b/qemu_mode/build_qemu_support.sh
@@ -0,0 +1,204 @@
+#!/bin/sh
+#
+# american fuzzy lop - QEMU build script
+# --------------------------------------
+#
+# Written by Andrew Griffiths <agriffiths@google.com> and
+#            Michal Zalewski <lcamtuf@google.com>
+#
+# Copyright 2015, 2016, 2017 Google Inc. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at:
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# This script downloads, patches, and builds a version of QEMU with
+# minor tweaks to allow non-instrumented binaries to be run under
+# afl-fuzz. 
+#
+# The modifications reside in patches/*. The standalone QEMU binary
+# will be written to ../afl-qemu-trace.
+#
+
+
+VERSION="2.10.0"
+QEMU_URL="http://download.qemu-project.org/qemu-${VERSION}.tar.xz"
+QEMU_SHA384="68216c935487bc8c0596ac309e1e3ee75c2c4ce898aab796faa321db5740609ced365fedda025678d072d09ac8928105"
+
+echo "================================================="
+echo "AFL binary-only instrumentation QEMU build script"
+echo "================================================="
+echo
+
+echo "[*] Performing basic sanity checks..."
+
+if [ ! "`uname -s`" = "Linux" ]; then
+
+  echo "[-] Error: QEMU instrumentation is supported only on Linux."
+  exit 1
+
+fi
+
+if [ ! -f "patches/afl-qemu-cpu-inl.h" -o ! -f "../config.h" ]; then
+
+  echo "[-] Error: key files not found - wrong working directory?"
+  exit 1
+
+fi
+
+if [ ! -f "../afl-showmap" ]; then
+
+  echo "[-] Error: ../afl-showmap not found - compile AFL first!"
+  exit 1
+
+fi
+
+
+for i in libtool wget python automake autoconf sha384sum bison iconv; do
+
+  T=`which "$i" 2>/dev/null`
+
+  if [ "$T" = "" ]; then
+
+    echo "[-] Error: '$i' not found, please install first."
+    exit 1
+
+  fi
+
+done
+
+if [ ! -d "/usr/include/glib-2.0/" -a ! -d "/usr/local/include/glib-2.0/" ]; then
+
+  echo "[-] Error: devel version of 'glib2' not found, please install first."
+  exit 1
+
+fi
+
+if echo "$CC" | grep -qF /afl-; then
+
+  echo "[-] Error: do not use afl-gcc or afl-clang to compile this tool."
+  exit 1
+
+fi
+
+echo "[+] All checks passed!"
+
+ARCHIVE="`basename -- "$QEMU_URL"`"
+
+CKSUM=`sha384sum -- "$ARCHIVE" 2>/dev/null | cut -d' ' -f1`
+
+if [ ! "$CKSUM" = "$QEMU_SHA384" ]; then
+
+  echo "[*] Downloading QEMU ${VERSION} from the web..."
+  rm -f "$ARCHIVE"
+  wget -O "$ARCHIVE" -- "$QEMU_URL" || exit 1
+
+  CKSUM=`sha384sum -- "$ARCHIVE" 2>/dev/null | cut -d' ' -f1`
+
+fi
+
+if [ "$CKSUM" = "$QEMU_SHA384" ]; then
+
+  echo "[+] Cryptographic signature on $ARCHIVE checks out."
+
+else
+
+  echo "[-] Error: signature mismatch on $ARCHIVE (perhaps download error?)."
+  exit 1
+
+fi
+
+echo "[*] Uncompressing archive (this will take a while)..."
+
+rm -rf "qemu-${VERSION}" || exit 1
+tar xf "$ARCHIVE" || exit 1
+
+echo "[+] Unpacking successful."
+
+echo "[*] Configuring QEMU for $CPU_TARGET..."
+
+ORIG_CPU_TARGET="$CPU_TARGET"
+
+test "$CPU_TARGET" = "" && CPU_TARGET="`uname -m`"
+test "$CPU_TARGET" = "i686" && CPU_TARGET="i386"
+
+cd qemu-$VERSION || exit 1
+
+echo "[*] Applying patches..."
+
+patch -p1 <../patches/elfload.diff || exit 1
+patch -p1 <../patches/cpu-exec.diff || exit 1
+patch -p1 <../patches/syscall.diff || exit 1
+patch -p1 <../patches/configure.diff || exit 1
+patch -p1 <../patches/memfd.diff || exit 1
+patch -p1 <../patches/translate-all.diff || exit 1
+patch -p1 <../patches/elfload2.diff || exit 1
+
+echo "[+] Patching done."
+
+# --enable-pie seems to give a couple of exec's a second performance
+# improvement, much to my surprise. Not sure how universal this is..
+
+CFLAGS="-O3 -ggdb" ./configure --disable-system \
+  --enable-linux-user --disable-gtk --disable-sdl --disable-vnc \
+  --target-list="${CPU_TARGET}-linux-user" --enable-pie --enable-kvm || exit 1
+
+echo "[+] Configuration complete."
+
+echo "[*] Attempting to build QEMU (fingers crossed!)..."
+
+make || exit 1
+
+echo "[+] Build process successful!"
+
+echo "[*] Copying binary..."
+
+cp -f "${CPU_TARGET}-linux-user/qemu-${CPU_TARGET}" "../../afl-qemu-trace" || exit 1
+
+cd ..
+ls -l ../afl-qemu-trace || exit 1
+
+echo "[+] Successfully created '../afl-qemu-trace'."
+
+if [ "$ORIG_CPU_TARGET" = "" ]; then
+
+  echo "[*] Testing the build..."
+
+  cd ..
+
+  make >/dev/null || exit 1
+
+  gcc test-instr.c -o test-instr || exit 1
+
+  unset AFL_INST_RATIO
+
+  echo 0 | ./afl-showmap -m none -Q -q -o .test-instr0 ./test-instr || exit 1
+  echo 1 | ./afl-showmap -m none -Q -q -o .test-instr1 ./test-instr || exit 1
+
+  rm -f test-instr
+
+  cmp -s .test-instr0 .test-instr1
+  DR="$?"
+
+  rm -f .test-instr0 .test-instr1
+
+  if [ "$DR" = "0" ]; then
+
+    echo "[-] Error: afl-qemu-trace instrumentation doesn't seem to work!"
+    exit 1
+
+  fi
+
+  echo "[+] Instrumentation tests passed. "
+  echo "[+] All set, you can now use the -Q mode in afl-fuzz!"
+
+else
+
+  echo "[!] Note: can't test instrumentation when CPU_TARGET set."
+  echo "[+] All set, you can now (hopefully) use the -Q mode in afl-fuzz!"
+
+fi
+
+exit 0
diff --git a/qemu_mode/patches/afl-qemu-cpu-inl.h b/qemu_mode/patches/afl-qemu-cpu-inl.h
new file mode 100644
index 00000000..f7a32c4c
--- /dev/null
+++ b/qemu_mode/patches/afl-qemu-cpu-inl.h
@@ -0,0 +1,356 @@
+/*
+   american fuzzy lop - high-performance binary-only instrumentation
+   -----------------------------------------------------------------
+
+   Written by Andrew Griffiths <agriffiths@google.com> and
+              Michal Zalewski <lcamtuf@google.com>
+
+   Idea & design very much by Andrew Griffiths.
+
+   TCG instrumentation and block chaining support by Andrea Biondo
+                                      <andrea.biondo965@gmail.com>
+
+   Copyright 2015, 2016, 2017 Google Inc. All rights reserved.
+
+   Licensed under the Apache License, Version 2.0 (the "License");
+   you may not use this file except in compliance with the License.
+   You may obtain a copy of the License at:
+
+     http://www.apache.org/licenses/LICENSE-2.0
+
+   This code is a shim patched into the separately-distributed source
+   code of QEMU 2.10.0. It leverages the built-in QEMU tracing functionality
+   to implement AFL-style instrumentation and to take care of the remaining
+   parts of the AFL fork server logic.
+
+   The resulting QEMU binary is essentially a standalone instrumentation
+   tool; for an example of how to leverage it for other purposes, you can
+   have a look at afl-showmap.c.
+
+ */
+
+#include <sys/shm.h>
+#include "../../config.h"
+
+/***************************
+ * VARIOUS AUXILIARY STUFF *
+ ***************************/
+
+/* This snippet kicks in when the instruction pointer is positioned at
+   _start and does the usual forkserver stuff, not very different from
+   regular instrumentation injected via afl-as.h. */
+
+#define AFL_QEMU_CPU_SNIPPET2 do { \
+    if(itb->pc == afl_entry_point) { \
+      afl_setup(); \
+      afl_forkserver(cpu); \
+    } \
+  } while (0)
+
+/* We use one additional file descriptor to relay "needs translation"
+   messages between the child and the fork server. */
+
+#define TSL_FD (FORKSRV_FD - 1)
+
+/* This is equivalent to afl-as.h: */
+
+static unsigned char dummy[65536];
+unsigned char *afl_area_ptr = dummy;
+
+/* Exported variables populated by the code patched into elfload.c: */
+
+abi_ulong afl_entry_point, /* ELF entry point (_start) */
+          afl_start_code,  /* .text start pointer      */
+          afl_end_code;    /* .text end pointer        */
+
+/* Set in the child process in forkserver mode: */
+
+static unsigned char afl_fork_child;
+unsigned int afl_forksrv_pid;
+
+/* Instrumentation ratio: */
+
+unsigned int afl_inst_rms = MAP_SIZE; /* Exported for afl_gen_trace */
+
+/* Function declarations. */
+
+static void afl_setup(void);
+static void afl_forkserver(CPUState*);
+
+static void afl_wait_tsl(CPUState*, int);
+static void afl_request_tsl(target_ulong, target_ulong, uint32_t, TranslationBlock*, int);
+
+/* Data structures passed around by the translate handlers: */
+
+struct afl_tb {
+  target_ulong pc;
+  target_ulong cs_base;
+  uint32_t flags;
+};
+
+struct afl_tsl {
+  struct afl_tb tb;
+  char is_chain;
+};
+
+struct afl_chain {
+  struct afl_tb last_tb;
+  int tb_exit;
+};
+
+/* Some forward decls: */
+
+TranslationBlock *tb_htable_lookup(CPUState*, target_ulong, target_ulong, uint32_t);
+static inline TranslationBlock *tb_find(CPUState*, TranslationBlock*, int);
+
+/*************************
+ * ACTUAL IMPLEMENTATION *
+ *************************/
+
+/* Set up SHM region and initialize other stuff. */
+
+static void afl_setup(void) {
+
+  char *id_str = getenv(SHM_ENV_VAR),
+       *inst_r = getenv("AFL_INST_RATIO");
+
+  int shm_id;
+
+  if (inst_r) {
+
+    unsigned int r;
+
+    r = atoi(inst_r);
+
+    if (r > 100) r = 100;
+    if (!r) r = 1;
+
+    afl_inst_rms = MAP_SIZE * r / 100;
+
+  }
+
+  if (id_str) {
+
+    shm_id = atoi(id_str);
+    afl_area_ptr = shmat(shm_id, NULL, 0);
+
+    if (afl_area_ptr == (void*)-1) exit(1);
+
+    /* With AFL_INST_RATIO set to a low value, we want to touch the bitmap
+       so that the parent doesn't give up on us. */
+
+    if (inst_r) afl_area_ptr[0] = 1;
+
+
+  }
+
+  if (getenv("AFL_INST_LIBS")) {
+
+    afl_start_code = 0;
+    afl_end_code   = (abi_ulong)-1;
+
+  }
+
+  /* pthread_atfork() seems somewhat broken in util/rcu.c, and I'm
+     not entirely sure what is the cause. This disables that
+     behaviour, and seems to work alright? */
+
+  rcu_disable_atfork();
+
+}
+
+
+/* Fork server logic, invoked once we hit _start. */
+static int forkserver_installed = 0;
+static void afl_forkserver(CPUState *cpu) {
+  if (forkserver_installed == 1)
+    return;
+  forkserver_installed = 1;
+
+  static unsigned char tmp[4];
+  //if (!afl_area_ptr) return;
+
+  /* Tell the parent that we're alive. If the parent doesn't want
+     to talk, assume that we're not running in forkserver mode. */
+
+  if (write(FORKSRV_FD + 1, tmp, 4) != 4) return;
+
+  afl_forksrv_pid = getpid();
+
+  /* All right, let's await orders... */
+
+  while (1) {
+
+    pid_t child_pid;
+    int status, t_fd[2];
+
+    /* Whoops, parent dead? */
+
+    if (read(FORKSRV_FD, tmp, 4) != 4) exit(2);
+
+    /* Establish a channel with child to grab translation commands. We'll
+       read from t_fd[0], child will write to TSL_FD. */
+
+    if (pipe(t_fd) || dup2(t_fd[1], TSL_FD) < 0) exit(3);
+    close(t_fd[1]);
+
+    child_pid = fork();
+    if (child_pid < 0) exit(4);
+
+    if (!child_pid) {
+
+      /* Child process. Close descriptors and run free. */
+
+      afl_fork_child = 1;
+      close(FORKSRV_FD);
+      close(FORKSRV_FD + 1);
+      close(t_fd[0]);
+      return;
+
+    }
+
+    /* Parent. */
+
+    close(TSL_FD);
+
+    if (write(FORKSRV_FD + 1, &child_pid, 4) != 4) exit(5);
+
+    /* Collect translation requests until child dies and closes the pipe. */
+
+    afl_wait_tsl(cpu, t_fd[0]);
+
+    /* Get and relay exit status to parent. */
+
+    if (waitpid(child_pid, &status, 0) < 0) exit(6);
+    if (write(FORKSRV_FD + 1, &status, 4) != 4) exit(7);
+
+  }
+
+}
+
+
+#if 0
+/* The equivalent of the tuple logging routine from afl-as.h. */
+
+static inline void afl_maybe_log(abi_ulong cur_loc) {
+
+  static __thread abi_ulong prev_loc;
+
+  /* Optimize for cur_loc > afl_end_code, which is the most likely case on
+     Linux systems. */
+
+  if (cur_loc > afl_end_code || cur_loc < afl_start_code /*|| !afl_area_ptr*/)
+    return;
+
+  /* Looks like QEMU always maps to fixed locations, so ASAN is not a
+     concern. Phew. But instruction addresses may be aligned. Let's mangle
+     the value to get something quasi-uniform. */
+
+  cur_loc  = (cur_loc >> 4) ^ (cur_loc << 8);
+  cur_loc &= MAP_SIZE - 1;
+
+  /* Implement probabilistic instrumentation by looking at scrambled block
+     address. This keeps the instrumented locations stable across runs. */
+
+  if (cur_loc >= afl_inst_rms) return;
+
+  afl_area_ptr[cur_loc ^ prev_loc]++;
+  prev_loc = cur_loc >> 1;
+
+}
+#endif
+
+/* This code is invoked whenever QEMU decides that it doesn't have a
+   translation of a particular block and needs to compute it. When this happens,
+   we tell the parent to mirror the operation, so that the next fork() has a
+   cached copy. */
+
+#if 0
+static void afl_request_tsl(target_ulong pc, target_ulong cb, uint64_t flags) {
+
+  struct afl_tsl t;
+
+  if (!afl_fork_child) return;
+
+  t.pc      = pc;
+  t.cs_base = cb;
+  t.flags   = flags;
+
+  if (write(TSL_FD, &t, sizeof(struct afl_tsl)) != sizeof(struct afl_tsl))
+    return;
+
+}
+#else
+static void afl_request_tsl(target_ulong pc, target_ulong cb, uint32_t flags,
+                            TranslationBlock *last_tb, int tb_exit) {
+  struct afl_tsl t;
+  struct afl_chain c;
+ 
+  if (!afl_fork_child) return;
+ 
+  t.tb.pc      = pc;
+  t.tb.cs_base = cb;
+  t.tb.flags   = flags;
+  t.is_chain   = (last_tb != NULL);
+ 
+  if (write(TSL_FD, &t, sizeof(struct afl_tsl)) != sizeof(struct afl_tsl))
+    return;
+ 
+  if (t.is_chain) {
+    c.last_tb.pc      = last_tb->pc;
+    c.last_tb.cs_base = last_tb->cs_base;
+    c.last_tb.flags   = last_tb->flags;
+    c.tb_exit         = tb_exit;
+
+    if (write(TSL_FD, &c, sizeof(struct afl_chain)) != sizeof(struct afl_chain))
+      return;
+  }
+ }
+#endif
+
+/* This is the other side of the same channel. Since timeouts are handled by
+   afl-fuzz simply killing the child, we can just wait until the pipe breaks. */
+
+static void afl_wait_tsl(CPUState *cpu, int fd) {
+
+  struct afl_tsl t;
+  struct afl_chain c;
+  TranslationBlock *tb, *last_tb;
+
+  while (1) {
+
+    /* Broken pipe means it's time to return to the fork server routine. */
+
+    if (read(fd, &t, sizeof(struct afl_tsl)) != sizeof(struct afl_tsl))
+      break;
+
+    tb = tb_htable_lookup(cpu, t.tb.pc, t.tb.cs_base, t.tb.flags);
+
+    if(!tb) {
+      mmap_lock();
+      tb_lock();
+      tb = tb_gen_code(cpu, t.tb.pc, t.tb.cs_base, t.tb.flags, 0);
+      mmap_unlock();
+      tb_unlock();
+    }
+
+    if (t.is_chain) {
+      if (read(fd, &c, sizeof(struct afl_chain)) != sizeof(struct afl_chain))
+        break;
+
+      last_tb = tb_htable_lookup(cpu, c.last_tb.pc, c.last_tb.cs_base,
+                                 c.last_tb.flags);
+      if (last_tb) {
+        tb_lock();
+        if (!tb->invalid) {
+          tb_add_jump(last_tb, c.tb_exit, tb);
+        }
+        tb_unlock();
+      }
+    }
+
+  }
+
+  close(fd);
+
+}
diff --git a/qemu_mode/patches/afl-qemu-translate-inl.h b/qemu_mode/patches/afl-qemu-translate-inl.h
new file mode 100644
index 00000000..9e778a83
--- /dev/null
+++ b/qemu_mode/patches/afl-qemu-translate-inl.h
@@ -0,0 +1,82 @@
+/*
+   american fuzzy lop - high-performance binary-only instrumentation
+   -----------------------------------------------------------------
+
+   Written by Andrew Griffiths <agriffiths@google.com> and
+              Michal Zalewski <lcamtuf@google.com>
+
+   Idea & design very much by Andrew Griffiths.
+
+   TCG instrumentation and block chaining support by Andrea Biondo
+                                      <andrea.biondo965@gmail.com>
+
+   Copyright 2015, 2016, 2017 Google Inc. All rights reserved.
+
+   Licensed under the Apache License, Version 2.0 (the "License");
+   you may not use this file except in compliance with the License.
+   You may obtain a copy of the License at:
+
+     http://www.apache.org/licenses/LICENSE-2.0
+
+   This code is a shim patched into the separately-distributed source
+   code of QEMU 2.10.0. It leverages the built-in QEMU tracing functionality
+   to implement AFL-style instrumentation and to take care of the remaining
+   parts of the AFL fork server logic.
+
+   The resulting QEMU binary is essentially a standalone instrumentation
+   tool; for an example of how to leverage it for other purposes, you can
+   have a look at afl-showmap.c.
+
+ */
+
+#include "../../config.h"
+#include "tcg-op.h"
+
+/* Declared in afl-qemu-cpu-inl.h */
+extern unsigned char *afl_area_ptr;
+extern unsigned int afl_inst_rms;
+extern abi_ulong afl_start_code, afl_end_code;
+
+/* Generates TCG code for AFL's tracing instrumentation. */
+static void afl_gen_trace(target_ulong cur_loc)
+{
+  static __thread target_ulong prev_loc;
+  TCGv index, count, new_prev_loc;
+  TCGv_ptr prev_loc_ptr, count_ptr;
+
+  /* Optimize for cur_loc > afl_end_code, which is the most likely case on
+     Linux systems. */
+
+  if (cur_loc > afl_end_code || cur_loc < afl_start_code || !afl_area_ptr)
+    return;
+
+  /* Looks like QEMU always maps to fixed locations, so ASAN is not a
+     concern. Phew. But instruction addresses may be aligned. Let's mangle
+     the value to get something quasi-uniform. */
+
+  cur_loc  = (cur_loc >> 4) ^ (cur_loc << 8);
+  cur_loc &= MAP_SIZE - 1;
+
+  /* Implement probabilistic instrumentation by looking at scrambled block
+     address. This keeps the instrumented locations stable across runs. */
+
+  if (cur_loc >= afl_inst_rms) return;
+
+  /* index = prev_loc ^ cur_loc */
+  prev_loc_ptr = tcg_const_ptr(&prev_loc);
+  index = tcg_temp_new();
+  tcg_gen_ld_tl(index, prev_loc_ptr, 0);
+  tcg_gen_xori_tl(index, index, cur_loc);
+
+  /* afl_area_ptr[index]++ */
+  count_ptr = tcg_const_ptr(afl_area_ptr);
+  tcg_gen_add_ptr(count_ptr, count_ptr, TCGV_NAT_TO_PTR(index));
+  count = tcg_temp_new();
+  tcg_gen_ld8u_tl(count, count_ptr, 0);
+  tcg_gen_addi_tl(count, count, 1);
+  tcg_gen_st8_tl(count, count_ptr, 0);
+
+  /* prev_loc = cur_loc >> 1 */
+  new_prev_loc = tcg_const_tl(cur_loc >> 1);
+  tcg_gen_st_tl(new_prev_loc, prev_loc_ptr, 0);
+}
diff --git a/qemu_mode/patches/configure.diff b/qemu_mode/patches/configure.diff
new file mode 100644
index 00000000..a9816f87
--- /dev/null
+++ b/qemu_mode/patches/configure.diff
@@ -0,0 +1,11 @@
+--- a/configure
++++ b/configure
+@@ -3855,7 +3855,7 @@ fi
+ # check if memfd is supported
+ memfd=no
+ cat > $TMPC << EOF
+-#include <sys/memfd.h>
++#include <sys/mman.h>
+ 
+ int main(void)
+ {
diff --git a/qemu_mode/patches/cpu-exec.diff b/qemu_mode/patches/cpu-exec.diff
new file mode 100644
index 00000000..754bf9ef
--- /dev/null
+++ b/qemu_mode/patches/cpu-exec.diff
@@ -0,0 +1,54 @@
+--- qemu-2.10.0-clean/accel/tcg/cpu-exec.c	2017-08-30 18:50:40.000000000 +0200
++++ qemu-2.10.0/accel/tcg/cpu-exec.c	2018-09-22 13:21:23.612068407 +0200
+@@ -36,6 +36,8 @@
+ #include "sysemu/cpus.h"
+ #include "sysemu/replay.h"
+ 
++#include "../patches/afl-qemu-cpu-inl.h"
++
+ /* -icount align implementation. */
+ 
+ typedef struct SyncClocks {
+@@ -144,6 +146,8 @@
+     int tb_exit;
+     uint8_t *tb_ptr = itb->tc_ptr;
+ 
++    AFL_QEMU_CPU_SNIPPET2;
++
+     qemu_log_mask_and_addr(CPU_LOG_EXEC, itb->pc,
+                            "Trace %p [%d: " TARGET_FMT_lx "] %s\n",
+                            itb->tc_ptr, cpu->cpu_index, itb->pc,
+@@ -337,7 +341,7 @@
+     TranslationBlock *tb;
+     target_ulong cs_base, pc;
+     uint32_t flags;
+-    bool have_tb_lock = false;
++    bool have_tb_lock = false, was_translated = false, was_chained = false;
+ 
+     /* we record a subset of the CPU state. It will
+        always be the same before a given translated block
+@@ -365,6 +369,7 @@
+             if (!tb) {
+                 /* if no translated code available, then translate it now */
+                 tb = tb_gen_code(cpu, pc, cs_base, flags, 0);
++                was_translated = true;
+             }
+ 
+             mmap_unlock();
+@@ -390,11 +395,16 @@
+         }
+         if (!tb->invalid) {
+             tb_add_jump(last_tb, tb_exit, tb);
++            was_chained = true;
+         }
+     }
+     if (have_tb_lock) {
+         tb_unlock();
+     }
++    if (was_translated || was_chained) {
++        afl_request_tsl(pc, cs_base, flags, was_chained ? last_tb : NULL,
++                        tb_exit);
++    }
+     return tb;
+ }
+ 
diff --git a/qemu_mode/patches/elfload.diff b/qemu_mode/patches/elfload.diff
new file mode 100644
index 00000000..34ec4847
--- /dev/null
+++ b/qemu_mode/patches/elfload.diff
@@ -0,0 +1,55 @@
+--- qemu-2.10.0.orig/linux-user/elfload.c	2017-08-30 18:50:41.000000000 +0200
++++ qemu-2.10.0/linux-user/elfload.c	2018-10-23 12:48:16.421879765 +0200
+@@ -20,6 +20,8 @@
+ 
+ #define ELF_OSABI   ELFOSABI_SYSV
+ 
++extern abi_ulong afl_entry_point, afl_start_code, afl_end_code;
++
+ /* from personality.h */
+ 
+ /*
+@@ -2085,6 +2087,8 @@
+     info->brk = 0;
+     info->elf_flags = ehdr->e_flags;
+ 
++    if (!afl_entry_point) afl_entry_point = info->entry;
++
+     for (i = 0; i < ehdr->e_phnum; i++) {
+         struct elf_phdr *eppnt = phdr + i;
+         if (eppnt->p_type == PT_LOAD) {
+@@ -2118,9 +2122,11 @@
+             if (elf_prot & PROT_EXEC) {
+                 if (vaddr < info->start_code) {
+                     info->start_code = vaddr;
++                    if (!afl_start_code) afl_start_code = vaddr;
+                 }
+                 if (vaddr_ef > info->end_code) {
+                     info->end_code = vaddr_ef;
++                    if (!afl_end_code) afl_end_code = vaddr_ef;
+                 }
+             }
+             if (elf_prot & PROT_WRITE) {
+@@ -2443,6 +2449,22 @@
+                                 info, (elf_interpreter ? &interp_info : NULL));
+     info->start_stack = bprm->p;
+ 
++#if defined(TARGET_PPC64) && !defined(TARGET_ABI32)
++    // On PowerPC64 the entry point is the _function descriptor_
++    // of the entry function. For AFL to properly initialize,
++    // afl_entry_point needs to be set to the actual first instruction
++    // as opposed executed by the target program. This as opposed to 
++    // where the function's descriptor sits in memory.
++    
++    // Shameless copy of PPC init_thread
++    info_report("Adjusting afl_entry_point");
++    if (afl_entry_point && (get_ppc64_abi(info) < 2)) {
++        uint64_t val;
++        get_user_u64(val, afl_entry_point);
++        afl_entry_point = val + info->load_bias;
++    }
++#endif
++
+     /* If we have an interpreter, set that as the program's entry point.
+        Copy the load_bias as well, to help PPC64 interpret the entry
+        point as a function descriptor.  Do this after creating elf tables
diff --git a/qemu_mode/patches/elfload2.diff b/qemu_mode/patches/elfload2.diff
new file mode 100644
index 00000000..e09d11c6
--- /dev/null
+++ b/qemu_mode/patches/elfload2.diff
@@ -0,0 +1,24 @@
+--- qemu-2.10.0/linux-user/elfload.c.after	2019-05-28 15:21:36.931618928 +0200
++++ qemu-2.10.0/linux-user/elfload.c	2019-05-28 15:22:23.939617556 +0200
+@@ -2087,7 +2087,20 @@
+     info->brk = 0;
+     info->elf_flags = ehdr->e_flags;
+ 
+-    if (!afl_entry_point) afl_entry_point = info->entry;
++    if (!afl_entry_point) {
++      char *ptr;
++      if ((ptr = getenv("AFL_ENTRYPOINT")) != NULL) {
++        afl_entry_point = strtoul(ptr, NULL, 16);
++      } else {
++        if (!afl_entry_point) afl_entry_point = info->entry;
++      }
++#ifdef TARGET_ARM
++      /* The least significant bit indicates Thumb mode. */
++      afl_entry_point = afl_entry_point & ~(target_ulong)1;
++#endif
++      if (getenv("AFL_DEBUG") != NULL)
++        fprintf(stderr, "AFL forkserver entrypoint: %p\n", (void*)afl_entry_point);
++    } while(0);
+ 
+     for (i = 0; i < ehdr->e_phnum; i++) {
+         struct elf_phdr *eppnt = phdr + i;
diff --git a/qemu_mode/patches/memfd.diff b/qemu_mode/patches/memfd.diff
new file mode 100644
index 00000000..7f68396c
--- /dev/null
+++ b/qemu_mode/patches/memfd.diff
@@ -0,0 +1,12 @@
+--- a/util/memfd.c
++++ b/util/memfd.c
+@@ -31,9 +31,7 @@
+ 
+ #include "qemu/memfd.h"
+ 
+-#ifdef CONFIG_MEMFD
+-#include <sys/memfd.h>
+-#elif defined CONFIG_LINUX
++#if defined CONFIG_LINUX && !defined CONFIG_MEMFD
+ #include <sys/syscall.h>
+ #include <asm/unistd.h>
diff --git a/qemu_mode/patches/syscall.diff b/qemu_mode/patches/syscall.diff
new file mode 100644
index 00000000..55b29140
--- /dev/null
+++ b/qemu_mode/patches/syscall.diff
@@ -0,0 +1,35 @@
+--- qemu-2.10.0-rc3-clean/linux-user/syscall.c	2017-08-15 11:39:41.000000000 -0700
++++ qemu-2.10.0-rc3/linux-user/syscall.c	2017-08-22 14:34:03.193088186 -0700
+@@ -116,6 +116,8 @@
+ 
+ #include "qemu.h"
+ 
++extern unsigned int afl_forksrv_pid;
++
+ #ifndef CLONE_IO
+ #define CLONE_IO                0x80000000      /* Clone io context */
+ #endif
+@@ -11688,8 +11690,21 @@
+         break;
+ 
+     case TARGET_NR_tgkill:
+-        ret = get_errno(safe_tgkill((int)arg1, (int)arg2,
+-                        target_to_host_signal(arg3)));
++
++        {
++          int pid  = (int)arg1,
++              tgid = (int)arg2,
++              sig  = (int)arg3;
++
++          /* Not entirely sure if the below is correct for all architectures. */
++
++          if(afl_forksrv_pid && afl_forksrv_pid == pid && sig == SIGABRT)
++              pid = tgid = getpid();
++
++          ret = get_errno(safe_tgkill(pid, tgid, target_to_host_signal(sig)));
++
++        }
++
+         break;
+ 
+ #ifdef TARGET_NR_set_robust_list
diff --git a/qemu_mode/patches/translate-all.diff b/qemu_mode/patches/translate-all.diff
new file mode 100644
index 00000000..853a66ad
--- /dev/null
+++ b/qemu_mode/patches/translate-all.diff
@@ -0,0 +1,19 @@
+--- a/accel/tcg/translate-all.c	2017-08-30 18:50:40.000000000 +0200
++++ b/accel/tcg/translate-all.c	2018-09-21 10:19:42.328766554 +0200
+@@ -60,6 +60,8 @@
+ #include "exec/log.h"
+ #include "sysemu/cpus.h"
+ 
++#include "../patches/afl-qemu-translate-inl.h"
++
+ /* #define DEBUG_TB_INVALIDATE */
+ /* #define DEBUG_TB_FLUSH */
+ /* make various TB consistency checks */
+@@ -1280,6 +1282,7 @@
+     tcg_func_start(&tcg_ctx);
+ 
+     tcg_ctx.cpu = ENV_GET_CPU(env);
++    afl_gen_trace(pc);
+     gen_intermediate_code(cpu, tb);
+     tcg_ctx.cpu = NULL;
+