about summary refs log tree commit diff
path: root/docs/historical_notes.md
diff options
context:
space:
mode:
Diffstat (limited to 'docs/historical_notes.md')
-rw-r--r--docs/historical_notes.md143
1 files changed, 0 insertions, 143 deletions
diff --git a/docs/historical_notes.md b/docs/historical_notes.md
deleted file mode 100644
index b5d3d157..00000000
--- a/docs/historical_notes.md
+++ /dev/null
@@ -1,143 +0,0 @@
-# Historical notes
-
-  This doc talks about the rationale of some of the high-level design decisions
-  for American Fuzzy Lop. It's adopted from a discussion with Rob Graham.
-  See README.md for the general instruction manual, and technical_details.md for
-  additional implementation-level insights.
-
-## 1) Influences
-
-In short, `afl-fuzz` is inspired chiefly by the work done by Tavis Ormandy back
-in 2007. Tavis did some very persuasive experiments using `gcov` block coverage
-to select optimal test cases out of a large corpus of data, and then using
-them as a starting point for traditional fuzzing workflows.
-
-(By "persuasive", I mean: netting a significant number of interesting
-vulnerabilities.)
-
-In parallel to this, both Tavis and I were interested in evolutionary fuzzing.
-Tavis had his experiments, and I was working on a tool called bunny-the-fuzzer,
-released somewhere in 2007.
-
-Bunny used a generational algorithm not much different from `afl-fuzz`, but
-also tried to reason about the relationship between various input bits and
-the internal state of the program, with hopes of deriving some additional value
-from that. The reasoning / correlation part was probably in part inspired by
-other projects done around the same time by Will Drewry and Chris Evans.
-
-The state correlation approach sounded very sexy on paper, but ultimately, made
-the fuzzer complicated, brittle, and cumbersome to use; every other target
-program would require a tweak or two. Because Bunny didn't fare a whole lot
-better than less sophisticated brute-force tools, I eventually decided to write
-it off. You can still find its original documentation at:
-
-  https://code.google.com/p/bunny-the-fuzzer/wiki/BunnyDoc
-
-There has been a fair amount of independent work, too. Most notably, a few
-weeks earlier that year, Jared DeMott had a Defcon presentation about a
-coverage-driven fuzzer that relied on coverage as a fitness function.
-
-Jared's approach was by no means identical to what afl-fuzz does, but it was in
-the same ballpark. His fuzzer tried to explicitly solve for the maximum coverage
-with a single input file; in comparison, afl simply selects for cases that do
-something new (which yields better results - see [technical_details.md](technical_details.md)).
-
-A few years later, Gabriel Campana released fuzzgrind, a tool that relied purely
-on Valgrind and a constraint solver to maximize coverage without any brute-force
-bits; and Microsoft Research folks talked extensively about their still
-non-public, solver-based SAGE framework.
-
-In the past six years or so, I've also seen a fair number of academic papers
-that dealt with smart fuzzing (focusing chiefly on symbolic execution) and a
-couple papers that discussed proof-of-concept applications of genetic
-algorithms with the same goals in mind. I'm unconvinced how practical most of
-these experiments were; I suspect that many of them suffer from the
-bunny-the-fuzzer's curse of being cool on paper and in carefully designed
-experiments, but failing the ultimate test of being able to find new,
-worthwhile security bugs in otherwise well-fuzzed, real-world software.
-
-In some ways, the baseline that the "cool" solutions have to compete against is
-a lot more impressive than it may seem, making it difficult for competitors to
-stand out. For a singular example, check out the work by Gynvael and Mateusz
-Jurczyk, applying "dumb" fuzzing to ffmpeg, a prominent and security-critical
-component of modern browsers and media players:
-
-  http://googleonlinesecurity.blogspot.com/2014/01/ffmpeg-and-thousand-fixes.html
-
-Effortlessly getting comparable results with state-of-the-art symbolic execution
-in equally complex software still seems fairly unlikely, and hasn't been
-demonstrated in practice so far.
-
-But I digress; ultimately, attribution is hard, and glorying the fundamental
-concepts behind AFL is probably a waste of time. The devil is very much in the
-often-overlooked details, which brings us to...
-
-## 2. Design goals for afl-fuzz
-
-In short, I believe that the current implementation of afl-fuzz takes care of
-several itches that seemed impossible to scratch with other tools:
-
-1) Speed. It's genuinely hard to compete with brute force when your "smart"
-   approach is resource-intensive. If your instrumentation makes it 10x more
-   likely to find a bug, but runs 100x slower, your users are getting a bad
-   deal.
-
-   To avoid starting with a handicap, `afl-fuzz` is meant to let you fuzz most of
-   the intended targets at roughly their native speed - so even if it doesn't
-   add value, you do not lose much.
-
-   On top of this, the tool leverages instrumentation to actually reduce the
-   amount of work in a couple of ways: for example, by carefully trimming the
-   corpus or skipping non-functional but non-trimmable regions in the input
-   files.
-
-2) Rock-solid reliability. It's hard to compete with brute force if your
-   approach is brittle and fails unexpectedly. Automated testing is attractive
-   because it's simple to use and scalable; anything that goes against these
-   principles is an unwelcome trade-off and means that your tool will be used
-   less often and with less consistent results.
-
-   Most of the approaches based on symbolic execution, taint tracking, or
-   complex syntax-aware instrumentation are currently fairly unreliable with
-   real-world targets. Perhaps more importantly, their failure modes can render
-   them strictly worse than "dumb" tools, and such degradation can be difficult
-   for less experienced users to notice and correct.
-
-   In contrast, `afl-fuzz` is designed to be rock solid, chiefly by keeping it
-   simple. In fact, at its core, it's designed to be just a very good
-   traditional fuzzer with a wide range of interesting, well-researched
-   strategies to go by. The fancy parts just help it focus the effort in
-   places where it matters the most.
-
-3) Simplicity. The author of a testing framework is probably the only person
-   who truly understands the impact of all the settings offered by the tool -
-   and who can dial them in just right. Yet, even the most rudimentary fuzzer
-   frameworks often come with countless knobs and fuzzing ratios that need to
-   be guessed by the operator ahead of the time. This can do more harm than 
-   good.
-
-   AFL is designed to avoid this as much as possible. The three knobs you
-   can play with are the output file, the memory limit, and the ability to
-   override the default, auto-calibrated timeout. The rest is just supposed to
-   work. When it doesn't, user-friendly error messages outline the probable
-   causes and workarounds, and get you back on track right away.
-
-4) Chainability. Most general-purpose fuzzers can't be easily employed
-   against resource-hungry or interaction-heavy tools, necessitating the
-   creation of custom in-process fuzzers or the investment of massive CPU
-   power (most of which is wasted on tasks not directly related to the code
-   we actually want to test).
-
-   AFL tries to scratch this itch by allowing users to use more lightweight
-   targets (e.g., standalone image parsing libraries) to create small
-   corpora of interesting test cases that can be fed into a manual testing
-   process or a UI harness later on.
-
-As mentioned in [technical_details.md](technical_details.md), AFL does all this not by systematically
-applying a single overarching CS concept, but by experimenting with a variety
-of small, complementary methods that were shown to reliably yields results
-better than chance. The use of instrumentation is a part of that toolkit, but is
-far from being the most important one.
-
-Ultimately, what matters is that `afl-fuzz` is designed to find cool bugs - and
-has a pretty robust track record of doing just that.