diff options
Diffstat (limited to 'llvm_mode/README.lto.md')
-rw-r--r-- | llvm_mode/README.lto.md | 215 |
1 files changed, 55 insertions, 160 deletions
diff --git a/llvm_mode/README.lto.md b/llvm_mode/README.lto.md index 28b3b045..9af9ffff 100644 --- a/llvm_mode/README.lto.md +++ b/llvm_mode/README.lto.md @@ -2,31 +2,32 @@ ## TLDR; -1. This compile mode is very frickle if it works it is amazing, if it fails - - well use afl-clang-fast +This version requires a current llvm 11 compiled from the github master. -2. Use afl-clang-lto/afl-clang-lto++ because it is faster and gives better +1. Use afl-clang-lto/afl-clang-lto++ because it is faster and gives better coverage than anything else that is out there in the AFL world -3. You can use it together with llvm_mode: laf-intel and whitelisting +2. You can use it together with llvm_mode: laf-intel and whitelisting features and can be combined with cmplog/Redqueen -4. It only works with llvm 9 (and likely 10+ but is not tested there yet) +3. It only works with llvm 11 (current github master state) + +4. AUTODICTIONARY feature! see below ## Introduction and problem description A big issue with how afl/afl++ works is that the basic block IDs that are -set during compilation are random - and hence natually the larger the number -of instrumented locations, the higher the number of edge collisions in the +set during compilation are random - and hence naturally the larger the number +of instrumented locations, the higher the number of edge collisions are in the map. This can result in not discovering new paths and therefore degrade the -efficiency of the fuzzing. +efficiency of the fuzzing process. -*This issue is understimated in the fuzzing community!* +*This issue is underestimated in the fuzzing community!* With a 2^16 = 64kb standard map at already 256 instrumented blocks there is on average one collision. On average a target has 10.000 to 50.000 instrumented blocks hence the real collisions are between 750-18.000! -To get to a solution that prevents any collision took several approaches +To reach a solution that prevents any collisions took several approaches and many dead ends until we got to this: * We instrument at link time when we have all files pre-compiled @@ -34,38 +35,48 @@ and many dead ends until we got to this: * Our compiler (afl-clang-lto/afl-clang-lto++) takes care of setting the correct LTO options and runs our own afl-ld linker instead of the system linker - * Our linker collects all LTO files to link and instruments them so that + * The LLVM linker collects all LTO files to link and instruments them so that we have non-colliding edge overage * We use a new (for afl) edge coverage - which is the same as in llvm -fsanitize=coverage edge coverage mode :) - * after inserting our instrumentation in all interesting edges we link - all parts of the program together to our executable The result: - * 10-15% speed gain compared to llvm_mode + * 10-20% speed gain compared to llvm_mode * guaranteed non-colliding edge coverage :-) * The compile time especially for libraries can be longer Example build output from a libtiff build: ``` -/bin/bash ../libtool --tag=CC --mode=link afl-clang-lto -g -O2 -Wall -W -o thumbnail thumbnail.o ../libtiff/libtiff.la ../port/libport.la -llzma -ljbig -ljpeg -lz -lm libtool: link: afl-clang-lto -g -O2 -Wall -W -o thumbnail thumbnail.o ../libtiff/.libs/libtiff.a ../port/.libs/libport.a -llzma -ljbig -ljpeg -lz -lm -afl-clang-lto++2.62d by Marc "vanHauser" Heuse <mh@mh-sec.de> -afl-ld++2.62d by Marc "vanHauser" Heuse <mh@mh-sec.de> (level 0) -[+] Running ar unpacker on /prg/tests/lto/tiff-4.0.4/tools/../libtiff/.libs/libtiff.a into /tmp/.afl-3914343-1583339800.dir -[+] Running ar unpacker on /prg/tests/lto/tiff-4.0.4/tools/../port/.libs/libport.a into /tmp/.afl-3914343-1583339800.dir -[+] Running bitcode linker, creating /tmp/.afl-3914343-1583339800-1.ll -[+] Performing optimization via opt, creating /tmp/.afl-3914343-1583339800-2.bc -[+] Performing instrumentation via opt, creating /tmp/.afl-3914343-1583339800-3.bc -afl-llvm-lto++2.62d by Marc "vanHauser" Heuse <mh@mh-sec.de> -[+] Instrumented 15833 locations with no collisions (on average 1767 collisions would be in afl-gcc/afl-clang-fast) (non-hardened mode). -[+] Running real linker /bin/x86_64-linux-gnu-ld -[+] Linker was successful +afl-clang-lto++2.63d by Marc "vanHauser" Heuse <mh@mh-sec.de> in mode LTO +afl-llvm-lto++2.63d by Marc "vanHauser" Heuse <mh@mh-sec.de> +AUTODICTIONARY: 11 strings found +[+] Instrumented 12071 locations with no collisions (on average 1046 collisions would be in afl-gcc/afl-clang-fast) (non-hardened mode). +``` + +## Building llvm 11 + +``` +$ sudo apt install binutils-dev # this is *essential*! +$ git clone https://github.com/llvm/llvm-project +$ cd llvm-project +$ mkdir build +$ cd build +$ cmake -DLLVM_ENABLE_PROJECTS='clang;clang-tools-extra;compiler-rt;libclc;libcxx;libcxxabi;libunwind;lld' -DCMAKE_BUILD_TYPE=Release -DLLVM_BINUTILS_INCDIR=/usr/include/ ../llvm/ +$ make -j $(nproc) +$ export PATH=`pwd`/bin:$PATH +$ export LLVM_CONFIG=`pwd`/bin/llcm-config +$ cd /path/to/AFLplusplus/ +$ make +$ cd llvm_mode +$ make +$ cd .. +$ make install ``` ## How to use afl-clang-lto -Just use afl-clang-lto like you did afl-clang-fast or afl-gcc. +Just use afl-clang-lto like you did with afl-clang-fast or afl-gcc. Also whitelisting (AFL_LLVM_WHITELIST -> [README.whitelist.md](README.whitelist.md)) and laf-intel/compcov (AFL_LLVM_LAF_* -> [README.laf-intel.md](README.laf-intel.md)) work. @@ -77,6 +88,13 @@ CC=afl-clang-lto CXX=afl-clang-lto++ ./configure make ``` +## AUTODICTIONARY feature + +Setting `AFL_LLVM_LTO_AUTODICTIONARY` will generate a dictionary in the +target binary based on string compare and memory compare functions. +afl-fuzz will automatically get these transmitted when starting to fuzz. +This improves coverage on a lot of targets. + ## Potential issues ### compiling libraries fails @@ -94,145 +112,16 @@ AR=llvm-ar RANLIB=llvm-ranlib CC=afl-clang-lto CXX=afl-clang-lto++ ./configure - ``` and on some target you have to to AR=/RANLIB= even for make as the configure script does not save it ... -### "linking globals named '...': symbol multiply defined" error - -The target program is using multiple global variables or functions with the -same name. This is a common error when compiling a project with LTO, and -the fix is `-Wl,--allow-multiple-definition` - however llvm-link which we -need to link all llvm IR LTO files does not support this - yet (hopefully). -Hence if you see this error either you have to remove the duplicate global -variable (think `#ifdef` ...) or you are out of luck. :-( - -### "expected top-level entity" + binary ouput error - -This happens if multiple .a archives are to be linked and they contain the -same object filenames, the first in LTO form, the other in ELF form. -This can not be fixed programmatically, but can be fixed by hand. -You can try to delete the file from either archive -(`llvm-ar d <archive>.a <file>.o`) or performing the llvm-linking, optimizing -and instrumentation by hand (see below). - -### "undefined reference to ..." - -This *can* be the opposite situation of the "expected top-level entity" error - -the library with the ELF file is before the LTO library. -However it can also be a bug in the program - try to compile it normally. If -fails then it is a bug in the program. -Solutions: You can try to delete the file from either archive, e.g. -(`llvm-ar d <archive>.a <file>.o`) or performing the llvm-linking, optimizing -and instrumentation by hand (see below). - -### "File format not recognized" - -This happens if the build system has fixed LDFLAGS, CPPFLAGS, CXXFLAGS and/or -CFLAGS. Ensure that they all contain the `-flto` flag that afl-clang-lto was -compiled with (you can see that by typing `afl-clang-lto -h` and inspecting -the last line of the help output) and add them otherwise - -### clang is hardcoded to /bin/ld - -Some clang packages have 'ld' hardcoded to /bin/ld. This is an issue as this -prevents "our" afl-ld being called. - --fuse-ld=/path/to/afl-ld should be set through makefile magic in llvm_mode - -if it is supported - however if this fails you can try: -``` -LDFLAGS=-fuse-ld=</path/to/afl-ld -``` - -As workaround attempt #2 you will have to switch /bin/ld: -``` - mv /bin/ld /bin/ld.orig - cp afl-ld /bin/ld -``` -This can result in two problems though: - - !1! - When compiling afl-ld, the build process looks at where the /bin/ld link - is going to. So when the workaround was applied and a recompiling afl-ld - is performed then the link is gone and the new afl-ld clueless where - the real ld is. - In this case set AFL_REAL_LD=/bin/ld.orig - - !2! - When you install an updated gcc/clang/... package, your OS might restore - the ld link. - -### Performing the steps by hand - -It is possible to perform all the steps afl-ld by hand to workaround issues -in the target. - -1. Recompile with AFL_DEBUG=1 and collect the afl-clang-lto command that fails - e.g.: `AFL_DEBUG=1 make 2>&1 | grep afl-clang-lto | tail -n 1` - -2. run this command prepended with AFL_DEBUG=1 and collect the afl-ld command - parameters, e.g. `AFL_DEBUG=1 afl-clang-lto[++] .... | grep /afl/ld` - -3. for every .a archive you want to instrument unpack it into a seperate - directory, e.g. - `mkdir archive1.dir ; cd archive1.dir ; llvm-link x ../<archive>.a` - -4. run `file archive*.dir/*.o` and make two lists, one containing all ELF files - and one containing all LLVM IR bitcode files. - You do the same for all .o files of the ../afl/ld command options - -5. Create a single bitcode file by using llvm-link, e.g. - `llvm-link -o all-bitcode.bc <list of all LLVM IR .o files>` - If this fails it is game over - or you modify the source code - -6. Run the optimizer on the new bitcode file: - `opt -O3 --polly -o all-optimized.bc all-bitcode.bc` - -7. Instrument the optimized bitcode file: - `opt --load=$AFL_PATH/afl-llvm-lto-instrumentation.so --disable-opt --afl-lto all-optimized.bc -o all-instrumented.bc - -8. If the parameter `--allow-multiple-definition` is not in the list, add it - as first command line option. - -9. Link everything together. - a) You use the afl-ld command and instead of e.g. `/usr/local/lib/afl/ld` - you replace that with `ld`, the real linker. - b) Every .a archive you instrumented files from you remove the <archive>.a - or -l<archive> from the command - c) If you have entries in your ELF files list (see step 4), you put them to - the command line - but them in the same order! - d) put the all-instrumented.bc before the first library or .o file - e) run the command and hope it compiles, if it doesn't you have to analyze - what the issue is and fix that in the approriate step above. - -Yes this is long and complicated. That is why there is afl-ld doing this and -that why this can easily fail and not all different ways how it *can* fail can -be implemented ... - ### compiling programs still fail afl-clang-lto is still work in progress. -Complex targets are still likely not to compile and this needs to be fixed. Please report issues at: [https://github.com/AFLplusplus/AFLplusplus/issues/226](https://github.com/AFLplusplus/AFLplusplus/issues/226) -Known issues: -* ffmpeg -* bogofilter -* libjpeg-turbo-1.3.1 - ## Upcoming Work -1. Currently the LTO whitelist feature does not allow to not instrument main, start and init functions -2. Modify the forkserver + afl-fuzz so that only the necessary map size is - loaded and used - and communicated to afl-fuzz too. - Result: faster fork in the target and faster map analysis in afl-fuzz - => more speed :-) - -## Tested and working targets - -* libpng-1.2.53 -* libxml2-2.9.2 -* tiff-4.0.4 -* unrar-nonfree-5.6.6 -* exiv 0.27 -* jpeg-6b +1. Currently the LTO whitelist feature does not allow to instrument main, + start and init functions ## History @@ -249,14 +138,20 @@ This was first implemented in January and work ... kinda. The LTO time instrumentation worked, however the "how" the basic blocks were instrumented was a problem, as reducing duplicates turned out to be very, very difficult with a program that has so many paths and therefore so many -dependencies. At lot of stratgies were implemented - and failed. +dependencies. At lot of strategies were implemented - and failed. And then sat solvers were tried, but with over 10.000 variables that turned out to be a dead-end too. + The final idea to solve this came from domenukk who proposed to insert a block into an edge and then just use incremental counters ... and this worked! After some trials and errors to implement this vanhauser-thc found out that there is actually an llvm function for this: SplitEdge() :-) + Still more problems came up though as this only works without bugs from llvm 9 onwards, and with high optimization the link optimization ruins the instrumented control flow graph. -As long as there are no larger changes in llvm this all should work well now ... + +This is all now fixed with llvm 11. The llvm's own linker is now able to +load passes and this bypasses all problems we had. + +Happy end :) |