| 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
 | # Map density
## How coverage works
The coverage in AFL++ works by assigning each basic block of code a unique ID
and during execution when transitioning between blocks (e.g., by calls or jumps)
assigning each of these edges an ID based upon the source and destination block
ID.
For each individual execution of the target, a single dimensional byte array
indexed by the edge ID is used to count how many times each edge is traversed.
A single dimensional cumulative byte array is also constructed where each byte
again represents an individual edge ID, but this time, the value of the byte
represents a range of how many times that edge has been traversed.
`1, 2, 3, 4-7, 8-15, 16-31, 32-127, 128+`
The theory is that a new path isn't particularly interesting if an edge has been
traversed `23` instead of `24` times for example, but is interesting if an edge
has been traversed for the very first time or the number of times fits within a
different bucket.
After each run, the count of times each edge is hit is compared to the values in
the cumulative map and if it is different, then the input is kept as a new seed
and the cumulative map is updated.
This mechanism is described in greater detail in the seminal
[paper](https://lcamtuf.coredump.cx/afl/technical_details.txt) on AFL by
[lcamtuf](https://github.com/lcamtuf).
## Collisions
In black-box fuzzing, we must assume that control may flow from any block to any
other block, since we don't know any better. Thus, for a target with `n` basic
blocks of code, there are `n * n` potential edges. As we can see, even with a
small number of edges, a very large map will be required so that we have space
to fit them all. Even if our target only had `1024` blocks, this would require a
map containing `1048576` entries (or 1Mb in size).
Whilst this may not seem like a lot of memory, it causes problems for two
reasons. Firstly, the processing step after each execution must now process much
more data, and secondly, a map this size is unlikely to fit within the L2 cache
of the processor. Since this is a very hot code path, we are likely to pay a
very heavy performance cost.
Therefore, we must accept that not all edges can have a unique ID and that
therefore there will be collisions. This means that if the fuzzer finds a new
path by uncovering an edge which was not previously found, but that the same
edge ID is used by another edge, then it may go completely unnoticed. This is
obviously undesirable, but equally if our map is too large, then we will not be
able to process as many potential inputs in the same time and hence not uncover
edges for that reason. Thus a careful trade-off of map size must be made.
## Block & edge numbering
Since the original AFL, blocks and edges have always been numbered in the same
way as we can see from the following C snippet from the whitepaper:
```c
cur_location = (block_address >> 4) ^ (block_address << 8);
shared_mem[cur_location ^ prev_location]++;
prev_location = cur_location >> 1;
```
Each block ID is generated by performing a shift and XOR on its address. Then
the edge ID is calculated as `E = B ^ (B' >> 1)`. Here, we can make two
observations. In fact, the edge ID is also masked to ensure it is less than the
size of the map being used.
### Block IDs
Firstly, the block ID doesn't have very good entropy. If we consider the address
of the block, then whilst each block has a unique ID, it isn't necessarily very
evenly distributed.
We start with a large address and need to discard a large number of the bits to
generate a block ID which is within range. But how do we choose the unique bits
of the address versus those which are the same for every block? The high bits of
the address may be all `0s` or all `1s` to make the address canonical, the
middle portion of the address may be the same for all blocks (since if they are
all within the same binary, then they will all be adjacent in memory), and on
some systems, even the low bits may have poor entropy as some use fixed length
aligned instructions. Then we need to consider that a portion of each binary may
contain the `.data` or `.bss` sections and so may not contain any blocks of code
at all.
### Edge IDs
Secondly, we can observe that when we generate an edge ID from the source and
destination block IDs, we perform a right shift on the source block ID. Whilst
there are good reasons as set out in the whitepaper why such a transform is
applied, in doing so, we dispose of `1` bit of precious entropy in our source
block ID.
All together, this means that some edge IDs may be more popular than others.
This means that some portions of the map may be very densely populated with
large numbers of edges, whilst others may be very sparsely populated, or not
populated at all.
## Improvements
One of the main reasons why this algorithm selected, is performance. All of the
operations are very quick to perform and given we may be carrying this out for
every block of code we execute, performance is critical.
However, the design of the binary instrumentation modes of AFL++ has moved on.
Both QEMU and FRIDA modes use a two stage process when executing a target
application. Each block is first compiled or instrumented, and then it is
executed. The compiled blocks can be re-used each time the target executes them.
Since a blocks ID is based on its address, and this is known at compile time, we
only need to generate this ID once per block and so this ID generation no longer
needs to be as performant. We can therefore use a hash algorithm to generate
this ID and therefore ensure that the block IDs are more evenly distributed.
Edge IDs, however, can only be determined at run-time. Since we don't know which
blocks a given input will traverse until we run it. However, given our block IDs
are now evenly distributed, generating an evenly distributed edge ID becomes
simple. Here, the only change we make is to use a rotate operation rather than a
shift operation so we don't lose a bit of entropy from the source ID.
So our new algorithm becomes:
```c
cur_location = hash(block_address)
shared_mem[cur_location ^ prev_location]++;
prev_location = rotate(cur_location, 1);
```
Lastly, in the original design, the `cur_location` was always set to `0`, at the
beginning of a run, we instead set the value of `cur_location` to `hash(0)`.
## Parallel fuzzing
Another sub-optimal aspect of the original design is that no matter how many
instances of the fuzzer you ran in parallel, each instance numbered each block
and so each edge with the same ID. Each instance would therefore find the same
subset of edges collide with each other. In the event of a collision, all
instances will hit the same road block.
However, if we instead use a different seed for our hashing function for each
instance, then each will ascribe each block a different ID and hence each edge
will be given a different edge ID. This means that whilst one instance of the
fuzzer may find a given pair of edges collide, it is very unlikely that another
instance will find the same pair also collide.
Due to the collaborative nature of parallel fuzzing, this means that whilst one
instance may struggle to find a particular new path because the new edge
collides, another instance will likely not encounter the same collision and thus
be able to differentiate this new path and share it with the other instances.
If only a single new edge is found, and the new path is shared with an instance
for which that edge collides, that instance may disregard it as irrelevant. In
practice, however, the discovery of a single new edge, likely leads to several
more edges beneath it also being found and therefore the likelihood of all of
these being collisions is very slim.
 |