Age | Commit message (Collapse) | Author |
|
The mandel example uses SDL2 for graphics
output. When GCC is used to assemble the
resulting *.s file it shows linker's
errors about undefined symbols from the
library.
This behavior can be fixed by moving
the flags passed to the compiler after
the source file name.
|
|
|
|
In this case, the immediate is too large to use directly in the add/sub
instructions, so move it into a temporary register first.
Also, for clarity, rearrange the if-conditions so that they match the
constraints of the instructions that immediately follow.
|
|
|
|
The ldrb and ldrh instructions require a 32-bit register name for the
destination and will clear the upper 32-bits of that register.
|
|
|
|
|
|
The code used to see add 0, 10 as
a copy of 0.
|
|
|
|
|
|
The same functionality can be implemented
naturally in the cfg simplification pass.
|
|
Previously, each ret would lead to an
epilog. This caused bloat for large
functions with multiple return points.
|
|
.align N can either mean align to
the next multiple of N or align to
the next multiple of 1<<N.
Credit goes to Jorge Acereda MaciĆ”
for reporting this issue.
|
|
SCCP is currently the one and only
pass which seriously affects control
flow; so we must compute loop costs
afterwards.
|
|
When lowering pointer arithmetic, it is
natural for a C frontend to generate
those instructions.
|
|
The heuristic was bogus for at least
two reasons (see below), and, looking
at some generated code, it looks like
some other issues are more pressing.
1. A stack slot of 4 bytes could be
used for a temporary of 8 bytes.
2. Should 2 arguments of an operation
end up spilled, the same slot
could be allocated to both!
|
|
The value argument of store instructions was
handled incorrectly.
|
|
This fixes similar bugs than the ones fixed
in the previous commit.
In the folding code the invariant is that
when a result is 32 bits wide, the low 32
bits of 'x' are correct. The high bits
can be anything.
|
|
|
|
This prevents an FE_INVALID exception when comparing with NaN.
|
|
Thanks to Jorge Acereda MaciĆ” for catching
this.
|
|
This was generated by csmith and then compiled
to qbe il by Michael Forney's C compiler.
|
|
|
|
|
|
|
|
Slots are stored as `int` in Fn, so use the same type in Tmp.
Rearrange the fields in Tmp slightly so that sizeof(Tmp) stays the same
(at least on 64-bit systems).
|
|
I had forgotten that %rip can only be
used as base when there is no index.
I also added a test which stresses
addressing selection with and without
constants.
|
|
We now emit correct code when the user
refers to a specific constant address.
I also made some comments clearer in
the instruction selection pass and got
rid of some apparently useless code.
|
|
|
|
In this case, the potential truncations
flagged by gcc are only affecting debug
information.
|
|
Michael Forney needs this to run his
compiler on interesting programs.
|
|
There is no flavor of mov which can set 8 bytes
of memory to a constant not representable as an
int32. The solution is simply to emit two movs
of 4 bytes each.
|
|
Previously, we would skip ssa construction when
a temporary has a single definition. This is
only part of the ssa invariant: we must also
check that all uses are dominated by the single
definition. The new code does this.
In fact, qbe does not store all the dominators
for a block, so instead of walking the idom
linked list we use a rough heuristic and declare
conservatively that B0 dominates B1 when one of
the two conditions is true:
a. B0 is the start block
b. B0 is B1
Some measurements on a big file from Michael
Forney show that the code is still as fast as
before this patch.
|
|
If an instruction does not have a result, the
variable `s` is not set. This could lead to a
bogus slot assignment.
|
|
|
|
On test/spill1.ssa, the stack frame of
the function f() goes from 56 bytes to
40 bytes. That's a reduction of close
to 30%.
This patch also opens the door to
folding operations on spill slots.
For example
movl $15, %r15d
addl -X(%rbp), %r15d
movl %r15d, -X(%rbp)
should become
add $15, -X(%rbp)
when %r15d is not used afterwards.
|
|
They are now linear and can be
safely used with arguments that
have side-effects. This patch
also introduces an iscall()
macro and uses it to fix a
missing check for Ovacall in
liveness analysis.
|
|
|
|
|
|
In fact, after spilling, a phi
can be a temporary or a slot.
I am now pondering whether this
is a good idea or not because
it causes annoying mem->mem
movs after register allocation.
|
|
|
|
|
|
If a temporary is assigned exactly
once (most are), there is no need
to do any work to put it in ssa
form.
On an input file of ~35k loc, this
makes the processing time go from
2.9 secs to 1.2 secs.
|
|
|
|
Remarks from Quentin:
It is an important decision to use Bot
and not Top as the result of 'x / 0'.
By using Bot, we refuse to give a warrant
to the compiler that would allow meaningless
subsequent decisions. An example follows.
Clang, on my computer, will build a program
which prints "Ho" when fed the following C:
int main()
{ puts(1/0 ? "Hi" : "Ho"); }
On the other hand, a C compiler based on
QBE will build a program which crashes,
as one would expect.
See also
https://c9x.me/notes/2014-09-10.html
|
|
The sparse data-flow analysis used for
copy elimination before this patch
could sometimes diverge. The core
reason for this behavior is that the
visitphi() function was not monotonic
in the following copy-of lattice:
top (represented as the temp
/ | \ itself)
x y z ...
\ | /
bot (represented as R)
This monotonicity defect could be
fixed by reverting 2f41ff03, but
then the pass would end up missing
some redundant phis.
This patch re-implements the pass
from scratch using a different
approach. The new algorithm should
get rid of all redundant copies. On
the other hand, it can run slower
than the monotonic sparse data-flow
analysis because, in the worst case,
an instruction in a phi cluster can
be visited as many times as there
are phis in the input program.
Thanks to Michael Forney for reviewing
and testing the new pass.
|
|
Before, amatch() would prefer matching
"o + b" to "o + s*i" and "b + s*i".
|
|
The numberer made some arranging choices when
numbering arguments of an instruction, but these
decisions were ignored when matching. The fix
is to reconcile numbering and matching.
|
|
|
|
Since ce0ab53ed7, we skip over predecessors that spilled the
temporary. However, if all predecessors spilled, then we might not have
an entry in `rl`, triggering an assertion failure in the following loop.
|