Age | Commit message (Collapse) | Author |
|
The value argument of store instructions was
handled incorrectly.
|
|
This fixes similar bugs than the ones fixed
in the previous commit.
In the folding code the invariant is that
when a result is 32 bits wide, the low 32
bits of 'x' are correct. The high bits
can be anything.
|
|
|
|
This prevents an FE_INVALID exception when comparing with NaN.
|
|
Thanks to Jorge Acereda MaciĆ” for catching
this.
|
|
This was generated by csmith and then compiled
to qbe il by Michael Forney's C compiler.
|
|
|
|
|
|
|
|
Slots are stored as `int` in Fn, so use the same type in Tmp.
Rearrange the fields in Tmp slightly so that sizeof(Tmp) stays the same
(at least on 64-bit systems).
|
|
I had forgotten that %rip can only be
used as base when there is no index.
I also added a test which stresses
addressing selection with and without
constants.
|
|
We now emit correct code when the user
refers to a specific constant address.
I also made some comments clearer in
the instruction selection pass and got
rid of some apparently useless code.
|
|
|
|
In this case, the potential truncations
flagged by gcc are only affecting debug
information.
|
|
Michael Forney needs this to run his
compiler on interesting programs.
|
|
There is no flavor of mov which can set 8 bytes
of memory to a constant not representable as an
int32. The solution is simply to emit two movs
of 4 bytes each.
|
|
Previously, we would skip ssa construction when
a temporary has a single definition. This is
only part of the ssa invariant: we must also
check that all uses are dominated by the single
definition. The new code does this.
In fact, qbe does not store all the dominators
for a block, so instead of walking the idom
linked list we use a rough heuristic and declare
conservatively that B0 dominates B1 when one of
the two conditions is true:
a. B0 is the start block
b. B0 is B1
Some measurements on a big file from Michael
Forney show that the code is still as fast as
before this patch.
|
|
If an instruction does not have a result, the
variable `s` is not set. This could lead to a
bogus slot assignment.
|
|
|
|
On test/spill1.ssa, the stack frame of
the function f() goes from 56 bytes to
40 bytes. That's a reduction of close
to 30%.
This patch also opens the door to
folding operations on spill slots.
For example
movl $15, %r15d
addl -X(%rbp), %r15d
movl %r15d, -X(%rbp)
should become
add $15, -X(%rbp)
when %r15d is not used afterwards.
|
|
They are now linear and can be
safely used with arguments that
have side-effects. This patch
also introduces an iscall()
macro and uses it to fix a
missing check for Ovacall in
liveness analysis.
|
|
|
|
|
|
In fact, after spilling, a phi
can be a temporary or a slot.
I am now pondering whether this
is a good idea or not because
it causes annoying mem->mem
movs after register allocation.
|
|
|
|
|
|
If a temporary is assigned exactly
once (most are), there is no need
to do any work to put it in ssa
form.
On an input file of ~35k loc, this
makes the processing time go from
2.9 secs to 1.2 secs.
|
|
|
|
Remarks from Quentin:
It is an important decision to use Bot
and not Top as the result of 'x / 0'.
By using Bot, we refuse to give a warrant
to the compiler that would allow meaningless
subsequent decisions. An example follows.
Clang, on my computer, will build a program
which prints "Ho" when fed the following C:
int main()
{ puts(1/0 ? "Hi" : "Ho"); }
On the other hand, a C compiler based on
QBE will build a program which crashes,
as one would expect.
See also
https://c9x.me/notes/2014-09-10.html
|
|
The sparse data-flow analysis used for
copy elimination before this patch
could sometimes diverge. The core
reason for this behavior is that the
visitphi() function was not monotonic
in the following copy-of lattice:
top (represented as the temp
/ | \ itself)
x y z ...
\ | /
bot (represented as R)
This monotonicity defect could be
fixed by reverting 2f41ff03, but
then the pass would end up missing
some redundant phis.
This patch re-implements the pass
from scratch using a different
approach. The new algorithm should
get rid of all redundant copies. On
the other hand, it can run slower
than the monotonic sparse data-flow
analysis because, in the worst case,
an instruction in a phi cluster can
be visited as many times as there
are phis in the input program.
Thanks to Michael Forney for reviewing
and testing the new pass.
|
|
Before, amatch() would prefer matching
"o + b" to "o + s*i" and "b + s*i".
|
|
The numberer made some arranging choices when
numbering arguments of an instruction, but these
decisions were ignored when matching. The fix
is to reconcile numbering and matching.
|
|
|
|
Since ce0ab53ed7, we skip over predecessors that spilled the
temporary. However, if all predecessors spilled, then we might not have
an entry in `rl`, triggering an assertion failure in the following loop.
|
|
|
|
|
|
|
|
Thanks to Michael Forney for spotting this
oversight and providing the test case.
Note: because esc() leaves ABot unchanged,
the assertion "a->type == ABot" on line 122
remains valid.
|
|
Instead of systematically spilling any
temp live in an exit branch but not in
the part of the loop already processed,
only spill when it is already known to
have been spilled.
|
|
The worst one was that "part 3" of rega()
could break the critical invariant that
two interferring temporaries get assigned
different registers. This is fixed by
being careful when changing the register
of a temporary based on predecessor
blocks.
Thanks to Michael Forney for reporting
these bugs and helping with the analysis.
|
|
That was silly... I believe qbe still
managed to work because bitsets are only
used inside a basic block where rcopy()
is not used.
|
|
If a variable is spilled in a loop, the
spiller now tries to keep it spilled over
the whole loop.
Thanks to Michael Forney for sharing a test
case exhibiting a pathological reload.
|
|
|
|
Compiler warned about comparison between signed and unsigned values.
|
|
This commit adds missing quotation marks around the argument to the
function, and changes the value of `-x' option to `c` (lowercase) as per
GCC manual [1].
[1]: https://gcc.gnu.org/onlinedocs/gcc-7.2.0/gcc/Overall-Options.html
|
|
|
|
|
|
In presence of jump loops, the algorithm would
create cycles in the disjoint-set data structure.
This caused infinite recursion and stack overflows.
|
|
The arm64 might have the same problem but it
is currently unable to handle them even in
instruction selection.
Thanks to Jean Dao for reporting the bug.
|
|
The stashing of constants in gas.c was also
changed to support 16-bytes constants.
|