Age | Commit message (Collapse) | Author |
|
If an instruction does not have a result, the
variable `s` is not set. This could lead to a
bogus slot assignment.
|
|
|
|
On test/spill1.ssa, the stack frame of
the function f() goes from 56 bytes to
40 bytes. That's a reduction of close
to 30%.
This patch also opens the door to
folding operations on spill slots.
For example
movl $15, %r15d
addl -X(%rbp), %r15d
movl %r15d, -X(%rbp)
should become
add $15, -X(%rbp)
when %r15d is not used afterwards.
|
|
They are now linear and can be
safely used with arguments that
have side-effects. This patch
also introduces an iscall()
macro and uses it to fix a
missing check for Ovacall in
liveness analysis.
|
|
|
|
|
|
In fact, after spilling, a phi
can be a temporary or a slot.
I am now pondering whether this
is a good idea or not because
it causes annoying mem->mem
movs after register allocation.
|
|
|
|
|
|
If a temporary is assigned exactly
once (most are), there is no need
to do any work to put it in ssa
form.
On an input file of ~35k loc, this
makes the processing time go from
2.9 secs to 1.2 secs.
|
|
|
|
Remarks from Quentin:
It is an important decision to use Bot
and not Top as the result of 'x / 0'.
By using Bot, we refuse to give a warrant
to the compiler that would allow meaningless
subsequent decisions. An example follows.
Clang, on my computer, will build a program
which prints "Ho" when fed the following C:
int main()
{ puts(1/0 ? "Hi" : "Ho"); }
On the other hand, a C compiler based on
QBE will build a program which crashes,
as one would expect.
See also
https://c9x.me/notes/2014-09-10.html
|
|
The sparse data-flow analysis used for
copy elimination before this patch
could sometimes diverge. The core
reason for this behavior is that the
visitphi() function was not monotonic
in the following copy-of lattice:
top (represented as the temp
/ | \ itself)
x y z ...
\ | /
bot (represented as R)
This monotonicity defect could be
fixed by reverting 2f41ff03, but
then the pass would end up missing
some redundant phis.
This patch re-implements the pass
from scratch using a different
approach. The new algorithm should
get rid of all redundant copies. On
the other hand, it can run slower
than the monotonic sparse data-flow
analysis because, in the worst case,
an instruction in a phi cluster can
be visited as many times as there
are phis in the input program.
Thanks to Michael Forney for reviewing
and testing the new pass.
|
|
Before, amatch() would prefer matching
"o + b" to "o + s*i" and "b + s*i".
|
|
The numberer made some arranging choices when
numbering arguments of an instruction, but these
decisions were ignored when matching. The fix
is to reconcile numbering and matching.
|
|
|
|
Since ce0ab53ed7, we skip over predecessors that spilled the
temporary. However, if all predecessors spilled, then we might not have
an entry in `rl`, triggering an assertion failure in the following loop.
|
|
|
|
|
|
|
|
Thanks to Michael Forney for spotting this
oversight and providing the test case.
Note: because esc() leaves ABot unchanged,
the assertion "a->type == ABot" on line 122
remains valid.
|
|
Instead of systematically spilling any
temp live in an exit branch but not in
the part of the loop already processed,
only spill when it is already known to
have been spilled.
|
|
The worst one was that "part 3" of rega()
could break the critical invariant that
two interferring temporaries get assigned
different registers. This is fixed by
being careful when changing the register
of a temporary based on predecessor
blocks.
Thanks to Michael Forney for reporting
these bugs and helping with the analysis.
|
|
That was silly... I believe qbe still
managed to work because bitsets are only
used inside a basic block where rcopy()
is not used.
|
|
If a variable is spilled in a loop, the
spiller now tries to keep it spilled over
the whole loop.
Thanks to Michael Forney for sharing a test
case exhibiting a pathological reload.
|
|
|
|
Compiler warned about comparison between signed and unsigned values.
|
|
This commit adds missing quotation marks around the argument to the
function, and changes the value of `-x' option to `c` (lowercase) as per
GCC manual [1].
[1]: https://gcc.gnu.org/onlinedocs/gcc-7.2.0/gcc/Overall-Options.html
|
|
|
|
|
|
In presence of jump loops, the algorithm would
create cycles in the disjoint-set data structure.
This caused infinite recursion and stack overflows.
|
|
The arm64 might have the same problem but it
is currently unable to handle them even in
instruction selection.
Thanks to Jean Dao for reporting the bug.
|
|
The stashing of constants in gas.c was also
changed to support 16-bytes constants.
|
|
|
|
It never worked until today.
|
|
|
|
Symbols in the source file are still limited in
length because the rest of the code assumes that
strings always fit in NString bytes.
Regardless, there is already a benefit because
comparing/copying symbol names does not require
using strcmp()/strcpy() anymore.
|
|
The previous heuristics were ad hoc and it was
hard to understand why they worked at all.
This patch can be summarized in three points:
1. When a register is freed (an instruction
assigns it), we try to find if a temporary
would like to be in it, and if we find one,
we move it in the newly freed register.
I call this an "eager move".
2. Temporaries now remember in what register
they were last allocated; this information
is stored in the field Tmp.visit, and
prevails on the field Tmp.hint when it is
set. (This makes having the same hint for
interfering temporaries not so disastrous.)
3. Blocks are now allocated in "onion" order,
from the innermost loop to the outermost.
This is the change I am the least sure
about; it should be evaluated thorougly.
|
|
I now take the view that a phi is "used" at the
end of all the predecessors. (Think that copies
are made to phis at the end of all predecessors.)
|
|
Before this commit, I tried to make sure that
two interfering temporaries never ended up in
the same phi class.
This was to make sure that their register hints
were not counterproductively stepping on each
other's toes. The idea is fine, but:
* the implementation is clumsy because it
mixes the orthogonal concepts of
(i) interference and (ii) phi classes;
* the hinting process in the register
allocator is hard to understand because
the disjoint-set data structure used for
phi classes is cut in arbitrary places.
After this commit, the phi classes *really* are
phi classes represented with a proper disjoint-set
data structure.
|
|
|
|
|
|
Since the environment can only be of type `l`, do
not require to write it down.
I also tightened the type information of the `pare`
and `arge` instructions.
|
|
|
|
|
|
|
|
Ori needs this. It should not cost much more memory at runtime,
only a minimal amount of address space.
|
|
The previous code was buggy. It would put a stack
pointer on the heap when handling "add $foo, 42".
The new code is more straightforward and hopefully
more correct. Only temporaries with a "stack"
alias class will have a slot pointer.
|
|
|
|
With the default toolchain, it looks like we have to
make sure all symbols are loaded using rip-relative
addressing.
|