Age | Commit message (Collapse) | Author |
|
|
|
Support "file" and "loc" directives. "file" takes a string (a file name)
assigns it a number, sets the current file to that number and records
the string for later. "loc" takes a single number and outputs location
information with a reference to the current file.
|
|
|
|
|
|
|
|
We now treat thread-local
symbols in Mems properly.
|
|
|
|
Non-store/load instructions were
not lowered correctly for thread-
local symbols. This is an attempt
at a fix (cannot test for now).
|
|
|
|
|
|
Thanks to Lassi Pulkkinen for
flagging the issue and pointing
me to Ulrich Drepper's extensive
doc [1].
[1] https://people.redhat.com/drepper/tls.pdf
|
|
This is consistent with newtmp()
and newcon().
|
|
|
|
|
|
|
|
|
|
During coalescing, the resizing/
reordering of the sl[] array
invalidates the indices stored
in the 'visit' field of temps;
we need to reset it before we
can use it again.
|
|
To match x86
|
|
This is necessary because, post
fusion, dead stores may clobber
data. A new test case exposes
one such situation.
|
|
|
|
|
|
|
|
Crashing loads of uninitialized memory
proved to be a problem when implementing
unions using qbe. This patch introduces
a new UNDEF Ref to represent data that is
known to be uninitialized. Optimization
passes can make use of it to eliminate
some code. In the last compilation stages,
UNDEF is treated as the constant 0xdeaddead.
|
|
|
|
When checking if two slices represent
the same range of memory we must check
that offsets match.
The bug was revealed by a harec test.
|
|
|
|
When multiple stack slots are coalesced
one 'alloc' instruction is kept in the il
and the other ones are removed and have
their uses replaced by the result of the
selected one. To produce valid ssa, it
must be ensured that the uses that get
replaced are dominated by the selected
'alloc' instruction. This patch ensures
dominance by moving the selected alloc up
in the start block as necessary.
|
|
We may well treat all rets as
non-escaping since stack slots
are destroyed upon funcion
return.
|
|
The .val field is signed in RSlot.
Add a new dedicated function to
fetch it as a signed int.
|
|
|
|
|
|
Some noisy assemblers complain
when asked to do it themselves.
|
|
It is handy to express when
the end of a block cannot be
reached. If a hlt terminator
is executed, it traps the
program.
We don't go the llvm way and
specify execution semantics as
undefined behavior.
|
|
|
|
Symbols are a useful abstraction
that occurs in both Con and Alias.
In this patch they get their own
struct. This new struct packages
a symbol name and a type; the type
tells us where the symbol name
must be interpreted (currently, in
gobal memory or in thread-local
storage).
The refactor fixed a bug in
addcon(), proving the value of
packaging symbol names with their
type.
|
|
|
|
When we process one block, we
start by allocating registers
for all the temporaries live
at the exit of the block.
Before this patch we processed
temps first, then in doblk() we
would mark globally live registers
allocated. This meant that temps
could get wrongly assigned a live
register.
The fix is simple: we now process
registers first at block exits,
then allocate temps.
|
|
The copy elimination pass is not
complete. This patch improves
things a bit, but I think we still
have quite a bit of incompleteness.
We now consistently mark phis with
all arguments identical as copies.
Previously, they were inconsistently
eliminated by phisimpl(). An example
where they were not eliminated is
the following:
@blk2
%a = phi @blk0 %x, @blk1 %x
jnz ?, @blk3, @blk4
@blk3
%b = copy %x
@blk4
%c = phi @blk2 %a, @blk3 %b
In this example, neither %c nor %a
were marked as copies of %x because,
when phisimpl() is called, the copy
information for %b is not available.
The incompleteness is still present
and can be observed by modifying
the example above so that %a takes
a copy of %x through a back-edge.
Then, phisimpl()'s lack of copy
information about %b will prevent
optimization.
|
|
This pass limits stack usage when
many small aggregates are allocated
on the stack. A fast liveness
analysis figures out which slots
interfere and the pass then fuses
slots that do not interfere. The
pass also kills stack slots that
are only ever assigned.
On the hare stdlib test suite, this
fusion pass managed to reduce the
total eligible slot bytes count
by 84%.
The slots considered for fusion
must not escape and not exceed
64 bytes in size.
|
|
We will be using it in the new
coalesce() pass.
|
|
The asserts (a->type == ABot) made it
impossible to run fillalias() multiple
times. We now reset the Alias.type field
of all temps before starting.
Getting rid of the asserts would have
been another option.
|
|
Stack slots may have padding
bytes, and if we want to have
precise liveness information
it's important that we are able
to tell them apart.
This patch extends fillalias()
to remember for every slot
what bytes were ever assigned.
In case the slot address does
not escape we know that only
these bytes matter.
To save space, we only store
this information if the slot
size is less than or equal to
NBit.
The Alias struct was reworked
a bit to save some space. I am
still not very satisfied with
its layout though.
|
|
|
|
We had the invariant that it'd
always be a temporary.
|
|
|
|
It is quite similar to arm64_apple.
Probably, the call that needs to be
generated also provides extra
invariants on top of the regular
abi, but I have not checked that.
Clang generates code that is a bit
neater than qbe's because, on x86,
a load can be fused in a call
instruction! We do not bother with
supporting these since we expect
only sporadic use of the feature.
For reference, here is what clang
might output for a store to the
second entry of a thread-local
array of ints:
movq _x@TLVP(%rip), %rdi
callq *(%rdi)
movl %ecx, 4(%rax)
|
|
It is documented nowhere how this is
supposed to work. It is also quite easy
to have assertion failures pop in the
linker when generating asm slightly
different from clang's!
The best source of information is found
in LLVM's source code (AArch64ISelLowering.cpp).
I paste it here for future reference:
/// Darwin only has one TLS scheme which must be capable of dealing with the
/// fully general situation, in the worst case. This means:
/// + "extern __thread" declaration.
/// + Defined in a possibly unknown dynamic library.
///
/// The general system is that each __thread variable has a [3 x i64] descriptor
/// which contains information used by the runtime to calculate the address. The
/// only part of this the compiler needs to know about is the first xword, which
/// contains a function pointer that must be called with the address of the
/// entire descriptor in "x0".
///
/// Since this descriptor may be in a different unit, in general even the
/// descriptor must be accessed via an indirect load. The "ideal" code sequence
/// is:
/// adrp x0, _var@TLVPPAGE
/// ldr x0, [x0, _var@TLVPPAGEOFF] ; x0 now contains address of descriptor
/// ldr x1, [x0] ; x1 contains 1st entry of descriptor,
/// ; the function pointer
/// blr x1 ; Uses descriptor address in x0
/// ; Address of _var is now in x0.
///
/// If the address of _var's descriptor *is* known to the linker, then it can
/// change the first "ldr" instruction to an appropriate "add x0, x0, #imm" for
/// a slight efficiency gain.
The call 'blr x1' above is actually
special in that it trashes less registers
than what the abi would normally permit.
In qbe, I don't take advantage of this
and lower the call like a regular call.
We can revise this later on. Again, the
source for this information is LLVM's
source code:
// TLS calls preserve all registers except those that absolutely must be
// trashed: X0 (it takes an argument), LR (it's a call) and NZCV (let's not be
// silly).
|
|
It is more natural to branch on a
flag than have different function
pointers for high-level passes.
|
|
When emitting data detected as zero
the comment appeared before the data
directives were output.
|
|
|