Age | Commit message (Collapse) | Author |
|
|
|
|
|
Some noisy assemblers complain
when asked to do it themselves.
|
|
It is handy to express when
the end of a block cannot be
reached. If a hlt terminator
is executed, it traps the
program.
We don't go the llvm way and
specify execution semantics as
undefined behavior.
|
|
|
|
Symbols are a useful abstraction
that occurs in both Con and Alias.
In this patch they get their own
struct. This new struct packages
a symbol name and a type; the type
tells us where the symbol name
must be interpreted (currently, in
gobal memory or in thread-local
storage).
The refactor fixed a bug in
addcon(), proving the value of
packaging symbol names with their
type.
|
|
|
|
When we process one block, we
start by allocating registers
for all the temporaries live
at the exit of the block.
Before this patch we processed
temps first, then in doblk() we
would mark globally live registers
allocated. This meant that temps
could get wrongly assigned a live
register.
The fix is simple: we now process
registers first at block exits,
then allocate temps.
|
|
The copy elimination pass is not
complete. This patch improves
things a bit, but I think we still
have quite a bit of incompleteness.
We now consistently mark phis with
all arguments identical as copies.
Previously, they were inconsistently
eliminated by phisimpl(). An example
where they were not eliminated is
the following:
@blk2
%a = phi @blk0 %x, @blk1 %x
jnz ?, @blk3, @blk4
@blk3
%b = copy %x
@blk4
%c = phi @blk2 %a, @blk3 %b
In this example, neither %c nor %a
were marked as copies of %x because,
when phisimpl() is called, the copy
information for %b is not available.
The incompleteness is still present
and can be observed by modifying
the example above so that %a takes
a copy of %x through a back-edge.
Then, phisimpl()'s lack of copy
information about %b will prevent
optimization.
|
|
This pass limits stack usage when
many small aggregates are allocated
on the stack. A fast liveness
analysis figures out which slots
interfere and the pass then fuses
slots that do not interfere. The
pass also kills stack slots that
are only ever assigned.
On the hare stdlib test suite, this
fusion pass managed to reduce the
total eligible slot bytes count
by 84%.
The slots considered for fusion
must not escape and not exceed
64 bytes in size.
|
|
We will be using it in the new
coalesce() pass.
|
|
The asserts (a->type == ABot) made it
impossible to run fillalias() multiple
times. We now reset the Alias.type field
of all temps before starting.
Getting rid of the asserts would have
been another option.
|
|
Stack slots may have padding
bytes, and if we want to have
precise liveness information
it's important that we are able
to tell them apart.
This patch extends fillalias()
to remember for every slot
what bytes were ever assigned.
In case the slot address does
not escape we know that only
these bytes matter.
To save space, we only store
this information if the slot
size is less than or equal to
NBit.
The Alias struct was reworked
a bit to save some space. I am
still not very satisfied with
its layout though.
|
|
|
|
We had the invariant that it'd
always be a temporary.
|
|
|
|
It is quite similar to arm64_apple.
Probably, the call that needs to be
generated also provides extra
invariants on top of the regular
abi, but I have not checked that.
Clang generates code that is a bit
neater than qbe's because, on x86,
a load can be fused in a call
instruction! We do not bother with
supporting these since we expect
only sporadic use of the feature.
For reference, here is what clang
might output for a store to the
second entry of a thread-local
array of ints:
movq _x@TLVP(%rip), %rdi
callq *(%rdi)
movl %ecx, 4(%rax)
|
|
It is documented nowhere how this is
supposed to work. It is also quite easy
to have assertion failures pop in the
linker when generating asm slightly
different from clang's!
The best source of information is found
in LLVM's source code (AArch64ISelLowering.cpp).
I paste it here for future reference:
/// Darwin only has one TLS scheme which must be capable of dealing with the
/// fully general situation, in the worst case. This means:
/// + "extern __thread" declaration.
/// + Defined in a possibly unknown dynamic library.
///
/// The general system is that each __thread variable has a [3 x i64] descriptor
/// which contains information used by the runtime to calculate the address. The
/// only part of this the compiler needs to know about is the first xword, which
/// contains a function pointer that must be called with the address of the
/// entire descriptor in "x0".
///
/// Since this descriptor may be in a different unit, in general even the
/// descriptor must be accessed via an indirect load. The "ideal" code sequence
/// is:
/// adrp x0, _var@TLVPPAGE
/// ldr x0, [x0, _var@TLVPPAGEOFF] ; x0 now contains address of descriptor
/// ldr x1, [x0] ; x1 contains 1st entry of descriptor,
/// ; the function pointer
/// blr x1 ; Uses descriptor address in x0
/// ; Address of _var is now in x0.
///
/// If the address of _var's descriptor *is* known to the linker, then it can
/// change the first "ldr" instruction to an appropriate "add x0, x0, #imm" for
/// a slight efficiency gain.
The call 'blr x1' above is actually
special in that it trashes less registers
than what the abi would normally permit.
In qbe, I don't take advantage of this
and lower the call like a regular call.
We can revise this later on. Again, the
source for this information is LLVM's
source code:
// TLS calls preserve all registers except those that absolutely must be
// trashed: X0 (it takes an argument), LR (it's a call) and NZCV (let's not be
// silly).
|
|
It is more natural to branch on a
flag than have different function
pointers for high-level passes.
|
|
When emitting data detected as zero
the comment appeared before the data
directives were output.
|
|
|
|
|
|
The apple targets are not done yet.
|
|
|
|
|
|
Should make qbe work on apple
arm-based hardware.
|
|
|
|
The general idea is to give abis a
chance to talk before we've done all
the optimizations. Currently, all
targets eliminate {par,arg,ret}{sb,ub,...}
during this pass. The forthcoming
arm64_apple will, however, insert
proper extensions during abi0.
Moving forward abis can, for example,
lower small-aggregates passing there
so that memory optimizations can
interact better with function calls.
|
|
|
|
Eg. data $a = { w $b $c }
|
|
|
|
We have a uint alias that we use
everywhere else. I also added a
todo about unhandled large offsets
in arm64/emit.
|
|
This generates tidier code and is pic
friendly because it lets the linker
trampoline calls to dynlinked libs.
|
|
apple support is more than assembly syntax
in case of arm64 machines, and apple syntax
is currently useless in all cases but amd64;
rather than having a -G option that only
makes sense with amd64, we add a new target
amd64_apple
|
|
|
|
|
|
- update the test generation script to
match some manual changes
- fix some variadic calls to printf
- add a test case where an odd number of
slots is used on the stack before varargs
|
|
|
|
When qbe is used with other tools is a bit hard to identify
what is the tool that is generating the error. Adding an
identifier at the beginning of the line makes much easier
to identify the tool generating the error.
|
|
POSIX specification stays:
string1 = [string2]
...
Macro expansions in string1 of macro definition lines shall
be evaluated when read. Macro expansions in string2 of macro
definition lines shall be performed when the macro identified
by string1 is expanded in a rule or command.
It means that recursive macro expansion is not guaranteed to work in
a portable Make. Also, as make is a declarative language makes more
sense to declare your targets as a primary concern instead of
derivating them from a informational macro like SRC that is only
used in a rule command.
|
|
|
|
|
|
cc can be absent in Gentoo to make sure the right compiler is picked,
for example when clang is preferred or when cross-compiling.
|
|
Makefile now compatible with gmake, bmake, smake and pdpmake.
|
|
This may cause invalid assembly to be generated
and is not all that useful anyway after constant
folding has run.
|
|
|
|
We were redundantly checking cardinality in a
way that prevented fp regs from ever being
globally live. We now check that the live
regs after a return are exactly the globally
live ones.
|
|
|
|
|
|
|