Age | Commit message (Collapse) | Author |
|
Support "file" and "loc" directives. "file" takes a string (a file name)
assigns it a number, sets the current file to that number and records
the string for later. "loc" takes a single number and outputs location
information with a reference to the current file.
|
|
We now treat thread-local
symbols in Mems properly.
|
|
Non-store/load instructions were
not lowered correctly for thread-
local symbols. This is an attempt
at a fix (cannot test for now).
|
|
|
|
Thanks to Lassi Pulkkinen for
flagging the issue and pointing
me to Ulrich Drepper's extensive
doc [1].
[1] https://people.redhat.com/drepper/tls.pdf
|
|
This is consistent with newtmp()
and newcon().
|
|
|
|
|
|
|
|
Crashing loads of uninitialized memory
proved to be a problem when implementing
unions using qbe. This patch introduces
a new UNDEF Ref to represent data that is
known to be uninitialized. Optimization
passes can make use of it to eliminate
some code. In the last compilation stages,
UNDEF is treated as the constant 0xdeaddead.
|
|
|
|
The .val field is signed in RSlot.
Add a new dedicated function to
fetch it as a signed int.
|
|
It is handy to express when
the end of a block cannot be
reached. If a hlt terminator
is executed, it traps the
program.
We don't go the llvm way and
specify execution semantics as
undefined behavior.
|
|
Symbols are a useful abstraction
that occurs in both Con and Alias.
In this patch they get their own
struct. This new struct packages
a symbol name and a type; the type
tells us where the symbol name
must be interpreted (currently, in
gobal memory or in thread-local
storage).
The refactor fixed a bug in
addcon(), proving the value of
packaging symbol names with their
type.
|
|
It is quite similar to arm64_apple.
Probably, the call that needs to be
generated also provides extra
invariants on top of the regular
abi, but I have not checked that.
Clang generates code that is a bit
neater than qbe's because, on x86,
a load can be fused in a call
instruction! We do not bother with
supporting these since we expect
only sporadic use of the feature.
For reference, here is what clang
might output for a store to the
second entry of a thread-local
array of ints:
movq _x@TLVP(%rip), %rdi
callq *(%rdi)
movl %ecx, 4(%rax)
|
|
It is more natural to branch on a
flag than have different function
pointers for high-level passes.
|
|
|
|
The apple targets are not done yet.
|
|
|
|
The general idea is to give abis a
chance to talk before we've done all
the optimizations. Currently, all
targets eliminate {par,arg,ret}{sb,ub,...}
during this pass. The forthcoming
arm64_apple will, however, insert
proper extensions during abi0.
Moving forward abis can, for example,
lower small-aggregates passing there
so that memory optimizations can
interact better with function calls.
|
|
apple support is more than assembly syntax
in case of arm64 machines, and apple syntax
is currently useless in all cases but amd64;
rather than having a -G option that only
makes sense with amd64, we add a new target
amd64_apple
|
|
This may cause invalid assembly to be generated
and is not all that useful anyway after constant
folding has run.
|
|
|
|
|
|
I also moved some isel logic
that would have been repeated
a third time in util.c.
|
|
The risc-v abi needs to know if a
type is defined as a union or not.
We cannot use nunion to obtain this
information because the risc-v abi
made the unfortunate decision of
treating
union { int i; }
differently from
int i;
So, instead, I introduce a single
bit flag 'isunion'.
|
|
|
|
|
|
|
|
amd64 lacks instruction for this so it has to be implemented with
float -> signed casts. The approach is borrowed from llvm.
|
|
amd64 lacks an instruction for this so it has to be implemented with
signed -> float casts:
- Word casting is done by zero-extending the word to a long and then doing
a regular signed cast.
- Long casting is done by dividing by two with correct rounding if the
highest bit is set and casting that to float, then adding
1 to mantissa with integer addition
|
|
Necessary for floating-point negation, because
`%result = sub 0, %operand` doesn't give the correct sign for 0/-0.
|
|
parseref() has code to reuse address constants, but this is not
done in other passes such as fold or isel. Introduce a new function
newcon() which takes a Con and returns a Ref for that constant, and
use this whenever creating address constants.
This is necessary to fix folding of address constants when one
operand is already folded. For example, in
%a =l add $x, 1
%b =l add %a, 2
%c =w loadw %b
%a and %b were folded to $x+1 and $x+3 respectively, but then the
second add is visited again since it uses %a. This gets folded to
$x+3 as well, but as a new distinct constant. This results in %b
getting labeled as bottom instead of either constant, disabling the
replacement of %b by a constant in subsequent instructions (such
as the loadw).
|
|
If the size of the struct is not a multiple of 8, the actual struct
size may be different from the size reserved on the stack.
This fixes the case where the struct is passed in memory, but we
still may over-read a struct passed in registers. A TODO is added
for now.
|
|
Some abis, like the riscv one, treat
arguments differently depending on
whether they are variadic or not.
To prepare for the upcomming riscv
target, we change the variadic call
syntax and give meaning to the
location of the '...' marker.
# new syntax
%ret =w call $f(w %regular, ..., w %variadic)
By nature of their abis, the change
is backwards compatible for existing
targets.
|
|
Env calls were disfunctional from the
start. This fixes them on amd64, but
they remain to do on arm64. A new
test shows how to use them.
|
|
|
|
On x86_64, ucomis[sd] sets ZF=1, PF=0, CF=0 for equal arguments.
However, if the arguments are unordered it sets ZF=1, PF=1, CF=1,
and there is no jump/flag instruction for ZF=1 & PF=0 or ZF=1 & CF=0.
So, in order to correctly implement ceq[sd] on x86_64, we need to
be a bit more creative. There are several options available, depending
on whether the result of ceq[sd] is used with jnz, or with other
instructions, or both.
If the result is used for a conditional jump, both gcc and clang
use a combination of jp and jnz:
ucomisd %xmm1, %xmm0
jp .Lfalse
jnz .Lfalse
...
.Lfalse:
If the result is used in other instructions or return, gcc does the
following for x == y:
ucomisd %xmm1, %xmm0
setnp %al
movzbl %al, %eax
movl $0, %edx
cmovne %edx, %eax
This sets EAX to PF=0, then uses cmovne to clear it if ZF=0. It
also takes care to avoid clobbering the flags register in case the
result is also used for a conditional jump. Implementing this
approach in QBE would require adding an architecture-specific
instruction for cmovne.
In contrast, clang does an additional compare, this time using
cmpeqsd instead of ucomisd:
cmpeqsd %xmm1, %xmm0
movq %xmm0, %rax
andl $1, %rax
The cmpeqsd instruction doas a floating point equality test, setting
XMM0 to all 1s if they are equal and all 0s if they are not. However,
we need the result in a non-XMM register, so it moves the result
back then masks off all but the first bit.
Both of these approaches are a bit awkward to implement in QBE, so
instead, this commit does the following:
ucomisd %xmm1, %xmm0
setz %al
movzbl %al, %eax
setnp %cl
movzbl %cl, %ecx
andl %ecx, %eax
This sets the result by anding the two flags, but has a side effect
of clobbering the flags register. This was a problem in one of my
earlier patches to fix this issue[0], in addition to being more
complex than I'd hoped.
Instead, this commit always leaves the ceq[sd] instruction in the
block, even if the result is only used to control a jump, so that
the above instruction sequence is always used. Then, since we now
have ZF=!(ZF=1 & PF=0) for x == y, or ZF=!(ZF=0 | PF=1) for x != y,
we can use jnz for the jump instruction.
[0] https://git.sr.ht/~sircmpwn/qbe/commit/64833841b18c074a23b4a1254625315e05b86658
|
|
When the two operands are Unordered (for instance if one of them
is NaN), ucomisd sets ZF=1, PF=1, and CF=1. When the result is
LessThan, it sets ZF=0, PF=0, and CF=1.
However, jb[e]/setb[e] only checks that CF=1 [or ZF=1] which causes
the result to be true for unordered operands.
To fix this, change the operand swap condition for these two floating
point comparison types: always rewrite x < y as y > x, and never
rewrite x > y as y < x.
Add a test to check the result of cltd, cled, cgtd, cged, ceqd, and
cned with arguments that are LessThan, Equal, GreaterThan, and
Unordered. Additionally, check three different implementations for
equality testing: one that uses the result of ceqd directly, one
that uses the result to control a conditional jump, and one that
uses the result both as a value and for a conditional jump. For
now, unordered equality tests are still broken so they are disabled.
|
|
The negate trick is unnecessary and broken when the first arg is the
result.
|
|
Reported by Alessandro Mantovani.
These addresses are likely bogus, but
they triggered an unwarranted assertion
failure. We now raise a civilized error.
|
|
Reported by Alessandro Mantovani.
Although unlikely in real programs it
was found that using the address of a
fast local in amd64 shifts triggers
assertion failures.
We now err when the shift count is
given by an address; but we allow
shifting an address.
|
|
Reported by Alessandro Mantovani.
Unlikely to be hit in practice
because we don't add addresses to
addresses.
type :biggie = { l, l, l }
function $repro(:biggie %p) {
@start
%x =l add %p, $a
storew 42, %x
ret
}
|
|
selcmp may potentially swap the arguments and return 1 indicating
that the opposite operation should be used. However, if the compare
result is used for a conditional jump as well as elsewhere, the
original compare op is used instead of the opposite.
To fix this, add a check to see whether the opposite compare should
be used, regardless of whether selcmp() is done now, or later on
during sel().
Bug report and test case from Charlie Stanton.
|
|
|
|
Otherwise, we may end up using an integer and floating class for the
same register, triggering an assertion failure:
qbe: rega.c:215: pmrec: Assertion `KBASE(pm[i].cls) == KBASE(*k)' failed.
Test case:
type :T = { s }
export
function $d(:T %.1, s %.2) {
@start
call $c(s %.2)
ret
}
|
|
|
|
|
|
The same functionality can be implemented
naturally in the cfg simplification pass.
|
|
Previously, each ret would lead to an
epilog. This caused bloat for large
functions with multiple return points.
|