Age | Commit message (Collapse) | Author |
|
If registers spill onto the stack, we may end up with SSA like
S320 =l copy 0
after rega(). Handle this case in arm64 emit().
|
|
|
|
|
|
Some abis, like the riscv one, treat
arguments differently depending on
whether they are variadic or not.
To prepare for the upcomming riscv
target, we change the variadic call
syntax and give meaning to the
location of the '...' marker.
# new syntax
%ret =w call $f(w %regular, ..., w %variadic)
By nature of their abis, the change
is backwards compatible for existing
targets.
|
|
|
|
Env calls were disfunctional from the
start. This fixes them on amd64, but
they remain to do on arm64. A new
test shows how to use them.
|
|
|
|
|
|
|
|
|
|
The documentation states that loadw is syntactic sugar for loadsw,
but it actually got parsed as Oload. If the result is an l temporary,
Oload behaves like Oloadl, not Oloadsw.
To fix this, parse Tloadw as Oloadsw explicitly.
|
|
|
|
Different architectures use different types for va_list:
x86_64 uses an 1-length array of struct type[0]:
typedef struct {
unsigned int gp_offset;
unsigned int fp_offset;
void *overflow_arg_area;
void *reg_save_area;
} va_list[1];
aarch64 uses a struct type[1]
typedef struct {
void *__stack;
void *__gr_top;
void *__vr_top;
int __gr_offs;
int __vr_offs;
} va_list;
Consequently, C functions which takes a va_list as an argument,
such as vprintf, may pass va_list in different ways depending on
the architecture.
On x86_64, va_list is an array type, so parameter decays to a pointer
and passing the address of the va_list is correct.
On aarch64, the va_list struct is passed by value, but since it is
larger than 16 bytes, the parameter is replaced with a pointer to
caller-allocated memory. Thus, passing the address as an l argument
happens to work.
However, this pattern of passing the address of the va_list to
vprintf doesn't extend to other architectures. On riscv64, va_list
is defined as
typedef void *va_list;
which is *not* passed by reference. This means that tests that call
vprintf using the address of a va_list (vararg1 and vararg2) will
not work on riscv.
To fix this while keeping the tests architecture-neutral, add a
small wrapper function to the driver which takes a va_list *, and
let the C compiler deal with the details of passing va_list by
value.
[0] https://c9x.me/compile/bib/abi-x64.pdf#figure.3.34
[1] https://c9x.me/compile/bib/abi-arm64.pdf#%5B%7B%22num%22%3A63%2C%22gen%22%3A0%7D%2C%7B%22name%22%3A%22XYZ%22%7D%2C52%2C757%2C0%5D
[2] https://github.com/riscv/riscv-elf-psabi-doc/blob/master/riscv-cc.adoc#va_list-va_start-and-va_arg$
|
|
Though I am not aware of any architecture where this matters, it
is technically incorrect to call these stdio functions as if they
had no result.
The QBE documentation says
> Unless the called function does not return a value, a return
> temporary must be specified, even if it is never used afterwards.
so we should follow it in the tests as well.
|
|
On both amd64 & arm64, the jumps
making it to rega won't have any
argument.
|
|
On x86_64, ucomis[sd] sets ZF=1, PF=0, CF=0 for equal arguments.
However, if the arguments are unordered it sets ZF=1, PF=1, CF=1,
and there is no jump/flag instruction for ZF=1 & PF=0 or ZF=1 & CF=0.
So, in order to correctly implement ceq[sd] on x86_64, we need to
be a bit more creative. There are several options available, depending
on whether the result of ceq[sd] is used with jnz, or with other
instructions, or both.
If the result is used for a conditional jump, both gcc and clang
use a combination of jp and jnz:
ucomisd %xmm1, %xmm0
jp .Lfalse
jnz .Lfalse
...
.Lfalse:
If the result is used in other instructions or return, gcc does the
following for x == y:
ucomisd %xmm1, %xmm0
setnp %al
movzbl %al, %eax
movl $0, %edx
cmovne %edx, %eax
This sets EAX to PF=0, then uses cmovne to clear it if ZF=0. It
also takes care to avoid clobbering the flags register in case the
result is also used for a conditional jump. Implementing this
approach in QBE would require adding an architecture-specific
instruction for cmovne.
In contrast, clang does an additional compare, this time using
cmpeqsd instead of ucomisd:
cmpeqsd %xmm1, %xmm0
movq %xmm0, %rax
andl $1, %rax
The cmpeqsd instruction doas a floating point equality test, setting
XMM0 to all 1s if they are equal and all 0s if they are not. However,
we need the result in a non-XMM register, so it moves the result
back then masks off all but the first bit.
Both of these approaches are a bit awkward to implement in QBE, so
instead, this commit does the following:
ucomisd %xmm1, %xmm0
setz %al
movzbl %al, %eax
setnp %cl
movzbl %cl, %ecx
andl %ecx, %eax
This sets the result by anding the two flags, but has a side effect
of clobbering the flags register. This was a problem in one of my
earlier patches to fix this issue[0], in addition to being more
complex than I'd hoped.
Instead, this commit always leaves the ceq[sd] instruction in the
block, even if the result is only used to control a jump, so that
the above instruction sequence is always used. Then, since we now
have ZF=!(ZF=1 & PF=0) for x == y, or ZF=!(ZF=0 | PF=1) for x != y,
we can use jnz for the jump instruction.
[0] https://git.sr.ht/~sircmpwn/qbe/commit/64833841b18c074a23b4a1254625315e05b86658
|
|
When the two operands are Unordered (for instance if one of them
is NaN), ucomisd sets ZF=1, PF=1, and CF=1. When the result is
LessThan, it sets ZF=0, PF=0, and CF=1.
However, jb[e]/setb[e] only checks that CF=1 [or ZF=1] which causes
the result to be true for unordered operands.
To fix this, change the operand swap condition for these two floating
point comparison types: always rewrite x < y as y > x, and never
rewrite x > y as y < x.
Add a test to check the result of cltd, cled, cgtd, cged, ceqd, and
cned with arguments that are LessThan, Equal, GreaterThan, and
Unordered. Additionally, check three different implementations for
equality testing: one that uses the result of ceqd directly, one
that uses the result to control a conditional jump, and one that
uses the result both as a value and for a conditional jump. For
now, unordered equality tests are still broken so they are disabled.
|
|
The negate trick is unnecessary and broken when the first arg is the
result.
|
|
This was intended, but was missing due to a typo in the test status
variable.
|
|
This was causing issues with aggregate types. A simple reproduction is:
type :type.1 = align 8 { 24 }
type :type.2 = align 8 { w 1, :type.1 1 }
The size of type.2 should be 32, adding only 4 bytes of padding between
the first and second field. Prior to this patch, 20 bytes of padding was
added instead, causing the type to have a size of 48.
Signed-off-by: Drew DeVault <sir@cmpwn.com>
|
|
udiv %x, 1 == %x, and for each of sub, or, xor, sar, shr, and shl,
<op> %x, 0 == %x.
|
|
In cases where stash was 0, gasemitfin exits immediately and the
GNU-stack note isn't added to the asm output. This would result in an
executable where GNU_STACK uses flags RWE instead of the desired RW.
|
|
Reported by Alessandro Mantovani.
These addresses are likely bogus, but
they triggered an unwarranted assertion
failure. We now raise a civilized error.
|
|
Previously, all casts but d->w, d->s, l->s, s->d, w->d were supported.
At least the first three can occur by storing to then loading from
a slot, currently triggering an assertion failure. Though the other
two might not be possible, they are easy enough to support as well.
Fixes hare#360.
|
|
Reported by Alessandro Mantovani.
Although unlikely in real programs it
was found that using the address of a
fast local in amd64 shifts triggers
assertion failures.
We now err when the shift count is
given by an address; but we allow
shifting an address.
|
|
Reported by Alessandro Mantovani.
Overly long function names would
trigger out-of-bounds accesses.
|
|
Reported by Alessandro Mantovani.
Unlikely to be hit in practice
because we don't add addresses to
addresses.
type :biggie = { l, l, l }
function $repro(:biggie %p) {
@start
%x =l add %p, $a
storew 42, %x
ret
}
|
|
selcmp may potentially swap the arguments and return 1 indicating
that the opposite operation should be used. However, if the compare
result is used for a conditional jump as well as elsewhere, the
original compare op is used instead of the opposite.
To fix this, add a check to see whether the opposite compare should
be used, regardless of whether selcmp() is done now, or later on
during sel().
Bug report and test case from Charlie Stanton.
|
|
|
|
|
|
This reverts commit be3a67a7f5079f30b0ccc696d549fd03a2dbbad1.
qemu-system-aarch64 is a full system emulator and is not suitable
for running the qbe test suite (at least without a kernel and root
filesystem).
|
|
The no-op `copy R0` is necessary in order to trigger dopm in spill.c
and rega.c, which assume that a call is always followed by one or
more copies from registers. However, the arm64 ABI does not actually
return the caller-passed pointer as in x86_64. This causes an
assertion failure
qbe: aarch64: Assertion failed: r == T.rglob || b == fn->start (spill.c: spill: 470)
for the following test program
type :t = { l 3 }
function $f() {
@start.1
@start.2
%ret =:t call $g()
ret
}
The assertion failure only triggers when the block containing the
call is not the first block, because the check is skipped for the
first block (since some registers may have been used for arguments).
To fix this, set R0 in the call data so that spill/rega can see
that this dummy "return" register was generated by the call. This
matches qbe's existing behavior when the function returns void,
another case where no register is used for the function result.
|
|
This makes it easier to determine which flag to pass to show the
desired debug info.
|
|
|
|
|
|
|
|
|
|
This allows you to explicitly specify the section to emit the data
directive for, allowing for sections other than .data: for example, .bss
or .init_array.
|
|
|
|
GNU ld uses the presence of these notes to determine the flags of
the final GNU_STACK program header. If they are present in every
object, then the resulting executable's GNU_STACK uses flags RW
instead of RWE.
Reported by Érico Nogueira Rolim.
|
|
The immediate in the add instruction is only 12 bits. If the offset
does not fit, we must move it into a register first.
|
|
|
|
Otherwise, if a constant is stored as a float and retrieved as an
int, the padding bits are uninitialized. This can result in the
generation of invalid assembly:
Error: suffix or operands invalid for `cvtsi2ss'
Reported by Hiltjo Posthuma.
|
|
Thanks to Jakob for pointing this out.
|
|
Otherwise, we may end up using an integer and floating class for the
same register, triggering an assertion failure:
qbe: rega.c:215: pmrec: Assertion `KBASE(pm[i].cls) == KBASE(*k)' failed.
Test case:
type :T = { s }
export
function $d(:T %.1, s %.2) {
@start
call $c(s %.2)
ret
}
|
|
|
|
According to the ARMv8 overview document
However if SP is used as the base register then the value of the stack
pointer prior to adding any offset must be quadword (16 byte) aligned,
or else a stack alignment exception will be generated.
This manifests as a bus error on my system.
To resolve this, just save registers two at a time with stp.
|
|
This now only limits the number of arguments when parsing the input SSA,
which is usually a small fixed size (depending on the frontend).
|
|
|
|
C99 6.5.2.5p6:
> If the compound literal occurs outside the body of a function,
> the object has static storage duration; otherwise, it has automatic
> storage duration associated with the enclosing block.
So, we can't use the address of a compound literal here. Instead,
just set p to NULL, and make the loop conditional on p being non-NULL.
Remarks from Quentin:
I made a cosmetic change to Michael's
original patch and merely pushed the
literal at toplevel.
|