Age | Commit message (Collapse) | Author |
|
This is necessary because, post
fusion, dead stores may clobber
data. A new test case exposes
one such situation.
|
|
When checking if two slices represent
the same range of memory we must check
that offsets match.
The bug was revealed by a harec test.
|
|
|
|
It is quite similar to arm64_apple.
Probably, the call that needs to be
generated also provides extra
invariants on top of the regular
abi, but I have not checked that.
Clang generates code that is a bit
neater than qbe's because, on x86,
a load can be fused in a call
instruction! We do not bother with
supporting these since we expect
only sporadic use of the feature.
For reference, here is what clang
might output for a store to the
second entry of a thread-local
array of ints:
movq _x@TLVP(%rip), %rdi
callq *(%rdi)
movl %ecx, 4(%rax)
|
|
Should make qbe work on apple
arm-based hardware.
|
|
|
|
- update the test generation script to
match some manual changes
- fix some variadic calls to printf
- add a test case where an odd number of
slots is used on the stack before varargs
|
|
They are meant to exercise the
hardware floating-point calling
convention of the risc-v target.
|
|
|
|
It is mostly complete, but still has a few ABI bugs when passing
floats in structs, or when structs are passed partly in register,
and partly on stack.
|
|
|
|
|
|
amd64 lacks instruction for this so it has to be implemented with
float -> signed casts. The approach is borrowed from llvm.
|
|
amd64 lacks an instruction for this so it has to be implemented with
signed -> float casts:
- Word casting is done by zero-extending the word to a long and then doing
a regular signed cast.
- Long casting is done by dividing by two with correct rounding if the
highest bit is set and casting that to float, then adding
1 to mantissa with integer addition
|
|
|
|
|
|
Some abis, like the riscv one, treat
arguments differently depending on
whether they are variadic or not.
To prepare for the upcomming riscv
target, we change the variadic call
syntax and give meaning to the
location of the '...' marker.
# new syntax
%ret =w call $f(w %regular, ..., w %variadic)
By nature of their abis, the change
is backwards compatible for existing
targets.
|
|
Env calls were disfunctional from the
start. This fixes them on amd64, but
they remain to do on arm64. A new
test shows how to use them.
|
|
Different architectures use different types for va_list:
x86_64 uses an 1-length array of struct type[0]:
typedef struct {
unsigned int gp_offset;
unsigned int fp_offset;
void *overflow_arg_area;
void *reg_save_area;
} va_list[1];
aarch64 uses a struct type[1]
typedef struct {
void *__stack;
void *__gr_top;
void *__vr_top;
int __gr_offs;
int __vr_offs;
} va_list;
Consequently, C functions which takes a va_list as an argument,
such as vprintf, may pass va_list in different ways depending on
the architecture.
On x86_64, va_list is an array type, so parameter decays to a pointer
and passing the address of the va_list is correct.
On aarch64, the va_list struct is passed by value, but since it is
larger than 16 bytes, the parameter is replaced with a pointer to
caller-allocated memory. Thus, passing the address as an l argument
happens to work.
However, this pattern of passing the address of the va_list to
vprintf doesn't extend to other architectures. On riscv64, va_list
is defined as
typedef void *va_list;
which is *not* passed by reference. This means that tests that call
vprintf using the address of a va_list (vararg1 and vararg2) will
not work on riscv.
To fix this while keeping the tests architecture-neutral, add a
small wrapper function to the driver which takes a va_list *, and
let the C compiler deal with the details of passing va_list by
value.
[0] https://c9x.me/compile/bib/abi-x64.pdf#figure.3.34
[1] https://c9x.me/compile/bib/abi-arm64.pdf#%5B%7B%22num%22%3A63%2C%22gen%22%3A0%7D%2C%7B%22name%22%3A%22XYZ%22%7D%2C52%2C757%2C0%5D
[2] https://github.com/riscv/riscv-elf-psabi-doc/blob/master/riscv-cc.adoc#va_list-va_start-and-va_arg$
|
|
Though I am not aware of any architecture where this matters, it
is technically incorrect to call these stdio functions as if they
had no result.
The QBE documentation says
> Unless the called function does not return a value, a return
> temporary must be specified, even if it is never used afterwards.
so we should follow it in the tests as well.
|
|
On x86_64, ucomis[sd] sets ZF=1, PF=0, CF=0 for equal arguments.
However, if the arguments are unordered it sets ZF=1, PF=1, CF=1,
and there is no jump/flag instruction for ZF=1 & PF=0 or ZF=1 & CF=0.
So, in order to correctly implement ceq[sd] on x86_64, we need to
be a bit more creative. There are several options available, depending
on whether the result of ceq[sd] is used with jnz, or with other
instructions, or both.
If the result is used for a conditional jump, both gcc and clang
use a combination of jp and jnz:
ucomisd %xmm1, %xmm0
jp .Lfalse
jnz .Lfalse
...
.Lfalse:
If the result is used in other instructions or return, gcc does the
following for x == y:
ucomisd %xmm1, %xmm0
setnp %al
movzbl %al, %eax
movl $0, %edx
cmovne %edx, %eax
This sets EAX to PF=0, then uses cmovne to clear it if ZF=0. It
also takes care to avoid clobbering the flags register in case the
result is also used for a conditional jump. Implementing this
approach in QBE would require adding an architecture-specific
instruction for cmovne.
In contrast, clang does an additional compare, this time using
cmpeqsd instead of ucomisd:
cmpeqsd %xmm1, %xmm0
movq %xmm0, %rax
andl $1, %rax
The cmpeqsd instruction doas a floating point equality test, setting
XMM0 to all 1s if they are equal and all 0s if they are not. However,
we need the result in a non-XMM register, so it moves the result
back then masks off all but the first bit.
Both of these approaches are a bit awkward to implement in QBE, so
instead, this commit does the following:
ucomisd %xmm1, %xmm0
setz %al
movzbl %al, %eax
setnp %cl
movzbl %cl, %ecx
andl %ecx, %eax
This sets the result by anding the two flags, but has a side effect
of clobbering the flags register. This was a problem in one of my
earlier patches to fix this issue[0], in addition to being more
complex than I'd hoped.
Instead, this commit always leaves the ceq[sd] instruction in the
block, even if the result is only used to control a jump, so that
the above instruction sequence is always used. Then, since we now
have ZF=!(ZF=1 & PF=0) for x == y, or ZF=!(ZF=0 | PF=1) for x != y,
we can use jnz for the jump instruction.
[0] https://git.sr.ht/~sircmpwn/qbe/commit/64833841b18c074a23b4a1254625315e05b86658
|
|
When the two operands are Unordered (for instance if one of them
is NaN), ucomisd sets ZF=1, PF=1, and CF=1. When the result is
LessThan, it sets ZF=0, PF=0, and CF=1.
However, jb[e]/setb[e] only checks that CF=1 [or ZF=1] which causes
the result to be true for unordered operands.
To fix this, change the operand swap condition for these two floating
point comparison types: always rewrite x < y as y > x, and never
rewrite x > y as y < x.
Add a test to check the result of cltd, cled, cgtd, cged, ceqd, and
cned with arguments that are LessThan, Equal, GreaterThan, and
Unordered. Additionally, check three different implementations for
equality testing: one that uses the result of ceqd directly, one
that uses the result to control a conditional jump, and one that
uses the result both as a value and for a conditional jump. For
now, unordered equality tests are still broken so they are disabled.
|
|
This was causing issues with aggregate types. A simple reproduction is:
type :type.1 = align 8 { 24 }
type :type.2 = align 8 { w 1, :type.1 1 }
The size of type.2 should be 32, adding only 4 bytes of padding between
the first and second field. Prior to this patch, 20 bytes of padding was
added instead, causing the type to have a size of 48.
Signed-off-by: Drew DeVault <sir@cmpwn.com>
|
|
selcmp may potentially swap the arguments and return 1 indicating
that the opposite operation should be used. However, if the compare
result is used for a conditional jump as well as elsewhere, the
original compare op is used instead of the opposite.
To fix this, add a check to see whether the opposite compare should
be used, regardless of whether selcmp() is done now, or later on
during sel().
Bug report and test case from Charlie Stanton.
|
|
The value argument of store instructions was
handled incorrectly.
|
|
This fixes similar bugs than the ones fixed
in the previous commit.
In the folding code the invariant is that
when a result is 32 bits wide, the low 32
bits of 'x' are correct. The high bits
can be anything.
|
|
|
|
This was generated by csmith and then compiled
to qbe il by Michael Forney's C compiler.
|
|
|
|
I had forgotten that %rip can only be
used as base when there is no index.
I also added a test which stresses
addressing selection with and without
constants.
|
|
|
|
|
|
The numberer made some arranging choices when
numbering arguments of an instruction, but these
decisions were ignored when matching. The fix
is to reconcile numbering and matching.
|
|
Thanks to Michael Forney for spotting this
oversight and providing the test case.
Note: because esc() leaves ABot unchanged,
the assertion "a->type == ABot" on line 122
remains valid.
|
|
The worst one was that "part 3" of rega()
could break the critical invariant that
two interferring temporaries get assigned
different registers. This is fixed by
being careful when changing the register
of a temporary based on predecessor
blocks.
Thanks to Michael Forney for reporting
these bugs and helping with the analysis.
|
|
|
|
The arm64 might have the same problem but it
is currently unable to handle them even in
instruction selection.
Thanks to Jean Dao for reporting the bug.
|
|
|
|
The vararg tests had to be changed because
va_list is 32-bit wide on arm. The astute
reader will notice that the way we pass
va_list values is wrong, we should be using
the ':valist' type as defined below instead
of 'l'. But eh, that works for now, because
of the ABI.
type :valist = align 8 { 32 }
|
|
|
|
|
|
|
|
This change is backward compatible, calls to
"variadic" functions (like printf) must now be
annotated (with ...).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|