Age | Commit message (Collapse) | Author |
|
|
|
|
|
|
|
amd64 lacks instruction for this so it has to be implemented with
float -> signed casts. The approach is borrowed from llvm.
|
|
amd64 lacks an instruction for this so it has to be implemented with
signed -> float casts:
- Word casting is done by zero-extending the word to a long and then doing
a regular signed cast.
- Long casting is done by dividing by two with correct rounding if the
highest bit is set and casting that to float, then adding
1 to mantissa with integer addition
|
|
|
|
|
|
|
|
Necessary for floating-point negation, because
`%result = sub 0, %operand` doesn't give the correct sign for 0/-0.
|
|
When slots are used with a large offset,
the emitter generates invalid assembly
code. That is caught later on by the
assembler, but it prevents compilation
of programs with large stack frames.
When a slot offset is too large to be
expressed as a constant offset to x29
(the frame pointer), emitins() inserts
a late Oaddr instruction to x16 and
replaces the large slot reference with
x16.
This change also gave me the opportunity
to refactor the save/restore logic for
callee-save registers.
This fixes the following Hare issue:
https://todo.sr.ht/~sircmpwn/hare/387
|
|
parseref() has code to reuse address constants, but this is not
done in other passes such as fold or isel. Introduce a new function
newcon() which takes a Con and returns a Ref for that constant, and
use this whenever creating address constants.
This is necessary to fix folding of address constants when one
operand is already folded. For example, in
%a =l add $x, 1
%b =l add %a, 2
%c =w loadw %b
%a and %b were folded to $x+1 and $x+3 respectively, but then the
second add is visited again since it uses %a. This gets folded to
$x+3 as well, but as a new distinct constant. This results in %b
getting labeled as bottom instead of either constant, disabling the
replacement of %b by a constant in subsequent instructions (such
as the loadw).
|
|
|
|
|
|
This may happen in a branch QBE doesn't realize is unreachable,
for example (simplified from real code found in ncurses)
data $str = { b "abcdef", b 0 }
function l $f(w %x) {
@start
%.1 =w ceqw %x, 0
jnz %.1, @logic_join, @logic_right
@logic_right
%p =l call $strchr(l $str, w %x)
%.2 =w ceql %p, 0
@logic_join
%.3 =w phi @start %.1, @logic_right %.2
jnz %.3, @fail, @return
@fail
ret 0
@return
%.4 =l sub %p, $str
ret %.4
}
|
|
If the size of the struct is not a multiple of 8, the actual struct
size may be different from the size reserved on the stack.
This fixes the case where the struct is passed in memory, but we
still may over-read a struct passed in registers. A TODO is added
for now.
|
|
Michael found a bug where some copies
from registers to memory in the arm64
abi clobber the stack. The test case
is:
type :T = { w }
function w $f() {
@start
%p =:T call $g()
%x =w loadw %p
ret %x
}
qbe will write 4 bytes out of bounds
when pulling the result struct from
its register. The same bug can be
observed if :T's definition is {w 3};
in this case qbe writes 16 bytes in
a slot of 12 bytes.
This patch changes stkblob() to use
the rounded argument size if it is
going to be restored from registers.
Relatedly, mem->reg loads for structs
with size < 16 and != 8, are treated
a bit sloppily both in the arm64 and
in the sysv abis. That is much less
harmful than the present bug.
|
|
|
|
This make it easier to understand the differences.
|
|
|
|
Some arm64 abi tests have been failing
for some time now. This fixes them by
being a bit more careful with liveset
management in spill.c.
A late bsclr() call in spill.c may drop
legitimately live registers in e.g.,
R12 =w add R12, 1
While it hurts for regs, it does not
matter for ssa temps because those cannot
be both in the arguments & return (by the
ssa invariant). I added a check before
bsclr() to make sure we are clearing
only ssa temps.
One might be surprised that any ssa temp
may be live at this point. The reason why
this is the case is the special handling
of dead return values earlier in spill().
I think that it is the only case where
the return value can be (awkwardly) live
at the same time as the arguments, and I
think this never happens with registers
(i.e., we never have dead register-
assigning instructions). I added an
assert to check the latter invariant.
Finally, there was a simple bug in the
arm64 abi which I fixed: In case the return
happens via a pointer, x8 needs to be marked
live at the beginning of the function. This
was caught by test/abi4.ssa.
|
|
Tested-by: Thomas Bracht Laumann Jespersen <t@laumann.xyz>
Fixes: https://todo.sr.ht/~sircmpwn/hare/312
|
|
Fixes #467. It assumes that the stack won't need to grow beyond 2^32 bytes.
If that were to happen, we'd need another or at most two more `movk` instructions.
Signed-off-by: Sudipto Mallick <smlckz@disroot.org>
|
|
If registers spill onto the stack, we may end up with SSA like
S320 =l copy 0
after rega(). Handle this case in arm64 emit().
|
|
|
|
|
|
Some abis, like the riscv one, treat
arguments differently depending on
whether they are variadic or not.
To prepare for the upcomming riscv
target, we change the variadic call
syntax and give meaning to the
location of the '...' marker.
# new syntax
%ret =w call $f(w %regular, ..., w %variadic)
By nature of their abis, the change
is backwards compatible for existing
targets.
|
|
|
|
Env calls were disfunctional from the
start. This fixes them on amd64, but
they remain to do on arm64. A new
test shows how to use them.
|
|
|
|
|
|
|
|
|
|
The documentation states that loadw is syntactic sugar for loadsw,
but it actually got parsed as Oload. If the result is an l temporary,
Oload behaves like Oloadl, not Oloadsw.
To fix this, parse Tloadw as Oloadsw explicitly.
|
|
|
|
Different architectures use different types for va_list:
x86_64 uses an 1-length array of struct type[0]:
typedef struct {
unsigned int gp_offset;
unsigned int fp_offset;
void *overflow_arg_area;
void *reg_save_area;
} va_list[1];
aarch64 uses a struct type[1]
typedef struct {
void *__stack;
void *__gr_top;
void *__vr_top;
int __gr_offs;
int __vr_offs;
} va_list;
Consequently, C functions which takes a va_list as an argument,
such as vprintf, may pass va_list in different ways depending on
the architecture.
On x86_64, va_list is an array type, so parameter decays to a pointer
and passing the address of the va_list is correct.
On aarch64, the va_list struct is passed by value, but since it is
larger than 16 bytes, the parameter is replaced with a pointer to
caller-allocated memory. Thus, passing the address as an l argument
happens to work.
However, this pattern of passing the address of the va_list to
vprintf doesn't extend to other architectures. On riscv64, va_list
is defined as
typedef void *va_list;
which is *not* passed by reference. This means that tests that call
vprintf using the address of a va_list (vararg1 and vararg2) will
not work on riscv.
To fix this while keeping the tests architecture-neutral, add a
small wrapper function to the driver which takes a va_list *, and
let the C compiler deal with the details of passing va_list by
value.
[0] https://c9x.me/compile/bib/abi-x64.pdf#figure.3.34
[1] https://c9x.me/compile/bib/abi-arm64.pdf#%5B%7B%22num%22%3A63%2C%22gen%22%3A0%7D%2C%7B%22name%22%3A%22XYZ%22%7D%2C52%2C757%2C0%5D
[2] https://github.com/riscv/riscv-elf-psabi-doc/blob/master/riscv-cc.adoc#va_list-va_start-and-va_arg$
|
|
Though I am not aware of any architecture where this matters, it
is technically incorrect to call these stdio functions as if they
had no result.
The QBE documentation says
> Unless the called function does not return a value, a return
> temporary must be specified, even if it is never used afterwards.
so we should follow it in the tests as well.
|
|
On both amd64 & arm64, the jumps
making it to rega won't have any
argument.
|
|
On x86_64, ucomis[sd] sets ZF=1, PF=0, CF=0 for equal arguments.
However, if the arguments are unordered it sets ZF=1, PF=1, CF=1,
and there is no jump/flag instruction for ZF=1 & PF=0 or ZF=1 & CF=0.
So, in order to correctly implement ceq[sd] on x86_64, we need to
be a bit more creative. There are several options available, depending
on whether the result of ceq[sd] is used with jnz, or with other
instructions, or both.
If the result is used for a conditional jump, both gcc and clang
use a combination of jp and jnz:
ucomisd %xmm1, %xmm0
jp .Lfalse
jnz .Lfalse
...
.Lfalse:
If the result is used in other instructions or return, gcc does the
following for x == y:
ucomisd %xmm1, %xmm0
setnp %al
movzbl %al, %eax
movl $0, %edx
cmovne %edx, %eax
This sets EAX to PF=0, then uses cmovne to clear it if ZF=0. It
also takes care to avoid clobbering the flags register in case the
result is also used for a conditional jump. Implementing this
approach in QBE would require adding an architecture-specific
instruction for cmovne.
In contrast, clang does an additional compare, this time using
cmpeqsd instead of ucomisd:
cmpeqsd %xmm1, %xmm0
movq %xmm0, %rax
andl $1, %rax
The cmpeqsd instruction doas a floating point equality test, setting
XMM0 to all 1s if they are equal and all 0s if they are not. However,
we need the result in a non-XMM register, so it moves the result
back then masks off all but the first bit.
Both of these approaches are a bit awkward to implement in QBE, so
instead, this commit does the following:
ucomisd %xmm1, %xmm0
setz %al
movzbl %al, %eax
setnp %cl
movzbl %cl, %ecx
andl %ecx, %eax
This sets the result by anding the two flags, but has a side effect
of clobbering the flags register. This was a problem in one of my
earlier patches to fix this issue[0], in addition to being more
complex than I'd hoped.
Instead, this commit always leaves the ceq[sd] instruction in the
block, even if the result is only used to control a jump, so that
the above instruction sequence is always used. Then, since we now
have ZF=!(ZF=1 & PF=0) for x == y, or ZF=!(ZF=0 | PF=1) for x != y,
we can use jnz for the jump instruction.
[0] https://git.sr.ht/~sircmpwn/qbe/commit/64833841b18c074a23b4a1254625315e05b86658
|
|
When the two operands are Unordered (for instance if one of them
is NaN), ucomisd sets ZF=1, PF=1, and CF=1. When the result is
LessThan, it sets ZF=0, PF=0, and CF=1.
However, jb[e]/setb[e] only checks that CF=1 [or ZF=1] which causes
the result to be true for unordered operands.
To fix this, change the operand swap condition for these two floating
point comparison types: always rewrite x < y as y > x, and never
rewrite x > y as y < x.
Add a test to check the result of cltd, cled, cgtd, cged, ceqd, and
cned with arguments that are LessThan, Equal, GreaterThan, and
Unordered. Additionally, check three different implementations for
equality testing: one that uses the result of ceqd directly, one
that uses the result to control a conditional jump, and one that
uses the result both as a value and for a conditional jump. For
now, unordered equality tests are still broken so they are disabled.
|
|
The negate trick is unnecessary and broken when the first arg is the
result.
|
|
This was intended, but was missing due to a typo in the test status
variable.
|
|
This was causing issues with aggregate types. A simple reproduction is:
type :type.1 = align 8 { 24 }
type :type.2 = align 8 { w 1, :type.1 1 }
The size of type.2 should be 32, adding only 4 bytes of padding between
the first and second field. Prior to this patch, 20 bytes of padding was
added instead, causing the type to have a size of 48.
Signed-off-by: Drew DeVault <sir@cmpwn.com>
|
|
udiv %x, 1 == %x, and for each of sub, or, xor, sar, shr, and shl,
<op> %x, 0 == %x.
|
|
In cases where stash was 0, gasemitfin exits immediately and the
GNU-stack note isn't added to the asm output. This would result in an
executable where GNU_STACK uses flags RWE instead of the desired RW.
|
|
Reported by Alessandro Mantovani.
These addresses are likely bogus, but
they triggered an unwarranted assertion
failure. We now raise a civilized error.
|
|
Previously, all casts but d->w, d->s, l->s, s->d, w->d were supported.
At least the first three can occur by storing to then loading from
a slot, currently triggering an assertion failure. Though the other
two might not be possible, they are easy enough to support as well.
Fixes hare#360.
|
|
Reported by Alessandro Mantovani.
Although unlikely in real programs it
was found that using the address of a
fast local in amd64 shifts triggers
assertion failures.
We now err when the shift count is
given by an address; but we allow
shifting an address.
|
|
Reported by Alessandro Mantovani.
Overly long function names would
trigger out-of-bounds accesses.
|
|
Reported by Alessandro Mantovani.
Unlikely to be hit in practice
because we don't add addresses to
addresses.
type :biggie = { l, l, l }
function $repro(:biggie %p) {
@start
%x =l add %p, $a
storew 42, %x
ret
}
|
|
selcmp may potentially swap the arguments and return 1 indicating
that the opposite operation should be used. However, if the compare
result is used for a conditional jump as well as elsewhere, the
original compare op is used instead of the opposite.
To fix this, add a check to see whether the opposite compare should
be used, regardless of whether selcmp() is done now, or later on
during sel().
Bug report and test case from Charlie Stanton.
|