summary refs log tree commit diff
path: root/arm64
AgeCommit message (Collapse)Author
2023-03-15silence some warningsQuentin Carbonneaux
2023-03-11Emit .type and .size directives on RISC-V and ARMAlexey Yerin
To match x86
2022-12-14new blit instructionQuentin Carbonneaux
2022-12-12new rsval() helper for signed RefsQuentin Carbonneaux
The .val field is signed in RSlot. Add a new dedicated function to fetch it as a signed int.
2022-11-27new hlt block terminatorQuentin Carbonneaux
It is handy to express when the end of a block cannot be reached. If a hlt terminator is executed, it traps the program. We don't go the llvm way and specify execution semantics as undefined behavior.
2022-11-22use a new struct for symbolsQuentin Carbonneaux
Symbols are a useful abstraction that occurs in both Con and Alias. In this patch they get their own struct. This new struct packages a symbol name and a type; the type tells us where the symbol name must be interpreted (currently, in gobal memory or in thread-local storage). The refactor fixed a bug in addcon(), proving the value of packaging symbol names with their type.
2022-10-12thread-local storage for amd64_appleQuentin Carbonneaux
It is quite similar to arm64_apple. Probably, the call that needs to be generated also provides extra invariants on top of the regular abi, but I have not checked that. Clang generates code that is a bit neater than qbe's because, on x86, a load can be fused in a call instruction! We do not bother with supporting these since we expect only sporadic use of the feature. For reference, here is what clang might output for a store to the second entry of a thread-local array of ints: movq _x@TLVP(%rip), %rdi callq *(%rdi) movl %ecx, 4(%rax)
2022-10-12thread-local storage for arm64_appleQuentin Carbonneaux
It is documented nowhere how this is supposed to work. It is also quite easy to have assertion failures pop in the linker when generating asm slightly different from clang's! The best source of information is found in LLVM's source code (AArch64ISelLowering.cpp). I paste it here for future reference: /// Darwin only has one TLS scheme which must be capable of dealing with the /// fully general situation, in the worst case. This means: /// + "extern __thread" declaration. /// + Defined in a possibly unknown dynamic library. /// /// The general system is that each __thread variable has a [3 x i64] descriptor /// which contains information used by the runtime to calculate the address. The /// only part of this the compiler needs to know about is the first xword, which /// contains a function pointer that must be called with the address of the /// entire descriptor in "x0". /// /// Since this descriptor may be in a different unit, in general even the /// descriptor must be accessed via an indirect load. The "ideal" code sequence /// is: /// adrp x0, _var@TLVPPAGE /// ldr x0, [x0, _var@TLVPPAGEOFF] ; x0 now contains address of descriptor /// ldr x1, [x0] ; x1 contains 1st entry of descriptor, /// ; the function pointer /// blr x1 ; Uses descriptor address in x0 /// ; Address of _var is now in x0. /// /// If the address of _var's descriptor *is* known to the linker, then it can /// change the first "ldr" instruction to an appropriate "add x0, x0, #imm" for /// a slight efficiency gain. The call 'blr x1' above is actually special in that it trashes less registers than what the abi would normally permit. In qbe, I don't take advantage of this and lower the call like a regular call. We can revise this later on. Again, the source for this information is LLVM's source code: // TLS calls preserve all registers except those that absolutely must be // trashed: X0 (it takes an argument), LR (it's a call) and NZCV (let's not be // silly).
2022-10-08mark apple targets with a booleanQuentin Carbonneaux
It is more natural to branch on a flag than have different function pointers for high-level passes.
2022-10-08"rel" fields become "reloc"Quentin Carbonneaux
2022-10-08add support for thread-local storageQuentin Carbonneaux
The apple targets are not done yet.
2022-10-03fix case of Pool constantsQuentin Carbonneaux
2022-10-03new arm64_apple targetQuentin Carbonneaux
Should make qbe work on apple arm-based hardware.
2022-10-03add new target-specific abi0 passQuentin Carbonneaux
The general idea is to give abis a chance to talk before we've done all the optimizations. Currently, all targets eliminate {par,arg,ret}{sb,ub,...} during this pass. The forthcoming arm64_apple will, however, insert proper extensions during abi0. Moving forward abis can, for example, lower small-aggregates passing there so that memory optimizations can interact better with function calls.
2022-09-01remove two unsignedQuentin Carbonneaux
We have a uint alias that we use everywhere else. I also added a todo about unhandled large offsets in arm64/emit.
2022-09-01use direct bl calls on arm64Quentin Carbonneaux
This generates tidier code and is pic friendly because it lets the linker trampoline calls to dynlinked libs.
2022-08-31drop -G flag and add target amd64_appleQuentin Carbonneaux
apple support is more than assembly syntax in case of arm64 machines, and apple syntax is currently useless in all cases but amd64; rather than having a -G option that only makes sense with amd64, we add a new target amd64_apple
2022-05-10arm64: fix maximum immediate size for small loads/storesMichael Forney
The maximum immediate size for 1, 2, 4, and 8 byte loads/stores is 4095, 8190, 16380, and 32760 respectively[0][1][2]. [0] https://developer.arm.com/documentation/dui0802/a/A64-Data-Transfer-Instructions/LDRB--immediate- [1] https://developer.arm.com/documentation/dui0802/a/A64-Data-Transfer-Instructions/LDRH--immediate- [2] https://developer.arm.com/documentation/dui0802/a/A64-Data-Transfer-Instructions/LDR--immediate-
2022-03-17fix return for big aggregatesQuentin Carbonneaux
The recent changes in arm and riscv typclass() set ngp to 1 when a struct is returned via a caller-provided buffer. This interacts bogusly with selret() that ends up declaring a gp register live when none is set in the returning sequence. The fix is simply to set cty to zero (all registers dead) in case a caller- provided buffer is used.
2022-03-15new -t? flag to print default targetQuentin Carbonneaux
2022-03-15support env calls on arm64Quentin Carbonneaux
The x9 register is used for the env parameter.
2022-03-14dynamic stack allocs for arm64Quentin Carbonneaux
I also moved some isel logic that would have been repeated a third time in util.c.
2022-03-14improve consistency in abisQuentin Carbonneaux
2022-03-14arm64/abi: fix big aggregates passed on the stackQuentin Carbonneaux
The riscv test abi8.ssa caught a bug in the arm backend. It turns out we were using the wrong class when loading pointers to aggregates from the stack. The fix is simple and mirrors what is done in the riscv abi.
2022-03-08flag types defined as unionsQuentin Carbonneaux
The risc-v abi needs to know if a type is defined as a union or not. We cannot use nunion to obtain this information because the risc-v abi made the unfortunate decision of treating union { int i; } differently from int i; So, instead, I introduce a single bit flag 'isunion'.
2022-03-08cosmeticsQuentin Carbonneaux
2022-02-25improve consistency in arm64 and rv64 abisQuentin Carbonneaux
2022-02-02shared linkage logic for func/dataQuentin Carbonneaux
2022-01-31arm64: handle large slots in OcopyQuentin Carbonneaux
2022-01-28implement float -> unsigned castsBor Grošelj Simić
amd64 lacks instruction for this so it has to be implemented with float -> signed casts. The approach is borrowed from llvm.
2022-01-28implement unsigned -> float castsBor Grošelj Simić
amd64 lacks an instruction for this so it has to be implemented with signed -> float casts: - Word casting is done by zero-extending the word to a long and then doing a regular signed cast. - Long casting is done by dividing by two with correct rounding if the highest bit is set and casting that to float, then adding 1 to mantissa with integer addition
2022-01-23Add a negation instructionEyal Sawady
Necessary for floating-point negation, because `%result = sub 0, %operand` doesn't give the correct sign for 0/-0.
2021-12-05arm64: fix slots with offset >32kQuentin Carbonneaux
When slots are used with a large offset, the emitter generates invalid assembly code. That is caught later on by the assembler, but it prevents compilation of programs with large stack frames. When a slot offset is too large to be expressed as a constant offset to x29 (the frame pointer), emitins() inserts a late Oaddr instruction to x16 and replaces the large slot reference with x16. This change also gave me the opportunity to refactor the save/restore logic for callee-save registers. This fixes the following Hare issue: https://todo.sr.ht/~sircmpwn/hare/387
2021-11-08amd64: avoid reading past end of passed structMichael Forney
If the size of the struct is not a multiple of 8, the actual struct size may be different from the size reserved on the stack. This fixes the case where the struct is passed in memory, but we still may over-read a struct passed in registers. A TODO is added for now.
2021-11-08fix for sloppy reg->mem in arm64 abiQuentin Carbonneaux
Michael found a bug where some copies from registers to memory in the arm64 abi clobber the stack. The test case is: type :T = { w } function w $f() { @start %p =:T call $g() %x =w loadw %p ret %x } qbe will write 4 bytes out of bounds when pulling the result struct from its register. The same bug can be observed if :T's definition is {w 3}; in this case qbe writes 16 bytes in a slot of 12 bytes. This patch changes stkblob() to use the rounded argument size if it is going to be restored from registers. Relatedly, mem->reg loads for structs with size < 16 and != 8, are treated a bit sloppily both in the arm64 and in the sysv abis. That is much less harmful than the present bug.
2021-10-26spill: fix regs assertionsQuentin Carbonneaux
Some arm64 abi tests have been failing for some time now. This fixes them by being a bit more careful with liveset management in spill.c. A late bsclr() call in spill.c may drop legitimately live registers in e.g., R12 =w add R12, 1 While it hurts for regs, it does not matter for ssa temps because those cannot be both in the arguments & return (by the ssa invariant). I added a check before bsclr() to make sure we are clearing only ssa temps. One might be surprised that any ssa temp may be live at this point. The reason why this is the case is the special handling of dead return values earlier in spill(). I think that it is the only case where the return value can be (awkwardly) live at the same time as the arguments, and I think this never happens with registers (i.e., we never have dead register- assigning instructions). I added an assert to check the latter invariant. Finally, there was a simple bug in the arm64 abi which I fixed: In case the return happens via a pointer, x8 needs to be marked live at the beginning of the function. This was caught by test/abi4.ssa.
2021-10-26arm64: Add LR to list of registers to saveMichael Forney
Tested-by: Thomas Bracht Laumann Jespersen <t@laumann.xyz> Fixes: https://todo.sr.ht/~sircmpwn/hare/312
2021-10-25arm64/emit.c: fix move instructions with big immediate valuesSudipto Mallick
Fixes #467. It assumes that the stack won't need to grow beyond 2^32 bytes. If that were to happen, we'd need another or at most two more `movk` instructions. Signed-off-by: Sudipto Mallick <smlckz@disroot.org>
2021-10-25arm64: handle copy of constant to slotMichael Forney
If registers spill onto the stack, we may end up with SSA like S320 =l copy 0 after rega(). Handle this case in arm64 emit().
2021-10-25arm64: Handle slots in Ocopy operandsMichael Forney
2021-10-25arm64: handle slotsMichael Forney
2021-10-22make variadic args explicitQuentin Carbonneaux
Some abis, like the riscv one, treat arguments differently depending on whether they are variadic or not. To prepare for the upcomming riscv target, we change the variadic call syntax and give meaning to the location of the '...' marker. # new syntax %ret =w call $f(w %regular, ..., w %variadic) By nature of their abis, the change is backwards compatible for existing targets.
2021-03-18spill: use stronger assertion for registers in use at start of functionMichael Forney
2021-03-12arm64: fix selcall call data for return of aggregate in memoryMichael Forney
The no-op `copy R0` is necessary in order to trigger dopm in spill.c and rega.c, which assume that a call is always followed by one or more copies from registers. However, the arm64 ABI does not actually return the caller-passed pointer as in x86_64. This causes an assertion failure qbe: aarch64: Assertion failed: r == T.rglob || b == fn->start (spill.c: spill: 470) for the following test program type :t = { l 3 } function $f() { @start.1 @start.2 %ret =:t call $g() ret } The assertion failure only triggers when the block containing the call is not the first block, because the check is skipped for the first block (since some registers may have been used for arguments). To fix this, set R0 in the call data so that spill/rega can see that this dummy "return" register was generated by the call. This matches qbe's existing behavior when the function returns void, another case where no register is used for the function result.
2021-03-02arm64: handle stack offsets >=4096 in OaddrMichael Forney
The immediate in the add instruction is only 12 bits. If the offset does not fit, we must move it into a register first.
2020-08-06arm64: Make sure SP stays aligned by 16Michael Forney
According to the ARMv8 overview document However if SP is used as the base register then the value of the stack pointer prior to adding any offset must be quadword (16 byte) aligned, or else a stack alignment exception will be generated. This manifests as a bus error on my system. To resolve this, just save registers two at a time with stp.
2020-08-06Use a dynamic array for phi argumentsMichael Forney
2019-05-15arm64: Handle stack allocations larger than 4095 bytesMichael Forney
In this case, the immediate is too large to use directly in the add/sub instructions, so move it into a temporary register first. Also, for clarity, rearrange the if-conditions so that they match the constraints of the instructions that immediately follow.
2019-05-15arm64: Handle truncd instructionMichael Forney
2019-05-15arm64: Use 32-bit register name when loading 'b' or 'h' into 'l'Michael Forney
The ldrb and ldrh instructions require a 32-bit register name for the destination and will clear the upper 32-bits of that register.