~cnx/roux - Alternative QBE compiler

Age	Commit message (Collapse)	Author
2022-10-12	thread-local storage for arm64_apple	Quentin Carbonneaux
	It is documented nowhere how this is supposed to work. It is also quite easy to have assertion failures pop in the linker when generating asm slightly different from clang's! The best source of information is found in LLVM's source code (AArch64ISelLowering.cpp). I paste it here for future reference: /// Darwin only has one TLS scheme which must be capable of dealing with the /// fully general situation, in the worst case. This means: /// + "extern __thread" declaration. /// + Defined in a possibly unknown dynamic library. /// /// The general system is that each __thread variable has a [3 x i64] descriptor /// which contains information used by the runtime to calculate the address. The /// only part of this the compiler needs to know about is the first xword, which /// contains a function pointer that must be called with the address of the /// entire descriptor in "x0". /// /// Since this descriptor may be in a different unit, in general even the /// descriptor must be accessed via an indirect load. The "ideal" code sequence /// is: /// adrp x0, _var@TLVPPAGE /// ldr x0, [x0, _var@TLVPPAGEOFF] ; x0 now contains address of descriptor /// ldr x1, [x0] ; x1 contains 1st entry of descriptor, /// ; the function pointer /// blr x1 ; Uses descriptor address in x0 /// ; Address of _var is now in x0. /// /// If the address of _var's descriptor is known to the linker, then it can /// change the first "ldr" instruction to an appropriate "add x0, x0, #imm" for /// a slight efficiency gain. The call 'blr x1' above is actually special in that it trashes less registers than what the abi would normally permit. In qbe, I don't take advantage of this and lower the call like a regular call. We can revise this later on. Again, the source for this information is LLVM's source code: // TLS calls preserve all registers except those that absolutely must be // trashed: X0 (it takes an argument), LR (it's a call) and NZCV (let's not be // silly).
2022-10-08	mark apple targets with a boolean	Quentin Carbonneaux
	It is more natural to branch on a flag than have different function pointers for high-level passes.
2022-10-08	"rel" fields become "reloc"	Quentin Carbonneaux

2022-10-08	add support for thread-local storage	Quentin Carbonneaux
	The apple targets are not done yet.
2022-10-03	new arm64_apple target	Quentin Carbonneaux
	Should make qbe work on apple arm-based hardware.
2022-09-01	remove two unsigned	Quentin Carbonneaux
	We have a uint alias that we use everywhere else. I also added a todo about unhandled large offsets in arm64/emit.
2022-09-01	use direct bl calls on arm64	Quentin Carbonneaux
	This generates tidier code and is pic friendly because it lets the linker trampoline calls to dynlinked libs.
2022-08-31	drop -G flag and add target amd64_apple	Quentin Carbonneaux
	apple support is more than assembly syntax in case of arm64 machines, and apple syntax is currently useless in all cases but amd64; rather than having a -G option that only makes sense with amd64, we add a new target amd64_apple
2022-05-10	arm64: fix maximum immediate size for small loads/stores	Michael Forney
	The maximum immediate size for 1, 2, 4, and 8 byte loads/stores is 4095, 8190, 16380, and 32760 respectively[0][1][2]. [0] https://developer.arm.com/documentation/dui0802/a/A64-Data-Transfer-Instructions/LDRB--immediate- [1] https://developer.arm.com/documentation/dui0802/a/A64-Data-Transfer-Instructions/LDRH--immediate- [2] https://developer.arm.com/documentation/dui0802/a/A64-Data-Transfer-Instructions/LDR--immediate-
2022-03-14	dynamic stack allocs for arm64	Quentin Carbonneaux
	I also moved some isel logic that would have been repeated a third time in util.c.
2022-02-02	shared linkage logic for func/data	Quentin Carbonneaux

2022-01-31	arm64: handle large slots in Ocopy	Quentin Carbonneaux

2022-01-28	implement float -> unsigned casts	Bor Grošelj Simić
	amd64 lacks instruction for this so it has to be implemented with float -> signed casts. The approach is borrowed from llvm.
2022-01-28	implement unsigned -> float casts	Bor Grošelj Simić
	amd64 lacks an instruction for this so it has to be implemented with signed -> float casts: - Word casting is done by zero-extending the word to a long and then doing a regular signed cast. - Long casting is done by dividing by two with correct rounding if the highest bit is set and casting that to float, then adding 1 to mantissa with integer addition
2022-01-23	Add a negation instruction	Eyal Sawady
	Necessary for floating-point negation, because `%result = sub 0, %operand` doesn't give the correct sign for 0/-0.
2021-12-05	arm64: fix slots with offset >32k	Quentin Carbonneaux
	When slots are used with a large offset, the emitter generates invalid assembly code. That is caught later on by the assembler, but it prevents compilation of programs with large stack frames. When a slot offset is too large to be expressed as a constant offset to x29 (the frame pointer), emitins() inserts a late Oaddr instruction to x16 and replaces the large slot reference with x16. This change also gave me the opportunity to refactor the save/restore logic for callee-save registers. This fixes the following Hare issue: https://todo.sr.ht/~sircmpwn/hare/387
2021-10-25	arm64/emit.c: fix move instructions with big immediate values	Sudipto Mallick
	Fixes #467. It assumes that the stack won't need to grow beyond 2^32 bytes. If that were to happen, we'd need another or at most two more `movk` instructions. Signed-off-by: Sudipto Mallick <smlckz@disroot.org>
2021-10-25	arm64: handle copy of constant to slot	Michael Forney
	If registers spill onto the stack, we may end up with SSA like S320 =l copy 0 after rega(). Handle this case in arm64 emit().
2021-10-25	arm64: Handle slots in Ocopy operands	Michael Forney

2021-10-25	arm64: handle slots	Michael Forney

2021-03-02	arm64: handle stack offsets >=4096 in Oaddr	Michael Forney
	The immediate in the add instruction is only 12 bits. If the offset does not fit, we must move it into a register first.
2020-08-06	arm64: Make sure SP stays aligned by 16	Michael Forney
	According to the ARMv8 overview document However if SP is used as the base register then the value of the stack pointer prior to adding any offset must be quadword (16 byte) aligned, or else a stack alignment exception will be generated. This manifests as a bus error on my system. To resolve this, just save registers two at a time with stp.
2019-05-15	arm64: Handle stack allocations larger than 4095 bytes	Michael Forney
	In this case, the immediate is too large to use directly in the add/sub instructions, so move it into a temporary register first. Also, for clarity, rearrange the if-conditions so that they match the constraints of the instructions that immediately follow.
2019-05-15	arm64: Handle truncd instruction	Michael Forney

2019-05-15	arm64: Use 32-bit register name when loading 'b' or 'h' into 'l'	Michael Forney
	The ldrb and ldrh instructions require a 32-bit register name for the destination and will clear the upper 32-bits of that register.
2017-05-17	intern symbol names	Quentin Carbonneaux
	Symbols in the source file are still limited in length because the rest of the code assumes that strings always fit in NString bytes. Regardless, there is already a benefit because comparing/copying symbol names does not require using strcmp()/strcpy() anymore.
2017-04-08	new arm64 backend, yeepee	Quentin Carbonneaux