~cnx/roux - Alternative QBE compiler

Age	Commit message (Collapse)	Author
2022-01-28	amd64/isel: nits	Quentin Carbonneaux

2022-01-28	fix test/fpcnv (wrong spacing)	Quentin Carbonneaux

2022-01-28	update token hash params	Quentin Carbonneaux

2022-01-28	implement float -> unsigned casts	Bor Grošelj Simić
	amd64 lacks instruction for this so it has to be implemented with float -> signed casts. The approach is borrowed from llvm.
2022-01-28	implement unsigned -> float casts	Bor Grošelj Simić
	amd64 lacks an instruction for this so it has to be implemented with signed -> float casts: - Word casting is done by zero-extending the word to a long and then doing a regular signed cast. - Long casting is done by dividing by two with correct rounding if the highest bit is set and casting that to float, then adding 1 to mantissa with integer addition
2022-01-23	increase token limit to 255	Bor Grošelj Simić

2022-01-23	bump copyright year	Quentin Carbonneaux

2022-01-23	check for fopen() errors for output file	Bor Grošelj Simić

2022-01-23	Add a negation instruction	Eyal Sawady
	Necessary for floating-point negation, because `%result = sub 0, %operand` doesn't give the correct sign for 0/-0.
2021-12-05	arm64: fix slots with offset >32k	Quentin Carbonneaux
	When slots are used with a large offset, the emitter generates invalid assembly code. That is caught later on by the assembler, but it prevents compilation of programs with large stack frames. When a slot offset is too large to be expressed as a constant offset to x29 (the frame pointer), emitins() inserts a late Oaddr instruction to x16 and replaces the large slot reference with x16. This change also gave me the opportunity to refactor the save/restore logic for callee-save registers. This fixes the following Hare issue: https://todo.sr.ht/~sircmpwn/hare/387
2021-11-22	reuse previous address constants in fold()	Michael Forney
	parseref() has code to reuse address constants, but this is not done in other passes such as fold or isel. Introduce a new function newcon() which takes a Con and returns a Ref for that constant, and use this whenever creating address constants. This is necessary to fix folding of address constants when one operand is already folded. For example, in %a =l add $x, 1 %b =l add %a, 2 %c =w loadw %b %a and %b were folded to $x+1 and $x+3 respectively, but then the second add is visited again since it uses %a. This gets folded to $x+3 as well, but as a new distinct constant. This results in %b getting labeled as bottom instead of either constant, disabling the replacement of %b by a constant in subsequent instructions (such as the loadw).
2021-11-10	fold: Prevent error when address is used as operand	Michael Forney

2021-11-10	bump NString	Quentin Carbonneaux

2021-11-10	fold: Don't fold invalid addition/subtraction rather than failing	Michael Forney
	This may happen in a branch QBE doesn't realize is unreachable, for example (simplified from real code found in ncurses) data $str = { b "abcdef", b 0 } function l $f(w %x) { @start %.1 =w ceqw %x, 0 jnz %.1, @logic_join, @logic_right @logic_right %p =l call $strchr(l $str, w %x) %.2 =w ceql %p, 0 @logic_join %.3 =w phi @start %.1, @logic_right %.2 jnz %.3, @fail, @return @fail ret 0 @return %.4 =l sub %p, $str ret %.4 }
2021-11-08	amd64: avoid reading past end of passed struct	Michael Forney
	If the size of the struct is not a multiple of 8, the actual struct size may be different from the size reserved on the stack. This fixes the case where the struct is passed in memory, but we still may over-read a struct passed in registers. A TODO is added for now.
2021-11-08	fix for sloppy reg->mem in arm64 abi	Quentin Carbonneaux
	Michael found a bug where some copies from registers to memory in the arm64 abi clobber the stack. The test case is: type :T = { w } function w $f() { @start %p =:T call $g() %x =w loadw %p ret %x } qbe will write 4 bytes out of bounds when pulling the result struct from its register. The same bug can be observed if :T's definition is {w 3}; in this case qbe writes 16 bytes in a slot of 12 bytes. This patch changes stkblob() to use the rounded argument size if it is going to be restored from registers. Relatedly, mem->reg loads for structs with size < 16 and != 8, are treated a bit sloppily both in the arm64 and in the sysv abis. That is much less harmful than the present bug.
2021-10-28	new chacha20 test	Quentin Carbonneaux

2021-10-26	use unified diff format for test output	Michael Forney
	This make it easier to understand the differences.
2021-10-26	remove trailing whitespace from test/abi7.ssa	Michael Forney

2021-10-26	spill: fix regs assertions	Quentin Carbonneaux
	Some arm64 abi tests have been failing for some time now. This fixes them by being a bit more careful with liveset management in spill.c. A late bsclr() call in spill.c may drop legitimately live registers in e.g., R12 =w add R12, 1 While it hurts for regs, it does not matter for ssa temps because those cannot be both in the arguments & return (by the ssa invariant). I added a check before bsclr() to make sure we are clearing only ssa temps. One might be surprised that any ssa temp may be live at this point. The reason why this is the case is the special handling of dead return values earlier in spill(). I think that it is the only case where the return value can be (awkwardly) live at the same time as the arguments, and I think this never happens with registers (i.e., we never have dead register- assigning instructions). I added an assert to check the latter invariant. Finally, there was a simple bug in the arm64 abi which I fixed: In case the return happens via a pointer, x8 needs to be marked live at the beginning of the function. This was caught by test/abi4.ssa.
2021-10-26	arm64: Add LR to list of registers to save	Michael Forney
	Tested-by: Thomas Bracht Laumann Jespersen <t@laumann.xyz> Fixes: https://todo.sr.ht/~sircmpwn/hare/312
2021-10-25	arm64/emit.c: fix move instructions with big immediate values	Sudipto Mallick
	Fixes #467. It assumes that the stack won't need to grow beyond 2^32 bytes. If that were to happen, we'd need another or at most two more `movk` instructions. Signed-off-by: Sudipto Mallick <smlckz@disroot.org>
2021-10-25	arm64: handle copy of constant to slot	Michael Forney
	If registers spill onto the stack, we may end up with SSA like S320 =l copy 0 after rega(). Handle this case in arm64 emit().
2021-10-25	arm64: Handle slots in Ocopy operands	Michael Forney

2021-10-25	arm64: handle slots	Michael Forney

2021-10-22	make variadic args explicit	Quentin Carbonneaux
	Some abis, like the riscv one, treat arguments differently depending on whether they are variadic or not. To prepare for the upcomming riscv target, we change the variadic call syntax and give meaning to the location of the '...' marker. # new syntax %ret =w call $f(w %regular, ..., w %variadic) By nature of their abis, the change is backwards compatible for existing targets.
2021-10-17	use -static when cross-compiling tests	Quentin Carbonneaux

2021-10-17	amd64/sysv: unbreak env calls	Quentin Carbonneaux
	Env calls were disfunctional from the start. This fixes them on amd64, but they remain to do on arm64. A new test shows how to use them.
2021-10-13	add size suffix to frame setup.	Andrew Chambers

2021-10-11	spill: add some comments describing functions	Michael Forney

2021-10-11	util: fix typo preventing 4-byte copy in blit()	Michael Forney

2021-10-11	avoid some one last gcc truncation warning	Michael Forney

2021-09-20	parse: fix loadw when assigned to l temporary	Michael Forney
	The documentation states that loadw is syntactic sugar for loadsw, but it actually got parsed as Oload. If the result is an l temporary, Oload behaves like Oloadl, not Oloadsw. To fix this, parse Tloadw as Oloadsw explicitly.
2021-09-09	skip nx stack annotation on osx	Quentin Carbonneaux

2021-09-07	test: use architecture-neutral wrapper for calling vprintf	Michael Forney
	Different architectures use different types for va_list: x86_64 uses an 1-length array of struct type[0]: typedef struct { unsigned int gp_offset; unsigned int fp_offset; void overflow_arg_area; void reg_save_area; } va_list[1]; aarch64 uses a struct type[1] typedef struct { void __stack; void __gr_top; void __vr_top; int __gr_offs; int __vr_offs; } va_list; Consequently, C functions which takes a va_list as an argument, such as vprintf, may pass va_list in different ways depending on the architecture. On x86_64, va_list is an array type, so parameter decays to a pointer and passing the address of the va_list is correct. On aarch64, the va_list struct is passed by value, but since it is larger than 16 bytes, the parameter is replaced with a pointer to caller-allocated memory. Thus, passing the address as an l argument happens to work. However, this pattern of passing the address of the va_list to vprintf doesn't extend to other architectures. On riscv64, va_list is defined as typedef void va_list; which is not passed by reference. This means that tests that call vprintf using the address of a va_list (vararg1 and vararg2) will not work on riscv. To fix this while keeping the tests architecture-neutral, add a small wrapper function to the driver which takes a va_list *, and let the C compiler deal with the details of passing va_list by value. [0] https://c9x.me/compile/bib/abi-x64.pdf#figure.3.34 [1] https://c9x.me/compile/bib/abi-arm64.pdf#%5B%7B%22num%22%3A63%2C%22gen%22%3A0%7D%2C%7B%22name%22%3A%22XYZ%22%7D%2C52%2C757%2C0%5D [2] https://github.com/riscv/riscv-elf-psabi-doc/blob/master/riscv-cc.adoc#va_list-va_start-and-va_arg$
2021-09-07	test: assign result of print functions to temporary	Michael Forney
	Though I am not aware of any architecture where this matters, it is technically incorrect to call these stdio functions as if they had no result. The QBE documentation says > Unless the called function does not return a value, a return > temporary must be specified, even if it is never used afterwards. so we should follow it in the tests as well.
2021-08-30	skip jump arguments in rega	Quentin Carbonneaux
	On both amd64 & arm64, the jumps making it to rega won't have any argument.
2021-08-29	amd64/isel: fix floating point == and != result with NaN	Michael Forney
	On x86_64, ucomis[sd] sets ZF=1, PF=0, CF=0 for equal arguments. However, if the arguments are unordered it sets ZF=1, PF=1, CF=1, and there is no jump/flag instruction for ZF=1 & PF=0 or ZF=1 & CF=0. So, in order to correctly implement ceq[sd] on x86_64, we need to be a bit more creative. There are several options available, depending on whether the result of ceq[sd] is used with jnz, or with other instructions, or both. If the result is used for a conditional jump, both gcc and clang use a combination of jp and jnz: ucomisd %xmm1, %xmm0 jp .Lfalse jnz .Lfalse ... .Lfalse: If the result is used in other instructions or return, gcc does the following for x == y: ucomisd %xmm1, %xmm0 setnp %al movzbl %al, %eax movl $0, %edx cmovne %edx, %eax This sets EAX to PF=0, then uses cmovne to clear it if ZF=0. It also takes care to avoid clobbering the flags register in case the result is also used for a conditional jump. Implementing this approach in QBE would require adding an architecture-specific instruction for cmovne. In contrast, clang does an additional compare, this time using cmpeqsd instead of ucomisd: cmpeqsd %xmm1, %xmm0 movq %xmm0, %rax andl $1, %rax The cmpeqsd instruction doas a floating point equality test, setting XMM0 to all 1s if they are equal and all 0s if they are not. However, we need the result in a non-XMM register, so it moves the result back then masks off all but the first bit. Both of these approaches are a bit awkward to implement in QBE, so instead, this commit does the following: ucomisd %xmm1, %xmm0 setz %al movzbl %al, %eax setnp %cl movzbl %cl, %ecx andl %ecx, %eax This sets the result by anding the two flags, but has a side effect of clobbering the flags register. This was a problem in one of my earlier patches to fix this issue[0], in addition to being more complex than I'd hoped. Instead, this commit always leaves the ceq[sd] instruction in the block, even if the result is only used to control a jump, so that the above instruction sequence is always used. Then, since we now have ZF=!(ZF=1 & PF=0) for x == y, or ZF=!(ZF=0 \| PF=1) for x != y, we can use jnz for the jump instruction. [0] https://git.sr.ht/~sircmpwn/qbe/commit/64833841b18c074a23b4a1254625315e05b86658
2021-08-27	amd64/isel: fix floating < and <= result with NaN	Michael Forney
	When the two operands are Unordered (for instance if one of them is NaN), ucomisd sets ZF=1, PF=1, and CF=1. When the result is LessThan, it sets ZF=0, PF=0, and CF=1. However, jb[e]/setb[e] only checks that CF=1 [or ZF=1] which causes the result to be true for unordered operands. To fix this, change the operand swap condition for these two floating point comparison types: always rewrite x < y as y > x, and never rewrite x > y as y < x. Add a test to check the result of cltd, cled, cgtd, cged, ceqd, and cned with arguments that are LessThan, Equal, GreaterThan, and Unordered. Additionally, check three different implementations for equality testing: one that uses the result of ceqd directly, one that uses the result to control a conditional jump, and one that uses the result both as a value and for a conditional jump. For now, unordered equality tests are still broken so they are disabled.
2021-08-23	amd64/emit.c: fix %x =k sub %x, %x	Eyal Sawady
	The negate trick is unnecessary and broken when the first arg is the result.
2021-08-23	test: include exit status in test failure reason	Michael Forney
	This was intended, but was missing due to a typo in the test status variable.
2021-08-23	parsefields: fix padding calculation	Drew DeVault
	This was causing issues with aggregate types. A simple reproduction is: type :type.1 = align 8 { 24 } type :type.2 = align 8 { w 1, :type.1 1 } The size of type.2 should be 32, adding only 4 bytes of padding between the first and second field. Prior to this patch, 20 bytes of padding was added instead, causing the type to have a size of 48. Signed-off-by: Drew DeVault <sir@cmpwn.com>
2021-08-02	copy: consider identity element for more instructions	Michael Forney
	udiv %x, 1 == %x, and for each of sub, or, xor, sar, shr, and shl, <op> %x, 0 == %x.
2021-08-02	gas: always emit GNU-stack note	Érico Nogueira
	In cases where stash was 0, gasemitfin exits immediately and the GNU-stack note isn't added to the asm output. This would result in an executable where GNU_STACK uses flags RWE instead of the desired RW.
2021-07-30	err when an address contains a sum $a+$b (afl)	Quentin Carbonneaux
	Reported by Alessandro Mantovani. These addresses are likely bogus, but they triggered an unwarranted assertion failure. We now raise a civilized error.
2021-07-29	load: handle all cases in cast()	Michael Forney
	Previously, all casts but d->w, d->s, l->s, s->d, w->d were supported. At least the first three can occur by storing to then loading from a slot, currently triggering an assertion failure. Though the other two might not be possible, they are easy enough to support as well. Fixes hare#360.
2021-07-28	handle fast locals in amd64 shifts (afl)	Quentin Carbonneaux
	Reported by Alessandro Mantovani. Although unlikely in real programs it was found that using the address of a fast local in amd64 shifts triggers assertion failures. We now err when the shift count is given by an address; but we allow shifting an address.
2021-07-28	fix buffer overflow in parser (afl)	Quentin Carbonneaux
	Reported by Alessandro Mantovani. Overly long function names would trigger out-of-bounds accesses.
2021-07-28	fix amd64 addressing selection bug (afl)	Quentin Carbonneaux
	Reported by Alessandro Mantovani. Unlikely to be hit in practice because we don't add addresses to addresses. type :biggie = { l, l, l } function $repro(:biggie %p) { @start %x =l add %p, $a storew 42, %x ret }
2021-06-17	amd64: fix conditional jump when compare is swapped and used elsewhere	Michael Forney
	selcmp may potentially swap the arguments and return 1 indicating that the opposite operation should be used. However, if the compare result is used for a conditional jump as well as elsewhere, the original compare op is used instead of the opposite. To fix this, add a check to see whether the opposite compare should be used, regardless of whether selcmp() is done now, or later on during sel(). Bug report and test case from Charlie Stanton.