summary refs log tree commit diff
AgeCommit message (Collapse)Author
2023-08-26Fix conversion from float/double to unsigned intMichael Forney
signed int can't represent all the values of unsigned int, so we need to do the conversion to signed long, and use the lower 32 bits as the result.
2023-08-18test.sh fixes for devuan linuxQuentin Carbonneaux
2023-08-18file,loc become dbgfile,dbglocQuentin Carbonneaux
2023-06-07parseline() tweaksQuentin Carbonneaux
2023-06-06implement line number info trackingThomas Bracht Laumann Jespersen
Support "file" and "loc" directives. "file" takes a string (a file name) assigns it a number, sets the current file to that number and records the string for later. "loc" takes a single number and outputs location information with a reference to the current file.
2023-05-31Bump NStringAlexey Yerin
2023-05-09fix sub-word returns on arm64_appleQuentin Carbonneaux
2023-04-03Fix 1 C UBLocria Cyber
2023-04-02amd64_apple: one more thread-local symbols fixQuentin Carbonneaux
We now treat thread-local symbols in Mems properly.
2023-04-02tests for thread-local addressesQuentin Carbonneaux
2023-04-02amd64_apple: support thread-local addressesQuentin Carbonneaux
Non-store/load instructions were not lowered correctly for thread- local symbols. This is an attempt at a fix (cannot test for now).
2023-04-02amd64_sysv: fix offsets in thread-local OaddrQuentin Carbonneaux
2023-04-02print prefix for thread-local symbolsQuentin Carbonneaux
2023-04-02amd64_sysv: thread-local support in OaddrQuentin Carbonneaux
Thanks to Lassi Pulkkinen for flagging the issue and pointing me to Ulrich Drepper's extensive doc [1]. [1] https://people.redhat.com/drepper/tls.pdf
2023-03-22rename blknew() to newblk()Quentin Carbonneaux
This is consistent with newtmp() and newcon().
2023-03-19naming nitQuentin Carbonneaux
2023-03-16silence format warning more reliablyQuentin Carbonneaux
2023-03-15silence some warningsQuentin Carbonneaux
2023-03-13fix memory leakQuentin Carbonneaux
2023-03-13refresh stale Tmp.link before useQuentin Carbonneaux
During coalescing, the resizing/ reordering of the sl[] array invalidates the indices stored in the 'visit' field of temps; we need to reset it before we can use it again.
2023-03-11Emit .type and .size directives on RISC-V and ARMAlexey Yerin
To match x86
2023-03-11kill dead stores when coalescing slotsQuentin Carbonneaux
This is necessary because, post fusion, dead stores may clobber data. A new test case exposes one such situation.
2023-01-09reorder some sections in doc v1.1Quentin Carbonneaux
2022-12-27ready for this jellyQuentin Carbonneaux
2022-12-25link pthread in testsQuentin Carbonneaux
2022-12-25new UNDEF RefQuentin Carbonneaux
Crashing loads of uninitialized memory proved to be a problem when implementing unions using qbe. This patch introduces a new UNDEF Ref to represent data that is known to be uninitialized. Optimization passes can make use of it to eliminate some code. In the last compilation stages, UNDEF is treated as the constant 0xdeaddead.
2022-12-16update documentationQuentin Carbonneaux
2022-12-15bugfix in load eliminationQuentin Carbonneaux
When checking if two slices represent the same range of memory we must check that offsets match. The bug was revealed by a harec test.
2022-12-14new blit instructionQuentin Carbonneaux
2022-12-14fix coalesce() to produce valid ssaQuentin Carbonneaux
When multiple stack slots are coalesced one 'alloc' instruction is kept in the il and the other ones are removed and have their uses replaced by the result of the selected one. To produce valid ssa, it must be ensured that the uses that get replaced are dominated by the selected 'alloc' instruction. This patch ensures dominance by moving the selected alloc up in the start block as necessary.
2022-12-12treat retc as non-escapingQuentin Carbonneaux
We may well treat all rets as non-escaping since stack slots are destroyed upon funcion return.
2022-12-12new rsval() helper for signed RefsQuentin Carbonneaux
The .val field is signed in RSlot. Add a new dedicated function to fetch it as a signed int.
2022-12-12crash loads from uninitialized slotsQuentin Carbonneaux
2022-12-12renamings in coalesce()Quentin Carbonneaux
2022-12-12zero msbs of 32-bit constantsQuentin Carbonneaux
Some noisy assemblers complain when asked to do it themselves.
2022-11-27new hlt block terminatorQuentin Carbonneaux
It is handy to express when the end of a block cannot be reached. If a hlt terminator is executed, it traps the program. We don't go the llvm way and specify execution semantics as undefined behavior.
2022-11-24cosmetics in mem.cQuentin Carbonneaux
2022-11-22use a new struct for symbolsQuentin Carbonneaux
Symbols are a useful abstraction that occurs in both Con and Alias. In this patch they get their own struct. This new struct packages a symbol name and a type; the type tells us where the symbol name must be interpreted (currently, in gobal memory or in thread-local storage). The refactor fixed a bug in addcon(), proving the value of packaging symbol names with their type.
2022-11-22rename Tmp.ins to be more descriptiveQuentin Carbonneaux
2022-11-21fix allocation ordering bug in regaQuentin Carbonneaux
When we process one block, we start by allocating registers for all the temporaries live at the exit of the block. Before this patch we processed temps first, then in doblk() we would mark globally live registers allocated. This meant that temps could get wrongly assigned a live register. The fix is simple: we now process registers first at block exits, then allocate temps.
2022-11-21recognize some phis as copiesQuentin Carbonneaux
The copy elimination pass is not complete. This patch improves things a bit, but I think we still have quite a bit of incompleteness. We now consistently mark phis with all arguments identical as copies. Previously, they were inconsistently eliminated by phisimpl(). An example where they were not eliminated is the following: @blk2 %a = phi @blk0 %x, @blk1 %x jnz ?, @blk3, @blk4 @blk3 %b = copy %x @blk4 %c = phi @blk2 %a, @blk3 %b In this example, neither %c nor %a were marked as copies of %x because, when phisimpl() is called, the copy information for %b is not available. The incompleteness is still present and can be observed by modifying the example above so that %a takes a copy of %x through a back-edge. Then, phisimpl()'s lack of copy information about %b will prevent optimization.
2022-11-20new slot coalescing passQuentin Carbonneaux
This pass limits stack usage when many small aggregates are allocated on the stack. A fast liveness analysis figures out which slots interfere and the pass then fuses slots that do not interfere. The pass also kills stack slots that are only ever assigned. On the hare stdlib test suite, this fusion pass managed to reduce the total eligible slot bytes count by 84%. The slots considered for fusion must not escape and not exceed 64 bytes in size.
2022-11-20export getalias()Quentin Carbonneaux
We will be using it in the new coalesce() pass.
2022-11-20make multiple calls to fillalias() possibleQuentin Carbonneaux
The asserts (a->type == ABot) made it impossible to run fillalias() multiple times. We now reset the Alias.type field of all temps before starting. Getting rid of the asserts would have been another option.
2022-11-20stored bytes in Alias informationQuentin Carbonneaux
Stack slots may have padding bytes, and if we want to have precise liveness information it's important that we are able to tell them apart. This patch extends fillalias() to remember for every slot what bytes were ever assigned. In case the slot address does not escape we know that only these bytes matter. To save space, we only store this information if the slot size is less than or equal to NBit. The Alias struct was reworked a bit to save some space. I am still not very satisfied with its layout though.
2022-11-20argc does not leak its address argumentQuentin Carbonneaux
2022-11-20make Alias.base an intQuentin Carbonneaux
We had the invariant that it'd always be a temporary.
2022-11-20fill definition site in filluse()Quentin Carbonneaux
2022-10-12thread-local storage for amd64_appleQuentin Carbonneaux
It is quite similar to arm64_apple. Probably, the call that needs to be generated also provides extra invariants on top of the regular abi, but I have not checked that. Clang generates code that is a bit neater than qbe's because, on x86, a load can be fused in a call instruction! We do not bother with supporting these since we expect only sporadic use of the feature. For reference, here is what clang might output for a store to the second entry of a thread-local array of ints: movq _x@TLVP(%rip), %rdi callq *(%rdi) movl %ecx, 4(%rax)
2022-10-12thread-local storage for arm64_appleQuentin Carbonneaux
It is documented nowhere how this is supposed to work. It is also quite easy to have assertion failures pop in the linker when generating asm slightly different from clang's! The best source of information is found in LLVM's source code (AArch64ISelLowering.cpp). I paste it here for future reference: /// Darwin only has one TLS scheme which must be capable of dealing with the /// fully general situation, in the worst case. This means: /// + "extern __thread" declaration. /// + Defined in a possibly unknown dynamic library. /// /// The general system is that each __thread variable has a [3 x i64] descriptor /// which contains information used by the runtime to calculate the address. The /// only part of this the compiler needs to know about is the first xword, which /// contains a function pointer that must be called with the address of the /// entire descriptor in "x0". /// /// Since this descriptor may be in a different unit, in general even the /// descriptor must be accessed via an indirect load. The "ideal" code sequence /// is: /// adrp x0, _var@TLVPPAGE /// ldr x0, [x0, _var@TLVPPAGEOFF] ; x0 now contains address of descriptor /// ldr x1, [x0] ; x1 contains 1st entry of descriptor, /// ; the function pointer /// blr x1 ; Uses descriptor address in x0 /// ; Address of _var is now in x0. /// /// If the address of _var's descriptor *is* known to the linker, then it can /// change the first "ldr" instruction to an appropriate "add x0, x0, #imm" for /// a slight efficiency gain. The call 'blr x1' above is actually special in that it trashes less registers than what the abi would normally permit. In qbe, I don't take advantage of this and lower the call like a regular call. We can revise this later on. Again, the source for this information is LLVM's source code: // TLS calls preserve all registers except those that absolutely must be // trashed: X0 (it takes an argument), LR (it's a call) and NZCV (let's not be // silly).