Age | Commit message (Collapse) | Author |
|
|
|
Support "file" and "loc" directives. "file" takes a string (a file name)
assigns it a number, sets the current file to that number and records
the string for later. "loc" takes a single number and outputs location
information with a reference to the current file.
|
|
To match x86
|
|
The .val field is signed in RSlot.
Add a new dedicated function to
fetch it as a signed int.
|
|
It is handy to express when
the end of a block cannot be
reached. If a hlt terminator
is executed, it traps the
program.
We don't go the llvm way and
specify execution semantics as
undefined behavior.
|
|
Symbols are a useful abstraction
that occurs in both Con and Alias.
In this patch they get their own
struct. This new struct packages
a symbol name and a type; the type
tells us where the symbol name
must be interpreted (currently, in
gobal memory or in thread-local
storage).
The refactor fixed a bug in
addcon(), proving the value of
packaging symbol names with their
type.
|
|
It is documented nowhere how this is
supposed to work. It is also quite easy
to have assertion failures pop in the
linker when generating asm slightly
different from clang's!
The best source of information is found
in LLVM's source code (AArch64ISelLowering.cpp).
I paste it here for future reference:
/// Darwin only has one TLS scheme which must be capable of dealing with the
/// fully general situation, in the worst case. This means:
/// + "extern __thread" declaration.
/// + Defined in a possibly unknown dynamic library.
///
/// The general system is that each __thread variable has a [3 x i64] descriptor
/// which contains information used by the runtime to calculate the address. The
/// only part of this the compiler needs to know about is the first xword, which
/// contains a function pointer that must be called with the address of the
/// entire descriptor in "x0".
///
/// Since this descriptor may be in a different unit, in general even the
/// descriptor must be accessed via an indirect load. The "ideal" code sequence
/// is:
/// adrp x0, _var@TLVPPAGE
/// ldr x0, [x0, _var@TLVPPAGEOFF] ; x0 now contains address of descriptor
/// ldr x1, [x0] ; x1 contains 1st entry of descriptor,
/// ; the function pointer
/// blr x1 ; Uses descriptor address in x0
/// ; Address of _var is now in x0.
///
/// If the address of _var's descriptor *is* known to the linker, then it can
/// change the first "ldr" instruction to an appropriate "add x0, x0, #imm" for
/// a slight efficiency gain.
The call 'blr x1' above is actually
special in that it trashes less registers
than what the abi would normally permit.
In qbe, I don't take advantage of this
and lower the call like a regular call.
We can revise this later on. Again, the
source for this information is LLVM's
source code:
// TLS calls preserve all registers except those that absolutely must be
// trashed: X0 (it takes an argument), LR (it's a call) and NZCV (let's not be
// silly).
|
|
It is more natural to branch on a
flag than have different function
pointers for high-level passes.
|
|
|
|
The apple targets are not done yet.
|
|
Should make qbe work on apple
arm-based hardware.
|
|
We have a uint alias that we use
everywhere else. I also added a
todo about unhandled large offsets
in arm64/emit.
|
|
This generates tidier code and is pic
friendly because it lets the linker
trampoline calls to dynlinked libs.
|
|
apple support is more than assembly syntax
in case of arm64 machines, and apple syntax
is currently useless in all cases but amd64;
rather than having a -G option that only
makes sense with amd64, we add a new target
amd64_apple
|
|
The maximum immediate size for 1, 2, 4, and 8 byte loads/stores is
4095, 8190, 16380, and 32760 respectively[0][1][2].
[0] https://developer.arm.com/documentation/dui0802/a/A64-Data-Transfer-Instructions/LDRB--immediate-
[1] https://developer.arm.com/documentation/dui0802/a/A64-Data-Transfer-Instructions/LDRH--immediate-
[2] https://developer.arm.com/documentation/dui0802/a/A64-Data-Transfer-Instructions/LDR--immediate-
|
|
I also moved some isel logic
that would have been repeated
a third time in util.c.
|
|
|
|
|
|
amd64 lacks instruction for this so it has to be implemented with
float -> signed casts. The approach is borrowed from llvm.
|
|
amd64 lacks an instruction for this so it has to be implemented with
signed -> float casts:
- Word casting is done by zero-extending the word to a long and then doing
a regular signed cast.
- Long casting is done by dividing by two with correct rounding if the
highest bit is set and casting that to float, then adding
1 to mantissa with integer addition
|
|
Necessary for floating-point negation, because
`%result = sub 0, %operand` doesn't give the correct sign for 0/-0.
|
|
When slots are used with a large offset,
the emitter generates invalid assembly
code. That is caught later on by the
assembler, but it prevents compilation
of programs with large stack frames.
When a slot offset is too large to be
expressed as a constant offset to x29
(the frame pointer), emitins() inserts
a late Oaddr instruction to x16 and
replaces the large slot reference with
x16.
This change also gave me the opportunity
to refactor the save/restore logic for
callee-save registers.
This fixes the following Hare issue:
https://todo.sr.ht/~sircmpwn/hare/387
|
|
Fixes #467. It assumes that the stack won't need to grow beyond 2^32 bytes.
If that were to happen, we'd need another or at most two more `movk` instructions.
Signed-off-by: Sudipto Mallick <smlckz@disroot.org>
|
|
If registers spill onto the stack, we may end up with SSA like
S320 =l copy 0
after rega(). Handle this case in arm64 emit().
|
|
|
|
|
|
The immediate in the add instruction is only 12 bits. If the offset
does not fit, we must move it into a register first.
|
|
According to the ARMv8 overview document
However if SP is used as the base register then the value of the stack
pointer prior to adding any offset must be quadword (16 byte) aligned,
or else a stack alignment exception will be generated.
This manifests as a bus error on my system.
To resolve this, just save registers two at a time with stp.
|
|
In this case, the immediate is too large to use directly in the add/sub
instructions, so move it into a temporary register first.
Also, for clarity, rearrange the if-conditions so that they match the
constraints of the instructions that immediately follow.
|
|
|
|
The ldrb and ldrh instructions require a 32-bit register name for the
destination and will clear the upper 32-bits of that register.
|
|
Symbols in the source file are still limited in
length because the rest of the code assumes that
strings always fit in NString bytes.
Regardless, there is already a benefit because
comparing/copying symbol names does not require
using strcmp()/strcpy() anymore.
|
|
|