1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
|
System V ABI x64
================
This document describes concisely the subset of the x64
as it is implemented in QBE. The subset can handle
correctly arbitrary standard C-like structs containing
float and integer types. Structs that have unaligned
members are also supported through "dark" types, see
the IR description document for more information about
them.
:|: ABI Subset Implemented
Data classes of interest as defined by the ABI:
- INTEGER
- SSE
- MEMORY
Classification:
1. The size of each argument gets rounded up to eightbytes.
(It keeps the stack always 8 bytes aligned.)
2. _Bool, char, short, int, long, long long and pointers
are in the INTEGER class. In the context of QBE, it
means that 'l' and 'w' are in the INTEGER class.
3. float and double are in the SSE class. In the context
of QBE, it means that 's' and 'd' are in the SSE class.
4. If the size of an object is larger than two eightbytes
or if contains unaligned fields, it has class MEMORY.
In the context of QBE, those are big aggregate types
and "dark" types.
5. Otherwise, recursively classify fields and determine
the class of the two eightbytes using the classes of
their components. If any is INTEGER the result is
INTEGER, otherwise the result is SSE.
Passing:
- Classify arguments in order.
- INTEGER arguments use in order %rdi %rsi %rdx %rcx
%r8 %r9.
- SSE arguments use in order %xmm0 - %xmm7.
- MEMORY gets passed on the stack. They are "pushed"
in the right-to-left order, so from the callee's
point of view, the left-most argument appears first
on the stack.
- When we run out of registers for an aggregate, revert
the assignment for the first eightbytes and pass it
on the stack.
- When all registers are taken, write arguments on the
stack from right to left.
- When calling a variadic function, %al stores the number
of vector registers used to pass arguments (it must be
an upper bound and does not have to be exact).
- Registers %rbx, %r12 - %r15 are callee-save.
Returning:
- Classify the return type.
- Use %rax and %rdx in order for INTEGER return values.
- Use %xmm0 and %xmm1 in order for SSE return values.
- I the return value's class is MEMORY, the first
argument of the function %rdi was a pointer to an
area big enough to fit the return value. The function
writes the return value there and returns the address
(that was in %rdi) in %rax.
:|: Alignment on the Stack
The ABI is unclear on the alignment requirement of the
stack. What must be ensured is that, right before
executing a 'call' instruction, the stack pointer %rsp
is aligned on 16 bytes. On entry of the called
function, the stack pointer is 8 modulo 16. Since most
functions will have a prelude pushing %rbp, the frame
pointer, upon entry of the body code of the function is
also aligned on 16 bytes (== 0 mod 16).
Here is a diagram of the stack layout after a call from
g() to f().
| |
| g() locals |
+-------------+
^ | | \
| | stack arg 2 | '
| |xxxxxxxxxxxxx| | f()'s MEMORY
growing | +-------------+ | arguments
addresses | | stack arg 1 | ,
| |xxxxxxxxxxxxx| /
| +-------------+ -> 0 mod 16
| | ret addr |
+-------------+ -> f()'s %rbp
| saved %rbp |
+-------------+ -> 0 mod 16
| f() locals |
| ... |
-> %rsp
Legend:
- xxxxx Optional padding.
:|: Remarks
- A struct can be returned in registers in one of three
ways. Either %rax, %rdx are used, or %xmm0, %xmm1,
or finally %rax, %xmm0. This should be clear from
the "Returning" section above.
- The size of the arguments area of the stack needs to
be computed first, then arguments are packed starting
from the bottom of the argument area, respecting
alignment constraints. The ABI mentions "pushing"
arguments in right-to-left order, but I think it's a
mistaken view because of the alignment constraints.
Example: If three 8 bytes MEMORY arguments are passed
to the callee and the caller's stack pointer is 16 bytes
algined, the layout will be like this.
+-------------+
|xxxxxxxxxxxxx| padding
| stack arg 3 |
| stack arg 2 |
| stack arg 1 |
+-------------+ -> 0 mod 16
The padding must not be at the end of the stack area.
A "pushing" logic would put it at the end.
|