1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
|
===========================
QBE Intermediate Language
===========================
- Table of Contents
-------------------
1. <@ Basic Concepts >
* <@ Input Files >
* <@ BNF Syntax >
* <@ Sigils >
2. <@ Types >
* <@ Simple Types >
* <@ Subtyping >
3. <@ Definitions >
* <@ Aggregate Types >
* <@ Data >
* <@ Functions >
4. <@ Control >
* <@ Blocks >
* <@ Instructions >
* <@ Jumps >
5. Immediate Constants
* Semantics
* Floating Sugar
6. Integer Instructions
* Arithmetic
* Memory
* Comparisons
7. Floating Point Instructions
* Arithmetic
* Memory
* Comparisons
8. Special Instructions
* Conversions and Extensions
* Casts
* Phi
- 1. Basic Concepts
-------------------
The intermediate language (IL) is a higher-level language
than the machine's assembly language. It smoothes most
of the irregularities of the underlying hardware and
allows an infinite number of temporaries to be used.
This higher abstraction level allows frontend programmers
to focus on language design issues.
~ Input Files
~~~~~~~~~~~~~
The intermediate language is provided to QBE as text files.
Usually, one file is generated per each compilation unit of
the frontend input language. An IL file is a sequence of
<@ Definitions > for data, functions, and types. Once
processed by QBE, the resulting file can be assembled and
linked using a standard toolchain (e.g. GNU binutils).
Here is a complete "Hello World" IL file, it defines a
function that prints to the screen. Since the string is
not a first class object (only the pointer is) it is
defined outside the function's body.
# Define the string constant.
data $str = { b "hello world", b 0 }
function w $main() {
@start
# Call the puts function with $str as argument.
%r =w call $puts(l $str)
ret 0
}
If you have read the LLVM language reference, you might
recognize the above example. In comparison, QBE makes a
much lighter use of types and the syntax is terser.
~ BNF Syntax
~~~~~~~~~~~~
The language syntax is vaporously described in the sections
below using BNF syntax. The different BNF constructs used
are described in the following list.
* Keywords are enclosed between quotes;
* `... | ...` expresses disjunctions;
* `[ ... ]` marks some syntax as optional;
* `( ... ),` designates a comma-separated list of the
enclosed syntax;
* `...*` and `...+` as used for arbitrary and
at-least-once repetition.
~ Sigils
~~~~~~~~
The intermediate language makes heavy use of sigils, all
user-defined names are prefixed with a sigil. This is
to avoid keyword conflicts, and also to quickly spot the
scope and kind of an identifier.
* `:` is for user-defined <@ Aggregate Types>
* `$` is for globals (represented by a pointer)
* `%` is for function-scope temporaries
* `@` is for function labels
In BNF syntax, we use `?IDENT` to designate an identifier
starting with the sigil `?`.
- 2. Types
----------
~ Simple Types
~~~~~~~~~~~~~~
`bnf
BASETY := 'w' | 'l' | 's' | 'd' # Base types
EXTTY := BASETY | 'h' | 'b' # Extended types
We makes very minimal use of types. The types used are only
what is necessary for unambiguous compilation to machine
code and C interfacing.
The four base types are `w` (word), `l` (long), `s` (single),
and `d` (double), they stand respectively for 32 bits and
64 bits integers, and 32 bits and 64 bits floating points.
Temporaries in the IL can only have a basic type.
Extended types contain base types and add `h` (half word)
and `b` (byte), respectively for 16 bits and 8 bits integers.
They are used in <@ Aggregate Types> and <@ Data> definitions.
The IL also provides user-defined aggregate types, these are
used for C interfacing. The syntax used to designate them is
`:foo`. Details about their definition are given in the
<@ Aggregate Types > section.
~ Subtyping
~~~~~~~~~~~
The IL has a minimal subtyping feature for integer types.
Any value of type `l` can be used in a `w` context. When that
happens only the 32 least significant bits of the word value
are used.
Note that a long value must not be used in word context.
The rationale is that the 32 high bits of the extended long
value could very well be zeroes or the result of a sign
extension of the word.
- 3. Definitions
----------------
Definitions are the essential components of an IL file.
They can define three types of objects: Aggregate types,
data, and functions. Aggregate types are never exported
and do not compile to any code. Data and function
definitions have file scope and are mutually recursive
(even across IL files). Their visibility can be controlled
using the `export` keyword.
~ Aggregate Types
~~~~~~~~~~~~~~~~~
`bnf
TYPEDEF :=
# Regular type
'type' :IDENT '=' [ 'align' NUMBER ]
'{'
( EXTTY [ NUMBER ] ),
'}'
| # Opaque type
'type' :IDENT '=' 'align' NUMBER '{' NUMBER '}'
Aggregate type definitions start with the `type` keyword.
They have file scope but types must be defined before their
first use. The inner structure of a type is expressed by a
comma separated list of <@ Simple Types> enclosed in curly
braces.
type :fourfloats = { s, s, d, d }
For ease of generation, a trailing comma is tolerated by
the parser. In case many items of the same type are
sequenced (like in a C array), the sorter array syntax
can be used.
type :abyteandmanywords = { b, w 100 }
By default, the alignment of an aggregate type is the
maximum alignment of its members. The alignment can be
explicitely specified by the programmer
Opaque types are used when the inner structure of an
aggregate cannot be specified, the alignment for opaque
types is mandatory. They are defined by simply enclosing
their size between curly braces.
type :opaque = align 16 { 32 }
~ Data
~~~~~~
`bnf
DATADEF :=
['export'] 'data' $IDENT '='
'{'
( EXTTY DATAITEM+
| 'z' NUMBER ),
'}'
DATAITEM :=
$IDENT [ '+' NUMBER ] # Symbol and offset
| '"' ... '"' # String
| IMMEDIATE # Immediate
Data definitions define objects that will be emitted in the
compiled file. They can be local to the file or exported
with global visibility to the whole program.
They define a global identifier (starting with the sigil
`$`), that will contain a pointer to the object specified
by the definition.
Objects are described by a sequence of fields that start with
a type letter. This letter can either be an extended type,
or the `z` letter. If the letter used is an extended type,
the data item following specifies the bits to be stored in
the field. When several data items follow a letter, they
initialize multiple fields of the same size.
The members of a struct will be packed. This means that
padding has to be emitted by the frontend when necessary.
Alignment of the whole data objects can be manually specified,
and when no alignment is provided, the maximum alignment of
the platform is used.
When the `z` letter is used the number following indicates
the size of the field, the contents of the field are zero
initialized. It can be used to add padding between fields
or zero-initialize big arrays.
Here are various examples of data definitions.
# Three 32 bits values 1, 2, and 3
# followed by a 0 byte.
data $a = { w 1 2 3, b 0 }
# A thousand bytes 0 initialized.
data $b = { z 1000 }
# An object containing two 64 bits
# fields, one with all bits sets and the
# other containing a pointer to the
# object itself.
data $c = { l -1, l $c }
~ Functions
~~~~~~~~~~~
`bnf
FUNCDEF :=
['export'] 'function' [BASETY | :IDENT] $IDENT PARAMS
'{'
BLOCK+
'}'
PARAMS := '(' ( (BASETY | :IDENT) %IDENT ), ')'
Function definitions contain the actual code to emit in
the compiled file. They define a global symbol that
contains a pointer to the function code. This pointer
can be used in call instructions or stored in memory.
Since global symbols are defined mutually recursive,
there is no need for function declarations: A function
can be referenced before its definition.
Similarly, functions from other modules can be used
without previous declarations. All the type information
is provided in the call instructions.
The syntax and semantics for the body of functions
is described in the <@ Control > section.
- Control
---------
The IL represents programs as textual transcriptions of
control flow graphs. A program is given as a sequence
of blocks of straight-line instructions ending with a
jump to other blocks or returning from the function.
Because QBE uses static single assignment, blocks
can start with a sequence of <@ Phi > instructions.
~ Blocks
~~~~~~~~
`bnf
BLOCK :=
@IDENT # Block label
PHI* # Phi instructions
INST* # Regular instructions
JUMP # Jump or return
~ Instructions
~~~~~~~~~~~~~~
~ Jumps
~~~~~~~
|