From 51c46ba6914741cbca54d3351f8cf8d2689fd3dc Mon Sep 17 00:00:00 2001 From: Quentin Rameau Date: Wed, 19 Apr 2017 10:57:08 +0200 Subject: Small corrections in documentation --- doc/il.txt | 109 ++++++++++++++++++++++++++++++------------------------------- 1 file changed, 54 insertions(+), 55 deletions(-) (limited to 'doc') diff --git a/doc/il.txt b/doc/il.txt index 7ebaf64..d816732 100644 --- a/doc/il.txt +++ b/doc/il.txt @@ -46,18 +46,18 @@ to focus on language design issues. ~ Input Files ~~~~~~~~~~~~~ -The intermediate language is provided to QBE as text files. -Usually, one file is generated per each compilation unit of +The intermediate language is provided to QBE as text. +Usually, one file is generated per each compilation unit from the frontend input language. An IL file is a sequence of <@ Definitions > for data, functions, and types. Once processed by QBE, the resulting file can be assembled and linked using a standard toolchain (e.g., GNU binutils). -Here is a complete "Hello World" IL file, it defines a +Here is a complete "Hello World" IL file which defines a function that prints to the screen. Since the string is not a first class object (only the pointer is) it is defined outside the function's body. Comments start with -a # character and run until the end of the line. +a # character and finish with the end of the line. # Define the string constant. data $str = { b "hello world", b 0 } @@ -70,7 +70,7 @@ a # character and run until the end of the line. } If you have read the LLVM language reference, you might -recognize the above example. In comparison, QBE makes a +recognize the example above. In comparison, QBE makes a much lighter use of types and the syntax is terser. ~ BNF Notation @@ -86,7 +86,7 @@ are listed below. * `( ... ),` designates a comma-separated list of the enclosed syntax; * `...*` and `...+` are used for arbitrary and - at-least-once repetition. + at-least-once repetition respectively. ~ Sigils ~~~~~~~~ @@ -94,14 +94,14 @@ are listed below. The intermediate language makes heavy use of sigils, all user-defined names are prefixed with a sigil. This is to avoid keyword conflicts, and also to quickly spot the -scope and kind of an identifier. +scope and nature of identifiers. * `:` is for user-defined <@ Aggregate Types> * `$` is for globals (represented by a pointer) * `%` is for function-scope temporaries * `@` is for block labels -In BNF syntax, we use `?IDENT` to designate an identifier +In this BNF syntax, we use `?IDENT` to designate an identifier starting with the sigil `?`. - 2. Types @@ -114,7 +114,7 @@ starting with the sigil `?`. BASETY := 'w' | 'l' | 's' | 'd' # Base types EXTTY := BASETY | 'b' | 'h' # Extended types -The IL makes very minimal use of types. By design, the types +The IL makes minimal use of types. By design, the types used are restricted to what is necessary for unambiguous compilation to machine code and C interfacing. Unlike LLVM, QBE is not using types as a means to safety; they are only @@ -140,16 +140,16 @@ section. ~ Subtyping ~~~~~~~~~~~ -The IL has a minimal subtyping feature for integer types. +The IL has a minimal subtyping feature, for integer types only. Any value of type `l` can be used in a `w` context. In that case, only the 32 least significant bits of the word value are used. -Make note that it is the inverse of the usual subtyping on +Make note that it is the opposite of the usual subtyping on integers (in C, we can safely use an `int` where a `long` is expected). A long value cannot be used in word context. The rationale is that a word can be signed or unsigned, so -extending it to a long can be done in two ways, either +extending it to a long could be done in two ways, either by zero-extension, or by sign-extension. - 3. Constants @@ -184,9 +184,9 @@ operand of the subtraction is a word (32-bit) context. Because specifying floating-point constants by their bits makes the code less readable, syntactic sugar is provided -to express them. Standard scientific notation is used with -a prefix of `s_` for single and `d_` for double-precision -numbers. Once again, the following example defines twice +to express them. Standard scientific notation is prefixed +with `s_` and `d_` for single and double precision numbers +respectively. Once again, the following example defines twice the same double-precision constant. %x =d add d_0, d_-1 @@ -200,7 +200,7 @@ constants by the linker. ---------------- Definitions are the essential components of an IL file. -They can define three types of objects: Aggregate types, +They can define three types of objects: aggregate types, data, and functions. Aggregate types are never exported and do not compile to any code. Data and function definitions have file scope and are mutually recursive @@ -221,14 +221,14 @@ using the `export` keyword. 'type' :IDENT '=' 'align' NUMBER '{' NUMBER '}' Aggregate type definitions start with the `type` keyword. -They have file scope, but types must be defined before their -first use. The inner structure of a type is expressed by a +They have file scope, but types must be defined before being +referenced. The inner structure of a type is expressed by a comma-separated list of <@ Simple Types> enclosed in curly braces. type :fourfloats = { s, s, d, d } -For ease of generation, a trailing comma is tolerated by +For ease of IL generation, a trailing comma is tolerated by the parser. In case many items of the same type are sequenced (like in a C array), the shorter array syntax can be used. @@ -243,7 +243,7 @@ explicitly specified by the programmer. Opaque types are used when the inner structure of an aggregate cannot be specified; the alignment for opaque -types is mandatory. They are defined by simply enclosing +types is mandatory. They are defined simply by enclosing their size between curly braces. type :opaque = align 16 { 32 } @@ -264,7 +264,7 @@ their size between curly braces. | '"' ... '"' # String | CONST # Constant -Data definitions define objects that will be emitted in the +Data definitions express objects that will be emitted in the compiled file. They can be local to the file or exported with global visibility to the whole program. @@ -282,11 +282,11 @@ initialize multiple fields of the same size. The members of a struct will be packed. This means that padding has to be emitted by the frontend when necessary. Alignment of the whole data objects can be manually specified, -and when no alignment is provided, the maximum alignment of +and when no alignment is provided, the maximum alignment from the platform is used. When the `z` letter is used the number following indicates -the size of the field, the contents of the field are zero +the size of the field; the contents of the field are zero initialized. It can be used to add padding between fields or zero-initialize big arrays. @@ -325,19 +325,18 @@ Here are various examples of data definitions. Function definitions contain the actual code to emit in the compiled file. They define a global symbol that contains a pointer to the function code. This pointer -can be used in call instructions or stored in memory. +can be used in `call` instructions or stored in memory. The type given right before the function name is the return type of the function. All return values of this -function must have the return type. If the return +function must have this return type. If the return type is missing, the function cannot return any value. The parameter list is a comma separated list of temporary names prefixed by types. The types are used to correctly implement C compatibility. When an argument -has an aggregate type, is is set on entry of the -function to a pointer to the aggregate passed by the -caller. In the example below, we have to use a load +has an aggregate type, a pointer to the aggregate is passed +by the caller. In the example below, we have to use a load instruction to get the value of the first (and only) member of the struct. @@ -350,7 +349,7 @@ member of the struct. } If the parameter list ends with `...`, the function is -a variadic function: It can accept a variable number of +a variadic function: it can accept a variable number of arguments. To access the extra arguments provided by the caller, use the `vastart` and `vaarg` instructions described in the <@ Variadic > section. @@ -375,10 +374,10 @@ very good compatibility with C. The <@ Call > section explains how to pass an environment parameter. Since global symbols are defined mutually recursive, -there is no need for function declarations: A function +there is no need for function declarations: a function can be referenced before its definition. Similarly, functions from other modules can be used -without previous declarations. All the type information +without previous declaration. All the type information is provided in the call instructions. The syntax and semantics for the body of functions @@ -389,8 +388,8 @@ are described in the <@ Control > section. The IL represents programs as textual transcriptions of control flow graphs. The control flow is serialized as -a sequence of blocks of straight-line code and connected -using jump instructions. +a sequence of blocks of straight-line code which are +connected using jump instructions. ~ Blocks ~~~~~~~~ @@ -406,12 +405,12 @@ All blocks have a name that is specified by a label at their beginning. Then follows a sequence of instructions that have "fall-through" flow. Finally one jump terminates the block. The jump can either transfer control to another -block of the same function or return, they are described +block of the same function or return; they are described further below. The first block in a function must not be the target of -any jump in the program. If this need is encountered, -the frontend can always insert an empty prelude block +any jump in the program. If this is really needed, +the frontend could insert an empty prelude block at the beginning of the function. When one block jumps to the next block in the IL file, @@ -453,7 +452,7 @@ the following list. When its word argument is non-zero, it jumps to its first label argument; otherwise it jumps to the other - label. The argument must be of word type, because of + label. The argument must be of word type; because of subtyping a long argument can be passed, but only its least significant 32 bits will be compared to 0. @@ -461,7 +460,7 @@ the following list. Terminates the execution of the current function, optionally returning a value to the caller. The value - returned must have the type given in the function + returned must be of the type given in the function prototype. If the function prototype does not specify a return type, no return value can be used. @@ -498,12 +497,12 @@ This is made explicit by the instruction suffix. The types of instructions are described below using a short type string. A type string specifies all the valid return types an instruction can have, its arity, and the type of -its arguments in function of its return type. +its arguments depending on its return type. Type strings begin with acceptable return types, then follows, in parentheses, the possible types for the arguments. -If the n-th return type of the type string is used for an -instruction, the arguments must use the n-th type listed for +If the N-th return type of the type string is used for an +instruction, the arguments must use the N-th type listed for them in the type string. When an instruction does not have a return type, the type string only contains the types of the arguments. @@ -513,7 +512,7 @@ The following abbreviations are used. * `T` stands for `wlsd` * `I` stands for `wl` * `F` stands for `sd` - * `m` stands for the type of pointers on the target, on + * `m` stands for the type of pointers on the target; on 64-bit architectures it is the same as `l` For example, consider the type string `wl(F)`, it mentions @@ -540,7 +539,7 @@ towards zero. The signed and unsigned remainder operations are available as `rem` and `urem`. The sign of the remainder is the same as the one of the dividend. Its magnitude is smaller than -the divisor's. These two instructions and `udiv` are only +the divisor one. These two instructions and `udiv` are only available with integer arguments and result. Bitwise OR, AND, and XOR operations are available for both @@ -548,8 +547,8 @@ integer types. Logical operations of typical programming languages can be implemented using <@ Comparisons > and <@ Jumps >. -Shift instructions `sar`, `shr`, and `shl` shift right or -left their first operand by the amount in the second +Shift instructions `sar`, `shr`, and `shl`, shift right or +left their first operand by the amount from the second operand. The shifting amount is taken modulo the size of the result type. Shifting right can either preserve the sign of the value (using `sar`), or fill the newly freed @@ -591,8 +590,8 @@ towards zero. * `loadsb`, `loadub` -- `I(mm)` For types smaller than long, two variants of the load - instruction is available: one will sign extend the value - loaded, while the other will zero extend it. Remark that + instruction are available: one will sign extend the loaded + value, while the other will zero extend it. Note that all loads smaller than long can load to either a long or a word. @@ -635,9 +634,9 @@ instructions. Pointers are stored in long temporaries. ~~~~~~~~~~~~~ Comparison instructions return an integer value (either a word -or a long), and compare values of arbitrary types. The value -returned is 1 if the two operands satisfy the comparison -relation, and 0 otherwise. The names of comparisons respect +or a long), and compare values of arbitrary types. The returned +value is 1 if the two operands satisfy the comparison +relation, or 0 otherwise. The names of comparisons respect a standard naming scheme in three parts. 1. All comparisons start with the letter `c`. @@ -676,7 +675,7 @@ a standard naming scheme in three parts. For example, `cod` (`I(dd,dd)`) compares two double-precision floating point numbers and returns 1 if the two floating points -are not NaNs, and 0 otherwise. The `csltw` (`I(ww,ww)`) +are not NaNs, or 0 otherwise. The `csltw` (`I(ww,ww)`) instruction compares two words representing signed numbers and returns 1 when the first argument is smaller than the second one. @@ -727,7 +726,7 @@ instruction to lower the precision of an integer temporary. ~~~~~~~~~~~~~~~ The `cast` and `copy` instructions return the bits of their -argument verbatim. A `cast` will however change an integer +argument verbatim. However a `cast` will change an integer into a floating point of the same width and vice versa. * `cast` -- `wlsd(sdwl)` @@ -755,7 +754,7 @@ single-precision floating point number `%f` into `%rs`. ABITY := BASETY | :IDENT -The call instruction is special in many ways. It is not +The call instruction is special in several ways. It is not a three-address instruction and requires the type of all its arguments to be given. Also, the return type can be either a base type or an aggregate type. These specifics @@ -801,7 +800,7 @@ is essentially effectful: calling it twice in a row will return two consecutive arguments from the argument list. Both instructions take a pointer to a variable argument -list as only argument. The size and alignment of variable +list as sole argument. The size and alignment of variable argument lists depend on the target used. However, it is possible to conservatively use the maximum size and alignment required by all the targets. @@ -890,7 +889,7 @@ translate it in SSA form is to insert a phi instruction. Phi instructions return one of their arguments depending on where the control came from. In the example, `%y` is -set to 1 if the `@ift` branch is taken, and it is set to +set to 1 if the `@ift` branch is taken, or it is set to 2 otherwise. An important remark about phi instructions is that QBE -- cgit 1.4.1