Type checking - An Introduction to Compilers - Free Computer, Programming, Mathematics, Technic

Informally, the so-called static semantics of a programming language describes the properties of a program that can be verified without actually executing it. Typ- ically, for statically-types languages, this consists of verifying the “well-typed” property where a program is well-typed if all its constructs (statements, expressions, . . . ) obey the type rules of the language.

Thetype systemof a programming language consists of • A definition of the set of all possible types.

• Type rules that determine the type of a language construct.

The language from Appendix C, Minic, has two primitive types: integers and floats and a few type constructors, as shown by the following domain2_definition3_:

type :: int :: float

:: array(type) :: struct(tenv) :: type∗ →type tenv :: name →type

Note that the above definition is more general than is actually needed in Minic, which has no higher order functions. E.g. it is not possible to define a function with type

int→(int→int)

in Minic.

The type rules of Minic can be symbolically represented as follows (we usex:tto denote thatxhas typet). In the table below we use+to use any of the arithmetic operators+,×, /,−.

t x; ⇒ x:t // declaration ofx x:int ∧ y:int ⇒ (x+y) :int

x:float ∧ y:float ⇒ (x+y) :float

x:t ∧ y:t ⇒ (x==y) :int // support = test on any type

x:array(t) ∧ i:int ⇒ x[i] :t

x:struct(s) ∧ n ∈dom(s) ⇒ x.n:s(n)

x=e ∧ x:t ∧ e:t0 ∧ t6=t0 ⇒ error

∀1≤i≤n·xi :ti ∧ f :t1t2. . . tn →t ⇒ f(x1, . . . , xn) :t

returnx ∧ x:t0 ∧ f :t1t2. . . tn →t ∧ t0 6=t ⇒ error // within the body off

Here error is used to indicate that a certain construct is not well-typed, i.e. no type can be assigned to it without violating the type rules.

2_{See e.g. the “Foundations of Computer Science II” course.}

Type checking then consists of using the rules to assign a type to each expression, variable etc. If no type, or more than one type, can be assigned to a construct, or if the rules generateerror, the program is not well-typed.

It should be obvious that with the help of attributes, semantic actions and a symbol table, such type checking is possible. An example of a type-checking parser for Minic can be found in Appendix C. In the example implementation, types are represented by structures where each structure contains a tag corresponding to the type constructor. Furthermore, types are shared, which saves space and facilitates later checking for equality (only pointers must be compared).

Note that there are statically typed languages (e.g. ML or Haskell) that do not require, as Minic does, that each variable is declared to have a specific type. In- stead, the type of a variable is inferred from its use. In addition, such languages usually have a more sophisticated type system where types can have variables that may be instantiated. In this way, a language can allow for polymorphism where a function has several (possibly an infinite number of) types.

A similar feature is supported by C++ through the template facility. E.g. a function

1 template <class T>

2 int length(list<T>) {

3 // ...

4 }

has a typelist(α)→intwhereαis a type variable. The compiler will instantiate

αtointwhen processing the expressionlength(l)in

1 list<int> l;

Intermediate code generation

While it is possible to construct a compiler that translates the source text directly into the target language, especially if sophisticated optimization is not required, there are important advantages of using an intermediate representation:

• It facilitates retargeting. A compiler employing an intermediate representation can be viewed as consisting of a “front end” (everything up to intermediate code generation1) and a “back end” (optimization, code generation). Note that all dependencies on the source language are located in the front end while dependencies on the target language are confined to the back end. Thus, to retarget the compiler (i.e. to generate code for another processor), it suffices to adapt the back end only. Similarly, if the intermediate code is sufficiently general, the back end could be reused in a compiler for another language.

• Many machine-independent optimizations, especially those that require code movement etc., can be done more easily on intermediate code.

Several forms of intermediate code have been proposed (and used): • Postfix notation.

• Abstract syntax trees. • Three-address code.

1_{Any machine-independent optimization that can be performed on the intermediate code} should also be done in the front end.

5.1 Postfix notation

Postfix notation, also called “postfix Polish” after the originator J. Lukasiewicz, represents expressions and flow of control statements in postfix form, which is free of parentheses, as in the following example:

infix postfix

a+b ab+

(a+b)×(c−d) ab+cd− ×

In general, an expression consisting of ak-ary operatorθ, applied onk(sub)expressions

e1, . . . , ekis represented as

e0₁e0₂. . . e0_kθ

wheree0_i,1≤i≤k, is the postfix form ofei.

Evaluation of postfix expressions is trivial, provided one uses an operand stack, as shown below.

1 operand

2 eval_operator(OPERATOR o,item args[]); 3

4 operand

5 eval_postfix(item code[],int code_len) 6 {

7 operand stack[MAX_STACK];

8 int tos = 0; // top of stack

10 for (int i=0;(i<code_len);++i)

11 if (is_operand(code[i]))

12 stack[tos++] = code[i]; // push on stack

13 else

14 { // operator: pop args, eval, push result

15 k = arity(code[i]);

16 stack[tos-k] = eval_operator(code[i].oprtr, stack-k);

17 tos -= (k-1);

18 }

19 return stack[0]; 20 }

By introducing labels which refer to a position in the postfix expression, arbitrary control structures can be implemented using instructions like

label jump e1e2 label jumplt

In order to interpret labels and jump instructions, the above algorithm must of course be extended to use a program counter (an index in thecodearray), Postfix notation is attractive because it can be easily generated using syntax- directed translation. Below, we use an synthesized attribute code representing the postfix notation of the corresponding symbol occurrence. The statements associated with a rule define the attribute of the left hand side of the rule (denoted

$$), based on the attribute values of the symbols in its right hand side ($irefers to thei’th symbol in the rules right hand side).

1 exp : exp binop exp {

2 $$.code = concat($1.code, $3.code, operator($2));

3 }

4 exp : ( exp) {

5 $$.code = $1.code;

6 }

7 exp : var { $$.code = $1; /* probably a symtab ref */ }

This scheme has the so-calledsimple postfixproperty which means that for a pro- duction

A→A1A2. . . An

the generated code is such that2

code(A) = code(A1) +. . .+code(An) +tail

which implies that code can be emitted as soon as a reduction occurs.

Postfix code is often used for stack-oriented virtual machines without general pur- pose registers, such as the Java virtual machine (see e.g. the micro compiler in Appendix B. It is not particularly suitable for optimization (code movement is hard).

In document An Introduction to Compilers - Free Computer, Programming, Mathematics, Technical Books, Lecture Notes and Tutorials (Page 72-77)