• No results found

Compiler

This chapter presents a first application of programming language semantics:

proving compiler correctness. To this end, we will define a small machine language based on a simple stack machine. Stack machines are common low-level intermediate languages; the Java Virtual Machine is one example. We then write a compiler from IMP to this language and prove that the compiled program has the same semantics as the source program. The compiler will perform a very simple standard optimization for boolean expressions, but is otherwise non-optimizing.

As in the other chapters, the emphasis here is on showing the structure and main setup of such a proof. Our compiler proof shows the core of the argument, but compared to real compilers we make drastic simplifications: our target language is comparatively high-level, we do not consider optimizations, we ignore the compiler front-end, and our source language does not contain any concepts that are particularly hard to translate into machine language.

8.1 Instructions and Stack Machine

thy

We begin by defining the instruction set architecture and semantics of our stack machine. We have already seen a very simple stack machine language inSection 3.3. In this section, we extend this language with memory writes and jump instructions.

Working with proofs on the machine language, we will find it convenient for the program counter to admit negative values, i.e., to be of type int in-stead of the initially more intuitive nat. The effect of this choice is that various decomposition lemmas about machine executions have nicer algebraic prop-erties and fewer preconditions than their nat counterparts. Such effects are usually discovered during the proof.

As inSection 3.3, our machine language models programs as lists of in-structions. Our int program counter will need to index into these lists. Isabelle comes with a predefined list index operator nth, but it works on nat. Instead of constantly converting between int and nat and dealing with the arising side conditions in proofs, we define our own int version of nth, i.e., for i ::

int :

(x # xs) !! i = (if i = 0 then x else xs !! (i − 1))

However, we still need the conversion int :: nat ⇒ int because the length of a list is of type nat. To reduce clutter we introduce the abbreviation

size xs ≡ int (length xs)

The !! operator distributes over @ in the expected way:

Lemma 8.1. If 0 6 i,

(xs @ ys) !! i = (if i < size xs then xs !! i else ys !! (i − size xs)) We are now ready to define the machine itself. To keep things simple, we directly reuse the concepts of values and variable names from the source language. In a more realistic setting, we would explicitly map variable names to memory locations, instead of using strings as addresses. We skip this step here for clarity, adding it does not pose any fundamental difficulties.

The instructions in our machine are the following. The first three are familiar fromSection 3.3:

datatype instr =

LOADI int | LOAD vname | ADD | STORE vname | JMP int | JMPLESS int | JMPGE int

The instruction LOADI loads an immediate value onto the stack, LOAD loads the value of a variable, ADD adds the two topmost stack values, STORE stores the top of stack into memory, JMP jumps by a relative value, JMPLESS compares the two topmost stack elements and jumps if the sec-ond one is less, and finally JMPGE compares and jumps if the secsec-ond one is greater or equal.

These few instructions are enough to compile IMP programs. A real ma-chine would have significantly more arithmetic and comparison operators, different addressing modes that are useful for implementing procedure stacks and pointers, potentially a number of primitive data types that the machine understands, and a number of instructions to deal with hardware features such as the memory management subsystem that we ignore in this formalization.

As in the source language, we proceed by defining the state such programs operate on, followed by the definition of the semantics itself.

8.1 Instructions and Stack Machine 97 Program configurations consist of an int program counter, a memory state for which we re-use the type state from the source language, and a stack which we model as a list of values:

type_synonym stack = val list

type_synonym config = int × state × stack

We now define the semantics of machine execution. Similarly to the small-step semantics of the source language, we do so in multiple levels: first, we define what effect a single instruction has on a configuration, then we define how an instruction is selected from the program, and finally we take the reflexive transitive closure to get full machine program executions.

We encode the behaviour of single instructions in the function iexec. The program counter is i, usually incremented by 1, except for the jump instruc-tions. Variables are loaded from and stored into the variable state s with function application and function update. For the stack stk, we use standard list constructs as well as hd2 xs ≡ hd (tl xs) and tl2 xs ≡ tl (tl xs) from Section 3.3.

fun iexec :: instr ⇒ config ⇒ config where

iexec (LOADI n ) (i , s, stk ) = (i + 1, s, n # stk ) iexec (LOAD x ) (i , s, stk ) = (i + 1, s, s x # stk )

iexec ADD (i , s, stk ) = (i + 1, s, (hd2 stk + hd stk ) # tl2 stk ) iexec (STORE x ) (i , s, stk ) = (i + 1, s(x := hd stk ), tl stk )

iexec (JMP n ) (i , s, stk ) = (i + 1 + n , s, stk ) iexec (JMPLESS n ) (i , s, stk ) =

(if hd2 stk < hd stk then i + 1 + n else i + 1, s, tl2 stk ) iexec (JMPGE n ) (i , s, stk ) =

(if hd stk 6 hd2 stk then i + 1 + n else i + 1, s, tl2 stk )

The next level up, a single execution step selects the instruction the pro-gram counter (pc) points to and uses iexec to execute it. For execution to be well defined, we additionally check if the pc points to a valid location in the list. We call this predicate exec1 and give it the notation P ` c → c0for program P executes from configuration c to configuration c0.

definition exec1 :: instr list ⇒ config ⇒ config ⇒ bool where P ` c → c0 =

(∃ i s stk . c = (i , s, stk ) ∧ c0 = iexec (P !! i ) c ∧ 0 6 i < size P) where x 6 y < z ≡ x 6 y ∧ y < z as usual in mathematics.

The last level is the lifting from single step execution to multiple steps using the standard reflexive transitive closure definition that we already used for the small-step semantics of the source language, that is:

abbreviation P ` c →∗ c0 ≡ star (exec1 P ) c c0

This concludes our definition of the machine and its semantics. As usual in this book, the definitions are executable. This means, we can try out a simple example. Let P = [LOAD 0 0y0 0, STORE 0 0x0 0], s 0 0x0 0 = 3, and s 0 0y0 0 = 4. Then

values{(i, map t [0 0x0 0, 0 0y0 0], stk )|i t stk. P ` (0, s, []) →∗ (i, t, stk)}

will produce the following sequence of configurations:

{(0, [3, 4], []), (1, [3, 4], [4]), (2, [4, 4], [])}