• No results found

Informal Compiler Algorithm Notation (ICAN)

I

n this chapter we discuss ican, the Informal Compiler Algorithm Notation we use in this text to express compiler algorithms. First we discuss the extended Backus-Naur form that is used to express the syntax of both ican and the intermediate languages discussed in the following chapter. Next we provide an in­ troduction to the language and its relationship to common programming languages, an informal overview of the language, and then a formal description of the syntax of ican and an informal English description of its semantics. It is hoped that, in general, the informal overview will be sufficient for the reader to understand ican

programs, but the full definition is provided to deal with those instances where it is not.

2.1

Extended Backus-Naur Form Syntax Notation

To describe the syntax of programming languages we use a version of Backus-Naur Form that we call Extended Backus-Naur Form, or x b n f. In xbnf terminals are written in ty p ew riter fon t (e.g., “ ty p e ” and “ [” ), nonterminals are written in italic font with initial capital letters and other uppercase letters interspersed for read­ ability (e.g., “ ProgUnit”, not “ Progunit” ). A production consists of a nonterminal followed by a long right arrow (“ — ► ” ) and a sequence of nonterminals, terminals, and operators. The symbol “ e ” represents the empty string of characters.

The operators are listed in Table 2.1. The operators superscript superscript “ + ” , and “ x ” all have higher precedence than concatenation, which has higher precedence than alternation “ |” . The curly braces “ {” . . . “ }” and square brackets “ [” . . . “ ]” act as grouping operators, in addition to brackets indicating that what they contain is optional. Note that the xbnf operators are written in our ordinary text font. When the same symbols appear in ty p e w riter fo n t, they are terminal symbols in the language being defined. Thus, for example,

20 Informal Compiler Algorithm Notation (ICAN)

TABLE 2.1 Operators used in Extended Backus-Naur Form syntax descriptions.

Symbol Meaning

1 Separates alternatives

{ and } Grouping

[ and ] Optional

* Zero or more repetitions

+ One or more repetitions

X One or more repetitions of the left operand separated by occurrences of the right operand

Knittinglnst — > {{kn it | p u rl} Integer | c a s t o f f } +

describes a Knittinglnst as a sequence of one or more of any of three possibilities, namely, k n it followed by an integer, p u rl followed by an integer, or c a s to ff; and

Wall — > b ric k x mortar | cementblock x mortar

describes a Wall as a sequence of b rick s separated (or, perhaps more appropriately, joined) by occurrences of mortar or as a sequence of cementblocks separated by occurrences of mortar.

As a more relevant example, consider

ArrayTypeExpr — > array [ ArrayBounds ] of TypeExpr ArrayBounds — > {[Expr\ •• [Expr]} tx ,

The first line describes an ArrayTypeExpr as the keyword array , followed by a left bracket “ [” , followed by an occurrence of something that conforms to the syntax of ArrayBounds, followed by a right bracket “ ] ” , followed by the keyword of, followed by an occurrence of something conforming to the syntax of TypeExpr. The second line describes ArrayBounds as a series of one or more triples of the form of an optional Expr, followed by “ • • ” , followed by an optional Expr, with the triples separated by com m as", ” . The following are examples of ArrayTypeExprs:

array [ • • ] of in te g e r array [1•* 10] of r e a l array [ l - * 2 , l * - 2 ] of r e a l array [m**n+2] of boolean

2.2

Introduction to ICAN

Algorithms in this text are written in a relatively transparent, informal notation1 called ican (Informal Compiler Algorithm Notation) that derives features from

1. One measure of the informality of icanis that many facets of the language that are considered

Section 2.2 Introduction to ICA N 21

1 Struc: Node — > set of Node

2

3 procedure Example.1(N,r) 4 N: in set of Node 5 r: in Node 6 begin

7 change :* true: boolean 8 D, t: set of Node 9 n, p: Node 10 Struc(r) := {r> 11 for each n e N (n * r) do 12 Struc(n) :* N 13 od 14 while change do 15 change :* false 16 for each n e N - {r> do 17 t :« N

18 for each p e Pred[n] do 19 t n= Struc(p)

20 od

21 D := {n} u t 22 if D * Struc(n) then

23 change := true; Struc(n) := D

24 fi

25 od

26 od

27 end I I Example.1

FIG. 2.1 A sample ican global declaration and procedure (the line numbers at left are not part of the code).

several programming languages, such as C, Pascal, and Modula-2, and that ex­ tends them with natural notations for objects such as sets, tuples, sequences, func­ tions, arrays, and compiler-specific types. Figures 2.1 and 2.2 give examples of i c a n

code.

The syntax of i c a n is designed so that every variety of compound statement

includes an ending delimiter, such as “ f i ” to end an if statement. As a result, sep­ arators are not needed between statements. However, as a convention to improve readability, when two or more statements are written on the same line we separate them with semicolons (Figure 2.1, line 23). Similarly, if a definition, declaration, or statement extends beyond a single line, the continuation lines are indented (Fig­ ure 2.2, lines 1 and 2 and lines 4 and 5).

A comment begins with the delimiter “ I I” and runs to the end of the line (Figure 2.1, line 27).

Lexically, an i c a n program is a sequence of a s c i i characters. Tabs, comments,

line ends, and sequences of one or more spaces are called “whitespace.” Each occurrence of whitespace may be turned into a single space without affecting the meaning of a program. Keywords are preceded and followed by whitespace, but operators need not be.

22 In form al C o m p iler A lgorith m N o ta tio n (IC A N )

1 webrecord = record {defs: set of Def,

2 uses: set of Use}

3

4 procedure Example_2(nwebs,Symreg,nregs, 5 Edges) returns boolean

6 nwebs: inout integer 7 nregs: in integer

8 Symreg: out array [1••nwebs] of webrecord 9 Edges: out set of (integer x integer) 10 begin

11 si, s2, rl, r2: integer

12 for rl := 1 to nregs (odd(rl)) do 13 Symreg[nwebs+rl] := nil 14 od 15 Edges := 0 16 for rl := 1 to nregs do 17 for r2 := 1 to nregs do 18 if rl * r2 then 19 Edges u= {<nwebs+rl,nwebs+r2>} 20 fi 21 od 22 od 23 for si := 1 to nwebs do 24 repeat 25 case si of 26 1: s2 := si * nregs 27 2: s2 := si - 1 28 3: s2 := nregs - nwebs 29 return false 30 default: s2 := 0 31 esac 32 until s2 = 0 33 for r2 := 1 to nregs do 34 if Interfere(Symreg[si],r2) then 35 goto LI 36 fi 37 od 38 LI: od 39 nwebs += nregs 40 return true 41 end I I Example_2

FIG. 2.2 A second example of ican code (the line numbers at left are not part of the code).

Lexical analysis proceeds left to right, and characters are accumulated to form tokens that are as long as they can be. Thus, for example, the code

f o r I3 7 _ 6 a := - 1 2 by 1 t o n l7 a do consists of nine tokens, as follows:

Section 2.3 A Quick Overview of ICAN 23