• No results found

Syntax Analysis and grammar

N/A
N/A
Protected

Academic year: 2020

Share "Syntax Analysis and grammar"

Copied!
46
0
0

Loading.... (view fulltext now)

Full text

(1)
(2)

Syntax Analysis

Every programming language has precise rules that describe the

syntactic structure of well formed programs

The syntax of programming language constructs can be specified by

(3)

Role of a parser

Main tasks

(4)

Parser

Top Down

Builds parse trees from top to bottom

Example: Recursive decent parsing, predictive parsing

Bottom Up

(5)

Errors in a program

Lexical errors include misspellings of identifiers, keywords or operators if x<1 thenn y=5:

Syntactic

if ((x<1)&(y>5)))…. {…….{….{……}}s • Semantics

if (x+5) then ... Type Errors

Undefined IDs, etc • Logical

if (i<9)should <=not < Bugs

(6)

Requirement

Detect All Errors (Except Logical!) Messages should be helpful.

Difficult to produce clear messages! Example: Syntax Error Example:

(7)

Error Recovery Approaches: Panic

Mode

Discard tokens until we see a “synchronizing” tokenSimple to implement

Commonly used

The key...

Good set of synchronizing tokens

Knowing what to do then

(8)

Error Recovery Approaches:

Phrase-Level Recovery

Compiler corrects the program by deleting or inserting tokens ...so it can proceed to parse from where it was

The key

... Don’t get into an infinite loop ...constantly inserting tokens

(9)

Error Recovery Approaches: Error

Productions

Augment the CFG with “Error Productions” Now the CFG accepts anything!

If “error productions” are used... Their actions:

{ print (“Error...”) }

Used with...

(10)

Error Recovery Approaches: Global

Correction

Theoretical Approach

Find the minimum change to the source to yield a valid program

(Insert tokens, delete tokens, swap adjacent tokens)

(11)

Grammars

Context-free grammar is a 4-tuple

G = (N, T, P, S) where

T is a finite set of tokens (terminal symbols)N is a finite set of nonterminals

P is a finite set of productions of the form

  

where   (NT)* N (NT)* and   (NT)*

(12)

Notational Conventions Used

Terminals

a,b,c,… T

specific terminals: 0, 1, id, +Nonterminals

A,B,C,… N

specific nonterminals: expr, term, stmtGrammar symbols

X,Y,Z  (NT)

• Strings of terminals

u,v,w,x,y,z T*

Strings of grammar symbols

(13)

Example

• Terminals Keywords

else “else” • Token Classes

ID INTEGER REAL

• Punctuation

; “;” ;

• Non-terminals

Any symbol appearing on the left hand side of any rule

• Start Symbol

Usually the non-terminal on the left hand side of the first rule

• Rules (or “Productions”)

(14)

S → 0S1 S → ε

The string 0011 is in the language generated. The derivation is: S = 0S1 = 00S11 = 0011 ⇒ ⇒ ⇒ For compactness, we write S → 0S1 | ε

(15)

Palindrome

Let P be language of palindromes with alphabet { a, b }. One can

determine a CFG for P by finding a recursive decomposition.

If we peel first and last symbols from a palindrome, what remains is a

palindrome; and if we wrap a palindrome with the same symbol front and back, then it is still a palindrome.

CFG is

P → a P a | b P b | ε

(16)

Even 0’s

A CFG for all binary strings with an even number of 0’s.

Find the decomposition. If first symbol is 1, then even number of 0’s

remain. If first symbol is 0, then go to next 0; after that again an even number of 0’s remain.

(17)

Alternate CFG for Even 0’s

Here is another CFG for the same language.

Note that when first symbol is 0, what remains has odd number of 0’s.

(18)

Examples

A CFG for the regular language corresponding to the RE 00 11 . ∗ ∗

The language is the concatenation of two languages: all strings of zeroes with all strings of ones.

(19)

Derivation

We derive strings in the language of a CFG by starting with the start

symbol, and repeatedly replacing some variable A by the right side of one of its productions.

That is, the “productions for A” are those that have A on the left side of

(20)

Derivations – Formalism

We say αAβ => αϒβ if A -> ϒ is a production. Example: S -> 01; S -> 0S1

(21)

Iterated Derivation

=>* means “zero or more derivation steps.”Basis: α =>* α for any string α.

(22)

Example: Iterated Derivation

S -> 01; S -> 0S1.

S => 0S1 => 00S11 => 000111.

(23)

Sentential Forms

Any string of variables and/or terminals derived from the start symbol

is called a sentential form

(24)

Leftmost and Rightmost Derivations

Derivations allow us to replace any of the variables in a string. Leads to many different derivations of the same string.

By forcing the leftmost variable (or alternatively, the rightmost

(25)

Leftmost Derivations

• Say wA = > lm w if w is a string of terminals only and A -> is a production.

• Also,  = > * lm  if  becomes  by a sequence

(26)

Example

: Leftmost Derivations

26

Balanced-parentheses grammar: S ->SS | (S) |()

S = > lm SS = > lm (S)S = > lm (())S = > lm(())()

•Thus, S = > * lm (())()

S = > SS = > S() = > (S)() = > (())() is a

(27)

Rightmost Derivations

27

• Say Aw = > rm w if w is a string of terminals only and A ->  is a production.

(28)

Example

: Rightmost

Derivations

• Balanced-parentheses grammmar: S -> SS | (S) | ()

• S = > rm SS = > rm S() = > rm (S)() = > rm (())()

• Thus, S = > * rm (())()

• S = > SS = > SSS = > S()S = > ()()S = >

(29)

29

Parse Trees

Parse trees are trees labeled by symbols of a

particular CFG.

Leaves: labeled by a terminal or ε.

Interior nodes: labeled by a variable.

Children are labeled by the right side of a production

for the parent.

(30)

30

Example

: Parse Tree

S -> SS | (S) | ()

S

S S

S )

(

( )

(31)

31

Yield of a Parse Tree

The concatenation of the labels of the leaves in left-to-right order

That is, in the order of a preorder traversal.

is called the yield of the parse tree.

Example: yield of is (())()

(32)

32

Parse Trees, Left- and

Rightmost Derivations

For every parse tree, there is a unique leftmost, and a unique

rightmost derivation.

We’ll prove:

(33)

33

Parse Trees and Rightmost

Derivations

The ideas are essentially the mirror image of the

proof for leftmost derivations.

(34)

34

Parse Trees and Any Derivation

The proof that you can obtain a parse tree from a

leftmost derivation doesn’t really depend on “leftmost.”

First step still has to be A => X1…Xn.

And w still can be divided so the first portion is

(35)

Ambiguity

• A grammar is ambiguous if it has more than one Parse-Tree for some string.

Equivalently, there is more than one right-most or left-most derivation for some string.

• Ambiguity is bad: Leaves meaning of some programs ill-defined since we cannot decide its syntactical structure uniquely.

• Ambiguity is a Property of Grammars, not Languages.

• Two alternative solutions:

1. Disambiguate the grammar

(36)

Ambiguity: Arithmetic Expressions

Consider the Grammar for arithmetic expressions:

E E + E | E E | (E ) | −E | id

The sequence of Tokens id + id ∗ id has two Parse-Trees

E

E

E + E E *

id E * E E E

id id id id

The first Parse-Tree reflects the usual assumption that * takes precedence on +.

E

(37)

Free University of Bolzano–Formal Languages and Compilers. Lecture V, 2014/2015 – A.Artale

(11)

Eliminating Ambiguity by Disambiguating the Grammar

• Sometime it is possible to eliminate ambiguity by rewriting the Grammar.

Example. Let us rewrite the Grammar for arithmetic expressions:

– Enforces precedence of * over +; – Enforces left-associativity of + and *

EE+T |

T T

F

→ →

T F |

F

(38)

(12)

Eliminating Ambiguity: Example

The sequence of Tokens id + id ∗ id has now only one Parse-Tree

E

E + T

T T * F F id id F id

EE+T |

T T

F

→ →

T F |

F

(39)

Ambiguity: The Dangling Else

• Consider the Grammar for if-then-else statements:

St m t → if E x pr then

St m t| if E x pr then St m t else St m t

| other

• This Grammar is ambiguous.

Example. Consider the statement:

(40)

else

The Dangling Else: Example

The statement: if E1 then if E2 then S1 else S2, has two Parse-Trees

Stmt Stmt

if EE 11 then Stmt if then Stmt else

if E2 then S1 S2 E2

S2

if then S1

• Typically, the first Parse-Tree is preferred.

Disambiguating Rule: Match each else with the closest unmatched

(41)

Disambiguating Dangling Else

Disambiguating Rule: Match each else with the closest unmatched

then.

• The rule can be incorporated into the Grammar if we distinguish between

matched and unmatched statements.

• A statement between a then-else must be matched.

Stm t Matched stmt

→ Matched stmt | Unmatched stmt

→ if Expr then Matched stmt else Matched stmt

| Other-Stmt

• This Grammar generates the same set of strings as the previous one but gives just one Parse-Tree for if-then-else statements.

Unmatched

stmt → if Expr then Stmt

| if Expr then Matched stmt

(42)

Elimination of Left Recursion

• A grammar is left recursive if it has a non terminal A such that there is a derivation A=>Aα for some string α.

• Top down parsing methods can not handle left recursive grammars.

(43)

Elimination of left recursion

• If we have production AAα|β.Left recursion can be eliminated by following rules.

AβA A’A’ |ε

• EE+T | T

• T T*F | F

• F ( E) | id

• ET E’

• E’+ TE’|ε

• T FT’

• T’* FT’| ε

(44)

Left Factoring

• Useful for producing grammar suitable for predictive or top down parsing

• When the choice between two alternative a productions is not clear we may be able to rewrite the productions.

(45)

Example-Left factoring

If we have the two productions stmt if expr then stmt else stmt | if expr then stmt

On seeing the input if we cannot immediately tell which production to choose to expand stmt.

In general, if Aα β1 | α β1

Rules for left factored are following

(46)

Example-Left factoring

stmt if expr then stmt else stmt | if expr then stmt

• After left factoring

stmt  if expr then stmt S’

References

Related documents

Taking appropriate measures to control pollution or reduce harm to the environment.. 2) Ethical Energy :- An ethical manufacturing has another component: Ethical

2 3-D reconstruction of the segmented scapulae showing distinct differences regarding relative external acromial rotation ( α – angle between a tangent to the lateral acromial

available today have one big shortcoming- lacl'\ of software. It is designed around the Intersil IM6100 microprocessor chip , a device which is software-compatible with the

Note!! If you are using the dopc.m file, and you use the maximum number of allowed iterations, assume that the solution has NOT converged. You must usually change the value of k

The collection of aerosol reference materials continued and reference layers for surface analysis were prepared and characterized by RBS, the scanning nuclear microprobe could

The proton conductivity of a series of extruded Nafion membranes 关 of equivalent weight 共 EW 兲 of 1100 and nominal dry thickness of 51, 89, 127, and 178 ␮ m 兴 has been

Comparing hummingbird yaw turns with Drosophila free flight saccades (Fry et al., 2003) reveals that both animals expand the stroke amplitude of the outer wing, but the fruit

Thus the negative slow potential was accompanied by an increase in impulse frequency in the antennal sensillum of the silkworm larva as observed in the labellar chemosensory hair of