• No results found

Syntax Analysis

N/A
N/A
Protected

Academic year: 2020

Share "Syntax Analysis"

Copied!
32
0
0

Loading.... (view fulltext now)

Full text

(1)

Lecture 3: Syntax

Analysis

(2)

The role of the parser

performs context-free syntax analysis

guides context-sensitive analysis

constructs an intermediate representation

produces meaningful error messages

attempts error correction

P

a

rs

in

(3)

Syntax analysis

Grammars are often written in Backus-Naur form (BNF).

Example:

In a BNF for a grammar, we represent

1. non-terminals with <angle brackets> or CAPITAL LETTERS

2. terminals with typewriter font or underline

3. productions as in the example

1. <goal> ::= <expr>

2. <expr> ::= <expr> <op> <expr>

3. | num

4. | id

5. <op>::= +

6. | —

7. | *

(4)

BNF Example

Write a BNF grammar for the language of Pascal variable declarations.

var i : integer;

var b : boolean;

var myfloat : real;

mychar : char;

x, y, z : integer;

Solution:

<vardecl> ::= var <vardecllist> ;

<vardecllist> ::= <varandtype> { ; <varandtype> }

<varandtype> ::= <ident> { , <ident> } : <typespec>

<ident> ::= <letter> { <idchar> }

(5)

Notational Conventions Used

Terminals

a,b,c,…

T

specific terminals:

0

,

1

,

id

,

+

Nonterminals

A,B,C,…

N

specific nonterminals:

expr

,

term

,

stmt

Grammar symbols

X,Y,Z

(

N

T

)

Strings of terminals

u,v,w,x,y,z

T

*

Strings of grammar symbols

,

,

(

N

T

)*

(6)

Scanning vs. parsing

Factoring out lexical analysis simplifies the compiler

term ::= [a-zA-Z] ( [a-zA-Z]  [0-9] )*

 0  [1-9][0-9]* op ::= +  —  *  /

expr ::= (term op)* term

Where do we draw the line?

Regular expressions:

Normally used to classify identifiers, numbers, keywords …

Simpler and more concise for tokens than a grammar

More efficient scanners can be built from REs

CFGs are used to impose structure

Brackets:

(), begin … end, if … then … else

(7)

Hierarchy of grammar classes

© O sc a r N ie rs tr a sz P a rs in g

LL(

k

):

Left-to-right, Leftmost derivation, k tokens lookahead

LR(

k

):

Left-to-right, Rightmost derivation, k tokens lookahead

SLR:

Simple LR (uses “follow sets”)

LALR:

(8)

Derivations

<goal>  <expr>

 <expr> <op> <expr>

 <expr> <op> <expr> <op> <expr>

 <id,x> <op> <expr> <op> <expr>

 <id,x> + <expr> <op> <expr>

 <id,x> + <num,2> <op> <expr>

 <id,x> + <num,2> * <expr>

 <id,x> + <num,2> * <id,y>

We can view the productions of a CFG as rewriting rules.

We have derived the sentence:

x + 2 * y

We denote this

derivation

(or

parse

) as: <goal>

*

id + num * id

(9)

New grammar with some

addition to force precedence

1. <goal> ::= <expr>

2. <expr> ::= <expr>

+

<term>

3.

|

<expr>

-

<term>

4.

|

<term>

5. <term> ::= <term>

*

<factor>

6.

|

<term>

/

<factor>

7.

|

<factor>

8. <factor>::=

num

(10)

Forcing the desired

precedence

Now, for the string:

x + 2 * y

<goal>

<expr>

<expr> + <term>

<expr> + <term> * <factor>

<expr> + <term> * <id,y>

<expr> + <factor> * <id,y>

<expr> + <num,2> * <id,y>

<term> + <num,2> * <id,y>

<factor> + <num,2> * <id,y>

<id,x> + <num,2> * <id,y>
(11)

Role of the Parser

Not all sequences of tokens are program.

Parser must distinguish between valid and

invalid

sequences of tokens.

(12)

Parsing: the big picture

(13)

Top-down versus bottom-up

Top-down

parser:

starts at the root of derivation tree and fills in

picks a production and tries to match the input

may require backtracking

some grammars are backtrack-free (

predictive

)

Bottom-up

parser:

starts at the leaves and fills in

starts in a state valid for legal first tokens

as input is consumed, changes state to encode possibilities

(

recognize

valid

prefixes

)

(14)

Top-Down Parsing

LL methods (Left-to-right, Leftmost derivation) and

recursive-descent parsing

14 Grammar:

ET + T T( E )

T- E Tid

Leftmost derivation:

Elm T + T

lm id + T

lm id + id

(15)

Left Recursion

Productions of the form

A

A

|

|

are left recursive

When one of the productions in a grammar is left recursive

then a predictive parser loops forever on certain inputs

(16)

Left Recursive Grammar

A grammar is said to be left –recursive if it has a non-terminal A

such that there is a derivation A =>Aa, for some string a.

Consider the grammar:

(i) A -> Aa|b

The parser can go into an infinite loop.

Corresponding grammar without left recursion:

A ->bR

(17)

Elimination of left recursion

17

Rewrite every left-recursive production

AA

|

|  | A

into a right-recursive production:

A   AR

|AR AR   AR

| AR

(18)

Eliminate Left recursion

1-

EE+T/T

2-ExprExpr - Term / Expr - Term / Term

TermTerm * Factor / Term ∕ Factor / Factor Factor (Expr)/ Num/ Identifier

3-

AB C | a

BC A | A b

(19)

Left Factoring

When a nonterminal has two or more

productions whose right-hand sides start with

the same grammar symbols, the grammar is not

LL(1) and cannot be used for predictive parsing

Replace productions

A

1

|

2

|

… |

n

|

with

A

A

R

|

A

R

1

|

2

| … |

n
(20)

Remove Left Factoring

1-S

if E then S else S

S

if E then S

(21)

Predictive Parsing

If a top down parser picks the

wrong production

, it

may need to

backtrack

Alternative is to

look ahead

in

input

and use

context to pick correctly

Fortunately, large classes of CFGs can be parsed

with

limited lookahead

Most

programming languages

constructs fall in

those subclasses

(22)

Predictive Parsing

Eliminate left recursion from grammar

Left factor the grammar

Compute FIRST and FOLLOW

Two variants:

Recursive (recursive-descent parsing)

Non-recursive (table-driven parsing)

(23)

Predictive Parsing

FIRST

Sets

:

For some rhs

G

, define

FIRST(

)

as the set of tokens

that appear as the

first symbol

in some string that derives

from

.

(24)

Predictive Parsing

That is,

x

FIRST(

)

iff



x



for some



.

(25)

FIRST Set

FIRST() = {

the set of terminals that begin all

strings derived from

 }

FIRST(

a

) = {

a

}

if

a

T

FIRST(

) = {

}

FIRST(

A

) =

A

FIRST(

)

for

A

 

P

FIRST(

X

1

X

2

X

k

) =

if

for all

j

= 1, …,

i

-1 :

FIRST(

X

j

)

then

add non-

in FIRST(

X

i

) to FIRST(

X

1

X

2

X

k

)

if

for all

j

= 1, …,

k

:

FIRST(

X

j

)

then

add

to FIRST(

X

1

X

2

X

k

)

(26)

Predictive Parsing

FOLLOW(

)

is the set of all words in the

grammar that can legally

appear

after

an

.

(27)

FOLLOW

FOLLOW(A) = { the set of terminals that can

immediately follow nonterminal A }

FOLLOW(A) =

for

all (B

A

)

P

do

add FIRST(

)\{

} to FOLLOW(A)

for

all (B

A

)

P

and

FIRST(

)

do

add FOLLOW(B) to FOLLOW(A)

for

all (B

A)

P

do

add FOLLOW(B) to FOLLOW(A)

if

A is the start symbol S

then

add

$

to FOLLOW(A)

(28)

Find First and Follow sets

CFG

exprexpr + term | term term term * factor | factor factor number | ( expr )

Reformed

EE+T/T TT*F/F F(E)/id

(29)

LL(1) Grammar

A grammar

G

is LL(1) if it is not left recursive and

for each collection of productions

A

1

|

2

| … |

n

for nonterminal

A

the following holds:

1.

FIRST(

i

)

FIRST(

j

) =

for all

i

j

2.

if

i

*

then

2.a.

j

*

for all

i

j

2.b.

FIRST(

j

)

FOLLOW(

A

) =

for all

i

j

(30)

Non-LL(1) Examples

30

Grammar

Not LL(1) because:

S

S

a

|

a

Left recursive

S

a

S

|

a

FIRST(

a

S

)

FIRST(

a

)

S

a

R

|

R

S

|

For

R

:

S

*

and

*

S

a

R

a

(31)

Example Table

31

ET ER

ER+ T ER | 

TF TR

TR* F TR | 

F( E ) | id

A

FIRST(

)

FOLLOW(

A

)

E

T E

R

( id

$ )

E

R

+

T

E

R

+

$ )

E

R

T

F T

R

( id

+ $ )

T

R

*

F

T

R

*

+ $ )

T

R

F

(

E

)

(

* + $ )

(32)

Example Table

32

ET ER

ER+ T ER | 

TF TR

TR* F TR | 

F( E ) | id

id

+

*

(

)

$

E

E

T E

R

E

T E

R

E

R

E

R

+

T

E

R

E

R

E

R

T

T

F T

R

T

F T

R

T

R

T

R

T

R

*

F

T

R

T

R

T

R

References

Related documents

In this study, CHD4 rs74790047, TSC2 rs2121870, and AR rs66766408, were found to be common exonic mutations in both lung cancer patients and normal individuals exposed to high

As noted in the Literature Review, above, scholarship on the determinants of foreign direct investment (FDI) variously argue the influence of GDP growth, the openness of a

Results of the survey are categorized into the following four areas: primary method used to conduct student evaluations, Internet collection of student evaluation data,

innovation in payment systems, in particular the infrastructure used to operate payment systems, in the interests of service-users 3.. to ensure that payment systems

UPnP Control Point (DLNA) Device Discovery HTTP Server (DLNA, Chormecast, AirPlay Photo/Video) RTSP Server (AirPlay Audio) Streaming Server.. Figure 11: Simplified

Because equity ownership that is assigned to or included in the pre-money valuation does not dilute the equity interest received by the venture capitalist, one point of negotiation

The biggest Bayesian network statistical model is described by the complete undirected graph over N and this essential graph corresponds to the ‘‘simplest” standard imset, namely

The modified electrode (DBT – CNT/GCE) was used to investi- gate electrocatalytic behavior of isoprenaline in phosphate buffer solution (pH = 7.0) using cyclic voltammetry,