WEEK9-10

(1)

(2)

Role of a Parser

• _{Obtains stream of tokens from the lexical analyzer.}

• _{Verifies that the given token sequence can be generated}

according to the grammar.

• _{Reports any syntax error in an intelligible fashion.}

• _{Recovers from commonly occurring errors and continue}

processing the remaining of the input.

Lexical Analyzer

Source Program Stream of Tokens _Syntax

Analyzer

(3)

Context Free Grammar

• _{Context-free grammar (CFG) is a formal grammar in}

which every production rule is of the form: producer → production

Where producer is a single nonterminal symbol, and production is a string of terminals and/or non-terminals (or can be empty).

Basic elements of a CFG:

• _Terminals

• _{Non-terminals} • _{Start Symbol}

(4)

Basic Elements of a CFG

• _Terminals

• _{Basically they are tokens.}

• _{They can not produce something that’s why we call them terminals.} • _{In a parse/syntax tree, leaf nodes represent terminals.}

• _{While writing a CFG, they can be written in}_bold_{face, but as far as white-board is concerned, we}

will write them using CAPITAL face.

• _{Non-Terminals}

• _{They are symbols other than terminals.}

• _{Non-leaf nodes of a parse tree represent them.}

• _{They can produce more sequence of terminals and/or non-terminals.} • _{Left hand side of the CFG contains non-terminals.}

• _{They can be represented as}_italic_{face. But we will represent them on white-board using small face.}

• _{Start Symbol}

• _{Non-terminal from which CFG starts.}

(5)

Basic Elements of a CFG

• _{Set of Productions}

• _{A production can be written in the form like:}

producer  production

where,

producer is a non-terminal which can produce or derived to a production;

production may contain sequence of terminals or non-terminals; or it can be simply empty.

Example: (Grammar 1.1)

dec_stmt  data_type id ;

(6)

lm

Derivations

• _{With the help of derivations we can check the syntax of tokens streams with}

respect to CFG.

• _{Derivations can be of two types:}

• _{Left most derivation}

• Drives from left to right each symbol at a time.

• Can be represented as lm on the arrow.

• Right most derivation

• Drives from right to left each symbol at a time.

• Can be represented as rm on the arrow.

• _Example:

• Check whether “int id ;” can be verifies by the grammar 1.1. decl_stmt => data_type id ; => int id ;

Verified. Hence there is no syntax error.

• _{Check whether “char id, id;” can be verifies by the grammar 1.1.} decl_stmt => data_type id ; => char id ;

Cannot be produced by CFG, hence syntax error.

(7)

Examples

1. DECLARATION STATEMENT

dec_stmt  data_type id_lst ;

data_type int | float | char

id_lst  id id_lst | id

What will be the start symbol, terminals and non-terminals of the above grammar?

Is the above id_lst is same as the following?

id_lst  id_lst id | id

But what if we want a comma delimited id list?

(8)

Examples

2. FUNCTION CALL STATEMENT

fcall_stmt  call id ( param_lst ) ;

param_lst param , param_lst | param param  id | literal | num

Why the call is in bold face ?

(9)

Examples

fcall_stmt  call id ( param_lst_opt ) ;

param_lst_opt  param_lst | €

(10)

Examples

3. EXPRESSIONS

expr  expr op expr | ( expr ) | - expr | id

op  + | - | * | /

or we can write the above grammar as follows:

expr  expr op expr expr  ( expr )

expr  - expr expr  id

op  + | - | * | /

Can we drive the above grammar to the following?

(11)

Parse Tree

• _{Syntax tree or parse tree represents the syntactic}

structure of a string according to some formal grammar.

• _{In a parse tree, the interior nodes are labeled by}

(12)

Syntax Error Handling

Common programming errors can occur at many different levels.

• _Lexical

• _{Examples includes a lexeme whose token cannot be generated, unclosed literals.}

• _Syntactic

• _{Examples include misplaced semicolons or extra or missing braces; that is, '((" or}

")." As another example, in C or Java, the appearance of a case statement without an enclosing switch is a syntactic error.

• _Semantic

• _{Include type mismatches between operators and operands.}

• _{An example is a return statement in a Java method with result type void.}

• _{A few semantic errors, such as type mismatches, can also be detected efficiently;}

however, accurate detection of semantic and logical errors at compile time is in general a difficult task.

• _Logical

• _{Can be anything from incorrect reasoning on the part of the programmer to the use}

(13)

Error Handler

The error handler in a parser has goals that are simple to state but challenging to realize:

• _{Report the presence of errors clearly and accurately.} • _{Recover from each error quickly enough to detect}

subsequent errors.

• _{Add minimal overhead to the processing of correct}

(14)

Error-Recovery Strategies

Once an error is detected, how should the parser recover?

• _{The simplest approach is for the parser to quit with an informative}

error message when it detects the first error. Additional errors are often uncovered if the parser can restore itself to a state where processing of the input can continue with reasonable hopes that the further processing will provide meaningful diagnostic

(15)

Panic-Mode Recovery

• _{With this method, on discovering an error, the parser}

discards input symbols one at a time until one of a designated set of synchronizing tokens is found.

• _{The synchronizing tokens are usually delimiters, such as}

semicolon or }, whose role in the source program is clear and unambiguous.

• _{The compiler designer must select the synchronizing}

tokens appropriate for the source language. While panic-mode correction often skips a considerable amount of input without checking it for additional errors, it has the

(16)

Phrase-Level Recovery

• _{On discovering an error, a parser may perform local}

correction on the remaining input; that is, it may replace a prefix of the remaining input by some string that allows the parser to continue.

• _{A typical local correction is to replace a comma by a}

semicolon, delete an extraneous semicolon, or insert a missing semicolon.

• _{The choice of the local correction is left to the compiler}

designer.

• _{Its major drawback is the difficulty it has in situations}

(17)

Error Productions

• _{By anticipating common errors that might be encountered,}

we can augment the grammar for the language at hand with productions that generate the erroneous constructs.

• _{The parser can then generate appropriate error}

(18)

Global Correction

• _{Ideally, we would like a compiler to make as few changes}

as possible in processing an incorrect input string.

• _{There are algorithms for choosing a minimal sequence of}

changes to obtain a globally least-cost correction.

• _{Given an incorrect input string}_x_{and grammar}_G_{, these}

algorithms will find a parse tree for a related string y, such that the number of insertions, deletions, and changes of tokens required to transform x into y is as small as

possible.

• _{Unfortunately, these methods are in general too costly to}

(19)

Context-free Grammar

Example: If statement:

stmt  if ( expr ) stmt else stmt

(20)

Context-free Grammar

Example: Arithmetic Expression

expr expr op expr

expr ( expr )

expr - expr

expr id

op  +

op 

-op  *

op  /

expr expr op expr

| ( expr ) | - expr | id

op  + | - | * | /

We may write in following both ways

We also use shorthand

E  E A E

| ( E ) | - E | id

(21)

Left-most and Right-most derivations

(22)

Building Parse Tree

Sentence: “- ( id + id )”

E

Þ - E

Þ - ( E )

Þ - ( E + E )

Þ - ( id + E )

(23)

Building Parse Tree

E E

- E

E

- E

( E )

E

- E

( E )

E + E

E

- E

( E )

E + E

id

E

- E

( E )

E + E

(24)

Ambiguity

• _{Grammar that produce more than one parse tree for a}

sentence with the same scheme of derivation.

• _{Ambiguous grammar is one that produces more than one}

(25)

Ambiguity Example

Left-most derivation - I

E

Þ E + E

Þ id + E

Þ id + E * E

Þ id + id * E

Þ id + id * id

E  E + E

| E * E

| ( E ) | - E | id

Sentence: “id + id * id”

Left-most derivation - II

E

Þ E * E

Þ E + E * E

Þ id + E * E

Þ id + id * E

(26)

Ambiguity

QUIZ#3

You are given the following grammar:

Also the following sentence:

Prove that the grammar is ambiguous by using:

•Left-most derivations

•Drawing parse trees

E  E + E

| E * E

| ( E ) | - E | id

(27)

Ambiguity

Grammar of an if statement with optional else:

stmt  if expr then stmt

| if expr then stmt else stmt I other

Suppose E1, E2 are expressions and S1 and S2 are

statements. What will be the parse tree for the following sentence?

(28)

Two Parse trees exists

Sentence: “if El then if E2 then S1 else S2”

stmt

if expr then stmt

if expr then stmt else stmt E1

E2 S1 S2

stmt

if expr then stmt else stmt

S2

if expr then stmt

E1

(29)

Left Factoring (Removing Ambiguity)

stmt  if expr then stmt

| if expr then stmt else stmt I other

stmt  if expr then stmt stmt’ | other

(30)

Left Recursion

If we have a grammar in the form of: A  A α | β

Then,

A  β A’

A’  α A’ | Є

Here α and β are terminals.

(31)

Left Recursion

Consider the following grammar: E  E + T | E – T | T

T  T * F | T / F | F F  ( E ) | id

Here,

E is expression.

T is Term which can be added or subtracted. F is factor which can be multiplied or divided.

Above grammar satisfies operator precedence:

• _{In parse tree, * and / will be placed at lower levels compare to + or - hence} will be evaluated earlier.

(32)

Removing Left Recursion

Consider the following grammar:

E  E + T | T T  T

*

F | F F  ( E ) | id

E  T E’

E’  + T E’ | Є

T  F T’

T’  * F T’ | Є

(33)

Top down parser

•

Top-down parsing can be viewed as the problem

of constructing a parse tree for the input string,

starting from the root and creating the nodes of

the parse tree in

preorder

i.e. Depth first.

•

Equivalently, top-down parsing can be viewed as

finding a

leftmost derivation

for an input string.

•

How grammar should be?

• _{Left Factored}

(34)

Bottom up Parser

• _{A bottom-up parse corresponds to the construction of a}

parse tree for an input string beginning at the leaves (the bottom) and working up towards the root (the top).

• _{In most of the cases}_{right most derivation}_{is used in}

bottom up parsing.

• _{Is basically construct a derivation in reverse.}

Reductions

• _{We can think of bottom-up parsing as the process of "reducing" a}

string w to the start symbol of the grammar.

• _{At each reduction step, a specific substring matching the body of a}

(35)

Example

Reduction

id * id F * id T * id T * F T

E

E  E + T | T T  T

*

F | F F  ( E ) | id

Input Sentence:

id * id

Right-most derivation

E T

(36)

Parser classes

• _{LL parser}

• _{Scan input symbols from}_L_{eft to right.}

• _Construct_L_{eft most derivation of the sentence.} • _{Top down parser}

• _{Grammar must NOT be left recursive as left most derivation is}

performed.

• _{LR parser}

• _{Scan input symbols from}_L_{eft to right.}

• _Construct_R_{ight most derivation of the sentence.} • _{Bottom up parser}

• _{Grammar must NOT be right recursive as right most derivation is}

(37)

Recursive Descent Parser

•

Is is a kind of top-down parser.

•

Buildup from a set of mutually recursive

procedures.

•

Each procedure reflects a production rule,

hence resulting program reflects CFG.

•

Implementation is specific to a CFG i.e. non

generic implementation.

•

It involves backtracking.

(38)

How backtrack works?

• _{Consider input = id + id} And the grammar: E  T E’ | T

E’  + T E’ | Є T  F T’ | F T’  * F T’ | Є F  ( E ) | id

E E T E’ E T E’ F E T E’ F

( E )

E

T E’

id

And so on … Backtrack

(39)

Predictive Parser

• _{With the help of next input symbol, this type of top down}

parser can predict which production must be chosen to produce that input symbol.

• _{No backtracking}_{is involves.}

• _{Implementation may be recursive or non-recursive, hence}

there exists:

• _{Recursive predictive parser.}

(40)

Recursive Predictive Parser

It is implemented with the help of mutually recursive procedures as in recursive descent parser but to avoid backtracking, at each step, we chose a recursive path

(41)

(42)

Non-Recursive Predictive Parser

Rather implement with the help of mutually recursive procedures, it uses a parsing table for prediction.

Parsing Table

• _{It can predict which production of a non-terminal}_nt_{must be chosen}

to produce next input symbol s.

• _{For example consider a function which returns a production against}

the pair of non-terminal and terminal. string PredictProduction(nt, s);

• _{To create a parsing table we must know about the:}

• _CFG

(43)

Non-Recursive Predictive Parser

First(x)

• _{Set of terminals from which a non-terminal x can start.}

Follow(x)

(44)

Non-Recursive Predictive Parser

CFG

E  T E’

E’  + T E’ | Є T  F T’

T’  * F T’ | Є F  ( E ) | id

Parsing Table

SYMBOL FIRST FOLLOW

E ( id ) $ E' + Є ) $

T ( id + ) $ T' * Є + ) $

(45)

(46)

Shift Reduce Parser

• _{Bottom up parser} • _{LR parser.}

(47)

Shift Reduce Parser

MOVES

• _{Shift j}

• _{Append an input symbol to symbols.} • _{Push j to stack.}

• _{Reduce j}

• _{Reduce N number of symbols and push their reduced form w.r.t.}

production no. j of CFG..

• _{Also pop N elements from stack.}

• _Accept

• _{Parser accept the input w.r.t. action table.}

• _Reject

(48)

Shift Reduce Parser

Action-Goto Table

(49)

Moves of a Shift Reduce Parser

STACK INPUT ACTION

0 id * id + id $ S5

0 id 5 * id + id $ R6: F->id 0 F 3 * id + id $ R4: T->F

0 T 2 * id + id $ S7

0 T 2 * 7 id + id $ S5

0 T 2 * 7 id 5 + id $ R6: F->id 0 T 2 * 7 F 10 + id $ R3: T->T*F

0 T 2 + id $ R2: E->T

0 E 1 + id $ S6

0 E 1 + 6 id $ S5

0 E 1 + 6 id 5 $ R6: F->id

0 E 1 + 6 F 3 $ R4: T->F

0 E 1 + 6 T 9 $ R1: E->E+T

(50)

Shift Reduce Parser Algorithm

1. Push 0 to STACK.

2. Append $ to INPUT.

3. Set X:=GetAction(STACK, INPUT)

4. If X is shift action Sj Then:

a) Move a symbol from INPUT to STACK.

b) Push j to STACK.

Else If X is reduce action Rj Then:

a) Set N equal to no. of symbols of jth production rule.

b) Pop Nx2 elements from STACK.

c) Push producer of jth production on STACK.

d) Set Y:=GetGoto(STACK)

e) PUSH(Y)

Else If X is ‘Accepted’ Then:

a) Write ‘Accepted’ and go to step 6. Else:

a) Write ‘Rejected’ and go to step 6.

5. Go to step 3.