• No results found

WEEK9-10

N/A
N/A
Protected

Academic year: 2020

Share "WEEK9-10"

Copied!
50
0
0

Loading.... (view fulltext now)

Full text

(1)
(2)

Role of a Parser

Obtains stream of tokens from the lexical analyzer.

Verifies that the given token sequence can be generated

according to the grammar.

Reports any syntax error in an intelligible fashion.

Recovers from commonly occurring errors and continue

processing the remaining of the input.

Lexical Analyzer

Source Program Stream of Tokens Syntax

Analyzer

(3)

Context Free Grammar

Context-free grammar (CFG) is a formal grammar in

which every production rule is of the form: producer → production

Where producer is a single nonterminal symbol, and production is a string of terminals and/or non-terminals (or can be empty).

Basic elements of a CFG:

Terminals

Non-terminalsStart Symbol

(4)

Basic Elements of a CFG

Terminals

Basically they are tokens.

They can not produce something that’s why we call them terminals.In a parse/syntax tree, leaf nodes represent terminals.

While writing a CFG, they can be written in bold face, but as far as white-board is concerned, we

will write them using CAPITAL face.

Non-Terminals

They are symbols other than terminals.

Non-leaf nodes of a parse tree represent them.

They can produce more sequence of terminals and/or non-terminals.Left hand side of the CFG contains non-terminals.

They can be represented as italic face. But we will represent them on white-board using small face.

Start Symbol

Non-terminal from which CFG starts.

(5)

Basic Elements of a CFG

Set of Productions

A production can be written in the form like:

producer  production

where,

producer is a non-terminal which can produce or derived to a production;

production may contain sequence of terminals or non-terminals; or it can be simply empty.

Example: (Grammar 1.1)

dec_stmtdata_type id ;

(6)

lm

Derivations

With the help of derivations we can check the syntax of tokens streams with

respect to CFG.

Derivations can be of two types:

Left most derivation

• Drives from left to right each symbol at a time.

• Can be represented as lm on the arrow.

Right most derivation

• Drives from right to left each symbol at a time.

• Can be represented as rm on the arrow.

Example:

• Check whether “int id ;” can be verifies by the grammar 1.1. decl_stmt => data_type id ; => int id ;

Verified. Hence there is no syntax error.

Check whether “char id, id;” can be verifies by the grammar 1.1. decl_stmt => data_type id ; => char id ;

Cannot be produced by CFG, hence syntax error.

(7)

Examples

1. DECLARATION STATEMENT

dec_stmtdata_type id_lst ;

data_typeint | float | char

id_lstid id_lst | id

What will be the start symbol, terminals and non-terminals of the above grammar?

Is the above id_lst is same as the following?

id_lstid_lst id | id

But what if we want a comma delimited id list?

(8)

Examples

2. FUNCTION CALL STATEMENT

fcall_stmtcall id ( param_lst ) ;

param_lstparam , param_lst | param paramid | literal | num

Why the call is in bold face ?

(9)

Examples

fcall_stmtcall id ( param_lst_opt ) ;

param_lst_opt  param_lst |

(10)

Examples

3. EXPRESSIONS

exprexpr op expr | ( expr ) | - expr | id

op+ | - | * | /

or we can write the above grammar as follows:

exprexpr op expr expr( expr )

expr- expr exprid

op+ | - | * | /

Can we drive the above grammar to the following?

(11)

Parse Tree

Syntax tree or parse tree represents the syntactic

structure of a string according to some formal grammar.

In a parse tree, the interior nodes are labeled by

(12)

Syntax Error Handling

Common programming errors can occur at many different levels.

Lexical

Examples includes a lexeme whose token cannot be generated, unclosed literals.

Syntactic

Examples include misplaced semicolons or extra or missing braces; that is, '((" or

")." As another example, in C or Java, the appearance of a case statement without an enclosing switch is a syntactic error.

Semantic

Include type mismatches between operators and operands.

An example is a return statement in a Java method with result type void.

A few semantic errors, such as type mismatches, can also be detected efficiently;

however, accurate detection of semantic and logical errors at compile time is in general a difficult task.

Logical

Can be anything from incorrect reasoning on the part of the programmer to the use

(13)

Error Handler

The error handler in a parser has goals that are simple to state but challenging to realize:

Report the presence of errors clearly and accurately.Recover from each error quickly enough to detect

subsequent errors.

Add minimal overhead to the processing of correct

(14)

Error-Recovery Strategies

Once an error is detected, how should the parser recover?

The simplest approach is for the parser to quit with an informative

error message when it detects the first error. Additional errors are often uncovered if the parser can restore itself to a state where processing of the input can continue with reasonable hopes that the further processing will provide meaningful diagnostic

(15)

Panic-Mode Recovery

With this method, on discovering an error, the parser

discards input symbols one at a time until one of a designated set of synchronizing tokens is found.

The synchronizing tokens are usually delimiters, such as

semicolon or }, whose role in the source program is clear and unambiguous.

The compiler designer must select the synchronizing

tokens appropriate for the source language. While panic-mode correction often skips a considerable amount of input without checking it for additional errors, it has the

(16)

Phrase-Level Recovery

On discovering an error, a parser may perform local

correction on the remaining input; that is, it may replace a prefix of the remaining input by some string that allows the parser to continue.

A typical local correction is to replace a comma by a

semicolon, delete an extraneous semicolon, or insert a missing semicolon.

The choice of the local correction is left to the compiler

designer.

Its major drawback is the difficulty it has in situations

(17)

Error Productions

By anticipating common errors that might be encountered,

we can augment the grammar for the language at hand with productions that generate the erroneous constructs.

The parser can then generate appropriate error

(18)

Global Correction

Ideally, we would like a compiler to make as few changes

as possible in processing an incorrect input string.

There are algorithms for choosing a minimal sequence of

changes to obtain a globally least-cost correction.

Given an incorrect input string x and grammar G, these

algorithms will find a parse tree for a related string y, such that the number of insertions, deletions, and changes of tokens required to transform x into y is as small as

possible.

Unfortunately, these methods are in general too costly to

(19)

Context-free Grammar

Example: If statement:

stmt  if ( expr ) stmt else stmt

(20)

Context-free Grammar

Example: Arithmetic Expression

exprexpr op expr

expr ( expr )

expr - expr

exprid

op+

op

-op*

op/

exprexpr op expr

| ( expr ) | - expr | id

op+ | - | * | /

We may write in following both ways

We also use shorthand

EE A E

| ( E ) | - E | id

(21)

Left-most and Right-most derivations

(22)

Building Parse Tree

Sentence: “- ( id + id )”

E

Þ - E

Þ - ( E )

Þ - ( E + E )

Þ - ( id + E )

(23)

Building Parse Tree

E E

- E

E

- E

( E )

E

- E

( E )

E + E

E

- E

( E )

E + E

id

E

- E

( E )

E + E

(24)

Ambiguity

Grammar that produce more than one parse tree for a

sentence with the same scheme of derivation.

Ambiguous grammar is one that produces more than one

(25)

Ambiguity Example

Left-most derivation - I

E

Þ E + E

Þ id + E

Þ id + E * E

Þ id + id * E

Þ id + id * id

EE + E

| E * E

| ( E ) | - E | id

Sentence: “id + id * id

Left-most derivation - II

E

Þ E * E

Þ E + E * E

Þ id + E * E

Þ id + id * E

(26)

Ambiguity

QUIZ#3

You are given the following grammar:

Also the following sentence:

Prove that the grammar is ambiguous by using:

•Left-most derivations

•Drawing parse trees

EE + E

| E * E

| ( E ) | - E | id

(27)

Ambiguity

Grammar of an if statement with optional else:

stmt if expr then stmt

| if expr then stmt else stmt I other

Suppose E1, E2 are expressions and S1 and S2 are

statements. What will be the parse tree for the following sentence?

(28)

Two Parse trees exists

Sentence: “if El then if E2 then S1 else S2”

stmt

if expr then stmt

if expr then stmt else stmt E1

E2 S1 S2

stmt

if expr then stmt else stmt

S2

if expr then stmt

E1

(29)

Left Factoring (Removing Ambiguity)

stmt if expr then stmt

| if expr then stmt else stmt I other

stmt if expr then stmt stmt’ | other

(30)

Left Recursion

If we have a grammar in the form of: AA α | β

Then,

Aβ A’

A’α A’ | Є

Here α and β are terminals.

(31)

Left Recursion

Consider the following grammar: E E + T | E T | T

T T * F | T / F | F F ( E ) | id

Here,

E is expression.

T is Term which can be added or subtracted. F is factor which can be multiplied or divided.

Above grammar satisfies operator precedence:

In parse tree, * and / will be placed at lower levels compare to + or - hence will be evaluated earlier.

(32)

Removing Left Recursion

Consider the following grammar:

E E + T | T T T

*

F | F F ( E ) | id

E T E’

E’ + T E’ | Є

T F T’

T’ * F T’ | Є

(33)

Top down parser

Top-down parsing can be viewed as the problem

of constructing a parse tree for the input string,

starting from the root and creating the nodes of

the parse tree in

preorder

i.e. Depth first.

Equivalently, top-down parsing can be viewed as

finding a

leftmost derivation

for an input string.

How grammar should be?

Left Factored

(34)

Bottom up Parser

A bottom-up parse corresponds to the construction of a

parse tree for an input string beginning at the leaves (the bottom) and working up towards the root (the top).

In most of the cases right most derivation is used in

bottom up parsing.

Is basically construct a derivation in reverse.

Reductions

We can think of bottom-up parsing as the process of "reducing" a

string w to the start symbol of the grammar.

At each reduction step, a specific substring matching the body of a

(35)

Example

Reduction

id * id F * id T * id T * F T

E

E E + T | T T T

*

F | F F ( E ) | id

Input Sentence:

id * id

Right-most derivation

E T

(36)

Parser classes

LL parser

Scan input symbols from Left to right.

Construct Left most derivation of the sentence.Top down parser

Grammar must NOT be left recursive as left most derivation is

performed.

LR parser

Scan input symbols from Left to right.

Construct Right most derivation of the sentence.Bottom up parser

Grammar must NOT be right recursive as right most derivation is

(37)

Recursive Descent Parser

Is is a kind of top-down parser.

Buildup from a set of mutually recursive

procedures.

Each procedure reflects a production rule,

hence resulting program reflects CFG.

Implementation is specific to a CFG i.e. non

generic implementation.

It involves backtracking.

(38)

How backtrack works?

Consider input = id + id And the grammar: E T E’ | T

E’ + T E’ | Є T F T’ | F T’ * F T’ | Є F ( E ) | id

E E T E’ E T E’ F E T E’ F

( E )

E

T E’

id

And so on … Backtrack

(39)

Predictive Parser

With the help of next input symbol, this type of top down

parser can predict which production must be chosen to produce that input symbol.

No backtracking is involves.

Implementation may be recursive or non-recursive, hence

there exists:

Recursive predictive parser.

(40)

Recursive Predictive Parser

It is implemented with the help of mutually recursive procedures as in recursive descent parser but to avoid backtracking, at each step, we chose a recursive path

(41)
(42)

Non-Recursive Predictive Parser

Rather implement with the help of mutually recursive procedures, it uses a parsing table for prediction.

Parsing Table

It can predict which production of a non-terminal nt must be chosen

to produce next input symbol s.

For example consider a function which returns a production against

the pair of non-terminal and terminal. string PredictProduction(nt, s);

To create a parsing table we must know about the:

CFG

(43)

Non-Recursive Predictive Parser

First(x)

Set of terminals from which a non-terminal x can start.

Follow(x)

(44)

Non-Recursive Predictive Parser

CFG

E T E’

E’ + T E’ | Є T F T’

T’ * F T’ | Є F ( E ) | id

Parsing Table

SYMBOL FIRST FOLLOW

E ( id ) $ E' + Є ) $

T ( id + ) $ T' * Є + ) $

(45)
(46)

Shift Reduce Parser

Bottom up parserLR parser.

(47)

Shift Reduce Parser

MOVES

Shift j

Append an input symbol to symbols.Push j to stack.

Reduce j

Reduce N number of symbols and push their reduced form w.r.t.

production no. j of CFG..

Also pop N elements from stack.

Accept

Parser accept the input w.r.t. action table.

Reject

(48)

Shift Reduce Parser

Action-Goto Table

(49)

Moves of a Shift Reduce Parser

STACK INPUT ACTION

0 id * id + id $ S5

0 id 5 * id + id $ R6: F->id 0 F 3 * id + id $ R4: T->F

0 T 2 * id + id $ S7

0 T 2 * 7 id + id $ S5

0 T 2 * 7 id 5 + id $ R6: F->id 0 T 2 * 7 F 10 + id $ R3: T->T*F

0 T 2 + id $ R2: E->T

0 E 1 + id $ S6

0 E 1 + 6 id $ S5

0 E 1 + 6 id 5 $ R6: F->id

0 E 1 + 6 F 3 $ R4: T->F

0 E 1 + 6 T 9 $ R1: E->E+T

(50)

Shift Reduce Parser Algorithm

1. Push 0 to STACK.

2. Append $ to INPUT.

3. Set X:=GetAction(STACK, INPUT)

4. If X is shift action Sj Then:

a) Move a symbol from INPUT to STACK.

b) Push j to STACK.

Else If X is reduce action Rj Then:

a) Set N equal to no. of symbols of jth production rule.

b) Pop Nx2 elements from STACK.

c) Push producer of jth production on STACK.

d) Set Y:=GetGoto(STACK)

e) PUSH(Y)

Else If X is ‘Accepted’ Then:

a) Write ‘Accepted’ and go to step 6. Else:

a) Write ‘Rejected’ and go to step 6.

5. Go to step 3.

References

Related documents