Role of a Parser
• Obtains stream of tokens from the lexical analyzer.
• Verifies that the given token sequence can be generated
according to the grammar.
• Reports any syntax error in an intelligible fashion.
• Recovers from commonly occurring errors and continue
processing the remaining of the input.
Lexical Analyzer
Source Program Stream of Tokens Syntax
Analyzer
Context Free Grammar
• Context-free grammar (CFG) is a formal grammar in
which every production rule is of the form: producer → production
Where producer is a single nonterminal symbol, and production is a string of terminals and/or non-terminals (or can be empty).
Basic elements of a CFG:
• Terminals
• Non-terminals • Start Symbol
Basic Elements of a CFG
• Terminals
• Basically they are tokens.
• They can not produce something that’s why we call them terminals. • In a parse/syntax tree, leaf nodes represent terminals.
• While writing a CFG, they can be written in bold face, but as far as white-board is concerned, we
will write them using CAPITAL face.
• Non-Terminals
• They are symbols other than terminals.
• Non-leaf nodes of a parse tree represent them.
• They can produce more sequence of terminals and/or non-terminals. • Left hand side of the CFG contains non-terminals.
• They can be represented as italic face. But we will represent them on white-board using small face.
• Start Symbol
• Non-terminal from which CFG starts.
Basic Elements of a CFG
• Set of Productions
• A production can be written in the form like:
producer production
where,
producer is a non-terminal which can produce or derived to a production;
production may contain sequence of terminals or non-terminals; or it can be simply empty.
Example: (Grammar 1.1)
dec_stmt data_type id ;
lm
Derivations
• With the help of derivations we can check the syntax of tokens streams with
respect to CFG.
• Derivations can be of two types:
• Left most derivation
• Drives from left to right each symbol at a time.
• Can be represented as lm on the arrow.
• Right most derivation
• Drives from right to left each symbol at a time.
• Can be represented as rm on the arrow.
• Example:
• Check whether “int id ;” can be verifies by the grammar 1.1. decl_stmt => data_type id ; => int id ;
Verified. Hence there is no syntax error.
• Check whether “char id, id;” can be verifies by the grammar 1.1. decl_stmt => data_type id ; => char id ;
Cannot be produced by CFG, hence syntax error.
Examples
1. DECLARATION STATEMENT
dec_stmt data_type id_lst ;
data_type int | float | char
id_lst id id_lst | id
What will be the start symbol, terminals and non-terminals of the above grammar?
Is the above id_lst is same as the following?
id_lst id_lst id | id
But what if we want a comma delimited id list?
Examples
2. FUNCTION CALL STATEMENT
fcall_stmt call id ( param_lst ) ;
param_lst param , param_lst | param param id | literal | num
Why the call is in bold face ?
Examples
fcall_stmt call id ( param_lst_opt ) ;
param_lst_opt param_lst | €
Examples
3. EXPRESSIONS
expr expr op expr | ( expr ) | - expr | id
op + | - | * | /
or we can write the above grammar as follows:
expr expr op expr expr ( expr )
expr - expr expr id
op + | - | * | /
Can we drive the above grammar to the following?
Parse Tree
• Syntax tree or parse tree represents the syntactic
structure of a string according to some formal grammar.
• In a parse tree, the interior nodes are labeled by
Syntax Error Handling
Common programming errors can occur at many different levels.
• Lexical
• Examples includes a lexeme whose token cannot be generated, unclosed literals.
• Syntactic
• Examples include misplaced semicolons or extra or missing braces; that is, '((" or
")." As another example, in C or Java, the appearance of a case statement without an enclosing switch is a syntactic error.
• Semantic
• Include type mismatches between operators and operands.
• An example is a return statement in a Java method with result type void.
• A few semantic errors, such as type mismatches, can also be detected efficiently;
however, accurate detection of semantic and logical errors at compile time is in general a difficult task.
• Logical
• Can be anything from incorrect reasoning on the part of the programmer to the use
Error Handler
The error handler in a parser has goals that are simple to state but challenging to realize:
• Report the presence of errors clearly and accurately. • Recover from each error quickly enough to detect
subsequent errors.
• Add minimal overhead to the processing of correct
Error-Recovery Strategies
Once an error is detected, how should the parser recover?
• The simplest approach is for the parser to quit with an informative
error message when it detects the first error. Additional errors are often uncovered if the parser can restore itself to a state where processing of the input can continue with reasonable hopes that the further processing will provide meaningful diagnostic
Panic-Mode Recovery
• With this method, on discovering an error, the parser
discards input symbols one at a time until one of a designated set of synchronizing tokens is found.
• The synchronizing tokens are usually delimiters, such as
semicolon or }, whose role in the source program is clear and unambiguous.
• The compiler designer must select the synchronizing
tokens appropriate for the source language. While panic-mode correction often skips a considerable amount of input without checking it for additional errors, it has the
Phrase-Level Recovery
• On discovering an error, a parser may perform local
correction on the remaining input; that is, it may replace a prefix of the remaining input by some string that allows the parser to continue.
• A typical local correction is to replace a comma by a
semicolon, delete an extraneous semicolon, or insert a missing semicolon.
• The choice of the local correction is left to the compiler
designer.
• Its major drawback is the difficulty it has in situations
Error Productions
• By anticipating common errors that might be encountered,
we can augment the grammar for the language at hand with productions that generate the erroneous constructs.
• The parser can then generate appropriate error
Global Correction
• Ideally, we would like a compiler to make as few changes
as possible in processing an incorrect input string.
• There are algorithms for choosing a minimal sequence of
changes to obtain a globally least-cost correction.
• Given an incorrect input string x and grammar G, these
algorithms will find a parse tree for a related string y, such that the number of insertions, deletions, and changes of tokens required to transform x into y is as small as
possible.
• Unfortunately, these methods are in general too costly to
Context-free Grammar
Example: If statement:stmt if ( expr ) stmt else stmt
Context-free Grammar
Example: Arithmetic Expressionexpr expr op expr
expr ( expr )
expr - expr
expr id
op +
op
-op *
op /
expr expr op expr
| ( expr ) | - expr | id
op + | - | * | /
We may write in following both ways
We also use shorthand
E E A E
| ( E ) | - E | id
Left-most and Right-most derivations
Building Parse Tree
Sentence: “- ( id + id )”
E
Þ - E
Þ - ( E )
Þ - ( E + E )
Þ - ( id + E )
Building Parse Tree
E E
- E
E
- E
( E )
E
- E
( E )
E + E
E
- E
( E )
E + E
id
E
- E
( E )
E + E
Ambiguity
• Grammar that produce more than one parse tree for a
sentence with the same scheme of derivation.
• Ambiguous grammar is one that produces more than one
Ambiguity Example
Left-most derivation - I
E
Þ E + E
Þ id + E
Þ id + E * E
Þ id + id * E
Þ id + id * id
E E + E
| E * E
| ( E ) | - E | id
Sentence: “id + id * id”
Left-most derivation - II
E
Þ E * E
Þ E + E * E
Þ id + E * E
Þ id + id * E
Ambiguity
QUIZ#3
You are given the following grammar:
Also the following sentence:
Prove that the grammar is ambiguous by using:
•Left-most derivations
•Drawing parse trees
E E + E
| E * E
| ( E ) | - E | id
Ambiguity
Grammar of an if statement with optional else:
stmt if expr then stmt
| if expr then stmt else stmt I other
Suppose E1, E2 are expressions and S1 and S2 are
statements. What will be the parse tree for the following sentence?
Two Parse trees exists
Sentence: “if El then if E2 then S1 else S2”
stmt
if expr then stmt
if expr then stmt else stmt E1
E2 S1 S2
stmt
if expr then stmt else stmt
S2
if expr then stmt
E1
Left Factoring (Removing Ambiguity)
stmt if expr then stmt
| if expr then stmt else stmt I other
stmt if expr then stmt stmt’ | other
Left Recursion
If we have a grammar in the form of: A A α | β
Then,
A β A’
A’ α A’ | Є
Here α and β are terminals.
Left Recursion
Consider the following grammar: E E + T | E – T | T
T T * F | T / F | F F ( E ) | id
Here,
E is expression.
T is Term which can be added or subtracted. F is factor which can be multiplied or divided.
Above grammar satisfies operator precedence:
• In parse tree, * and / will be placed at lower levels compare to + or - hence will be evaluated earlier.
Removing Left Recursion
Consider the following grammar:E E + T | T T T
*
F | F F ( E ) | idE T E’
E’ + T E’ | Є
T F T’
T’ * F T’ | Є
Top down parser
•
Top-down parsing can be viewed as the problem
of constructing a parse tree for the input string,
starting from the root and creating the nodes of
the parse tree in
preorder
i.e. Depth first.
•
Equivalently, top-down parsing can be viewed as
finding a
leftmost derivation
for an input string.
•
How grammar should be?
• Left Factored
Bottom up Parser
• A bottom-up parse corresponds to the construction of a
parse tree for an input string beginning at the leaves (the bottom) and working up towards the root (the top).
• In most of the cases right most derivation is used in
bottom up parsing.
• Is basically construct a derivation in reverse.
Reductions
• We can think of bottom-up parsing as the process of "reducing" a
string w to the start symbol of the grammar.
• At each reduction step, a specific substring matching the body of a
Example
Reduction
id * id F * id T * id T * F T
E
E E + T | T T T
*
F | F F ( E ) | idInput Sentence:
id * id
Right-most derivation
E T
Parser classes
• LL parser
• Scan input symbols from Left to right.
• Construct Left most derivation of the sentence. • Top down parser
• Grammar must NOT be left recursive as left most derivation is
performed.
• LR parser
• Scan input symbols from Left to right.
• Construct Right most derivation of the sentence. • Bottom up parser
• Grammar must NOT be right recursive as right most derivation is
Recursive Descent Parser
•
Is is a kind of top-down parser.
•
Buildup from a set of mutually recursive
procedures.
•
Each procedure reflects a production rule,
hence resulting program reflects CFG.
•
Implementation is specific to a CFG i.e. non
generic implementation.
•
It involves backtracking.
How backtrack works?
• Consider input = id + id And the grammar: E T E’ | T
E’ + T E’ | Є T F T’ | F T’ * F T’ | Є F ( E ) | id
E E T E’ E T E’ F E T E’ F
( E )
E
T E’
id
And so on … Backtrack
Predictive Parser
• With the help of next input symbol, this type of top down
parser can predict which production must be chosen to produce that input symbol.
• No backtracking is involves.
• Implementation may be recursive or non-recursive, hence
there exists:
• Recursive predictive parser.
Recursive Predictive Parser
It is implemented with the help of mutually recursive procedures as in recursive descent parser but to avoid backtracking, at each step, we chose a recursive path
Non-Recursive Predictive Parser
Rather implement with the help of mutually recursive procedures, it uses a parsing table for prediction.Parsing Table
• It can predict which production of a non-terminal nt must be chosen
to produce next input symbol s.
• For example consider a function which returns a production against
the pair of non-terminal and terminal. string PredictProduction(nt, s);
• To create a parsing table we must know about the:
• CFG
Non-Recursive Predictive Parser
First(x)
• Set of terminals from which a non-terminal x can start.
Follow(x)
Non-Recursive Predictive Parser
CFG
E T E’
E’ + T E’ | Є T F T’
T’ * F T’ | Є F ( E ) | id
Parsing Table
SYMBOL FIRST FOLLOW
E ( id ) $ E' + Є ) $
T ( id + ) $ T' * Є + ) $
Shift Reduce Parser
• Bottom up parser • LR parser.
Shift Reduce Parser
MOVES
• Shift j
• Append an input symbol to symbols. • Push j to stack.
• Reduce j
• Reduce N number of symbols and push their reduced form w.r.t.
production no. j of CFG..
• Also pop N elements from stack.
• Accept
• Parser accept the input w.r.t. action table.
• Reject
Shift Reduce Parser
Action-Goto Table
Moves of a Shift Reduce Parser
STACK INPUT ACTION
0 id * id + id $ S5
0 id 5 * id + id $ R6: F->id 0 F 3 * id + id $ R4: T->F
0 T 2 * id + id $ S7
0 T 2 * 7 id + id $ S5
0 T 2 * 7 id 5 + id $ R6: F->id 0 T 2 * 7 F 10 + id $ R3: T->T*F
0 T 2 + id $ R2: E->T
0 E 1 + id $ S6
0 E 1 + 6 id $ S5
0 E 1 + 6 id 5 $ R6: F->id
0 E 1 + 6 F 3 $ R4: T->F
0 E 1 + 6 T 9 $ R1: E->E+T
Shift Reduce Parser Algorithm
1. Push 0 to STACK.
2. Append $ to INPUT.
3. Set X:=GetAction(STACK, INPUT)
4. If X is shift action Sj Then:
a) Move a symbol from INPUT to STACK.
b) Push j to STACK.
Else If X is reduce action Rj Then:
a) Set N equal to no. of symbols of jth production rule.
b) Pop Nx2 elements from STACK.
c) Push producer of jth production on STACK.
d) Set Y:=GetGoto(STACK)
e) PUSH(Y)
Else If X is ‘Accepted’ Then:
a) Write ‘Accepted’ and go to step 6. Else:
a) Write ‘Rejected’ and go to step 6.
5. Go to step 3.