Compiler construction - Lecture 17 - 18 week 8 - 9 - Majid mumtaz department of computer science, ciit wah
A parser should try to determine that an error
has occurred as soon as possible.
• A suitable and comprehensive message should
be reported. “Missing semicolon on line 36” is
helpful, “unable to shift in state 425” is not.
• After an error has occurred, the parser must
pick a reasonable place to resume the parse.
31 trang |
Chia sẻ: nguyenlam99 | Lượt xem: 870 | Lượt tải: 0
Bạn đang xem trước 20 trang tài liệu Compiler construction - Lecture 17 - 18 week 8 - 9 - Majid mumtaz department of computer science, ciit wah, để xem tài liệu hoàn chỉnh bạn click vào nút DOWNLOAD ở trên
Compiler Construction
Lecture 17-18 week 8-9
Majid Mumtaz
1
Department of Computer Science, CIIT Wah
Recursive-Descent Parsing (RDP)
• A general form of Top-Down Parsing.
• It may require backtracking.
• Grammar should have eliminated left recursion
and left factored.
• In general form it can’t choose an A-production
easily.
• So we need to try all alternatives.
• If one failed the input pointer needs to be reset
and another alternative should be tried.
2
Left Recursion
• A grammar is left recursive if it has a
nonterminal A such that there is a derivation
A +A for some string .
• Top-down parsing methods cannot handle
left-recursive grammars.
• So, a transformation that eliminates left
recursion is needed.
• Fortunately, in many cases it is possible to
eliminate left recursion
3
Eliminating Left Recursion – Example 1
• Consider the following grammar which contains
immediate left recursion
A A | where neither nor starts with A
• could be replaced by the non-left recursive
productions
A A
A A |
Where A’ is a new non-terminal.
This accepts the same language but uses only right
recursion
4
Left Recursion – Example 2
• Consider the following grammar, contains immediate
left recursion
E E + T | E – T | T
T T * F | T / F | F
F number | (E) | id
• could be replaced by the non-left recursive productions
E TE
E +TE | -TE’ |
T FT
T *FT | /FT’ |
F number | (E) | id
5
Left Recursion - Not Immediate
• What do you say about the following
grammar, if the input string is Sda.
S Aa | b
A Ac | Sd |
6
Eliminating Left Recursion
7
Eliminating Left Recursion - Example
S Aa | b
A Ac | Sd |
8
Top-Down Parsing
• A Top-down parser tries to create a parse tree
from the root towards the leafs scanning input
from left to right
• It can also be viewed as finding a leftmost
derivation for an input string
• Example: id+id*id
9
E
lm
E
T E’
lm
E
T E’
F T’
lm
E
T E’
F T’
id
lm
E
T E’
F T’
id Ɛ
lm
E
T E’
F T’
id Ɛ
+ T E’
E TE
E +TE |
T FT
T *FT |
F (E) | id
Top-Down Parsing (contd.)
• Consider the following grammar:
S AB
A aA |
B b | bB
• The input string is aaab.
S
AB S AB
aAB A aA
aaAB A aA
aaaAB A aA
aaaεB A ε
aaab B b
10
Top-Down Parsing (contd.)
• In previous example, it was easy for us to see
which productions were appropriate because
we could see the entire string aaab.
• In a compiler’s parser, however, we don’t have
long-distance vision.
• We are usually limited to just one-symbol of
lookahead (is the next symbol coming up in
the input).
11
Backtracking
• Parsing would required to implement
backtracking.
• Based on the information the parser currently
has about the input, a decision is made to go
with one particular production.
• If this choice leads to a dead end, the parser
would have to backtrack to that decision
point, moving backwards through the input,
and start again making a different choice.
12
Backtracking - Example
• Consider the following grammar:
S bab | bA
A d | cA
• The input string is bcd.
S bcd Try S bab
bab bcd match b
ab cd dead-end, backtrack
S bcd Try S –> bA
bA bcd match b
A cd Try A –> d
d cd dead-end, backtrack
A cd Try A –> cA
cA cd match c
A d Try A –> d
d d match d
Success!
13
Drawbacks in Recursive Descent
Parser
• In recursive descent,
– At each step, many choices of productions
are available and each one need to use until
exact match
– Backtracking used to undo bad choices
14
Solution of Recursive descent parser
• Predictive parser that used LL(1) grammer
– In LL(1) At each step, only one choice of production can be used e.g.
S aBb and B c then according to predictive parser the solution is
S acb
There is an issue arises when common prefixes are used in production rule e.g.
Recall the grammar
E T + E | T
T int | int * T | ( E )
• Hard to predict because
– For T two productions start with int
– For E it is not clear how to predict
• We need to left-factor the grammar
15
Left factoring
• Left factoring is a grammar transformation
that is useful for producing a grammar
suitable for predictive or top-down parsing.
• Consider following grammar:
stmt -> if expr then stmt else stmt
| if expr then stmt
• On seeing input if it is not clear for the parser
which production to use
Left factoring (cont.)
• In general, if A 1 | 2 are two A-
productions, and the input begins with a
nonempty string derived from , we do not know
whether to expand A to 1 or to 2.
• However we may defer the decision by expanding
A to A.
• Then, after seeing the input derived from , we
expand A to 1 or to 2. That is left factoring.
• The original production will become:
A A
A 1 | 2
17
Left factoring (cont.)
• The following grammar abstracts the dangling-
else problem:
S iEtS | iEtSeS | a
E b
• and the left factored grammar version is:
S iEtSS | a
S eS |
E b
Predictive Parsing
• Basic idea:
– Given A | , the parser should be able to choose between or
• FIRST Sets
– Definition: For any string α of grammar symbols, we define FIRST(α) to
be the set of terminals that occur as the first symbol in a string derived
from α. So, if α⇒*xQ for x a terminal and Q a string, then x is in
FIRST(α). In addition if α⇒*ε, then ε is in FIRST(α).
• FOLLOW Sets
– Definition: For any non-terminal A, FOLLOW(A) is the set of terminals
x, that can appear immediately to the right of A in a sentential form.
Formally, it is the set of terminals x, such that S⇒*αAxβ. In addition, if
A can be the rightmost symbol in a sentential form, the endmarker $ is
in FOLLOW(A).
19
Predictive Parsing
• It uses LL(1) grammar
– parser scans the input Left-to-right, generates a Leftmost
derivation and uses 1 symbol of lookahead.
– LL(1) has no left-recursive productions and has been left-
factored.
• It doesn’t require backtracking.
• There exist many grammars that cannot be modified to
become LL(1).
• In such cases, another parsing technique must be
employed, or special rules must be embedded into the
predictive parser.
• How can we avoid backtracking? Preprocessing.
20
Aided for Predictive Parser
• FIRST and FOLLOW are used in construction of
a predictive parser.
• These functions used to find out the set of
terminals that help in parsing.
• These functions fills up the entries in
predictive parsing table whenever possible.
21
Rules for FIRST Sets Calculation
1. If X is a terminal then First(X) is just {X}
2. If there is a Production X → ε then add ε to
first(X)
3. If there is a Production X → Y1Y2..Yk then add
first(Y1Y2..Yk) to first(X)
4. First(Y1Y2..Yk) is either
– First(Y1) (if First(Y1) doesn't contain ε)
– OR (if First(Y1) does contain ε) then First (Y1Y2..Yk) is
everything in First(Y1) as well as
everything in First(Y2..Yk)
22
Computing FIRST – Example 1
• Consider the following grammar:
S AB
A Ca |
B cB
B aACB |
C b |
23
FIRST(C) = {b ε}
FIRST(B‘) = {a ε}
FIRST(B) = {c}
FIRST(A) = {b a }
FIRST(S) = {b a c}
Computing FIRST – Example 2
• Consider the following grammar:
E TE
E +TE |
T FT
T *FT |
F (E) | id
24
FIRST(E)
FIRST(T)
FIRST(F)
{( id}
FIRST(E)
{+ }
FIRST(T)
{* }
Rules for Computing FOLLOW Sets
• To compute FOLLOW(A) for all nonterminals A, apply
following rules until nothing can be added to any FOLLOW
set:
1. First put $ (the end of input marker) in Follow(S) (S is the start
symbol)
2. If there is a production A → aBβ, (where a can be a whole
string) then everything in FIRST(β) except for ε is placed in
FOLLOW(B).
3. If there is a production A → aB, then everything in FOLLOW(A)
is in FOLLOW(B)
4. If there is a production A → aBβ, where FIRST(β) contains
ε, then everything in FOLLOW(A) is in FOLLOW(B)
Note: are any strings of grammar symbols
25
Computing FOLLOW – Example 1
• Consider the following grammar:
S AB
A Ca |
B cB
B aACB |
C b |
26
FOLLOW(S) = {$}
FOLLOW(B) = {$}
FOLLOW(B) = {$}
FOLLOW(C) = {a $}
FOLLOW(A) = {c b a $}
Computing FOLLOW – Example 2
• Consider the following grammar:
E TE
E +TE |
T FT
T *FT |
F (E) | id
27
FOLLOW(E)
FOLLOW(E)
{) $}
FOLLOW(T)
FIRST(E)
{+ ) $}
FOLLOW(F)
{+ * ) $}
Predictive Parsing Table
• Predictive Parsing Table is a 2D array M[A,a] where A
is a nonterminal symbol and a is a terminal or $.
• It uses LL(1) grammar.
• One dimension for current non-terminal to expand.
• One dimension for next token.
• Table entry contains one production.
28
Non-
Terminals
INPUT SYMBOLS
a b e i t $
S
S
E
Algorithm to Construct Predictive
Parsing Table
• For each production A of the grammar,
do:
• For each terminal a in FIRST() add A to
M[A, a].
• If ∈ FIRST(), add A to M[A,b] for each
terminal b in FOLLOW(A).
• If ∈ FIRST(), and $ is in FOLLOW(A), add A
to M[A,$].
• Make each undefined entry of M be ERROR.
29
Parsing Table - Example
• Consider the following grammar:
E TE FIRST(E) = { ( id } FOLLOW(E) = { ) $ }
E +TE | FIRST(E) = { + } FOLLOW(E) = { ) $ }
T FT FIRST(T) = { ( id } FOLLOW(T) = { + ) $ }
T *FT | FIRST(T) = { * } FOLLOW(T) = { + ) $ }
F (E) | id FIRST(F) = { ( id } FOLLOW(F) = { + * ) $ }
Non-
Termi
nals
INPUT SYMBOLS
id + * ( ) $
E E TE E TE
E E +TE E E
T T FT T FT
T T T *FT T T
F F id F (E)
Error Detection and Recovery
• A parser should try to determine that an error
has occurred as soon as possible.
• A suitable and comprehensive message should
be reported. “Missing semicolon on line 36” is
helpful, “unable to shift in state 425” is not.
• After an error has occurred, the parser must
pick a reasonable place to resume the parse.
• A parser should avoid cascading errors.
31
Các file đính kèm theo tài liệu này:
- cc_week8_9_lec17_18_5413.pdf