Compiler construction - Lecture 17 - 18 week 8 - 9 - Majid mumtaz department of computer science, ciit wah

A parser should try to determine that an error has occurred as soon as possible. • A suitable and comprehensive message should be reported. “Missing semicolon on line 36” is helpful, “unable to shift in state 425” is not. • After an error has occurred, the parser must pick a reasonable place to resume the parse.

pdf31 trang | Chia sẻ: nguyenlam99 | Lượt xem: 906 | Lượt tải: 0download
Bạn đang xem trước 20 trang tài liệu Compiler construction - Lecture 17 - 18 week 8 - 9 - Majid mumtaz department of computer science, ciit wah, để xem tài liệu hoàn chỉnh bạn click vào nút DOWNLOAD ở trên
Compiler Construction Lecture 17-18 week 8-9 Majid Mumtaz 1 Department of Computer Science, CIIT Wah Recursive-Descent Parsing (RDP) • A general form of Top-Down Parsing. • It may require backtracking. • Grammar should have eliminated left recursion and left factored. • In general form it can’t choose an A-production easily. • So we need to try all alternatives. • If one failed the input pointer needs to be reset and another alternative should be tried. 2 Left Recursion • A grammar is left recursive if it has a nonterminal A such that there is a derivation A +A for some string . • Top-down parsing methods cannot handle left-recursive grammars. • So, a transformation that eliminates left recursion is needed. • Fortunately, in many cases it is possible to eliminate left recursion 3  Eliminating Left Recursion – Example 1 • Consider the following grammar which contains immediate left recursion A  A |  where neither  nor  starts with A • could be replaced by the non-left recursive productions A A A  A |  Where A’ is a new non-terminal. This accepts the same language but uses only right recursion 4 Left Recursion – Example 2 • Consider the following grammar, contains immediate left recursion E  E + T | E – T | T T  T * F | T / F | F F  number | (E) | id • could be replaced by the non-left recursive productions E  TE E  +TE | -TE’ |  T  FT T  *FT | /FT’ |  F  number | (E) | id 5 Left Recursion - Not Immediate • What do you say about the following grammar, if the input string is Sda. S  Aa | b A  Ac | Sd |  6 Eliminating Left Recursion 7 Eliminating Left Recursion - Example S  Aa | b A  Ac | Sd |  8 Top-Down Parsing • A Top-down parser tries to create a parse tree from the root towards the leafs scanning input from left to right • It can also be viewed as finding a leftmost derivation for an input string • Example: id+id*id 9 E lm E T E’ lm E T E’ F T’ lm E T E’ F T’ id lm E T E’ F T’ id Ɛ lm E T E’ F T’ id Ɛ + T E’ E  TE E  +TE |  T  FT T  *FT |  F  (E) | id Top-Down Parsing (contd.) • Consider the following grammar: S  AB A  aA |  B  b | bB • The input string is aaab. S  AB S  AB  aAB A  aA  aaAB A  aA  aaaAB A  aA  aaaεB A  ε  aaab B  b 10 Top-Down Parsing (contd.) • In previous example, it was easy for us to see which productions were appropriate because we could see the entire string aaab. • In a compiler’s parser, however, we don’t have long-distance vision. • We are usually limited to just one-symbol of lookahead (is the next symbol coming up in the input). 11 Backtracking • Parsing would required to implement backtracking. • Based on the information the parser currently has about the input, a decision is made to go with one particular production. • If this choice leads to a dead end, the parser would have to backtrack to that decision point, moving backwards through the input, and start again making a different choice. 12 Backtracking - Example • Consider the following grammar: S  bab | bA A  d | cA • The input string is bcd. S bcd Try S  bab bab bcd match b ab cd dead-end, backtrack S bcd Try S –> bA bA bcd match b A cd Try A –> d d cd dead-end, backtrack A cd Try A –> cA cA cd match c A d Try A –> d d d match d Success! 13 Drawbacks in Recursive Descent Parser • In recursive descent, – At each step, many choices of productions are available and each one need to use until exact match – Backtracking used to undo bad choices 14 Solution of Recursive descent parser • Predictive parser that used LL(1) grammer – In LL(1) At each step, only one choice of production can be used e.g. S aBb and B  c then according to predictive parser the solution is S  acb There is an issue arises when common prefixes are used in production rule e.g. Recall the grammar E  T + E | T T  int | int * T | ( E ) • Hard to predict because – For T two productions start with int – For E it is not clear how to predict • We need to left-factor the grammar 15 Left factoring • Left factoring is a grammar transformation that is useful for producing a grammar suitable for predictive or top-down parsing. • Consider following grammar: stmt -> if expr then stmt else stmt | if expr then stmt • On seeing input if it is not clear for the parser which production to use Left factoring (cont.) • In general, if A 1 | 2 are two A- productions, and the input begins with a nonempty string derived from , we do not know whether to expand A to 1 or to 2. • However we may defer the decision by expanding A to A. • Then, after seeing the input derived from , we expand A to 1 or to 2. That is left factoring. • The original production will become: A A A  1 | 2 17 Left factoring (cont.) • The following grammar abstracts the dangling- else problem: S  iEtS | iEtSeS | a E  b • and the left factored grammar version is: S  iEtSS | a S  eS |  E  b Predictive Parsing • Basic idea: – Given A  |  , the parser should be able to choose between  or  • FIRST Sets – Definition: For any string α of grammar symbols, we define FIRST(α) to be the set of terminals that occur as the first symbol in a string derived from α. So, if α⇒*xQ for x a terminal and Q a string, then x is in FIRST(α). In addition if α⇒*ε, then ε is in FIRST(α). • FOLLOW Sets – Definition: For any non-terminal A, FOLLOW(A) is the set of terminals x, that can appear immediately to the right of A in a sentential form. Formally, it is the set of terminals x, such that S⇒*αAxβ. In addition, if A can be the rightmost symbol in a sentential form, the endmarker $ is in FOLLOW(A). 19 Predictive Parsing • It uses LL(1) grammar – parser scans the input Left-to-right, generates a Leftmost derivation and uses 1 symbol of lookahead. – LL(1) has no left-recursive productions and has been left- factored. • It doesn’t require backtracking. • There exist many grammars that cannot be modified to become LL(1). • In such cases, another parsing technique must be employed, or special rules must be embedded into the predictive parser. • How can we avoid backtracking? Preprocessing. 20 Aided for Predictive Parser • FIRST and FOLLOW are used in construction of a predictive parser. • These functions used to find out the set of terminals that help in parsing. • These functions fills up the entries in predictive parsing table whenever possible. 21 Rules for FIRST Sets Calculation 1. If X is a terminal then First(X) is just {X} 2. If there is a Production X → ε then add ε to first(X) 3. If there is a Production X → Y1Y2..Yk then add first(Y1Y2..Yk) to first(X) 4. First(Y1Y2..Yk) is either – First(Y1) (if First(Y1) doesn't contain ε) – OR (if First(Y1) does contain ε) then First (Y1Y2..Yk) is everything in First(Y1) as well as everything in First(Y2..Yk) 22 Computing FIRST – Example 1 • Consider the following grammar: S  AB A  Ca |  B  cB B  aACB |  C  b |  23 FIRST(C) = {b ε} FIRST(B‘) = {a ε} FIRST(B) = {c} FIRST(A) = {b a } FIRST(S) = {b a c} Computing FIRST – Example 2 • Consider the following grammar: E  TE E  +TE |  T  FT T  *FT |  F  (E) | id 24 FIRST(E) FIRST(T) FIRST(F) {( id} FIRST(E) {+ } FIRST(T) {* } Rules for Computing FOLLOW Sets • To compute FOLLOW(A) for all nonterminals A, apply following rules until nothing can be added to any FOLLOW set: 1. First put $ (the end of input marker) in Follow(S) (S is the start symbol) 2. If there is a production A → aBβ, (where a can be a whole string) then everything in FIRST(β) except for ε is placed in FOLLOW(B). 3. If there is a production A → aB, then everything in FOLLOW(A) is in FOLLOW(B) 4. If there is a production A → aBβ, where FIRST(β) contains ε, then everything in FOLLOW(A) is in FOLLOW(B) Note:  are any strings of grammar symbols 25 Computing FOLLOW – Example 1 • Consider the following grammar: S  AB A  Ca |  B  cB B  aACB |  C  b |  26 FOLLOW(S) = {$} FOLLOW(B) = {$} FOLLOW(B) = {$} FOLLOW(C) = {a $} FOLLOW(A) = {c b a $} Computing FOLLOW – Example 2 • Consider the following grammar: E  TE E  +TE |  T  FT T  *FT |  F  (E) | id 27 FOLLOW(E) FOLLOW(E) {) $} FOLLOW(T) FIRST(E) {+ ) $} FOLLOW(F) {+ * ) $} Predictive Parsing Table • Predictive Parsing Table is a 2D array M[A,a] where A is a nonterminal symbol and a is a terminal or $. • It uses LL(1) grammar. • One dimension for current non-terminal to expand. • One dimension for next token. • Table entry contains one production. 28 Non- Terminals INPUT SYMBOLS a b e i t $ S S E Algorithm to Construct Predictive Parsing Table • For each production A  of the grammar, do: • For each terminal a in FIRST() add A  to M[A, a]. • If  ∈ FIRST(), add A  to M[A,b] for each terminal b in FOLLOW(A). • If  ∈ FIRST(), and $ is in FOLLOW(A), add A  to M[A,$]. • Make each undefined entry of M be ERROR. 29 Parsing Table - Example • Consider the following grammar: E  TE FIRST(E) = { ( id } FOLLOW(E) = { ) $ } E  +TE |  FIRST(E) = { +  } FOLLOW(E) = { ) $ } T  FT FIRST(T) = { ( id } FOLLOW(T) = { + ) $ } T  *FT |  FIRST(T) = { *  } FOLLOW(T) = { + ) $ } F  (E) | id FIRST(F) = { ( id } FOLLOW(F) = { + * ) $ } Non- Termi nals INPUT SYMBOLS id + * ( ) $ E E  TE E  TE E E  +TE E   E   T T  FT T  FT T T   T  *FT T   T   F F  id F  (E) Error Detection and Recovery • A parser should try to determine that an error has occurred as soon as possible. • A suitable and comprehensive message should be reported. “Missing semicolon on line 36” is helpful, “unable to shift in state 425” is not. • After an error has occurred, the parser must pick a reasonable place to resume the parse. • A parser should avoid cascading errors. 31

Các file đính kèm theo tài liệu này:

  • pdfcc_week8_9_lec17_18_5413.pdf
Tài liệu liên quan