CSE411

CSE411: Introduction to Compilers

Predictive Parsing

Jooyong Yi (UNIST)

2024 Spring
CSE411

Recursive Descent Parsing

// Grammar G
E: E + E | E * E | (E) | int
  • Problem?
2024 Spring
CSE411

Ambiguous Grammar (Precedence)

E: E + E | E * E | (E) | int
  • The parse tree of int + int * int?
              E                        E
            / | \                    / | \
           E  *  E       OR         E  +  E
          /|\   |                   |    /|\
         E + E  int                int  E * E
         |   |                          |   |
        int  int                       int  int
2024 Spring
CSE411

Disambiguating Ambiguous Grammar

E: E + E | E * E | (E) | int

==>

E: E + E | T
T: T * T | F
F: (E) | int
              E                         E
            / | \                     / | \
           E  +  E       ==>         E  +  E
           |    /|\                        |
          int  E * E                       T
               |   |                      /|\    
              int  int                   T * T
                                         |   |
                                         F   F
                                         |   |
                                        int int
2024 Spring
CSE411

Ambiguous Grammar (Associativity)

E: E + E | T
T: T * T | F
F: (E) | int

Consider: int + int + int

         E                             E
     /   |  \                       /  |  \
    E    +   E        OR           E   +   E 
 /  |  \     |                     |     / | \
E   +   E   int                   int   E  +  E
|       |                               |     |
int    int                             int    int
2024 Spring
CSE411

Disambiguating Ambiguous Grammar

E: E + E | T           E: E + T | T
T: T * T | F   ==>     T: T * F | F
F: (E) | int           F: (E) | int
           E                         E               
       /   |  \                  /   |  \
      E    +   E     ==>        E    +   T
   /  |  \     |             /  |  \     |
  E   +   E   int           E   +  T     F
  |       |                 |      |     |
 int     int                T      F     int
                            |      |
                            F     int
                            |
                           int
2024 Spring
CSE411

Elimination of Immediate Left Recursion

  • is a nonterminal (i.e., ).
  • and do not start with .
2024 Spring
CSE411

Elimination of Immediate Left Recursion

E: E + T | T        E : T E'
T: T * F | F   ==>  E': + T E' | ε
F: (E) | int        T : F T'
                    T': * F T' | ε
                    F: (E) | int         

2024 Spring
CSE411

Recursive Descent Parsing

E : T E'
E': + T E' | ε
T : F T'
T': * F T' | ε
F: int | (E)                                                                                                                
val tok = ref (getToken())
fun advance() = tok := getToken()
fun eat(t) = if t = !tok then tok := advance() else error()

fun E() = (T(); E'())
and E'() = case !tok (* lookahead token *)
           of PLUS => (eat(PLUS); T(); E'())
            | ? => () (* do nothing *)
and T() = (F(); T'())
and T'() = case !tok (* lookahead token *)
           of TIMES => (eat(TIMES); F(); T'())
            | ? => () (* do nothing *)
and F() = case !tok (* lookahead token *)
          of INT => eat(INT)
           | LPAREN => (eat(LPAREN); E(); eat(RPAREN))                                                             
  • Parse (int + int) * int
2024 Spring
CSE411

Without Lookahead

fun E() = (T(); E'())
and E'() = (eat(PLUS); T(); E'())
          | () (* do nothing *)
and T() = (F(); T'())
and T'() = (eat(TIMES); F(); T'())
           | () (* do nothing *)
and F() = eat(INT)
          | (eat(LPAREN); E(); eat(RPAREN))                                                             
  • Parse (int + int) * int
2024 Spring
CSE411

Illegal Input

E : T E'
E': + T E' | ε
T : F T'
T': * F T' | ε
F: int | (E)                                                                                                                
val tok = ref (getToken())
fun advance() = tok := getToken()
fun eat(t) = if t = !tok then tok := advance() else error()

fun E() = (T(); E'())
and E'() = case !tok (* lookahead token *)
           of PLUS => (eat(PLUS); T(); E'())
            | ? => () (* do nothing *)
and T() = (F(); T'())
and T'() = case !tok (* lookahead token *)
           of TIMES => (eat(TIMES); F(); T'())
            | ? => () (* do nothing *)
and F() = case !tok (* lookahead token *)
          of INT => eat(INT)
           | LPAREN => (eat(LPAREN); E(); eat(RPAREN))                                                             
  • Parse (int +* int) * int
2024 Spring
CSE411

LL(1) Parsing Table

E : T E'
E': + T E' | ε
T : F T'
T': * F T' | ε
F: int | (E)                                                                                                                
-------------------------------------------------
   |  int  |  +     |   *   |  (    |  )  |  $  |
-------------------------------------------------   
E  |  T E' |        |       | T E'  |     |     |
E' |       | + T E' |       |       |  ε  |  ε  |
T  |  F T' |        |       | F T'  |     |     |
T' |       |  ε     | * F T'|       |  ε  |  ε  |
F  |  int  |        |       |  (E)  |     |     |
-------------------------------------------------   
  • Parse (int +* int) * int
2024 Spring
CSE411

How to construct the LL(1) parsing table?

2024 Spring
CSE411

LL(1) Parsing Table Construction

E : T E'
E': + T E' | ε
T : F T'
T': * F T' | ε
F: int | (E)                                                                                                                
-------------------------------------------------     -------------------------------------------------
   |  int  |  +     |   *   |  (    |  )  |  $  |        |  int  |  +     |   *   |  (    |  )  |  $  |
-------------------------------------------------     -------------------------------------------------   
E  |  T E' |        |       | T E'  |     |     |     E  |       |        |       |       |     |     |
E' |       | + T E' |       |       |  ε  |  ε  |     E' |       |        |       |       |     |     |
T  |  F T' |        |       | F T'  |     |     |     T  |       |        |       |       |     |     |
T' |       |  ε     | * F T'|       |  ε  |  ε  |     T' |       |        |       |       |     |     |
F  |  int  |        |       |  (E)  |     |     |     F  |       |        |       |       |     |     |
-------------------------------------------------     -------------------------------------------------   
2024 Spring
CSE411

LL(1) Parsing Table Construction

E : T E'
E': + T E' | ε
T : F T'
T': * F T' | ε
F: int | (E)                                                                                                                
-------------------------------------------------     -------------------------------------------------
   |  int  |  +     |   *   |  (    |  )  |  $  |        |  int  |  +     |   *   |  (    |  )  |  $  |
-------------------------------------------------     -------------------------------------------------   
E  |  T E' |        |       | T E'  |     |     |     E  |       |        |       |       |     |     |
E' |       | + T E' |       |       |  ε  |  ε  |     E' |       |        |       |       |     |     |
T  |  F T' |        |       | F T'  |     |     |     T  |       |        |       |       |     |     |
T' |       |  ε     | * F T'|       |  ε  |  ε  |     T' |       |        |       |       |     |     |
F  |  int  |        |       |  (E)  |     |     |     F  |       |        |       |       |     |     |
-------------------------------------------------     -------------------------------------------------   
  • What terminal can T E' start with?
2024 Spring
CSE411

FIRST(T E')

E : T E'
E': + T E' | ε
T : F T'
T': * F T' | ε
F: int | (E)                                                                                                                
  • FIRST(T E') = FIRST(T) = FIRST(F T') = FIRST(F) = {int, (}
2024 Spring
CSE411

LL(1) Parsing Table Construction

E : T E'
E': + T E' | ε
T : F T'
T': * F T' | ε
F: int | (E)                                                                                                                
-------------------------------------------------     -------------------------------------------------
   |  int  |  +     |   *   |  (    |  )  |  $  |        |  int  |  +     |   *   |  (    |  )  |  $  |
-------------------------------------------------     -------------------------------------------------   
E  |  T E' |        |       | T E'  |     |     |     E  |  T E' |        |       |  T E' |     |     |
E' |       | + T E' |       |       |  ε  |  ε  |     E' |       |        |       |       |     |     |
T  |  F T' |        |       | F T'  |     |     |     T  |       |        |       |       |     |     |
T' |       |  ε     | * F T'|       |  ε  |  ε  |     T' |       |        |       |       |     |     |
F  |  int  |        |       |  (E)  |     |     |     F  |       |        |       |       |     |     |
-------------------------------------------------     -------------------------------------------------   
  • FIRST(T E') = FIRST(T) = FIRST(F T') = FIRST(F) = {int, (}
2024 Spring
CSE411

Computing FIRST

FIRST() for is defined as follows:

  • If (i.e., is a terminal), then
  • If is a production of the given grammar , then
  • Suppose (i.e., is a nonterminal) and production (where ) exists in the given grammar .
    • If , then
2024 Spring
CSE411

LL(1) Parsing Table Construction

E : T E'
E': + T E' | ε
T : F T'
T': * F T' | ε
F: int | (E)                                                                                                                
-------------------------------------------------     -------------------------------------------------
   |  int  |  +     |   *   |  (    |  )  |  $  |        |  int  |  +     |   *   |  (    |  )  |  $  |
-------------------------------------------------     -------------------------------------------------   
E  |  T E' |        |       | T E'  |     |     |     E  |  T E' |        |       |  T E' |     |     |
E' |       | + T E' |       |       |  ε  |  ε  |     E' |       | + T E' |       |       |     |     |
T  |  F T' |        |       | F T'  |     |     |     T  |  F T' |        |       |  F T' |     |     |
T' |       |  ε     | * F T'|       |  ε  |  ε  |     T' |       |        | * F T'|       |     |     |
F  |  int  |        |       |  (E)  |     |     |     F  |  int  |        |       |  (E)  |     |     |
-------------------------------------------------     -------------------------------------------------   
  • FIRST(+ T E') = {+}
  • FIRST(F T') = FIRST(F) = {int, (}
  • FIRST(* F T') = {*}
  • FIRST(int) = {int}, FIRST((E)) = {(}
2024 Spring
CSE411

LL(1) Parsing Table Construction

E : T E'
E': + T E' | ε
T : F T'
T': * F T' | ε
F: int | (E)                                                                                                                
Consider: int + int
              ↑

E => T E' => F T' E' => int T' E'

                   E
                /     \
              T        E'                                                                                                                
             / \     
            F  T'   
            |  
          int  
  • For T', which production should be used?
2024 Spring
CSE411

LL(1) Parsing Table Construction

E : T E'
E': + T E' | ε
T : F T'
T': * F T' | ε
F: int | (E)                                                                                                                
Consider: int + int
              ↑

E => T E' => F T' E' => int T' E'

                   E
                /     \
              T        E'                                                                                                                
             / \     
            F  T'   
            |  
          int  
  • For T', which production should be used? ε
  • How do we know that?
2024 Spring
CSE411

LL(1) Parsing Table Construction

E : T E'
E': + T E' | ε
T : F T'
T': * F T' | ε
F: int | (E)                                                                                                                
Consider: int + int
              ↑

E => T E' => F T' E' => int T' E'

                   E
                /     \
              T        E'                                                                                                                
             / \     
            F  T'   
            |  
          int  
  • For T', which production should be used? ε
  • How do we know that? T' is followed by +.
2024 Spring
CSE411

LL(1) Parsing Table Construction

E : T E'
E': + T E' | ε
T : F T'
T': * F T' | ε
F: int | (E)                                                                                                                
-------------------------------------------------     -------------------------------------------------
   |  int  |  +     |   *   |  (    |  )  |  $  |        |  int  |  +     |   *   |  (    |  )  |  $  |
-------------------------------------------------     -------------------------------------------------   
E  |  T E' |        |       | T E'  |     |     |     E  |  T E' |        |       |  T E' |     |     |
E' |       | + T E' |       |       |  ε  |  ε  |     E' |       | + T E' |       |       |     |     |
T  |  F T' |        |       | F T'  |     |     |     T  |  F T' |        |       |  F T' |     |     |
T' |       |  ε     | * F T'|       |  ε  |  ε  |     T' |       |  ε     | * F T'|       |     |     |
F  |  int  |        |       |  (E)  |     |     |     F  |  int  |        |       |  (E)  |     |     |
-------------------------------------------------     -------------------------------------------------   
  • T' is followed by +; in other words, + ∈ FOLLOW(T')
2024 Spring
CSE411

LL(1) Parsing Table Construction

E : T E'
E': + T E' | ε
T : F T'
T': * F T' | ε
F: int | (E)                                                                                                                
-------------------------------------------------     -------------------------------------------------
   |  int  |  +     |   *   |  (    |  )  |  $  |        |  int  |  +     |   *   |  (    |  )  |  $  |
-------------------------------------------------     -------------------------------------------------   
E  |  T E' |        |       | T E'  |     |     |     E  |  T E' |        |       |  T E' |     |     |
E' |       | + T E' |       |       |  ε  |  ε  |     E' |       | + T E' |       |       |  ε  |  ε  |
T  |  F T' |        |       | F T'  |     |     |     T  |  F T' |        |       |  F T' |     |     |
T' |       |  ε     | * F T'|       |  ε  |  ε  |     T' |       |  ε     | * F T'|       |  ε  |  ε  |
F  |  int  |        |       |  (E)  |     |     |     F  |  int  |        |       |  (E)  |     |     |
-------------------------------------------------     -------------------------------------------------   
  • FOLLOW(E') = {), $}
  • FOLLOW(T') = {+, ), $}
2024 Spring
CSE411

LL(1) Parsing Table Construction Algorithm

  • For each production in grammar do:
    • for each terminal do:
    • If , for each
2024 Spring
CSE411

Computing FOLLOW

for is defined as follows:

  • If is the start nonterminal, then .
  • If there exists a production , then _________
2024 Spring
CSE411

Computing FOLLOW

for is defined as follows:

  • If is the start nonterminal, then .
  • If there exists a production , then
2024 Spring
CSE411

Computing FOLLOW

for is defined as follows:

  • If is the start nonterminal, then .
  • If there exists a production , then
  • If there exists a production , then __________
2024 Spring
CSE411

Computing FOLLOW

for is defined as follows:

  • If is the start nonterminal, then .
  • If there exists a production , then
  • If there exists a production , then
2024 Spring
CSE411

Computing FOLLOW

for is defined as follows:

  • If is the start nonterminal, then .
  • If there exists a production , then
  • If there exists a production , then
  • If there exists a production and , then
2024 Spring
CSE411

FIRST and FOLLOW

E : T E'
E': + T E' | ε
T : F T'
T': * F T' | ε
F: int | (E)                                                                                                                
---------------------------------
    |   FIRST    |    FOLLOW    
---------------------------------    
 E  |            |
 E' |            |
 T  |            |
 T' |            |
 F  |            |
---------------------------------                                                                            
  • If (i.e., is a terminal), then
  • If is a production of the given grammar , then
2024 Spring
CSE411

FIRST and FOLLOW

E : T E'
E': + T E' | ε
T : F T'
T': * F T' | ε
F: int | (E)                                                                                                                
---------------------------------
    |   FIRST    |    FOLLOW    
---------------------------------    
 E  |            |
 E' | {+, ε}     |
 T  |            |
 T' | {*, ε}     |
 F  | {int, (}   |
---------------------------------                                                                            
  • If (i.e., is a terminal), then
  • If is a production of the given grammar , then
2024 Spring
CSE411

FIRST and FOLLOW

E : T E'
E': + T E' | ε
T : F T'
T': * F T' | ε
F: int | (E)                                                                                                                
---------------------------------
    |   FIRST    |    FOLLOW    
---------------------------------    
 E  |            |
 E' | {+, ε}     |
 T  |            |
 T' | {*, ε}     |
 F  | {int, (}   |
---------------------------------                                                                            
  • Suppose (i.e., is a nonterminal) and production (where ) exists in the given grammar .

    • If , then
  • FIRST(F) FIRST(T) FIRST(E)

2024 Spring
CSE411

FIRST and FOLLOW

E : T E'
E': + T E' | ε
T : F T'
T': * F T' | ε
F: int | (E)                                                                                                                
---------------------------------
    |   FIRST    |    FOLLOW    
---------------------------------    
 E  | {int, (}   |
 E' | {+, ε}     |
 T  |            |
 T' | {*, ε}     |
 F  | {int, (}   |
---------------------------------                                                                            
  • Suppose (i.e., is a nonterminal) and production (where ) exists in the given grammar .

    • If , then
  • FIRST(F) FIRST(T) FIRST(E)

2024 Spring
CSE411

FIRST and FOLLOW

E : T E'
E': + T E' | ε
T : F T'
T': * F T' | ε
F: int | (E)                                                                                                                
---------------------------------
    |   FIRST    |    FOLLOW    
---------------------------------    
 E  | {int, (}   |
 E' | {+, ε}     |
 T  | {int, (}   |
 T' | {*, ε}     |
 F  | {int, (}   |
---------------------------------                                                                            
  • Suppose (i.e., is a nonterminal) and production (where ) exists in the given grammar .

    • If , then
  • FIRST(F) FIRST(T)

2024 Spring
CSE411

FIRST and FOLLOW

E : T E'
E': + T E' | ε
T : F T'
T': * F T' | ε
F: int | (E)                                                                                                                
---------------------------------
    |   FIRST    |    FOLLOW    
---------------------------------    
 E  | {int, (}   |  {$}
 E' | {+, ε}     |
 T  | {int, (}   |
 T' | {*, ε}     |
 F  | {int, (}   |
---------------------------------                                                                            
  • If is the start nonterminal, then .
2024 Spring
CSE411

FIRST and FOLLOW

E : T E'
E': + T E' | ε
T : F T'
T': * F T' | ε
F: int | (E)                                                                                                                
---------------------------------            ---------------------------------
    |   FIRST    |    FOLLOW                     |   FIRST    |    FOLLOW    
---------------------------------            ---------------------------------    
 E  | {int, (}   |  {$}                       E  | {int, (}   |  {$, )}
 E' | {+, ε}     |                            E' | {+, ε}     | 
 T  | {int, (}   |  {+}               =>      T  | {int, (}   |  {+} 
 T' | {*, ε}     |                            T' | {*, ε}     | 
 F  | {int, (}   |  {*}                       F  | {int, (}   |  {*}
---------------------------------            ---------------------------------                                               
  • If there exists a production , then
    • FIRST(E') \ {ε} FOLLOW(T)
    • FIRST(T') \ {ε} FOLLOW(F)
    • FIRST()) \ {ε} FOLLOW(E)
2024 Spring
CSE411

FIRST and FOLLOW

E : T E'
E': + T E' | ε
T : F T'
T': * F T' | ε
F: int | (E)                                                                                                                
---------------------------------            ---------------------------------
    |   FIRST    |    FOLLOW                     |   FIRST    |    FOLLOW    
---------------------------------            ---------------------------------    
 E  | {int, (}   |  {$, )}                    E  | {int, (}   |  {$, )}
 E' | {+, ε}     |                            E' | {+, ε}     |  {$, )} 
 T  | {int, (}   |  {+}               =>      T  | {int, (}   |  {+} 
 T' | {*, ε}     |                            T' | {*, ε}     |  {+}
 F  | {int, (}   |  {*}                       F  | {int, (}   |  {*}
---------------------------------            ---------------------------------                                               
  • If there exists a production , then
    • FOLLOW(E) FOLLOW(E')
    • FOLLOW(T) FOLLOW(T')
2024 Spring
CSE411

FIRST and FOLLOW

E : T E'
E': + T E' | ε
T : F T'
T': * F T' | ε
F: int | (E)                                                                                                                
---------------------------------            ---------------------------------
    |   FIRST    |    FOLLOW                     |   FIRST    |    FOLLOW    
---------------------------------            ---------------------------------    
 E  | {int, (}   |  {$, )}                    E  | {int, (}   |  {$, )}
 E' | {+, ε}     |  {$, )}                    E' | {+, ε}     |  {$, )} 
 T  | {int, (}   |  {+}               =>      T  | {int, (}   |  {+, $, )}
 T' | {*, ε}     |  {+}                       T' | {*, ε}     |  {+}
 F  | {int, (}   |  {*}                       F  | {int, (}   |  {*, $, )}
---------------------------------            ---------------------------------                                               
  • If there exists a production and , then
    • FOLLOW(E) FOLLOW(T)
    • FOLLOW(T) FOLLOW(F)
2024 Spring
CSE411

FIRST and FOLLOW

E : T E'
E': + T E' | ε
T : F T'
T': * F T' | ε
F: int | (E)                                                                                                                
---------------------------------            ---------------------------------
    |   FIRST    |    FOLLOW                     |   FIRST    |    FOLLOW    
---------------------------------            ---------------------------------    
 E  | {int, (}   |  {$, )}                    E  | {int, (}   |  {$, )}
 E' | {+, ε}     |  {$, )}                    E' | {+, ε}     |  {$, )} 
 T  | {int, (}   |  {+, $, )}         =>      T  | {int, (}   |  {+, $, )}
 T' | {*, ε}     |  {+}                       T' | {*, ε}     |  {+, $, )}
 F  | {int, (}   |  {*, $, )}                 F  | {int, (}   |  {*, $, )}
---------------------------------            ---------------------------------                                               
  • If there exists a production , then
    • FOLLOW(E) FOLLOW(E')
    • FOLLOW(T) FOLLOW(T')
2024 Spring
CSE411

LL(1) Parsing

  • The first L: scanning from left to right.
  • The second L: using the left-most derivation.
  • 1: using one lookahead token.
2024 Spring
CSE411

Reflection

2024 Spring
CSE411

CFG as a Parser Specification Language

                  CFG
                   ↓
            |-------------|    
            |    Parser   |        
            | Synthesizer |
            |-------------|
                   |
              synthesizes   
                   ↓
              |--------|            
    program → | Parser | → parse tree
              |--------|                          
2024 Spring
CSE411

CFG and Pushdown Automata (PDA)

Theory: For any context-free grammar , there exists an non-deterministic pushdown automata (NPDA) accepting .

2024 Spring
CSE411

Non-deterministic PDA (NPDA)

Input
□□□□□□□□
  ↓
|---------|    Stack
| Control | ⇔  |__|
|  Unit   |    |__|
|---------|    |__|
2024 Spring
CSE411

Non-deterministic PDA (NPDA)

  • NPDA
    • : a finite set of internal states of the control unit
    • : the input alphabet
    • : the stack alphabet
    • : the transition function
    • : the initial state of the control unit
    • : the stack start symbol
    • : the set of final states
2024 Spring
CSE411

Deterministic PDA (DPDA)

  • DPDA
    • : a finite set of internal states of the control unit
    • : the input alphabet
    • : the stack alphabet
    • : the transition function
      • If is defined, then should not be defined for every .
    • : the initial state of the control unit
    • : the stack start symbol
    • : the set of final states
2024 Spring
CSE411

CFG NPDA DPDA

  • This does not work. Why?

  • NPDA -X-> DPDA

    • The expressive power of NPDA and DPDA are not equivalent.
    • NPDA is more expressive.
2024 Spring
CSE411

Revisiting LL(1) Parsing Table

E : T E'
E': + T E' | ε
T : F T'
T': * F T' | ε
F: int | (E)                                                                                                                
-------------------------------------------------      
   |  int  |  +     |   *   |  (    |  )  |  $  |      
-------------------------------------------------      
E  |  T E' |        |       | T E'  |     |     |      
E' |       | + T E' |       |       |  ε  |  ε  |
T  |  F T' |        |       | F T'  |     |     |
T' |       |  ε     | * F T'|       |  ε  |  ε  |
F  |  int  |        |       |  (E)  |     |     |
-------------------------------------------------                                                                             
Parse `int + int`

(input, stack): 
(int, E$) => (int, T E'$) => (int, F T' E'$) => (int, int T' E'$) => (+, T' E'$) => (+, E'$) => (+, + T E'$) => (int, T E'$)
=> (int, F T' E'$) => (int, int T' E'$) => ($, T' E'$) => ($, E'$) => ($, $) => accept
2024 Spring
CSE411

Revisiting LL(1) Parsing Table

Input
□□□□□□□□
  ↓
|---------|    Stack
| Control | ⇔  |__|
|  Unit   |    |__|
|---------|    |__|                                                                                                        
-------------------------------------------------      
   |  int  |  +     |   *   |  (    |  )  |  $  |      
-------------------------------------------------      
E  |  T E' |        |       | T E'  |     |     |      
E' |       | + T E' |       |       |  ε  |  ε  |
T  |  F T' |        |       | F T'  |     |     |
T' |       |  ε     | * F T'|       |  ε  |  ε  |
F  |  int  |        |       |  (E)  |     |     |
-------------------------------------------------                                                                             
Parse `int + int`

(input, stack): 
(int, E$) => (int, T E'$) => (int, F T' E'$) => (int, int T' E'$) => (+, T' E'$) => (+, E'$) => (+, + T E'$) => (int, T E'$)
=> (int, F T' E'$) => (int, int T' E'$) => ($, T' E'$) => ($, E'$) => ($, $) => accept
2024 Spring