CSE411

CSE411: Introduction to Compilers

Intermediate Representation

Jooyong Yi (UNIST)

CSE411

Where are we?

                 |-----------|        |----------|
Source Program → | Front End | → IR → | Back End | → Target Program
                 |-----------|        |----------|
                                  ↑
         |-------|            |--------|           |----------|        |-----------|
Source → | Lexer | → tokens → | Parser | → parse → | Semantic |→ AST → |   IR      | → IR
Program  |       |            |        |   tree    | Analyzer |        | Generator |
         |-------|            |--------|           |----------|        |-----------|
  • IR: Intermediate Representation (or Intermediate Code)
CSE411

Example of IR

  • LLVM IR
int sum(int x, int y) { int z = x + y; return z; }
define i32 @sum(i32 noundef %0, i32 noundef %1) { ; noundef: no undefined value
  %3 = alloca i32, align 4 ; allocate the memory on the current call stack
  %4 = alloca i32, align 4
  %5 = alloca i32, align 4
  store i32 %0, ptr %3, align 4
  store i32 %1, ptr %4, align 4
  %6 = load i32, ptr %3, align 4
  %7 = load i32, ptr %4, align 4
  %8 = add nsw i32 %6, %7 ; nsw: no signed wrap
  store i32 %8, ptr %5, align 4
  %9 = load i32, ptr %5, align 4
  ret i32 %9
}
CSE411

Another Example of IR

  • Java Bytecode
static int sum(int x, int y) { int z = x + y; return z; }
  static int sum(int, int);
    descriptor: (II)I
    flags: (0x0008) ACC_STATIC
    Code:
      stack=2, locals=3, args_size=2
         0: iload_0
         1: iload_1
         2: iadd
         3: istore_2
         4: iload_2
         5: ireturn
      LineNumberTable:
        line 3: 0
        line 4: 4
CSE411

Why IR?

  • Why not directly translate AST to the target program?
         |-------|            |--------|           |----------|        |-----------|
Source → | Lexer | → tokens → | Parser | → parse → | Semantic |→ AST → |   Code    | → Target
Program  |       |            |        |   tree    | Analyzer |        | Generator |   Program
         |-------|            |--------|           |----------|        |-----------|
CSE411

Why IR? Portability

CSE411

Why IR? Portability and Maintenance

CSE411

Why IR? Portability and Maintenance

CSE411

Why IR? Portability and Maintenance

CSE411

Why IR? Portability and Maintenance

CSE411

Why IR? Reusability and Maintenance

  • Code optimization routines developed for bytecode can be reused for different languages.
CSE411

Why IR? Separation of Concerns

  • When translating AST to target code, you should consider both language semantics and machine specifics.
CSE411

Why IR? Separation of Concerns

  • How many variables can you declare in a function?
  • How many registers can you use in a machine?
CSE411

Which IR Format?

CSE411

Which IR Format? AST

                 |-----------|         |----------|
Source Program → | Front End | → AST → | Back End | → Target Program
                 |-----------|         |----------|
  • Portability
  • Separation of concerns
CSE411

Which IR Format? AST

                                            E                                       BinExp(+) 
                                        /   |   \                                  /         \
3 - 2 + 1  == syntax check ==>         E    +   T     == abstraction ==>      BinExp(-)      Int(1)
                                     / | \      |                               /   \
                                    E  -  T    int                          Int(3)  Int(2)
                                    |     |
                                    T    int
                                    |
                                   int
  • S-expression: (BinExp + (BinExp - (Int 3) (Int 2)) (Int 1))
CSE411

Which IR Format? AST

1: a = 0;                                                      StmtSeq
2: while (a < N) {                                          /          \
3:  b = a + 1;                                         AssignStmt       WhileStmt
4:  c = c + b;                                         /     \         /         \
5:  a = b * 2;                                       Id(a)  Int(0)   BinExp(<)     StmtSeq
6: }                                                                   /    \        / | \
7: return c;                                                         Id(a)  Id(N)   
  • At each line, which variables are live?
CSE411

Which IR Format? Control Flow Graph

1: a = 0;                                                       a = 0
2: while (a < N) {                                                |
3:  b = a + 1;                                           ----   a < N  <-------
4:  c = c + b;                                           |        |           |
5:  a = b * 2;                                           |      b = a + 1     |
6: }                                                     |        |           |
7: return c;                                             |      c = c + b     |
                                                         |        |           |
                                                         |      a = b * 2     |
                                                         ↓        |___________|
                                                      return c
CSE411

Which IR Format? Linear IR

1: a = 0;                                                       a = 0
2: while (a < N) {                                                |
3:  b = a + 1;                                           ----   a < N  <-------
4:  c = c + b;                                           |        |           |
5:  a = b * 2;                                           |      b = a + 1     |
6: }                                                     |        |           |
7: return c;                                             |      c = c + b     |
                                                         |        |           |
                                                         |      a = b * 2     |
                                                         ↓        |___________|
                                                      return c                                                       
entry:
  %a = alloca i32, align 4
  %b = alloca i32, align 4
  %c = alloca i32, align 4
  store i32 0, i32* %a, align 4
  br label %loop.cond

loop.cond:  
  %a_val = load i32, ptr %a, align 4
  %cmp = icmp slt i32 %a_val, %N ; slt: signed less than
  br i1 %cmp, label %loop.body, label %loop.exit

loop.body:
  ...
loop.exit:                                                                                                 
  ...
CSE411

Java Bytecode

a = 0;                   0: iconst_0
while (a < N) {          1: istore_2
  b = a + 1;             2: iconst_0
  c = c + b;             3: istore_3
  a = b * 2;             4: iconst_0
}                        5: istore        4
return c;                7: iload_2
                         8: iload_1
                         9: if_icmpge     29
                         12: iload_2
                         13: iconst_1
                         14: iadd
                         15: istore_3
                         16: iload         4
                         18: iload_3
                         19: iadd
                         20: istore        4
                         22: iload_3
                         23: iconst_2
                         24: imul
                         25: istore_2
                         26: goto          7
                         29: iload         4
                         31: ireturn                                                                                                  
  • Language for a stack-based virtual machine.
  • More abstract than linear IR.
CSE411

How to Generate IR?

static void foo() {
    int x;
    x = 1;
}
    ICONST_1
    ISTORE 0
    RETURN
CSE411

How to Generate IR?

static void foo() { int x; x = 1; }

    ICONST_1
    ISTORE 0
    RETURN

\begin{tikzpicture}[node distance=1cm and 10cm] \node (c) { $C$ }; \node (cpp) [below of=c] { $C++$ }; \node (rust) [below of=cpp] { $Rust$ }; \node (Julia) [below of=rust] { $Julia$ }; \node (x86) [right=4cm of c] { $x86$ }; \node (arm) [below of=x86] { $ARM$ }; \node (mips) [below of=arm] { $MIPS$ }; \node (alpha) [below of=mips] { $Alpha$ }; \draw [arrow] (c) -- (x86); \draw [arrow] (c) -- (arm); \draw [arrow] (c) -- (mips); \draw [arrow] (c) -- (alpha); \draw [arrow] (cpp) -- (x86); \draw [arrow] (cpp) -- (arm); \draw [arrow] (cpp) -- (mips); \draw [arrow] (cpp) -- (alpha); \draw [arrow] (rust) -- (x86); \draw [arrow] (rust) -- (arm); \draw [arrow] (rust) -- (mips); \draw [arrow] (rust) -- (alpha); \draw [arrow] (Julia) -- (x86); \draw [arrow] (Julia) -- (arm); \draw [arrow] (Julia) -- (mips); \draw [arrow] (Julia) -- (alpha); \end{tikzpicture}

\begin{tikzpicture}[node distance=1cm and 10cm] \node (c) { $C$ }; \node (cpp) [below of=c] { $C++$ }; \node (rust) [below of=cpp] { $Rust$ }; \node (julia) [below of=rust] { $Julia$ }; \node (llvm) [right=2.5cm of c, below of=c] { LLVM IR }; \node (x86) [right=4.5cm of c] { $x86$ }; \node (arm) [below of=x86] { $ARM$ }; \node (mips) [below of=arm] { $MIPS$ }; \node (alpha) [below of=mips] { $Alpha$ }; \draw [arrow] (c) -- (llvm); \draw [arrow] (cpp) -- (llvm); \draw [arrow] (rust) -- (llvm); \draw [arrow] (julia) -- (llvm); \draw [arrow] (llvm) -- (x86); \draw [arrow] (llvm) -- (arm); \draw [arrow] (llvm) -- (mips); \draw [arrow] (llvm) -- (alpha); \end{tikzpicture}

\begin{tikzpicture}[node distance=1cm and 10cm] \node (java) { $Java$ }; \node (kotlin) [below of=java] { $Kotlin$ }; \node (scala) [below of=kotlin] { $Scala$ }; \node (closure) [below of=scala] { $Closure$ }; \node (x86) [right=4cm of java] { $x86$ }; \node (arm) [below of=x86] { $ARM$ }; \node (mips) [below of=arm] { $MIPS$ }; \node (alpha) [below of=mips] { $Alpha$ }; \draw [arrow] (java) -- (x86); \draw [arrow] (java) -- (arm); \draw [arrow] (java) -- (mips); \draw [arrow] (java) -- (alpha); \draw [arrow] (kotlin) -- (x86); \draw [arrow] (kotlin) -- (arm); \draw [arrow] (kotlin) -- (mips); \draw [arrow] (kotlin) -- (alpha); \draw [arrow] (scala) -- (x86); \draw [arrow] (scala) -- (arm); \draw [arrow] (scala) -- (mips); \draw [arrow] (scala) -- (alpha); \draw [arrow] (closure) -- (x86); \draw [arrow] (closure) -- (arm); \draw [arrow] (closure) -- (mips); \draw [arrow] (closure) -- (alpha); \end{tikzpicture}

\begin{tikzpicture}[node distance=1cm and 10cm] \node (c) { $Java$ }; \node (cpp) [below of=c] { $Kotlin$ }; \node (rust) [below of=cpp] { $Scala$ }; \node (julia) [below of=rust] { $Closure$ }; \node (llvm) [right=2.5cm of c, below of=c] { JVM Bytecode }; \node (x86) [right=4.5cm of c] { $x86$ }; \node (arm) [below of=x86] { $ARM$ }; \node (mips) [below of=arm] { $MIPS$ }; \node (alpha) [below of=mips] { $Alpha$ }; \draw [arrow] (c) -- (llvm); \draw [arrow] (cpp) -- (llvm); \draw [arrow] (rust) -- (llvm); \draw [arrow] (julia) -- (llvm); \draw [arrow] (llvm) -- (x86); \draw [arrow] (llvm) -- (arm); \draw [arrow] (llvm) -- (mips); \draw [arrow] (llvm) -- (alpha); \end{tikzpicture}