Intermediate Presentation

  • Layout:
    • Compilers and Static Analyzers
    • AST vs. IR
    • IR: Three-Address Code
    • 3AC in Real Static Analyzer: Soot
    • Static Single Assignment
    • Basic Blocks
    • Control Flow Graphs

Finishing this class, you need to know the answers to the following questions:

  • The relation between compilers and static analyzers
  • Understand 3AC and its common forms (in IR jimple)
  • How to build basic blocks on top of IR
  • How to construct control flow graphs on top of BBS.

Compiler

Source Code -> Scanner > Lexical Analysis(by regular expression) –> tokens -> Parser > Syntax Analysis(Context Free Grammar) –> AST -> Type Checker > Semantic Analysis(Attribute Grammar) –> Decorated AST -> Translator –> IR(Three address code) -> Code Generator –> machine code

graph TD;
	Source-Code-->|Lexical Analysis|Scanner;
	Scanner--Tokens-->Parser;
	Parser--AST-->Type-Checker;
	Type-Checker--Decorated-AST-->Translator;
	Translator-->IR;
	IR--3AC-->Code-Generator-->MachineCode;
	Source-Code--front-end-->IR;
	IR--back-end-->MachineCode

Before IR: front end After IR: back end

AST vs. IR

  • AST

    • High level and close to grammar structure
    • usually language dependent
    • suitable for fast type checking
    • lack of control flow information
  • IR

    • low-level and close to machine code
    • usually language independent
    • compact and uniform
    • contains control flow information
    • usually considered as the basis for static analysis

IR

Each 3AC contains at most 3 addressed

  • Address
    • Name
    • Constant
    • Compiler-generated temporary

Some common 3AC Forms:

  • x = y bop z
  • x = uop y
  • x = y
  • goto L
  • if x goto L
  • if x rop y goto L

Soot and Its IR: Jimple

Repo,Tutorials

A simple FOR example for Jimple:

package nju.sa.examples;
public class ForLoop3AC {
  public static void main(String[] args) {
    int x = 0;
    for(int i = 0; i<10; i++) {
      x = x + 1;
    }
  }
}

3AC(Jimple):

public static void main(java.lang.String[])
{
  java.lang.String[] r0;
  int i1;

  r0 := @parameter0: java.lang.String[];

  i1 = 0;

label1:
  if i1 >= 10 goto label2;

  i1 = i1 + 1;

  goto label1:

label2:
  return;
}

Do-while loop:

Java version:

package nju.sa.examples;
public class DoWhileLoop3AC {
  public static void main(String[] args) {
    int[] arr = new int[10];
    int i = 0;
    do {
      i = i + 1;
    } while(arr[i] < 10);
  }
}

3AC version:

public static void main(java.lang.String[])
{
  java.lang.String[] r0;
  int[] i1;
  int $i0, i1;

  r0 := @parameter0: java.lang.String[];

  r1 = newarray (int)[10];

  i1 = 0;

label1:
  i1 = i1 + 1

  $i0 = r1[i1];

  if $i0 < 10 goto label1

  return;
}

Method call

Java version:

package nju.sa.examples;
public class MethodCall3AC{
  
  string foo(String para1, String para2) {
    return para1 + " " + para2;
  }

  public static void main(String[] args) {
    MethodCall3AC mc = new MethodCall3AC();
    String result = mc.foo("Hello","World");
  }

}

3AC version:

java.lang.String foo(java.lang.String, java.lang.String)
{
  nju.sa.examples.MethodCall3AC r0;
  java.lang.String r1, r2, $r7;
  java.lang.StringBuilder $r3, $r4, $r5, $r6;

  r0 := @this: nju.sa.examples.MethodCall3AC;

  r1 := @parameter0: java.lang.String; 

  r2 := @parameter1: java.lang.String; 

  $r3 = new java.lang.StringBuilder;

  specialinvoke $r3.<java.lang.StringBuilder: void <init>()>();

  $r4 = virtualinvoke $r3.<java.lang.StringBuilder: java.lang.StringBuilder append(java.lang.String)>(r1);

  $r5 = virtualinvoke $r4.<java.lang.StringBuilder: java.lang.StringBuilder append(java.lang.String)>(" ");

  $r6 = virtualinvoke $r5.<java.lang.StringBuilder: java.lang.StringBuilder append(java.lang.String)>(r2);

  $r7 = virtualinvoke $r6.<java.lang.StringBuilder: java.lang.String toString()>();

  return $r7;

}

public static void main(java.lang.String[])
{
  java.lang.String[] r0;
  nju.sa.examples.MethodCall3AC $r3;

  r0 := @parameter: java.lang.String[];

  $r3 = new nju.sa.examples.MethodCall3AC;

  specialinvoke $r3.<nju.sa.examples.MethodCall3AC: void <init>()>();

  virtualinvoke $r3.<nju.sa.examples.MethodCall3AC: java.lang.String foo(java.lang.String, java.lang.String)>("Hello","World");

}

JVM invoke instruction:

  • Invokespecial: call constructor, call superclass methods, call private method
  • Invokevirtual: instance methods call(virtual dispatch)
  • Invokeinterface: cannot optimize, checking interface implementation
  • Invokestatic: call static methods

Java 7: Invokedynamic -> Java static typing, dynamic language runs on JVM

Method Signature

  1. Class name
  2. Return type
  3. Method name
  4. Parameter types
<Class name: return type <method name>(Parameter type)>(Real parameter);

Class:

Java version:

package nju.sa.examples;
public class Class3AC {
	public static final double pi = 3.14;
    public static void main(String[] args) {
        
    }
}

3AC version

public class nju.sa.examples.Class3AC extends java.lang.Object
{
    public static final double pi;
    
    public void<init>()
    {
        nju.sa.examples.Class3AC r0;
        
        r0 := @this: nju.sa.examples.Class3AC;
        
        specialinvoke r0.<java.lang.Object: void<init>()>();
        
        return;
    }
    
    public static void main(java.lang.String[])
    {
        java.lang.String[] r0;
        
        r0 := @parameter0: java.lang.String[];
        
        return;
    }
    
    public static void <clinit>()
    {
        <nju.sa.examples.Class3AC: double pi> = 3.14;
        
        return;
    }

Static Single Assignment

All assignments in SSA are to variables with distinct names

  • Give each definition a fresh name
  • Propagate fresh name to subsequent uses
  • Every variable has exactly one definition
3AC SSA
p = a + b p1 = a + b
q = p - c q1 = p1 - c
p = q * d p2 = q1 * d
p = e - p p3 = e - p2
q = p + q q2 = p3 + q1

What if we encounter the following situation:

graph TD
	1[if e]
	2[/x0 = 0/]
	3[\x1 = 1\]
	4[y = x + 7]
	1-->2
	1-->3
	2-->4
	3-->4

We can use $$ x_2 = \phi(x_0,x_1) $$ to guarantee the requirements of SSA.

Why and Why not SSA?

  • Merits

    • Flow information in indirectly incorporated into the unique variable names.
      • May help deliver some simpler analyses, e,g., flow-insensitive analysis gains partial precision of flow-sensitive analysis via SSA.
    • Define-and-Use pairs are explicit.
      • Can be optimized better
  • Demerits

    • SSA may introduce too many variables and phi-functions
    • May introduce inefficiency problem when transitioning to machine code.

Control Flow Analysis

  • Usually refer to building Control Flow Graph

The node in CFG can be an individual 3-address instruction, or (usually) a Basic Block(BB).

Basic Block

Basic Blocks are maximal sequences of consecutive three-address instructions with the properties that:

  • It can be entered only at the beginning, i.e. ,the first instruction in the block.
  • It can be exited only at the end

Now we have the following example:

(1) x = input

(2) y = x - 1

(3) z = x * y

(4) if z < x goto (7)

(5) p = x / y

(6) q = p + y

(7) a = q

(8) b = x + a

(9) c = 2*a - b

(10) if p == q goto (12)

(11) goto (3)

(12) return

We can use goto statement to divide the whole sequence into Basic Blocks

BB(1): (1),(2)

BB(2): (3),(4)

BB(3): (5),(6)

BB(4): (7),(8),(9),(10)

BB(5): (11)

BB(6): (12)

The algorithm for this is quite simple, which is not listed in detail here.

And then we construct the CFG by adding edges to the nodes.

  • There is and edge from Block A to Block B if and only if
    • There is a conditional or unconditional jump from the end of A to the beginning of B.(Sequential Executing is also a conditional jump)

It is normal to replace the jumps to instructions labels by jumps to Basic Blocks.