Finishing this class, you need to know the answers to the following questions:
- The relation between compilers and static analyzers
- Understand 3AC and its common forms (in IR jimple)
- How to build basic blocks on top of IR
- How to construct control flow graphs on top of BBS.
Source Code -> Scanner > Lexical Analysis(by regular expression) –> tokens -> Parser > Syntax Analysis(Context Free Grammar) –> AST -> Type Checker > Semantic Analysis(Attribute Grammar) –> Decorated AST -> Translator –> IR(Three address code) -> Code Generator –> machine code
graph TD;
Source-Code-->|Lexical Analysis|Scanner;
Scanner--Tokens-->Parser;
Parser--AST-->Type-Checker;
Type-Checker--Decorated-AST-->Translator;
Translator-->IR;
IR--3AC-->Code-Generator-->MachineCode;
Source-Code--front-end-->IR;
IR--back-end-->MachineCode
Before IR: front end After IR: back end
AST
IR
Each 3AC contains at most 3 addressed
Some common 3AC Forms:
A simple FOR example for Jimple:
package nju.sa.examples;
public class ForLoop3AC {
public static void main(String[] args) {
int x = 0;
for(int i = 0; i<10; i++) {
x = x + 1;
}
}
}
3AC(Jimple):
public static void main(java.lang.String[])
{
java.lang.String[] r0;
int i1;
r0 := @parameter0: java.lang.String[];
i1 = 0;
label1:
if i1 >= 10 goto label2;
i1 = i1 + 1;
goto label1:
label2:
return;
}
Do-while loop:
Java version:
package nju.sa.examples;
public class DoWhileLoop3AC {
public static void main(String[] args) {
int[] arr = new int[10];
int i = 0;
do {
i = i + 1;
} while(arr[i] < 10);
}
}
3AC version:
public static void main(java.lang.String[])
{
java.lang.String[] r0;
int[] i1;
int $i0, i1;
r0 := @parameter0: java.lang.String[];
r1 = newarray (int)[10];
i1 = 0;
label1:
i1 = i1 + 1
$i0 = r1[i1];
if $i0 < 10 goto label1
return;
}
Method call
Java version:
package nju.sa.examples;
public class MethodCall3AC{
string foo(String para1, String para2) {
return para1 + " " + para2;
}
public static void main(String[] args) {
MethodCall3AC mc = new MethodCall3AC();
String result = mc.foo("Hello","World");
}
}
3AC version:
java.lang.String foo(java.lang.String, java.lang.String)
{
nju.sa.examples.MethodCall3AC r0;
java.lang.String r1, r2, $r7;
java.lang.StringBuilder $r3, $r4, $r5, $r6;
r0 := @this: nju.sa.examples.MethodCall3AC;
r1 := @parameter0: java.lang.String;
r2 := @parameter1: java.lang.String;
$r3 = new java.lang.StringBuilder;
specialinvoke $r3.<java.lang.StringBuilder: void <init>()>();
$r4 = virtualinvoke $r3.<java.lang.StringBuilder: java.lang.StringBuilder append(java.lang.String)>(r1);
$r5 = virtualinvoke $r4.<java.lang.StringBuilder: java.lang.StringBuilder append(java.lang.String)>(" ");
$r6 = virtualinvoke $r5.<java.lang.StringBuilder: java.lang.StringBuilder append(java.lang.String)>(r2);
$r7 = virtualinvoke $r6.<java.lang.StringBuilder: java.lang.String toString()>();
return $r7;
}
public static void main(java.lang.String[])
{
java.lang.String[] r0;
nju.sa.examples.MethodCall3AC $r3;
r0 := @parameter: java.lang.String[];
$r3 = new nju.sa.examples.MethodCall3AC;
specialinvoke $r3.<nju.sa.examples.MethodCall3AC: void <init>()>();
virtualinvoke $r3.<nju.sa.examples.MethodCall3AC: java.lang.String foo(java.lang.String, java.lang.String)>("Hello","World");
}
Java 7: Invokedynamic -> Java static typing, dynamic language runs on JVM
<Class name: return type <method name>(Parameter type)>(Real parameter);
Class:
Java version:
package nju.sa.examples;
public class Class3AC {
public static final double pi = 3.14;
public static void main(String[] args) {
}
}
3AC version
public class nju.sa.examples.Class3AC extends java.lang.Object
{
public static final double pi;
public void<init>()
{
nju.sa.examples.Class3AC r0;
r0 := @this: nju.sa.examples.Class3AC;
specialinvoke r0.<java.lang.Object: void<init>()>();
return;
}
public static void main(java.lang.String[])
{
java.lang.String[] r0;
r0 := @parameter0: java.lang.String[];
return;
}
public static void <clinit>()
{
<nju.sa.examples.Class3AC: double pi> = 3.14;
return;
}
All assignments in SSA are to variables with distinct names
3AC | SSA |
---|---|
p = a + b | p1 = a + b |
q = p - c | q1 = p1 - c |
p = q * d | p2 = q1 * d |
p = e - p | p3 = e - p2 |
q = p + q | q2 = p3 + q1 |
What if we encounter the following situation:
graph TD
1[if e]
2[/x0 = 0/]
3[\x1 = 1\]
4[y = x + 7]
1-->2
1-->3
2-->4
3-->4
We can use $$ x_2 = \phi(x_0,x_1) $$ to guarantee the requirements of SSA.
Merits
Demerits
The node in CFG can be an individual 3-address instruction, or (usually) a Basic Block(BB).
Basic Blocks are maximal sequences of consecutive three-address instructions with the properties that:
Now we have the following example:
(1) x = input
(2) y = x - 1
(3) z = x * y
(4) if z < x goto (7)
(5) p = x / y
(6) q = p + y
(7) a = q
(8) b = x + a
(9) c = 2*a - b
(10) if p == q goto (12)
(11) goto (3)
(12) return
We can use goto statement to divide the whole sequence into Basic Blocks
BB(1): (1),(2)
BB(2): (3),(4)
BB(3): (5),(6)
BB(4): (7),(8),(9),(10)
BB(5): (11)
BB(6): (12)
The algorithm for this is quite simple, which is not listed in detail here.
And then we construct the CFG by adding edges to the nodes.
It is normal to replace the jumps to instructions labels by jumps to Basic Blocks.