Data Flow Analysis V : Foundations II

The first thing we need to deal with is relating the algorithm to the function we talked about in the last lecture.

In each iteration of our algorithm, we have a (L1, L2, … ,Lk), $L^k$ for short.

It is equivalent to think that we apply function F which consists of

  • transfer function $f_i$: L -> L for every node
  • join/meet function $\sqcup / \sqcap$: L * L -> L for control flow confluence

Now we only need to prove that F is monotonic.

We want to show that $\sqcup$ is monotonic: $$ \forall x,y,z \in L, x\subseteq y \Rightarrow x \sqcup z \sube y \sqcup z $$ This is quite obvious.


So now we still have a problem:

  • When will the algorithm reach the fixed point, or when can we get the solution ?

Def:

  • The height h of a lattice L is the length of the longest path from Top to Bottom in L.

Worst case: h * #(nodes in CFG)


May and Must Analyses, a Lattice View

May analysis and Must analysis both start from an unsafe result.

In May this result lies in the Bottom of the product lattice. (No definition can reach -> error free, but unsafe). And the top of the lattice means that all definitions may reach. It is safe but useless. So we can introduce the Truth node. Then we can divide the whole product lattice into Safe part and Unsafe Part. In our designing stage, we have considered safe approximation, which guarantee that we can get a safe result. And because we start from bottom, we can get to the Least Fixed Point. And precision decreases from bottom to top. Since the fixed point we got is the lowest, then we prove that this is the best we can achieve.

In Must analysis, we can get similar results.


How Precise is Our Solution ?

Meet-Over-All-Paths Solution (MOP)

We can define a Transfer Function $F_P$ for every path P. (Just a composition of transfer functions for all statements on that path) $$ P := entry \Rightarrow S_1 \Rightarrow S_2 … \Rightarrow S_i $$

$$ MOP[S_i] = \sqcup / \sqcap F_P(OUT[Entry]) $$ MOP computes the data-flow values at the end of each path and apply join/meet operator to these values to find their lub/glb.

Note that :

  • Some paths may be not executable => Not fully precise

  • Unbounded(we do not know when the loop break), and not enumerable => impractical

Ours(Iterative Algorithm) vs. MOP

graph TD;
	1[Entry]
	2[S1]
	3[S2]
	4[S3]
	5[S4]
	1 --> 2
	1 --> 3
	2 --> 4
	3 --> 4
	4 --> 5

$$ Ours = F(x \sqcup y) $$

$$ MOP = F(x) \sqcup F(y) $$

By definition of lub, we have $x\sube (x\sqcup y)$ and $y\sube (x\sqcup y)$

Because transfer function is monotonic, we have $ F(x)\sube F(x\sqcup y)$ and $F(y)\sube F(x\sqcup y)$

This means that $F(x)\sqcup F(y) \sube F(x\sqcup y)$

So $MOP \sube Ours$

So ours is less precise than MOP.

When F is distributive, we can just let Ours as precise as MOP.

Note that : Bit-vector or Gen/Kill problems are distributive. We may wonder what is not distributive ?

ABOUT Constant Propagation

Given a variable x at program point p, determine whether x is guaranteed to hold a constant value at p.

In a Lattice view:

  • Domain of the values V

    ​ UNDEF, all const nums, NAC

  • Meet operator (Neither union nor intersection)

  • Transfer function

We can show by example that this is not distributive.

Worklist Algorithm - an optimization of iterative algorithm

In our iterative algorithm, even if only a few nodes got updated, we still have to go through all the nodes in the next iteration.

Worklist Algorithm for forward analysis:

​OUT[entry] = empty set;

​for each basic block B\entry

​	OUT[B] = empty set;

​Worklist <- all basic blocks

​while Worklist is not empty

​	Pick a BB B from Worklist

​	old_OUT = OUT[B]

​	update IN[B] and OUT[B]

​	if old_OUT != OUT[B]

​		Add all successors of B to Worklist