Data flow Analysis IV - Foundations I

Layout of contents:

Iterative Algorithm, Another view
Partial Order
Upper and Lower Bounds
Lattice, Semilattice, Complete and Product Lattice
Data Flow Analysis Framework via Lattice
Monotonicity and Fixed Point Theorem
Relate Iterative Algorithm to Fixed Point Theorem
May/Must Analysis, A Lattice View
Distributivity and MOP
Constant Propagation
Worklist Algorithm

View Iterative Algorithm in Another Way

Given a CFG(a program) with k nodes, the iterative algorithm updates OUT[n] for every node in each iteration.
Assume the domain of the values in data flow analysis is V, then we can define a k-tuple:

$$ (OUT[n_1],OUT[n_2],…,OUT[n_k]) $$

as an element of set (V1 * V2 * … * Vk), denoted as V^k ,to hold the values of the analysis after each iteration.

Each iteration can be considered as taking an action to map an element of V^k to a new element of V^k, through applying the transfer functions and control-flow handling, abstracted as a function F: V^k -> V^k
When we reach a fixed point in F, we terminate in this algorithm.

The iterative algorithm is also called IN/OUT equation system

Properties of F

Is the algorithm guaranteed to terminate or reach the fixed point, or does it always have a solution ?
If so, is there only one fixed point? If more than one, which one is the best (more precise) ?
When will the algorithm reach the fixed point? (Worst-case time consumption)

We will have a look at these problems in the latter part of this lecture.

Partial Order

Poset(p, op): Reflexivity, Antisymmetry, Transitivity

Partial means that for a pair of set elements in P, they could be incomparable; in other words, not necessary that every pair of set elements must satisfy the ordering.

Upper and Lower Bounds

Given a poset (P, $\sube$) and its subset S that S is a subset of P, we say that:

u in P is an upper bound of S, if for every x in S, x $\sube$ u.
l in P is an lower bound of S, if for every x in S, l $\sube$ x

Then we can define the least upper bound $\sqcup$S and the greatest lower bound $\sqcap$S.

And if S contains only two elements a and b, then

$\sqcup$S can be written as a $\sqcup$ b

$\sqcap$S can be written as a $\sqcap$ b

Some properties

Not every poset has lub or glb.
But if a poset has lub or glb, it will be unique

Lattice

Given a poset (P, $\sube$), $\forall$a,b $\in$ P, if a $\sqcup$ b and a $\sqcap$ b exist, then this poset is called a lattice.

Semi lattice

Given a poset (P, $\sube$), $\forall$a,b $\in$ P,

if only a $\sqcup$ b then this poset is called a join semilattice,
else if only a $\sqcap$ b exist, this poset is called a meet semilattice.

Complete lattice

Given a poset (P, $\sube$), for arbitrary subset S of P, if $\sqcup$S and $\sqcap$S exist, then this poset is called a complete lattice.

Every complete lattice (P, $\sube$) has a greatest element $\top$ = $\sqcup$ P called top and a least element $\bot$ = $\sqcap$ P called bottom.

Every finite lattice (P is finite) is a complete lattice.

Product Lattice

Give lattices L1 = (P1, $\sube_1$), L2 = (P2, $\sube_2$), L3 = (P3, $\sube_3$) …. , Ln = (Pn, $\sube_n$), if for all i, (Pi, $\sube_i$) has $\sqcup_i$ and $\sqcap_i$, then we can have a product lattice $L^n$ = (P, $\sube$) that is defined by:

P = P1 * P2 * … * Pn

$(x_1, … , x_n) \sube (y_1,…,y_n) \iff (x_1 \sube y_1) \and (x_2 \sube y_2) \and … \and (x_n \sube y_n) $

$(x_1, … , x_n) \sqcup (y_1,…,y_n) = (x_1 \sqcup_1 y_1 , x_2 \sqcup_2 y_2 ,… , x_n \sqcup_n y_n) $

$(x_1, … , x_n) \sqcap (y_1,…,y_n) = (x_1 \sqcap_1 y_1 , x_2 \sqcap_2 y_2 ,… , x_n \sqcap_n y_n) $

Properties:

A product lattice is a lattice
If a product lattice L is a product of complete lattices, then L is complete lattice.

Data Flow Analysis Framework via Lattice

A data flow analysis framework (D, L, F) consists of:

D: Direction of the data flow
L: A lattice including domain of the values V and a meet $\sqcap$ or join $\sqcup$ operator.
F: a family of transfer functions from V to V

Data flow analysis can be seen as iteratively applying transfer functions and meet/join operations on the values of a lattice.

Back to the questions we mentioned before

Is the algorithm guaranteed to terminate or reach the fixed point, or does it always have a solution ?
If so, is there only one fixed point? If more than one, which one is the best (more precise) ?
When will the algorithm reach the fixed point? (Worst-case time consumption)

Monotonicity

Obviously:

A function f: L -> L (L is a lattice) is monotonic if $\forall$ x, y $\in$ L, $x\sube y \Rightarrow f(x)\sube f(y)$

Fixed-Point TH

Given a complete lattice (L, $\sube$), if

f: L -> L is monotonic, and
L is finite, then

the least fixed point of f can be found by iterating f($\bot$), f(f($\bot$)), …, $f^k(\bot)$ until a fixed point is reached.

the greatest fixed point of f can be found by iterating f($\top$), f(f($\top$)), …, $f^k(\top)$ until a fixed point is reached.

proof:

By the definition of $\bot$ and f: L->L, we have $\bot$ $\sube f(\bot)$

As f is monotonic, $f(\bot)$ $\sube f(f(\bot)) = f^2(\bot)$

Because of finiteness, the fixed point exists. (Partial Order can not be reversed)