Data flow Analysis IV - Foundations I

Layout of contents:

  • Iterative Algorithm, Another view
  • Partial Order
  • Upper and Lower Bounds
  • Lattice, Semilattice, Complete and Product Lattice
  • Data Flow Analysis Framework via Lattice
  • Monotonicity and Fixed Point Theorem
  • Relate Iterative Algorithm to Fixed Point Theorem
  • May/Must Analysis, A Lattice View
  • Distributivity and MOP
  • Constant Propagation
  • Worklist Algorithm

View Iterative Algorithm in Another Way

  • Given a CFG(a program) with k nodes, the iterative algorithm updates OUT[n] for every node in each iteration.
  • Assume the domain of the values in data flow analysis is V, then we can define a k-tuple:

$$ (OUT[n_1],OUT[n_2],…,OUT[n_k]) $$

as an element of set (V1 * V2 * … * Vk), denoted as V^k ,to hold the values of the analysis after each iteration.

  • Each iteration can be considered as taking an action to map an element of V^k to a new element of V^k, through applying the transfer functions and control-flow handling, abstracted as a function F: V^k -> V^k

  • When we reach a fixed point in F, we terminate in this algorithm.

The iterative algorithm is also called IN/OUT equation system

Properties of F

  • Is the algorithm guaranteed to terminate or reach the fixed point, or does it always have a solution ?
  • If so, is there only one fixed point? If more than one, which one is the best (more precise) ?
  • When will the algorithm reach the fixed point? (Worst-case time consumption)

We will have a look at these problems in the latter part of this lecture.

Partial Order

Poset(p, op): Reflexivity, Antisymmetry, Transitivity

Partial means that for a pair of set elements in P, they could be incomparable; in other words, not necessary that every pair of set elements must satisfy the ordering.

Upper and Lower Bounds

Given a poset (P, $\sube$) and its subset S that S is a subset of P, we say that:

  • u in P is an upper bound of S, if for every x in S, x $\sube$ u.
  • l in P is an lower bound of S, if for every x in S, l $\sube$ x

Then we can define the least upper bound $\sqcup$S and the greatest lower bound $\sqcap$S.

And if S contains only two elements a and b, then

​ $\sqcup$S can be written as a $\sqcup$ b

​ $\sqcap$S can be written as a $\sqcap$ b

Some properties

  • Not every poset has lub or glb.
  • But if a poset has lub or glb, it will be unique

Lattice

Given a poset (P, $\sube$), $\forall$a,b $\in$ P, if a $\sqcup$ b and a $\sqcap$ b exist, then this poset is called a lattice.

Semi lattice

Given a poset (P, $\sube$), $\forall$a,b $\in$ P,

  • if only a $\sqcup$ b then this poset is called a join semilattice,

  • else if only a $\sqcap$ b exist, this poset is called a meet semilattice.

Complete lattice

Given a poset (P, $\sube$), for arbitrary subset S of P, if $\sqcup$S and $\sqcap$S exist, then this poset is called a complete lattice.

Every complete lattice (P, $\sube$) has a greatest element $\top$ = $\sqcup$ P called top and a least element $\bot$ = $\sqcap$ P called bottom.

Every finite lattice (P is finite) is a complete lattice.

Product Lattice

Give lattices L1 = (P1, $\sube_1$), L2 = (P2, $\sube_2$), L3 = (P3, $\sube_3$) …. , Ln = (Pn, $\sube_n$), if for all i, (Pi, $\sube_i$) has $\sqcup_i$ and $\sqcap_i$, then we can have a product lattice $L^n$ = (P, $\sube$) that is defined by:

P = P1 * P2 * … * Pn

$(x_1, … , x_n) \sube (y_1,…,y_n) \iff (x_1 \sube y_1) \and (x_2 \sube y_2) \and … \and (x_n \sube y_n) $

$(x_1, … , x_n) \sqcup (y_1,…,y_n) = (x_1 \sqcup_1 y_1 , x_2 \sqcup_2 y_2 ,… , x_n \sqcup_n y_n) $

$(x_1, … , x_n) \sqcap (y_1,…,y_n) = (x_1 \sqcap_1 y_1 , x_2 \sqcap_2 y_2 ,… , x_n \sqcap_n y_n) $

Properties:

  • A product lattice is a lattice
  • If a product lattice L is a product of complete lattices, then L is complete lattice.

Data Flow Analysis Framework via Lattice

A data flow analysis framework (D, L, F) consists of:

  • D: Direction of the data flow

  • L: A lattice including domain of the values V and a meet $\sqcap$ or join $\sqcup$ operator.

  • F: a family of transfer functions from V to V

Data flow analysis can be seen as iteratively applying transfer functions and meet/join operations on the values of a lattice.

Back to the questions we mentioned before

  • Is the algorithm guaranteed to terminate or reach the fixed point, or does it always have a solution ?

  • If so, is there only one fixed point? If more than one, which one is the best (more precise) ?

  • When will the algorithm reach the fixed point? (Worst-case time consumption)

Monotonicity

Obviously:

  • A function f: L -> L (L is a lattice) is monotonic if $\forall$ x, y $\in$ L, $x\sube y \Rightarrow f(x)\sube f(y)$

Fixed-Point TH

Given a complete lattice (L, $\sube$), if

  • f: L -> L is monotonic, and
  • L is finite, then

the least fixed point of f can be found by iterating f($\bot$), f(f($\bot$)), …, $f^k(\bot)$ until a fixed point is reached.

the greatest fixed point of f can be found by iterating f($\top$), f(f($\top$)), …, $f^k(\top)$ until a fixed point is reached.

proof:

By the definition of $\bot$ and f: L->L, we have $\bot$ $\sube f(\bot)$

As f is monotonic, $f(\bot)$ $\sube f(f(\bot)) = f^2(\bot)$

Because of finiteness, the fixed point exists. (Partial Order can not be reversed)