Data Flow Analysis Applications I

Layout of these lectures:

  • Overview of Data Flow Analysis
  • Preliminaries of Data Flow Analysis
  • Reaching Definitions Analysis
  • Live Variable Analysis
  • Available Expression Analysis

Overview

Data Flow Analysis –> How Data flows on CFG? –> How application-specific Data Flows through the Nodes and Edges of CFG( a program ) ?

Recall 1st lecture:

  • We will do abstraction regarding the application-specific data (Like sign abstraction)
  • Over-approximation will occur in the flow process to guarantee soundness.(for most static analysis, which is called may analysis)
    • In contrast, must analysis refers to algorithms that output information that must be true(under-approximation)

Node(BBs/statements) –> Transfer function in nodes

Edges(control flows)–> Control-flow handling (merge/union)

Preliminaries of Data Flow Analysis

Input and output states

  • Each execution of an IR statement transforms an input state to a new output state.
  • The input (output) state is associated with the program point before (after) the statement.

(This is quite similar to the state machine abstraction mentioned in OS, which is taught by Yanyan Jiang.)

graph TD;
	1[S1]
	2[S2]
	3[S3]
	1-->3
	2-->3

$$ In[S_3] = OUT[S_1] \and OUT[S_2] $$

In each data-flow analysis application, we associate with every program point a data-flow value that represents an abstraction of the set of all possible program states that can be observed for that point.

Data-flow analysis is to find a solution to a set of safe-approximation-directed constraints on th IN[s]’s and OUT[s]’s, for all statements s.

  • Constraints based on semantics of statements (transfer function)
  • Constraints based on the flows of control

Forward analysis

$$ OUT[s] = f_s[IN[s]] $$

Backward analysis

$$ IN[s] = F_s(OUT[s]) $$

We can just convert the CFG to turn backward analysis to forward analysis. (Details not mentioned here)

Control flow within a BB

$$ IN[s_{i+1}] = OUT[s_{i}] $$

for all i = 1, 2, … , n-1

Control flow among BBs

$$ IN[B] = IN[s_1] $$

$$ OUT[B] = OUT[s_n] $$

Forward: $$ OUT[B] = f_B(IN[B]) $$ Where $$ f_B = f_{s_{n}} (f_{s_{n-1}}(…f_{s_{1}})) $$

$$ IN[B] = \and \space OUT[P] $$ where P is a predecessor of B

The backward version is similar to forward version, not listed here.

Reaching Definitions Analysis

Declaration:

  • Issues not covered here
    • Method Calls
      • Only intra-procedural CFG
    • Aliases
      • Variables have no aliases

A definition d at program point p reaches a point q if there is a path from p to q such that d is not ‘killed’ along the path.

  • A definition of a variable v is a statement that assigns a value to v
  • Killed : A new definition of v appears on the path
  • Can be used to detect possible undefined variables. (Introduce a dummy definition for every v at the entry point)
    • This is a may analysis

Abstraction

  • Data Flow Values/Facts
  • The definitions of all variables in a program
  • Can be represented by bit vectors

Safe approximation

  • Transfer function
  • Control flow

We can try to understand safe approximation with the following example:

This statement ‘generates’ a definition D of variable v and kills all the other definitions before that define v.

  • Transfer function $$ OUT[B] = gen_B \cup (IN[B] - kill_B) $$

  • Control flow $$ IN[B] = \cup \space OUT[P] $$ where P is a predecessor of B

We can note that in our definition of control flow, a definition reaches a program point as long as there exists at least one path along which the definition reaches.

Algorithm of Reaching Definitions Analysis

  • Input: CFG
  • Output: IN[B] and OUT[B] for each basic block B
  • Method:
    • OUT[entry] = empty set
    • for each basic block B\entry
      • OUT[B] = empty set
    • while changes to any OUT occur
      • for each basic block B\entry
        • update IN[B] and OUT[B]
      • End for
    • End while

For some analysis, OUT[entry] is not defined as empty set, in which situation we can easily change this template algorithm to satisfy those requirements.

The initialization of OUT[B] will be introduced in later lectures. Remember that we are doing may analysis here

Proof of correctness:

  • Partial correctness
    • Can be achieved with another iteration.
  • Termination
    • gens and kills remain unchanged
    • when more facts flow in IN[S], the more facts either
      • is killed, or
      • flows to OUT[S]
    • When a fact is added to OUT[S], through either gens, or survivors, it stays there forever
    • OUT[S] never shrinks