Prediction Graph - Architectural Analysis

Timing Analysis

3.2 Architectural Analysis

3.2.3 Prediction Graph

The abstract state collecting path semantics computes a safe approximation of the concrete execution behavior of a program. We can now combine the observed ab-stract state transitions into aprediction graph that describes every possible execution behavior. The prediction graph is our basis to detect instances of timing anomalies in the abstract hardware state space. For this purpose we define thefeasible abstract successor to construct the prediction graph.

Definition 3.11(Feasible Abstract Successor)

Let G = (V , v_s, E) be a CFG and ˆA= ( ˆS, τ_abs)an abstract state automaton, and θ ∈ 2^S^ˆ be the set of initial abstract states that are about to execute the first instruction of the control-flow graph G, i.e., ∀ ˆs ∈ θ . ∃ ˆt ∈ ˆS s.t. ˆt 7 vs ∧ ˆs ∈ τ_abs( ˆt) ∧ ˆs B vs.

The abstract state ˆt is a feasible abstract successor of the abstract state ˆs, written ˆs π ˆt, iff there exists a path π = π⁰◦v ◦ π⁰⁰and k ∈ N such that:

ˆs ∈ f_τ^B_abs^k(v)([[π⁰]]_τ_abs(θ)) ∧ ˆt ∈ f_τ^B_abs^k+1(v)([[π⁰]]_τ_abs(θ)) ∧ ˆt ∈ τabs( ˆs) Definition 3.12(Prediction Graph)

Let G = (V , v_s, E) be a CFG, ˆA= ( ˆS, τ_abs)an abstract state automaton, and θ ∈ 2^S^ˆbe the set of initial abstract states. The corresponding prediction graph is ˆP_G= ( ˆS, ˆE), where

E= { ( ˆs, ˆt) | ˆs π ˆt∧ π is a path through G }.

[[(v₁)]]_τ_abs({ ˆs₀}) [[(v₁, v₂)]]_τ_abs({ ˆs₀})

ˆs0

ˆs1

ˆs₂

ˆs3 ˆs4

ˆs₅ ˆs₆ ˆs₇

Figure 3.5:Prediction Graph: Evolution of abstract hardware states for the simulation of the instruction sequence (v₁, v₂). The edges denote the single cycle transitions in the abstract state space. The gray boxes span the set of states that execute under the same instruction. In total the program completes after four cycles.

Figure3.5depicts the evolution of abstract hardware states for the execution of a very simple program that consists of the path π = (v₁, v₂). The length of a longest path through the prediction graph provides a safe upper bound for the number of processor cycles in which the input program finishes execution. In this example, the longest path comprises four transitions of abstract hardware states. Hence, the program takes at worst four cycles to complete.

Depending on the control-flow graph, it is not feasible to construct the whole prediction graph. For now we restrict ourselves to programs that comprise a finite number of paths. Furthermore we implicitly require the abstract hardware states to carry along the analysis context. We assume an infinite call string approach and all loops to be virtually unrolled [26], such that every loop iteration corresponds to a different analysis context. In this fashion we achieve that the prediction graph is a directed-acyclic graph (DAG) because back edges cannot occur. Chapter4 on page 51relaxes some restrictions to allow for arbitrary, possibly infinite programs.

We can now compute a longest path through the prediction graph. The length of that path denotes an upper bound (in terms of processor cycles) for the longest execution

3.2 Architectural Analysis

of the corresponding program. To do so we first sort the prediction graph nodes in topological order by means of depth-first search to realize the sorting. Algorithm3.1 provides a pseudo-code implementation [44]. The output of the algorithm is a vector of topologically sorted nodes.

Algorithm 3.1Topological Sorting of the Prediction Graph function TopologicalSorting

Given the topological sorting of the prediction graph and the fact that the prediction graph is a DAG, we can now compute a longest path through the graph. Note that the longest path through the prediction graph is not uniquely determined. There might be multiple paths through the graph with maximal but equal length.

For each node we remember the predecessor node and the maximal length of the path to the node that has been seen so far. Initially every node is associated with itself as predecessor and zero length. In topological order of the nodes we then update this information stepwise. If we find a longer path from the current node to one of its predecessors, we update the maximal path length and corresponding

predecessor node. After having visited all nodes, we construct a longest path.

First we choose the node with the maximum associated path length. To construct the longest path we follow the predecessors until we reach one of the initial hardware states. Algorithm 3.2 implements this computation in pseudo-code [22]. The algorithm returns a sequence of edges that describes a worst-case path through the prediction graph. For the prediction graph shown in Figure3.5, the longest path algorithm computes the path P = [ ( ˆs₀, ˆs₂), ( ˆs₂, ˆs₅), ( ˆs₅, ˆs₆), ( ˆs₆, ˆs₇) ].

Algorithm 3.2Computation of a Longest Path function LongestPath

inputPrediction graph ˆP_G= ( ˆS, ˆE) outputLongest path P

begin

Map Cost ← { ˆs → 0 | ˆs ∈ ˆS }

Map P redecessor ← { ˆs → ˆs | ˆs ∈ ˆS } Vector T ← TopologicalSorting( ˆP_G) foreach ˆs ∈ T

c ← Cost( ˆs) foreach( ˆs, ˆt) ∈ ˆE

c⁰←Cost( ˆt) ifc + 1 > c⁰ then

Cost ← Cost[ ˆt → c + 1 ]

P redecessor ← P redecessor[ ˆt → ˆs]

end if end foreach end foreach P ← []

choose ˆs ∈ ˆS whereCost( ˆs) ≥ Cost( ˆt) for all ˆt ∈ ˆS repeat

ˆt = P redecessor(ˆs) Prepend(P , ( ˆt, ˆs))

ˆs = ˆt until ˆs ∈ θ end

end function

Reconsider the prediction graph shown in Figure 3.5. Starting with the abstract hardware state ˆs₀, the abstract state transition distinguishes between the successor states ˆs1and ˆs2. For example, the transition ( ˆs0, ˆs1) could represent an initial cache miss, i.e., the local worst-case (LWC), whereas the state transition ( ˆs₀, ˆs₂) corresponds

3.2 Architectural Analysis

to assuming an initial cache hit (i.e., non-LWC). Because the edge ( ˆs₀, ˆs₂) is part of the longest path P and the opposing path P⁰= [ ( ˆs₀, ˆs1), ( ˆs₁, ˆs3), ( ˆs₃, ˆs4) ] is shorter, the abstract simulation would reveal an instance of a timing anomaly.

To detect a timing anomaly we need to be able to identify whether an outcome of a split in the abstract state transition corresponds to thelocal worst-case. For example, if it is unknown whether a memory reference hits or misses the cache, we naturally identify the cache miss as the local worst-case. In many such situations we have an intuitive understanding about which decision is to be considered the local worst-case. For some others the local worst-case is not easily identified.

In document Static timing analysis tool validation in the presence of timing anomalies (Page 53-57)