• No results found

Timing Analysis

3.2 Architectural Analysis

3.2.5 Challenges for Static Analysis

The degree of non-determinism that is observed during abstract simulation strongly depends on the analyzed hardware. The complexity of a static timing analysis is directly related to the possible amount of non-determinism. Even if a hardware feature is perfectly well analyzable in isolation (e.g., a cache using the LRU re-placement strategy) the combination of two such features might lead to a very costly or imprecise analysis. As a consequence, we observe an evolution of hard-ware states that would never occur during any real execution (i.e., analysis-induced non-determinism).

Take the Freescale MPC5554 hardware architecture as an example (see Section6.1.5 on page 95). For the sake of simplicity we assume that the cache is updated using the LRU cache replacement policy.7 LRU caches are perfectly well analyzable and proven to be free of cache-related timing anomalies or domino effects [4], i.e., a cache hit cannot effectuate a cache state where more subsequent cache misses can

7The actual hardware uses a pseudo-round robin replacement strategy with a global set counter.

3.2 Architectural Analysis

occur for the same access sequence. Furthermore, the processor comprises an on-chip FLASH memory that offers a two-line read buffer to cache previously accessed FLASH pages. The size of a FLASH page corresponds to the size of a cache line. Both the FLASH memory read buffer and the LRU instruction cache are nicely analyzable in isolation. But a combination of both analyses will lead to an information loss due to the unfavorable structure of the combined analysis for the hierarchical layout of the MPC5554 hardware architecture.

A code fetch that targets the FLASH memory can either be cached or not. In case the requested instructions are already available in the instruction cache, the FLASH read buffer state will not be updated. Hence, we will lose any information about the read buffer state after a potential state join. After the state join the analysis is unable to tell whether an access to the same memory reference hits the FLASH read buffer. Figure3.7depicts this issue in detail.

( >, > )

( {a}, {a} )

( {a}, > ) ( {a}, {a} )

( {a}, > ) Cache Hit

Cache Miss Buffer Hit

Cache Miss Buffer Miss

t

t

t

Figure 3.7:Non-Determinism in Static Analysis: Evolution of abstract pipeline state for an access to the internal FLASH memory. Each node shows a pair com-prising an abstract cache state (left-hand side) and an abstract FLASH read buffer state (right-hand side). Initially we have no information about the state of both the cache and the FLASH read buffer, i.e., (>, >). After the access is complete, the analysis knows that a is definitely contained in the cache, but is unsure about the contents of the read buffer.

This problem is common to static analyses for architectures comprising components where an update of one system component does not affect the state of other involved

components under any assumption made. As long as the state of one such component is not precisely known any information about other components is inevitably lost at each join point. To avoid such loss of information, a static analysis would need to enumerate all possible states, which is usually infeasible in practice.

Consider the abstract state automaton ˆA= ( ˆS, τabs) that provides an abstract model for a concrete processor. For each component of the real processor, i.e., functional units, caches, bus interconnects, etc., there exists a corresponding counterpart in the abstract model encoded by ˆA. Hence, the abstract state space can be understood as a composition of abstract component states, i.e., ˆS = ˆSC

1× · · · × ˆSC

n, where ˆSC

i is the abstract state space for the component Ci where 1 ≤ i ≤ n and n ∈ N is the number of processor components.

An abstract state transition does not necessarily update every abstract component state. Which component states are affected strongly depends on the interaction between individual hardware components [42].

To discover the above described problem in the abstract state space we first need to identify the modified components. For this purpose we define thestate transition difference.

Definition 3.15(State Transition Difference)

Let ˆA= ( ˆS, τabs)be an abstract state automaton, where ˆS is the composition of abstract component states, i.e., ˆS = ˆSC

1× · · · × ˆSC

n with n ∈ N. The state transition difference τdiff : ˆS × ˆS → Bnis defined for ˆs = ( ˆc1, . . . , ˆcn)and ˆt = ( ˆc01, . . . , ˆc0n) ∈ τabs( ˆs) as follows:

τdiff( ˆs, ˆt) := (b1, . . . , bn)with bi :=( 1 if ˆci, ˆc0i 0 otherwise

By means of the state transition difference we can observe which abstract com-ponents a state transition are affected. If for a given abstract state ˆs ∈ ˆS all state transitions ( ˆs, ˆt) with ˆt ∈ τabs( ˆs) affect the same hardware components the state tran-sitions originating at ˆs are inclusive. Otherwise the state trantran-sitions originating at ˆs arenon-inclusive.

Definition 3.16(Inclusive State Transition)

Let ˆA= ( ˆS, τabs)be an abstract state automaton, where ˆS is the composition of abstract component states. Let ˆs ∈ ˆS be an abstract state. The state transitions originating at ˆs are inclusive iff for all ˆt and ˆu ∈ τabs( ˆs) it holds τdiff( ˆs, ˆt) = τdiff( ˆs, ˆu). Otherwise the state transitions originating at ˆs are non-inclusive.

Non-inclusive state transitions imply information loss in the abstract state space due to later abstract state joins. Reconsider the example in Figure3.7. Here the state transitions originating at the initial hardware state are non-inclusive. Due to

3.2 Architectural Analysis

the state join at the final hardware state the analysis loses all information about the second hardware component (i.e., the FLASH read buffer).

This phenomenon is not uncommon to static timing analysis – especially for the analysis of complex hardware architectures. Non-inclusive state transitions cannot always be avoided, either due to limitations of the analysis tool chain or simply because of the hardware architecture. The following list comprises common cases of non-inclusive state transitions:

value analysis precision The precision of the value analysis has a major impact on the performance of a static timing analysis. For instance, if the target of a memory access cannot be precisely identified, the analysis has to consider all possible memories as a potential target. If updating the memory state has an impact on the timing behavior of subsequent memory accesses non-inclusive state transitions cannot be avoided. The processor to memory bus clock jitter, read buffers (see above), are just some examples.

virtual memory A related problem occurs in the static analysis of virtual memory.

Typically a translation lookaside buffer (TLB) is employed to cache the virtual to physical address translation attributes (i.e., page table entries). Upon a TLB miss an interrupt is triggered and the corresponding interrupt service routine (re)loads the requested page table entry. If the state of TLB is initially un-known the state of other (involved) hardware components remains (partially) unknown until the TLB contents are precisely known. Naturally, this greatly impairs the precision and resource consumption of a static timing analysis.

non-inclusive caches This phenomenon has been already discussed in Figure3.7.

Cache-related non-inclusive state transitions are unavoidable if the assumption of a cache hit does not affect the same system components that are updated upon a cache miss otherwise.

For example, consider a Freescale MPC755 processor that is connected to a DRAM controller. As long as the state of the processor’s L1 cache is unknown, a static analysis cannot gather any information about the status of the DRAM controller and is thus unable to predict any DRAM page hits.

This phenomenon cannot be avoided completely because either the value analysis precision cannot be further improved, or a modification of the hardware architecture is not possible at all, or, if possible, would incur an unacceptable degradation of performance. However, some hardware architectures, like the Freescale MPC7448, features inclusive L1 and L2 caches. Any modification done to the L1 cache auto-matically implies a state update of the (outer) L2 cache. For such architectures, the static analysis will not lose information about the L2 cache state after state joins due to an unknown state of the L1 cache.