Case studies - AUTOMATIC COMPOSITIONAL REASONING FOR SCAL-

CHAPTER 7 AUTOMATIC COMPOSITIONAL REASONING FOR SCAL-

7.6 Case studies

We present the following case studies in order to highlight the different aspects of our technique:

1. Ripple carry adder (Chapter 5) 2. MIMO Systems (Chapter 4) 3. FIR filter

4. OR1200 processor (Chapter 5)

With the exception of the ripple carry adder, the case studies that we consider are all sequential designs. For these sequential designs, we wish to compute the steady-state probabilities with respect to performance, i.e., we consider Type A properties described in Section 3.4.

The results presented in our tables can be interpreted as follows. We terminate ex- periments for which verification does not complete within 10 hours. In such cases, we verify the accuracy of our computed invariants by comparing them with high-confidence estimates that are obtained using a sufficiently large number of simulations. In the table for each case study, we list both the number of states (or input vectors) in the largest component model and the number of value-based cases (i.e., verification instances).

For sequential, we use our approach to trade off the complexity in space with that in time. We are now able to compute probabilistic invariants for models that were previously “uncheckable” since the model checking tool ran out of memory.

7.6.1 Ripple carry adder

We wish to compute the probability that the delay at the MSB of a ripple carry adder output is less than a specified timing constraint (Definition 14). We model this delay invariant approximately (Section 5.2) as a function of carry bits that ripple through the

design, denoted by ripples, which in turn can be expressed as a function of adder inputs. For a 32-bit adder, ripples can take a value between 0 and 31 depending on the inputs. The 32-bit adder will have 264 _{possible input vectors and is therefore, too large for efficient,} formal probabilistic analysis (Section 3.3). For a 64-bit adder, the number of possible input vectors can be as high as 2128 _{(approximately 10}40_).

Since we model delay as a function of ripples, we can compute the delay invariant by determining the probability distribution of ripples. We decompose the 32-bit adder into four 8-bit blocks. While verifying Block 4 (corresponding to the higher bits), we will have to consider all possible values of the lower 24 bits of the adder output. However, Block 4 only requires information about the probability distribution of ripples through the Blocks 1 to 3, i.e., the lower 24 bits. Since there are only 24 possible numeric values for the ripples through Blocks 1 to 3, only 24 cases need to be considered. We apply this approach recursively to compute the ripples through Blocks 1 and 2. At any point, probabilistic verification is performed only on an 8-bit adder block.

Table 7.1: Delay invariants at the MSB of a ripple carry adder. Without decomposition With decomposition

Input Time Invariant Split Input Cases Time Invariant

vectors (sec) vectors (sec)

16-bit 232 _2.141 _0.9990 2x8-bit 216 9 0.31 0.9990 4x4-bit 28 ₂₇ _0.93 _0.9990 32-bit 264 _>10hrs _- 2x16-bit 232 ₁₇ _36.40 _0.9688 4x8-bit 216 ₅₁ _1.63 _0.9688 8x4-bit 28 ₁₁₉ _3.81 _0.9688 64-bit 2128 _>10hrs _- 4x16-bit 232 ₉₉ _211.96 _0.8730 8x8-bit 216 ₂₃₁ _3.70 _0.8730 8x4-bit 28 ₄₉₅ _7.92 _0.8730

Table 7.1 shows the delay invariants that we compute using our technique for adders of different sizes. We consider several possible splits of the adder designs into smaller equally-sized blocks. We observe that there exists an optimum point for minimizing the total runtime while trading off between the size of the component model (i.e., number of input vectors) and the total number of verification instances.

7.6.2 FIR Filter

We consider a system where a bit x[n] transmitted over a channel [14] at time step n can be corrupted by fractions of the bits transmitted in the three previous time steps. Moreover, the channel can also introduce statistical variation, called noise, into the data.

Therefore the received data y[n], at time step n, can be modeled as y[n] = 3 X i=0 aix[n − i] + η[n] (7.14)

where ai represents the fractions of the previous bits that corrupt the data. The noise is represented by η[n] which is commonly modeled as a Gaussian random variable and y[n] is represented using a variable with N bits.

The received data is processed by a digital hardware block, called an equalizer [89] which is a finite impulse response (FIR) filter whose output is represented as

z[n] = T aps−1_X

i=0

ciy[n − i] (7.15)

where ci are called the tap coefficients and T aps is the total number of such coefficients. The FIR output z[n] can then be used to compute bx[n] which is an estimate of the transmitted bit.

We wish to compute the BER that is defined as the probability that the transmitted bit, x[n] is different from the decoded bit bx[n]. We construct an RTL DTMC (Section 3.4.1) with x[n − i] and y[n − i] as the state variables, denoted by xi and yi respectively. We are interested in computing the following invariant:

Γ = P [_{bx[n] 6= x[n]]}

= P [f (y0, y1, ...yT aps−1) 6= x0] (7.16) where f is the function defined in Equation 7.15. We observe that f is additively separable. Therefore, we can rewrite Equation 7.16 as

Γ = P [f (f1(y0, .., yT aps/2−1), f2(yT aps/2, .., yT aps−1)) 6= x0] (7.17) where f1 and f2 are given by

f1(y0, .., yT aps/2−1) = T aps/2−1_X i=0 ciy[n − i] (7.18) f2(yT aps/2, .., yT aps−1) = T aps−1_X i=T aps/2 ciy[n − i] (7.19)

We split the observables into two sets: {y0, .., yT aps/2−1} and {yT aps/2, .., yT aps−1}. We need to consider all 2N T aps/2 _{numeric values of the observables in {y}

T aps/2, .., yT aps−1}, where each yi is represented using N bits. However, we are interested only in the values of f2(yT aps/2, .., yT aps−1). We assume that the output of every addition in the filter is truncated to N bits. Therefore, the number of cases we need to consider is reduced to 2N_.

In Table 7.2, we show the BER invariants computed for FIR filters with different number of data bits (N) and taps (T aps). We observe that the number of cases depends only on the number of data bits and not on the number of taps.

Table 7.2: BER of an FIR Filter. We use O.o.M to denote Out of Memory. Without decomposition With decomposition

N Taps DTMC Time Invariant Split DTMC Cases Time Invariant

(bits) states states

6 6 > 1012 _>10hrs _- _2x3- tap 222 ₁₂₈ _{189.44s 5.02x10}−3 6 8 O.o.M. - - 2x4- tap 228 ₁₂₈ _{234.24s 9.67x10}−4 8 6 O.o.M. - - 2x3- tap 228 ₅₁₂ _{2713.6s 3.36x10}−3

7.6.3 MIMO systems

We wish to obtain the BER performance of the combined operation of a 1x4 MIMO detector and a Viterbi decoder (Chapter 4). Let x be the actual transmitted data bit. Let x′ and x′′ _{be the bits estimated by the MIMO detector and Viterbi decoder, respectively. We} express the BER invariant of the combined system as a function of x, x′ _{and x}′′ _{which can} in turn be written as a function of the state variables of the individual components. This function is fairly complex, and therefore we omit its description here. Direct application of our SHARPE methodology (Section 3.4) on the combined system is infeasible due to the large state space of the corresponding RTL DTMC model.

Table 7.3: Individual BER values for a MIMO detector and Viterbi decoder.

States States Time Invariant

(Original model) (Reduced model) (sec)

MIMO 1x4 524, 288 1, 320 53.27 1.08x10−5

We choose to perform case-splitting based on the numeric values of x and x′_{. In} other words, we model check the Viterbi decoder, by conditioning on whether the MIMO detector output is correct or not. We need to consider four cases of the form (x = i, x′ _{= j),} where i,j ∈ {0, 1}. However, the occurrence of bit errors at the MIMO detector output is independent on the values of the decoded and transmitted bits. Therefore, due to symmetry, we only need to consider two cases: (x = 0, x′ _{= 0) and (x = 0, x}′ _{= 1). In} Table 7.4, we show the BER invariant computed using our methodology.

Table 7.4: BER for a MIMO detector operating in conjunction with a Viterbi decoder. Without decomposition With decomposition

States Time Invariant States Cases Time Invariant (sec)

MIMO 1x4

O.o.M. - - 8, 505, 363 2 493.45 0.2437

+ Viterbi

7.6.4 OR1200 processor

In the control logic of processors, even single bit flips that occur due to soft errors [9] can be catastrophic. Therefore, it is desirable to have an estimate of the soft error rate (SER) which is defined to be the probability of occurrence of soft errors. In this case study, we consider an OR1200 processor, an open-source embedded RISC processor. We wish to compute the SER invariant for the RTL variable sel a that determines the address which one set of data values are read from or written to.

Through static analysis, we determine that sel a depends on whether the address stored in six bits of the fetched instruction if insn[21:16] is equal to the address stored in rf addrw. It is reasonable to assume that SER depends on the values of the RTL variables for stall (if stall), freeze (if f reeze) and pipeline flush (f lushpipe). Therefore, we define the SER invariant as a function f (if stall, if f reeze, f lushpipe, if insn[21:16], rf addrw).

Without decomposition, PRISM is unable to construct the required DTMC model. Therefore, we perform case-splitting based on the numeric values of if insn[21:16] and rf addrw. Each of these are 6-bit variables, and therefore we need to consider 212 cases. From static analysis we identify that the numeric values of if stall,if f reeze and f lushpipe depend only on whether rf addrw is equal to 0 or not. Therefore, we model rf addrw to have either value 0 or a value i (i 6= 0) that represents the remaining 63 non- zero possibilities. We are only interested in checking whether if insn[21:16]=rf addrw. Therefore, we consider only three values for if insn[21 : 16]: 0, i (i 6= 0) and j (j 6= 0,

j 6= i). We consider 3x2=6 combinations of numeric values for if insn[21:16] and rf addrw. A similar data abstraction technique has been used in [38]. Table 7.5 shows the SER computed using our decomposition methodology.

Table 7.5: SER for the sel a variable in the control logic of an OR1200 processor Without decomposition With decomposition

States Time Invariant States Cases Time Invariant (sec)

sel a O.o.M. - - 3, 145, 729 6 16.48 1.0 x 10−8

In general, control logic of processors can depend on datapath variables . Considering a case-splitting based on the values of such variables can potentially lead to a very large number of cases. However, data-dependant control logic typically is concerned with only a few possible data values for a particular case. Therefore, the number of cases to be verified is small in practice.

7.7 Chapter summary

In this chapter, we have presented an automatic decomposition technique to scale probabilistic verification for hardware designs. We have shown that our technique can be used to significantly improve the scalability of our SHARPE methodology by demonstrating it on several case studies.

In document Statistical guarantees of performance for RTL designs (Page 126-132)