Experimental Evaluation of Method II - EFFICIENT VALIDATION INPUT GENERATION IN

CHAPTER 3 EFFICIENT VALIDATION INPUT GENERATION IN

3.8 Experimental Evaluation of Method II

According to the flow shown in Figure 3.7, we have implemented the explored symbolic state caching in STAR, and also all optimization strategies, with C++, which interacts with VCS simulator through the direct programming interface(DPI) and with Yices [57] constraint solver through its C Library Interface. All the following experiments are performed on a machine with four Intel i5 2.67GHz processor cores with 16GB of memory running Linux. We present a set of experimental results for the RTL models from ITC99 and OpenRISC1200 [1]. The OR1200-0/1/2 designs are instruction cache controller, data cache controller and Wishbone bus interface. 1

3.8.1 Comparison with Original STAR

The second experiment, for which the results are shown in Table 3.3, compared the enhanced STAR with the original STAR algorithm. We set the maximum runtime to

1_{We refer to the work in this chapter that augments STAR with explored symbolic state caching,}

bitmap encoding of taken branches and the two optimizations as the enhanced STAR in the experimental results.

Table 3.3: Comparison between the enhanced STAR introduced in this paper and the original STAR. All runtimes are in seconds. The length of each generated pattern is equal to the unrolled depth. The runtime limit is set as one hour. The original STAR is not scalable for most designs.

Bench- Cycles Original STAR Enhanced STAR

mark Runtime Num. of Repeated Runtime Num. of Repeated Patterns Branches Patterns Branches

b01 10 0.76s 512 563 0.07s 32 35 b06 10 175.77s 131072 205156 0.45s 362 566 b10 10 237.01s 70593 76134 0.80s 248 236 b11 50 >3600s - - 3.66s 367 1684 b11 100 >3600s - - 16.47s 745 6941 b14 10 >3600s - - 1020.46s 61836 15065 or1200-0 50 >3600s - - 20.28s 226 2035 or1200-0 100 >3600s - - 88.94s 375 7532 or1200-1 100 >3600s - - 11.22s 87 957 or1200-2 10 >3600s - - 815.40s 15596 67941

one hour. It can be observed that STAR suffers from path explosion for most of the designs. That is shown by long runtime over 1 hour. The runtime will increase exponentially as the unrolled depth increases in STAR. Therefore, the unrolled depth in this experiment was very small in original STAR. Thus, the original STAR is unscalable for big, practical designs. For the same circuit, however, the runtime for enhanced STAR was very small. The runtime of enhanced STAR does not increase exponentially with the unrolling cycles. The average number of branches repeated by the generated test set is used as a measurement to approximately reflect repetitive covering of the same design function. It can be observed that the number of repeated branches is very high in the original STAR, whereas in the enhanced STAR, the average number of repetitively covered branches is less than 6% of that in original STAR.

3.8.2 Optimization Effectiveness of Enhanced STAR

The optimization detail column in Table 3.4 shows the optimization effectiveness of UD chain slicing and local conflict resolution. The explored symbolic state caching’s effect is also shown in the “caching” column. UD chain slicing reduces the number of constraints sent to the SMT solver, since unused variables’ defini- tions will be excluded from current constraints. Local conflict resolution reduces the number of calls to the SMT solver, since it takes advantage of the static analysis

Table 3.4: The “UD chain slicing” column represents the percentage of reduced constraint numbers. The “local conflict resolution” column represents the number of conflicts detected during the mutation of constraints. The “caching” column represents the number of detected explored symbolic states.

Bench- Cycles Optimization detail

mark UD chain slicing Local conflict resolution Caching

b01 10 56.7% 31 24 b10 15 43.4% 224 515 b11 100 5.24% 1245 720 or1200-0 100 0.2% 759 370 or1200-1 100 0.22% 280 86 or1200-2 10 23.4% 178 11850

Table 3.5: The coverage, running time, number of patterns and repeated branches reported by the enhanced STAR. The generated tests by our enhanced STAR have high structural coverage as well as functional coverage. The enhanced STAR is also compared with constraint-based random test generation method. The tests generated by the enhanced STAR have much higher coverage than the tests generated by constraint based random test generation method.

Bench- STAR with explored symbolic state caching (Constraint-based) random test generation mark Cycles Bran Cov Path Cov Assert Cov Cycles Bran Cov Path Cov Assert Cov

b01 10 94.44% 94.44% 95.00% 10(x1000) 94.44% 94.44% 95.00% b06 10 94.12% 93.10% 100% 10(x1000) 94.12% 93.10% 100.00% b10 10 87.10% 72.73% 5.85% 10(x1000) 83.87% 69.70% 1.34% b10 50 96.77% 81.82% 96.15% 50(x1000) 93.55% 78.79% 7.17% b11 50 91.30% 91.30% 12.31% 10(x1000) 82.61% 82.61% 1.01% b11 100 91.30% 91.30% 25.88% 100(x1000) 82.61% 82.61% 1.01% b11 150 91.30% 91.30% 35.93% 150(x1000) 82.61% 82.61% 1.51% b14 10 83.50% 45% 100% 10(x1000) 44.17% 0.30% 66.67% or1200-0 50 93.75% 74.07% 97.89% 20000 12.50% 7.41% 17.61% or1200-0 100 93.75% 77.78% 100% - - - - or1200-1 50 96.55% 76.74% 72.55% 20000 12.00% 5.56% 0% or1200-1 100 96.55% 79.07% 92.16% - - - - or1200-2 10 100% 100% 100% 20000 36.14% 50% 38.57%

to check the conflict in the constraints stack instead of sending them to the SMT solver. The UD chain slicing can reduce the number of constraints by 0.2%–56.7%. Shown in the “local conflict resolution” column, this optimization is able to find local constraint conflicts and successfully avoid unnecessary calls to the SMT solver. For explored symbolic state caching, it can be observed that there are 24–11850 symbolic states on different designs identified as having been explored before.

3.8.3 Coverage Evaluation

Structural Coverage Evaluation

The first experiment in Table 3.5 shows the coverage rate for the generated test patterns by the enhanced STAR. It can be observed that the enhanced STAR can generate tests that achieve very high structural coverage as long as the unrolled depth is enough.

The unrolled depth is an important parameter for improving coverage in STAR. It can be determined by design engineers or by coverage feedback. If the coverage is not high, it means that the uncovered parts are not reachable within the unrolled design. We can increase the unrolled depth. The b10 and b11 circuits demon- strate that relationship between coverage and unrolled depth. When the unrolled depth increases from 10 to 30, all feasible branches are fully covered. However, for or1200-2, 10 cycles is enough to cover all branches. The running time demonstrates the applicability of the enhanced STAR for practical circuits. There is no memory bottlenecks even if we store the explored symbolic state in the form of bitmaps.

It is worthy of mentioning that infeasible paths can be identified during path enumeration. This is highly valuable to the verification engineer. In addition, the enhanced STAR can also be used to check properties on each path.

Functional Coverage Analysis

We also demonstrated that the input patterns generated by the enhanced STAR can provide high functional coverage although the entire analysis is on the RTL source code. The RTL code structures reflect the design specifications and STAR is able to follow these structures to generate meaningful patterns from the perspective of interesting functionality.

We used the assertion generation tool GoldMine [29] to generate assertions for several designs. The generated tests from the enhanced STAR were applied to the design to evaluate the assertion coverage during concrete execution. The assertions are normally used to check whether the design has correctly implemented the specification. The triggered assertion coverage rate reflects the quality of simulation patterns. The assertion coverage report on the tests generated by the enhanced STAR is shown in the “assertion coverage” column of Table 3.5. From the results, we can conclude that the enhanced STAR is able to generate tests that comprehen-

sively capture the design specification.

In comparison with random generation method, the enhanced STAR has the advantage of being able to generate all meaningful input to cover all possible design behaviors, instead of exhausting all possible values of inputs. The generate process relies on the hints from the RTL structure. Unlike widely used constraint-random generation method, our enhanced STAR doesn’t have to build the constraints man- ually.

3.8.4 Comparison with Constraint-based Random Test Generation

Method

In this experiment, we compare the enhanced STAR with constraint-based random test generation method. We use both methods to generate tests and compare the cor- responding structural and functional coverage values. Constraint-based random test generation method is widely used in simulation based verification. It requires the user to express the legal/valid input requirements in the form of constraint in verification testbench. The generator employs a constraint satisfaction technique to solve the complex constraint and produce input stimuli satisfying the input requirements. For OpenRISC design, we adopt the constraint-based random test generation testbench from the opencore website [1]. For ITC99 benchmark, we generate tests by randomizing the input variables in RTL designs since we did not find published testbench for these designs. We generate 1000 random patterns with different number of cycles for ITC99 benchmark.

We evaluate the structural coverage as well as functional coverage of both methods. From the experimental results shown in Table 3.5, it can be observed that the tests generated by the enhanced STAR always has higher coverage than the tests generated by other random method. Especially, for the design b14 with a lot of control branches, it is very difficult for random methods to generate high coverage tests. Also for constraint based random tests, we found that it always trigger a small number of paths repeatedly while it is not able to generate tests to cover the whole design. However, the enhanced STAR is able to generate high coverage tests.

In document Harmonizing data mining and static analysis to tackle hardware and system level verification (Page 62-67)