A simple way to simulate a circuit is by compiled-code simulation [WWX06, BA04]. In compiled-code simulation the combinational netlist is translated into executable instruc- tions for each node with all signal states being kept as variables. The circuit nodes are then evaluated by executing the assigned operations over their input variables. For a successful evaluation of each node, all of its input variables need to be determined first. Hence, the processing of the nodes usually follows a topological order to allow for a hassle-free eval- uation, which is obtained by a so-called levelization pre-process that partitions the nodes into levels ordered by increasing topological distance (depth of the nodes) with respect to the circuit inputs [Wun91].
A basic simulation flow for the simulation of a test pattern tp ∈ B|I|2 is shown in Algo- rithm 2.1. During the process, first the input nodes I ⊂ V are assigned their correspond- ing values of the input pattern, followed by the ordered computation of the internal node signals until the outputs are reached. Due to the levelization, all input-dependencies of the nodes have been resolved. This type of simulation is typically referred to as oblivious simulation, since always every node of the circuit is evaluated upon the application of a new test pattern [BBC94].
2.4 Circuit and Fault Simulation
Algorithm 2.1: Simulation flow of a plain oblivious logic simulation. Input: netlist G = (V, E), test pattern tp with tp[i] value of input i ∈ I
Output: values vnfor all n ∈ V
1 foreach node n ∈ V in topological order do 2 if n ∈ I then
3 Assign input value vn:= tp[n].
4 else
5 Fetch values v1, v2, ..., vkof fanin(n) (direct predecessors of n).
6 Compute vn:= φn(v1, v2, ..., vk).
7 end 8 end
9 return Stored values vnof all nodes n ∈ V .
2.4.1 Event-driven Simulation
Usually, when applying consecutive test patterns to a circuit, not all of the primary or pseudo primary inputs of the circuits change and hence certain signals sustain their state. With the oblivious simulation scheme, this causes a lot of unnecessary node evaluations since these nodes do not require recomputation, yet all nodes are always evaluated re- gardless of switching activity from signal changes [BA04, WWX06]. To provide a more efficient evaluation in cases of little switching activity, event-driven simulation approaches have been proposed [Wun91]. In event-driven simulation the evaluations are constrained to nodes with active switching events at their inputs. Thus, the evaluation only follows the path of events during simulation and thereby avoids (unnecessary) evaluation of nodes with constant signals.
Traditional time simulators typically follow a synchronous event-driven time-wheel ap- proach [Ulr69], which as proven well for simulation at logic level. A different simula- tion approach for asynchronous event-driven simulation can be realized using the Chandy- Misra-Bryant (CMB) algorithm [CM79, Bry77]. As opposed to a global synchronous time schedule, the CMB algorithm assigns a time stamp to each event and utilizes message passing to distribute events from node outputs to input FIFOs of successor nodes. The evaluation of events at different nodes can be realized by individual processes concur- rently, which can benefit from parallelization to provide speedup for simulating single circuit instances [BMC96].
In event-driven approaches, the handling of events can get quite complex which quickly increases the runtime overhead when considering more detailed delay models [Wun91].
For example, in inertial delay modeling, many events scheduled during processing might need to be reverted when processing later events in time. Also, the algorithms usually only speed up simulation of single circuit instances by exploiting parallelism from independent gates. They can not benefit from pattern parallelism [BBC94] through simultaneous eval- uating multiple patterns in a data-parallel fashion, as they rely on sparse occurrences of events. Since, gate level parallelism can diminish at deeper levels, this strongly lim- its the simulation throughput and effectiveness of these approaches. Moreover, for the implementation on GPUs these algorithms demand for highly complex control- and dy- namic memory management to process all the event lists in the circuit. However, frequent memory operations of the scheduling are expensive and will limit the effectiveness of an acceleration on GPUs [OHL+08].
2.4.2 Fault Simulation
In fault simulation, a circuit is simulated under the behavior of a given defect to determine whether the circuit behavior is altered by the fault [Wun91]. A naïve approach to simulate faults is through serial processing of the provided fault lists. In serial fault simulation, first a simulation of the circuit is performed to obtain the fault-free good-value simulation results for a test pattern. The resulting output responses voof all circuit (pseudo-)primary outputs o ∈ O are considered as golden reference, or expected values of the fault-free circuit, which are stored for comparison. The good-value simulation is usually followed by repeated simulations of the pattern for various copies of faulty circuits in which different faults have been injected. The injection of a fault is performed by modifying the circuit description according to the abstracted fault behavior.
After simulation of a faulty circuit copy, the output responses of the faulty circuit v0 o are compared against the golden reference vo to compute an output syndrome synoby
syno:= vo⊕ v0o, (2.3)
where ⊕ corresponds to the bitwise XOR-operation of the response bits of the outputs o ∈ O in good and faulty response. The output syndrome contains all the differences in the faulty and the fault-free output response and thus indicates the fault detection of the