Simulation Best Practices - vs. 4-state logic

Verilog 2-state vs. 4-state logic

63. Simulation Best Practices

Communication between simulation threads

Components of a simulation testbench, such as initial, fork-join and always blocks, execute concurrently as separate threads or processes. A typical testbench has multiple threads performing different tasks. The threads need to be spawned, share data among each other, and synchronized. Multiple threads waiting to execute at one simulation time point have to be scheduled in queues, also referred as event queues. A simulator can only execute one thread at a time in a single-core CPU. In a multi-core CPU the simulator may execute multiple threads in parallel without predetermined execution order. If a simulation testbench is not written correctly this can create a race condition, meaning that the simulation results depend on the order of execution of CPU and the simulator.

Concurrent simulation threads require communication to establish control for execution sequence. There are three main methods to implement thread communication in Verilog.

Using events

In this case, one thread is waiting for an event to be triggered by another thread. The following is a simple example of event-based communication.

initial begin event DONE;

fork

my_task1();

join

task my_task1();

@(DONE);

$display(“event received”);

endtask

task my_task2();

->DONE;

$display(“event sent”);

endtask end // initial

Resource sharing

Concurrent threads can be synchronized using a shared resource. A thread tries to access a shared resource and waits if it’s

unavailable. It resumes the execution when another thread releases the resource. SystemVerilog provides support for semaphore class to implement resource sharing. The following is an example of using semaphores.

semaphore sem1 = new(1);

sem.get(1); // semaphore is taken fork

begin: thread1

sem.get(1); // tries to get the semaphore end

join

sem.put(1); // releases semaphore; thread1 is served

Data passing

SystemVerilog enables data passing between threads using mailbox functionality. A thread sends data by putting messages into a mailbox. Another thread retrieves the data from the mailbox. If the message is unavailable the thread has an option to wait or resume the execution. The following is an example of using mailboxes.

mailbox #(bit[15:0]) mbox = new(1);

bit [15:0] data = 16'h1234;

mbox.put(data);

fork

begin: thread1

status mbox.try_get(data);

mbox.get(data);

end join

Simulation delta delays

In a given simulation time, a simulator executes multiple concurrent statements in initial, fork-join and always blocks and continuous assignments in the testbench. When all the events scheduled for the current simulation time are updated, the simulator advances the simulation time. Each event update in the current simulation cycle is executed at the delta cycle. Updating the clk3 signal in the following code example takes two delta cycles from the time clk1 has changed, even if it appears to change in the same simulation time.

always @(*) begin clk2 <= clk1;

clk3 <= clk2;

end

Many commercial simulators, such as ModelSim and VCS, have an option to display simulation delta cycles.

The mechanics of simulation time, event queues, and the effect of delta cycles on simulation results is often not well understood, even by experienced FPGA designers. Adding #0 (explicit zero delay) is a frequently used technique to fix issues in simulation testbenches related to delta cycles. However, it is not a good design practice. There are books [1], papers, and simulator user guides that discuss simulation time, event queues, and delta cycles. The most reliable and comprehensive source of information on this subject remains the Verilog specification.

The following example illustrates how a delta cycle added to a clock signal can cause incorrect simulation results. The circuit, shown in the following figure, consists of two registers. The Clk3 signal is connected to the right register, and is unintentionally delayed with two zero-delay buffers. There is combinatorial logic between the two registers.

Figure 1: Clock signal is unintentionally delayed with two buffers

The following is a code example of the above circuit. The logic between two registers is implemented using d_delta1, and d_delta2 signals.

module tb;

reg clk1;

reg clk2,clk3;

reg data;

reg d_in, d_out;

reg d_delta1,d_delta2;

reg reset;

initial begin reset = 1;

data = 0;

#15; reset = 0;

#15;

@(posedge clk1);

data = 1;

@(posedge clk1);

data = 0;

#50;

$finish(2);

end

initial begin clk1 = 0;

forever #5 clk1 = ~clk1;

end

always @(*) begin clk2 <= clk1;

clk3 <= clk2;

end

always @(*) begin d_delta1 <= d_in;

d_delta2 <= d_delta1;

end

always @(posedge clk1) if(reset) d_in <= 1'b0;

else d_in <= data;

always @(posedge clk2) if(reset) d_out <= 1'b0;

else d_out <= d_in;

endmodule

The value of the d_out output depends on the amount of delay introduced by the combinatorial logic between registers. This creates a race condition between the arrival of clock edge and data to the register on the right.

The following waveform shows incorrect simulation results: a pulse on the d_in input appears on the output in the same clock cycle.

Figure 2: Incorrect d_out due to delta delays on the clock signal After reducing the combinatorial logic delay, the simulation is correct.

Figure 3: Correct d_out when the combinatorial delay is reduced There are several ways to fix this problem. One is to add a non-zero delay:

always @(posedge clk1) if(reset) d_in <= #1 1'b0;

else d_in <= #1 data;

However, if the simulator resolution is smaller than that delay, the delay will be rounded down to 0ns and will become a delta delay, as in the original code. Another way is to use blocking assignment in combinational logic:

always @(*) begin d_delta1 = d_in;

d_delta2 = d_delta1;

end

The third way is to eliminate delta delays on the clock signals and ensure that clock edges line up to the exact delta at the destination registers. Changing clk2 to clk1 in the above example achieves that.

always @(posedge clk1) if(reset) d_out <= 1'b0;

else d_out <= d_in;

Using `timescale directive

Verilog `timescale directive specifies the time unit and time precision of the modules that follow it. It has the following syntax:

`timescale <time_unit>/<time_precision>

The time_unit is the unit of measurement for time and delay values. The time_precision specifies how delay values are rounded before being used in simulation. The following is a code example that illustrates the `timescale directive operation.

// delays are measured in 1ns units with 100ps precision

`timescale 1ns/100ps module timescale_test;

reg clk1,clk2;

localparam HALF_PERIOD1 = 2.5, HALF_PERIOD2 = 3.333;

initial begin clk1 = 0;

forever #HALF_PERIOD1 clk1 = ~clk1;

end

// incorrect, will be rounded up to 3.3 initial begin

clk2 = 0;

forever #HALF_PERIOD2 clk2 = ~clk2;

end endmodule

The following figure is a simulation waveform showing that clk2 period is rounded down to 6.600ns from the originally specified 6.666ns.

Figure 4: Using `timescale directive

It is important to understand that the `timescale directive is not specific to files and modules. It can be overridden by another

`timescale encountered in a different file during the compile process. Therefore, depending on the compile order of the files,

simulation results might be different. Also, if the `timescale directive is not present in a Verilog file, the compiler might insert a default value. That situation will cause portability issues between different simulation tools.

To avoid these problems, it is recommended to specify the `timescale directive before each module in both the design and testbench files.

Simulation tools provide global options to control simulation time unit and precision. ModelSim and VCS provides a -timescale option, which specifies the timescale for modules that don’t have explicitly defined `timescale directives. VCS provides an a – override_timescale option, which overrides the time unit and precision for all the `timescale directives in the design.

Displaying state variables

Simulation tools provide several options to display state variable: hexadecimal, decimal, binary, symbolic, and ASCII string.

Displaying state variable using symbolic representation requires certain configuration, which depends on the simulator. The following Verilog code example shows how to view state variable using an ASCII string.

localparam STATE_INIT = 3'd0, STATE_ONE = 3'd1, STATE_TWO = 3'd2, STATE_THREE = 3'd3;

reg [2:0] state_cur, state_next;

reg [1:11*8] state_str; // current state shown as ASCII always @(posedge clk) begin

if(reset) begin

state_cur <= STATE_INIT;

state_outputs <= 4'b0;

end else begin

state_cur <= state_next;

state_outputs[0] <= state_cur == STATE_INIT;

state_outputs[1] <= state_cur == STATE_ONE;

state_outputs[2] <= state_cur == STATE_TWO;

state_outputs[3] <= state_cur == STATE_THREE;

end end

always @(*) begin

state_next = state_cur;

case(state_cur)

STATE_INIT: if(state_inputs[0]) state_next = STATE_ONE;

STATE_ONE: if(state_inputs[1]) state_next = STATE_TWO;

STATE_TWO: if(state_inputs[2]) state_next = STATE_THREE;

STATE_THREE: if(state_inputs[3]) state_next = STATE_INIT;

endcase end

// state_str is an ASCII representation of state_cur always @(*) begin

state_str = "";

case(state_cur)

STATE_INIT: state_str = "STATE_INIT";

STATE_ONE: state_str = "STATE_ONE";

STATE_TWO: state_str = "STATE_TWO";

STATE_THREE: state_str = "STATE_THREE";

endcase end

Resources

[1] Janick Bergeron, “Writing Testbenches: Functional Verification of HDL Models”,Springer, 2003. ISBN 9781402074011

In document 100 Power Tips for FPGA Designers - Stavinov, Evgeni (Page 141-145)