Verilog 2-state vs. 4-state logic
63. Simulation Best Practices
Communication between simulation threads
Components of a simulation testbench, such as initial, fork-join and always blocks, execute concurrently as separate threads or processes. A typical testbench has multiple threads performing different tasks. The threads need to be spawned, share data among each other, and synchronized. Multiple threads waiting to execute at one simulation time point have to be scheduled in queues, also referred as event queues. A simulator can only execute one thread at a time in a single-core CPU. In a multi-core CPU the simulator may execute multiple threads in parallel without predetermined execution order. If a simulation testbench is not written correctly this can create a race condition, meaning that the simulation results depend on the order of execution of CPU and the simulator.
Concurrent simulation threads require communication to establish control for execution sequence. There are three main methods to implement thread communication in Verilog.
Using events
In this case, one thread is waiting for an event to be triggered by another thread. The following is a simple example of event-based communication.
initial begin event DONE;
fork
my_task1();
my_task1();
join
task my_task1();
@(DONE);
$display(“event received”);
endtask
task my_task2();
->DONE;
$display(“event sent”);
endtask end // initial
Resource sharing
Concurrent threads can be synchronized using a shared resource. A thread tries to access a shared resource and waits if it’s
unavailable. It resumes the execution when another thread releases the resource. SystemVerilog provides support for semaphore class to implement resource sharing. The following is an example of using semaphores.
semaphore sem1 = new(1);
sem.get(1); // semaphore is taken fork
begin: thread1
sem.get(1); // tries to get the semaphore end
join
sem.put(1); // releases semaphore; thread1 is served
Data passing
SystemVerilog enables data passing between threads using mailbox functionality. A thread sends data by putting messages into a mailbox. Another thread retrieves the data from the mailbox. If the message is unavailable the thread has an option to wait or resume the execution. The following is an example of using mailboxes.
mailbox #(bit[15:0]) mbox = new(1);
bit [15:0] data = 16'h1234;
mbox.put(data);
fork
begin: thread1
status mbox.try_get(data);
mbox.get(data);
end join
Simulation delta delays
In a given simulation time, a simulator executes multiple concurrent statements in initial, fork-join and always blocks and continuous assignments in the testbench. When all the events scheduled for the current simulation time are updated, the simulator advances the simulation time. Each event update in the current simulation cycle is executed at the delta cycle. Updating the clk3 signal in the following code example takes two delta cycles from the time clk1 has changed, even if it appears to change in the same simulation time.
always @(*) begin clk2 <= clk1;
clk3 <= clk2;
end
Many commercial simulators, such as ModelSim and VCS, have an option to display simulation delta cycles.
The mechanics of simulation time, event queues, and the effect of delta cycles on simulation results is often not well understood, even by experienced FPGA designers. Adding #0 (explicit zero delay) is a frequently used technique to fix issues in simulation testbenches related to delta cycles. However, it is not a good design practice. There are books [1], papers, and simulator user guides that discuss simulation time, event queues, and delta cycles. The most reliable and comprehensive source of information on this subject remains the Verilog specification.
The following example illustrates how a delta cycle added to a clock signal can cause incorrect simulation results. The circuit, shown in the following figure, consists of two registers. The Clk3 signal is connected to the right register, and is unintentionally delayed with two zero-delay buffers. There is combinatorial logic between the two registers.
Figure 1: Clock signal is unintentionally delayed with two buffers
The following is a code example of the above circuit. The logic between two registers is implemented using d_delta1, and d_delta2 signals.
module tb;
reg clk1;
reg clk2,clk3;
reg data;
reg d_in, d_out;
reg d_delta1,d_delta2;
reg reset;
initial begin reset = 1;
data = 0;
#15; reset = 0;
#15;
@(posedge clk1);
data = 1;
@(posedge clk1);
data = 0;
#50;
$finish(2);
end
initial begin clk1 = 0;
forever #5 clk1 = ~clk1;
end
always @(*) begin clk2 <= clk1;
clk3 <= clk2;
end
always @(*) begin d_delta1 <= d_in;
d_delta2 <= d_delta1;
end
always @(posedge clk1) if(reset) d_in <= 1'b0;
else d_in <= data;
always @(posedge clk2) if(reset) d_out <= 1'b0;
else d_out <= d_in;
endmodule
The value of the d_out output depends on the amount of delay introduced by the combinatorial logic between registers. This creates a race condition between the arrival of clock edge and data to the register on the right.
The following waveform shows incorrect simulation results: a pulse on the d_in input appears on the output in the same clock cycle.
Figure 2: Incorrect d_out due to delta delays on the clock signal After reducing the combinatorial logic delay, the simulation is correct.
Figure 3: Correct d_out when the combinatorial delay is reduced There are several ways to fix this problem. One is to add a non-zero delay:
always @(posedge clk1) if(reset) d_in <= #1 1'b0;
else d_in <= #1 data;
However, if the simulator resolution is smaller than that delay, the delay will be rounded down to 0ns and will become a delta delay, as in the original code. Another way is to use blocking assignment in combinational logic:
always @(*) begin d_delta1 = d_in;
d_delta2 = d_delta1;
end
The third way is to eliminate delta delays on the clock signals and ensure that clock edges line up to the exact delta at the destination registers. Changing clk2 to clk1 in the above example achieves that.
always @(posedge clk1) if(reset) d_out <= 1'b0;
else d_out <= d_in;
Using `timescale directive
Verilog `timescale directive specifies the time unit and time precision of the modules that follow it. It has the following syntax:
`timescale <time_unit>/<time_precision>
The time_unit is the unit of measurement for time and delay values. The time_precision specifies how delay values are rounded before being used in simulation. The following is a code example that illustrates the `timescale directive operation.
// delays are measured in 1ns units with 100ps precision
`timescale 1ns/100ps module timescale_test;
reg clk1,clk2;
localparam HALF_PERIOD1 = 2.5, HALF_PERIOD2 = 3.333;
initial begin clk1 = 0;
forever #HALF_PERIOD1 clk1 = ~clk1;
end
// incorrect, will be rounded up to 3.3 initial begin
clk2 = 0;
forever #HALF_PERIOD2 clk2 = ~clk2;
end endmodule
The following figure is a simulation waveform showing that clk2 period is rounded down to 6.600ns from the originally specified 6.666ns.
Figure 4: Using `timescale directive
It is important to understand that the `timescale directive is not specific to files and modules. It can be overridden by another
`timescale encountered in a different file during the compile process. Therefore, depending on the compile order of the files,
simulation results might be different. Also, if the `timescale directive is not present in a Verilog file, the compiler might insert a default value. That situation will cause portability issues between different simulation tools.
To avoid these problems, it is recommended to specify the `timescale directive before each module in both the design and testbench files.
Simulation tools provide global options to control simulation time unit and precision. ModelSim and VCS provides a -timescale option, which specifies the timescale for modules that don’t have explicitly defined `timescale directives. VCS provides an a – override_timescale option, which overrides the time unit and precision for all the `timescale directives in the design.
Displaying state variables
Simulation tools provide several options to display state variable: hexadecimal, decimal, binary, symbolic, and ASCII string.
Displaying state variable using symbolic representation requires certain configuration, which depends on the simulator. The following Verilog code example shows how to view state variable using an ASCII string.
localparam STATE_INIT = 3'd0, STATE_ONE = 3'd1, STATE_TWO = 3'd2, STATE_THREE = 3'd3;
reg [2:0] state_cur, state_next;
reg [1:11*8] state_str; // current state shown as ASCII always @(posedge clk) begin
if(reset) begin
state_cur <= STATE_INIT;
state_outputs <= 4'b0;
end else begin
state_cur <= state_next;
state_outputs[0] <= state_cur == STATE_INIT;
state_outputs[1] <= state_cur == STATE_ONE;
state_outputs[2] <= state_cur == STATE_TWO;
state_outputs[3] <= state_cur == STATE_THREE;
end end
always @(*) begin
state_next = state_cur;
case(state_cur)
STATE_INIT: if(state_inputs[0]) state_next = STATE_ONE;
STATE_ONE: if(state_inputs[1]) state_next = STATE_TWO;
STATE_TWO: if(state_inputs[2]) state_next = STATE_THREE;
STATE_THREE: if(state_inputs[3]) state_next = STATE_INIT;
endcase end
// state_str is an ASCII representation of state_cur always @(*) begin
state_str = "";
case(state_cur)
STATE_INIT: state_str = "STATE_INIT";
STATE_ONE: state_str = "STATE_ONE";
STATE_TWO: state_str = "STATE_TWO";
STATE_THREE: state_str = "STATE_THREE";
endcase end
Resources
[1] Janick Bergeron, “Writing Testbenches: Functional Verification of HDL Models”,Springer, 2003. ISBN 9781402074011