An Embedded Systems
Approach Using Verilog
Chapter 4
Sequential Basics
Sequential circuits
Outputs depend on current inputs and
previous inputs
Store state: an abstraction of the history of
inputs
Usually governed by a periodic clock
D-Flipflops
1-bit storage element
We will treat it as a basic component
Other kinds of flipflops
SR (set/reset), JK, T (toggle)
D Q
clk D
clk
Registers
Store a multi-bit encoded value
One D-flipflop per bit
Stores a new value on
each clock cycle
wire [n:0] d;
reg [n:0] q; ...
always @(posedge clk) q <= d;
event list nonblocking asignment D Q clk D Q clk D Q clk d(0) … … … d(1)
d(n)
n n
q(0)
q(1)
q(n) clk
D Q
Pipelines Using Registers
Total delay = Delay1 + Delay2 + Delay3 Interval between outputs > Total delay
Clock period = max(Delay1, Delay2, Delay3) Total delay = 3 × clock period
Interval between outputs = 1 clock period
D Q clk combin-ational circuit 1 D Q clk combin-ational circuit 2 D Q clk combin-ational circuit 3 d_in clk d_out combin-ational circuit 1 combin-ational circuit 2 combin-ational circuit 3 d_in
Pipeline Example
Compute the average of corresponding
numbers in three input streams
New values arrive on each clock edge
module average_pipeline ( output reg signed [5:-8] avg,
input signed [5:-8] a, b, c, input clk ); wire signed [5:-8] a_plus_b, sum, sum_div_3;
reg signed [5:-8] saved_a_plus_b, saved_c, saved_sum; ...
Pipeline Example
...
assign a_plus_b = a + b;
always @(posedge clk) begin // Pipeline register 1 saved_a_plus_b <= a_plus_b;
saved_c <= c; end
assign sum = saved_a_plus_b + saved_c;
always @(posedge clk) // Pipeline register 2 saved_sum <= sum;
assign sum_div_3 = saved_sum * 14'b00000001010101; always @(posedge clk) // Pipeline register 3
avg <= sum_div_3;
D-Flipflop with Enable
Storage controlled by a clock-enable
stores only when CE = 1 on a rising edge
of the clock
CE is a synchronous control input
D CE
Q
clk D
CE clk
Register with Enable
One flipflop per bit
clk and CE wired in common
wire [n:0] d;
wire ce;
reg [n:0] q; ...
always @(posedge clk) if (ce) q <= d;
Register with Synchronous Reset
Reset input forces stored value to 0
reset input must be stable around rising
edge of clk
always @(posedge clk)
if (reset) q <= 0;
D CE
Q
clk reset
D CE reset clk
Q
Register with Asynchronous Reset
Reset input forces stored value to 0
reset can become 1 at any time, and effect
is immediate
reset should return to 0 synchronously
D CE
Q
clk reset
D CE reset clk
Q
Asynch Reset in Verilog
always @(posedge clk or posedge reset) if (reset) q <= 0;
else if (ce) q <= d;
reset is an asynchronous control input here
include it in the event list so that the process
Example: Accumulator
Sum a sequence of signed numbers
A new number arrives when data_en = 1
Clear sum to 0 on synch reset
module accumulator
( output reg signed [7:-12] data_out, input signed [3:-12] data_in,
input data_en, clk, reset ); wire signed [7:-12] new_sum;
assign new_sum = data_out + data_in; always @(posedge clk)
if (reset) data_out <= 20'b0; else if (data_en) data_out <= new_sum;
Flipflop and Register Variations
module flip_flop_n ( output reg Q, output Q_n,
input pre_n, clr_n, D, input clk_n, CE );
always @( negedge clk_n or
negedge pre_n or negedge clr_n ) begin if ( !pre_n && !clr_n)
$display("Illegal inputs: pre_n and clr_n both 0"); if (!pre_n) Q <= 1'b1;
else if (!clr_n) Q <= 1'b0; else if (CE) Q <= D; end
assign Q_n = ~Q; endmodule
D CE
Q Q clk
pre
Shift Registers
Performs shift operation on stored data
Arithmetic scaling Serial transfer
of data D D_in CE load_en Q clk D CE Q clk 0 1 D CE Q clk 0 1 D CE Q clk 0 1 Q(n–1) Q(n–2) Q(0) D(n–1) D(n–2) D(0) clkCE load_en
Example: Sequential Multiplier
16×16 multiply over 16 clock cycles, using
one adder
Shift register for multiplier bits
Shift register for lsb’s of accumulated product
17-bit reg reset CE D Q clk D 16-bit reg CE Q clk D_in 15-bit shift reg CE Q clk 16-bit shift reg D_in D CE load_en Q clk x 16-bit adder c0 y c16 s 15...0 16 15 0 31...16 P(14...0) P(31...15) y(15...0) x(15...0) y_load_en y_ce x_ce P_reset P_ce clk
Latches
Level-sensitive storage
Data transmitted while enable is '1'
transparent latch
Data stored while enable is '0'
D Q
LE
D LE
Feedback Latches
Feedback in gate circuits produces
latching behavior
Example: reset/set (RS) latch
S R
Q
Current RTL synthesis tools don’t accept
Verilog models with unclocked feedback +V
Q
Q R
Latches in Verilog
Latching behavior is usually an error!
always @*
if (~sel) begin
z1 <= a1; z2 <= b1; end
else begin
z1 <= a2; z3 <= b2; end
Oops! Should be
z2 <= ...
Values must be stored
for z2 while sel = 1 for z3 while sel = 0
Counters
Stores an unsigned integer value
increments or decrements the value
Used to count occurrences of
events
repetitions of a processing step
Used as timers
count elapsed time intervals by
Free-Running Counter
Increments every rising edge of clk
up to 2n–1, then wraps back to 0
i.e., counts modulo 2n
This counter is
synchronous
all outputs governed by clock edge
D Q
clk
+1 Q
Example: Periodic Control Signal
Count modulo 16 clock cycles
Control output = 1 every 8th and 12th cycle
decode count values 0111 and 1011
+1 clk ctrl 0 1 2 3 0 1 2 3 D Q clk D Q clk D Q clk D Q clk
Example: Periodic Control Signal
module decoded_counter ( output ctrl, input clk ); reg [3:0] count_value;
always @(posedge clk)
count_value <= count_value + 1;
assign ctrl = count_value == 4'b0111 || count_value == 4'b1011;
Count Enable and Reset
Use a register with control inputs
Increments when CE = 1 on rising clock
edge
Reset: synch or asynch
+1
Q
clk CE reset
clk D CE
Q reset
Terminal Count
Status signal indicating final count value
TC is 1 for one cycle in every 2
ncycles
frequency = clock frequency / 2n
Called a
clock divider
counter
… …
Q0 Q1 Qn
clk
Divider Example
Alarm clock beep: 500Hz from 1MHz clock
count
tone2 tone
clk
10-bit counter
Q TC clk
D CE
Q clk
tone tone2 count clk
1 0 1
0 2 2 0 1 2 0 1
Divide by
k
Decode
k
–1 as terminal count and reset
counter register
Counter increments modulo k
Example: decade counter
Terminal count = 9
clk Q0
Q1 Q2 Q3 Q0
Q1 Q2 Q3 clk reset
Decade Counter in Verilog
module decade_counter ( output reg [3:0] q,
input clk ); always @(posedge clk)
q <= q == 9 ? 0 : q + 1;
Down Counter with Load
Load a starting value, then decrement
Terminal count = 0
Useful for interval timer
D Q
clk –1
=0?
Q TC clk
loadD
0 1
Loadable Counter in Verilog
module interval_timer_rtl ( output tc, input [9:0] data,
input load, clk ); reg [9:0] count_value;
always @(posedge clk)
if (load) count_value <= data;
else count_value <= count_value - 1; assign tc = count_value == 0;
Reloading Counter in Verilog
module interval_timer_repetitive ( output tc, input [9:0] data,
input load, clk ); reg [9:0] load_value, count_value;
always @(posedge clk) if (load) begin
load_value <= data; count_value <= data; end
else if (count_value == 0) count_value <= load_value; else
count_value <= count_value - 1; assign tc = count_value == 0;
Ripple Counter
Each bit toggles between 0 and 1
when previous bit changes from 1 to 0
D Q Q clk D Q Q clk D Q Q clk D Q Q clk Q0 Q1 Q2 Qn clk Q1 Q0 Q0 clk Q1 Q2 Q2
Ripple or Synch Counter?
Ripple counter is ok if
length is short
clock period long relative to flipflop delay transient wrong values can be tolerated
area must be minimal
E.g., alarm clock
Datapaths and Control
Digital systems perform sequences of
operations on encoded data
Datapath
Combinational circuits for operations
Registers for storing intermediate results
Control section
: control sequencing
Generates control signals
Selecting operations to perform
Enabling registers at the right times
Example: Complex Multiplier
Cartesian form, fixed-point
operands: 4 pre-, 12 post-binary-point bits result: 8 pre-, 24 post-binary-point bits
Subject to tight area constraints
i r ja
a
a b br jbi
) (
)
( r r i i r i i r
i
r jp a b a b j a b a b
p ab
p
4 multiplies, 1 add, 1 subtract
Perform sequentially using 1 multiplier, 1
Complex Multiplier Datapath
0 1 0 1 D CE Q clk D CE Q clk × ± D CE Q clk D CE Q clk p_r p_i a_r a_i b_r b_i a_sel b_sel pp1_ce pp2_ce sub p_r_ce p_i_ce clkComplex Multiplier in Verilog
module multiplier
( output reg signed [7:-24] p_r, p_i,
input signed [3:-12] a_r, a_i, b_r, b_i,
input clk, reset, input_rdy );
reg a_sel, b_sel, pp1_ce, pp2_ce, sub, p_r_ce, p_i_ce;
wire signed [3:-12] a_operand, b_operand;
wire signed [7:-24] pp, sum
reg signed [7:-24] pp1, pp2;
Complex Multiplier in Verilog
assign a_operand = ~a_sel ? a_r : a_i;
assign b_operand = ~b_sel ? b_r : b_i;
assign pp = {{4{a_operand[3]}}, a_operand, 12'b0} *
{{4{b_operand[3]}}, b_operand, 12'b0};
always @(posedge clk) // Partial product 1 register
if (pp1_ce) pp1 <= pp;
always @(posedge clk) // Partial product 2 register
if (pp2_ce) pp2 <= pp;
assign sum = ~sub ? pp1 + pp2 : pp1 - pp2;
always @(posedge clk) // Product real-part register
if (p_r_ce) p_r <= sum;
always @(posedge clk) // Product imaginary-part register
if (p_i_ce) p_i <= sum; ...
Multiplier Control Sequence
Avoid resource conflict
First attempt
1. a_r * b_r → pp1_reg 2. a_i * b_i → pp2_reg 3. pp1 – pp2 → p_r_reg 4. a_r * b_i → pp1_reg 5. a_i * b_r → pp2_reg 6. pp1 + pp2 → p_i_reg
Multiplier Control Sequence
Merge steps where no resource conflict
Revised attempt
1. a_r * b_r → pp1_reg 2. a_i * b_i → pp2_reg 3. pp1 – pp2 → p_r_reg
a_r * b_i → pp1_reg
4. a_i * b_r → pp2_reg 5. pp1 + pp2 → p_i_reg
Multiplier Control Signals
Step a_sel b_sel pp1_ce pp2_ce sub p_r_ce p_i_ce
1 0 0 1 0 – 0 0
2 1 1 0 1 – 0 0
3 0 1 1 0 1 1 0
4 1 0 0 1 – 0 0
Finite-State Machines
Used the implement control sequencing
Based on mathematical automaton theory
A FSM is defined by
set of inputs: Σ
set of outputs: Γ
set of states: S
initial state: s0 S
transition function: δ: S × Σ → S
FSM in Hardware
Mealy FSM: ω: S × Σ
→
Γ
Moore FSM: ω: S
→
Γ
Mealy FSM only
D reset
Q clk
current_state
outputs inputs
clk reset
next state logic output
FSM Example: Multiplier Control
One state per step Separate idle state?
Wait for input_rdy = 1
Then proceed to steps 1, 2, ... But this wastes a cycle!
Use step 1 as idle state
Repeat step 1 if input_rdy ≠ 1 Proceed to step 2 otherwise
Output function
Defined by table on slide 43 Moore or Mealy?
current_
state input_rdy next_state
Step1 0 Step1
Step1 1 Step2
Step2 – Step3
Step3 – Step4
Step4 – Step5
Step5 – Step1
State Encoding
Encoded in binary
N states: use at least log2N bits
Encoded value used in circuits for transition
and output function
encoding affects circuit complexity
Optimal encoding is hard to find
CAD tools can do this well
One-hot works well in FPGAs
Often use 000...0 for idle state
FSMs in Verilog
Use parameters for state values
Synthesis tool can choose an alternative
encoding
parameter [2:0] step1 = 3'b000, step2 = 3'b001,
step3 = 3'b010, step4 = 3'b011, step5 = 3'b100;
reg [2:0] current_state, next_state ; ...
Multiplier Control in Verilog
always @(posedge clk or posedge reset) // State register if (reset) current_state <= step1;
else current_state <= next_state; always @* // Next-state logic
case (current_state)
step1: if (!input_rdy) next_state = step1; else next_state = step2; step2: next_state = step3; step3: next_state = step4; step4: next_state = step5; step5: next_state = step1; endcase
Multiplier Control in Verilog
always @* begin // Output_logic
a_sel = 1'b0; b_sel = 1'b0; pp1_ce = 1'b0; pp2_ce = 1'b0; sub = 1'b0; p_r_ce = 1'b0; p_i_ce = 1'b0;
case (current_state)
step1: begin
pp1_ce = 1'b1;
end
step2: begin
a_sel = 1'b1; b_sel = 1'b1; pp2_ce = 1'b1;
end
step3: begin
b_sel = 1'b1; pp1_ce = 1'b1; sub = 1'b1; p_r_ce = 1'b1;
end
step4: begin
a_sel = 1'b1; pp2_ce = 1'b1;
end
step5: begin
p_i_ce = 1'b1;
end
endcase
State Transition Diagrams
Bubbles to represent states
Arcs to represent transitions
Example
S = {s1, s2, s3} Inputs (a1, a2):
Σ = {(0,0), (0,1), (1,0), (1,1)}
δ defined by diagram
s1 s2
s3 0, 0
0, 0
0, 1
1, 0 0, 1 1, 0
1, 1
State Transition Diagrams
Annotate diagram to
define output function
Annotate states for
Moore-style outputs
Annotate arcs for
Mealy-style outputs
Example
x1, x2: Moore-style y1, y2, y3: Mealy-style
s1 s2
s3
0, 0 / 0, 0, 0
1, 0 0, 0
0, 1 0, 0 / 0, 0, 0
0, 1 / 0, 1, 1
/ 0, 1, 1
1, 0 / 1, 0, 0
0, 1 / 0, 1, 1 1, 0 / 1, 0, 0
1, 1 / 1, 1, 1
Multiplier Control Diagram
Input: input_rdy
Outputs
a_sel, b_sel, pp1_ce, pp2_ce, sub, p_r_ce, p_i_ce
step1 0, 0, 1, 0, –, 0, 0
0 1 step2
1, 1, 0, 1, –, 0, 0
step4 1, 0, 0, 1, –, 0, 0 step5
–, –, 0, 0, 0, 0, 1
step3 0, 1, 1, 0, 1, 1, 0
Bubble Diagrams or Verilog?
Many CAD tools provide editors for
bubble diagrams
Automatically generate Verilog for
simulation and synthesis
Diagrams are visually appealing
but can become unwieldy for complex
FSMs
Your choice...
Register Transfer Level
RTL — a level of abstraction
data stored in registers
transferred via circuits that operate on data
control section
outputs inputs
Clocked Synchronous Timing
Registers driven by a common clock
Combinational circuits operate during clock
cycles (between rising clock edges)
tco + tpd + tsu < tc
Q1 tpd D2
tco tsu
Q1 clk
D2
tco
tc
Control Path Timing
tco + tpd-s + tpd-o + tpd-c + tsu < tc tco + tpd-s + tpd-ns + tsu < tc
Ignore tpd-s for a Moore FSM
tpd-s tpd-c tpd-o tpd-ns tco tsu
Timing Constraints
Inequalities must hold for all paths If tco and tsu the same for all paths
Combinational delays make the difference
Critical path
The combinational path between registers with
the longest delay
Determines minimum clock period for the entire
system
Focus on it to improve performance
Interpretation of Constraints
1.
Clock period depends on delays
System can operate at any frequency up
to a maximum
OK for systems where high performance
is not the main requirement
2.
Delays must fit within a target clock
period
Optimize critical paths to reduce delays if
necessary
Clock Skew
Need to ensure clock edges arrive at all
registers at the same time
Use CAD tools to insert clock buffers and
route clock signal paths
Q1 D2 Q1
clk1
clk2 D2
Off-Chip Connections
Delays going off-chip and inter-chip
Input and output pad delays, wire delays
Same timing rules apply
Use input and output registers to avoid
adding external delay to critical path
Asynchronous Inputs
External inputs can change at any time
Might violate setup/hold time constraints
Can induce
metastable state
in a flipflop
Unbounded time to recover
May violate setup/hold time
of subsequent flipflop 1 2
2
f f k
e MTBF
f t k
Synchronizers
If input changes outside setup/hold window
Change is simply delayed by one cycle
If input changes during setup/hold window
First flipflop has a whole cycle to resolve
metastability
See data sheets for metastability parameters
D Q
clk
D Q
clk
clk
Switch Inputs and Debouncing
Switches and push-buttons suffer from
contact bounce
Takes up to 10ms to settle
Need to debounce to avoid false triggering Requires two inputs
and two resistors
Must use a
break-before-make double-throw switch
Q R
S +V
Switch Inputs and Debouncing
Alternative
Use a single-throw switch
Sample input at intervals longer than bounce time Look for two successive samples with the same
value
Assumption
Extra circuitry inside the chip
is cheaper than extra
components and connections outside
Debouncing in Verilog
module debouncer ( output reg pb_debounced,
input pb,
input clk, reset );
reg [18:0] count500000; // values are in the range 0 to 499999 wire clk_100Hz;
reg pb_sampled;
always @(posedge clk or posedge reset)
if (reset) count500000 <= 499999; else if (clk_100Hz) count500000 <= 499999;
else count500000 <= count500000 - 1;
assign clk_100Hz = count500000 == 0;
always @(posedge clk)
if (clk_100Hz) begin
if (pb == pb_sampled) pb_debounced <= pb; pb_sampled <= pb;
end endmodule
Verifying Sequential Circuits
DUV may take multiple and varying number of
cycles to produce output
Checker needs to
synchronize with test generator
ensure DUV outputs occur when expected ensure DUV outputs are correct
ensure no spurious outputs occur
Design Under Verification
(DUV) Apply
Test Cases Checker
Example: Multiplier Testbench
`timescale 1ns/1ns
module multiplier_testbench; parameter t_c = 50;
reg clk, reset;
reg input_rdy;
wire signed [3:-12] a_r, a_i, b_r, b_i; wire signed [7:-24] p_r, p_i;
real real_a_r, real_a_i, real_b_r, real_b_i,
real_p_r, real_p_i, err_p_r, err_p_i;
task apply_test ( input real a_r_test, a_i_test,
b_r_test, b_i_test ); begin
real_a_r = a_r_test; real_a_i = a_i_test; real_b_r = b_r_test; real_b_i = b_i_test; input_rdy = 1'b1;
@(negedge clk) input_rdy = 1'b0;
repeat (5) @(negedge clk);
end
Example: Multiplier Testbench
multiplier duv ( .clk(clk), .reset(reset), .input_rdy(input_rdy), .a_r(a_r), .a_i(a_i), .b_r(b_r), .b_i(b_i), .p_r(p_r), .p_i(p_i) ); always begin // Clock generator
#(t_c/2) clk = 1'b1; #(t_c/2) clk = 1'b0; end
initial begin // Reset generator reset <= 1'b1;
#(2*t_c) reset = 1'b0; end
Example: Multiplier Testbench
initial begin // Apply test cases @(negedge reset)
@(negedge clk)
apply_test(0.0, 0.0, 1.0, 2.0); apply_test(1.0, 1.0, 1.0, 1.0); // further test cases ...
$finish; end
assign a_r = $rtoi(real_a_r * 2**12); assign a_i = $rtoi(real_a_i * 2**12); assign b_r = $rtoi(real_b_r * 2**12); assign b_i = $rtoi(real_b_i * 2**12);
Example: Multiplier Testbench
always @(posedge clk) // Check outputs if (input_rdy) begin
real_p_r = real_a_r * real_b_r - real_a_i * real_b_i; real_p_i = real_a_r * real_b_i + real_a_i * real_b_r; repeat (5) @(negedge clk);
err_p_r = $itor(p_r)/2**(-24) - real_p_r; err_p_i = $itor(p_i)/2**(-24) - real_p_i;
if (!( -(2.0**(-12)) < err_p_r && err_p_r < 2.0**(-12) && -(2.0**(-12)) < err_p_i && err_p_i < 2.0**(-12) )) $display("Result precision requirement not met");
end endmodule
Asynchronous Timing
Clocked synchronous timing requires
global clock distribution with minimal skew path delay between registers < clock period
Hard to achieve in complex multi-GHz systems
Globally asynch, local synch (GALS) systems
Divide the systems into local clock domains Inter-domain signals treated as asynch inputs Simplifies clock managements and constraints Delays inter-domain communication
Delay-insensitive asynchronous systems
Other Clock-Related Issues
Inter-chip clocking
Distributing high-speed clocks on PCBs is hard
Often use slower off-chip clock, with on-chip clock
a multiple of off-chip clock
Synchronize on-chip with phase-locked loop (PLL)
In multi-PCB systems
treat off-PCB signals as asynch inputs
Low power design
Continuous clocking wastes power
Summary
Registers for storing data
synchronous and asynchronous control
clock enable, reset, preset
Latches: level-sensitive
usually unintentional in Verilog
Counters
free-running dividers, terminal count,
Summary
RTL organization of digital systems
datapath and control section
Finite-State Machine (FSM)
states, inputs, transition/output functions Moore and Mealy FSMs
bubble diagrams
Clocked synch timing and constraints
critical path and optimization
Asynch inputs, switch debouncing Verification of sequential systems