Chapter 4. Approximate Timing-Error Correction by Path Delay Shaping
4.7 Comparison to Published Approaches
In order to put the proposed PDS technique into context, a qualitative
comparison with established approaches from the literature is presented. Table I
contrasts the key features of PDS along with generalisations of two approaches
introduced in Chapter 2, both of which have also been studied in numerous DSP
applications.
The most established technique is ANT, which is the only one to have been
realised in silicon to date [61]. ANT represents an elegant solution that requires only
standard logic gates and flip-flops. An implementation of the algorithm, known as
the main pipeline, is designed to produce timing-errors in the MSBs, such that they
are of large magnitude. This is achieved by explicit use of RCAs, which in our
comparisons were found to incur a delay increase of 69–72% in high performance
applications. An estimator block is then implemented to provide a low-overhead
prediction of the current output sample. An error is identified by the decision block
when the main pipeline output diverges from the output of the estimator block by
more than a given threshold. If an error is detected, the incorrect sample is replaced
Table 4.1: Qualitative comparison with published approaches.
Error Detection Error Mitigation Circuit Overhead ANT Main pipeline output
sample compared to estimator output
Replace erroneous output sample with estimator sample
RCAs in main pipeline limit performance; estimator block; decision
block SDC Analogue PVT sensor
and pre-calibrated look-up table
Errors forced into less significant coefficients
PVT sensor; minimal performance loss due to
the
algorithm/architecture modifications PDS Razor flip-flops Errors forced into less
significant bits of datapaths
Razor flops; modified carry-merge adders
The estimator block and the decision block introduce additional circuit overhead,
mainly dictated by the implementation of the predictor, for which there are a number
of options. The predictor block may also introduce additional limitations, e.g. the
linear prediction approach may limit signal bandwidth, while the RPR technique
may introduce excessive area overhead [65].
Significance-driven computation is more closely related to PDS, in that it
coerces timing-errors to occur first in a particular part of the circuit by affecting the
path delay, resulting in a timing guard-band. The key difference is that SDC ensures
that computations associated with less significant coefficients are most critical and
therefore fail first, while the most important coefficients are engineered with a slack
margin. This is achieved in practice through algorithm/architecture transformations
that maximise sharing of common sub-expressions for slow paths and minimise
sharing for fast paths. These transformations are most applicable when the algorithm
coefficients are fixed and know in advance. The DCT design in [65], for instance,
frequencies in the spatial domain. The result on image quality is a gradual loss of
fine detail as the three path groups are progressively compromised. A calibrated
analogue PVT sensor is used to detect delay variation in order to zero DCT
coefficients that are subject to timing-errors. This constitutes a canary approach and
a margin must be included to account for intra-die process variation.
PDS offers an approach derived from applying Razor DVS to DSP circuits.
The Razor concept is adapted to DSP accelerators using timing-error detecting flip-
flops on critical paths and adapting the error correction replay mechanism with the
error mitigation approach proposed. The circuit area, delay and power overheads are
low, and the design flow required is realistic and does not require algorithm
manipulations. It is emphasised that in this work, that no attempt is made to provide
resiliency at very high error rates, but instead are targeting a sufficient error rate
tolerance to afford a timing guard band to protect against catastrophic failure due to
fast moving delay variation effects, while reducing power consumption from voltage
scaling. A significant limitation of PDS is that there is a minimum suitable datapath
bit width that allows a large enough timing guard-band to be developed.
As a final remark, it is conceded that all the considered approaches are
susceptible to failure due to datapath metastability. While metastability is not
addressed directly in this work, it is remarked that metastability is arguably of less
concern in the case of a DSP accelerator, since there is little control logic and often
very little interaction between datapath and control logic. In addition to this, DSP
circuits typically have a given probability of failure anyway since they are often
4.8 Conclusions
In this chapter, a new, more generally applicable technique for circuit-level
timing-error mitigation was described, along with the design of an FIR filter and 2D
DCT transform. The technique is also broadly applicable to other fixed-point
datapaths. The proposed approach has a much smaller impact on minimum clock
period (16 – 20% reduction compared to conventional), as it does not necessitate the
use of ripple-carry adders and has the advantage of requiring minimal additional
design effort.
The proposed approach has been investigated in the context of a high-
performance programmable digital filter accelerator and a 2D DCT accelerator in a
32nm CMOS process technology. Simulation results show a reduction in power
consumption of 21 – 23% through DVS, with a negligible degradation in algorithmic
performance in both cases, as compared to a fully margined conventional
implementation at nominal supply voltage.
The PDS approach is shown to offer a number of attractive advantages when
compared to other published approaches to timing-error tolerance. Nonetheless, in
common with the approach of Chapter 3 and indeed many of the published
approaches discussed in Chapter 2, timing violations still result in some error, albeit
limited and easily modelled. In a commercial VLSI design context, this may still
restrict applicability, since many aspects of design verification at both design and
manufacturing test stages, require predictable, bit-accurate results. While design
verification and manufacturing test are beyond the scope of this work, the next
chapter explores an approach that potentially avoids logic errors altogether during