Comparison to Published Approaches - Approximate Timing-Error Correction by Path Delay Shaping

Chapter 4. Approximate Timing-Error Correction by Path Delay Shaping

4.7 Comparison to Published Approaches

In order to put the proposed PDS technique into context, a qualitative

comparison with established approaches from the literature is presented. Table I

contrasts the key features of PDS along with generalisations of two approaches

introduced in Chapter 2, both of which have also been studied in numerous DSP

applications.

The most established technique is ANT, which is the only one to have been

realised in silicon to date [61]. ANT represents an elegant solution that requires only

standard logic gates and flip-flops. An implementation of the algorithm, known as

the main pipeline, is designed to produce timing-errors in the MSBs, such that they

are of large magnitude. This is achieved by explicit use of RCAs, which in our

comparisons were found to incur a delay increase of 69–72% in high performance

applications. An estimator block is then implemented to provide a low-overhead

prediction of the current output sample. An error is identified by the decision block

when the main pipeline output diverges from the output of the estimator block by

more than a given threshold. If an error is detected, the incorrect sample is replaced

Table 4.1: Qualitative comparison with published approaches.

Error Detection Error Mitigation Circuit Overhead ANT Main pipeline output

sample compared to estimator output

Replace erroneous output sample with estimator sample

RCAs in main pipeline limit performance; estimator block; decision

block SDC Analogue PVT sensor

and pre-calibrated look-up table

Errors forced into less significant coefficients

PVT sensor; minimal performance loss due to

the

algorithm/architecture modifications PDS Razor flip-flops Errors forced into less

significant bits of datapaths

Razor flops; modified carry-merge adders

The estimator block and the decision block introduce additional circuit overhead,

mainly dictated by the implementation of the predictor, for which there are a number

of options. The predictor block may also introduce additional limitations, e.g. the

linear prediction approach may limit signal bandwidth, while the RPR technique

may introduce excessive area overhead [65].

Significance-driven computation is more closely related to PDS, in that it

coerces timing-errors to occur first in a particular part of the circuit by affecting the

path delay, resulting in a timing guard-band. The key difference is that SDC ensures

that computations associated with less significant coefficients are most critical and

therefore fail first, while the most important coefficients are engineered with a slack

margin. This is achieved in practice through algorithm/architecture transformations

that maximise sharing of common sub-expressions for slow paths and minimise

sharing for fast paths. These transformations are most applicable when the algorithm

coefficients are fixed and know in advance. The DCT design in [65], for instance,

frequencies in the spatial domain. The result on image quality is a gradual loss of

fine detail as the three path groups are progressively compromised. A calibrated

analogue PVT sensor is used to detect delay variation in order to zero DCT

coefficients that are subject to timing-errors. This constitutes a canary approach and

a margin must be included to account for intra-die process variation.

PDS offers an approach derived from applying Razor DVS to DSP circuits.

The Razor concept is adapted to DSP accelerators using timing-error detecting flip-

flops on critical paths and adapting the error correction replay mechanism with the

error mitigation approach proposed. The circuit area, delay and power overheads are

low, and the design flow required is realistic and does not require algorithm

manipulations. It is emphasised that in this work, that no attempt is made to provide

resiliency at very high error rates, but instead are targeting a sufficient error rate

tolerance to afford a timing guard band to protect against catastrophic failure due to

fast moving delay variation effects, while reducing power consumption from voltage

scaling. A significant limitation of PDS is that there is a minimum suitable datapath

bit width that allows a large enough timing guard-band to be developed.

As a final remark, it is conceded that all the considered approaches are

susceptible to failure due to datapath metastability. While metastability is not

addressed directly in this work, it is remarked that metastability is arguably of less

concern in the case of a DSP accelerator, since there is little control logic and often

very little interaction between datapath and control logic. In addition to this, DSP

circuits typically have a given probability of failure anyway since they are often

4.8 Conclusions

In this chapter, a new, more generally applicable technique for circuit-level

timing-error mitigation was described, along with the design of an FIR filter and 2D

DCT transform. The technique is also broadly applicable to other fixed-point

datapaths. The proposed approach has a much smaller impact on minimum clock

period (16 – 20% reduction compared to conventional), as it does not necessitate the

use of ripple-carry adders and has the advantage of requiring minimal additional

design effort.

The proposed approach has been investigated in the context of a high-

performance programmable digital filter accelerator and a 2D DCT accelerator in a

32nm CMOS process technology. Simulation results show a reduction in power

consumption of 21 – 23% through DVS, with a negligible degradation in algorithmic

performance in both cases, as compared to a fully margined conventional

implementation at nominal supply voltage.

The PDS approach is shown to offer a number of attractive advantages when

compared to other published approaches to timing-error tolerance. Nonetheless, in

common with the approach of Chapter 3 and indeed many of the published

approaches discussed in Chapter 2, timing violations still result in some error, albeit

limited and easily modelled. In a commercial VLSI design context, this may still

restrict applicability, since many aspects of design verification at both design and

manufacturing test stages, require predictable, bit-accurate results. While design

verification and manufacturing test are beyond the scope of this work, the next

chapter explores an approach that potentially avoids logic errors altogether during

Chapter 5. Timing-Error Correction by

In document Timing-Error Tolerance Techniques for Low-Power DSP: Filters and Transforms (Page 97-101)