Chapter 5 Research Implementation
5.3 Adaptive Error Compensation in Subsequent Addition
We have implemented a modification to the approach described in algorithm 1. We re- fer to this approach asadaptive error compensation in subsequent addition(AECSA). In this design, we calculate the errors during summation and if possible apply a cor-
rection as well. If it is not possible to apply the correction because of unavailability of corresponding error, we accumulate the error. Once the set has been reduced com- pletely, we add the final accumulated error term and the result. This approach has been depicted in Figure 5.7.
Figure 5.6 shows the pipeline required for implementing one iteration of compen- sated summation algorithm which compensates for the error in subsequent addition. In this implementation, error term, e calculated in the previous iteration is required as an input to the first adder. Due to the depth of the adder network, it is not possi- ble to have this error every cycle. Error will not be available for each addition even if we replace the two adders with one custom adder which generates error. The values from different sets can be scheduled in an interleaved manner but this will require lot of on-chip resources, complex control logic and upfront processing of data.
Figure 5.6: Error Compensation in Subsequent Addition
Figure 5.8 shows an overview of AECSA. In this design, we require two reduction circuits but unlike AEC, the application of rules in ERC depends on the availability of errors to VRC. Further, since the error compensation may take during accumulation of values, two adders connected sequentially are required in VRC. The first adder is for error correction which adds the error to an input value while the other adder adds the output of the first adder with the second input value. The second adder also generates the error term.
Figure 5.7: Adaptive Error Compensation in Subsequent Addition
the input values. If |a| >|b|, a and b are swapped. Thus, the overall adder pipeline depth in VRC is more than two times of that in AEC and original reduction circuit. Here, we increase the size of the input FIFO but keep the number of buffers same as the original reduction circuit in order to keep the control logic simple.
As discussed previously, due to the depth of adder pipeline and simultaneous accumulations from the same set, there may arrive a situation where we may not be able to compensate for the error using the first adder. In such a case, we accumulate the error using ERC. Errors can also be supplied from ERC to VRC in order to maximize the chances of error compensation in VRC. After set reduction, the final error term from ERC can be added to summation to obtain the final result.
The major difference between AEC and AECSA is, in AEC the rules in ERC are independent of conditions in VRC while in AECSA, a check for set ID match is performed and if a match is found then that error value is supplied to VRC. In other words, the set ID of error term in ERC is equal to the set ID which is being input the adder pipeline in VRC, the corresponding value from ERC is supplied to VRC for error compensation.
In ERC, if conditions for a rule are satisfied but the set ID of error source matches in VRC then that rule is not applied. In such a case, some other rule is applied. Error
Figure 5.8: Module for AECSA
accumulation in ERC is performed only when is it not possible to supply the errors to VRC as input. Thus, error input to VRC is prioritized. The rules in ERC have been modified in accordance with this policy.
There are four sources of the input error in VRC- the output of the custom floating point adder in VRC, FIFO output in ERC, buffers in ERC and output of adder in ERC. When applying Rule 1, Rule 2, Rule 3 and Rule 4, the set ID of the error outputs is checked compared with the adder input set ID. If a match is found, the respective error term is supplied as input. If the error output from the custom adder in VRC serves as the input error, this error term is not supplied to ERC and err_valid line is de-asserted. If the error term comes from ERC, it is invalidated in ERC and is not considered for accumulation. The rule to be applied in ERC depends on whether the corresponding error term has been invalidated or not. For example, in ERC, if the set ID of adder output matches the set ID of one of the buffers (Rule 1), but at the same time, in VRC, the set ID is the same as set ID of adder output in ERC, then Rule 1 won’t be applied in ERC and the error term from output of the adder pipeline in ERC is supplied to VRC. In such a case, some other rule is applied in
ERC. Algorithm 6 describes the control logic in VRC. It is evident that the number of compare operations per cycle is significantly more and the application of a rule in ERC is dependent on the logic decision from VRC. This essentially levies a timing challenge and the performance in terms of operating is affected adversely due to this. Also, the rules in ERC have been modified to accommodate the condition for potential set ID match in VRC. The control logic for ERC is shown in algorithm 7.
In the original reduction circuit, we have three counters- the first counts up when a new value arrives in the reduction circuit, the second counts down when two values are supplied to the adder while the third counter counts down when a set has been reduced. Since ERC also supplies error terms back to VRC, the number of valid error term decreases even when there is no reduction. In order to account for the error terms supplied to VRC from ERC and keep track of the number of errors belonging to a particular set in ERC, we need a fourth counter. This counter counts down whenever an error term is supplied to VRC from ERC. In the absence of this counter, the error reduction will not be correct. Example in Figure 5.9 depicts the working of counters in ERC. The subscripts in the example represent the counter which is activated. Thus, when two error terms are added in ERC, counter 1 is activated. Similarly, if a value is supplied from ERC to VRC from either the input FIFO, buffers or output of adder, counter 4 is activated.
In ERC, similar to AEC, the error values belonging to a particular set are not contiguous hence we need to check the status of the set VRC. This can be checked using the valid_out signal when an error term is supplied to ERC. If valid_out signal is asserted, then the main set has been completely reduced and the last error value has been supplied to ERC. This signal is synchronized with the input FIFO in ERC and is stored in a dual ported memory in ERC. In order to assert valid_out_err in ERC, this signal must be asserted and the sum of outputs of the three counters must be 1.
Algorithm 6 AECSA VRC Rules
1: if ∃n :bufn.set=adderOut.set then .Rule 1 2: rule1 = 1 3: addIn1 =adderOut 4: addIn2 =bufn 5: if input.validthen 6: bufn=input 7: end if
8: else if ∃i, j :bufi.set=bufj.set then .Rule 2 9: rule2 = 1 10: addIn1 =bufi 11: addIn2 =bufj 12: if input.validthen 13: bufi =input 14: end if
15: if numActive(adderOut.set) = 1 then
16: result=adderOut
17: else
18: bufi =adderOut 19: end if
20: else if input.valid then .Rule 3
21: if input.set=adderOut.set then
22: rule3 = 1
23: addIn1 =input
24: addIn2 =adderOut
25: end if
26: else if input.valid then .Rule 4
27: if ∃n:bufn.set=input.set then
28: rule4 = 1
29: addIn1 =input
30: addIn2 =bufn
31: if numActive(adderOut.set) = 1 then
32: result=adderOut
33: else
34: bufn=adderOut 35: end if
36: end if
37: else if input.valid then .Rule 5
38: addIn1 =input
39: addIn2 = 0
40: if numActive(adderOut.set) = 1 then
41: result=adderOut
42: else
43: if ∃n:bufn.valid= 0 then 44: bufn=adderOut 45: else 46: ERROR 47: end if 48: end if 49: else .Rule 6 50: addIn1 =AdderOut 51: addIn2 = 0 52: end if
53: if rule1 OR rule2 OR rule3 OR rule4then
54: if not(adderOut.rule_5_or_6_out)then
55: addErrIn=addererrOut
56: errInput.disable = 1
57: else if errInput.set =adderOut.set then
58: addErrIn=errInput
59: errInput.errEn= 1
60: else if errAdderOut.set=adderOut.set then
61: addErrIn=errAdderOut
62: errAdderOut.errEn = 1
63: else if ∃n :errBufn.set=adderOut.set then 64: addErrIn=errBufn 65: errBufn.errEn= 1 66: else 67: addErrIn= 0.0 68: end if 69: else 70: addErrIn= 0.0 71: end if
72: if count1 +count2 +count3 = 1 then
73: redCktOut.valid= 1
74: redCktOut.set =adderOut.set
75: redCktOut=adderOut
76: end if
77: if not(errInput.disable)ornot(adderOut.rule_5_or_6_out)then
78: redCktOut.Err =adderOut.Err 79: redCktOut.errV alid=adderOut.valid
80: redCktOut.errSet=adderOut.set
81: end if
Algorithm 7 AECSA ERC Rules
1: if ∃n:errBufn.set=errAdderOut.set and
2: not(errBufn.errEn or errAdderOut.errEn)then . Rule 1 3: rule1 = 1 4: errAddIn1=errAdderOut 5: errAddIn2=errBufn 6: if errInput.validthen 7: errBufn =errInput 8: end if
9: else if ∃i, j:errBufi.set=errBufj.set and
10: not(errBufi.errEn or errBufj.errEn)then . Rule 2 11: rule2 = 1 12: errAddIn1=errBufi 13: errAddIn2=errBufj 14: if errInput.validthen 15: errBufi=errInput 16: end if
17: if numActive(adderOut.set) = 1then
18: result=errAdderOut
19: else
20: bufi=errAdderOut
21: end if
22: else if errInput.validthen . Rule 3
23: if errInput.set=errAdderOut.set and
24: not(errInput.errEn or errAdderOut.errEn)then
25: rule3 = 1
26: addIn1=errInput 27: addIn2=errAdderOut
28: end if
29: else if errInput.validthen . Rule 4
30: if ∃n:errBufn.set=errInput.set and 31: not(errInput.errEn or errBufn.errEn)then 32: rule4 = 1
33: errAddIn1=errInput 34: errAddIn2=errBufn
35: if numActive(errAdderOut.set) = 1then
36: result=errAdderOut
37: else
38: bufn=errAdderOut
39: end if
40: end if
41: else if errInput.validthen . Rule 5
42: errAddIn1=errInput 43: errAddIn2= 0
44: if numActive(errAdderOut.set) = 1 then
45: result=errAdderOut
46: else
47: if ∃n:errBufn.valid= 0then 48: errBufn=errAdderOut 49: else 50: ERROR 51: end if 52: end if 53: else . Rule 6 54: errAddIn1=errAdderOut 55: errAddIn2= 0 56: end if
57: if errCount1 +errCount2 +errCount3 +errCount4 = 1 andvrc.set.done= 1then
58: errOut.valid= 1
59: errOut.set=adderOut.set
60: errOut=adderOut
Also, the summation of the main set is stored in a dual ported memory. Once the error terms belonging to a set have been reduced, the final sum is calculated by adding the error term from the adder and main set sum from the memory. For this another floating point adder is required.
In AECSA, the overall behavior of VRC does not change. Rules are applied in the original order of priority even when an error term is not available from any of the sources.