Optimized Architecture - Soft-Output Delayed Decision Feedback Sequence Estimation

2.4 Soft-Output Delayed Decision Feedback Sequence Estimation

2.4.4 Optimized Architecture

The critical path can be significantly reduced by sacrificing the minimum number of operations property. The order of operations is changed in such a way that as few operations as possible are part of the first order feedback loops. An overview of the optimized architecture with the operation reordering is shown in Fig. 2.32. It can be easily seen that the reordering comes with an increase in the total number of operations required.

A ﬁrst delay optimization can be achieved by exploiting the fact that the calculation of the history feedback requires only hard-decisions and not soft information. Hence, the complexity

Figure 2.31 – Detailed overview of the DDFSE baseline architecture with the critical path marked in red.

as the hard-decision register exchange has signiﬁcantly lower complexity compared to the soft-decision version.

The operations of (2.47) can now be reordered according to their data dependencies, e.g., the reference signal e_b¯only depends on the CIR h and is time-invariant during the processing

of a frame. It can be subtracted without any feedback dependency and therefore moved further to the beginning of the arithmetic operations. Another reordering can be performed for the common history feedback (CHF). In the common history, which is essentially a shift register, only the first register depends on the decision taken in the ACS. Hence, the feedback from all but the first register can be effectivly retimed and removed from the critical path. Consequently the CHF is split into two parts, a primary common history feedback which consists only of the feedback of the first decision and a secondary common history feedback for all remaining decisions. The secondary CHF can now be retimed and applied before all other operations.

Speculative ISH Feedback Calculation

In a classical register exchange the decisions of the ACS unit are not only used to determine the transitions between states, but also become the most recent value of the respective state. As a consequence all possible history vectors for a state at time k can be predicted using the histories of its source states at time k− 1. Interestingly, the decision value and therefore the complete predicted history vector only depends on the source state. As a result, there is only one speculative feedback value per state which is identical for all its destination states. This knowledge can be used to calculate for each state a speculative feedback value in prepa- ration for the case in which this state would be chosen as source state. This allows to break the critical path of the baseline architecture. Due to the speculation, the arithmetic operations of (2.47) do not depend on the decision of the previous cycle. Instead, potential branch metric values are calculated for all possible feedback values from a speciﬁc state already before the decision is taken.

The BM unit is split into two parts, a preBM unit, which speculatively calculates all possible branch metric values and a postBM unit, which selects the applicable value and calculates the norm. By inserting a pipeline stage between the preBM and postBM unit, the calculation of the norm and the operations in the ACS unit are removed from the critical path. Practically, this halves the complexity of the critical path and allows a signiﬁcant increase of the clock speed.

Figure 2.32 – Detailed overview of the optimized DDFSE architecture with a reduced critical path.

Figure 2.33 – LUT architecture with serial write and parallel read scheme.

Pre-calculated Look-Up Tables

The calculation of the feedback and the reference values e_b¯is performed using look-up tables

(LUTs). All possible output values are pre-calculated at the beginning of a frame and stored in a memory. The decision vector input of the feedback unit is used as address for the feedback value storage.

The register-based LUT is implemented in a serial-in, parallel-out fashion. The possible feedback words are serially calculated by convolution of the relevant part of the CIR with the output of a counter. Afterwards the feedback value is written to the last entry of the LUT. During the write phase the registers are conﬁgured as a shift register and the values are clocked into their respective position. After all values are clocked in the LUT, the clock of the registers can be gated during the actual detection (between the ﬁrst and the last data symbol of the frame) to reduce the power consumption.

The parallel output of the LUT registers is connected to an address decoder multiplexer. The address decoder selects the correct LUT value based on the input decision vector. For large feedback word widths routing of the address decoder can be problematic during the backend design due to congestion. To address this problem, the LUT can be split into multiple slices of which each stores only some of the bits of the ﬁnal feedback value. The number of bits per slice is variable to match the requirements of the physical implementation.

The same basic LUT design is used for the generation of the ISHF, the CHF, and the reference values e_b¯. The difference between the LUTs is in the number of address inputs and generated

feedback output ports. While the primary and secondary CHFs generate only a single feedback value each, the individual state history feedback (ISHF) generates one output per state and the LUT for the reference values has each entry hard-wired as a separate output.

The basic feedback unit allows to deactivate parts of the address input, if the corresponding channel coefficient is zero. A zero channel coefficient implies that the associated decision will not affect the feedback. Nevertheless, on any input change the feedback unit will still apply a different address vector to the LUT and cause activity in the address decoder logic. As the main power consumption of the LUT-based feedback calculation comes from switching in the logic tree of the address decoder, a receiver cannot profit from zero channel coefficients and therefore will not exhibit an energy-proportional behaviour. Deactivation of the corresponding address inputs allows the feedback unit to reduce the power consumption with the number of channel coefficients that are zero (energy-proportional operation).

mSOVA Unit

The mSOVA register exchange (RE) unit calculates the initial LLR values based on a sorted list of PMs and the associated decisions from the ACS. Unlike in a traditional register exchange, the mSOVA RE keeps not only track of the decision of the most likely path, but uses the sorted PMs to evaluate the probability of each decision. Because the reliability is calculated separately for the bits of each symbol, the mSOVA RE always evaluates the decision reliability relative to the next most likely transition which would have led to a different bit decision. In a QPSK scenario it is given that if the first and the second path lead to the same bit decision, the decision associated with the third path must differ in this bit. Hence, the LLRgen unit shown in Fig. 2.34 always calculates the difference between the smallest PM and the two following PMs. Based on the decisions associated with the two winning PMs, the correct value for the reliability is selected. In the case that the first two path metrics would lead to a different bit decision, their difference becomes the reliability, otherwise the PM difference of the first and the third path becomes the reliability of the LLR. The decision of the winning path metric provides the sign for the LLR. The LLR values inside the mSOVA unit are stored and processed in a sign- magnitude format and only converted at the output into a two’s complement representation.

After initial calculation, the LLRs enter a soft-information register exchange. While a hard- decision register exchanges swaps only decision histories between states, the soft-decision extension additionally calculates for each transition an upper limit of the LLR magnitudes based on the reliability of this decision. The minLLR unit shown in Fig. 2.35 updates the reliability of the LLR. Similar to the calculation of the initial LLR values, the minLLR unit takes the reliability of the decision and the LLR reliability values of the two most-likely paths into account. In case the bit decisions of the two paths differ, the reliability of the LLR is upper-bound by the reliability of the decision of the swap. If the two bit decisions do not differ, then the LLR is updated using the minimum between the reliability values of the winning path and the sum of the reliability values of the decision and the second path.

Figure 2.34 – Initial LLR generation unit for modiﬁed soft-output register exchange.

Figure 2.36 – BER performance comparison of single tap phase rotator with decision feedback, 8 taps MMMSE TD-LE and DDFSE receiver in 11ad CB near location scenario.

In document Modulation, Coding, and Receiver Design for Gigabit mmWave Communication (Page 70-77)