• No results found

6.3 Semi-Static Scheduling

6.3.1 Selective Processing

The tasks of MIMO detection and turbo decoding are to update the extrinsic LLRs, which are exchanged between them over iterations, see Fig. 5.3. Based on the input a-priori LLRs and the previously calculated extrinsic LLRs, some re-calculations are avoidable. Bypass- ing the corresponding computations, the overall computational energy consumed by the MIMO detector and turbo decoder can be reduced. Certainly, such computational energy savings should not be obtained at the expense of the decoding performance. In this part, we introduce a simple way to inform the MIMO detector and the turbo decoder to selectively update the extrinsic LLRs.

Informed Message Update (IMU) at the MIMO Detection Unit

The task of MIMO detection is to update the extrinsic LLRs of the code bits according to

λ[c]α,i = ln      P [c]Is,k(i):ci=1

pyk(i)|Hk(i), [c]Is,k(i)

 Q i′∈I s,k(i) p (ci′; λβ,i′) P [c]Is,k(i):ci=0

pyk(i)|Hk(i), [c]Is,k(i)

 Q i′∈I s,k(i) p (ci′; λβ,i′)     − λ β,i (a) ≈ ln     max [c]Is,k(i):ci=1

pyk(i)|Hk(i), [c]Is,k(i)

 Q i′∈I s,k(i) p (ci′; λβ,i′) max [c]Is,k(i):ci=0

pyk(i)|Hk(i), [c]Is,k(i)

 Q i′∈I s,k(i) p (ci′; λβ,i′)    − λβ,i (6.11)

where the max-log approximation at (a) is commonly adopted due to complexity con- straints. Let us denote the extrinsic LLRs generated at the iterationl as{λ[c],[l]α,i }. Instead of

calculatingλ[c],[l]α,i according to (6.11), we can also reuse its old value

which is clearly much simpler than (6.11). To identify which part of[c],[l]α,i } does not need re-calculations, the selection rule is given as follows:

• If|λ[c],[l−1]α,i + λ [l] β,i| > η[l] and sgn(λ [c],[l−1] α,i + λ [l] β,i) = sgn(λ [c],[l−1]

α,i ), the extrinsic LLR

λ[c],[l]α,i is generated according to (6.12);

• Otherwise, λ[c],[l]α,i is calculated according to (6.11).

In the above,η[l] is a pre-defined threshold. It can vary over iterations. Some remarks on

the above-defined selection rule are necessary. Note that the a-priori LLR λ[l]β,i input to the MIMO detection unit at the iterationl is also the extrinsic LLR generated at the turbo decoder, meaning that the sumλ[c],[l−1]α,i +λ[l]β,iis effectively the up-to-date a-posteriori LLR of the code bitcigenerated at the turbo decoding unit. In other words, the sign and magnitude

ofλ[c],[l−1]α,i + λ[l]β,i reflect the up-to-date decoding decision on ci and the reliability of the

decoding decision, respectively. When the magnitude is larger than the thresholdη[l], we

consider the sign information is reliable. If the equality sgn(λ[c],[l−1]α,i + λ[l]β,i) = sgn(λ[c],[l−1]α,i ) holds as well, we inferλ[c],[l−1]α,i is not yet out-dated and therefore can be reused.

After introducing the selection rule, let us proceed to realize such selective processing at the MIMO detection unit. In the standard SD algorithm, the extrinsic LLRs for the whole bit vector are computed via a single tree search. Therefore, some modifications in the SD algorithm are necessary to enable the selective LLR updates per bit vector. According to the short introduction of the SD algorithm in Section 5.3.2, the computation of the LLRs relies on finding xmin, dmin

det and the counter-hypothesis metrics associated to each bit in

the bit vector. During the tree traversal, they are recursively updated. To limit the tree search space, whenever we reach to a node, a step-down to the subtree expanded from the node is made only if visiting the leaf nodes belonging to the subtree may result in updates for any of xmin,dmin

det and the counter-hypothesis metrics. If no update is possible,

the subtree is pruned. Now, assume we are only interested in the extrinsic LLR associated to a specific bit. Then, we only need to use xmin,dmin

det and the counter-hypothesis metric

associated to that bit. In short, this identification suggests only a subset of the counter- hypothesis metrics are of interest when LLR re-calculation is not needed for all bits in one bit vector. Given this point, we can reduce the search space of SD by simply tightening the tree pruning criterion such that a step-down to a subtree is only needed for updating any of xmin,dmin

det and the counter-hypothesis metrics that are of interest. Such change is simple,

as it has no impact on the depth-first tree traversal, the enumeration and other techniques for complexity reductions, e.g., LLR clipping. With a fastened tree pruning process, the number of VNs, i.e.,Nvn reduces, so does the computational energy consumed by SD.

IMU at the Turbo Decoding Unit

Analogously, the above-described selection rule adopted in the MIMO detection unit can be straightforwardly extended for informing the turbo decoder to selectively update {λ[m]α1,i, λ

[m]

α2,i, λβ,i}. In the following, we focus on how to save the computational energy

At the turbo decoding unit, the extrinsic LLRs[m]α1,i, λ[m]α2,i, λβ,i} are generated by fol-

lowing the BCJR algorithm, respectively. In the BCJR algorithm, the trellis diagram of the CC needs to be traversed twice, cf. Section 2.1. For the first time, the beta metrics {βFB,i(Si+1)} for i = Nm, Nm− 1, . . . , 1 are recursively calculated and saved for the next

step. During the second time of going through the trellis, the metrics FB,i(Si+1)} are

sequentially read from the memory and used for computing the LLRs. Suppose the re- calculations for the LLRs associated to the stagei′of the trellis diagram are skipped. Then,

the metricβFB,i′(Si+1) is only needed for computing the metric βFB,i−1(Si′) on the previous

stage. This impliesβFB,i′(Si+1) only needs to be calculated, used and overwritten immedi-

ately. By avoiding the writing/reading operation forβFB,i′(Si+1), the realization of the IMU

at the turbo decoding unit results in a reduction on the number of writing/reading opera- tions forFB,i(Si+1)}, i.e., Nwr. As the writing/reading operations for {βFB,i(Si+1)} con-

sume more computational energy than other arithmetic operations involved in the BCJR algorithm [97], the employment of the IMU at the turbo decoding unit can be an efficient way for saving the computational energy.

Complexity Analysis

To support selective processing, some additional hardware costs are needed. First, for iden- tifying the extrinsic LLRs to be re-calculated, the above-described selection rule requires the following additional computational effort. Each magnitude check for an a-posteriori LLR requires one real-valued addition and one real-valued comparison, while the sign check only requires one logic comparison. In general, these operations are much simpler than that required for VNs and writing/reading for beta metrics.

Another hardware requirement is related to the memory used for keeping the extrin- sic LLRs. Take the extrinsic LLRs that are exchanged within the turbo decoding unit, i.e., {λ[m]α1,i} and {λ

[m]

α2,i} as an example. Within a sequential processing architecture, they can

share the same memory. Namely, the extrinsic LLRs[m]α1,i} newly calculated by one convo- lutional decoder can overwrite[m]α2,i} that are generated by the other one at the previous step. However, to support selective processing,[m]α2,i} cannot be overwritten, since some of them will be re-used. This implies[m]α1,i} and {λ[m]α2,i} have to be stored separately. Sep- arate memory for them is also needed in a parallel processing architecture. Compared to a sequential processing architecture, a parallel one has the low latency advantage and spurs a great research interest, e.g., in [139].