6.3 Semi-Static Scheduling
6.3.1 Selective Processing
The tasks of MIMO detection and turbo decoding are to update the extrinsic LLRs, which are exchanged between them over iterations, see Fig. 5.3. Based on the input a-priori LLRs and the previously calculated extrinsic LLRs, some re-calculations are avoidable. Bypass- ing the corresponding computations, the overall computational energy consumed by the MIMO detector and turbo decoder can be reduced. Certainly, such computational energy savings should not be obtained at the expense of the decoding performance. In this part, we introduce a simple way to inform the MIMO detector and the turbo decoder to selectively update the extrinsic LLRs.
Informed Message Update (IMU) at the MIMO Detection Unit
The task of MIMO detection is to update the extrinsic LLRs of the code bits according to
λ[c]α,i = ln P [c]Is,k(i):ci=1
pyk(i)|Hk(i), [c]Is,k(i)
Q i′∈I s,k(i) p (ci′; λβ,i′) P [c]Is,k(i):ci=0
pyk(i)|Hk(i), [c]Is,k(i)
Q i′∈I s,k(i) p (ci′; λβ,i′) − λ β,i (a) ≈ ln max [c]Is,k(i):ci=1
pyk(i)|Hk(i), [c]Is,k(i)
Q i′∈I s,k(i) p (ci′; λβ,i′) max [c]Is,k(i):ci=0
pyk(i)|Hk(i), [c]Is,k(i)
Q i′∈I s,k(i) p (ci′; λβ,i′) − λβ,i (6.11)
where the max-log approximation at (a) is commonly adopted due to complexity con- straints. Let us denote the extrinsic LLRs generated at the iterationl as{λ[c],[l]α,i }. Instead of
calculatingλ[c],[l]α,i according to (6.11), we can also reuse its old value
which is clearly much simpler than (6.11). To identify which part of{λ[c],[l]α,i } does not need re-calculations, the selection rule is given as follows:
• If|λ[c],[l−1]α,i + λ [l] β,i| > η[l] and sgn(λ [c],[l−1] α,i + λ [l] β,i) = sgn(λ [c],[l−1]
α,i ), the extrinsic LLR
λ[c],[l]α,i is generated according to (6.12);
• Otherwise, λ[c],[l]α,i is calculated according to (6.11).
In the above,η[l] is a pre-defined threshold. It can vary over iterations. Some remarks on
the above-defined selection rule are necessary. Note that the a-priori LLR λ[l]β,i input to the MIMO detection unit at the iterationl is also the extrinsic LLR generated at the turbo decoder, meaning that the sumλ[c],[l−1]α,i +λ[l]β,iis effectively the up-to-date a-posteriori LLR of the code bitcigenerated at the turbo decoding unit. In other words, the sign and magnitude
ofλ[c],[l−1]α,i + λ[l]β,i reflect the up-to-date decoding decision on ci and the reliability of the
decoding decision, respectively. When the magnitude is larger than the thresholdη[l], we
consider the sign information is reliable. If the equality sgn(λ[c],[l−1]α,i + λ[l]β,i) = sgn(λ[c],[l−1]α,i ) holds as well, we inferλ[c],[l−1]α,i is not yet out-dated and therefore can be reused.
After introducing the selection rule, let us proceed to realize such selective processing at the MIMO detection unit. In the standard SD algorithm, the extrinsic LLRs for the whole bit vector are computed via a single tree search. Therefore, some modifications in the SD algorithm are necessary to enable the selective LLR updates per bit vector. According to the short introduction of the SD algorithm in Section 5.3.2, the computation of the LLRs relies on finding xmin, dmin
det and the counter-hypothesis metrics associated to each bit in
the bit vector. During the tree traversal, they are recursively updated. To limit the tree search space, whenever we reach to a node, a step-down to the subtree expanded from the node is made only if visiting the leaf nodes belonging to the subtree may result in updates for any of xmin,dmin
det and the counter-hypothesis metrics. If no update is possible,
the subtree is pruned. Now, assume we are only interested in the extrinsic LLR associated to a specific bit. Then, we only need to use xmin,dmin
det and the counter-hypothesis metric
associated to that bit. In short, this identification suggests only a subset of the counter- hypothesis metrics are of interest when LLR re-calculation is not needed for all bits in one bit vector. Given this point, we can reduce the search space of SD by simply tightening the tree pruning criterion such that a step-down to a subtree is only needed for updating any of xmin,dmin
det and the counter-hypothesis metrics that are of interest. Such change is simple,
as it has no impact on the depth-first tree traversal, the enumeration and other techniques for complexity reductions, e.g., LLR clipping. With a fastened tree pruning process, the number of VNs, i.e.,Nvn reduces, so does the computational energy consumed by SD.
IMU at the Turbo Decoding Unit
Analogously, the above-described selection rule adopted in the MIMO detection unit can be straightforwardly extended for informing the turbo decoder to selectively update {λ[m]α1,i, λ
[m]
α2,i, λβ,i}. In the following, we focus on how to save the computational energy
At the turbo decoding unit, the extrinsic LLRs{λ[m]α1,i, λ[m]α2,i, λβ,i} are generated by fol-
lowing the BCJR algorithm, respectively. In the BCJR algorithm, the trellis diagram of the CC needs to be traversed twice, cf. Section 2.1. For the first time, the beta metrics {βFB,i(Si+1)} for i = Nm, Nm− 1, . . . , 1 are recursively calculated and saved for the next
step. During the second time of going through the trellis, the metrics {βFB,i(Si+1)} are
sequentially read from the memory and used for computing the LLRs. Suppose the re- calculations for the LLRs associated to the stagei′of the trellis diagram are skipped. Then,
the metricβFB,i′(Si′+1) is only needed for computing the metric βFB,i′−1(Si′) on the previous
stage. This impliesβFB,i′(Si′+1) only needs to be calculated, used and overwritten immedi-
ately. By avoiding the writing/reading operation forβFB,i′(Si′+1), the realization of the IMU
at the turbo decoding unit results in a reduction on the number of writing/reading opera- tions for{βFB,i(Si+1)}, i.e., Nwr. As the writing/reading operations for {βFB,i(Si+1)} con-
sume more computational energy than other arithmetic operations involved in the BCJR algorithm [97], the employment of the IMU at the turbo decoding unit can be an efficient way for saving the computational energy.
Complexity Analysis
To support selective processing, some additional hardware costs are needed. First, for iden- tifying the extrinsic LLRs to be re-calculated, the above-described selection rule requires the following additional computational effort. Each magnitude check for an a-posteriori LLR requires one real-valued addition and one real-valued comparison, while the sign check only requires one logic comparison. In general, these operations are much simpler than that required for VNs and writing/reading for beta metrics.
Another hardware requirement is related to the memory used for keeping the extrin- sic LLRs. Take the extrinsic LLRs that are exchanged within the turbo decoding unit, i.e., {λ[m]α1,i} and {λ
[m]
α2,i} as an example. Within a sequential processing architecture, they can
share the same memory. Namely, the extrinsic LLRs{λ[m]α1,i} newly calculated by one convo- lutional decoder can overwrite{λ[m]α2,i} that are generated by the other one at the previous step. However, to support selective processing,{λ[m]α2,i} cannot be overwritten, since some of them will be re-used. This implies{λ[m]α1,i} and {λ[m]α2,i} have to be stored separately. Sep- arate memory for them is also needed in a parallel processing architecture. Compared to a sequential processing architecture, a parallel one has the low latency advantage and spurs a great research interest, e.g., in [139].