Blind Multiuser Detection
2.3 Blind Multiuser Detection: Direct Methods
2.3.3 QR-RLS Algorithm
The RLS approach discussed in the previous subsection, which is based on the matrix in-version lemma for recursively updating Cr[i]−1, has O(N2) complexity per update. Note that although fast RLS algorithms of O(N) complexity exist [62, 79, 113, 121], all these
algorithms exploit the shifting property of the input data. In this particular application, however, successive input data vectors do not have the shifting relationship, in fact, r[i] and r[i− 1] do not overlap at all. Therefore, these standard fast RLS algorithms can not be applied in this application.
The RLS implementation of the blind linear MMSE detector suffers from two major prob-lems. The first problem is numerical. Recursive estimation of Cr[i]−1 is poorly conditioned because it involves inversion of a data correlation matrix. The condition number of a data correlation matrix is the square of the condition number of the corresponding data matrix;
hence twice the dynamic range is required in the numerical computation [155]. The second problem is that the form of the recursive update of Cr[i]−1 severely limits the parallelism and pipelining that can effectively be applied in implementation.
A well-known approach for overcoming these difficulties associated with the RLS algo-rithms is the rotation-based QR-RLS algorithm [105, 381, 580]. The QR decomposition transforms the original RLS problem into a problem that uses only transformed data values, by Cholesky factorization of the original least-squares data matrix. This causes the numeri-cal dynamic range of the transformed computational problem to be halved, and enables more accurate computation, compared with the RLS algorithms that operate directly on Cr[i]−1. Another important benefit of the rotation-based QR approaches is that the computation can be easily mapped onto systolic array structures for parallel implementations. We next describe the QR-RLS blind linear MMSE detector, which was first developed in [381].
QR-RLS Blind Linear MMSE Detector
Assume that Cr[i] is positive definite. Let
Cr[i] = C[i]HC[i] (2.54)
be the Cholesky decomposition, i.e., C[i] is the unique upper triangular Cholesky factor with positive diagonal elements. Define the following quantities:
u[i] = C[i] −Hs1, (2.55)
v[i] = C[i] −Hr[i], (2.56)
and α[i] = s H1 Cr[i]−1s1 = u[i]Hu[i]. (2.57)
At time i, the a posteriori least-squares (LS) estimate is given by
z[i] = m 1[i]Hr[i] = sH1 Cr[i]−1r[i]
sH1 Cr[i]−1s1 (2.58)
= u[i]Hv[i]/α[i]. (2.59)
The a priori LS estimate at time i is given by
ξ[i] = m 1[i− 1]Hr[i]. (2.60) It can be shown that ξ[i] and z[i] are related by [381]
ξ[i] = z[i]
1− v[i]2 + α[i]|z[i]|2. (2.61) Suppose that C[i− 1] and u[i − 1] are available from the previous recursion. At time i, the new observation r[i] becomes available. We construct a block matrix consisting of C[i− 1], u[i− 1] and r[i], and apply an orthogonal transformation as follows
Q[i]
* √λC[i− 1] u[i − 1]/√ λ 0
r[i]H 0 1
+
=
* C[i] u[i] v[i]
0H η[i] γ[i]
+
. (2.62)
In (2.62) the matrix Q[i], which zeros the first N elements on the last row of the partitioned matrix appearing on the left-hand side of (2.62), is an orthonormal matrix consisting of N Givens rotations,
Q[i] = Q
N[i]· · · Q2[i]Q
1[i], (2.63)
where Q
n[i] zeros the nth element in the last row by rotating it with the (n + 1)th row. An individual rotation is specified by two scalars, cnand sn(which can be regarded as the cosine and sine respectively of a rotation angle φn), and affects only the last row and the (n + 1)th row. The effects on these two rows are
* cn sn
−s∗n cn
+ * 0 · · · 0 yn yn+1 · · · 0 · · · 0 rn rn+1 · · ·
+
=
* 0 · · · 0 yn yn+1 · · · 0 · · · 0 0 rn+1 · · ·
+
. ←− (n + 1)th row
←− last row (2.64)
where the rotation factors are defined by
cn = yn∗
,|yn|2+|rn|2, (2.65)
and sn = r∗n
,|yn|2+|rn|2. (2.66)
The correctness of (2.62) is shown in the Appendix (Section 2.8.1). It is seen from (2.62) that the computed quantities appearing on the right-hand side are C[i], u[i] and v[i] at time n. It is also shown in the Appendix (Section 2.8.1) that the quantities α[i], z[i] and ξ[i] can be updated according to the following equations
α[i] = α[i− 1]/λ − |η[i]|2, (2.67)
z[i] = −η[i]∗γ[i]/α[i], (2.68)
and ξ[i] = z[i]
|γ[i]|2+ α[i]|z[i]|2. (2.69) Note that γ[i] in (2.62) is the last diagonal element of Q[i]. A direct calculation shows that γ[i] =-N
i=1cn [105, 311].
The initialization of the QR-RLS blind adaptive algorithm is given by C[−1] = √ δIN, u(0) = s1/√
δ and α[−1] = δ, where δ is a small number. This corresponds to the initial condition Cr[−1] = δIN and m1[−1] = s1, i.e., the adaptation starts with the matched filter. At each time i, the algorithm proceeds as follows.
Algorithm 2.4 [QR-RLS blind linear MMSE detector - synchronous CDMA]
• Update the detector: Apply the orthonormal transformation (2.62).
• Compute the detector output and perform differential detection:
z1[i] = η[i]∗γ[i], (2.70)
βˆ1[i] = sign{(z1[i]z1[i− 1]∗)} . (2.71)
The orthonormal transformation (2.62) on the block matrix can be mapped onto a triangular systolic array for highly efficient parallel implementation, which is discussed next.
. .
Figure 2.1: Systematic illustration of the systolic array implementation of the QR-RLS blind adaptive algorithm (N = 4, K = 2), and the operations at each cell.
Left boundary cell
Parallel Implementation on Systolic Arrays
The QR-RLS blind adaptive algorithm derived above has good numerical properties and is well suited for parallel implementation. Fig. 2.1 shows systematically a systolic array implementation of this algorithm, using a triangular array first proposed by McWhirter [311].
It consists of three sections — the basic upper triangular array, which stores and updates C[i]; the right-hand column of cells which stores and updates u[i]; and the final processing cell which computes the demodulated data bit. The system is initialized as C[−1] =√
δIN
and u[−1] = s1/√
δ. The received data r[i] are fed from the top and propagate to the bottom of the array. The rotation angles φn are calculated in left boundary cells and propagate from left to right. The internal cells update their elements by Givens rotations using the angles received from the left. The factor γ[i] is calculated along the left boundary cells where the dot “•” represent an extra delay. The final cell extracts the signs of η[i] and γ[i], and produces the demodulated differential data bit, according to (2.71). The computation at each cell is also outlined in Fig. 2.1. The QR-RLS algorithm may also be carried out using the square-root free Givens rotation algorithm to reduce the computational complexity at each cell [155, 311]. For more details on the systolic array implementations, see [105, 311].
The systolic array in Fig. 2.1 operates in a highly pipelined manner. The computational wavefront propagates at the received data symbol rate. The demodulated data bits are also output at the received data symbol rate. Note that the demodulated data bit produced on a given clock corresponds to the received vector entered 2N clock cycles earlier.
If multiple synchronous user data streams need to be demodulated, then we can simply add more column arrays on the right-hand side, and initialize each of them by the corre-sponding signature vector of each user. It is clear that by using the same triangular array, multiple users’ data can be demodulated simultaneously. This is also illustrated in Fig. 2.1 for the case of two users. (Also multiple paths of the same signal can be handled by adding appropriate linear array to Fig. 2.1.)