QR-RLS Algorithm - Blind Multiuser Detection: Direct Methods

Blind Multiuser Detection

2.3 Blind Multiuser Detection: Direct Methods

2.3.3 QR-RLS Algorithm

The RLS approach discussed in the previous subsection, which is based on the matrix in-version lemma for recursively updating C_r[i]⁻¹, has O(N²) complexity per update. Note that although fast RLS algorithms of O(N) complexity exist [62, 79, 113, 121], all these

algorithms exploit the shifting property of the input data. In this particular application, however, successive input data vectors do not have the shifting relationship, in fact, r[i] and r[i− 1] do not overlap at all. Therefore, these standard fast RLS algorithms can not be applied in this application.

The RLS implementation of the blind linear MMSE detector suﬀers from two major prob-lems. The ﬁrst problem is numerical. Recursive estimation of C_r[i]⁻¹ is poorly conditioned because it involves inversion of a data correlation matrix. The condition number of a data correlation matrix is the square of the condition number of the corresponding data matrix;

hence twice the dynamic range is required in the numerical computation [155]. The second problem is that the form of the recursive update of C_r[i]⁻¹ severely limits the parallelism and pipelining that can eﬀectively be applied in implementation.

A well-known approach for overcoming these difficulties associated with the RLS algo-rithms is the rotation-based QR-RLS algorithm [105, 381, 580]. The QR decomposition transforms the original RLS problem into a problem that uses only transformed data values, by Cholesky factorization of the original least-squares data matrix. This causes the numeri-cal dynamic range of the transformed computational problem to be halved, and enables more accurate computation, compared with the RLS algorithms that operate directly on C_r[i]⁻¹. Another important benefit of the rotation-based QR approaches is that the computation can be easily mapped onto systolic array structures for parallel implementations. We next describe the QR-RLS blind linear MMSE detector, which was first developed in [381].

QR-RLS Blind Linear MMSE Detector

Assume that C_r[i] is positive deﬁnite. Let

Cr[i] = C[i]^HC[i] (2.54)

be the Cholesky decomposition, i.e., C[i] is the unique upper triangular Cholesky factor with positive diagonal elements. Deﬁne the following quantities:

u[i] = C[i] ^−Hs1, (2.55)

v[i] = C[i] ^−Hr[i], (2.56)

and α[i] = s ^H₁ C_r[i]⁻¹s₁ = u[i]^Hu[i]. (2.57)

At time i, the a posteriori least-squares (LS) estimate is given by

z[i] = m ₁[i]^Hr[i] = s^H₁ Cr[i]⁻¹r[i]

s^H₁ C_r[i]⁻¹s₁ (2.58)

= u[i]^Hv[i]/α[i]. (2.59)

The a priori LS estimate at time i is given by

ξ[i] = m 1[i− 1]^Hr[i]. (2.60) It can be shown that ξ[i] and z[i] are related by [381]

ξ[i] = z[i]

1− v[i]² + α[i]|z[i]|². (2.61) Suppose that C[i− 1] and u[i − 1] are available from the previous recursion. At time i, the new observation r[i] becomes available. We construct a block matrix consisting of C[i− 1], u[i− 1] and r[i], and apply an orthogonal transformation as follows

Q[i]

* √λC[i− 1] u[i − 1]/√ λ 0

r[i]^H 0 1

* C[i] u[i] v[i]

0^H η[i] γ[i]

. (2.62)

In (2.62) the matrix Q[i], which zeros the ﬁrst N elements on the last row of the partitioned matrix appearing on the left-hand side of (2.62), is an orthonormal matrix consisting of N Givens rotations,

Q[i] = Q

N[i]· · · Q₂[i]Q

1[i], (2.63)

where Q

n[i] zeros the n^th element in the last row by rotating it with the (n + 1)^th row. An individual rotation is specified by two scalars, c_nand s_n(which can be regarded as the cosine and sine respectively of a rotation angle φ_n), and affects only the last row and the (n + 1)^th row. The effects on these two rows are

* c_n s_n

−s^∗n c_n

+ * 0 · · · 0 yn y_n+1 · · · 0 · · · 0 rn r_n+1 · · ·

* 0 · · · 0 yn y_n+1 · · · 0 · · · 0 0 rn+1 · · ·

. ←− (n + 1)^th row

←− last row (2.64)

where the rotation factors are deﬁned by

c_n = y_n^∗

,|yn|²+|rn|², (2.65)

and s_n = r^∗_n

,|yn|²+|rn|². (2.66)

The correctness of (2.62) is shown in the Appendix (Section 2.8.1). It is seen from (2.62) that the computed quantities appearing on the right-hand side are C[i], u[i] and v[i] at time n. It is also shown in the Appendix (Section 2.8.1) that the quantities α[i], z[i] and ξ[i] can be updated according to the following equations

α[i] = α[i− 1]/λ − |η[i]|², (2.67)

z[i] = −η[i]^∗γ[i]/α[i], (2.68)

and ξ[i] = z[i]

|γ[i]|²+ α[i]|z[i]|². (2.69) Note that γ[i] in (2.62) is the last diagonal element of Q[i]. A direct calculation shows that γ[i] =-N

i=1cn [105, 311].

The initialization of the QR-RLS blind adaptive algorithm is given by C[−1] = √ δI_N, u(0) = s₁/√

δ and α[−1] = δ, where δ is a small number. This corresponds to the initial condition C_r[−1] = δIN and m₁[−1] = s1, i.e., the adaptation starts with the matched ﬁlter. At each time i, the algorithm proceeds as follows.

Algorithm 2.4 [QR-RLS blind linear MMSE detector - synchronous CDMA]

• Update the detector: Apply the orthonormal transformation (2.62).

• Compute the detector output and perform diﬀerential detection:

z₁[i] = η[i]^∗γ[i], (2.70)

βˆ1[i] = sign{(z1[i]z1[i− 1]^∗)} . (2.71)

The orthonormal transformation (2.62) on the block matrix can be mapped onto a triangular systolic array for highly eﬃcient parallel implementation, which is discussed next.

. .

Figure 2.1: Systematic illustration of the systolic array implementation of the QR-RLS blind adaptive algorithm (N = 4, K = 2), and the operations at each cell.

Left boundary cell

Parallel Implementation on Systolic Arrays

The QR-RLS blind adaptive algorithm derived above has good numerical properties and is well suited for parallel implementation. Fig. 2.1 shows systematically a systolic array implementation of this algorithm, using a triangular array ﬁrst proposed by McWhirter [311].

It consists of three sections — the basic upper triangular array, which stores and updates C[i]; the right-hand column of cells which stores and updates u[i]; and the ﬁnal processing cell which computes the demodulated data bit. The system is initialized as C[−1] =√

δIN

and u[−1] = s1/√

δ. The received data r[i] are fed from the top and propagate to the bottom of the array. The rotation angles φ_n are calculated in left boundary cells and propagate from left to right. The internal cells update their elements by Givens rotations using the angles received from the left. The factor γ[i] is calculated along the left boundary cells where the dot “•” represent an extra delay. The ﬁnal cell extracts the signs of η[i] and γ[i], and produces the demodulated diﬀerential data bit, according to (2.71). The computation at each cell is also outlined in Fig. 2.1. The QR-RLS algorithm may also be carried out using the square-root free Givens rotation algorithm to reduce the computational complexity at each cell [155, 311]. For more details on the systolic array implementations, see [105, 311].

The systolic array in Fig. 2.1 operates in a highly pipelined manner. The computational wavefront propagates at the received data symbol rate. The demodulated data bits are also output at the received data symbol rate. Note that the demodulated data bit produced on a given clock corresponds to the received vector entered 2N clock cycles earlier.

If multiple synchronous user data streams need to be demodulated, then we can simply add more column arrays on the right-hand side, and initialize each of them by the corre-sponding signature vector of each user. It is clear that by using the same triangular array, multiple users’ data can be demodulated simultaneously. This is also illustrated in Fig. 2.1 for the case of two users. (Also multiple paths of the same signal can be handled by adding appropriate linear array to Fig. 2.1.)

In document 20471280 Wireless Communication Systems (Page 55-60)