An inverse QRD-RLS algorithm for linearly constrained minimum variance adaptive filtering

(1)

An inverse QRD-RLS algorithm for linearly constrained

minimum variance adaptive ﬁltering

$

Ce´sar A. Medina S.

n

, Raimundo Sampaio-Neto

Center of Telecommunications Studies (CETUC), Pontiﬁcal Catholic University of Rio de Janeiro (PUC-Rio), Brazil

a r t i c l e

i n f o

Article history: Received 30 May 2012 Received in revised form 24 September 2012 Accepted 2 November 2012 Available online 9 November 2012 Keywords:

Linearly constrained minimum variance IQRD-RLS ﬁltering

Constrained RLS CDMA

Blind receiver estimation

a b s t r a c t

In this paper an inverse QR decomposition based recursive least-squares algorithm for linearly constrained minimum variance ﬁltering is proposed. The proposed algorithm is numerically stable in ﬁnite precision environments and is suitable for implementation in systolic arrays or DSP vector architectures. Its performance is illustrated by simulations of a blind receiver for a multicarrier CDMA communication system and compared with previously proposed inverse QR decomposition recursive least-squares algorithms.

&_{2012 Elsevier B.V.}

1. Introduction

QR decomposition based algorithms have gained an important role in wireless communication systems because their numerical stability in limited precision environments

[1], reduced dynamic range, plentiful data and task-level parallelism [2] and efﬁcient implementation in systolic arrays[3]or DSP vector architectures[4]. Such character-istics have previously been exploited to develop algorithms for a range of applications such as adaptive receivers, adaptive equalizers and adaptive beamforming, among others[5–12].

Constrained QR decomposition based RLS algorithms have been proposed in [13–15], nevertheless, they are only solutions to the minimum variance distortion-less response (MVDR) ﬁlter, i.e., only one constraint is assumed.

In[7], Chern et al. present a constrained inverse QRD-RLS (CIQRD-RLS) solution for the linearly constrained minimum variance (LCMV) filter formed by two stages. The first stage uses the unconstrained IQRD-RLS algorithm[16]to compute the inverse of the autocorrelation matrix of the input signal, while the second stage applies the matrix-inversion-lemma to the results of the first stage to compute the LCMV filter output.

In this paper we propose a CIQRD-RLS algorithm to compute the coefficients of the LCMV filter that, different from previously proposed algorithms, avoids the use of matrix-inversion-lemma to derive the solution and thus, not only one but two unconstrained IQR-RLS decomposi-tion stages are used to update the LCMV filter coefficients. The proposed algorithm results in better numerical accuracy when compared with previously proposed algorithms, as indicated by computer simulations.

This paper is structured as follows.Section 2describes the linearly constrained minimum variance least-squares ﬁltering problem, while inSection 3the proposed algo-rithm is derived. Some simulation experiments are pre-sented inSection 4, whileSection 5gives the conclusions. Notation: In what follows, Ikrepresents a k k identity

matrix, 0mn, an m n null matrix, diagðxÞ is a diagonal

Contents lists available atSciVerse ScienceDirect

journal homepage:www.elsevier.com/locate/sigpro

Signal Processing

0165-1684 & 2012 Elsevier B.V.

http://dx.doi.org/10.1016/j.sigpro.2012.11.002

$

This work was supported by FAPERJ/CAPES, Fundac- ~ao de Amparo a Pesquisa do Estado do Rio de Janeiro/Coordenac- ~ao de Aperfeic-oamento de Pessoal de Nı´vel Superior, under the program PAPD-2009, process number E-26/102.018/2009.

n

Corresponding author. Tel.: þ 55 21 35272151.

E-mail addresses: [email protected] (C.A. Medina S.), [email protected] (R. Sampaio-Neto).

Open access under CC BY-NC-ND license.

(2)

matrix with the components of x as its nonzero elements, ðÞT, ðÞH and ðÞn

denote transpose, Hermitian transpose and complex conjugated, respectively.

2. The linearly constrained minimum variance least-squares ﬁlter

A linearly constrained minimum variance least-squares ﬁlter is shown inFig. 1, where is supposed that the samples of the input vector, rðiÞ, are drawn from the complex ﬁeld and, in order to deal with multidimensional input signals, they do not necessarily present a time shift relationship.

The set of ﬁlter coefﬁcients wðiÞ ¼ ½w0ðiÞ w1ðiÞ

wM1ðiÞTis chosen in order to minimize the output error

in the weighted least-squares sense, subject to a set of L linear constraints, i.e.,

wðiÞ ¼ arg min

w

Xi j ¼ 1

l

ij9wH_rðjÞ92

subject to CHwðiÞ ¼ p ð1Þ where 0 5

l

o1 is the so-called forgetting factor, C is the M L constraint matrix and p is the L-element response vector.

Using Lagrange multipliers, the optimum ﬁlter coefﬁ-cients, are obtained as[17]

wðiÞ ¼ R1ðiÞCðCHR1ðiÞCÞ1_p _ð2Þ

where RðiÞ ¼ X

i

j ¼ 1

l

ijrðjÞrH_{ðjÞ ¼}

_l

_{Rði1Þ þrðiÞr}H_ðiÞ _ð3Þ

is the so-called deterministic correlation matrix of the input signal[3]. Note that(3)is a rank-one update process characterized by the plus sign in front of the rank-one matrix rðiÞrH_ðiÞ.

3. A constrained inverse QRD-RLS algorithm

The direct implementation of (2) is computationally intensive because involves, among other matrix opera-tions, two matrix inversions. An implementation based on the matrix-inversion-lemma is proposed in [17], leading to the constrained fast recursive least-squares (LCMV FRLS) algorithm. Nevertheless, it is known that such approach leads to a numerically unstable algorithm for ﬁnite precision environments. In[7]a CIQR-RLS algorithm

was developed in two stages, the ﬁrst stage uses the (unconstrained) IQRD-RLS algorithm to compute recur-sively the Cholesky factors of R1ðiÞ, and then, in the second stage, the results of the matrix-inversion-lemma are used to compute recursively ðCHR1ðiÞCÞ1_.

In this paper we propose a CIQR-RLS algorithm imple-mented in two stages. The ﬁrst stage updates the Cholesky factors of R1ðiÞ as in[7]. However, differently from [7], the second stage updates ðCHR1ðiÞCÞ1by downdating its Cholesky factors using QR decomposition based algo-rithms, and thus, leading to a new constrained IQRD-RLS (CIQRD-RLS) algorithm. The proposed algorithm is derived in the following.

3.1. The ﬁrst stage: updating the M M matrix R1_ðiÞ

The (unconstrained) IQRD-RLS algorithm [16] was originally proposed to compute the weight vector of the unconstrained RLS adaptive ﬁlter, avoiding recursive matrix inversion or back-substitution as in the conven-tional RLS algorithm or the QRD-RLS algorithm, respec-tively. In the ﬁrst stage of the algorithm proposed here, we update the Cholesky factors of R1ðiÞ using the unconstrained IQRD-RLS algorithm, which we summarize here for completeness of the paper. Note that this stage is also implemented by the CIQRD-RLS algorithm of[7].

Let RðiÞ be a rank-one updated process as in (3)and UðiÞ be its the Cholesky factor, such that RðiÞ ¼ UH_ðiÞUðiÞ,

then, it is well known that UðiÞ can be updated from Uði1Þ as[16] 0TM1 UðiÞ " # ¼Q ðiÞ r H_ðiÞ

l

1=2Uði1Þ " # ð4Þ where Q ðiÞ ¼ QM1ðiÞQM2ðiÞ Q0ðiÞ is an orthogonal

matrix constructed by a sequence of Givens rotations matrices that annihilates the elements of rH_{ðiÞ over the}

diagonal elements of

l

1=2Uði1Þ. Let express Q ðiÞ as Q ðiÞ ¼

g

ðiÞ g

H_ðiÞ

f ðiÞ EðiÞ " #

ð5Þ then, by inverting and conjugate-transposing both sides of the square matrix

0TM1

g

ðiÞ UðiÞ f ðiÞ " # ¼Q ðiÞ r H_ðiÞ ₁

l

1=2Uði1Þ 0M1 " # ð6Þ

(3)

we get zH_ðiÞ ₁₌

_g

_ðiÞ UHðiÞ 0M1 " # ¼Q ðiÞ 0 T M1 1

l

1=2UHði1Þ aðiÞ " # ð7Þ where

zðiÞ ¼ U1ðiÞf ðiÞ=

g

ðiÞ ð8Þ and aðiÞ ¼

l

1=2UH_{ði1ÞrðiÞ.}

Then, to update U1ðiÞ from U1ði1Þ we ﬁrst compute aðiÞ as before, then compute the sequence of Givens rotations matrices, Q ðiÞ ¼ QM1ðiÞQM2ðiÞ Q0ðiÞ, such

that 1=

g

ðiÞ 0M1 " # ¼Q ðiÞ 1 aðiÞ " # ð9Þ and ﬁnally, we apply to

zH_ðiÞ UHðiÞ " # ¼Q ðiÞ 0 T M1

l

1=2UHði1Þ " # ð10Þ such that UHðiÞ is updated from UHði1Þ.

3.2. The second stage: downdating the L L matrix ðCHR1_ðiÞCÞ1

First note that as matrix Q ðiÞ is orthogonal and since R1_{ðiÞ ¼ U}1_ðiÞUH_{ðiÞ, we have from}₍₁₀₎_that

U1_ðiÞUH_ðiÞ |fflfflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflffl} R1_ðiÞ þzðiÞzH_{ðiÞ ¼}

_l

1 U1_ði1ÞUH_ði1Þ |fflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl} R1_ði1Þ

R1ðiÞ ¼

l

1R1ði1ÞzðiÞzH_ðiÞ _ð11Þ

now, pre-multiplying(11)by CH_{and pos-multiplying by C}

we get that

X

ðiÞ ¼ CHR1ðiÞC can be recursively computed as

X

ðiÞ ¼

l

1

X

ði1Þ

a

ðiÞ

a

H

ðiÞ ð12Þ

where

a

ðiÞ ¼ CH_zðiÞ _ð13Þ

The key strategy in the proposed algorithm, which allows us to devise a QR decomposition based solution, is to note that(12), different from(3), implements a rank-one downdate process, characterized by the minus sign in front of

a

ðiÞ

a

H_{ðiÞ, and thus, well known downdating}

algorithms based on QR decomposition can be used. There are in the literature several approaches for rank-one downdating process using QR decomposition algo-rithms, such as the ones proposed in [18–20] that use Givens rotations, [4,21] that use hyperbolic plane rota-tions and [22] that uses hyperbolic Householder trans-forms. In all the aforementioned algorithms the forgetting factor is equal to one,

l

¼1.

In this paper we use an algorithm based on[4,16] to downdate the inverse of the Cholesky factor of

X

ðiÞ, for

l

a1. Although its derivation is rather straightforward, the authors did not ﬁnd this more general version in the literature.

In the following, we say that a matrix PðiÞ is pseudo-orthogonal if PHðiÞ

U

PðiÞ ¼

U

for some signature matrix

U

¼diagð71Þ[23].

Now, let us deﬁne DðiÞ as the Cholesky factor of

X

ðiÞ such that

X

ðiÞ ¼ DHðiÞDðiÞ. The key observation here is that a sequence of hyperbolic Householder transforms can be found, so that[22] DðiÞ 0TL1ðiÞ " # ¼PðiÞ

l

1=2 Dði1Þ

a

H_ðiÞ " # ð14Þ where PðiÞ ¼ PL1ðiÞPL2ðiÞ P0ðiÞ is a pseudo-orthogonal

matrix with respect to

U

¼diagðIL,1Þ, such that

PHðiÞ

U

PðiÞ ¼

U

, and the downdated Cholesky factor DðiÞ satisﬁes (12). The matrices PjðiÞ (j ¼ L1,L2, . . . ,0) are

also pseudo-orthogonal and each one annihilates one element of

a

ðiÞ over the diagonal elements of

l

1=2Dði1Þ.

Now, lets partition matrix PðiÞ as PðiÞ ¼ AðiÞ vðiÞ

tH_ðiÞ _qðiÞ

" #

ð15Þ then, by inverting and conjugate-transposing both sides of the square matrix:

DðiÞ vðiÞ 0TL1 qðiÞ " # ¼PðiÞ

l

1=2 Dði1Þ 0L1

a

H_ðiÞ ₁ " # ð16Þ we get DHðiÞ 0L1 mH_ðiÞ _1=qn ðiÞ " # ¼PðiÞ

l

1=2 DH_ði1Þ

b

ðiÞ 0TL1 1 " # ð17Þ where

b

ðiÞ ¼

l

1=2DHði1Þ

a

ðiÞ and

mðiÞ ¼ D1ðiÞvðiÞ=qðiÞ ð18Þ So, in order to downdate DH_{ðiÞ from D}H_{ði1Þ we ﬁrst}

compute the matrix PðiÞ ¼ PL1ðiÞPL2ðiÞ P0ðiÞ that

implements an hyperbolic transform such that 0L1 1=qn ðiÞ " # ¼PðiÞ

b

ðiÞ 1 ð19Þ and then, using PðiÞ, we compute

DHðiÞ mH_ðiÞ " # ¼PðiÞ

l

1=2 DHði1Þ 0TL1 " # ð20Þ For details on construction of PðiÞ refer to Appendix.

Finally, the receiver ﬁlter for the desired user can be computed as

wðiÞ ¼ U1ðiÞUHðiÞCD1ðiÞDHðiÞpðiÞ ð21Þ Since in some applications such as blind user detection and beamforming the main interest is in the receiver ﬁlter output, yðiÞ ¼ wH_{ðiÞrðiÞ, and not the receiver ﬁlter itself,}

some operations can be avoided. To do this, ﬁrst note from(12)that

DH_{ðiÞDðiÞ ¼}

_l

1

DH_{ði1ÞDði1Þ}

_a

_ðiÞ

_a

H_ðiÞ _ð22Þ

then, replacing(15)in(14)we get DðiÞ ¼

l

1=2AðiÞDði1Þ þ vðiÞ

a

H

ðiÞ ð23Þ

Pre-multiplying(23)by DH_{ðiÞ and then equating to}₍₂₂₎_,

we get

(4)

so that mðiÞ in(18)is given by

mðiÞ ¼ D1ðiÞDHðiÞ

a

ðiÞ=qðiÞ ¼

X

1ðiÞ

a

ðiÞ=qðiÞ ð25Þ Now, by using a procedure similar to the one that led to(24)from(22)and(23), replacing(5)in(4)and after some algebraic manipulations, we get

f ðiÞ ¼ UHðiÞrðiÞ ð26Þ

that, when replaced in(8), yields

zðiÞ ¼ U1_ðiÞUH_ðiÞrðiÞ=

_g

_{ðiÞ ¼ R}1_ðiÞrðiÞ=

_g

_ðiÞ _ð27Þ

which is known as the Kalman gain.

Using(21), (27), (13) and (25), we have that the output signal can be computed directly as

yðiÞ ¼ wHðiÞrðiÞ ¼pH_ðiÞD1

ðiÞDHðiÞCHU1ðiÞUHðiÞrðiÞ ¼

g

ðiÞpHðiÞD1ðiÞDHðiÞCHzðiÞ ¼

g

ðiÞpH_ðiÞD1

ðiÞDHðiÞ

a

ðiÞ

¼

g

ðiÞqðiÞpH_ðiÞmðiÞ _ð28Þ

that are outputs of the CIQRD-RLS algorithm. The pro-posed CIQRD-RLS algorithm for linearly constrained mini-mum variance ﬁltering is summarized inTable 1.

4. Simulation results

For simulation purposes we implement a minimum variance least-squares receiver for a downlink BPSK synchronous multicarrier (MC) CDMA transmission system with K users, as depicted inFig. 2. For user k, the transmitted symbols, bk(i), drawn from a complex signal

constellation with zero mean and unit average symbol energy, are ﬁrst spread by a PN code ck of M chips per

symbol, M¼32 in our simulations. The chips are grouped in blocks and transmitted in multicarrier fashion by a M M matrix FH, where F implements the normalized discrete Fourier transform.

In order to allow interblock interference (IBI) suppres-sion at the receiver, a length Lgcyclic preﬁx (CP) insertion

is performed before transmission by a P M matrix T, detailed below, where 0mn represents an m n null

matrix and P ¼ M þLg; Lgmust be at least the channel

order to avoid IBI: T ¼ 0LgðMLgÞ9ILg

IM

" #

Each block of chips is then serially transmitted through a multipath channel, modeled here as a FIR ﬁlter with L taps whose gains are samples, taken at the chip rate, of the channel impulse response complex envelope. It is assumed that during the i-th block duration the channel impulse response, hðiÞ ¼ ½h0ðiÞ . . . hL1ðiÞT, remains

constant. In our simulations we set L¼4 and Lg¼3. Finally,

through the use of a matrix G ¼ ½0MLg9 IM, the receiver

removes the cyclic preﬁx from the received signal to eliminate IBI.

After some algebraic manipulations, the M dimen-sional observation vector, rðiÞ, can be represented by[24]

rðiÞ ¼ X K k ¼ 1 ffiffiffiffi

r

p

kCkhðiÞbkðiÞ þnðiÞ ð29Þ

where

r

kis the average power of the transmitted symbol

for user k, nðiÞ ¼ ½n0ðiÞ . . . nM1ðiÞT is a complex white

Gaussian noise vector with covariance matrix E½nðiÞnH_{ðiÞ ¼}

s

2_I

M and Ck is an M L code related circulant matrix for

user k, whose columns are circularly shifted versions of the k-th user transformed spreading sequence, FHck.

Considering the observation vector in(29), the linearly constrained minimum variance least-squares receiver for user k is the ﬁlter wkðiÞ obtained as[24]

wkðiÞ ¼ arg min w Xi j ¼ 1

l

ij9wH_rðjÞ92 subject to CHkwðiÞ ¼ p ð30Þ whose solution is wkðiÞ ¼ R1ðiÞCkðCHkR 1_ðiÞC kÞ1p ð31Þ Table 1

Proposed CIQRD-RLS algorithm.

Uð0Þ the Cholesky factor of Rð0Þ ¼dI,da small constant Dð0Þ the Cholesky factor of CH_R1_ð0ÞC

(1) Form the matrix–vector product: aðiÞ ¼l1=2_UH_ði1ÞrðiÞ

(2) Compute Q ðiÞ such that[16]

1=gðiÞ 0M1 " # ¼Q ðiÞ 1 aðiÞ " #

(3) Apply Q ðiÞ to ½0M1l1=2U1ði1ÞHforming

zH_ðiÞ UH_ðiÞ " # ¼Q ðiÞ 0 T M1 l1=2_UH_ði1Þ " #

(4) Form the matrix–vector product:

aðiÞ ¼ CHzðiÞ

bðiÞ ¼l1=2_DH_ði1Þ_a_ðiÞ

(5) Compute PðiÞ such that (see AppendixTable 2) 0L1 1=qn_ðiÞ " # ¼PðiÞ bðiÞ 1

(6) Apply PðiÞ to ½l1=2_D1_{ði1Þ 0}

L1Hforming DH_ðiÞ mH_ðiÞ " # ¼PðiÞ l 1=2_DH_ði1Þ 0TL1 " #

(7) Compute the receiver ﬁlter output

yðiÞ ¼ gðiÞqðiÞpH_ðiÞmðiÞ

(8) Alternatively compute

(5)

where p is an estimate of the channel impulse response. In all the experiments vector p was made equal to the complex channel impulse response, hðiÞ.

The performance of the blind receiver was measured in terms of the signal-to-interference plus noise ratio (SINR) and bit error rate (BER). We compare the proposed algorithm with the linearly constrained minimum variance fast recursive least-squares (LCMV FRLS) algo-rithm in[17], the linearly constrained minimum variance IQRD-RLS (LCMV IQRD-RLS) algorithm in [7] and the linearly constrained constant modulus IQRD-RLS receiver (LCCM IQRD-RLS) in[6].

In the ﬁrst experiment we set a system with K ¼10 users, the power level distribution amongst the inter-ferers follow a log-normal distribution with associated standard deviation of 10 dB. The path gains of the trans-mission channel are drawn from a zero-mean and unit variance complex Gaussian random variable and kept ﬁxed throughout each simulation run. The forgetting factor was set to

l

¼0:999 and the results are an average of 100 runs. The obtained SINR performance results are depicted in Fig. 3 for Eb=N0¼0 dB, Eb=N0¼10 dB and

Eb=N0¼20 dB with respect to the desired user power

level (Ebis the energy per bit of the desired user).

Results indicate that the LCMV FRLS and the LCMV IQRD-RLS algorithms exhibit numerical instability as they diverge after a number of transmitted symbols, while the LCCM IQRD-RLS and the proposed algorithm are stable and presents similar steady-state performance for low signal-to-noise ratio (up to Eb=N0¼10 dB), with the proposed

algorithm outperforming the LCCM IQRD-RLS in terms of rate of convergence. InFig. 4the obtained BER vs. Eb=N0for

the same scenario is depicted. Results are an average of 100 independent runs of 2000 transmitted symbols.

Note that for Eb=N0410 dB the performance of the

LCMV FRLS deteriorates because the ill-conditioned matrices involved in the algorithm. Results show that LCMV IQRD-RLS algorithm is not suitable for the imple-mentation of a MC CDMA blind receiver, in fact, it was originally proposed for beamforming. Although the

purpose of this paper is not to compare the bit error rate performance of a constrained constant modulus receiver (LCCM IQRD-RLS) with a constrained minimum variance receiver (Proposed) but their numerical stability, the results showed a superiority of the proposed algorithm over LCCM IQRD-RLS. This is also due to the fact that error counting includes the sequence of transmitted symbols were the proposed algorithm performs better than LCCM IQRD-RLS algorithm in terms of SINR (the transient symbols).

In the second experiment we considered the BER performance in a dynamic scenario in which the system has initially seven users, the power level distribution amongst the interferers follow a log-normal distribution with associated standard deviation of 3 dB. After 1000 symbols, seven additional users enter the system and the power level distribution amongst interferes is loosen with associated standard deviation being increased to 10 dB. The path gains of the transmission channel are randomly drawn from a zero-mean and unit variance complex Gaussian random variable and kept ﬁxed throughout each simulation run. The forgetting factor was set to

l

¼0:999. The result, which is an average of 100 runs, is shown inFig. 5for Eb=N0¼20 dB. Note that

the LCMV FRLS algorithm diverges after additional users enter to the system.

Finally, to evaluate the behavior of the algorithm for different values of forgetting factor, a MC CDMA transmis-sion system in a time-variant random channel was simu-lated. The sequence of channel coefﬁcients, hlðiÞ ¼ pl

a

lðiÞ

(l ¼ 0,1,2, . . . ,L1) is obtained with Clarke’s model [25]. This procedure corresponds to the generation of L indepen-dent sequences of correlated unit power complex Gaussian random variables (E½9

a

2

lðiÞ9 ¼ 1) with the path weights pl

normalized so that PLp l ¼ 19pl9

2

¼1. We assume that the average power of each path decays exponentially, such that 9pj,l9

2

¼

s

2

0expl, l ¼ 0,1,2,3, and

s

20¼1exp1=ð1eLÞ [26]. The results, depicted inFig. 6, are shown in terms of the normalized Doppler frequency ðfdTÞ, where fd is the Fig. 2. MC-CDMA downlink transmission system.

(6)

Doppler frequency and T is the inverse of the transmission rate. In the simulations two values of fdT were assumed.

For fdT¼0.0001 a value of

l

¼0:997 was used, while for

fdT¼0.0005,

l

¼0:996. As expected, the BER of the two

algorithms increase for a faster varying channel (fdT¼

0.0005), but the proposed algorithm exhibits better bit error

rate than the LCCM IQRD-RLS algorithm. Both algorithms behave numerically stable.

InTable 3the computational complexity of the algo-rithms is given in terms of complex multiplications, complex sums and total number of operations for an hypothetic case of M¼ 32 and L¼ 4 (it was assumed that the computational

0 500 1000 1500 −30 −25 −20 −15 −10 −5 0 5 10 15 20 25 SINR (dB) 24000 24200 24400 24600 24800 25000 Eb/N0=20dB E_b/N₀=10dB Eb/N0=0 dB LCMV FRLS (diverges around this point)

Number of Transmitted Symbols Proposed

LCCM IQRD−RLS

LCMV IQRD−RLS (diverges arround

sample 15000)

Fig. 3. SINR vs. number of transmitted symbols for a time-invariant random channel under different signal-to-noise ratios.

0 5 10 15 20 10−4 10−3 10−2 10−1 100 Eb/N0 (dB) BER LCMV FRLS LCMV IQRD−RLS LCCM IQRD−RLS Proposed

(7)

cost of a complex multiplication and a complex sum is similar). The evaluation of computational complexity does not consider possible optimizations in the implementation. Results inTable 3are valid for serial architectures. However, as mentioned in Introduction, QR decomposition based adaptive algorithms are suitable for systolic arrays or DSP

vector architectures since they posse desirable properties for VLSI implementations such as regularity and good ﬁnite word-length behavior. Also they can be mapped to CORDIC arithmetic-based processors and can be designed efﬁciently for high-speed/low-power applications through pipelining and parallel processing. In serial implementations the system

0 500 1000 1500 2000 10−3 10−2 10−1 100 BER

Number of Transmitted Symbols

LCMV FRLS LCCM IQRD−RLS Proposed

Fig. 5. BER vs. number of transmitted symbols for a dynamic scenario (Eb=N0¼20 dB).

0 2 4 6 8 10 12 14 16 18 20 10−4 10−3 10−2 10−1 100 Eb/N0 (dB) BER LCCM IQRD−RLS Proposed f_dT=5e−4 fdT=1e −4

(8)

clock must be several times faster than the sample rate due to the implementation of several matrix–vector multiplica-tions, among other operations (see LCMV-FRLS algorithm

[17] for example). On the other hand, inverse QR based algorithms can be implemented in parallel, such that the speed of the algorithm increases, at the expense of more CORDIC units (hardware complexity). In the proposed algo-rithm, two unconstrained IQR-RLS decomposition stages are used to update the LCMV filter coefficients, which can be efficiently implemented on CORDIC arithmetic-based proces-sors. Between these two stages only one matrix–vector multiplication is found, see(13). However,(13)can also be efficiently computed in parallel, incurring only in an aug-mented hardware complexity and an increment of order M in the latency. Other algorithms include several matrix–vector multiplication which forces a serial or pipelined implementa-tion, and thus, reducing the speed of the algorithm.

In terms of stability, a ﬂoating-point implementation the LCMV FRLS algorithm, which uses the matrix inver-sion lemma to derive the algorithm to update R1_ðiÞ,

produce errors between iterations and, in practice, the algorithm become unstable [3]. The other algorithms presented in the paper avoid the instability of R1_{ðiÞ by}

updating its Cholesky factor using the (unconstrained) IQRD-RLS algorithm, which is known to be stable. The source of instability of LCMV RLS and LCCM IQRD-RLS algorithms is the second stage, where recursions for

X

1ðiÞ are implicit computed based on the matrix inversion lemma.

In the proposed algorithm, we avoid the use of such recursion by downdating the Cholesky factor of

X

1ðiÞ using hyperbolic transforms. Hyperbolic plane rotations for Cho-lesky downdating is known to be forward (weakly) stable, in the sense that acceptable results can always be expected when downdating problem is not too ill-conditioned [4]. In [20] is shown that the downdating problem is ill-conditioned if any singular value is reduced signiﬁcantly

(not necessarily becoming small). This happens in the presence of outliers, i.e., an erroneous and large element. However, in the proposed algorithm, the input for the downdating stage is the vector

a

ðiÞ ¼ CHR1_ðiÞrðiÞ=

_g

_ðiÞ

which is a ﬁltered version of the input vector, so that we expect to not contain any outlier. Simulations results show good numerical properties of the proposed algo-rithm that corroborate the arguments above.

5. Conclusions

In this paper was proposed an IQRD-RLS algorithm for linearly constrained minimum variance ﬁlters. The proposed algorithm is suitable for implementations in systolic arrays or DSP vector architectures and, as shown through computer simulations, has better numerical behavior in ﬁnite precision environments when compared with other constrained IQRD-RLS algorithms for blind detection in MC-CDMA systems. This numerical robustness results in better performance in terms of BER and SINR.

Appendix A. Construction of matrix PðiÞ

The structure of the matrix PðiÞ is somewhat similar to the structure of the matrix Q ðiÞ, and depends on the type of triangularization of the matrix DðiÞ. For example, if we consider that DðiÞ is an upper triangular matrix, then the structure of PjðiÞ (j ¼ 0, ,L1) is PjðiÞ ¼ pj,00 0T pj,01 0T 0 Ij 0 0 pj,10 0 T pj,11 0 T 0 0 0 ILj 2 6 6 6 6 6 4 3 7 7 7 7 7 5 ðA:1Þ

where the coefﬁcients pj,ln, l,n ¼0,1 are computed by an

hyperbolic transform algorithm. In this paper we use the hyperbolic Householder transform, which proceed as follows. First deﬁne

U

¼ ½1₀ ₁0, then, if we want to compute the matrix:

~ PjðiÞ ¼ pj,00 pj,01 pj,10 pj,11 " # ðA:2Þ such that 0 eðiÞ " # ¼ ~PjðiÞ

d

ðiÞ

k

ðiÞ " # ðA:3Þ where 9eðiÞ92¼ 9

k

ðiÞ929

d

ðiÞ92 and _P~H

jðiÞ

U

P~jðiÞ ¼

U

, we

apply the algorithm inTable 2. More robust algorithms for hyperbolic Householder transforms can be found in the

Table 2

Hyperbolic householder transform[30].

DeﬁneU¼ 1 0 0 1

and a ¼ ½0 pffiffiffiffiffiffiffi1T. (1) Set

gðiÞ ¼ ½dðiÞ kðiÞT

(2) Set xðiÞ ¼gðiÞ þpffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffigH_ðiÞ_U_g_ðiÞ_a

(4) SetnðiÞ ¼ 2= x H_ðiÞ_U_xðiÞ

(5) Compute PjðiÞ as

PjðiÞ ¼ I2nðiÞxðiÞxHðiÞU

Table 3

Computational complexity of RLS algorithms.

Algorithm Complex sums Complex multiplications Tot. Op. (M ¼32, L¼ 4)

LCMV FRLS M2_þMð2L2_þ3L1ÞL _2M2_þMð2L2_{þ4Lþ 1Þþ 3L þ 1} ₆₀₂₅

LCMV IQRD-RLS M2_{þMð4L þ 2ÞL2} _M2_{þMð3L þ 1Þ þ L}3_þ2L2_{þL þ 1} ₃₁₃₅

LCMM IQRD-RLS M2_þMðL2_{þ5L1Þþ 2L}2_{L þ 2} _M2_þMð2L2_{þ5L þ 4Þ þ L}2_þL ₅₀₁₀

(9)

literature[27,28], as well systolic implementations of the algorithms[29].

References

[1] J.A. Apolina´rio (Ed.), QRD-RLS Adaptive Filtering, Springer, 2009. [2] L. Ma, K. Dickson, J. McAllister, J. McCanny, QR

decomposition-based matrix inversion for high performance embedded MIMO receivers, IEEE Transactions on Signal Processing 59 (4) (2011) 1858–1867.

[3] P.S.R. Diniz, Adaptive Filtering: Algorithms and Practical Imple-mentation, 3rd ed. Springer, 2008.

[4] C. Pan, R.J. Plemmons, Least squares modiﬁcations with inverse factorizations: parallel implications, Journal of Computation and Applied Mathematics 27 (1–2) (1989) 109–127.

[5] C.A. Medina, R. Sampaio-Neto, A constrained IQRD-RLS blind detection algorithm for CDMA transmission systems in multipath channels, in: Proceedings The Seventh International Symposium on Wireless Communication Systems, York, United Kingdom (ISWCS 2010).

[6] S.-J. Chern, C.-Y. Chang, Adaptive MC-CDMA receiver with con-strained constant modulus IQRD-RLS algorithm for MAI suppres-sion, Signal Processing 83 (10) (2003) 2209–2226.

[7] S.-J. Chern, C.-Y. Chang, Adaptive linearly constrained inverse QRD-RLS beamforming algorithm for moving jammers suppression, IEEE Transactions on Antennas and Propagation 50 (8) (2002) 1138–1150.

[8] M.C. Tsakiris, C.G. Lopes, V.H. Nascimento, An array recursive least-squares algorithm with generic nonfading regularization matrix, IEEE Signal Processing Letters 17 (12) (2010) 1001–1004. [9] D. Cescato, H. B ¨olcskei, Algorithms for interpolation-based QR

decomposition in MIMO-OFDM systems, IEEE Transactions on Signal Processing 59 (4) (2011) 1719–1733.

[10] K. Nagatomi, H. Kawai, Kenichi Higuchi1, Complexity-Reduced MLD Based on QR Decomposition in OFDM MIMO Multiplexing with Frequency Domain Spreading and Code Multiplexing, EURASIP Journal on Advances in Signal Processing 2011, 2011:525829 http://dx.doi.org/10.1155/2011/525829.

[11] S. Aubert, F. Nouvel, A. Nafkha, Complexity gain of QR decomposi-tion based sphere decoder in LTE receivers, in: Proceedings IEEE of Vehicle Technology Conference, Fall, United States, 2009. [12] K.-H. Lin, R. C. Chang, C.-L. Huang, F.-C. Chen, S.-C. Lin,

Implemen-tation of QR decomposition for MIMO OFDM detection systems, in: Proceedings 15th IEEE International Conference on Electronics, Circuits and Systems, 2008, pp. 57–60.

[13] H.V. Poor, X. Wang, Code-aided interference suppression in DS/ CDMA communications: II Parallel blind adaptive implementations, IEEE Transactions on Communications 45 (9) (1997) 1112–1122.

[14] M. Moonen, I.K. Proudler, MVDR beamforming and generalized sidelobe cancellation based on inverse updating with residual extraction, IEEE Transactions on Circuits and Systems 47 (2) (2000) 352–358.

[15] C.-F.T. Tang, Adaptive Array Systems Using QR-Based RLS and CRLS Techniques with Systolic Array Architectures, Ph.D. Thesis, University of Maryland, 1991.

[16] A. Ghirnikar, S. T. Alexander, Stable recursive least squares ﬁltering using inverse QR decomposition, in: Proceedings International Conference on Speech, Acoustics and Signal Processing (ICASSP), vol. 3, 1990, pp. 1623–1626.

[17] L.S. Resende, J.M. Romano, M. Bellanger, A fast least-squares algorithm for linearly constrained adaptive ﬁltering, IEEE Transac-tions on Signal Processing 44 (5) (1996) 1168–1174.

[18] J.M. Chambers, Regression updating, Journal of the American Statistical Association 66 (336) (1971) 744–748.

[19] M.A. Saunders, Large Scale Linear Programming Using Cholesky Factorizations, Ph.D. Thesis, Stanford University, Computer Science Department, 1972.

[20] G.W. Stewart, The effects of rounding error on an algorithm for downdating a Cholesky factorization, Journal of the Institute of Mathematics and Its Applications 23 (1979) 203–213.

[21] G.H. Golub, Matrix Decomposition and Statistical Calculations, Statistical Computations, Academic Press, New York, 1969. [22] C.M. Rader, A.O. Steinhardt, Hyperbolic householder transforms,

SIAM Journal on Matrix Analysis and Applications 9 (2) (1988) 269–290.

[23] G. Golub, C.F.V. Loan, Matrix Computations, 3rd ed. The Johns Hopkins University Press, 1996.

[24] C.A. Medina, T.T.V. Vinhoza, R. Sampaio-Neto, Performance com-parison of minimum variance single carrier and multicarrier CDMA receivers, VIII IEEE Workshop on Signal Processing Advances in Wireless Communications (SPAWC-2007), Helsinki, Finlˆandia. [25] T.S. Rappaport, Wireless Communications: Principles and Practice,

Prentice Hall PTR, 1996.

[26] T. Kaitz, Channel and interference model for 802.16b Physical Layer, IEEE 802.16 Broadband Wireless Access Working Group, January 2001.

[27] A.W. Bojanczyk, A.O. Steinhardt, Stabilized hyperbolic householder transformations, IEEE Transactions on Acoustics, Speech, and Signal Processing 37 (8) (1989) 1286–1288.

[28] A.W. Bojanczyk, A.O. Steinhardt, S.H. So, Robust hyperbolic square root updating in RLS estimation, in: Proceedings Fourth Annual ASSP Workshop on Spectrum Estimation and Modeling, 1988, pp. 301–306.

[29] K.R. Liu, S.-F. Hsieh, K. Yao, Systolic block householder transforma-tion for RLS algorithm with two level pipelined implementatransforma-tion, IEEE Transactions on Signal Processing 40 (4) (1992) 946–958. [30] A.O. Steinhardt, Householder transforms in signal processing, IEEE