RLS Adaptive Filtering Algorithms Based on Parallel Computations

(1)

RLS Adaptive Filtering Algorithms

Based on Parallel Computations

Victor I. DJIGAN

VLSI Development Laboratory, ELVEES R&D Center of Microelectronics, POB 19, Centralny Prospect, Zelenograd, Moscow, Russia 124460

[email protected] Abstract. The paper presents a family of the sliding

win-dow RLS adaptive filtering algorithms with the regulari-zation of adaptive filter correlation matrix. The algorithms are developed in forms, fitted to the implementation by means of parallel computations. The family includes RLS and fast RLS algorithms based on generalized matrix in-version lemma, fast RLS algorithms based on square root free inverse QR decomposition and linearly constrained RLS algorithms. The considered algorithms are mathe-matically identical to the appropriate algorithms with sequential computations. The computation procedures of the developed algorithms are presented. The results of the algorithm simulation are presented as well.

Keywords

Adaptive filtering, RLS, fast RLS, QR decomposi-tion, linear constraints, parallel computations.

1. Introduction

Adaptive signal processing [1] is an essential part of modern digital signal processing. Theoretical results, ob-tained in the field, are widely used in adaptive filters de-sign. Communication channels equalization, echo cancel-lation, suppression of spatially separated noise sources – these are only a few examples of the practical use of such filters [2]. The efficiency of the applications depends on the algorithms that the adaptive filters are based on.

In the applications the simplest gradient adaptive fil-tering algorithms are basically used because the algorithms have small arithmetic complexity. However, the filters, which use such algorithms, are not good enough for the processing of non-stationary signals. The Recursive Least Squares (RLS) algorithms [3] are more appropriate for the processing of the signals, but they require more computing resources. The algorithms complexity is not a problem in modern Digital Signal Processors (DSP) anymore, as the devices have enough resources for the implementation of such algorithms [4]. Besides, as a few DSPs can be inte-grated in a chip, such chips can be used for compact

im-plementation of signal processing algorithms based on parallel computations. Due to the opportunity, the de-velopment of parallel adaptive filtering algorithms be-comes an important task.

2. Problem Formulation and Solution

The given paper presents a method of RLS algorithms description, which allows the development of the algo-rithms in forms, fitted to the implementation by means of parallel computations. The Sliding Window (SW) RLS algorithms with the regularization of correlation matrix for multichannel adaptive filters with unequal number of com-plex-valued weights in channels are considered. The algo-rithms diversity includes unconstrained and constrained RLS algorithms, based on generalized Matrix Inversion Lemma (MIL) and square root free inverse QR decomposi-tion (QRD).

Most of RLS algorithms are based on the use of MIL for the recursive inversion of adaptive filter correlation matrix. To provide the tracking properties of the filters, when non-stationary signals are processed, exponential weighting of the signals or (and) SW is used. In SW case, MIL is applied twice per sample. Due to the limited num-ber of samples involved in the estimation of correlation matrix, SW RLS algorithms can be unstable sometimes. Dynamic regularization of the matrix can be used to stabi-lize RLS algorithms [5]. The SW and regularization are the reasons of the increased arithmetic complexity in com-parison with growing window (Prewindowed, PW) RLS algorithms or the absence of the regularization.

The computation load of the complex adaptive fil-tering algorithms implementation can be decreased by means of parallel computations. To get the parallel RLS algorithms, methods [6] can be used. Based on the meth-ods, parallel algorithms fitted to the implementation by means of two or four processors were developed [6-8].

This paper considers another simple method of the RLS algorithms description, which allows the developing of the regularized PW RLS, SW RLS and regularized SW RLS algorithms of adaptive filtering, including Linearly Constrained (LC) versions of the algorithms, [9-12] as the

(2)

sequence of the same parallel computations, similar to those of the same named PW RLS algorithms.

A block-diagram of M-channel adaptive filter is shown in Fig. 1. The filter can have unequal number of weights in channels. ) ( ) 1 ( 1 1 k N k H N x h − − ) ( ) 1 ( 2 2 k N k H N x h − − ) ( ) 1 (k k m m N H N x h − − ) ( ) 1 ( 1 1 k M k M N H N − − − −h x ) ( ) 1 (k k M M N H N x h − − Adaptive Algorithm ) (k d ) ( 1 k x ) ( 2 k x ) ( 1 k x_M₋ ) (k x_M ) (k x_m ∑

α

N χ, (k) M M M M M M M M

Fig. 1. Multichannel adaptive filter.

The objective of the LC SW least square filtering is to minimize the energy of the error between the desired signal

d(k) and the adaptive filter output:

[

∑

+ − = − ₋ = k L k i N H N i k N k d i k i E 1 2 ) ( ) ( ) ( ) (

λ

h χ

]

, (1)

where the error is measured over an observation window L samples long. The minimization is carried out under the

condition H _N _J. Here, NJh k f C ,χ( )= (k)

[

1(k), H N H N h h =

]

) ( ), ( 2 k k H N h _K ) (k m , ), ( , k H N H Nm h M h _K N h is a vector of M-channel adaptive filter weights;

is a vector of weights in m-th channel of the filter;

[

h = ₀_,_m,h₁_,_m,

]

T m Nm h ₁_, , ₋ K

[

, T (k)

]

NM x K = ) (k T Nm x ), (k , ) (k T N T N x m χ = ₁(k), T₂(k), N T N x x _K is a

vector of input signals in the adaptive filter;

[

x_m(k−1),_K,x_m(k 1)

]

∑

= = M m m N 1 ), ( − = x_m k N N + m is a vector of signals in m-th channel; CNJ and fJ are matrix and vector of

J linear constraints; Nm is a number of weights in m-th channel; is a total number of adaptive filter weights; k is a sample number and λ is a forgetting factor. Superscripts H and T denote Hermitian transpose and transposition of a vector or a matrix; one subscript N, J, or

F denotes the dimension of vectors and square matrices,

two subscripts NJ or NF denote the dimension of rectan-gular (non-transposed) matrices.

The solution of the problem (1) is the vector of

adap-tive filter weights [13]:

[

( )

] [

( ) ( ) ) ( ) ( ) ( ) ( 1 1 1 1 1 k k k k k k k N N H NJ J NJ N H NJ NJ N N N N r R C f C R C C R r R h − − − − − − × × + =

]

(2)

If SW and dynamic regularization are used, the adap-tive filter correlation matrix is defined as

[

]

). ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) 1 ( ) ( ) ( ) ( ) ( ) ( 2 2 1 2 L k L k k k L k L k k k k i i i i k T N N T N N H N N H N N N T N k L k i N H N N i k N − − − + − − − − + − = × × + =

∑

+ − = − ρ ρ ρ ρ χ χ χ χ R ρ ρ χ χ R

µξ

ξ

µ

λ

ξ

λ

(3)

The cross correlation of χN(k) and d(k) is defined as

). ( ) ( ) ( ) ( ) 1 ( ) ( ) ( ) ( 1 L k d L k k d k k i d i k N N N k L k i N i k N − − − + + − = = ∗ ∗ ∗ + − = −

∑

χ χ r χ r

µ

λ

(4)

In (3) and (4), ()*_{means complex conjugate, µ=λ}L_and

ξ2_{is a small value of a dynamic regularization parameter}

[5]. Parameter ξ2_{and parameter δ}2_{for the initial}

regulariza-tion of correlaregulariza-tion matrix are selected as ξ2_{, δ}2_{≥0.01 σ}2_,

where σ2_{is the variance of adaptive filter input signals. The}

dynamic regularization vector ρN(k) is defined as

[

( ), ( ), , ( ), , ( )

]

) (k ₁ k ₂ k k T k N T N T N T N T N p p p m p M ρ = K K (5) where ( )=

[

m( ), m( −1), , m( − m +1)

]

T Nm k p k p k K p k N p 0 ) ( . In (5), pm k = , if 1+nmodNm ≠1, and pm(k)=1 1 , if 1+nmodNm = .

The first item in equation (2) is the solution of the problem (1) without constraints. The second item is de-termined by linear constraints. As a result, the RLS algo-rithms, which compute weight vector (2), also consist of two computational procedures: unconstrained and LC ones. MIL [1], which is used in sequential RLS algorithms, is expressed as R-1_=B-1_-B-1_cA-1_dB-1_{, where A=dB}-1_{c+1 and}

R=B+cd. Here c and d are vectors. It allows the using of the lemma sequentially to invert matrix (3).

The application of the MIL [6] allows the getting of parallel forms for regularized and SW RLS algorithms. The MIL [6] is very cumbersome and the resulting mathemati-cal descriptions of parallel algorithms [6-8] are cumber-some as well.

In general case, see [14], MIL is expressed as

1 1 1 1 1 − − − − − ₌_B ₋_B _CA _DB R (6)

where A=DB-1_{C+S, C and D are matrices. The equation}

(6) is a key tool for the development of the parallel SW regularized RLS algorithms, considered in this paper. To use the equation (6), the matrices C=[y,x,y,v]=[µ0.5_χ

N

(3)

have to be created. The columns in matrix XNF(k) cause the diversity of RLS algorithms: PW, regularized PW, SW and regularized SW. 0) ..., NF M NF N m b N N m f N m f N N N NF NF N N N N N N N N M m E L d d L L O G 0 h 0 h 0 h O X 0 ρ 0 ρ 0 χ 0 χ : Init. = = = = = = = = + − = = + − = = + − = ) 1 ( , , , 2 , 1 , ) 0 ( , ) 0 ( , ) 0 ( , ) 0 ( , ) 0 ( , 0 ) 1 0 ( , 0 ) 0 ( , ) 1 0 ( ,..., ) 0 ( , ) 1 0 ( ,..., ) 0 ( ) ( ) ( ) ( 2 ) ( K

δ

3. Parallel RLS

Algorithms

The using of the equation (6) allows the getting of a RLS algorithm of adaptive filtering in parallel form:

K k=_{1 K},2, , For 0) ) , , , 1 , , , , , 1 ( ) 0 ( , ) 0 ( , ) 0 ( , 0 ) 1 0 ( ,..., 0 ) 0 ( , ) 1 0 ( ,..., ) 0 ( , ) 1 0 ( ,..., ) 0 ( 1 1 2 1 1− − − − = = = = = + − = = + − = = + − = M N N N N N N N NF NF N N N N N N N N diag L d d L L

λ

δ

K K K Λ 0 h Λ R O X 0 ρ 0 ρ 0 χ 0 χ : Init. 1 , , 1 , − _K =M M m For 1) ( )(_k) ( )(_k) ( ) (_k 1) (m)(_k) NF H m f N m F m f F x h X α = − − 2) ) ( ) 1 ( ) ( ) ( ) 1 ( ) ( ) ( ) ( k k N k k m NF H m b N m m F m b F − × × − − − = X h x α K k=_{1 K},2, , For 3) ( )(_k) ( )(_k 1) ( )(_k) f(m)H(_k) F m NF m f N m f N h G α h = − + 1) ) ( ) 1 ( ) ( ) ( ) 1 ( ) ( ₁ 1 k k k k k k NF N H NF F NF N NF X R X S X R − + − = − ₋

λ

G 4)

[

m

]

_F NF H m f N m F m f F k x k h k X k S e ( )( )₌ ( )( )₋ ( ) ( ) ( )( ) 2)

[

]

) 1 ( ) ( ) ( ) 1 ( ) ( 1 1 1 1 − × × − − = − − − − k k k k k N H NF NF N N R X G R R

λ

5) ) ( ) ( ) 1 ( ) ( ) ( ) ( ) ( ) ( k k k E k E H m f F m f F m f N m f N α e × × + − =λ 6)

[

]

[

_m_H

]

H NF T F m f N m f F H H m f N m F N k k E k k k ) ( , ) ( ) ( ) ( , 1 ) ( ) ( ) ( ) ( ) ( ) ( ) 1 ( G 0 e h G + × × − = + 3) α (k) (k) (k 1) NF(k) H N F F =d −h − X 4) h (k) (k 1) (k) H(k) F NF N N =h − +G α 7)

{

}

_      = + + + ) ( ) ( ) ( ₍ ₎ ) ( ) ( ) 1 ( ) ( 1 ) ( 1 k k k _m F m NF m F N T m N m N q Q G T S k for End

The procedure uses a number of matrix and vector computations that are consequently the vector and scalar ones in the proper sequential RLS algorithms. The matrix of Kalman gains GNF(k) contains F columns. The columns are computed independently each other. So, the matrix can be computed by means of F processors, i.e. in parallel. In the algorithm the vector dF(k) is defined as [µ0.5

d(k-L),d(k),0,0], and error signal at the output of adaptive filter,

see Fig. 1, is defined as αN,χ(k)=d(k)-hHN(k-1)χN(k)=

αF(2)(k), where αF(2)(k) means the second element of the error vector αF(k). Vectors dF(k) and αF(k) are row-vectors.

8)

[

]

[

₍ ₎ ₍ ₎

]

1 ) ( ) ( ) ( ) 1 ( ) ( ) ( ) ( ) 1 ( ) ( ~ ) ( − − − × × − + = k k k k k k m F H m b F F m F m b N m NF m NF q α I q h Q G 9) ( )(_k) ( )(_k 1) ( 1)(_k) b(m)H(_k) F m NF m b N m b N h G α h ₌ ₋ ₊ − m for End 10) α (k) (k) (k 1) NF(k) H N F F =d −h − X 11) h (_k) (_k 1) (0)(_k) H(_k) F NF N N =h − +G α

There is a distinction of the algorithm from the se-quential RLS algorithms. Denominator in the equation (2) is a scalar variable in a sequential algorithm, while it is a matrix with FxF elements in the parallel algorithm. The matrix ensures the mathematic identity of the sequential and parallel RLS algorithms. Because F≤4, the matrix inversion does not effect the algorithm complexity if

N>>F. So, the complexity of the parallel RLS algorithm is O(N2_{F) arithmetic operations per iteration. Similarly, fast}

(computationally efficient, O(NF) complexity) parallel RLS algorithms can be developed on basis of use of the least squares linear prediction theory [15]. A parallel ver-sion of the multichannel Fast Kalman (FK) algorithm is shown below. 12) G( )(_k 1) (0)(_k) NF M NF + =G k for End

In the algorithm, the vectors and

are defined as ) ( ) ( k m F x ) ( ) ( m m F k−N x

[

0.5_x (_k), m m

µ

(k−L),x and

]

) ( ),

ξρ

_m k ( 5 . 0 _k _L m

ξρ

µ

−

[

0.5_x _L), m −

µ

( ),ξρ_m k ) ( ) (m _k F ( NF X (k−N_m

]

) m N − ) ( ) (m _k f F q ) ( ) _k m . The vectors , , , and

are row-vectors. The matrices are com-( 5 . 0 m k− ξρ ) k f(m)( F e ), ( _m m k N x − µ ( ) (m f F α ) ( ) (m _k F q m L N − ) k _αb

(4)

posed as ( )₍_k₎=

[

µ

0.5 ( )₍_k−_L_), (m)₍_k_),

µ

0.5

ξ

× N m N m NF χ χ X

]

) ( ), (m) _k N ρ

ξ

) ( ) (m _k N χ

[

]

[

]

[

]

, ) 1 ( ), 1 ( , ), 1 ( ), 1 ( , ) ( , ), ), 1 ( , ), 1 ( ), 1 ( , ) ( , ), ( , ), ( ), 1 ( ), ( 2 1 2 1 2 1 T T N T N T N T T N T N T N T N T T N T N T N T N N k k k k k k k k k k k k k m M m M m − − − − = − − − = − = = x x x x x x x x x x x χ K K K K K ) (k−L

[

]

[

]

[

]

T T N T N T N T T N T N T N T N T T N T N T N N L k L L k L k L L k L k L L k L k L L k L L k L k L L k L M M m M ) 1 ( , ), 1 , ), 1 ( ), 1 ( ) , ) ( , ), ( ), 1 , ), 1 ( ), 1 ( ) , ) ( , ), , ), ( ), 1 ( ) ), ( ) 2 1 1 2 1 2 1 − − − − − − − − = − − − − − − − − − = − − − − − − = − − = − + x x x x x x x x x x χ K K K K K K ) (k

[

]

[

]

[

]

, ) 1 ( ), 1 ( , ), 1 ( ), 1 ( , ) ( , ), ), 1 ( , ), 1 ( ), 1 ( , ) ( , ), ( , ), ( ), 1 ( ), ( 2 1 2 1 2 1 T T N T N T N T T N T N T N T N T T N T N T N T N N k k k k k k k k k k k k k m M m M m − − − − = − − − = − = = p p p p p p p p p p p p K K K K K ) ( ) (m _k _L N −

. The columns of the matrix are composed from the vectors. Vectors are composed as ( ) (m _k _L N ρ − × , ) ( ( ) ( ) ( ) ( 1 ) ( ) ( ) 1 ( ) 0 ( T N M N T N m N N N k k k k k M m+ x χ x χ χ χ K M M ) (m N T N M N T N m N T N N N k k k k k k k m m m ( ( ( ( ( ( ( ) ( ) ( ) 1 ( ) 0 ( x χ x χ x χ χ M M ) (m N , ) ( ( ) ( ) ( ) ( 1 ) ( ) ( ) 1 ( ) 0 ( T N M N T N m N N N k k k k k M m+ p ρ p ρ ρ ρ K M M

[

]

[

]

[

]

. ) 1 ( , ), 1 ( , ), 1 ( ), 1 ( ) ( , ) ( , ), ( ), 1 ( , ), 1 ( ), 1 ( ) ( , ) ( , ), ( , ), ( ), 1 ( ) ( ), ( ) ( 2 1 1 2 1 2 1 ) ( ) ( ) 1 ( ) 0 ( T T N T N T N T N M N T T N T N T N T N T N m N T T N T N T N T N N N N L k L k L k L k L k L k L k L k L k L k L k L k L k L k L k L k L k L k M m M m m M m − − − − − − − − = − − − − − − − − − = − − − − − − = − − = − + p p p p ρ p p p p p ρ p p p p ρ ρ ρ K K M K K M K K

Permutation matrices and T enable the building of the multichannel fast RLS algorithms with unequal number of weights in channels [16]. The using of the matrices does not require the additional arithmetic operations. The matrices just rearrange Kalman gains. To establish the rearranging rules for the given M and N

) ( 1 m N+ S mT N ) ( 1 + m, the matrices product can be calculated in advance. Here, the matrices are for the case M=3:

vectors χ are composed as

              = + 3 2 3 1 3 3 3 2 2 1 2 2 3 1 2 1 1 1 3 2 1 1 ) 1 ( 1 N N N N N N N N N N N N N N N N N N T N T N T N T N I Ο Ο 0 Ο I Ο 0 Ο Ο I 0 0 0 0 T ,               = + 3 2 3 1 3 3 3 2 2 1 2 2 3 2 1 3 1 2 1 1 1 1 ) 2 ( 1 N N N N N N N N N N N N T N T N T N N N N N N N T N I Ο Ο 0 Ο I Ο 0 0 0 0 Ο Ο I 0 T ,               = + 3 2 3 1 3 3 3 2 1 3 2 2 1 2 2 3 1 2 1 1 1 1 ) 3 ( 1 N N N N N N T N T N T N N N N N N N N N N N N N T N I Ο Ο 0 0 0 0 Ο I Ο 0 Ο Ο I 0 T ,

vectors ρ are composed as

              = + T N T N T N N N N N N N N N N N N N N N N N N N N 3 2 1 3 2 3 3 1 3 3 2 2 2 1 2 3 1 2 1 1 1 1 ) 1 ( 1 0 0 0 I Ο 0 Ο Ο I 0 Ο Ο Ο 0 I S ,               = + T N T N T N N N N N N N N N N N N N N N N N N N N 3 2 1 3 2 3 3 1 3 3 2 2 2 1 2 3 1 2 1 1 1 1 ) 1 ( 1 0 0 0 I Ο 0 Ο Ο I 0 Ο Ο Ο 0 I S ,

(5)

6) ( )(_k) ( )(_k 1) 1~( )(_k) f(m)H(_k) F m NF m f N m f N h T e h ₌ ₋ ₊

_λ

−               = + 1 3 2 1 3 3 2 3 1 3 2 3 2 2 1 2 1 3 1 2 1 1 ) 3 ( 1 T N T N T N N N N N N N N N N N N N N N N N N N N 0 0 0 0 I Ο Ο 0 Ο I Ο 0 Ο Ο I S . 7) ( )( )₌ ( )( ) ( )( ₋1) k E k k bm N m F m b F q α 8) ~( 1)(_k) ( )(_k) ( )(_k 1) (m)(_k) F m b N m NF m NF Q h q T − = + −

The matrices for other values of M can be created

simi-larly. 9)       − = ) ( ) ( ) ( ) ( ) ( ₍ ₎ ) ( ) ( ) ( ) ( k E k k k k _f _m N m f F H m f F F m F m F e α I Φ Φ

A parallel form of Fast Transversal Filter (FTF) is based on the recursive updating of the matrices

11)

[

]

~ ( ) ) ( ) ( ) ( ) ( ) ( 1 ) ( ) ( ) ( 1 ) 1 ( k k k k k m F m b F H m F m F F m F Φ α q Φ I Φ − − − × × − =

λ

      − = ) ( ) ( ) ( ) ( ) ( ₍ ₎ ) ( ) ( ) ( ) ( k E k k k k _f _m N m f F H m f F F m F m F e α I Φ Φ 10) E ) ( ) ( ) 1 ( ) ( ) ( ) ( ) ( ) ( k k k E k H m f F m f F m f N m f N α e × × + − =λ and

[

]

~ ( ). ) ( ) ( ) ( ) ( ) ( 1 ) ( ) ( ) ( 1 ) 1 ( k k k k k m F m b F H m F m F F m F Φ α q Φ I Φ − − − × × − =

λ

12) e ( )( ) ( )( ) ( 1)( ) k k k m F m b F m b F − =α Φ 13) E ( )(_k) _E ( )(_k 1) ( )(_k) b(m)H(_k) F m b F m b N m b N =

λ

− +e α

The matrices allow the calculations of Kalman gains

14) h ) ( ) ( ~) ( 1) ( ) ( ) 1 ( 1 ) ( ) ( k k k k H m b F m NF m b N m b N e T h − − + + − =

λ

) ( ) ( ~ ) ( ) (_k (0) _k 1 (0) _k (0) _k F NF NF NF G T Φ G = =

λ

− ,

and the vectors and e , which are used in the updating of prediction error energies

) ( ) (m _k f F e b(m)(_k) F End for m 15) α (k) (k) (k 1) NF(k) H N F F =d −h − X ) ( ) ( ) 1 ( ) ( ( ) ( ) ( ) ) ( _k _E _k _k _k E f mH F m f F m f N m f N =

λ

− +e α , 16) e (_k) (_k) (0)(_k) F F F =α Φ and 17) (_k) (_k 1) 1~( 1)(_k) H(_k) F m NF N N h T e − − + − =

λ

h ) ( ) ( ) 1 ( ) ( ( ) ( ) ( ) ) ( _k _E _k _k _k E bmH F m b F m b N m b N =

λ

− +e α . 18) ~( )(_k 1) ~(0)(_k), ( )(_k 1) (0)(_k) F M F NF M NF + =T Φ + =Φ T

The parallel FTF algorithm is described as:

0) F M F NF M NF N m b N N m f N N m b N m f N N N NF NF N N N N N N N N M m E E L d d L L m S Φ O T 0 h 0 h 0 h O X 0 ρ 0 ρ 0 χ 0 χ : Init. = = = = = = = = = = + − = = + − = = + − = − ) 1 ( , ) 1 ( ~ (0) , (0) , 1,2, , , , ) 0 ( , ) 0 ( , ) 0 ( , ) 0 ( , 0 ) 1 0 ( ,..., 0 ) 0 ( , ) 1 0 ( ,..., ) 0 ( , ) 1 0 ( ,..., ) 0 ( ) ( ) ( ) ( ) ( 2 ) ( 2 ) ( K

λ

δ

k for End

A parallel form of Fast a Posteriori Error Sequential Technique (FAEST) algorithm is distinguished from the parallel FTF algorithm by the recursive update of the in-verse matrices

[

( )(_k)

] [

1 ( )(_k)

]

1 1 ( ) (_k) (m)(_k) F H m f F m F m F Φ α η Φ − ₌ − ₊

_λ

− _, and K k=_{1 K},2, , For

[

( 1)(_k)

] [

1 ( )(_k)

]

1 1 ( ) (_k)~(m)(_k) F H m b F m F m F Φ α q Φ − − ₌ − ₋

_λ

− _. 1 , , 1 , − K =M M m For

Similarly, a parallel form of a stabilized FAEST algo-rithm can be developed. A multichannel parallel version of the algorithm is shown below. The details of the single channel prototype of the algorithm can be found in [17]. 1) α ( )( ) ( )( ) ( ) ( 1) ( )( ) k k k k m NF H m f N m F m f F =x −h − X 2) e ( )(_k) ( )(_k) (m)(_k) F m f F m f F =α Φ 0) F M F NF M NF N m b N N m f N N m b N m f N N N NF NF N N N N N N N N M m E E L d d L L m S Φ O T 0 h 0 h 0 h O X 0 ρ 0 ρ 0 χ 0 χ : Init. = = = = = = = = = = + − = = + − = = + − = − ) 1 ( , ) 1 ( ~ (0) , (0) , 1,2, , , , ) 0 ( , ) 0 ( , ) 0 ( , ) 0 ( , 0 ) 1 0 ( , 0 ) 0 ( , ) 1 0 ( ,..., ) 0 ( , ) 1 0 ( ,..., ) 0 ( ) ( ) ( ) ( ) ( 2 ) ( 2 ) ( K

λ

δ

..., 3) ( )(_k)₌ ( )(_k) _Ef(m)(_k ₋1) N m f F m F α η 4) T

[

]

[

_m

]

H NF T F m F H H m f N m F N k k k k ) ( ~ , ) ( ) ( , 1 ) ( ~ ) ( ) ( ) ( ) ( ) 1 ( T 0 η h + + − = + 5)

{

}

_      = + + + ) ( ) ( ) ( ~ ) ( ) ( ) ( ) 1 ( ) ( 1 ) ( 1 k k k _m F m NF m F N T m N m N q Q T T S

(6)

K k=_{1 K},2, , For 24) α (k) (k) H(k 1) _NF(k) N F F =d −h − X 1 , , 1 , − K =M M m For 25) e (_k) (_k) (0)(_k) F F F =α Φ 25) (_k) (_k 1) 1~(0)(_k) H(_k) F NF N N =h − +

λ

−T e h 1) α ( )(_k) ( )(_k) ( ) (_k 1) (m)(_k) NF H m f N m F m f F =x −h − X 25) ~( )(_k 1) ~(0)(_k), ( )(_k 1) (0)(_k) F M F NF M NF + =T Φ + =Φ T 2) e ( )( ) ( )( ) ( )( ) k k k m F m f F m f F =α Φ 3) ( )(_k)₌ ( )(_k) _Ef(m)(_k₋1) N m f F m F α η _End _for _k

It is well known, that RLS and fast RLS adaptive fil-tering algorithms can be also developed by the using QRD. If the adaptive filter weights are required, the inverse QRD has to be used. Usually, QRD RLS algorithms have the square root operations. The operations can be excludes by means the scaling of the variables involved in calculations [18]. Using the duality between the fast RLS algorithms and the fast QRD based least squares algorithms [19], parallel multichannel version of the square root free in-verse QRD fast RLS algorithm [20] can be developed. The algorithm is presented below.

4) _      +       − − = + ) ( ~ ) ( ) 1 ( 1 ) ( ~ ) ( ) ( ) ( ) ( ) 1 ( k k k k _m NF T F m F m f N m F N T 0 η h T 5)

{

}

_      = + + + ) ( ~ ( ) ~ ) ( ~ ) ( ) ( ) ( ) 1 ( ) ( 1 ) ( 1 k k k _m F m NF m F N T m N m N q Q T T S 6) ( )(_k) ( )(_k 1) 1~( )(_k) f(m)H(_k) F m NF m f N m f N =h − +

λ

−T e h 7) _E ( )(_k) _E ( )(_k 1) ( )(_k) f(m)H(_k) F m f F m f N m f N =

λ

− +e α 8)

[

( )(_k)

] [

1 ( )(_k)

]

1 1 ( ) (_k) (m)(_k) F H m f F m F m F Φ α η Φ − = − +

λ

− 0) d F M B F NF M NF N m b N N m f N N m b N m f N N N NF NF N N N N N N N N M m E E L d L L m S K O G 0 h 0 h 0 h O X 0 ρ 0 ρ 0 χ 0 χ : Init. = = = = = = = = = = + − = = + − = = + − = − ) 1 ( , ) 1 ( , , , 2 , 1 , ) 0 ( , ) 0 ( , ) 0 ( , ) 0 ( , ) 0 ( , ) 0 ( , 0 ) 1 0 ( ,..., 0 ) 0 ( , ) 1 0 ( ,..., ) 0 ( , ) 1 0 ( ,..., ) 0 ( ) ( ) ( ) ( ) ( 2 ) ( 2 ) ( K

λ

δ

9) ( )(_k) ( )(_k _N ) ( ) (_k 1) (m1)(_k) NF H m b N m m F m b F =x − −h − X − α 10) ( )( )₌ ( )( ) ( )( ₋1) k E k k bm N m b F m F α q 11) ~ ( )(_k)₌~( )(_k)_Eb(m)(_k₋1) N m F m b F q α 12) ( ) ( ) (1 )~ ( )( ) 1 ) ( 1 ) )( 1 ( _k _K _k _K bm _k F m b F m b F α α α = + − K k=_{1 K},2, , For 13) ( ) ( ) (1 )~ ( )( ) 2 ) ( 2 ) )( 2 ( _k _K _k _K bm _k F m b F m b F α α α = + − 1 , , 1 , − K =M M m For 14) ( ) ( ) (1 )~ ( )( ) 5 ) ( 5 ) )( 5 ( _k _K _k _K bm _k F m b F m b F α α α = + − 1) α ( )(_k) ( )(_k) ( ) (_k 1) (m)(_k) NF H m f N m F m f F =x −h − X 15) ( ) ( ) (1 )~( )( ) 4 ) ( 4 ) ( _k _K _k _K m _k F m F m F q q t = + − 2) ( ) ( )

[

( )

]

1 ) ( ) ( ) (k = k Bm k − F m f F m f F α K e 16) ~( 1)(_k) ~( )(_k) ( )(_k 1) (m)(_k) F m b N m NF m NF Q h t T − = + − 17)

[

]

m _F NF H m NF m F k T k X k S Φ_ˆ( −1)₍ ₎−1=

_λ

−1~( −1) ₍ ₎ ( −1)₍ ₎+ ₃₎ ) ( ) ( ) 1 ( ) ( ) ( ) ( ) ( ) ( 1 1 ) ( ) ( k k k E k k m f F H m f F m f N m B F m B F α α K × × − + =

_λ

− − K 18)

[

]

[

]

) ( ~ ) ( ) ( ) ( ~ ) ( ) )( 5 ( 1 1 ) ( 1 ) 1 ( k k k k m F H m b F m F m F q α Φ Φ − − − − − − = λ 4)

[

]

1 ) ( ) ( ) ( ₍_k₎₌ ₍_k₎ Bm ₍_k₎− F m B F m f F K K C 5) ( )(_k) 1_E 1 ( )(_k 1)

[

( )(_k)

]

1 f(m)H(_k) F m B F m f N m f F K α − − − ₋ =

λ

s 19)

[

]

[

]

[

₍ ₁₎

]

1 3 1 ) 1 ( 3 1 ) 1 ( ) ( ~ ) 1 ( ) ( ˆ ) ( − − − − − − − + + = k K k K k m F m F m F Φ Φ Φ 6) Q or 8) ) ( ) 1 ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( k k k k k H m f F m f N m f F m NF m f NF s h C G − − − = 20) (1)( )(_k) (1)( )(_k)~(m1)(_k) F m b F m b F =α Φ − e 7) h ( )(_k) ( )(_k 1) ( )(_k) f(m)H(_k) F m NF m f N m f N =h − +G α 21) (2)( )(_k) (2)( )(_k)~(m1)(_k) F m b F m b F =α Φ − e 8) Q ( )( ) ( )( ) ( )( ) ( ) ( ) or 6) k k k k f mH F m f N m NF m f NF =G −h s 22) _E ( )(_k) _E ( )(_k 1) (2)( )(_k) b(2)(m)H(_k) F m b F m b N m b N =

λ

− +e α 9) q ( )(_k) f(m)H(_k) F m f F =s 23) ( )(_k) ( )(_k 1) 1~( 1)(_k) b(1)(m)H(_k) F m NF m b N m b N h T e h ₌ ₋ ₊

_λ

− − 10) S _      =       + + ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( 1 ) ( 1 k k k k m b F m b NF m f NF m f F T m N m N q Q Q q T m for End

(7)

11) _E ( )(_k) _E ( )(_k 1) ( )(_k) f(m)H(_k) F m f F m f N m f N =

λ

− +e α 12) α ( )(_k) ( )(_k _N ) ( ) (_k 1) (m1)(_k) NF H m b N m m F m b F − − − − =x h X 13) ) ( ) ( ) 1 ( ) ( ) ( ) ( ) ( ) ( 1 1 ) ( ) 1 ( k k k E k k m b F H m b F m b N m B F m B F α α K K − = −

_λ

− − − × × 14) ( )₍_k₎₌ ( −1)₍_k₎

[

B(m)₍_k₎

]

−1 F m B F m b F K K C 15) ( )(_k) 1_E 1( )(_k 1)

[

( )(_k)

]

1 b(m)H(_k) F m B F m b N m b F K α − − − ₋ =

λ

s 16)

[

]

[

₍ ₎

]

1 ) ( ) ( ) ( ) 1 ( ) ( ) ( ) 1 ( ) ( ) ( − − × × − + = k k k k k m b F H m b F m b N m b NF m NF C s h Q G 17) e ( ) ( )

[

( 1)

]

1 ) ( ) ( ) (k = k Bm− k − F m b F m b F α K 18) _E ( )(_k) _E ( )(_k 1) ( )(_k) b(m)H(_k) F m b F m b N m b N =

λ

− +e α 19) h ( )( ) ( )( 1) ( 1)( ) ( ) ( ) k k k k bmH F m NF m b N m b N h G α − + − = m for End 20) α (k) (k) (k 1) NF(k) H N F F =d −h − X 21) h (_k) (_k 1) (0)(_k) H(_k) F NF N N =h − +G α 22) G( )(_k 1) (0)(_k), ( )(_k 1) B(0)(_k) F M B F NF M NF + =G K + =K k for End

Similarly, the stabilized form of the parallel inverse QRD-based algorithm can be developed as well.

The identity of the parallel and the same named se-quential fast RLS algorithms is ensured by means of the

square matrices

[

and K which have

F×F elements. The matrices disappear in sequential SW

regularized fast RLS algorithms. They become the scalar variables, known in adaptive filter theory as the inverse of likehood ratios.

]

1 ) (m ₍_k₎− F Φ B(m)(_k) F

The above considered parallel algorithms can be used in adaptive filters without linear constraints and for the calculation of Kalman gain in LC RLS algorithms. In the parallel LC RLS algorithms, MIL (6) is also used for the

calculation of the matrices Γ ,

and Q ,

caused by linear constrains in (2). There are three forms of the LC RLS algorithms [13]. The parallel forms of the algorithms are developed by the using of steps similar to those considered in [13]. NJ N NJ(k) R (k)C 1 − = ) ( ) ( ) (_k _k 1 _k J NJ − =Γ Ψ

[

]

1 1₍ ₎ ₍ ₎− − _k ₌ _k NJ H NJ J C Γ Ψ NJ

The below is LC RLS algorithm, based on the matrix QNJ(k) calculation. The algorithms is parallel as the compu-tation of the matrices with F columns can be accomplished by means of F parallel processors, because the computa-tions are independent each other and depend on the

inde-pendent streams of adaptive filters input data, i.e. the col-umns of matrix XNF(k). 0)

[

]

) , , , 1 , , , , , 1 ( ) 0 ( ) 0 ( , ) 0 ( ) 0 ( ) 0 ( , ) 0 ( ) 0 ( , ) 0 ( , ) 0 ( , 0 ) 1 0 ( ,..., 0 ) 0 ( , ) 1 0 ( ,..., ) 0 ( , ) 1 0 ( ,..., ) 0 ( 1 1 1 1 2 1 1− − − − − − = = = = = = = + − = = + − = = + − = M N N N J NJ N NJ H NJ NJ NJ NJ N NJ N N NF NF N N N N N N N N diag L d d L L

λ

δ

K K K Λ f Q h Γ C Γ Q C R Γ Λ R O X 0 ρ 0 ρ 0 χ 0 χ : Init. K k=_{1 K},2, , For 1) Calculation of G_NF(k) 2) (k) NF(k) H NJ JF C G V = 3) (k)= (k) NJ(k−1) H NF H JF X Q N 4)

[

]

      − + × × − − = ′ ) ( ) ( ) ( ) ( ) ( ) ( ) 1 ( ) ( k k k k k k k k JF H JF F H JF JF J H JF NF NJ NJ V N I N V I N G Q Q 5)

(

)

[

( )

]

) ( ) ( 1 k k k NJ H NJ J NJ H NJ NJ NJ NJ Q C I C C C Q Q ′ − × × + ′ = − 6) (k) (k) (k 1) NF(k) H N F F d h X α = − − 7) (k) (k 1) (k) H(k) F NF N N h G α h′ = − + 8) (k) (k) (k)

[

N(k)

]

H NJ J NJ N N h Q f C h h = ′ + − ′ k for End

4. Simulation

Simulations, which confirm the considered parallel algorithms efficiency, are presented in the section in Fig. 2 – Fig. 5. Multichannel versions of PW LC RLS and un-regularized SW LC RLS algorithms are compared in a three channel linear filter identification problem and non-stationary (speech) input signals. Filter performance is observed during 50000 iterations, corresponding to the same number of signal samples. In Fig. 2 and Fig. 4, the vertical arrows indicate the mentioned frequencies and horizontal dashed line indicates the level of constraints.

Fig. 2 demonstrates that two linear constraints 0 dB of adaptive filter transfer function |H(f)|, applied at f=1 kHz and f=2 kHz selected frequency points, are provided by the both algorithms. The Echo Return Loss Enhancement (ERLE) is frequently used in the evaluation of the quality of an identification problem solution. The ERLE is the ratio of the energy of the echo signal d(k) and the energy of

(8)

the suppressed echo αN,χ(k), measured at each iteration of adaptive filtering algorithm as follows:

      =

∑

+ − = + − = k B k i N k B k i i i d k ERLE 1 2 , 1 2 10 ( ) ( ) log 10 ) (

α

χ

Here, the length B is approximately selected as 30 ms that corresponds to a stationarity interval of speech. From Fig. 3, it can be seen that the performance of the SW algorithm in terms of ERLE is better, meaning less error at the adap-tive filter output in comparison with PW ones.

It is interesting to examine the behavior of the algo-rithms as presented in Fig. 3. At the beginning of the adap-tation process, both algorithms provide approximately similar performance. However, the PW algorithm does not properly track the changing processed signals. At the same time, the SW algorithm demonstrates better performance (approximately 20 dB higher ERLE in the considered case) due to its ability to track the change in the input signals. In the examples, L=256 samples or approximately 30 ms. The window length for ERLE calculation is also 256 samples.

Fig. 4. Simulations results, transfer function: 1 - SW LC RLS

algorithm with regularization, 2 - SW LC RLS algorithm without regularization.

Fig. 5. Simulations results, ERLE: 1 - SW LC RLS algorithm

with regularization, 2 - SW LC RLS algorithm without regularization.

The improvement of the SW RLS algorithms is achieved by means of dynamic regularization of correlation matrix, see Fig. 4 and Fig. 5. The regularized algorithm also pro-vides the mentioned constraints of adaptive filter transfer function. Besides, average ERLE in the case is compared with unregularized algorithm and even higher. It means more stable performance of the regularized algorithms in comparing with unregularized ones.

Fig. 2. Simulations results, transfer function: 1 - PW LC RLS

algorithm without regularization, 2 - SW LC RLS algo-rithm without regularization.

So, the above results confirm the operation of the pro-posed algorithms. Better efficiency of constrained SW RLS algorithms was also demonstrated in comparison with PW algorithms, in situations where the input signals of the adaptive filter are nonstationary. Furthermore, algorithm improvement is achieved due to the regularization of the correlation matrix inversion.

5. Conclusion

Fig. 3. Simulations results, ERLE: 1 - PW LC RLS algorithm

without regularization, 2 - SW LC RLS algorithm

(9)

mul-possible modifications of correlation matrix, was presented in the paper. The parallel algorithms are mathematically identical to the same named sequential algorithms. Identity means the same performance if adaptive filters have the same parameters and process the same signals. Total num-ber of arithmetic operations of the parallel algorithms and the same named sequential algorithms is approximately the same. However, if F processors are used, the computa-tional load per processor is decreased F times in the paral-lel algorithms. These algorithms can be used in all tradi-tional applications of adaptive filters.

References

[1] SAYED, A. H. Fundamentals of Adaptive Filtering. Hoboken, NJ: John Wiley and Sons, Inc., 2003.

[2] BENESTY, J., HUANG, Y. (Eds.). Adaptive Signal Processing:

Applications to Real-World Problems. New York: Springer-Verlag,

2003.

[3] DJIGAN, V. I. Multichannel RLS and fast RLS adaptive filtering algorithms. Successes of Modern Radioelectronics. 2004, no. 11, p. 48 - 77. (in Russian).

[4] PETRICHKOVICH Y. Y., SOLOKHINA, T. V. MULTICORE signal controllers - new Russian family of SoC. In Proceedings of

6-th International Conference on Digital Signal Processing and its Applications. Moscow (Russia), 2004, vol.1, p. 8 - 15. (in Russian).

[5] GAY, S. L. Dynamically regularized fast RLS with application to echo cancellation. In Proceedings of the International Conference on

Acoustic Speech and Signal Processing. Atlanta (USA), 1996, p. 957

- 960.

[6] PAPAODYSSEYS, C. A robust, parallelizable, O(m), a posteriori recursive least squares algorithm for efficient adaptive filtering.

IEEE Trans. Signal Processing, 1999, vol. 47, no. 9, p. 2552 - 2558.

[7] DJIGAN, V. I. Parallelizable sliding window regularized fast RLS algorithm for multichannel linearly constrained adaptive filtering. In

Proceedings of the 10-th International Conference on Radars, Navi-gation and Communication (RLNC-2004). Voronezh (Russia), 2004,

vol. 1, p. 132 - 142. (in Russian).

[8] DJIGAN, V. I. Multichannel fast RLS adaptive filtering algorithm for parallel implementation by means of four processors. In

Proceed-ings of Bauman’s Moscow State Technical University, 2005, no. 1, p.

83 - 99. (in Russian).

[9] DJIGAN, V. I. On parallel implementation of adaptive filtering algorithms. In Proceedings of the 8–th International Conference on

Pattern Recognition and Information Processing (PRIP–2005).

Minsk (Belarus), 2005, vol. 1, p. 101 - 104.

[10] DJIGAN, V. I. Fast RLS with parallel computations. In Proceedings

of the IEEE 7–th Emerging Technologies Workshop: “Circuits and Systems for 4G Mobile Wireless Communications”. S. Petersburg

(Russia), 2005, p. 42 - 45.

[11] DJIGAN, V. I. Parallel multichannel fast RLS adaptive filtering algorithm based on inverse QR decomposition. In Proceedings of the

Second IASTED International Multi-Conference on ACIT.

Novosi-birsk (Russia), 2005, vol. SIP, p. 170 – 175.

[12] DJIGAN, V. I. Parallel linearly–constrained recursive least squares for mulitchannel adaptive filtering. In Proceedings of St. Petersburg

IEEE Chapters: International Conference “Radio – That Connects Time. 110 Years of Radio Invention”. S. Petersburg (Russia), 2005,

vol. 2, p. 134 -139.

[13] RESENDE, L. S., ROMANO, J. M. T., BELLANGER, M.G. A fast least-squares algorithm for linearly constrained adaptive filtering.

IEEE Trans. Signal Processing, 1996, vol. 44, no. 5, p. 1168 - 1174.

[14] GIORDANO, A. A., HSU, F. M. Least Square Estimation with

Application to Digital Signal Processing. Toronto: John Wiley and

Sons, Inc., 1985.

[15] ZELNIKER, G., TAYLOR, F. J. Advanced Digital Signal

Process-ing: Theory and Applications. New York: Marcel Dekker, Inc., 1994.

[16] GLENTIS, G.-O. A., KALOUPTSIDIS N. Fast adaptive algorithms for multichannel filtering and system identification. IEEE Trans.

Sig-nal Processing, 1992, vol. 40, no. 10, p. 2433 - 2458.

[17] SLOCK, D.T.M., KAILATH, T. Numerically stable fast transversal filters for recursive least squares adaptive filtering. IEEE Trans.

Sig-nal Processing. 1991, vol. 39, no. 1, p. 92 - 114.

[18] HSIEH, S. F., LIU, K. J. R. A unified square-root-free approach for QRD based recursive least squares estimation. IEEE Trans. Signal

Processing, 1993, vol. 41, no. 3, p. 1405 - 1409.

[19] GLENTIS, G.-O. A. On the duality between the fast transversal and the fast QRD adaptive least squares algorithms. IEEE Trans. Signal

Processing. 1999, vol. 47, no. 8, p. 2317 - 2321.

[20] PROUDER, I. K. Fast time-series adaptive-filtering algorithm based on the QRD inverse-updates method. IEE Proceedings: Vision,

Im-age and Signal Processing, 1994, vol. 141, no. 5, p. 325 - 333.

About Author...

Victor I. DJIGAN was born in 1958 in Ukraine. He re-ceived a B.S. degree with honors from the Vinnitsa Tech-nical College of Electronic Devices, Ukraine, in 1978; M.S. degree with honors from the Moscow Institute of Electronic Engineering (MIEE) - Technical University, Russia, in 1984, and Ph.D. degree from MIEE in 1990, all in radio engineering. From 1984 to the present he has been holding different R&D positions in MIEE, in a number of Russian companies and in such international companies as Samsung Electronics, LG and PMC-Sierra outside Russia, where he worked in the fields of digital signal processing; communication; adaptive antenna arrays; acoustic echo and noise cancellation; transmitting line analysis, simulation and reflectometry. Presently, he is a Principal Researcher in ELVEES R&D Center, Moscow, Russia, and a Scien-tific Secretary of the Company. His scienScien-tific interests include adaptive signal processing, speech processing and digital communication. He is the author of about 100 pa-pers in the above-mentioned fields, Member of the Russian A.S. Popov Society for Radioengineering, Electronics & Communications, and IEEE Senior Member (M’96, SM’2004).