Background 2. Lecture 2 1. The Least Mean Square (LMS) algorithm 4. The Least Mean Square (LMS) algorithm 3. br(n) = u(n)u H (n) bp(n) = u(n)d (n)

(1)

Lecture 2

1 During this lecture you will learn about

• TheLeast Mean Squaresalgorithm (LMS)

• Convergence analysis of the LMS

• Equalizer (Kanalutj¨amnare)

Adaptive Signal Processing 2011 Lecture 2

Background

2

The method of theSteepest descentthat was studies at the last lecture is a recursive algorithm for calculation of the Wiener filter when the statistics of the signals are known (knowledge aboutRochp). The problem is that this information is oftenly unknown!

LMS is a method that is based on the same principles as the met-hod of the Steepest descent, but where the statistics is estimated continuously.

Since the statistics is estimated continuously, the LMS algorithm can adapt to changes in the signal statistics; The LMS algorithm is thus an adaptive filter.

Because of estimated statistics the gradient becomes noisy. The LMS algorithm belongs to a group of methods referred to as stochastic gradientmethods, while the method of the Steepest descent belongs to the groupdeterministic gradientmethods.

The Least Mean Square (LMS) algorithm

3

We want to create an algorithm that minimizesE{|e(n)|2}, just like the SD, but based on unkown statistics.

A strategy that then can be used is to uses estimates of the autocorre-lation matrixRand the cross correlationen vectorp. If instantaneous estimates are chosen,

b

R(n) =u(n)uH(n) b

p(n) =u(n)d∗(n)

the resulting method is theLeast Mean Squaresalgorithm.

The Least Mean Square (LMS) algorithm

4

For the SD, the update of the filter weights is given by

w(n+1)=w(n) +1

2µ[−∇J(n)] where∇J(n) =−2p+ 2Rw(n).

In the LMS we use the estimates Rb och bp to calculate ∇Jb (n). Thus, also the updated filter vector becomes an estimate. It is therefore denotedwb(n);

b

(2)

The Least Mean Square (LMS) algorithm

5 For the LMS algorithm, the update equation of the filter weights becomes

b

w(n+ 1) =wb(n) +µu(n)e∗(n) wheree∗(n) =d∗(n)−uH(n)wb(n).

Compare with the corresponding equation for the SD: w(n+ 1) =w(n) +µE{u(n)e∗(n)}

wheree∗(n) =d∗(n)−uH(n)w(n).

The difference is thus thatE{u(n)e∗(n)}is estimated asu(n)e∗(n). This leads togradient noise.

The Least Mean Square (LMS) algorithm

6

Illustration ofgradient noise.Rochpin the example from Lecture 1. µ= 0.1. −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 1.06 1.96 2.86 3.76 4.66 5.56 6.46 7.36 7.36 7.36 8.26 8.26 8.26 9.16 9.16 10.06 10.06 10.96 10.96 11.86 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 1.06 1.96 2.86 3.76 4.66 5.56 6.46 7.36 7.36 7.36 8.26 8.26 8.26 9.16 9.16 10.06 10.06 10.96 10.96 11.86 w0 w0 w1 w1 SD LMS

Because of the noise in the gradientm the LMS never reachesJmin,

which the SD does.

Convergence analysis of the LMS, overview

7

Consider the update equation b

w(n+ 1) =bw(n)+µu(n)e∗(n).

We want to know how fast and towards what the algorithm converges,in terms ofJ(n)andwb(n).

Strategy:

1. Introduce the filter weight error vectorǫ(n) =wb(n)−wo.

2. Express the update equation inǫ(n).

3. ExpressJ(n)inǫ(n). This leads toK(n) =E{ǫ(n)ǫH(n)}. 4. Calculate the difference equation in K(n), which controls the

convergence.

5. Make a variable change fromK(n) to X(n)which converges equally fast.

1. The filter weight error vector

8

The LMS contains feedback, and just like for the SD there is a risk that the algorithm diverges, if the stepsizeµis not chosen appropriately. In order to investigate the stability, the filter weight error vector is introducedǫ(n):

ǫ(n) =wb(n)−wo (wo=R−1p Wiener-Hopf)

Note thatǫ(n)corresponds toc(n) =w(n)−wofor theSteepest

(3)

2. Update equation in

ǫ(

n

)

9 The update equation forǫ(n)at timen+1can be expressed recursively as

ǫ(n+1) = [I−µRb(n)]ǫ(n) +µ[bp(n)−Rb(n)wo]

= [I−µu(n)uH(n)]ǫ(n) +µu(n)e∗_o(n) wheree∗_o(n) =d(n)−wHou(n).

Compare to that of the SD:

c(n+ 1)= (I−µR)c(n)

3. Express

J

(

n

)

in

ǫ(

n

)

10

The convergence analysis of the LMS is considerably more complicated than that for the SD. Therefore, two tools are needed (approximations). These are

• Independence theory

• Direct-averaging method

3. Express

J

(

n

)

i

ǫ(

n

)

, cont.

11

Independence Theory

1. The input vectors of different time instantsn, u(1),u(2), . . . ,u(n) ,

is independet of each other (and thereby uncorrelated).

2. The input signal vector at timenis independent of the desired signal

at all earlier time instants,

u(n) idependent of d(1), d(2), . . . , d(n−1) . 3. The desired signal at timen,d(n),

is independent of all earlier desired signals,d(1), d(2), . . . , d(n−

1), but dependent of the input signal vector at timen,u(n). 4. The signalsd(n)andu(n)at the same time instantnis mutually

normally distributed.

3. Express

J

(

n

)

in

ǫ(

n

)

, cont.

12

Direct-averaging

Direct-averagingmeans that in the update equation forǫ,

ǫ(n+ 1) = [I−µu(n)uH(n)]ǫ(n) +µu(n)e∗_o(n), the instantaneous estimateRb(n) =u(n)uH(n)is substituted with the expectation over the ensemble,R=E{u(n)uH(n)}, such that

ǫ(n+ 1) = (I−µR)ǫ(n) +µu(n)e∗_o(n) results.

(4)

3. Express

J

(

n

)

i

ǫ(

n

)

, cont.

13 For the SD we could show that

J(n) =Jmin+ [w(n)−wo]HR[w(n)−wo]

and that the eigenvalues ofRcontrols the convergence speed. For the LMS,ǫ(n) =wb(n)−wo, J(n) =E{|e(n)|2}=E{|d(n)−wbH(n)u(n)|2} =E{|eo(n)−ǫH(n)u(n)|2}= i =E_|{|eo_{z(n)|2}_} Jmin +E{ǫH(n)_|u(n)u_{zH(n)_} b R(n) ǫ(n)}

Here, (i) indicates that the indpendence assupmtion is used. Sinceǫ

andRbare estimates, and in addition not independent, the expectation operator cannot just be “moved intoR.b

3. Express

J

(

n

)

i

ǫ(

n

)

, cont.

14

The behavior of Jex(n) = J(n)−Jmin = E{ǫH(n)Rbǫ(n)}

determines the convergence.

Since the following holds for the trace of a matrix, 1. tr(scalar) =scalar 2. tr(AB) = tr(BA) 3. E{tr(A)}= tr(E{A}) Jex(n)can be written as Jex(n)1=,2E{tr[Rb(n)ǫ(n)ǫH(n)]} 3 = tr[E{Rb(n)ǫ(n)ǫH(n)}] i = tr[E{Rb(n)}E{ǫ(n)ǫH(n)}] = tr[RK(n)]

The convergence thus depends onK(n) =E{ǫ(n)ǫH(n)}. Adaptive Signal Processing 2011 Lecture 2

4. Difference equation of

K

(

n

)

15

The update equation for the filter weight error is, as shown previously,

ǫ(n+ 1) = [I−µu(n)uH(n)]ǫ(n) +µu(n)e∗_o(n) The solution to this stochastic difference equation is for smallµclose to the solution for

ǫ(n+ 1) = (I−µR)ǫ(n) +µu(n)e∗o(n),

which is resulting fromDirect-averaging.

This equation is still difficult to solve, but now the behavior ofK(n) can be described by the following difference equation

K(n+ 1) = (I−µR)K(n)(I−µR) +µ2JminR

5. Changing of variables from

K

(

n

)

to

X

(

n

)

16 The variable changes

QHRQ=Λ QHK(n)Q=X(n)

where Λis the diagonal eigenvalue matrix ofR, and where X(n) normally is not diagonal, yields

Jex(n) = tr(RK(n)) = tr(ΛX(n))

The convergence can be discribed by

(5)

5. Changing of variables from

K

(

n

)

to

X

(

n

)

,

cont.

17

Jex(n)depends only on the diagonal elements ofX(n)(because of

the tracetr[ΛX(n)]). We can therefore writeJex(n) = M

X

i=1

λixi(n).

The update of the diagonal elements is controlled by xi(n+ 1) = (1−µλi)2xi(n) +µ2Jminλi

This is an inhomogenous difference equation.This means that the solu-tion will contain a transient part, and a stasolu-tionary part that depends on Jminandµ. The cost function can therefore be written

J(n) =Jmin+Jex(∞) +Jtrans(n)

whereJmin+Jex(∞)ochJtrans(n)is the stationary and transient

solutions, respectively.

At the best, the LMS can thus reachJmin+Jex(∞).

Properties of the LMS

18

• Convergence for the same interval as the SD: 0< µ < 2 λmax • Jex(∞) =Jmin M X i=1 µλi 2−µλi • MisadjustmentM=Jex(∞)

Jmin is a measure of how close the the

optimal solution the LMS (in mean-square sense).

Rules of thumb LMS

19

The eigenvalues ofRare rarely known. Therefore, a set of rules of thumbs is commonly used, that is based on the tap-input power, P_M₋₁ k=0 E{|u(n−k)|2}=Mr(0) = tr(R). • The stepsize0< µ <_P 2 M−1 k=0 E{|u(n−k)|2} • MisadjustmentM ≈µ₂PM_k₌₀−1E{|u(n−k)|2}

• Average time constantτmse,av≈₂_µλav1 d¨ar

λav=_M1 PMi=1λi.

Whenλiis unknown, the following relationship is useful.

• Approximate relationshipMandτmse,av M ≈₄_τmse,avM

Here,τmse,avdenotes the time it takes forJ(n)to decay a factor of

e−1.

Summary LMS

20

Summary of the iterations:

1. Set start values for the filter coefficients,wˆ(0) 2. Calculate the errore(n) =d(n)−wˆH(n)u(n) 3. Update the filter coefficients

ˆ

w(n+ 1) = ˆw(n) +µu(n)[d∗(n)−uH(n) ˆw(n)] 4. Repeat steps 2 and 3

(6)

Example: Channel equalizer in training phase

21 LMS FIR M=11 Channel h(n) + + Σ z−7 + – Σ 6 -? - p(n) p(n) =. . . ,1,1,−1,1,−1, . . . d(n) F¨ordr¨ojning b d(n) e(n) u(n) WGNv(n) σ2_v=0.001 h(n) ={0,0.2798,1.000,0.2798,0}

Sent information in the form of apn-sequence is affected by a channel (h(n)). An adaptive inverse filter is used to compensate for the channel such that the total effect can be approximated by a delay. White Gaussian noise is added during the transmission (WGN).

Example, cont: Learning Curve

22

100 10−1 10−2 0 500 1000 1500 µ=0.0075 µ=0.075 µ=0.025

Learning curves based on 200 realiza-tions.

The decay is related toτmse,av. A largeµgives fast convergence, but also a largeJex(∞).

A ve ra ge of M SE , J ( n ) Iterationn

The curves illustrates learning curves for two differentµ.

What to read?

23

• Least Mean Squares, Haykin chap.5.1–5.3, 5.4 (see lecture), 5.5–5.7

Exercises: 4.1, 4.2, 1.2, 4.3, 4.5, (4.4)

Computer exercise, theme: Implement the LMS algorithm in Matlab, and generate the results in Haykin chap.5.7