Lecture 2
1 During this lecture you will learn about• TheLeast Mean Squaresalgorithm (LMS)
• Convergence analysis of the LMS
• Equalizer (Kanalutj¨amnare)
Adaptive Signal Processing 2011 Lecture 2
Background
2The method of theSteepest descentthat was studies at the last lecture is a recursive algorithm for calculation of the Wiener filter when the statistics of the signals are known (knowledge aboutRochp). The problem is that this information is oftenly unknown!
LMS is a method that is based on the same principles as the met-hod of the Steepest descent, but where the statistics is estimated continuously.
Since the statistics is estimated continuously, the LMS algorithm can adapt to changes in the signal statistics; The LMS algorithm is thus an adaptive filter.
Because of estimated statistics the gradient becomes noisy. The LMS algorithm belongs to a group of methods referred to as stochastic gradientmethods, while the method of the Steepest descent belongs to the groupdeterministic gradientmethods.
Adaptive Signal Processing 2011 Lecture 2
The Least Mean Square (LMS) algorithm
3We want to create an algorithm that minimizesE{|e(n)|2}, just like the SD, but based on unkown statistics.
A strategy that then can be used is to uses estimates of the autocorre-lation matrixRand the cross correlationen vectorp. If instantaneous estimates are chosen,
b
R(n) =u(n)uH(n) b
p(n) =u(n)d∗(n)
the resulting method is theLeast Mean Squaresalgorithm.
The Least Mean Square (LMS) algorithm
4For the SD, the update of the filter weights is given by
w(n+1)=w(n) +1
2µ[−∇J(n)] where∇J(n) =−2p+ 2Rw(n).
In the LMS we use the estimates Rb och bp to calculate ∇Jb (n). Thus, also the updated filter vector becomes an estimate. It is therefore denotedwb(n);
b
The Least Mean Square (LMS) algorithm
5 For the LMS algorithm, the update equation of the filter weights becomesb
w(n+ 1) =wb(n) +µu(n)e∗(n) wheree∗(n) =d∗(n)−uH(n)wb(n).
Compare with the corresponding equation for the SD: w(n+ 1) =w(n) +µE{u(n)e∗(n)}
wheree∗(n) =d∗(n)−uH(n)w(n).
The difference is thus thatE{u(n)e∗(n)}is estimated asu(n)e∗(n). This leads togradient noise.
Adaptive Signal Processing 2011 Lecture 2
The Least Mean Square (LMS) algorithm
6Illustration ofgradient noise.Rochpin the example from Lecture 1. µ= 0.1. −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 1.06 1.96 2.86 3.76 4.66 5.56 6.46 7.36 7.36 7.36 8.26 8.26 8.26 9.16 9.16 10.06 10.06 10.96 10.96 11.86 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 1.06 1.96 2.86 3.76 4.66 5.56 6.46 7.36 7.36 7.36 8.26 8.26 8.26 9.16 9.16 10.06 10.06 10.96 10.96 11.86 w0 w0 w1 w1 SD LMS
Because of the noise in the gradientm the LMS never reachesJmin,
which the SD does.
Adaptive Signal Processing 2011 Lecture 2
Convergence analysis of the LMS, overview
7Consider the update equation b
w(n+ 1) =bw(n)+µu(n)e∗(n).
We want to know how fast and towards what the algorithm converges,in terms ofJ(n)andwb(n).
Strategy:
1. Introduce the filter weight error vectorǫ(n) =wb(n)−wo.
2. Express the update equation inǫ(n).
3. ExpressJ(n)inǫ(n). This leads toK(n) =E{ǫ(n)ǫH(n)}. 4. Calculate the difference equation in K(n), which controls the
convergence.
5. Make a variable change fromK(n) to X(n)which converges equally fast.
1. The filter weight error vector
8The LMS contains feedback, and just like for the SD there is a risk that the algorithm diverges, if the stepsizeµis not chosen appropriately. In order to investigate the stability, the filter weight error vector is introducedǫ(n):
ǫ(n) =wb(n)−wo (wo=R−1p Wiener-Hopf)
Note thatǫ(n)corresponds toc(n) =w(n)−wofor theSteepest
2. Update equation in
ǫ(
n
)
9 The update equation forǫ(n)at timen+1can be expressed recursively asǫ(n+1) = [I−µRb(n)]ǫ(n) +µ[bp(n)−Rb(n)wo]
= [I−µu(n)uH(n)]ǫ(n) +µu(n)e∗o(n) wheree∗o(n) =d(n)−wHou(n).
Compare to that of the SD:
c(n+ 1)= (I−µR)c(n)
Adaptive Signal Processing 2011 Lecture 2
3. Express
J
(
n
)
in
ǫ(
n
)
10The convergence analysis of the LMS is considerably more complicated than that for the SD. Therefore, two tools are needed (approximations). These are
• Independence theory
• Direct-averaging method
Adaptive Signal Processing 2011 Lecture 2
3. Express
J
(
n
)
i
ǫ(
n
)
, cont.
11Independence Theory
1. The input vectors of different time instantsn, u(1),u(2), . . . ,u(n) ,
is independet of each other (and thereby uncorrelated).
2. The input signal vector at timenis independent of the desired signal
at all earlier time instants,
u(n) idependent of d(1), d(2), . . . , d(n−1) . 3. The desired signal at timen,d(n),
is independent of all earlier desired signals,d(1), d(2), . . . , d(n−
1), but dependent of the input signal vector at timen,u(n). 4. The signalsd(n)andu(n)at the same time instantnis mutually
normally distributed.
3. Express
J
(
n
)
in
ǫ(
n
)
, cont.
12Direct-averaging
Direct-averagingmeans that in the update equation forǫ,
ǫ(n+ 1) = [I−µu(n)uH(n)]ǫ(n) +µu(n)e∗o(n), the instantaneous estimateRb(n) =u(n)uH(n)is substituted with the expectation over the ensemble,R=E{u(n)uH(n)}, such that
ǫ(n+ 1) = (I−µR)ǫ(n) +µu(n)e∗o(n) results.
3. Express
J
(
n
)
i
ǫ(
n
)
, cont.
13 For the SD we could show thatJ(n) =Jmin+ [w(n)−wo]HR[w(n)−wo]
and that the eigenvalues ofRcontrols the convergence speed. For the LMS,ǫ(n) =wb(n)−wo, J(n) =E{|e(n)|2}=E{|d(n)−wbH(n)u(n)|2} =E{|eo(n)−ǫH(n)u(n)|2}= i =E|{|eo{z(n)|2}} Jmin +E{ǫH(n)|u(n)u{zH(n)} b R(n) ǫ(n)}
Here, (i) indicates that the indpendence assupmtion is used. Sinceǫ
andRbare estimates, and in addition not independent, the expectation operator cannot just be “moved intoR.b
Adaptive Signal Processing 2011 Lecture 2
3. Express
J
(
n
)
i
ǫ(
n
)
, cont.
14The behavior of Jex(n) = J(n)−Jmin = E{ǫH(n)Rbǫ(n)}
determines the convergence.
Since the following holds for the trace of a matrix, 1. tr(scalar) =scalar 2. tr(AB) = tr(BA) 3. E{tr(A)}= tr(E{A}) Jex(n)can be written as Jex(n)1=,2E{tr[Rb(n)ǫ(n)ǫH(n)]} 3 = tr[E{Rb(n)ǫ(n)ǫH(n)}] i = tr[E{Rb(n)}E{ǫ(n)ǫH(n)}] = tr[RK(n)]
The convergence thus depends onK(n) =E{ǫ(n)ǫH(n)}. Adaptive Signal Processing 2011 Lecture 2
4. Difference equation of
K
(
n
)
15The update equation for the filter weight error is, as shown previously,
ǫ(n+ 1) = [I−µu(n)uH(n)]ǫ(n) +µu(n)e∗o(n) The solution to this stochastic difference equation is for smallµclose to the solution for
ǫ(n+ 1) = (I−µR)ǫ(n) +µu(n)e∗o(n),
which is resulting fromDirect-averaging.
This equation is still difficult to solve, but now the behavior ofK(n) can be described by the following difference equation
K(n+ 1) = (I−µR)K(n)(I−µR) +µ2JminR
5. Changing of variables from
K
(
n
)
to
X
(
n
)
16 The variable changesQHRQ=Λ QHK(n)Q=X(n)
where Λis the diagonal eigenvalue matrix ofR, and where X(n) normally is not diagonal, yields
Jex(n) = tr(RK(n)) = tr(ΛX(n))
The convergence can be discribed by
5. Changing of variables from
K
(
n
)
to
X
(
n
)
,
cont.
17Jex(n)depends only on the diagonal elements ofX(n)(because of
the tracetr[ΛX(n)]). We can therefore writeJex(n) = M
X
i=1
λixi(n).
The update of the diagonal elements is controlled by xi(n+ 1) = (1−µλi)2xi(n) +µ2Jminλi
This is an inhomogenous difference equation.This means that the solu-tion will contain a transient part, and a stasolu-tionary part that depends on Jminandµ. The cost function can therefore be written
J(n) =Jmin+Jex(∞) +Jtrans(n)
whereJmin+Jex(∞)ochJtrans(n)is the stationary and transient
solutions, respectively.
At the best, the LMS can thus reachJmin+Jex(∞).
Adaptive Signal Processing 2011 Lecture 2
Properties of the LMS
18• Convergence for the same interval as the SD: 0< µ < 2 λmax • Jex(∞) =Jmin M X i=1 µλi 2−µλi • MisadjustmentM=Jex(∞)
Jmin is a measure of how close the the
optimal solution the LMS (in mean-square sense).
Adaptive Signal Processing 2011 Lecture 2
Rules of thumb LMS
19The eigenvalues ofRare rarely known. Therefore, a set of rules of thumbs is commonly used, that is based on the tap-input power, PM−1 k=0 E{|u(n−k)|2}=Mr(0) = tr(R). • The stepsize0< µ <P 2 M−1 k=0 E{|u(n−k)|2} • MisadjustmentM ≈µ2PMk=0−1E{|u(n−k)|2}
• Average time constantτmse,av≈2µλav1 d¨ar
λav=M1 PMi=1λi.
Whenλiis unknown, the following relationship is useful.
• Approximate relationshipMandτmse,av M ≈4τmse,avM
Here,τmse,avdenotes the time it takes forJ(n)to decay a factor of
e−1.
Summary LMS
20Summary of the iterations:
1. Set start values for the filter coefficients,wˆ(0) 2. Calculate the errore(n) =d(n)−wˆH(n)u(n) 3. Update the filter coefficients
ˆ
w(n+ 1) = ˆw(n) +µu(n)[d∗(n)−uH(n) ˆw(n)] 4. Repeat steps 2 and 3
Example: Channel equalizer in training phase
21 LMS FIR M=11 Channel h(n) + + Σ z−7 + – Σ 6 -? - p(n) p(n) =. . . ,1,1,−1,1,−1, . . . d(n) F¨ordr¨ojning b d(n) e(n) u(n) WGNv(n) σ2v=0.001 h(n) ={0,0.2798,1.000,0.2798,0}Sent information in the form of apn-sequence is affected by a channel (h(n)). An adaptive inverse filter is used to compensate for the channel such that the total effect can be approximated by a delay. White Gaussian noise is added during the transmission (WGN).
Adaptive Signal Processing 2011 Lecture 2
Example, cont: Learning Curve
22100 10−1 10−2 0 500 1000 1500 µ=0.0075 µ=0.075 µ=0.025
Learning curves based on 200 realiza-tions.
The decay is related toτmse,av. A largeµgives fast convergence, but also a largeJex(∞).
A ve ra ge of M SE , J ( n ) Iterationn
The curves illustrates learning curves for two differentµ.
Adaptive Signal Processing 2011 Lecture 2
What to read?
23• Least Mean Squares, Haykin chap.5.1–5.3, 5.4 (see lecture), 5.5–5.7
Exercises: 4.1, 4.2, 1.2, 4.3, 4.5, (4.4)
Computer exercise, theme: Implement the LMS algorithm in Matlab, and generate the results in Haykin chap.5.7