the Ls technIque - 5 The Wiener Filter

5 The Wiener Filter

5.2 the Ls technIque

The principle of LS is used in this chapter for signals of one variable, although the concept applies equally well for signals with more than one variable. Furthermore, we study discrete signals, which can always be derived from continuous signals if the sampling frequency is sufficient high enough so that aliasing does not take place.

Let us, for example, have a discrete signal that is represented in its vector form as follows: x=[ ( )x1 x( )2  x N( )]. We would like, next, to approximate the func-tion with a polynomial of the form:

x nT( )= + ×c1 c2 (nT)+ ×c3 (nT)2+ + cM×(nT)M⁻1 T =sampling time (5.1) Since the approximating polynomial is a linear function of the unknown ci,s, we are dealing with linear LS approximation. The difference between the exact and the approximate function is the error. We can define a cost function J, which is equal to the sum of the difference of errors squared, the total square error. Hence, we write

J x n x n

n N

= −

∑

= ^{[ ( )} ^^{( )]}

2 (5.2)

example 5.2.1

Let x n( )=2sin( .0 05πn)+randn(1,20), with T=1, be the signal to be approximated.

Let the signal x n ( ) = +c1 c n c n2 + 3 2, the sampling time T=1, approximate the origi-nal sigorigi-nal. Find the unknown constants ci,

s and plot the results.

Solution: The cost function for this case is

J x n x n x n c c n c n

n n

= − = − − −

= =

∑

^{[ ( )} ^^{( )]}

∑

^{[ ( )} ^]

1 20

2 1 2

1 20

3 2 2

The partial derivatives with respect to ci

Using MATLAB, we obtain the following system:

20 210 2870 27 7347 210 2870 44100 296 1023 2870

s are found by using the expression:

20 210 2370

210 2870 44100 2870 44100 722666



Therefore, the estimate curve is given by

x n( )= −1 2457 0 7042. + . n−0 0332. n²

J x n x n x n n

Next, to verify (5.3), we proceed as follows:

J x

The last two expressions are identical. The reader should observe that the following matrix relationship was used:

(x Hc− )^T =(x^T−c H^T ^T) (5.4) where:

The exponent T stands for transpose of a matrix

To find the unknowns ci, differentiate the cost function of (5.3) with respect to each ci

and then set the developed equations equal to zero. Therefore, we have a system with ci, the unknowns that are determined by solving the system. The following example will elucidate the procedure.

example 5.2.2

Let a signal be a constant, s n( )= A, and let the received signal be given by x n( )= +5 randn(1 10, ). Find A.

Solution: According to the LS approach, we can estimate the constant 5 by mini-mizing the cost function. Taking the derivative J with respect to A and setting the results equal to zero, we obtain

J A A x n J A

and for this case

J x n

Find the amplitude constants of the signal

s( )n =Asin( .0 1πn)+Bsin( .0 4πn) n=0 1, , , N−1

if the received signal is x n( )=s n( )+ randn(1, )N and their exact values are A=0 2. and B=5 2. .

Solution: For this case, and for n=0 1 2 3, , , and N=4, we obtain then equate the results to zero, we obtain

>>n = 0:3;x = 0.2*sin(0.1*pi*n)+5.2*sin(0.4*pi*n)…

>>+0.5*randn(1,4);

>>h1 = sin(0.1*pi*n);h2 = sin(0.4*pi*n);

>>ab = inv([sum(h1.^2) sum(h1.*h2);sum(h1.*h2)…

>>sum(h2.^2)])*[sum(x.*h1);sum(x.*h2)];

◼

5.2.2 LS FormuLation

We consider a linear adaptive filter with coefficients at time n:

w( ) [ ( )n = w n1 w n2( )  w nM( )]T a measured real-valued input vectorr x( ) [ ( )n = x n1 x n2( )  x nM( )]T and a measured desired response d(nn) Note that no structure has been specified for the input vector x( )n , and therefore, it can be considered as the successive samples of a particular process or as a snapshot of M detectors as shown in Figure 5.2. Hence, the problem is to estimate the desired response d n( ) using the linear combination:

y n ^T n n w n x nk k n N

The above equation can be represented by a linear combiner as shown in Figure 5.3.

The combiner error is defined by the relation:

e n( )=d n( )−y n( )=d n( )− w^T( ) ( )nxn (5.6)

The coefficients of the adaptive filter are found by minimizing the sum of the squares of the error (LS):

J Ee g n e n

n N

≡ =

∑

= ^{( ) ( )}² 1

(5.7) where:

g n( ) is a weighting function

Therefore, in the LS method, the filter coefficients are optimized by using all the observations from the time the filter begins until the present time and minimizing

x(1)

x(2)

x(3)

x(2) x(3) ... x(M) x(M + 1) x(M + 2) ... x(N − 1) x(N)

M 1 2

Time (a)

(b)

xM(N) x₁(N) x2(N)

xM(2) x₁(2) x₂(2) x₁(1)

x₂(1)

xM(1)

FIgure 5.2

x₁(n)

w₁(n)

...

X +

X w₂(n)

y(n) w_M(n)

x2(n)

x3(n)

FIgure 5.3

the sum of the squared values of the error samples, which are equal to the measured desired signal and the output signal of the filter. The minimization is valid when the filter coefficient vector w( )n is kept constant, w, over the measurement time interval 1 ≤ ≤n N. In statistics, the LS estimation is known as regression, e n( ) are known as signals, and w is the regression vector.

We next define the matrix of the observed input samples as follows:

X^T

where we assume that N M> . This defines an overdetermined LS problem.

For the case in which we have one-dimensional input signal, as shown in Figure 5.2b, the data matrix takes the form:

X^T

1 2  Filter coefficients 1 (5.17)

X=[x1 x2  xM]T≡Matrix(N M× ) (5.18)

In addition, with g n( )=1 for all n, (5.7) takes the form:

T T T

T T T T T T

= = − − = − −

= − − +

= −

e e d y d y d Xw d Xw d d w X d d Xw w X Xw

( ) ( ) ( ) ( )

TT T T

d T T

p p w w Rw− + = −2p w w Rw+

(5.19)

where:

Ed T d n d n

n N

= =

∑

d d ( ) ( )

(5.20)

R X X= = x x ×

∑

= T

n N

n T n M M

( ) ( ) ( ) (5.21)

p X d= = x ×

∑

= T

n N

n d n M

( ) ( ) ( ) (5.22)

y Xw= = x ×

∑

= k

k k N

w ( 1) (5.23)

The matrix R becomes time average if it is divided by N. In statistics, the scaled form of R is known as the sample correlation matrix.

Setting the gradient of J with respect to the vector coefficients w equal to zero, we obtain

Rw p p= ; ^T =w R^T ^T=w R^T (Ris symmetric ) (5.24) or

w R p = ⁻¹ (5.25)

Therefore, the minimum sum of squared errors is given by

J ^T ^T ^T Ed T E

d T

min=d d−2p R p w RR p⁻¹ + ⁻¹ = −p R p⁻¹ = −p w (5.26) since R is symmetric.

example 5.2.4

Let the desired response be d=[1 1 1 1 , and the two measured ] x1=[ .0 7 1 4 0 4 1 3. . . ]^T and x2=[ .1 2 0 6 0 5 1 1. . . ] . Then, we obtain^T

R X X= = The LS technique is a mathematical procedure that enables us to achieve a best fit of a model to experimental data. In the sense of the M-parameter linear system, shown in Figure 5.4, (5.5) is written in the form:

y n( )=w x n_{1 1}( )+w x n_{2 2}( )++w x nM M( ) n=1 2, , , N (5.27) The above equation takes the following matrix form:

y=Xw (5.28)

To estimate the M parameters wi, it is necessary that N M≥ . If N M= , then we can uniquely solve for w to find

w X y = ⁻¹ (5.29)

provided that X⁻¹ exists. w is the estimate of w. Using the least error squares, we can determine w, provided that N M> .

Let us define an error vector e=[e e eN]T

1 1  as follows:

e y Xw= − (5.30)

Next, we choose w in such a way that the criterion J ei

is minimized. To proceed, we write

Differentiating J with respect to w and equating the results to zero, we determine the conditions on the estimate w that minimizes J. Hence,

∂ from which we obtain

w =(X X^T )⁻¹X y^T (5.35)

The above equation is known as the LS estimator (LSE) of w. Equation 5.34 is known as the normal equation.

5.2.3 StatiSticaL ProPertieSoF LSeS

We rewrite (5.30) in the form (X = deterministic matrix):

y Xw e= + (5.36)

and assume that e is a stationary random vector with zero mean value, E{ }e =0.

Furthermore, e is assumed to be uncorrelated with y and X. Therefore, on the given statistical properties of e, we wish to know just how good or how accurate the esti-mates of the parameters are.

Substituting (5.36) in (5.35) and taking the ensemble average, we obtain

E{ }w =E{w+(X X^T )⁻¹X e^T }=E{ }w +E{(X X^T )⁻¹X} { }Ee =w ( { }Ee =0 (5.37)) which indicates that w is unbiased.

The covariance matrix corresponding to the estimate error w w − is

(5.38)

where:

Re is the error correlation matrix

If the noise sample e( )i for i=1 2 3, , ,  is normal, identically distributed with zero mean and variance σ²[e= N 0( ,σI)], then

Re=E{ee^T} σ= ²I (5.40)

and hence,

Cw=σ²(X X^T )⁻¹ (5.41)

Using (5.36) and taking into consideration that e is a Gaussian random vector, then the natural logarithm of its probability density is given by

ln ( ; ) with respect to parameter w. Hence, we find

∂

Equation 5.43 becomes

∂

Assuming that X X^T is invertible, then

∂

From the Cramer–Rao lower bound (CRLB) theorem, w is the minimum variance unbiased (MVU) estimator, since we have found that

w =(X X^T )⁻¹X y g w^T = ( ) (5.47) and (5.46) becomes

∂

The matrix

I w( ) = X X^T

σ² (5.49)

is known as the Fisher information matrix. In the CRLB theorem, the Fisher matrix is defined by the relation:

(Iw) ( ;e w

i j

E w w

= ∂

∂ ∂













2lnP ) (5.50)

and thus, the parameters are shown explicitly. Comparing (5.41) and (5.49), the MVU estimator of w is given by (5.47) and its covariance matrix is

Cw=I w⁻¹( )=σ²(X X^T )⁻¹ (5.51) The MVU estimator of the linear model (5.36) is efficient since it attains the CRLB or, in other words, the covariance matrix is equal to the inverse of the Fisher infor-mation matrix.

Let us rewrite the error covariance matrix in the form:

Cw=I w = X X =  X X

 



− −

−

1 2 1 2 1

( ) σ ( _T ) σ 1 _T

N N (5.52)

where:

N is the number of equations in the vector equation (5.36). Let lim[( / ) ] ,

N T

→∞

− =

1 X X ¹ A

where A is a rectangular constant matrix. Then

lim lim

N^→∞C =N^→∞ NA=

w σ²

0 (5.53)

Since the covariance becomes zero as N goes to infinity it implies that w w= . The above convergence property defines w as a consistent estimator.

The above development shows that, if a system is modeled as linear in the presence of white Gaussian noise, the LSE approach provides estimators that are unbiased and consistent.

5.2.4 t^he LS a^PProach

Using the LS approach, we try to minimize the squared difference between the given data (or desired data) d n( ) and the output signal of a linear time-invariant (LTI) system. The signal y n( ) is generated by some system, which in turn depends on its unknown parameters wi,s. The LSE of wi,s chooses the values that make y,s closest to the given data. The measure of closeness is defined by the LSE [see also (5.19)].

For the one-coefficient system model, we have

J w d n y n function J w( ) is the LSE. It is apparent that the performance of LSE will depend on the statistical properties of the corrupting noise to the signal as well as any system modeling error.

example 5.2.5

Let us assume that the signal is y n( ) = cos(a ω0n), where ω0 is known and the ampli-tude a must be determined. Hence, the LSE minimizes the cost function:

J a d n a n ⁿ

Therefore, we obtain

∂ Let us assume that the output of a system is linear, and it is given by the relation y n( )=x n w( ) , where x n( ) is a known sequence. Hence, the LSE criterion becomes

The estimate value of w is

w = ⁼

and the minimum LS error is given by (see Problem 5.2.4)

J J d n d n x n d n d n x n

example 5.2.6

Consider the experimental data shown in Figure 5.5. It is recommended that the linear model, y n( )= +a bn, for the data be used. Using the LSE approach, we find the cost function:

J d n a bn

From (5.35), the estimate of w is

w =(X X X d^T )⁻¹ ^T (5.59) and from the data shown in Figure 5.5

w 

The straight line was also plotted to verify the procedure of LSE. The data were produced using the equation d n( )=1 5 0 035. + . n+randn( )n for n=1 2, , ,… 100.

◼

5.2.5 orthogonaLity P^rinciPLe

To obtain the orthogonality principle for the LS problem, we follow the procedure developed for the Wiener filter. Therefore, using unweighted sum of the squares of the error, we obtain

∂

But (5.10) and (5.11) are equal to (w has M coefficients) e m d m w x mk k and, therefore, taking the derivative of e m( ) with respect of wk and introducing the results of (5.60), we obtain

∂ The estimate error e m( ) is optimum in the LS sense. The above result is known as the principle of orthogonality.

5.2.6 coroLLary

In document Adaptive Filtering (Page 142-156)