• No results found

• minimum mean-square estimation (MMSE)

N/A
N/A
Protected

Academic year: 2021

Share "• minimum mean-square estimation (MMSE)"

Copied!
34
0
0

Loading.... (view fulltext now)

Full text

(1)

EE363 Winter 2005-06

Lecture 6 Estimation

• Gaussian random vectors

• minimum mean-square estimation (MMSE)

• MMSE with linear measurements

• relation to least-squares, pseudo-inverse

(2)

Gaussian random vectors

random vector x ∈ R

n

is Gaussian if it has density p

x

(v) = (2π)

−n/2

(det Σ)

−1/2

exp



− 1

2 (v − ¯x)

T

Σ

−1

(v − ¯x)

 ,

for some Σ = Σ

T

> 0, ¯ x ∈ R

n

• denoted x ∼ N (¯x, Σ)

• ¯x ∈ R

n

is the mean or expected value of x, i.e.,

¯

x = E x = Z

vp

x

(v)dv

• Σ = Σ

T

> 0 is the covariance matrix of x, i.e.,

Σ = E(x − ¯x)(x − ¯x)

T

(3)

= E xx

T

− ¯x¯x

T

= Z

(v − ¯x)(v − ¯x)

T

p

x

(v)dv

density for x ∼ N (0, 1):

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45

PSfrag replacements

p

x

(v ) =

1 2π

e

v2 /2

(4)

• mean and variance of scalar random variable x

i

are E x

i

= ¯ x

i

, E (x

i

− ¯x

i

)

2

= Σ

ii

hence standard deviation of x

i

is √

Σ

ii

• covariance between x

i

and x

j

is E(x

i

− ¯x

i

)(x

j

− ¯x

j

) = Σ

ij

• correlation coefficient between x

i

and x

j

is ρ

ij

= Σ

ij

ii

Σ

jj

• mean (norm) square deviation of x from ¯x is

E kx − ¯xk

2

= E Tr(x − ¯x)(x − ¯x)

T

= Tr Σ =

n

X

i=1

Σ

ii

(using Tr AB = Tr BA)

example: x ∼ N (0, I) means x

i

are independent identically distributed

(IID) N (0, 1) random variables

(5)

Confidence ellipsoids

p

x

(v) is constant for (v − ¯x)

T

Σ

−1

(v − ¯x) = α, i.e., on the surface of ellipsoid

E

α

= {v | (v − ¯x)

T

Σ

−1

(v − ¯x) ≤ α}

thus ¯ x and Σ determine shape of density

can interpret E

α

as confidence ellipsoid for x:

the nonnegative random variable (x − ¯x)

T

Σ

−1

(x − ¯x) has a χ

2n

distribution, so Prob(x ∈ E

α

) = F

χ2

n

(α) where F

χ2

n

is the CDF some good approximations:

• E gives about 50% probability

(6)

geometrically:

• mean ¯x gives center of ellipsoid

• semiaxes are √

αλ

i

u

i

, where u

i

are (orthonormal) eigenvectors of Σ

with eigenvalues λ

i

(7)

example: x ∼ N (¯x, Σ) with ¯x =

 2 1



, Σ =

 2 1 1 1



• x

1

has mean 2, std. dev. √ 2

• x

2

has mean 1, std. dev. 1

• correlation coefficient between x

1

and x

2

is ρ = 1/ √ 2

• E kx − ¯xk

2

= 3

90% confidence ellipsoid corresponds to α = 4.6:

−4

−2 0 2 4 6 8

PSfrag replacements

x

2

(8)

Affine transformation

suppose x ∼ N (¯x, Σ

x

)

consider affine transformation of x:

z = Ax + b, where A ∈ R

m×n

, b ∈ R

m

then z is Gaussian, with mean

E z = E(Ax + b) = A E x + b = A¯ x + b and covariance

Σ

z

= E(z − ¯z)(z − ¯z)

T

= E A(x − ¯x)(x − ¯x)

T

A

T

= AΣ

x

A

T

(9)

examples:

• if w ∼ N (0, I) then x = Σ

1/2

w + ¯ x is N (¯x, Σ)

useful for simulating vectors with given mean and covariance

• conversely, if x ∼ N (¯x, Σ) then z = Σ

−1/2

(x − ¯x) is N (0, I)

(normalizes & decorrelates)

(10)

suppose x ∼ N (¯x, Σ) and c ∈ R

n

scalar c

T

x has mean c

T

x and variance c ¯

T

Σc

thus (unit length) direction of minimum variability for x is u, where Σu = λ

min

u, kuk = 1

standard deviation of u

Tn

x is √

λ

min

(similarly for maximum variability)

(11)

Degenerate Gaussian vectors

it is convenient to allow Σ to be singular (but still Σ = Σ

T

≥ 0) (in this case density formula obviously does not hold)

meaning: in some directions x is not random at all write Σ as

Σ = [Q

+

Q

0

]

 Σ

+

0

0 0



[Q

+

Q

0

]

T

where Q = [Q

+

Q

0

] is orthogonal, Σ

+

> 0

• columns of Q

0

are orthonormal basis for N (Σ)

• columns of Q

+

are orthonormal basis for range(Σ)

(12)

then Q

T

x = [z

T

w

T

]

T

, where

• z ∼ N (Q

T+

x, Σ ¯

+

) is (nondegenerate) Gaussian (hence, density formula holds)

• w = Q

T0

x ¯ ∈ R

n

is not random

(Q

T0

x is called deterministic component of x)

(13)

Linear measurements

linear measurements with noise:

y = Ax + v

• x ∈ R

n

is what we want to measure or estimate

• y ∈ R

m

is measurement

• A ∈ R

m×n

characterizes sensors or measurements

• v is sensor noise

(14)

common assumptions:

• x ∼ N (¯x, Σ

x

)

• v ∼ N (¯v, Σ

v

)

• x and v are independent

• N (¯x, Σ

x

) is the prior distribution of x (describes initial uncertainty about x)

• ¯v is noise bias or offset (and is usually 0)

• Σ

v

is noise covariance

(15)

thus  x v



∼ N

 x ¯

¯ v

 ,

 Σ

x

0 0 Σ

v



using

 x y



=

 I 0 A I

  x v



we can write

E

 x y



=

 x ¯ A¯ x + ¯ v

 and

E

 x − ¯x y − ¯y

  x − ¯x y − ¯y



T

=

 I 0 A I

  Σ

x

0 0 Σ

v

  I 0 A I



T

 Σ

x

Σ

x

A

T



(16)

covariance of measurement y is AΣ

x

A

T

+ Σ

v

• AΣ

x

A

T

is ‘signal covariance’

• Σ

v

is ‘noise covariance’

(17)

Minimum mean-square estimation

suppose x ∈ R

n

and y ∈ R

m

are random vectors (not necessarily Gaussian) we seek to estimate x given y

thus we seek a function φ : R

m

→ R

n

such that ˆ x = φ(y) is near x one common measure of nearness: mean-square error,

E kφ(y) − xk

2

minimum mean-square estimator (MMSE) φ

mmse

minimizes this quantity

general solution: φ

mmse

(y) = E(x |y), i.e., the conditional expectation of x

(18)

MMSE for Gaussian vectors

now suppose x ∈ R

n

and y ∈ R

m

are jointly Gaussian:

 x y



∼ N

 

¯ x

¯ y

 ,

 Σ

x

Σ

xy

Σ

Txy

Σ

y

 

(after alot of algebra) the conditional density is p

x|y

(v |y) = (2π)

−n/2

(det Λ)

−1/2

exp



− 1

2 (v − w)

T

Λ

−1

(v − w)

 , where

Λ = Σ

x

− Σ

xy

Σ

−1y

Σ

Txy

, w = ¯ x + Σ

xy

Σ

−1y

(y − ¯y) hence MMSE estimator (i.e., conditional expectation) is

ˆ

x = φ

mmse

(y) = E(x |y) = ¯x + Σ

xy

Σ

−1y

(y − ¯y)

(19)

φ

mmse

is an affine function

MMSE estimation error, ˆ x − x, is a Gaussian random vector ˆ

x − x ∼ N (0, Σ

x

− Σ

xy

Σ

−1y

Σ

Txy

)

note that

Σ

x

− Σ

xy

Σ

−1y

Σ

Txy

≤ Σ

x

i.e., covariance of estimation error is always less than prior covariance of x

(20)

Best linear unbiased estimator

estimator

ˆ

x = φ

blu

(y) = ¯ x + Σ

xy

Σ

−1y

(y − ¯y) makes sense when x, y aren’t jointly Gaussian

this estimator

• is unbiased, i.e., E ˆx = E x

• often works well

• is widely used

• has minimum mean square error among all affine estimators

sometimes called best linear unbiased estimator

(21)

MMSE with linear measurements

consider specific case

y = Ax + v, x ∼ N (¯x, Σ

x

), v ∼ N (¯v, Σ

v

), x, v independent

MMSE of x given y is affine function ˆ

x = ¯ x + B(y − ¯y) where B = Σ

x

A

T

(AΣ

x

A

T

+ Σ

v

)

−1

, ¯ y = A¯ x + ¯ v intepretation:

• ¯x is our best prior guess of x (before measurement)

(22)

• estimator modifies prior guess by B times this discrepancy

• estimator blends prior information with measurement

• B gives gain from observed discrepancy to estimate

• B is small if noise term Σ

v

in ‘denominator’ is large

(23)

MMSE error with linear measurements

MMSE estimation error, ˜ x = ˆ x − x, is Gaussian with zero mean and covariance

Σ

est

= Σ

x

− Σ

x

A

T

(AΣ

x

A

T

+ Σ

v

)

−1

x

• Σ

est

≤ Σ

x

, i.e., measurement always decreases uncertainty about x

• difference Σ

x

− Σ

est

gives value of measurement y in estimating x

• e.g., (Σ

est ii

x ii

)

1/2

gives fractional decrease in uncertainty of x

i

due to measurement

note: error covariance Σ

est

can be determined before measurement y is

made!

(24)

to evaluate Σ

est

, only need to know

• A (which characterizes sensors)

• prior covariance of x (i.e., Σ

x

)

• noise covariance (i.e., Σ

v

)

you do not need to know the measurement y (or the means ¯ x, ¯ v)

useful for experiment design or sensor selection

(25)

Information matrix formulas

we can write estimator gain matrix as

B = Σ

x

A

T

(AΣ

x

A

T

+ Σ

v

)

−1

= A

T

Σ

−1v

A + Σ

−1x



−1

A

T

Σ

−1v

• n × n inverse instead of m × m

• Σ

−1x

, Σ

−1v

sometimes called information matrices corresponding formula for estimator error covariance:

Σ = Σ − Σ A

T

(AΣ A

T

+ Σ )

−1

(26)

can interpret Σ

−1est

= Σ

−1x

+ A

T

Σ

−1v

A as:

posterior information matrix (Σ

−1est

)

= prior information matrix (Σ

−1x

)

+ information added by measurement (A

T

Σ

−1v

A)

(27)

proof: multiply

Σ

x

A

T

(AΣ

x

A

T

+ Σ

v

)

−1 ?

= A

T

Σ

−1v

A + Σ

−1x



−1

A

T

Σ

−1v

on left by (A

T

Σ

−1v

A + Σ

−1x

) and on right by (AΣ

x

A

T

+ Σ

v

) to get

(A

T

Σ

−1v

A + Σ

−1x

x

A

T ?

= A

T

Σ

−1v

(AΣ

x

A

T

+ Σ

v

)

which is true

(28)

Relation to regularized least-squares

suppose ¯ x = 0, ¯ v = 0, Σ

x

= α

2

I, Σ

v

= β

2

I estimator is ˆ x = By where

B = A

T

Σ

−1v

A + Σ

−1x



−1

A

T

Σ

−1v

= (A

T

A + (β/α)

2

I)

−1

A

T

. . . which corresponds to regularized least-squares MMSE estimate ˆ x minimizes

kAz − yk

2

+ (β/α)

2

kzk

2

over z

(29)

Example

navigation using range measurements to distant beacons

y = Ax + v

• x ∈ R

2

is location

• y

i

is range measurement to ith beacon

• v

i

is range measurement error, IID N (0, 1)

• ith row of A is unit vector in direction of ith beacon prior distribution:

x ∼ N (¯x, Σ ), x = ¯

 1 

, Σ =

 2

2

0 

(30)

90% confidence ellipsoid for prior distribution { x | (x − ¯x)

T

Σ

−1x

(x − ¯x) ≤ 4.6 }:

−6 −4 −2 0 2 4 6

−5

−4

−3

−2

−1 0 1 2 3 4 5

PSfrag replacements

x

1

x

2

(31)

Case 1: one measurement, with beacon at angle 30

fewer measurements than variables, so combining prior information with measurement is critical

resulting estimation error covariance:

Σ

est

=

 1.046 −0.107

−0.107 0.246



(32)

90% confidence ellipsoid for estimate ˆ x: (and 90% confidence ellipsoid for x)

−6 −4 −2 0 2 4 6

−5

−4

−3

−2

−1 0 1 2 3 4 5

PSfrag replacements

x

1

x

2

interpretation: measurement

• yields essentially no reduction in uncertainty in x

2

• reduces uncertainty in x

1

by a factor about two

(33)

Case 2: 4 measurements, with beacon angles 80

, 85

, 90

, 95

resulting estimation error covariance:

Σ

est

=

 3.429 −0.074

−0.074 0.127



(34)

90% confidence ellipsoid for estimate ˆ x: (and 90% confidence ellipsoid for x)

−6 −4 −2 0 2 4 6

−5

−4

−3

−2

−1 0 1 2 3 4 5

PSfrag replacements

x

1

x

2

interpretation: measurement yields

• little reduction in uncertainty in x

1

• small reduction in uncertainty in x

2

References

Related documents

The objectives of this work were to test the operability and energy efficiency of the two-stage grinding process to produce fine wood particles and to determine the

STONE Nephrolithometry scoring is simple and effective bed side tool in determining the chance of achieving stone free status by a single session of PCNL.Total score,

CE at the nominal operating condition of the powertrain (CISPR current-probe method): (a) Comparison of a test setup with proper grounding of cable shields, and a setup lacking

Depressed Whole Blood Serotonin Levels Associated With Behavioral Abnormalities in the de

In the resistance state, the information of the experience of successful resisting competitor persuasion and the connection knowledge with target product or brand

We found improvements of sirolimus in the following pa- rameters in patients after beginning sirolimus treatment: the FEV 1 in the first year; the FVC in the first and sec- ond

The City of West Allis Department of Development, Housing Division will be responsible for the provision of information and counseling of all Home Rental Rehabilitation/Home

With the regulation screw in the minimum pressure position (nut level, 12mm screw inside), the automatic drop pressure will be adjusted to approximately 33psi.. To increase