EE363 Winter 2005-06
Lecture 6 Estimation
• Gaussian random vectors
• minimum mean-square estimation (MMSE)
• MMSE with linear measurements
• relation to least-squares, pseudo-inverse
Gaussian random vectors
random vector x ∈ R
nis Gaussian if it has density p
x(v) = (2π)
−n/2(det Σ)
−1/2exp
− 1
2 (v − ¯x)
TΣ
−1(v − ¯x)
,
for some Σ = Σ
T> 0, ¯ x ∈ R
n• denoted x ∼ N (¯x, Σ)
• ¯x ∈ R
nis the mean or expected value of x, i.e.,
¯
x = E x = Z
vp
x(v)dv
• Σ = Σ
T> 0 is the covariance matrix of x, i.e.,
Σ = E(x − ¯x)(x − ¯x)
T= E xx
T− ¯x¯x
T= Z
(v − ¯x)(v − ¯x)
Tp
x(v)dv
density for x ∼ N (0, 1):
0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45
PSfrag replacements
p
x(v ) =
1 √ 2πe
−v2 /2• mean and variance of scalar random variable x
iare E x
i= ¯ x
i, E (x
i− ¯x
i)
2= Σ
iihence standard deviation of x
iis √
Σ
ii• covariance between x
iand x
jis E(x
i− ¯x
i)(x
j− ¯x
j) = Σ
ij• correlation coefficient between x
iand x
jis ρ
ij= Σ
ijpΣ
iiΣ
jj• mean (norm) square deviation of x from ¯x is
E kx − ¯xk
2= E Tr(x − ¯x)(x − ¯x)
T= Tr Σ =
n
X
i=1
Σ
ii(using Tr AB = Tr BA)
example: x ∼ N (0, I) means x
iare independent identically distributed
(IID) N (0, 1) random variables
Confidence ellipsoids
p
x(v) is constant for (v − ¯x)
TΣ
−1(v − ¯x) = α, i.e., on the surface of ellipsoid
E
α= {v | (v − ¯x)
TΣ
−1(v − ¯x) ≤ α}
thus ¯ x and Σ determine shape of density
can interpret E
αas confidence ellipsoid for x:
the nonnegative random variable (x − ¯x)
TΣ
−1(x − ¯x) has a χ
2ndistribution, so Prob(x ∈ E
α) = F
χ2n
(α) where F
χ2n
is the CDF some good approximations:
• E gives about 50% probability
geometrically:
• mean ¯x gives center of ellipsoid
• semiaxes are √
αλ
iu
i, where u
iare (orthonormal) eigenvectors of Σ
with eigenvalues λ
iexample: x ∼ N (¯x, Σ) with ¯x =
2 1
, Σ =
2 1 1 1
• x
1has mean 2, std. dev. √ 2
• x
2has mean 1, std. dev. 1
• correlation coefficient between x
1and x
2is ρ = 1/ √ 2
• E kx − ¯xk
2= 3
90% confidence ellipsoid corresponds to α = 4.6:
−4
−2 0 2 4 6 8
PSfrag replacements
x
2Affine transformation
suppose x ∼ N (¯x, Σ
x)
consider affine transformation of x:
z = Ax + b, where A ∈ R
m×n, b ∈ R
mthen z is Gaussian, with mean
E z = E(Ax + b) = A E x + b = A¯ x + b and covariance
Σ
z= E(z − ¯z)(z − ¯z)
T= E A(x − ¯x)(x − ¯x)
TA
T= AΣ
xA
Texamples:
• if w ∼ N (0, I) then x = Σ
1/2w + ¯ x is N (¯x, Σ)
useful for simulating vectors with given mean and covariance
• conversely, if x ∼ N (¯x, Σ) then z = Σ
−1/2(x − ¯x) is N (0, I)
(normalizes & decorrelates)
suppose x ∼ N (¯x, Σ) and c ∈ R
nscalar c
Tx has mean c
Tx and variance c ¯
TΣc
thus (unit length) direction of minimum variability for x is u, where Σu = λ
minu, kuk = 1
standard deviation of u
Tnx is √
λ
min(similarly for maximum variability)
Degenerate Gaussian vectors
it is convenient to allow Σ to be singular (but still Σ = Σ
T≥ 0) (in this case density formula obviously does not hold)
meaning: in some directions x is not random at all write Σ as
Σ = [Q
+Q
0]
Σ
+0
0 0
[Q
+Q
0]
Twhere Q = [Q
+Q
0] is orthogonal, Σ
+> 0
• columns of Q
0are orthonormal basis for N (Σ)
• columns of Q
+are orthonormal basis for range(Σ)
then Q
Tx = [z
Tw
T]
T, where
• z ∼ N (Q
T+x, Σ ¯
+) is (nondegenerate) Gaussian (hence, density formula holds)
• w = Q
T0x ¯ ∈ R
nis not random
(Q
T0x is called deterministic component of x)
Linear measurements
linear measurements with noise:
y = Ax + v
• x ∈ R
nis what we want to measure or estimate
• y ∈ R
mis measurement
• A ∈ R
m×ncharacterizes sensors or measurements
• v is sensor noise
common assumptions:
• x ∼ N (¯x, Σ
x)
• v ∼ N (¯v, Σ
v)
• x and v are independent
• N (¯x, Σ
x) is the prior distribution of x (describes initial uncertainty about x)
• ¯v is noise bias or offset (and is usually 0)
• Σ
vis noise covariance
thus x v
∼ N
x ¯
¯ v
,
Σ
x0 0 Σ
vusing
x y
=
I 0 A I
x v
we can write
E
x y
=
x ¯ A¯ x + ¯ v
and
E
x − ¯x y − ¯y
x − ¯x y − ¯y
T=
I 0 A I
Σ
x0 0 Σ
vI 0 A I
TΣ
xΣ
xA
Tcovariance of measurement y is AΣ
xA
T+ Σ
v• AΣ
xA
Tis ‘signal covariance’
• Σ
vis ‘noise covariance’
Minimum mean-square estimation
suppose x ∈ R
nand y ∈ R
mare random vectors (not necessarily Gaussian) we seek to estimate x given y
thus we seek a function φ : R
m→ R
nsuch that ˆ x = φ(y) is near x one common measure of nearness: mean-square error,
E kφ(y) − xk
2minimum mean-square estimator (MMSE) φ
mmseminimizes this quantity
general solution: φ
mmse(y) = E(x |y), i.e., the conditional expectation of x
MMSE for Gaussian vectors
now suppose x ∈ R
nand y ∈ R
mare jointly Gaussian:
x y
∼ N
¯ x
¯ y
,
Σ
xΣ
xyΣ
TxyΣ
y(after alot of algebra) the conditional density is p
x|y(v |y) = (2π)
−n/2(det Λ)
−1/2exp
− 1
2 (v − w)
TΛ
−1(v − w)
, where
Λ = Σ
x− Σ
xyΣ
−1yΣ
Txy, w = ¯ x + Σ
xyΣ
−1y(y − ¯y) hence MMSE estimator (i.e., conditional expectation) is
ˆ
x = φ
mmse(y) = E(x |y) = ¯x + Σ
xyΣ
−1y(y − ¯y)
φ
mmseis an affine function
MMSE estimation error, ˆ x − x, is a Gaussian random vector ˆ
x − x ∼ N (0, Σ
x− Σ
xyΣ
−1yΣ
Txy)
note that
Σ
x− Σ
xyΣ
−1yΣ
Txy≤ Σ
xi.e., covariance of estimation error is always less than prior covariance of x
Best linear unbiased estimator
estimator
ˆ
x = φ
blu(y) = ¯ x + Σ
xyΣ
−1y(y − ¯y) makes sense when x, y aren’t jointly Gaussian
this estimator
• is unbiased, i.e., E ˆx = E x
• often works well
• is widely used
• has minimum mean square error among all affine estimators
sometimes called best linear unbiased estimator
MMSE with linear measurements
consider specific case
y = Ax + v, x ∼ N (¯x, Σ
x), v ∼ N (¯v, Σ
v), x, v independent
MMSE of x given y is affine function ˆ
x = ¯ x + B(y − ¯y) where B = Σ
xA
T(AΣ
xA
T+ Σ
v)
−1, ¯ y = A¯ x + ¯ v intepretation:
• ¯x is our best prior guess of x (before measurement)
• estimator modifies prior guess by B times this discrepancy
• estimator blends prior information with measurement
• B gives gain from observed discrepancy to estimate
• B is small if noise term Σ
vin ‘denominator’ is large
MMSE error with linear measurements
MMSE estimation error, ˜ x = ˆ x − x, is Gaussian with zero mean and covariance
Σ
est= Σ
x− Σ
xA
T(AΣ
xA
T+ Σ
v)
−1AΣ
x• Σ
est≤ Σ
x, i.e., measurement always decreases uncertainty about x
• difference Σ
x− Σ
estgives value of measurement y in estimating x
• e.g., (Σ
est ii/Σ
x ii)
1/2gives fractional decrease in uncertainty of x
idue to measurement
note: error covariance Σ
estcan be determined before measurement y is
made!
to evaluate Σ
est, only need to know
• A (which characterizes sensors)
• prior covariance of x (i.e., Σ
x)
• noise covariance (i.e., Σ
v)
you do not need to know the measurement y (or the means ¯ x, ¯ v)
useful for experiment design or sensor selection
Information matrix formulas
we can write estimator gain matrix as
B = Σ
xA
T(AΣ
xA
T+ Σ
v)
−1= A
TΣ
−1vA + Σ
−1x −1A
TΣ
−1v• n × n inverse instead of m × m
• Σ
−1x, Σ
−1vsometimes called information matrices corresponding formula for estimator error covariance:
Σ = Σ − Σ A
T(AΣ A
T+ Σ )
−1AΣ
can interpret Σ
−1est= Σ
−1x+ A
TΣ
−1vA as:
posterior information matrix (Σ
−1est)
= prior information matrix (Σ
−1x)
+ information added by measurement (A
TΣ
−1vA)
proof: multiply
Σ
xA
T(AΣ
xA
T+ Σ
v)
−1 ?= A
TΣ
−1vA + Σ
−1x −1A
TΣ
−1von left by (A
TΣ
−1vA + Σ
−1x) and on right by (AΣ
xA
T+ Σ
v) to get
(A
TΣ
−1vA + Σ
−1x)Σ
xA
T ?= A
TΣ
−1v(AΣ
xA
T+ Σ
v)
which is true
Relation to regularized least-squares
suppose ¯ x = 0, ¯ v = 0, Σ
x= α
2I, Σ
v= β
2I estimator is ˆ x = By where
B = A
TΣ
−1vA + Σ
−1x −1A
TΣ
−1v= (A
TA + (β/α)
2I)
−1A
T. . . which corresponds to regularized least-squares MMSE estimate ˆ x minimizes
kAz − yk
2+ (β/α)
2kzk
2over z
Example
navigation using range measurements to distant beacons
y = Ax + v
• x ∈ R
2is location
• y
iis range measurement to ith beacon
• v
iis range measurement error, IID N (0, 1)
• ith row of A is unit vector in direction of ith beacon prior distribution:
x ∼ N (¯x, Σ ), x = ¯
1
, Σ =
2
20
90% confidence ellipsoid for prior distribution { x | (x − ¯x)
TΣ
−1x(x − ¯x) ≤ 4.6 }:
−6 −4 −2 0 2 4 6
−5
−4
−3
−2
−1 0 1 2 3 4 5
PSfrag replacements
x
1x
2Case 1: one measurement, with beacon at angle 30
◦fewer measurements than variables, so combining prior information with measurement is critical
resulting estimation error covariance:
Σ
est=
1.046 −0.107
−0.107 0.246
90% confidence ellipsoid for estimate ˆ x: (and 90% confidence ellipsoid for x)
−6 −4 −2 0 2 4 6
−5
−4
−3
−2
−1 0 1 2 3 4 5
PSfrag replacements
x
1x
2interpretation: measurement
• yields essentially no reduction in uncertainty in x
2• reduces uncertainty in x
1by a factor about two
Case 2: 4 measurements, with beacon angles 80
◦, 85
◦, 90
◦, 95
◦resulting estimation error covariance:
Σ
est=
3.429 −0.074
−0.074 0.127
90% confidence ellipsoid for estimate ˆ x: (and 90% confidence ellipsoid for x)
−6 −4 −2 0 2 4 6
−5
−4
−3
−2
−1 0 1 2 3 4 5