Conditional PDF’s for Gaussian random vectors

Next consider the conditional probability fX|Y(x|y) for two zero-mean jointly-Gaussian ran-

dom vectors X and Y with a non-singular covariance matrix. From (3.23), fX,Y(x, y) = 1 2⇡ X Y p 1 ⇢2exp  (x/ X)2+ 2⇢(x/ X)(y/ Y) (y/ Y)2 2(1 ⇢2) ,

where ⇢ = E [XY ] /( X Y). Since fY(y) = (2⇡ 2_Y) 1/2exp( y2/2 2_Y), we have

fX|Y(x|y) = 1 X p 2⇡(1 ⇢2₎exp  (x/ X)2+ 2⇢(x/ X)(y/ Y) ⇢2(y/ Y)2 2(1 ⇢2₎ .

The numerator of the exponent is the negative of the square (x/ x ⇢y/ y)2. Thus

fX|Y(x|y) = 1 X p 2⇡(1 ⇢2₎exp " [x ⇢( X/ Y)y]2 2 2 X(1 ⇢2) # . (3.37)

This says that, given any particular sample value y for the rv Y , the conditional density of X is Gaussian with variance 2_X(1 ⇢2) and mean ⇢( X/ Y)y. Given Y =y, we can view X

as a random variable in the restricted sample space where Y =y. In that restricted sample space, X is N ⇢( X/ Y)y, 2X(1 ⇢2) .

We see that the variance of X, given Y = y, has been reduced by a factor of 1 ⇢2 _{from the}

variance before the observation. It is not surprising that this reduction is large when |⇢| is close to 1 and negligible when ⇢ is close to 0. It is surprising that this conditional variance is the same for all values of y. It is also surprising that the conditional mean of X is linear in y and that the conditional distribution is Gaussian with a variance constant in y. Another way to interpret this conditional distribution of X conditional on Y is to use the above observation that the conditional fluctuation of X, conditional on Y = y, does not

depend on y. This fluctuation can then be denoted as a rv V that is independent of Y . Thus we can represent X as X = ⇢( X/ Y)Y + V where V ⇠ N (0, (1 ⇢2) 2X) and V is

independent of Y .

As will be seen in Chapter 10, this simple form for the conditional distribution leads to important simplifications in estimating X from Y . We now go on to show that this same kind of simplification occurs when we study the conditional density of one Gaussian random vector conditional on another Gaussian random vector, assuming that all the variables are jointly Gaussian.

Let X = (X1, . . . , Xn)T and Y = (Y1, . . . , Ym)T be zero-mean jointly Gaussian rv’s of

length n and m (i.e., X1, . . . , Xn, Y1, . . . , Ym are jointly Gaussian). Let their covariance

matrices be [KX] and [KY] respectively. Let [K] be the covariance matrix of the (n+m)-rv

(X1, . . . , Xn, Y1, . . . , Ym)T.

The (n+m) ⇥ (n+m) covariance matrix [K] can be partitioned into n rows on top and m rows on bottom, and then further partitioned into n and m columns, yielding:

[K] = 2 4 [KX] [KX·Y] [KT X·Y] [KY] 3 5 . (3.38)

Here [KX] = E [X XT], [KX·Y] = E [X YT], and [KY] = E [Y YT]. Note that if X and Y

have means, then [KX] = E

(X X )(X X )Ti, [KX·Y] = E

(X X )(Y Y )Ti, etc. In what follows, assume that [K] is non-singular. We then say that X and Y are jointly non-singular, which implies that none of the rv’s X1, . . . , Xn, Y1, . . . , Ym can be expressed

as a linear combination of the others. The inverse of [K] then exists and can be denoted in block form as [K 1_{] =} 2 4 [B] [C] [CT] [D] 3 5 . (3.39)

The blocks [B], [C], [D] can be calculated directly from [KK 1_{] = [I] (see Exercise 3.16),}

but for now we simply use them to find f_X_|Y(x |y).

We shall find that for any given y, f_X_|Y(x |y) is a jointly-Gaussian density with a conditional covariance matrix equal to [B 1_{] (Exercise 3.11 shows that [B] is non-singular). As in}

(3.37), where X and Y are one-dimensional, this covariance does not depend on y. Also, the conditional mean of X , given Y = y, will turn out to be [B 1_{C] y . More precisely,}

we have the following theorem:

Theorem 3.5.1. Let X and Y be zero-mean, jointly Gaussian, jointly non-singular rv’s. Then X, conditional on Y = y, is N [B 1_{C] y , [B} 1_{] , i.e.,}

fX|Y(x|y) = expn 1 2 ⇣ x + [B 1C] yT⌘ [B]⇣_{x + [B} 1_{C] y}⌘o (2⇡)n/2p_det[B 1_] . (3.40)

Proof: Express fX|Y(x |y) as fX Y(x , y)/fY(y). From (3.22), fX Y(x , y) = exp 1 2(x T_{, y}T)[K 1_](xT_{, y}T)T (2⇡)(n+m)/2p_det[K 1_]

= exp 12(xT[B]x + xT[C]y + yT[CT]x + yT[D]y)

(2⇡)(n+m)/2p_det[K 1_] .

Note that x appears only in the first three terms of the exponent above, and that x does not appear at all in fY(y). Thus we can express the dependence on x in fX|Y(x |y) by

fX|Y(x | y) = (y) exp ⇢ 1 2 h xT_{[B]x + x}T_{[C]y + y}T_[CT_]xi , (3.41) where (y) is some function of y. We now complete the square around [B] in the exponent above, getting

fX|Y(x | y) = (y) exp⇢ 1₂

(x +[B 1_{C] y )}T_{[B] (x +[B} 1_{C] y ) + y}T_[CT_B 1_{C] y}i _.

Since the last term in the exponent does not depend on x , we can absorb it into (y). The remaining expression has the form of the density of a Gaussian n-rv with non-zero mean as given in (3.24). Comparison with (3.24) also shows that (y) must be (2⇡) n/2_(det[B 1₎ 1/2_].

With this substituted for (y), we have (3.40).

To interpret (3.40), note that for any sample value y for Y , the conditional distribution of X has a mean given by [B 1_{C]y and a Gaussian fluctuation around the mean of variance}

[B 1_{]. This fluctuation has the same distribution for all y and thus can be represented as}

a rv V that is independent of Y . Thus we can represent X as

X = [G]Y + V ; Y , V independent, (3.42) where

[G] = [B 1_{C] and V} _{⇠ N (0, [B} 1_]). _(3.43)

We often call V an innovation, because it is the part of X that is independent of Y . It is also called a noise term for the same reason. We will call [KV] = [B 1] the conditional

covariance of X given a sample value y for Y . In summary, the unconditional covariance, [KX], of X is given by the upper left block of [K] in (3.38), while the conditional covariance

[KV] is the inverse of the upper left block, [B], of the inverse of [K].

The following theorem expresses (3.42) and (3.43) directly in terms of the covariances of X and Y .

Theorem 3.5.2. Let X and Y be zero-mean, jointly Gaussian, and jointly non-singular. Then X can be expressed as X = [G]Y + V where V is statistically independent of Y and

G = [KX·YKY1] (3.44)

Proof: From (3.42), we know that X can be represented as [G]Y + V with Y and V independent, so we simply have to evaluate [G] and [KV]. Using (3.42), the covariance of

X and Y is given by

[KX·Y] = E [X YT] = E [[G]Y YT+ V YT] = [GKY],

where we used the fact that V and Y are independent. Post-multiplying both sides by [K 1

Y ] yields (3.44). To verify (3.45), we use (3.42) to express [KX] as

[KX] = E [X XT] = E [([G]Y + V )([G]Y + V )T]

= [GKYGT] + [KV], so

[KV] = [KX] [GKYGT].

This yields (3.45) when (3.44) is used for [G].

We have seen that [KV] is the covariance of X conditional on Y = y for each sample value

y . The expression in (3.45) provides some insight into how this covariance is reduced from [KX]. More particularly, for any n-vector b,

bT_[K

X]b bT[KV]b,

i.e., the unconditional variance of bT_{X is always greater than or equal to the variance of}

bT_{X conditional on Y = y .}

In the process of deriving these results, we have also implicity evaluated the matrices [C] and [B] in the inverse of [K] in (3.39). Combining the second part of (3.43) with (3.45),

[B] = ⇣[KX] [KX·YKY1K

X·Y]

⌘ 1

(3.46) Combining the first part of (3.43) with (3.44), we get

[C] = [BKX·YKY1] (3.47)

Finally, reversing the roles of X and Y , we can express D as [D] =⇣[KY] [KY·XKX1K

Y·X]

⌘ 1

(3.48) Reversing the roles of X and Y is even more important in another way, since Theorem 3.5.2 then also says that X and Y are related by

Y = [H]X + Z , where X and Z are independent and (3.49)

[H] = [KY·XKX1], (3.50)

[KZ] = [KY] [KY·XKX1KYT·X]. (3.51)

This gives us three ways of representing any pair X , Y of zero-mean jointly Gaussian rv’s whose combined covariance is non-singular. First, they can be represented simply as

an overall rv, (X1, . . . , XnY1, . . . , Ym)T, second as X = [G]Y + V where Y and V are

independent, and third as Y = [H]X + Z where X and Z are independent.

Each of these formulations essentially implies the existence of the other two. If we start with formulation 3, for example, Exercise 3.17 shows simply that if X and Z are each zero-mean Gaussian rv’s, the independence between them assures that they are jointly Gaussian, and thus that X and Y are also jointly Gaussian. Similarly, if [KX] and [KZ] are nonsingular,

the overall [K] for (X1, . . . , Xn, Y1, . . . , Ym)T must be non-singular. In Chapter 10, we will

find that this provides a very simple and elegant solution to jointly Gaussian estimation problems.

In document Stochastic Processes --Theory for Applications by Robert g. Gallager (Page 138-142)