Random Vectors: Independence and Dependence
5.5 Dependence: Conditional Expectation
If in addition the Xi are independent, then
E n
i=1
Xi
=
n i=1
E(Xi). (18)
Many inequalities for probabilities and moments are useful when dealing with random vectors. Most of them are beyond our scope at this stage, but we give a few simple examples with some applications.
(19) Basic Inequality If X ≤ Y with probability one, then E(X) ≤ E(Y ).
Proof This follows immediately from Theorem 4.3.6.
Corollary (a) For 1< r < s,
E(|X|r)≤ (E(|X|s))+ 1.
(20)
(b) For r≥ 1,
E(|X + Y |r)≤ 2r(E(|X|r)+ E(|Y |r)). (21)
Proof (a) If|x| ≤ 1, then |x|r ≤ 1; and if |x| > 1 and r < s, then |x|r ≤ |x|s. Hence, in any case, when r ≤ s, |x|r ≤ |x|s + 1. Thus,
E(|X|r)=
x
|x|r f (x)≤
x
|x|sf (x)+ 1 = E(|X|s)+ 1.
(b) For any real numbers x and y, if k≤ r, |x|k|y|r−k ≤ |xr + |y|r, because either (|x|k/|y|k)≤ 1 or (|y|r−k/|x|r−k)< 1. Hence,
|x + y|r ≤ (|x| + |y|)r ≤
r k=0
r k
|x|k|y|r−k
≤
r k=0
r k
(|x|r+ |y|r)= 2r(|x|r + |y|r),
and (3) follows.
(22) Corollary These inequalities show that:
(a) If E(Xs)< ∞, then for all 1 ≤ r ≤ s, E(Xr)< ∞.
(b) If E(Xr) and E(Yr) are finite then E((X + Y )r)< ∞.
5.5 Dependence: Conditional Expectation
Let X and Y be jointly distributed random variables. We may be given the value of Y , either in fact, or as a supposition. What is the effect on the distribution of X ?
(1) Definition If X and Y have joint probability mass function f (x, y), then given Y = y, the random variable X has a conditional probability mass function given by
fX|Y(x|y) = f (x, y) fY(x)
for all y such that fY(y)> 0.
Example Let X and Y be independent geometric random variables each having mass function f (x)= (1 − λ)λx; x ≥ 0, 0 < λ < 1. Let Z = X + Y . Showthat for 0 ≤ x ≤ z, fX|Z(x|z) = 1/(z + 1).
Solution From Example 5.4.13, we know that Z has mass function fZ(z)= (z+ 1)(1 − λ)2λzand so
fX|Z(x|z) = P(X = x, Z = z)
(z+ 1)(1 − λ)2λz = (1− λ)2λxλz−x
(1− λ)2λz(z+ 1) = (z + 1)−1.
s
(2) Example 5.1.8 Revisited: Cutting for the Deal Find the conditional mass function of the loser’s card conditional on W = w; find also fW|V(w|v).
Solution According to Example 5.1.8, f (v, w) = 781; 2≤ v < w ≤ 14, and fW(w) = w − 2
78 ; 3≤ w ≤ 14.
Hence, using Definition 1,
fV|W(v|w) = 1
w − 2; 2≤ v < w.
The loser’s score is uniformly distributed given W . Likewise, fW|V(w|v) = 1
78
14− v
78 = 1
14− v; v < w ≤ 14,
also a uniform distribution.
s
(3) Example Let X and Y be independent. Showthat the conditional mass function of X given Y is fX(x), the marginal mass function.
Solution Because X and Y are independent, f (x, y) = fX(x) fY(y). Hence, applying Definition 1,
fX|Y(x|y) = f (x, y)/fY(y)= fX(x).
s
(4) Theorem fX|Y(x|y) is a probability mass function, which is to say that (i) fX|Y(x|y) ≥ 0 and
(ii)
x fX|Y(x|y) = 1.
5.5 Dependence: Conditional Expectation 179 Proof Part (i) is trivial. Part (ii) follows immediately from (5.1.5). Recall that two events A and B are said to be conditionally independent given C, if P( A∩ B|C) = P(A|C)P(B|C). Likewise, it is possible for two random variables X and Y to be conditionally independent given Z , if
fX,Y |Z = fX|ZfY|Z.
Example: Cutting for the Deal (Example 5.1.8 Revisited) Suppose three players cut for the deal (with ties not allowed, as usual). Let X be the lowest card, Y the highest card and Z the intermediate card. Clearly, X and Y are dependent. However, conditional on Z = z, X and Y are independent. The mass function fX|Zis uniform on{2, . . . , z − 1}
and fY|Zis uniform on{z + 1, . . . , 14}.
s
Being a mass function, fX|Ymay have an expectation; it has a special name and importance.
(5) Definition The conditional expectation of X , given that Y = y where fY(y)> 0, is
E(X|{Y = y}) =
x
x f (x, y)/fY(y),
when the sum is absolutely convergent.
As y varies over the possible values of Y , this defines a function of Y , denoted by E(X|Y ).
Because it is a function of Y , it is itself a random variable, which may have an expectation.
(6) Theorem If both sides exist,
E(E(X|Y )) = E(X).
(7)
Proof Assuming the sums are absolutely convergent we have, by Theorem 4.3.4, E(E(X|Y )) =
y
E(X|{Y = y}) fY(y)=
y
x
x f (x, y)
fY(y) fY(y) by Example 2
=
x
x fX(x) by (5.1.4)
= E(X).
This is an exceptionally important and useful result. Judicious use of Theorem 6 can greatly simplify many calculations; we give some examples.
(8) Example: Eggs A hen lays X eggs where X is Poisson with parameterλ. Each hatches with probability p, independently of the others, yielding Y chicks. Showthatρ(X, Y ) =
√p.
Solution Conditional on X = k, the number of chicks is binomial B(k, p), with mean kp. Hence, identically distributed random variables, and let Y be an integer valued random variable independent of all the Xi. Let SY =Y
(10) Example 4.12 Revisited: Gamblers’ Ruin Two gamblers, A and B, have n coins.
They divide this hoard by tossing each coin; A gets those that showheads, X say, B gets the rest, totalling n− X.
They then play a series of independent fair games; each time A wins he gets a coin from B, each time he loses he gives a coin to B. They stop when one or other has all the coins.
Let DX be the number of games played. Find E(DX), and showthat, when the coins are fair,ρ(X, DX)= 0.
Solution Conditional on X = k, as in Example 4.12, Dk = 1
2Dk+1+1
2Dk−1+ 1
5.5 Dependence: Conditional Expectation 181 with solution Dk = k(n − k). Hence, observing that X is B(n, p) (where p is the chance of a head), we have
E(DX)= E(E(DX|X)) = E(X(n − X)) = n(n − 1)p(1 − p).
Finally,
cov(X, DX)= E(X2(n− X)) − EXEDX = n(n − 1)p(p − 1)(2p − 1), whenceρ = 0, when p = 12
(11) Example Partition Rule: Showthat if X and Y are jointly distributed, then fX(x)=
y
fY(y) fX|Y(x|y).
s
Solution This is just Theorem 6 in the special case when we take X to be Ix, the indicator of the event {X = x}. Then, E(Ix)= fX(x), and E(Ix|Y = y) = fX|Y(x|y).
The result follows from (7). Alternatively, you can substitute from Definition 1.
Essentially this is the Partition Rule applied to discrete random variables.
s
Recall that we have already defined E(X|B) for any event B in Chapter 4. It is convenient occasionally to consider quantities, such as E(X|Y ; B). This is defined to be the expected value of the conditional distribution
P(X = x|{Y = y} ∩ B) = P({X = x} ∩ {Y = y} ∩ B) P({Y = y} ∩ B) (12)
for any value y of Y such that P({Y = y} ∩ B) > 0.
We give some of the more important properties of conditional expectation.
(13) Theorem Let a and b be constants, g(.) an arbitrary function, and suppose that X, Y, and Z are jointly distributed. Then (assuming all the expectations exist),
(i) E(a|Y ) = a
(ii) E(a X+ bZ|Y ) = aE(X|Y ) + bE(Z|Y ) (iii) E(X|Y ) ≥ 0 if X ≥ 0
(iv) E(X|Y ) = E(X), if X and Y are independent (v) E(X g(Y )|Y ) = g(Y )E(X|Y )
(vi) E(X|Y ; g(Y )) = E(X|Y ) (vii) E(E(X|Y ; Z)|Y ) = E(X|Y ).
Property (v) is called the pull-through property, for obvious reasons.
Property (vii) is called the tower property. It enables us to consider multiple conditioning by taking the random variables in any convenient order.
Proof We prove the odd parts of Theorem 13, the even parts are left as exercises for you.
(i) f (a, y) = fY(y), so
E(a|Y ) = a fY(y)/fY(y)= a.
(iii) If X≥ 0, then every term in the sum in Theorem 6 is nonnegative. The result follows.
(v) E(X g(Y )|Y = y) =
x
xg(y) f (x, y)/fY(y)= g(y)
x
f (x, y)/fY(y)
= g(y)E(X|Y = y).
(vii) For arbitrary values, Y = y and Z = z of Y and Z, w e have E(X|Y ; Z) =
xx f (x, y, z)/fY,Z(y, z). Hence, by definition, E(E(X|Y ; Z)|Y ) =
z
E(X|Y ; Z) fY,Z(y, z)/fY(y)=
z
x
x f (x, y, z)/fY(y) (14)
=
x
x f (x, y)/fY(y)= E(X|Y ).
(15) Example Three children (Aelhyde, Beowulf, and Canute) roll a die in the order A, B, C, A, . . . , etc. until one of them rolls a six and wins. Find the expected number of rolls, given that Canute wins.
Solution We use a form of the tower property, (14). Let C be the event that Canute wins, let X be the duration of the game, and let Y denote the first roll to showa six in the first three rolls, with Y = 0 if there is no six. Then,
E(X|C) = E(E(X|Y ; C)|C).
(16)
Nowif Y = 0, then with the fourth roll the game stands just as it did initially, except that three rolls have been made. So
E(X|Y = 0; C) = 3 + E(X|C).
Obviously, Y is otherwise 3, and
E(X|Y = 3; C) = 3.
Therefore, substituting in (16), we have
E(X|C) = 3 + E(X|C)P(Y = 0|C) and so
E(X|C) = 3 1−
5 6
3 = 648 91 .
Of course, there are other ways of doing this.
s
Finally, we remark that conditional expectation arises in another way.
(17) Theorem Let h(Y ) be any function of Y such that E(h(Y )2)< ∞. Then, E((X− h(Y ))2)≥ E((X − E(X|Y ))2).
(18)