1.5 Conditional Probability
1.5.2 Some Properties of Conditional Expectations
Although the definition above may appear rather abstract, it is not too dif- ficult to work with, and it yields the properties of conditional expectation that we have come to expect based on the limited definitions of elementary probability.
For example, we have the simple relationship with the unconditional ex- pectation:
E(E(X|A)) = E(X). (1.236)
Also, if the individual conditional expectations exist, the conditional ex- pectation is a linear operator:
∀a∈IR,E(aX+Y|A) =aE(X|A) + E(Y|A) a.s. (1.237) This fact follows immediately from the definition. For anyA∈ A
E(aX+Y|A) = Z A aX+YdP =a Z A XdP+ Z A YdP =aE(X|A) + E(Y|A)
As with unconditional expectations, we have immediately from the defini- tion:
X ≤Y a.s. ⇒ E(X|A)≤E(Y|A) a.s.. (1.238) We can establish conditional versions of the three theorems stated on page 89 that relate to the interchange of an integration operation and a limit operation (monotone convergence, Fatou’s lemma, and dominated con- vergence). These extensions are fairly straightforward.
• monotone convergence: for 0≤X1≤X2· · · a.s. Xn a.s. →X ⇒ E(Xn|A) a.s. → E(X|A). (1.239) • Fatou’s lemma: 0≤Xn∀n ⇒ E(lim
n infXn|A)≤limn inf E(Xn|A) a.s.. (1.240) • dominated convergence:
given a fixedY with E(Y|A)<∞,
|Xn| ≤Y∀nandXna.s.→X ⇒ E(Xn|A)a.s.→E(X|A). (1.241) Another useful fact is that ifY isA-measurable and|XY|and|X|are inte- grable (notice this latter is stronger than what is required to define E(X|A)), then
E(XY|A) =YE(X|A) a.s. (1.242)
Some Useful Conditional Expectations
There are some conditional expectations that arise often, and which we should immediately recognize. The simplest one is
E E(Y|X)= E(Y). (1.243)
Note that the expectation operator is based on a probability distribution, and so anytime we see “E”, we need to ask “with respect to what probability distribution?” In notation such as that above, the distributions are implicit and all relate to the same probability space. The inner expectation on the left is with respect to the conditional distribution ofY givenX, and so is a func- tion ofX. The outer expectation is with respect to the marginal distribution ofX.
Approaching this slightly differently, we consider a random variableZthat is a function of the random variablesX andY:
Z=f(X, Y). We have
E(f(X, Y)) = EY EX|Y(f(X, Y)|Y)= EX EY|X(f(X, Y)|X). (1.244) Another useful conditional expectation relates adjusted variances to “to- tal” variances:
V(Y) = V E(Y|X)+ E V(Y|X). (1.245) This is intuitive, although you should be able to prove it formally. The intuitive explanation is: the total variation inY is the sum of the variation of its mean given X and its average variation about X (or given X). (Think of SST = SSR + SSE in regression analysis.)
This equality implies the Rao-Blackwell inequality (drop the second term on the right).
Exchangeability, Conditioning, and Independence
De Finetti’s representation theorem (Theorem 1.30 on page75) requires an infinite sequence, and does not hold for finite sequences. For example, consider an urn containing one red ball and one blue ball from which we draw the balls without replacement. Let Ri = 1 if a red ball is drawn on the ith draw and Ri = 0 otherwise. (This is the Polya’s urn of Example 1.6 on page24 with r=b= 1 andc=−1.) Clearly, the sequenceR1, R2is exchangeable. Because
Pr(R1= 1, R2= 1) = 0,
if there were a measure µas in de Finetti’s representation theorem, then we would have
0 = Z 1
0
π2dµ(π),
which means thatµmust put mass 1 at the point 0. But also Pr(R1= 0, R2= 0) = 0,
which would mean that 0 =
Z 1 0
(1−π)2dµ(π).
That would not be possible ifµsatisfies the previous requirement. There are, however, finite versions of de Finetti’s theorem; see, for example, Diaconis (1977) orSchervish(1995).
An alternate statement of de Finetti’s theorem identifies a random variable with the distribution P, and in that way provides a more direct connection to its use in statistical inference.
Theorem 1.61 (de Finetti’s representation theorem (alternate))
The sequence {Xi}∞i=1 of binary random variables is exchangeable iff there
is a random variable Π such that, conditional on Π = π, the {Xi}∞i=1 are
iid Bernoulli random variables with parameter π. Furthermore, if {Xi}∞i=1
is exchangeable, then the distribution of Π is unique and Xn =Pni=1Xi/n
converges to Π almost surely.
Example 1.30 exchangeable Bernoulli random variables that are conditionally iid Bernoullis (Schervish, 1995)
Suppose {Xn}∞n=1are exchangeable Bernoulli random variables such that for each nand fork= 0,1, . . . , n,
Pr n X i=1 =k ! = 1 n+ 1.
NowXn a.s.→ Π, whereΠ is as in Theorem1.61, and soXn→d Π. To determine the distribution of Π, we write the CDF ofXn as
Fn(t) = b ntc+ 1
n+ 1 ;
hence, limFn(t) = t, which is the CDF of Π. Therefore, Π has a U(0,1) distribution. The Xi are conditionally iid Bernoulli(π) forΠ =π.
The distributions in this example will be used in Examples4.2and4.6in Chapter4 to illustrate methods in Bayesian data analysis.
Conditional expectations also are important in approximations of one ran- dom variable by another random variable, and in “predicting” one random variable using another, as we see in the next section.