Expectations and Independence
2. Conditional Expectations
Let Y be a discrete random variable taking values in , and let A be an event with P(A) > 0. Then the conditional probability that Y = b given the event A is (see (1.3.1))
As b varies, this is called the conditional distribution of Y given the event A. We define the conditional expectation of Y given the event A as
In particular, when A = {X = a} for a discrete random variable X taking values in a set E,
is called the conditional expectation of Y given that X = a. As a varies, (2.3) defines a function f on E by
By the conditional expectation of Y given X, written E[Y | X], we mean the random variable f(X); that is,
where f is as defined by (2.4). The following definition is the generalized version of this.
(2.6) DEFINITION. Let X1, . . ., Xn be discrete random variables taking values in E, and let Y be a discrete random variable with values in . Then the conditional expectation of Y given X1, . . ., Xn is
where for any n-tuple (a1, . . ., an) with ai ∈ E,
If Y is not discrete, then a similar definition is given in terms of its conditional distribution P{Y ≤ t | X1 = a1, . . ., Xn = an}. For example, if Y is non-negative,
where
for all a1, . . ., an ∈ E.
For any event A, its indicator function IA (which is such that IA(ω) = 1 or 0 according as ω ∈ A or
not) is a random variable. Then we define the conditional probability of A given X1, . . ., Xn as
The following are some easy properties of conditional expectations. These are analogous to Propositions (1.22), (1.23), (1.24), and (1.25). We omit the proofs.
(2.10) PROPOSITION. Let Y be a discrete random variable with values in E and g a function from E into . Then
(2.11) PROPOSITION. Let Y1, . . ., Ym be discrete random variables with values in E, and let g be a function from Em into . Then
(2.12) COROLLARY. If Y1, . . ., Ym take values in and c1, . . ., cm are constants, then
(2.13) EXAMPLE. Let X and Y be two random variables with
Let f(b) = E[Y | X = b], b = 1, 2. Then
Thus,
(2.14) EXAMPLE. Consider three random variables X, Y, and Z with joint distribution
for K = 1, . . ., m – 1; m = 2, . . ., n – 1; n = 3, 4, . . ., where 0 < p < 1, p + q = 1.
Then for k = 1, . . ., m – 1; m = 2, 3, . . .;
Thus, for k = 1, . . ., m – 1 and m = 2, . . ., n – 1, we have
Hence, for k = 1, . . ., m – 1 and m = 2, 3, . . .,
Thus,
We note that for any bounded function g,
so that
In particular, if g(b) = αb for some α ∈ [0, 1], then
The next proposition states that if the knowledge of X1 . . ., Xn determines Y completely, then the conditional expectation of Y given X1, . . ., Xn is equal to Y itself. The proof is very easy and we omit it.
(2.15) PROPOSITION. If Y can be written as
for some function f, then
Next is a very useful result used in situations where E[Y | X1, . . ., Xn] is easy to obtain or known somehow. Since E[Y | X1, . . ., Xn] is a random variable taking real values, we can talk about its expected value. That expected value is the same as the expectation of Y. In words, the expected value of any conditional expectation of Y is equal to the expected value of Y.
(2.16) PROPOSITION. E[E[Y | Xl, . . ., Xn] = E[Y].
Proof for discrete Y, X1, . . ., Xn. Let
then
On the other hand,
Putting (2.18) into (2.17) and changing the order of summation, noting Definition (1.3.1) of conditional probabilities, we obtain
The next result is very important in the theory of stochastic processes. It shows how to obtain the conditional expectation of Y given X1, . . ., Xn when it is easy to obtain the same given X1, . . ., Xn plus some extra information contained in Xn+1, . . ., Xn+m.
(2.19) THEOREM. For any n, m ≥ 1
Proof for n = 2, m = 1, X1, X2, X3, Y discrete. Let Z = f(X1, X2, X3) = E[Y| X1, X2, X3] We need to show that
We have
and
Putting the two computations together, we get
since
by (1.3.1) and (1.3.2). Noting that (2.21) is the same as (2.20) completes the proof.
(2.22) COROLLARY. If
then
Proof. By Theorem (2.19) and Proposition (2.15), ., Xn} are such that knowing the values of one set determines the values of the other. This is especially the case when Y1 = g1(X1, . . ., Xn), . . ., Ym = gm(X1, . . ., Xn) and conversely X1 = f1(Y1, . . ., Ym) . . ., Xn = fn(Y1, . . ., Ym). Then for any random variable Y, the conditional expectation of Y given X1, . . ., Xn is the same as the conditional expectation of Y given Y1, . . ., Ym. This is so since {X1, . . ., Xn} carries the same information as {Y1, . . ., Ym}. The proof is easy and will be omitted.
(2.23) THEOREM. Suppose the collections {X1, . . ., Xn} and {Y1, . . ., Ym} are such that the knowledge of the random variables in one collection determines the values of the random variables in the other.
Then for any Y,
We close this section by giving an extension of the concept of independence.
(2.24) DEFINITION. The set of random variables {Y1, . . ., Ym} is said to be independent of {X1, . . ., Xn} if
for all non-negative functions g. Two stochastic processes {Yt; t ∈ T1} and {Xt; t ∈ T2} are said to be independent of each other if any finite collection {Yt
1, . . ., Yt
equivalent to the independence, in the sense of (2.24), of any two subcollections. As such, we will not distinguish between the two.
Next is a new concept, that of conditional independence.
(2.25) DEFINITION. {Y1, . . ., Ym} is said to be conditionally independent of {Z1, . . ., Zk} given {X1, . . ., Xn} provided that
for all non-negative functions g. The collection {Yt; t ∈ T1} is said to be conditionally independent of the collection {Zt; t ∈ T2} given the collection {Xt; t ∈ T3} provided that for any finite collection {Yt
1, . . ., Yt
m} from the first and any finite collection {Zs
1, . . ., Zs
(2.26) EXAMPLE. Consider the random variables X, Y, Z of Example (2.14). We had shown that
The right hand side being independent of X, we see that Z is conditionally independent of X given Y.
We also note that Z is not independent of X.
(2.27) EXAMPLE. Let X1, X2, . . . be a sequence of random variables with E[Xi] = μ for all i. Let N be a non-negative integer-valued random variable independent of X1, X2, . . . with E[N] = λ For each ω
∈ Ω, let
We would like to compute E[Y]. We may think of X1, X2, . . . as the amounts spent by customers 1, 2, . . . and of N as the number of arrivals within the first hour. Then Y is the total revenue within that hour.
By Proposition (2.16),
On the other hand, since N is independent of X1, X2, . . ., for n ≥ 1,
Hence
and by (2.28)
3. Exercises
(3.1) Find the expected value of the random variable X taking the values –5, 1, 4, 8, 10 with probabilities 0.3, 0.2, 0.2, 0.1, 0.2 respectively.
(3.2) Consider the random variable X taking the values –2, 0, 2 with probabilities 0.4, 0.3, 0.3 respectively. Compute the expected values of X, X2, 3X2 + 5.
(3.3) Compute the variance and the generating function of the random variable in Example (1.20).
(3.4) A random variable X is said to have the uniform distribution over [a, b] if
(a) Compute E[X], Var(X), E[(X – a)/(b – a)].
(b) Find the distribution of Y = (X – a)/(b – a).
(3.5) Compute the variance and the Laplace transform of the lifetime in Example (1.21).
(3.6) Compute the variance of the intensity of light in Example (1.19).
(3.7) The headway X between two vehicles at a fixed instant is a random variable with
Find the expected value and the variance of the headway.
(3.8) Show that for any constants a and b,
for any random variable X.
(3.9) Show that for any two independent random variables X and Y,
(3.10) The lifetime X of a device has the distribution
(a) Show that E[X] = 1/c.
(b) Show that (see also Example (1.3.8))
(3.11) Let X, Y be as defined in Example (1.2.23).
(a) Compute E[X], E[Y – X], E[Y].
(b) Find E[Y – X | X], E[Y | X].
(c) Show by direct computation that E[E[Y | X]] = E[Y].
(3.12) Suppose X1 X2, . . . are independent and identically distributed non-negative random variables with
Let N be a non-negative integer-valued random variable which is independent of {X1, X2, . . .}, and let
Let S0 = 0, S1 = X1, S2 = X1 + X2, . . ., and let Y = SN. (a) Compute E[Y | N], E[Y2|N].
(b) Compute E[Y], E[Y2], Var(Y).
(c) Show that for any ,
where
The chapter just finished completes our account of the preliminaries necessary for studying stochastic processes.
Especially in view of the monstrous looks of the last section, it seems all too advisable to inquire how the reader’s patience is holding out and to assure him that he will in time come to appreciate the true friendliness of these concepts.
For a deeper treatment and for the proofs which we have omitted, we refer the reader to CHUNG [2].
†Note that, for x ∈ (0, 1), . If we differentiate both sides we get
and .