Conditional Expectations - Expectations and Independence

Expectations and Independence

2. Conditional Expectations

Let Y be a discrete random variable taking values in , and let A be an event with P(A) > 0. Then the conditional probability that Y = b given the event A is (see (1.3.1))

As b varies, this is called the conditional distribution of Y given the event A. We define the conditional expectation of Y given the event A as

In particular, when A = {X = a} for a discrete random variable X taking values in a set E,

is called the conditional expectation of Y given that X = a. As a varies, (2.3) defines a function f on E by

By the conditional expectation of Y given X, written E[Y | X], we mean the random variable f(X); that is,

where f is as defined by (2.4). The following definition is the generalized version of this.

(2.6) DEFINITION. Let X₁, . . ., X_n be discrete random variables taking values in E, and let Y be a discrete random variable with values in . Then the conditional expectation of Y given X₁, . . ., X_n is

where for any n-tuple (a₁, . . ., a_n) with a_i ∈ E,

If Y is not discrete, then a similar definition is given in terms of its conditional distribution P{Y ≤ t | X₁ = a₁, . . ., X_n = a_n}. For example, if Y is non-negative,

where

for all a₁, . . ., a_n ∈ E.

For any event A, its indicator function I_A (which is such that I_A(ω) = 1 or 0 according as ω ∈ A or

not) is a random variable. Then we define the conditional probability of A given X₁, . . ., X_n as

The following are some easy properties of conditional expectations. These are analogous to Propositions (1.22), (1.23), (1.24), and (1.25). We omit the proofs.

(2.10) P^ROPOSITION. Let Y be a discrete random variable with values in E and g a function from E into . Then

(2.11) PROPOSITION. Let Y₁, . . ., Y_m be discrete random variables with values in E, and let g be a function from E_m into . Then

(2.12) COROLLARY. If Y₁, . . ., Y_m take values in and c₁, . . ., c_m are constants, then

(2.13) EXAMPLE. Let X and Y be two random variables with

Let f(b) = E[Y | X = b], b = 1, 2. Then

Thus,

(2.14) EXAMPLE. Consider three random variables X, Y, and Z with joint distribution

for K = 1, . . ., m – 1; m = 2, . . ., n – 1; n = 3, 4, . . ., where 0 < p < 1, p + q = 1.

Then for k = 1, . . ., m – 1; m = 2, 3, . . .;

Thus, for k = 1, . . ., m – 1 and m = 2, . . ., n – 1, we have

Hence, for k = 1, . . ., m – 1 and m = 2, 3, . . .,

Thus,

We note that for any bounded function g,

so that

In particular, if g(b) = α^b for some α ∈ [0, 1], then

The next proposition states that if the knowledge of X₁ . . ., X_n determines Y completely, then the conditional expectation of Y given X₁, . . ., X_n is equal to Y itself. The proof is very easy and we omit it.

(2.15) P^ROPOSITION. If Y can be written as

for some function f, then

Next is a very useful result used in situations where E[Y | X₁, . . ., X_n] is easy to obtain or known somehow. Since E[Y | X₁, . . ., X_n] is a random variable taking real values, we can talk about its expected value. That expected value is the same as the expectation of Y. In words, the expected value of any conditional expectation of Y is equal to the expected value of Y.

(2.16) PROPOSITION. E[E[Y | X_l, . . ., X_n] = E[Y].

Proof for discrete Y, X₁, . . ., X_n. Let

then

On the other hand,

Putting (2.18) into (2.17) and changing the order of summation, noting Definition (1.3.1) of conditional probabilities, we obtain

The next result is very important in the theory of stochastic processes. It shows how to obtain the conditional expectation of Y given X₁, . . ., X_n when it is easy to obtain the same given X₁, . . ., X_n plus some extra information contained in X_n+1, . . ., X_n+m.

(2.19) T^HEOREM. For any n, m ≥ 1

Proof for n = 2, m = 1, X₁, X₂, X₃, Y discrete. Let Z = f(X₁, X₂, X₃) = E[Y| X₁, X₂, X₃] We need to show that

We have

and

Putting the two computations together, we get

since

by (1.3.1) and (1.3.2). Noting that (2.21) is the same as (2.20) completes the proof.

(2.22) C^OROLLARY. If

then

Proof. By Theorem (2.19) and Proposition (2.15), ., X_n} are such that knowing the values of one set determines the values of the other. This is especially the case when Y₁ = g₁(X₁, . . ., X_n), . . ., Y_m = g_m(X₁, . . ., X_n) and conversely X₁ = f₁(Y₁, . . ., Y_m) . . ., X_n = f_n(Y₁, . . ., Y_m). Then for any random variable Y, the conditional expectation of Y given X₁, . . ., X_n is the same as the conditional expectation of Y given Y₁, . . ., Y_m. This is so since {X₁, . . ., X_n} carries the same information as {Y₁, . . ., Y_m}. The proof is easy and will be omitted.

(2.23) T^HEOREM. Suppose the collections {X₁, . . ., X_n} and {Y₁, . . ., Y_m} are such that the knowledge of the random variables in one collection determines the values of the random variables in the other.

Then for any Y,

We close this section by giving an extension of the concept of independence.

(2.24) D^EFINITION. The set of random variables {Y₁, . . ., Y_m} is said to be independent of {X₁, . . ., X_n} if

for all non-negative functions g. Two stochastic processes {Y_t; t ∈ T₁} and {X_t; t ∈ T₂} are said to be independent of each other if any finite collection {Y_t

1, . . ., Y_t

equivalent to the independence, in the sense of (2.24), of any two subcollections. As such, we will not distinguish between the two.

Next is a new concept, that of conditional independence.

(2.25) DEFINITION. {Y₁, . . ., Y_m} is said to be conditionally independent of {Z₁, . . ., Z_k} given {X₁, . . ., X_n} provided that

for all non-negative functions g. The collection {Y_t; t ∈ T₁} is said to be conditionally independent of the collection {Z_t; t ∈ T₂} given the collection {X_t; t ∈ T₃} provided that for any finite collection {Y_t

1, . . ., Y_t

m} from the first and any finite collection {Z_s

1, . . ., Z_s

(2.26) E^XAMPLE. Consider the random variables X, Y, Z of Example (2.14). We had shown that

The right hand side being independent of X, we see that Z is conditionally independent of X given Y.

We also note that Z is not independent of X.

(2.27) EXAMPLE. Let X₁, X₂, . . . be a sequence of random variables with E[X_i] = μ for all i. Let N be a non-negative integer-valued random variable independent of X₁, X₂, . . . with E[N] = λ For each ω

∈ Ω, let

We would like to compute E[Y]. We may think of X₁, X₂, . . . as the amounts spent by customers 1, 2, . . . and of N as the number of arrivals within the first hour. Then Y is the total revenue within that hour.

By Proposition (2.16),

On the other hand, since N is independent of X₁, X₂, . . ., for n ≥ 1,

Hence

and by (2.28)

3. Exercises

(3.1) Find the expected value of the random variable X taking the values –5, 1, 4, 8, 10 with probabilities 0.3, 0.2, 0.2, 0.1, 0.2 respectively.

(3.2) Consider the random variable X taking the values –2, 0, 2 with probabilities 0.4, 0.3, 0.3 respectively. Compute the expected values of X, X², 3X² + 5.

(3.3) Compute the variance and the generating function of the random variable in Example (1.20).

(3.4) A random variable X is said to have the uniform distribution over [a, b] if

(a) Compute E[X], Var(X), E[(X – a)/(b – a)].

(b) Find the distribution of Y = (X – a)/(b – a).

(3.5) Compute the variance and the Laplace transform of the lifetime in Example (1.21).

(3.6) Compute the variance of the intensity of light in Example (1.19).

(3.7) The headway X between two vehicles at a fixed instant is a random variable with

Find the expected value and the variance of the headway.

(3.8) Show that for any constants a and b,

for any random variable X.

(3.9) Show that for any two independent random variables X and Y,

(3.10) The lifetime X of a device has the distribution

(a) Show that E[X] = 1/c.

(b) Show that (see also Example (1.3.8))

(3.11) Let X, Y be as defined in Example (1.2.23).

(a) Compute E[X], E[Y – X], E[Y].

(b) Find E[Y – X | X], E[Y | X].

(3.12) Suppose X₁ X₂, . . . are independent and identically distributed non-negative random variables with

Let N be a non-negative integer-valued random variable which is independent of {X₁, X₂, . . .}, and let

Let S₀ = 0, S₁ = X₁, S₂ = X₁ + X₂, . . ., and let Y = S_N. (a) Compute E[Y | N], E[Y²|N].

(b) Compute E[Y], E[Y²], Var(Y).

where

The chapter just finished completes our account of the preliminaries necessary for studying stochastic processes.

Especially in view of the monstrous looks of the last section, it seems all too advisable to inquire how the reader’s patience is holding out and to assure him that he will in time come to appreciate the true friendliness of these concepts.

For a deeper treatment and for the proofs which we have omitted, we refer the reader to CHUNG [2].

†Note that, for x ∈ (0, 1), . If we differentiate both sides we get

and .

CHAPTER 3

Bernoulli Processes and Sums of

In document Erhan Cinlar Introduction to Stochastic Processes (Page 42-53)