• No results found

Expectations and Independence

1. Expected Value

Let Ω be a sample space, P a probability measure, and X a discrete random variable defined on Ω.

Let the values X takes be and put Bn = {ω: X(ω) = bn}. Then

B0, B1, . . . are disjoint and their union is Ω. The function X is equal to bn on the set Bn whose measure is P(Bn). So the integral of the function X with respect to the measure P is

(we allow it to be + ∞). (See Figure 2.1.1.) Note that the right hand side divided by 1 = P(Ω) can also be looked upon as the weighted average of the function X with respect to the weight distribution given by P. Replacing P(Bn) by P{X = bn} in (1.1), we now make the following

Figure 2.1.1 Expected value of a discrete random variable is the sum of its values weighted by the corresponding probabilities.

(1.2) DEFINITION. The expected value of a discrete random variable X taking values in the set is

The preceding defines the expected value of X when it is a discrete non-negative random variable.

We extend this first to arbitrary non-negative random variables and then to arbitrary random variables.

Suppose X is a non-negative real-valued random variable. Then it is possible to find discrete random variables X1, X2, . . . such that

and

for all ω. Since each Xn is discrete, its expected value E[Xn] is well defined by (1.2). By our interpretation of E[Xn] as an integral it is easy to see that

and it seems reasonable to make the following

(1.5) DEFINITION. Let X be a non-negative random variable, and let X1, X2, . . . be discrete random variables satisfying (1.3) and (1.4). Then we define the expected value of X to be

Finally, if X is an arbitrary real-valued random variable (not necessarily non-negative), and if we define

and

for all ω ∈ Ω, then both Y and Z are non-negative random variables, and

Definition (1.5) provides the meanings for the expected values of Y and Z, and we now make the following

(1.8) DEFINITION. Let X be an arbitrary random variable with values in , and let Y and Z be defined by (1.6) and (1.7). Then

provided that at least one of the numbers E[Y] and E[Z] is finite. If E[Y] = E[Z] = + ∞, then X is said to have no expected value.

Definitions (1.2) and (1.8) are quite workable, but (1.5) is not. In fact, we have not even settled the matter of nonambiguity. If {Xn} is a sequence of discrete random variables increasing to X, and if

{Yn} is another sequence of discrete random variables also increasing to X, then Definition (1.5) would put E[X] = limn E[Xn] and E[X] = limn E[Yn]. How do we know that these two numbers are the same ? Indeed, they are the same, as the proof of the next theorem shows. As a by-product we obtain a nice computational formula.

(1.9) THEOREM. For any non-negative random variable X,

Proof. First, suppose X is discrete with values in E. Then using Definition (1.2) and changing the order of summation and integration, we get (see Figure 2.1.2)

This establishes (1.10) for X discrete.

Let X be an arbitrary non-negative random variable, and let X1, X2, . . . be discrete random variables increasing to X. Then (1.11) applies to each

Figure 2.1.2 For non-negative discrete X, E[X] is the shaded area no matter how that is sliced.

Xn, and we have

On the other hand, since the Xn increase to X, for any t ≥ 0,

Thus, Proposition (1.1.11) applies to give

It follows from (1.12) and (1.13) and the monotonicity of the convergence that, by Definition (1.5), we have

This completes the proof.

We note that, in Definition (1.5), the sequence chosen to approximate X has nothing to do with the value E[X]. Formula (1.10) is in general easy to use if the distribution of X is known. (See Figure 2.1.3). In the case of discrete random variables taking integer values 0, 1, 2, . . ., it reduces further to a simpler sum:

(1.14) COROLLARY. If X is a random variable taking values in , then

Figure 2.1.3 Expected value of a non-negative random variable is the shaded area lying above its distribution function.

Figure 2.1.4. Expected value of the random variable X is the difference E[Y] – E[Z] of the two shaded areas.

In the case of arbitrary random variables, using Theorem (1.9) to compute E[Y] and E[Z] in Definition (1.8), we obtain (see Figure 2.1.4)

(1.15) COROLLARY. For any real-valued random variable X,

provided that at least one term on the right is finite.

In the formula given by (1.15), if we integrate by parts we obtain

(1.16) COROLLARY. For any real-valued random variable X with distribution function φ,

provided the integral converges absolutely.

In computing a particular expectation, the choice of one formula over another is largely a matter of convenience. If there is a closed form expression for P{X > t}, in general, it is easier to use Theorem (1.9) and Corollary (1.15). Otherwise, it is easier to use Corollary (1.16) or its discrete equivalent, Definition (1.2).

(1.17) EXAMPLE. The number of arrivals into a store during a specified time interval is a random variable X with

Then, using Definition (1.2),

(1.18) EXAMPLE. The lifetime X of an item has the distribution

This is a non-negative random variable; it is easier to compute E[X] by using Theorem (1.9). We obtain

(1.19) EXAMPLE. The intensity X of light falling on a certain surface has a distribution φ given by

This distribution is called “the normal distribution with mean α and variance β2.” Using Corollary (1.16),

(1.20) EXAMPLE. A discrete random variable X has the distribution

where p, q > 0, p + q = 1. If we use Definition (1.2),†

On the other hand, if we choose to use Corollary (1.14), we first compute

for all ; then

(1.21) EXAMPLE. A piece of equipment has two components whose life-times X and Y are independent random variables with distributions

The equipment fails if either one of the two components does, namely, the lifetime of the equipment is Z = min(X, Y). To compute E[Z] we use Theorem (1.9). Now, Z > t if and only if both X > t and Y > t.

So

where the second equality follows from the independence of X and Y (see Definition (1.2.21)). So

If X is a random variable taking values in E, and if f is a function from E into , then f(X) is a random variable with values in . Given the distribution of X, one can obtain the distribution of Y = f(X) and, using that, compute the expected value of Y by using the formulae of the preceding propositions. However, it is much easier to think of E[Y] as the integral of the function Y with respect to P and obtain it as in Definition (1.2):

(1.22) PROPOSITION. Let X be a discrete random variable taking values in E, and let f be a function from E into . Then

provided the sum is absolutely convergent.

Proof. The random variable Y = f(X) takes the value f(a) on the set {X = a}, whose measure is P{X

= a}. The integral of Y therefore is Σ f(a) P{X = a} where the summation is over all a ∈ E.

In the case of arbitrary (instead of discrete) random variables, the same reasoning gives the

following

(1.23) PROPOSITION. Let X be a random variable with values in E, and let f be a function from E into . Then

provided that the integral be absolutely convergent.

Proof is omitted. If instead we had a function of more than one random variable, the preceding two propositions become as follows. Again, we omit the proof.

Proof of (a) is immediate from Proposition (1.23), where we take f(a) = ca and then use Corollary (1.16). Proof of (b) follows from (1.24) by taking f(a, b) = a + b, and (c) is immediate from (a) and (b).

We note that in the preceding corollary, we made no assumption of independence: whether or not the random variables are independent, the expected value of any linear combination of them is equal to the same linear combination of their expectations. The following is the analog for the case of multiplication.

(1.26) PROPOSITION. Let X and Y be two independent random variables taking values in E, and let g and h be two functions from E into . Then

Proof for discrete X, Y. Put f(a, b) = g(a)h(b) in Theorem (1.24). Then

But the independence of X and Y implies that

for any a and b. So

The assumption of independence in this proposition is crucial: if X and Y are not independent, then E[g(X)h(Y)] might differ from E[g(X)]E[h(Y)]. generating function of X. If X is a non-negative random variable, and if f(b) = e–αb for some α ≥ 0, then E[f(X)] is again a number between 0 and 1. Considered as a function of α, F(α) = E[e–αx] is called the Laplace transform of X.

The expected value of X is a rough guide to the value X is likely to be near. The variance of X measures the deviation of X from this likely value E[X]. If the variance is small, then X is more likely to be near E[X]. The following is an estimate that can be used when the distribution of X is not known. It is called Chebyshev’s inequality.

(1.27) PROPOSITION. Let X be a random variable with expectation a and variance b2. Then for any ε >

0,

Proof. Consider the expectation of the positive random variable Y = (X − a)2; it is E[Y] = b2. Now, E[Y] is the integral of Y over all of Ω, and as such it is greater than the integral of Y on the set {Y >

ε2}. The measure of that set is P{Y > ε2}, and Y > ε2 on that set. So the integral must be greater than ε2P{Y > ε2}. That is,

from which the proposition follows.

In computing the variance it is usually worth noting that

Following are some examples of these computations.

(1.29) EXAMPLE. Consider the random variable X of Example (1.17). We had already computed E[X]

= 8. Now, to obtain the variance we use the formula (1.28). To compute E[X2], note that it is easier to compute E[X(X − 1)] first and then use E[X2] = E[X(X − 1)] + E[X]. Now,

Hence,

and

Next we compute its generating function. We have

for any α ∈ [0, 1]. Note that the derivative of G at α = 1 is

whereas the second derivative of G at α = 1 is

and the third derivative is

(1.30) EXAMPLE. Consider the lifetime X of the item discussed in Example (1.18). Its expectation was E[X] = 50. Now,

and hence

Computing the Laplace transform of X, we find

We note that the derivative of F at α = 0 is

and that the second derivative at α = 0 is

The results concerning the derivatives of the generating function in Example (1.29) hold in general:

we have, for the generating function G of a non-negative integer-valued random variable X,

where G(k) is the kth derivative of G. Similarly, the results in Example (1.30) concerning the Laplace transform also generalize. For any non-negative random variable X with Laplace transform F,

It is also worth mentioning that a generating function determines the probability distribution associated with it; this is true because

which means that P{X = n} is the coefficient of αn in the power-series expansion of G(α). Similarly, the Laplace transform determines the associated distribution function.

We close this section with two theorems on the expected value of the limit of a sequence of random variables. The first is called the monotone convergence theorem and the second the bounded convergence theorem. The proof of the first is the same as that of (1.9), and we will not repeat it; we also omit the proof of the second.

(1.34) THEOREM. If X1, X2, . . . is a sequence of non-negative random variables increasing to the random variable X, then the expectations E[X1], E[X2], . . . increase to E[X].

(1.35) THEOREM. Let X1, X2, . . be a sequence of random variables which are bounded in absolute value by a random variable Y such that E[Y] < ∞. If

for almost all ω ∈ Ω, then