• No results found

More about discrete random variables

3.5 Conditional expectation

Just as we developed expectation for discrete random variables in Section 2.4, including the law of the unconscious statistician, we can develop conditional expectation in the same way. This leads to the formula

E[g(Y)|X = xi] =

j

g(yj) pY|X(yj|xi). (3.21)

Example 3.19. The random number Y of alpha particles emitted by a radioactive sample is conditionally Poisson(k) given that the sample size X = k. Find E[Y|X = k].

Solution. We must compute

E[Y|X = k] =

n

nP(Y = n|X = k), where (cf. Example 3.15)

P(Y = n|X = k) = kne−k

n! , n = 0,1,....

Hence,

E[Y|X = k] =

n=0

nkne−k n! .

Now observe that the right-hand side is exactly ordinary expectation of a Poisson random variable with parameter k (cf. the calculation in Example 2.22). Therefore,E[Y|X = k] = k.

Example 3.20. Let Z be the output of the Poisson channel of Example 3.18, and let X be the transmitted signal. ComputeE[X|Z = j] using the conditional pmf pX|Z(i| j) found in Example 3.18.

Solution. We must compute

E[X|Z = j] =

j

i=0iP(X = i|Z = j), where, letting p :=λ/(λ+µ),

P(X = i|Z = j) =

j i



pi(1 − p)j−i. Hence,

E[X|Z = j] =

j i=0

i

j i



pi(1 − p)j−i.

Now observe that the right-hand side is exactly the ordinary expectation of a binomial( j, p) random variable. It is shown in Problem 8 that the mean of such a random variable is j p.

Therefore,E[X|Z = j] = jp = jλ/(λ+µ).

Substitution law for conditional expectation

For functions of two variables, we have the following conditional law of the unconscious statistician,

E[g(X,Y)|X = xi] =

k

j

g(xk,yj) pXY|X(xk,yj|xi).

However,

pXY|X(xk,yj|xi) = P(X = xk,Y = yj|X = xi)

= P(X = xk,Y = yj,X = xi) P(X = xi) . Now, when k= i, the intersection

{X = xk} ∩ {Y = yj} ∩ {X = xi}

is empty, and has zero probability. Hence, the numerator above is zero for k= i. When k= i, the above intersections reduce to {X = xi} ∩ {Y = yj}, and so

pXY|X(xk,yj|xi) = pY|X(yj|xi), for k = i.

It now follows that

E[g(X,Y)|X = xi] =

j

g(xi,yj) pY|X(yj|xi)

= E[g(xi,Y)|X = xi].

We call

E[g(X,Y)|X = xi] = E[g(xi,Y)|X = xi] (3.22)

the substitution law for conditional expectation. Note that if g in (3.22) is a function of Y only, then (3.22) reduces to (3.21). Also, if g is of product form, say g(x,y) = h(x)k(y), then

E[h(X)k(Y)|X = xi] = h(xi)E[k(Y)|X = xi].

Law of total probability for expectation

In Section 3.4 we discussed the law of total probability, which shows how to compute probabilities in terms of conditional probabilities. We now derive the analogous formula for expectation. Write

i

E[g(X,Y)|X = xi] pX(xi) =

i



j

g(xi,yj) pY|X(yj|xi)

 pX(xi)

=

i

j

g(xi,yj) pXY(xi,yj)

= E[g(X,Y)].

Hence, the law of total probability for expectation is E[g(X,Y)] =

i

E[g(X,Y)|X = xi] pX(xi). (3.23)

In particular, if g is a function of Y only, then E[g(Y)] =

i

E[g(Y)|X = xi] pX(xi).

Example 3.21. Light of intensityλ is directed at a photomultiplier that generates X∼ Poisson(λ) primaries. The photomultiplier then generates Y secondaries, where given X = n, Y is conditionally geometric1

(n + 2)−1

. Find the expected number of secondaries and the correlation between the primaries and the secondaries.

Solution. The law of total probability for expectations says that

E[Y] =

n=0E[Y|X = n] pX(n),

where the range of summation follows because X is Poisson(λ). The next step is to compute the conditional expectation. The conditional pmf of Y is geometric1(p), where, in this case,

p= (n + 2)−1, and the mean of such a pmf is, by Problem 4, 1/(1 − p). Hence,

An easy calculation (Problem 34 in Chapter 2) shows that for X∼ Poisson(λ), E The correlation between X and Y is

E[XY] =

Note 1. When z is complex,

E[zX] := E[Re(zX)] + jE[Im(zX)].

By writing

zn = rnejnθ = rn[cos(nθ) + j sin(nθ)],

it is easy to check that for|z| ≤ 1, the above expectations are finite (cf. (3.3)) and that E[zX] =

n=0

znP(X = n).

Note 2. Although GX(z) is well defined for |z| ≤ 1, the existence of its derivatives is only guaranteed for|z| < 1. Hence, G(k)X (1) may have to be understood as limz↑ 1G(k)X (z).

By Abel’s theorem [32, pp. 64–65], this limit is equal to the kth factorial moment on the right-hand side of (3.5), even if it is infinite.

3.4: Conditional probability

Note 3. Here is an alternative derivation of the fact that the sum of independent Ber-noulli random variables is a binomial random variable. Let X1,X2,... be independent Bernoulli(p) random variables. Put

Yn :=

n

i=1

Xi.

We need to show that Yn∼ binomial(n, p). The case n = 1 is trivial. Suppose the result is true for some n≥ 1. We show that it must be true for n + 1. Use the law of total probability to write

P(Yn+1= k) =

n

i=0P(Yn+1= k|Yn= i)P(Yn= i). (3.24) To compute the conditional probability, we first observe that Yn+1= Yn+ Xn+1. Also, since the Xiare independent, and since Yndepends only on X1,...,Xn, we see that Ynand Xn+1are independent. Keeping this in mind, we apply the substitution law and write

P(Yn+1= k|Yn= i) = P(Yn+ Xn+1= k|Yn= i)

= P(i + Xn+1= k|Yn= i)

= P(Xn+1= k − i|Yn= i)

= P(Xn+1= k − i).

Since Xn+1 takes only the values zero and one, this last probability is zero unless i= k or i= k − 1. Returning to (3.24), we can writec

P(Yn+1= k) =

k

i=k−1P(Xn+1= k − i)P(Yn= i).

Assuming that Yn∼ binomial(n, p), this becomes P(Yn+1= k) = p Using the easily verified identity,

 n

Note 4. We show that the MAP rule is optimal for minimizing the probability of a decision error. Consider a communication system whose input X takes values 1,...,M with given probabilities pX(i) = P(X = i). The channel output is an integer-valued random variable Y . Assume that the conditional probability mass function pY|X( j|i) = P(Y = j|X = i) is also known. The receiver decision rule is ψ(Y) = i if Y ∈ Di, where D1,...,DM is

cWhen k= 0 or k = n + 1, this sum actually has only one term, since P(Yn= −1) = P(Yn= n + 1) = 0.

a partition of IR. The problem is to characterize the choice for the partition sets Di that minimizes the probability of a decision error, or, equivalently, maximizes the probability of a correct decision. Use the laws of total probability and substitution to write the probability of a correct decision as

P(ψ(Y) = X) =

M

i=1P(ψ(Y) = X|X = i)P(X = i)

=

M

i=1P(ψ(Y) = i|X = i)pX(i)

=

M

i=1

P(Y ∈ Di|X = i)pX(i)

=

M

i=1



j

IDi( j)pY|X( j|i)

 pX(i)

=

j

M

i

=1

IDi( j)pY|X( j|i)pX(i)

 .

For fixed j, consider the inner sum. Since the Diform a partition, the only term that is not zero is the one for which j∈ Di. To maximize this value, we should put j∈ Diif and only if the weight pY|X( j|i)pX(i) is greater than or equal to pY|X( j|i)pX(i) for all i= i. This is exactly the MAP rule (cf. (3.19)).

Problems

3.1: Probability generating functions

1. Find var(X) if X has probability generating function GX(z) = 16+16z+23z2.

2. If GX(z) is as in the preceding problem, find the probability mass function of X.

3. Find var(X) if X has probability generating function

GX(z) =

2+ z 3

5

.

4. Evaluate GX(z) for the cases X ∼ geometric0(p) and X ∼ geometric1(p). Use your results to find the mean and variance of X in each case.

5. For i = 1,...,n, let Xi∼ Poisson(λi). Put Y :=

n

i=1

Xi.

FindP(Y = 2) if the Xiare independent.

6. Let a0,...,anbe nonnegative and not all zero. Let m be any positive integer. Find a constant D such that

GX(z) := (a0+ a1z+ a2z2+ ··· + anzn)m/D is a valid probability generating function.

7. Let X1,X2,...,Xnbe i.i.d. geometric1(p) random variables, and put Y := X1+···+Xn. FindE[Y], var(Y), and E[Y2]. Also find the probability generating function of Y.

Remark. We say that Y is a negative binomial or Pascal random variable with para-meters n and p.

3.2: The binomial random variable

8. Use the probability generating function of Y ∼ binomial(n, p) to find the mean and variance of Y .

9. Show that the binomial(n, p) probabilities sum to one. Hint: Use the fact that for any nonnegative integer-valued random variable, GY(z)|z=1= 1.