Conditional probability - Probability and Random Processes for Electrical and Computer Engineer

A computer maker buys the same chips from two different suppliers, S1 and S2, in order to reduce the risk of supply interruption. However, now the computer maker wants to ﬁnd out if one of the suppliers provides more reliable devices than the other. To make this determination, the computer maker examines a collection of n chips. For each one, there are four possible outcomes, depending on whether the chip comes from supplier S1 or supplier S2 and on whether the chip works (w) or is defective (d). We denote these outcomes by Ow,S1, Od,S1, Ow,S2, and Od,S2. The numbers of each outcome can be arranged in the matrix

N(Ow,S1) N(Ow,S2) N(Od,S1) N(Od,S2)

. (1.18)

The sum of the ﬁrst column is the number of chips from supplier S1, which we denote by

N(OS1). The sum of the second column is the number of chips from supplier S2, which we

denote by N(OS2).

The relative frequency of working chips from supplier S1 is N(Ow,S1)/N(OS1). Sim-

ilarly, the relative frequency of working chips from supplier S2 is N(Ow,S2)/N(OS2). If N(Ow,S1)/N(OS1) is substantially greater than N(Ow,S2)/N(OS2), this would suggest that

supplier S1 might be providing more reliable chips than supplier S2.

Example 1.19. Suppose that (1.18) is equal to

754 499 221 214

. Determine which supplier provides more reliable chips.

Solution. The number of chips from supplier S1 is the sum of the ﬁrst column, N(OS1)

= 754 + 221 = 975. The number of chips from supplier S2 is the sum of the second col- umn, N(OS2) = 499+214 = 713. Hence, the relative frequency of working chips from sup-

plier S1 is 754/975 ≈ 0.77, and the relative frequency of working chips form supplier S2 is 499/713 ≈ 0.70. We conclude that supplier S1 provides more reliable chips. You can run your own simulations using the MATLABscript in Problem 51.

Notice that the relative frequency of working chips from supplier S1 can also be written as the quotient of relative frequencies,

N(Ow,S1) N(OS1) =

N(Ow,S1)/n N(OS1)/n .

This suggests the following deﬁnition of conditional probability. LetΩ be a sample space. Let the event S1model a chip’s being from supplier S1, and let the event W model a chip’s

working. In our model, the conditional probability that a chip works given that the chip comes from supplier S1 is deﬁned by

P(W|S1) := P(W ∩ S_P(S 1)

1) ,

where the probabilities model the relative frequencies on the right-hand side of (1.19). This deﬁnition makes sense only ifP(S1) > 0. If P(S1) = 0, P(W|S1) is not deﬁned.

Given any two events A and B of positive probability,

P(A|B) = P(A ∩ B)_P(B) (1.20) and

P(B|A) = P(A ∩ B)_P(A) . From (1.20), we see that

P(A ∩ B) = P(A|B)P(B). (1.21) Substituting this into the numerator above yields

P(B|A) = P(A|B)P(B)

P(A) . (1.22)

We next turn to the problem of computing the denominatorP(A).

The law of total probability and Bayes’ rule

The law of total probability is a formula for computing the probability of an event that can occur in different ways. For example, the probability that a cell-phone call goes through depends on which tower handles the call. The probability of Internet packets being dropped depends on which route they take through the network.

When an event A can occur in two ways, the law of total probability is derived as follows (the general case is derived later in the section). We begin with the identity

A = (A ∩ B) ∪ (A ∩ Bc)

(recall Figure 1.13(a)). Since this is a disjoint union,

P(A) = P(A ∩ B) + P(A ∩ Bc_).

In terms of Figure 1.13(a), this formula says that the area of the disk A is the sum of the areas of the two shaded regions. Using (1.21), we have

P(A) = P(A|B)P(B) + P(A|Bc_)P(Bc_).

(1.23) This formula is the simplest version of the law of total probability.

Example 1.20. Due to an Internet conﬁguration error, packets sent from New York to

Los Angeles are routed through El Paso, Texas with probability 3/4. Given that a packet is routed through El Paso, suppose it has conditional probability 1/3 of being dropped. Given that a packet is not routed through El Paso, suppose it has conditional probability 1/4 of being dropped. Find the probability that a packet is dropped.

Solution. To solve this problem, we use the notationf

E = {routed through El Paso} and D = {packet is dropped}.

With this notation, it is easy to interpret the problem as telling us that

P(D|E) = 1/3, P(D|Ec_{) = 1/4, and P(E) = 3/4.} _(1.24)

We must now computeP(D). By the law of total probability, P(D) = P(D|E)P(E) + P(D|Ec_)P(Ec₎

= (1/3)(3/4) + (1/4)(1 − 3/4) = 1/4 + 1/16

= 5/16. (1.25)

To derive the simplest form of Bayes’ rule, substitute (1.23) into (1.22) to get P(B|A) = P(A|B)P(B)

P(A|B)P(B) + P(A|Bc_)P(Bc₎. (1.26)

As illustrated in the following example, it is not necessary to remember Bayes’ rule as long as you know the deﬁnition of conditional probability and the law of total probability.

Example 1.21 (continuation of Internet Example 1.20). Find the conditional probabil-

ity that a packet is routed through El Paso given that it is not dropped.

Solution. With the notation of the previous example, we are being asked to ﬁnd P(E|Dc_).

Write

P(E|Dc_{) =} P(E ∩ Dc)

P(Dc)

= P(Dc|E)P(E) P(Dc₎ .

From (1.24) we haveP(E) = 3/4 and P(Dc|E) = 1 − P(D|E) = 1 − 1/3. From (1.25), P(Dc_{) = 1 − P(D) = 1 − 5/16. Hence,}

P(E|Dc_{) =} (2/3)(3/4)

11/16 = 8 11.

f_{In working this example, we follow common practice and do not explicitly specify the sample space}_{Ω or the}

probability measureP. Hence, the expression “let E = {routed through El Paso}” is shorthand for “let E be the subset ofΩ that models being routed through El Paso.” The curious reader may ﬁnd one possible choice for Ω and P, along with precise mathematical deﬁnitions of the events E and D, in Note 5.

If we had not already computedP(D) in the previous example, we would have computed P(Dc_{) directly using the law of total probability.}

We now generalize the law of total probability. Let Bnbe a sequence of pairwise disjoint

events such that∑nP(Bn) = 1. Then for any event A,

P(A) =

∑

P(A|Bn)P(Bn).

To derive this result, put B :=nBn, and observe thatg

P(B) =

∑

P(Bn) = 1.

It follows thatP(Bc) = 1 − P(B) = 0. Next, for any event A, A ∩ Bc⊂ Bc, and so 0 ≤ P(A ∩ Bc) ≤ P(Bc) = 0.

Hence,P(A ∩ Bc) = 0. Writing (recall Figure 1.13(a))

A = (A ∩ B) ∪ (A ∩ Bc),

it follows that

P(A) = P(A ∩ B) + P(A ∩ Bc₎

= P(A ∩ B) = P A∩ n Bn = P n [A ∩ Bn] =

∑

n P(A ∩ Bn). (1.27)

This formula is illustrated in Figure 1.10(b), where the area of the disk is the sum of the areas of the different shaded parts.

To computeP(Bk|A), write

P(Bk|A) = P(A ∩ B_P(A)k) = P(A|B_P(A)k)P(Bk).

In terms of Figure 1.10(b), this formula says thatP(Bk|A) is the ratio of the area of the kth

shaded part to the area of the whole disk. Applying the law of total probability toP(A) in the denominator yields the general form of Bayes’ rule,

P(Bk|A) = P(A|Bk)P(Bk)

∑

P(A|Bn)P(Bn)

g_{Notice that since we do not require}

nBn= Ω, the Bndo not, strictly speaking, form a partition. However,

sinceP(B) = 1 (that is, B is an almost sure event), the remainder set (cf. (1.8)), which in this case is Bc_{, has} probability zero.

In formulas like this, A is an event that we observe, while the Bnare events that we cannot

observe but would like to make some inference about. Before making any observations, we know the prior probabilitiesP(Bn), and we know the conditional probabilities P(A|Bn).

After we observe A, we compute the posterior probabilitiesP(Bk|A) for each k.

Example 1.22. In Example 1.21, before we learn any information about a packet, that

packet’s prior probability of being routed through El Paso isP(E) = 3/4 = 0.75. After we observe that the packet is not dropped, the posterior probability that the packet was routed through El Paso isP(E|Dc) = 8/11 ≈ 0.73, which is different from the prior probability.

1.6 Independence

In the previous section, we discussed how a computer maker might determine if one of its suppliers provides more reliable devices than the other. We said that if the relative frequency of working chips from supplier S1 is substantially different from the relative frequency of working chips from supplier S2, we would conclude that one supplier is better than the other. On the other hand, if the relative frequencies of working chips from both suppliers are about the same, we would say that whether a chip works not does not depend on the supplier.

In probability theory, if events A and B satisfyP(A|B) = P(A|Bc), we say A does not depend on B. This condition says that

P(A ∩ B) P(B) =

P(A ∩ Bc₎

P(Bc₎ . (1.28)

Applying the formulasP(Bc_{) = 1 − P(B) and}

P(A) = P(A ∩ B) + P(A ∩ Bc₎

to the right-hand side yields

P(A ∩ B) P(B) =

P(A) − P(A ∩ B) 1− P(B) . Cross multiplying to eliminate the denominators gives

P(A ∩ B)[1 − P(B)] = P(B)[P(A) − P(A ∩ B)].

Subtracting common terms from both sides shows thatP(A ∩ B) = P(A)P(B). Since this sequence of calculations is reversible, and since the conditionP(A ∩ B) = P(A)P(B) is symmetric in A and B, it follows that A does not depend on B if and only if B does not depend on A.

When events A and B satisfy

P(A ∩ B) = P(A)P(B), (1.29) we say they are statistically independent, or just independent.

Caution. The reader is warned to make sure he or she understands the difference be-

tween disjoint sets and independent events. Recall that A and B are disjoint if A∩ B =

∅

. This concept does not involveP in any way; to determine if A and B are disjoint requires only knowledge of A and B themselves. On the other hand, (1.29) implies that independence

does depend onP and not just on A and B. To determine if A and B are independent requires

not only knowledge of A and B, but also knowledge ofP. See Problem 61.

In arriving at (1.29) as the deﬁnition of independent events, we noted that (1.29) is equivalent to (1.28). Hence, if A and B are independent,P(A|B) = P(A|Bc). What is this common value? Write

P(A|B) = P(A ∩ B)_P(B) = P(A)P(B)_P(B) = P(A).

We now make some further observations about independence. First, it is a simple exer- cise to show that if A and B are independent events, then so are A and Bc, Acand B, and Ac and Bc. For example, writing

P(A) = P(A ∩ B) + P(A ∩ Bc₎

= P(A)P(B) + P(A ∩ Bc_),

we have

P(A ∩ Bc_{) = P(A) − P(A)P(B)}

= P(A)[1 − P(B)] = P(A)P(Bc_).

By interchanging the roles of A and Ac_{and/or B and B}c_{, it follows that if any one of the four}

pairs is independent, then so are the other three.

Example 1.23. An Internet packet travels from its source to router 1, from router 1

to router 2, and from router 2 to its destination. If routers drop packets independently with probability p, what is the probability that a packet is successfully transmitted from its source to its destination?

Solution. A packet is successfully transmitted if and only if neither router drops it. To

put this into the language of events, for i= 1,2, let Didenote the event that the packet is

dropped by router i. Let S denote the event that the packet is successfully transmitted. Then

S occurs if and only if the packet is not dropped by router 1 and it is not dropped by router 2.

We can write this symbolically as

S = D₁c∩ D₂c.

Since the problem tells us that D1and D2are independent events, so are D1cand D2c. Hence,

P(S) = P(Dc 1∩ Dc2) = P(Dc 1)P(D2c) = [1 − P(D1)][1 − P(D2)] = (1 − p)2_.

Now suppose that A and B are any two events. IfP(B) = 0, then we claim that A and B are independent. We must show that

P(A ∩ B) = P(A)P(B) = 0.

To show that the left-hand side is zero, observe that since probabilities are nonnegative, and since A∩ B ⊂ B,

0 ≤ P(A ∩ B) ≤ P(B) = 0. (1.30) We now show that ifP(B) = 1, then A and B are independent. Since P(B) = 1, P(Bc) = 1− P(B) = 0, and it follows that A and Bcare independent. But then so are A and B.

Independence for more than two events

Suppose that for j= 1,2,..., Ajis an event. When we say that the Ajare independent,

we certainly want that for any i= j,

P(Ai∩ Aj) = P(Ai)P(Aj).

And for any distinct i, j,k, we want

P(Ai∩ Aj∩ Ak) = P(Ai)P(Aj)P(Ak).

We want analogous equations to hold for any four events, ﬁve events, and so on. In general, we want that for every ﬁnite subset J containing two or more positive integers,

P j∈J Aj =

∏

j∈J P(Aj).

In other words, we want the probability of every intersection involving ﬁnitely many of the Aj to be equal to the product of the probabilities of the individual events. If the above

equation holds for all ﬁnite subsets of two or more positive integers, then we say that the Aj

are mutually independent, or just independent. If the above equation holds for all subsets

J containing exactly two positive integers but not necessarily for all ﬁnite subsets of 3 or

more positive integers, we say that the Ajare pairwise independent.

Example 1.24. Given three events, say A, B, and C, they are mutually independent if

and only if the following equations all hold,

P(A ∩ B ∩C) = P(A)P(B)P(C) P(A ∩ B) = P(A)P(B) P(A ∩C) = P(A)P(C) P(B ∩C) = P(B)P(C).

It is possible to construct events A, B, and C such that the last three equations hold (pairwise independence), but the ﬁrst one does not.6 It is also possible for the ﬁrst equation to hold while the last three fail.7

Example 1.25. Three bits are transmitted across a noisy channel and the number of

correct receptions is noted. Find the probability that the number of correctly received bits is two, assuming bit errors are mutually independent and that on each bit transmission the probability of correct reception isλ for some ﬁxed 0≤λ ≤ 1.

Solution. When the problem talks about the event that two bits are correctly received,

we interpret this as meaning exactly two bits are received correctly; i.e., the other bit is received in error. Hence, there are three ways this can happen: the single error can be in the ﬁrst bit, the second bit, or the third bit. To put this into the language of events, let Cidenote

the event that the ith bit is received correctly (soP(Ci) =λ), and let S2denote the event that

two of the three bits sent are correctly received.hThen

S2= (C1c∩C2∩C3) ∪ (C1∩C2c∩C3) ∪ (C1∩C2∩C3c).

This is a disjoint union, and soP(S2) is equal to

P(Cc

1∩C2∩C3) + P(C1∩C2c∩C3) + P(C1∩C2∩C3c). (1.31)

Next, since C1, C2, and C3are mutually independent, so are C1and(C2∩C3). Hence, C₁c

and(C1∩C2) are also independent. Thus,

P(Cc

1∩C2∩C3) = P(C1c)P(C2∩C3)

= P(Cc

1)P(C2)P(C3)

= (1 −λ)λ2_.

Treating the last two terms in (1.31) similarly, we haveP(S2) = 3(1 −λ)λ2. If bits are as

likely to be received correctly as incorrectly, i.e.,λ= 1/2, then P(S2) = 3/8.

Example 1.26. If A1,A2,... are mutually independent, show that

P _∞ n=1 An =

∏

∞ n=1P(A n). Solution. Write P ∞ n=1 An = lim N→∞P N n=1 An , by limit property (1.14), = lim N→∞ N

∏

n=1 P(An), by independence, =

∏

∞ n=1 P(An),

where the last step is just the deﬁnition of the inﬁnite product.

h_{In working this example, we again do not explicitly specify the sample space}_{Ω or the probability measure P.}

Example 1.27. Consider the transmission of an unending sequence of bits over a noisy

channel. Suppose that a bit is received in error with probability 0< p < 1. Assuming errors occur independently, what is the probability that every bit is received in error? What is the probability of ever having a bit received in error?

Solution. We use the result of the preceding example as follows. Let Ω be a sample

space equipped with a probability measureP and events An,n = 1,2,..., with P(An) = p,

where the Anare mutually independent.9 Thus, Ancorresponds to, or models, the event that

the nth bit is received in error. The event that all bits are received in error corresponds to

_∞

n=1An, and its probability is

P ∞ n=1 An = lim N→∞ N

∏

n=1 P(An) = lim N→∞p N _{= 0.}

The event of ever having a bit received in error corresponds to A :=∞n=1An. Since

P(A) = 1 − P(Ac_{), it sufﬁces to compute the probability of A}c₌∞

n=1Anc. Arguing exactly as above, we have P ∞ n=1 Anc = lim N→∞ N

∏

n=1 P(Ac n) = lim N→∞(1 − p) N _{= 0.} Thus,P(A) = 1 − 0 = 1.

In document Probability and Random Processes for Electrical and Computer Engineers (Page 40-48)