Conditional Probability - Conditional Probability and Independence

Conditional Probability and Independence

2.1 Conditional Probability

Conditional Probability and Independence

Nowand then there is a person born who is so unlucky that he runs into accidents which started to happen to somebody else.

Don Marquis

2.1 Conditional Probability

Suppose you have a well-shufﬂed conventional pack of cards. Obviously (by symmetry), the probability P(T ) of the event T that the top card is an ace is

P(T )= 4 52 = 1

13.

However, suppose you notice that the bottom card is the ace of spades SA. What now is the probability that the top card is an ace? There are 51 possibilities and three of them are aces, so by symmetry again the required probability is₅₁³. To distinguish this from the original probability, we denote it by P(T|SA) and call it the conditional probability of T given that the bottom card is the ace of spades.

Similarly, had you observed that the bottom card was the king of spades S_K, you would conclude that the probability that the top card is an ace is

P(T|SK)= 4 51. Here is a less trivial example.

Example: Poker [Note: In this example the symbol (ⁿ_r) denotes the number of ways of choosing r cards from n cards. If you are unfamiliar with this notation, omit this example at a ﬁrst reading.]

Suppose you are playing poker. As the hand is dealt, you calculate the chance of being dealt a royal flush R, assuming that all hands of five cards are equally likely. (A royal flush comprises 10, J, Q, K, A in a single suit.) Just as you get the answer

P(R)= 4

52 5

₋₁

= 1

649740,

the dealer deals your last card face up. It is the ace of spades, S_A. If you accept the card, what now is your chance of picking up a royal ﬂush?

Intuitively, it seems unlikely still to be P(R) above, as the conditions for getting one have changed. Nowyou need your ﬁrst four cards to be the ten to king of spades precisely.

(Also, had your last card been the two of spades, S2, your chance of a royal ﬂush would deﬁnitely be zero.) As above, to distinguish this newprobability, we call it the conditional probability of R given S_Aand denote it by P(R|SA).

Is it larger or smaller than P(R)? At least you do have an ace, which is a start, so it might be greater. But you cannot nowget a ﬂush in any suit but spades, so it might be smaller.

To resolve the uncertainty, you assume that any set of four cards from the remaining 51 cards is equally likely to complete your hand and calculate that

P(R|SA)=

51 4

₋₁

= 13 5 P(R).

Your chances of a royal ﬂush have more than doubled.

s

Let us investigate these ideas in a more general setting. As usual we are given a sample space, an event space F, and a probability function P(.). We suppose that some event B∈ F deﬁnitely occurs, and denote the conditional probability of any event A, given B, by P( A|B). As we did for P(.), we observe that P(.|B) is a function deﬁned on F, which takes values in [0, 1]. But what function is it?

Clearly, P( A) and P( A|B) are not equal in general, because even when P(B^c)= 0 w e always have

P(B^c|B) = 0.

Second, we note that given the occurrence of B, the event A can occur if and only if A∩ B occurs. This makes it natural to require that

P( A|B) ∝ P(A ∩ B).

Finally, and trivially,

P(B|B) = 1.

After a moment’s thought about these three observations, it appears that an attractive candidate to play the role of P( A|B) is P(A ∩ B)/P(B). We make these intuitive reﬂections formal as follows.

Deﬁnition Let A and B be events with P(B)> 0. Given that B occurs, the con-ditional probability that A occurs is denoted by P( A|B) and deﬁned by

P( A|B) = P( A∩ B) P(B) .

(1)

When P(B)= 0, the conditional probability P(A|B) is not deﬁned by (1). However, to avoid an endless stream of tiresome reservations about special cases, it is conve-nient to adopt the convention that, even when P(B)= 0, we may still write P(A ∩ B) = P( A|B)P(B), both sides having the value zero. Thus, whether P(B) > 0 or not, it is true

2.1 Conditional Probability 53 that

P( A∩ B) = P(A|B)P(B).

Likewise, P( A∩ B^c)= P(A|B^c)P(B^c) and hence, for any events A and B, w e have proved the following partition rule:

(2) Theorem P( A)= P(A ∩ B) + P(A ∩ B^c)

= P(A|B)P(B) + P(A|B^c)P(B^c).

The reader will come to realize the crucial importance of (1) and (2) as he or she discovers more about probability. We begin with a trivial example.

Example: Poker Revisited Let us check that Deﬁnition 1 is consistent with our in-formal discussion earlier in this section. By (1)

P(R|SA)= P(R ∩ SA)/P(SA)= 1

52 5

51 4

52 5

51 4

₋₁

s

Here is a more complicated example.

Example: Lemons An industrial conglomerate manufactures a certain type of car in three towns called Farad, Gilbert, and Henry. Of 1000 made in Farad, 20% are defective;

of 2000 made in Gilbert, 10% are defective, and of 3000 made in Henry, 5% are defective.

You buy a car from a distant dealer. Let D be the event that it is defective, F the event that it was made in Farad and so on. Find: (a) P(F|H^c); (b) P(D|H^c); (c) P(D); (d) P(F|D).

Assume that you are equally likely to have bought any one of the 6000 cars produced.

Solution

(a) P(F|H^c)= P(F ∩ H^c)

P(H^c) by (1),

= P(F)

P(H^c) because F ⊆ H^c,

= 1000 6000

3000 6000 = 1

(b) P(D|H^c)= P(D∩ H^c)

P(H^c) by (1)

= P(D∩ (F ∪ G))

P(H^c) because H^c= F ∪ G,

= P(D∩ F) + P(D ∩ G)

P(H^c) because F∩ G = φ

= P(D|F)P(F) + P(D|G)P(G)

P(H^c) by (1)

= 1 5·1

6+ 1 10·1

3 1 2

on using the data in the question,

= 2 15.

= 1 20·1

2+ 2 15·1

2 on using the data and (b)

= 11 120.

(d) P(F|D) =P(F ∩ D)

P(D) by (1)

= P(D|F)P(F)

P(D) by (1)

= 1 5·1

120 on using the data and (c)

= 4

11.

s

We often have occasion to use the following elementary generalization of Theorem 2.

(3) Theorem We have

P( A)=

P( A|Bi)P(Bi) whenever A⊆

B_i and B_i∩ Bj = φ for i = j; the extended partition rule.

Proof This is immediate from (1.4.3) and (1).

For example, with the notation of (3), we may write

P(B_j|A) = P(Bj ∩ A)/P(A) = P( A|Bj)P(B_j) P( A) ,

and expanding the denominator using (3), we have proved the following celebrated result;

also known as Bayes’s Rule:

2.1 Conditional Probability 55

The following is a typical example of how (4) is applied in practice.

Example: False Positives You have a blood test for some rare disease that occurs by chance in 1 in every 100 000 people. The test is fairly reliable; if you have the disease, it will correctly say so with probability 0.95; if you do not have the disease, the test will wrongly say you do with probability 0.005. If the test says you do have the disease, what is the probability that this is a correct diagnosis?

Solution Let D be the event that you have the disease and T the event that the test says you do. Then, we require P(D|T ), which is given by

P(D|T ) = P(T|D)P(D)

P(T|D)P(D) + P(T |D^c)P(D^c) by (4)

= (0.95)(0.00001)

(0.95)(0.00001) + (0.99999)(0.005)  0.002.

Despite appearing to be a pretty good test, for a disease as rare as this the test is almost

useless.

s

It is important to note that conditional probability is a probability function in the sense deﬁned in Section 1.4. Thus, P(|B) = 1 and, if Ai∩ Aj= φ for i = j, w e have

From these, we may deduce various useful identities (as we did in Section 1.4); for example:

P( A∩ B ∩ C) = P(A|B ∩ C)P(B|C)P(C),

(10) Example Let us prove (5), (6), (7), (8), and (9). First,

A_i|B

= P

A_i

∩ B

P(B) by (1).

= P

( Ai∩ B)

P(B)

P( Ai∩ B)/P(B) by (1.4.3), because the Ai are disjoint,

P( A_i|B) by (1) again,

and we have proved (5). Second, by repeated use of (1), P( A|B ∩ C)P(B|C)P(C) = P( A∩ B ∩ C)

P(B∩ C) ·P(B∩ C)

P(C) ·P(C) = P(A ∩ B ∩ C), if the denominator is not zero. If the denominator is zero, then (6) still holds by convention, both sides taking the value zero.

The relation (7) follows by induction using (6); and (8) and (9) are trivial consequences

of (5).

s

(11) Example: Repellent and Attractive Events The event A is said to be attracted to B if P( A|B) > P(A).

If P( A|B) < P(A), then A is repelled by B and A is indifferent to B if P( A|B) = P(A).

(12)

(a) Showthat if B attracts A, then A attracts B, and B^crepels A.

(b) A flimsy slip of paper is in one of n bulging box files. The event that it is in the j th box file is Bj, where P(Bj)= bj > 0. The event that a cursory search of the jth box file fails to discover the slip is F_j, where P(F_j|Bj)= φj < 1. Showthat Bj and F_j are mutually repellent, but F_jattracts B_i, for i = j.

Solution (a) Because B attracts A, by (1), P( A∩ B) > P(A)P(B), whence, on divid-ing by P( A), we have P(B|A) > P(B). Furthermore, by Theorem 2,

P( A|B^c)P(B^c)= P(A) − P(A|B)P(B) < P(A)(1 − P(B)), because B attracts A,

= P(A)P(B^c).

So B^crepels A (on dividing through by P(B^c)= 0).

(b) By Bayes’ theorem (4),

P(B_j|Fj)= P(F_j|Bj)P(B_j)

i=1P(Fj|Bi)P(Bi)

= φjb_j 1− bj + φjb_j

2.2 Independence 57 because, obviously, for i = j, P(Fj|Bi)= 1. Hence,

P(B_j)− P(Bj|Fj)= b_j(1− bj)(1− φj) 1− bj + φjb_j > 0.

Therefore, Bj is repelled by Fj. Also, for i = j, P(B_i|Fj)− P(Bi)= b_i

1− bj+ φjbj

− bi = b_ib_j(1− φj) 1− bj + φjbj

> 0

so F_jattracts B_i, for i = j.

s

Notice that this agrees with our intuition. We believe quite strongly that if we look in a ﬁle for a slip and fail to ﬁnd it, then it is more likely (than before the search) to be elsewhere.

(Try to think about the consequences if the opposite were true.) This conclusion of Example 11 was not incorporated in our axioms, but follows from them. It therefore lends a small but valuable boost to their credibility.

Finally, we consider sequences of conditional probabilities. Because conditional prob-ability is a probprob-ability function [see (5)], we expect it to be continuous in the sense of Section 1.5. Thus if (as n→ ∞) An → A and Bn → B, then by Theorem 1.5.2 we have

n→∞lim P( A_n|B) = P(A|B) and

n→∞lim P( A|Bn)= P(A|B).

(13)

2.2 Independence

It may happen that the conditional probability P( A|B) is the same as the unconditional probability P( A), so that

P( A)= P(A|B) = P( A∩ B) P(B) . This idea leads to the following:

(1) Deﬁnition (a) Events A and B are independent when P( A∩ B) = P(A)P(B).

(b) A collection of events ( A_i; i ≥ 1) is independent when P

i∈F

A_i

i∈F

P( A_i) for any ﬁnite set F of indices.

(c) Events A and B are conditionally independent, given C, when P( A∩ B|C) = P(A|C)P(B|C).

This does not imply independence unless C = .

(d) A collection of events ( A_i; i ≥ 1) is pairwise independent if P( Ai∩ Aj)= P(Ai)P( Aj) for i = j.

This does not imply independence in general.

It is easy to see that independence is equivalent to the idea of indifference deﬁned in (2.1.12), but the term “indifference” is not in general use. It is usually, but not always, clear when two events are independent, as the next two examples illustrate.

(2) Example: Sport Prior to a game of football, you toss a coin for the kick-off. Let C be the event that you win the toss, and let M be the event that you win the match.

(a) Show that the outcome of the match is independent of whether you win the toss if and only if, for some p and p, with 0< p, p< 1,

P(C ∩ M) = pp, P(C∩ M^c)= p(1 − p), P(C^c∩ M) = (1 − p)p, and

P(C^c∩ M^c)= (1 − p)(1 − p).

(b) Let B be the event that you win both or lose both, so B= {(C ∩ M) ∪ (C^c∩ M^c)}.

Suppose that C and M are indeed independent. Showthat C and B are independent if and only if p= ¹₂.

Solution (a) If C and M are independent, and P(C)= p and P(M) = p, then by deﬁnition P(C∩ M) = ppand so on.

Conversely, for the given probabilities

P(C)= P(C ∩ M) + P(C ∩ M^c)= pp+ p(1 − p)= p and similarly we have P(M)= p. Hence,

P(C)P(M)= pp= P(C ∩ M).

This, together with three similar identities (exercises for you), demonstrates the indepen-dence.

(b) Trivially, P(C∩ B) = P(C ∩ M). Hence, C and B are independent if pp= P(C ∩ M) = P(C)P(B) = p(pp+ (1 − p)(1 − p)).

That is, if (1− p)(1 − 2p)= 0. Because p = 1, it follows that p= ¹₂. The converse is

trivial.

s

(3) Example: Flowers A plant gets two independent genes for ﬂower colour, one from each parent plant. If the genes are identical, then the ﬂowers are uniformly of that colour;

2.2 Independence 59 if they are different, then the flowers are striped in those two colours. The genes for the colours pink, crimson, and red occur in the population in the proportions p:q:r , where p+ q + r = 1. A given plant’s parents are selected at random; let A be the event that its flowers are at least partly pink, and let B be the event that its flowers are striped.

(a) Find P( A) and P(B).

(b) Showthat A and B are independent if p= ²₃ and r = q = ¹₆.

(c) Are these the only values of p, q, and r such that A and B are independent?

Solution (a) With an obvious notation (P for pink, C for crimson, and R for red), we have

P(PP)= P(P)P(P), by parents independence,

= p²,

because P occurs with probability p. Likewise,

P(PR)= P(R)P(P) = rp = P(RP).

Hence,

P( A)= P(P P ∪ P R ∪ P ∪ PC ∪ C P)

= p²+ 2pr + 2pq by (1.4.3),

= 1 − (1 − p)²,

because p+ q + r = 1. (Can you see howto get this last expression directly?) Similarly, P(B)= P(PC ∪ P R ∪ RC) = 2(pq + qr + rp).

(b) The events A and B are independent, if and only if,

P( A)P(B)= P(A ∩ B) = P(PC ∪ P R) = 2(pq + pr).

From part (a), this is equivalent to

(1− (1 − p)²)( pq+ qr + pr) = p(q + r), (4)

and this is satisﬁed by the given values of p, q, and r .

(c) No. Rearranging (4), we see that A and B are independent for any values of q and r lying on the curve r q = 2rq(q + r) + r³+ q³, in the r− q plane. You may care to amuse yourself by showing that this is a loop from the origin. Outside the loop, A and B are

attractive; inside the loop, A and B are repellent.

s

(5) Example 1.13 Revisited: Craps Let us reconsider this game using conditional prob-ability and independence. Recall that Ak is the event that you win by rolling a pair with sum k. Let S_kbe the event that any given roll yields sum k. Now, for example, A₄occurs only if S₄occurs at the first roll and S₄occurs before S₇in later rolls. However, all the rolls after the first until the first occurrence of S4or S7are irrelevant, and rolls are independent.

Hence,

P( A4)= P(S4)P(S4|S4∪ S7)= (P(S4))² P(S4∪ S7) =

3 36

In document Elementary_Probability.pdf (Page 65-74)