Conditional Probability and Independence
2.1 Conditional Probability
Conditional Probability and Independence
Nowand then there is a person born who is so unlucky that he runs into accidents which started to happen to somebody else.
Don Marquis
2.1 Conditional Probability
Suppose you have a well-shuffled conventional pack of cards. Obviously (by symmetry), the probability P(T ) of the event T that the top card is an ace is
P(T )= 4 52 = 1
13.
However, suppose you notice that the bottom card is the ace of spades SA. What now is the probability that the top card is an ace? There are 51 possibilities and three of them are aces, so by symmetry again the required probability is513. To distinguish this from the original probability, we denote it by P(T|SA) and call it the conditional probability of T given that the bottom card is the ace of spades.
Similarly, had you observed that the bottom card was the king of spades SK, you would conclude that the probability that the top card is an ace is
P(T|SK)= 4 51. Here is a less trivial example.
Example: Poker [Note: In this example the symbol (nr) denotes the number of ways of choosing r cards from n cards. If you are unfamiliar with this notation, omit this example at a first reading.]
Suppose you are playing poker. As the hand is dealt, you calculate the chance of being dealt a royal flush R, assuming that all hands of five cards are equally likely. (A royal flush comprises 10, J, Q, K, A in a single suit.) Just as you get the answer
P(R)= 4
52 5
−1
= 1
649740,
the dealer deals your last card face up. It is the ace of spades, SA. If you accept the card, what now is your chance of picking up a royal flush?
51
Intuitively, it seems unlikely still to be P(R) above, as the conditions for getting one have changed. Nowyou need your first four cards to be the ten to king of spades precisely.
(Also, had your last card been the two of spades, S2, your chance of a royal flush would definitely be zero.) As above, to distinguish this newprobability, we call it the conditional probability of R given SAand denote it by P(R|SA).
Is it larger or smaller than P(R)? At least you do have an ace, which is a start, so it might be greater. But you cannot nowget a flush in any suit but spades, so it might be smaller.
To resolve the uncertainty, you assume that any set of four cards from the remaining 51 cards is equally likely to complete your hand and calculate that
P(R|SA)=
51 4
−1
= 13 5 P(R).
Your chances of a royal flush have more than doubled.
s
Let us investigate these ideas in a more general setting. As usual we are given a sample space, an event space F, and a probability function P(.). We suppose that some event B∈ F definitely occurs, and denote the conditional probability of any event A, given B, by P( A|B). As we did for P(.), we observe that P(.|B) is a function defined on F, which takes values in [0, 1]. But what function is it?
Clearly, P( A) and P( A|B) are not equal in general, because even when P(Bc)= 0 w e always have
P(Bc|B) = 0.
Second, we note that given the occurrence of B, the event A can occur if and only if A∩ B occurs. This makes it natural to require that
P( A|B) ∝ P(A ∩ B).
Finally, and trivially,
P(B|B) = 1.
After a moment’s thought about these three observations, it appears that an attractive candidate to play the role of P( A|B) is P(A ∩ B)/P(B). We make these intuitive reflections formal as follows.
Definition Let A and B be events with P(B)> 0. Given that B occurs, the con-ditional probability that A occurs is denoted by P( A|B) and defined by
P( A|B) = P( A∩ B) P(B) .
(1)
When P(B)= 0, the conditional probability P(A|B) is not defined by (1). However, to avoid an endless stream of tiresome reservations about special cases, it is conve-nient to adopt the convention that, even when P(B)= 0, we may still write P(A ∩ B) = P( A|B)P(B), both sides having the value zero. Thus, whether P(B) > 0 or not, it is true
2.1 Conditional Probability 53 that
P( A∩ B) = P(A|B)P(B).
Likewise, P( A∩ Bc)= P(A|Bc)P(Bc) and hence, for any events A and B, w e have proved the following partition rule:
(2) Theorem P( A)= P(A ∩ B) + P(A ∩ Bc)
= P(A|B)P(B) + P(A|Bc)P(Bc).
The reader will come to realize the crucial importance of (1) and (2) as he or she discovers more about probability. We begin with a trivial example.
Example: Poker Revisited Let us check that Definition 1 is consistent with our in-formal discussion earlier in this section. By (1)
P(R|SA)= P(R ∩ SA)/P(SA)= 1
52 5
51 4
52 5
=
51 4
−1
.
s
Here is a more complicated example.
Example: Lemons An industrial conglomerate manufactures a certain type of car in three towns called Farad, Gilbert, and Henry. Of 1000 made in Farad, 20% are defective;
of 2000 made in Gilbert, 10% are defective, and of 3000 made in Henry, 5% are defective.
You buy a car from a distant dealer. Let D be the event that it is defective, F the event that it was made in Farad and so on. Find: (a) P(F|Hc); (b) P(D|Hc); (c) P(D); (d) P(F|D).
Assume that you are equally likely to have bought any one of the 6000 cars produced.
Solution
(a) P(F|Hc)= P(F ∩ Hc)
P(Hc) by (1),
= P(F)
P(Hc) because F ⊆ Hc,
= 1000 6000
3000 6000 = 1
3.
(b) P(D|Hc)= P(D∩ Hc)
P(Hc) by (1)
= P(D∩ (F ∪ G))
P(Hc) because Hc= F ∪ G,
= P(D∩ F) + P(D ∩ G)
P(Hc) because F∩ G = φ
= P(D|F)P(F) + P(D|G)P(G)
P(Hc) by (1)
= 1 5·1
6+ 1 10·1
3 1 2
on using the data in the question,
= 2 15.
(c) P(D)= P(D|H)P(H) + P(D|Hc)P(Hc) by (2)
= 1 20·1
2+ 2 15·1
2 on using the data and (b)
= 11 120.
(d) P(F|D) =P(F ∩ D)
P(D) by (1)
= P(D|F)P(F)
P(D) by (1)
= 1 5·1
6
11
120 on using the data and (c)
= 4
11.
s
We often have occasion to use the following elementary generalization of Theorem 2.
(3) Theorem We have
P( A)=
i
P( A|Bi)P(Bi) whenever A⊆
i
Bi and Bi∩ Bj = φ for i = j; the extended partition rule.
Proof This is immediate from (1.4.3) and (1).
For example, with the notation of (3), we may write
P(Bj|A) = P(Bj ∩ A)/P(A) = P( A|Bj)P(Bj) P( A) ,
and expanding the denominator using (3), we have proved the following celebrated result;
also known as Bayes’s Rule:
2.1 Conditional Probability 55
The following is a typical example of how (4) is applied in practice.
Example: False Positives You have a blood test for some rare disease that occurs by chance in 1 in every 100 000 people. The test is fairly reliable; if you have the disease, it will correctly say so with probability 0.95; if you do not have the disease, the test will wrongly say you do with probability 0.005. If the test says you do have the disease, what is the probability that this is a correct diagnosis?
Solution Let D be the event that you have the disease and T the event that the test says you do. Then, we require P(D|T ), which is given by
P(D|T ) = P(T|D)P(D)
P(T|D)P(D) + P(T |Dc)P(Dc) by (4)
= (0.95)(0.00001)
(0.95)(0.00001) + (0.99999)(0.005) 0.002.
Despite appearing to be a pretty good test, for a disease as rare as this the test is almost
useless.
s
It is important to note that conditional probability is a probability function in the sense defined in Section 1.4. Thus, P(|B) = 1 and, if Ai∩ Aj= φ for i = j, w e have
From these, we may deduce various useful identities (as we did in Section 1.4); for example:
P( A∩ B ∩ C) = P(A|B ∩ C)P(B|C)P(C),
(10) Example Let us prove (5), (6), (7), (8), and (9). First,
P
i
Ai|B
= P
i
Ai
∩ B
P(B) by (1).
= P
i
( Ai∩ B)
P(B)
=
i
P( Ai∩ B)/P(B) by (1.4.3), because the Ai are disjoint,
=
i
P( Ai|B) by (1) again,
and we have proved (5). Second, by repeated use of (1), P( A|B ∩ C)P(B|C)P(C) = P( A∩ B ∩ C)
P(B∩ C) ·P(B∩ C)
P(C) ·P(C) = P(A ∩ B ∩ C), if the denominator is not zero. If the denominator is zero, then (6) still holds by convention, both sides taking the value zero.
The relation (7) follows by induction using (6); and (8) and (9) are trivial consequences
of (5).
s
(11) Example: Repellent and Attractive Events The event A is said to be attracted to B if P( A|B) > P(A).
If P( A|B) < P(A), then A is repelled by B and A is indifferent to B if P( A|B) = P(A).
(12)
(a) Showthat if B attracts A, then A attracts B, and Bcrepels A.
(b) A flimsy slip of paper is in one of n bulging box files. The event that it is in the j th box file is Bj, where P(Bj)= bj > 0. The event that a cursory search of the jth box file fails to discover the slip is Fj, where P(Fj|Bj)= φj < 1. Showthat Bj and Fj are mutually repellent, but Fjattracts Bi, for i = j.
Solution (a) Because B attracts A, by (1), P( A∩ B) > P(A)P(B), whence, on divid-ing by P( A), we have P(B|A) > P(B). Furthermore, by Theorem 2,
P( A|Bc)P(Bc)= P(A) − P(A|B)P(B) < P(A)(1 − P(B)), because B attracts A,
= P(A)P(Bc).
So Bcrepels A (on dividing through by P(Bc)= 0).
(b) By Bayes’ theorem (4),
P(Bj|Fj)= P(Fj|Bj)P(Bj)
n
i=1P(Fj|Bi)P(Bi)
= φjbj 1− bj + φjbj
2.2 Independence 57 because, obviously, for i = j, P(Fj|Bi)= 1. Hence,
P(Bj)− P(Bj|Fj)= bj(1− bj)(1− φj) 1− bj + φjbj > 0.
Therefore, Bj is repelled by Fj. Also, for i = j, P(Bi|Fj)− P(Bi)= bi
1− bj+ φjbj
− bi = bibj(1− φj) 1− bj + φjbj
> 0
so Fjattracts Bi, for i = j.
s
Notice that this agrees with our intuition. We believe quite strongly that if we look in a file for a slip and fail to find it, then it is more likely (than before the search) to be elsewhere.
(Try to think about the consequences if the opposite were true.) This conclusion of Example 11 was not incorporated in our axioms, but follows from them. It therefore lends a small but valuable boost to their credibility.
Finally, we consider sequences of conditional probabilities. Because conditional prob-ability is a probprob-ability function [see (5)], we expect it to be continuous in the sense of Section 1.5. Thus if (as n→ ∞) An → A and Bn → B, then by Theorem 1.5.2 we have
n→∞lim P( An|B) = P(A|B) and
n→∞lim P( A|Bn)= P(A|B).
(13)
2.2 Independence
It may happen that the conditional probability P( A|B) is the same as the unconditional probability P( A), so that
P( A)= P(A|B) = P( A∩ B) P(B) . This idea leads to the following:
(1) Definition (a) Events A and B are independent when P( A∩ B) = P(A)P(B).
(b) A collection of events ( Ai; i ≥ 1) is independent when P
i∈F
Ai
=
i∈F
P( Ai) for any finite set F of indices.
(c) Events A and B are conditionally independent, given C, when P( A∩ B|C) = P(A|C)P(B|C).
This does not imply independence unless C = .
(d) A collection of events ( Ai; i ≥ 1) is pairwise independent if P( Ai∩ Aj)= P(Ai)P( Aj) for i = j.
This does not imply independence in general.
It is easy to see that independence is equivalent to the idea of indifference defined in (2.1.12), but the term “indifference” is not in general use. It is usually, but not always, clear when two events are independent, as the next two examples illustrate.
(2) Example: Sport Prior to a game of football, you toss a coin for the kick-off. Let C be the event that you win the toss, and let M be the event that you win the match.
(a) Show that the outcome of the match is independent of whether you win the toss if and only if, for some p and p, with 0< p, p< 1,
P(C ∩ M) = pp, P(C∩ Mc)= p(1 − p), P(Cc∩ M) = (1 − p)p, and
P(Cc∩ Mc)= (1 − p)(1 − p).
(b) Let B be the event that you win both or lose both, so B= {(C ∩ M) ∪ (Cc∩ Mc)}.
Suppose that C and M are indeed independent. Showthat C and B are independent if and only if p= 12.
Solution (a) If C and M are independent, and P(C)= p and P(M) = p, then by definition P(C∩ M) = ppand so on.
Conversely, for the given probabilities
P(C)= P(C ∩ M) + P(C ∩ Mc)= pp+ p(1 − p)= p and similarly we have P(M)= p. Hence,
P(C)P(M)= pp= P(C ∩ M).
This, together with three similar identities (exercises for you), demonstrates the indepen-dence.
(b) Trivially, P(C∩ B) = P(C ∩ M). Hence, C and B are independent if pp= P(C ∩ M) = P(C)P(B) = p(pp+ (1 − p)(1 − p)).
That is, if (1− p)(1 − 2p)= 0. Because p = 1, it follows that p= 12. The converse is
trivial.
s
(3) Example: Flowers A plant gets two independent genes for flower colour, one from each parent plant. If the genes are identical, then the flowers are uniformly of that colour;
2.2 Independence 59 if they are different, then the flowers are striped in those two colours. The genes for the colours pink, crimson, and red occur in the population in the proportions p:q:r , where p+ q + r = 1. A given plant’s parents are selected at random; let A be the event that its flowers are at least partly pink, and let B be the event that its flowers are striped.
(a) Find P( A) and P(B).
(b) Showthat A and B are independent if p= 23 and r = q = 16.
(c) Are these the only values of p, q, and r such that A and B are independent?
Solution (a) With an obvious notation (P for pink, C for crimson, and R for red), we have
P(PP)= P(P)P(P), by parents independence,
= p2,
because P occurs with probability p. Likewise,
P(PR)= P(R)P(P) = rp = P(RP).
Hence,
P( A)= P(P P ∪ P R ∪ P ∪ PC ∪ C P)
= p2+ 2pr + 2pq by (1.4.3),
= 1 − (1 − p)2,
because p+ q + r = 1. (Can you see howto get this last expression directly?) Similarly, P(B)= P(PC ∪ P R ∪ RC) = 2(pq + qr + rp).
(b) The events A and B are independent, if and only if,
P( A)P(B)= P(A ∩ B) = P(PC ∪ P R) = 2(pq + pr).
From part (a), this is equivalent to
(1− (1 − p)2)( pq+ qr + pr) = p(q + r), (4)
and this is satisfied by the given values of p, q, and r .
(c) No. Rearranging (4), we see that A and B are independent for any values of q and r lying on the curve r q = 2rq(q + r) + r3+ q3, in the r− q plane. You may care to amuse yourself by showing that this is a loop from the origin. Outside the loop, A and B are
attractive; inside the loop, A and B are repellent.
s
(5) Example 1.13 Revisited: Craps Let us reconsider this game using conditional prob-ability and independence. Recall that Ak is the event that you win by rolling a pair with sum k. Let Skbe the event that any given roll yields sum k. Now, for example, A4occurs only if S4occurs at the first roll and S4occurs before S7in later rolls. However, all the rolls after the first until the first occurrence of S4or S7are irrelevant, and rolls are independent.
Hence,
P( A4)= P(S4)P(S4|S4∪ S7)= (P(S4))2 P(S4∪ S7) =
3 36