• No results found

Week 3 Conditional probabilities, Bayes formula, WEEK 3 page 1 Expected value of a random variable

N/A
N/A
Protected

Academic year: 2022

Share "Week 3 Conditional probabilities, Bayes formula, WEEK 3 page 1 Expected value of a random variable"

Copied!
7
0
0

Loading.... (view fulltext now)

Full text

(1)

Week 3 Conditional probabilities, Bayes formula, WEEK 3 page 1 Expected value of a random variable

We recall our discussion of 5 card poker hands.

Example 13 : a) What is the probability of event A that a 5 card poker hand contains one or more aces.

This is the probability of the complement of the event that the hand contains no aces.

P( no aces ) =

485

/

525

. Thus P( A) = P( 1 or more ace) = 1 – P( no aces ) =

525

485

/

525

where the numerator

525

485

is the number of hands having 1 or more aces (i.e. not having zero aces ). We could also have computed this by noting that the event 1 or more aces is the union of the 4 events : exactly 1 ace (hence 4 non-aces chosen from 48 non-aces) , exactly two aces (hence 3 non- aces) , exactly 3 aces or exactly 4 aces. By our basic counting principles we then must have

525

485

=

41

484

42

483

43

482

44

481

b) What is the probability of the event B that we are dealt a full house consisting of 3 aces and 2 kings ? The probability is the number of ways to choose 3 aces from the 4 aces in the deck times the ways to choose 2 kings from 4 over the total number of 5 card poker hands or

43

42

/

525

.

c) What is the probability of any full house (3 of one kind two of another) ? This is the same as in b) above except there are 13 ways to choose the kind that we have 3 of and 12 ways to choose the kind we have 2 of (since it can't be the kind we already picked). So probability 13⋅12⋅

43

42

/

525

d) What is the conditional probability P(B | A ) of the full house in b) if we are told by the dealer before he gives us our 5 cards that our hand has at least 1 ace in it already. This is a conditional probability problem conditioned on the event A that our hand has at least one ace in it. In this case we saw in part a) that the number of such hands with one or more ace is the total number of hands minus the number having no aces or

525

485

hands. These hands constitutes the reduced sample space A for the conditional event problem (i.e. the event A that our hand has at least one ace in it in this example) . We can assume that our full house is then randomly selected from these hands with each such hand being equally likely. This yields the number of full houses aces over kings found in the numerator of b) above over the size of the reduced sample space or

P  B | A=

43

42

525

485

=

P  B∩A

P  A

To see the last equality, note that by dividing both the numerator and denominator above by the total number of poker hands

525

hands this is the same as the answer in b) divided by the answer in a) i.e. it is equal to P( B ) / P( A) and since the full house containing 3 aces that is the event B is

contained in the event A that at least one ace occurs we also have B=B∩ A . This is how we will define conditional probability in general. I.e. we have motivated the following definition.

(2)

WEEK 3 page 2 Definition of conditional probability : for any two events A and B with P(A) non-zero

we define P  B | A = P  B∩A

P  A . If we let p(B)= P  B | A one checks that for fixed A the set function measure p(B) satisfies the 3 axioms needed for it to be a probability measure. Namely it satisfies

1) 0≤ p  A≤1

2) p = p A=P  A

P  A=1 (i.e. since ∩ A= A the sample space  has measure 1 ) 3) countable additivity for p( ) follows directly from the same property of P( ) .

Thus a conditional probability measure really is a probability measure.

The above definition can be re-written as the multiplication rule for conditional probabilities : P  A∩B = P  A⋅P  B | A .

For three events (and similarly for more than three) this takes the form P  A∩B∩C=P  A⋅P  B| A⋅P C | A∩B ,

which can be verified by going back to the definition of conditional probabilities. If we think of events which are ordered in time, this says that the probability of a sequence of events occurring may be written as a product where at each step of the product we condition the next event on the previous events that are assumed to have already occurred. For the three events above first A occurs and then B occurs (given A already has) and then C occurs (given A and B have already occurred).

Example 1 : To calculate the probability of selecting 3 aces in a row from a randomly shuffled 52 card deck, we have done such problems directly by a counting argument. Namely the number of ways to do this is the number of ways to choose 3 aces from the 4 in the deck and to get the probability we divide by the number of ways to select 3 cards from 52 . I.e .

P( select three aces in a row from a randomly shuffled 52 card deck )

=

43

/

523

= 52⋅51⋅50 .4⋅3⋅2

We could also obtain the same result using the multiplication rule for conditional probabilities namely letting Ai be the event that the ith card selected is an ace, we have

P  A1A2A3=P  A1⋅P  A2| A1⋅P  A3| A1A2 = 4 52⋅3

51⋅2 50

That is, for the first ace there are 4 ways (aces) to choose an ace out of 52 equally likely cards to select , but having chosen an ace there are now 3 ways to choose an ace from the remaining 3 aces left in the 51 equally likely cards remaining, and finally 2 ways to choose the third ace out of the remaining 2 aces in the 50 equally likely cards remaining given that an ace was obtained in each of the previous two selections.

Example 2 : problem 3.70 b) from the text : during the month of May in a certain town the probability of the event Rk1 that day k+1 is rainy given Rk that day k was is .80 . That is assuming days are either rainy or sunny (not rainy) P  Rk1| Rk=.80 . This implies P  Sk 1| Rk=.20 is the

probability of a sunny day given the previous day was raining. We are also told that

P  Rk1|Sk=.60 is the probability that the next day will be rainy given the previous was sunny.

What then is the probability that for some 5 consecutive days in May in this town a rainy day is

followed by two more rainy days, then a sunny day and then a rainy day. We are not told the probability that a particular day is rainy so we must interpret this as the probability that given the first day is rainy

(3)

that the second and third are too, the fourth sunny and the fifth rainy. WEEK 3 page 3 By the multiplication rule for conditional probabilities we have

P  R5S4R3R2| R1=P  R5| S4R3R2R1⋅P  S4| R3R2R1⋅P  R3| R2R1⋅P  R2| R1

now we use the property (so-called Markov property) that the next event only depends on the previous one and nothing earlier (one says that the Markov chain has a “memory of one”. to get

= P R5| S4⋅P S4| R3⋅P R3| R2⋅P R2|R1 = .6⋅.2⋅.8⋅.8 = .0768 for our desired conditional probability.

Remark : A Markov chain generalizes this example to the case where there may be more than the two states rainy or sunny. For example in the finite state space case for a 5 state chain, the chain is

described by a 5 by 5 transition matrix of conditional probabilities. For example the row 1 column 3 entry of the matrix gives the conditional probability of going to state 3 given that the previous state was state 1. If we want to know the probability of going from state 2 to state 5 in 4 time steps we look at the (2,5) entry of the fourth power of the transition matrix. If one wants to remember the last two states (memory of two) there is a trick one uses. This situation can still be described by a Markov chain where now we enlarge the state space from 5 states to all 25 pairs of states (which we could label as states 1 through 25). Now the transition matrix is a 25 by 25 matrix of 625 transition probabilities. Similarly if we want to remember the previous three states our enlarged state space would then consist of all 125 triples of states one through 5 (so 5 cubed or 125 states) etc. To learn more consider taking Math 632 Introduction to Stochastic (random) Processes.

Independent events : Intuitively what we mean when we say two events A and B are independent such as two consecutive flips of a fair coin, is that being told that event B has occurred (the first flip yielded

a head say) should not influence the probability of A occurring (the second flip is a tail say) or in symbols P  A| B=P  A .

Equivalently by the definition of conditional probability, this is true if and only if P  A∩B=P  A P  B ( <-- pairwise indepence )

i.e. the probability of the intersection of the two events factors as the product of the probabilities of the individual events. More generally, we say a collection of events are independent if the probability of the intersection of the events in any sub-collection of two or more of the events factors as the product of the probabilities of the individual events: P Ai1Ai2∩...∩ Aik=P Ai1×P Ai2×. . .×P Aik . It is possible for three events to fail to be independent ( not independent = dependent ) even when any two of the events are pairwise independent.

Example 3: Three events which are dependent (not independent) but which are pairwise independent : For a simple example of this consider flipping a fair coin twice. Let

A= the first flip yields a head={HH, HT}, B= the second flip yields a head={TH, HH},

C= exactly one head occurs in the two flips={TH, HT}

Note that the intersection of the three events is the empty set which has probability 0, we have P  A∩B∩C=P ∅=0≠P  A⋅P  B⋅P C=1/8 since the individual events each have probability

P  A=P  B= PC =1/2 so their product is 1/8. Since 0 is not equal to 1/8, A, B, and C are not independent events. but we claim the probability of the intersection of any two of these events is

P  A∩B=P  B∩C =P  A∩C =1 /4 , so these are pairwise independent events since clearly P  A∩B=P  A⋅P  B etc.

Example 4 : Consider 5 rolls of a fair (six-sided) die. The probability of rolling a 3 for any particular

(4)

WEEK 3 page 4 roll is 1/6 while the probability of not rolling a 3 is 5/6 by the probability of the complement. Find the probability of rolling exactly two 3's in 5 rolls. Using independence, the probability of any particular sequence of 5 rolls, which we view as 5 independent events, two of which involve rolling a 3, and the other three involve rolling anything else , is the product of the individual probabilities1/6 times 1/6 times 5/6 times 5/6 times 5/6. But there are 5 choose 2 ways that we could have selected the particular two rolls in which the 3 occurred. Thus by the sum rule for probabilities of disjoint events :

P( two threes in 5 rolls) = =

52

1/625/63 .

This is an example of a binomial random variable where the probability of success (a 3 is rolled) for each trial (roll) is p=1/6 and the probability of failure is (1-p) = 5/6. Similar reasoning gives that the probability of exactly k successes in n independent trials each having success probability p is the binomial probability b( k; n, p) =

P( exactly k successes in n independent trials each having success probability p ) =

nk

pk1− pn−k .

Example 5 : Consider a system where parts 1 and 2 operate independently of one another each with failure probability .1 but are running in parallel so that the system diagram looks like

_____

______| 1 |______

/ |____| \

---/ \__________

\ _____ / \______| 2 |______/

|____|

Letting

A ={ component 1 operates successfully}, B= {component 2 operates successfully},

then P( system succeeds ) = P( either 1 or 2 operates) = P  A∪B=P  AP  B−P  A⋅P  B=.9.9−.9⋅.9=.99 so the combined system is operational 99% of the time.

A slightly more complicated example involves a similar system like _____ _____

______| 1 |_____| 3 |__

/ |____| |_____| \

---/ \_________

\ _____ / \______| 2 |______ ______/

|____|

where now with event C = {component 3 operates successfully}, with C independent of A and B P( system succeeds ) =

P  A∩C∪B=P  A∩CP  B−P  A∩B∩C=.9.9.9−.9.9.9=.981

using independence of the three events and assuming component 3 also fails with probability .1.

The success probability is slightly lower than before since now both components 1 and 3 must work

(5)

properly for the top series to work. WEEK 3 page 5 Example 6 : Problem 3.75 : A tree diagram for conditional probabilities is a useful device.. Figure 3.16 of the text used for exercise 3.75 is the following

.30 .30 B--- A B--- A / \ / \

.4 / \--- A .4 / \--- A / which can be filled in as / .70 \ \

\ .6\ .8 \ B --- A \ B --- A \ \

\--- A \--- A .2 .2

The interpretation is that P  B=.4 , P  A | B=.30, P  A| B=.20 from which we infer P  B=.6 , P  A | B=.7 , P  A| B=.8 (using the law of the probability for the complement ) a) P  A=P  B P  A | BP  B P  A| B = (.4)(.3) + (.6)(.8) = .60

b) P  B | A=P  A∩ B

P  A =P  B P  A| B

P  A = .4.3

.60 = 1

5 = .20 by part a) using the multiplication rule for conditional probabilities.

Note that the original diagram gave us P  A | B and what was wanted in part b) was to reverse the order of conditioning, that is to find P  B | A . This is the situation where Bayes' theorem applies.

One has a collection of mutually exclusive (disjoint) events which exhaust the sample space. In this case the sample space is a disjoint union of B and B .

c) Similarly P  B| A=P  B P  A| B

P  A = .4.7

.4 = .70

Part a) is referred to as the rule of total probability (or rule of elimination) . It is used to get the denominator in part b). Parts b) and c) are known as Bayes' theorem. To get the rule of total

probability we note that the disjoint partition of the sample space such as B∪B= also partitions any set A= A∩B∪ A∩ B as a disjoint union. Since the union is disjoint, the third axiom of probability gives P  A=P  A∩BP  A∩ B . We then re-write each probability on the right via the multiplication rule for conditional probabilities (essentially the definition of conditional

probability). This gives the rule of total probability which gives the denominator in Bayes' theorem.

More generally if E1E2∪... Ek= is a disjoint union which exhausts the sample space  then Bayes' theorem says P  El| D= P  ElP  D | El

j=1 k

P  EjP  D| Ej. The probabilities P  El l=1, .. , k are called the priors (which reflect our best knowledge before the experiment D). Then we collect some new data and update these to get the posterior probabilities P  El| D . Certain types of probability distributions have the property that the prior and posterior both have a similar form except that certain real number parameters characteristic of the distribution change in ways which are easy to compute.

Then if we get some new data in we can regard the old posterior as the new prior and use the new data

(6)

to update to get the new posterior and so on. WEEK 3 page 6 Example 7 : An Ace Electronics dealer sells 3 brands of televisions : 50% are the first brand which are Italian made , 30% are the second brand which are French made, and 20% are the third brand which are Swiss made. Each TV comes with a 1 year warranty. We know that 25% of the Italian TVs will require repair work under warranty, 20% of the French will require repairs under warranty, and 10% of the Swiss TVs will need repair under warranty.

a) What is the probability that a randomly selected customer has bought an Italian TV that will need repairs under warranty ?

b) What is the probability that a randomly selected customer has a TV that will need repairs under warranty ?

c) If the customer returns to the store with a TV that needs repairs under warranty, what is the probability that it is an Italian TV? French ? Swiss ?

Letting A be the event that a randomly selected TV is Italian , B that it is French, C that it is Swiss and R that it will need repairs under warranty.

We are given that P  A=.50, P  B=.30, PC =.20, P R | A=.25, P  R |B=.20, P  R| C =.10 Part a) asks for P  A∩R=P  A⋅P R | A=.50⋅.25=.125=1 /8. by the product rule for conditional probabilities.

Part b) wants P(R) which uses the rule of total probability

P  R= P R∩AP  R∩BP  R∩C =P  A P  R| AP  B P  R| BP C  P  R|C 

= .125 + .060 + .020 = .205.

Part c) asks for P  A| R=P  A P  R| A

P  R =.125

.205=.61 using Bayes' theorem and again for P  B| R=.060

.205=.29 and P C | R= .20

.205=.10 which could also be obtained as 1−P  A| R−P  B | R=1−.61−.29=.10 .

Example 8 : Suppose a certain diagnostic test is 98% accurate both on those who do and those that don't have a disease. If .3%=.003 of the population has cancer, find the probability that a randomly selected tested person has cancer (C) given that the test is positive (+). The given information says

P C = .003, P |C = .98 = P −| C from which we deduce that P  C=.997, P | C =.02=P −|C  Then by Bayes' theorem, P ( C |) = P|C  PC

P |C PC P | C P  C = .98×.003

.98×.003.02×.997 ≈ 3

23 . This is slightly larger than 15% . Thus the posterior probability of having the disease (given a positive test result) is over 150 chances out of 1000 up from the original prior probability of .003 ( 3 chances in 1000 ) prior to testing.

A positive test result only gives a 15% chance of having the disease due to the fact that the chance of having it in the population as a whole is so small.

Expectation of a random variable (also called its expected value or mean value)

Consider the following game: we flip a fair coin 5 times. If a head occurs for the first time on the jth flip the game pays us winnings amount W =W =aj=2j dollars for 1≤ j≤5 where  is a particular outcome of the experiment that is a sequence of 5 heads or tails such that the first head occurs on the jth flip and using independence of flips this happens with probability

pj=P  :W =aj=1

2j (= P W =aj for short ) . If no heads occur in 5 flips we'll take j=6 so that the game pays us the grand prize of W =26=$ 64 and this occurs with probability 1/32. Now the 64 dollar question is : How much are you willing to pay to play this game so that on average you

(7)

will break even ? WEEK 3 page 7 If we play the game a large number of times n then by the relative frequency interpretation of

probabilities, the number of times sj we won amount aj dollars (successes sj in winning the jth amount ) is approximately n pj (equivalently the probability pjsj

n is approximately the relative fraction of times we won amount aj ). Thus if we play the game independently many times n wining amount Wk on the kth time that we played, our long term average or sample mean winnings

W=

k=1 n

Wk

n is approximately ≈

j

ajn pj

n or canceling the n's and denoting our average winnings by E[ W ] (the expected value of the random variable W ) we find for the

definition of the expectation of any discrete random variable W taking possible values aj (from a countable set of values) with probabilities pj :

E [W ]=

j

aj pj=

j

ajP Wj=aj .

In our particular example when the sample size n gets big but the sum over j is for 1≤ j≤6 this gives E [W ]=21/241/ 481/8161/16321/3264 1/32

= 111112=$ 7 .

Thus we should be willing to pay $7 each time we play the game, assuming we play more than 32 times, long enough to win the grand prize which we expect to happen around 1/32 of the time.

The situation is a little different if instead of stopping at the 5th flip we flip the coin 15 times with grand prize 216 dollars. It is easy enough to calculate the expectation in this case using the above formula.

But realistically not all of us would want to wait around on average 215 times which could take several years before we see the rare event of winning the grand prize which occurs with probability

1/ 215 . Economists speak of a utility function which describes how much playing such a game is worth to us personally and which may vary from person to person depending on our tastes in gambling and how much we are willing to risk.

In one of the homeworks problem 3.90 involving expected values, a company pays some per unit cost of C dollars and sells the item at a per unit sales price S=S1. If a fixed number k of items are stocked for the day then the cost of the k units is a fixed amount k⋅C . The demand (how many units customers desire to purchase that day) is assumed to be a random variable where pj gives the probability that the demand equals j units that day and the sales price resulting from a given demand j is then a random variable X given by

X =S⋅j if j≤k−1 else X =S⋅k if j≥k

. (I.e. The actual sales cannot exceed the number in stock.) Then the expected profit is the expected sales price minus the fixed cost or

E [ P ]=E [ X ]−C⋅k =

j =0 k−1

pjS⋅ j

j≥k

pj⋅S⋅k −C⋅k .

That is for demand j less than the number k in stock our sales price for the j units sold is S⋅j and this occurs with probability pj while if the demand is greater than or equal to the number k in stock we sell k items at a price S⋅k and this occurs with probability

j≥k

pj . Note we could have re-written the fixed cost k⋅C =k⋅C⋅

j

pj (since the sum of the probabilities equals 1) and then the above is equivalent to the expected profit where the profit is the random variable X −C⋅k which is the sales price random variable minus the fixed cost for k items in stock .

References

Related documents

No Enbrel Humira Simponi Proceed to appropriate program policy Other Proceed to Figure 3 Deny Actemra Cimzia Orencia Deny Deny Will it be used in combination with a potent

• Adds to fluorescence label of cells -&gt; decreases fluorescence detection limit • observed in all fluorescence channels,. but decreases dramatically at longer wavelengths

Expanding digital and online offerings can begin to alleviate significant inequities that exist within our system today between students who have access to high quality teachers

Table 1 shows the majority of the observed female teenagers at Playen district, Gunung Kidul has common premarital sex behaviors, and 7 teenagers have good premarital sex

What are the driving factors leading companies to request sales tax outsourcing services:. • Complexity of returns at the local level of tax (County

The main contributions of our system are: (i) aP2P architecture providing resource sharing, load balancing and robustness; (ii) advanced information system capabilities, based

Health is an issue of fundamental human rights and social justice and binds social work to apply these principles in policy, education, research and practice..

Structure investigations showed that the mechanical properties maximum during the ageing at 200°C/16h (Mg-3Nd-1Gd alloy) and at 250°C/16h (Mg-4Y-3RE alloy) resulted from the