MCB 432
Basic Probability:
Probability
Probability of an event is usually represented as P(event) For tossing a fair coin:
P(head) = 1/2 P(tail) = 1/2
When throwing a fair 6-sided die, the probability of any specific number is 1/6. So,
Combining Alternatives
If there are multiple independent ways to arrive at a goal, the combined probability is the sum of the individual probabilities. That is,
P(A or B) = P(A) + P(B)
Because these outcomes are mutually exclusive (you cannot get a 1 and a 2 on the same throw), the probability of a die throw yielding a 1 or a 2 is:
P(1 or 2) = P(1) + P(2) = 1/6 + 1/6 = 2/6 = 1/3 Similarly, the probability of an even number is:
Something Happens
A useful1 rule is the sum of the probabilities of all possible outcomes is 1 (something will happen).
This is why the probability of a given number on a die is 1/6 (it is trivial in this case, but it is very useful in other circumstances). It comes from:
P(1) + P(2) + P(3) + P(4) + P(5) + P(6) = 1
By the definition of a fair die, all outcomes are equally likely: P(1) = P(2) = P(3) = P(4) = P(5) = P(6)
So we can replace P(2) etc. with P(1):
P(1) + P(1) + P(1) + P(1) + P(1) + P(1) = 1 6 x P(1) = 1
P(1) = 1/6 [= P(2) = P(3) = P(4) = P(5) = P(6)]
Combinations of Events
To get the probability that both of two separate (independent) things will happen, we take the product of the probabilities. That is,
P(A and B) = P(A) x P(B)
For example, when throwing a die two times, the probability of
getting 1's both times is the probability of a 1 followed by a 1, which is:
P(1,1) = P(1) x P(1) = 1/6 x 1/6 = 1/36
The same concept applies to the probability of a 1 followed by a 2: P(1,2) = P(1) x P(2) = 1/6 x 1/6 = 1/36
Alternative Combinations of Events
The previous case specified a 1 followed by a 2. What if all we want is that the total of the dice is 3? There are two ways to get a total of 3: 1 then 2, or 2 then 1.P(total 3) = P(1,2) + P(2,1)
= 1/6 x 1/6 + 1/6 x 1/6 = 1/36 + 1/36 = 2/36 = 1/18 If order does not matter, then we need to consider all possible
orders. All alternatives that arrive at the goal must be considered. When throwing two dice, there are six ways for the dice to total 7: 1 & 6, 2 & 5, 3 & 4, 4 & 3, 5 & 2 or 6 & 1. Thus the probability of the total of two throws being 7 is:
P(total 7) = P(1,6) + P(2,5) + P(3,4) + P(4,3) + P(5,2) + P(6,1) = 6 x (1/6 x 1/6) = 6 x 1/36 = 1/6
Russian Roulette
Russian Roulette is "played" with a 6-shot revolver with 5 empty
chambers and a live cartridge in the sixth chamber. The cylinder is spun at the beginning of the turn, so there is a 1/6 chance that the gun is ready to fire, and a 5/6 chance that the cylinder has stopped at an empty chamber. The player points it at his or her head and pulls the trigger.
If a player has played 3 times, what is the probability that he or she is still alive?
Russian Roulette
Russian Roulette is "played" with a 6-shot revolver with 5 empty
chambers and a live cartridge in the sixth chamber. The cylinder is spun at the beginning of the turn, so there is a 1/6 chance that the gun is ready to fire, and a 5/6 chance that the cylinder has stopped at an empty chamber. The player points it at his or her head and pulls the trigger.
If a player has played 3 times, what is the probability that he or she is still alive?
View 1 (the textbook answer):
There is a 5/6 chance of being alive after a turn. Therefore the probability of being alive after 3 turns is
Russian Roulette
Russian Roulette is "played" with a 6-shot revolver with 5 empty
chambers and a live cartridge in the sixth chamber. The cylinder is spun at the beginning of the turn, so there is a 1/6 chance that the gun is ready to fire, and a 5/6 chance that the cylinder has stopped at an empty chamber. The player points it at his or her head and pulls the trigger.
If a player has played 3 times, what is the probability that he or she is still alive?
View 2 (the logical, from my perspective, answer):
In order to play the third time, the player survived the first 2 rounds. Therefore the probability is
Poker
Poker is played with a 52 card deck, with 13 different card types in each of 4 suits. We will consider hands of 5 cards, with no
replacements. The strength of a hand is determined by special combinations of cards with the following rankings:
1 pair (two-of-a-kind) 2 pairs
3-of-a-kind
straight (five consecutive cards from sequence 2-3-4-5-6-7-8-9-10-J-Q-K-A)
flush (5 cards in same suit)
full house (3-of-a-kind and a pair) four-of-a-kind
Poker Probability of a Pair
With 5 random cards, a pair can be arrived at by several different paths. It could be the first 2 cards drawn, the first and third, first and fourth, etc. There are 10 alternatives (permutations) of this
sort. The second factor is that of getting 2 that are the same, and 3 other cards that are all different. Let's work this out for the case of drawing the pair in the first 2 cards.
For the first card, any card will do: P = 52/52
The second card must be one of the 3 remaining that match the first card:
P = 3/51
The third card must be one of the 48 that do not match the pair: P = 48/50
Poker Probability of a Pair
The fourth card must be one of the 44 that do not match the previous (2 of 49 match the pair, 3 of 49 match the 3rd card):
P = 44/49
The fifth card must be one of the 40 that do not match the previous: P = 40/48
The overall probability for this sequence is: P = 52/52 x 3/51 x 48/50 x 44/49 x 40/48 = (3 x 44 x 40)/(51 x 50 x 49)
= (44 x 4)/(17 x 5 x 49) = 176/4165
Poker Probability of a Pair
What if we were to draw it in a different sequence, say getting the pair on the second and fourth cards?
For the first card, any card will do: P = 52/52
The second card (the first of the pair) must be different: P = 48/51
The third card must be different than both the previous: P = 44/50
The fourth card must match the second: P = 3/49
The fifth card must not match the previous: P = 40/48
Poker Probability of a Pair
There are 10 different ways to arrive a exactly one pair. If A, B, C and D are 4 different card types, we can get:
AABCD, ABACD, ABCAD, ABCDA, BAACD,
BACAD, BACDA, BCAAD, BCADA, or BCDAA
Each specific order has a probability of 176/6615, so the overall frequency of drawing a pair (but no better) is:
P = 10 x 176/4165 = 1760/4165
Poker Probability of 3-of-a-Kind
3-of-a-kind is 3 cards of one type, and two cards that do not match the 3, or each other. Let's work this out for the case of completing it in the first 3 cards.
For the first card, any card will do: P = 52/52
The second card must match the first card: P = 3/51
The third card must match the first card: P = 2/50
The fourth card must be different: P = 48/49
The fifth card must be different: P = 44/48
Poker Probability of 3-of-a-Kind
The overall probability for this sequence is:P = 52/52 x 3/51 x 2/50 x 48/49 x 44/48 = (3 x 2 x 44)/(51 x 50 x 49)
= 44/(17 x 25 x 49) = 44/20825
≈ 0.00211
There are 10 alternative orders of completing it: AAABC, AABAC, AABCA, ABAAC, ABACA,
ABCAA, BAAAC, BAACA, BACAA, and BCAAA The probability for arriving at 3-of-a-kind in any order is:
P = 10 x 44/20825 = 440/20825
Poker Probability of a Full House
A full house is 3-of-a-kind and a pair. Let's work this out for the case of drawing the 3-of-a-kind first, then the pair.
For the first card, any card will do: P = 52/52
The second card must match the first card: P = 3/51
The third card must match the first card: P = 2/50
The fourth card must be different: P = 48/49
The fifth card must match the fourth: P = 3/48
Poker Probability of a Full House
The overall probability for this sequence is:P = 52/52 x 3/51 x 2/50 x 48/49 x 3/48 = (3 x 2 x 3)/(51 x 50 x 49)
= 3/(17 x 25 x 49) [exactly 3/44 the 3-of-a-kind value] = 3/20825
≈ 0.000144
There are 10 alternative orders of completing it: AAABB, AABAB, AABBA, ABAAB, ABABA,
ABBAA, BAAAB, BAABA, BABAA, and BBAAA The probability for arriving at a full house in any order is:
P = 10 x 3/20825 = 30/20825
Poker Probability of 4-of-a-Kind
Four-of-a-kind is 4 of the same type of cards. Let's work this out for the case of drawing the 4 of the same type first.
For the first card, any card will do: P = 52/52
The second card must match the first card: P = 3/51
The third card must match the first card: P = 2/50
The fourth card must match the first card: P = 1/49
The fifth card will be different (all of the kind are gone): P = 48/48
Poker Probability of 4-of-a-Kind
The overall probability for this sequence is:P = 52/52 x 3/51 x 2/50 x 1/49 x 48/48 = (3 x 2)/(51 x 50 x 49)
= 1/(17 x 25 x 49) [exactly 1/3 the full house value] = 1/20825 ≈ 0.0000480
There are 5 alternative orders of completing it:
AAAAB, AAABA, AABAA, ABAAA and BAAAA
The probability for arriving at 4-of-a-kind in any order is: P = 5 x 1/20825
= 5/20825 ≈ 0.000240
The low probabilities of exciting hands is one reason why televised poker is most commonly based on 7-card hands!
Poker Probability of 4-of-a-Kind in 7 Cards
So, what is the probability of 4-of-a-kind in 7 cards? Let's work this out for the case of drawing the 4-of-a-kind first.The first 5 cards are as above:
P = 52/52 x 3/51 x 2/50 x 1/49 x 48/48 = 1/20825
The sixth and seventh cards can be anything (you can only use 5 cards, so getting another pair does not change the hand):
P = 47/47 x 46/46
The overall probability for this sequence remains:
P = 52/52 x 3/51 x 2/50 x 1/49 x 48/48 x 47/47 x 46/46 = 1/20825 ≈ 0.0000480
The higher probability comes from the fact that there are now 35 possible orders in which the 4 matching cards can be drawn (not just 5 orders). Thus,
Poker Probability of a Full House in 7 Cards
Let's work this out for drawing the 3-of-a-kind, then the pair.The first 5 cards are as above:
P = 52/52 x 3/51 x 2/50 x 48/49 x 3/48 = 3/20825
Now it gets uglier. There are 210 permutations for having both of the remaining cards different from both the 3-of-a-kind, and from the pair, with
P(full house) = 3/20825 x 44/47 x 43/46
But there are another 210 ways in which one of the remaining cards could match the pair (which does not improve the hand):
P(2 3-of-a-kind) = 3/20825 x 2/47 x 44/46 Putting everything together:
P = 210 x 3/20825 x ( 44/47 x 43/46 + 2/47 x 44/46 ) = 210 x 3/20825 x ( 44/47 x 45/46 )
The Monty Hall Problem
You are offered three doors and asked to choose one. One of the two that you did not choose is opened, but this is never the Grand Prize. You are offered the chance to keep your original door, or switch to the other unopened door.
What is the optimal strategy? Keep the door you originally chose, switch to the other unopened door, or it does not matter.
Using the optimal strategy, what is the probability of winning the Grand Prize?
The Monty Hall Problem
You are offered three doors and asked to choose one. One of the two that you did not choose is opened, but this is never the Grand Prize. You are offered the chance to keep your original door, or switch to the other unopened door.
Initially,
P(car0) = 1/3, P(goat0) = 2/3 Keep the original door strategy:
P(car) = P(car0) x P(car0→car) + P(goat0) x P(goat0→car) = 1/3 x 1 + 2/3 x 0 = 1/3
P(car) = 1/3
P(goat) = P(car0) x P(car0→goat) + P(goat0) x P(goat0→goat) = 1/3 x 0 + 2/3 x 1 = 2/3
The Monty Hall Problem
You are offered three doors and asked to choose one. One of the two that you did not choose is opened, but this is never the grand prize. You are offered the chance to keep your original door, or switch to the other unopened door.
Initially,
P(car0) = 1/3, P(goat0) = 2/3 Switch door strategy:
P(car) = P(car0) x P(car0→car) + P(goat0) x P(goat0→car) = 1/3 x 0 + 2/3 x 1 = 2/3
P(car) = 2/3
P(goat) = P(car0) x P(car0→goat) + P(goat0) x P(goat0→goat) = 1/3 x 1 + 2/3 x 0 = 1/3
Probabilities of Nucleotide Sequences
DNA sequences have a 4-letter alphabet: A, C, G and T.RNA sequences have a 4-letter alphabet: A, C, G and U. The probability that a given six nucleotide DNA sequence is GAATTC (the EcoRI endonuclease recognition sequence) is
P(GAATTC) = P(G) x P(A) x P(A) x P(T) x P(T) x P(C) If each of the 4 nucleotides is equally likely, then
P(A) = P(C) = P(G) = P(T) = 1/4 so
P(GAATTC) = 1/4 x 1/4 x 1/4 x 1/4 x 1/4 x 1/4 = 1/4096
Probabilities of Nucleotide Sequences
The probability that a given six nucleotide DNA sequence isGAATTC (the EcoRI endonuclease recognition sequence) is P(GAATTC) = P(G) x P(A) x P(A) x P(T) x P(T) x P(C)
What if the probabilities of the nucleotides are not equal? What if P(A) = P(T) = 1/6 P(C) = P(G) = 1/3 then P(GAATTC) = 1/3 x 1/6 x 1/6 x 1/6 x 1/6 x 1/3 = 1/11664 ≈ 0.000086
So, the base composition of the DNA matters in restriction site frequencies.
Probabilities of Nucleotide Sequences
If we have a 10,240 basepair circular plasmid with equalfrequencies of each of the 4 nucleotides, what is the probability that the plasmid is cleaved (one or more times) by EcoRI?
We approach this most easily by computing the probability that it is not cleaved:2
P(cleaved one or more times) + P(not cleaved anywhere) = 1 So,
P(cleaved one or more times) = 1 – P(not cleaved anywhere)
Probabilities of Nucleotide Sequences
For the plasmid to not be cleaved anywhere, it is necessary that it is not cleaved at position 1, and not cleaved at position 2, ... and not cleaved at position 4096. The probability that a given site is not cleaved is 1 minus the probability that it is cleaved at the site:P(not cleaved at position 1) = 1 – 1/4096 = 4095/4096 Or more generally, for any position i:
P(not cleaved at position i) = 1 – 1/4096 = 4095/4096
The probability that it is not cleaved at any of the 10,240 positions is the product of the probabilities for not being cut at each individual position:
P(plasmid not cleaved anywhere) = (4095/4096)10240 ≈ 0.082 So,
Probabilities of Nucleotide Sequences
What if we had approached the above question as:P(cleaved somewhere) = number_of_sites x P(cleaved at site i) This formulation is
P(cleaved at site 1) + P(cleaved at site 2) + ... + P(cleaved at last site)
This would only make sense as a way to combining probabilities if the events were mutually exclusive solutions to the problem, but, in fact, more than one site can be cleaved.
If we were to use this formula we would get:
P = 10240 x 1/4096 = 2.5
This cannot be a probability; a probability cannot be greater than 1 (or less than 0).
Probabilities of Nucleotide Sequences
If the above expression is not the probability of a cleavage, what is it? It is the expected number of cleavages:
E = 10240 x 1/4096 = 2.5
That is, if we had a large number of plasmids of this size and base composition, the number of cleavages per plasmid, averaged over all of the plasmids, would be 2.5.
Any individual plasmid would be cut a specific number of times (0, 1, 2, 3, ...), but the average need not be an integer.
The expected number of events is fundamental to the Poisson distribution, where it is usually called µ (i.e., the Greek letter mu). By the way, when the expected number is very small (<<1), it is a good approximation of the probability of one or more events
occurring; but it assuming that this is always true will get you in trouble.