Probability & Probability
Distributions
Carolyn J. Anderson EdPsych 580
Probability & Probability Distributions
• Elementary Probability Theory • Definitions
• Rules
• Bayes Theorem
• Probability Distributions
• Discrete & continuous variables. • Characteristics of distributions. • Expectations
Elementary Probability Theory
or
How Likely are the results?
Probabilities arise when sampling individuals
from a population and in experimental situations, because different “trials” or replications of the
same experiment usually result in different outcomes.
Statistical Experiment
A (simple) statistical experiment is some well
defined act or process (including sampling) that leads to one well defined outcome.
• It’s repeatable (in principle).
• There is uncertainty about the results. • Uncertainty is modeled by assigning
probabilities to the outcomes.
Examples of Statistical Experiments
Well defined, repeatable, uncertainty, model by probabilities?
• Flip a coin 5 times & record number of heads. • Count the number of blue M& M’s in a 9 oz.
package.
• Roll two dice & record the total number of
spots.
• Ask people who they intend to vote for in the
Statistical Experiments
A statistical experiment maybe
• Real (it can actually be done).
Definition: Probability
The probability of an event is the proportion of
times that the event occurs in a large number of trials of the experiment.
Example
• Experiment: Draw a card from a standard
deck of 52.
• Sample space: The set of all possible distinct
outcomes, S (e.g., 52 cards).
• Elemenatary event or sample point: a
member of the sample space. (e.g., the ace of hearts).
• Event (or event class): any set of elementary
events. e.g., Suit (Hearts), Color (Red), or Number (Ace).
Example (continued)
Probability of an Ace = number of aces number of cards
= 4
52 = .0769
Notes:
• Elementary events are equally likely
• Denote events by roman letters (e.g., A, B,
etc)
More Definitions
• Joint Event is when you consider two (or
more events) at a time. e.g., A =heads on
penny, B = heads on quarter, and joint event is heads on both coins.
• Intersection: (A ∩ B) = A and B occur at the
same time.
• Union: (A ∪ B) = A or B occur • Only A occurs.
• Only B occurs. • A and B occur.
More Definitions
• Complement of an event is that the event did
not occur. A ≡¯ not A. e.g., if A =red card, then A¯ is a black card (not a red card).
• Mutually exclusive events are events that
cannot occur at the same time. Events have no elementary events in common. e.g., A = heart and B = club.
• Mutually exclusive and exhaustive events are
a complete partition of the sample space. e.g., • Suits (hearts, diamonds, clubs, spades)
Formal Defintion of Probability
Probability is a number assigned to each and every member in the sample space. Denote by
P (·).
A probability function is a rule of correspondence that associates with each event A in the sample space S a number P (A) such that
• 0 ≤ P (A) ≤ 1, for any event A.
• The sum of probabilities for all distinct events is 1. • If A and B are mutually exclusive events, then
Example
Let A = number card (i.e., 2–10), B = face card (i.e., J, Q, K), and C = Ace.
• Probabilities of events: P (A) = 9(4)/52 = 36/52 = .6923 P (B) = 3(4)/52 = 12/52 = .2308 P (C) = 1(4)/52 = 4/52 = .0769 • P (A) + P (B) + P (C) = 1 • P (A ∪ B) = P (A) + P (B) = .6923 + .2308 = .9231 = 48/52.
Another Example
• Experiment: Randomly select a third grade
student from a Unit 4 public school in Champaign county.
• Sample Space: All 3rd grade students at Unit
4 public schools.
• Elementary Event: A characteristic of the
child. e.g., brown hair, age (in months),
weight, gender, the response “very much” to question “How much do you like school?”
Venn Diagram
&% '$ A ' & $ % BS
C @ @ @ @ @ I A ∩ B A ∩ C Addition rules. . .Addition Rules
• Rule 1: If 2 events, B & C, are
mutually exclusive (i.e., no overlap) then the probability that one or both occur is
P (B or C) = P (B ∪ C) = P (B) + P (C) • Rule 2: For any 2 events, A & B, the
probability that one or both occur is
Example: Teachers by Region
The population consists of all elementary and secondary teachers in US in 1969.
Level
Region Elementary Secondary
Northeast 273,687 224,013 517,700
North Central 314,614 265,848 580,462
South 240,028 183,180 423,208
West 279,445 213,021 492,466
Example: Teachers by Region
• Elementary event (or “sample point”) is a
teacher.
• Event is any set of teachers. (e.g., region,
level, or combination).
• Simple Experiment: Select 1 teacher at
random,
P (elementary) = 1, 107, 774
2, 0138, 836 = .55
Example: Addition Rules
Rule 1: Events are an elementary teacher from the South & an elementary teacher from the
West,
P (elementary in S or W) =
= P (elementary, South) + P (elementary, West)
= 240, 028
2, 013, 836 +
279, 445
Example: Addition Rules
(continued)Rule 2: Events are an elementary teacher and a teacher from the South
P (elementary or from South) =
= P (elementary) + P (South)
−P (elementary and South)
= 1, 107, 774 2, 013, 836+ 423, 208 2, 013, 836 − 240, 028 2, 013, 836 = .64
Conditional Probability
• Conditional Probability equals the probability
of an event A given that we know that event B has occurred.
P (A|B) = P (A ∩ B) P (B) =
P (A, B) P (B)
• Example: What is the probability that a
teacher is from the South given that he/she is an elementary school teacher?
Example: Answer
P (South|elementary) = P (elementary and South)
P (elementary) = 240, 028/2, 013, 836 1, 107, 774/2, 013, 836 = 240, 028 1, 107, 774 = .217
Example
(continued)• Note that
P (South) = 423, 208
2, 013, 396 = .210
• Knowing that a teacher is an elementary
school teacher changes the chance that the teacher is also from the south,
P (South|elementary) 6= P (South) .217 6= .210
Bayes Theorem
• P (A ∩ B) = P (A, B) = P (A|B)P (B) • P (A ∩ B) = P (A, B) = P (B|A)P (A) • Bayes Theorem:
P (A|B) = P (B|A)P (A) P (B)
• Example: Monty hall problem.
Monty Hall Problem
• Start of Game: Probability of getting the big
prise (e.g, car)
P (A) = 1 3 P (B) = 1 3 P (C) = 1 3 • You pick door A.
• Monty opens door B and gives you the
chance to switch from door A to door C. What should do you do?
Monty Hall Problem
(continued)• Choose the door for which has the larger
conditional probability, i.e., P (A|Monty opened B)
or P (C|Monty opened B).
• Use Bayes Theorem. . . so we need
• Conditional probabilities that Monty opens door B given the car is behind A, behind B and behind C. • Joint probabilities that Monty chooses door B and
the car is behind door A, door B and door C.
Monty Hall Problem
(continued)Conditional prob. that Monty opens door B:
P (Monty opensB|car behind A) = P (BM onty|A) = 1 2 P (Monty opensB|car behind B) = P (BM onty|B) = 0 P (Monty opensB|car behind C) = P (BM onty|C) = 1 Joint probabilities:
P (BM onty, A) = P (BM onty|A)P (A) = 1 2 × 1 3 = 1 6 P (BM onty, B) = P (BM onty|B)P (B) = 0 × 1 3 = 0 P (B , C) = P (B |C)P (C) = 1 × 1 = 1
Monty Hall Problem
(continued)(Unconditional) Probability that Monty opens door B:
P (BM onty) = P (BM onty, A) + P (BM onty, B) + P (BM onty, C) = 1 6 + 0 + 1 3 = 1 2
Apply Bayes Theorem. . .
P (A|BM onty) = P (A)
P (BM onty)P (BM onty|A) =
1/3 1/2 × 1 2 = 1 3 P (C|BM onty) = P (C) P (BM onty)P (BM onty|C) = 1/3 1/2 × 1 = 2 3
Monty Hall Problem
(continued)• I got this example from: Gill, J. (2002).
Bayesian Methods for the Social and Behavioral Sciences. Chapman & Hall.
• Other sources on The Monty Hall Problem. • History.
Independence
• If the conditional and unconditional
probabilities are identical, then the two events are Independent.
• For Independent events, • P (A|B) = P (A)
• P (B|A) = P (B)
• P (A and B) = P (A ∩ B) = P (A)P (B) =⇒
Conditional Independence
(continued)• Conditional probabilities and Conditional
Independence: two very important concepts.
• Conditional probability and regression. • Conditional Independence: explaining
dependency (e.g., classic example: Cal graduate admissions)
• Demonstration: Toss penny and quarter and
Are Events Conditionally Independence?
• Physical considerations— physically
unrelated events..
Independent: Physical Considerations
Examples:
• Toss a penny & a quarter:
P (penny = head & quarter = head) =
P (penny = head)P (quarter = head) = (.5)(.5) = .25
• Role two dice:
P (die1 = 5 & die2 = 6) =
Independent: Physical Considerations
Examples:
• Administer an test that measures attitude
toward gun control to 2 randomly drawn adults in the US population.
• P (Score1 = 50 and Score2 = 55) = P (Score1 = 50)P (Score2 = 55)
Independence: Deduction
Whether events are independent can sometimes be deduced from observations, e.g., Mendal’s
experiments.
• Mendal postulated that existence of genes
that are recessive and dominant.
• Experiment: Bred pure strains of yellow peas
& green peas.
• 1st generation: Cross the yellow and green
peas.
Mendal’s Experiments
(continued)• Results: About 75% yellow and About 25%
green.
• Results were very regular and replicable (with
other traits and plants).
• Part of explanation involves assumption of
Mendal’s Experiments: Explanation
• There exist genes which when paired up
control seed color according to rules:
y/g −→ yellow g/y −→ yellow
y/y −→ yellow g/g −→ green
• 1st generation: Pure yellow strain (y/y) could
only give a y gene and pure green strain (g/g) could only give a g gene.
• 2nd generation: About 1/2 of parent plants
contribute a y, about 1/2 contribute a g, and pairing in random (independent).
Mendal’s Experiments: Explanation
Maternal y g y y/y y/g Paternal (.25) (.25) (.50) g g/y g/g (.25) (.25) (.50) (.50) (.50) (1.00)• Probability of each cell = (.50)(.50) = .25. . . this is the independence part of the theory.
• Probability of phenotype:
Mendal’s Experiments: Explanation
• Mendal’s theory is an example where anabstract probability theory is applied to observed data.
• The postulated probability distribution of seed
Basic Logic
• Assumed some things to be true (e.g.,
Mendal’s theory).
• Make deductions about what should be true
in the long-run (e.g., 2nd generation: 75% yellow and 25% green).
• It’s physically impossible to do all possible
experiments, so we do some (“sample”).
• By chance the results will differ from what
Probability Distributions
From Hayes:
• “Any statement of a function associating each
of a set of mutually exclusive and exhaustive events with its probability is a probability
distribution”
• “Let X represent a function that associates a
Real number with each and every elementary event in some sample space S. Then X is
called a random variable on the sample space S.”
Random Variables
• If random variable can only equal a finite
number of values, it is a discrete random variable.
Probability distribution is known as a “probability mass function”.
• If a random variable can equal an infinite (or
really really large) number of values, then it is a continuous random variable.
Probability distribution is know as a “probability density function”.
Discrete Random Variables
From Mendal’s theory, assign event to real number (arbitary):
Y = (
1 if yellow
0 if green Probability Mass Function:
Lottery Spinner
Color Y P (Y ) Yellow −100 .10 Blue −5 .20 Red 0 .50 Green 10 .10 Tan 100 .10Continuous Random Variables
• When a numerical variable is continuous, it’s
probability distribution is represented by a curve known as a “probability density
function” or just p.d.f.
• Denote a p.d.f by f (y).
• P (x1 ≤ Y ≤ x2) = area under curve.
Probability = area under curve.
Continuous Random Variable
The event is how many miles a randomly
selected graduate student attending UIUC is from “home.”
Continuous Random Variable
Probability that a graduate student attending UIUC is 2,000 or more miles from “home”
Continuous Random Variables
The event is temperature outside the education building on January 27th.
Examples of p.d.f.’s
Examples of p.d.f.’s
Characteristics of Distributions
• Discrete or continuous • Shape
• Central tendency
Expected Value
If you played this game what would you expect to win or lose? Color Y P (Y ) Yellow −100 .10 Blue −5 .20 Red 0 .50 Green 10 .10 Tan 100 .10 µY = E(Y )
Expectations are Means
• For discrete random variable,E[Y ] = µy ≡
n
X
i=1
yiP (yi)
• For continuous variables,
E[Y ] = µy ≡
Z
yf (y)d(y)
• Variance is the mean squared deviation,
σy2 = E[(y − µy)2] = E[y2 − 2yµy + µ2y]
Expectations are Means
(continued)Example: The variance of lottery spinner:
σ2 = E[(y − µy)2] = 5 X i=1 (yi − µ)2P (Yi) = .1(−100 − 0)2+.2(−5 − 0)2 + .5(0 − 0)2 .1(10 − 0)2 + .1(100 − 0)2 = 2, 015
The Algebra of Expectations
Why? We don’t have to deal with calculus & it’s used alot in statistics. From Hayes Appendix B,
• Rule 1: If a is a constant, then E(a) = a
• Rule 2: If a is a constant real number and Y is
a random variable with expectation E(Y ), then
The Algebra of Expectations
• Rule 3: If a is a constant real number and Y is
a random variable with expectation E(Y ), then
E(Y + a) = E(Y ) + a
• Rule 4: If X and Y are random variables with
expectations E(X) and E(Y ), respectively, then
The Algebra of Expectations
• Rule 5: Given a finite number of random
variables, the expectation of the sum of those variables is the sum of their individual
expectations, e.g.
E(X + Y + Z) = E(X) + E(Y ) + E(Z)
Variances:
• Rule 6: If a is a constant and if Y is a random
The Algebra of Expectations
• Rule 7: If a is a constant and if Y is a random
variable with variance σy2, then the random variable (aY ) has variance a2σy2.
• Rule 8: If X and Y are independent random
variables with variances σx2 and σy2, then the variance of X + Y is
σ(x+y)2 = σx2 + σy2 • What about variance of (X − y)?
The Algebra of Expectations
Independence
• Rule 9a: Given random variables X and Y
with expectations E(X) and E(Y ),
respectively, then X and Y are independent if
E(XY ) = E(X)E(Y )
• Rule 9b: If E(XY ) 6= E(X)E(Y ), the