Statistics For Social Sciences – MATH1208
Unit 6 – Probability Distribution
Random Variable
A random variable, X, is a quantitative
variable that has values that vary according to the rules of probability. A random
variable is also known as a chance variable.
In the experiment of rolling a single die and recording the number of spots that are on the top face, the sample space for this
number of spots on the top face of a die is a random variable.
In the experiment of tossing two coins, the sample space is S = {HH, HT, TH, TT}. This experiment does not involve a random variable because the possible outcomes are not numerical. It is possible to describe the outcomes of the experiment numerically by defining a random variable for the
experiment. We could let X = the number of heads that appear on the two coins, then the
Probability Distribution of a Discrete Random Variable
The probability distribution of a random
variable, X, written as p(x), gives the
probability that the random variable will
take on each of its possible values. p(x) =
P(X = x) for all possible values of X.
A discrete probability distribution for a variable X is defined, if X can assume a
discrete set of values X1, X2, X3, …, Xk with
respective probabilities p1, p2, p3, …, pk,
where p1 + p2 + p3 + … + pk = 1. The
function p(X), which has the respective values p1, p2, p3, …, pk for X = X1, X2, X3,
frequency function of X. Since X can assume certain values with given
probabilities, it is often called a discrete random variable.
A discrete probability distribution for a
variable X is such that = 1.
Probability Notation Summary
Find the probability that X takes on a value that is …
Notation
at least x P(X ≥ x)
more than x P(X > x)
at most x P(X ≤ x)
less than x P(X < x)
between x1 and x2
inclusive
P(x1 ≤ X ≤ x2)
Example
In the experiment of tossing two coins, X = the number of heads is a random variable that can take on three values: 0, 1 and 2.
a. Find the probability distribution of X,
which is the number of heads in tossing two coins.
The sample space is S = {HH, TH, HT, TT}
The event that no head appears is TT, that is,
The event that one head appears is HT or
TH, that is, P(X = 1) = p(1) =
The event that two heads appear is HH, that
is, P(X = 2) = p(2) = .
b. Construct a table to show probability distribution of your results in part a.
Note: p(0) + p(1) + p(2) = + + = 1
c. Which is the most frequent number of
heads? Answer 1 head
x 0 1 2
Expectation
If p is the probability that a person will
receive a sum of money S, the mathematical expectation (or expectation) is defined as p S = pS. For example, if the probability that a
man wins a $10 prize is , then his
expectation is $10 = $2.
If X denotes a discrete random variable that
can assume the values X1, X2, …, Xk with
respective probabilities p1, p2, …, pk, where
p1 + p2 + … + pk = 1, the mathematical
NOTE: The mean of a distribution is the expectation, E(X), of the distribution. E(X) = p1X1 + p2X2 + … + pkXk =
. The variance of the
distribution, VAR(X) = E[(X - )2] =
OR
VAR(X) = E[(X - )2] = E(X2) – [E(X)]2,
where E(X2) = and
E(X) = .
STD(X) = . That is, STD(X) =
=
OR STD(X) =
Exercise
1. The table below shows the weekly accidents at a factory.
a. Copy and complete the table below.
Number of accidents per week (x)
0 1 2 3 4 Total
Number of weeks (f)
10 18 15 6 1
xf
Probability (p)
px
i. mean number of accidents. Ans: 1.4
ii. expected number of accidents per week.
Ans: 1.4
c. What is the most frequent number of accidents per week?
2. On a particular day a farmer expects the sales of cucumbers to follow the trend below:
Sales 0 100 200 300 Total
Probability (P) 0.1 0.4 0.3 0.2
Profit (p) – 50 5 60 60
Pp
a. Copy and complete the table above.
b. What is the profit from the sale of
c. Calculate the expected profit for the sale
of cucumbers. Ans: $27
d. What is the most frequent number of sales?
Exercise
Answer the following.
1. a) Construct a table to show the
probability distribution of a tossing a pair of fair dice, where X denotes the sum of the
points obtained. [Schaum’s Outline,
Page 130]
Answer:
1 2 3 4 5 6
2 3 4 5 6 7 8
3 4 5 6 7 8 9
4 5 6 7 8 9 10
5 6 7 8 9 10 11
6 7 8 9 10 11 12
X 2 3 4 5 6 7 8 9 10 11 12
P(X)
b) What is the most frequent sum of the points obtained?
2. The number of ‘STARS’ sold daily is a random variable X that has the following probability distribution: [Elementary
X 0 1 2 3 4 5
P(X = x) 0.10 0.12 0.25 0.30 0.20 0.03
a. Find the probability that the following copies of the ‘STARS’ will be sold.
i. two copies Ans: 0.25
ii. one or two copies Ans: 0.37
iii. more than three copies Ans: 0.23
iv. at most two copies Ans: 0.47
v. between one and five copies Ans: 0.75
b. Which is the most frequent number of ‘STARS’ sold?
c. Find: i) P(2 < X ≤ 4) Ans: 0.5
ii) P( X < 1) Ans: 0.1
iv) P( X = 2 or X = 5) Ans: 0.28
v. P(X > 2) Ans: 0.53
NOTE: P(X > 2) = 1 – P(X ≤ 2) , P(X < 1) = 1 – P(X ≥ 1) and P(X ≤ 3) = 1 – P(X ≥ 4)
d. Find:
i) the expectation or the mean of the distribution, E(X)
NB. E(X) =
Ans E(X) = 2.47
ii) E(X2) Ans: 7.77
NB. E(X2) = .
NB. VAR(X) = E[(X - )2] =
OR
VAR(X) = E(X2) – [E(X)]2 where E(X2) =
. Ans: VAR(X) = 1.67
iv) the standard deviation of the
distribution, STD(X) Ans: STD(X) = 1.29
NB. STD(X) = . That is,
STD(X) = =
OR STD(X) =
service number for a Cable Company is a random variable with the following
probability distribution:
x
0 1 2 3 4
p(x) 0.23 0.34 0.17 0.15 2m
Find:
a. i) m Ans: 0.055 ii) p(4) Ans: 0.11
b. the probability that a person does not get a busy signal.
c. the probability that a person gets at least
one busy signal. Ans: 0.77
d. the probability that a person receives
more than two busy signals. Ans: 0.26
i) the expectation or the mean of the
distribution, E(X) Ans: 1.57
ii) E(X2) Ans: 4.13
iii) the variance of the distribution, VAR(X)
Ans: 1.67
iv) the standard deviation of the
distribution, STD(X) Ans: 1.29
f. What is the most frequent busy signal?
4. A company that sells ballpoint pens in bulk packages knows that the number of defective pens in a package is a random variable with the probability distribution:
p(x) 0.30 0.21 3w 0.10 0.10 0.09 0.08
Find:
a.i) w Ans: 0.04 ii) p(2) Ans: 0.12
b. What is the most frequent number of defective pens in a package?
c. the probability that a package of pens will contain at least three defective pens.
Ans: 0.37
d. the probability that the package will contain between two and five defective
pens. Ans: 0.2
f. P(X = 0 or X = 2 or X = 6) Ans: 0.5
g. P(1 ≤ X ≤ 3) Ans: 0.43
h. P(X > 4) Ans: 0.17
I. Find:
i) the expectation or the mean of the
distribution, E(X) Ans: 2.08
ii) E(X2) Ans: 8.32
iii) the variance of the distribution, VAR(X) Ans: 3.99
iv) the standard deviation of the
distribution, STD(X) Ans: 2.0
Given a distribution with n = number of trials and p = probability of success at each trial, then:
Mean = np, Variance = np(1 – p) = npq where q = 1 – p and
the standard deviation = .
Exercise
1. Find the mean, variance and standard deviation of the binomial distribution when:
a. n =12 and p = 0.4
Answer: mean = 4.8, variance = 2.88
standard deviation = 1.7
b. n = 3 and p = 0.35
standard deviation = 0.82
Binomial Distribution
If p is the probability that an event will happen in any single trial (called the
probability of a success) and q = 1 – p is the probability that it will fail to happen in any single trial (called the probability of a
The probability distribution of X is
determined by the formula
p(x) = or
p(x) = for x = 0, 1, 2, 3,
…, n.
Note: n! = n(n – 1)(n – 2)(n – 3) … (1) and 0! = 1.
In a binomial distribution the variable takes on only two outcomes. That is, the existence of a trial of an experiment is defined in
terms of the two states ‘success and failure’. Identical trials are repeated a number of
times, yielding a number of successes. In a binomial distribution the probability of
Note: x ~ B(n, p) where n is the sample size and p is the probability of success. Exercise
In a binomial distribution with n =5 and p = 0.35, what is the variance?
Ans: variance = 1.138
Ans: D
Which of the following is NOT a condition for a binomial experiment? A. Each trial has only two outcomes
B. There are only two trials
C. p is the probability of success, q is the probability of failure and p+q=1 D. The trials are independent
Ans: C
Ans: D
Ans: C
Ans: B
Ans: C
Scenario
Suppose a salesman who makes one call per day and considers the call successful if he sells goods worth over $200. In a five day working week, he can make 0, 1, 2, 3, 4 or 5 successful calls. The frequency distribution table below shows the number of successful calls per week over a 48 – week year.
Number of successful
calls (x)
0 1 2 3 4 5 Total
This is known as a binomial frequency distribution. The particular characteristics that makes it binomial is, it describes the number of ‘successes’ obtained when a
number of identical ‘trials’ of an experiment are performed.
Trial = making a single call on one day
Trial success = a sale of goods worth over $200
Number of trials = 5 (i.e. 5 calls in a week)
Exercise
1. In a company the probability that a
during a day’s production run is 0.2. If there are 6 of these machines running on a
particular day, find the probability that:
[P. 464 Example 2 – Francis, A.]
a. no machine need correcting;
Answer: 0.262
b. just one machine needs correcting;
Answer: 0.393
c. exactly two machines need correcting;
Answer: 0.246
d. more than two machines.
Answer: 0.099
0.4. Assuming a Binomial probability
distribution, calculate the probability that for a given week the:
a. bus is early exactly 3 times
Answer: 0.2903
b. bus is early at least 5 times
Answer: 0.0962
c. bus is early at most 2 times
Answer: 0.4199
3. In a certain game in which winning and losing are the only outcomes, the probability
of a player winning is . Given that the
game is played 6 times. Calculate :
a. The probability that the player has exactly
b. The probability that the player has at least
2 wins. Answer: 0.6492
c. The expected number of wins.
Answer: 2
4. A multiple choice quiz has 10 questions. Each question has 5 possible answers of
which only one is correct.
a. What is the probability of selecting a
correct answer? Answer: 0.2
b. Determine the probability that if a student guesses he/she will get:
i. Exactly two (2) correct answers?
Answer: 0.302 (3 dec. pl.)
ii. At least three (3) correct answers?
5. A manufacturer sets up the following sampling scheme for accepting or rejecting large crates of identical items of raw
material received. He takes a random sample of 20 items from the crate. If he finds more than two defective items in the sample, he rejects the crate; otherwise he accepts it. It is known that approximately 5% of these type of items received are defective. Calculate
the: [P. 465 Example 3 – Francis, A.]
a. mean and the variance of the number of
defectives in the sample of 20;
Ans.: 1 & 0.95
b. proportion of crates that will be rejected.
Answer: 0.076
Poisson Distribution
A poisson situation (or process) is a
frequency distribution is obtained by
observing the number of random events that occur in repeated intervals.
Given a poisson situation with m = mean number of events per interval, then the
poisson probability distribution of x events
occurring is given by p(x) = for x =
0, 1, 2, 3, …, n. A poisson interval can be adjusted provided that the mean is adjusted accordingly. For a poisson probability
distribution, the mean = variance.
In a binomial situation, the poisson
1. n is large (greater than 30);
2. the probability (p) is small (less than 0.01).
Ans: B
Exercise
Answer: 4.5e– 3 or 0.224
2. On average, 8 persons visit the doctor in a given hour. Assuming a Poisson
probability distribution, calculate the: a. probability that exactly 5 persons will visit the doctor in a randomly selected hour.
Answer: 0.0916
b. probability that at least 10 persons visit the doctor in a given hour.
Answer: 0.2196
(estimated for 3 values)
c. probability that for a randomly selected half-hour, exactly 3 persons will visit the
doctor. Answer: 0.1954
assembly line is 3 per week. Find the probability that:
a. A particular week will be accident free.
Answer: 0.0498
b. At least three (3) accidents will occur in
a week. Answer: 1 – 8.5e – 3 or 0.5767
c. Exactly five (5) accidents will occur in
two weeks. Answer: 64.8e- 6 or 0.1607
4. Customers arrive randomly at a
department store at an average rate of 3.4 per minute. Assuming the customer arrivals form a poisson distribution, calculate the probability that:
a. no customer arrive in any particular time; Answer: 0.0334
b. exactly one customer arrives in any
particular minute; Answer: 0.1136
c. two or more customers arrive in any
particular minute; Answer: 0.8530
d. one or more customers arrive in any 30 –
second period. Answer: 0.8173
5. Items produced from a machine are
known to be 1% defective. If the items are boxed into cases of 200, what is the
probability of finding that a single box has 2 or more defectives? [P. 469, Example 6 –
6. JIIC receives on average 3.5 claims per week. Assuming that the number of claims follows a poisson distribution, find the
probability that it will receive:
a. exactly three claims in a given week;
b. no more than three claims in a week.
Standard Normal Variable (z – value)
lengths of uniform manufactured products and times.
It is not possible to find the probability as a precise value because the Normal
distribution is continuous. However, we can evaluate the existence of the probability in a certain range. The information needed to
evaluate probabilities for ranges of values for a Normal distribution is the values of the:
1. mean ( - the Greek letter mu) and 2. standard deviation ( - the Greek letter sigma) of the distribution.
The procedure used in calculating
distribution requires a knowledge of the use of:
1. Z-scores and
2. Z(or Standard Normal) tables.
Z-scores
The Z-score for any value x of a normal
distribution, having mean and standard
deviation is defined by: . This
process is sometimes known as
standardizing the x-value.
The z-score measures the number of
standard deviation that a data value is from the mean. A standard normal random
mean of 0 and a standard deviation of 1. That is, Z ~ N(0, 1). Thus, Z ~ N( , ).
Note: x ~ N(95, 16), that is x ~ N( , )
Standard Normal (Z) tables
A standard normal table is a table of probabilities for a Z random variable.
The Normal probability is calculated by first standardizing the Normal value given in the question to obtain a z-score, then use the standard Normal tables to find the
probability.
Note: X ~ Z(mean, variance) or
X ~ Z( , ).
Ans: C
Ans: C
Summary of procedure for calculating Normal probabilities
The procedure for calculating any Normal probability is:
STEP 2 – Use the standard Normal table to find the probability of Z.
STEP 3 – If necessary, manipulate the probability obtained.
Example – Page 475, A. Francis
1. The lengths of steel pins produced by a machine are distributed Normally with mean 20 cm and standard deviation 0.1 cm. Find the:
a. z-score if the probability of a randomly selected pin is less than 20.1 cm in length
b. probability that a randomly selected pin is less than 20.1 cm in length
Solution: P(length of pin < 20.1) = P(Z < 1)
Using the standard normal table P(Z < 1) = 0.8413
c. probability that a randomly selected pin is greater than 20.1 cm in length.
Solution: P(length of pin > 20.1) = P(Z > 1) P(Z > 1) = 1 – 0.8413 = 0.1587 d. probability that a randomly selected pin is greater than 19.9 cm in length.
Solution:
P(length of pin > 19.9) = P(Z > – 1) Using the standard normal table
20.1 20
● ●
1 – 0.8413 = 0.1587 0.8413
Normal distribution of steel pins
Length of pin
1 0 ● ●
1 – 0.8413 = 0.1587 0.8413
Standard Normal distribution of steel pins
P(Z < 1) = P(Z > – 1) = 0.8413 because the Normal distribution is symmetric.
Draw a diagram for each of the following Note: P(Z < – 1) = P(Z > 1)
P(Z > 1) = 1 – P(Z < 1) P(Z < – 1) = 1 – P(Z < 1) P(Z < 1) = 1 – P(Z > 1)
P(Z < 1) = P(Z > – 1)
P(– 1 < Z < 1) = P(Z < 1) – P(Z < – 1) P(– 1 < Z < 1) = P(Z > – 1) – P(Z > 1)
2. The weights of bags of potatoes are Normally distributed with mean 5 lbs and
1 0 ● ●
Area of (B+C+D) = Area of (A+B+C) Standard Normal distribution of steel pins
z ●
– 1 A
B C
standard deviation 0.2 lb. The potatoes are delivered to a supermarket, 200 bags at a time.
a. What is the probability that a random bag will weigh more than 5.5 lbs?
Solution: P(bag > 5.5 lbs) =
z =
Using the table P(Z < 2.5) = 0.9938, so P(Z > 2.5) = 1 – 0.9938 = 0.0062
b. How many bags, from a single delivery, would be expected to weigh more than 5.5 lbs?
= 200 0.0062 = 1.24. In practical terms 1 bag.
Example 3 – Page 478, A. Francis The time taken to complete jobs of a
particular type is known to be Normally distributed with mean 6.4 hours and
standard deviation 1.2 hours. What is the probability that a randomly selected job of this type takes:
a) less than 7 hours
Ans: P(X < 7) is P(Z < 0.5) = 0.6915 b) less than 6 hours
Ans: P(X < 6 hours) is
P(Z < – 0.33) = 1 – P(Z < 0.33) = 0.3707
Ans: P(6 < X < 7) is P(– 0.33 < Z < 0.5) =
P(Z < 0.5) – P(Z < (– 0.33)) =
0.6915 – 0.3707 = 0.3208
Example 4 – Page 479, A. Francis
Company records show the weekly distance travelled by their salesmen is approximately normally distributed with mean 800 miles and standard deviation 90 miles. The sales manager considers that salesmen who travel less than 600 miles in one week are
If the company employs 200 salesmen, how many would be expected to perform poorly in a particular week?
Ans: Standardising 600 gives z = – 2.22 P(Z < – 2.22) = 1 – P(Z < 2.22)
= 1 – 0.9868 = 0.0132
The expectation = np = 200(0.0132) = 2.64 approx. 3 salesmen to perform poorly in any one week.
Exercise
1. The random variable X is normally
distributed with mean 12 and variance 0.25. What is the probability that X < 12?
2. Let X be a random variable with a mean of 50 and a variance of 64. Find:
a. P(X < 52) Answer:0.5987
[2013 Past Paper – question 4b]
b. P(X > 52) Answer: 0.4013
c. P(35 < X < 64) Answer: 0.9298
3. A random variable X follows a Normal probability distribution and has a mean
value of 25 and a variance of 9. Calculate: [2014 – question 4b]
a. P(X < 21) Answer: 0.0918
b. P(X > 26) Answer: 0.3707
4. The height of students at a certain school are normally distributed with mean 169 cm and standard deviation 10 cm.
a. What is the z-score of a student who
is 167 cm tall? Answer: z = – 0.2
b. What is the probability that a student selected at random is:
i) less than 167 cm tall?
Ans: P(Z ≤ 167) = 0.4207
ii) between 167 and 172 cm?
Ans: P(167 < x < 172) = 0.1972
5. Weekly demand at a grocery store for a brand of canned sausages is normally
cans. What is the probability that the weekly demand is:
a. 959 cans or less; Ans:P(Z ≤ 959) = 0.983
b. between 660 and 735 cans of sausages?
Ans: P(660 < x < 735) = 0.16141
Approximating Probabilities to the
Binomial Distribution Using the Standard Normal Distribution
The Normal distribution can be used as an approximation to the binomial distribution when:
2. the probability (p) is not too small or large (the closer to 0.5 the better).
When used as an approximation in this way, the Normal distribution has:
mean = np and
standard deviation = .