5
5
Chapter 5 – Probability Distributions
5.1 Introduction 5.1 Introduction
Aprobability distributionprobability distribution is a list of all the possible outcomes of a random variable and their associated probabilities of occurrence.
Chapter 4 showed that probabilities can be derivedempirically through data collection and the construction of frequency distributions. The frequencies associated with each outcome of the random variable under study are then used to calculate the probabilities of specific events occurring. A probability distribution is therefore similar to a frequency distribution.
In contrast, this chapter will show how to usemathematical functions to find probabilities.
There are numerous problem situations in practice where the outcomes of a specific random variable followknown probability patterns. If the behaviour of a random variable under study can be matched to one of these known probability patterns, then probabilities for outcomes associated with the random variable can be found directly by applying an appropriate theoretical probability distribution function. This avoids the need for empirical data capture and summary analysis to derive these probabilities.
5.2
5.2 TTypes ypes of of Probability Probability Distribut Distribution ion
Probability distribution functions can be classified asdiscrete orcontinuous. Therefore the choice of a particular probability distribution function in practice depends on the data type of the random variable (i.e. discrete or continuous) under study.
This chapter will describe two discrete probability distribution functions (called the binomial and the Poisson distributions) and one continuous probability function (called the normal distribution).
5.3
5.3 Discrete Discrete Probability Probability Distributi Distributions ons
Discrete probability distributions
Discrete probability distributions assume that the outcomes of a random variable under study can take ononly specific(usuallyinteger ) values.
Examples include the following:
A maths class can have 1, 2, 3, 4, 5 (or any integer) number of students.
A bookshop has 0, 1, 2, 3 (or any integer) number of copies of a title in stock.
A machine can produce 0, 1, 2, 3, 4 (or any integer) defective items in a shift.
A company can have 0, 1, 2, 3 (or any integer) employees absent on a day.
In discrete probability distributions, a non-zero probability exists for each possible outcome of the random variable (within the sample space). The probability is zero for values of the random variable that are not in the sample space.
Two common discrete probability distribution functions are the binomial probability distribution and thePoisson probability distribution.
For a discrete random variable to follow either a binomial or a Poisson process, it must possess a number of specific characteristics. These features will be identified for each of these probability distribution functions in the following sections.
Applied Business Statistics
5.4
5.4 Binomial Binomial Probability Probability Distribut Distribution ion
A discrete random variable follows thebinomial distributionbinomial distribution if it satisfies the following four conditions:
The random variable is observed n number of times (this is equivalent to drawing a sample of n objects and observing the random variable in each one).
There are onlytwo, mutually exclusive and collectively exhaustive,outcomes associated with the random variable on each object in the sample. These two outcomes are labelled success and failure (e.g. a product is def ective or not defective; an employee is absent or not absent from work; a consumer prefers brand A or not brand A).
Each outcome has an associated probability.
– The probability for the success outcome is denoted by p. – The probability for the failure outcome is denoted by 1 − p.
The objects are assumed to beindependent of each other, meaning that p remains constant for each sampled object (i.e. the outcome on any object is not influenced by the outcome on any other object). This means that p is the same (constant) for each of then objects.
If these four conditions are satisfied, then the followingbinomial question can be addressed.
The Binomial Question
‘What is the probability thatx successes will be occur in a randomly drawn sample of nobjects?’
This probability can be calculated using thebinomial probability distributionbinomial probability distribution formula:
P(x) =nC x px(1 – p)n–x forx = 0, 1, 2, 3, …,n 5.15.1
Where: n = the sample size, i.e. the number of independent trials (observations) x = the number ofsuccess outcomes in then independently drawn objects p = probability of asuccess outcome on a single independent object (1 − p) = probability of a failure outcome on a single independent object
Thex values represent thenumber of thesuccess outcome that can be observed in a sample of n objects. It is called thedomain. Since the number of success outcomes cannot exceed the number of trials, the domain for the binomial probability distribution is limited to all the integer values (including zero) up to the sample size,n.
The rationale of thebinomial process is illustrated by the following problem:
Example 5.1 Car Hire Request Study Example 5.1 Car Hire Request Study
The Zeplin car hire company has a fleet of rental cars that includes the make Opel.
Experience has shown that one in four clients requests to hire an Opel.
If five reservations are randomly selected from today’s bookings, what is the probability (or likelihood) that two clients will have requested an Opel?
Chapter 5 – Probability Distributions
Solution Solution
The random variable (i.e. the number of hire requests for an Opel) is discrete, since 0, 1, 2, 3, 4, etc. Opels can be requested for hire on a given day. For this discrete random variable to follow the binomial process, it must satisfy the four conditions defined above.
Condition 1 is satisfied, since the random variable is observed five times (i.e. a sample of five hiring requests was studied). Hencen = 5. Each of the five reservation requests is a single trial (or object) in the study of car hire request patterns.
Condition 2 is satisfied, since there are only two possible outcomes on each client request:
A client requests the hire of an Opel (success outcome).
A client requests the hire of another make of car, i.e. not an Opel ( failure outcome).
Condition 3 is satisfied, since the probability of thesuccessoutcome is constant and is derived from the statement that ‘experience has shown that one in four clients request to hire an Opel’.
Thus p (= probability of a client requesting to hire an Opel) = 0.25
(1 − p) (= probability of a client requesting another make of car) = 0.75 Condition 4 is satisfied, since the trials are independent. Each client’s car preference request is independent of every other client’s car preference request. This implies that p will not change from one client request to another.
Since all the conditions for the binomial process have been satisfied, the binomial question can be addressed: ‘What is the probability that two out of five clients will request to hire an Opel?’
Find: P(x = 2) whenn = 5 and p = 0.25.
Then: P(x = 2) =5C2(0.25)2 (1 – 0.25)5–2 = (10)(0.0625)(0.4219) = 0.264
Thus there is a 26.4% chance that two out of five randomly selected clients will request an Opel.
How to Select p
Thesuccess outcome is always associated with the probability, p. The outcome that must be labelled as thesuccess outcome is identified from the binomial question.
To illustrate, in Example 5.1, the binomial question relates to a client requesting the hire of an Opel, thus the success outcome is ‘receiving ahire request for an Opel’. The failure outcome is ‘receiving ahire request foranother make of car ’.
Example 5.2 Life Assurance Policy Surrender Study Example 5.2 Life Assurance Policy Surrender Study
Global Insurance has found that 20% (one in five) of all insurance policies are surrendered (cashed in) before their maturity date. Assume that 10 policies are randomly selected from the company’s policy database.
(a) What is the probability that four of these 10 insurance policies will have been surrendered before their maturity date?
(b) What is the probability thatno more than three of these 10 insurance policies will have been surrendered before their maturity date?
(c) What is the probability that atleast two out of the 10 randomly selected policies will be surrendered before their maturity date?
Applied Business Statistics
Solution Solution
(a) The random variable ‘number of policies surrendered’ is discrete, since there can be 0, 1, 2, 3, …, 9, 10 surrendered policies in the randomly selected sample of 10 policies.
This random variable ‘fits’ the binomial probability distribution for the following reasons:
The random variable is observed 10 times (10 policies were randomly sampled).
Each policy is an observation of the random variable (i.e. policy surrender status). Hencen = 10.
There are only two possible outcomes for each policy, namely:
– a policy is surrendered before maturity (thesuccess outcome) – a policy is not surrendered before maturity (the failure outcome).
A probability can be assigned to each outcome for a policy, namely:
– p (= probability of a policy being surrendered) = 0.20
– (1 − p) (= probability of a policy not being surrendered) = 0.80
Note that the success outcome refers to surrendering a policy since the binomial question seeks probabilities for surrendered policies.
The trials are independent. Each policy’s status (surrendered or not) is independent of every other policy’s status. Thus p = 0.20 is constant for each policy.
Since all the conditions for the binomial process have been satisfied, the binomial question can be addressed: ‘What is the probability that four of these 10 insurance policies will have been surrendered before maturity date?’
Find: P(x = 4) whenn = 10 and p = 0.2.
Then: P(x = 4) =10C4(0.20)4(1 – 0.20)10–4 = (210)(0.0016)(0.2621) = 0.088 Thus there is an 8.8% chance that four out of 10 randomly selected policies will
have been surrendered before maturity.
(b) The binomial approach still applies. In terms of the binomial question, ‘no more than 3’
implies that either 0 or 1 or 2 or 3 of the sampled policies will be surrendered before maturity.
Thus find P(x ≤ 3). (This is a cumulative probability.)
Using the addition rule of probability for mutually exclusive events, the cumulative probability is:
P(x ≤ 3) = P(x = 0) + P(x = 1) + P(x = 2) + P(x = 3)
The three binomial probabilities must now be calculated separately and summed:
P(x = 0) =10C0 (0.20)0(1 – 0.20)10–0 = 0.107 P(x = 1) =10C1 (0.20)1(1 – 0.20)10–1 = 0.269 P(x = 2) =10C2 (0.20)2(1 – 0.20)10–2 = 0.302 P(x = 3) =10C3 (0.20)3(1 – 0.20)10–3 = 0.201
Then P(x≤ 3) = 0.107 + 0.269 + 0.302 + 0.201 = 0.879
There is an 87.9% chance that no more than three out of the 10 policies randomly selected will be surrendered before their maturity date.
(c) This question translates into finding the cumulative probability of P(x ≥ 2) i.e. P(x ≥ 2) = P(x = 2) + P(x = 3) + P(x = 4) + … + P(x = 10).
Chapter 5 – Probability Distributions
This requires that nine binomial calculations be performed. However, to avoid onerous calculations, the complementary law of probability can be used, as follows:
P(x ≥ 2) = 1 − P(x ≤ 1)
= 1 − [P( x = 0) + P(x = l)]
= 1 − [0.107 + 0.269] (from (b) above) = 1 – 0.376 = 0.624 Thus there is a 62.4% chance that at least two out of the ten randomly selected
policies will be surrendered before their maturity date.
Useful Pointers on Calculating Probabilities
Key words such asat least,no more than, at most, no less than, smaller than, larger than, greater than,no greater than always imply cumulative probabilities (i.e. the summing of individual marginal probabilities.
Thecomplementary rule should be considered whenever practical to reduce the number of probability calculations.