Geometric distribution

You perform Bernoulli trials (flipping a coin, throwing a die) repeatedly till you get your first success (for example -- getting a first head or first 6), then you stop and count the total number of trials you had (including the final trial that brings you the only success).

The total number of trials you counted is a geometric random variable.

If we let X= # of independent trials just enough for one success, and p is the constant probability of success in any single trial, then Xis geometric random variable with parameter p.

Please note that one nasty thing about a geometric (and a negative binomial) random variable is that there are always two ways to define the random variable – to define it as the number of trials or the number of failures. Though the variance formula is identical either way, the mean formulas are different (the two means differ by one).

Because exam questions can use either way to define a geometric variable, you need to determine, according to the context of the problem, which mean and probability mass formulas to use.

Here is a second way to define a geometric random variable:

If we let Y= # of independent failures just enough for one success, p is the constant probability of success in any single trial, then Yis geometric random variable with parameter p.

Let’s look at the probability mass function of Xand Y:

S=success, F=failure, p=the probability of success in a single trial. Then

Page 120 of 425

©Yufeng Guo, Deeper Understanding: Exam P X= # of trials

before 1^st success

1 2 3 4 … n

Y = # of failures before

1^st success

0 1 2 3 k

(k=n 1) Outcome in

terms of trials

S FS FFS FFFS …

...

FF F S Outcome in

terms of failures

S FS FFS FFFS … ...

FF F S

Probability

( )

P X =n

(

¹ ^{p p}

) (

¹ ^p

)

² ^p

(

¹ ^p

)

³ ^p ^…

(

¹ ^p

)

ⁿ ¹ ^p

Probability

( )

P Y=k

(

¹ ^{p p}

) (

¹ ^p

)

² ^p

(

¹ ^p

)

³ ^p ^…

(

¹ ^p

)

^k ^p

Page 121 of 425

©Yufeng Guo, Deeper Understanding: Exam P Key formulas for geometric random variable

Formula Explanation

To have 1st success at n-th trial, you must fail the first

(n-1) trials and succeed at n-th trial. Notice X=Y+1.

( )

one success, you must have zero success in the first n 1trials.

The continuous counterpart of geometric is exponential.

Exponential CDF is

( )

¹ ^x

FX x = e . Notice that CDF for geometric and

exponential is one minus something.

You can derive it using MGF (next page). Intuitive formula.

For example, if p=0.2 1 5= , then on average every 5 trials bring in one success. So

( )

Not an intuitive formula. Just have to memorize the first one.

To get the 2^nd formula, use past. After making initial

btrials with no success, your chance of succeeding after at least amore trials is the same as if you would reset your counter to zero after the initial btrials and start counting trials afresh.

Why so? If an event is truly random (such as coin tossing, your past success or failure should have no bearing on your future success or failure.

Page 122 of 425

©Yufeng Guo, Deeper Understanding: Exam P

How to derive the mean and variance formula using MGF:

( )

(

) (

)

Sample Problems and Solutions

Problem 1

A candidate is taking a multiple-choice exam. For each tested problem in the exam, there are 5 possible choices --- A, B, C, D, and E. Because the candidate has zero knowledge of the subject, he relies on pure guesswork to answer each question, independent of how he answers any previous questions.

Find:

(1) The probability that the candidate answers three questions wrong in a row before he finally answers the fourth question correctly.

(2) Let random variable X= # of problems the candidate answers wrong in a row before he finally guesses a correct answer. Find ^{E X}

( )

^and^{Var X .}

( )

Page 123 of 425

©Yufeng Guo, Deeper Understanding: Exam P

(3) When the candidate’s exam is being graded, the grader finds that the candidate has answered the first five exam questions all wrong. Can the grader conclude that the candidate is more or less likely to guess a problem right in other problems?

(4) When the candidate’s exam is being graded, the grader finds that the candidate has answered the first two exam questions all correctly. Can the grader conclude that the candidate is more or less likely to guess a problem right in other problems?

Solution

(1) The probability of having three wrongs but the fourth right.

If you use the number of failures X as the geometric random variable, then you have

( ) (

)

pX x = p p , x =0,1,2,3,… and p=0.2

( )

(

)

³ ^{0.2 1 0.2}

( )

pX = p p = =10.24%

If you use the number of trials Y as the geometric random variable, then you have:

( ) (

)

^y ¹

pY y = p p , y=1,2,3,… and p=0.2

( )

⁴

(

)

^{4 1} ^{0.2 1 0.2}

( )

pY = p p = =10.24%

(2) Find ^{E X}

( )

^and^{Var X}

( )

If you use the number of failures X as the geometric random variable, then you have

( )

¹ ¹ ¹ ¹ ⁴

If you use the number of trials Y as the geometric random variable, then you have

( )

¹ ¹ ⁵

(3) Geometric distribution lacks memory. The past failures have no use in predicting future successes or failures.

If the candidate really has zero knowledge of the subject and all exam problems are unrelated, then the candidate answering the first 5 questions all wrong is purely by chance. This information is useless for predicting the candidate’s performance on other exam questions.

(4) Once again, geometric distribution lacks memory. The past successes have no bearing on future successes or failures.

Page 124 of 425

©Yufeng Guo, Deeper Understanding: Exam P Problem 2

You throw a die repeatedly until you get a 6. What’s the probability that you need to throw more than 20 times to get 6?

Solution

If you use the number of trials X as the geometric random variable, then you have

( ) (

)

ⁿ ¹

P X n = p , p=1 6

(

²⁰

) (

²¹

) (

)

^{21 1}

(

^{1 1 6}

)

^{21 1}

P X > =P X = p = = 2.6%

If you use the number of failures Y as the geometric random variable, then you have:

( ) (

)

P Y k = p

Throwing a die at least 21 times to get a 6 is the same as having at least 20 failures before a success. ^{P Y}

(

²⁰

) (

⁼ ¹ ^p

)

²⁰ ⁼

(

^{1 1 6}

)

²⁰^=2.6%

Problem 3

The annual number of losses incurred by a policyholder of an auto insurance policy, N, is geometrically distributed with parameter p=0.8. If losses do occur, the amount of losses is either $1,000 with probability of 0.4 or $5,000 with probability of 0.6. Assume the number of losses and amounts of losses are independent.

Let S=annual aggregate loss incurred by the policyholder.

Find ^{E S}

( )

^and^{Var S}

( )

Solution

Let X=amount of a single loss (set K= $1,000 to make our calculation easier):

1 with probability 0.4 5 with probability 0.6 X K

= K

We wrote 1K and 5K so we won’t forget that the unit is $1,000.

S =NX

Because N and Xare independent, we have

Page 125 of 425

©Yufeng Guo, Deeper Understanding: Exam P

( ) ( ) ( )

E S =E N E X (intuitive)

( ) ( ) ( ) ( )

( ) ( )

Var S =Var NX =Var N E X +E N Var X (not intuitive ) You should memorize the above formulas.

Note in the second formula above, the first item on the right hand side is ^{Var N E}

( ) ( )

² ^{X ,}

not Var N E X . To see why, notice that

( ) ( )

^{Var NX}

( )

^is$ (variance of a dollar amount ² is dollar squared); ^{E N Var X}

( ) ( )

is dollar squared.^{Var N E}

( ) ( )

² ^X is also dollar

squared. If you use ^{Var N E X}

( ) ( )

as opposed to the correct term ^{Var N E}

( ) ( )

² ^{X , then}

( ) ( )

Var N E X is a dollar amount, which will not make the equation hold.

To calculate ^{E X}

( )

^andVar X , we simply plug in the following data (remember to

( )

scale up the probability) into BA II Plus (by now this should be your second nature):

1 with probability 0.4 5 with probability 0.6 X K

= K

Set X01=1, Y01=4; X02=5, Y02=6. Using BA II Plus 1-V Statistics Worksheet, you should get:

Next, we need to find the mean and variance of N. N has a geometric distribution with parameter p=0.8. Should we treat N as the number of trials or the number of failures?

Because N is the number of losses occurred in a year and a policyholder may have zero losses in a year, Nstarts from zero, not from one. We should treat N as the number of failures and use the mean formula that produces the lower mean:

( )

¹ ¹ ¹ ¹ ^0.25

( ) ( ) ( )

( ) ( )

^{0.3125 3.4}

( )

² ^{0.25 3.84}

(

)

Var S =Var N E X +E N Var X = K + K

( ) ( )

2 2 2

4.5725K 4.5725 1, 000 4, 572, 500 $

= = =

Problem 4

The annual number of losses incurred by a policyholder of an auto insurance policy, N, is geometrically distributed with parameter p=0.4.

Find ^{E N}

(

² ^N ^>²

)

^, ^{Var N}

(

² ^N ^>²

)

^, ^{E N N}

(

^>²

)

^,^{Var N N}

(

^>²

)

Solution

Geometric random variable N doesn’t have any memory of its past. After seeing k losses (k is a non-negative integer), if we set the counter to zero and start counting the number of losses incurred from this point on (that is N k ), the count N k will have the identical geometric distribution with the same parameter p.

In this problem, after seeing two losses, if we reset the counter to zero, the number of losses we will see in the future is N 2, where N is the original count of losses before the counter is reset to zero. N 2 has a geometric distribution with the identical

parameterp=0.4. In other words, the conditional random variable N 2 N > is also 2 geometrically distributed with p=0.4.

(

² ²

) ^{( )}

¹ ¹ ¹ ^{1 1.5}

E N N E N 0.4

> = = p = =

( ) ( )

2 2

1 1 1 1

2 2 3.75

0.4 0.4

Var N N Var N

p p

> = = = =

(

) (

² ²

)

E N N > =E N N > + =1.5+2=3.5

(

) (

² ²

)

^3.75

Var N N > =Var N N > =

Homework for you: rework all the problems in this chapter.

Page 127 of 425

In document Guo EXAM P (Page 120-128)