• No results found

Example 1: Dear Abby. Stat Camp for the Full-time MBA Program

N/A
N/A
Protected

Academic year: 2021

Share "Example 1: Dear Abby. Stat Camp for the Full-time MBA Program"

Copied!
14
0
0

Loading.... (view fulltext now)

Full text

(1)

Stat Camp for the

Full-time MBA Program

Daniel Solow

188

Lecture 4

The Normal Distribution and the

Central Limit Theorem

You wrote that a woman is pregnant for 266 days. Who said so? I carried my baby for ten months and five days, and there is no doubt about it because I know the exact date my baby was conceived. My husband is in the Navy and it couldn’t possibly have been any other time because I saw him only once for an hour and I didn’t see him again until the day

Example 1: Dear Abby

189 an hour, and I didn t see him again until the day before the baby was born.

I don’t drink or run around, and there is no way this baby isn’t his, so please print a retraction about the 266-day carrying time because otherwise I am in a lot of trouble.

San Diego Reader

Dear Abby

Step 1: Identify an appropriate random variable. Y = number of days of pregnancy

What are the possible values for Y? What is the density function for Y?

About 230 – 290? Prob Density ??? 190 265 270 260 275 255 … … Days Prob. Density

Idea: Approximate the density of Y with a normal!

Dear Abby

Question: If you are going to use a normal approximation, what information do you need?

Answer: The meanand standard deviation.

Fact: According to the collective experience of generations of pediatricians, pregnancies have a mean of 266 and

standard deviation of 16 days, so Y ~ N(= 266, = 16). • Question:What are the possible values forY? –to

191 Question: What are the possible values for Y?

Question: How can the number of days of pregnancy be

<230?

Answer:Using the normal distribution, you have that P(Y < 230) = NORMDIST(230, 266, 16, true) 0.01. • Thus, when using the normal approximation, there is only

about 1% chance that a pregnancy lasts less than 230 days.

to 

(2)

Dear Abby

Step 2: State what you are looking for as a

probability question in terms of the rv.

You want to find

P

(

Y

10 mo. and 5 days) =

P

(

Y

310).

Step 3: Use the probability distribution of

192

= 1 – NORMDIST(310, 266, 16, TRUE)

Step 3: Use the probability distribution of

the rv to answer the probability question.

= 0.00298

P

(

Y

310) = 1 –

P

(

Y

<

310)

Was she telling the truth?

Possibly, but highly unlikely.

Example 2: Problem of GoodTire

GoodTire has a new tire for which in order

193

GoodTire has a new tire for which, in order to be competitive, they want to offer a warranty of 30,000 miles. Before doing so, the company wants to know what fraction of tires they can expect to be returned under the warranty.

The Problem of GoodTire

•For GoodTire, let

X = number of miles such a tire will last.

Step 1: Identify an appropriate random variable.

What are the possible values for X? Wh t i th d it f ti f X? 0 – 90000? ??? (cont.) 194 X ~N(= 40000, = 10000) with possible values: What is the density function for X? ???

From statistical analysis of a random sample, GoodTire believes the mileage follows approximately a normal distribution with a mean of 40,000 miles and a standard deviation of 10,000 miles, so assume that

–to 

The Problem of GoodTire

Step 2: State what you are looking for in

terms of a probability question pertaining

to the random variable.

195

•GoodTire wants to know the

P{

X

30000} = ?

Likelihood a tire fails =

Fraction of tires returned =

(3)

The Problem of GoodTire

Step 3: Use the probability distribution

of the random variable to answer the

probability question.

•For GoodTire, you have

P

{

X

30000} = ?

196

, y

{

}

40000 X N(40000, 10000) 30000 NORMDIST(30000, 40000, 10000, TRUE) = 0.1587

The Problem of GoodTire

Question: The CEO finds that a 16% return rate is too high. What warranty mileage s should they offer to get a 5% return rate?

Step 2: Probability Question: What should sbe so that P{X s} = 0.05? 197 40000 s = ? 0.05 Step 3:s = NORMINV(0.05, 40000, 10000) = 23551.47

Fact: While you cannot control the value of a rv, you can control the likelihood of certain events occurring with that RV.

Example 3: Marketing

Projections

• From historical data over a number of years, a

firm knows that its annual sales average $25

million. For planning purposes, the CEO wants

to know the likelihood that sales next year will:

198

to

ow t e

e ood t at sa es e t yea w :

– Exceed $30 million.

– Be within $1.5 million of the average.

The CEO is willing to issue bonuses if sales are

“sufficiently” high. What level should be set so

that bonuses are given at most 20% of the time?

Marketing Projections

•Let

Y = next year’s sales in $ millions.

Step 1: Identify an appropriate random variable.

What are the possible values for Y? 0 – 50?

199

Y ~N(= 25, = 3) p

What is the density function for Y? ???

From statistical analysis over a number of years, they believe that annual sales follows approximately a normal distribution with a mean of $25 mil. and a standard deviation of $3 mil., so assume that

(4)

Marketing Projections

Step 2: State what you are looking for in

terms of a probability question pertaining

to the random variable.

•You want to know:

P( l d $30 il ) P(Y≥30)

200

•P(sales exceeds $30 mil.) = •P(sales is within $1.5 of $25 mil.) =

P(giving a bonus) = 0.20? P(Y s) = 0.20?

P(Y ≥ 30).

P(23.5 Y26.5).

•What should be the value of sales (s) so that

Marketing Projections

Step 3: Use the probability

distribution of the random variable to

answer the probability question.

•From Excel, using = 25 and = 3:

201 •P(Y ≥ 30) = •P(23.5 Y26.5) = 1 NORMDIST(30, 25, 3, TRUE) NORMDIST(26.5, 25, 3, TRUE) – NORMDIST(23.5, 25, 3, TRUE) = 0.045. = 0.383. •s = NORMINV(0.8, 25, 3) = 27.524.

Example 4: DUI Test

• In many states, a driver is legally drunk if the blood alcohol concentration, as determined by a breath analyzer, is 0.10% or higher.

• Suppose that a driver has a true blood alcohol concentration of 0.095%. With the breath analyzer

202

y test, what is the probability that the person will be (incorrectly) booked on a DUI charge?

Step 1: Identify an appropriate random variable.

Let Y = the measurement of the analyzer as a %.

Question:What are the possible values for Y? 0 – 0.3? (cont.)

DUI Test

Step 1 (continued).

Question:What is the density function for Y?

Answer:We do not know, but experience indicates that Y follows approximately a normal distribution with mean equal to the person’s true alcohol level

203

with mean equal to the person s true alcohol level and standard deviation equal to 0.004%, so…

= the person’s true blood alcohol level (%)

(5)

DUI Test

Step 2: State what you are looking for in

terms of a probability question pertaining

to the random variable.

•You want to know the probability that a

204

p

y

person with

= 0.095 will be (incorrectly)

booked on a DUI charge:

P(

Y

0.10)

P(being booked on a DUI) =

DUI Test

Step 3: Use the probability distribution of

the random variable to answer the

probability question.

F

E

l ( i

0 095 d 0 004)

205

•From Excel (using

= 0.095 and = 0.004):

P(

Y

0.10) =

1

NORMDIST(0.10, 0.095, 0.004, true) =

0.1056.

•There is about a 10% chance that such a

person will be incorrectly charged with a DUI.

An Insurance Problem

GoodHands is considering insuring employees of GoodTire. What annual premium should the

company charge to be sure that there is a likelihood of no more than 1%

206

of losing money on each customer?

This is an example ofdecision making under uncertainty:you have to make a decision today —how much should the annual premium be—

Question: Why is the future uncertain? facing an uncertain future.

Solving the Insurance Problem

Step 1: Identify an appropriate random variable.

Let X = the $ claimed by a customer in one year.

What are the possible values for X? [0, 100000 (?)]

Is X continuous or discrete? discrete

What is the density function for X?

207 X ~N(= 2500, = 1000)

y

It is unknown, so borrow one.

From statistical analysis, the annual claim for these people follows approximately a normal distribution with a mean of $2500 and a standard deviation of $1000, so:

Note: It can be OK to approximate a discrete RV with a continuous distribution.

(6)

P b bili Q i Wh h ld h

An Insurance Problem

Step 2: State what you are looking for in terms of a probability question pertaining to the RV.

For GoodHands, what should the premium s be so that the likelihood of losing money is no more than 1%. Question: When do you lose money on a customer?

Probability Question: What should the premium s be so that the

208 2500 XN(2500, 1000) X s 01 . 0 } {XsP s P( ) = 0.01?

An Insurance Problem

Step 3: Use the probability distribution of the random variable to answer the probability question.

XN(2500, 1000) 01 . 0 } {XsP 209 = NORMINV(0.99, 2500, 1000) = $4826.35

Fact: While you cannot control the value of a rv (such as the claim of a person), you cancontrol the likelihood of certain

events occurring with that RV (such as the likelihood of such a claim exceeding the premium).

2500 s

The Insurance Problem (cont.)

Question: GoodTire wants to insure all 100 of its employees through GoodHands. What premium should GoodHands charge per employee so that the likelihood of losing money on the averageof all these claims is 1%? Step 1: Identify appropriate random variables.

For GoodHands let

210 100 / ) ... (X1 X100 X  

For GoodHands, let

Xi= the $ / annual claim of customer i (i = 1,…,100)

Xi~N(= 2500, = 1000)

Question: What is the distribution of the random variable ?X

Answer: You do not know. However, because is the AVERAGE of other rvs, try…

X

TheCentral Limit Theorem provides an approximate density function when the r.v. you are interested in is the

average of n other rvs, say, X1, X2, …, Xn, that are:

(1) Independent

The Central Limit Theorem

(knowing the value of one rv tells you nothing about the values of the other rvs).

(2) Identically distributed

211

( , / n) nothing about the values of the other rvs).

(have the same density function with mean and standard deviation ), then, for “large” n,

(approx.) N n X X X 1... n~

(7)

The Insurance Problem (cont.)

2500 , 100 .

N  1 ... 100 100 X X X  

For the insurance problem, you have

Xi= annual $ claimed by person i (i = 1, …, 100)

~N

~N 2500,  =1000 . 2500, 1000 / 100 212 100

(1) Are X1, X2, …, X100independent random variables?

Yes, because the amount claimed by one person has no effect on the amount claimed by another person. (2) Are X1, X2, …, X100identically distributed? Yes, because

Therefore, by the CLT, is approximately Normal with… X

An Insurance Problem

Step 2: State what you are looking for in terms of a probability question pertaining to the random variable.

For GoodHands,

What should the premium s be so that the

213 probability that the average of the 100 claims exceeds s is 0.01?

Probability Question: What should sbe so that

? 01 . 0 100 ... 100 1      XX  X s P N(2500, 100) X

An Insurance Problem (cont.)

Probability Question: What should the premium s be so thatP

Xs

0.01? 01 . 0 } {XsP 2500 214 Step 3: Use the probability distribution of the

random variable to answer the probability question. s= NORMINV(0.99, 2500, 100)

= $2732.64

s

Another Example of the CLT

• In modeling the performance of a team with 5

people, consider the following five rvs:

P

i

=

performance contribution of person

i

for (i = 1,…,5)

for (i 1,…,5)

215

U[0,1]

Possible values: [0, 1] (continuous)

Density function:

E[

P

i

] =

= 0.5 STDEV[

P

i

] =

=

0

.

29

12

1

However, what is of interest is the

team

(8)

Another Example of the CLT

T =

performance of the whole team

5

5 4 3 2 1

P

P

P

P

P

Possible values: [0, 1] (continuous)

216

Density function: ???

You cannot find the true density function, so

borrow one.

Because the rv

T

is the

average

of other RVs,

think of using the

Central Limit Theorem

to

approximate the density function of

T

.

0.29.

The Team Problem

For the team problem, you have

Pi= performance of person i (i = 1, 2, 3, 4, 5) (0.5,0.13). N0.5 and std. dev. = 0.5,

~U[0, 1] with mean =

5 5 4 3 2 1 P P P P P T     ~N(0.5,0.29/ 5)N(0.5,0.13) 217 (1) Are P1, P2, P3,P4,P5independent random variables?

Yes, assuming that the performance of a person says nothing about the performance of another person. (2) Are P1, P2, P3,P4,P5identically distributed?

Therefore, by the CLT, Pis approximately Normal with… Yes, because

5

The Team Problem

Question:

What is the probability that the

team performance is at least 0.75?

P(

T

0.75)

=

1 – NORMDIST(0.75, 0.5, 0.13, TRUE) =

0.027

218 0.5 TN(0.5, 0.13) P(T≥ 0.75) 0.75

The Average of a Sample

Suppose you are going to record the numbers X1, X2,…, Xntaken from a sample of size n from a population and then compute:

n X X X 1... n

If you have not yet taken the sample then X ISa rv Is a rv? X

The answer depends on “timing”. If you have already taken the sample, then X is NOTa rv.

219

All possible values:

There is no practical way to list the possible values, so… G1

Groups of size n: for the group:

X A1

G2 A2

G3 A3

If you have not yet taken the sample, then X ISa rv.

Discrete, but…

YOU CANNOT WRITE THE DENSITY FUNCTION.

The (finite) list of averages of every group of size nin the population.

(9)

The Average of a Sample

n X X

X 1... n The rvs X1, X2,…, Xnare iid

from the same population with mean = and std. dev. = 

Solution: Because is the average of rvs, think of the using the CLT which if applicable results in the

X 220 (, +) Possible Values: ) ( ~N X

Now you can use the Normal Distribution to answer your probability question about .X

using the CLT which, if applicable, results in the following density function for X:

,/ n

A Final Example of the CLT

• Historical data collected at a paper mill show that 40% of sheet breaks are due to water drops, resulting from the condensation of steam. • Suppose that the causes of the next 100 sheet

breaks are monitored and that the sheet breaks are i d d f h

221

independent of one another.

• Find the expected value and the standard deviation of the number of sheet breaks that will be caused by water drops.

• What is the probability that at least 35 of the breaks will be due to water drops?

• Success = break due to water drops • P(success) = p =

X= number of breaks due to water drops • Xis Binomial with n = 100 and p = 0.4 • E(X) =

Exact Answer

np =(100)(0 4) = 40 0.4 222 • E(X) = • From Excel P(X35) = 1 – P(X< 35) = 1 – P(X34) • = 1 – BINOMDIST(34, 100, 0.4, TRUE) • = 0.8617 np = (100)(0.4) = 40 =(100)(0.4)(0.6) = 24 = 4.9 SD(X) = n p (1 p)

Normal Approx. to Binomial

For this problem, let

p =

P(success) = 0.4, and

In this problem, you are interested in the rv

,

1

,...,

100

on trial

failure

a

if

,

0

on trial

success

a

if

,

1

i

i

i

X

i 223

X

= number of successes in 100 trials

=

X

1

+

X

2

+

… +

X

100

To find P(

X

35) = P(

X

/

100

35

/

100) , you

need to know the probability distribution of

which, by the CLT, is approximately

normal, so…

X

/

100

,

(10)

Normal Approx. to Binomial

Each

X

i

~

Binomial(1,

p =

0.4), so

E[

X

i

] =

=

p =

0.4

49

.

0

)

1

(

]

[

X

p

p

SD

i

Assuming that

224

Assuming that

)

049

.

0

,

4

.

0

(

)

/

,

(

~

100

100 1

X

N

n

N

X

X

L

•The Xiare pairwise independent and

n = 100 is large enough (np >5 and n(1 –p) >5),

then by the CLT, the random variable

Normal Approx. to Binomial

Then, for

X = X

1

+ …+ X

100

P(

X

/

100

35

/

100)

P(

X

35) =

100

100

X

,

225

= 1

NORMDIST(0.35, 0.4, 0.049, TRUE)

= 0.85.

)

35

.

0

P(

X

(The exact answer was 0.86.)

A function y = f(x) describes a relationship between the two quantitative variables x and y.

Review of Basic Math

y = f(x) = –x + 2 (a linear relationship) •y = f(x) = x2– 2x + 1 (a nonlinear relationship)

226

You can represent a function visually as follows: y

x

y

x

Review of Functions

You can also think of a function fas transforming an

inputxinto an outputy, as follows:

x

227

f f(x) =y

Note: A function fcan have many input values, instead of just one.

(11)

y y = mx + b A linear equation y = mx + b, provides a relationship between the two variables, x and y, in which:

Review of Linear Equations

b= they-intercept

= the value of y when x = 0. b m 1 228 x y xm>0: as x increases, y increases. m>0 m<0 m= 0 •m = the slope of the line

m = 0: as x increases, y remains the same.

m<0: as x increases, y decreases.

= the change in yper

unit of increase in x. x 1

x + 1

An Example of a Line

If

y =

the thousands of bushels of wheat

x

= the number of inches of rain

then for the line

229

then, for the line

y =

80

x

+ 71,

b =

71 means that there are 71,000 bushels of

wheat when there is no rain.

m

= 80 means that each extra inch of rain

results in 80,000 more bushels of wheat.

Sometimes a line is written in the form:

a

1

x

1

+

a

2

x

2

=

c

Assuming that

a

2

0, you can solve for

x

2

:

A Different Equation for a Line

230

x

2

= – (

a

1

/ a

2

)

x

1

+ (

c / a

2

)

y =

m

x +

b

How Large is Large Enough?

• For symmetric but outlier-prone data,

n= 15 samples should be enough to use the normal approximation.

• For mild skewness, n= 30 should generally be sufficient to make the normal approximation

231

appropriate.

• For severe skewness, nshould be at least 100 to use the normal approximation.

• Generally speaking, the larger n is, the better the normal approximation is.

(12)

Graphing a Line

To draw the graph of the line

a

1

x

1

+

a

2

x

2

=

b

:

• Find two

different

points on the line (usually by

setting

x

1

= 0 and finding

x

2

and then setting

x

2

=

0 and finding

x

1

).

232

g

1

)

• Plotting these two points on a graph.

• Drawing the straight line through those two

points.

Example of Graphing a Line

The line: 2x1 + x2 = 230 When x1= 0, x2= 230 When x2= 0, x1= 115 300 x2 233 2 , 1

Note: Any point on the line gives a value for x1

and a value for x2that

satisfies 2x1 + x2 = 230. x1 300 200 100 100 200

Solving Two Linear Equations

Objective:

Solve the following two equations

for

x

1

and

x

2

:

2

x

1

+ x

2

= 230 (a)

x

1

+

2

x

2

= 250 (b)

234

Solution Procedure:

– Solve (a) for x2:

– Substitute x2 = 230 – 2x1in (b): x1 + 2(230 – 2x1) = –3x1 + 460 = 250 (d) – Solve (d) for x1: – Substitute x1 = 70 in (c): x1 = 70 x2 = 230 – 2x1= 90. x2 = 230 – 2x1 (c)

Objective:

Solve the following for

x

1

and

x

2

:

(a) 2

x

1

+ x

2

= 230

(b)

x

1

+ 2x

2

= 250

Alternative Procedure:

– Multiply (a) through by 2.

(c) 4

x

1

+

2x

2

= 460

Another Approach

(d) 3

x

1

= 210

–[

(b)

x

]

1

+ 2x

2

= 250

235 p y ( ) g y – Subtract (b) from (c). – Solve (d) for x1:

– Substitute x1 = 70 in (a) and solve for x2: x2 = 230 – 2x1= 90

Note:

There are computer packages for solving

n

linear equations in

n

unknowns.

(13)

Exponentials

• An

exponent

is the power to which a number

(called the

base)

is raised.

Example:

2

5

(base = 2; exponent = 5)

Question:

How much will $1000 be worth after

5

t 6%

d i t

t?

236

5 years at 6% compound interest?

Year 1 Year 2 Year 3 Year 4 Year 5

Principal$1,000.00 $1,060.00 $1,123.60 $1,191.02 $1,262.48 Interest $60.00 $63.60 $67.42 $71.46 $75.75 Total $1,060.00 $1,123.60 $1,191.02 $1,262.48 $1,338.23

Answer:

Total =

f

(

P

,

r

,

n

) =

P

(1 +

r

)

n

= 1000 (1 + 0.06)

5

= 1338.23

Properties of Exponents

• Laws of Exponents: –xa + b= xb + a= xaxb (example: 23 + 2 = 2322) – (xa)b=(xb)a= xab (example: (23)2 = 26)x–a= 1 / xa(example: 2–3 = 1 / 23= 1 / 8)x0= 1 237

• Exponential Functions Increase and Decrease Rapidly:

y = 2^x 0 200000 400000 600000 800000 1000000 1200000 0 5 10 15 20 25 y = 2^x y = 2^(-x) 0 0.1 0.2 0.3 0.4 0.5 0.6 0 5 10 15 20 25 y = 2^(-x)

Scientific Notation

Scientific Notation:

a

10

b

(also written as

a

E ±

b

) means move the decimal point of

a

:

b positions to the right, if b > 0. –b positions to the left, if b < 0.

238 p ,

Example:

4.000

10

3

= 4.000 E+3 =

Example:

4

10

–3

= 4 E

3 =

4000.

0.004

.

Logarithms

• The

log base

b

of

x

[written log

b

(

x

)] is the

power to which you must raise

b

to get

x

.

Examples:

log

10

(100) =

• Logs are only defined for positive numbers.

If th b

i

itt d th d f lt i 10

2,

log

2

(32) =

5

239

• If the base is omitted, the default is 10.

• The base

e

= 2.718… is used in some financial

applications (such as continuous compounding),

in which case, log

e

(

x

) is written as ln(x) (the

(14)

Laws of Logarithms

Logs convert products to sums, that is, logb(xy)=logb(x) + logb(y).

– Ex: log2(64) =

• logb(x / y)=logb(x) – logb(y) – Ex: log10(1000 / 100) =

Logs bring down exponents that is

log2(416) = log2(4) + log2(16) = 2+4 = 6

log10(1000) – log10(100) = 32 = 1

240 Logs bring down exponents, that is,

logb(xy) = y log b(x). – Example: log2(45) =

Logs undo exponentiation, that is, logb(by) = ylogb(b) = y.

– Example: log2(25) =

• loga(x) = k logb(x), where k = loga(b) – Example: log2(x) = 3.322 log10(x)

5(2) = 10 5 log2(4) =

5

Problem Solving with Logs

Question:

How many years will it take to

double an investment at

i

% interest

compounded annually?

Answer:

Let

P =

the initial investment

241

P =

the initial investment

r

= interest rate as a fraction =

i /

100

n

= the number of years of compounding

Then, after

n

years, you will have

P(1 +

r

)

n

.

Problem Solving with Logs

Answer (continued):

Thus, you want to find

n

so that

P

(1 +

r

)

n

= 2

P

To solve (a) for

n

, take the log of both sides to

bring the exponent

n

down:

(1 +

r

)

n

= 2

(a)

242

bring the exponent

n

down:

log[(1 +

r

)

n

] = log(2)

n

log[(1 +

r

)] = log(2)

n

= log(2) / log[(1 +

r

)]

• Example: At 6% (

r

= 0.06), it will take

n = log(2) / log(1.06) = 0.301 / 0.025 = 11.9 years.

Qn: Log base what?

Ans: Log base 10 (but any base will work).

References

Related documents

In this study, it is aimed to develop the Science Education Peer Comparison Scale (SEPCS) in order to measure the comparison of Science Education students'

In addition, the quality of human resources can be known through the Human Development Index (HDI) or Human Development Index (HDI) than the population of a country

The purpose of this study was to evaluate the diagnostic utility of real-time elastography (RTE) in differentiat- ing between reactive and metastatic cervical lymph nodes (LN)

The parameters of clients into Clients of special category (as given below) may be classified as higher risk.. and higher degree of due diligence and regular update of KYC

This paper describes our experiences using Active Learning in four first-year computer science and industrial engineering courses at the School of Engineering of the Universidad

Note: If the US/metric setting or piston/cam type is changed, the valve auto-homes after exiting from the Master Programming Mode.. Regeneration Type (Display

maritime simulation training, types of maritime simulation, the International Maritime Organization (IMO), its tools and its power for the effectiveness of the

Department of Music, University of Richmond, &#34;Junior Recital: Jacqueline Gabrielle Schimpf &#34; (2016).. Music Department