Normal distribution

30  Download (0)

Full text

(1)

CHAPTER

6

Normal distribution

Are the diameters of the rods exactly equal?

I

(2)

In your earlier studies of statistics, frequency distributions were considered and it was stated

that, although they are sometimes of importance in themselves, they are mainly important

in providing information about the population from which the sample is drawn.

a b a b

Figure 6-1: Histogram Figure 6-2: Frequency curve

Consider a relative frequency distribution of a continuous variable, represented by the

histogram (Figure 6-1). It is not difficult to imagine that, if the sample size is increased

indefinitely, it will approach the population size and, if the class intervals are made as small

as possible, the histogram will approach a well defined smooth curve (Figure 6-2). Such a

curve is called afrequency curve or probability density curve and it provides us with

information about a population in much the same way as the histogram does about the

sample. The shaded area in the histogram represents the relative frequency with which the

variable lies between the values a and bin the sample, and corresponds to the shaded area

in the frequency curve which represents the relative frequency or probability with which the

variable Hes between the values a and bin the population. If the probability density curve

is drawn so that the total area under the curve is unity and if the equation of the curve

y

= f(x) is known, then the probability that X

lies in the interval a to b can be calculated

by the process of integration:

Pr(a <

X

< b) =

[

b

f(x)dx.

The mean and standard deviation of a population are denoted by Greek letters µ and a

respectively and are called parameters to distinguish them from the mean and standard

deviation of a sample which are denoted by Roman letters x ands respectively and are called

statistics.

6.1 The normal distribution

y

a

(3)

One of the most important examples of a probability distribution of a continuous variable

is the

normal

distribution. If a random variable,

X,

is normally distributed, its frequency

curve has a typical symmetrical bell shape as.shown in Figure 6-3. This curve has been

found to give an adequate fit to a great variety of frequency distributions. The heights of

children of a certain age, the diameters of metal cylinders, the number of burning hours

of electric light globes manufactured by a particular firm, the Intelligence Quotient of

children in a certain area, are a few examples of distributions which are approximately

normal.

The normal distribution is defined by the equation of its frequency curve:

where:

and:

1

-! (�)'

Y

=

--e

i u

ufu

µ, = the mean value of X in the population

u

=

the standard deviation of X in the population.

How the equation of this curve is derived is beyond the scope of this book, but the following

properties should be noted.

(i) The curve extends, theor"etically at least, to infinity in either direction and so

X

can

assume all values.

(ii) The curve is symmetrical about the ordinate

x

= µ, and so the mean, mode and median

of the normal distribution coincide.

(iii)

Practically all of the population (about 99. 7 per cent) lies in the interval µ, ±

3a;

about

95 per cent lies in the interval µ,

±

2a;

about

j

of the population lies in the interval

µ,

±

(1.

(iv)

The total area under the curve is unity.

(v) The probability that

X

lies in the interval

a

to

b

is equal to the area under the curve

bounded by the ordinates

x

=

a

and

x

=

b.

6.2 Standard normal curve

Normal distribution curves will differ in location and degree of spread according to the

values of the parameters µ, and a respectively.

y

y

y

X µ X

Figure 6-4 Figure 6-5

In Figure 6-4, the curves differ in location but have the same degree of spread, i.e.

a

is the

same for both, but µ, is different.

(4)

It

would appear then, that the area for any interval would require special determination for

each particular curve. However, this is not the case, since all normal curves can be

transformed into a standard normal curve by putting µ =

0

and

a

= l. The equation of

the standard curve is:

1 _! ,'

Y

=

--e

2

-J2ir

To transform the normal equation to the standard normal equation put:

x-µ

z

= --and

Y= ay.

(1

Whenz

=

0,

Y = -

1-e

0

= -

1-

=

0.40

-J2ir

-J2ir

Whenz

= ±

1,

Y

=

v21r

},_e-f

=

0.24

When

z

= ±

2,

Y

= .

},_

e

- 2

=

0.05

v21r

When

z

= ±

3,

Y

= .

v21r

},_

e

-

4-5

= 0.004 (using a calculator)

For example, to evaluate -,/2ir

e

-

4-5,

we can proceed as follows:

(g)

a a e •

111:1@

o

CID 11111111

e

-3

-2

-1

Figure 6-6: Standard normal curve

z

0.4

0.3

0.2

0.1

z,

0

0°0044

2 3 z

The total area under this curve is unity but the area from - oo to any positive value of z

is given in the normal probability tables provided (see page 178).

To find the area under any normal curve between the ordinates x1 and X2, we find the

corresponding values of z1 and z2 from the transformation formulae:

X1 - µ

X2 -

µ

z1

=

---andz2

a

= ---

a

(5)

Example 1

The heights of VCE students in Victoria may be considered to be normally distributed with

a mean of 170 cm and a standard deviation of

5

cm.

a What is the probability that a student, selected at random, has a height between 174 cm

and 178 cm?

b Out of a group of 150 VCE students, how many would be expected to have a height less

than 164 cm?

c What proportion of students would be expected to have heights deviating from the mean

by more than two standard deviations?

y

A

155 160 :165 170 175 180 185 X

-3

-2 :-1 0

'

: 1 2

3

z

-1'.2 o.'8 1.6

Figure 6-7

a The shaded area,

A,

in Figure 6-7 measures the required probability.

Whenx = 174:

Whenx

=

178:

z = X - /J, = 174 - 170 = 0.8

a

5

Pr(X

<

174)

=

Pr(z

<

0.8)

=

0.7881 (from tables)

z =

X �

/J, = 178 ; 170 = 1.6

Pr(X

<

178) =

Pr(z

<

1.6)

= 0.9452 (from tables)

Pr(174

<X

<

178)

=

0.9452 - 0.7881

= 0.1571

b The shaded area,

B,

in Figure 6-7 measures the probability of a student having

a height less than 164 cm.

Whenx = 164:

4

0

z = X - µ, = 16 - 17 = _ 1.2

a

5

Pr(X

<

164)

=

Pr(z

< -

1.2)

=

Pr(z

>

1.2) (from symmetry of curve)

= 1 -

Pr(z

<

1.2)

= 1 - 0.8849

=

0.1151

Expected number

=

150 x 0.1151

(6)

y

155 160 165 170 175 180 185 X

Figure 6-8 -3 -2 -1 0 2 3 z

c The required proportion is the sum of the areas

C

and Din Figure 6-8. By

symmetry, these areas are equal. For two standard deviations above or below the

mean,z

= ±

2.

Pr

(X> µ

+

2o-) =

Pr(z>

2)

=

1 - Pr(z<2)

=

1 - 0.9772 (from tables)

=

0.0228

Pr(X> µ

+

2a or<µ

- 2a)

=

2 x 0.0228 (from symmetry)

= 0.0456

This means that about

5

OJo of the students have heights deviating from the mean

by more than two standard deviations, or that about 95% of students have

heights within two standard deviations from the mean. This is one of the

characteristics of the normal distribution. Verify that about� of the population

lies within one standard deviation from the mean and that practically all of the

population (about 99.7%) lies within three standard deviations from the mean.

Example 2

A lathe turns out brass cylinders with a mean diameter of 2.00 cm and a standard deviation

of 0.04 cm. Assuming that the distribution of diameters is normal, find the limits to the

acceptable diameters if, on checking, it is found that five per cent in the long run are rejected

because they are oversize and five per cent are rejected because they are undersize.

Each of the shaded areas in Figure 6-9 is five per cent of the total area. This is an

inverse type of problem in which the proportions are given and the x-values of the

inside ends of the shaded portions are to be found. Using the inverse normal

distribution tables on page 178, we find the z value such that the area from -oo to

this z value is 0.95. Its value is 1.6449. By symmetry, the other z value is -1.6449.

Figure 6-9

1.92 1.96

-2 -1

-1.6449

y

2.00 0

,

1.6449 0.05 ,

(7)

I / ---...__..._

\

z -

x-

µ·"-...

� 7"-,..- ')

'Z

V

"'- .

+ 1 6449

:·x�

2·00

- .

0.04

X

=

2.00 ±

1.6449 X 0.04

=

2.00

±

0.07

=

1.93 or 2.07 (cm)

These are the acceptable limits.

Example 3

The mean life of a certain type of television tube is 10 000 hours with a standard deviation

of 1000 hours. Assuming the distribution of lifetimes, X, is normal, find the probability that

Xis less than any specified value

x.

Using integral multiples of the standard deviation, plot

the cumulative probability curve.

Figure 6-1 O

y

Pr (X< x)

�-+--'�--,...---t--��-i----i---1�x

7 -3

8 -2

9 -1

10 X 11 0

12 2

13 hours ('000) 3 z

The shaded area of Figure 6-10 shows the probability that Xis less than

x,

where

the values of the variable lie almost certainly between 7000 and 13 000.

When

x

=

7000:

z

= - 3 and Pr(X

<

7000) = 0.0013 (from tables)

When

x

=

8000:

z

= -

2 and Pr(X

<

8000)

=

0.0228

Similarly, for

x

=

9000, 10 000, .. . 13 000 as shown in the following table:

x('OOO) 7 8 9 10 11 12 13

Pr(X<x) 0.0013 0.0228 0.1587 0.5 0.8413 0.9772 0.9987

This table gives a cumulative probability distribution of a normal variable, X. It

is similar to a cumulative relative frequency distribution. From the table we can see,

for example, that 84.13 per cent of tubes have a lifetime less than 11 000 hours.

Figure 6-11 shows the cumulative probability curve from which the different

quantiles and relative frequencies can be calculated approximately. For example,

about 28 per cent of tubes have a lifetime less than about 9400 hours. The 0.6

quantile is approximately 10 250.

(8)

1.0

0.8 0.6

0.4

0.28-0.2

0 7

V

----I

---

---

---

i'i

I

---

---

_/

V

'

---

' '

8 9 : 10

'

11 12 13 hours ('000)

Figure 6-11 9_'4 10.25

Example4

A machine makes metal rods with a mean length of 50 cm and a standard deviation of 1 cm .

Assume that the distribution of lengths is normal.

a What proportion of rods whose length is greater than 49 cm will have a length in excess

of 50cm?

b If five rods are selected at random, what is the probability that not more than one of

these rods will have a length greater than 49 cm?

49 -1

y

50 0

51 52 2

53 X 3

z

Figure 6-12

a

Pr(X>

49)

=

Pr(z

>

-1)

=

Pr(z

<

1)

=

0.8413

Pr(X>

50)

=

0.5

This question involves conditional probability.

Using the formula:

Pr(A

n

B)

Pr(B

I

A)

=

Pr(A)

we get:

d

Pr(X>

50)

Pr(X>.5

IX> 49) =

Pr(X>

49). Why?

---

0.5

=

0.8413

=

0.5943

Or, simply using the geometry of the situation as shown in Figure 6-12:

(9)

NORMAL DISTRIBUTION 175

b Since each rod has the same probability 0.8413 of having length greater than

49 cm, we are dealing with a

binomial

variable,

Y,

withp

=

0.8413,

q

=

0.1587

and

n

=

5.

Pr(Y �

1)

=

Pr(Y

=

0)

+

Pr(Y

=

1)

Exercises 6a

=

(0.1587)

5

+ (\0) (0.1587)

4

(0.8413)

=

0.0001 + 0.0053

=

0.0054

1 Plot the curve of the normal distribution:

Y

=

_l _

crfiir

e-½(7)'

whenµ

=

20 and

<1

=

5 and, by counting squares, verify that the area under the curve

is approximately one square unit.

A normal variable has meanµ and standard deviation

<1.

What is the probability that

any value of the variable, randomly selected, lies betweenµ +

0.4<1

and µ + 2.6<1?

3 A manufacturer of electric light globes finds that these articles have an average life of

1200 burning hours with a standard deviation of 200 hours. Assuming that the

distribution of life-times is normal:

a what is the probability of a globe selected at random having a life between 1240 and

1320 hours?

b out of a batch of 200 globes, how many would be expected to fail in the first 880

burning hours?

c what proportion of globes manufactured would be expected to have a life less than

1100 hours or more than 1460 hours?

@

Tests on breaking strengths of two different kinds of fibre, one being silk and the other

a silk-rayon mixture, yielded the following data:

Silk: mean 10 kg wt; standard deviation 2.5 kg wt.

Silk-rayon: mean 15 kg wt; standard deviation 5 kg wt.

Calculate:

a the probability that a piece of silk, selected at random, will be at least as strong as

the mean for the silk-rayon mixture.

b the probability that a piece of silk-rayon selected at random will be no stronger than

the mean of the silk.

5 A machine makes electrical resistors which have a mean resistance of 50 ohms with a

standard deviation of 2 ohms.

a Assuming the distribution to be normal, find the proportion of resistors made which

have resistance of less than 47 .5 ohms.

b Calculate the limits

a

and

b,

equally spaced on either side of the mean, so that the

manufacturer can correctly claim that, in the long run, no more than one resistor

in 500 lies outside these limits.

· V The local authorities in a certain city install ! 000 electric lamps in the streets of the city.

a If the lamps have an average life of 2000 burning hours with a standard deviation

of 400 hours, and the life of the lamps is normally distributed, what number of lamps

might be expected to fail in the first 1500 burning hours?

(10)

7 Steel rods are manufactured to be

5

cm in diameter, but they are acceptable if they are

between 4.95 and

5

.05 cm. The manufacturer finds that, in the long run, about four per

cent are rejected as oversize and four per cent as undersize.

If

the diameters are

normally distributed, find the distribution's standard deviation.

8

Speedometers of cars are not accurate. Suppose that, when the speedometer of a

randomly chosen car registers 60 km/h, the actual speed of the car is a variable having

a normal distribution with meanµ

=

62 ana standard deviation a

=

2.

What proportion of cars are exceeding 60 km I h when their speedometers register

60 km/h?

9

The 'threshold', or smallest amount, of a certain poison which is sure to kill a rat, is

known to vary from rat to rat, following a normal distribution with mean 25 .0 mg and

standard deviation 2.5 mg.

a Find the proportion of rats that would be killed by a dose of 27 .0 mg.

b

Plot a graph (using integral multiples of the standard deviation) showing how this

proportion changes as the dose is increased or decreased.

c Find the smallest dose that would kill 90 per cent of rats.

d

What changes would you expect in the graph if the poison were diluted by adding

two parts of inert bait to one part of the original poison?

10

Butter, marketed in 250 g packages, has a weight which is normally distributed with its

advertised weight of 250 g as the mean. It may be regarded as appreciably underweight

if the actual weight is less than 225 g. Find the maximum allowable value of the

standard deviation if, in the long run, not more than one package in 100 is rejected as

being underweight.

11

The mean annual income for a sample of 200 persons selected at random from a certain

industry was $20 800 with a standard deviation of $4160. Of these, eight earned less

than $272 per week and 24 earned more than $480 Qer week. Does this sample tend to

confirm or refute the claim that incomes of the population from which this sample was

selected were normally distributed? Give reasons for your answer.

12

A firm producing brass washers to a specified thickness of 0.5 cm has found that the

thickness varies normally about a mean of 0.5 cm with a standard deviation of

0.005 cm. All washers with a thickness between 0.49 cm and 0.51 cm are regarded as

satisfactory. In a batch of 2000 washers, how many would you expect to be rejected?

13

The average height of male students at a certain university is 170 cm with variance 25.

What proportion of these students whose height is greater than 160 cm will have a

height in excess of 170 cm assuming the heights are normally distributed?

14

If

Xis a normally distributed random variable with mean 10, and the probability that

Xis greater than 12 is 0.1056, find:

a the standard deviation of X

b

Pr(X> 8)

c

Pr(X>

8

J

X<

12)

d

the value of

x

for which

Pr(X

>

x)

=

0.85

15 .·Xis a normally distributed random variable and the variance of Xis 4. Given that

Pr(X>

16) = 0.95, findµ, the mean value of

X

and determine:

a

Pr(X

<

16

IX<µ),

b

Pr(X

<

µ

IX<

20).

(11)

/

17 A certain population of plants has a distribution of heights, measured in centimetres,

which is normal with mean 30 cm and standard deviation 2 cm.

a

Cals;ulate the probability that a randomly selected plant will be less than 27 cm in

height.

b

If

five plants are selected at random, what is the probability that, at most, one is less

ilimn����n

18

The average life of a certain type of light globe is 1200 h with a standard deviation of

240 h.

a Assuming that the lengths of life of this type of globe are normally distributed,

complete the following proportionate frequency distribution.

Length of life

<480 <720 <960 < 1200 < 1440 < 1680 < 1920

Proportion of tubes

b

Use the table above to draw a cumulative proportion curve and, from the graph, find:

(i)

the proportion of tubes which have a life less than 700 h.

(ii)

the 0.9 quantile, stating what it represents.

19

The wingspan of birds of a particular species has a normal distribution with mean 50 cm

and standard deviation 5 cm.

a Find the probability that a randomly selected bird has a wingspan greater than 60 cm.

b

If

the wingspan is measured to the nearest centimetre, find the probability that a

randomly selected bird has a wingspan measured as 50 cm.

20 The length of a certain species of fish has a normal distribution with mean 30 cm and

standard deviation 2.5 cm.

a

Find the probability that a randomly selected fish has a length greater than 36 cm.

b

If

the lengths of the fish are measured correct to the nearest centimetre, show that

the probability of a randomly selected fish having a length which is measured as

30 cm is about 0.16.

c

If

five fish are randomly selected, find the probability that exactly two will have their

lengths measured as 30 cm.

21

Suppose that the strengths of mass-produced items are normally distributed with mean

µ

and standard deviation 0.5. The value of

µ

can be controlled by a machine setting.

If

the strength of an item is less than 5, it is classified as defective. Revenue from sales

of non-defective items is $20 per item, while revenue from defective items is $2 per item.

The cost of production of items with mean-µ is $2 per item. Find the expected profit

per item ifµ = 6.

2� A machine makes electrical resistors which have a mean resistance of 50 ohms with a

·� · standard deviation of 2 ohms.

23

a

Assuming the distribution of resistances to be normal, find the proportion of

resistors made which have resistance less than 47 .5 ohm�.

b

If

IO resistors are selected randomly, what is the probability that no more than one

will have a resistance less than 47 .5 ohms?

A chain is made of five links which are selected at random from a population of links.

The strengths of the links are assumed to be normally distributed with mean of 500 units

and a standard deviation of IO units. Find the probability that:

a a randomly selected link has a strength of at least 490 units

b a chain has a strength of at least 490 units

c

at least two links in a chain have strength of at least 490 units.

(12)

Area under standard normal curve

giving area as function of x, x � 0

/( 0 1 2

CD

4 5 6 7 6 9 1

0.0 �1000 'Q'.5040 0.5080 0.5120 0.5160 0.5199 0.5239 0.5279 0.5319 0.5359 4 0.1 ,5398 0.5438 0.5478 0.5517 0.5557 0.5596 0.5636 0.5675 0.5714 0.5753 4

0.2 0.5793 o.5832 o.5871 o.591 o 0.594�8t) 0.6026 0.6064 0.6103 0.6141 4

0.3 0.6179 0.6217 0.6255 0.6293 0.6331 0.6368 0.6406 0.6443 0.6480 0.6517 4

0.4 0.6554 0.6591 0.6628 0.6664 0.6700 0.6736 0.6772 0.6808 0.6844 0.6879 4

0.5 0.6915 0.6950 0.6985 0.7019 0. 7054 0. 7088 0. 7123 0. 7157 0. 7190 0. 7224 3

0.6 0.7257 0.7291 0.7324 0.7357 0. 7389 0. 7 422 0. 7 454 0.7486 0.7517 0.7549 3

0.7 0.7580 0.7611 0.7642 0.7673 0.7704 0.7734 0.7764 0.7794 0.7823 0.7852 3

0.8 0.7881 o. 791 o o. 7939 o. 7967 0.7995 0.8023 0.8051 0.8078 0.8106 0.8133 3

0.9 0.8159 0.8186 0.8212 0.8238 0.8264 0.8289 0.8315 0.8340 0.8365 0.8389 3

1.0 0.84� 3 0.8438 0.8461 0.8485 0.8508 0.8531 0.8554 0.8577 0.8599 0.8621 2

1.1 0.8643 0.8665 0.8686 0.8708 0.8729 0.87 49 0.8770 o.8790 o.881 o o.8830 2

1.2 0.8849 0.8869 0:8888 0.8907 0.8925-l�-8962 0.8980 0.8997 Q.9015 2

1.3 0.9032 0.9049 0.9066 0.9082 0.9099 !f.9fl 5 0.9131 0.9147 0.9162 0.9117 2

1.4 0.9192 0.9207 0.9222 0.9236 0.9251 0.9265 0.9279 0.9292 0.9306 0.9319 1

1.5 0.9332 0.9345 0.9357 0.9370 0.9382 0.9394 0.9406 0.9418 0.9429 0.9441 1

1.6 0.9452 0.9463 0.947 4 0.9484 0.9495 0.9505 0.9515 0.9525 0.9535 0.9545 1

1.7 0.9554 0.9564 0.9573 0.9582 0.9591 0.9599 0.9608 0.9616 0.9625 0.9633 1

1.8 0.9641 0.9649 0.9656 0.9664 0.9671 0.9678 0.9686 0.9693 0.9699 0.9706 1

1.9 0.9713 0.9719 0.9726 0.9732 0.9738 0.97 44 0.9750 0.9756 0.9761 0.9767 1 '

2.0 0.9772 0.9778 0.9783 0.9788 0.9793 0.9798 0.9803 0.9808 0.9812 0.9817 o

2.1 .0.982) 0.9826 0.9830 0.9834 0.9838 0.9842 0.9846 0.9850 0.9854 0.9857 o

2.2 0.9861 0.9864 0.9868 0.9811 0.9875 0.9878 0.9881 0.9884 0.9887 0.9890 o

@

0.9893 0.9896 0.9898 ().99011 0.9904 0.9906 0.9909 0.9911 0.9913 0.9916 o 2.4 0.9918 0.9920 0.9922 0.9925 0.9927 0.9929 0.9931 0. 9932 0. 9934 0. 9936 o

, .. )' )

2.5 0.9938 0.9940 0.9941 0.9943 0.9945 0.9946 0.9948 0.9949 0.9951 0.9952 o

2.6 0.9953 0.9955 0.9956 0.9957 0.9959 0.9960 0.9961 0.9962 0.9963 0.9964 o

2.7 0.9965 0.9966 0.9967 0.9968 0.9969 0.9970 0.9971 0.9972 0.9973 0.997 4 o

2.8 0.9974 0.9975 0.9976 0.9977 0.9977 0.9978 0.9979 0.9979 0.9980 0.9981 o

2.9 0.9981 0.9982 0.9982 0.9983 0.9984 0.9984 0.9985 o.9985 o.9986 o.9986 o

:r.o

0.9987 0.9987 0.9987 0.9988 0.9988 0.9989 0.9989 0.9989 0.9990 0.9990 o

3.1 0.9990 0.9991 0.9991 0.9991 0.9992 0.9992 0.9992 0.9992 0.9993 0.9993 o

3.2 0.9993 0.9993 0.9994 0.9994 0.9994 0.9994 0.9994 0.9995 0.9995 0.9995 o

3.3 0.9995 0.9995 0.9995 0.9996 0.9996 0.9996 0.9996 0.9996 0.9996 0.9997 o

3.4 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9998 o

3.6 0.9998 0.9998 0.9999 0.9999 0.9999 0.9999 0.9999 0.9999 0.9999 0.9999 o

3.8 0.9999 0.9999 0.9999 0.9999 0.9999 0.9999 0.9999 0.9999 0.9999 0.9999 o

3.9 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 o

Inverse normal tables

0.50

0.0000

0.60

0.2533

0.70

0.5244

0.80

0.8416

0.90

0.51

0.0251,

0.61

0.2793

0.71

0.5534

0.81

0.8779

0.91

0.52

0.0502

0.62

0.3055

0.72

0.5828

0.82

0.9154

0.92

0.53

0.0753

0.63

0.3319

0.73

0.6128

0.83

0.9542

0.93

0.54

0.1004

0.64

0.3585

0.74

0.6433

0.84

0.9945

0.94

0.55

0.1257

0.65

0.3853

0.75

0.6745

0.85

1.0364

0.95

0.56

0.1510

0.66

0.4125

0.76

0.7063

0.86

1.0803

0.96

0.57

0.1764

0.67

0.4399

0.77

0.7388

0.87

1.1264

0.97

0.58

0.2019

0.68

0.4677

0.78

0.7722

0.88

1.1750

0.975

0.59

0.2275

0.69

0.4959

0.79

0.8064

0.89

1.2265

0.98

Mean dlfferen es

2 (!: 4 5 6 7 8 9

8 12 76 20 24 28 32 36 8 12 16 20 24 28 32 36 8 12 15 19 23 27 31 35 8 11 15 19 22 26 30 34 7 11 14 18 22 25 29 32

7 10 14 17 21 24 27 31 6 10 13 16 19 23 26 29 6 9 12 1518 21 24 27 6 8 11 14 17 19 22 25 5 8 10 13 15 18 20 23

5 7 9 12 14 16 18 21 4 6 8 10 12 14 16 19 4 5 7 9 11 13 15 16 3 5 6 8 10 11 13 14 3 4 6 7 8 10 11 13

2 4 5 6 7 8 10 11 2 3 4 5

I\

7 8 9 2 3 3 4

5

6 7 8 1 2 3 4 4 5 6 6 1 2 2 3 4 4 5 5

1 1 2 2 3 3 4 4 1 1 2 2 2 3 3 4 1 1 1 2 2 2 3 3 0 I 1 1 1 2 2 2 2 o 1 1 1 1 1 2 2

o o 1 1 1 1 1 1 o o o 1 1

1

1 1 o o o o 1 1 1 1 o o o o o o o 1 o o o o o o o 1

o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o

o o o o o o o o o o o o o o o o o o o o o o o o

1.2816

0.990

2.3263

1.3408

0.991

2.3656

1.4051

0.992

2.4089

1.4758

0.993

2.4573

1.5548

0.994

2.5121

1.6449

0.995

2.5758

1.7507

0.996

2.6521

1.8808

0.997

2.7478

1.9600

0.998

2.8782

(13)

6.3 Normal approximation to binomial distribution

When

n

is fairly large (30 or more), andp is not too small (not less than about 0.1) or not

too large (not greater than about 0.9), the

normal

distribution with

µ,

= np

and standard

deviation a

=

-Jnjiq can be used as an approximation for the binomial distribution.

The histograms below and on the next page have been drawn for p = 0.8 and for values

of

n =

5, 10, 15 and 20. Observe that, as

n

increases, the histograms become less skewed,

leading then to the idea that the curve drawn through the midpoints of the top of each

rectangle of the histogram has the characteristic symmetrical shape of the normal

distribution curve.

Pr

0.4

0.3

0.2

0.1

I

µ

0

0

2

3 4

Number of successes

Figure6-13:p

=

0.8,n

=

5,p.

=

4

Pr

0.3

0.2

0.1

µ

0

0 2 3 4 5 6 7 8 9 10

Number of successes

(14)

Pr

0.3

0.2

0.1

3 6 7 8 9 10 11 12 13 14 15

Number of successes

Figure 6-15: p = 0.8, n = 15, µ, = 12

Pr

0.3

0.2

0.1

0

I I

µ

4 8 10 11 12 13 14 15 16 17 18 19 20 Number of successes

Figure 6-16: p = 0.8, n = 20, µ, = 16

The binomial variable is discrete and can assume only integral values 0,

1,

2, ... , n. How

then can we represent its probability distribution by means of a histogram which can be

drawn for a continuous variable only? This can be justified by the fact that the areas of

the rectangles are proportional to the probabilities and, since the width of the base of each

rectangle is one unit, the height of each rectangle represents the probability of the midpoint

of the base. The sum of the areas of all the rectangles is 1 unit of area, corresponding to

the fact that the sum of the probabilities is 1.

Example 5

A Gallup Poll establishes that 80 per cent of people interviewed are in favour of a certain

proposal. If 20 people are interviewed, find, using the normal approximation to the

binomial distribution, the probability that:

(15)

µ

=

np

=

20 X 0. 8

=

16

a

=

-,/npq

=

--J16 X 0.2

=

1.789

A normal distribution with mean 16 and standard deviation -13.2 can be used as

an approximation in this case.

a With reference to Figure 6-16 it will be seen that, to find the probability that

X

=

14,

it

will be necessary to find the probability that 13.5

<X*

< 14.5. It

should be remembered that the normal variable is continuous, whereas the

binomial variable is discrete.

Note:

If

Xis a binomial variable whose distribution is approximated by a normal

variable

X*,

then:

Pr(X =a)""' Pr(a

-

0.5<X*<a

+ 0.5)

and:

Pr(a <X

<

b)

""'

Pr(a

+

0.5

<X*

<

b

- 0.5)

for integer values

a

and

b

such that

a<

b.

WhenX*

=

13.5:

WhenX*

=

14.5:

z

=

13.5 - 16

""'

-1.40

-13.2

z

=

14.5 - 16""' -0.84

-13.2

Pr(X*

� 13.5)

=

Pr(z �

-1.40)

=

0.0808

Pr(X*

< 14.5)

=

Pr(z

< -0.84)

=

0.2005

Pr(13.5 <

X*

< 14.5)

=

0.2005 - 0.0808

=

0.1197

Check whether this is a good approximation by evaluating

(i�)

(0.2)

6

(0.8)

14•

b It will be necessary to find Pr(14.5

<X*

< 17.5)

When

X*

=

14.5:

WhenX*

=

17.5:

z

=

14.5 - 16

""'

-0.84

-13.2

z

=

11.5

- 16

""'

0.84

-13.2

Pr(X*

� 14.5)

=

Pr(z �

-0.84)

=

0.2005

Pr(X*<

17.5)

=

Pr(z<

0.84)

=

0.7995

Pr(14.5<X*< 17.5)

=

0.7995 - 0.2005

=

0.5990

Compare this result with the arithmetical drudgery involved in calculating:

(16)

Exercises 6b

(In each of the following questions, use the normal approximation to the binomial

distribution where applicable.)

1 A dental inspector finds that about 20 per cent of children of a certain area have tooth

decay. If a group of 400 children is randomly selected:

a

how many would be expected to have tooth decay?

b

what is the probability of exactly this number?

c

what is the probability that the number lies within one standard deviation of the

expected number?

2

A fair coin is tossed 100 times.

a How many heads do we expect to turn up?

b

What is the probability of this number?

c What is the probability that the number of heads is greater than 45 but less than 55?

3

A fair coin is tossed 500 times. Find the probability that the number of heads uppermost

will not differ from 250 by more than 10.

4

A fair die is thrown 180 times. What is the probability that:

a

a six will turn up exactly 40 times?

b an odd number will turn up at least 100 times?

5

A targetshooter finds that a bull's-eye is scored on 20 per cent of occasions. What is

the probability that at least 24 bull's-eyes will be scored out of 100 attempts?

6

a

Assuming that the length of life of a certain type of television tube is normally

distributed with a mean of 1000 hours and a standard deviation of 250 hours, what

proportion of tubes would be expected to have a life not exceeding 780 hours?

b

If 100 such tubes are randomly selected, how many would be expected to have a life

not exceeding 780 hours and what is the probability that the number exceeds 21?

7

A manufacturer of metal pistons finds, that on average, 10 per cent of the pistons are

rejected because they are either oversize or undersize. What is the probability that a

batch of 900 pistons will contain:

a

no more than 100 rejects?

b

at least 80 rejects?

8

Hospital records show that of patients suffering from a certain complaint, 75 per cent

recover. What is the probability that, of 48 randomly selected patients, at least 40

recover?

9

In packets of flower seeds, 40 per cent are known to produce pink flowers. If 250 seeds

are planted and they all flower:

a

how many pink flowers would we expect?

b

what is the probability of this number?

c

within what limits·would the number of pink flowers very probably lie?

6.4 Probability limits for a single value of the

normal /variable

It has been stated that one of the characteristic properties of a normal distribution is that:

(i)

about� of the population lies in the intervalµ ±

a

(Figure 6-17).

(17)

68.27%

µ-cr--µ µ+cr

Figure 6-17

95.45% 99.74%

µ -20' µ µ + 20' µ-30' µ µ + 30'

Figure 6-18 Figure 6-19

From this we infer that a single value,

x,

of the variable will

almost certainly

lie within 3

standard deviations of the mean:

i.e.

I almost certainly,

µ,

-

3a � x �

µ, +

3a.

Very probably

(probability of about 0.95) it will lie within 2 standard deviations of the

mean:

i.e.

very probably,

µ,

-

2a � x �

µ, +

2a.

These limits for the value of a variable are called the 3 sigma and 2 sigma

probability limits

respectively. If a value of the variable lies beyond either of these limits, it is said to differ

significantly

from the mean at that particular level of significance.

Example&

A manufacturer of electric light globes finds that the globes have an average life of 2000

burning hours with a standard deviation of 200 hours. Assuming that the distribution of

lifetimes is normal, within what interval will the lifetimes almost certainly lie?

Their lifetimes will almost certainly lie in the interval

µ,

±

3a.

i.e.

in the interval 2000 ± 600 hours

i.e.

between 1400 and 2600 hours

If a globe, randomly selected, had a lifetime of only 1200 hours, what conclusion could we

draw?

We could conclude thit this is most unlikely to have occurred by chance so perhaps there

is some factor which needs to be taken into consideration, such as a fault in the

manufacturing process.

Further sampling would be necessary to discover whether the lifetime of globes was

consistently lower than expected.

(18)

Example 7

In the long run, 64 per cent of patients treated for a particular disease with drug

X

are

cured. If 100 patients, not specially selected, are treated with this drug and 75 are cured,

determine whether this number is significantly higher than the expected numbe� of cures.

Expected number of cures:

µ

=

np

=

100 X 0.64

= 64

a

=

-Jnpq

=

-JlO0

X 0.64 X 0.36

= 4.8

µ

+

2a

=

64

+

2 X 4.8

=

73.6

µ

+

3a

= 64

+

3 X 4.8

=

78.4

So 75 cures are significantly more than the expected number at the 2 sigma limit

but not at the 3 sigma limit.

The mean, µ, is a characteristic parameter of a Poisson distribution and the variance and

standard deviation areµ and -Jµ, respectively, the derivation of which is beyond the scope

of this book. Whenµ, is large, the normal distribution gives a satisfactory approximation

to the Poisson distribution with practically all values of the Poisson variable lying in the

rangeµ ± 3-Jµ, since a

=

-Jµ, and very probably (probability of about 0.95) lying in the

range µ ± 2-Jµ,.

Example&

The number of demands for a certain item of equipment varies randomly from week to

week, following a Poisson distribution with mean 20. What is the smallest number of items

a firm must have in stock each week to be almost certain of not having to refuse a demand

for this item?

µ

=

20

a

= ,,/µ

= .J26 =

4 .4 72.

It is almost certain that the demand will not be more thanµ, + 3-Jµ,, i.e. not more

than 20

+

3 x 4.472

= 33.416.

The firm should have at least 34 items in stock.

Very probably the demand will lie in the interv.al µ, ± 2-Jµ,, i.e. in the interval

20 ± 2 x 4.472. The demand will very probably be between 11 and 29.

Exercises 6c

1 Assuming that the length of life of a certain type of television tube is normally

distributed with a mean of 1000 hours and standard deviation of 250 hours, after how

many hours is a tube almost certain to fail?

(19)

NORMAL DISTRIBUTION 185

3 A dental inspector finds that about 20 per cent of children of a certain age have tooth

decay. In a certain area the inspector finds that 25 out of 200 children examined have

tooth decay. Use 3 sigma limits to determine whether this number is significantly less

than the expected number.

4 A Gallup Poll establishes that 60 per cent of people are in favour of a certain proposal.

If a sample of 120 were interviewed, use 2 sigma limits to estimate the number in favour

of the proposal.

5 Electricity power failures occur according to a Poisson law with an average of three

failures every twenty weeks. If, over a period of 40 weeks, there were actually 9 failures,

use 3 sigma limits to determine whether this is significantly more than the expected

number.

6

Variables

X

and

Y

are known to be connected by the formula:

Y

= 10 +

bX.

X

can be measured accurately but the measurements of

Y

are subject to a random error

which is normally distributed with mean of 0 and standard deviation of 0.20. An

observation

Y

= 15 is obtained when

X

= 2. Determine limits within which

b

almost

certainly lies.

7 The cost,

$C

per article, of manufacturing an article is related to the weight,

w

g, by

the equation:

C

=

2w

+ 25

The weight of the articles is normally distributed with mean_ 5 g and standard deviation

0.1 g. Give limits within which the cost of the article will almost certainly lie.

8

On the average, one student in every ten wears glasses.

a From a group of 90 students, how many would be expected to wear glasses? Give

limits between which this number very probably lies.

b How large would a group of such students need to be for us to be almost certain

that the number of students in the group wearing glasses is at least 63?

9

A fair coin is tossed 100 times.

a What is the mean and standard deviation of the number of heads appearing

uppermost?

b Give limits between which the number of heads:

(i)

very probably will lie

(ii)

almost certainly will lie.

10 A

manufacturer of metal pistons markets the product in batches of 10 and finds that

15 per cent of batches contain at least one defective piston. In 1000 batches, estimate

the mean and standard deviation of the number of defective pistons and give limits

between which this number will almost certainly lie, assuming the Poisson law.

11

The number of demands for a certain item of equipment varies randomly from week

to week, following a Poisson distribution with mean 4. If

X

denotes the number of

demands per week, find

Pr(X �

µ, - a).

12

A retailer keeps a record of sales and finds that, on 82 out of 1000 days, there was no

demand for a particular item of clothing. On one particular day there was a demand

for 13 such items. Assuming the Poisson law, determine whether this is significantly

more than the expected number.

13

Electricity power failures occur according to a Poisson law, with an average of three

failures every twenty weeks. If, over a period of 40 weeks, there were actually nine

failures, use three sigma limits to determine whether this is significantly more than the

expected number.

(20)

·/

14 Cans of peas are tested for infection by certain organisms by storing them for a period

of time before they leave the factory. Cans which contain one or more organisms burst

open on account of fermentation.

a

If

the number of organisms in a can is a Poisson variate, and on the average 76 in

every 10 000 cans burst open, find the mean number of organisms per can.

b In batches of 5000 cans, find approximately the mean and standard deviation of the

number of cans which burst open, and give limits between which this number will

almost certainly lie.

15 In samples of milk taken from a bulk transportation vehicle, 40 per cent proved to have

no bacterial spores.

a Assuming the Poisson law, estimate the mean number of bacterial spores per sample

and determine the proportion of samples-which would contain t�o bacterial spores.

b Out of 1000 samples, how many would be expected to have only-one spore each?

Give limits between which this number very probably lies,

6.5 Probability limits for the sample mean of n

values of the variable

If

a random sample of

n

observations is drawn from a normally distributed population with

mean

µ,

and standard deviation

<J,

it can be shown that the mean,

x,

of the sample:

/

(i)

more likely than not

(probability about�) lies in the interval

µ,

±

Jn

(Figure 6-20).

(ii)

very probably

(probability of about 0.95) lies in the interval

µ,

±

2

J,,

(Figure 6-21).

n

i.e.

very probably,µ, - 2

J,, :,;;

x:,;; µ, + 2

J,,

(iii)

almost certainly

lies in the interval

µ,

±

3

Jn

(Figure 6-22).

i.e.

almost certainly, µ, - 3

Jn :,;;

x :,;; µ, ,+ 3

Jn

68.27%

µ-.Q. µ µ+.Q.

.Jn

.Jn

Figure 6'...20

95.45% 99.74%

µ-gg µ µ+gg µ-;m µ µ+�

.Jn

.Jn

.Jn

.Jn

(21)

The quantity

J,,

is known as the

standard error

of the mean. So, in Example 6, the mean

life of 100 randomly selected globes would

almost certainly

lie in the interval

2000

±

3

:;..:oo,

i.e. in the interval 2000 ± 60 h, and

very probably

would lie in the interval

100

2000

±

2

100

O, i.e. in the interval 2000 ± 40 h.

The standard error, J,,, gets smaller and smaller as n gets larger and larger, and

J,,---+O asn---+oo.

n

If

n

= 400, the mean life of 400 randomly selected globes would almost certainly lie in the

interval 2000 ± 3 x �. i.e. in the interval 2000 ± 30.

-v400

As

n

increases, x should give us a more reliable estimate of

µ,. This is what we would expect.

It seems feasible to suggest, then, that in order to get a true estimate of the population

mean, take as large a sample as possible. However, this is not practical in many situations ..

Certainly, in the case of the globes it would be wasteful and expensive. Why?

We use general phrases such as 'more likely than not', 'very probably', 'almost certainly'

when referring to probability limits and levels of significance such as the 2 sigma level and

the 3 sigma level. The probability that x lies in the intervalµ, ± 1.96 � is 0.95.

.

-vn

Example 9

The mean weekly wage in a c,ertain industry is $500 with a standard deviation of $30. A

random sample of 25 employees in this industry has a mean wage of $475. ls this

significantly less, at the 3 sigma level, than the mean wage of the population?

At the 3 sigma level,

µ,

±

3 �

=

500

±

3 X �

.

-vn

-v25

= 500 ± 18

Since 475 is not in this interval, it is significantly less at the 3 sigma level.

Exercises 6d

1 The mean weight of boys of a certain age is 50 kilograms with standard deviation of 5

kilograms. Within what limits would the mean weight of a random sample of 64 boys

of this age very probably be?

2 At a certain school, the mean IQ (Intelligence Quotient) of the students is 100, with a

standard deviation of 15. The mean IQ of a sample of 25 students was 112. Is this

significantly higher than would be expected?

3 A sample, 55, 63,--69;-'B, is drawn from a population whose standard deviation is 4 and

whose mean is thought to be 60. Do you think the population mean has been wrongly

given?

-�

4 A machine makes electrical resistors which have a mean resistance of 50 ohms with a

standard deviation of 2 ohms.

a Within what limits would we expe�t the mean of 25 randomly selected resistors to

lie with a probability of 0.95? W\'',

0'

(22)

5 Butter is marketed to retailers in cartons containing 16 packages drawn from a

population normally distributed with a mean weight of 0.5 kg and a standard deviation

of 0.02 kg.The butter in a particular carton weighed 8.15 kg. Is this significantly more

than the expected weight at the 2 sigma level of significance?

6 The mean height of a sample of 25 students is 150 cm. Can we infer that this sample is

drawn from a population of students of mean height 160 cm and standard deviation

10 cm?

'1

-:: 2

7 The length of a certain species of fish has a normal distribution with mean 30 cm and

standard deviation 2.5 cm. An angler caught nine such fish whose average length was

27 cm. Is this significantly less than the expected value at the

3a

level?

6.6 Confidence limits

a Population mean

Confidence limits are limits for the value of a parameter estimated from a particular value

of a statistic. Discussion will be confined in this section to estimating a population mean

from a single value of the variable or from the mean of a set of

n

observations.

If

xis

a particular value of a variable, it was stated in Section 6.4 that almost certainly:

µ,

-

3a

,,;;

x

,,;;

µ,

+

3a

. . . .. . .

(1)

Transposing

(1)

gives:

µ,

,,;;

x

+

3a

and

x

-

3a

,,;;

µ,

i.e.

x

-

3a

,,;;

µ,

,,;;

x

+

3a

...

(2)

The population mean,

µ,,

therefore almost certainly lies in the interval x

±

3a.

These limits

for the value of

µ, are called the 3 sigma confidence limits. Very probably,

µ,

lies between

x

±

2a,

these limits being called the 2 sigma confidence limits.

If

we have a

random sample of n observations

with mean

x,

we would expect

x

to give a

better estimate of

µ,

than

a single value

of the variable.

It can be shown that

µ,

almost certainly lies in the interval x

±

3;. The standard error, �.

will decrease as

n

increases.

'V

n

'V

n

We have assumed that the standard deviation,

a,

of the population is known.

If

it is not

known, the sample standard deviation,

s,

may be used as an estimate of it if the sample

is large.

The approximate 95% confidence interval for

µ, would be given by:

Example 10

-

2a

-

2a

x- c,,;;µ,,,;;x+

vn ·.

---,

vn

The IQ (Intelligence Quotient) of a sample of 100 VCE students had a mean of 108 with

a standard deviation of 15. Find a 95% confidence interval for the mean IQ of the

population of VCE students.

n

=

100

x

=

108

s

=

15

=

estimate of

a

ax

=

__!!__

=

__11_

=

1 5

(23)

The approximate 95% confidence interval would be given by:

-

2a

-

2a

x- -Jn �µ�x+ -Jn

i.e.

108 - 2 X 1.5 � µ � 108 + 2 X 1.5

105 � µ � 111

We can be about 95 % sure that the mean IQ of the population lies in the interval

105 to 111. The confidence limits are 105 and 111.

Example 11

A random sample of 25 employees has a mean weekly wage of $320. Could this sample have

been taken from a population of employees whose weekly wage is normally distributed with

mean of $290 and standard deviation of $40?

n

=

25

x =

320

a= 40

ax=_.!!_= 40

=

8

-Jn 5

We can be almost certain that the mean wage of the population would be in the

interval .x ±

;fn

i.e. in the closed interval 320 ± 3 x 8, i.e. (296, 344]

Since 290 does not lie in this interval, we can reject the hypothesis that the sample

is taken from a population whose mean weekly wage is $290.

b Proportions

Example 12

A random sample of 400 manufactured articles contains 80 defectives. Give 2 sigma

confidence limits for the number of defectives in samples of 400 articles and give the

proportion of defectives in the whole output of all samples of 400 articles.

Using this sample to estimate the probability of defectives:

n =

400

p = 400 = 0·2

-

80

where j3 denotes the

sample

proportion.

q

=

0.8

Standard deviation

=

-J npq

= v

r-

4

--,-

00

-,---

x-0

----,

. 2

,---

x-o

-=--

.----=8

=

8.

The 2 sigma confidence limits for the number of defectives in samples of 400 articles

are 80

±

2 x 8. At this confidence level, the number of defectives lies between 64

and 96. Therefore the proportion of defectives lies between 4

�6

and

1

0

6

0

,

(24)

Alternatively:

Usingp as the sample proportion:

-

X

80 O

P = n = 400 = ·2

q = l - p = 0.8

The 95% confidence intervals for pare given by:

P - 2 �P (1 n-p) :,;;,_ P :,;;,_ P + 2 �P (1 n-p)

i.e.

0.2 _ 2. /0.24�0

'\J

0.8 :5:: :5:: 0 2 + 2. /0.2 X 0.8

..._,_p ..._,_ •

'\J 400

i.e.

0.16 :,;;,_

p

:,;;,_ 0.24 as before

Formulae for the mean and standard deviation of a binomial

distribution

Random variable

Mean

Number of occurrences

(np)

µ

njj

=

np

Proportion of occurrences (p)

µ.

p

=P

Example 13

Standard deviation

<l

n;;

=

.Jnpq

<l

;;

=

vnpq=

n

n

A coin is tossed 500 times, and a head appears uppermost 320 times. Give 95% confidence

limits for the porportion of heads and state whether you consider the coin�iassed?

n =

500,p

= ��� = 0.64

The 95% confidence limits for pare given by:

P ± 2 �P (1 n- p)

= 0 64 + 2. /0.64 X 0.36

. - '\J 500

= 0.64 ± 0.04

Figure

Figure 6-3: Normal curve

Figure 6-3:

Normal curve p.2
Figure 6-1: Histogram

Figure 6-1:

Histogram p.2
Figure 6-4 Figure 6-5

Figure 6-4

Figure 6-5 p.3
Figure 6-6: Standard normal curve

Figure 6-6:

Standard normal curve p.4
Figure 6-7 a The shaded area, A, in Figure 6-7 measures the required probability. Whenx = 174:

Figure 6-7

a The shaded area, A, in Figure 6-7 measures the required probability. Whenx = 174: p.5
Figure 6-8 -3 c The required proportion is the sum of the areas -2-10 2 3 z C and Din Figure 6-8

Figure 6-8 -

3 c The required proportion is the sum of the areas -2-10 2 3 z C and Din Figure 6-8 p.6
Figure 6-1 O

Figure 6-1

O p.7
Figure 6-11

Figure 6-11

p.8
Figure 6-12

Figure 6-12

p.8
Figure6-13:p = 0.8,n = 5,p. = 4
Figure6-13:p = 0.8,n = 5,p. = 4 p.13
Figure 6-15: p = 0.8, n = 15, µ, = 12

Figure 6-15:

p = 0.8, n = 15, µ, = 12 p.14
Figure 6-16: p = 0.8, n = 20, µ, = 16

Figure 6-16:

p = 0.8, n = 20, µ, = 16 p.14
Figure 6-17

Figure 6-17

p.17
Figure 6'...20

Figure 6'...20

p.20

References