CHAPTER
6
Normal distribution
Are the diameters of the rods exactly equal?
I
In your earlier studies of statistics, frequency distributions were considered and it was stated
that, although they are sometimes of importance in themselves, they are mainly important
in providing information about the population from which the sample is drawn.
a b a b
Figure 6-1: Histogram Figure 6-2: Frequency curve
Consider a relative frequency distribution of a continuous variable, represented by the
histogram (Figure 6-1). It is not difficult to imagine that, if the sample size is increased
indefinitely, it will approach the population size and, if the class intervals are made as small
as possible, the histogram will approach a well defined smooth curve (Figure 6-2). Such a
curve is called afrequency curve or probability density curve and it provides us with
information about a population in much the same way as the histogram does about the
sample. The shaded area in the histogram represents the relative frequency with which the
variable lies between the values a and bin the sample, and corresponds to the shaded area
in the frequency curve which represents the relative frequency or probability with which the
variable Hes between the values a and bin the population. If the probability density curve
is drawn so that the total area under the curve is unity and if the equation of the curve
y
= f(x) is known, then the probability that X
lies in the interval a to b can be calculated
by the process of integration:
Pr(a <
X
< b) =
[
b
f(x)dx.
The mean and standard deviation of a population are denoted by Greek letters µ and a
respectively and are called parameters to distinguish them from the mean and standard
deviation of a sample which are denoted by Roman letters x ands respectively and are called
statistics.
6.1 The normal distribution
y
a
One of the most important examples of a probability distribution of a continuous variable
is the
normal
distribution. If a random variable,
X,
is normally distributed, its frequency
curve has a typical symmetrical bell shape as.shown in Figure 6-3. This curve has been
found to give an adequate fit to a great variety of frequency distributions. The heights of
children of a certain age, the diameters of metal cylinders, the number of burning hours
of electric light globes manufactured by a particular firm, the Intelligence Quotient of
children in a certain area, are a few examples of distributions which are approximately
normal.
The normal distribution is defined by the equation of its frequency curve:
where:
and:
1
-! (�)'
Y
=
--e
i uufu
µ, = the mean value of X in the population
u
=
the standard deviation of X in the population.
How the equation of this curve is derived is beyond the scope of this book, but the following
properties should be noted.
(i) The curve extends, theor"etically at least, to infinity in either direction and so
X
can
assume all values.
(ii) The curve is symmetrical about the ordinate
x
= µ, and so the mean, mode and median
of the normal distribution coincide.
(iii)
Practically all of the population (about 99. 7 per cent) lies in the interval µ, ±
3a;
about
95 per cent lies in the interval µ,
±
2a;
about
j
of the population lies in the interval
µ,
±
(1.(iv)
The total area under the curve is unity.
(v) The probability that
X
lies in the interval
a
to
b
is equal to the area under the curve
bounded by the ordinates
x
=
a
and
x
=
b.
6.2 Standard normal curve
Normal distribution curves will differ in location and degree of spread according to the
values of the parameters µ, and a respectively.
y
y
y
X µ X
Figure 6-4 Figure 6-5
In Figure 6-4, the curves differ in location but have the same degree of spread, i.e.
a
is the
same for both, but µ, is different.
It
would appear then, that the area for any interval would require special determination for
each particular curve. However, this is not the case, since all normal curves can be
transformed into a standard normal curve by putting µ =
0
and
a
= l. The equation of
the standard curve is:
1 _! ,'
Y
=
--e
2-J2ir
To transform the normal equation to the standard normal equation put:
x-µ
z
= --and
Y= ay.
(1
Whenz
=
0,
Y = -
1-e
0= -
1-
=
0.40
-J2ir
-J2ir
Whenz
= ±
1,
Y
=
v21r
},_e-f
=
0.24
When
z
= ±
2,
Y
= .
},_
e
- 2
=
0.05
v21r
When
z
= ±
3,
Y
= .
v21r
},_
e
-
4-5= 0.004 (using a calculator)
For example, to evaluate -,/2ir
e
-
4-5,we can proceed as follows:
(g)
a a e •
111:1@
o
CID 11111111
e
-3
-2
-1Figure 6-6: Standard normal curve
z
0.4
0.3
0.2
0.1
z,
0
0°0044
2 3 z
The total area under this curve is unity but the area from - oo to any positive value of z
is given in the normal probability tables provided (see page 178).
To find the area under any normal curve between the ordinates x1 and X2, we find the
corresponding values of z1 and z2 from the transformation formulae:
X1 - µ
X2 -µ
z1
=
---andz2
a
= ---
a
Example 1
The heights of VCE students in Victoria may be considered to be normally distributed with
a mean of 170 cm and a standard deviation of
5
cm.
a What is the probability that a student, selected at random, has a height between 174 cm
and 178 cm?
b Out of a group of 150 VCE students, how many would be expected to have a height less
than 164 cm?
c What proportion of students would be expected to have heights deviating from the mean
by more than two standard deviations?
y
A
155 160 :165 170 175 180 185 X
-3
-2 :-1 0'
: 1 23
z
-1'.2 o.'8 1.6
Figure 6-7
a The shaded area,
A,
in Figure 6-7 measures the required probability.
Whenx = 174:
Whenx
=
178:
z = X - /J, = 174 - 170 = 0.8
a
5
Pr(X
<
174)
=
Pr(z
<
0.8)
=
0.7881 (from tables)
z =
X �/J, = 178 ; 170 = 1.6
Pr(X
<
178) =
Pr(z
<
1.6)
= 0.9452 (from tables)
Pr(174
<X
<
178)
=
0.9452 - 0.7881
= 0.1571
b The shaded area,
B,
in Figure 6-7 measures the probability of a student having
a height less than 164 cm.
Whenx = 164:
4
0
z = X - µ, = 16 - 17 = _ 1.2
a
5
Pr(X
<
164)
=
Pr(z
< -
1.2)
=
Pr(z
>
1.2) (from symmetry of curve)
= 1 -
Pr(z
<
1.2)
= 1 - 0.8849
=
0.1151
Expected number
=
150 x 0.1151
y
155 160 165 170 175 180 185 X
Figure 6-8 -3 -2 -1 0 2 3 z
c The required proportion is the sum of the areas
C
and Din Figure 6-8. By
symmetry, these areas are equal. For two standard deviations above or below the
mean,z
= ±
2.
Pr
(X> µ
+
2o-) =
Pr(z>
2)
=
1 - Pr(z<2)
=
1 - 0.9772 (from tables)
=
0.0228
Pr(X> µ
+
2a or<µ
- 2a)
=
2 x 0.0228 (from symmetry)
= 0.0456
This means that about
5
OJo of the students have heights deviating from the mean
by more than two standard deviations, or that about 95% of students have
heights within two standard deviations from the mean. This is one of the
characteristics of the normal distribution. Verify that about� of the population
lies within one standard deviation from the mean and that practically all of the
population (about 99.7%) lies within three standard deviations from the mean.
Example 2
A lathe turns out brass cylinders with a mean diameter of 2.00 cm and a standard deviation
of 0.04 cm. Assuming that the distribution of diameters is normal, find the limits to the
acceptable diameters if, on checking, it is found that five per cent in the long run are rejected
because they are oversize and five per cent are rejected because they are undersize.
Each of the shaded areas in Figure 6-9 is five per cent of the total area. This is an
inverse type of problem in which the proportions are given and the x-values of the
inside ends of the shaded portions are to be found. Using the inverse normal
distribution tables on page 178, we find the z value such that the area from -oo to
this z value is 0.95. Its value is 1.6449. By symmetry, the other z value is -1.6449.
Figure 6-9
1.92 1.96
-2 -1
-1.6449
y
2.00 0
,
1.6449 0.05 ,
I / ---...__..._
\
z -
x-
µ·"-...
� 7"-,..- ')
'Z
V"'- .
+ 1 6449
:·x�
2·00
- .
0.04
X
=
2.00 ±
1.6449 X 0.04
=
2.00
±
0.07
=
1.93 or 2.07 (cm)
These are the acceptable limits.
Example 3
The mean life of a certain type of television tube is 10 000 hours with a standard deviation
of 1000 hours. Assuming the distribution of lifetimes, X, is normal, find the probability that
Xis less than any specified value
x.
Using integral multiples of the standard deviation, plot
the cumulative probability curve.
Figure 6-1 O
y
Pr (X< x)
�-+--'�--,...---t--��-i----i---1�x
7 -38 -2
9 -1
10 X 11 0
12 2
13 hours ('000) 3 z
The shaded area of Figure 6-10 shows the probability that Xis less than
x,
where
the values of the variable lie almost certainly between 7000 and 13 000.
When
x
=
7000:
z
= - 3 and Pr(X
<
7000) = 0.0013 (from tables)
When
x
=
8000:
z
= -
2 and Pr(X
<
8000)
=
0.0228
Similarly, for
x
=
9000, 10 000, .. . 13 000 as shown in the following table:
x('OOO) 7 8 9 10 11 12 13
Pr(X<x) 0.0013 0.0228 0.1587 0.5 0.8413 0.9772 0.9987
This table gives a cumulative probability distribution of a normal variable, X. It
is similar to a cumulative relative frequency distribution. From the table we can see,
for example, that 84.13 per cent of tubes have a lifetime less than 11 000 hours.
Figure 6-11 shows the cumulative probability curve from which the different
quantiles and relative frequencies can be calculated approximately. For example,
about 28 per cent of tubes have a lifetime less than about 9400 hours. The 0.6
quantile is approximately 10 250.
1.0
�
0.8 0.60.4
0.28-0.2
0 7
V
----I
---
---
---
i'i
I
---
---
_/
V
'
---
' '
8 9 : 10
'
11 12 13 hours ('000)Figure 6-11 9_'4 10.25
Example4
A machine makes metal rods with a mean length of 50 cm and a standard deviation of 1 cm .
Assume that the distribution of lengths is normal.
a What proportion of rods whose length is greater than 49 cm will have a length in excess
of 50cm?
b If five rods are selected at random, what is the probability that not more than one of
these rods will have a length greater than 49 cm?
49 -1
y
50 0
51 52 2
53 X 3
z
Figure 6-12
a
Pr(X>
49)
=
Pr(z
>
-1)
=
Pr(z
<
1)
=
0.8413
Pr(X>
50)
=
0.5
This question involves conditional probability.
Using the formula:
Pr(A
n
B)
Pr(B
I
A)
=
Pr(A)
we get:
d
Pr(X>
50)
Pr(X>.5
IX> 49) =
Pr(X>
49). Why?
---
0.5
=
0.8413
=
0.5943
Or, simply using the geometry of the situation as shown in Figure 6-12:
NORMAL DISTRIBUTION 175
b Since each rod has the same probability 0.8413 of having length greater than
49 cm, we are dealing with a
binomial
variable,
Y,
withp
=
0.8413,
q
=
0.1587
and
n
=
5.
Pr(Y �
1)
=
Pr(Y
=
0)
+
Pr(Y
=
1)
Exercises 6a
=
(0.1587)
5+ (\0) (0.1587)
4(0.8413)
=
0.0001 + 0.0053
=
0.0054
1 Plot the curve of the normal distribution:
Y
=
_l _
crfiir
e-½(7)'
whenµ
=
20 and
<1
=
5 and, by counting squares, verify that the area under the curve
is approximately one square unit.
:£
A normal variable has meanµ and standard deviation
<1.
What is the probability that
any value of the variable, randomly selected, lies betweenµ +
0.4<1
and µ + 2.6<1?
3 A manufacturer of electric light globes finds that these articles have an average life of
1200 burning hours with a standard deviation of 200 hours. Assuming that the
distribution of life-times is normal:
a what is the probability of a globe selected at random having a life between 1240 and
1320 hours?
b out of a batch of 200 globes, how many would be expected to fail in the first 880
burning hours?
c what proportion of globes manufactured would be expected to have a life less than
1100 hours or more than 1460 hours?
@
Tests on breaking strengths of two different kinds of fibre, one being silk and the other
a silk-rayon mixture, yielded the following data:
Silk: mean 10 kg wt; standard deviation 2.5 kg wt.
Silk-rayon: mean 15 kg wt; standard deviation 5 kg wt.
Calculate:
a the probability that a piece of silk, selected at random, will be at least as strong as
the mean for the silk-rayon mixture.
b the probability that a piece of silk-rayon selected at random will be no stronger than
the mean of the silk.
5 A machine makes electrical resistors which have a mean resistance of 50 ohms with a
standard deviation of 2 ohms.
a Assuming the distribution to be normal, find the proportion of resistors made which
have resistance of less than 47 .5 ohms.
b Calculate the limits
a
and
b,
equally spaced on either side of the mean, so that the
manufacturer can correctly claim that, in the long run, no more than one resistor
in 500 lies outside these limits.
· V The local authorities in a certain city install ! 000 electric lamps in the streets of the city.
a If the lamps have an average life of 2000 burning hours with a standard deviation
of 400 hours, and the life of the lamps is normally distributed, what number of lamps
might be expected to fail in the first 1500 burning hours?
7 Steel rods are manufactured to be
5
cm in diameter, but they are acceptable if they are
between 4.95 and
5
.05 cm. The manufacturer finds that, in the long run, about four per
cent are rejected as oversize and four per cent as undersize.
If
the diameters are
normally distributed, find the distribution's standard deviation.
8
Speedometers of cars are not accurate. Suppose that, when the speedometer of a
randomly chosen car registers 60 km/h, the actual speed of the car is a variable having
a normal distribution with meanµ
=
62 ana standard deviation a
=
2.
What proportion of cars are exceeding 60 km I h when their speedometers register
60 km/h?
9
The 'threshold', or smallest amount, of a certain poison which is sure to kill a rat, is
known to vary from rat to rat, following a normal distribution with mean 25 .0 mg and
standard deviation 2.5 mg.
a Find the proportion of rats that would be killed by a dose of 27 .0 mg.
b
Plot a graph (using integral multiples of the standard deviation) showing how this
proportion changes as the dose is increased or decreased.
c Find the smallest dose that would kill 90 per cent of rats.
d
What changes would you expect in the graph if the poison were diluted by adding
two parts of inert bait to one part of the original poison?
10
Butter, marketed in 250 g packages, has a weight which is normally distributed with its
advertised weight of 250 g as the mean. It may be regarded as appreciably underweight
if the actual weight is less than 225 g. Find the maximum allowable value of the
standard deviation if, in the long run, not more than one package in 100 is rejected as
being underweight.
11
The mean annual income for a sample of 200 persons selected at random from a certain
industry was $20 800 with a standard deviation of $4160. Of these, eight earned less
than $272 per week and 24 earned more than $480 Qer week. Does this sample tend to
confirm or refute the claim that incomes of the population from which this sample was
selected were normally distributed? Give reasons for your answer.
12
A firm producing brass washers to a specified thickness of 0.5 cm has found that the
thickness varies normally about a mean of 0.5 cm with a standard deviation of
0.005 cm. All washers with a thickness between 0.49 cm and 0.51 cm are regarded as
satisfactory. In a batch of 2000 washers, how many would you expect to be rejected?
13
The average height of male students at a certain university is 170 cm with variance 25.
What proportion of these students whose height is greater than 160 cm will have a
height in excess of 170 cm assuming the heights are normally distributed?
14
If
Xis a normally distributed random variable with mean 10, and the probability that
Xis greater than 12 is 0.1056, find:
a the standard deviation of X
b
Pr(X> 8)
c
Pr(X>
8
JX<
12)
d
the value of
x
for which
Pr(X
>
x)
=
0.85
15 .·Xis a normally distributed random variable and the variance of Xis 4. Given that
Pr(X>
16) = 0.95, findµ, the mean value of
X
and determine:
a
Pr(X
<
16
IX<µ),
b
Pr(X
<
µ
IX<
20).
/
17 A certain population of plants has a distribution of heights, measured in centimetres,
which is normal with mean 30 cm and standard deviation 2 cm.
a
Cals;ulate the probability that a randomly selected plant will be less than 27 cm in
height.
b
If
five plants are selected at random, what is the probability that, at most, one is less
ilimn����n
18
The average life of a certain type of light globe is 1200 h with a standard deviation of
240 h.
a Assuming that the lengths of life of this type of globe are normally distributed,
complete the following proportionate frequency distribution.
Length of life
<480 <720 <960 < 1200 < 1440 < 1680 < 1920Proportion of tubes
b
Use the table above to draw a cumulative proportion curve and, from the graph, find:
(i)
the proportion of tubes which have a life less than 700 h.
(ii)
the 0.9 quantile, stating what it represents.
19
The wingspan of birds of a particular species has a normal distribution with mean 50 cm
and standard deviation 5 cm.
a Find the probability that a randomly selected bird has a wingspan greater than 60 cm.
b
If
the wingspan is measured to the nearest centimetre, find the probability that a
randomly selected bird has a wingspan measured as 50 cm.
20 The length of a certain species of fish has a normal distribution with mean 30 cm and
standard deviation 2.5 cm.
a
Find the probability that a randomly selected fish has a length greater than 36 cm.
b
If
the lengths of the fish are measured correct to the nearest centimetre, show that
the probability of a randomly selected fish having a length which is measured as
30 cm is about 0.16.
c
If
five fish are randomly selected, find the probability that exactly two will have their
lengths measured as 30 cm.
21
Suppose that the strengths of mass-produced items are normally distributed with mean
µ
and standard deviation 0.5. The value of
µcan be controlled by a machine setting.
If
the strength of an item is less than 5, it is classified as defective. Revenue from sales
of non-defective items is $20 per item, while revenue from defective items is $2 per item.
The cost of production of items with mean-µ is $2 per item. Find the expected profit
per item ifµ = 6.
2� A machine makes electrical resistors which have a mean resistance of 50 ohms with a
·� · standard deviation of 2 ohms.
23
a
Assuming the distribution of resistances to be normal, find the proportion of
resistors made which have resistance less than 47 .5 ohm�.
b
If
IO resistors are selected randomly, what is the probability that no more than one
will have a resistance less than 47 .5 ohms?
A chain is made of five links which are selected at random from a population of links.
The strengths of the links are assumed to be normally distributed with mean of 500 units
and a standard deviation of IO units. Find the probability that:
a a randomly selected link has a strength of at least 490 units
b a chain has a strength of at least 490 units
c
at least two links in a chain have strength of at least 490 units.
Area under standard normal curve
giving area as function of x, x � 0
/( 0 1 2
CD
4 5 6 7 6 9 10.0 �1000 'Q'.5040 0.5080 0.5120 0.5160 0.5199 0.5239 0.5279 0.5319 0.5359 4 0.1 ,5398 0.5438 0.5478 0.5517 0.5557 0.5596 0.5636 0.5675 0.5714 0.5753 4
0.2 0.5793 o.5832 o.5871 o.591 o 0.594�8t) 0.6026 0.6064 0.6103 0.6141 4
0.3 0.6179 0.6217 0.6255 0.6293 0.6331 0.6368 0.6406 0.6443 0.6480 0.6517 4
0.4 0.6554 0.6591 0.6628 0.6664 0.6700 0.6736 0.6772 0.6808 0.6844 0.6879 4
0.5 0.6915 0.6950 0.6985 0.7019 0. 7054 0. 7088 0. 7123 0. 7157 0. 7190 0. 7224 3
0.6 0.7257 0.7291 0.7324 0.7357 0. 7389 0. 7 422 0. 7 454 0.7486 0.7517 0.7549 3
0.7 0.7580 0.7611 0.7642 0.7673 0.7704 0.7734 0.7764 0.7794 0.7823 0.7852 3
0.8 0.7881 o. 791 o o. 7939 o. 7967 0.7995 0.8023 0.8051 0.8078 0.8106 0.8133 3
0.9 0.8159 0.8186 0.8212 0.8238 0.8264 0.8289 0.8315 0.8340 0.8365 0.8389 3
1.0 0.84� 3 0.8438 0.8461 0.8485 0.8508 0.8531 0.8554 0.8577 0.8599 0.8621 2
1.1 0.8643 0.8665 0.8686 0.8708 0.8729 0.87 49 0.8770 o.8790 o.881 o o.8830 2
1.2 0.8849 0.8869 0:8888 0.8907 0.8925-l�-8962 0.8980 0.8997 Q.9015 2
1.3 0.9032 0.9049 0.9066 0.9082 0.9099 !f.9fl 5 0.9131 0.9147 0.9162 0.9117 2
1.4 0.9192 0.9207 0.9222 0.9236 0.9251 0.9265 0.9279 0.9292 0.9306 0.9319 1
1.5 0.9332 0.9345 0.9357 0.9370 0.9382 0.9394 0.9406 0.9418 0.9429 0.9441 1
1.6 0.9452 0.9463 0.947 4 0.9484 0.9495 0.9505 0.9515 0.9525 0.9535 0.9545 1
1.7 0.9554 0.9564 0.9573 0.9582 0.9591 0.9599 0.9608 0.9616 0.9625 0.9633 1
1.8 0.9641 0.9649 0.9656 0.9664 0.9671 0.9678 0.9686 0.9693 0.9699 0.9706 1
1.9 0.9713 0.9719 0.9726 0.9732 0.9738 0.97 44 0.9750 0.9756 0.9761 0.9767 1 '
2.0 0.9772 0.9778 0.9783 0.9788 0.9793 0.9798 0.9803 0.9808 0.9812 0.9817 o
2.1 .0.982) 0.9826 0.9830 0.9834 0.9838 0.9842 0.9846 0.9850 0.9854 0.9857 o
2.2 0.9861 0.9864 0.9868 0.9811 0.9875 0.9878 0.9881 0.9884 0.9887 0.9890 o
@
0.9893 0.9896 0.9898 ().99011 0.9904 0.9906 0.9909 0.9911 0.9913 0.9916 o 2.4 0.9918 0.9920 0.9922 0.9925 0.9927 0.9929 0.9931 0. 9932 0. 9934 0. 9936 o, .. )' )
2.5 0.9938 0.9940 0.9941 0.9943 0.9945 0.9946 0.9948 0.9949 0.9951 0.9952 o
2.6 0.9953 0.9955 0.9956 0.9957 0.9959 0.9960 0.9961 0.9962 0.9963 0.9964 o
2.7 0.9965 0.9966 0.9967 0.9968 0.9969 0.9970 0.9971 0.9972 0.9973 0.997 4 o
2.8 0.9974 0.9975 0.9976 0.9977 0.9977 0.9978 0.9979 0.9979 0.9980 0.9981 o
2.9 0.9981 0.9982 0.9982 0.9983 0.9984 0.9984 0.9985 o.9985 o.9986 o.9986 o
:r.o
0.9987 0.9987 0.9987 0.9988 0.9988 0.9989 0.9989 0.9989 0.9990 0.9990 o3.1 0.9990 0.9991 0.9991 0.9991 0.9992 0.9992 0.9992 0.9992 0.9993 0.9993 o
3.2 0.9993 0.9993 0.9994 0.9994 0.9994 0.9994 0.9994 0.9995 0.9995 0.9995 o
3.3 0.9995 0.9995 0.9995 0.9996 0.9996 0.9996 0.9996 0.9996 0.9996 0.9997 o
3.4 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9998 o
3.6 0.9998 0.9998 0.9999 0.9999 0.9999 0.9999 0.9999 0.9999 0.9999 0.9999 o
3.8 0.9999 0.9999 0.9999 0.9999 0.9999 0.9999 0.9999 0.9999 0.9999 0.9999 o
3.9 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 o
Inverse normal tables
0.50
0.00000.60
0.25330.70
0.52440.80
0.84160.90
0.51
0.0251,0.61
0.27930.71
0.55340.81
0.87790.91
0.52
0.05020.62
0.30550.72
0.58280.82
0.91540.92
0.53
0.07530.63
0.33190.73
0.61280.83
0.95420.93
0.54
0.10040.64
0.35850.74
0.64330.84
0.99450.94
0.55
0.12570.65
0.38530.75
0.67450.85
1.03640.95
0.56
0.15100.66
0.41250.76
0.70630.86
1.08030.96
0.57
0.17640.67
0.43990.77
0.73880.87
1.12640.97
0.58
0.20190.68
0.46770.78
0.77220.88
1.17500.975
0.59
0.22750.69
0.49590.79
0.80640.89
1.22650.98
Mean dlfferen es
2 (!: 4 5 6 7 8 9
8 12 76 20 24 28 32 36 8 12 16 20 24 28 32 36 8 12 15 19 23 27 31 35 8 11 15 19 22 26 30 34 7 11 14 18 22 25 29 32
7 10 14 17 21 24 27 31 6 10 13 16 19 23 26 29 6 9 12 1518 21 24 27 6 8 11 14 17 19 22 25 5 8 10 13 15 18 20 23
5 7 9 12 14 16 18 21 4 6 8 10 12 14 16 19 4 5 7 9 11 13 15 16 3 5 6 8 10 11 13 14 3 4 6 7 8 10 11 13
2 4 5 6 7 8 10 11 2 3 4 5
I\
7 8 9 2 3 3 45
6 7 8 1 2 3 4 4 5 6 6 1 2 2 3 4 4 5 51 1 2 2 3 3 4 4 1 1 2 2 2 3 3 4 1 1 1 2 2 2 3 3 0 I 1 1 1 2 2 2 2 o 1 1 1 1 1 2 2
o o 1 1 1 1 1 1 o o o 1 1
1
1 1 o o o o 1 1 1 1 o o o o o o o 1 o o o o o o o 1o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o o
o o o o o o o o o o o o o o o o o o o o o o o o
1.2816
0.990
2.32631.3408
0.991
2.36561.4051
0.992
2.40891.4758
0.993
2.45731.5548
0.994
2.51211.6449
0.995
2.57581.7507
0.996
2.65211.8808
0.997
2.74781.9600
0.998
2.87826.3 Normal approximation to binomial distribution
When
n
is fairly large (30 or more), andp is not too small (not less than about 0.1) or not
too large (not greater than about 0.9), the
normal
distribution with
µ,= np
and standard
deviation a
=
-Jnjiq can be used as an approximation for the binomial distribution.
The histograms below and on the next page have been drawn for p = 0.8 and for values
of
n =
5, 10, 15 and 20. Observe that, as
n
increases, the histograms become less skewed,
leading then to the idea that the curve drawn through the midpoints of the top of each
rectangle of the histogram has the characteristic symmetrical shape of the normal
distribution curve.
Pr
0.4
0.3
0.2
0.1
I
µ
0
0
2
3 4Number of successes
Figure6-13:p
=
0.8,n=
5,p.=
4Pr
0.3
0.2
0.1
µ
0
0 2 3 4 5 6 7 8 9 10
Number of successes
Pr
0.3
0.2
0.1
3 6 7 8 9 10 11 12 13 14 15
Number of successes
Figure 6-15: p = 0.8, n = 15, µ, = 12
Pr
0.3
0.2
0.1
0
I I
µ
4 8 10 11 12 13 14 15 16 17 18 19 20 Number of successes
Figure 6-16: p = 0.8, n = 20, µ, = 16
The binomial variable is discrete and can assume only integral values 0,
1,
2, ... , n. How
then can we represent its probability distribution by means of a histogram which can be
drawn for a continuous variable only? This can be justified by the fact that the areas of
the rectangles are proportional to the probabilities and, since the width of the base of each
rectangle is one unit, the height of each rectangle represents the probability of the midpoint
of the base. The sum of the areas of all the rectangles is 1 unit of area, corresponding to
the fact that the sum of the probabilities is 1.
Example 5
A Gallup Poll establishes that 80 per cent of people interviewed are in favour of a certain
proposal. If 20 people are interviewed, find, using the normal approximation to the
binomial distribution, the probability that:
µ
=
np
=
20 X 0. 8
=
16
a
=
-,/npq
=
--J16 X 0.2
=
1.789
A normal distribution with mean 16 and standard deviation -13.2 can be used as
an approximation in this case.
a With reference to Figure 6-16 it will be seen that, to find the probability that
X
=
14,
it
will be necessary to find the probability that 13.5
<X*
< 14.5. It
should be remembered that the normal variable is continuous, whereas the
binomial variable is discrete.
Note:
If
Xis a binomial variable whose distribution is approximated by a normal
variable
X*,
then:
Pr(X =a)""' Pr(a
-
0.5<X*<a
+ 0.5)
and:
Pr(a <X
<
b)
""'
Pr(a
+
0.5
<X*
<
b
- 0.5)
for integer values
a
and
b
such that
a<
b.
WhenX*
=
13.5:
WhenX*
=
14.5:
z
=
13.5 - 16
""'-1.40
-13.2
z
=
14.5 - 16""' -0.84
-13.2
Pr(X*
� 13.5)
=
Pr(z �
-1.40)
=
0.0808
Pr(X*
< 14.5)
=
Pr(z
< -0.84)
=
0.2005
Pr(13.5 <
X*
< 14.5)
=
0.2005 - 0.0808
=
0.1197
Check whether this is a good approximation by evaluating
(i�)
(0.2)
6(0.8)
14•b It will be necessary to find Pr(14.5
<X*
< 17.5)
When
X*
=
14.5:
WhenX*
=
17.5:
z
=
14.5 - 16
""'-0.84
-13.2
z
=
11.5
- 16
""'0.84
-13.2
Pr(X*
� 14.5)
=
Pr(z �
-0.84)
=
0.2005
Pr(X*<
17.5)
=
Pr(z<
0.84)
=
0.7995
Pr(14.5<X*< 17.5)
=
0.7995 - 0.2005
=
0.5990
Compare this result with the arithmetical drudgery involved in calculating:
Exercises 6b
(In each of the following questions, use the normal approximation to the binomial
distribution where applicable.)
1 A dental inspector finds that about 20 per cent of children of a certain area have tooth
decay. If a group of 400 children is randomly selected:
a
how many would be expected to have tooth decay?
b
what is the probability of exactly this number?
c
what is the probability that the number lies within one standard deviation of the
expected number?
2
A fair coin is tossed 100 times.
a How many heads do we expect to turn up?
b
What is the probability of this number?
c What is the probability that the number of heads is greater than 45 but less than 55?
3
A fair coin is tossed 500 times. Find the probability that the number of heads uppermost
will not differ from 250 by more than 10.
4
A fair die is thrown 180 times. What is the probability that:
a
a six will turn up exactly 40 times?
b an odd number will turn up at least 100 times?
5
A targetshooter finds that a bull's-eye is scored on 20 per cent of occasions. What is
the probability that at least 24 bull's-eyes will be scored out of 100 attempts?
6
a
Assuming that the length of life of a certain type of television tube is normally
distributed with a mean of 1000 hours and a standard deviation of 250 hours, what
proportion of tubes would be expected to have a life not exceeding 780 hours?
b
If 100 such tubes are randomly selected, how many would be expected to have a life
not exceeding 780 hours and what is the probability that the number exceeds 21?
7
A manufacturer of metal pistons finds, that on average, 10 per cent of the pistons are
rejected because they are either oversize or undersize. What is the probability that a
batch of 900 pistons will contain:
a
no more than 100 rejects?
b
at least 80 rejects?
8
Hospital records show that of patients suffering from a certain complaint, 75 per cent
recover. What is the probability that, of 48 randomly selected patients, at least 40
recover?
9
In packets of flower seeds, 40 per cent are known to produce pink flowers. If 250 seeds
are planted and they all flower:
a
how many pink flowers would we expect?
b
what is the probability of this number?
c
within what limits·would the number of pink flowers very probably lie?
6.4 Probability limits for a single value of the
normal /variable
It has been stated that one of the characteristic properties of a normal distribution is that:
(i)
about� of the population lies in the intervalµ ±
a
(Figure 6-17).
68.27%
µ-cr--µ µ+cr
Figure 6-17
95.45% 99.74%
µ -20' µ µ + 20' µ-30' µ µ + 30'
Figure 6-18 Figure 6-19
From this we infer that a single value,
x,
of the variable will
almost certainly
lie within 3
standard deviations of the mean:
i.e.
I almost certainly,
µ,-
3a � x �
µ, +
3a.
Very probably
(probability of about 0.95) it will lie within 2 standard deviations of the
mean:
i.e.
very probably,
µ,-
2a � x �
µ, +
2a.
These limits for the value of a variable are called the 3 sigma and 2 sigma
probability limits
respectively. If a value of the variable lies beyond either of these limits, it is said to differ
significantly
from the mean at that particular level of significance.
Example&
A manufacturer of electric light globes finds that the globes have an average life of 2000
burning hours with a standard deviation of 200 hours. Assuming that the distribution of
lifetimes is normal, within what interval will the lifetimes almost certainly lie?
Their lifetimes will almost certainly lie in the interval
µ,±
3a.
i.e.
in the interval 2000 ± 600 hours
i.e.
between 1400 and 2600 hours
If a globe, randomly selected, had a lifetime of only 1200 hours, what conclusion could we
draw?
We could conclude thit this is most unlikely to have occurred by chance so perhaps there
is some factor which needs to be taken into consideration, such as a fault in the
manufacturing process.
Further sampling would be necessary to discover whether the lifetime of globes was
consistently lower than expected.
Example 7
In the long run, 64 per cent of patients treated for a particular disease with drug
X
are
cured. If 100 patients, not specially selected, are treated with this drug and 75 are cured,
determine whether this number is significantly higher than the expected numbe� of cures.
Expected number of cures:
µ
=
np
=
100 X 0.64
= 64
a
=
-Jnpq
=
-JlO0
X 0.64 X 0.36
= 4.8
µ
+
2a
=
64
+
2 X 4.8
=
73.6
µ
+
3a
= 64
+
3 X 4.8
=
78.4
So 75 cures are significantly more than the expected number at the 2 sigma limit
but not at the 3 sigma limit.
The mean, µ, is a characteristic parameter of a Poisson distribution and the variance and
standard deviation areµ and -Jµ, respectively, the derivation of which is beyond the scope
of this book. Whenµ, is large, the normal distribution gives a satisfactory approximation
to the Poisson distribution with practically all values of the Poisson variable lying in the
rangeµ ± 3-Jµ, since a
=
-Jµ, and very probably (probability of about 0.95) lying in the
range µ ± 2-Jµ,.
Example&
The number of demands for a certain item of equipment varies randomly from week to
week, following a Poisson distribution with mean 20. What is the smallest number of items
a firm must have in stock each week to be almost certain of not having to refuse a demand
for this item?
µ
=
20
a
= ,,/µ
= .J26 =
4 .4 72.
It is almost certain that the demand will not be more thanµ, + 3-Jµ,, i.e. not more
than 20
+
3 x 4.472
= 33.416.
The firm should have at least 34 items in stock.
Very probably the demand will lie in the interv.al µ, ± 2-Jµ,, i.e. in the interval
20 ± 2 x 4.472. The demand will very probably be between 11 and 29.
Exercises 6c
1 Assuming that the length of life of a certain type of television tube is normally
distributed with a mean of 1000 hours and standard deviation of 250 hours, after how
many hours is a tube almost certain to fail?
NORMAL DISTRIBUTION 185
3 A dental inspector finds that about 20 per cent of children of a certain age have tooth
decay. In a certain area the inspector finds that 25 out of 200 children examined have
tooth decay. Use 3 sigma limits to determine whether this number is significantly less
than the expected number.
4 A Gallup Poll establishes that 60 per cent of people are in favour of a certain proposal.
If a sample of 120 were interviewed, use 2 sigma limits to estimate the number in favour
of the proposal.
5 Electricity power failures occur according to a Poisson law with an average of three
failures every twenty weeks. If, over a period of 40 weeks, there were actually 9 failures,
use 3 sigma limits to determine whether this is significantly more than the expected
number.
6
Variables
X
and
Y
are known to be connected by the formula:
Y
= 10 +
bX.
X
can be measured accurately but the measurements of
Y
are subject to a random error
which is normally distributed with mean of 0 and standard deviation of 0.20. An
observation
Y
= 15 is obtained when
X
= 2. Determine limits within which
b
almost
certainly lies.
7 The cost,
$C
per article, of manufacturing an article is related to the weight,
w
g, by
the equation:
C
=
2w
+ 25
The weight of the articles is normally distributed with mean_ 5 g and standard deviation
0.1 g. Give limits within which the cost of the article will almost certainly lie.
8
On the average, one student in every ten wears glasses.
a From a group of 90 students, how many would be expected to wear glasses? Give
limits between which this number very probably lies.
b How large would a group of such students need to be for us to be almost certain
that the number of students in the group wearing glasses is at least 63?
9
A fair coin is tossed 100 times.
a What is the mean and standard deviation of the number of heads appearing
uppermost?
b Give limits between which the number of heads:
(i)
very probably will lie
(ii)
almost certainly will lie.
10 A
manufacturer of metal pistons markets the product in batches of 10 and finds that
15 per cent of batches contain at least one defective piston. In 1000 batches, estimate
the mean and standard deviation of the number of defective pistons and give limits
between which this number will almost certainly lie, assuming the Poisson law.
11
The number of demands for a certain item of equipment varies randomly from week
to week, following a Poisson distribution with mean 4. If
X
denotes the number of
demands per week, find
Pr(X �
µ, - a).
12
A retailer keeps a record of sales and finds that, on 82 out of 1000 days, there was no
demand for a particular item of clothing. On one particular day there was a demand
for 13 such items. Assuming the Poisson law, determine whether this is significantly
more than the expected number.
13
Electricity power failures occur according to a Poisson law, with an average of three
failures every twenty weeks. If, over a period of 40 weeks, there were actually nine
failures, use three sigma limits to determine whether this is significantly more than the
expected number.
·/
14 Cans of peas are tested for infection by certain organisms by storing them for a period
of time before they leave the factory. Cans which contain one or more organisms burst
open on account of fermentation.
a
If
the number of organisms in a can is a Poisson variate, and on the average 76 in
every 10 000 cans burst open, find the mean number of organisms per can.
b In batches of 5000 cans, find approximately the mean and standard deviation of the
number of cans which burst open, and give limits between which this number will
almost certainly lie.
15 In samples of milk taken from a bulk transportation vehicle, 40 per cent proved to have
no bacterial spores.
a Assuming the Poisson law, estimate the mean number of bacterial spores per sample
and determine the proportion of samples-which would contain t�o bacterial spores.
b Out of 1000 samples, how many would be expected to have only-one spore each?
Give limits between which this number very probably lies,
6.5 Probability limits for the sample mean of n
values of the variable
If
a random sample of
n
observations is drawn from a normally distributed population with
mean
µ,and standard deviation
<J,it can be shown that the mean,
x,
of the sample:
/
(i)
more likely than not
(probability about�) lies in the interval
µ,±
Jn
(Figure 6-20).
(ii)
very probably
(probability of about 0.95) lies in the interval
µ,±
2
J,,
(Figure 6-21).
n
i.e.
very probably,µ, - 2
J,, :,;;
x:,;; µ, + 2
J,,
(iii)
almost certainly
lies in the interval
µ,±
3
Jn
(Figure 6-22).
i.e.
almost certainly, µ, - 3
Jn :,;;
x :,;; µ, ,+ 3
Jn
68.27%
µ-.Q. µ µ+.Q.
.Jn
.Jn
Figure 6'...20
95.45% 99.74%
µ-gg µ µ+gg µ-;m µ µ+�
.Jn
.Jn
.Jn
.Jn
The quantity
J,,
is known as the
standard error
of the mean. So, in Example 6, the mean
life of 100 randomly selected globes would
almost certainly
lie in the interval
2000
±
3
:;..:oo,
i.e. in the interval 2000 ± 60 h, and
very probably
would lie in the interval
100
2000
±
2
�
100
O, i.e. in the interval 2000 ± 40 h.
The standard error, J,,, gets smaller and smaller as n gets larger and larger, and
J,,---+O asn---+oo.
n
If
n
= 400, the mean life of 400 randomly selected globes would almost certainly lie in the
interval 2000 ± 3 x �. i.e. in the interval 2000 ± 30.
-v400
As
n
increases, x should give us a more reliable estimate of
µ,. This is what we would expect.It seems feasible to suggest, then, that in order to get a true estimate of the population
mean, take as large a sample as possible. However, this is not practical in many situations ..
Certainly, in the case of the globes it would be wasteful and expensive. Why?
We use general phrases such as 'more likely than not', 'very probably', 'almost certainly'
when referring to probability limits and levels of significance such as the 2 sigma level and
the 3 sigma level. The probability that x lies in the intervalµ, ± 1.96 � is 0.95.
.
-vn
Example 9
The mean weekly wage in a c,ertain industry is $500 with a standard deviation of $30. A
random sample of 25 employees in this industry has a mean wage of $475. ls this
significantly less, at the 3 sigma level, than the mean wage of the population?
At the 3 sigma level,
µ,
±
3 �
=
500
±
3 X �
.
-vn
-v25
= 500 ± 18
Since 475 is not in this interval, it is significantly less at the 3 sigma level.
Exercises 6d
1 The mean weight of boys of a certain age is 50 kilograms with standard deviation of 5
kilograms. Within what limits would the mean weight of a random sample of 64 boys
of this age very probably be?
2 At a certain school, the mean IQ (Intelligence Quotient) of the students is 100, with a
standard deviation of 15. The mean IQ of a sample of 25 students was 112. Is this
significantly higher than would be expected?
3 A sample, 55, 63,--69;-'B, is drawn from a population whose standard deviation is 4 and
whose mean is thought to be 60. Do you think the population mean has been wrongly
given?
-�
4 A machine makes electrical resistors which have a mean resistance of 50 ohms with a
standard deviation of 2 ohms.
a Within what limits would we expe�t the mean of 25 randomly selected resistors to
lie with a probability of 0.95? W\'',
0'5 Butter is marketed to retailers in cartons containing 16 packages drawn from a
population normally distributed with a mean weight of 0.5 kg and a standard deviation
of 0.02 kg.The butter in a particular carton weighed 8.15 kg. Is this significantly more
than the expected weight at the 2 sigma level of significance?
6 The mean height of a sample of 25 students is 150 cm. Can we infer that this sample is
drawn from a population of students of mean height 160 cm and standard deviation
10 cm?
'1
-:: 2
�
7 The length of a certain species of fish has a normal distribution with mean 30 cm and
standard deviation 2.5 cm. An angler caught nine such fish whose average length was
27 cm. Is this significantly less than the expected value at the
3a
level?
6.6 Confidence limits
a Population mean
Confidence limits are limits for the value of a parameter estimated from a particular value
of a statistic. Discussion will be confined in this section to estimating a population mean
from a single value of the variable or from the mean of a set of
n
observations.
If
xis
a particular value of a variable, it was stated in Section 6.4 that almost certainly:
µ,
-
3a
,,;;
x
,,;;
µ,
+
3a
. . . .. . .
(1)Transposing
(1)
gives:
µ,
,,;;
x
+
3a
and
x
-
3a
,,;;
µ,
i.e.
x
-
3a
,,;;
µ,
,,;;
x
+
3a
...
(2)The population mean,
µ,,
therefore almost certainly lies in the interval x
±
3a.
These limits
for the value of
µ, are called the 3 sigma confidence limits. Very probably,µ,
lies between
x
±
2a,
these limits being called the 2 sigma confidence limits.
If
we have a
random sample of n observations
with mean
x,
we would expect
x
to give a
better estimate of
µ,than
a single value
of the variable.
It can be shown that
µ,
almost certainly lies in the interval x
±
3;. The standard error, �.
will decrease as
n
increases.
'Vn
'Vn
We have assumed that the standard deviation,
a,
of the population is known.
If
it is not
known, the sample standard deviation,
s,
may be used as an estimate of it if the sample
is large.
The approximate 95% confidence interval for
µ, would be given by:Example 10
-
2a
-
2a
x- c,,;;µ,,,;;x+
vn ·.
---,vn
The IQ (Intelligence Quotient) of a sample of 100 VCE students had a mean of 108 with
a standard deviation of 15. Find a 95% confidence interval for the mean IQ of the
population of VCE students.
n
=
100
x
=
108
s
=
15
=
estimate of
a
ax
=
__!!__
=
__11_
=
1 5
The approximate 95% confidence interval would be given by:
-
2a
-
2a
x- -Jn �µ�x+ -Jn
i.e.
108 - 2 X 1.5 � µ � 108 + 2 X 1.5
105 � µ � 111
We can be about 95 % sure that the mean IQ of the population lies in the interval
105 to 111. The confidence limits are 105 and 111.
Example 11
A random sample of 25 employees has a mean weekly wage of $320. Could this sample have
been taken from a population of employees whose weekly wage is normally distributed with
mean of $290 and standard deviation of $40?
n
=
25
x =
320
a= 40
ax=_.!!_= 40
=
8
-Jn 5
We can be almost certain that the mean wage of the population would be in the
interval .x ±
;fn
i.e. in the closed interval 320 ± 3 x 8, i.e. (296, 344]
Since 290 does not lie in this interval, we can reject the hypothesis that the sample
is taken from a population whose mean weekly wage is $290.
b Proportions
Example 12
A random sample of 400 manufactured articles contains 80 defectives. Give 2 sigma
confidence limits for the number of defectives in samples of 400 articles and give the
proportion of defectives in the whole output of all samples of 400 articles.
Using this sample to estimate the probability of defectives:
n =
400
p = 400 = 0·2
-
80
where j3 denotes the
sample
proportion.
q
=
0.8
Standard deviation
=
-J npq
= v
r-4
--,-00
-,---x-0
----,. 2
,---x-o
-=--.----=8
=
8.
The 2 sigma confidence limits for the number of defectives in samples of 400 articles
are 80
±
2 x 8. At this confidence level, the number of defectives lies between 64
and 96. Therefore the proportion of defectives lies between 4
�6
and
1
06
0,
Alternatively:
Usingp as the sample proportion:
-
X80 O
P = n = 400 = ·2
q = l - p = 0.8
The 95% confidence intervals for pare given by:
P - 2 �P (1 n-p) :,;;,_ P :,;;,_ P + 2 �P (1 n-p)
i.e.
0.2 _ 2. /0.24�0
'\J
0.8 :5:: :5:: 0 2 + 2. /0.2 X 0.8
..._,_p ..._,_ •
'\J 400
i.e.
0.16 :,;;,_
p
:,;;,_ 0.24 as before
Formulae for the mean and standard deviation of a binomial
distribution
Random variable
Mean
Number of occurrences
(np)
µ
njj=
np
Proportion of occurrences (p)
µ.
p=P
Example 13
Standard deviation
<l
n;;=
.Jnpq
<l
;;=
vnpq=n
�
n
A coin is tossed 500 times, and a head appears uppermost 320 times. Give 95% confidence
limits for the porportion of heads and state whether you consider the coin�iassed?
n =
500,p
= ��� = 0.64
The 95% confidence limits for pare given by:
P ± 2 �P (1 n- p)
= 0 64 + 2. /0.64 X 0.36
. - '\J 500
= 0.64 ± 0.04
�--,
c Differences between population means
Example 14
A commercial traveller travels regularly between two towns, A and B, and has a choice of
either of two routes, 1 and 2. To determine whether there is any significant difference in
times taken, the traveller records the mean and standard deviation of times taken on a
number of occasions on each route as shown:
Route
Mean
Standard deviation
Sample size
1
x
1
=52 (min)
s
1 =8 (min)
80
2
x
2
=50 (min)
s
2
=6 (min)
100
Different sample sizes have been assumed to illustrate that they need not necessarily be the
same.
Let the mean and standard deviation of the population of times taken for routes 1 and 2
be
µ1and µ2 and
0"1and
u2respectively. These are the population parameters corresponding
to the sample statistics
x1and
x2
and
s1and
s2respectively.
If x
1and X
2are independent random variables it can be shown that:
a the mean of the difference X1 -
x2
is:
i.e. the mean of the difference
=
the difference of the means
b the variance of the difference .x1 -
x2
is:
...
(1)
(1� -x
=
(1�+
(1� • • • • • • • .. • • .. • • • • • • • • • • • .. • • • • • • .. • • • .. .. • • • • • • • • .. • .. • • • • .. • .. .. • • • .. • .. • •(2)
i -x2 xi X2
i.e. the variance of the difference
=
the
sum
of the variances
(12 (12
Since:
u� = _.!_ and u�
xi=
-1., equation (2) becomes:
n1 2 n2
uf u½
(1� -xi -x2
= -
n1+ - . . . .. . . .. . . .. . . .. . . .. . .. . .. . . .. . .. . . .. . . .. ..
n2 (3)and, so, the standard deviation of the difference· xi -
x2
is given by:
v(12 (12
O'x -x = _.!_
1 2 n1+
-1.
n2 ... (4)If, then, X1 and X
2are the means of two large independent samples from populations with
means
µ1and µ2 and standard deviations
u1and
u2respectively, and, if we make the
assumption (hypothesis) that
µiand µ
2are equal, then the sampling distribution of X1 -
X2may be considered as a normal distribution with mean µx
1-x
2=
0 and standard deviation:
(1
x
- i= .
I
u1
+
u½
. 1 2 'V n1 n2·Equation (4) is referred to as
the standard error of the difference between two means.
For
large samples, we can use the sample standard deviations,
s1and s2 in place of
0"1and
0"2,so equation (4) can be approximated by:
Sx -x
1 2= .
'Yn1/st
+
n2s½
Therefore, using the data of the example,
x,
=
52, s,
=
8 and n,
=
80 for route 1, and
x2
=
50, s2
=
6, and n2
=
100 for route 2, then from equation
(4):
8
26
2ax1 -x2
=
80 + 100
=
1.08
and:
x,
-
x2
=
52 - 50
=
2
At the 95% confidence limits, the expected value of µx
1-x
2=
0 should lie in the interval
2 ± 2 x 1.08, i.e. in the interval [ - 0.16, 4.16]. Since it does, we conclude that there is
no significant difference in times taken at the 2 sigma confidence levels.
d Differences between population proportions
Example 15
An opinion poll found that 75 out of 100 males and 180 out of 200 females interviewed
were in favour of a certain proposal. Is there any significant difference in the overall
proportion of males and females favouring the proposal?
Using the subscripts 1 and 2 for male and female respectively, thenp, andp2 refer
. to the sample proportions of males and females in favour of the proposal.
75
3
-
180
9
p
I100
=
4
pz
=
200
=
10
25
1
-
20
1
q, 100
=
4
qz
=
200 10
n
2= 200
If p, and p2 denote the population proportions, then, if there is no significant
difference in the overall proportion in favour, then p, - pz = 0.
Since p, and p 2 are the sample proportions drawn from populations with
parameters p, and p2 respectively, then the sampling distribution of p, - pz has
mean:
. . . ... . . . ... (1)
and standard deviation:
Gfi
1-fi
2=
..J
a}1 + a}2 ...
(2)where a} and a} are the variances of the sampling distributions of p, and pz.
Compard
(1)and\2)
with
(1)and
(2)
in the section on the differences between
population means (page 191).
Assuming a binomial distribution with mean = np and variance = npq then:
a2.
=
n,p,q,
=
p,q,
P1
n
I 2n,
and:
a2. n2p2 qz P2 qz
P2n
2=--
n2
2
.'. Equation(:?,) becomes:
NORMAL DISTRIBUTION 193
Compare equation
(3)with equation
(4)in the section on differences between
population means (page 191). The standard deviation in equation
(3)Gfi
1-fi
2is
referred to as
the standard error of the difference between two proportions.
Using p I and JJ2 as estimates of
p
I and p2, and the data from the example, equation
(3)becomes:
_ _ = �0.75 X 0.25 + 0.9 X 0.1 = 0 048
ap
1-pz
100
200
·
andp1 - JJ2 = 0.75 - 0.9 = -0.15
At the
950Jo
confidence limits, the expected value of P1 - P2 = 0 should lie in the
interval -0.15 ± 2 x 0.048, i.e. in the interval [ -0.246, -0.054]. Since it does
not, we conclude that there is a significant difference in the proportions at the 2
sigma confidence limits.
Note:
If
we assume no difference in the proportions in the two populations, then we can use what
is called a 'pooled estimator' Po where:
-
n1fi1 + n2fi2
Po
=n1
+
n2
75 + 180
100 + 200
= 0.85
which replaces p I and JJ2.
So _ _ = . /0.85 X 0.15
'
ap
1-Pz
'J
100
+
0.85 X 0.15 = O 044
200
·
Exercises Se
1 The weighing error of a certain type of balance has a standard deviation of 0.002. Give
3 sigma confidence limits for the true weight of a specimen weighed as 12.366 g.
2 The coefficient of variation of the error of a certain measuring instrument is 3.0 per
cent. An observation taken by the instrument is 12.52. Obtain 3 sigma confidence limits
for the true value. (The coefficient of variation is the ratio of the standard deviation
to the mean.)
3 Within what limits would the mean height of male university students very probably
be if the mean height of a random sample of 100 students is 170 cm with a standard
deviation of 5.5 cm?
4 Variables
V
and Tare known to be connected by the formula:
V
= 7 .62
+
bT.
T
can be measured accurately but the measurements of
V
are subject to a random error
which is normally distributed with mean O and standard deviation 0.12. An observation
V
= 11.23 is obtained when
T
= 5. Determine limits within which
b
almost certainly
lies.
5 Observations of weight made with a certain balance are equal, on the average, to the
Itrue weight and have a coefficient of variation from it of 0.5 per cent.
a
If
a large number of observations were made of a specimen having a true weight of
6.000 g, what standard deviation should they have?
b
If
a single observation were made, within what limits is it almost certain to lie?
c The mean of 25 observations of the weight of another specimen is 3.000 g. -Determine
the limits between which its true weight will almost certainly lie.
6
Within what limits would the mean length of a certain species of fish very probably lie
if the mean length of a random sample of 25 such fish was 30 cm and standard deviation
4 cm, assuming lengths are normally distributed?
7 A random sample of 200 patients was treated with a certain drug and 150 were cured.
Give 2 sigma confidence limits for the number of cures, and also the proportion of
cures.
8
A random sample of 150 voters in an electorate indicated that 60 per cent of them were
in favour of a certain candidate at the forthcoming elections. Give 2 sigma confidence
limits for the proportion of all voters in the electorate in favour of this candidate.
9
A dental inspector finds that, out of a group of 204> children, 45 have tooth decay. Give
a 950Jo confidence interval for the proportion of children with tooth decay.
10
A Gallup Poll establishes that out of a group of 150 people interviewed 72 were in
favour of a certain proposal. Give two sigma confidence limits for the number, in
groups of 150 people, in favour of the proposal and, therefore, the overall proportion
in favour.
11
Out of a group of 100 patients suffering from a particular disease, 75 were cured by
drug
X.
Give a 950Jo confidence interval for the proportion of p_atients cured by this
drug.
12
A targetshooter fires 200 rounds at the target and scores 120 bull's-eyes. Give two sigma
confidence limts for the number of bull's-eyes in rounds of 200.
13
In a random sample of 100 rods produced in a manufacturing process, 25 were rejected
as faulty. Give a 950Jo confidence interval for the overall proportion of rods rejected.
14
A gardener planted 80 seeds of which 64 germinated. Is this number consistent with the
gardener's claim that 900Jo of seeds planted usually germinate? Use two sigma
confidence limits.
15
Hospital records show that 750Jo of patients suffering from a certain complaint recover.
A new drug cured 165 out of 200 patients on whom it was tried. Use 2 sigma confidence
limits to determine whether this new drug is more effective.
16
The heights of a random sample of 100 VCE students is randomly distributed with a
mean of 175 cm and standard deviation of 5 cm. Give a 950Jo confidence interval for
the mean height of VCE students.
17 In a particular electorate, 260 voters out of a random sample of 400 expressed their
intention of voting for a particular party. Give 2 sigma confidence limits for the number
of voters who would vote for this party in other samples of 400 in the electorate. Hence
give 950Jo confidence limits for the proportion of voters in the electorate in favour of
this party.
18
An agency conducted a survey of a random sample of 400 viewers of a TV show and
found that 175 of them were children. Is this consistent with the agency's claim that
500Jo of viewers of this show are children? Use a 95 OJo confidence interval of
proportions.
19 A machine makes electrical resistors which have a mean resistance
ofµ
ohms with
20
If Xis a normally distributed r