• No results found

9. Continuous Probability Distributions 161

9.5 Normal Distribution

Physical Setup:

A random variable X defined on (−∞, ∞) has a normal distribution if it has probability density func-tion of the form

f (x) = 1

√2πσe12(x−μσ )2 − ∞ < x < ∞

where−∞ < μ < ∞ and σ > 0 are parameters. It turns out (and is shown below) that E(X) = μ and Var(X) = σ2for this distribution; that is why its p.d.f. is written using the symbols μ and σ. We write

X ∼ N(μ, σ2)

to denote that X has a normal distribution with mean μ and variance σ2(standard deviation σ).

The normal distribution is the most widely used distribution in probability and statistics. Physical processes leading to the normal distribution exist but are a little complicated to describe. (For example, it arises in physics via statistical mechanics and maximum entropy arguments.) It is used for many processes where X represents a physical dimension of some kind, but also in many other settings.

We’ll see other applications of it below. The shape of the p.d.f. f (x) above is what is often termed a

“bell shape” or “bell curve”, symmetric about 0 as shown in Figure 9.4.(you should be able to verify the shape without graphing the function)

Illustrations:

(1) Heights or weights of males (or of females) in large populations tend to follow normal distribu-tions.

(2) The logarithms of stock prices are often assumed to be normally distributed.

−50 −4 −3 −2 −1 0 1 2 3 4 5 0.1

0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Φ(x)

x

−4.7

Figure 9.5: The standard normal c.d.f.

The cumulative distribution function: The c.d.f. of the normal distribution N (μ, σ2) is

F (x) = Z x

−∞

√1

2πσe12(y−μσ )2dy.

as shown in Figure 9.5. This integral cannot be given a simple mathematical expression so numerical methods are used to compute its value for given values of x, μ and σ. This function is included in many software packages and some calculators.

In the statistical packages R and S-Plus we get F (x) above using the function pnorm(x, μ, σ).

Before computers, people needed to produce tables of probabilities F (x) by numerical integration, using mechanical calculators. Fortunately it is necessary to do this only for a single normal distribution:

the one with μ = 0 and σ = 1. This is called the “standard" normal distribution and denoted N (0, 1).

It is easy to see that if X∼ N(μ, σ2) then the “new" random variable Z = (X −μ)/σ is distributed as Z ∼ N(0, 1). (Just use the change of variables methods in Section 9.1.) We’ll use this to compute F (x) and probabilities for X below, but first we show that f (x) integrates to 1 and that E(X) = μ and

Var(X) = σ2. For the first result, note that

Mean, Variance, Moment generating function: Recall that an odd function, f (x), has the property that f (−x) = −f(x). If f(x) is an odd function thenR

−∞f (x)dx = 0, provided the integral exists.

Consider and so μ is the mean. To obtain the variance,

Var(X) = Eh

2σ2 dx ( since the function is symmetric about μ).

We can obtain a gamma function by letting y = (x−μ)22.

Then

and so σ2 is the variance. We now find the moment generating function of the N (μ, σ2) distribution.

If X has the N (μ, σ2) distribution, then where the last step follows since

1

is just the integral of a N (μ + tσ2, σ2) probability density function and is therefore equal to one. This confirms the values we already obtained for the mean and the variance of the normal distribution

MX0 (0) = eμt+σ2t2/2(μ + tσ2)|t=0= μ MX00(0) = μ2+ σ2= E(X2)

from which we obtain

V ar(X) = σ2.

Finding Normal Probabilities Via N (0, 1) Tables As noted above, F (x) does not have an explicit closed form so numerical computation is needed. The following result shows that if we can compute the c.d.f. for the standard normal distribution N (0, 1), then we can compute it for any other normal distribution N (μ, σ2) as well.

Theorem 35 Let X∼ N(μ, σ2) and define Z = (X − μ)/σ. Then Z ∼ N(0, 1) and FX(x) = P (X ≤ x)

= FZ(x−μσ ).

Proof: The fact that Z ∼ N(0, 1) has p.d.f.

fZ(z) = 1

√2πe12z2 − ∞ < z < ∞ follows immediately by change of variables. Alternatively, we can just note that

FX(x) = Z x

−∞

√1

2πσe12(x−μσ )2dx

=

Z (x−μ)/σ

−∞

√1

2πe12z2dz (z = x − μ σ )

= FZ(x − μ

σ ) ¤

A table of probabilities FZ(z) = P (Z ≤ z) is given on the last page of these notes. A space-saving feature is that only the values for z > 0 are shown; for negative values we use the fact that N (0, 1) p.d.f. is symmetric about 0. The following examples illustrate how to get probabilities for Z using the tables.

Examples: Find the following probabilities, where Z∼ N(0, 1).

(a) P (Z ≤ 2.11) (b) P (Z ≤ 3.50) (c) P (Z > 1.06) (d) P (Z <−1.06)

(e) P (−1.06 < Z < 2.11)

Solution:

a) Look up 2.11 in the table by going down the left column to 2.1 then across to the heading .01.

We find the number .9826. Then P (Z≤ 2.11) = F (2.11) = .9826. See Figure 9.6.

182

−4 −3 −2 −1 0 1 2 3 4

0 0.05 0.1 0.15 0.2 0.25 0.3

0.9826

2.11

f(z)

z

Figure 9.6:

b) P (Z ≤ 3.40) = F (3.40) = .9996631

c) P (Z > 1.06) = 1− P (Z ≤ 1.06) = 1 − F (1.06) = 1 − .8554 = .1446 d) Now we have to use symmetry:

P (Z < −1.06) = P (Z > 1.06) = 1 − P (Z ≤ 1.06) = 1 − F (1.06) = .1446 See Figure 9.7.

−40 −3 −2 −1 0 1 2 3 4

0.1 0.2 0.3 0.4

−1.06

f(z)

z

−40 −3 −2 −1 0 1 2 3 4

0.1 0.2 0.3 0.4

1.06

f(z)

z

Figure 9.7:

e) P (−1.06 < Z < 2.11) = F (2.11) − F (−1.06)

= F (2.11) − P (Z ≤ −1.06) = F (2.11) − [1 − F (1.06)]

= .9826 − (1 − .8554) = .8380

In addition to using the tables to find the probabilities for given numbers, we sometimes are given the probabilities and asked to find the number. With R or S-Plus software , the function qnorm (p, μ, σ) gives the 100 p-th percentile (where 0 < p < 1). We can also use tables to find desired values.

Examples:

a) Find a number c such that P (Z < c) = .85 b) Find a number d such that P (Z > d) = .90 c) Find a number b such that P (−b < Z < b) = .95

Solutions:

a) We can look in the body of the table to get an entry close to .8500. This occurs for z between 1.03 and 1.04; z = 1.04 gives the closest value to .85. For greater accuracy, the table at the bottom of the last page is designed for finding numbers, given the probability. Looking beside the entry .85 we find z = 1.0364.

b) Since P (Z > d) = .90 we have F (d) = P (Z ≤ d) = 1 − P (Z > d) = .10. There is no entry for which F (z) = .10 so we again have to use symmetry, since d will be negative.

P (Z ≤ d) = P (Z ≥ |d|)

= 1 − F (|d|) = .10 Therefore F (|d|) = .90 Therefore|d| = 1.2816 Therefore d =−1.2816

−4 −3 −2 −1 0 1 2 3 4

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4

d |d|

f(z)

z

0.1 0.1

The key to this solution lies in recognizing that d will be negative. If you can picture the situation it will probably be easier to handle the question than if you rely on algebraic manipulations.

Exercise: Will a be positive or negative if P (Z > a) = .05? What if P (Z < a) = .99?

c) If P (−b < Z < b) = .95 we again use symmetry.

−4 −3 −2 −1 0 1 2 3 4

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4

−b b

f(z)

z 0.95

0.025 0.025

Figure 9.8:

The probability outside the interval (−b, b) must be .05, and this is evenly split between the area above b and the area below−b.

Therefore P (Z <−b) = P (Z > b) = .025 and P (Z ≤ b) = .975

Looking in the table, b = 1.96.

To find N¡ μ, σ2¢

probabilities in general, we use the theorem given earlier, which implies that if X ∼ N(μ, σ2) then

P (a ≤ X ≤ b) = P³

a−μ

σ ≤ Z ≤ b−μσ ´

= FZ³

b−μ σ

´

− FZ

¡a−μ

σ

¢

where Z ∼ N(0, 1).

Example: Let X∼ N(3, 25).

a) Find P (X < 2)

b) Find a number c such that P (X > c) = .95.

Solution:

a)

Therefore c−35 = −1.6449 and c = −5.2245

Gaussian Distribution: The normal distribution is also known as the Gaussian11 distribution. The notation X ∼ G(μ, σ) means that X has Gaussian (normal) distribution with mean μ and standard deviation σ. So, for example, if X ∼ N(1, 4) then we could also write X ∼ G(1, 2).

Example: The heights of adult males in Canada are close to normally distributed, with a mean of 69.0 inches and a standard deviation of 2.4 inches. Find the 10th and 90th percentiles of the height distribution. (Recall that the a-th percentile is such that a% of the population has height less that this value.)

11After Johann Carl Friedrich Gauss (1777-1855), a German mathematician, physicist and astronomer, discoverer of Bode’s Law, the Binomial Theorem and a regular 17-gon. He discovered the prime number theorem while an 18 year-old student and used least-squares (what is called statistical regression in most statistics courses) to predict the position of Ceres.

Solution: We are being told that if X is the height of a randomly selected Canadian adult male, then X ∼ G(69.0, 2.4), or equivalently X ∼ N(69.0, 5.76). To find the 90th percentile c, we use

P (X ≤ c) = P or c = 65.92 inches, as the 10th percentile.

Linear Combinations of Independent Normal Random Variables

Linear combinations of normal r.v.’s are important in many applications. Since we have not covered continuous multivariate distributions, we can only quote the second and third of the following results without proof. The first result follows easily from the change of variables method.

1. Let X ∼ N(μ, σ2) and Y = aX + b, where a and b are constant real numbers. Then Y ∼

be independent, and let a and b be constants.

Then aX + bY ∼ N¡

1+ bμ2, a2σ12+ b2σ22¢ . In general if Xi∼ N¡

μi, σi2¢

are independent and aiare constants, thenP

Actually, the only new result here is that the distributions are normal. The means and variances of linear combinations of r.v.’s were previously obtained in section 8.3.

Example: Let X ∼ N(3, 5) and Y ∼ N(6, 14) be independent. Find P (X > Y ).

Solution: Whenever we have variables on both sides of the inequality we should collect them on one side, leaving us with a linear combination.

P (X > Y ) = P (X − Y > 0)

Example: Three cylindrical parts are joined end to end to make up a shaft in a machine; 2 type A parts and 1 type B. The lengths of the parts vary a little, and have the distributions: A∼ N (6, .4) and B ∼ N (35.2, .6). The overall length of the assembled shaft must lie between 46.8 and 47.5 or else the shaft has to be scrapped. Assume the lengths of different parts are independent. What percent of assembled shafts have to be scrapped?

Exercise: Why would it be wrong to represent the length of the shaft as 2A + B? How would this length differ from the solution given below?

Solution: Let L, the length of the shaft, be L = A1+ A2+ B.

Then

L ∼ N (6 + 6 + 35.2, .4 + .4 + .6) = N(47.2, 1.4) and so

P (46.8 < L < 47.5) = P³

46.8−47.2

1.4 < Z < 47.5−47.2 1.4

´

= P (−.34 < Z < .25) = .2318.

i.e. 23.18% are acceptable and 76.82% must be scrapped. Obviously we have to find a way to reduce the variability in the lengths of the parts. This is a common problem in manufacturing.

Exercise: How could we reduce the percent of shafts being scrapped? (What if we reduced the vari-ance of A and B parts each by 50%?)

Example: The heights of adult females in a large population is well represented by a normal distribu-tion with mean 64 in. and variance 6.2 in2.

(a) Find the proportion of females whose height is between 63 and 65 inches.

(b) Suppose 10 women are randomly selected, and let ¯X be their average height ( i.e. X =¯ P10

i=1

Xi/10, where X1, . . . , X10are the heights of the 10 women). Find P (63≤ ¯X ≤ 65).

(c) How large must n be so that a random sample of n women gives an average height ¯X so that P (| ¯X − μ| ≤ 1) ≥ .95?

Solution:

(a) X ∼ N(64, 6.2) so for the height X of a random woman,

Remark: This shows that if we were to select a random sample of n = 24 persons, then their average height ¯X would be with 1 inch of the average height μ of the whole population of women. So if we did not know μ then we could estimate it to within±1 inch (with probability .95) by taking this small a sample.

Exercise: Find how large n would have to be to make P (| ¯X − μ| ≤ .5) ≥ .95.

These ideas form the basis of statistical sampling and estimation of unknown parameter values in populations and processes. If X ∼ N(μ, σ2) and we know roughly what σ is, but don’t know μ, then we can use the fact that ¯X ∼ N(μ, σ2/n) to find the probability that the mean ¯X from a sample of size n will be within a given distance of μ.

Problems:

9.5.1 Let X ∼ N(10, 4) and Y ∼ N(3, 100) be independent. Find the probability

a) 8.4 < X < 12.2 b) 2Y > X

c) Y <0 where Y is the sample mean of 25 independent observations on Y .

9.5.2 Let X have a normal distribution. What percent of the time does X lie within one standard deviation of the mean? Two standard deviations? Three standard deviations?

9.5.3 Let X ∼ N(5, 4). An independent variable Y is also normally distributed with mean 7 and standard deviation 3. Find:

(a) The probability 2X differs from Y by more than 4.

(b) The minimum number, n, of independent observations needed on X so that P¡

|X − 5| < 0.1¢

≥ .98. (X = Pn i=1

Xi/n is the sample mean)