6-1: Normal Distributions

(1)

Random variables can be either discrete or continuous. Continuous variables, such as height of adults, body temperature of rats, or household size, can assume all values between any two given values of the variables. Although no variable fits a normal distribution perfectly, the normal distribution can be used to describe many variables since the deviation from a normal distribution are small.

6-1: Normal Distributions

Objective 1. Identify the Properties of a Normal Distribution.

Definition: Normal Distribution

If a random variable has a probability distribution whose graph is continuous, bell-shaped, and symmetric, it is called the normal distribution. The graph is called a normal distribution curve.

Mathematical Equation for a Normal Distribution

𝑦𝑦 =𝑒𝑒−(𝑥𝑥−𝜇𝜇)

2_/(2𝜎𝜎2₎ 𝜎𝜎√2𝜋𝜋

where 𝒆𝒆 ≈ 2.718; 𝝅𝝅 ≈ 3.14;

𝝁𝝁 = 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 𝑚𝑚𝑒𝑒𝑝𝑝𝑝𝑝; 𝝈𝝈 = 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 𝑠𝑠𝑝𝑝𝑝𝑝𝑝𝑝𝑠𝑠𝑝𝑝𝑠𝑠𝑠𝑠 𝑠𝑠𝑒𝑒𝑑𝑑𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝.

The shape and position of the normal distribution curve depend on the parameters population mean and population standard deviation.

(2)

When the population means are the same, but the population standard deviations are different, the curve with the larger population standard deviation is more spread out. When the population standard deviations are the same, but the means are different, the curves have the same center but are centered at different points along the x axis.

Summary of the Properties of the Theoretical Normal Distribution

A normal distribution curve is bell-shaped.

The mean, median, and mode are located at the center of the distribution. The normal distribution curve has exactly one mode.

The normal distribution curve is symmetric about the mean.

The normal distribution curve is continuous, having no gaps or holes, for all values of the independent variable.

The curve approaches, but does meet, the axis, getting increasingly close while never touching the x-axis.

The total area under a normal distribution curve is equal to one.

The area under the part of the normal distribution that lies within one standard deviation of the mean is approximately 68%; within two standard deviations of the mean is approximately 95%; and within three standard deviations, approximately 99.7%. This is also known as the empirical rule.

(3)

Objective 2. Identify Distributions as Symmetric or Skewed

The normal distribution is symmetric about the mean, that is, the data is evenly distributed about the mean. When the majority of the data values fall either to the left or right of the mean, the distribution is skewed. When the majority of the data is to the right of the mean, the distribution is negatively-skewed or left-skewed. The mean is less than or to the left of the median. When the majority of the data is to the left of the mean, the distribution is positively-skewed or right-skewed. The mean is greater than or to the right of the median. The “tail” of the curve indicates the direction of the skewness (right is positive and left is negative).

(4)

Objective 3. Find the Area Under the Standard Normal Distribution, Given

Various z Values.

Definition: Standard Normal Distribution

The Standard Normal Distribution is a normal distribution with a mean of zero and a standard deviation of one.

The formula for the standard normal distribution is 𝑦𝑦 =𝑒𝑒−𝑥𝑥2/2_√2𝜋𝜋 .

All normally distributed variables can be transformed into the standard normally distributed variable by using the formula for the standard score:

𝑧𝑧 = _{standard deviation}value−mean 𝑝𝑝𝑠𝑠 𝑧𝑧 = 𝑋𝑋− 𝜇𝜇_𝜎𝜎 .

The standard score, or z-score, is the number of standard deviations that a particular X value is from the mean. Table E in Appendix A gives the area (to four decimal places) under the standard normal curve for z values from -3.49 to 3.49. Technology can also be used to find the area under the standard normal distribution curve.

Finding Areas Under the Standard Normal Distribution Curve

Step 1: Draw the normal distribution curve and shade the area under consideration. Step 2: Choose the appropriate type of problem and either use Table E in Appendix A or

(5)

To find: Sketch of an Example:

The area under the standard normal distribution curve to the left a z value, find the z value in Table E (Appendix A) and use the area given.

Find the area to the left of z = 1.32. The area to the left of 1.32 on Table E is 0.9066.

The area under the standard normal distribution curve to the right of a z value, find the z value in Table E (Appendix A) and subtract the area given from 1.

Find the area to the right of z = −0.67 The area to the left of −0.67 on Table E is 0.2514.

The area to the right of −0.67 is 1 – 0.2514 = 0.7486.

The area under the standard normal distribution curve between two z values, find both z values in Table E (Appendix A) and subtract the corresponding areas.

Find the area between z = 1.45 and z = 2.32 The area to the left of 1.45 is 0.9265. The area to the left of 2.32 is 0.9898. The area between 1.45 and 2.32 is 0.9898 – 0.9265 or 0.0633.

For the following examples, use Table E or technology.

Example 6 – 1. Find Areas Between Two Given Z Values

Find the area under the standard normal distribution curve a. Between 𝑧𝑧 = 0 and 𝑧𝑧 = −0.32. Solution: 0.5000 – 0.3745 = 0.1255 b. To the right of 𝑧𝑧 = 1.23 Solution: 1 – 0.8907 = 0.1093 c. To the left of 𝑧𝑧 = −1.03. Solution: 0.1515 Between 𝑧𝑧 = −2.15 and 𝑧𝑧 = 0.97. d. Solution: 0.8340 - 0.0158 = 0.8182

Example 6 – 2. Find Probabilities for Regions Between Two Z Values

Find the Probabilities using the standard normal distribution curve

(6)

a. 𝑃𝑃(0 < 𝑧𝑧 < 0.92) Solution: 0.8212 – 0.5000 = 0.3212 b. 𝑃𝑃(𝑧𝑧 < 2.77) Solution: 0.9972 c. 𝑃𝑃(𝑧𝑧 > 1.12) Solution: 1 – 0.8686 = 0.1314 d. 𝑃𝑃(−1.20 < 𝑧𝑧 < 1.56) Solution: 0.9406 – 0.1151 = 0.8255

Example 6 – 3. Find the Z Values for Specific Areas in the Tails

Find two z values, one positive and one negative that are equidistant from the mean so that the area in the two tails totals 5%; 2%; and 1%.

Solutions:

The two z values have 95% between them and, thus, 5% in the tails, meaning each tail contains 2.5% or 0.0250. The z value with 0.0250 to the left is −1.96. The z value for the area of 0.0250 to the right, so 0.9750 on the left, is 1.96.

The two z values have 98% between them and, thus, 2% in the tails, meaning each tail contains 1% or 0.0100. The z value with 0.0099, closest value on the table to 0.0100) to the left is −2.33. The z value for the area of 0.0099 to the right, so 0.9901 on the left, is 2.33.

The two z values have 99% between them and, thus, 1% in the tails, meaning each tail contains 0.5% or 0.0050. The z value with an area of 0.0050 to the left is halfway between −2.57 and −2.58. In this case, choose the z value with the greater absolute

(7)

value. Thus the z value would be −2.58. The z value for the area of 0.0050 to the right, so 0.9950 on the left, is 2.58.

6 – 2 Applications of the Normal Distribution

Objective 4. Find Probabilities for a Normally Distributed Variable by

Transforming it Into a Standard Normal Variable.

The standard normal distribution curve can be used to solve many practical problems. Assume the variables presented here are all normally or approximately normally distributed.

First we transform the original variable to a standard normal distribution variable using the formula, 𝑧𝑧 =

value−mean

standard deviation 𝑝𝑝𝑠𝑠 𝑧𝑧 = 𝑋𝑋− 𝜇𝜇

𝜎𝜎 , rounding the result to two decimal places.

To Find the Area Under Any Normal Curve

Step 1 Draw a normal curve and shade the desired area. Step 2 Convert the value of X to z using the formula 𝒛𝒛 = 𝑿𝑿− 𝝁𝝁_𝝈𝝈 .

Step 3 Find the corresponding area using a table, calculator or software.

Example 6 – 4. Liters of Blood

An adult has on an average 5.2 liters of blood. Assume the variable is normally

distributed and has a standard deviation of 0.3. Find the percentage of people who have more than 5.4 liters of blood in their system.

Solution.

(8)

Step 2: Use the formula to find the z value corresponding to 5.4. 𝑧𝑧 = 𝑋𝑋− 𝜇𝜇_𝜎𝜎 = 5.4− 5.2_0.3 = 0.2_0.3= 0.67

Step 3: Use Table E, or your calculator, or computer software to determine the area to the right of 𝑧𝑧 = 0.67. (Notice this value is rounded.) The area under the standard normal curve to the left of the z score, then to find the area to the right, subtract the value from the table from 1.

1 – 0.7486 = 0.2514

Therefore, 25.14% of adults have greater than 5.4 liters of blood in their system.

Example 6 – 5.

Monthly Mortgage Payments

The average monthly mortgage payment including principal and interest is $982 in the United States. If the standard deviation is approximately $180 and the mortgage payments are approximately normally distributed, find the probability that a randomly selected monthly payment is a) more than $1000; b) More than $1475; and c) between $800 and $1150?

Solution: Step 1: Draw.

(9)

a) Normal distribution for monthly payments of more than $1000:

b) Normal distribution for monthly payments of more than $1475:

c) Normal distribution for monthly payments of between $800 and $1150:

(10)

a) P(X > 1000) 𝑧𝑧 =1000−982₁₈₀ = 18 180= 0.10 b) P(X > 1475) 𝑧𝑧 =1475−982₁₈₀ = 493 180= 2.74 c) P(800 > X > 1150) 𝑧𝑧 =800 − 982 180 =−182₁₈₀ = − 1.01 𝑧𝑧 =1150 − 982₁₈₀ = −168 180 = 0.93 Step 3: Find the appropriate areas.

a)

The area to the left of z = 0.10 is 0.5398.

𝑃𝑃(𝑋𝑋 > 1000) = 1– 0.5398 = 0.4602

b)

0.9969 is the value from the cumulative standard normal distribution. 𝑃𝑃(𝑋𝑋 > 1475) = 1– 0.9969 = 0.0031

c)

0.1562 is the area to the left of z = −1.01. 0.8238 is the area to the left of z = 0.93.

P(800 > X > 1150) = 0.8238 – 0.1562 = 0.6676.

Objective 5. Find Specific Data Values for Given Percentages, Using the

Standard Normal Distribution.

Using the formula = 𝑋𝑋− 𝜇𝜇_𝜎𝜎 , solve for X and use this new equation to find the specific data value for a given percentage.

Solve for X: 𝑧𝑧 = 𝑋𝑋− 𝜇𝜇_𝜎𝜎 𝑧𝑧 ∙ 𝜎𝜎 = 𝑋𝑋 − μ 𝑧𝑧 ∙ 𝜎𝜎 + 𝜇𝜇 = X

Formula for Finding the Value of a Normal Variable X

𝑋𝑋 = 𝑧𝑧 ∙ 𝜎𝜎 + 𝜇𝜇

(11)

Finding Data Values for Specific Probabilities

Step 1 Draw a normal curve and shade the desired area that represents _{the probability, proportion or percentile.} Step 2 Find the z value corresponding to the desired area from the _{table, calculator, or computer software.} Step 3 Calculate the X value using the formula 𝑋𝑋 = 𝑧𝑧 ∙ 𝜎𝜎 + 𝜇𝜇.

Example 6 – 6. Police Academy Qualifications

To qualify for a police academy, candidates must score in the top 10% on a general abilities test. Assume the test scores are normally distributed and the test has a mean of 200 and a standard deviation of 20. Find the lowest possible score to qualify. Solution:

Step 1. Draw a normal distribution curve and shade the right hand 10% of the area.

Step 2. The area to the left of the 0.1 shaded on the right is 1 – 0.1 = 0.9. Find the z value from Table E (or calculator or computer software) corresponding to 0.9 area to the left.

z = 1.28

Step 3. Use this value in the formula 𝑋𝑋 = 𝑧𝑧 ∙ 𝜎𝜎 + 𝜇𝜇 to find X. 𝑋𝑋 = 1.28 ∙ 20 + 200 = 225.6

Conclusion: A score of 225.6 should be used for the cutoff. Anyone scoring 225.6 or higher qualifies for the academy.

(12)

Example 6 – 7. Qualifying Test Scores

In order to qualify for a medical study, an applicant must have a systolic blood pressure in the 50% of the middle range. If the systolic blood pressure is normally distributed with a mean of 120 and a standard deviation of 4, find the upper and lower limits of blood pressure a person must have in order to qualify for the study.

Solution:

Step 1. Draw. Mark the middle 50%. This leaves 25% to the left of the middle and 25% to the right of the middle.

Step 2. Use the table or technology to find the z values corresponding with the area of 0.25 to the left and 0.75 to the right.

Look up the z values. z = −0.67 and z = 0.67

Step 3. Use the formula to calculate the two values of X, using each of the z values, 120 as the mean and 4 as the standard deviation.

𝑋𝑋 = −0.67 ∙ 4 + 120 = 117.32 𝑋𝑋 = 0.67 ∙ 4 + 120 = 122.68

Determine Normality

There are a variety of ways to determine if a distribution of values is normally or approximately normally distributed.

The easiest way to check for normality is to draw a histogram for the data and check its shape.

Another is to find the Pearson coefficient (PC) of skewness or the Pearson’s index of skewness:

(13)

PC = 3(𝑋𝑋�−median)_𝑠𝑠 .

If the index is greater than or equal to +1 or less than or equal to −1, it can be concluded that the data are significantly skewed.

Check the data for outliers by finding the outlier boundaries as defined by Lower outlier boundary = 𝑄𝑄1− 1.5(IQR)

Upper outlier boundary = 𝑄𝑄3+ 1.5(IQR), where 𝐼𝐼𝑄𝑄𝐼𝐼 = 𝑄𝑄3− 𝑄𝑄1.

Example 6 – 8. Technology Innovations

A survey of 18 high-tech firms showed the number of days’ inventory they had on hand. Determine if the data are approximately normally distributed.

5 29 34 44 45 63 68 74 74

81 88 91 97 98 113 118 151 158

Solution:

Construct a frequency distribution. Class Frequency 5 – 29 2 30 – 54 3 55 – 79 4 80 – 104 5 105 - 129 2 130 –154 1 155 - 179 1 Draw the histogram.

(14)

Use the formula for the Pearson Coefficient of Skewness. Determine if there is an indication that the distribution is significantly skewed.

PC = 3(𝑋𝑋�−median)_𝑠𝑠 =3(79.5−77.5)_40.454 = 0.148 This does not indicate significant skewness.

Check for outliers using the formulas for the lower and upper outlier boundaries. Lower outlier boundary = 𝑄𝑄1− 1.5(IQR) = 45 − 1.5(53) = −34.5

Upper outlier boundary = 𝑄𝑄3+ 1.5(IQR) = 98 + 1.5(53) = 177.5 where IQR = 98 − 45 = 53.

There are no outliers.

Using these three indicators, the distribution is approximately normally distributed.

Example 6– 9. Number of Runs Made

The data represent the number of runs made each year during Bill Mazeroski’s career. Check for normalcy.

30 59 69 50 58 71 55 43 66

52 56 62 36 13 29 17 3

Solution:

Construct a frequency distribution and draw a histogram.

Class Frequency 0 – 9 1 10 – 19 2 20 – 29 1 30 – 39 2 40 – 49 1 50 – 59 6 60 – 69 3 70 – 79 1 Draw a histogram.

(15)

Use the formula for the Pearson Coefficient of Skewness. Determine if there is an indication that the distribution is significantly skewed.

PC = 3(𝑋𝑋�−median)_𝑠𝑠 =3(45.235−52)_20.584 = −0.986

The value is close to -1, so the coefficient indicates there could be significant skewness. Check for outliers using the formulas for the lower and upper outlier boundaries.

Lower outlier boundary = 𝑄𝑄1− 1.5(𝐼𝐼𝑄𝑄𝐼𝐼) = 29.5 − 1.5(31) = −17 Upper outlier boundary = 𝑄𝑄3+ 1.5(𝐼𝐼𝑄𝑄𝐼𝐼) = 60.5 + 1.5(31) = 107

where 𝐼𝐼𝑄𝑄𝐼𝐼 = 60.5 − 29.5 = 31. There are no outliers.

Using these three indicators, the distribution does not appear to be normally distributed.

6 – 3 The Central Limit Theorem

Objective 6. Use the Central Limit Theorem to Solve Problems Involving Sample

Means for Large Samples.

Select a sample of 30 adult males and find the mean measure of the triglyceride levels for the sample to be 187 mg/dl. Select a second sample, and find the mean to be 192 mg/dl. Continue selecting until you have 100 samples and sample means. Let the mean of these sample means be a random variable. The sample means become a sampling distribution of sample means.

(16)

Definition: Sampling Distribution of Sample

A sampling distribution of sample means is a distribution using the means computed from all possible random samples of a specific size taken from a population.

The sample means of samples randomly selected with replacement will be somewhat different from the population mean. The differences are caused by sampling error.

Definition: Sampling Error

Sampling error is the difference between the sample measure and the corresponding population measure due to the fact that the sample is not a perfect representation of the population.

Properties of the Distribution of Sample Means

1. The mean of the sample means will be the same as the population mean. 2. The standard deviation of the sample means will be smaller than the standard deviation of the population, and it will be equal to the population standard deviation divided by the square root of the sample size.

Example 6 – 10. The Central Limit Theorem.

This example illustrates the three properties of the Central Limit Theorem.

Suppose an 8-point quiz was given to four students and the results were 2, 4, 6, and 8. The population mean is 5 and the population standard deviation is 2.236.

Find all possible samples of size 2 taken with replacement and find the mean of each sample. Then using the frequency distribution of the sample means, find the mean of the sample means and the standard deviation of the sample means, also known as the standard error of the mean.

Sample Mean Sample Mean

2,2 2 6,2 4

2,4 3 6,4 5

2,6 4 6,6 6

(17)

Sample Mean Sample Mean 4,2 3 8,2 5 4,4 4 8,4 6 4,6 5 8,6 7 4,8 6 8,8 8

X

frequency 2 1 3 2 4 3 5 4 6 3 7 2 8 1

The mean of the sample means, denoted by 𝜇𝜇𝑥𝑥̅, is 5, which is the mean of the

population. Thus, 𝜇𝜇𝑥𝑥̅ = 𝜇𝜇.

The standard deviation of the sample means, denoted by 𝜎𝜎𝑥𝑥̅ =_√𝑛𝑛𝜎𝜎 =2.236_√2 , is 1.581

which is the standard deviation of the population divided by �sample size. Thus, 𝜎𝜎𝑥𝑥̅ =_√𝑛𝑛𝜎𝜎 .

Draw the histogram of the distribution of sample means.

(18)

The Central Limit Theorem

As the sample size n increases without limit, the shape of the distribution of the sample means taken with replacement from a population with mean 𝜇𝜇 and standard deviation 𝜎𝜎 will approach a normal distribution with a mean of 𝜇𝜇 and a standard deviation of _√𝑛𝑛𝜎𝜎.

For samples with sufficient size, the central limit theorem can be used to answer questions about sample means in the same manner that a normal distribution can be used to answer questions about individual values. The difference is that the formula for z becomes

𝑧𝑧 =𝑋𝑋� − 𝜇𝜇 𝜎𝜎 √𝑝𝑝⁄

1. When the original variable is normally distributed, the distribution of the sample means will be normally distributed for any sample size n.

2. When the distribution of the original variable is not normal, a sample size of 30 or more is needed to use a normal distribution to approximate the distribution of the sample means.

Example 6 – 11. Working Weekends

The average time spent by construction workers who work on weekends is 7.93 hours (over a period of 2 days). Assume the distribution is approximately normal and has a standard deviation of 0.8 hours.

a. Find the probability that an individual who works at that trade works fewer than 8 hours on the weekend.

b. If a sample of 40 construction workers is randomly selected, find the probability that the mean of the sample will be less than 8 hours.

Solution:

a. Find the probability that an individual who works at that trade works fewer than 8 hours on the weekend.

(19)

Step 2: Use the formula to find the z value corresponding to an individual who works fewer than 8 hours.

𝑧𝑧 = 𝑋𝑋 − 𝜇𝜇_{𝜎𝜎 =}8 − 7.93_0.8 = 0.0875 𝑝𝑝𝑠𝑠 0.09

Step 3: Use Table E, or your calculator, or computer software to determine the area to the left of 𝑧𝑧 =0.09.

Therefore, the probability that an individual who works at that trade works fewer than 8 hours on the weekend is 0.5359.

b. If a sample of 40 construction workers is randomly selected, find the probability that the mean of the sample will be less than 8 hours.

Step 1: Draw a normal curve and shade the desired area for the mean of the sample of 40 construction workers.

(20)

Step 2: Use the formula to find the z value corresponding to the mean of 40 construction workers has a mean less than 8 hours.

𝑧𝑧 = 𝑋𝑋� − 𝜇𝜇_𝜎𝜎 √𝑝𝑝

= 8 − 7.93_0.8 √40

= 0.55

Step 3: Use Table E, or your calculator, or computer software to determine the area to the left of 𝑧𝑧 =0.55.

Therefore, the probability that a sample of 40 construction workers has a mean of fewer than 8 hours on the weekend is 0.7088.

Finite Population Correction Factor

The formula for the standard error of the mean 𝜎𝜎

√𝑛𝑛 is accurate when samples are drawn with

replacement or drawn without replacement from a very large or infinite population. When sampling from a finite population without replacement, a correction factor of �𝑁𝑁−𝑛𝑛_𝑁𝑁−1, where N is the population size and n is the sample size. That is, we use

𝜎𝜎𝑥𝑥̅= 𝜎𝜎

√𝑝𝑝 ∙ � 𝑁𝑁 − 𝑝𝑝 𝑁𝑁 − 1 So the formula for z values becomes

𝑧𝑧 = 𝑋𝑋� − 𝜇𝜇 𝜎𝜎

√𝑝𝑝 ∙ �𝑁𝑁 − 𝑝𝑝𝑁𝑁 − 1

If the population is large, but the sample is small, the correction factor is not used. In that case, the correction factor is close to 1.

(21)

6 – 4 The Normal Approximation to the Binomial Distribution

Objective 7. Use the Normal Approximation to Compute Probabilities for a

Binomial Variable.

Recall that a binomial distribution has the following characteristics: 1. There must be a fixed number of trials.

2. The outcome of each trial must be independent.

3. Each experiment can have only two outcomes, or outcomes that can be reduced to two outcomes.

4. The probability of a success must remain the same for each trial. Also,

𝜇𝜇 = 𝑝𝑝 ∙ 𝑝𝑝 and 𝜎𝜎 = �𝑝𝑝 ∙ 𝑝𝑝 ∙ (1 − 𝑝𝑝)

When n is large, that is 100 or more, the normal distribution can sometimes be used to approximate the binomial distribution. When the probability of success is close to 0.5, the shape of the binomial

distribution is approximately normal. Statisticians agree, as a rule of thumb, that the normal distribution should be used only when 𝑝𝑝 ∙ 𝑝𝑝 ≥5 𝑝𝑝𝑝𝑝𝑠𝑠 𝑝𝑝 ∙ 𝑞𝑞 ≥5. However, a continuity correction should also be used when employing the normal distribution to represent a discrete distribution. The continuity correction means that for any specific value of X, say 8, the boundaries of X in the binomial distribution (in this case, 7.5 to 8.5) must be used.

Summary of the Normal Approximation to the Binomial Distribution

Binomial Normal

When finding: Use:

1. 𝑃𝑃(𝑋𝑋 = 𝑝𝑝) 𝑃𝑃(𝑝𝑝 − 0.5 < 𝑋𝑋 < 𝑝𝑝 + 0.5) 2. 𝑃𝑃(𝑋𝑋 ≥ 𝑝𝑝) 𝑃𝑃(𝑋𝑋 > 𝑝𝑝 − 0.5)

3. 𝑃𝑃(𝑋𝑋 > 𝑝𝑝) 𝑃𝑃(𝑋𝑋 > 𝑝𝑝 + 0.5) 4. 𝑃𝑃(𝑋𝑋 ≤ 𝑝𝑝) 𝑃𝑃(𝑋𝑋 < 𝑝𝑝 + 0.5) 5. 𝑃𝑃(𝑋𝑋 < 𝑝𝑝) 𝑃𝑃(𝑋𝑋 < 𝑝𝑝 − 0.5)

(22)

Example 6 –12. Population of College Cities

College students often make up a substantial portion of the population of college cities and towns. State College, Pennsylvania, ranks first with 71.1% of its population made up of college students. What is the probability that in a random sample of 150 people from State College, more than 90 are not college students?

Solution:

Step 1. Check if the normal approximation can be used.

𝑝𝑝 represents the probability a person in State College is 𝐧𝐧𝐧𝐧𝐧𝐧 a college student. Thus, if the probability that a citizen is a college student is 0.711, the probability that a citizen is not a college student is 0.289.

𝑝𝑝 = 150; 𝑝𝑝 ≥ 100

𝑝𝑝 = 0.289; 𝑝𝑝𝑝𝑝 = 150(0.289) = 43.35 ≥ 5; 𝑝𝑝𝑞𝑞 = 150 − 43.35 = 106.65 ≥ 5 The normal distribution can be used.

Step 2. Write the probability notation that models the problem.

What is the probability that in a random sample of 150 people from State College, more than 50 are not college students?

𝑃𝑃(more than 50 are not college students) = 𝑃𝑃(𝑋𝑋 > 50)

Step 3. Find the mean and standard deviation of the distribution. 𝜇𝜇 = 𝑝𝑝 ∙ 𝑝𝑝 = 150(0.289) = 43.35

𝜎𝜎 = �𝑝𝑝 ∙ 𝑝𝑝 ∙ (1 − 𝑝𝑝) = √150 ∙ 0.289 ∙ 0.711 = √30.8218 = 5.552 Step 4. Rewrite the problem using the continuity correction factor. 𝑃𝑃(𝑋𝑋 > 50) = 𝑃𝑃(𝑋𝑋 > 50 + 0.5) = 𝑃𝑃(𝑋𝑋 > 50.5)

Procedure for the Normal Approximation to the Binomial Distribution

Step 1 Check to see whether the normal approximation can be used. Step 2 Find the mean 𝜇𝜇 and the standard deviation 𝜎𝜎.

Step 3 Write the problem in probability notation, using X.

Step 4 Rewrite the problem by using the continuity correction factor, and show the corresponding area under the normal distribution.

Step 5 Find the corresponding z values. Step 6 Find the solution.

(23)

Step 5. Find the corresponding z values. 𝑧𝑧 =50.5 − 43.35_5.552 = 1.287 𝑝𝑝𝑠𝑠 1.29 Step 6. Find the probability.

Use the Table or technology to find the probability. 𝑃𝑃(𝑋𝑋 > 50.5) = 0.0985 based on the rounded value of z.