Final Exam Practice Problem Answers
The following data set consists of data gathered from 77 popular breakfast cereals. The variables in the data set are as follows:
Brand: The brand name of the cereal
Calories: The number of calories per serving
Protein: The number of grams of protein per serving Fat: The number of grams of fat per serving
Fiber: The number of grams of fiber per serving
Sodium: The number of milligrams (mg) of sodium per serving Carbo: The number of grams of carbohydrates per serving Sugars: The number of grams of sugars per serving
Vitamins: The percentage of the recommended daily allowance (RDA) of vitamins per serving Shelf: 1 indicates that the cereal appears on the lowest shelf in the store
0 indicates that the cereal does not appear on the lowest shelf in the store
rating: An overall healthiness rating for the cereal. The higher the rating, the healthier the cereal.
Some observations from the data set follow:
name calories protein fat sodium fiber carbo sugars vitamins Shelf rating
Product_19 100 3 0 320 1 20 3 100 0 41.504
Cheerios 110 6 2 290 2 17 1 25 0 50.765
Corn_Flakes 100 2 0 290 1 21 2 25 0 45.863
Rice_Krispies 110 2 0 290 0 22 3 25 0 40.560
Corn_Chex 110 2 0 280 0 22 3 25 0 41.445
The Excel output below gives information about the sodium content in the 77 cereals. Use this to answer the following questions
sodium
Mean 159.6753
Standard Error 9.553577
Median 180
Mode 0
Standard Deviation 83.8323 Sample Variance 7027.854
Kurtosis -0.34524
Skewness -0.57571
Range 320
Minimum 0
Maximum 320
Sum 12295
Count 77
Confidence Level(90.0%) 15.90814
sodium
Min 0
Q1 130
Median 180
Q3 210
Max 320
Outliers 0
0 0 0 0 0
1. Describe the shape of the distribution of sodium contents in the 77 breakfast cereals.
The distribution is slightly skewed to the left and contains 9 outliers. These outliers all appear as one point on the boxplot because each of the 9 outlying cereals contain 0 mg of sodium per serving.
2. What is the median sodium content in the cereals? What does this value represent?
The median sodium content in the cereals is 180 mg. This implies that 50% of the cereals in the sample have less than 180 mg. of sodium per serving. Likewise, 50% of the cereals in the sample have more than 180 mg. of sodium per serving.
3. The 25% of the cereals that contain the most sodium contain at least how much sodium per serving?
This value would be 75th percentile or the 3rd quartile. The 25% of the cereals with most sodium contain at least 210 mg per serving.
4. What is the standard deviation of the sodium contents? What does this value represent?
The standard deviation of the sodium contents is 83.83. This is a measure of variability in the sample. Specifically it measures the spread of the observations around the sample mean.
5. Assume that this represents a random sample of 77 cereals from the population of all breakfast cereals. Conduct a hypothesis test to determine if the mean sodium content in all cereals is greater than 140 mg. per serving. State the null and alternative hypothesis, the test statistic, p- value or an approximate p-value, and the decision and conclusion. Use α = 0.01
Ho: µ = 140 Ha: µ > 140
Test statistic: 159.6753 140 2.0683.8323 77
t x s
n µ
− −
= = =
Degrees of freedom: n-1 = 76
p-value: use approximate degrees of freedom of 80 on the t-table. Note that the computed test statistic falls between the critical values of 1.990 and 2.088 on the t-table. This implies that the p-value falls in the range 0.02 < p-value < 0.025.
Decision: Since the p-value is greater than α, we will not reject the null hypothesis. There is not sufficient evidence at the 1% level of significance to conclude that the mean sodium content in all cereals is greater than 140 mg per serving.
6. What is the IQR of the sample? What does this value represent?
The IQR gives the range of the middle 50% of the sample. It is the difference between the third and first quartiles and is given by Q3-Q1 = 210-130 = 80.
The following Excel output gives information about the healthiness ratings of cereals that appear on the low shelf in the store compared to the ratings of cereals that do not appear on the low shelf in the store. The output was generated using αααα = 0.05. Use this output to answer the following questions. Assume that the data represent random samples from the populations of all cereals on the low shelf and those not on the low shelf in the store.
7. What is the sample variance of the healthiness rating of cereals that do not appear on the low shelf?
s2 = 170.805
8. Suppose you wish to conduct a hypothesis test to determine if cereals on the low shelf have a lower average healthiness rating than those appearing on higher shelves. State the null and alternative hypothesis to test this claim.
H0: µlow = µhi
Ha: µlow < µhi
9. State the test statistic, p-value, decision, and conclusion to the hypothesis test in the previous question. Use α = 0.05
Test statistic: -3.014 p-value: 0.002
Decision: Since the p-value is less than α, reject H0. There is sufficient evidence to conclude that cereals on the low shelf have lower average healthiness ratings than those that do not appear on the low shelf.
10. Compute and interpret a 95% confidence interval to estimate the difference in the population mean healthiness ratings between cereals that appear on the lower shelf and those on higher shelves.
56 805 . 170 21
685 . 032 194 . 2 578 . 10
* ) (
2 22 1 12 2
1 − ± + =− ± +
n s n t s x x
= -10.578 ± 2.032(3.510)
= -10.578 ± 7.132
With 95% confidence, on average cereals on the low shelf in the grocery store have a rating of between 3.45 and 17.71 points lower than cereals on higher shelves.
11.What is the margin of error for the confidence interval computed in the previous question?
The margin of error for the interval computed above is 7.132
Suppose that the 77 cereals represent a random sample of all breakfast cereals. 21 of the cereals contain more than 10 grams of sugar per serving. Use this information to answer the following questions.
12. Compute a 99% confidence interval to estimate the true proportion of breakfast cereals that contain more than 10 grams of sugar per serving. Interpret the interval.
2 21 2 0.2840 4 77 4
p x n
+ +
= = =
+ +
( ) ( )
( )
( )
* 1 0.284 1 0.284
0.284 2.576
4 77 4
0.284 2.576 0.0501 0.284 0.1291
0.155,0.413
p p
p z n
− −
± = ±
+ +
= ±
= ±
=
We are 99% confident that the true population proportion of all breakfast cereals that contain more than 10 grams of sugar per serving is between 16% and 41%.
13. A consumer health advocacy group states that more than one quarter of all breakfast cereals contain more than 10 grams of sugar per serving. State the null and alternative hypothesis to test this claim.
Ho: p = 0.25 Ha: p > 0.25
14. For the test in the previous question, state the test statistic, p-value, decision and conclusion. Use α = 0.01
ˆ 21 0.2727
77 p x
= =n = Test statistic:
(
0) ( )
0 0
ˆ 0.2727 0.25
1 0.25 1 0.25
77 0.0227
0.04935 0.46
p p
z
p p
n
− −
= =
− −
=
= p-value: 0.3228
Decision: Since the p-value is greater than α, do not reject Ho. There is not enough evidence at the 1% level of significance to conclude that more than one quarter of all breakfast cereals contain more than 10 grams of sugar.
The following table gives a breakdown of the shelf on which the cereal appears (shelf = 1 indicates the low shelf, shelf = 0 indicates a higher shelf), and the manufacturer of the cereal.
Self = 1 Shelf = 0 Row totals
General Mills 7 15 22
Kellogg 7 16 23
Nabisco 2 4 6
Quaker 3 5 8
Other 2 16 18
Column totals 21 56 77
15. Use this table information to test for the independence between the two categorical variables, shelf and manufacturer. State the null and alternative hypothesis, compute the test statistic, and give an approximate p-value for the test. State your decision and conclusion based on α = 0.05.
Ho: The shelf on which a cereal appears is independent of the manufacturer.
Ha: The shelf on which a cereal appears depends on the manufacturer.
Table of expected cell counts:
Self = 1 Shelf = 0 Row totals
General Mills 6 16 22
Kellogg 6.27 16.73 23
Nabisco 1.64 4.36 6
Quaker 2.18 5.82 8
Other 4.91 13.09 18
Column totals 21 56 77
Table of
(
actual expected)
2expected
−
Self = 1 Shelf = 0 Row totals General Mills 0.166667 0.0625
Kellogg 0.084321 0.031621
Nabisco 0.080808 0.030303
Quaker 0.306818 0.115057
Other 1.723906 0.646465
Column totals 3.2484652
Test statistic: 3.248
Degrees of freedom: (5-1)(2-1) = 4
p-value: The closest critical value on the chi square table with 4 degrees of freedom is 5.39 which has a tail probability of 0.25. Our computed test statistic is 3.248 which gives an upper tail probability that is larger than 0.25. Thus, our p-value is larger than 0.25.
Decision: Since p-value > α, we do not reject Ho. There is not enough evidence at the 5% level of significance to conclude that the shelf on which a cereal appears is dependent upon the manufacturer.
16. Of those cereals on the low shelf, what percentage is made by Nabisco?
2/21 = 0.095 = 9.5%
Use the multiple regression output below to answer the following questions. The output reflects the regression of the healthiness rating (Y) on the number of calories, fat, and fiber grams per serving as well as the shelf on which the cereal appears.
SUMMARY OUTPUT: Regression using PredInt.xls
Regression Statistics
Multiple R 0.8284
R Square 0.6863
Adjusted R Square 0.6689
Standard Error 8.0834
Observations 77
ANOVA
df SS MS F Significance (p-value) for F
Regression 410292.232573.058 39.3788 0.0000
Residual 724704.56765.34121
Total 76 14996.8197.3263
Dependent (Criterion)
Variable: rating Coef-ficients
Standard Error t Stat
P-value
(2-tails) Lower 95% Upper 95%
X Values for Prediction
Intercept 77.760 6.263 12.416 0.000 65.276 90.245
calories -0.337 0.059 -5.753 0.000 -0.454 -0.220 120
fat -2.571 1.084 -2.372 0.020 -4.732 -0.410 1
fiber 2.324 0.436 5.328 0.000 1.455 3.194 5
Shelf -5.414 2.185 -2.477 0.016 -9.771 -1.058 0
Confidence Level Prediction Interval for a Single Observation Predicted 46.376
0.95 of rating, with the X Values that you Standard Error 8.299
enter in the yellow boxes. Lower 95% 29.833
Upper 95% 62.919
Confidence Interval for Expected rating Fit 46.376
while holding X constant at the values that you Standard Error 1.878
enter in the yellow boxes. Lower 95% 42.632
Upper 95% 50.120
17. What is R2? What does this value mean?
0.6863. This means that 68.63% of the observed variation in the healthiness ratings can be explained by the calories, fat, and fiber per serving in addition to the shelf on which the cereal appears.
18. Estimate the healthiness rating of a cereal with 100 calories, 2 grams of fat, 0 grams of fiber per serving that appears on the low shelf.
ˆ 77.76 .337*100 2.571*2 2.324*0 5.414*1 33.504
y= − − − − =
19. Test to determine if the number of fat grams per serving is a significant linear predictor of the healthiness rating. State the null and alternative hypothesis, test statistic, p-value, decision and conclusion. Use α = 0.05.
β
Test statistic: -2.372 p-value: 0.020
Decision: Since p-value < α, reject Ho. There is enough evidence at the 5% level of significance to conclude that the number of fat grams is a significant linear predictor of the healthiness rating of breakfast cereals.
20. State and interpret the 95% confidence interval for estimating the population slope coefficient of the variable fiber.
The 95% confidence interval is given by (1.455, 3.194). We are 95% confident that a one gram increase in fiber per serving gives an increase in the population average cereal rating of between 1.455 and 3.194 points when comparing cereals with the same number of calories and fat grams per serving that appear on the same shelf.
21. State and interpret the 95% confidence interval for estimating the population slope coefficient of the variable shelf.
The 95% confidence interval is given by (-9.771, -1.058). When comparing cereals with the same number of calories, fat, and fiber per serving, cereals on the low shelf have a population average rating of between 1.058 and 9.771 points lower than cereals on higher shelves.
22. Interpret the slope coefficient for the variable calories.
For each additional calorie per serving contained in a breakfast cereal, the predicted average rating decreases by 0.337 points when comparing cereals with the same amount of fat and fiber per serving that appear on the same shelf in the grocery store.