• No results found

Final Exam Practice Problem Answers

N/A
N/A
Protected

Academic year: 2022

Share "Final Exam Practice Problem Answers"

Copied!
7
0
0

Loading.... (view fulltext now)

Full text

(1)

Final Exam Practice Problem Answers

The following data set consists of data gathered from 77 popular breakfast cereals. The variables in the data set are as follows:

Brand: The brand name of the cereal

Calories: The number of calories per serving

Protein: The number of grams of protein per serving Fat: The number of grams of fat per serving

Fiber: The number of grams of fiber per serving

Sodium: The number of milligrams (mg) of sodium per serving Carbo: The number of grams of carbohydrates per serving Sugars: The number of grams of sugars per serving

Vitamins: The percentage of the recommended daily allowance (RDA) of vitamins per serving Shelf: 1 indicates that the cereal appears on the lowest shelf in the store

0 indicates that the cereal does not appear on the lowest shelf in the store

rating: An overall healthiness rating for the cereal. The higher the rating, the healthier the cereal.

Some observations from the data set follow:

name calories protein fat sodium fiber carbo sugars vitamins Shelf rating

Product_19 100 3 0 320 1 20 3 100 0 41.504

Cheerios 110 6 2 290 2 17 1 25 0 50.765

Corn_Flakes 100 2 0 290 1 21 2 25 0 45.863

Rice_Krispies 110 2 0 290 0 22 3 25 0 40.560

Corn_Chex 110 2 0 280 0 22 3 25 0 41.445

The Excel output below gives information about the sodium content in the 77 cereals. Use this to answer the following questions

sodium

Mean 159.6753

Standard Error 9.553577

Median 180

Mode 0

Standard Deviation 83.8323 Sample Variance 7027.854

Kurtosis -0.34524

Skewness -0.57571

Range 320

Minimum 0

Maximum 320

Sum 12295

Count 77

Confidence Level(90.0%) 15.90814

sodium

Min 0

Q1 130

Median 180

Q3 210

Max 320

Outliers 0

0 0 0 0 0

(2)

1. Describe the shape of the distribution of sodium contents in the 77 breakfast cereals.

The distribution is slightly skewed to the left and contains 9 outliers. These outliers all appear as one point on the boxplot because each of the 9 outlying cereals contain 0 mg of sodium per serving.

2. What is the median sodium content in the cereals? What does this value represent?

The median sodium content in the cereals is 180 mg. This implies that 50% of the cereals in the sample have less than 180 mg. of sodium per serving. Likewise, 50% of the cereals in the sample have more than 180 mg. of sodium per serving.

3. The 25% of the cereals that contain the most sodium contain at least how much sodium per serving?

This value would be 75th percentile or the 3rd quartile. The 25% of the cereals with most sodium contain at least 210 mg per serving.

4. What is the standard deviation of the sodium contents? What does this value represent?

The standard deviation of the sodium contents is 83.83. This is a measure of variability in the sample. Specifically it measures the spread of the observations around the sample mean.

5. Assume that this represents a random sample of 77 cereals from the population of all breakfast cereals. Conduct a hypothesis test to determine if the mean sodium content in all cereals is greater than 140 mg. per serving. State the null and alternative hypothesis, the test statistic, p- value or an approximate p-value, and the decision and conclusion. Use α = 0.01

Ho: µ = 140 Ha: µ > 140

Test statistic: 159.6753 140 2.0683.8323 77

t x s

n µ

− −

= = =

Degrees of freedom: n-1 = 76

p-value: use approximate degrees of freedom of 80 on the t-table. Note that the computed test statistic falls between the critical values of 1.990 and 2.088 on the t-table. This implies that the p-value falls in the range 0.02 < p-value < 0.025.

Decision: Since the p-value is greater than α, we will not reject the null hypothesis. There is not sufficient evidence at the 1% level of significance to conclude that the mean sodium content in all cereals is greater than 140 mg per serving.

6. What is the IQR of the sample? What does this value represent?

The IQR gives the range of the middle 50% of the sample. It is the difference between the third and first quartiles and is given by Q3-Q1 = 210-130 = 80.

The following Excel output gives information about the healthiness ratings of cereals that appear on the low shelf in the store compared to the ratings of cereals that do not appear on the low shelf in the store. The output was generated using αααα = 0.05. Use this output to answer the following questions. Assume that the data represent random samples from the populations of all cereals on the low shelf and those not on the low shelf in the store.

(3)

7. What is the sample variance of the healthiness rating of cereals that do not appear on the low shelf?

s2 = 170.805

8. Suppose you wish to conduct a hypothesis test to determine if cereals on the low shelf have a lower average healthiness rating than those appearing on higher shelves. State the null and alternative hypothesis to test this claim.

H0: µlow = µhi

Ha: µlow < µhi

9. State the test statistic, p-value, decision, and conclusion to the hypothesis test in the previous question. Use α = 0.05

Test statistic: -3.014 p-value: 0.002

Decision: Since the p-value is less than α, reject H0. There is sufficient evidence to conclude that cereals on the low shelf have lower average healthiness ratings than those that do not appear on the low shelf.

10. Compute and interpret a 95% confidence interval to estimate the difference in the population mean healthiness ratings between cereals that appear on the lower shelf and those on higher shelves.

56 805 . 170 21

685 . 032 194 . 2 578 . 10

* ) (

2 22 1 12 2

1 − ± + =− ± +

n s n t s x x

= -10.578 ± 2.032(3.510)

= -10.578 ± 7.132

With 95% confidence, on average cereals on the low shelf in the grocery store have a rating of between 3.45 and 17.71 points lower than cereals on higher shelves.

(4)

11.What is the margin of error for the confidence interval computed in the previous question?

The margin of error for the interval computed above is 7.132

Suppose that the 77 cereals represent a random sample of all breakfast cereals. 21 of the cereals contain more than 10 grams of sugar per serving. Use this information to answer the following questions.

12. Compute a 99% confidence interval to estimate the true proportion of breakfast cereals that contain more than 10 grams of sugar per serving. Interpret the interval.

2 21 2 0.2840 4 77 4

p x n

+ +

= = =

+ +

( ) ( )

( )

( )

* 1 0.284 1 0.284

0.284 2.576

4 77 4

0.284 2.576 0.0501 0.284 0.1291

0.155,0.413

p p

p z n

− −

± = ±

+ +

= ±

= ±

=

We are 99% confident that the true population proportion of all breakfast cereals that contain more than 10 grams of sugar per serving is between 16% and 41%.

13. A consumer health advocacy group states that more than one quarter of all breakfast cereals contain more than 10 grams of sugar per serving. State the null and alternative hypothesis to test this claim.

Ho: p = 0.25 Ha: p > 0.25

14. For the test in the previous question, state the test statistic, p-value, decision and conclusion. Use α = 0.01

ˆ 21 0.2727

77 p x

= =n = Test statistic:

(

0

) ( )

0 0

ˆ 0.2727 0.25

1 0.25 1 0.25

77 0.0227

0.04935 0.46

p p

z

p p

n

− −

= =

− −

=

= p-value: 0.3228

Decision: Since the p-value is greater than α, do not reject Ho. There is not enough evidence at the 1% level of significance to conclude that more than one quarter of all breakfast cereals contain more than 10 grams of sugar.

(5)

The following table gives a breakdown of the shelf on which the cereal appears (shelf = 1 indicates the low shelf, shelf = 0 indicates a higher shelf), and the manufacturer of the cereal.

Self = 1 Shelf = 0 Row totals

General Mills 7 15 22

Kellogg 7 16 23

Nabisco 2 4 6

Quaker 3 5 8

Other 2 16 18

Column totals 21 56 77

15. Use this table information to test for the independence between the two categorical variables, shelf and manufacturer. State the null and alternative hypothesis, compute the test statistic, and give an approximate p-value for the test. State your decision and conclusion based on α = 0.05.

Ho: The shelf on which a cereal appears is independent of the manufacturer.

Ha: The shelf on which a cereal appears depends on the manufacturer.

Table of expected cell counts:

Self = 1 Shelf = 0 Row totals

General Mills 6 16 22

Kellogg 6.27 16.73 23

Nabisco 1.64 4.36 6

Quaker 2.18 5.82 8

Other 4.91 13.09 18

Column totals 21 56 77

Table of

(

actual expected

)

2

expected

Self = 1 Shelf = 0 Row totals General Mills 0.166667 0.0625

Kellogg 0.084321 0.031621

Nabisco 0.080808 0.030303

Quaker 0.306818 0.115057

Other 1.723906 0.646465

Column totals 3.2484652

Test statistic: 3.248

Degrees of freedom: (5-1)(2-1) = 4

p-value: The closest critical value on the chi square table with 4 degrees of freedom is 5.39 which has a tail probability of 0.25. Our computed test statistic is 3.248 which gives an upper tail probability that is larger than 0.25. Thus, our p-value is larger than 0.25.

Decision: Since p-value > α, we do not reject Ho. There is not enough evidence at the 5% level of significance to conclude that the shelf on which a cereal appears is dependent upon the manufacturer.

16. Of those cereals on the low shelf, what percentage is made by Nabisco?

2/21 = 0.095 = 9.5%

(6)

Use the multiple regression output below to answer the following questions. The output reflects the regression of the healthiness rating (Y) on the number of calories, fat, and fiber grams per serving as well as the shelf on which the cereal appears.

SUMMARY OUTPUT: Regression using PredInt.xls

Regression Statistics

Multiple R 0.8284

R Square 0.6863

Adjusted R Square 0.6689

Standard Error 8.0834

Observations 77

ANOVA

df SS MS F Significance (p-value) for F

Regression 410292.232573.058 39.3788 0.0000

Residual 724704.56765.34121

Total 76 14996.8197.3263

Dependent (Criterion)

Variable: rating Coef-ficients

Standard Error t Stat

P-value

(2-tails) Lower 95% Upper 95%

X Values for Prediction

Intercept 77.760 6.263 12.416 0.000 65.276 90.245

calories -0.337 0.059 -5.753 0.000 -0.454 -0.220 120

fat -2.571 1.084 -2.372 0.020 -4.732 -0.410 1

fiber 2.324 0.436 5.328 0.000 1.455 3.194 5

Shelf -5.414 2.185 -2.477 0.016 -9.771 -1.058 0

Confidence Level Prediction Interval for a Single Observation Predicted 46.376

0.95 of rating, with the X Values that you Standard Error 8.299

enter in the yellow boxes. Lower 95% 29.833

Upper 95% 62.919

Confidence Interval for Expected rating Fit 46.376

while holding X constant at the values that you Standard Error 1.878

enter in the yellow boxes. Lower 95% 42.632

Upper 95% 50.120

17. What is R2? What does this value mean?

0.6863. This means that 68.63% of the observed variation in the healthiness ratings can be explained by the calories, fat, and fiber per serving in addition to the shelf on which the cereal appears.

18. Estimate the healthiness rating of a cereal with 100 calories, 2 grams of fat, 0 grams of fiber per serving that appears on the low shelf.

ˆ 77.76 .337*100 2.571*2 2.324*0 5.414*1 33.504

y= − − − − =

19. Test to determine if the number of fat grams per serving is a significant linear predictor of the healthiness rating. State the null and alternative hypothesis, test statistic, p-value, decision and conclusion. Use α = 0.05.

β

(7)

Test statistic: -2.372 p-value: 0.020

Decision: Since p-value < α, reject Ho. There is enough evidence at the 5% level of significance to conclude that the number of fat grams is a significant linear predictor of the healthiness rating of breakfast cereals.

20. State and interpret the 95% confidence interval for estimating the population slope coefficient of the variable fiber.

The 95% confidence interval is given by (1.455, 3.194). We are 95% confident that a one gram increase in fiber per serving gives an increase in the population average cereal rating of between 1.455 and 3.194 points when comparing cereals with the same number of calories and fat grams per serving that appear on the same shelf.

21. State and interpret the 95% confidence interval for estimating the population slope coefficient of the variable shelf.

The 95% confidence interval is given by (-9.771, -1.058). When comparing cereals with the same number of calories, fat, and fiber per serving, cereals on the low shelf have a population average rating of between 1.058 and 9.771 points lower than cereals on higher shelves.

22. Interpret the slope coefficient for the variable calories.

For each additional calorie per serving contained in a breakfast cereal, the predicted average rating decreases by 0.337 points when comparing cereals with the same amount of fat and fiber per serving that appear on the same shelf in the grocery store.

References

Related documents

Ova knjiga govori tek o malom broju svet- skih mitova, i uglavnom iznosi samo jednu verziju svake priče, ali sadrži širok izbor iz čitavog sveta, uključujući i mnoge mitove iz

When incorporating a component that by itself has been approved or certified as a medical device or for which marketing notification has been submitted, state the name of the

Partnership for Prevention, funded by The Robert Wood Johnson Foundation, began a project in 2001 to study employer coverage of clinical preventive services. Study goals are:

The effect of selection for residual feed intake on scale activity and exit 1.. score in Yorkshire gilts

purports to protect women from abortion, it is useful to remain cognizant of the ways in which birth followed by adoption can be precisely the sort of

To become and remain in good standing with the College for CPD, psychiatrists must undertake an average of 50 hours/credits of peer group approved educational activity during

The Nagoya Protocol on Access to Genetic Resources and the Fair and Equitable Sharing of Benefits Arising from their Utilization to the Convention on Biological Diversity 2

Taking into account that AhR expression promotes differentiation in different cell types ( Esser and Rannug, 2015; Mulero-Navarro and Fernandez-Salguero, 2016 ), and