Week ]12: Relationships and Associations Between Variables
Problem 1. Nationally, about 66% of high school graduates enroll in higher education. This past year, out of the 200 graduating seniors at the high school in your hometown, 145 enrolled in higher education and 55 did not. You want to perform a statistical inference test to determine if the distribution is the same at the national level and at your school. Test this at the 10% level. a. What kind of hypothesis test should be conducted in this scenario?
b. What are the hypotheses? c. What is the significance level? d. What are the expected counts?
e. What is the value of the test statistic? f. What is the p-value?
g. What is the correct decision?
h. What is the appropriate conclusion/interpretation?
Problem 2. A researcher is interested in learning about cell phone ownership at a local high school, specifically for freshman and seniors. They randomly select 500 students and ask them their class year (freshman or senior) and whether they own a cell phone (yes or no). The results are shown in the table below. The researcher wants to perform a statistical inference test to determine if there is an association between class year and cell phone ownership. Test this at the 5% level.
Owns a Cell Phone Doesn’t Own a Cell Phone
Freshman 100 150
Senior 200 50
a. What kind of hypothesis test should be conducted in this scenario? b. What are the hypotheses?
c. What is the significance level? d. What are the expected counts?
e. What is the value of the test statistic? f. What is the p-value?
g. What is the correct decision?
h. What is the appropriate conclusion/interpretation?
Problem 3. A 2013 poll in the State of California surveyed people about taxing sugar-sweetened beverages. The results are presented in the table below and are classified by race/ethnicity as well as survey response. Does there appear to be a relationship between race/ethnicity and survey response? Test this at the 1% level.
Favor Oppose No Opinion
White/Non-Hispanic 234 433 43
Hispanic 147 106 19
African American 24 41 6
Asian American 54 48 16
a. What kind of hypothesis test should be conducted in this scenario? b. What are the hypotheses?
c. What is the significance level? d. What are the expected counts?
e. What is the value of the test statistic? f. What is the p-value?
g. What is the correct decision?
h. What is the appropriate conclusion/interpretation?
Problem 4. You are conducting a study of three types of feed supplements for cattle to test their effectiveness in producing weight gain among calves whose feed includes one of the supplements. You have four groups of 30 calves (one is a control group receiving the usual feed, but no supplement). You want to conduct a statistical inference test to see if there is a relationship between feed supplement and weight. Test this at the 5% level.
a. What are the hypotheses? b. What is the significance level?
c. The ANOVA table below is partially filled in. Complete the missing spaces.
Source DF Sum of Squares Mean Square F Value Prob
Feed (Groups) < 0.00001
Error (Residuals) 374.5
Total 621.4
d. What is the value of the test statistic? e. What is the p-value?
f. What is the correct decision?
g. What is the appropriate conclusion/interpretation?
h. If there had been 35 calves in each group, instead of 30, would the F-Statistic be larger or smaller? Assume the sum of squares remain the same.
Problem 5. Four sororities took a random sample of sisters and asked them their GPA for the past semester. The results are shown in the table below. You want to conduct a statistical inference test to see if there is a relationship between sorority and GPA. Test this at the 5% level.
Sorority 1 Sorority 2 Sorority 3 Sorority 4
2.17 2.63 2.63 3.79
1.85 1.77 3.78 3.45
2.83 3.25 4.00 3.08
a. What are the hypotheses? b. What is the significance level?
c. The blank ANOVA table is shown. Complete the missing spaces.
Source DF Sum of Squares Mean Square F Value Prob Sorority (Groups)
Error (Residuals) Total
d. What is the value of the test statistic? e. What is the p-value?
f. What is the correct decision?
g. What is the appropriate conclusion/interpretation?
Problem 6. The scatterplots below show the relationship between height, diameter, and volume of timber in 31 felled black cherry trees. The diameter of the tree is measured 4.5 feet above the ground.
a. Describe the relationship between volume and height of these trees. b. Describe the relationship between volume and diameter of these trees.
c. Suppose you have height and diameter measurements for another black cherry tree. Which of these variables would be preferable to use to predict the volume of timber in this tree using a simple linear regression model? Explain your reasoning.
Problem 7. The Coast Starlight Amtrak train runs from Seattle to Los Angeles. The scatterplot below displays the distance between each stop (in miles) and the amount of time it takes to travel from one stop to another (in minutes).
a. Describe the relationship between distance and travel time.
b. How would the relationship change if travel time was instead measured in hours, and distance was instead measured in kilometers?
c. The correlation between travel time (in miles) and distance (in minutes) is r = 0.636. What is the correlation between travel time (in kilometers) 60 and distance (in hours)?
Problem 8. What would be the correlation between the annual salaries of males and females at a company if for a certain type of position men always made:
a. $5,000 more than women? b. twice as much as women?
c. 25% less than women?
Problem 9. Determine if the following statements are true or false. Explain
a. A correlation coefficient of -0.90 indicates a stronger linear relationship than a correlation of 0.5.
b. Correlation is a measure of the association between any two variables.
Problem 10. Match each correlation to the corresponding scatterplot.
a. r = -0.72 b. r = 0.07
c. r = 0.86 d. r = 0.99
Problem 11. In college freshman men, it appears as though there is a linear relationship between height (in inches) and weight (in pounds). In a sample of the population, we see that the average height is 68.4 inches, with a standard deviation of 4.0 inches. We see that the average weight is 141.6 pounds, with a standard deviation of 9.6 pounds. The correlation between height and weight in our sample is 0.73.
a. What is the formula for the linear model? b. Interpret the intercept.
c. Interpret the slope. d. What is r-squared? e. Interpret r-squared.
f. James is a male college freshman who is 68 inches tall. What is his predicted weight? g. James actually weights 152 pounds. What is his residual?