3.4.8Problems
3.23 Try to establish the relationship that twice around the thumb is once around the wrist. Measure some volunteers’ thumbs and wrists and fit a regression line. What should the slope be? While you are at it, try to find relationships between the thumb and neck size, or thumb and waist. What do you think: Did Gulliver’s shirt fit well?
3.24 The data set fat (UsingR) contains ten body circumference measurements. Fit a linear model modeling the circumference of the abdomen by the circumference of the wrist. A 17-cm wrist size has what predicted abdomen size?
3.25 The data set wtloss (MASS) contains measurements of a patient’s weight in kilograms during a weight-rehabilitation program. Make a scatterplot showing how the variable Weight decays as a function of Days.
1. What is the Pearson correlation coefficient of the two variables?
2. Does the data appear appropriate for a linear model? (A linear model says that for two comparable time periods the same amount of weight is expected to be lost.)
3. Fit a linear model. Store the results in res. Add the regression line to your scatterplot. Does the regression line fit the data well?
4. Make a plot of the residuals, residuals (res), against the Days variable. Comment on the shape of the points.
3.26 The data frame x77 contains data from each of the fifty United States. First coerce the state. x77 variable into a data frame with
> x77 = data.frame(state.x77)
1. The model of illiteracy rate (Illiteracy) modeled by high school graduation rate HS. Grad.
2. The model of life expectancy (Life. Exp) modeled by (Murder()) the murder rate. 3. The model of income (Income) modeled by the illiteracy rate (Illiteracy).
Write a sentence or two describing any relationship. In particular, do you find it as expected or is it surprising?
3.27 The data set batting (UsingR) contains baseball statistics for the year 2002. Fit a linear model to runs batted in (RBI) modeled by number of home runs (HR). Make a scatterplot and add a regression line. In 2002, Mike Piazza had 33 home runs and 98 runs batted in. What is his predicted number of RBIs based on his number of home runs? What is his residual?
3.28 In the American culture, it is not considered unusual or inappropriate for a man to date a younger woman. But it is viewed as inappropriate for a man to date a much younger woman. Just what is too young? Some say anything less than half the man’s age plus seven. This is tested with a survey of ten people, each indicating what the cutoff is for various ages. The results are in the data set too.young (UsingR). Fit the regression model and compare it with the rule of thumb by also plotting the line y=7+(1/2)x. How do they compare?
3.29 The data set diamond (UsingR) contains data about the price of 48 diamond rings. The variable price records the price in Singapore dollars and the variable carat records the size of the diamond. Make a scatterplot of carat versus price. Use pch=5 to plot with diamonds. Add the regression line and predict the amount a one-third carat diamond ring would cost.
3.30 The data set Animals (MASS) contains the body weight and brain weight of several different animals. A simple scatterplot will not suggest the true relationship, but a log-transform of both variables will. Do this transform and then find the slope of the regression line.
Compare this slope to that found from a robust regression model using lqs( ). Comment on any differences.
3.31 To gain an understanding of the variability present in a measurement, a researcher may repeat or replicate a measurement several times. The data set breakdown (UsingR) includes measurements in minutes of the time it takes an insulating fluid to break down as a function of an applied voltage. The relationship calls for a log-transform. Plot the voltage against the logarithm of time. Find the coefficients for simple linear regression and discuss the amount of variance for each level of the voltage.
3.32 The motors (MASS) data set contains measurements on how long, in hours, it takes a motor to fail. For a range of temperatures, in degrees Celsius, a number of motors were run in an accelerated manner until they failed, or until time was cut off. (When time is cut off the data is said to have been censored.) The data shows a relationship between increased temperature and shortened life span.
The commands
> data(motors, package="MASS")
produce a scatterplot of the variable time modeled by temp. The pch=cens argument marks points that were censored with a square; otherwise a circle is used. Make the scatterplot and answer the following:
1. How many different temperatures were used in the experiment?
2. Does the data look to be a candidate for a linear model? (You might want to consider why the data point (150,8000) is marked with a square.)
3. Fit a linear model. What are the coefficients?
4. Use the linear model to make a prediction for the accelerated lifetime of a motor run at a temperature of 210°C.
3.33 The data set mw.ages (UsingR) contains census 2000 data on the age distribution of residents of Maplewood, New Jersey. The data is broken down by male and female.
Attach the data set and make a plot of the Male and Female variables added together. Connect the dots using the argument type="l". For example, with the command plot(1:103,Male + Female,type="l").
Next, layer on top two trend lines, one for male and one for female, using the supsmu() function. What age group is missing from this town?