• No results found

We extended the additive model in two variables to the interaction model by adding a third term to the equation.

N/A
N/A
Protected

Academic year: 2021

Share "We extended the additive model in two variables to the interaction model by adding a third term to the equation."

Copied!
16
0
0

Loading.... (view fulltext now)

Full text

(1)

Quadratic Models

We extended the additive model in two variables to the interaction model by adding a third term to the equation.

Similarly, we can extend the linear model in one variable to the quadratic model by adding a second term to the equation:

E (Y ) = β0+ β1x + β2x2. This a special case of the two-variable model

E (Y ) = β0+ β1x1+ β2x2 with x1 = x and x2 = x2.

1 / 16 Multiple Linear Regression Quadratic Models

(2)

Example: immune system and exercise

x = maximal oxygen uptake (VO2 max, mL/(kg · min));

y = immunoglobulin level (IgG, mg/dL);

data for 30 subjects (AEROBIC.txt).

Get the data and plot them:

aerobic <- read.table("Text/Exercises&Examples/AEROBIC.txt", header = TRUE)

plot(aerobic[, c("MAXOXY", "IGG")])

Slight curvature suggests a linear model may not fit.

2 / 16 Multiple Linear Regression Quadratic Models

(3)

Check the linear model:

plot(lm(IGG ~ MAXOXY, aerobic))

Graph of residuals against fitted values shows definite curvature.

Fit and summarize the quadratic model:

aerobicLm <- lm(IGG ~ MAXOXY + I(MAXOXY^2), aerobic) summary(aerobicLm)

3 / 16 Multiple Linear Regression Quadratic Models

(4)

Output

Call:

lm(formula = IGG ~ MAXOXY + I(MAXOXY^2), data = aerobic)

Residuals:

Min 1Q Median 3Q Max

-185.375 -82.129 1.047 66.007 227.377

Coefficients:

Estimate Std. Error t value Pr(>|t|) (Intercept) -1464.4042 411.4012 -3.560 0.00140 **

MAXOXY 88.3071 16.4735 5.361 1.16e-05 ***

I(MAXOXY^2) -0.5362 0.1582 -3.390 0.00217 **

---

Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1

Residual standard error: 106.4 on 27 degrees of freedom Multiple R-squared: 0.9377, Adjusted R-squared: 0.9331 F-statistic: 203.2 on 2 and 27 DF, p-value: < 2.2e-16

4 / 16 Multiple Linear Regression Quadratic Models

(5)

The quadratic term I(MAXOXY^2) is significant, so we reject the null hypothesis that the linear model is acceptable.

The quadratic term is negative, which is consistent with the concavity of the curve.

The other two t-ratios test irrelevant hypotheses, because the quadratic term is important.

Extrapolation: the fitted curve has a maximum at MAXOXY = 88.3071

2 × 0.5362 ≈ 82

and declines for higher MAXOXY, which seems unlikely to represent the real relationship.

5 / 16 Multiple Linear Regression Quadratic Models

(6)

An alternative analysis

The graph of IGG against log(MAXOXY) is more linear:

with(aerobic, plot(log(MAXOXY), IGG))

aerobicLm2 <- lm(IGG ~ log(MAXOXY), aerobic) summary(aerobicLm2)

with(aerobic, plot(MAXOXY, IGG)) with(aerobic, lines(sort(MAXOXY),

fitted(aerobicLm)[order(MAXOXY)], col = "blue"))

with(aerobic, lines(sort(MAXOXY),

fitted(aerobicLm2)[order(MAXOXY)], col = "red"))

The fitted curve continues to increase indefinitely, but with diminishing slope.

6 / 16 Multiple Linear Regression Quadratic Models

(7)

Output

Call:

lm(formula = IGG ~ log(MAXOXY), data = aerobic)

Residuals:

Min 1Q Median 3Q Max

-165.455 -88.651 -2.395 55.756 218.934

Coefficients:

Estimate Std. Error t value Pr(>|t|) (Intercept) -4885.71 324.33 -15.06 5.87e-15 ***

log(MAXOXY) 1653.38 83.07 19.90 < 2e-16 ***

---

Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1

Residual standard error: 107.6 on 28 degrees of freedom Multiple R-squared: 0.934, Adjusted R-squared: 0.9316 F-statistic: 396.1 on 1 and 28 DF, p-value: < 2.2e-16

7 / 16 Multiple Linear Regression Quadratic Models

(8)

More Complex Models

Complete second-order model When the first-order model

E (Y ) = β0+ β1x1+ β2x2

is inadequate, the interaction model

E (Y ) = β0+ β1x1+ β2x2+ β3x1x2

may be better, but sometimes a complete second-order model is needed:

E (Y ) = β0+ β1x1+ β2x2+ β3x1x2+ β4x12+ β5x22

8 / 16 Multiple Linear Regression More Complex Models

(9)

Example: cost of shipping packages Get the data and plot them:

express <- read.table("Text/Exercises&Examples/EXPRESS.txt", header = TRUE)

pairs(express)

Fit the complete second-order model and summarize it:

expressLm <- lm(Cost ~ Weight * Distance +

I(Weight^2) + I(Distance^2), express) summary(expressLm)

plot(expressLm)

9 / 16 Multiple Linear Regression More Complex Models

(10)

Output

Call:

lm(formula = Cost ~ Weight * Distance + I(Weight^2) + I(Distance^2), data = express)

Residuals:

Min 1Q Median 3Q Max

-0.86027 -0.19898 -0.00885 0.16531 0.94396

Coefficients:

Estimate Std. Error t value Pr(>|t|) (Intercept) 8.270e-01 7.023e-01 1.178 0.258588 Weight -6.091e-01 1.799e-01 -3.386 0.004436 **

Distance 4.021e-03 7.998e-03 0.503 0.622999 I(Weight^2) 8.975e-02 2.021e-02 4.442 0.000558 ***

I(Distance^2) 1.507e-05 2.243e-05 0.672 0.512657 Weight:Distance 7.327e-03 6.374e-04 11.495 1.62e-08 ***

---

Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1

Residual standard error: 0.4428 on 14 degrees of freedom Multiple R-squared: 0.9939, Adjusted R-squared: 0.9918 F-statistic: 458.4 on 5 and 14 DF, p-value: 5.371e-15

10 / 16 Multiple Linear Regression More Complex Models

(11)

Qualitative Variables

A qualitative variable (or factor) is one that indicates membership of different categories.

E.g., a person’s gender = male or female: a qualitative variable with two levels, indicating membership of one of two categories.

E.g., package type = Fragile, Semifragile, or Durable:

three levels, corresponding to three categories.

11 / 16 Multiple Linear Regression More Complex Models

(12)

We code a qualitative variable using indicator (dummy) variables:

Choose one level to use as a base or reference level, say male or Durable.

For each other level, create a variable

xj =

(1 if this item is in this category 0 otherwise.

For gender, there is only one other category, so the only indicator variable is

x =

(1 for a female 0 for a male.

12 / 16 Multiple Linear Regression More Complex Models

(13)

For packages, there are two other categories, so the indicator variables are

xFragile =

(1 for a Fragile package 0 otherwise,

xSemifragile =

(1 for a Semifragile package 0 otherwise,

For any item, at most one of the indicator variables is non-zero, indicating a non-base category;

if they are all zero, the item belongs to the base category.

13 / 16 Multiple Linear Regression More Complex Models

(14)

Example: shipment cost of packages, by type.

Get the data and plot them:

cargo <- read.table("Text/Exercises&Examples/CARGO.txt", header = TRUE)

plot(COST ~ CARGO, cargo)

Fit and summarize the model:

cargoLm <- lm(COST ~ CARGO, cargo) summary(cargoLm)

14 / 16 Multiple Linear Regression More Complex Models

(15)

Output

Call:

lm(formula = COST ~ CARGO, data = cargo)

Residuals:

Min 1Q Median 3Q Max

-2.20 -1.80 -1.00 1.05 4.24

Coefficients:

Estimate Std. Error t value Pr(>|t|) (Intercept) 3.260 1.075 3.032 0.0104 * CARGOFragile 9.740 1.521 6.405 3.38e-05 ***

CARGOSemiFrag 5.440 1.521 3.577 0.0038 **

---

Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1

Residual standard error: 2.404 on 12 degrees of freedom Multiple R-squared: 0.7745, Adjusted R-squared: 0.7369 F-statistic: 20.61 on 2 and 12 DF, p-value: 0.0001315

15 / 16 Multiple Linear Regression More Complex Models

(16)

Note that the intercept is the fitted value for CARGOFragile = 0 and CARGOSemiFrag = 0; that is, for Durable packages.

The coefficients of CARGOFragile and CARGOSemiFrag measure the differences between those categories and Durable.

The overall model F -test is the same as the analysis of variance test:

cargoAov <- aov(COST ~ CARGO, cargo) summary(cargoAov)

Output

Df Sum Sq Mean Sq F value Pr(>F) CARGO 2 238.25 119.13 20.61 0.000132 ***

Residuals 12 69.37 5.78 ---

Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1

16 / 16 Multiple Linear Regression More Complex Models

References

Related documents

The success in implementing a single solution as your core policy system for both commercial and personal lines lies in the consistency, simplicity, and

The markers showing polymorphisms between the male sterile lines and the restorer line, also demonstrating an association with the pollen fertility trait, should be applicable for

Fitat intake in the study was obtained based on the results of interviews used SQFFQ form, then the list of food interview results processed used FP2 program package, the results

The domain panel displays all the domains visited so far The tree panel display the tree visualization of the visited URLs of the domain selected on the domain panel (In a

The results range from a high 94 per cent of listeners who identify the chocolate example as clearly or more like advertising than other program content, to a more moderate 70

We examined six dichotomous indicators of hunger experience (see Table 2): Still Hungry indicates food insecurity when the respondent stated the household did not have enough food

Previously we have assumed that if LDL-C is elevated LDL-P (LDL particle concentration) is also elevated. Unfortunately, in an insulin resistant world, where small LDL