3.5 Does touchscreen input reflect attention?
5.2.5 Regression modelling
5.2.5.3 Multiple variables and their interactions
Crucially, regression models can take more than one independent variable. These multiple regression models estimate a parameter for each independent variable (much in the same way as the simple regression models discussed above) in the presence of the other independent variable. The meaning of one independent variable’s parameter is the change in the dependent variable that is correlated with an increase of 1 in this independent variable, keeping all others constant. Multiple regression also allows for interaction effects, which model a change that goes together when two independent variables change at the same time. Interactions typically mean deviations
from the overall tendencies of either variable. For example, a model of general health might show an interaction between quality of diet and amount of exercise: a better diet leads to better health, more exercise leads to better health, but a better diet and more exercise at the same time mean an even greater increase in health than the simple summation of the two individual effects. Interactions can also reverse the trend of the individual effects: a model of election data may show an effect of candidate age (with younger, less experienced, less known candidates winning fewer votes) and an effect of demographics of the district (with districts with a younger, less engaged population voting less overall), but also an interaction effect of the two: young candidates running in ’younger’ districts may winmorevotes (because their platform or public profile engage the voters more).
Continuing with examples from theetymologydata, it is conceivable that the effect of family size interacts with the effect of regularity: for example, the irregular English verbseeis probably more frequent than the regularbelieve, even though they have similar family sizes. A regression model that uses these two effects side-by-side, without an interaction, would not reveal this effect. Interaction effects allow the model to account for changes in two independent variablestogether having an effect on the dependent variable. Fig.5.37 shows the frequency of the 285 Dutch verbs in a corpus of writing by their family size (just as in Fig.5.34), colored by whether they have a regular inflectional paradigm (blue dots) or an irregular one (orange dots). The regression model using family size, regularity of inflection, and their interaction as predictors for frequency is summarized in Table5.7. The first three rows of that table can be interpreted much as above: the intercept now is the fitted frequency value for when all variables are 0 or at their reference level. This means that this model predicts a frequency value of 6.1 for regular verbs with no other words derived from the same base. The effects for irregular inflection and family size by themselves (middle rows of Table5.7) can be added to this to calculate the predicted values for the cases where only one of those variables changes: adding the effect of irregular inflection (−1.61) to the intercept value gives 4.49 as the predicted value for an irregular verb with family size 0; adding the effect of family size (multiplied by the family size) gives the predicted value for a regular verb with that family size, like 6.1+ (3×0.53)= 7.69 for a verb with family size 3. The standard error, Wald test statistic, andpvalues in Table5.7are calculated as above; with 0.05 as the significance level for p, all effects are significant.
When both variables change from the intercept/baseline, however, the interaction effect (“irreg- ular : family size”, the bottom row of Table5.7) comes into play as well: we add both effects and the interaction effectto the intercept value. The model thus predicts that an irregular verb with family size 3 has a frequency value of 6.1+ (−1.61)+ (3×0.53) +(3×0.58) = 7.82. As both the main effect parameter for family size and the interaction parameter are multiplied by the same value (a verb’s family size), they can be summed for simplicity into 1.11. This
is how much the frequency value is changed by a change of 1 in the family size if the verb is irregular—in other words, the slope of the regression line for irregular verbs. The interaction effect does not affect the prediction for regular verbs, so the change and slope there are simply the parameter for family size, 0.53. The two lines in Fig. 5.37show the model fits for regular (blue line) and irregular (orange line) verbs. It is apparent from these values and slopes that the model suggests the effect of family size on frequency is stronger for irregular verbs than it is for regular verbs.
variable parameter estimate standard error t p
(Intercept) 6.10 0.36 17.14 < 0.01
irregular inflection -1.61 0.57 -2.85 < 0.01
family size 0.53 0.11 5.05 < 0.01
irregular : family size 0.58 0.17 3.49 < 0.01
Table 5.7: Coefficients of regression model for writing corpus frequency on regularity of inflectional paradigm and family size
This concludes the illustration of regression modelling in this section. In practice, models with many more independent variables can be fit to test which (if any) of them have a significant effect on the dependent variable in the presence of all the others. This is the approach I will take in this thesis: all (independent) variables that are present in the data and that I have a reason to expect to have an effect on the dependent variable in question will be included in a given model to test whether they do have an effect. In other words, no model reduction or selection will occur. Thep-values of their parameter estimates under the normality assumption for the Wald statistic will be used as the indicator for whether an independent variable has a significant effect on that dependent variable.