1. Introduction: Proximity, Politics, and the Study of Mass Public Opinion
1.4 The Specification, Testing and Probing of Interaction Effects
1.4.4 Interaction Effects in Linear versus Generalized Linear Models
In the past decade, the testing of interactions in models with dichotomous and limited dependent variables (such as logit models) has received a substantial amount of attention in econometrics and political methodology. This literature has been anything but unequivocal, resulting in sometimes competing prescriptions to applied researchers. Given that logit models (both binary logit and ordinal logit) factor prominently in the articles of the thesis, it is worth explicating at some length the two contrasting theoretical perspectives on logit models – one which conceives of logit models as non-linear probability models, and one which sees logit models as a case of the generalized linear model, and thus within the same framework as linear models – since the choice between these two perspectives on the same underlying statistical model has implications for how one tests interaction effects in such models. To state my perspective at the outset, I adopt the generalized linear model approach to logit models over the non-linear probability model approach.
A widely cited perspective originating in econometrics emphasizes the differences between linear and logit models, especially as it relates to testing interaction effects. While technically complex, the key claims of Norton and colleagues is that the logic of interaction from linear models does not carry over to logit models, and the statistical significance of an interaction effect cannot be
assessed using the coefficient and standard error of a product term. The reasons for this is that, as a ‚nonlinear model‛ where the dependent variable is constrained to the interval [0, 1], the value of the interaction effect depends on the values of all of the independent variables in the model, not simply the values of the variables involved in the specified interaction (Ai and Norton 2003; Norton, Wang, and Ai 2004). This is because the values of the other independent variables contribute to the predicted probability of the outcome, which can approach 0 or 1, reducing the effect of the interaction term. This phenomenon is known as ‚compression‛ (Berry, DeMeritt, and Esarey 2010; Berry, Golder, and Milton 2012; Tsai and Gill 2013).
Further, compression is not accounted for in the coefficient or standard error of the product term typically reported by statistical software. The solution, from this perspective, is to calculate cross-derivatives to estimate the interaction and its statistical significance using specialized statistical procedures (Ai and Norton 2003;
Norton, Wang, and Ai 2004).
An alternative perspective, more at home within social statistics and political methodology, emphasizes the underlying continuity between linear and logit models, and argues for an extension of the logic of linear models (and interactions in the case of linear models) to logit models. This approach follows the seminal work of McCullagh and Nelder, who argue that there is continuity between linear models and other regression models, such as logit, probit, and Poisson models, because they
‚share a number of properties such as linearity and have a common method for computing parameter estimates. These common properties enable us to study generalized linear models as a single class rather than as an unrelated collection of special topics‛ (1983, 1). McCullagh and Nelder explain that generalized linear models do not differ from linear models in their ‚systematic component‛ – the linear predictor (denoted by η, or eta). They differ only in their ‚random
component‛ – the measurement of the dependent variable – and in the link between the random and systematic components. That is, they employ different link functions – the identity link (for linear models) versus the logit, probit, Poisson, and other link functions (for generalized linear models) (McCullagh and Nelder 1983, 18–24).
Following the logic of McCullagh and Nelder’s argument, Jaccard’s widely cited treatment of interaction effects in logit models emphasizes the log-odds interpretation of logistic regression over the non-linear probability interpretation.
He argues that focusing on log-odds ‚permits us to stay in the familiar terrain of the general linear model with the traditional interpretation of slopes and intercepts. For interaction models, it permits us to take the same general principles for analyzing interactions in traditional [that is, linear] regression analysis and apply them directly to log odds based models‛ (Jaccard 2001, 10–11). Also following directly from McCullagh and Nelder, Fox (2008), presents the latent-variable formulation of the binary logit model. This formulation posits an unobserved continuous variable denoted by ξ (or xi), where Y = 0 when ξ is at or below a threshold given by α; Y = 1 when ξ is above α. So while the observed outcome Y is dichotomous, the unobserved outcome ξ, which is the variable of theoretical interest, is continuous.
The latent variable ξ can thus be assumed to be a linear function of the independent variables (Fox 2008, 343). This unobserved-variable formulation extends to the case of multiple ordered categories. This latent variable approach to ordinal logit models similarly posits an unobserved continuous variable ξ and expands the model to include multiple thresholds, given by α1, α2, < αm – 1, where m is the number of observed categories of Y. In this formulation, Y = 1 when ξ ≤ α1, Y = 2 when α1 < ξ ≤ α2, < Y = m when ξ > αm – 1. Importantly, ξ remains a linear function of the independent variables (Fox 2008, 363–366).
Stepping back from the statistical theory and mathematical notation, it is worth drawing attention to the differences in terminology employed by the two competing perspectives, namely the characterization of logit models as non-linear models versus generalized linear models. Part of the reason for the difference in terminology is the two perspectives’ focus on different quantities of interest despite of their use of a common statistical model. The difference is between a focus on the probability of an outcome, P(Y), versus the value of a hypothetical continuous latent variable, ξ, given an observed categorical measurement. As Rainey (2016) notes, the existing statistical literature has often not been clear which is the quantity of interest, even though the appropriate methods for testing interaction depend on the choice to focus on P(Y) or ξ. If opting for the generalized linear model (log-odds or latent variable) approach, the specification and testing of interactions in binary and ordinal logit models are no different from the case of linear models. Berry, DeMerritt and Esarey (2010, 250) are clear on this point: ‚If one’s hypotheses are about the effects of X1 and X2 on the unbounded latent dependent variable, the situation is fully analogous to the case of a continuous dependent variable model estimated with [linear] regression, and a [significant] nonzero product term coefficient is necessary for interaction.‛ Tsai and Gill (2013, 97–98) are similarly clear in stating that where the researcher’s interest is centred on an unbounded latent variable and not the probability of an event, then a model specification including a product term, following directly from conventional linear models, is both appropriate and sufficient for testing interaction.
Further, I contend that the latter perspective – focused on ξ, not P(Y) – is more appropriate given my substantive focus on mass public opinion and the attitudinal survey data I rely upon. To be certain, political data provide a number of examples of dichotomous data that are naturally conceived and measured as
dichotomous – e.g., voting (voted versus did not vote), policy adoption (adopted or not), and conflict data (hostilities exist or do not). These are examples where modelling the probability of the occurrence of an event makes most sense. Focusing on P(Y) in such instances accords with the presentation of limited dependent variable models in standard econometrics texts such as Wooldridge (2010).3 In many cases, though – attitudinal survey data being a case in point – the dichotomous or ordered categorical coding of the dependent variables merely reflects the coarseness of measurement using survey-based measures of what remain best conceived as underlying continuous latent constructs – for example, preferences for closer (or more distant) Canada–US relations, support for (or opposition to) the Keystone XL pipeline, and preferences for more accommodating (or more restrictionist) immigration and naturalization policies. It is these unobserved, underlying constructs that remain the focus of theoretical interest, and not the observed categorical measures. For these reasons, I identify with the generalized linear model perspective represented by McCullagh and Nelder (1983), Jaccard (2001), and Fox (2008), and not the competing nonlinear probability model approach of Norton and colleagues (Ai and Norton 2003; Norton, Wang, and Ai 2004). My approach to specifying and testing interaction effects in logit models thus follows from the former approach, not the latter.