Analytical strategy - The methodological approach of the doctoral thesis

Chapter 1: Migrants’ Group Identities and Attitudinal Integration into Politics –

1.5 The methodological approach of the doctoral thesis

1.5.2 Analytical strategy

In line with the thinking on causal inference by Smith (2014) or Holland (1986), the present doctoral thesis seeks to identify the ‘the effects of causes’ (EoC) instead of the ‘causes of effects’ (CoE). CoE has been traditionally applied within social sciences, leading to an infinite list of causes, and researchers needed to realise, according to Sobel (2000), ‘that they are merely adding more and more variables to a predictive conditioning set, [that] one wonders what will take the place of the thousands of purported (causal) effects that currently fill the journals’ (Sobel 2000, 650). Instead, the causal approach of EoC implies that it is not the main statistical aim to decompose the variance of an outcome variable as far as possible, respectively to identify as many causes of the outcome as possible, but rather the attempt to identify the causal effect of a specific variable such as national identification on an outcome variable such as political attitudes. According to the counterfactual approach on causality (Rubin’s model) (Rubin 1974), the causal effect of a treatment (T) is defined as the difference between the outcome for an individual in case of no treatment as well as in the case of treatment. Yet, an individual can never be observed simultaneously in both states, which is known as the fundamental problem of causal inference (Holland 1986). Within cross-sectional designs, the difference between different individuals is thus measured. Yet, the causal effect would only hold if the assumption of unit homogeneity (no unobserved heterogeneity) holds. Within non-experimental survey data (without randomisation), this is not the case, and this method consequently suffers from the problems of self-selection based on unobserved heterogeneity (also called: omitted variables bias). Longitudinal data (i.e. repeated observations on individuals over time) and respective regression models allow researchers to deal with the problem of selection on observable and unobservable variables (for panel regression models, see Wooldridge 2010; Allison 2009; Mundlak 1978; see also Schunck 2013; Brüderl 2010). Panel data and respective regression methods address variations in characteristics between persons as well as within persons over time. The general formulation of the error-component model looks like the following:

𝑦it= 𝛽1𝑥it+ 𝛼i + 𝜀it

In this model, the error term is divided into two components. While 𝛼i donates to a

person-specific time-constant error term, thus involving the unobserved characteristics that do not change over time, 𝜀it represents the idiosyncratic error term that involves all unobserved

characteristics of a person that vary over time. This error component model is the base of random effects (RE) as well as fixed effects (RE) regression models. Depending on the precise RE and FE model, assumptions on strict exogeneity hold, implying that the x-variables are uncorrelated with the error terms 𝜀it or 𝛼i. Within the FE model, the unobserved 𝛼i is removed

prior to estimation through a transformation of time-deaming the data. Hence, fixed-effects are even unbiased (consistent) if 𝐶𝑜𝑣 (𝑥it , 𝛼i) ≠ 0. Phrased differently, FE controls for all time-

invariant variables of individuals, even though they have not been observed or measured such as sex, country of birth, or personality traits that are rather stable (e.g. intelligence). Moreover, at the same time it applies that FE controls for one part of attrition bias in longitudinal data due to time-constant variables. The estimator based on the time-demeaned variables is called the fixed effects estimator or the within estimator and is entirely based on within-person changes over time. It only rests on the further assumption of strict exogeniety between the independent variables and the idiosyncratic error (unobserved time-variant variables) 𝐶𝑜𝑣(𝑥it , 𝜀it) = 0. In

contrast, the random effects model assumes that the person-specific error term 𝛼i is not

correlated with the predictors, which allows for time-invariant variables to play a role as explanatory variables in the regression models 𝐶𝑜𝑣(𝑥it , 𝛼i) = 0. Yet, the assumption can easily

be violated in non-experimental research due to unobserved heterogeneity, which leads to biased and inconsistent estimates in the case of RE, while FE still provides consistent estimates. Yet, if the assumption 𝐶𝑜𝑣(𝑥it , 𝛼i) = 0 holds, RE is more efficient (due to smaller standard

errors) than FE, because it draws on within- as well as between-person information to estimate the effect. Thus, there is a trade-off between efficiency and bias within panel regression modelling. Yet, in terms of identifying EoC, the latter “bias” is more important.

The empirical studies within Chapters 2-5 draw on these advantages of panel regression analysis. While Chapter 2 on the conditions of national identification applies random effects models, all other chapters (3, 4, and 5) use panel regression models that estimate within-effects in random-effects models, thus allowing us to address both advantages of FE and RE regression simultaneously (Allison 2009; Wooldridge 2010; Mundlak 1978; Rabe-Hesketh and Skrondal 2008; for an overview, see also Schunck 2013). These are called hybrid (Allison 2009) or correlated-random effects models (Mundlak 1978). They rest on the idea of decomposing between and within variation and to estimate the effects within only one model. Even though

the models are not new, they have received increasing attention within studies on panel data. Hence, the hybrid model according to Allison (2009) used within Chapters 2 and 3 on migrants’ political interest decomposes the time-varying variables into a within and between cluster component by

𝑦it= (𝑥it− 𝑥i)′𝛽1+ 𝑥i𝛾 + 𝑧i′𝛿 + 𝛼i+ 𝜀it

Thus, 𝛽1 gives the within or fixed-effect estimate that is unbiased by the level 2 error 𝛼i. As in

fixed-effects models, 𝛽1 is not biased through time-constant unobserved variables. 𝛿 provides

the coefficient for the time-invariant variables, for which 𝐶𝑜𝑣(𝛼i|𝑥it, 𝑧i)=0 still needs to hold.

Yet, by inclusion of the cluster means 𝑥i of the level 1 variables, the model ensures that effect

estimates of the level 2 variables are corrected for between-cluster differences in 𝑥it. In sum,

this hybrid model provides the most efficient and unbiased estimates for time-variant as well as time-invariant indicators of national identification, political interest, as well as ethnicity (i.e. country of origin).

Similar to the hybrid model is the correlated random effects model (CRE) (Mundlak 1978) applied within Chapter 5 on migrants’ satisfaction with democracy. In contrast to the hybrid model, it includes the cluster means of level 1 variables as an alternative to cluster mean centring (Halaby 2003, 519).

𝑦it= (𝑥it)′𝛽1+ 𝑥i𝛾 + 𝑧i′𝛿 + 𝛼i+ 𝜀it

The cluster mean picks up any correlation between the person-specific error and the level 2 variable. While 𝛽1 still provides the same fixed-effects estimate as in the hybrid model,

𝑥i will differ, as it provides within the hybrid model the between effect, while it is within CRE

the difference of the within and between effects.

In addition to the main strategy to account for selection of time-constant unobservables via panel regression models, the empirical regression models in Chapters 2-5 of my thesis also account for the selection of time-varying observables to assess the causal effect of social identification; as for fixed effects, the assumption of 𝐶𝑜𝑣(𝑥it , 𝜀it) = 0 stills needs to hold. I build

on Morgan and Whinship (2007) as well as Pearl’s (2010) framework of directed acyclic graphs (DAGs). Pearl elaborates three different approaches to identifying causal effects, of which one is the conditioning on variables that block all back-door paths from the causal variable to the outcome variable. This means, in more traditional terms, to identify observed variables that simultaneously affect X and Y. This variable is supposed to confound the relationship between X and Y and needs to be conditioned to assess the causal effect of X. Figure 1.4 C provides a

confounder of the relationship between D and Y, as in terms of Pearl’s language, a back-door path. The Path X←C→Y is a back-door path because it includes a directed edge pointing to X. In terms of my relationships of interests within Chapters 3, 4, and 5 between migrants’ time- varying social identification and political attitudes, the approach suggests that other time- dependent integration processes such as social and cultural adaption must be conditioned on to identify the causal effect of changes in psychological group memberships. Hence, in summary, my empirical analyses in the form of panel regression models of political attitudes on ethnic neglect, rather than seek, to primarily account for all causes of political interest, but focus on common causes that affect ethnic identities and political attitudes simultaneously. To find the respective variables is, with a first step, a theoretical task.

Figure 1-6. A causal diagram in which the effect of X on Y is confounded by C

In document Social Identities of Immigrants – Bridges or Barriers for their Attitudinal Integration into Politics in Germany? (Page 58-61)