Empirical framework - Essays on Genetics and the Social Sciences

To estimate the average impact of the Swedish comprehensive schooling reform on educational attainment, cognitive performance, and income, we follow Meghir and Palme (2005) and employ differences-in-differences. Our identification strategy leverages the fact that the reform was implemented at different times in different municipalities; it is based on the comparison of cohorts that were schooled before and after the reform within municipalities, and on the comparison of pre- and post-reform municipalities within cohorts.

The point of departure for our empirical framework is the following specification: (5.1) 𝑦𝑖𝑐𝑚= 𝛽0+ 𝛽𝑅𝑅𝑐𝑚+ 𝛽𝑆𝑆𝑖𝑐𝑚+ 𝐂𝒊𝒄𝒎𝛃𝑪+ 𝜀𝑖𝑐𝑚,

where 𝑦𝑖𝑐𝑚 is some outcome of interest (educational attainment, income or cognitive perfor-

mance), for individual i in birth cohort c and who went to school in municipality m; 𝑅𝑐𝑚 is

a dummy indicator of reform status for cohort c in municipality m; 𝑆𝑖𝑐𝑚 is the polygenic

score of educational attainment for individual i; and 𝐂𝒊𝒄𝒎 is a vector of control variables for

individual i.

The control variables include fixed effects for each birth cohort (i.e., birth year) and a set of municipality clusters. We defined the municipality clusters by grouping together municipalities having the same first birth cohort affected by the reform.25_{They also include the top}

ten principal components of the genetic-relatedness matrix among the regressors (to control for population stratification, as explained above). We run all regressions separately for males and females, so we do not need to control for gender.

Because the municipalities to a certain extent could self-select into the evaluation program, it is possible that the timing of the reform was related to municipality specific characteristics that also influenced the outcomes of interest. More precisely, our identification strategy re- lies on the assumption that changes in municipality-specific factors are not correlated with

25_{This approach is similar to the setup used in Pekkarinen, Uusitalo, & Kerr (2009) in their study of the effects of} a Finnish school reform on intergenerational income mobility. Including a fixed effect for each of the 1,000+ municipalities would substantially reduce the degrees of freedoms of our regressions. Furthermore, in the interaction specification introduced below, the fixed effects for the municipality clusters and the cluster-specific birth year trends are interacted with both the reform dummy and the polygenic score; using fixed effects for each municipality instead of for each municipality cluster would leave us with no degrees of freedom at all. As we show in Section 5.6.4, our main estimates are very similar and remain significant when using municipality fixed effects instead of fixed effects for the municipality clusters (while still using cluster-specific birth year trends).

the exact timing of the reform, conditional on the control variables.26_{To minimize the}

chances this assumption fails, we also include in our regressions separate birth year trends for each municipality cluster as well as a set of time-varying municipality-level covariates intended to measure demographic and socioeconomic changes.27_{The detailed analyses pre-}

sented in Hjalmarsson et al. (2015) and Lindgren, Oskarsson and Dawes (n.d.) further cor- roborate the view that we can treat reform participation as exogenous in our sample. To estimate possible interaction effects between the polygenic score and the effects of the schooling reform, we augment the above model with terms for the interaction between the reform dummy and the score as well as for interactions between the control variables and the score and between the control variables and the reform dummy28_:

(5.2) 𝑦𝑖𝑐𝑚= 𝛽0+ 𝛽𝑅𝑅𝑐𝑚+ 𝛽𝑆𝑆𝑖𝑐𝑚+ 𝛽𝑅𝑆(𝑅𝑐𝑚×𝑆𝑖𝑐𝑚)

+𝐂𝒊𝒄𝒎𝛃𝐂+ (𝐂𝒊𝒄𝒎×𝑆𝑖𝑐𝑚)𝛃𝑪𝑺+ (𝐂𝒊𝒄𝒎×𝑅𝑐𝑚)𝛃𝑪𝑹+ 𝜀𝑖𝑐𝑚

For the continuous outcomes (years of schooling, cognitive performance, and income), we estimated the above regression by ordinary least squares, clustering at the municipality level. For the binary outcomes (dummies indicating highest degree completed), we estimated linear probability models,29_{also clustering at the municipality level.}

For income, we estimated a panel model in which we also controlled for a third degree pol- ynomial of age (in addition to the above control variables), by ordinary least squares and clustering at the municipality level. For the baseline specification, we included all individual-year observations for which an individual was between 25 and 55 years old when his or

26_{Since our regressions include fixed effects for the municipality clusters, time-invariant differences between early} and late reformers will not compromise our identification strategy.

27_{The time-varying municipality level covariates include the following variables (each measured the year the indi-} vidual turned eleven): municipal level voter turnout, vote shares for the largest parties, and size of the electorate. We use political indicators since year-by-year indicators of socioeconomic development at the municipal level are only available for more recent time-periods. However, previous research has shown that aggregate level turnout and party vote shares in Sweden are highly correlated with more direct measures of socioeconomic development (Elinder, 2010). To create the year-by-year indicators, we interpolated turnout, vote shares and electorate size between the election years (1948, 1952, 1956, 1958, 1960, 1964, and 1968).

28_{A common concern in GxE studies is that interaction effects may be driven by confounders; for this reason, it is} important to control for interactions between the control variables and the two interacted covariates of interest. For example, suppose that the average polygenic score is higher in wealthier cities and that the reform had a smaller effect in those cities. Under such a scenario, an estimate of the interaction between reform status and the score may be confounded unless we control for the interaction between reform status and municipality wealth (or municipality fixed effects).

29_{As Ai and Norton (2003) show, coefficients on interaction terms are not easy to interpret in probit and logit} models. Because we are primarily interested in the coefficient on the interaction between score and reform, we use a linear probability model instead of a logit or probit model for the binary outcomes.

148 OF GENES AND SCREENS

her income was measured (as mentioned above, income was measured every five years). We also ran specifications including all individual-year observations for which an individual was 23 to 32 years old (“early career”); all individual-year observations for which an individual was 33 to 42 years old (“mid career”); and all individual-year observations for which an individual was 43 to 52 years old (“late career”). (Because the income data was measured every five years, most individuals had two individual-year observations in each of the early, mid, and late career specifications.)

In each municipality we exclude the birth cohort preceding the first cohort affected by the school reform. The reason for doing so is that previous studies have shown that the youngest pre-reform cohort was significantly affected by the reform, possibly due to the fact that a substantial share of the pupils born late in a given year started school a year later than they were supposed to (Fredriksson & Öckert, 2014; Hjalmarsson et al., 2015)

Results

Effect of the schooling reform on educational attainment and

In document Essays on Genetics and the Social Sciences (Page 153-155)