To estimate the average impact of the Swedish comprehensive schooling reform on educa- tional attainment, cognitive performance, and income, we follow Meghir and Palme (2005) and employ differences-in-differences. Our identification strategy leverages the fact that the reform was implemented at different times in different municipalities; it is based on the comparison of cohorts that were schooled before and after the reform within municipalities, and on the comparison of pre- and post-reform municipalities within cohorts.
The point of departure for our empirical framework is the following specification: (5.1) π¦πππ= π½0+ π½π π ππ+ π½πππππ+ ππππππͺ+ ππππ,
where π¦πππ is some outcome of interest (educational attainment, income or cognitive perfor-
mance), for individual i in birth cohort c and who went to school in municipality m; π ππ is
a dummy indicator of reform status for cohort c in municipality m; ππππ is the polygenic
score of educational attainment for individual i; and ππππ is a vector of control variables for
individual i.
The control variables include fixed effects for each birth cohort (i.e., birth year) and a set of municipality clusters. We defined the municipality clusters by grouping together municipal- ities having the same first birth cohort affected by the reform.25 They also include the top
ten principal components of the genetic-relatedness matrix among the regressors (to control for population stratification, as explained above). We run all regressions separately for males and females, so we do not need to control for gender.
Because the municipalities to a certain extent could self-select into the evaluation program, it is possible that the timing of the reform was related to municipality specific characteristics that also influenced the outcomes of interest. More precisely, our identification strategy re- lies on the assumption that changes in municipality-specific factors are not correlated with
25 This approach is similar to the setup used in Pekkarinen, Uusitalo, & Kerr (2009) in their study of the effects of a Finnish school reform on intergenerational income mobility. Including a fixed effect for each of the 1,000+ mu- nicipalities would substantially reduce the degrees of freedoms of our regressions. Furthermore, in the interaction specification introduced below, the fixed effects for the municipality clusters and the cluster-specific birth year trends are interacted with both the reform dummy and the polygenic score; using fixed effects for each municipality instead of for each municipality cluster would leave us with no degrees of freedom at all. As we show in Section 5.6.4, our main estimates are very similar and remain significant when using municipality fixed effects instead of fixed effects for the municipality clusters (while still using cluster-specific birth year trends).
the exact timing of the reform, conditional on the control variables.26 To minimize the
chances this assumption fails, we also include in our regressions separate birth year trends for each municipality cluster as well as a set of time-varying municipality-level covariates intended to measure demographic and socioeconomic changes.27 The detailed analyses pre-
sented in Hjalmarsson et al. (2015) and Lindgren, Oskarsson and Dawes (n.d.) further cor- roborate the view that we can treat reform participation as exogenous in our sample. To estimate possible interaction effects between the polygenic score and the effects of the schooling reform, we augment the above model with terms for the interaction between the reform dummy and the score as well as for interactions between the control variables and the score and between the control variables and the reform dummy28:
(5.2) π¦πππ= π½0+ π½π π ππ+ π½πππππ+ π½π π(π ππΓππππ)
+ππππππ+ (ππππΓππππ)ππͺπΊ+ (ππππΓπ ππ)ππͺπΉ+ ππππ
For the continuous outcomes (years of schooling, cognitive performance, and income), we estimated the above regression by ordinary least squares, clustering at the municipality level. For the binary outcomes (dummies indicating highest degree completed), we estimated lin- ear probability models,29 also clustering at the municipality level.
For income, we estimated a panel model in which we also controlled for a third degree pol- ynomial of age (in addition to the above control variables), by ordinary least squares and clustering at the municipality level. For the baseline specification, we included all individ- ual-year observations for which an individual was between 25 and 55 years old when his or
26 Since our regressions include fixed effects for the municipality clusters, time-invariant differences between early and late reformers will not compromise our identification strategy.
27 The time-varying municipality level covariates include the following variables (each measured the year the indi- vidual turned eleven): municipal level voter turnout, vote shares for the largest parties, and size of the electorate. We use political indicators since year-by-year indicators of socioeconomic development at the municipal level are only available for more recent time-periods. However, previous research has shown that aggregate level turnout and party vote shares in Sweden are highly correlated with more direct measures of socioeconomic development (Elinder, 2010). To create the year-by-year indicators, we interpolated turnout, vote shares and electorate size be- tween the election years (1948, 1952, 1956, 1958, 1960, 1964, and 1968).
28 A common concern in GxE studies is that interaction effects may be driven by confounders; for this reason, it is important to control for interactions between the control variables and the two interacted covariates of interest. For example, suppose that the average polygenic score is higher in wealthier cities and that the reform had a smaller effect in those cities. Under such a scenario, an estimate of the interaction between reform status and the score may be confounded unless we control for the interaction between reform status and municipality wealth (or municipality fixed effects).
29 As Ai and Norton (2003) show, coefficients on interaction terms are not easy to interpret in probit and logit models. Because we are primarily interested in the coefficient on the interaction between score and reform, we use a linear probability model instead of a logit or probit model for the binary outcomes.
148 OF GENES AND SCREENS
her income was measured (as mentioned above, income was measured every five years). We also ran specifications including all individual-year observations for which an individual was 23 to 32 years old (βearly careerβ); all individual-year observations for which an indi- vidual was 33 to 42 years old (βmid careerβ); and all individual-year observations for which an individual was 43 to 52 years old (βlate careerβ). (Because the income data was measured every five years, most individuals had two individual-year observations in each of the early, mid, and late career specifications.)
In each municipality we exclude the birth cohort preceding the first cohort affected by the school reform. The reason for doing so is that previous studies have shown that the youngest pre-reform cohort was significantly affected by the reform, possibly due to the fact that a substantial share of the pupils born late in a given year started school a year later than they were supposed to (Fredriksson & Γckert, 2014; Hjalmarsson et al., 2015)