3. HOW DOES INEQUALITY OF SCHOOL ACHIEVEMENT AFFECT
3.3. Data and Methods
3.3.5. Data issues
There are many gaps in the dataset, especially in data from WDI and UIS- UNESCO where the data are collected every year. I dealt with this by using the last available value for the periods 1998-2000, 2001-2003, 2004-2006, and 2007-2009 to match PISA variables for 2000, 2003, 2006 and 2009, respectively. In other words, I have a collapsed dataset informing four three-year periods.
The final dataset contains the following variables: PISA reading scores, percentage of scores variance found within schools, ESCS, R-squared from ESCS, GDP PPP, percentage of the labour force with HE, participation of HE attainers in total unemployment, GER secondary education, expenditure in secondary education as a percentage of GDP per capita, and expenditure in HE as a percentage of GDP per capita. Every variable is reported (when possible) for the last year of the periods mentioned above. There are data for 63 countries, which corresponds to countries participating in PISA 2009.
Nonetheless, one of the main concerns, especially when working with comparative data, is incomplete information for some variables. I considered imputing predictor
63 values by using two criteria. Firstly, I replaced missing values with fitted values for each country given by:
3.1 ′
Where ′ is the predicted value of an independent variable , given by a linear
combination of the valid Xs, the period and a vector of country dummies .
This is equivalent to fit a fixed effects model and replace the missing values for their fitted values obtained from [3.1].
The second criteria consisted in computing a within-country mean for each country. For each country i and time t, missing values have been replaced by the following expression:
3.2
Where is the observed mean value of independent variables throughout the whole period of study.
With either strategy, imputations were made only if at least two observed values were present for the variable. If, for a country, an independent variable had presented only one observation, no imputations would have been made for the independent variable/country.
There are, of course, more imputation alternatives (e.g., more complex regression- based techniques such as chained equations). It is difficult to find variables with no or few missing values in the dataset that may be used as predictors as, with the exception of identification variables, they face the same missingness issues as the variables to impute, then there are low chances of building a good imputation model. Even though the data allows multiple imputation, estimates would not differ substantially from those imputing using [3.1] due to the lack of auxiliary variables. This is not an appropriate setting to try multiple imputation techniques as the missing
64 at random assumption (MAR) may not hold as missingness is not related to non- response patterns but to countries’ strengths and weaknesses affecting data collection carried out by international agencies, as well as countries’ policies regarding the generation, production, and dissemination of indicators3. For instance, the first round
of PISA included OECD countries and a few middle-income and developing countries; whereas students from 63 countries were measured in 2009. This means that participating in PISA is not a random mechanism but conditional to countries’ income and development. In addition to that, some countries participating in PISA 2000 did not participate in the following version on purpose, this way allowing policies to crystallise or political reasons. Anyway, multiple imputation estimates, using chained equations, are also shown when assessing robustness and in more detail in Appendix A. However, I prevent that, due to the reasons given, these imputation models might not be particularly robust.
All the models reported in the chapter include missing dummies. In section 3.4, I report estimates for models using [3.1] and [3.2] techniques, lagged predictors as well as models only considering full case data. I am aware of the fact that imputation using averages may reduce the variance critically but chances are limited
3.3.6. Model specification
The model will measure the effect of the socioeconomic distribution of school achievement on access to HE. I start by establishing the access to HE as a dependent variable being a function of a series of variables as shown below.
3.3 , ,
In [3.3] is the outcome variable, namely HE GER; is a series of variables informing school education characteristics such as equity of school achievement, performance, and social inclusiveness/segmentation; WEC are variables giving account
3 For instance, the outcome variable may have many gaps as in some countries enrolment rates are not necessarily calculated by using administrative records but household surveys applied with very different periodicity.
65 of the wider socio-economic context such as GDP, school attainment of the adult population, and educational profile of the labour force; are variables accounting for investment in education at secondary and higher levels as a fraction of the GDP.
Firstly, I use an OLS estimation. Let Char be a series of m country-level variables other than inequality of school achievement as in [3.4]. Formally, the OLS model is:
3.4
For the i-th country: is the HE GER, is the intercept, are m variables to be used as controls, and is the residual. The main coefficient of interest is , though some of might be of interest too, especially those regarding PISA-related variables.
Given the constraints and the nature of the data, I do not intend to establish causal relationships, but associations between variables. Moreover, there are issues like reverse causation and endogeneity that might bias the results. The main question here is whether is actually measuring the returns to inequality of school achievement or the returns to other unobserved confounders.
Next, I use all available data in order to deal with observed confounders but I take a different approach to deal, in part, with omitted variable bias. One way of specifying the model is to add as many dummy variables as countrites to [3.4] above. As the time dimension is included, the dependent variables, predictors, and residuals are added to the subscript t, this meaning that now estimates are for the i-th country in time t.
66 In [3.5] all the effects of unobserved time-invariant characteristics are absorbed by the dummies. Another specification is the use of the ‘within estimator’, which measures the deviations from the mean within each entity – in this case, countries – so deviations of time-invariant unobserved characteristics equals to 0. I specify the model including dummies instead of the demeaned formulation as it is straightforward that controlling for dummies captures countries fixed effects.
The fixed-effects model still has its own caveats. Firstly, omitted variable bias still remains as the fixed effect model can only control for unobservable confounders that are constant across time. Secondly, since the between-subject variance is taken out, variability is reduced, this meaning a loss of ‘signal’ at the same time that more ‘noise’ is likely. With a short panel, as the one I used, low variability might be a critical issue because in the fixed effect model all the variation comes from within-subject. Moreover, given this low variability scenario, problems associated with measurement error may distort the analysis and the estimates. This is especially sensitive to some variables in the dataset; for example, those reporting SES based on ESCS. However, the measurement error in this dataset is unlikely to be systematic. Finally, there are many imputed values so it is also necessary to check the robustness by comparing estimates obtained with different imputation techniques as well as complete case analysis (see relevant tables in Appendix A for more detail on missingness).
I shall start specifying the estimation models by establishing a base model containing inequality of school achievement, PISA reading scores and ESCS. Additional controls will be added to the base specification in further steps, starting from within/between school variance composition, then I introduce indicators measuring the wider socioeconomic context, and finally I add variables on educational expenditure. The reason behind this sequence is that I mostly deal with the ‘interaction’ between ESCS and reading scores so any variables directly involved in the measure of socioeconomic inequality of school achievement are shown from the outset. I expect the inequality measure to attenuate as control variables are introduced.
67
3.3.7. Power and sample size
The small sample size in this study (228 observations for 61 countries) is also problematic as the power of the estimates is affected. In this context, ‘power’ means the probability of obtaining a true non-null effect, and depends on sample size, effect size, or both. In other words, a small sample size, as it reduces power, undermines the chances of a statistically significant result being a true effect. It may also overestimate the magnitude of effects unless effect sizes are truly large. Button et al. (2013) illustrate the consequences of low power on the validity of the estimates; namely, low power means low replicability. They highlight the importance of a sound discussion on sample size and statistical power as an essential scientific practice.
This is a threat to the validity of the estimates, so I shall also discuss power estimates in the results section. Power is formally defined as the probability of not making a Type II error, often referred to as . It is commonly expressed as 1 . I use a formula
that corresponds to the inverse of that used by Dupont and Plummer (1998:599) to assess the power of the estimates:
3.6
Where corresponds to the value of t (two sided) in the t distribution, the first term is to the ratio between , which is the absolute value of the regression coefficient, and , which stands for the standard error of the coefficient. The standard error contains the sample size as it is the ratio between the standard deviation and √ , where stands for the sample size. The expression denotes the value of t at a given value of /2, with degrees of freedom. This way, it is straightforward that
3.7 , so replacing the terms, I obtain:
68 It is clear from [3.7] that increasing the sample size leads to smaller standard errors so is bigger.
With over 100 observations, normal and t distributions are similar, so one could argue that ≅ . At .05, the z value (two sided) is 1.96, then substituting
in the equation:
3.9 1.96
If the confidence level were to increase, statistical power drops.
To obtain the probability of finding a true effect, I use the normal cumulative distribution Φ, thus:
3.10 Φ ,
where Φ is the normal cumulative distribution function.
A different perspective is that proposed by Cohen (1988) -the one implemented in GPower software- to study the relationship between power and size effect. In a context of multiple regression, Cohen suggests using , which is the ratio between the variance explained by the model and the residual, as a measure effect size:
[3.11]
The effect size of a given coefficient can be measured as a change in comparing to the model including the variable with another not including it. In bivariate regression, is simply the square of the standardised regression coefficient r, which is the Pearson’s correlation coefficient, whereas in multiple regression the contribution of a specific variable to the model is measured through partial correlations, then the variable contribution is a partial- , which is defined as:
69
[3.12] , 1 ,
where Y stands for the dependent variable, X is the relevant predictor, and X’ is a set of predictors other than X, stands for the t-statistic corresponding to the coefficient. , already controls for the effect of any X’ variables. Hence, by
substituting in [3.11], the calculation of becomes:
[3.13] ,
Amplifying both the numerator and the denominator by 1 , the
expression becomes:
[3.14]
which is the measure of the effect size I use. The parameter follows the F distribution so, for a particular F-value, given the sample size and the number of parameters the model estimates, the statistical power will be the area under the F curve up to the F-value corresponding to the effect size and the relevant degrees of freedom (for a further explanation, see Nakagawa and Cuthill 2007:598) .
Another common problem with small samples is that the complexity of the model may exceed what the data are able to deliver. For instance, for a small sample with too many parameters, it is likely that the model is overfit. In other words, an overfit model is likely to be tailored to a particular sample so results are hardly replicated in out- of-sample estimations. This is also termed as the model parameters being ‘idiosyncratic’ to a specific sample. There exist several approaches to test model overfit, but I prefer parsimony as a general methodological principle.
In consequence, I shall report Akaike’s and Schwartz’s Bayesian information criteria (AIC and BIC, respectively) on the estimates tables as an indirect way of assessing overfit. The advantage of using either information criteria, although there is no test,
70 is that they penalise the inclusion of additional parameters while at the same time assessing goodness of fit. Beyond their limitations, the use of information criteria is intuitive and researchers use them as conventions to select models (Kass and Raftery 1995).
3.4.Data Analysis and Results