Specification of the pupil utility model - The two-sided matching model

5.2 The two-sided matching model

5.2.1 Specification of the pupil utility model

The pupil utility model must take into account the factors potentially influencing school choice that were discussed earlier – peer preferences and moderated preferences for academic performance. It must also take account of proximity, as parents have obvious incentives to prefer closer schools over schools further away, all things being equal. The NPD data contains the postcodes of both schools and pupils as well as the Ordnance Survey northings and eastings of the postcode centroids. The OS co-ordinates are used to calculate the linear distance between each pupil’s postcode and each school in kilometres. Since postcode is the lowest-level geographical identifier, the measurement error in using postcode centroid to approximate location is considered negligible and is not accounted for in the model.

Previous discrete choice studies of school choice have assumed that utility is linear in distance from school. Abdulkadiroğlu et al. (2015) stated that this assumption allowed distance to be used as a “numeraire” to set a scale for utility comparisons in the absence of money. We wish to accommodate the possibility that utility for proximity is non- linear. However, a polynomial specification for distance, such as a quadratic or cubic, is too constraining. A quadratic specification, for example, may be influenced too much by the extremes of distance where choice probabilities are very low. To allow for non-linear preferences for proximity we have specified a model in which utility for proximity is piecewise linear, with the “knots” chosen so that they fall roughly at the quartiles of allocated distances across all samples; the knots (including boundary knots) are at_{0,2.5,4,6,22} kilometres. This allows the gradient of the proximity utility function to change at each quartile. The piecewise linear terms were parametrised as a B-spline to minimise correlation between terms for computational reasons. The R functions splineDesign and

bswere used to create the B-spline design matrices (R Core Team, 2016). In addition, a dummy for the closest school to the pupil is included to allow for the possibility that families give additional weight to their nearest school, which is not accounted for by proximity alone.

To operationalise peer effects pupil ethnicities and data on the ethnic mix at schools are used. In the urban samples of pupils the predominant ethnic groups are: white British; Indian; Pakistani; and Bangladeshi; with much smaller proportions of other eth-

nic groups. Because the estimation of peer preferences requires reasonably large group sizes, ethnicity is re-categorised into three groups: white British; Asian (including In- dian, Pakistani and Bangladeshi but excluding ‘other asian’); and Other (including ‘white other’ and ‘asian other’). In the context of the three urban school markets, this categori- sation is appropriate as south asian pupils tend to form the main minority ethnic group in all three markets.

School ethnic mix is represented by two-year lagged mean proportions of White and Asian categories at each school (whole-school population). Proportions are calculated as the mean of proportions at y₋2 and y₋1, where y is the year of entry. Lagged mean percentages have been divided by 10 so that they take values between one and 10 and parameter estimates represent the change in utility per 10 point change in ethnic composition. The lagged means are designed to represent parents’ qualitative recogni- tion of the relative ethnic compositions of schools, gleaned from direct observation and social networks, as opposed to a precise knowledge of ethnic proportions, which are not published. Lagged mean proportions are interacted with pupil ethnicity dummies, such that only white pupils have a term for the white composition of schools, and likewise for asian pupils. In this way homophily (preference for one’s in-group) is modelled ex- plicitly, but xenophobia (dis-preference for another group) is not modelled. Linear and quadratic terms on these interactions are included in the pupil utility function to allow for non-linear peer preferences. Peer preferences for the Other category are not modelled as the group size is too small.

Academic performance is operationalised as the % of children in each year achieving five or more A*–C at GCSE (5AC). During the sampled period this measure was the main raw performance measure used in performance tables published by the government and in league tables. This raw performance measure was used rather than a value-added measure such as Contextualised Value-Added (CVA) for two reasons. First, there is evidence from empirical studies that parents focus on raw performance when choosing schools, rather than value-added measures (Wilson et al., 2006). Second, value-added measures used in performance tables changed during the study period, whereas the 5AC measure remained unaltered4. Using a measure that has not been altered during the sampled period facilitated the combination of estimates across years. As with ethnic

preferences, the two-year lagged mean of 5AC was used to represent parents’ weighing several information sources (including out-of-date word of mouth as well as up-to-date statistics). The measure was standardised in each sample, so that parameter estimates could be interpreted per standard deviation change in 5AC.

Measuring socio-economic status (SES) using data from the NPD is problematic. The two variables that are widely used as proxies for SES are Free School Meals eligibility (FSM) and the Income Deprivation Affecting Children Index (IDACI). Each has been criticised for different reasons. FSM indicators denote that a child is, or has been at some point in the past six years, eligible for a free school meal because the family receives certain state benefits. However, this is not an indicator of poverty per se and it has been criticised on the grounds that it contains an element of arbitrariness and fails to measure the working poor (Hobbs and Vignoles, 2007). IDACI, on the other hand, is a geographically-imputed variable – that is, a composite of the prevalence of certain indicators of deprivation in census tracts, that has been assigned to each pupil based on the pupil’s postcode. This means that, rather than measuring the pupil’s income deprivation, it assumes that the pupil’s circumstances are explained by the pupil’s home location. As a geographically-imputed variable, IDACI is only valid as a proxy for pupil SES to the extent that residential sorting on SES exists.

Each of these proxies for SES is problematic. Crawford and Greaves (2013) compared several proxies for SES and concluded that at a school level, the proportion receiving FSM was a better predictor of “educational disadvantage” than IDACI. However, we require a proxy for SES that has good properties at the level of an individual pupil, and for this purpose there are several reasons to prefer IDACI. First, FSM codifies a single administrative rule based on somewhat arbitrary criteria, whereas IDACI at least averages over several rule-based classifications, and many census respondents, leading to an estimate of deprivation that should be expected to be more stable and less sensitive to arbitrary rules. Second, IDACI allows more finely-grained classification into five (or any number of) levels, whereas FSM only allows binary classification. A single ‘outlier’ may be more influential when using FSM than IDACI quintiles, since the influence of a single case on the estimate for one quintile will not affect neighbouring quintiles.

come deprivation. These were transformed into a national rank, and thence into national quintiles. The reason for this transformation was that transforming IDACI into an or- dered factor variable avoids having to assume linearity in the effect of IDACI on utilities. Finally, to allow for the moderation of preferences for academic performance by SES, the standardised 5AC measure was interacted with IDACI quintiles. In one model IDACI quintiles were replaced by an indicator of FSM eligibility.

The final part of the pupil utility model was a term interacting the Christian religious denomination of the school with the religious denomination of the pupil’s primary school. Separate terms are estimated for Church of England (CofE) and Roman Catholic (RC) schools. The CofE term takes a value of one if the secondary school has a CofE denomination and the pupil’s primary school also has a CofE denomination, zero otherwise. The RC term is defined similarly. The substantive interpretation of this measure is the extent to which families who send their children to faith primary schools prefer faith secondary schools, or if primary school denomination is viewed as a proxy for faith, the extent to which people of faith prefer schools of that faith. The practical reason for the inclusion of the terms is that they appear in the school utility model; we therefore avoid adding an additional exclusion restriction to the set of assumptions. By including this term in both the pupil and the school models we allow for the possibility that sorting based on religious denomination is by choice as well as constraint.

The existence of all-boys schools and all-girls schools means that for both boys and girls there are some schools that are not available, and should not be considered to be part of the choice set. To account for this, cases corresponding to a girl’s probability of forming a blocking pair with a boys’ school, or a boy with a girls’ school, are simply excluded from the log-likelihood function.

The following pupil utility functions have been estimated. The first, model A, specifies a linear function for distance and ethnic mix, and uses IDACI:

Uas=β1distanceas+β2closestas+β3(Asiana×%Asians) +β4(WhiteBra×%WhiteBrs)

5 X

j=1

β5j(5ACs×IDACI.Qaj) +β6(CofEs×CofEPrimarya) +β7(RCs×RCPrimarya) +as,

belongs to. Model B allows quadratic peer preferences:

Uas=β1distanceas+β2closestas+β3(Asiana×%Asians) +β8(Asiana×%Asians)2

+β4(WhiteBra×%WhiteBrs) +β9(WhiteBra×%WhiteBrs)2+

5 X

j=1

β5j(5ACs×IDACI.Qaj) +β6(CofEs×CofEPrimarya) +β7(RCs×RCPrimarya) +as.

Model C adds a piecewise linear specification of distance:

Uas= I

i=1

β1iBi(distanceas) +β2closestas+β3(Asiana×%Asians) +β8(Asiana×%Asians)2

+β4(WhiteBra×%WhiteBrs) +β9(WhiteBra×%WhiteBrs)2

5 X

j=1

β5j(5ACs×IDACI.Qaj) +β6(CofEs×CofEPrimarya) +β7(RCs×RCPrimarya) +as,

whereBi()is a linear B-spline basis function. The final specification, model D, features linear distance and quadratic peer effects, and uses FSM in place of IDACI:

Uas=β1distanceas+β2closestas+β3(Asiana×%Asians) +β8(Asiana×%Asians)2

+β4(WhiteBra×%WhiteBrs) +β9(WhiteBra×%WhiteBrs)2

2 X

j=1

β5j(5ACs×FSMaj) +β6(CofEs×CofEPrimarya) +β7(RCs×RCPrimarya) +as.

In addition to the specifications above, we also undertake a sensitivity analysis using model B (quadratic ethnic preferences) whereby all Islamic schools are removed and the model is re-estimated. Islamic schools, of which there is one in Preston and two in Blackburn, tend to have large proportions of Asian students. They also have faith-based admissions criteria for at least part of their intake. The sample does not contain any information about feeder schools, regular worship, or other criteria that constitute the admissions criteria, so it is possible that religious preferences, and religious constraints, may be confounded with ethnic preferences. For this reason we exclude Islamic schools to test the influence of these schools on results.

One aspect that is missing from the pupil utility model, that is often included in discrete choice models, is a school fixed effect (a dummy variable for each school) whose pur-

pose would be to mop up any unobserved systematic variation in school popularity, not explained by school-level covariates. Although theoretically identifiable in many-to-one two-sided structural models, we found that in practice adding fixed effects to the model led to correlation and local optima in the likelihood function, hindering stable estimation. This appears to be due to the fact that the two-sided method already includes a latent variable for each school that mops up some unobserved variation in the popularity of schools, since less popular schools tend to have lower threshold utilitiesV. Fixed effects interact with these latent variables to produce local optima in the likelihood function, which makes estimation problematic. Excluding fixed effects should not affect the un- biased estimation of other pupil utility parameters, but is relevant to the accuracy of aggregate demand estimation for policy analysis.

Assuming thatasare uncorrelated across schools implies that the model does not allow for substitution in the sense defined by Train (2009) – that is, non-proportional substitution in the individual choice probabilities, conditioning on observables. In other words, the model does not allow for the possibility that there is some unobservable characteristic shared by schools s and s0 _{such that people who like school} _s _{would be more likely to} like school s0, and vice-versa. However, a more important question for policy analysis is whether the model allows for realistic patterns of substitution in aggregate demand for schools, averaging over observables. This is indeed the case; by specifying a model that richly interacts school characteristics with pupil characteristics (location, ethnicity, SES and primary school denomination), rich patterns of substitution are possible. The presence of non-proportional substitution can be confirmed by analysing cross-elasticities of demand for a school with all other schools, and checking that they are not all equal.

In document School choice, competition and ethnic segregation in Lancashire:evidence from structural models of two sided matching (Page 87-92)