• No results found

4.4 Statistical modelling approach

4.4.3 Propensity score matching

An important limitation of longitudinal datasets is that they provide observational data which means that the datasets are not based on experiments. A relatively new statistical approach to correct for this limitation is propensity score matching (PSM),

that attempts to design a randomised experiment using observational studies in cases where it is not feasible to conduct randomised trials and therefore it is challenging to draw causal inferences. PSM is employed for the current thesis. The overall process of propensity score matching analysis will be described in this section. Details of the statistical theories, the modelling principles and a step by step analysis will follow in Chapter 7.

The debate on neighbourhood e↵ects on young people’s outcomes centres on the ques- tion of whether di↵erences observed in the outcome (NEET status at 18-19) between young people who live in high and low Crime Score areas are caused by neighbourhood characteristics or by the characteristics of the people who live in those areas. In other words, if being in NEET status is attributed to neighbourhood characteristics then this finding suggests that living in an area characterised by high Crime influences neg- atively young people’s educational and employment outcomes. If however, di↵erences in young people’s outcomes are attributable to the people who live in specific areas, findings would suggest that specific people would become NEETs regardless of whether they live in high or low Crime Score areas. The best approach to study such a question would be to employ a randomised experiment. Because observational data, such as the LSYPE study employed in the current analysis, do o↵er the possibility to assign individuals to treatment and control groups at random a statistical procedure needs to be employed to balance the data and to create two comparable groups before assessing treatment e↵ects. In the current study propensity score analysis is employed to assess the impact of Crime Score on young people becoming NEETs.

The aim of propensity score analysis is to balance data when treatment assignment is non-ignorable, to evaluate treatment e↵ects in a non-randomised approach and to reduce multidimensional covariates to a one-dimensional score called the propensity score (Rosenbaum and Rubin [154], 1983). The propensity score analysis starts by finding the conditioning variables or covariates that are considered to a↵ect the out-

come and to cause an imbalance between the treatment and control groups. After the vector of covariates is defined, the modelling begins with an estimation of the con- ditional probability of receiving a treatment given the vector of observed covariates. The estimation of the conditional probability, ie the propensity score, is done by us- ing a logistic regression model to analyze the e↵ects on the treatment of the vector of covariates. The propensity score is a balancing score. The propensity score is esti- mated both for treated and control groups based on the values of specific covariates. The balancing scores (propensity scores) are employed to match treated participants to control participants. Matching based on propensity scores balances observed co- variates and controls for selection bias. Matching data can be performed using one of three conventional methods, the ordinary least squares regression, matching and stratification. Matching, which will be selected in the current analysis, is performed by matching each treated participant to a non-treated participant based on a vector of matching covariates and employing the propensity score. The goal of matching is to create two groups of participants similar in terms of the propensity scores which can be compared on the observed covariates. Various algorithms for matching exist such as greedy match, the Mahalanobis matching and optimal matching. The main di↵erence is the way they treat loss of participants in cases where the propensity scores cannot fulfil matching. Greedy matching will be employed in the current analysis and more specifically the nearest neighbour without replacement without caliper approach which will be presented analytically in Chapter 7. After the two comparable groups are created, post-matching analysis will be performed on the matched samples. Analysis initially estimates the Average Treatment E↵ect. Subsequently, multivariate analysis is performed on the matched samples as it would have been done as if a sample created by a randomised sample was used. Propensity score methods create a summary measure of the probability of receiving a treatment. The advantage of propensity score match- ing is that it approximates randomised trials and that after units are matched, the unmatched comparison units are discarded and consequently not used in the treatment

impact (Dehejia and Wahba [46], 2002). However, bias may also arise in propensity score matching because the apparent di↵erence between the two compared groups of units may result from characteristics that a↵ected whether or not a unit received a treatment and not from the e↵ect of the treatment per se. As Rosenbaum [153] (2002) noted what remains unknown when propensity score matching is employed is the ex- tent to which matching adequately controls for bias and yields estimates of treatment e↵ects that can be robust. The most important form of bias that can arise in propen- sity score analysis is the hidden bias which is created by the omission of unobserved characteristics which might a↵ect the outcome of the analysis. A statistical approach to correct for hidden bias is sensitivity analysis.