Propensity Score Matching - Data Analysis Plan

Chapter 3: Methodology

3.5 Data Analysis Plan

3.5.3 Propensity Score Matching

Propensity scores that effectively remove imbalances along potential confounders satisfy the conditional independence assumption (Apel & Sweeten, 2010a). Two methods assess whether the PSM has achieved covariate balance. First, the distribution of scores within treatment and comparison groups can be compared to determine whether there is common support across levels of x. Insignificant differences across most to all covariates within each bin provide supporting evidence that the model has achieved covariate balance (Apel & Sweeten, 2010a).

Second, estimates of the standardized bias before and after matching can be used to compare the results derived from various matching methods. Standardized bias estimates that exceed 20 indicate covariate imbalance (Apel & Sweeten, 2010a; Rosenbaum & Rubin, 1985). To meet the overlap assumption, propensity scores for cases in each treatment condition should overlap, with

few off-support cases in tails of the score distribution and ideally common support across all values between 0 and 1 (Apel & Sweeten, 2010a; Heckman et al., 1997).

Matching Estimation Techniques

Nearest neighbor matching is the simplest method, and it permits matching to single or multiple cases, with or without replacement. Matching to multiple nonparticipants reduces variance but can increase bias because some matches are less accurate. Matching without replacement works well when untreated and treated cases are located along the whole propensity score distribution. Matching with replacement permits treated cases to be matched to the same untreated cases; this can improve the quality of the matches, although it reduces the effective sample size. This loss of efficiency is preferred to the potential increase in bias that can occur when using matching without replacement (Apel & Sweeten, 2010a).

Caliper matching sets an additional parameter to the nearest neighbor matching technique; the caliper sets the maximum area from which untreated cases can be matched to treated cases. This helps ensure that matched pairs have similar propensity scores (Apel & Sweeten, 2010a). The initial caliper width proposed for use in the matching model was 0.2 of the standard deviation of the propensity score logit (Rosenbaum & Rubin, 1985).

Kernel matching techniques may be preferred in cases where untreated and treated cases are less evenly distributed. The finite probability distribution function used as a kernel weights untreated cases by their distance from the treated case. If a uniform kernel is used, this method can

increase the number of untreated cases to which treated cases are matched, as it matches treated cases to all untreated cases within a given radius. The Epanechnikov kernel matches treated cases to all untreated cases located within a pre-specified bandwidth (Apel & Sweeten, 2010a).

Propensity Score Estimation Methods

The Stata user-written psmatch2 command was used to implement various matching estimators. Matches were restricted to regions of common support. Prior to matching, the Stata user-written

pscore command was used to divide the sample into 5 bins of equal size. The program showed

that there were no significant differences between participants and nonparticipants within the same bin. The pstest command was used to assess standardized biases for key predictors before and after matching.

Results from multiple matching estimators were evaluated to assess which estimator produced optimal matches (Apel & Sweeten, 2010b). Nearest neighbor matching with caliper, and permitting various numbers of matches, was used as the primary matching technique. To

supplement results derived from nearest neighbor matching techniques, this study implemented a second set of models that use kernel matching. Various forms of kernel matching estimators (Epanechnikov, Gaussian, tricube and uniform kernels) were considered, but these alternatives resulted in greater loss of cases without any corresponding improvement in balance.

Implementing Propensity Score Matching

The dissertation proposal initially specified that matches would be restricted to individuals within the same trajectory group (Haviland & Nagin, 2005; Haviland et al., 2007; Haviland et al., 2008). When efforts were made to restrict matching to within the same trajectory group,

participants and nonparticipants in the high-rate trajectory groups (Groups 1 and 3) showed adequate balance using most matching techniques. Despite this, mean standardized biases exceeded 5% for all matching methods, and standardized biases for key predictors exceeded 10% (e.g., age at release, SVORI prison term). Matching estimators failed to achieve balance when

matching individuals within the low-rate offending group (Group 2). Instead, the matching process was modified to permit matching across trajectory groups.

3.5.4 Duration Models

Duration models estimate the effect of employment programs on rearrests during the first 3 years of release (Sedgley et al., 2010). The repeated-events model estimates the time to each new arrest date that occurred within the first 3 years of release. For men with multiple recorded arrests, the time to subsequent arrests was adjusted to reflect the time that had lapsed since the preceding arrest. In the single-event duration models, respondents remain in the sample until they experience the event, at which point they are removed from the sample as failures (Zweig et al., 2011).

The duration models use parametric regression models to measure time to arrest in five different formats: time to arrest within the first 3 years with repeated events permitted, time to first arrest for any offense, and time to first arrest for three offense subtypes (drug, property, and violent offense). The Gompertz distribution provided the best fit to the data in the repeated-events failure model. To account for the dependence due to repeated observations for the same individual, standard errors use robust estimation with clustering at the individual level. In the case of the four models estimating time to first arrest, the Weibull distribution provided the best fit to the data. In these models, standard errors use robust estimation with clustering at the state level.

3.5.5 Structural Equation Modeling

Structural equation modeling (SEM) provides more flexibility than traditional regression-based approaches in modeling measurement error, time-specific parameter estimates, and cross-lagged effects (Bollen & Brand, 2010). SEM can model explicitly the measurement error that results

from random noise (Bollen & Brand, 2010; Krohn et al., 2011). The Confirmatory Factor Analysis (CFA) submodel depicts relationships between the latent factor and indicators used to measure each latent factor (Bollen & Noble, 2011). The CFA measurement submodel permits use of multiple indicators of the same construct to enhance measurement accuracy (Bollen & Bauldry, 2011). It enables one to specify correlations between indicators. The CFA permits error terms for indicators to be correlated when there is a theoretical or methodological reason for the error terms to correlate (Bollen & Noble, 2011).

The structural submodel models the inter-relationships between latent factors and the observed indicators that reflect these latent constructs (Bollen & Brand, 2010; Krohn et al., 2011). Figure 2.2 presents the original proposed structural model. Figure 3.5 presents the initial longitudinal structural equation model (LSEM). Figure 4.10 presents the CFA results and Figure 4.11

presents the results of the LSEM. Ellipses represent latent constructs that were retained from the CFA. Squares and rectangles represent observed variables.

Assessing Factorial Invariance over Time

Tests were conducted to assess whether the indicators exhibited factorial invariance over time. Indicators that exhibit consistent factor loadings, intercepts, and variances over time can be said to be measuring the same construct over time, with differences over time reflecting changes in the underlying construct. To test this assumption, a series of nested models were conducted that imposed increasingly stringent requirements on model parameters. Fit statistics for each nested model were compared to the fit statistics for the preceding model to assess whether the items met the assumption of invariance at each stage. Meade, Johnson, and Braddy (2008) recommend using a change in the Comparative Fit Index (CFI) of .002 as the threshold for rejecting this assumption.

Figure 3.2 Parameter Labels for Four Time Points: The Configural Invariance Model.

The configural invariance model estimated all parameters freely, with restrictions only on the patterns of loadings on factors (Figure 3.2). The weak invariance model constrained factor loadings to equality at each time point, but intercepts and variances remain freely estimated (Figure 3.3). The strong invariance model constrained factors and loadings to be equal at each period (Figure 3.4). If the model passes the assumption of strong factorial invariance, the model can be used to examine changes in latent means over time.

Figure 3.3 Parameter Labels for Four Time Points: The Weak Invariance Model.

Structural Equation Path Model

This study adapts the path model depicted in Thornberry & Christenson (1984). Prior criminal activity is predicted to influence men’s current labor force participation and job conditions through changes in men’s stock of human and social capital (Heckman et al., 2006; Sickles & Williams, 2008; Thornberry & Christenson, 1984). The path model is estimated by Mplus version 7.3, using maximum likelihood estimation with robust standard errors, because the employment, crime, and recidivism measures are binary indicators.

Figure 3.4 Parameter Labels for Four Time Points: The Strong Invariance Model.

Control variables were regressed on latent and observed endogenous variables from the final analysis period and on work and crime outcomes at each follow-up wave. Models’ AIC values were used to compare competing non-nested models to determine which one provided the best fit to the data. The final sample size for the general structural equation model was 1,243 cases.

In document Labor Force Participation and Crime among Serious and Violent Former Prisoners (Page 60-69)