Regression Model for Estimating Overall Impacts of Project Gate

CHAPTER IX. LESSONS LEARNED

C.1 Regression Model for Estimating Overall Impacts of Project Gate

Our estimates of the impacts of Project GATE are based on a comparison of applicants randomly assigned to the program group with applicants randomly assigned to the control group. To compute impacts, we estimated a statistical model that predicts the outcome of interest as a function of program/control status, site, and a set of background characteristics detailed below. The basic form of the model is:

(C.1)

where

yi is the outcome of interest

Si equals 1 if applicant i was in site S and 0 if not

Pi equals 1 if applicant i was in the program group, 0 if the applicant was in the control group

Xi is a vector of baseline characteristics of customer i

ε_i is a random, mean-zero error term that captures the impacts of unobserved factors that influence the outcome

5 5 1 1 , i i i S i S i i S S y = λ S + β S P δ X +ε = = + ′

∑

Evaluation of Project GATE Appendix C-168 May 2008

λ, β, and δ are parameters (or vectors of parameters) to be estimated.

The regression models were estimated using weights to account for survey nonresponse business partnerships that were necessarily excluded (Appendix B).

C.1.1 Estimation of Impacts

The parameters of greatest interest are the β_S for each site, because they represent the impact on

applicants of being assigned to the program group rather than the control group in site S. To obtain the average impact across all sites, we computed a weighted average of the impacts in each site, βPool, where the weight is denoted by W_S:

5 1 Pool S S S W β β = = ∑

The site weight, WS, used in the above formulas is the proportion of all respondents that are from site S. As a sensitivity check, Appendix D compares the results from our main specification to an alternative where the five sites are each given equal weight in the regression, that is, WS = 1/5.

C.1.2 Choice of Linear Regression

For all outcomes we estimated the parameters in Equation C.1 using ordinary least squares, which models the outcome as a linear function of the predictors. An alternative would have been to use logistic regression (or probit models) for binary outcomes such as employment status. Logistic regression models the “log odds of success” as a linear function of the predictors:

( ) log( ) 1 i i i i i g π π βX e π = = + − , where πi =E y( )i .

We chose to use linear regression rather than a logistic regression for all outcomes for a few reasons. The first reason was simplicity, both of analysis and presentation. There is not a standard way of estimating or presenting standard error estimates for pooled impacts estimated using logistic regression, whereas the calculation and presentation is very straightforward using linear regression.

Second, in previous research conducted by two of the authors of this study (McConnell et al. 2006), a series of sensitivity analyses indicated that the linear and logistic regressions led to very similar results for this analysis. In particular, results from linear regression were compared with a bootstrap approach for estimating standard errors in logistic regression. The bootstrap approach yields correct standard errors, but is computationally intensive and was not feasible for this study because of its very large number of outcome measures. They generated impact estimates for a set of key binary outcomes (with a range of mean values, from 0.1 to 0.9) using both approaches and compared the results. The bootstrap and linear regression led to remarkably similar results; the impact estimates were generally identical and the standard errors (and associated p-values) were very similar as well. There were very few instances where the methods would lead to different conclusions regarding the significance of an estimated impact. We thus chose to use linear regression for all outcomes, as was done in several other large-scale evaluations, including Kling (2006), McConnell et al. (2006), and Trenholm et al. (2007).

C.1.3 Regression Predictors

The predictors included in the regression model (the X variables in Equation C.1) were: age, sex, race/ethnicity, whether disabled, marital status, household size, education level, born in the United States, whether receiving UI benefits at application, weeks of UI benefits received over the previous year, employment at the time of random assignment, prior self-employment experience (either oneself or a relative), prior managerial experience, family support for pursuing self-employment, another family member employed, household income, credit problems, relevant skills developed in a job or hobby, and outside health insurance coverage. Data to define these predictors were obtained from the GATE application package.

C.1.4 Estimating Subgroup Impacts

A slight simplification to the model was used when estimating impacts for subgroups of applicants. In particular, to allow efficient estimation of the parameters of key interest for subgroups—the overall impact across all sites for each subgroup—we do not include separate program indicators for each site when estimating subgroup impacts. Including the site interactions with the subgroup indicator would greatly increase the number of parameters in the model and may result in less precise estimation of the overall subgroup impacts. The model used for subgroups is thus:

Evaluation of Project GATE Appendix C-170 May 2008

In document Growing America Through Entrepreneurship: Findings from the Evaluation of Project GATE (Page 184-187)