Multiple imputation - Sample description - Findings of the Quantitative Study

Chapter 8 Findings of the Quantitative Study

8.2 Sample description

8.3.2 Multiple imputation

Multiple imputation replaces missing data by substituting values estimated on the basis of other information, allowing the analysis to retain all cases. Original methods of imputation increased “noise” in the dataset, so the process of multiple imputation was developed in order to impute and analyse several possible datasets and average results among them (Rubin, 1987). The goal is not to “guess” what value individual respondents might have given but rather to generate valid estimates of the population parameters, preserving the integrity of the dataset as a whole (Graham et al., 2003). Multiple imputation in SPSS therefore generates between five and 20 datasets, with each one imputing a different value for those missing, analyses each of the datasets, and then combines the parameter estimates and standard errors into one.

Previous studies in top management journals have utilised multiple imputation for missing data. For instance, in the Academy of Management Journal, Mishina et al. (2010: 709) impute values when firm data were missing for several years. They note:

Scholars have suggested that when some data are missing, multiple imputation of the missing data can be reliably employed to estimate values for the missing cases. Multiple imputation injects the appropriate amount of uncertainty when computing standard errors and confidence intervals … by deriving multiple predicted values for each missing case and using these predicted values to generate a range of possible parameter estimates. It then combines these estimates, approximating the error associated with sampling a variable assuming the reasons for nonresponse are known (i.e., measurement error) as well as the uncertainty associated with the reasons the data may be missing, thereby producing an average parameter estimate and appropriate standard error.

171

Similarly, also in the Academy of Management Journal, Jensen and Roy (2008: 504) use multiple imputation to handle missing data, conducting analyses on the pooled imputed dataset, while providing back-up analysis using listwise deletion of cases of missing data. The authors comment:

Although multiple imputation is the preferred method for handling missing data when it cannot be assumed that the data are missing completely at random, we also report the results using listwise deletion of observations with missing data for comparison.

In the current study, as a first step to multiple imputation, patterns of missing values were analysed to determine where and what amount of data were missing, possible cause of missingness (e.g., questions that are difficult, sensitive or time-consuming to answer; potential linkages in which lack of response on one item reduces response on another), and whether data were missing at random. In particular, patterns of missing data were analysed to check whether non-response on one key variable was related to non-response on others, including revenue growth, psychological capital and social capital variables. In this process, no extreme patterns emerged, with the most frequent pattern being full data and next most frequent patterns the lack of only one of the variables. As a further check on whether data were missing at random, independent samples t-tests were conducted to compare the mean scores on several other main variables for which full data were available between respondents who answered questions on revenue growth, psychological capital, and social networks versus those who did not (Chen et al., 1998). For each of the four items, dummy variables were created for “provided data” (0=no, 1=yes). The results found no significant difference in means on individual and firm-level demographics and other variables between respondents and non-respondents (see Table 8.3). This suggests the data were missing completely at random and that multiple imputation is therefore an appropriate procedure.

Finally, multiple imputation techniques were conducted to replace the missing values. Data were imputed only for continuous quantitative variables, as full data were available for all categorical variables such as gender, education, industry, and dichotomous start-up and industry experience (IBM SPSS, n.d.). Given the small size

172

of the dataset, the low end of the acceptable range for number of imputations (5–20) was chosen. The two alternative dependent variables were included in the imputation, as it is appropriate to include other variables that are highly correlated to reduce standard error of the estimated parameters (Graham et al., 2003). Minimal constraints were placed on the data during imputation to ensure that impossible values were not generated.42 The resulting dataset comprises 164 complete cases with 16 variables (two dependent variables, six controls, seven predictors, and one moderator) with analyses then run listwise on the original dataset and the post-MI dataset. To reduce common method variance, hypothesis testing also took place on the alternative dependent variable, lender rating, as seen in the supplementary analysis.

Table 8.3 Independent Samples t-Test for Non-response Bias on Revenue Growth, PsyCap, and Social Capital Versus Demographic Variables

Variable Sig. level: Response to Revenue Growth Sig. level: Response to PsyCap Sig. level: Response to Business Advice Network Sig. level: Response to Emotional Support Network Respondent age .28 .19 .42 .40 Education .24 .45 .95 .37 Firm age .29 .84 .55 .14 Firm size .11 .37 .36 .45 Loan size .71 .82 .71 .36 Psychological capital .38 - .59 .92 Network size .62 .35 - - Revenue growth - .44 .48 .26

42 _{Constraints were: minimum values of 0 for firm size, revenue growth, and loan size; minimum–}

173 8.4 Measurement and validity

Prior to hypothesis testing, constructs were examined for reliability and validity.

In document The role of human capital, social capital, and psychological capital in micro-entrepreneurship in China (Page 179-182)