Chapter 5 Stage Two: Testing and modifying the IAT to measure stereotypes of empathy in scientists
5.2.6. Data analysis
In order to preserve credibility of the results, rigorous procedures of data analysis were followed in the present study. For the statistical tests of the present study, sources of bias (e.g., missing data and outliers) were treated using techniques justified with reference to a variety of literatures. Assumptions of tests (e.g., normality and homogeneity of variance) were also checked before using any parametric test with the
data (e.g., t-test and correlation; Field, 2013).
5.2.6.1. Missing data
Firstly, Missing Value Analysis (MVA) in SPSS was performed with the dataset. The results suggest that in the present dataset 7% of the cases have missing values in academic identification, meaning that 7% of participants did not report their identification with science or liberal arts. In this case, independent t-tests were conducted to see if there are significant differences between those who have reported their academic identification and those who hadn't in terms of their SE-IAT, SE-explicit
and ISSOS performances. No significant difference was found on any of the three measures, indicating that the patterns of missing values do not depend on the data values. The missing values can therefore be deemed as missing at random (Little & Rubin, 2002). Moreover, as for other variables (e.g., age, gender, individual item in the three measures), the MVA results suggest less than 5% of the cases have missing values and are missing at random as well.
According to Field (2013), there are several methods to deal with missing data under this condition, such as case deletion, mean imputation, and regression imputation. Each method has its advantages and disadvantages. Case deletion, especially listwise deletion, is one of the most commonly used methods, which can avoid the drawback of introducing bias to the data. However, it inevitably reduces the sample size, which results in the decreased power of analysis (Field, 2013). In contrast, imputations retain the original sample size by replacing missing data with values generated in different ways. For example, mean imputation uses the variable mean of all other cases to replace the missing value. However, this poses a potential problem of attenuating correlations between variables. Given that the present study proposes to examine the implicit- explicit correlations, this method is deemed inappropriate. Regression imputation, on the other hand, makes up the pitfall of mean imputation by using a predicted regression model to generate possible values for the missing data. Yet, this may mask the uncertainty of imputed values and lead to over identified relationships between variables (Field, 2013). In this case, the present study adopted the simpler and more straightforward way of handling missing data by deleting the cases with missing values. Besides, as the sample size of the present study is relatively large (= 485) and the number of cases with missing values is very small (< 7%) the effect on sample size is acceptable. Therefore, listwise deletion was adopted in the present study as the technique to handle missing data.
5.2.6.2. Outliers
One thing that usually affects normality is outliers. An outlier is a score very different from the rest of the data (Field, 2013). To spot the outliers, z-scores were calculated for each measure and cases with z-scores greater than 3.29 in absolute value were regarded as potential outliers (Field, 2013). No outlier was spotted for the SE-IAT and SE- explicit results and only 3 outliers were found for the ISSOS results. In terms of how to deal with outliers, one can either delete the case or transform data values. Given that there were only a very small number of outliers identified in the present study, these cases were removed from the dataset on the grounds that they might have represented rather idiosyncratic situations.
5.2.6.3. Normality
Before using any parametric tests with the data (e.g., t-test and correlation), the assumption of normality is checked by examining the Skew and Kurtosis of the data. The Kolmogorov-Smirnov test or Shapiro-Wilk test is not used because in a large sample of the present study (n = 485) these tests can be significant even when the scores are only slightly different from a normal distribution (Field, 2013). Instead, Histograms, P-P plots and the values of skew and kurtosis are examined. Data for SE-IAT, SE- explicit and ISSOS are all normally distributed with skew and kurtosis values very close to 0.
5.2.6.4. Homogeneity of variance
Another important assumption for parametric tests is the homogeneity of variance because unequal variances could create bias and inconsistency in the estimate of the standard error associated with the parameter estimates in linear models (Hayes & Cai, 2007). Statisticians used to recommend testing for homogeneity of variance using Levene's test, if the assumption was violated, using an adjustment to correct it (Field,
2013). However, like the aforementioned problem with significant tests of normality, Levene's test can also be significant even for small and unimportant effects in large samples. As suggested by Field (2013), in a large sample like the present study, the assumption of homogeneity of variance is pretty much irrelevant, and can be ignored (Field, 2013).
5.2.6.5. Criteria for evaluating the SE-IAT
As discussed in the literature review, controversial evidence has been found for the psychometric properties of different IATs. Before applying the newly developed SE- IAT to investigate implicit stereotypes about empathy in scientists, it is important to assess whether the SE-IAT met relevant psychometric criteria. Selected criteria, hypotheses and applied statistical analyses to evaluate the reliability and validity of the SE-IAT are displayed in Table 5.9 below.
Table 5.9 Summary of the criteria, hypotheses and statistical analyses to evaluate the SE-IAT
Criteria Hypotheses Statistical
analyses Replicating the
IAT effect
H1: Both the reaction time (H1a) and errors (H1b) should be increased for the incompatible task as compared to the compatible task due to the theory that the incompatible task requires more cognitive capacities.
Paired sample T- test
Internal consistency
H2: SE-IAT is expected to show similar internal consistency with other IATs ranging from .70 to .90.
Split-half reliability
Relationship with explicit measures
H3: SE-IAT is expected to show little or no correlation with SE-explicit (H3a) or ISSOS
Pearson's correlation
Criteria Hypotheses Statistical analyses (H3b). However, if correlated, the correlation is
expected to be stronger for SE-explicit than ISSOS because the SE-explicit has better
structural fit with the SE-IAT than ISSOS
(H3c). Ability to capture
individual differences
H4: Both gender and major are expected to have influence on the SE-IAT performance. Women are expected to have weaker SE-IAT effect than men due to their advantage in social sensitivity (H4a). Students identified with sciences are expected to have weaker SE-IAT effect than liberal arts due to the ingroup favouritism (H4b). If there is an interaction between gender and major, individuals with unconventional identities (women in science and men in liberal arts) are expected to have weaker SE-IAT effect than those with conventional identities (women in liberal arts and men in science) due to the role
incongruity theory (H4c).
Two-way ANOVA
Resistance to order effect
H5: No difference is expected in SE-IAT results between participants who did the incompatible task first and the compatible task later and those who completed the test in a reversed order.
Independent t-test
Resistance to prior IAT experience effect
H6: No difference is expected in their SE-IAT results between participants who had done an IAT before and those who had no prior experience with IAT.
5.3. Results