Data cleaning and assumption for parametric analysis

The data set was checked to determine assumptions for parametric analysis. This process involved inspecting the data and conducting preliminary analysis on all continuous variables, including: Age, TAS-20 total score and subscale scores, MSPSS total score and subscale scores, VEPR total score and subscale scores and FER total score and subscale scores.

Missing data

All participants were invited to complete all measures. The only variable with missing data was the demographic questionnaire. One participant did not report accommodation status

and nine participants did not report therapeutic status (see Table 3.1). Furthermore, two YPwO reported they had committed an offence, but did not disclose the nature of this offence, so offence severity and offence type (violent or non-violent) could not be determined.

Error analysis and outliers

Data was checked for obvious input errors by visually scanning the minimum and maximum values for each variable and checking that these fell within the possible range; no input errors were found. SPSS outlier analysis, excluding cases pairwise, was conducted with the continuous variables to identify outliers and extreme values. Inspection of the frequency distributions and corresponding box plots identified several outliers. An outlier labelling technique (Hoaglin & Iglewicz, 1987), was used to identify values as true outliers, revealing the majority of outliers to be true outliers. Each data point was identified and checked for commonly reported outlier reasons, including checks for data entry error (all outliers were checked against the raw data) and intentional misreporting (no pattern was identified of one/certain participants causing outlier data) (Osborne, 2013).

Initial t-tests were conducted with outliers included and removed to establish whether removing outliers would make a difference to statistical significance (see Appendix H). Secondly, guidance was sought in handling extreme values through supervisor consultation and reviewing relevant literature, leading to the decision not to remove outlier data points with the following clear rationale. Including outliers can produce bias to subsequent analysis and introduce Type 1 errors (Field, 2013), but removing outliers and continuing parametric analysis can impact on estimation of standard error (Bakker & Wicherts, 2014). Although transformations can be applied to non-normal data as an alternative to removing outliers (Field, 2013; Pallant, 2013), some critics argue this method does not always lead to normal data distribution, and has side effects of reducing power and altering the nature of the data, subsequently impacting interpretation (Osborne, 2013). Another option is to use non-

parametric data analysis (Bakker & Wicherts, 2014), although this is less powerful than parametric analysis and can still be affected by outliers (Osborne, 2013). Bootstrapping methods (Efron & Tibshirani, 1993) are the most recent recommended appropriate approach to statistical analysis when legitimate outliers lead to a non-normal distribution (Bakker & Wicherts, 2014; Wilcox, 2012).

Check for normality

Parametric analysis assumes that the data are normally distributed in the sample. This was reviewed in the current study by visual inspection of the histograms, normal Q-Q plots and

box plots and calculating a z-score for skewness and kurtosis by dividing each by its standard error, with a z-score >1.96 indicative of an unsatisfactory level (Field, 2013). The aforementioned process indicated that many variables were not normally distributed (z score > 1.96 and/or p<.05 (see Appendix I). This is not uncommon in social science measures (Pallant, 2013), whether completed with clinical or non-clinical populations (Wright et al. 2011). Wright et al. (2011) note that parametric tests ‘often make unrealistic assumptions about variables’ distributions…in data derived from clinical samples, or when looking at groups responding at the extreme end of clinical constructs’ (p. 252). Furthermore,

psychometric factors such as number of scoring options or measuring an underlying trait not fitting the study sample may also lead to non-normal data (Bakker & Wicherts, 2014). A psychometric factor of note in the current study is that FER emotion intensity scores are likely be unequally distributed, as higher intensity emotions are naturally more likely to be accurately recognised than lower intensity emotions (Bowen et al. 2013). This skewness and kurtosis is likely to be more pronounced for emotions which literature suggests are easier to recognise (such as happiness). As several variables violated the assumptions required for parametric data analysis, bootstrapping methods (Efron & Tibshirani, 1993) were used as a robust approach to inferential statistical analyses, based on a review of the evidence available (see next section 3.2.2.4).

Inferential statistical analysis- bootstrapping

In light of the presence of legitimate outliers and non-normal data distribution with the current sample being a representative of the target population (Aguinis et al. 2013; Bakker &

Wicherts, 2014; Wilcox, 2012), bootstrapping methods (Efron & Tibshirani, 1993) available on SPSS version 20, were considered the best approach to conduct the planned inferential statistical analyses (see section 2.7).

Bootstrapping methods can be used to find standard errors and confidence intervals for almost any statistic (Field, 2013). Bootstrapping methods estimate the distribution properties of the sample by taking smaller samples from the data and calculating the mean from each bootstrap sample based on the values between which 95% of the bootstrap sample

estimates fall, also known as the bootstrap confidence interval (Field 2013). Bias corrected and accelerated (BCa) confidence intervals were used, as these are considered slightly more accurate than the 95% confidence interval, minimising the bias of mean (Efron & Tibshirani, 1993; Field, 2013). The confidence limits generated were used to test the null hypothesis for each hypothesis, accepting the null hypothesis if the BCa confidence intervals included zero. Bootstrapping methods of 2000 samples were used for t-test and ANOVA analyses, allowing inferences to be made on normally and non-normally distributed data (Field et al. 2013; Wright et al. 2011).

Sample characteristics

In document Emotion recognition and perceived social support in young people who offend (Page 84-87)