2.7.1 Data preparation. The raw data was inputted into a Predictive Analytics
Software Version 18 (PASW v.18) spreadsheet for analysis. Firstly, data were checked for missing data.
A full dataset for demographic information was obtained, including FSIQ scores for all of the participants. Regarding the SRM-SF, only one individual question was not answered by one participant. There were multiple responses given by participants that did not meet the scoring criteria, and these were replaced on the database with ‘999’. However, every participant provided enough scorable responses (≥ 7 answers out of 11) to enable a total SRM-SF score and global stage score to be calculated.
Regarding the EPS, a full dataset for the EPS-SRI measure was obtained.
81
returned. This missing data was scored as ‘999’ on the database. Of these missing questionnaires, seven were from the male offender group, six were from the female non- offender group and 11 from the male non-offender group. The overall response rate for the EPS-BRS was 65%.
2.7.2 Interrater reliability. Thirty percent of the questionnaires in this study (N
= 20) were second-rated by an expert rater. PASW was used to randomly select five participant numbers from each of the four groups, so that each group was equally represented. The second-rater was blind to participant group and sex. Rating scores were entered into a separate database for the 20 selected participants by both the researcher and second-rater. Interrater reliability was computed for the first 10 of the randomly selected questionnaires, and scored r = .80 (p < .001). This only marginally met Gibbs’ et al. (1992) recommendations of requiring an interrater reliability of r ≥ .80. Therefore, the researcher and expert rater discussed in length the moral
reasoning stage of each individual question on the 10 questionnaires, and looked for inconsistencies in scoring. Three words in particular, ‘upset’, ‘hurt’ and ‘feel’ were scored by the researcher at too low a level. These were typically scored at stage 1/2 or stage 2, rather than stage 2/3 which was a more accurate representation of the stage score. These inconsistencies were corrected on the first 10 questionnaires, and interrater reliability was recalculated at r = .99 (p < .001), using an intra-class correlation.
The remaining 58 questionnaires were then re-rated by the researcher to correct these inconsistencies, particularly looking for the use of the words ‘upset’, ‘hurt’ and ‘feel’ in the participant responses, to ensure these were scored correctly. The second 10 questionnaires were then second-rated by the expert rater. Interrater reliability was then computed for these 10 randomly selected questionnaires, and scored r = .99 (p < .001).
82
2.7.3 Data analysis. Data analysis was undertaken by various methods.
Demographic data was explored using descriptive statistics, and tests of normality were conducted on raw data.
2.7.3.1. Tests of normality. Firstly, histograms were inspected visually to examine normal distribution. Following this, the Kolmogorov-Smirnov test (K-S test) was used to explore whether the distributions of scores significantly differed from a normal distribution. Several of the variables were not normally distributed. FSIQ of participants was significantly non-normal; D (68) = 0.16, p < .001. A histogram illustrating the distribution of FSIQ is presented in Appendix U.
The total SRM-SF score was normally distributed; D (68) = 0.07, p > .05. However in terms of the individual constructs, none of the scores were normally distributed; Contract, D (52) = 0.17, p < .001; Truth, D (52) = 0.17, p < .001; Affiliation, D (52) = 0.25, p < .001; Life, D (52) = 0.22 p < .001, Property, D (52) = 0.18, p < .001, Law, D (52) = 0.25, p < .001 and Legal Justice, D (52) = 0.16, p < .01.
In terms of the EPS-SRI, the Total Pathology score data were normally
distributed, D (68) = 0.08, p > .05, along with the subscales, positive impression, D (68) = 0.10, p > .05, and anxiety, D (68) = 0.10, p > .05. The remaining four subscales however; low self-esteem, D (68) = 0.12, p < .05, depression, D (68) = 0.13, p < .01, thought/ behaviour disorder, D (68) = 0.15, p < .001 and impulse control, D (68) = 0.13, p < .01, were significantly non-normal. Finally, regarding the EPS-BRS, the
Externalising Behaviour Problem score data were normally distributed, D (44) = 0.10, p >. 05, whereas the Internalising Behaviour Problem score data were significantly non- normal, D (44) = 0.15, p < .05.
2.7.3.2 Analysis. As not all of the data were normally distributed, bootstrapping
83
intervals, which were less susceptible to errors. It treats the sample as a population; a participant is drawn, the score (e.g. mean) is recorded, and it is replaced into the sample. In this study this procedure was performed 5000 times, providing a histogram of
bootstrapped mean scores. From these, standard error scores, confidence intervals and tests of significance can then be computed (Field, 2009). Bootstrapping is regarded as a robust alternative method when parametric assumptions are in doubt, particularly if the sample is not overly large (Preacher, Rucker & Hayes, 2007), hence its selection for use in this research study.
ANCOVA was then used to address the first two research questions, to test for differences within the calculated means of the SRM-SF and individual constructs. Main effects and interactions were examined. Bootstrap parameter estimates were determined, with bias corrected and accelerated (BCa) confidence intervals which adjust for bias and skewness in the distribution.. The F statistics presented were calculated using the
original dataset, whereas the significance levels and the 95% BCa confidence intervals were calculated through the bootstrapping procedure. When the confidence interval did not include the value zero in its range then it was deemed a significant finding; p < .05.
The latter two research questions, exploring the relationship between moral reasoning and offence severity, and exploring the relationship between moral reasoning and emotional and behavioural problems, were addressed using Spearman’s correlation coefficient, as data were non-parametric. ANOVA was also used to partially address Question 4, comparing the emotional and behavioural problems of offenders and non- offenders.
2.7.3.3 Homogeneity of variance. To test homogeneity of variance of the
regression slopes, Levene’s test of Equality of Error Variances was used. Variances were equal across the four groups; F (3, 64) = 1.41, p >.05, therefore homogeneity of
84
variance was assumed. There was no significant effect of sex on SRM-SF total score, after controlling for FSIQ; F (1, 60) = 1.26, p > .05 (BCa 95% CI = -5.50 to 1.65). There was no significant effect of offence history on SRM-SF total score, after controlling for FSIQ; F (1, 60) = 1.32, p > .05 (BCa 95% CI = -2.81 to 11.64). In addition there was no significant interaction effect between sex and offence history on SRM-SF total score, after controlling for FSIQ; F (1, 60) = 0.00, p > .05 (BCa 95% CI = -7.96 to 4.73. These were all desirable effects.
Regarding the EPS, the variances of the total pathology score (EPS-SRI) were equal for the four participant groups; F (3, 64) = 1.28, p > .05. Therefore homogeneity of variance was assumed. For the externalising behaviours score (EPS-BRS), the variances were also equal for the four groups; F (3, 64) = 1.26, p > .05, therefore homogeneity of variance was assumed. Finally for the internalising behaviour score (EPS-BRS), the variances were equal across the four groups; F (3, 64) = .44, p > .05, so homogeneity of variance was assumed.
85
Chapter Three- Results
3.1 Overview of Chapter
This chapter presents the analysis and results from this study. It begins by exploring demographic information, and making comparisons between the groups. The study hypotheses are then addressed in turn. Moral reasoning scores are inspected to see whether significant differences exist between the four participant groups, exploring the effect of sex, offence history, and the interaction between the two. In a similar manner, individual constructs from the SRM-SF are then inspected. The chapter moves on to explore the relationship between total moral reasoning score and offence severity. It then explores the relationship between moral reasoning and the presence of emotional or behavioural problems. The chapter ends with a summary of the findings.