5.4. Approach to Statistical Analysis
5.4.2. Multiple Regression
The final level of investigation in this thesis involved applying procedures that would best capture the relative importance of the many variables under consideration in predicting relevant parameters. As Goodwill, Alison and Beech (2009) have emphasised, in the area of offender profiling generally, there are two main ways in which this may be achieved. One way is to attempt to cluster a range of variables into themes or typologies on the basis of their statistical relationships with each other (through, for example, cluster analysis, principle components analysis, factor analysis, or Smallest Space Analysis) and then use these to predict offending. An alternative approach, is to side step the thematic or ‘clustering’ phase, and simply apply a standard multivariate regression approach to the variables under
consideration (see, for example, Edwards & Grace, 2014; Goodwill et al., 2009).
Significantly, using a sample of 85 stranger rapists, Goodwill et al. (2009) compared the three established typological/thematic models with a simple multivariate logistic regression
approach to assess their ability to predict an offender’s previous convictions from crime scene information. Their conclusion was that “predictive analyses based on a multivariate approach using a mixture of crime scene behaviours, as opposed to the grouping of
behaviours into themes or types as in the three models, far exceeded the predictive ability of the three models” (p.507). Consequently, simple correlational and multiple regression
approaches were adopted in the present thesis to assess relationships between, and predictions from, the variables as a whole.
Given the nature of the data in the present thesis, the most appropriate multiple regression approach was that used by Goodwill et al. (2009); that is, binary logistic regression (BLR), with the dependent variable coded dichotomously. Consequently, this was the primary regression tool used. Furthermore, it can be noted that when using logistic regression, the standard measure of effect size for a binary independent variable is the odds
79
ratio (OR) ; technically known as the Exponentiation of Beta or Exp(B). When beta signs are positive the interpretation of the odds ratio is straightforward; it tells you the difference in odds between the two categories in the classification. So for example, an OR of 4.30 would tell us that those in category 1 of the independent are over four times as likely to appear in category 1 of the dependent variable, as those in category 0 of the independent variable. However, when beta signs are negative the values of Exp(B) fall below 1 with a minimum possible value of 0. This makes comparisons of the ORs for positive and negative beta signs difficult to conduct as they are not on a simple linear scale. In such cases,
therefore, when citing comparative ORs in the text, the convention is to convert to the Exp(B) values to those they would have been had the sign been positive, by using the reciprocal value; i.e. 1/Exp(B). This convention was, therefore, followed in this thesis.
In addition to the standard logistic regression table, and as also recommended by Goodwill et al. (2009), and used by Edwards and Grace (2014) and Rice and Harris (1995), ROC (Receiver Operating Characteristic) analyses were also conducted. ROC analysis enables the calculation of an area under the curve (AUC) statistic to assess both the hit probability (pH) and the false alarm probability (pFA) simultaneously. For example, if we use previous/ no previous convictions for arson as the outcome measure, then the hit probability (pH), or sensitivity, assesses the probability that the regression model has correctly predicted the presence of a previous record of arson, whereas the false alarm probability (pFA), or 1-specificity, assesses the probability that the model has incorrectly predicted the presence of a previous record of arson. The AUC analysis then plots the relationship between p(H) and p(FA) producing an ROC curve, and the area under the curve (AUC) is used as a measure of the diagnostic accuracy of the prediction of previous record of arson. Hosmer and Lemeshow (2005) provide the following guide for classifying the accuracy using AUC statistics which is applied here: .90-1.00 = Outstanding, >.80- <.90 =
80
Excellent, >.70- .80 = Acceptable/Satisfactory, >.60-.70 = poor, >.50-.60 = fail. The AUC is a useful measure as it provides a general measure of the predictive value of the regression model that can be compared directly with other models (Edwards & Grace, 2014; Goodwill et al., 2009; Rice & Harris, 1995). As such by calculating AUC statistics it was possible to compare the relative efficacy of variables as predictors in each case; each can be used as a kind of effect size (Goodwill et al., 2009).
In further support of the above approach, is the use of a series of Binary Logistic Regressions (BLR) and ROC analyses is now being considered ‘common practice’ in criminal profiling literature (see Markson, Woodhams & Bond, 2010, p.98; and for further examples, Bennell, 2005; Bennell & Canter, 2002; Bennell & Jones, 2005; Fujita et al., 2012; Tonkin et al., 2008; Woodhams & Toye, 2007).
One of the most important advantages of logistic regression over standard multiple linear regression is that the former has has far fewer assumptions. Consequently, a standard forced entry procedure, that included all the main variables, was used in the following analyses (Field, 2009; Studenmund & Cassidy, 1987). The fewer assumptions are important, as unlike ANOVA and t, multiple linear regression is not robust to violations of most of its many assumptions (Field, 2009; Osborne, & Waters, 2002; Tabachnik & Fidell, 2001). However, sample size remains an issue with BLR, so in the following analyses, sample size always exceeded the fairly conservative rule of thumb for BLR of 10 observations per independent variable (Vittinghoff, & McCulloch, 2007). Also, the rule of independence of observations was followed in all cases. The assumption of linearity of the logit was not tested directly. However, this would not have been an issue in most of the analyses as the majority of the variables were binary; moreover, problems with the linearity of the logit can usually be identified by poor and non-significant pseudo R2 statistics (Nagelkerke R2), and indicators of model fit such as the AUC, which were calculated for all regressions (NRCM, 2011).
81
Another issue, concerns the method of entry used in the analysis; i.e. whether to enter all of the variables considered relevant (‘forced entry’), and base the final model on this, or use a progressive stepwise method that results in a model containing only those independent variables that contribute significantly to the model fit. For many years there has been a growing, widely held, view amongst statisticians that stepwise regression approaches are invalid and inappropriate within the framework of the scientific method. Among the many problems are that stepwise procedures capitalise on chance factors in the data, greatly exacerbate Type 1 errors, and very often produce unreliable models (see, for example, Harrel, 2015; Henderson & Velleman, 1981; Judd & McClelland, 2008; Mundry & Nunn, 2009; Knapp & Sawilowsky, 2001; Studenmund & Cassidy, 1987; Tabachnick & Fidell, 2001; Thompson, 1989, 2001; Whittingham et al., 2006; Wittink, 1988). Consequently, in the present thesis, as recommended by authorities such as Studenmund and Cassidy (1987), and Mundry and Nunn (2009), a standard, full model, forced entry procedure was adopted for all multiple regression analyses.
A final issue regarding multiple regression concerns validation of the model. One approach that has been advocated by some is data splitting; that is, splitting the data into two independent samples, and attempting to apply the regression model from one sample to another. However, this approach has been subject to a number of criticisms, major issues being the inevitable loss of statistical power, the limitations on statistical analysis imposed by decreasing the sample size, and the questionable rationale behind attempting to validate a model using a sample drawn from essentially the same population (Dallal, 2012). In the present studies, the main source of information was a sample of 355; though relatively large by some standards, given the number of variables under consideration, there was,
nevertheless, little leeway to reduce it further and still be able to make meaningful statistical inferences. A data splitting procedure was not, therefore, adopted in the present studies.
82
Rather it was assumed, that any results should be treated with appropriate caution until replicated, preferably by different researchers using an entirely different sample (see Thayer, 2002).