Multivariate methods - Analytic strategy

3: Data and Methods

3.2 Analytic strategy

3.2.1 Multivariate methods

To tackle the research questions, a number of different procedures were

required. The choice of methods was based on the research question and the nature of the data. Along with three multivariate methods, two bivariate methods were used, namely Pearson’s Chi-square test of significance and Spearman’s rho. Pearson’s Chi- square is a non-parametric test that compares the observed frequencies to the expected frequencies to determine if two groups are associated with one another (McCall, 2001). Pearson’s Spearman’s Rho is a common measure of directional association between two ordinal variables or an ordinal and an interval level variable (McCall, 2001).

Binary Logistic Regression

Binary logistic regression is a form of regression that is used when the

dependent variable is a dichotomy; independent variables can be either continuous or categorical. Logistic regression is used to predict the explained variance of the

dependent variable based on the independent variables, determine the relative importance of the independent variables in terms of the dependent variable and to

understand the impact of each predictor variable net of the other variables in the model (Garson, 2009). Logistic regression applies maximum likelihood estimation after

transforming the dependent into a logit variable (the natural log of the odds of the

dependent occurring or not). Maximum likelihood estimation finds estimates of the model parameters that will most likely resemble the observations in the sample (Pampel, 2000). In this way, logistic regression estimates the odds of a certain event occurring, in this case the odds of arrest (yes/no) and odds of detection (yes/no). Binary logistic regression is therefore the most appropriate method to measure cost avoidance because in this study, the dependent variables (arrest and detection) are binary.

Ordinal Regression

To optimally develop explanatory models for illegal earnings, and stay faithful to the ordinal nature of the outcome, ordinal regression was chosen. By choosing ordinal regression, the ranking of the dependent variable, earnings from cannabis cultivation, is respected and information regarding the direction of the predictor variables on earnings is illustrated. The nature of the variable did not allow for Ordinary Least Squares

Regression, which requires continuous data and a normal distribution (Tabachnick & Fiddell, 2007). Dichotomizing the dependent variable and using logistic regression would lose important ranking of the earnings categories.

The ordinal regression model is based on the assumption that there is a latent continuous outcome variable and that the observed ordinal variable comes from

categorizing the underlying continuum. Ordinal regression is an extension of the binary logistic model, which uses the cumulative odds model, rather than the logit model, to predict the odds of being at or below a particular category. The model mimics a binary logistic regression for each of the splits in dependent variable yet it offers advantages over logistic regression and multinomial regression in that it provides a parsimonious

model and accounts for the rank ordering of the categories of the dependent variable, thereby respecting an important aspect of the data (O’Connell, 2006_{). There are few} assumptions of ordinal regression but an important one is that there are proportional or parallel odds. That is, the independent variables have the same effect for each of the categories in the dependent variable, which allows one model to be sufficient to explain the relationship between the dependent variable and a set of predictors (O’Connell, 2006; Hosmer & Lemeshow, 2000). Each logit has its own threshold values (intercept) but the same co-efficient, which means that each predictor variable is the same for the different logit functions. Rejecting parallelism implies that there may be an interaction with the predictor variables and the splits in the dependent variable. For the earnings models, tests for parallel lines indicates that parallelism was achieved in all three models (p>.05).

In logistic regression, the link function transforms the dependent variable into a logit variable (natural log of the odds). In ordinal regression, different link functions can be applied to transform the ordinal response, depending on the probability that cases fall into each of the categories. In the case for my earnings variable, a negative log-log link was applied because the probability of cases falling into the lower categories is more likely. That is, the probability that the respondents would earn less money is greater than earning more money.

Chi-square Automatic Interaction Detector (CHAID)

The Chi-square Automatic Interaction Detector (CHAID) is a method is designed for uncovering sometimes complex interactions, which would be more difficult to detect using traditional techniques. It is a criterion-based method of segmentation. That is, the segmentation is based on a criterion variable (the dependent variable), whereas a non- criterion method of segmentation is not dependent on any single variable (e.g. cluster

analysis). CHAID essentially involves creating many crosstabs and determining the most significant relationships to the dependent variable (based on the Chi-square value) to structure a tree, which is at the heart of a CHAID analysis (Hoare, 2004; Kass, 1980; Magidson, 1994). The sample is then divided further according to the values of that predictor and the selection procedure is repeated for each partition, until the most significant predictor is found. The tree stops developing once no further partitioning can be accomplished within the specified level of significance. The outcome of the

partitioning process is the creation of more or less homogeneous groups with regard to the dependent variable (Silver & Chow-Martin, 2002). One of the advantages of CHAID is its flexibility. That is, it is a nonparametric method; it thus performs well with non linear, highly skewed variables, or those with an ordinal structure (Lewis, 2000), such as the ones considered here.

CHAID analysis has two important features that complement regression analysis. First, it can uncover interaction effects and profiles that may be more difficult with

regression models. Interaction terms can also be added to regression models by manually interacting the factors or covariates. However, this manual method should be theoretically purposive because it would be labour intensive to interact all the factors in the model as an exploratory technique. In the CHAID however, all the variables of interest can be entered simultaneously and the algorithm distinguishes the combination of predictors that are important to the outcome variable.

For the all the CHAID analyses the p≤.10 critical value was used rather than the p≤.05 level for vari_{able selection because of the reduced statistical power that}

accompanies smaller sample sizes. Parent nodes could not be split if they contained less than 25 cases and child nodes could only be included of it contains a minimum number of 10 cases. Tree depth was not specified.

In document Youth criminal networks: resources in criminal achievement (Page 57-61)