Model Testing Results and Discussion
7.3 Regression analyses and assumption checking
The theoretical model and the associated interaction hypotheses were tested through five sets of moderated hierarchical regression analyses, one set for each criterion variable.
In moderated hierarchical regression analysis the interaction terms are represented by product terms. These were found by creating the product terms for each pair of predictor variables in the proposed theoretical model (Figure 4.1) that were thought to influence the five criterion variables. To counter potentially high correlations between the product scores and their components, the predictor variable scores were centred before the product terms were computed. Centering the predictor variables by subtracting the sample mean from each observed value has frequently been recommended as a potential solution to problems in moderated multiple regression analysis caused by multicollinearity (Jaccard, Turrisi & Wan, 1990; Tabachnick & Fidell, 1996).
Hierarchical regression analysis is based on the assumptions of (a) multivariate normality, (b) homoscedasticity, (c) linearity and (d) independence of cases. SPSS provides the opportunity to check these assumptions, as well as the issue of multicollinearity when running regression analyses.
The normality assumption in regression analysis expects that the residuals of the predictor variables are normally distributed. In this research, multivariate normality was evaluated graphically by plotting, for each of the five regression analyses, the residual scores in histograms with superimposed normal curves. The five histograms showed satisfactorily normal distributions.
The focus in this statistical data analysis is on the regression coefficients.
Therefore, to calculate the minimum sample size the recommendation made by Tabachnik and Fidell (1996) were followed. The authors suggested that to detect a medium size beta with alpha set at .05 and beta set at .20, with reliable measures and normally distributed criterion variables, a minimum sample size of 104 plus the number of predictor variables is required. Although these conditions were met in this research project, it needs to be pointed out that Tabachnik and Fidell’s (1996) recommendation was based on standard (one step) multiple regression analysis and that more complicated models require more cases; however, according to the authors the exact number is difficult to calculate and most analysts refrain from doing so. Secondly, the minimum sample size is based on the
assumption of medium sized betas (at least .2); should the betas turn out to be smaller, more cases are needed. In the case of this project, the required minimum sample size would be 130 cases. After listwise deletion, the sample sizes for the five moderated hierarchical regression analyses testing the theoretical model fell within the minimum
Model testing results
recommended size (i.e. N = 224 – N=227), and furthermore included additional cases to allow for small betas and the fact that a two-step hierarchical regression model was used.
The assumption of homoscedasticity requires a uniform banding of residual scores around the regression surface. In this research, homoscedasticity was assessed through examining the scatter plots of the regression standardised residuals and the regression standardised predicted values, without clear indications of heteroscedasticity.
The linearity assumption requires the residuals to have no relationship with the predicted scores of the criterion variable. The residual-predicted Y scatter plot was examined for patterns that might suggest non-linearity which the regression model has omitted.
Multicollinearity concerns the magnitude of the relationships among the predictor variables and is checked with the tolerance statistics. Tolerance scores are equal to 1 minus the r-squared of the regression of each predictor variable on all of the other predictor variables, while ignoring the criterion variable. A tolerance score for a variable approaching zero would indicate that most of that variable’s variance is explained by the other predictor variables in the regression model (Conner, 2002). None of the tolerance scores in the current research’s regression analyses fulfilled that condition, indicating that the assumption of linearity was not violated.
Finally, regression models assume that the error deviations are uncorrelated. This independence assumption is assessed through the Durbin-Watson statistic, where values below .80 tend to indicate autocorrelations. In the present study, the Durbin-Watson statistics in the five moderated hierarchical regressions were between 1.66 and 2.16, indicating that the independence assumption was not violated.
After assumption checking, the main effects and interaction effects of the predictor variables on the criterion variables were tested with t-tests of the regression coefficients. In each of the six sets of regression analyses, the predictor and moderator variables were entered in the first step of each hierarchical regression, extracting the main effects while controlling for the correlations among them. The interaction terms were then entered in the second step, testing for interaction effects whilst controlling for main effects and the
correlations among the interactions. The r-squared statistics in the full models were seen to indicate how much variance in the criterion variables is explained by all investigated effects. Due to the large number of interaction terms included and the subsequent loss of power, the R2 changes were not used as interaction detectors. To determine which main and interaction effects make significant, unique contribution to explaining the variance in the criterion variables the significance level was set at p<.05.
To more closely examine the nature of any significant interactions, the Modgraph software (http://www.vuw.ac.nz/psych/staff/paul-jose/files/helpcentre/help1_intro.php),
Model testing results
accessed 1.8.2007) was used to create graphical representations of the significant interactions, and to calculate the associated simple slope statistics. For these calculations trimmed regression models were used that included the significant main and interaction effects explaining the variance in each of the criterion variables. The reason for using trimmed regression models was that dropping predictor variables that made no significant, unique contribution to explaining the variance of the criterion variable, and allowing the sample size to increase, generated more statistical power for testing the simple slopes. The results of the Modgraph computations are graphs showing the regression lines for three levels of the moderator. The three levels of high, medium, and low were computed by the software using the mean as the medium value, one standard deviation above the mean as the high mean, and one standard deviation below the mean as the low mean (following Aiken & West, 1991).