Data Screening and Assumption Testing

4.2 Statistical Analyses

4.2.1 Data Screening and Assumption Testing

The generation of sample frequencies and descriptives allowed for data screening to occur. There were no out-of-range values noted and the means and standard deviations were plausible. Missing data were only found for four participants on the social support questionnaire. Since the items left out were not going to be used in the main analysis, and because there was a relatively large amount of omitted data, it was decided that the missing

data would not be substituted, and their cases were not included in the descriptive analysis, through pairwise deletion.

Assumptions of univariate normality, linearity and homoscedasticity were tested for those variables that were to be used in the correlational analysis. Upon inspection of histograms it was observed that many of the variables were not normally distributed. Some skewness was to be expected, given the nature of the data, with variables tending to show skewness toward the functional end of the scale. Measures of skewness were obtained, and it was noted that three of the seven measures (antenatal bonding, postnatal bonding at six weeks and received social support in the antenatal period) had skews greater than -1.00. The most heavily skewed variable was that of received social support with a skew of -1.32. Scatter plots showed no signs of non-linearity but the assumption of homoscedasticity was somewhat violated for received social support. This finding was not so surprising, having already noted the presence of skewness in the same variable. However, these findings indicated that its inclusion in the analysis of both the correlations and the regression needed to be reconsidered.

Multiple regression assumes normality of the dependent variable, that there is a certain ratio of cases to independent variables, that issues relating to outliers and the existence of multicollinearity are dealt with; and that there is normality, linearity, homoscedasticity and independence of residuals.

Some skewness had been noted above for the dependent variable, postnatal bonding at six weeks. However, a one-sample Kolmogorov-Smirnov test indicated that the assumption was satisfied as the observed p-value (.608) exceeded the 0.5 level of

significance required to accept the null hypothesis, allowing the data to be fitted by a multiple regression model.

The issue of the cases to independent variables ratio is related to the research design and was discussed in the power analysis section (Chapter 3, page 80) where the sample size was determined in such a way to ensure adequate power and effect size. Because a higher attrition of participants over time occurred than was originally estimated, sample size was slightly smaller than proposed (52 actual participants participated in both the first and second time point, compared to the 60 participants proposed by the sample size determination in the power analysis). This would of course mean that if the model were to be tested as proposed, the observed power may not be sufficient, particularly if the effect

size was small. While it was still considered feasible to conduct the analysis with this number of participants, it must be noted that the results should be treated as preliminary. In an attempt to improve the observed power, it was decided that the regression model would also be tested without the independent variable of received social support, thereby decreasing the number of independent variables included in the model. The regression was run both ways and it was noted that the model fit was not improved in any way with the inclusion of the social support variable. For this reason the social support variable was not included in the multiple regression analysis reported below, seeing as its inclusion had

An analysis of outliers, both univariate and multivariate was carried out, and extreme cases were noted. Due to the relatively small size of the sample, the outliers were not removed immediately from the analyses. Instead, the analyses were carried out both with and without these cases. It was found that removing univariate outliers neither improved the Pearson r correlations to a significant level nor did it significantly improve

the fit for the hypothesised multiple regression model. Diagnostic tools, including studentized deleted residuals, Cook’s distances and Leverage values, were used to check for multivariate outliers and influential points. The removal of the three cases shown to be multivariate outliers did not significantly improve the fit of the model, but it somewhat increased the regression coefficient value.

There were no issues of multicollinearity noted. While it was observed that antenatal anxiety and antenatal depression were highly correlated (r =0.76, p = 0.000), this

correlation was still lower than the recommended r-value of 0.80 used to assess

independence of variables. As a further check collinearity statistics were also run, with tolerance levels not indicating evidence of multicollinearity.

Having removed the three cases that showed evidence of being outliers, the scatter plot of standardized residuals plotted against standardized predicted values indicated that the assumptions of multivariate homoscedasticity and linearity had been met. The normal probability plot indicated that the assumption of multivariate normality was not violated.

In document ANTENATAL PREDICTORS OF MATERNAL BONDING FOR ADOLESCENT MOTHERS (Page 109-113)