Study One: A Survey of Physical Activity and Quality of Life in People with Psychosis
4.2 Aims and objectives
4.4.6 Data screening
The data was screened for the distribution of scores, outliers and missing data.
4.4.6.1 Missing data
Missing data was limited as can be seen in tables 4.4.3c, 4.4.4 & 4.4.5. The data that was missing was checked to investigate if a pattern existed across variables or participants. This was not the case, therefore the missing values were replaced with a group mean. Tabachnick & Fidell (2001) recommend using this method as it is not as liberal as using prior knowledge i.e. a well-educated guess and it is not as conservative as inserting overall mean values. Therefore the group means were based upon a person’s diagnosis as this was deemed the most important determining factor. The three missing values for the PA data were replaced with the mean PA score for that participant’s diagnosis. This was a value of 1137.13 for the participant with a missing PA value with bipolar disorder and 1222.43 for those with schizophrenia. The same method was used for the missing values on the other variables to keep the treatment of missing variables consistent.
4.4.6.2 Distributions
Most statistical tests assume that the distribution of variables is normal, especially multivariate analysis such as MANOVA and multiple regressions. The data was tested for normality by investigating the skewness and kurtosis values, the Kolmogorov-smirnov and Shapiro-Wilk values, and also by inspecting a histogram with a normal distribution curve, a normal probability plot and a detrended normal Q-Q plot. The results for the normality tests of the continuous variables are in table 4.4.6.2.
125
Table 4.4.6.2 Distributions of continuous variable
If the distribution is perfectly normal, skewness and kurtosis value would be 0.
For the distribution to be normal the Kolmogorov-Smirnov and Shapiro-Wilk significance value needs to have a significance value of more than 0.05. As table Variable Skewness Kurtosis
Kolmogorov-smirnov Shapiro-Wilk Stat Sig Stat Sig
Age 0.020 -1.182 0.092 0.183 0.957 0.012
PH -0.654 0.122 0.101 0.046 0.964 0.025
MH -0.003 -0.742 0.062 .200 0.982 0.311
General Health -0.217 -0.867 0.215 0.000 0.905 0.000 Physical
Functioning -0.876 -0.319 .244 0.000 0.823 0.000
Role Physical -0.300 -0.824 0.137 0.001 0.923 0.000 Role Emotional -0.0099 -0.935 0.186 0.000 0.909 0.000
Bodily Pain -0.679 -1.032 .256 0.000 0.800 0.000
Vitality -0.234 -0.536 .144 0.000 0.953 0.005
Mental Health 0.430 -0.404 0.195 0.000 0.915 0.000 Social
Functioning -0.639 -0.422 0.207 0.000 0.856 0.000
Autonomy -0.135 -0.405 0.062 0.200 0.983 0.359
Competence -0.092 -0.243 0.050 0.200 0.989 0.727
Relatedness -0.043 -0.802 0.074 0.200 0.970 0.063
Depression 0.768 0.060 0.108 0.024 0.936 0.001
PA total MET 2.798 10.834 0.214 0.000 0.707 0.000
PA walking MET 1.883 2.797 .227 0.000 0.722 0.000
PA moderate
MET 3.165 11.826 0.317 0.000 0.546 0.000
PA vigorous MET 3.399 13.254 0.348 0.000 0.846 0.000
PA total time 3.10 11.42 0.25 0.00 0.63 0.00
PA walking time 2.87 8.62 0.27 0.00 0.60 0.00
PA moderate
time 3.17 11.83 0.32 0.00 0.55 0.00
PA vigorous time 3.40 13.25 0.35 0.00 0.35 0.00
126
4.4.6.2 shows the only variables that appear to be normally distributed from this data are the variables autonomy, competence, relatedness and MH. After inspecting the histogram, normal probability plot and the detrended normal Q-Q plot it was decided to treat age and PH as normally distributed as the distribution curve on these histograms were relatively normal and the scores around the normal distribution line were relatively straight.
All of the non-normally distributed variables were attempted to be transformed, see section 4.4.6.2 for a description of this.
4.4.6.3 Outliers
Univariate and multivariate outliers were inspected to investigate if this could account for the non-normally distributed data.
Mahalanobis distance was employed to search for multivariate outliers. The mahalanobis distance showed that there were no multivariate outliers. The distance for the variables used in this study was 15.954, which is under 24.32, the critical value of a chi-square based on 7 variables. These 7 variables include the two subscales of QoL (PH & MH), total volume of PA, depression, autonomy, competence and relatedness.
Boxplots were inspected to assess if univariate outliers existed on each of the scales and subscales. Numerous outliers were evident on all of the PA variables, one outlier was evident on PH, and there were no other outliers on the non-normally distributed variables.
Multivariate statistics are extremely sensitive to outliers therefore it is imperative that they are considered. Initially these outliers were verified as being entered correctly. According to Tabachnick & Fidell (2001) there are a number of strategies to reduce the influence of outliers. Outliers could be deleted, however as multivariate statistics are also sensitive to a small sample size this was decided against as a number of values would need to be deleted from the PA variables. In addition, the outliers were seen as accurate and representative of the population.
127
Different strategies were employed for treating the outliers for the PA variables and PH. As PH was normally distributed and only had one outlier, this case was changed to be only one raw score below the next most extreme case in the distribution. The score was changed from 12.30 to 19.92. Tabachnick & Fidell (2001) suggest that this method is attractive as often measurement of variables is arbitrary as is the case with QoL measures and the SF-12.
The outliers on the PA variables were kept because they are deemed representative of the population, therefore an attempt to transform these variables alongside the other non-normally distributed depression and QoL subscales was undertaken.
4.4.6.4 Transformation of variables
Transformations were attempted on all of the non-normally distributed variables.
Transformation of variables is not always recommended as interpretation of the variables can become difficult. Tabachnick & Fidell (2001) suggest that if the scale in which the variable is measured is meaningful, transformation can hinder interpretation. However, if the measurement of the scale is arbitrary, interpretation should not be any more difficult. METs are a representative measure of PA energy expenditure and could be described as meaningful.
However, the interpretation required is of correlations and therefore this interpretation should not be hindered through transformation of the variable.
The measure of depression, the BDI-II, is an arbitrary measure, as are the 8 subscales of the QoL measure.
Transformations can improve the analysis, reduce the influence of outliers and help the variables meet the assumptions of the statistical analysis (Tabachnick &
Fidell, 2001). The normal distribution curves and histograms were inspected to decide which method of transformation was required.
Following numerous attempts to transform the variables using various methods, only the total volume of PA score and depression were able to be transformed into a normal distribution.
128
A logarithm was carried out to transform the total score of PA and a square root was undertaken for the depression data. After transformation both variables were normally distributed with no outliers, see table 4.4.6.4.
Table 4.4.6.4 Distributions of total volume of PA and depression after transformation.
Mean SD Range Skewness Kurtosis
Kolmogorov-smirnov Shapiro-Wilk Stat Sig Stat Sig Total
volume of PA
2.93 0.52 1.52 –
4.07 -0.274 -0.179 0.08 0.20 0.99 0.77 Depression 3.72 1.86 0-7.48 -0.264 -0.501 0.07 0.20 0.98 0.14