CENTRE OF PRESSURE IN THE GOLF SWING: GROUP-BASED ANALYSIS
5.2.1 Parameters
5.2.2.1 Relationship between weight transfer and performance
5.2.2.1.2 MULTIPLE REGRESSION ANALYSIS
5.2.2.1.2.1 Cluster analysis to reduce the number of CP parameters
Due to lower than planned N in this study and the further reduction due to styles being present in the data, the number of parameters used in multiple regression needed to be reduced. To achieve the parameter:case ratio (number of parameters compared with
the number of cases or golfers in this study) of 1:5 recommended by Tabachnick and Fidell (1996) as a minimum requirement, cluster analysis was performed (using parameters rather than golfers as in study 1).
Cluster analysis was performed using the same process as described in study 1 but with four differences in this process:
1. Clustering grouped similar parameters together rather than similar golfers. 2. The Pearson’s correlation measure was used, rather than the squared Euclidean
distance measure, as it operates independently of measurement scales (different scales were used for the different parameters, e.g. CPy%: 1-100, CPy Velocity: approximately -5 – 5 m.s-1).
3. Validation of the solution included Point Biserial Correlation as well as cross correlations. Correlations between parameters within and between the cluster groups were visually inspected. A valid solution returned high correlations between parameters within a cluster and low correlations between parameters in different clusters. As the aim of the analysis was to reduce the number of clusters rather than identify a result that was generalisable outside of this study, no replication was performed.
4. The maximum number of clusters was predetermined by the N in each group (i.e. seven for the Front Foot group and three for the Reverse group so the 1:5 ratio was achieved). The optimal solution was chosen as the strongest Point Biserial Correlation and C-Index within these constraints.
Once the clusters were formed, one CP parameter was chosen from each cluster for use in regression analysis. This was based on two levels of decision-making:
1. Strongest partial correlation with Club Velocity (controlling for Age). 2. Indicated as significant in previous research (if correlations were similar
between parameters and Club Velocity).
Two other methods for parameter reduction were considered along with cluster analysis but were discounted on statistical and theoretical grounds. First, factor analysis was applied to the data but failed diagnostic tests (Kaiser-Meyer-Olkin and Bartlett's test of sphericity < 0.6) indicating it was inappropriate for use with this data. Second, limiting CP parameters to those that were significantly correlated with Club Velocity was also considered and performed in pilot work. However, it was felt that it would be more useful to include parameters that did not necessarily correlate
significantly with Club Velocity but might contribute significantly with other parameters in the regression predicting Club Velocity. Given it was the first time styles had been examined and that regression analyses have not been performed on weight transfer data previously, this represented exploratory work. As such,
increasing rather than decreasing the number of parameters, within the statistical recommendations for case:parameter ratios, was appropriate.
5.2.2.1.2.2 Best subsets regression
Once the parameters for use in regression analysis were chosen, a best subsets regression was applied to the data (Minitab 11). Best subsets regression uses a
combination of Mallow’s Cp (total square error) and Best Multiple R2 assessment, as recommended by Daniel and Wood (1980) to determine the ‘best’ regression equation for the data. Briefly, a regression equation was calculated for all possible
combinations of independent variables (Age was included in all) along with the total square error (Cp) for the regression. The combination of parameters chosen to
represent the best subset regression was based on the largest R2 value for the smallest Cp (error) value. Tabachnick and Fidell (1996) recommend this approach if the parameter:case ratio is low (discussed in section 5.2.2.1.2.1). This process has been used in previous sports biomechanics applications (e.g. Ball et al., 2003a; Ball et al., 2003b) and is explained in more detail in Results section 5.3.1.2.
5.2.2.1.2.3 Full regression
Once the best subset regression was chosen, a full standard linear regression analysis was performed in SPSS 10. The need to perform both Best Subsets and Full
Regression was due to the limited output of data from the Best Subsets regression. While Best Subset identified the best regression for a given set of data and outputs overall R2 and error values, it does not output information such as change in R2 for individual parameters in the regression. As such, the use of full regression was not a separate analysis from Best Subsets. Rather it was a repetition of the same analysis using software that would output more information.
As Age was strongly correlated with Club Velocity, it was included in the first block in each regression calculation. The first block refers to the process where Age is entered into the regression first and without any other parameter (to eliminate the
effects of Age on the analysis prior to examining CPy – Club Velocity). CP
parameters were included in the next block. To assess the importance of individual CPy parameters, change in R2 and p-values for each CPy parameter were examined. The effects of Age on this analysis are discussed in section 5.4.1.4.1.
To examine the robustness of the regression, a subset analysis was performed, where a randomly drawn sub-sample of two-thirds of the original sample (i.e. N = 28 for the Front Foot group and N = 12 for the Reverse group) was re-analysed. This was an adaptation of the method suggested by Tabachnick and Fidell (1996) where the sample is halved and the regression analysis repeated on both halves. Due to low N, analysis of a two-thirds subset, as used by Hodge and Petlichkoff (2000) for cluster analysis and used in study 1, was considered more appropriate for this study. For a parameter (and regression) to be considered robust it should be significant in the subset analysis as well as the original analysis.
5.2.2.1.2.4 Outliers
The data were screened for outliers and influential cases throughout the regression analysis. This was performed on three levels – univariate (z-scores), bivariate (scatterplots) and multivariate (residual analysis and Difference in fit – DFit).
Prior to best subsets analysis, univariate and bivariate outliers were examined. To identify univariate outliers, z-scores were examined within each parameter. A case with a z-score greater than 3.29 (p < 0.001; recommended by Tabachnick and Fidell, 1996) was considered an outlier. Bivariate outliers were assessed subjectively from
visual inspection of scatterplots (as performed for correlation analysis, section 5.2.2.1.1).
After the best subsets regression and prior to the full standard regression analysis, the selected parameters were included in screening for multivariate outliers using
Mahalanobis distance. A level of p < 0.001 was set as the cut-off for detecting multivariate outliers as recommended by Tabachnick and Fidell (1996). As this cut- off is different for analyses with different numbers of independent variables, the exact cut-off value is reported in the relevant Results section.
At the completion of the full regression, influential observations (i.e. observations with a notable influence on the R2 value) were assessed. This was performed using two diagnostic tests. Residuals were examined for each case with a residual value of greater than two considered an outlier (Pedhazur, 1997). DFit, the standardized
difference in predicted value with that case removed, was also examined. Cases with a larger DFit influence R2 values more substantially and need to be examined further
(DFit > 1 considered influential, Pedhazur, 1997). This assessment identified cases that either increased or decreased the R2 result substantially.
Where outliers or influential cases existed, the analysis was performed with these cases removed to examine their influence on the result.
Examples of the use of these screening methods are presented in Results section 5.3.1.2.