NON-HIERARCHICAL CLUSTER ANALYSIS - Statistical analysis

IDENTIFICATION OF WEIGHT TRANSFER STYLES

4.2.6 Statistical analysis

4.2.6.1.2 NON-HIERARCHICAL CLUSTER ANALYSIS

Non-hierarchical analysis was performed using the k-means cluster method in SPSS 10. Each cluster solution below the cut-off identified in the hierarchical process was analysed. For each, the seeds, or group means, obtained from the hierarchical analysis provided the starting point for the analysis and each golfer was clustered with the nearest seed.

Note: SPSS 10 offers only the Euclidean distance measure for this analysis, as opposed to the squared Euclidean distance measure used in the hierarchical cluster analysis. This researcher was concerned that this may have produced an inconsistency in the analysis. However, 100% of golfers clustered into the same groups when the analysis was repeated using the squared Euclidean distance measure in a custom developed Microsoft Excel spreadsheet. As such, the different distance measures did not affect the analysis.

4.2.6.1.2.1 Number of clusters

As there is no widely accepted method for deciding on the number of clusters in an analysis (e.g. Hair et al., 1995), a number of techniques were used to substantiate the analysis as recommended by Milligan and Cooper (1985).

4.2.6.1.2.1.1 Statistical methods

Milligan (1996) recommended the use of two or more statistical methods for choosing the number of clusters in a dataset. As such, all non-hierarchical solutions were

analysed using two stopping rules. Both compare the distances between cases within a cluster to distances between cases in different clusters but use different key

parameters.

1. Point Biserial Correlation

A larger correlation coefficient indicates a stronger relationship between cases within clusters compared with cases in different clusters. The optimal cluster solution was the one that returned the highest Point Biserial

Correlation coefficient. It is calculated using the following formula (equation 4.9).

Point Biserial Correlation =

SD Overall within Outside Total within proportion outside proportion . ) ( ) ( * ) ( * (within)] Mean - (outside) [Mean + Equation 4.9

Where Mean = mean distance between cases within each cluster (within) divided by the total number of distances or mean distance between each cluster (outside) divided by the total number of distances

Proportion = number of distances between cases within each cluster divided by the total number of distances (within) or between cases in different clusters divided by the total number of distances (outside).

Overall SD = standard deviation of all distances between all cases

2. C Index

The lowest C-Index value indicated the optimal solution (equation 4.10).

C-Index = (D) Minimum - (D) Maximum (D) Minimum - clusters) all - (D Sum Equation 4.10

Where D = distance between two cases.

These methods were chosen as Milligan and Cooper (1985) found them to be among the strongest methods for accurately determining the number of clusters in a data set. Formulas have been taken from the Milligan and Cooper paper.

The cluster solution was chosen if both methods indicated it was optimal. If there was no agreement between the stopping rules, the largest cluster solution (i.e. the one with the largest number of clusters) was chosen, as suggested by Milligan (1996).

4.2.6.1.2.2 Cluster validation

Similar to the decision on the number of clusters, there is no method that has been widely agreed upon for validation of clusters (e.g. Hair et al., 1995). Once again, a number of methods were used to validate clusters in this study. These were:

1. Point Biserial Correlation.

This was reported by Milligan (1981) to be one of the strongest methods of internal validation of cluster analysis. The use of this method in validation differs from its use as a stopping rule. As a stopping rule, the largest coefficient across all cluster solutions analysed indicated the optimal solution without regard for the strength of the relationship. For the validation of a cluster, the strength and significance level of the correlation are examined.

2. Replication.

In this procedure, the cluster process was repeated with three randomly drawn subsets of N = 41, or two thirds of the data. This procedure examines the stability or robustness of clusters. The number of golfers who reclassify into the same clusters as they did in the original analysis is assessed, with a higher percentage of reclassification indicative of a more robust cluster (Hodge and Petlichkoff, 2000). A qualitative assessment of the similarity of the group mean patterns was also examined.

3. Leave-one-out reclassification.

This technique eliminates a golfer from the analysis, re-calculates the cluster group means and then re-clusters the golfer using the nearest neighbour method. Successful reclassification (i.e. the removed golfer is allocated to the same

cluster) indicates robustness of the solution. An unstable cluster will be influenced by single golfers and will perform poorly in reclassification.

4. One way ANOVA.

Cluster groups were compared to identify significant differences between parameters used in clustering (internal) as well as parameters not used in clustering (external).

4.2.6.1.2.3 Theoretical considerations

The final level of decision-making in terms of number of clusters and validity of the cluster groups, and overall philosophy of the analysis, was based on finding the minimum number of meaningful clusters in the data (similar to the guiding

In document WEIGHT TRANSFER STYLES IN THE GOLF SWING: INDIVIDUAL AND GROUP ANALYSIS (Page 118-123)