ANALYSIS OF NOTATION DATA: RELIABILITY Mike Hughes
9.5 VISUAL INTERPRETATION OF THE DATA (A MODIFIED BLAND AND ALTMAN PLOT)
9.5.1 Sample data
Consider an example from rugby union. Five analysts were to undertake an analysis of the recent World Cup and all five notated the same match twice each, so that data was available for intra- and inter-operator reliability studies. The data for the frequencies of the simple variable actions of tackle, pass, ruck, kick, scrum and line are shown in Table 9.3. An accepted way of testing these operators would be to use χ2 and percentage differences for the intra-operator tests, and to use Kruskal–Wallis and percentage differences for the inter-operator Figure 9.4 A Bland and Altman plot of the differences in rally length plotted
against the mean of the rally length from the two tests
analysis of notation data: reliability
157
tests. The χ2 and Kruskal–Wallis tests reflect the shape of the data sets rather than the actual differences and so there is a need for a second simple difference test. However, care must be taken with the percentage difference test, in both its definition and application.
The overall percentage differences in Table 9.3 show a satisfactory analysis, all of the operators scoring 5 per cent or less. They are presented in Figure 9.5 as a plot of (Σ (mod (V1−V2)) / Vmean) × 100 per cent) for each operator. As the expected limits of agreement in this study were 5 per cent then all the data should fall
Figure 9.5 The overall data from the reliability study, the intra-operator test, presented as a function of the accuracy of each operator
Table 9.3 Data from a rugby match notated twice by five different operators and presented as an intra-operator reliability analysis
Operators L S I G O
V1 V2 V1 V2 V1 V2 V1 V2 V1 V2
Tackle 51 53 53 55 54 49 53 56 53 55
Pass 102 108 97 99 94 97 98 99 99 97
Kick 39 40 38 39 39 38 39 41 37 39
Ruck 49 49 50 51 49 46 51 49 44 49
Scrum 6 6 6 6 6 6 6 6 6 6
Line 14 15 14 14 14 15 14 15 14 15
Σ (mod (V1−V2)) 10 6 13 9 12
Σ (V1+V2)/2 264.5 261 258.5 263.5 260
% Error overall 3.8 2.3 5.0 3.4 4.6
analysis of notation data: reliability
158
below this line. There is a similarity in presenting the data in this way with that of the Bland and Altman plot, the visual power of the chart is its ability to immediately identify those measurements that are in danger of transgressing the limits of agreements.
The processed data in Table 9.4 is the intra-operator test for reliability of each of the separate variables with the difference between tests 1 and 2 shown as a percentage of the respective operators’ mean for that particular variable. The error percentages for each of the variables vary much more, as would be expected, depending upon the degree of difficulty of recognition of the defined action. This variation may depend upon the accuracy of the operational definition of that action by the operators, the quantity of training of the operators, or it may be that there are accepted difficulties in observation of that particular variable.
Some observations are more difficult to make than others, for example, in rugby union, deciding when a maul becomes a ruck or, using the previous example, the identification of position in a squash match. It is logical then to have different levels of expected accuracy for different variables. Some research papers have argued for different levels of accuracy to be acceptable, because of the nature of the data they were measuring (Hughes and Franks 1991; Wilson and Barnes 1998).
These data were then plotted as percentages against each of the actions in Figure 9.6a. They can also be plotted against each of the operators (Figure 9.6b) to test which of the operators are more, or least, reliable and by how much. These charts are very useful, highlighting which variables are most contributing to viola-tions of the levels of expected reliability.
Table 9.4 Data from a rugby match notated twice by five different operators and the differences for each operator expressed as a percentage of the respective mean
L S I G O
Tackle 3.8 3.7 9.7 5.5 3.7
Pass 5.7 2.0 3.1 1.0 2.0
Kick 2.5 2.6 2.6 2.5 5.3
Ruck 0 2.0 4.0 4.0 10.8
Scrum 0 0 0 0 0
Line 6.9 0 6.9 6.9 6.9
analysis of notation data: reliability
159
SUMMARY
It was found that many research papers in performance analysis present no reli-ability tests whatsoever and, when they do, they apply inappropriate statistical processes for these tests, and the subsequent data processing. Many research papers have used parametric tests in the past – these were found to be slightly less sensitive than the non-parametric tests, and they did not respond to large differences within the data. Further, the generally accepted tests for comparing sets of non-parametric data, χ2-test analysis and Kruskal–Wallis, were found to be insensitive to relatively large changes within the data. It would seem that a
(a)
(b)
Figure 9.6 The data from the reliability study, the intra-operator test, presented as a function of the action variables and the operators
analysis of notation data: reliability
160
simple percentage calculation gives the best indicator of reliability, but it was demonstrated that these tests can also lead to errors and confusion. The following conditions should be applied.
The data should initially retain its sequentiality and be cross-checked item against item
Any data processing should be carefully examined as these processes can mask original observation errors
The reliability test should be examined to the same depth of analysis as the subsequent data processing, rather than being performed on just some of the summary data
Careful definition of the variables involved in the percentage calculation is necessary to avoid confusion in the mind of the reader, and also to prevent any compromise of the reliability study
It is recommended that a calculation based upon:
(Σ (mod (V1−V2)) / Vmean) × 100%,
where V1 and V2 are variables, Vmean their mean, mod = modulus and Σ means ‘sum of’, is used to calculate percentage error for each variable involved in the observation system, and these are plotted against each vari-able, and each operator. This will give a powerful and immediate visual image of the reliability tests.
It is recommended that further work examine the problems of sufficiency of data, first to ensure that the data for reliability is significant, and also to confirm that the data present in a ‘performance profile’ have reached stable means.
analysis of notation data: reliability