53 reliability were the same, a uniform classification criteria was applied across all validity and

reliability components, for all items within the DHAQ. Classification involved several steps.

Step one involved the classification of PA. Items were classified as displaying very good, good, moderate, or poor agreement according to the criteria outlined in Table 2. In a study of sport expertise development in triathletes, Baker and colleagues (2005) reported PA values of 70% as ‗reasonable agreement‘ and values above 80% as ‗high agreement‘ for retrospective recall of hours involved in sport specific training activities. Similarly, a 20% change score for retrospective recall of hours involved in physical activity in a sample of master athletes was considered acceptable in an investigation of the reliability of a semi-structured interview procedure relating to predictors of physical activity in older adults, while percent agreements of 63% and 72% for recall of several categorical variables relating to physical activity where deemed to be ‗less reliable‘ (MacDonald et al., 2009). In the absence of established criteria for acceptable levels of PA, and given the relatively limited use of PA statistics in studies of sport expertise development, these examples were used as a basis for the classification criteria adopted in the current study.

Step two involved the classification of the ICC. Two factors were considered in this process: a) the strength of the correlation (i.e. the value of the correlation coefficient); and b) the significance of the correlation (i.e. the p value associated with the correlation coefficient). ICCs were considered very strong if the value of the coefficient was equal to or exceeded .80, strong if the coefficient ranged from .65 to .79, reasonable if the coefficient ranged from .50 to .64, and weak if the value of the coefficient was .49 or below. Interpretation of the magnitude of ICCs is inconsistent, with no apparent guidelines for the coefficients as they relate to analytical research goals Atkinson and Neville (1998), and Kottner and Dassen (2008) suggested that the comparison of ICCs between studies is limited because the value of the coefficient is influenced by differences in the characteristics of the study participants, and because there are multiple models available for the calculation of the coefficient. Additionally, Costa-Santos et al., (2011) observed that a sample of medical clinicians and biostatisticians were inconsistent in their interpretations of ICCs despite being presented with the same set of results.

A wide range of ICCs have been reported as ‗high‘ or ‗good‘ within reliability investigations across a variety of fields. In the medical investigation described above, correlation coefficients as low as .65 were rated as ‗good‘ (Costa-Santos et al., 2011), while values of .90 have been recommended as the cut-off criteria for ‗high‘ correlations in tests of physiological capacities (Lemmink, Elferink-Gemser, & Visscher, 2004). In investigations of reliability of retrospective recall for recreational and occupational physical activity, values above .65 tend to be considered ‗high‘ (Ropponen et al., 2001), while values of .35 to .55 have

54

been considered ‗moderate‘ (Ainsworth, Richardson, Jacobs Jr., Leon, & Sternfeld, 1999; Reis, Dubose, Ainsworth, Macera, & Yore, 2005). The classification criteria established for the strength of the ICCs obtained in this study were based upon the studies of physical activity recall described above, and a suggestion that ICCs can be interpreted according to the well established criteria for the interpretation of the kappa statistic (Garson, 2012).

In a similar fashion, ICCs were considered highly significant if the p value was less than or equal to .01, significant if the p value ranged between .02 and .05, approaching significance if the p value was in the range of .06 to .10, and non-significant if the p value exceeded .10. The p value indicates the chance that the null hypothesis is rejected when it is actually true (Tabachnick & Fidell, 2007), or in other words, the chance that a significant finding is reported when in fact it is not significant. In scientific research, a p value of .05 is generally accepted as a suitable criterion for determining statistical significance, as this indicates there is a 5% chance of incorrectly reporting a significant result (Tabachnick & Fidell, 2007). Values less than .01 are also commonly acknowledged, as these findings indicate the chance the null hypothesis has been incorrectly rejected is less than 1% (Hopkins, 2000a). The criteria adopted in this study for classification of the ICC as significant or highly significant, were based upon these conventions. In many cases p values above .05 are considered non-significant, however, the significance of a correlation coefficient is highly dependent upon the number of participants involved in the investigation (Haggard, 1958; Hopkins, 2000b; Morrow Jr. & Jackson, 1993). Due to the small sample size involved in this study, p values ranging between .06 and .10 were classified as approaching significance in order to allow a small buffer before removing an item from the DHAQ on the basis of displaying a non significant ICC. ICCs for which the p value exceeded .10 were classified as non-significant, as questionnaire items with probabilities of incorrectly rejecting the null hypothesis greater than 10% were deemed to be too uncertain to be considered for further investigation.

Following classification of the strength and the significance of the ICC for each item within the DHAQ, correlations were given an overall rating of very good, good, moderate, or poor, according to the criteria outlined in Table 2. Once again, as the significance of a correlation coefficient is highly dependent upon sample size (Haggard, 1958; Hopkins, 2000b; Morrow Jr. & Jackson, 1993), the strength of the correlation was weighted more heavily than its significance in the classification of the ICC overall. As such, any items with a very strong, strong, or reasonable ICC were respectively classified as having a very good, good, or moderate correlation overall, providing the coefficient was either highly significant, significant, or approaching significance. All items displaying a non-significant ICC were classified as poor overall regardless of the strength of the correlation. Similarly, all items rated as having a weak

55

In document The Developmental History of Athletes Questionnaire: Towards a comprehensive understanding of the development of sport expertise (Page 53-55)