1.2 Bone age and manual methods of assessment
1.2.3 Methods of bone age assessment
1.2.3.4 Accuracy and differences between the three systems
The main difficulty with assessing the accuracy of any bone age assessment system is the lack of a gold standard. The best that can be achieved is the assessment of bone age in a large group of normal, healthy children at a single point in time. The bone ages are then compared with the chronological ages of the children, under the assumption that across a large group of children the mean differences between their bone age and chronological age will be zero. This is a cross-sectional comparison because it captures children at a single point in time. This method has been used in many studies comparing the different meth- ods of bone age assessment, although comparative studies also tend to compare bone age measurements against each other, not just against chronological age. When chronological age is not used the accuracy cannot be established, only the consistency between methods. The accuracy of the methods of bone age assessment depends on the reference population used to develop them, and its relationship to the patient population being measured. It is also dependent on the implementation of the techniques of assessment. If someone is poorly trained in a method, then it is likely that their method of assessment will not match that which was used to analyse the reference population. Hence, their bone age results will be inaccurate. Part of the problem with the accuracy arises from the subjectivity in interpreting the skeletal maturity indicators of the radiograph. All manual methods suffer from this, but the TW and Fels methods are thought to be more objective than the GP method because the bones are considered in a systematic way, and in isolation. This is thought to be one of the reasons why the TW method is slightly more reliable than the GP method [Roch88, p36].
When compared with each other, the three methods produce different results. Some of this relates to the differences in reference populations used to develop the methods, some is due to subjectivity, and some is probably due to the skeletal maturity indicators chosen for an assessment. The problem of different reference populations explains many of the research findings of differences between the methods. For example, in a study looking at the prediction of adult height for 23 boys during puberty, bone ages were assessed using the GP, TW2, and Fels [Roem97]. The unique aspect of this assessment was that it was performed by three experts in the area, including Tanner (TW2) and Roche (Fels), and the GP method used a bone-by-bone assessment. Although the sample size was small, this was a longitudinal study following the boys for at least 6 years and with bone age assessments every 8 months. The mean bone age results showed that for boys in the age range 9-15 years, the TW2 bone ages were consistently more advanced than the GP and Fels results.
1.2 Bone age and manual methods of assessment 23
A comparison of the Fels method with both the GP and TW2 methods was performed us- ing patients from the Fels longitudinal study [Roch88, p265]. On average the Fels bone ages were larger than the GP bone ages in younger children, but in young teenagers the Fels bone ages were smaller than the GP bone ages for both sexes. The opposite was found when comparing the Fels and TW2 methods. In the very young the two methods gave com- parable results, but by the age range 6.5 – 9 years the TW2 results were up to 2 years more advanced than Fels bone ages in girls, and 1.7 years more advanced in boys. In both sexes the differences between the two methods decreased as the children approached puberty. Given the similarities between the Fels and TW2 methods, the most likely explanation is the influence of different reference populations.
In addition to the reference populations, there are other inherent differences between the methods. One of these is the impact of variability in the maturity of different bones within a single radiograph. This is a major limitation of the GP method [De L99], but the TW and Fels methods partly overcome the problem because they consider each bone in isolation and then combine scores at the end. With the GP method, each bone is meant to be con- sidered in turn, but as mentioned in Section 1.2.3, there is no formal method of combining them. The result is that when there is dissociation within the radiograph such as advanced carpals compared with the long bones, it can be difficult to assign a bone age using the GP method.
There remains no agreement on which method of bone age assessment is the best to use, although it has been suggested that “some are better in specific disease states” [Alba95]. In one survey 182 paediatricians in England and Wales indicated that 76% of bone age assessments were performed using the GP method, 20% used the TW method and 4% used other methods [Buck83]. The GP method is most widely in the United States [Zeri91], although the TW method is preferred in Europe [Gree01, p178]. Some assessors consider the TW2 method too laborious for routine use. Yet the GP method requires just as much care to achieve accurate bone age estimates if it is performed as per the GP atlas [Mars77]. Part of the popularity of the GP method is the ease with which the method is performed, and the short time it takes to complete an assessment. This is because many institutions and individuals perform a method based on an overall impression of a child’s radiograph in comparison with the standard radiographs [Bull99]. This is essentially the method used for the preliminary selection of a standard and is an abbreviated GP method. This simplifies the assessment and reduces the time it takes to complete. Typical times for the GP method are 1.4 minutes, and 7.9 minutes for the TW2 method [King94].
Lastly, there is a difference in the maximum bone ages for the three methods of assessment. They are 16.5 years for the TW3 method [Tann01], 18 years for the GP method [Greu59], and 18 years for the Fels method [Roch88]. The significance of these differences is not
known.