• No results found

User-assisted computerised assessment

1.4 Conclusions

2.1.2 User-assisted computerised assessment

User-assisted computerised assessment refers to when a user is required to add input to assist the software. This approach recognises that there are aspects of bone age assessment that may require the skill and judgement of a human observer to achieve good results. Such assistance could be required because of the cost and difficulty in automating certain aspects of the assessment, or if current automatic methods produce unreliable results compared with assistance from a user.

2.1.2.1 The CASAS system and use of a continuous scale: Tanneret al.

Early in the development of the TW method, Tanner and Cameron proposed that the pro- cess of allocating TW bone age stages was something that was suitable for a computer to perform [Tann01]. With the assistance of Tanner, Discerning Systems Inc. developed a computer-aided skeletal age scoring system (CASAS) based on the TW2 RUS method. With user-assistance, this system digitised the x-ray bone by bone using a light box and monochrome video camera. The system analysed the digitised information using methods

of classification statistics. (this system was later replaced with a digital version that was designed to process digital radiographs, although the processing remained essentially the same). Each bone was positioned and zoomed on the camera with the aid of an overlay template. If required, the captured image was histogram equalised and filtered to remove “extraneous details and radiographic imperfections” [Tann94b].

The computer performed a template matching assessment using a two dimensional fast Fourier transform. The bone was matched by finding the best average template that min- imised the root-mean-square error between the coefficients of the Fourier transform of the bone, and the coefficients of the Fourier transform of the bone template [Tann94b]. This produced a discrete stage for the bone maturity stage, but Tanner et al. took it a step fur- ther and used the root-mean-square error for the two stages above and the two stages be- low the best match. They fitted a Gaussian function to the resulting five root-mean-square error values, and used the mean from this fit as the bone maturity score. This produced a continuous bone stage between 0 and 9.0. The stage for each bone was used to calculate an overall estimate of the bone age using the TW2 system. Personal communication and internet searches indicate that this product is no longer supplied by Discerning Systems Inc. (Jan 2009).

An important contribution to the performance of this system was how representative the bone templates were of the TW bone maturity stages. The templates were generated by averaging the Fourier transform coefficients from at least 10 radiographs of each bone stage. For some of the middle stages, as many as 30 radiographs were available and were used to generate the template. The radiographs were from the TW ’Basic Series’, a series of radiographs from a London longitudinal study in which bone age had been assessed and reassessed over many years [Tann94b]. Although the radiographs from this series were used in the development of the standard for the TW skeletal maturity, they were not used in the development of the actual bone scoring system, according to information from Tanner et al. [Tann83, p4], an approach that could be flawed. The TW method places a large emphasis upon the verbal criteria that “describe shape and density markings of each bone” [Tann83, p4]. Given that the Fourier transform is linear, the process of Fourier transforming the images and taking the average of the coefficients is the same as taking the Fourier transform of the average image. Minor amounts of misregistration and scaling would result in a reduction in the high spatial frequency components in the average image (they would be ‘blurred’ by the averaging). It is difficult to say how the coefficients of the Fourier transform provide a measure of the shape and densities of the bones, to meet the verbal criteria of the TW method. If the method is based on the ability to match the exam- ined radiograph with the template image, then it is unlikely that it is doing so on the basis of morphology. The template images are critical, and the choice of the source images to create these template images becomes important.

2.1 A review of computerised assessment of bone age 35

Of all the automated bone age assessment methods to date, the CASAS system has been tested the most. This testing included evaluation of the system against radiographs used in the development of the TW method, as well as radiographs of normal children and those with proven pathologic conditions. A number of studies have investigated differences be- tween the CASAS and manual TW bone age assessments, including how often intervention by the user was required, how well the manual and CASAS methods compared in calcu- lating individual bone stages, and the overall bone age results (refer to Appendix A). They showed that on average one manual insertion was required per radiograph, the repeata- bility of the CASAS system is better than the manual TW method, and there is reasonable agreement between bone age stages using the CASAS and manual methods.

It has been claimed that the use of a continuous scale in the CASAS system leads to a smoother progression in bone age over time, with the standard deviations of the CASAS results being 50% of those for the manual TW method [Tann94b]. This is one of the rea- sons why the system has been recommended for the longitudinal assessments in children [Teun96], and the suggestion was even made that if patients are being further assessed, then their previous radiographs should be reassessed using the CASAS system [Tann94a]. In a series of 6-monthly serial radiographs of children, the stage of some bones appeared to reverse in approximately 4% of bones when using the manual method, but there were no reversals of one stage or more when using the CASAS system [Tann94a]. Another group found that in a limited sample of five patients with Turner’s syndrome, three patients showed a reversal in bone age when assessed using the CASAS system, but there were no such reversals when they used the manual TW2 and GP methods [Fris96].

General opinions on the CASAS system seemed to vary, although the general conclu- sion was that it was probably adequate for bone age assessment in children with normal bone morphology. Some of the reported drawbacks with the CASAS system were that it worked best with high resolution radiographs typical of those pre-1990, and that non- standard hand positioning and unusually shaped bones caused poor matching to bone stages [Tann01, p24] [Fris96] [Teun96].

Overall, research on the user-assisted CASAS system has shown that it is possible to in- crease repeatability, and that it is possible to use relatively simple image processing tech- niques to perform the TW method of bone age assessment with reasonable accuracy. How- ever, the system appears to fail when the bone age is assessed in children with some patho- logical conditions, most likely because of bone deformations in these conditions. The sys- tem can also require a large number of manual interventions, which reduces the objectivity of the assessment.

2.1.2.2 Analysis of the middle phalanx of the third finger using an active shape model: Niemeijeret al.

A limited implementation of the TW2 system was performed by Niemeijer et al. [Niem02] whereby they classified the stage of the middle phalanx of the third finger using an ac- tive shape model. An active shape model is an model of the mean object shape with and an added eigenvector description of the most significant variation modes for the shape– described using a covariance matrix. It is essentially an iterative, deformable model with shape constraints imposed using a statistical measure derived from a training set of con- tours. In this system, the user would draw a box around the third phalanx and the com- puter would automatically segment the enclosed bone using the active shape model. Clas- sification of the bone stage was performed by finding the highest correlation between the pixel values for a physeal region of interest and the pixel values from a set of mean images representing the limited range of TW2 stages. Compared with a trained observer, their system had an accuracy of 73%, and this compared favourably with an accuracy of 80% for a second human observer [Niem02, p62]. However, this staging was only useful over the TW2 stages of E to I, corresponding to an age range of approximately 9 to years 17 . Furthermore, by using only one bone the precision of the estimates may be poor because, for example, the step between two stages in the TW method can be as high as four years in males [Tann01, p2].

2.1.2.3 Neural network-based system using linear distance measures: Grosset al.

The alternative to implementing a morphology-based system like the TW method is to have the user directly measure features from the radiographs and to use a decision sys- tem to calculate the bone age. The approach taken by Gross et al. was to have a user position cursors on digitised hand-wrist radiographs to measure distances. The ratios of these distances were feed into a neural network that had been trained to calculate the bone age [Gros95]. They started with ten measurements, but a linear regression analysis allowed them to choose seven measurements whose ratios against chronological age gave the best correlation coefficients for a group of male children. From these correlation coefficients they chose three ratios with the highest correlation coefficients. The drawback of this tech- nique was that it ignored morphological information that is used in the GP, TW, and Fels methods. This may be one of the reasons why they found that all correlation coefficients for the ratios were less than 0.67. They found, however, that there was no significant dif- ference between the bone age derived using their neural network and that assessed by a single radiologist using the GP method. They also found no significant difference between the mean neural network bone age and bone ages from 14 images from the actual GP at- las. These results show the ability of their neural network to find relationships between variables that each have a low correlation with bone age.

2.1 A review of computerised assessment of bone age 37