Table 4.5: Clinical studies summary Study Digital
4.8. Use of image simulation for evaluation of detector performance
The use of phantoms in studies is important for developing our understanding o f the processes of cancer detection in mammography, but as discussed there are limitations. So it is vital that studies using radiologists in a realistic clinical scenario with clinical images are undertaken. However, it is difficult to obtain a controlled set of images for a clinical study from a screening program. It is of interest to obtain images with a range of image qualities associated with different detectors, but it may not be feasible or even ethical to image women multiple times. One solution is to adapt existing images.
The adaption of images has a long history and different methods have been used. Van Metter et al (1986) simulated images using knowledge of the characteristics of screen/film imaging receptors without reference to an original image. They added Poisson noise and the created image was blurred using convolution with a point spread function. If only unblurred Poisson noise is added to images then the variance of the noise in the images can be correct but a difference will be seen in the NPS between the simulated and real images (Veldkamp
et al 2009). A more accurate method is to use a frequency-based measure of noise (such as
NPS) to ensure that the correct correlation of noise is produced (Saito et al 2012, Saunders, Jr. and Samei 2003, Svalkvist and Bath 2010, Treiber et al 2003, Workman 2005, Yip et al 2010). To simulate an image fully from an original image the sharpness of the image needs to be adapted. Saunders and Samei (2003) showed a method for adapting noise and sharpness from a high quality, highly sampled image. Generally, these methods have been
successfully used to simulate dose reduction for a given detector and any validation was undertaken using the NPS. However, the NPS does not fully quantify image quality but may be sufficient where only noise is added to simulate a lower dose. A more complete validation of the image quality of simulated images can be achieved using contrast detail test objects (Smans et al 2010, Yip et al 2010).
Studies with real clinical images with inserted calcification clusters and masses have been undertaken by various authors (Ruschin et al 2007, Samei et al 2007, Timberg et al 2006). In these studies, all of the images were effectively reduced in dose by the addition of noise. The results of the studies demonstrated that detection of the calcification clusters was sensitive to dose but that the masses were not. Samei et al (2007) showed the mass detection remained the same for a dose reduction to a quarter but the discrimination between malignant and benign lesions did significantly reduce. Saunders et al (2007) not only changed the dose but also the sharpness of the images and found that the noise has a larger effect on diagnostic performance. One weakness for two of the studies (Samei et al 2007, Saunders, Jr. et al 2007) is that they used the location known paradigm, and so misses the search element of the clinical task. It is necessary to include search for a full evaluation of the ability o f a reader to detect and correctly interpret lesions in images (Chakraborty 2011).
Currently, the literature contains descriptions o f methods for adding noise to simulate images at a reduced dose or uses modification o f highly sampled images to simulate differences in sharpness properties (Saunders, Jr. and Samei 2003, Smans et al 2010, Svalkvist and Bath 2010, Treiber et al 2003, Van Metter et al 1986, Veldkamp et al 2009, Workman 2005, Yip et al 2010). However, there is no method to change acquired images to appear with a range of image qualities.
4.9. Discussion
4.9.1. Differences found between technologies
There are clear differences in the detection o f calcifications clusters between detector types. The detection of lesions without calcifications is less sensitive to image quality. Only a few of the above studies found differences for these lesions. This matches some o f the studies using test phantom (Huda et al 2006, Kotre 1998, Saunders, Jr. et al 2007). However, there must be a detector effect on the detection of non-calcification lesions as a few studies did show differences for invasive cancers.
The results are clear that the cancer detection rate for powdered phosphor CR is inferior to DR. To mitigate this effect, the CR systems need to operate at much higher doses than DR
to obtain equivalent detection rates. CR NIP has been measured to have a better image quality than powder phosphor CR (Young et al 2009), but no clinical studies have investigated the use of CR NIP. Yaffe et al (2013) discussed the need for a clinical study on this technology. Using current techniques, this will take a number o f years either to organise a prospective trial or to obtain sufficient cancers in one geographic region to obtain significant results.
There have been many studies comparing SFM with one or two digital technologies, in reality these types of studies become less frequent as the number of suitable sites decreases as the decision to convert to digital technologies is made. Studies comparing digital technologies are o f interest. Significant differences have been shown for good quality DR and powder phosphor CR, but it may be challenging to distinguish some real differences between some digital technologies using retrospective studies.
4.9.2. Ideal study
It is clear that clinical studies are required to provide information on cancer detection. However, screening studies require very large populations and a timescale of years to be undertaken. The challenges faced with these types o f studies are the confounding factors, such as images of different women, and the effect of compression on the appearance o f lesions. For studies into the effect of detector type on cancer detection, then grid type and radiographic factors used with different systems will increase the number of variables. In summary an ideal study would have the following features:
• Paired study (same women in each study arm)
• Same compression
• Same radiologists read all sets of images
• Sampled from same populations
• Known dose
• Realistic cancer prevalence
The study shown in chapter 8 meets these features apart from the realistic cancer prevalence. It is not practical to run this study with realistic cancer prevalence, however due to the reduction in variables the statistical power of this study is improved compared to published work.
4.10. Conclusions
It is necessary to speed up the process to judge if a technology is appropriate for breast screening. The retrospective and prospective screening studies have a large number of confounding factors. Phantom studies can be undertaken relatively quickly, but this lacks a real link to clinical effectiveness, indeed most studies focus on the possibility of dose reduction rather than the clinical effectiveness. Studies that involve real clinical images may solve many of the issues of realism, though currently the literature only changes the dose for the images and there is no method for changing the sharpness or type of noise with a view to simulating particular detectors.
This thesis shows a method for the conversion of clinical images (chapter 7) and then applies it to a clinical study for four different mammography detectors (chapter 8). The aim of this work is to extend the simulation methodology by developing and testing a method to convert an image acquired using a standard digital mammography imaging system to appear as if acquired on a system with a different detector. The advantage is that clinical images acquired during routine screening can be converted to a wide range of image qualities for observer studies. This method will allow studies that meet most of the requirements for an ideal study, and speed up the clinical evaluation of new systems.