Table 8.13: Difference in AUC {p valuQs) between study arms for

Comparison of

arms Difference in AUC

1-2 0.013 (0.17) 1-3 0.043 (0.0001*) 1-4 0.041 (0.0001*) 2-3 0.029 (0.0043*) 2-4 0.028 (0.0063*) 3-4 -0.0014(0.88)

Significant difference (p-value<0.05)

8.9.8. Learning and memory effects

In these types of studies learning and memory effects may influence the results:

Learning effect: The observer may improve their score for later images as they become

more familiar with the task. Therefore, the observers participated in a pilot study before the main study to improve their familiarity with the task. Also the observers were asked to undertake a few practice cases before each session.

Memory effect: The observer may be able to remember some of the images and

therefore the existence or even the location of lesions. A minimum interval of 14 days between seeing the same image was implemented to reduce this effect. Each observer was interviewed at the end o f the study, and some of the observers stated they could remember some of the more memorable cases.

To reduce the effects of learning and memory on the results of the study, the observers viewed the images in a different order from each other. It is of interest to test whether the learning and memory effect can be measured in the data. The FROC data were reorganised such that the first set contains the results from the cases that each observer saw first and so on to the 4* set which was the last viewing of the cases. Each of these new sets will contain a mixture of image qualities and lesion types. The AUC for each set for the order of viewing the cases is in table 8.14. There is no obvious pattern to the results.

O rd er of viewing Calcifications Non-calcifications

ist _{0.696 (0.025)} _{0.659 (0.028)}

2nd _{0.705 (0.025)} _{0.650 (0.027)}

3rd _{0.695 (0.024)} _{0.653 (0.029)}

4th _{0.717 (0.025)} _{0.658 (0.028)}

The F-test (table 8.15) showed that there are no significant differences between these sets. Therefore, if there were learning and memory effects on the results, they were too small to be measured with this data.

Study F-value /i-value

Calcifications 0.74 0.54

N on-calcifications 0.3 0.82

8.10. How would detector type affect the cancer detection rate in

breast screening?

This section estimates the effect detector technology could have on cancer detection in a breast screening programme. The results cannot be used to estimate the cancer detection

rate in a screening programme but can be used to estimate the differences in cancer detection rate for different detector types.

8.10.1. Differences between this study and reporting fo r breast screening

Clearly, there are differences between a screening programme and this study. The study was undertaken with only one view rather than the standard two view mammography. The main reason for this was that the calcification template was two-dimensional and so could not be accurately inserted into two views of the same breast. Mammography reporting using two views is the standard in the breast screening in the NHS. The observers did state that they missed having a second view and that this resulted in a number of lesions that were marked, but probably would not have been if the second view had been available. Also during reporting, the reader has access to any previous images to check on changes to the breast and so this study can be considered equivalent to the prevalent round of screening. Due to time and financial constraints, it was necessary with this type o f study to have a high prevalence of cancers compared to the standard screening process. It has been shown that for these types of study, that both the detection rate and false recall rate are higher than if the images had been seen during screening (Evans et al2Q\2>, Hendrick et al 2008). This effect has been attributed to the prevalence effect i.e. the observer is expecting to mark a lot of cancers and so would mark cancers that they would not have recalled during screening. To aid the analysis and statistics, the observers were asked to make marks on any suspicious area even if they believed that the lesion was unlikely to be malignant. This may have also increased the cancer detection rate in the study.

8.10.2. Published comparisons o f breast cancer detection fo r DR and CR detectors

Studies reviewing screening data firom breast screening programmes are discussed in detail in chapter 4. There are three papers (Bosmans et al 2013, Chiarelli et al 2013, Séradour et al 2014) that compare DR detectors (equivalent to a combination of arms 1 and 2) and powder phosphor CR detectors (equivalent to arm 4) in large screening programmes. Table 8.16 shows a summary of the difference in cancer detection for DCIS and invasive cancers, where DCIS are mostly seen as calcification clusters and invasive cancers are mostly seen as non-calcification lesions.

The results show that CR had a much lower cancer detection rate than DR, but the difference can be reduced by using higher doses for CR (Bosmans et al 2013). Warren et al (2012) have shown that the detection of calcification clusters is sensitive not only to the detector but also to the dose. So the relative doses for the DR and CR in the Ontario and BdR studies are important. The Ontario study was undertaken over 2008 and 2009. In 2009 a dose survey showed that the mean of the average glandular doses was only 17% higher for

CR than DR (Yaffe et al 2013). The BdR study did not state the dose of the systems. However, all of the systems were tested using the EC guidelines (EC 2006), and it might be expected that the CR would have been set up with a higher dose.

In document The Effect of Image Quality On Cancer Detection in Mammography. (Page 158-161)