Semi-quantification - (I123)FP-CIT reporting: Machine Learning, Effectiveness and Clinical Inte

1 Introduction

1.2 Semi-quantification

Semi-quantification enables an objective assessment of an image to be performed, which is designed to help clinicians better and more consistently assess nigrostriatal dopaminergic function. Numerous commercial software solutions are available, including DaTQUANT (GE Healthcare) and BRASS (Hermes Medical).

Semi-quantification involves measurement of tracer uptake within regions of interest, placed over organs that are key to differential diagnosis (i.e. the striatum or subsections of the striatum such as the putamen and caudate, see Figure 1-2 for a typical example). The average voxel intensity (and hence tracer uptake concentration) within these regions is usually compared to another region of the brain, with low uptake, which represents non-specific uptake of the tracer. The ratio of the two values gives the non-specific to non-non-specific uptake ratio or striatal binding ratio (SBR). In this thesis SBR is calculated according to:

Eq 1.1

Where CS refers to the mean count level within a striatal region (or sub-region), which may be defined on a full 3D volume or summed 2D slices, and C_B refers to the mean count level within a background region, such as the occipital lobe. In addition, other ratios are often calculated as part of semi-quantitative analysis, such as left to right asymmetry ratios and caudate to putamen ratios. The regions of interest used to define the boundaries of striatal uptake are often small and are often defined on a chosen template image. Each test image is then usually registered to the template in order that regions of interest can be applied automatically. Alternative methods have also been proposed. For example, the Southampton method (39) applies a wide region of interest around the individual striata, using manual placement. Background, non-specific uptake is estimated from the remainder of the brain.

Figure 1-2 Example of the regions of interest used in the calculation of SBR. Caudate regions are shown in white, putamen regions in yellow and the region covering the occipital

lobe in green

Whichever particular method is used to define and place regions of interest, the calculated SBRs (and other ratios) are usually provided to the clinician alongside data on expected

values for ‗normal‘ (and possibly ‗abnormal‘) patients, where ‗normal‘ refers to either healthy controls or patients without dopaminergic deficit and ‗abnormal‘ covers any patients with pre-synaptic dopaminergic deficit. This gives some context to the SBR figures.

One of the major reasons why interpretation of (I123)FP-CIT images can be difficult through visual analysis alone, and why semi-quantification is recommended, is that normal striatal tracer uptake is known to decline naturally with increasing patient age (40). It is difficult for a human to visualise precisely how images appearances should change with patient age and so it can be challenging to appreciate how the tolerances on normal appearances should be adjusted for each patient. For this reason normal ranges reported with SBR results are often age-matched, for example only considering SBRs from reference patients that are within +/- 5 years of the test patient.

Another justification often presented for the use of semi-quantification software in clinic is that in a minority of cases nigrostriatal deficit can manifest as balanced loss of DaT throughout the striatum, as mentioned previously, maintaining comma-shaped striatal appearances on reconstructed images even at advanced stages of disease. In these cases reporters must examine the contrast between voxel intensities within striatal structures, as compared to non-specific uptake in the rest of the brain, in order to identify that disease is present. Appreciating the exact intensity threshold (and hence display colour) of background tissues that indicates abnormality can be difficult. The fact that striatal tissues maintain a classic normal shape could be sufficient to distract the reporter from making the correct interpretation. Semi-quantification is easily able to highlight these ‗balanced loss‘ cases as SBRs are simply a ratio of counts within striatal regions as compared to non-specific uptake regions.

1.2.1 Impact on clinical performance

A number of studies have previously sought to estimate the added value that

semi-quantification brings. This data gives a useful indication as to the level of performance gain that may be possible with image analysis tools, and may provide some justification for pursuit of more sophisticated machine learning solutions.

Albert and colleagues (41) examined 62 historical patient datasets, where SPECT imaging had originally been reported as inconclusive. Reference diagnosis was established from clinical follow-up. Following re-reconstruction with different parameters each image was

reported visually by 2 reporters and then semi-quantification was performed using BRASS.

Any study where SBR figures were less than 2 standard deviations from the mean of an age-matched normal comparison set was considered abnormal. The accuracy of visual analysis alone was found to be 89%, in line with many of the studies highlighted in section 1.1.5.

Accuracy from semi-quantification alone was 85%. Where semi-quantification and visual analysis were in concordance the accuracy was 94%, evidence that, if in agreement, semi-quantification may add confidence to visual analysis.

Along similar lines, Ueda and colleagues (42) and Suarez-pinera and colleagues (43) examined retrospective clinical data to compare the performance of semi-quantitative software with that of visual analysis alone, and then examined results from the two approaches combined. Ueda found that visual analysis had a higher sensitivity but equal specificity to semi-quantification, and that a combined approach (where results agreed) gave an even higher sensitivity (96.7%) than either in isolation (42). Suarez-pinera found no significant difference between semi-quantification and visual analysis, and found no added performance benefit from combining the two approaches (43). However, the dataset used in this case was small (32 cases), limiting the chances of measuring significant differences between approaches. In both of these studies, the optimum cut-off for the semi-quantification classification was defined from the same data to which it was applied to measure

classification performance. Therefore, performance figures are likely to represent an overestimate.

Focusing on studies where reporters were exposed to semi-quantitative output there is again a collection of relatively small scale investigations in the literature. The largest such study included 304 cases from previous clinical trials, using clinical diagnosis as the reference standard. Each case was read by 5 reporters with limited clinical experience, first using visual analysis alone and then repeated with semi-quantification results available (44). It was found that sensitivity was almost identical between the two approaches and that the

introduction of semi-quantification increased mean specificity slightly (87.9% vs 89.9%).

Interestingly, the mean confidence score of the reporters increased significantly when the semi-quantification results were available as compared to when performing visual analysis alone, apparently an advantage of semi-quantification may be in decreasing diagnostic uncertainty.

Two other studies of semi-quantification performance were carried out based on similar assumptions. Soderlund and colleagues (45) and Pencharz and colleagues (46) examined

the variability in reporting both with and without the assistance of semi-quantification software. Soderlund, using a dataset of 54 historical cases, found that mean inter-reporter variability was kappa = 0.8 for visual analysis alone. This is similar to the variability results found in 1.1.5. When reporters were given access to SBR results kappa increased to 0.86.

When both SBR results and caudate-to-putamen ratios were provided to reporters the variability between them reduced further (kappa = 0.95) (45). Pencharz, using 109 historical patient cases, found that there was no difference in accuracy between visual analysis and visual + semi-quantification combined. However, they also found that the mean number of cases per reporter that were reported as equivocal reduced from 10.6 to 3.6 after

introduction of semi-quantification results (46).

These results, taken together, confirm that semi-quantification offers some benefit in clinical practice (its usefulness in clinical trials is not considered). There is no compelling evidence of a significant increase in sensitivity or specificity as a result of introducing

quantification to the reporting process. However, it does appear that when

semi-quantification and visual analysis agree, the diagnostic accuracy of the combined results is likely to be very high. When used by image reporters, semi-quantification seems to increase confidence in image reports and there is evidence that inter-observer variability reduces as a result. These findings may partly explain why semi-quantification continues to be in routine clinical use, particularly in Europe. Conversely, the relatively modest gains achievable with semi-quantification may explain why SNM guidelines suggest that semi-quantification is not an absolute necessity (32).

Semi-quantification is an imperfect tool for assisted image reporting. Firstly, due to the small, tight regions of interest that are often used, results usually rely on accurate registration of the test image to a template. Small errors in registration can cause big differences in the

quantities measured. Secondly, semi-quantification results are usually provided to clinicians in the form of multiple SBR results (and possibly other ratio figures), each with an associated normal range or suggested normal / abnormal cut-off value. The clinician must interpret each of the SBR scores in light of normal ranges to come to an overall decision on patient

diagnosis. Therefore, there is still a significant amount of interpretation required by the reporter after image analysis. Thirdly, semi-quantification is a relatively crude classification tool. It takes no account of the shape of striatal uptake or the distribution of voxel values, or any other image features which could be affected by disease processes. Finally, it is well known that semi-quantification is highly sensitive to differences in gamma camera

equipment, scanning protocols and reconstruction methods (47–50). This is likely to be more

pronounced than the effects on visual analysis (as humans are less likely to be distracted by a slight difference in noise, for example). This dictates that individual hospitals may need to define their own normal ranges for SBR figures.

For these reasons there are benefits to be obtained from improved (I123)FP-CIT reporting software. Machine learning algorithms may be able to overcome some or all of the limitations associated with conventional semi-quantification methods and, given the industrious activity in this area, it is hypothesised that established machine learning technology is already sufficiently mature to offer improved performance in clinic. This work focuses on selection, implementation and evaluation of machine learning software to establish whether such systems offer effective diagnostic support to reporters. To this end, the following sections give an overview of machine learning algorithms along with a summary of the techniques applied to (I123)FP-CIT SPECT imaging in the recent literature, before setting out the aims of this work. Although the focus of much of the following section is on machine learning, there is no aspiration to develop a completely new algorithm, the main goal is to critically evaluate existing techniques in a clinical reporting scenario.

In document (I123)FP-CIT reporting: Machine Learning, Effectiveness and Clinical Integration (Page 28-33)