Receiver Operating Characteristic (ROC) analysis

3. Extended Results

3.9 Receiver Operating Characteristic (ROC) analysis

When considering the predictive accuracy of a test, statistics derived from the ROC analysis are the preferred indices of predictive accuracy and effect size (Swets, Dawes, & Monahan, 2000; Harris, 2003; linden, 2006; Craig, Browne, Stringer, & Hogue 2008). ROC curves are valuable tools for the assessment of the accuracy of a test by comparing it with a definitive „gold standard‟ (reference standard) test (Obuchowski & McClish, 1997). The „gold standard‟ test will specify a cut-off point which distinguishes between normal values (negative cases) from abnormal values (positive cases). Thus, indicating absence and presence of disease. If a less extreme cut-off is used, more patients are indicated as positive cases, thus improving the sensitivity of the test (i.e. the probability of rightfully concluding the disease is present in diseased patients), but at the expense of

Page 102 of 209

deteriorating specificity (i.e. the probability of rightfully concluding the disease is absent in healthy patients). ROC curves are used to describe the possible combinations of sensitivity and specificity, depending on the cut-off point that is chosen (Hout, 2003).

In this study the „gold standard‟ test was the MMSE. The accuracy of the alternative cognitive tests (index tests) was compared to MMSE in terms of identifying those who are eligible for treatment with AchI and those who are not.

3.9.1 Classifier performance

Generally, both the sensitivity and specificity of a test need to be known in order to assess its usefulness for a diagnosis (Fawcett, 2006). When selecting a cut-off point the trade off between sensitivity and specificity was considered. This was as follows:

 If the threshold for identifying those eligible for AchI, from the index test, is lowered then the number of false positives increases (the percentage of participants who were not eligible for AchI who were incorrectly classified as being eligible for AchI)

 If the threshold for identifying those eligible for AchI, from the cognitive tests, is heightened then the number of false negatives or misses increases (the percentage of participants who are incorrectly classified as not being eligible for AchI). A perfect measure would have 100% sensitivity and 100% specificity, thereby correctly identifying everyone and never misclassifying people (Bewick, Cheek, & Ball, 2004). In reality few measures are that accurate (Linden, 2006). The cut-off identified from each of the index tests needed to balance high specificity (>80%) with the least acceptable rate of false positives (>60%). However, there is no clear standard set for what percentage of sensitivity and specificity is

Page 103 of 209

acceptable. The standards used in this research have also been used in a previous studies (Barr, 1997; Blake, McKinney, Treece, Lee & Lincoln, 2002; Bewick et al, 2004; Lepeleire, Heyrman, Baro & Buntinx, 2005; Linden, 2006; Mitchell, 2009).

A limitation of ROC analysis is that the predictive values of the cognitive tests are highly sensitive to the prevalence rate of the observed outcome in the population being evaluated (Altman & Bland, 1994; Linden, 2006). When the sample has a high prevalence of the outcome the Positive Predictive Value increases (PPV), however the Negative Predictive Value (NPV) decreases. Conversely, when the prevalence of positive cases (those who are eligible for AchI) in the sample is low, the PPV decreases and NPV increases. The prevalence of participants who were eligible for AchI in this sample was 40%, therefore, the predictive accuracy of all the measures evaluated in this study was lower, due to the unequal prevalence rates. In addition, due to missing data the prevalence rate of those eligible for AchI for each ROC analysis varied according to the index test being evaluated. The prevalence of those eligible for AchI was calculated for each measure using the following metric: (A+B)/ (A+B+C+D) x 100%.

Cut-offs were identified for all the measures that correlated with the MMSE and these will be presented individually below. Once an appropriate cut-off was identified based on a balance between sensitivity and specificity, the variables were transformed with these cut-offs, within SPSS, using cross tabulations. A kappa analysis was conducted to establish the rate of true negative and false positives identified using these cut offs. Given each measure, there are four possible outcomes. If the instance is positive and it is classified as positive it is counted as a true positive; if it is classified as negative, it is counted as a false negative. If an instance is negative and it is

Page 104 of 209

classified as negative, it is counted as a true negative, if it is classified as a positive, it is counted as false positive. Given each measure and the test set, a two-by-two contingency table can be constructed representing the dispositions of the set of instances (Fawcett, 2006) see Table 4. Once this data has been completed, it is possible to calculate the false positive rate, the true positive rate, the sensitivity, specificity, and the overall positive predictive value of the cut-offs on the cognitive tests, using the metrics presented in Table 5.

Table 4: Contingency table

Reference standard test

Index test Positive Negative Total

Positive True positives A False positives B A + B Negative False Negatives C True negatives D C+D Total A + C B + D A+B+C+D

Page 105 of 209

Table 5: Metrics

Metrics False

positive rate B/ Total negatives

True

positive rate A/ Total positives

Sensitivity A/ (A+C) x 100 Specificity D/ (B + D) x 100 Positive predictive Value(PPV) A/ (A + B) Negative Predictive Value (NPV) D / (C + D) Discriminant

Ability Specificity / 2 Sensitivity +

3.9.2 Area under the curve (AUC)

A ROC analysis plots the tests true positive rate (sensitivity) against its false negative rate (1-specificity) and is constructed by estimating the sensitivity and specificity of each test for each of the participants test score. This produces a line of data points across a graph making up the “curve”. This graph is a technique for visualising, organising and selecting classifiers based on their performance (Fawcett, 2006). The AUC is a popular summary measure of the accuracy of a test. It serves as an index to describe the discriminatory property of a test, so one does not have to rely solely on visual inspection to determine how well the test performs (Bewick et al, 2004; Linden, 2006). An AUC of 0.5 is a random, an AUC between 0.5 and 0.7 represents moderate, between 0.9 and 1 represents high accuracy, and an AUC of 1 would represent the ideal test (Fischer, Bachmann & Jaeschke, 2003). However, the full AUC has been criticised because it is the

Page 106 of 209

function of both the sensitivity and specificity, therefore the AUC represents the entire range of error rates and gives equal weight to all false positive rates. The volume under the ROC surface of 1/6 corresponds to a test without discriminatory power, and the value of 1 indicates a perfect test.

The AUC value of two or more tests can be used to make comparisons of their predictive accuracy. If one test has a higher AUC value than another, this suggests that it has better predictive value and can be selected. However, caution must be taken when doing comparisons between two ROC volumes, because it is not possible to establish if there is a statistically significant difference between the AUC of two or more different measures, without appropriate computer software (Stephen, Wesseling, Schink & Jung, 2003; Chi & Zhou, 2008; Erkel & Pattynama, 2008).

3.9.3 Positive and Negative Predictive Values

The positive predictive value (PPV) of a test is the probability that a patient has a positive outcome given that they have a positive test result. This is in contrast to sensitivity, which is the probability that a patient has a positive test result given that they have a positive outcome. Similarly, the negative predictive value (NPV) is the probability that a patient has a negative outcome given that they have a negative test result, in contrast to specificity, which is the probability that a patient has a negative test result given that they have a negative outcome. The PPV and NPV were calculated for each measure and the metrics are presented in Table 5.

3.10 ROC analysis for Rey Complex Figure Test – Visual

In document The assessment of dementia severity using non-verbal cognitive tests (Page 102-107)