Diagnostic tests - diagnostic test evaluation and application of Bayes’ theorem to

Chapter 3.The diagnostic test evaluation and application of Bayes’ theorem to

3.2 Diagnostic tests

This section provides a brief description of diagnostic characteristics. First, the principles of diagnostic test evaluations and their basic notation are demonstrated. This is followed by a discussion of test performance and test accuracy and then a demonstration of moving the cut-off point is given. Finally, the trade-off between maximising sensitivity and specificity is briefly discussed.

3.2.1 Characteristics of diagnostic tests

Screening tests are a medical strategy used to detect a disease in individuals in the general population without clinical signs or symptoms of that disease, and who are at sufficient risk of the disease. Diagnostic tests are medical procedures used to confirm positive screening tests and distinguish healthy individuals from people who have the disease (Public Health Action Support Team 2010). Some of the key differences such as purpose, target population and positive result threshold are shown in Table 3.1. The five main purposes of screening and diagnostic tests are as follows: to verify a diagnosis in symptomatic patients; to screen for disease in asymptomatic patients; to provide prognostic information on patients established to have the disease, to monitor the benefits and side effects of therapy; to confirm disease absent patients (Jyoti and Richard, 2009) (Edgar. et al., 1999). In this way, early intervention and management might reduce complications and mortality from a disease. Although screening may result in an earlier diagnosis, not all screening tests have been shown to benefit the person being screened.

Over-diagnosis, misdiagnosis, and the creation of a false sense of security are some potential adverse effects of screening and diagnosis (Laking et al., 2006).

Table 3.1 Differences between screening and diagnostic tests

Screening Tests Diagnostic tests

Purpose To detect potential disease indicators To verify presence/ absence of disease Target

Test method Simple, acceptable to patients and staff Invasive, expensive but justifiable as necessary to verify diagnosis Positive result Essentially indicates suspicion of disease

(often used in combination with other risk factors) that warrants confirmation

Result provides a definite diagnosis

Cost Cheap, benefit should justify the cost since large numbers of people will need to be screened to identify a small number of potential cases

Higher costs associated with diagnostic test may be justified to establish diagnosis

Based on table 3.3.1 in diagnosis and screening: differences between screening and diagnostic tests, case finding (Ruf M. and Morgan O., 2008).

Test results may help physicians to make a diagnosis in a symptomatic patient (diagnostic testing), or identify disease in an asymptomatic patient (screen testing) (Jyoti and Richard, 2009) The most common tests provide results along a continuous or quantitative scale

(eg. blood glucose level, white blood cell count). The clinicians often use these ranges to diagnose a condition by classifying them as positive or negative and disease present or absent, based on the criteria or cut-off point. (Laking et al., 2006) Diagnostic techniques allow clinicians to allocate the right treatment to the right patient. However, the errors of diagnostic tests or misdiagnosis may risk useless and possibly harmful treatment, or prevent or delay access to beneficial treatment (Laking et al., 2006).

3.2.2 Diagnostic test performance

To show the performance of the screening test and diagnostic test, the fundamental element is the test result which is often used to evaluate the accuracy of the outcome.

Screening and diagnostic test results are shown by classifying patients into two groups:

one population with disease (or conditional), the other without disease. (Khamis, 1987)

The clinical performance of a laboratory test can be described in terms of diagnostic accuracy. The four outcomes of the test performance are shown, according to the test results, as positive (T+) or negative (T-), and according to whether the disease was truly present (D+) or absent (D-). Table 3.2 displays the status of the person being tested in the columns and the test results in the rows. For example, in cases where there is disease and the test is positive, the outcome is classified as true positive (TP). In cases without disease, but where the test result claims that disease is present, the outcome is classified as false positive (FP). On the other hand, cases without disease where the test confirms its absence are classified as true negative (TN). In cases where the patient has disease, but the test indicates that they don’t, the outcomes are classified as false negative (FN). Information on the accuracy of screening and diagnostic tests can be put into a two way table. A two by two table is the easiest and clearest way to calculate and summarize all the information about diagnostic tests.

Table 3.2 Two way classifications of results according to tests and disease status Disease

Test Present D+ n Absent D- n Total n

Positive T+ True Positive (TP) a False Positive (FP) b All positive a + b Negative T- False Negative (FN) c True Negative (TN) d All negative c + d

Total All diseased

persons

a + c All non – diseased b + d All tested persons a + b+ c+ d

3.2.3 Distributions of test results

The test results can be used to allocate people in the population into two groups: those suspected of having the disease, and those thought to be without it. Typically, the test results include quantitative results (e.g., white blood cell count in cases of suspected infection) followed by some type of distribution curve. Test results are random variables and hence are subject to a distribution, which can be, but are not necessarily, a normal distribution. The distribution of test results shows a different mean for patients with or without disease. The variation in results for patients with the disease is quite large, but a high proportion of test results are close to the mean value. A very similar pattern exists for patients without the disease, although the mean value differs.

There is some overlap in test results for the two patient groups, as can be seen in Figure 3.1. Patients with the disease are represented in the distribution to the right, and patients without disease are represented in the distribution to the left. Those patients with the disease and test values to the right of the cut-off point are TP, and those with results below the cut-off are FN; that is to say, they are wrongly identified as free from the disease. Similarly, those without the disease and with test results lower than the cut-off point are TN, and those with values above the cut-off point are FP; that is, they are wrongly identified as having the disease. The extent to which the two distributions overlap, that is the false positive and false negative rates, change when the cut-off point changes.

Figure 3.1 Distribution of the biomarker among disease and disease free in population

Disease free Diseased

Cut-off Point

Test Negative Test Positive

True Negative (TN) True Positive (TP)

False Negative (FN) False Positive (FP)

3.2.4 Cut-off point

Diagnostic test results produce two types of data, qualitative data from clinical symptoms and quantitative data from diagnostic tests. The qualitative data identify patients as being with or without disease according to the presence or absence of clinical signs or symptoms. The quantitative results classify patients as diseased or disease free on the basis of whether they fall above or below the cut-off point. The cut-off point determines how many subjects are considered to have the disease. For continuous and ordinal tests, when the disease test result is above the cut-off level, the patient is assumed to have the disease. Similarly, when the test result drops below the cut-off point, the patient is assumed to be without the disease (McMaster University Health and Sciences Centre, 1981) (Edgar. et al., 1999) (Peter., 2007). Ideally, the test should not overlap in results between those with and without disease. The test would have perfect predictive accuracy and there would be no false positives or false negatives (perfect sensitivity and specificity) (Edgar. et al., 1999). A perfect test should have high sensitivity and high specificity.

However, in reality most test results do not meet these standards, as in the case considered here. For most tests, the results will overlap between patients with and without disease. These relationships are demonstrated in the Figure 3.2.

Each cut-off point is related with a specific probability of true positive and false positive results. Line “A” indicates test results for high sensitivity and low specificity of about 90% and 60% respectively. All values that fall to the left are negative; those to the right are positive. This cut-off point criterion decreases the number of false negatives (increased sensitivity) but also increases the number of false positives (decreased specificity). Tests with a high sensitivity are often used to screen for disease, and tests with low sensitivity fail to identify many patients with disease. Screening tests tend to cast a wide net in order to pick up all cases of disease and not miss anyone, but they include some accidental positive results in people who do not actually have the disease. Moving the cut-off point may affect the sensitivity and specificity of the test result. Cut-off point “B”

is intermediate between the two (sensitivity 80%, specificity 80%). This cut-off point shows high sensitivity and high specificity in screening tests, but still produces false positives and false negatives. Cut-off point “C” shows hypothetical test results with high specificity at about 95%, but limited sensitivity at about 60%. All values that fall to the right of “C” are considered positive; those to the left are negative. This cut-off point criterion increases the number of FN (increased specificity) but also decreases the number of FP (decreased sensitivity). Tests with a high specificity are appropriate to confirm a suspected diagnosis, but cannot be used to exclude the presence of disease because it shows low sensitivity.

Patients who produce positive results on a very sensitive screening test may produce negative results on a specific confirmation test.If a test is designed to confirm a disease in

a population, a cut-off point with greater specificity and low sensitivity is selected. On the other hand, if a test is designed to screen in a general population without clinical signs or symptoms of that disease, a cut-off point with greater sensitivity and low specificity is selected.

Figure 3.2 Distribution of the biomarker of results with different cut-off points

3.2.5 Test accuracy

Diagnostic accuracy relates to the ability of a test to discriminate between people with and without the disease. Sensitivity, specificity, predictive values, and likelihood are also discussed in this section. Different measurements of diagnostic accuracy relate to the different aspects of the diagnostic procedure. Some measures assess the discriminatory power of the tests; others are used to estimate the predictive ability of the test. Some results of the test relate to the background characteristics of the population to which they are applied. Sensitivity, specificity, positive predictive value and negative predictive value can all be represented as probabilities. A summary of all terms, calculation methods and definitions related to test accuracy are shown in Table 3.4 and are denoted the same as in Table 3.2.

Sensitivity and specificity

Sensitivity and specificity are two critical components that reflect the accuracy of diagnostic tests (Altman and Bland, 1994b, NCSSM Statistics Leadership Institute, 1999) (NCSSM Statistics Leadership Institute, 1999). The sensitivity of a test is the probability that a test provides a positive result when the subject does in fact have the disease, and

Disease free Diseased

Cut-off Point

Test Negative Test Positive

A B C

specificity is the probability that the test result is negative given that the individual tested is free from the disease. Sensitivity and specificity relate only to the characteristics of the test and are unaffected by population characteristics, such as the prevalence of a disease, and so can be applied to a variety of populations (Šimundić, 2008) (Altman and Bland, 1994b).

Sensitivity and specificity are of considerable importance for a clinician. Ideally, good test results should provide high sensitivity and specificity. For example, screening and diagnostic tests can cause a test to have very high sensitivity, but sometimes these test results have low specificity. The clinicians are able to keep both specificity and sensitivity high in the test, but the tests still produce FP and FN. In a large population it is impossible to avoid FP and FN. FP are especially undesirable when screening a serious disease in a population. The clinicians do not want to tell patients that they have a serious disease when they do not actually have it (NCSSM Statistics Leadership Institute, 1999).

Both sensitivity and specificity are closely related to the concepts of type I and type II errors, as shown in Table 3.3. In the ideal test, perfect test prediction can achieve 100%

sensitivity, predicting that all people from the sick group are sick, and 100% specificity, not predicting that anyone from the health group is sick. The upper left corner relates to the correct decision to reject null hypothesis when the alternative is really true. The lower right corner corresponds to the correct decision not to reject the null hypothesis when it should not be rejected. A type I error is a FP result during diagnostic testing, while a test with a high specificity has a low type I error rate. The upper right corner relates to a test result that causes clinicians to reject the null hypothesis when it is actually true. A type II error is a FN result during diagnostic testing, while a test with a high sensitivity has a low type II error rate. The lower left corner corresponds to a test result that causes clinicians not to reject the null hypothesis when it is not true.

Table 3.3 Relationships among type I error and type II error with diagnostic test Disease

Test Present (D+) Absent (D-)

Positive (T+) True Positive (TP) False Positive (FP) (Type I error) Negative (T-) False Negative (FN)

(Type II error)

True Negative (TN)

Predictive values

Other statistical relationships between test results and the disease outcome are positive predictive value and negative predictive value. Positive predictive value (PPV) gives the probability that a patient has the disease given that they test positive, and negative predictive value (NPV) gives the probability of a patient not having the disease given that they test negative (Parmigiani, 2002). Unlike sensitivity and specificity, predictive values are highly dependent on the prevalence of the disease in the population. The PPV and NPV should only be used if the prevalence ratio of patients in the disease group is the same. With high prevalence the positive predictive value will be high for disease. The NPV moves in the opposite direction. For example, if a clinician uses a diagnostic test in a high prevalence population, the positive test result will be more likely to be truly positive than in a low prevalence population. Therefore, neither predictive value from one study should be applied to another setting in which prevalence differs (Šimundić, 2008) (Altman and Bland, 1994a). To overcome the problem of two populations having equal prevalence, positive and the negative likelihood ratio should be reported instead of PPV and NPV, as likelihood ratios do not depend on prevalence.

Likelihood ratio

Likelihood ratio (LR) is a very crucial measure of diagnostic accuracy. The likelihood ratio for positive test results (LR+) tell a clinician how likely it is that a patient has disease when there is a positive test result. It is usually higher than 1 because it is more likely that the positive test result will occur in subjects with the disease than in disease-free subjects.

Likelihood ratio for negative test results (LR-) tell a clinician how much more likely a negative result is to be found in subjects without disease than in subjects with disease. It is usually less than 1 because it is less likely that a negative test result will occur in subjects with disease than in subjects without disease (Greenhalgh, 1997) (Šimundić, 2008). Both specificity and sensitivity are used to calculate the likelihood ratio. LR+ and LR- are unaffected by prevalence of disease. Therefore, the likelihood ratio from one study can be applied in other settings, as long as the definition of the disease is not changed (Šimundić, 2008) (Jyoti and Richard, 2009).

Prevalence

The prevalence of disease in the population can be denoted by P(D+), the prior probability of a randomly selected individual from the population having the disease.

Table 3.4 Definition of terms related to test accuracy disease who have a positive test result. This is P (T+│D+) = true positive divided by the sum of true positive plus false

The false positive rate is defined as the proportion of patients without the disease who have a positive test result. This is P (T+│D-) = false positive divided by the sum of false positive plus true negative.

Specificity or True

negative rate(TNR) P (T-│D-) =

d b+d

Specificity is defined as the proportion of patients without the disease who have a negative result. This is P (T-│D-) = true negative divided by the sum of true negative plus false patients with disease who have a negative test result. This is P (T+│D-) = false negative divided by the sum of true positive plus false negative patients with a positive test result will have the disease. This is P (D+│T+) = true positive divided by the sum of true patients with a negative test result will not have disease. This is P (D-│T-) = true negative divided by the sum of true odds of having the disease after a positive test result. This is LR+ = true positive rate(sensitivity) divided by the false odds of having the disease after a negative test result. This is LR- = false negative rate(1-sensitivity) divided by the true correct result. This is true positive plus true negative divided by the sum of true positive, true negative, false positive and false negative.

Prevalence P (D+) Prevalence is defined as the proportion of patients who have the disease (McMaster University Health and Sciences Centre, 1981).

The ideal test is one that has very high sensitivity and specificity, so that the most true disease cases are identified and most non disease cases are excluded. Sometimes a test result can be positive in patients who do not actually have the disease, which is called the

false positive rate (1 – sensitivity), and can be negative in patients who do actually have the disease, which is called false negative rate (1-specificity). However, sensitivity and specificity change in opposite directions when the cut-off point of tests change, this is due to a trade-off between maximising sensitivity and specificity, as mentioned in section 3.2.3. This occurs because tests generally do not have a 100% TPR or a 100% TNR. For example, as the cut-off point for positivity is high, specificity will increase and sensitivity will decrease. Diabetes is diagnosed based on a fasting blood sugar >126 mg/dl, however If a clinician moves the cut-off point to 170mg/dl, it makes it more difficult to detect positive cases. This makes the test less sensitive (some true diabetic cases don’t have such high blood sugar levels) and more specific (people without diabetes may at times have blood sugar levels higher than 126mg/dl, but it is unlikely to be as high as 170 mg/dl). On the other hand, when lowering the cut-off point, the test becomes more sensitive but less specific. The score that is chosen as the cut-off point is determined by maximising sensitivity (true positive rate) and 1- specificity (false positive rate) across a series of cut-off points. Sensitivity and specificity always have this inverse relationship and the plot of the trade-off between sensitivity and the false negative rate is known as the ROC curve, which highlights the covariation between the two outcomes (Warner, 2004). The best diagnostic tests will be those that maximize both sensitivity and specificity.

If the detection or diagnosis of disease involves the use of more than one diagnostic or screening device, the evaluation of the diagnostic strategy entails a combination of two or more diagnostic and screening tests. Decision trees are an ideal

In document The economics of diagnostic test: the cost-effectiveness of screening test for gestational diabetes mellitus in Scotland (Page 72-82)