To test or not to test: using the evidence base to calculate how useful a test is

Like everything else, diagnosis is never definitive. At best, the clinician considers the level of certainty to which they judge a condition to exist. Thus, words such as possibly, probably, likely, unlikely, etc are often used to describe this level of certainty (Gilbert et al, 2001b). Although these words are often used as casual adjectives during clinical con- versation, they do have a more specific, mathematical meaning to them which can assist in judging the usefulness of a test. It is with this mathematical concept in mind that the clinical utility of the VBI positional tests can be analyzed.

Straus et al (2005) define a model which can be used to assess the clinical usefulness of diagnostic tests by considering relevant published data relating to the accuracy of that test. This model of diagnostic utility is used extensively in medicine to assist the clinician in evidence-based decision-making and there is a wealth of available information demonstrating the pragmatic value of such an approach (e.g. Altman & Bland, 1994a, 1994b; Deeks & Altman, 2004; Elstein & Schwarz, 2002a, 2002b; Gilbert et al, 2001a, 2001b; Johnson et al, 2001; Knottnerus et al, 2002; Round, 2001; Sackett & Haynes, 2002). Despite being subject to potential criticism on the basis of being too quantitative for healthcare decisions (Downing & Hunter, 2003), such data-driven modelling is encour- aged and supported in a number of ‘human’ activities (Chae et al, 2003; Fang et al, 2003a, 2003b; Kopelman et al, 1999; Niskanen, 1999; Tien, 2003). A number of physiotherapy-specific papers have been published which have utilized the concept of this model related to, for example, shoulder labral tests (Stratford, 2001), and the VBI rotational test (Gross et al, 2005; Ritcher & Reinking, 2005). Information provided by the latter two papers, plus other published data, can now be used to enhance the clinical appreciation of

both the clinical utility of the VBI test, and the concept of probability analysis in relation to clinical practice.

As mentioned above, there are certain key words which need to be considered when assessing the diagnostic role of a test. The conceptual mathematical meaning of these words needs to be appreciated in order to understand how assessment of the diagnostic test works. Davidson (2002) provides a thor- ough overview of these concepts and as such the detail presented here will be brief and relevant to the VBI test. Table 3.2 provides simple definitions

Table 3.2 Summary of key terms associated with diagnostic utility analysis

Term Definition

Probability This is expressed as a number within the range 0.1 to 1.0 (1 to 100%) which reflects the clinician’s estimate of how likely it is that the condition exists. Pre-test probability is this estimate before a diagnostic test is undertaken, and post-test probability is the estimate after the test has been performed Sensitivity The ‘true positive rate’. The sensitivity of a test refers to that proportion of people who actually have the condition being correctly identified by the test. A negative result from a highly sensitive test will confidently rule the presence of the condition out

Specificity The ‘true negative rate’. The specificity of a test refers to that proportion of people who do not have the condition being correctly identified by the test as not having it. A positive result from a highly specific test will confidently rule the condition in. Sensitivity and specificity are expressed numerically as a percentage between 0 and 100

Prevalence The proportion of people within a given sample or population who actually have the condition. This again is expressed as a percentage and is synonymous with the pre-test probability Likelihood

ratio (LR)

The ratio between getting a true result and getting a false result. The LR is expressed as a number between 0.001 and 1000. The further the LR is away from 1 (above 1 for LR positive; below 1 for LR negative), the better the test is at separating those with the condition from those without it. The LR represents the valve of a test for increasing the certainity about a diagnosis.

of the terms needed to be understood when deter- mining the diagnostic utility of a test.

The presence of a condition existing in a patient can also be referred to in terms of odds, e.g. ‘have the odds of this condition actually existing changed since the test was performed?’. Odds are different, but intrinsically related to, probability. Odds are an expression of probability divided by 1 minus probability.

All the above information can be calculated by using data from available studies that have focused on the diagnostic accuracy of a test. In terms of the VBI test, this means studies that have used blood flow analysis of a high calibre or ‘Gold Stan- dard’/‘near Gold Standard’ (i.e. either magnetic res- onance angiography (MRA) or Doppler ultrasound insonation) to examine the effect of the position used in the VBI test (e.g. cervical rotation) on blood flow in the vertebrobasilar system (seeing as the underlying principle of the VBI test is that blood flow is affected by this position). Clinically, a positive test is defined as one where the patient reports the onset of symptoms during this sustained position. Valid studies would, therefore, be those which report on both changes in blood flow and the onset of symptoms during examination of the subjects. For a test to be useful, it would be necessary for these two factors to correlate, i.e. those people who demonstrate reduced blood flow also report the onset of symptoms during the time of this change, and equally, those people who do not

demonstrate a reduction in blood flow report no onset of symptoms.

So, we can extract data from a number of studies to input into our utility calculation. For the pur- pose of this illustration, we can use the data reported in the two existing VBI test utility studies (i.e. Gross et al, 2005; Ritcher & Reinking, 2005). This data (the number representing the number of subjects) can then be separated into four different categories:

A – Those with a positive flow result and positive VBI test

B – Those with a negative flow result and positive VBI test

C – Those with a positive flow result and negative VBI test

D – Those with a negative flow result and negative VBI test

Common sense would tell us that in order for a test to be good, most subjects should fall into categories A or D, but we can use some simple calculations to robustly establish factors which allow us to judge more accurately the utility of the test.1 The basis for such calculations (e.g. specificity, sensitivity, likelihood ratios) is a ‘two-by-two table’. For our example, the results are presented in Table 3.3.

The results of this analysis show that most subjects fall into categories B and C, which is the oppo- site of what we would hope for! The subsequent

Table 3.3 Two by two table for the VBI test. Data from two sources (Grosset al, 2005; Ritcher & Reinking, 2005) is collated in cells A to D. The number represents number of individual subjects

MRA/doppler US Positive Negative

Rotation test Positive A 20 B 112

Negative C 173 D 89

Sensitivity a/(aþc) 10%

Specificity d/(bþd) 44%

Pre-test probability (‘prevalence’) (aþc)/(aþbþcþd) 49%

Likelihood ratioþ sensitivity/(1-specificity) 0.19

calculations result in a poor sensitivity, moderate specificity, and very poor likelihood ratios.1

We shall consider the ‘prevalence’ result later. In clinical terms, this means that the VBI test will not be very useful in informing us that the condition is absent if it is negative, might be informative if it is positive, but overall unlikely to change our clinical decision.

So, did the clinician in the flowchart in Figure 3.7 make the correct decision? There is some conflict now between the moderate specificity which might suggest that the condition is present, and the very low likelihood ratios which suggest that whatever the result of the test, our clinical decision will not be altered. Mathematically, the likelihood ratios are better indicators of a test’s value than either the sensitivity or specificity values in isolation (Deeks & Altman, 2004; Gilbert et al, 2001b). Therefore we could say that we will go with the likelihood ratios and not let the result of the test change our clinical decision. Based on this, we could argue a case for not incorporating this test in our clinical investigation. However, we have not considered the complete picture. To help us further, we need to continue our exploration of probability.

It is inferred above that there is an estimation of how likely VBI exists in this patient before the test is performed. In basic clinical terms, there must be some suspicion of the condition being present or otherwise the test would not have been performed. It is important to establish this estimation – known as the pre-test probability – as it actually influences the impact a test will have on the clinical decision, i.e. the interpretation of a test result is not only reliant on the calculation in Table 3.3, but also the pre-test probability. In the ideal world, we would have used better quality data to input into our utility calculation – this is not a criticism of the study reports from which our data was taken, but rather a comment on the lack of specific utility studies done in this field. The calculation is therefore skewed by a spectrum bias (Pewsner et al, 2004). In the absence of studies on patients, we have relied on studies on normal subjects with the chance reporting of symptom reproduction.

This has resulted in a chance finding that perhaps the subjects who reported symptoms during the rotation test and also had positive flow results, did not actually have pathological VBI. It may be that blood flow reduces in normal individuals anyway (see Ch. 6). This has resulted in an unexpectedly high ‘prevalence’, or ‘pre-test probability’ figure of 49%. Ordinarily, this figure would be used, literally, as the sole basis of our pre-test probability estimation, i.e. the 38-year-old male with left-sided neck pain and headache has a 49% (almost 1 in 2) chance of having VBI. This is extremely incongruous with other information we have about VBI. More for- mally, because the studies used in the calculation were not carried out on 38-year-old males with unilateral neck and head pain, we cannot make this assumption. We must therefore use other information to inform our pre-test probability.

Firstly, let us briefly clarify how pre-test probability estimation has the potential to affect the impact a test has on post-test clinical judgement. The simplest way to demonstrate this is by use of a likelihood ratio nomogram (Fig. 3.8): The nomogram works by plotting a straight line joining the known integers of pre-test probability and likelihood ratio of the test. The straight line continues towards the right where it intersects the last line marked ‘post-test probability’. The nomogram immediately illustrates that there is a direct rela- tionship between pre-test probability and post-test probability (being swayed in the middle by the test’s likelihood ratio).

Using the nomogram we can calculate the post-test probabilities of the condition existing in the event of a positive or negative test. The results are presented in Table 3.4 which uses two examples of different pre-test probability judgements: scenario A, a low pre-test probability judgement (10%) of the condition existing, and scenario B, a high pre-test probability judgement (90%) of the condition existing.

As stated earlier, the greater than 1 a LRþ is, the better the test is at ruling the condition in, whilst the lower than 1 a LR– is, the better the test at ruling the condition out. The VBI test actually has LRþ less than 1, and a LR– greater than 1! We are therefore getting aberrant results, e.g. scenario B suggests that with a 90% pre-test probability, if we get a positive test result, the post-test probability actually reduces! This, of course, is illogical. This concept occurs in all the above scenarios. Because of the obvious poor utility of this test, mathematically we could argue the case for not using it to directly inform our clinical decision.

The mathematics of these calculations will not be presented here. Readers are referred to Strauss et al (2005) for further information. If you do this, you will see that it is possible (and easier) to use an online programme to perform all the necessary calculations for establishing the diagnostic utility of a test.

Now let us imagine we have another test which is actually very good (LRþ 15; LR– 0.5). The results for this test are presented in Table 3.5.

The results of this analysis demonstrate that if there is low pre-test probability, a good positive test will significantly affect the clinical decision (scenario A – probability changed from 10% to 60% chance of the condition existing). However, if the pre-test probability is high, the difference between pre- and post-test judgement is not significantly affected (scenario B – 90% to 99% chance of the condition existing); in other words, it was quite certain that the condition might exist and the positive result from the (good) test simply supports that suspicion but does not alter judgement.

With a good LR– test, if the pre-test probability is low and the test is negative, judgement will be supported towards the fact that it is unlikely that the condition exists. If the pre-test probability is high and the test is negative, the probability will change (in the case of scenario D, by 30%), but there is still a 60% chance that the condition exists! In summary, with the exception of a very good LRþ test being used in the presence of low pre-test probability (which is an unusual scenario), even good tests have limited influence on pre-test probability.

Clinical relevance

In light of the above it is the responsibility of the therapist to do their utmost to make the best judgement on the probability of a problem existing prior to performing any physical tests. In other words, the physical examination should only be entered into with a good estimation of what is wrong – this is good clinical reasoning! Pre-test probability (%) Likelihood ratio 0.2 0.5 1 2 5 0.02 0.05 0.1 0.2 0.5 0.0005 0.001 0.002 0.005 0.01 10 20 30 40 50 60 70 80 90 95 98 99 98 0.1 99 95 90 80 70 60 50 40 30 20 10 5 2 1 0.5 0.2 0.1 1 2 5 10 20 50 100 200 500 1000 2000 Post-test probability (%)

Figure 3.8 Likelihood ratio nomogram. If the pre-test probability and likelihood ratio are known, a straight line can be drawn to read-off the post-test probability.(Source: http://www.cebm.net/likelihood_ratios.asp)

Table 3.4 Results of nomogram calculations for existing VBI test showing post-test probabilities for A: a positive likelihood ratio, and B: a negative likelihood ratio

A Pre-test probability % þ Likelihood ratio Post-test probability %

Scenario A 10 0.19 4

Scenario B 90 0.19 50

B Pre-test probability % Likelihood ratio Post-test probability %

Scenario A 10 2.02 35

In document Combined Movement Theory, 2010 (Page 48-52)