Intercoder reliability testing - Evaluating nuanced practices for initiating decision-making in

A total of 30 cases were randomly selected from the remaining sample (i.e. those cases that had not been part of the development of the coding scheme) and independently coded. Each of the three coders coded 20 cases, 10 with each of the other two coders. Formal intercoder reliability testing was conducted to quantify the extent to which the process could be described as reliable.

Intercoder reliability testing focused on the first decision point of each decision. Two aspects of reliability were assessed: first, the extent to which coders were able to identify the same parts within the consultations as initiation points; and, second, the extent to which coders were able to agree on the classifications applied to the decision points they had identified (the type of decision that was being made, the patient’s response, etc.). Coder A (Paul Chappell) identified 32 initiation points within 20 consultations. A total of 24 (75%) of these initiation points were also identified by the second coder (half of the time the second coder was B and half of the time the second coder was C). Coder B (Merran Toerien) identified 37 initiation points within 20 consultations. A total of 30 (81%) initiation points were also identified by the second coder. Coder C (Clare Jackson) identified 37 initiation points within 20 consultations. A total of 24 (65%) of these initiation

Consultation

Decision 1

(e.g. whether or not to start disease-modifying therapy

for MS)

First decision point

(e.g. list of treatment options)

Patient’s response

(e.g. seeks further information)

Second decision point

(e.g. seeks patient’s view after giving extra

information)

Patient’s response

(e.g. patient selects a treatment from the list)

Course of action going to happen?

(yes/no/decision deferred)

Decision 2

First decision point

(e.g. recommends treatment X)

Patient’s response

(e.g. accepts)

Course of action going to happen?

(yes/no/decision deferred) (e.g. which treatment to try for MS symptoms of fatigue)

FIGURE 1 Example of how a consultation might be coded for more than one decision and one or more decision point per decision.

points were also identified by the second coder. This gives a total agreement rate of 74%. Although this indicates that there is some disagreement over which interactional turns should be coded as relevant to our study, both coders agreed on a large majority of first decision points.

In the second part of the intercoder reliability analysis we used cross-tabulations and Cohen’s kappa to explore intercoder agreement in the classification of the 39 decision points that coders had both identified across the 30 consultations (although it is worth noting that not all of the consultations contained

initiation points). Table 2 shows the results for intercoder agreement on decision-level questions related to those decision points. Separate percentage agreement and kappa scores have been estimated for each of the different variables. Landis and Koch87_{suggest kappa scores ranging from 0.40 to 0.59 show moderate}

agreement, 0.60 to 0.79 substantial agreement and ≥ 0.80 show outstanding agreement.

Some level of disagreement between coders is to be expected when applying quantitative classifications to complex interactional data. However, the results of our reliability analysis are encouraging. In percentage terms, coders agreed with each other most of the time for all of the different variables. Kappa scores indicate that there is a substantial or outstanding level of agreement for all variables, with the exception of the questions regarding whether or not the end outcome is one that the neurologist and patient appear to prefer. These questions relied on coders to some extent interpreting the motivations of the participants, so it is perhaps not surprising that it was hard to achieve high intercoder reliability for these variables. Nevertheless, these results indicate that the coding scheme is reliable for all but these two variables. It is also worth noting that these results may occasionally be overestimating the final differences between coders because some of the categories were combined in later analyses. For example, we collapsed pronouncements and recommendations together. This means that all cases in the reliability analysis that were coded as a recommendation by one coder but as a pronouncement by another would have been counted as ‘disagreement’ under this testing process. However, if the categories had been collapsed together at the point of testing, all these cases would have counted as ‘agreement’.

After the 30 cases had been coded, the coders discussed their differences and decided on agreed codings for any disagreements, and agreed how similar issues would be dealt with in future. After intercoder reliability testing, CJ and PC coded the remainder of the consultations, independently coding a randomly selected half of the remaining cases each. Where coders found problematic cases, with decision points that were hard to identify or classify, these were discussed by all three coders and agreed with reference to the rules outlined in the codebook.

TABLE 2 Percentage agreement and kappa scores for variables derived from coding process

Variable Agreement (%) κ

Type 92.3 0.87

Turn design 79.4 0.70

Who responds? 84.6 0.60

Response 87.1 0.83

Is the outcome in line with what the neurologist thinks best? 87.1 0.46

Is the outcome in line with what the patient appears to prefer? 76.9 0.59

Is the course of action going to happen in principle? 97.4 0.92

DOI: 10.3310/hsdr06340 HEALTH SERVICES AND DELIVERY RESEARCH 2018 VOL. 6 NO. 34

© Queen’s Printer and Controller of HMSO 2018. This work was produced by Reuber et al. under the terms of a commissioning contract issued by the Secretary of State for Health and Social Care. This issue may be freely reproduced for the purposes of private research and study and extracts (or indeed, the full report) may be included in professional journals provided that suitable acknowledgement is made and the reproduction is not associated with any form of advertising. Applications for commercial reproduction should be addressed to: NIHR Journals Library, National Institute for Health Research, Evaluation, Trials and Studies Coordinating Centre, Alpha House, University of Southampton Science Park, Southampton SO16 7NS, UK.

In document Evaluating nuanced practices for initiating decision-making in neurology clinics : a mixed-methods study. (Page 45-47)