Critique on pilot study methodology - Summary and conclusion

4.8 Summary and conclusion

5.4.6 Critique on pilot study methodology

Grading of the cine MR images

The method chosen to randomly select 52 patients and then subsequently select a subset of high quality data from this cohort is entirely justified for the aim of this pilot study. Selecting patients at random in the first instance provided a representative sample in terms of the quality of the images. This allowed a judgement to be made on the proportion of data which may be considered ‘high quality’, i.e. perfectly adequate for processing. In this cohort 50% of dynamic sagittal slices were judged to be high quality. Selection and processing of only high quality data permitted a good test of the technique under real, but suitable conditions. At this stage of development, it is first necessary to test the technique on suitably high quality data, free from any aberrations introduced by challenging processing and interpretation of low quality data. To test the technique on poor quality data at this stage would be a distraction and complicate the assessment of whether the technique had potential to help identify adhesions, possibly making it difficult to draw conclusions. However, the fact that this analysis has been performed on data considered to be of high quality should be kept in mind when interpreting results. Protocol modifications to improve the fraction of suitable images is a priority but subsequent investigations should also clarify the limitations of the technique on lower quality images.

152 The methodology used for the scoring procedure (scoring each scan twice and reviewing the scans in a different order) recognises that human judgement often drifts as more images are viewed and more experience is gained. The aim of viewing the images in a different order in each scoring session was to reduce ‘drift’ in the usability scores across the dataset. Although subjective, this scoring procedure proved consistent. After both passes of scoring the quality, 253 of the 341 sagittal dynamic image sequences were found to agree without the need for a third review. In all cases where disagreement between the two passes was observed the score difference was 1. Moreover, in most of these disagreements (59/88) the difference may be considered less than 1 if, for example, on the first pass a score of “score 3 (possibly 4)” was received and on second pass “score 4” was given.

An alternative to the approach taken would be to mathematically quantify the amount of motion occurring in the abdomen. One consideration was to register every frame and calculate the average displacement across all nodes (or pixels) within the abdominal cavity. However, the images would still require a review by eye to assess for artefacts and respiratory motion (as opposed to pelvic thrust, for example) and, given the reasonable consistency shown by the expert eye, such an implementation was not considered necessary.

Reporter training

The reporters’ correct interpretation of the sheargram is paramount if the sheargram’s ability to help detect adhesions is to be accurately assessed. The training dataset of 10 sheargrams (with accompanying cine-MRs) reviewed by both reporters prior to commencing the pilot study addressed this requirement. The cases were specially selected to cover a wide range of possible scenarios, priming the reporters to recognise the effects of MRI artefacts (see Figure 5.3), out- of-plane motion and large, fast abdominal wall excursions. Additional sheargram patterns reflecting the presence or absence of adhesions were also included. The 10 examples for training were thought to be adequate without becoming too time consuming, and the scenarios presented covered the vast majority of situations encountered within the pilot study.

The technical expert was the developer of the visceral slide measurement technique and therefore has a deep understanding of the process used to generate the sheargram. The radiologist was provided with a simplified description of the sheargram generation procedure and was only exposed to the training dataset of 10 patients. The disparity of the reporters’ knowledge and experience of examining sheargrams permits deeper insights into the difficulty

153 of sheargram interpretation. Although the requirement for greater objectivity is acknowledged, the two reporters’ sheargram interpretation only disagreed in 2 cases. This implies that only a small amount of training was required to understand and interpret the sheargram without requiring a deep understanding of the technique.

Reporting and analysis method

The reporting procedure was intended to reflect how the sheargram might be used clinically i.e. in conjunction with the cine-MRI to draw attention to suspicious areas. The use of the sheargram in this way in the pilot study makes conclusions more relevant to its clinical implementation.

The classification system for deciding whether an adhesion was present (in both the sheargram and final decision) would ideally be a binary choice – ‘yes’, an adhesion is indicated or ‘no’, an adhesion is not indicated. However, in reality a single diagnostic procedure is often unable to offer a clear-cut diagnosis and for this reason the third ‘equivocal’ grouping was offered to reporters. The results and analysis are less ‘clean’ but it allowed the views of the reporters to be more accurately categorised rather than forcing a decision which would not normally be possible with the information available. Ultimately, classification of the sheargram result and clinical decision permitted the quantitative comparison and analysis shown in Table 5.1. It was this which formed the basis for most of the quantified data relating to the sheargram’s correlation to the final clinical decision.

The analysis method has treated each slice as an independent entity; however, an argument could be made for slices to be grouped or clustered as some belong to the same patient. The data within the groups could potentially demonstrate greater correlation with one another and lead to bias in the statistics. For example, if the technique works in a single patient with 8 slices, but fails in one patient with 4 slices, it would show an 8/12 agreement but only 50% agreement on a per-patient basis. An in-depth look at the agreement distribution within each patient reveals: 63% of patients had agreement between sheargram and cine-MRI in all their slices; 34% had one disagreement and 3% had two disagreements. The patient-by-patient analysis shows almost all patients included in the study had 0 or 1 disagreement – therefore the agreements/disagreements were spread evenly rather than being concentrated in certain patients, indicating little influence from the ‘clusters’. This implies that the likelihood of two slices belonging to the same patient leading to a disagreement is no more than two slices

154 originating from different patients. Consequently, while there was potential for clustering to be present within the data, the patient origin of the slice does not appear to be a contributory factor for agreement and it was appropriate for the analysis method to treat each slice independently.

The results principally revolve around comparing the sheargram to clinical interpretation of cine-MRI scans. It should be recognised that surgical confirmation is considered the true gold standard for diagnosis of adhesions. However, in the absence of surgical confirmation in the vast majority of patients scanned, clinically judged cine-MRI serves as a practical alternative. Use of cine-MRI as a diagnostic comparison tool is also supported by a recent in-house study conducted in Nijmegen. The study found high correlation between cine-MRI findings and surgically confirmed adhesions (further details were not available for inclusion in this thesis as the results are under review for publication).

5.5 Summary

This chapter has demonstrated and discussed the potential usefulness of the visceral slide quantification technique in a retrospective clinical pilot study. The pilot study included a cohort of 52 randomly selected patients who had been referred for abdominal pain with suspected adhesions. The following bullet points summarise the methodology taken:

1. The 281 unique sagittal cine-MRI slices were filtered so only those suitable for processing were selected, leaving 141 images (50%).

2. Processing produced a sheargram for all 141 sagittal slices.

3. An expert radiologist and technical expert reviewed all sheargrams and cine-MR slices. The original report of the cine-MRI was also available for comparison.

4. A judgement was made as to whether an adhesion was indicated on the sheargram and whether an adhesion to the abdominal wall was present in the cine-MRI.

5. Data analysis compared the degree of correlation between sheargram interpretation and the clinical decision on the cine-MRI made by the radiologist.

Each of the metrics proposed for an effective diagnostic aid have been addressed. The principal metric, accuracy, has been evidenced through the sheargram’s strong correlation to expert clinical opinion:

- The results indicate good agreement between the sheargram and clinical decision as displayed in Figure 5.5.

155 - 84% of scans were considered to correlate with clinical opinion for both reporters

(compared with 79% sheargram correlation with the original report).

- 96% and 93% of positively identified adhesions correctly correlated with sheargram interpretation for the radiologist and tech expert respectively.

- 81% of healthy sagittal slices were correctly identified by both reporters.

Robustness is indicated as there were only two cases where a positive adhesion was not visible on the sheargram. The technique exhibited a high sensitivity but a lower specificity. However, given that the aim of the technique is to be a diagnostic aid, wrongly drawing the attention of the radiologist to healthy regions is preferable to missing adhesions.

The influence of the sheargram has been evidenced through the radiologist altering his decision on the presence of an adhesion on 12 occasions relative to the original report. On 10/12 cases the sheargram agreed with the change made, suggesting that it influenced the decision making process. As a result, 7 additional adhesions were identified in the pilot study relative to the original reports.

Despite encouraging results, several limitations have been highlighted. Differences in judgement between reporters were observed, highlighting the subjective nature of sheargram interpretation. Differences in correlating the sheargram to clinical opinion could mostly be attributed to the relative lack of experience of the technical expert resulting in a tendency to report more equivocally. The inability to analyse adhesions away from the abdominal wall is a fundamental limitation of the 2D sheargram technique and a move to 3D imaging is proposed.

The pilot study has not effectively quantified the effects on reporting efficiency but a reduction in reporting time is a likely outcome and this should be investigated more thoroughly in future studies.

The evidence raised by the pilot study has indicated that the visceral slide measurement technique is well placed to become a future diagnostic aid for cine-MRI interpretation of the abdominal wall. The results build confidence in the technique and signify that further investigation is deserved. However, fundamental limitations of imaging in 2D cannot be ignored and the need for a 3D analysis is acknowledged and explored in the next Chapter.

156

Chapter 6

Discussion

This thesis has described the application of image segmentation and registration to measure sliding of organs around the perimeter of the abdominal cavity for detection of adhesive pathology. This analysis technique represents a shift from the approach of previous work and the reasons for this have been discussed and justified. Validation of the implemented technique has been sought through tests investigating certain characteristics and its clinical potential has been ascertained through a pilot study.

This chapter brings together the considerations of previous chapters but also offers some more general, fundamental concepts and explores alternative image processing options. It is organised under the following discussion points:

1. Summary of previous discussions 2. Is the sheargram necessary?

a. Comparison with AbsCAT 3. Clinical potential of the technique

a. Inter-operator variability 4. MRI acquisition

5. Alternative approaches to segmentation 6. Shear as an analogue for sliding

7. Presentation of the sheargram 8. Feasibility of 3D analysis

In document Towards a non-invasive diagnostic aid for abdominal adhesions using dynamic MRI and image processing (Page 172-177)