Lung Function Resources - An Expert System for Lung Function Interpretation

Chapter 3 An Expert System for Lung Function Interpretation

3.2 Methodology

3.2.1 Lung Function Resources

The availability of both data and experts had a large influence on the direction of this study. The study was prompted in part by the availability of large numbers of archived lung function reports, a resource with the potential for expanding current knowledge. Whilst a lack of expert availability also shaped the course of the study to some extent, this resource deficiency highlights the potential benefits of research in this area.

3.2.1.1 Data

The data that was used to acquire the domain knowledge consisted of an amalgamation of lung function case reports from three sources: 1568 reports from Austin Health in Melbourne, Australia2; 1390 reports from the 2004 round of the Tasmanian Longitudinal Health Study (TAHS)3_{; and 5 reports from the Royal} Hobart Hospital in Hobart, Australia4. Each report was considered to be a single case in the dataset, with the source added as a further attribute. In the implementation of the knowledge acquisition system, each of these sources were

2_{http://www.austin.org.au/}

3_{http://www.epi.unimelb.edu.au/research/major/tahs} 4_{http://www.dhhs.tas.gov.au/hospital/royal-hobart-hospital}

presented as distinct datasets, but also with the option to view them all as a single dataset; as it is common for experts to be interested only in a single source of data for knowledge discovery purposes. The data was considered similar enough that they would be unlikely to want to define a rule for only one dataset, but the inclusion of the source as an attribute allowed this if necessary.

All cases had any identifying data removed for privacy reasons and were identified within the online system by an ID number and Source pair; for example, ―case 38 from the TAHS study‖, or ―38 TAHS‖. The Source was used as an identifier to allow for the possibility that cases may be linked back to the archived stores, which may have additional information for future analysis.

Importantly, all cases in the dataset were entirely unclassified – they had no information such as eventual interpretations or diagnoses, nor any information on the future discovered effects for each patient. This precluded automated machine learning approaches from consideration in developing a knowledge base for the data. Each case constituted a single set of test results from a single patient, independent of history, future tests or information, or any form of information other than the recorded test results.

When presented to the users, all reports were displayed in a format similar to the printed formats that are used by most medical institutions, so as to be recognisable and familiar. Figure 3-2 shows an example lung function report, as they appear in the online system.

As Figure 3-2 shows, not all cases had values for all attributes, and many were missing values for different attributes. The exact measurements taken may have depended on the facility where the tests were performed, the reason the tests were being performed, who was performing the tests, practical restrictions due to other medical problems, or even broken equipment – the reasons for their omission were not recorded with the cases. These missing values had little impact on the knowledge acquisition process, as most cases contained sufficient information for classifications to be made; and if any single case did not, the definition for classifications could be derived from other cases.

Figure 3-2: Sample lung function report

Reference Equations

As described in section 2.5.2, lung function reports are typically presented with the predicted values for each attribute, as determined by a set of reference equations. In this study, the report data initially contained values derived from the Knudson et al. 1976 equations (Knudson, et al., 1976) for spirometry, gas transfer equations from Cotes and Leathart (Cotes & Leathart, 1993), and Goldman and Becklake’s 1959 lung volumes equations (H. Goldman & Becklake, 1959). During the developmental stages of this study however, these were rejected by a number of lung function experts as being somewhat outdated, and new equations were introduced: the NHANES III equations for spirometry (Hankinson, et al., 1999), the Roca et al. equations for lung volumes (Roca, et al., 1990), and the Quanjer et al. equations for gas transfer (Quanjer, et al., 1993). The previous equations were kept as an option to allow experts to compare results between reference equations, and to allow them to use whichever they felt was most appropriate.

Flow-Volume Loops and Data Visualisations

There were however a few significant differences between the reports that experts typically see in their professional work and the reports that were presented with the system. Significantly, most cases in the system did not have the Flow-Volume Loop (FVL) diagrams, volume graphs, or any associated visual representation of the test results. FVL diagrams visually describe the airflow during the inhalation and exhalation measured in spirometry. This is generally considered to be a vital component in the interpretation of a lung function report, as the visual cues provided by the shape of the FVL provide respiratory experts with an immediate impression of what to be looking for and how to proceed, and often an initial diagnosis. It is also considered to be critical both by inexperienced experts who are not as aware of the significance of all of the attributes, and by experienced experts who can infer a great deal from an initial glance. While the FVL generally does not provide any information that the test results do not, it has become such an effective shortcut to interpreting results that it is expected on reports and some experts come to rely upon it for their interpretations. In fact, when experts were initially approached to take part in this study many were uncomfortable working without FVL and declined to take part (this appeared to be entirely based on personal preference, with the type and experience of experts not providing any indicator of whether they would refuse). As they can have such a critical role, reports were added to the dataset from the Royal Hobart Hospital with the FVL and volumes graphs attached; and twenty more FVL were created by a leading respiratory scientist to match a set of cases chosen to be representative of the range of cases in the dataset, to allow as many experts as possible the opportunity to participate in the study.

3.2.1.2 The Experts

In an effort to ensure the best possible resultant knowledge base, multiple experts were used to perform the knowledge acquisition. These experts had a range of experience and knowledge in the lung function field, in working with patients and performing respiratory research.

Three experts were used to acquire the knowledge for the main knowledge base in this study. Primarily the knowledge came from a single leading respiratory scientist

in Australia, with additional input from another highly regarded clinical specialist, and some minor additions by another respiratory researcher. Further input, ranging from system design and testing to explaining complexities of the lung function domain, was taken from 15 more available experts in Australia.

Initially, in order to organise his thoughts and establish some fundamental classifications, the leading expert created a document detailing definitions for a set of common classifications. This document was circulated and confirmed by another small group of respiratory experts, including the two other experts involved in developing the knowledge base. Once confirmed, the administrator of the system added these definitions to the system as a basic set of initial rules, in the manner of a Vazey CARD approach (Vazey, 2006). The secondary expert then contributed to this knowledge base, along with the tertiary expert. As the experts were not concurrently available, and in order to allow knowledge comparisons, the first expert also developed their own knowledge base independently of the collaborative knowledge base. Finally, the two knowledge bases were compared, inconsistencies were resolved where necessary, and the knowledge bases consolidated into a single final knowledge base. The development and contributions towards each knowledge base are summarised in Figure 3-3. The methods for acquiring and consolidating the experts’ knowledge are discussed in the following section.

Primary

Expert Secondary Expert Tertiary Expert

Independent Knowledge Base Collaborative Knowledge Base Initial Definitions Document Consolidated Knowledge Base

Figure 3-3: Contributors to each knowledge base

In document A method for knowledge discovery and development with health data (Page 105-110)