Are KDD techniques viable in the identification of those with CRC?

Discussion

4.7 Are KDD techniques viable in the identification of those with CRC?

Despite best efforts in early detection colorectal cancer remains difficult to diagnose

based on clinical symptoms alone. This is likely attributable to numerous factors

such as stage of disease, location of tumour and patients themselves to identify a

few. Efforts are on-going to increase the detection rate of those with colorectal

cancer at an earlier stage within the UK in the form of the national screening

programme and FOB screening [148, 151]. Whilst very sensitive this compliance in

the screening population is variable [151, 153] likely due to the method by which the

individual provides the samples. In addition to the above, the use of flexible

sigmoidoscopy in a mobile setting is being evaluated to optimise detection rates of

colorectal cancer within the general population[286][287].

Notwithstanding above, Colonoscopy remains the gold standard method of diagnosis

[13, 288, 289] for colorectal cancer and it is not within the bounds of this study to

compare KDD methods to colonoscopy, nor was it the aim to compare these

techniques with screening tools. The aim was to optimise the referral pattern in those

who attended their primary care physician with symptoms and were referred onto

secondary care for further assessment, attempting to classify those who needed more

urgent assessment and as such potentially assist in the more appropriate distribution

of resources.

In this study KDD methods varied in their ability to predict patients with colorectal

cancer. These ranged from the best model accurately predicting 95% of those within

171

In studies assessing prediction it is important to ensure the sample studied is

sufficiently large to safeguard reliability. As such the ration of input variables to

outcomes should be 10:1 [217] as failure to achieve this level has resulted in unstable

models being created. In this study, all models explored had an appropriate ratio of

input and output variables.

The use of KDD has a broad spectrum across all fields of medicine. The most

commonly used method to date has been that of ANN with studies showing

comparability, if not some degree of superiority to traditional techniques. [290][291]

[236][292][237]. The nonlinearity and ability of ANN to learn has made their use

attractive when trying to stratify and predict outcomes in the field of medicine.

Studies assessing outcomes of mortality and morbidity following cardiac surgery

have been undertaken with positive results [242]

Alternative KDD methods used, specifically in the field of medicine include fuzzy

logic classification systems such as PROAFTN [293] which has been applied to

assist in diagnosing bladder tumours and acute leukaemia. Fuzzy KNN classifiers

have been used and have been shown to produce a more robust model of prognostic

markers than logistic regression and MLP’s [294]. Fuzzy rule generation in

conjunction with breast cancer datasets has been used with accuracy rates of 97%

172

The clinical environment in which a predictive system is used is the primary

determinate of the model and its classification cut off point. In clinical settings such

as this study’s model the optimal system is one that has a small number of false

positives and no false negatives. This will result in preference being given to model

sensitivity at the cost of specificity. Whilst there is no theoretical guidance as to how

the ideal cut off point in an ANN is chosen it may be possible to alter the number of

cut off points in the ROC curve but studies looking at this have shown minimal gain

173 4.8 Limitations

The use of KDD is reliant upon the quality of information entered into the

database for analysis. Whilst the data entry within this study was a direct

reflection upon the answers given by patients regarding their symptoms it is

feasible that the questionnaire may have been too complex. The initial

questionnaire had been validated within a cohort of patients within the

department however some additions were made prior to the distribution of the

questionnaire for use within this study to try and increase the amount of data

received. It is feasible that the addition of extra questions may have misled or

confused those completing the questionnaire thus reducing its reproducibility.

It is accepted that once any changes had been made to the questionnaire this

should once again have been tested and validated on an independent cohort of

patients both prior to and on attendance at a clinic to ensure that the answers

were reproducible. Whilst this technique in itself may, due to human nature

result in some anomalies it would allow the rigorous testing of the

174 4.9 Conclusion

The complexity of medical diagnosis remains challenging both to the physician and

computation models. Risk prediction remains central to a clinician’s ability to

successfully perform their duties, be it in a primary care setting, secondary or tertiary

care. An array of tests and tools are at the disposal of those in a hospital setting,

allowing the investigation of those deemed to be at increased risk of a condition.

Clinicians use clinical evidence in conjunction with experience to initiate further

investigations however there is variation in experience depending on the

specialisation of the clinician.

This study has shown that the use of KDD tools as an adjuvant to clinical acumen

can prove beneficial in identifying patients with lower GI pathology therefore

expedite their diagnosis and treatment. While it would be ill-conceived to suggest

that such computer models can replace physician-patient interaction further work

assessing the feasibility of models such as the ones in this study directing patients

‘straight to test’ are worthy of consideration for both 2ww pathway patients and

175

Appendix A

In document The use of knowledge discovery databases in the identification of patients with colorectal cancer (Page 170-175)