Discussion
4.7 Are KDD techniques viable in the identification of those with CRC?
Despite best efforts in early detection colorectal cancer remains difficult to diagnose
based on clinical symptoms alone. This is likely attributable to numerous factors
such as stage of disease, location of tumour and patients themselves to identify a
few. Efforts are on-going to increase the detection rate of those with colorectal
cancer at an earlier stage within the UK in the form of the national screening
programme and FOB screening [148, 151]. Whilst very sensitive this compliance in
the screening population is variable [151, 153] likely due to the method by which the
individual provides the samples. In addition to the above, the use of flexible
sigmoidoscopy in a mobile setting is being evaluated to optimise detection rates of
colorectal cancer within the general population[286][287].
Notwithstanding above, Colonoscopy remains the gold standard method of diagnosis
[13, 288, 289] for colorectal cancer and it is not within the bounds of this study to
compare KDD methods to colonoscopy, nor was it the aim to compare these
techniques with screening tools. The aim was to optimise the referral pattern in those
who attended their primary care physician with symptoms and were referred onto
secondary care for further assessment, attempting to classify those who needed more
urgent assessment and as such potentially assist in the more appropriate distribution
of resources.
In this study KDD methods varied in their ability to predict patients with colorectal
cancer. These ranged from the best model accurately predicting 95% of those within
171
In studies assessing prediction it is important to ensure the sample studied is
sufficiently large to safeguard reliability. As such the ration of input variables to
outcomes should be 10:1 [217] as failure to achieve this level has resulted in unstable
models being created. In this study, all models explored had an appropriate ratio of
input and output variables.
The use of KDD has a broad spectrum across all fields of medicine. The most
commonly used method to date has been that of ANN with studies showing
comparability, if not some degree of superiority to traditional techniques. [290][291]
[236][292][237]. The nonlinearity and ability of ANN to learn has made their use
attractive when trying to stratify and predict outcomes in the field of medicine.
Studies assessing outcomes of mortality and morbidity following cardiac surgery
have been undertaken with positive results [242]
Alternative KDD methods used, specifically in the field of medicine include fuzzy
logic classification systems such as PROAFTN [293] which has been applied to
assist in diagnosing bladder tumours and acute leukaemia. Fuzzy KNN classifiers
have been used and have been shown to produce a more robust model of prognostic
markers than logistic regression and MLP’s [294]. Fuzzy rule generation in
conjunction with breast cancer datasets has been used with accuracy rates of 97%
172
The clinical environment in which a predictive system is used is the primary
determinate of the model and its classification cut off point. In clinical settings such
as this study’s model the optimal system is one that has a small number of false
positives and no false negatives. This will result in preference being given to model
sensitivity at the cost of specificity. Whilst there is no theoretical guidance as to how
the ideal cut off point in an ANN is chosen it may be possible to alter the number of
cut off points in the ROC curve but studies looking at this have shown minimal gain
173 4.8 Limitations
The use of KDD is reliant upon the quality of information entered into the
database for analysis. Whilst the data entry within this study was a direct
reflection upon the answers given by patients regarding their symptoms it is
feasible that the questionnaire may have been too complex. The initial
questionnaire had been validated within a cohort of patients within the
department however some additions were made prior to the distribution of the
questionnaire for use within this study to try and increase the amount of data
received. It is feasible that the addition of extra questions may have misled or
confused those completing the questionnaire thus reducing its reproducibility.
It is accepted that once any changes had been made to the questionnaire this
should once again have been tested and validated on an independent cohort of
patients both prior to and on attendance at a clinic to ensure that the answers
were reproducible. Whilst this technique in itself may, due to human nature
result in some anomalies it would allow the rigorous testing of the
174 4.9 Conclusion
The complexity of medical diagnosis remains challenging both to the physician and
computation models. Risk prediction remains central to a clinician’s ability to
successfully perform their duties, be it in a primary care setting, secondary or tertiary
care. An array of tests and tools are at the disposal of those in a hospital setting,
allowing the investigation of those deemed to be at increased risk of a condition.
Clinicians use clinical evidence in conjunction with experience to initiate further
investigations however there is variation in experience depending on the
specialisation of the clinician.
This study has shown that the use of KDD tools as an adjuvant to clinical acumen
can prove beneficial in identifying patients with lower GI pathology therefore
expedite their diagnosis and treatment. While it would be ill-conceived to suggest
that such computer models can replace physician-patient interaction further work
assessing the feasibility of models such as the ones in this study directing patients
‘straight to test’ are worthy of consideration for both 2ww pathway patients and
175