Chapter 2: Microarray technologies and data-driven modelling for bladder cancer
2.6 Machine learning models for microarray Cancer Classification
2.6.1 Computational Intelligence Modelling for Cancer Classification
Computational Intelligence (CI) can be defined as “the study of adaptive mechanisms to enable or facilitate intelligent behaviour in complex problems” [65]. CI algorithms have proven to be popular in the analysis of microarray data because they can detect complex nonlinear associations between the different variables and offer substantial benefits in terms of tolerance to imprecision and system interpretability. Computational Intelligence includes techniques such as Neural Networks (NN), Fuzzy Logic (FL), Neural-Fuzzy Logic, Support Vector Machine and Bayesian Networks. A review of a number of CI techniques applied to bioinformatics is presented in [66].
In, [7, 17, 18], the authors compared different CI approaches for cancer classification and state that traditional analytic methods fail to give accurate results in microarray data applications because this methods assume biological linearity and use
correlation or dependence to find the relationship between a gene and its class. Within CI there is an area of study called Soft computing. Soft computing could be seen as a number of methods so that real problems could be solved in a similar way as humans solve them [67]. This is one of the most important reasons for the use of Soft Computing, to apply the human reasoning to solve a problem and a human understandable explanation of the model.
Soft Computing includes techniques such as Neural Networks, Fuzzy Logic, Neural-Fuzzy Logic, and Support Vector Machine.
i. Fuzzy Logic
Fuzzy Logic is a linguistic method based on a number of rules that describe the system. The transparency of FL and the possibility of easily interpret the results makes it an attractive and effective method for the analysis of gene expression data [68-71].
An important aspect to take into account at the moment of reducing the number of rules is that in fact is important reduce the rules but the most important is to prove that the reduction of rules does not affects the accuracy of the model. The goal is to have a minimum number of rules with the best accuracy of prediction, not one rule per input.
That is the same case with the number of genes; there is a discussion between the effectiveness of using a large or a small number of genes. As stated previously in this chapter, microarray data is composed of thousands of genes so the main purpose is to find the best genes that could lead us to make a good prediction.
Recent research has shown that a small number of genes are enough for accurate prediction of most cancers, nevertheless the number of genes vary between
Chapter 2: Microarray technologies and data-driven modelling for cancer 26
diseases [72]. A large set of gene expression will decrease the classification accuracy due to the curse of dimensionality [73]. In this phenomenon, the classification accuracy decreases as the dimensionality increases.
A list with the advantages of Fuzzy Logic method:
Transparency because of the linguistic rules.
Easy interpretation of the output because of the Low, Low Medium, Medium, Medium High, High states.
Rules explaining the model, making easier to clinicians to understand the model.
Due to the characteristics of microarray data (high dimension and low sample size) Fuzzy logic models (as many other methodologies) struggle to make an accurate classification [68].
ii. Neural Networks
Neural Networks are inspired by how the human brain learns and processes information, they have the capability to solve complex tasks [74]. Their concept simulates the behaviour of a biological neural network [74]. While in humans, learning is done by adjusting the synaptic connections between neurons; in NNs, learning is done by adjusting the weights existing between the processing elements of the network [74].
Neural networks can obtain a good performance with higher learning speed in many applications. However, a high complexity of the network (large number of hidden nodes) translates into a slower response of the trained network [75].
A possible disadvantage of neural networks, especially with microarray data, is overtraining. In overtraining, a model can learn a local solution for each example as opposed to finding a global solution [76].
Neural Networks have been successfully applied to the prediction of cancer [77, 78], but some of the informed disadvantages are that the elicited network is hidden within a ‘black box’, consequently deeming the gain of any insight into the process aspects and into a clinical interpretation [7].
iii. Neural-Fuzzy
The characteristics of Fuzzy Logic and Neural Networks have been discussed in this Chapter; these two methodologies can be combined to form a hybrid Neural- Fuzzy (NF) model. Neural-Fuzzy models combine the learning ability of Neural Networks and the interpreting ability of Fuzzy systems [72]. The fuzzy logic rules of this type of models can be translated into linguistic statements to allow understanding and interrogation of the model.
Neural-Fuzzy systems, are a popular approach for addressing tolerance to imprecision and system simplicity (interpretability) and is widely used in literature [79- 82] and more recently also used for the prediction bladder cancer [7, 16-18]. Neural- Fuzzy systems take advantage of the simplicity and tolerance to imprecision of Fuzzy Logic structures and the adaptive learning ability of NN while the inclusion of knowledge to the model is still possible. In general, fuzzy set theory [83] has been extensively applied to pattern classification and FL system have been proven to perform well on uncertain information [84-86]. In terms of their simplicity and interpretability, Neural-Fuzzy models allow model knowledge to be represented in the form of just a
Chapter 2: Microarray technologies and data-driven modelling for cancer 28
few simple linguistic rules thus rendering such modelling structures appropriate for systems oriented towards human-reasoning (human-centric systems) e.g. clinical decision support systems [87-89].
iv. Support Vector Machines
The support vector machine was initially created to solve classification problems and has been successfully applied to a number of real world problems. Support Vector Machines has exhibited outstanding performance in classification tasks. SVM aims at searching for a hyper plane that separates the two classes of data with largest margin. SVM is shown to be a good classifier for microarray data [90].
Support Vector Machine is a popular method in microarray analysis because it is possible to deal with data with a large number of features and a small number of samples [91]. One of the drawbacks for this method is the high algorithm complexity and the extensive computing requirements of the large-scale quadratic programming tasks. A second problem often mentioned is the poor interpretability as compared to other methods [92, 93].
v. Bayesian Networks
Bayesian networks (BNs) reflect the random nature of gene expression and use Bayes’ rule [94]. They are also known as probabilistic networks or probabilistic graphical models. The hypothesis in BN is that gene expression values can be defined by random variables that follow probability distributions [94].
Bayesian networks provide a flexible framework for combining expert knowledge into the modelling process [95, 96]. An additional advantage of BNs is that they are good with modelling the randomness and noise associated with microarray
data [97]. Bayesian networks deal with probabilities but the ‘causality’ or factors that generated the solution are also important for the network [97].
Bayesian Networks have also been applied to Cancer Prediction [98-101] in particular in the form of a Bayesian Neural Networks (BNN). Bayesian Networks are modelling structures for expressing multidimensional joint probability distributions. The main challenge in using BNN is the necessity to estimate the topology of a BNN from observations, which is not a trivial problem due to the large amount of uncertainty and high computational complexity even for moderate sizes of networks [98, 102, 103].
2.6.2 Machine learning models specific to microarray bladder cancer Stage, Grade