In recent years, Deoxyribonucleic acid (DNA) microarray technique has shown great impact in determining the informative genes that cause cancer [115, 116]. The major pitfall which is persistent in microarray data is the `curse of dimensionality' problem [117]. This problem hinders the useful information of data set and leads to computational instability. Therefore, the selection/extraction of relevant features (genes) remains imperative in the analysis of microarray data of cancer.
A good number of feature (gene) selection/extraction techniques and classifiers based on machine learning techniques have been proposed by various researchers and practitioners [3–7].
The statistical tests which can be categorized as either parametric or non-parametric can be used for feature selection method by assuming the hypotheses [118]. Based on the correctness of the hypothesis (Null hypothesis or Alternate hypothesis), the features are either selected or rejected. Further, classification of data to their respective classes is
performed.
Extreme Learning Machine (ELM) is a variant of artificial neural network (ANN) specifically, single-hidden layer feed forward networks, which recently has gained lot of popularity due to its faster learning rate when compared to conventional machine learning techniques [119].
Relevance Vector Machines (RVM) is one of the machine learning technique which has an better edge in comparison to SVM among the research community [120, 121]. RVM work flow is based on the Bayesian formulation of a linear model with an appropriate assumption that results in a sparse representation. As a result, it can be well generalized and can provide inferences at low computation cost. RVM has an identical functionality in comparison to SVM, but rather it uses a Bayesian probabilistic model for learning and performing predictions.
Fuzzy logic provides a means to arrive at a definite conclusion based upon vague, ambiguous, imprecise, noisy, or missing input information. Since the nature of data set is quite fuzzy i.e., not predictable, which in turn (data) leads to different inference. The relationship among the data and inference is unknown. The fuzzy concept has been used in this work, to study the behavior of the data (capturing human way of thinking), and also it is also possible to represent and describe the data mathematically. Further, fuzzy system has been considered because of the limited number of learning rules that needs to be learnt in the present system. The no. of free parameters to be learnt is reduced considerably, leading to efficient computation. In general, if the number of features are larger than 100, then it is suitable to use machine learning techniques rather than using statistical approaches.
If ANN is applied for the same method, designing the model would be far more challenging due to the large no. of cases. Hence coupling ANN with Fuzzy logic, will be easy to handle by inferring the rule base of the fuzzy system. In the current scenario, neuro-fuzzy networks have found to be successfully applied in various areas of analytics. Two typical types of neuro-fuzzy networks are Mamdani-type [122] and TSK-type [123]. For Mamdani-type neuro-fuzzy networks, minimum number of fuzzy implications are used in fuzzy reasoning. Meanwhile, in TSK-type neuro-fuzzy networks, the consequence of each rule is a function of various input variables. The generic adopted function for rule generation, is a linear combination of input variables and constant term. Several researchers and practitioners have reported that, using TSK-type neuro-fuzzy network achieves superior performance in network size and learning accuracy than that of Mamdani-type neuron-fuzzy networks [124]. In classic TSK-type neuro-fuzzy network, which is linear polynomial of the input variables, the system output is approximated locally by the rule of hyperplanes.
However, a linear subspace cannot describe the non-linear variations of microarray genes. Alternatively, a kernel feature space can reflect non-linear information of genes. By using the kernel trick, the data points are mapped into a higher dimensional (possibly infinite-dimensional) space [125]. Kernel trick is a mathematical technique which can be
applied to any dot product based algorithms. Whenever a dot product between two vectors is encountered, it is replaced by kernel function. This maps candidate linear algorithms into non-linear algorithms (sometimes with little effort or reformulation). Further, the transformed non-linear algorithms are the equivalent of their linear algorithm in their original feature space.
In this chapter, the following type of kernels have been used to map the function in high-dimensional space.
• Linear:K(xi, xj) =γxTi xj.
• Polynomial:K(xi, xj) = (xTi xj +b)γ, γ >0.
• Radial Basis Function (RBF):K(xi, xj) = exp(−γkxi −xjk2), γ >0. • Tansigmoid (Tansig):K(xi, xj) =tanh(γxTi xj +b), γ >0.
where,γandbarekernel parameters.
The choice of a kernel function depends on the problem in hand because it depends on what we are trying to model. For instance, a polynomial kernel allows feature conjunction modeling to the order of the polynomial. Radial basis function allows to pick out circles (or hyper spheres) in contrast with the linear kernel, which allows only to pick out lines (or hyperplanes). The objective behind using the choice of a particular kernel can be very intuitive and straightforward depending on what kind of information is to be extracted with respect to data.
Hence, along with the feature selection using t-statistic, ELM, RVM, and a non-linear version of FIS called kernel fuzzy inference system (KFIS), which is a kernerlized version of neuro-fuzzy system with different kernel functions are used as classifiers by applying 10-fold cross validation (CV). We have already shown the state of art simulation of existing methods in the earlier chapter, where SVM (with RBF Kernel) performed very well with better accuracy and in less time. Therefore, the motivation of this chapter is:
• To analyze the microarray data using classifier with better accuracy in a minimum processing time.
• Idea: Various classifiers with different kernels are applied to analyze the microarray dataset.
The rest of the chapter is organized as follows: Section 3.2 presents the procedure for classifying the microarray data using various proposed classifiers. Section 3.3 presents the implementation details of the proposed approach. Section 3.5 highlights on the results obtained, and the interpretation drawn from it. Section 3.6 summarizes the chapter and presents the scope for future work.