Optimization and Classification Techniques in Microarray Medical Data for Gene Selection: A Survey

(1)

Vol. 28, No. 12, (2019), pp. 68-78

Optimization and Classification Techniques in Microarray Medical Data for Gene Selection: A Survey

Vadipina Amarnadh, Md Asdaque Hussain

Dept. of Computer Science and Engineering, KoneruLakshmaiah Education Foundation, Guntur (Andhra Pradesh)–India,

Abstract

The aim of the paper is to analysis the various techniques involve in the classification of microarray medical data. Major researches concentrated on the new gene extraction technique that is known as gene selection, which has a higher impact on the classification result. By selecting the related genes of the disease from the enormous genes, helps to classify the disease more accurately. Some optimization techniques applied for the selection of gene and one of the well-known method is modified Particle Swarm Optimization (PSO) and that method is explained in the paper. Some researchers used different types of classifier to test the performance of the method. The minimum redundancy and maximum relevance features along with Support Vector Machine (SVM) and achieved the accuracy up to 98% for 8 genes.

Keywords: Microarray medical data, Minimum redundancy and maximum relevance features Particle Swarm Optimization, Support Vector Machine.

1. Introduction

The scientists invented novel technologies to meet the requirements for reliable assessment for transcription profiling, based on the relationship between genotype and phenotype. There are numerous methods performed for analysis of transcriptome in order to understand the expression of transcripts and among them, two most used techniques for transcript expression are microarray and RNA-Seq [1-3]. Recently standard Karyotype examination replaced by Chromosomal Microarray Analysis testing because of its superior detection of clinically significant findings [4 -6].A huge amount of genes presents in the cell that indicate unique features of different types of cells. DNA microarray technology helps to obtain the gene expression of cells, which gives the simulation expression of more than thousand genes. This technology widely used in the medical field to distinguish the tissues between normal and cancer [7 - 9]. The gene selection is part of feature selection, which is the process of extracting the relevant gene for the purpose of classification/prediction problem. Selection of relevant genes is the crucial task and selecting a subset of informative genes from high-dimensional microarray database is Hard problem due to its nature of nondeterministic polynomial-time [10 - 11].

Two more issues involve in the gene classifications are: (1) the database available for the gene expression is complex and also have some noise and (2) The dataset available for this method contain fewer than one hundred instances and each instance quantifies the expression levels of several thousands of genes [12 - 13]. High-density DNA microarray helps to measure the activity of the genes in a parallel way and this method is much helpful to provide better therapeutic measurements for cancer patients by improving the accuracy of diagnosing cancer types. Early detection helps to treat the cancer in the initial stage and increase the chance of survival rate [14]. This paper presented an analysis of numerous research related to the classification of microarray medical data for various diseases and test. Each method consists of different feature selection method and this

(2)

Vol. 28, No. 12, (2019), pp. 68-78

helps to extract the relevant gene for the classification. Different method achieved different classification accuracy and other parameters due to their feature selection and classifier technique used. An ant colony optimization technique in the feature extraction method and helped to reduce the error rate to 23.63 %. Fuzzy rough quick reduct algorithm used and this method has the classification accuracy of 97.22 for Leukemia gene.

2. OUTCOMES FROM THE PARALLEL RESEARCHES

In this context, latest paper related to the microarray medical data classification considered for understanding the significance and limitations of their works. Each paper has its own technique to improve the classification accuracy or other parameters. Feature selection is related to the gene selection in the data. There are abundant data of genes present in the cell and extract the relevant features of these large data is the important part of the classification method. Many feature extraction techniques utilized by the researchers and various classifiers also tested for improving the performance. The proposed techniques of research along with their classifiers methods discussed in this section. There are some databases available for gene expression, which explained in the following sections. The overview of the classification of microarray medical data classification [15] displayed below in Fig. 1.

Figure 1. Overview of the microarray data classification method 2.1 Gene Expression Dataset

There are some datasets available for the gene expression and most researchers used the datasets such as Yeast cell-cycle dataset, Small Round Blue, Cell Tumo, Acute lymphoblastic leukemia, Mixed-lineage leukemia, Affymetrix oligonucleotide Human, etc.

Yeast cell-cycle dataset utilized in the gene classification research [16]. This dataset used in the research [16] and this is publicly available dataset. This dataset retrieved from more than 6000 yeast genes calculated over two complete cell cycle at 17 conditions. In the research of [16], 2884 genes considered for the experiment. GOTermFinder is the tool, which helped to identify the genes that may have characteristics in common.Some researchers [16 - 17] used this tool and it is available in the Saccharomyces Genome

(3)

Vol. 28, No. 12, (2019), pp. 68-78

Database (SGD). This tool developed to search for characteristic shared GO terms by the group of genes.

In the research of [18], used three different kinds of database and this includes the binary and multi-class datasets. These three datasets are SRBCT data, acute

lymphoblastic leukemia (ALL) and acute myeloid leukemia (AML), and myeloid lymphoid leukemia. SRBCT data has four cancers classes namely, neuroblastoma (NB), non-Hodgkin lymphoma (NHL), Ewing sarcoma (EWS) and rhabdomyosarcoma (RMS).

Total number of genes present in dataset are 2308 and 88 samples used in their research [18]. ALL_AML datasets have ALL and AML cancer classes, which has the total genes of 7129. There are three kinds of classes present in MLL namely, MLL, ALL, and AML.

2.2 Feature Selection

Feature selection is the important process in this method and there are lots of genes present in the datasets. As shown in the dataset description, there are thousands of gene models present in the datasets. A selection of important and relevant gene regarding to the disease is the crucial task. Most of the researcher focused on classification of cancer genes such as Colon cancer, acute leukemia, prostate cancer, lung cancer II, and high- grade glioma. Some techniques clustered the data using the gene features. A lot of features extracted for the research of microarry medical data classification. The latest research extracted some features, which are briefly described in this section.

Self-training method used in the research of [19], which is one of the commonly used semi-supervised algorithms. It considered as wrapper technique around a supervised classifier, i.e. self-training can be used to improve the classifiers. Each instance x* need to be classified before applying self-training, except its predicted class label, the classifier must be able to output a certain score and this is the estimation of how likely the class label prediction is accurate by the classifier. Fuzzy backward feature elimination (FBFE) and Independent component analysis (ICA), are a new hybrid feature extraction from the gene database [20]. ICA is a projection method that creates components with desired property, which decomposes from the dataset. This technique decomposes an input dataset into components until that each component statically as not depend on one from the other component as possible and this is proven in many applications. This is the extension of PCA and this project the data into a new space spanned by the principal Component. ICA aims to identifies a linear representation of non-Gaussian data in such a case of components are independent. The ICA vectors are used by the fuzzy feature selection method to find the best gene subset for the better classification task. This will also extract the components, which are equal to the observational variables m for which again 2m gene subsets exist and this is one of the major limitation in the system. The maximum relevance–minimum redundancy is employed in the research [21] for feature extraction. The modified cat swarm optimization is applied for the dimensionality reduction in the microarray data and has the effective performance [22]. Cao, et al. [23]

developed the multi-objective feature selection algorithm with adoption strategy considering the classification error, feature number and feature redundancy. The Hidden Markov Model (HMM) is used for combining multiple feature selection criteria with five feature selection ranking method [24].

One of the important feature selection technique for the gene selection is the PSO and the research used this method along with the modification are given as below.

2.3 Particle Swarm Optimization

PSO is population-based technique and population N particles revolves around in the D-dimensional search space [25]. In PSO, particles assign their location based on the global optima based on two factors during search process: particle best position is measured by itself, p_bestiand the best position is calculated with whole population g_bestigiven as follows.

(4)

Vol. 28, No. 12, (2019), pp. 68-78 (1)

(2)

In which f(.)is the fitness function that based on the issues to be solved. (x_i )and speed (v_i )of each particle are obtained in the next iterations shown in the mathematical equation (3) & (4):

(3)

(4)

wherec_1and c_2are acceleration parameters and, r_1and r_2are represent the random numbers present in between 0 and 1. Wis the inertia weight, which changeswith each iteration. The general pseudo code for PSO is shown below.

Algorithm 1: General Pseudo code for PSO

For each particle do initialize position initialize velocity set

end for repeat

for each particle do

evaluate the fitness function for each particle

evaluate personal best (pbest) and global best (gbest) for each dimension do

calculate new velocity end for

calculate new position end for

until some convergence criteria is satisfied

The Algorithm for the PSO along with the dICA method is shown below [26]:

Algorithm 2: PSO-dICA algorithm

Step 1. Data is centered as: .

Step 2. To obtain orthonormal features, the centered data is Whitened, Step 3. Initialize the particles (position and velocity of particles) randomly.

Step 4. Evaluate the fitness of each particle Step 5. Calculate local best position Step 6. Calculate the global best position Step 7. Update the velocity and position

Step 8. If the maximum number of iterations is not satisfied, go to step 5; else is obtained

(5)

Vol. 28, No. 12, (2019), pp. 68-78 Step 9. Obtain features

2.4 Classifiers

The feature that is highly relevant to the disease need to be extracted and after the feature extraction, the classifiers have to be trained. Many classifiers are available for the training and testing the gene related to the disease. This will provide the output of normal and affected tissues, then it is evaluated to measure the classification accuracy and other parameters. In the medical data classification of gene method [15], used a multilayer perceptron (MLP). This is a radial basis function neural network (RBF) and support vector machine (SVM). These three classifiers mostly used for solving different problems.

The training stage faster for RBF than MLP, when the neural network having only one layer. The speed of the system is faster for MLP, when it is using a number of layers. In addition, the learning process that is difficult task in the function of ANN is present between MLP and RBF. Some methods like [27] clustering technique applied for the classification and Kernel-Based Clustering method for Gene extraction used in this paper.

Ensemble learning used in the paper [28] and in the research [29] and [30], two classifiers such as KNN, and SVM used. The classifiers such as SVM, KNN and NB used in the research [31], [32], [33]. Several classifiers used in the method [34] and the paper [35], used Naive Bayes (NB) and SVM.Maniruzzamana, et al. [36] uses the multiple classifiers including SVM, ANN, NB for the gene classification and has the higher efficiency in the gene analysis. Klein, et al. [37] uses the different classification method for the microarray classification and shows that the deep learning has the higher performance in the gene classification.

Microarray data have high dimensional and sample size are present in small. A ten-fold cross validation commonly used to generate the independent training set and test set in order to achieve the objective classification accuracy [38]. Each one from the ten- fold cross-validation technique applied as the test set and remaining nine set used for the training set. Each of the selected genes areapplied to the training in order to rank all the features and select the m top-ranked features. The selected feature used in the trained on the training set and the test set is input data over that selected feature is processed by constructed a classifier.

Deep learning classifier used to develop the deep neural network and it consists of activation function Softmax, Weight initialization method-XAVIER, Bias initialization- 1.0, Normal Distribution, Distribution function [39]. Deep learning is mapped from input to output y=f(x)to find relevant in attributes of x and y present in the dataset and it is taken as universal approximator. Neural Network design depends on the working of the human brain for pattern recognition. There are different depths present in the DNN and convolution neural network, that consists of more hidden layer apart from input and output layer. Panda, et al. [39] combines the Elephant Search Optimization technique with the DNN for the gene selection and evaluated in the different dataset for performance analysis.Liao, et al. [40] proposed the multi-task deep learning technique to solver data sparsity problem and provide more representation for the rare cancer. The deep learning is also known as Stacked neural network. The pseudo code for DNN [39] is given as follows.

Algorithm 3: Deep Neural Network

Step 1: Input denotes data matrix of samples,

as their corresponding output labels, the maximum number of selected attributes .

Step 2: Initialize: and

Step 3: While do

Step 4: Assign

(6)

Vol. 28, No. 12, (2019), pp. 68-78 Step 5: Update weight of hidden layers as well as input weight

Step 6: Multiple times Drop out to be used and then obtain average Step 7: Calculate

Step 8: Update Learning rates using AdaDelta Step 9: Initialize with Xavier initialization

Step 10: Perform and

3. Performance comparison

The latest research on the classification of microarray medical data considered in the comparison of the experimental result. The method used in the respective research along with its significance and limitations are given in the Table. (1).

Table 1. Comparative analysis of various methods Author of the research Classifier

employed in the research

Optimization algorithm Advantages Limitations Performance Evaluation

JiaLv, et al. [31] SVM Classifier

A multi-objective heuristic algorithm

( MOEDA)

This method helps to achieve higher accuracy and also helps in saving computation time.

The algorithm developed to attain higher output in the two objectives, but higher convergence speed is not found in this research.

Accuracy Leukemia =

1.00/0.97, Wang breast

cancer = 0.77/0.69.

CPU time(second)

Leukemia = 364/460, Wang breast

cancer = 1094/1401.

Elyasigomari, et al. [32] SVM The minimum redundancy and maximum relevance

The extracted feature investigated and it is concluded that these features are more relevant to each type of cancer.

The accuracy of 8 genes was 97% and increase the number of genes causes to decreases in the

classification accuracy for 90-92%.

Genetic algorithm accuracy, Leukemia =

100%, Prostate =

98.04%, Lymphoma =

100%.

Dechun Yan and Jiajun Wang. [33]

The biclustering

operation applied only

in relevant genes

K-means algorithm This algorithm helps in estimate the missing data and well- performance in the

biclustering.

The proposed technique has higher output than the other existing method but didn’t have the effective performance.

Average mean-square-

residue = 181.8, Average volume =

1956.2.

Rasmita Dash. [34] Artificial neural network, k-

Nearest

Hybridizedharmonysearch and Pareto optimization.

Hybridizing method helps to give high performance in

The features contain the both significant and some

Proposed method accuracy in

Leukemia

(7)

Vol. 28, No. 12, (2019), pp. 68-78 Neighbor,

naive Bayesian classifier, support vector

machine.

the

classification and also in the prediction of feature subset.

insignificant genes, so this requires bi- objective parameter optimization.

database = 0.96, colon database =

0.74.

Shun Guo, et al. [35] SVM L1-regularized optimization problem, with newly definedlinear

discriminant analysis parameter.

The proposed gives the effective performance compared to the state-of-art method.

To evaluate the proposed method ten databases used and the proposed method achieved high performance than the state- of-art method in two

databases only.

Average classification

accuracy, Breast = 60.39(7.34),

Colon = 84.69(8.56).

ArunkumarChinnaswamy and Ramakrishnan

Srinivasan. [36]

Random forest classifier.

Fuzzy rough quick reduct algorithm.

The proposed method gives high

performance in the various parameters including classification accuracy, precision, recall, f- measure etc.

The

computation time of the proposed method is not measured in this method.

Classification accuracy, Leukemia gene = 97.22,

Ovarian cancer = 97.23.

IndrajitSaha , et al. [37] SVM Fuzzy C-means, Variable length genetic algorithm

based fuzzy clustering and Cluster validity

indices.

The

experimental result shows that this method achieves more accuracy independent of clustering method chosen.

The proposed technique is needed to be tested for the different settings of parameters to analysis its function.

P-values, Fungal-type

cell wall = 8.49E-07, Preribosome

= 9.41E-27.

Mohapatra, et al. [38] Ridge regression,

Online sequential

ridge regression, Kernel ridge

regression, Support

vector machine and

Random forest.

Modified cat swarm optimization

The proposed feature selection with two variations gives the best performance compared to existing methods.

The value of training is higher than the testing and the evaluation is required for the low training and more testing process.

Testing accuracy for Colon tumor,

Ridge regression =

9000, Random

forest = 79.16, support

vector machine radial basis

function = 0.70.

Hanaa Salem, et al. [39] Genetic Combination of both This clearly The Classification

(8)

Vol. 28, No. 12, (2019), pp. 68-78 Programming

classifier.

Information Gain (IG) and Standard Genetic

Algorithm (SGA).

shows that the genetic algorithm improves the classification performance compared to other existing method.

computation time is very high compared to other methods.

accuracy, Leukemia dataset =

97.06%, Colon Tumor = 85.48%.

SinaTabakhi, et al. [40] NB and SVM Microarray Gene Selection Based On Ant

Colony Optimization

Simulation result shows that the proposed method select the subset of genes with minimum redundancy and maximum relevance method.

An another metric function is needed to develop for enhancing the performance of the method.

NB classifier Error rate =

35.45 %, SVM classifier error rate =

23.63 %.

This table compares the various method developed to classify the diseases based on the gene. Gene extraction method plays the vital role in classifying the disease. Several parameters compared in this table. Classifier accuracy is the major parameters used by the researchers and some other parameters are also compared.

4. Conclusion

Over the last two decades, technology of DNA microarray provided opportunities to analysis genes related to the disease. There are some databases available for the gene expression wherea huge number of genes present in each database. Feature extraction helps to identify the relevant genes to the disease. Several recent year papersconsidered in this study to investigate the performance of the respective method in classification of gene.The classifiers such as SVM, knn, and random forest etc., used in several methods for evaluation purpose.Genetic Programming classifier and selected feature using Combination of both Information Gain (IG) and Standard Genetic Algorithm (SGA). This helped to achieve the classification accuracy up to 97.06% in Leukemia dataset. But this has the limitations of very high computational time. So, new technique requires to reduce the performance time.

References

[1] Chen, L., Sun, F., Yang, X., Jin, Y., Shi, M., Wang, L., Shi, Y., Zhan, C. and Wang, Q., 2017. Correlation between RNA-Seq and microarrays results using TCGA data. Gene, 628, pp.200-204.

[2] Mohamed, N.S., Zainudin, S. and Othman, Z.A., 2017. Metaheuristic approach for an enhanced mRMR filter method for classification using drug response microarray data. Expert Systems with Applications, 90, pp.224-231.

[3] Apolloni, J., Leguizamón, G. and Alba, E., 2016. Two hybrid wrapper-filter feature selection algorithms applied to high-dimensional microarray experiments. Applied Soft Computing, 38, pp.922-932.

[4] Singer, A., Maya, I., Koifman, A., Samra, N.N., Baris, H.N., Falik-Zaccai, T., Shachar, S.B. and Sagi-Dain, L., 2018. Microarray analysis in pregnancies with isolated echogenic bowel. Early human development, 119, pp.25-28.

(9)

Vol. 28, No. 12, (2019), pp. 68-78 [5] Duan, H.L., Zhu, X.Y., Zhu, Y.J., Wu, X., Zhao, G.F., Wang, W.J. and Li, J., 2019. The application of chromosomal microarray analysis to the prenatal diagnosis of isolated mild ventriculomegaly. Taiwanese Journal of Obstetrics and Gynecology, 58(2), pp.251-254.

[6] Wang, R., Lei, T., Fu, F., Li, R., Jing, X., Yang, X., Liu, J., Li, D. and Liao, C., 2019. Application of chromosome microarray analysis in patients with unexplained developmental delay/intellectual disability in South China.

Pediatrics & Neonatology, 60(1), pp.35-42.

[7] Lotfi, E. and Keshavarz, A., 2014. Gene expression microarray classification using PCA–BEL. Computers in biology and medicine, 54, pp.180-187.

[8] Sun, M., Liu, K., Wu, Q., Hong, Q., Wang, B. and Zhang, H., 2019. A novel ECOC algorithm for multiclass microarray data classification based on data complexity analysis. Pattern Recognition, 90, pp.346-362.

[9] Ghosh, M., Begum, S., Sarkar, R., Chakraborty, D. and Maulik, U., 2019.

Recursive memetic algorithm for gene selection in microarray data. Expert Systems with Applications, 116, pp.172-185.

[10] Dashtban, M., Balafar, M. and Suravajhala, P., 2018. Gene selection for tumor classification using a novel bio-inspired multi-objective approach. Genomics, 110(1), pp.10-17.

[11] Dadaneh, B.Z., Markid, H.Y. and Zakerolhosseini, A., 2016. Unsupervised probabilistic feature selection using ant colony optimization. Expert Systems with Applications, 53, pp.27-42.

[12] Alshamlan, H.M., Badr, G.H. and Alohali, Y.A., 2015. Genetic Bee Colony (GBC) algorithm: A new gene selection method for microarray cancer classification. Computational biology and chemistry, 56, pp.49-60.

[13] Elyasigomari, V., Lee, D.A., Screen, H.R. and Shaheed, M.H., 2017.

Development of a two-stage gene selection method that incorporates a novel hybrid approach using the cuckoo optimization algorithm and harmony search for cancer classification. Journal of biomedical informatics, 67, pp.11-20.

[14] Mohapatra, P., Chakravarty, S. and Dash, P.K., 2016. Microarray medical data classification using kernel ridge regression and modified cat swarm optimization based gene selection system. Swarm and Evolutionary Computation, 28, pp.144-160.

[15] Garro, B.A., Rodríguez, K. and Vázquez, R.A., 2016. Classification of DNA microarrays using artificial neural networks and ABC algorithm. Applied Soft Computing, 38, pp.548-560.

[16] Ayadi, W., Elloumi, M. and Hao, J.K., 2012. BiMine+: an efficient algorithm for discovering relevant biclusters of DNA microarray data. Knowledge-Based Systems, 35, pp.224-234.

[17] Ayadi, W. and Hao, J.K., 2014. A memetic algorithm for discovering negative correlation biclusters of DNA microarray data. Neurocomputing, 145, pp.14- 22.

[18] Kar, S., Sharma, K.D. and Maitra, M., 2015. Gene selection from microarray gene expression data for classification of cancer subgroups employing PSO and adaptive K-nearest neighborhood technique. Expert Systems with Applications, 42(1), pp.612-627.

[19] Buza, K., 2016. Classification of gene expression data: a hubness-aware semi- supervised approach. Computer methods and programs in biomedicine, 127, pp.105-113.

[20] Aziz, R., Verma, C.K. and Srivastava, N., 2016. A fuzzy based feature selection from independent component subspace for machine learning classification of microarray data. Genomics data, 8, pp.4-15.

[21] Arevalillo, J.M. and Navarro, H., 2013. Exploring correlations in gene expression microarray data for maximum predictive–minimum redundancy

(10)

Vol. 28, No. 12, (2019), pp. 68-78

biomarker selection and classification. Computers in biology and medicine, 43(10), pp.1437-1443.

[23] Cao, B., Zhao, J., Yang, P., Yang, P., Liu, X., Qi, J., Simpson, A., Elhoseny, M., Mehmood, I. and Muhammad, K., 2019. Multiobjective feature selection for microarray data via distributed parallel algorithms. Future Generation Computer Systems.

[24] Momenzadeh, M., Sehhati, M. and Rabbani, H., 2019. A Novel Feature Selection Method for Microarray Data Classification Based on Hidden Markov Model. Journal of biomedical informatics, p.103213.

[25] Pashaei, E. and Aydin, N., 2017. Binary black hole algorithm for feature selection and classification on biological data. Applied Soft Computing, 56, pp.94-106.

[26] Mollaee, M. and Moattar, M.H., 2016. A novel feature extraction approach based on ensemble feature selection and modified discriminant independent component analysis for microarray data classification. Biocybernetics and Biomedical Engineering, 36(3), pp.521-529.

[27] Chen, H., Zhang, Y. and Gutman, I., 2016. A kernel-based clustering method for gene selection with gene expression data. Journal of biomedical informatics, 62, pp.12-20.

[28] Liu, Z., Tang, D., Cai, Y., Wang, R. and Chen, F., 2017. A hybrid method based on ensemble WELM for handling multi class imbalance in cancer microarray data. Neurocomputing, 266, pp.641-650.

[29] Liu, K.H., Zeng, Z.H. and Ng, V.T.Y., 2016. A Hierarchical Ensemble of ECOC for cancer classification based on multi-class microarray data.

Information Sciences, 349, pp.102-118.

[30] Brahim, A.B. and Limam, M., 2016. A hybrid feature selection method based on instance learning and cooperative subset search. Pattern Recognition Letters, 69, pp.28-34.

[31] Sharbaf, F.V., Mosafer, S. and Moattar, M.H., 2016. A hybrid gene selection approach for microarray data classification using cellular learning automata and ant colony optimization. Genomics, 107(6), pp.231-238.

[32] Bolón-Canedo, V., Sánchez-Maroño, N. and Alonso-Betanzos, A., 2015.

Distributed feature selection: An application to microarray data classification.

Applied soft computing, 30, pp.136-150.

[33] Dashtban, M. and Balafar, M., 2017. Gene selection for microarray cancer classification using a new evolutionary method employing artificial intelligence concepts. Genomics, 109(2), pp.91-107.

[34] Jain, I., Jain, V.K. and Jain, R., 2018. Correlation feature selection based improved-Binary Particle Swarm Optimization for gene selection and cancer classification. Applied Soft Computing, 62, pp.203-215.

[35] Al-Rajab, M., Lu, J. and Xu, Q., 2017. Examining applying high performance genetic data feature selection and classification algorithms for colon cancer diagnosis. Computer methods and programs in biomedicine, 146, pp.11-24.

[36] Maniruzzaman, M., Rahman, M.J., Ahammed, B., Abedin, M.M., Suri, H.S., Biswas, M., El-Baz, A., Bangeas, P., Tsoulfas, G. and Suri, J.S., 2019.

Statistical Characterization and Classification of Colon Microarray Gene Expression Data using Multiple Machine Learning Paradigms. Computer Methods and Programs in Biomedicine.

[37] Klein, O., Kanter, F., Kulbe, H., Jank, P., Denkert, C., Nebrich, G., Schmitt, W.D., Wu, Z., Kunze, C.A., Sehouli, J. and Darb‐Esfahani, S., 2019.

(11)

Vol. 28, No. 12, (2019), pp. 68-78

MALDI‐Imaging for Classification of Epithelial Ovarian Cancer Histotypes from a Tissue Microarray Using Machine Learning Methods. PROTEOMICS–

Clinical Applications, 13(1), p.1700181.

[38] Wang, A., An, N., Chen, G., Li, L. and Alterovitz, G., 2015. Improving PLS–

RFE based gene selection for microarray data classification. Computers in biology and medicine, 62, pp.14-24.

[39] Panda, M., 2017. Elephant search optimization combined with deep neural network for microarray data analysis. Journal of King Saud University- Computer and Information Sciences.

[40] Liao, Q., Ding, Y., Jiang, Z.L., Wang, X., Zhang, C. and Zhang, Q., 2019.

Multi-task deep convolutional neural network for cancer diagnosis.

Neurocomputing, 348, pp.66-73.

[41] Lv, J., Peng, Q., Chen, X. and Sun, Z., 2016. A multi-objective heuristic algorithm for gene expression microarray data classification. Expert Systems With Applications, 59, pp.13-19.

[42] Elyasigomari, V., Lee, D.A., Screen, H.R. and Shaheed, M.H., 2017.

Development of a two-stage gene selection method that incorporates a novel hybrid approach using the cuckoo optimization algorithm and harmony search for cancer classification. Journal of biomedical informatics, 67, pp.11-20.

[43] Yan, D. and Wang, J., 2013. Biclustering of gene expression data based on related genes and conditions extraction. Pattern Recognition, 46(4), pp.1170- 1182.

[44] Dash, R., 2018. An Adaptive Harmony Search Approach for Gene Selection and Classification oh High Dimensional Medical Data. Journal of King Saud University-Computer and Information Sciences.

[45] Guo, S., Guo, D., Chen, L. and Jiang, Q., 2016. A centroid-based gene selection method for microarray data classification. Journal of theoretical biology, 400, pp.32-41.

[46] Chinnaswamy, A. and Srinivasan, R., 2018. Attribute Selection using fuzzy roughset based customized similarity measure for lung cancer microarray gene expression data. Future Computing and Informatics Journal.

[47] Saha, I., Maulik, U., Bandyopadhyay, S. and Plewczynski, D., 2011.

Improvement of new automatic differential fuzzy clustering using SVM classifier for microarray analysis. Expert Systems with Applications, 38(12), pp.15122-15133.

[49] Salem, H., Attiya, G. and El-Fishawy, N., 2017. Classification of human cancer diseases by gene expression profiles. Applied Soft Computing, 50, pp.124-134.

[50] Tabakhi, S., Najafi, A., Ranjbar, R. and Moradi, P., 2015. Gene selection for microarray data classification using a novel ant colony optimization.

Neurocomputing, 168, pp.1024-1036.