Feature Selection Techniques and Microarray Data: A Survey

(1)

International Journal of Emerging Technology and Advanced Engineering

Website: www.ijetae.com (ISSN 2250-2459, ISO 9001:2008 Certified Journal, Volume 4, Issue 1, January 2014)

179

Feature Selection Techniques and Microarray Data: A Survey

Sonawane Shraddha

1

, Nawathe Anuradha

2

, Sonawane Swapnil

3

1_{ME Student,}2_{Assistant Professor, A.V.C.O.E. Sangamner.}

3

B.Tech Student, V.I.T. Pune, India.

Abstract

—

Feature selection techniques became a lucid want in many bioinformatics applications. Additionally to the massive pool of techniques that have already been developed within the machine learning and data processing fields, specific applications in bioinformatics have crystal rectifier to a wealth of freshly projected techniques.

One of the objectives of coming up with feature choice learning algorithms is to get classifiers that rely on a little number of attributes and have verifiable future performance guarantees. There are a unit few, if any, approaches that with success address the two goals at the same time.

In this article, we tend to build the interested reader awake to the possibilities of feature selection, providing a basic taxonomy of traditional feature selection techniques, and discussing the premise of conjunction of decision stumps in Occam’s Razor, Sample Compression, and PAC-Bayes learning setting for distinctive a little set of attributes that may be accustomed perform reliable classification tasks.

This proposed work presents a brief survey of the feature selection techniques.

Keywords—Feature selection, microarray data, decision stump, gene identification.

I. INTRODUCTION

Pattern recognition techniques were not designed to deal with bulky amounts of options, combining it with feature selection techniques has become a necessity in many applications [5]. Feature selection basically deals with the following issues: (a) to minimize overfitting and increase performance (b) to produce faster and more cost effective models and (c) to achieve a deeper insight into the processes that generated the info. Incorporate this search within the superimposed area of feature subsets in the model choice. When dealing with classification, feature selection techniques can be categorized into 3 classes, reckoning on how they combine the feature selection search with the construction of the classification model: filter, wrapper and embedded methods.

II. TRADITIONAL FEATURE SELECTION TECHNIQUES

The filter model depends on the final characteristics of information and evaluates options while not involving any learning algorithmic rule.

While, the wrapper model needs a preset learning algorithmic rule and uses its performance as analysis criterion to select options. Algorithms with embedded model, e.g., C4.5 [25] and LARS [14], incorporate variable selection as a region of the coaching method, and have connectedness is obtained analytically from the target of the training model. Feature selection algorithms with filter and embedded models might come back either a subset of elect options or the weights of all options. In line with the type of the output, they will be divided into feature coefficient and set choice algorithms.

According to the type of the output, they will be divided into feature coefficient and set choice algorithms. Algorithms with wrapper model typically come back feature set. According to our information, presently most feature selection algorithms are designed to handle learning tasks with single knowledge supply, though the capability of mistreatment auxiliary knowledge sources in multi-source feature choice might greatly enhance the learning performance [21,29]. Below, we have a tendency to visit the key idea of connectedness & redundancy for feature selection, similarly because the necessary parts during a feature choice method. Table provides a details about the filters of feature selection methods, Filter techniques assess the relevance of features by looking only at the intrinsic properties of the info. In most cases a feature relevance score is calculated, and low-scoring options are removed. Afterwards, this subset of features is presented as input to the classification algorithm. Advantages of filter techniques area unit that they simply scale to terribly high-dimensional datasets, they area unit computationally easy and quick, and that they area unit independent of the classification rule. As a result, feature selection needs to be performed only once, then completely different classifiers will be evaluated. A common drawback of filter ways is that they ignore the interaction with the classifier which most proposed techniques are univariate.

(2)

International Journal of Emerging Technology and Advanced Engineering

180

In order to overcome the drawback of ignoring feature dependencies, variety of multivariate filter techniques were introduced, aiming at the incorporation of feature dependencies to some degree. Whereas filter techniques treat the drawback of finding a smart feature subset independently of the model selection step, wrapper ways plant the model hypothesis search inside the feature set search. During this setup, a pursuit procedure in the area of attainable feature subsets is outlined, and various subsets of options are generated and evaluated. The analysis of a selected set of options is obtained by coaching and testing a specific classification model, rendering this approach tailored to a specific classification rule. To search the space of all feature subsets, a pursuit rule is then ‘wrapped’ around the classification model. However, as the space of feature subsets grows exponentially with the amount of features, heuristic search methods are wont to guide the search for associate optimum set.

Recently, there are quite an few surveys revealed to serve this purpose. As an example, 2 comprehensive surveys for feature selection published in machine learning or data point domain may be found in [15, 19]. In [26], the authors provided a good review for applying feature choice techniques in bioinformatics. In [16], the authors surveyed the filter and also the wrapper model for feature choice. In [23], the authors explore the representative feature selection approaches supported distributed regularization, that could be a branch of embedded model. Representative feature selection algorithms also are through empirical observation compared and evaluated in [17, 18, 20, 22, 24, 27, 28] with different drawback settings from different views.

These search ways will be divided in two classes: settled and irregular search algorithms. Advantages of wrapper approaches embrace the interaction between feature subset search and model selection, and the ability to require under consideration feature dependencies. A common drawback of those techniques is that they need a better risk of overfitting than filter techniques and are terribly computationally intensive, particularly if building the classifier includes a high computational cost. In a third category of feature choice techniques, termed embedded techniques, the search for associate optimum set of features is constructed into the classifier construction, and may be seen as a pursuit within the combined space of feature subsets and hypotheses. Similar to wrapper approaches, embedded approaches are so specific to a given learning rule.

Embedded ways have the advantage that they embrace the interaction with the classification model, while at constant time being so much less computationally intensive than wrapper ways.

TABLEI FILTERS

Filters Supe rvise d

Unsu pervi sed

Univ ariate

Multi variat e

Feature weighi ng

Feat ure set

Laplacian Score

No Yes Yes No Yes No

SPEC No Yes Yes No Yes No

Fisher Score YES No Yes No Yes No

ReliefF YES No Yes No Yes No

t-score, YES No Yes No Yes No

F-score YES No Yes No Yes No

FCBF YES No No Yes Yes No

MBF[4] YES No No Yes Yes No

Chi-square Score

YES No Yes No Yes No

Kruskal Wallis

Gini Index YES No Yes No Yes No

Information Gain

FCBF[5] YES No No Yes No Yes

CFS[3] YES No No Yes No Yes

mRmR YES No No Yes No Yes

(3)

International Journal of Emerging Technology and Advanced Engineering

181

III. MICROARRAY DATA AND FEATURE SELECTION

A microarray is typically a glass slide on to that DNA molecules area unit fixed in associate orderly manner at specific locations referred to as spots (or features). A microarray could contain thousands of spots and every spot could contain a couple of million copies of identical DNA molecules that unambiguously correspond to a factor. The DNA in an exceedingly spot could either be genomic DNA or short stretch of oligo-nucleotide strands that correspond to a factor. The spots area unit written on to the glass slide by a mechanism or area unit synthesized by the method of lithography.

Feature set selection may be seen as an exploration through the space of feature subsets. Four queries got to be answered in terms of the search method [2,31]:

 Wherever to start out the search within the feature space?

The beginning point can decide the direction of the search. The search will begin with associate empty set and in turn add useful options to the present set. this is often known as forward selection. An alternative would be beginning with a full set and successively removing useless options. This is often known as backward elimination. begining the search from somewhere in the middle of the feature set is additionally attainable. The search may be performed by either adding helpful or removing useless options.

 How to judge subsets or features?

There exist 2 general strategies, specifically filters and wrappers. Most filter approaches judge options by giving them a score according to general characteristics of the coaching set. By setting a threshold, they then take away impertinent options. If the score of a sequence is on top of the edge, the sequence can be elect. There also are some filter approaches, such as CFS, that assign a score to subsets of options. Wrapper approaches, in contrast, take biases of machine learning algorithms into consideration once choosing options. They apply a machine learning algorithmic rule to feature subsets and use cross-validation to cypher a score for them.

 The way to search?

Associate thorough search of the whole feature subspace is impractical even with the present customary of computational power. A typical microarray cancer information set contains a number of thousands genes as options. Heuristic search methods like greedy hill ascension and best first square measure sometimes applied.

Greedy hill ascension search considers only native changes to a feature set. It evaluates all the attainable native changes to the present feature set, such as adding one feature to the set or removing one. It chooses the most effective or just the primary amendment that improves the score of the feature set. Once a amendment is formed for a feature set, it's never reconsidered. Best first search is comparable to greedy hill ascension however with the distinction that it will return to a brighter previous subset if it finds the present set isn't worthy to be explored.

 When to prevent searching?

The addition or removal of options should be stopped once none of the alternatives improves the score of a current feature set. Another criterion would be to revise the feature set unceasingly as long because the score doesn't degrade or to continue generating feature subsets till reaching the opposite finish of the feature area so choose the most effective.

IV. METHOD REVIEW:UNCONVENTIONAL APPROACH

TOWARDS FEATURE SELECTION FOR MICROARRAY DATA

Feature Selection with Conjunctions of Decision Stumps and Learning from Microarray Data [1]

In this work decision stump is the important phenomenon used along with feature selection. A decision stump may be a machine learning model consisting of a one-level decision tree. It's a decision tree with one internal node (the root) that is straight away connected to the terminal nodes (its leaves). A decision stump makes a prediction supported the worth of simply one input feature. Generally they're conjointly referred to as 1 rules.

A. An Occam’s Razor Approach[8]

Algorithm Presentation:

 The initial approach toward learning the conjunction (or disjunction) of call stumps is that the Occam’s Razor approach. Basically, we tend to would like to get a hypothesis that can be coded victimization the smallest amount range of bits. We tend to initial propose an Occam’s Razor risk sure which can ultimately guide the learning algorithmic program.

New Contribution(s):

 Occam’s Razor is able to find sparse classifiers (with very few genes).

(4)

International Journal of Emerging Technology and Advanced Engineering

182

Drawback:

 An Occam Razor’s approach is not able to obtain classification accuracy.

B. Sample Compression Approach:[10]

 The basic plan of the sample compression framework is to get learning algorithms with the property that the generated classification can typically be reconstructed with a really little set of training examples.

 Aim’s at getting distributed classifiers with minimum range of stumps. This sparsity is implemented by choosing the classifiers with least encryption of the message strings.

 Sample Compression approach finds a predictor that depends upon margin directly.

Drawback:

 Sample Compression approach is not able to obtain classification accuracy.

C. Pac-Bay’s Approach: [7,9,11,12,13]

It examine if sacrificing this sparsity in terms of a larger separating margin round the call boundary will lead us to classifiers with smaller generalization error. the educational algorithmic program relies on the PAC-Bayes approach that aims at providing probably just about Correct guarantees to “Bayesian” learning algorithms per terms of a previous distribution P (before the observation of the data) and a data-dependent, posterior distribution alphabetic character over an area of classifiers.

 It performs significant margin sparsity tade-off.

 It is most competitive with the best performing classifier.

 It provides added advantage of using very few genes.

Drawback:

Risk bound is the factor which should be taken into consideration.

V. CONCLUSION

In this paper we review the existing techniques for feature selection. We discussed a variety of feature selection techniques. For each technique we have provided a detailed explanation of each technique. From this analysis, a number of shortcomings and limitations were highlighted of these techniques.

Further study includes development of efficient algorithm which will explore the utility of the risk bound and will provide classification accuracy.

REFERENCES

[1] Mohak Shah, Member, IEEE, Mario Marchand, and Jacques Corbeil “Feature Selection with Conjunctions of Decision Stumps and Learning from Microarray Data,” IEEE transactions on pattern analysis and machine intelligence, vol. 34, no. 1, january 2012 [2] Yvan Saeys, Inaki Inza and Pedro Larranaga. “A review of feature

selection techniques in bioinformatics,” Vol. 23 no. 19, pp 2507– 2517, 2007

[3] Hall,M. (1999) “Correlation-based feature selection for machine learning”. PhD, Thesis., Department of Computer Science, Waikato University, New Zealand.

[4] Koller,D. and Sahami,M. (1996) “Toward optimal feature selection. In Proceedings of the Thirteenth International Conference on MachineLearning”, Bari, Italy, pp. 284–292.

[5] Yu,L. and Liu,H. (2004) “Efficient feature selection via analysis of relevance and redundancy.” J. Mach. Learn. Res., 5, 1205– 1224.Tavel, P. 2007

[6] M. Eisen and P. Brown, “DNA Arrays for Analysis of GeneExpression,” Methods in Enzymology, vol. 303, pp. 179-205, 1999.Sannella, M. J. 1994 Constraint Satisfaction and Debugging for Interactive User Interfaces. Doctoral Thesis. UMI Order Number: UMI Order No. GAX95-09398., University of Washington. [7] M. Marchand and M. Shah, “PAC-Bayes Learning of

Conjunctionsand Classification of Gene-Expression Data,” Proc. Advancesin Neural Information Processing Systems, pp. 881-888, 2005.Forman, G. 2003. An extensive empirical study of feature selection metrics for text classification. J. Mach. Learn. Res. 3 (Mar. 2003), 1289-1305.

[8] A. Blumer, A. Ehrenfeucht, D. Haussler, and M. Warmuth, “Occam’s Razor,” Information Processing Letters, vol. 24, pp. 377-380,1987.

[9] Brown, L. D., Hua, H., and Gao, C. 2003. A widget framework for augmented interaction in SCAPE. D. McAllester, “Some PAC-Bayesian Theorems,” Machine Learning,vol. 37, pp. 355-363, 1999. [10] D. Kuzmin and M.K. Warmuth, “Unlabeled Compression Schemes

for Maximum Classes,” J. Machine Learning Research,vol. 8, pp. 2047-2081, 2007.

[11] D. McAllester, “Some PAC-Bayesian Theorems,” Machine Learning,vol. 37, pp. 355-363, 1999.

(5)

International Journal of Emerging Technology and Advanced Engineering

183 [13] M. Seeger, “PAC-Bayesian Generalization Bounds for

GaussianProcesses,” J. Machine Learning Research, vol. 3, pp. 233-269,2002.

[14] B. Efron, T. Hastie, I. Johnstone, and R. Tibshirani. Least angle regression. Annals of Statistics, 32:407{49}, 2004.

[15] I. Guyon and A. Elissee_. “An introduction to variable and feature selection”. Journal of Machine Learning Research, 3:1157{1182}, 2003.

[16] Inaki Inza, Pedro Larranaga, Rosa Blanco, and Antonio J.Cerrolaza. Filter versus wrapper gene selection approaches in dna microarray domains. Arti_cial Intelligence in Medicine, 31:91{103}, 2004. [17] Carmen Lai, Marcel J T Reinders, Laura J van't Veer, and Lodewyk

F A Wessels. A comparison of univariate and multivariate gene selection techniques for classi_cation of cancer datasets. BMC Bioinformatics, 7:235, 2006.

[18] Tao Li, Chengliang Zhang, and Mitsunori Ogihara. A comparative study of feature selection and multiclass classi_cation methods for tissue classi_cation based on gene expression. Bioinformatics, 20(15):2429{2437}, Oct 2004.

[19] H. Liu and L. Yu. Toward integrating feature selection algorithms for classi_cation and clustering. IEEE Transactions on Knowledge and Data Engineering, 17:491{502}, 2005.

[20] Huiqing Liu, Jinyan Li, and Limsoon Wong. A comparative study on feature selection and classification methods using gene expression profiles and proteomic patterns. Genome Inform, 13:51{60},2002. [21] J. Lu, G. Getz, E. A. Miska, E. Alvarez-Saavedra, J. Lamb, D. Peck,

A. Sweet-Cordero, B. L. Ebert, R. H. Mak, A. Ferrando, J. R. Downing, T. Jacks, H. R. Horvitz, and T. R. Golub. Microrna expression profiles classify human cancers. Nature, 435:834{838}, 2005.

[22] Shuangge Ma. Empirical study of supervised gene screening. BMC Bioinformatics, 7:537, 2006.

[23] Shuangge Ma and Jian Huang. Penalized feature selection and classification in bioinformatics. Brief Bioinformatics, 9(5):392{403, Sep 2008.

[24] Carl Murie, Owen Woody, Anna Lee, and Robert Nadon. Comparison of small n statistical tests of differential expression applied to microarrays. BMC Bioinformatics, 10(1):45, Feb 2009. [25] J. R. Quinlan. C4.5: Programs for Machine Learning. Morgan

Kaufmann, 1993.

[26] Yvan Saeys, Iaki Inza, and Pedro Larraaga. A review of feature selection techniques in bioinformatics. Bioinformatics, 23(19):2507{2517}, Oct 2007.

[27] Y. Sun, C. F. Babbs, and E. J. Delp. “A comparison of feature selection methods for the detection of breast cancers in mammograms: adaptive sequential oating search vs. genetic algorithm”. Conf Proc IEEE Eng Med Biol Soc, 6:6532{6535}, 2005.

[28] Michael D Swartz, Robert K Yu, and Sanjay Shete. “Finding factors influencing risk: Comparing bayesian stochastic search and standard variable selection methods applied to logistic regression models of cases and controls”. Stat Med, 27(29):6158{6174, Dec 2008. [29] Z. Zhao, J. Wang, H. Liu, J. Ye, and Y. Chang. “Identifying

biologically relevant genes via multiple heterogeneous data sources”. In The Fourteenth ACM SIGKDD International Conference On Knowledge Discovery and Data Mining (SIGKDD 2008), 2008. [30] Zheng Zhao, Fred Morstatter, Shashvata Sharma, Salem Alelyani,

Aneeth Anand, Huan Liu, Advancing Feature Selection Research, ASU Feature Selection Repository.Langley, P., 1994.