Predictive Analysis of Gene Expression Data from Human SAGE Libraries

(1)

Predictive Analysis of Gene Expression Data

from Human SAGE Libraries

Alexessander Alves1_{, Nikolay Zagoruiko}2_{, Oleg Okun}3_{, Olga Kutnenko}2_{, and} Irina Borisova2

1

Laboratory of Artificial Intelligence and Computer Science, University of Porto Rua do Campo Alegre, 823, 4150 Porto, PORTUGAL

[email protected]

2

Sobolev Institute of Mathematics, Russian Academy of Sciences, Koptyug avenue 4, Novosibirsk, 630090, RUSSIA

[email protected],[email protected],[email protected]

3 _{Machine Vision Group,}

Infotech Oulu and Department of Electrical and Information Engineeering, P.O.Box 4500, 90014 University of Oulu, FINLAND

[email protected]

Abstract. We study the impact of dimensionality reduction methodolo-gies on the performance of classification methods. Typically not differ-entially expressed genes and genes presenting small variation (especially on those expressed at lower levels) are considered unimportant for dis-crimination between different classes and they are removed by filtering techniques from further analysis. We are interested in studying their rel-evance for classification, which has been left unexplored. We compare results obtained using filtering techniques with those from feature selec-tion approaches. Based on experiments, we demonstrate that applying typical filtering approaches negatively impacts on the predictive accuracy of the induced classifiers.

1 Introduction

Predictive analysis of gene expression data consists of classifying samples of gene expression data into specific groups. The goal is to predict phenotypes using information from gene expression profile. Applications include predicting patient response to a drug, disease prognosis and diagnosis. A typical example is selecting a set of gene markers that collectivelly may predict with acceptable accuracy the response of a tumour to a particular chemotherapeutic agent. Another subset may identify tumour subclasses. This is particularly important when standard clinical indicators or histopathology cannot help.

Since a reliable and precise classification of tumours is essential for successful treatment of cancer, predictive analysis has a decisive role in clinical applications. In this context, Serial Analysis of Gene Expression Data (SAGE) is particularly important for cancer research. SAGE method allows collecting the complete gene expression profile for a cell or tissue simultaneously without requiring prior

(2)

knowledge of the complete set of the mRNA to be profiled. This aspect is partic-ularly important on cancer gene profiling because the sequence and role of most mRNA is still unknown. Another advantage of SAGE is that gene expression data prepared with it do not need normalisation, i.e., SAGE libraries coming from different organ/tissue types can be directly compared with each other.

Since it is very expensive to produce data using SAGE method, there are few SAGE datasets publicly available. This, together with the extremely high dimensionality of data, makes the prediction task even harder. In such cases, the effect referred to as curse of dimensionality occurs, which negatively influences on clustering and classification of a given data set. Dimensionality reduction is typically used to mitigate this effect.

In this paper we analyse two methodologies for dimensionality reduction: Feature Selection and Filtering. Filtering selects genes independently. A pointed reason for the superiority of feature selection methods is the preservation of interactions and interdependency among groups of genes.

Our contribution is to assess the impact of typical filtering techniques on the performance of the induced classifiers. In particular, we study the removal of genes not differentially expressed or genes presenting small variation, especially those expressed at lower levels. We also uncover the importance of interactions and interdependency on the results obtained with feature selection methods.

We found empirical evidence casting a negative effect on the predictive ac-curacy of the induced classifiers using filtering techniques. Moreover, the per-formance of the classifiers induced on the data pre-processed with the chosen feature selection algorithms is significantly better than the performance of the classifiers induced on all features or on selected features by the filtering approach. We also have found that some attributes used to build the classifier experiencing best performance are not differentially expressed.

2 Related Work

Though Discovery Challenge Workshop regularly occurred at ECML/PKDD since 1999, gene expression data were only introduced last year, hence a very few articles based on them [1–3].

Gandrillon [1] described the molecular biology background of gene expres-sion data as well as two datasets (small and large) prepared according to the SAGE method [4]. These datasets are gene expression matrices, where biologi-cal situations (or SAGE libraries) are rows and genes are columns (the number of columns far exceeds the number of rows). As remarked by Rioult [3], these datasets have not been studied a lot.

Gamberoni and Storari [2] investigated classification and hierarchical cluster-ing of the large dataset. Two classifications were considered: normal/cancer and by organ type (brain, breast, colon, etc.). Leave-one-out cross-validation (loo-cv) was used to assess performance of two classifiers: support vector machine (SVM) [5] and decision tree (C4.5) [6]. C4.5 discriminated all instances based

(3)

only on 7 genes. But loo-cv classified almost a half of normal samples as cancer-ous, leading to 74.4% accuracy whereas regarding organ classification, accuracy was only 48.9%. With SVM, these figures were 82.2% and 71.1%, respectively. As with C4.5, almost a half of normal samples were misclassified. Hierarchical clus-tering, generating a tree-like representation (dendrogram) from which clusters emerge, resulted in many clusters containing instances of the same organ, though no quantitative cluster validation was given. SVM applied to the clustered data again outperformed C4.5.

Rioult [3] aimed at finding strongly emerging patterns (associations of fea-tures whose frequency increases significantly between two classes) by transposing the expression matrix and applying the Galois connection to a boolean matrix. Strongly emerging patterns have been proven useful in building accurate classi-fiers. Both large and small datasets were used in experiments.

Since gene expression datasets have many thousands of genes and less than a hundred of biological situations, gene selection is typically applied before clas-sification or clustering. Gene selection is viewed as feature selection, hence it can be done either with a filter or with a wrapper approach. The filter approach means that gene importance or relevance is based on intrinsic characteristics of the data, while the wrapper approach involves a predictor to judge on gene relevance. The former approach is simpler and faster than the latter one by keep-ing in mind thousands of features (genes) in microarays. That is why the filter approach dominates gene selection research. However, it does not take into ac-count a predictor, i.e., genes selected may not compose the optimal subset from predictor’s point of view.

Regardless of either approach to gene selection, it is common to select a single gene at a time, i.e., each gene is considered individually and indepen-dently of other genes (see, e.g., [7], where Na¨ıve-Bayes is used which is based on the conditional independence of the predictive features given the class). Never-theless, scientists came to understanding that there are interdependencies and interactions between genes in small groups like in gene regulatory networks [8]. Regarding gene selection for cancer classification treating each gene separately can miss subsets of genes that together allow good separation between normal and tumour tissues. However, to our best knowledge, only pairwise interactions between genes are taken into account [9], i.e., we are unaware of any attempts to select more than two genes at a time.

We were able to find the only work [9] related to cancer classification (nor-mal/tumour tissues) and pairwise gene selection. It describes a filter method ranking the pairs of genes based on t-score. Two variants of this methods are proposed. The all-pairs variant considers all pairs of genes. Given the sorted list of pair scores for all possible pairs, top-ranked disjoint pairs are sought in a greedy manner. First, the pair with highest pair score is selected, then all pairs containing either gene in that pair are removed from the list. Then the highest-scoring pair from the remaining pairs is chosen, etc. The rationale behind this algorithm is that the removal of selected genes from all pairs in the list guaran-tees that a high-scoring gene will only attract its best companion to join in a

(4)

pair while pairs involving a high-scoring gene and ‘bad’ companion genes will be eliminated from selection.

Another (faster) variant in [9] evaluates only a fraction of all gene pairs. Initially it ranks all genes based on individual t-score. Then it selects the best gene and another gene that together maximise the pairwise t-score. These two genes are then removed from the ranked list and the procedure above is repeated for the remaining genes, etc. Though this variant is faster than all pairs, it may miss some high-score pairs because exhaustive selection is avoided.

Inspired by [9], the authors in [10] invented a data transformation procedure from single gene expression to pairwise gene expression ratio. First they compute the correlation coefficient for each gene and select the set of cancer-related genes based on simple thresholding, i.e., only those gene having the absolute value of the correlation coefficient larger than 0.4 are retained for further analysis. Next anM ×M matrix (M =C₂#genes) including all possible combinations of gene pairs is constructed. Each row/column corresponds to a specific gene and the entry at the intersection of row i and columnj corresponds to the ratio of expression levels of genesi andj; hence the pairwise ratio. The pairwise ratios are used as features in order to discriminate between normal and tumour tissues. As a concluding remark, despite some dimensionality reduction techniques allowing the parwise selection of genes, none is capable of coping with more than 2 genes at each time. To fill this gap, in this paper we introduce GRAD: A methodology capable of selectingmgenes at each time and coping with interac-tions and interdependencies between more than 2 genes simultaneously.

3 GRAD Algorithm

This algorithm is based on a combination of forward sequential search (FSS) and backward sequential search (BSS), which will be called AdDel (AdditionDeletion). AdDel consists in the following. FSS (Ad) selectsn1 features, followed by BSS (Del) removingn2(n2< n1) features from the selected by FSS. After that, next

n1features are selected by FSS so that the cardinality of the feature set becomes 2n1−n2. Out of them,n2 features are again deleted by BSS. This alternation of FSS and BSS is done until the targeted number of features is obtained.

Thus AdDel adds/deletes one feature at a time. Therefore its extension called GRAD (GRanular AdDel) has been introduced [11], where subsets (granules) ofnfeatures are added or deleted each time. Advantages of GRAD over AdDel are clear, since several features (not all of them have to be relevant) can compose a good subset because of interactions and interdependence.

Though weakly relevant features can be included in the granule leading to the optimal performance, this set is nevertheless mostly composed of strongly relevant features (Statement 1). Thus, the relevance of a single feature cannot reliably rank it as a potential candidate for inclusion into the granule if that feature is considered in isolation from other features. As a result, exhaustive search for good granules should be employed whenever computer resources allow

(5)

this, which implies the small number of features participating in this process (Statement 2).

GRAD operates as follows:

1. Selection of granules consisting ofnfeatures, wherenvaries from 1 tonmax,

2. Running AdDel on granules selected at the previous step.

In GRAD the AdDel algorithm works with a set of the most relevant gran-ules, each consisting ofnfeatures selected by exhaustive search. In order to limit search space (and therefore computational demands) and based on Occam’s Ra-zor principle that a relevant granule includes a few rather than many features, we setnmax to 3. It means granules of size 1, 2, and 3.

Let N be the number of original features. N features are first sorted in descending order of the individual relevance (judged on the error rate achieved after removing a certain feature from the original set of features) and they form a listP0. IfN is large, the number of combinations of 2 or 3 features is extremely large, thus rendering exhaustive search impractical. Because of this fact, m1 (m1< N) features from the top ofP0are picked (according to Statement 1) and they compose a new listP1, from which listsP2andP3are generated by means of exhaustive search.P2containsm22-features granules whereasP3containsm3 3-features granules. IfN is ofO(103_{) or}_O₍₁₀4_),_m

1=m2=m3= 100 is a good choice. Thus, one can see that according to Statement 2, exhaustive selection is not employed indiscriminately, and the total number of granules is equal to 300 for given values of m1, m2, andm3. After relevant granules of size 1, 2, and 3 have been selected, AdDel is applied as described above, with granules utilised in place of individual features.

As in AdDel, the problem of optimal ratio ofn1 andn2 is still open. Based on experiments, it was found that the optimal ratio n1

n2 = 2

1, wheren1= 6 and

n2= 3.

GRAD can function in two modes, depending on whether a granule once included into the optimal feature set was excluded from further consideration or not. When a granule is allowed to be added to the optimal feature set several times, its weight increases so that it becomes more relevant.

4 Experiments

The experiments involves the estimation of the predictive accuracy experienced by classifiers on the original dataset and on derived versions using dimensionality reduction techniques.

The experiments performed have two goals: on the one hand, to study the effect of dimensionality reduction techniques on the predictive accuracy and on the other hand, to search for an intelligible predictive model that improves on previous results.

As a secondary task of studying dimensionality reduction techniques, an ex-ploratory analysis of results is performed in order to reveal the occurrence of interactions and interdependency between genes selected by the classifier. This

(6)

occurrence would illustrate the importance of these effects on the classification results.

Because the dataset we use lacks normal libraries for some organs while oth-ers do not have cancer libraries, an stratified predictive analysis of all organs is impossible (see Fig. 1). Hence, this analysis will consist of only classifying ex-amples in cancer and normal categories using all libraries of the dataset. Ideally, the goal of this analysis would be discovering a common subset of genes capable of discriminating between normal and cancer conditions for any tissue or cell.

4.1 Dataset Description

We have chosen the small dataset consisting of the expressions of 822 genes in 74 biological situations, comprising a matrix with 74 lines and 823 columns, where the last column is the class of the biological situation. The class is represented as a binary attribute, in which 1 represents the normal condition and 0 the cancer condition. In Fig. 1, distribution of normal and cancerous samples is imbalanced with the bias toward the latter, i.e., 24 samples or 32.43% are normal while 50 samples or 67.57% are cancerous. Thus it will be not surprising if many normal samples would be misclassified as obtained in [2].

In Fig. 1 (b), the cumulative density functions of the gene expression level is presented.

(a) (b)

Fig. 1.Figure (a) Distribution of biological situations for each organ. Notice that four organs do not have cancerous samples in the dataset (b) Cumulative density function of the expression level of all genes in the dataset

4.2 Experimental Design

The design involves the predictive analysis of a set of versions of the original dataset using different classifiers.

(7)

We used the original dataset and datasets pre-processed with dimensionality reduction techniques. The dimensionality reduction techniques considered are: GRAD, Wrapper Approach with C4.5 and Filtering.

For each dataset version we induced the following classification models: De-cision Trees (C4.5), Support Vector Machines (SVM), Radial Basis Function (RBF) networks and Multi-Layer Perceptron Neural Network (NN). The classi-fiers parameters were fairly optimised.

Classifier precision is accessed through ten fold cross validation. The results are then subject to one-way analysis of variance (ANOVA) and multiple hypothe-ses tests in order to make statistically driven conclusions about all classification techniques.

The one-way ANOVA test is first utilised in order to find out whether the mean error rates of all four techniques are the same (null hypothesis H0:µ1=

µ2=· · ·=µ4) or not. If the returned p-value is smaller thanα= 0.05, the null hypothesis is rejected, which implies that the mean error rates are not the same. The next step is to determine which pairs of means are significantly different, and which are the critical values by means of multiple hypotheses testing (after applying Scheffe’s S procedure adjustment).

Subsequently, we pick the classifier with best prediction accuracy on each dataset and apply this procedure again. All results are collected and presented in plots and tables in the following section 4.3.

Following that, we collect the genes selected by the best classifiers induced on the datasets pre-processed with GRAD and the Wrapper approach. Then, an exploratory analysis is performed having as baseline for the comparison the genes selected with the Filtering technique. This procedure has the goal of revealing genes not differentially expressed selected by GRAD and Wrapper Approach.

Finally, two scatter plots are obtained with the genes selected by the best classifier. On one scatter plot two genes differentially expressed and in the other one two genes not differentially expressed. These scatter plots aim at uncovering the type of relationship (inter-dependency and interactions) that may exists between these two kinds of variables and showing their discriminatory power to separate cancer from normal libraries.

4.3 Results

Table 4.3 presents the estimated predictive accuracy estimated by 10 fold cross validation on the original dataset and on the derived ones using dimensionality reduction techniques. It also presents results obtained on previous research (see [2]). Multiple hypothesis tests were applied to the best classifier from each dimen-sionality reduction technique. Critical values were calculated using the Scheffe’s S procedure. The classifier induced on the dataset preprocessed with GRAD has significantly better accuracy than the one induced on the dataset preprocessed with the Filtering technique.

Fig. 2 presents the decision trees induced with C4.5 on the datasets prepro-cessed with GRAD and the Wrapper approach using C4.5.

(8)

Table 1.Predictive accuracy for all pre-processing techniques and each classifier Classifiers C4.5 SVM RBF NN GRAD 86.17867.858 76.553 72.268 Wrapper 82.08867.858 69.696 69.875 Original 64.517 79.33970.965 77.410 Filtering 71.054 78.41067.304 76.392 Prev. Results 74.4 82.2 - -(a) (b)

Fig. 2.Figure (a) C4.5 Tree induced on the dataset pre-processed with GRAD (b) C4.5 Tree induced on the dataset pre-processed with the Wrapper approach using C4.5.

Fig 3 presets a scatter plot between attributes 18 and 242 that are not dif-ferentially expressed, and attributes 138 and 800, which are.

(a) (b)

Fig. 3.Relationship between genes selected by C4.5 on the dataset pre-processed with GRAD (a) Relationship between two genes low expressed. (b) Relationship between two genes with signficant differential expression levels among class conditions (cancer vs normal) .

(9)

Table 4.3 presents results from descriptive statistics characterizing the gene expression level of the genes selected by the decision tree induced on the datasets preprocessed with GRAD and the Wrapper approach. The group of rows with the indentifiertestpresent the results from statistical tests to detect significant differences between normal and cancer conditions on the mean, median and distribution of gene expression levels. The identifier H0 asserts the condition in which the null hypothesis cannot be rejected. That is, the statistics (mean, median or distrbution) is equal across both normal and cancer conditions. H1 represents the condition in which we have to reject the null hypothesis at a significance level of 0.05.

Statistics

Genes

GRAD Wrapper Approach C4.5 18 138 242 800 69 106 112 median 2.5 4 3 2 12 42 54 mean 4.55 4.78 4.27 4.39 27.47 54.91 65.77 stdev 5.35 4.54 4.36 5.89 35.25 46.83 39.92 range 23 28 23 32 167 199 227 kurtosis 4.53 11.43 6.49 10.07 6.60 4.22 7.86 skewness 1.47 2.41 1.59 2.51 1.99 1.46 1.84 mediannormal 2 2 4 2 11 28 47 cancer 3 4 3 3.5 13 46 57 mean normal 2.88 2.79 5.33 2.17 17.13 46.42 48.83 cancer 5.36 5.74 3.76 5.46 32.44 58.98 73.90 stdev normal 2.94 2.13 5.76 2.70 19.36 49.93 27.81 cancer 6.04 5.07 3.45 6.68 39.97 45.21 42.46 range normal 9 6 23 10 85 163 102 cancer 23 28 16 32 167 188 210 tests mean H0 H1 H0 H1 H0 H0 H1 median H0 H1 H0 H1 H0 H1 H1 distrib. H0 H1 H0 H1 H0 H1 H1 4.4 Discussion

It can be observed that the accuracy obtained on the dataset pre-processed with GRAD is better than the one obtained with the wrapper approach. C4.5 is the best classifier on both preprocessing techniques. These classifiers are sig-nificantly better than both classifiers induced on the original dataset and on the one preprocessed using the Filtering technique. Interesting enough, the classifier induced on the original dataset has better performance than the one induced on the dataset preprocessed with the filtering technique. These results reinforces the notion that filtering may not preserve important information regarding the classification of examples. Hence its application on predictive analysis may result on the induction of classifiers with poor performance.

(10)

From results presented on Table 4.3 we may infer that both trees induced on GRAD and Wrapper approach use attributes associated with genes that are not differentially expressed. Because these classifiers presented the best performance, this may confirms the notion that an important amount of the information re-garding the characterisation of normal and cancer conditions may be encoded in groups of genes not differentially expressed.

From the resuls presented on Fig. 1 (b), the cumulative density function of gene expression levels, we may infer that 80% of the data is below the mean (10.0). Thus, an important amount of data is present at lower levels on this dataset. We may also conclude from results presented on Table 4.3 that genes selected by GRAD have substantially lower gene expression average and vari-ability than the Wrapper Approach. Since the GRAD approach produced best results, we may conclude that small variations and low-expressed genes have an important role on the induction of predictive models from gene expression data. To uncover the relationship between differentially expressed genes and low expressed genes, Fig. 3 presents a scatter plot between attributes 18 and 242 that are not differentially expressed, and attributes 138 and 800, which are. In Fig. 3(a), we may find normal and cancer examples along the whole range of the x and y axis, being the interrelation between x and y that discriminates positive from negative cases (although not a perfect dichotomy). Hence, the interaction and interdependency between genes has an important role in the induction and performance of the best classifier.

In Fig. 3(b), normal examples are low expressed and concentrated around the origin of plot. Cancer samples are scattered through all higher values of the genes expression level. Thus the examples contained within the subdomain defined by the contraintsx≤6∧y≤3 are considered normal where the remaining consid-ered to be cancerous. This dichotomy separates the lower expressed from higher expressed genes independently and performs better than the line representing a relationship between the two variables, which reveals the reduced degree of association between these genes.

We have also studied the impact of removing genes differentially expressed and non-differentially expressed from the set of genes selected by GRAD. How-ever, results were inconclusive. In both cases, removing some differentially ex-pressed genes increased accuracy and others decreased. The same happens with non differentially expressed genes.

4.5 Methods

The following paragraphs describe the preparation details of each dataset ac-cording to the requirements of each technique.

GRAD found 7 genes with indices 18, 138, 242, 253, 341, 435, and 800 that enable us to classify all 74 biological situations correctly, using the following decision function: f = 1−2r1/(r1+r2), where r1 and r2 are distances of a pattern (biological situation)xto the nearest neighbour belonging to the normal

(11)

class and cancer class, respectively. Therefore if f ≥0,x is normal, otherwise it is cancer. Based on the selected genesgk (k= 18,138,242,253,341,435,800),

the following distance functionr measuring a similarity of biological situations ofaandbwas used:

r(a, b) = v u u t 7 X k=1 wk(gka−gkb)2,

where a weight vectorw={2,5,1,1,1,1,4}, where each weight reflects how many times a certain gene was selected and placed in the optimal feature set.

Wrapper Approach using C4.5 found 6 genes with indices 10,56,69,106,112,125. For selecting these genes, ten runs were done with wrapper algorithm. In each run the dataset was randomly split into a training and testing set and the at-tributes selected by the wrapper were collected. The overall frequency of each attribute on all folds was computed and genes sorted according to this criterion. Attributes with frequency below the 10th percentile were removed.

Filtering approach found 54 genes differentially expressed with indices: 12,56,105,106,111,112,113,125,131,138,144,146,165,176,227,253,269,

277,334,343,347,356,357,365,366,374,394,396,398,403,406,409,412,420,

424,433,440,454,456,488,523,532,549,589,593,629,675,701,721,726,757,

769,800,803.Several pre-processing techniques were applied for selecting these genes. The scheme involves removing genes with all absolute values, variance and entropy less than the 10th percentile, as well as genes with no significant differential expression profiles between class conditions (normal and cancer con-ditions). The selection of genes differentially expressed is done with the Wilcoxon test with significance level of 0.05. The Wilcoxon test makes no assumption con-cerning the distribution of the two populations, it is often not severely affected by changes in a small portion of the data, and it can be successfully applied even to very small samples.

5 Conclusion

In this paper two methodologies for dimensionality reduction techniques were considered for gene pre-selection: filtering and feature selection. These dimen-sionality reduction methodologies were applied on SAGE data. The results ob-tained reiterated the importance of carefully selecting and applying suitable dimensionality reduction techniques before fitting a classifier to data.

Experimental results also acknowledged the notion that filtering may not preserve the interactions and inter-dependency between attributes. Hence its application on predictive analysis may result on the induction of classifiers with poor performance.

(12)

The results obtained casts a preference for GRAD: A dimensionality reduc-tion techniques that considers inter-acreduc-tions among more than 2 variables.

It was demonstrated graphically the existence of inter-dependency between attributes of the best classifier of this dataset. This classifier relied on genes not differentially expressed, with relatively low expression and low variation levels to build its predictions. Despite the risk of over-fitting and learning data artifacts when dealing with low expression levels, the classifier presented a considerable stability expressed by its low error variance across folds of the cross validation. Hence, our contribution is to highlight the importance of genes not differen-tially expressed as well as those small variations on the expression level, partic-ularly of those genes expressed at lower levels, which are commonly removed by filtering pre-processing techniques. These genes may have a significant impact on the performance of the induced classifiers on the predictive analysis of gene expression data from Human SAGE libraries.

References

1. Gandrillon, O.: Guide to the gene expression data. In: Proceedings of the ECML/PKDD Discovery Challenge Workshop, Pisa, Italy. (2004) 116–120 2. Gamberoni, G., Storari, S.: Supervised and unsupervised learning techniques for

profiling sage results. In: Proceedings of the ECML/PKDD Discovery Challenge Workshop, Pisa, Italy. (2004) 121–126

3. Rioult, F.: Mining strong emerging patterns in wide sage data. In: Proceedings of the ECML/PKDD Discovery Challenge Workshop, Pisa, Italy. (2004) 127–138 4. Velculescu, V., Zhang, L., Vogelstein, B., Kinzler, K.: Serial analysis of gene

ex-pression. Science270(1995) 484–487

5. Vapnik, V.: The nature of statistical learning theory. Springer-Verlag, Berlin (1995) 6. Quinlan, J.: C4.5: programs for machine learning. Morgan-Kaufmann, San

Fran-cisco (1993)

7. Blanco, R., Larra˜naga, P., Inza, I., Sierra, B.: Gene selection for cancer classifica-tion using wrapper approaches. Internaclassifica-tional Journal of Pattern Recogniclassifica-tion and Artificial Intelligence8(2004) 1373–1390

8. Brock, G., Beavis, W., Salter, L.: (Fuzzy logic and alternate methods for detecting gene regulatory networks)

9. Bø, T., Jonassen, I.: New feature subset selection procedures for classification of expression profiles. Genome Biology3(2002)

10. Yap, Y., Zhang, X., Ling, M., Wang, X., Wong, Y., Danchin, A.: Classification between normal and tumor tissues based on the pair-wise gene expression ratio. BMC Cancer4(2004)

11. Zagoruiko, N., Borisova, I., Kiselev, A., Kutnenko, O.: Algorithm GRAD for se-lection of informative genetic characteristics. In: Proceedings of the 2nd Moscow Conference on Computational Molecular Biology, Moscow, Russia. (2005)