A wavelet-based data pre-processing analysis approach in mass spectrometry

(1)

www.intl.elsevierhealth.com/journals/cobm

A wavelet-based data pre-processing analysis approach in mass spectrometry

Xiaoli Li

^∗

, Jin Li, Xin Yao

Cercia, School of Computer Science, University of Birmingham B15 2TT, UK

Abstract

Recently, mass spectrometry analysis has a become an effective and rapid approach in detecting early-stage cancer. To identify proteomic patterns in serum to discriminate cancer patients from normal individuals, machine-learning methods, such as feature selection and classiﬁcation, have already been involved in the analysis of mass spectrometry (MS) data with some success. However, the performance of existing machine learning methods for MS data analysis still needs improving. The study in this paper proposes a wavelet-based pre-processing approach to MS data analysis. The approach applies wavelet-based transforms to MS data with the aim of de-noising the data that are potentially contaminated in acquisition. The effects of the selection of wavelet function and decomposition level on the de-noising performance have also been investigated in this study. Our comparative experimental results demonstrate that the proposed de-noising pre-processing approach has potentials to remove possible noise embedded in MS data, which can lead to improved performance for existing machine learning methods in cancer detection.

Keywords: Cancer detection; Mass spectrometry; Wavelet transforms; De-noising; Linear discriminate analysis; Principal component analysis; Probabilistic classiﬁcation

1. Introduction

Microarray technology is a powerful tool in understanding biological systems, discovering the mechanism of diseases and developing new drugs, etc. However, microarray technology is primarily used to analyze mRNA rather than actual biological effectors. Levels of the mRNA expression are not well corre- lated to the actual protein because of differential rates of mRNA translation and varying protein half-lives. Therefore, microarray technology cannot provide direct information about the function of proteins for cancer diagnosis.

Mass spectrometry (MS) has recently been developed to un- cover the content of proteins of a tissue[1]such as serum. MS enables one directly to analyze proteins to understand biological systems by revealing the masses of biomolecules as well as biomolecular fragments. It is found that MS has great potentials to differentiate amongst various types of tissue samples (e.g., normal or tumor) using protein expression proﬁles[2].

One of the biggest advantages of MS is that the whole process of mass spectrometry analysis, including sample preparation,

∗Corresponding author.

E-mail addresses:[email protected](X. Li),[email protected](J. Li), [email protected](X. Yao).

doi:10.1016/j.compbiomed.2006.08.009

data acquisition and data analysis, usually could be completed approximately 3–4 h. This has made MS one of the fastest cancer detection methods so far. One of the biggest challenges in MS data analysis is that the data usually contain many transients that are difﬁcult to handle. Moreover, the real protein “signals”

are almost always contaminated with substantial noise in the acquisition of MS data. This is largely due to the complexity of biological specimens and interfering biochemical/physical processes [3]. The noise hidden in MS data could make the ratios of mass/charge (m/z) not align across spectra within a single sample (resolution∼ 0.1%), as well as make the inten- sities not calibrate across multiple samples [4]. A number of machine learning methods have been used to process MS data.

While most of the methods concentrate on major components in machine learning, such as feature selection and classiﬁca- tion[5,11,19,20], few studies have been done in an inevitable and important process in machine learning, i.e., MS data pre- processing.

Suppose that the obtained MS data are constituted of an actual protein informative signal plus some noise. It is desirable to separate actual protein data from the background noise in protein intensity measurements. The complex struc- ture of MS data and the complex measurement procedures make the de-noising process difﬁcult. Our tests show the

(2)

traditional de-noising methods such Fourier transform-based ﬁlters are not effective in removing noise embedded in MS data.

In this paper, we propose a novel data pre-processing approach to de-noising based on wavelet techniques. The hope is that the performance of existing machine learning methods in cancer detection could potentially be improved if our proposed approach were applied in advance. The effectiveness of our approach could be veriﬁed by means of performance comparison in cancer detection before and after incorporat- ing our de-noising method to an existing machine learning classiﬁer.

2. MS data

The most popular and widely used proteomic technology is to characterize changes in protein expression between two different samples: normal and disease. In recent years, surface- enhanced laser desorption/ionization time-of-flight mass spectrometry (SELDI-TOF-MS) in combination with advanced data mining algorithms, is used to detect protein patterns associated with diseases [11–15]. As a kind of MS-based protein Chip technology, SELDI-TOF-MS has been successfully used to detect several disease-associated proteins in complex biological specimens such as serum[16–18]. A diagram of SELDI-TOF- MS is shown inFig. 1. The most intense peak in the spectrum is termed the base peak and all the others are relative to its intensity. The peaks themselves are typically very sharp, and are often simply represented as vertical lines. The position of an individual protein in the spectrum corresponds to its “time of flight” because the small proteins fly faster whilst large proteins fly more slowly[10].

All datasets analyzed in this study are from Dr. Emanuel Pet- ricoin III and Dr. George Wright Jr. in Eastern Virginia Med- ical School, Virginia Prostate Center (2002). The MS data in Petricoin’s group are obtained from the NIH and FDA Clinical

Fig. 1. Cancer disease diagnosis using SEL-TOF-MS. Collection of serum protein from blood sample, then adsorption, partition, electrostatic interaction or afﬁnity chromatography on a stationary-phase immobilized in an array format on a protein chip surface. Finally, a mass spectrometry series is obtained, which shows a plot of relative intensity that is a function of the mass-to-charge ratio (m/z).

Proteomics Program Databank[19]. The datasets contain 100 healthy samples and 100 disease samples.

3. Method

Our preliminary experiments have shown that nonparametric techniques could potentially remove the noise in MS data very well. The advantage of the non-parametric techniques is that it does not need to build a model before removing the noise. Considering the nature of MS data, which are derived from complicated biological system, we suggest that wavelet shrinkage and wavelet thresholding estimators are potential candidates for pre-processing MS data.

The ﬁrst method of wavelet-based de-noising is proposed by Donoho and Johnstone[6], which is carried out by threshold- ing wavelet coefﬁcients. Given a measured signal x(t) with a Gaussian white noise n(t) a signal s(t) can be recovered by the following formula:

S(t) = x(t) − n(t), t = 1, . . . , N. (1)

The wavelet-based de-noising method is composed of three steps[8]: (i) the wavelet transform of signal x(t); (ii) thresh- olding the wavelet coefﬁcients; (iii) the inverse wavelet transform based on thresholded wavelet coefﬁcient to obtain the de- noised signal. A selection of the threshold is core to the success of wavelet- based de-noising method.

A universal threshold T is proposed by Donoho and Johnstone [7]to remove white noise, it is given by

T =

2 log(N), = MAD/0.6745, (2)

where N is the length of signal x(t), is the noise level, MAD is the median absolute deviation estimated in the ﬁrst scale.

In the case of wavelet-packet transform, the threshold can be written as

T =

2 log(N log2N). (3)

(3)

Thereby, the signiﬁcant wavelet coefﬁcients can be derived by thresholding. There are hard and soft thresholding rules Hard thresholding:= ^HT( ˆd_{j k})

=

0 if| ˆdj k|T ,

ˆdj k if| ˆd_{j k}| > T , (4) Soft thresholding:= ^ST( ˆd_{j k})

=

⎧⎨

⎩

0 if| ˆd_{j k}|T , ˆdj k− T if ˆdj k> T , ˆdj k+ T if ˆd_{j k}< − T .

(5)

The procedure of wavelet-based de-noising is given as x^DWT−→{ˆc_j0k, ˆd_{j k}}Thresholding

−→ {ˆc_j0k, ( ˆd_{j k})}^IDWT−→ ˆx.

During wavelet-based de-noising, we need to consider two is- sues: wavelet basis function selection and setting of thresholds.

In the application of wavelet-based de-noising, the selection of a suitable wavelet function is an important problem. The criteria of choosing a basic function are to maximize the correlation between the basis function and the information signal, as well as to minimize the correlation between the noisy signal and the basis function. The arbitrary choice of a wavelet function is not desirable. Several factors orthogonality, wavelet shape and wavelet width should be considered. Orthogonal wavelet can give the most compact representation of a signal, so this study concentrates on orthogonal wavelet functions.

Daubechies wavelet has good localizing properties both in tem- poral and frequency domains, however bi-orthogonal wavelet has symmetry and simplicity. If a signal consists of many sharp jumps or steps, a boxcar-like function such as the Haar wavelet

Fig. 2. The new classiﬁcation procedure of MS data by combining wavelet-based de-noising with Q5.

is reasonable, while for smoothly varying time series a smooth function such as Symlets wavelet and discrete approximation of Meyer wavelet are better. In this paper, we shall resort to trail and error to determine which one is more appropriate to remove the noise in MS data than others.

On the other hand, several algorithms have been proposed to estimate the threshold for wavelet coefficients to remove noise, such as data-adaptive wavelet thresholding estimators, block thresholding estimators, Bayesian approach, and so on[6]. Pre- vious work has shown that Bayesian wavelet shrinkage and thresholding estimators outperform the classical data-adaptive wavelet thresholding estimators in terms of mean squared error in situations with finite samples. In the Bayesian approach, a prior distribution is first calculated based on the wavelet coefficients, which captures the sparseness of wavelet expansions. A suitable Bayesian rule is then to estimate posterior distribution of the wavelet coefficients. In this study, we use a Bayesian block thresholding estimator to design thresholds for de-noising [7,9]. The details of this algorithm can be found in Ref.[7].

After the process of the de-noising MS data, principal component analysis (PCA) and linear discriminate analysis (LDA) are exploited to classify the healthy and cancer samples. The reason to take the combination of PCA and LDA for classification is two-fold. Firstly, the combination approach is one of the most used methods in the literature for handing classification on MS data [10,21–23]. Secondly, using the same approach enables us to easily justify whether our wavelet-based pre- processing approach would add value to an existing public available machine learning method. More specifically, we adopt the same system, called Q5, used in[10]. Comparison shall be made out in terms of major performance criteria such as the sensitivity, specificity and positive predictive values. The new

(4)

procedure of classiﬁcation on MS data by combining proposed wavelet-based de-noising with Q5 is illustrated inFig. 2.

4. Results

The procedure of discrete wavelet transform de-noising is shown inFig. 3, which is illustrated with a 4-level wavelet decomposition. It is worth noting that a 4-level decomposition may not be sufficient to remove noise in some cases. At each level of decomposition, the signal or the wavelet approximation coefficient is decomposed into the high-frequency compo- nent cDi and the low frequency component cAi. After thresh- olding, each high frequency component cDi, the thresholded detail coefficients are presented as cD_i. Finally, the forth-level low-frequency component cA⁴and these thresholded detail co- efficients cDi are reconstructed to generate an outcome signal,

Fig. 3. The procedure of wavelet transform de-noising of MS data. The top signal is the original MS data. At each decomposition level, cAi present the low-frequency components, and cDi represents the high-frequency components of the approximation. After de-noising, cD_i represents the de-noised high-frequency components. Taking wavelet inverse transformation, the de-noised signal and noise can be obtained.

which is regarded as the de-noised MS data. The noise is sup- posed to be removed from the original MS data after the wavelet de-noising processes. It is worth pointing out that the de-noised signal can still retain important details such as spikes and transients. To reﬁne the decomposition of MS, a wavelet-packet transform de-noising is carried out using the same procedure as discrete wavelet transform.

For comparison, we apply the original MS data set (without wavelet de-noising) to the Q5 algorithm [10], and the result is presented as a yardstick, which measures how much improvement wavelet-based de-noising method can achieve. For discrete wavelet de-noising, the comparison results indicate that 10-level decomposition is better for “db4” (Daubechies wavelet), “sym4” (Symlets wavelet), “haar” (Haar wavelet),

“dmey” (Discrete approximation of Meyer wavelet) and

“bior3.5” (Biorthogonal wavelet). For wavelet-packet de- noising, 6-level decomposition with “db4”, “sym4”, “haar”

(5)

Table 1

De-noising method T% PCT Corr. % Classif. % PPV % Sens. % Spec. %

Without de-noising 50 0.5 88.99(3.14) 98.11(1.38) 90.34(3.88) 87.37(5.21) 90.60(4.18)

0.63 92.58(2.72) 85.50(3.35) 93.61(3.31) 91.26(4.66) 93.83(3.34)

0.75 95.02(2.39) 72.28(4.41) 95.85(2.98) 93.86(4.18) 96.08(2.84)

Discrete wavelet db4 50 0.5 91.29(2.63) 98.00(1.63) 92.19(3.45) 90.28(4.63) 92.28(3.73)

0.63 94.60(2.37) 86.91(3.38) 95.27(3.22) 93.77(4.17) 95.38(3.26)

0.75 96.54(2.23) 74.54(3.99) 97.16(3.01) 95.76(3.78) 97.27(2.94)

Discrete wavelet sym4 50 0.5 91.81(2.76) 98.03(1.55) 92.76(3.43) 90.78(4.68) 92.85(3.62)

0.63 94.87(2.51) 86.95(3.34) 95.59(2.98) 93.99(4.21) 95.67(3.04)

0.75 96.97(2.15) 74.74(4.29) 97.43(2.66) 96.33(3.46) 97.52(2.58)

Discrete wavelet haar 50 0.5 91.21(2.79) 98.02(1.60) 92.31(3.54) 90.01(4.56) 92.41(3.78)

0.63 94.35(2.45) 86.83(3.27) 95.24(3.06) 93.37(4.18) 95.27(3.20)

0.75 96.43(2.31) 74.35(3.98) 97.17(2.80) 95.57(3.77) 97.23(2.75)

Discrete wavelet dmey 50 0.5 91.66(2.51) 98.10(1.42) 92.09(3.47) 91.21(4.48) 92.11(3.14)

0.63 94.86(2.46) 87.11(3.28) 95.16(3.32) 94.46(4.03) 95.21(3.44)

0.75 96.98(2.05) 74.42(4.17) 97.36(2.67) 96.50(3.59) 97.39(2.70)

Discrete wavelet bior3.5 50 0.5 91.52(2.92) 98.04(1.49) 92.23(3.43) 90.75(4.96) 92.29(3.68)

0.63 94.67(2.67) 86.88(3.15) 95.30(3.20) 93.93(4.45) 95.37(3.34)

0.75 96.66(2.21) 74.79(4.00) 97.18(2.80) 95.94(3.70) 97.28(2.80)

T%, training percent; PCT, probability classification threshold; Corr. %, percent correctly classified; Classif. %, percent classified; PPV, positive predictive value; Sens., sensitivity; Spec., specificity.

wavelets is better. The basic principle for the selection of wavelet function is based on the remove of noise hidden in MS data. Test results show that wavelet functions selected are better at removing noise than same kind wavelet functions, for example, abd4 is the best than db1, bd2 and other same kinds of wavelets.

The results for discrete wavelet transform de-noising are reported inTable 1, and also depicted inFig. 4, including all outcomes using 5 different wavelets. Meanwhile, the results for wavelet-packet transform de-noising are given inTable 2, and also shown inFig. 5, including all outcomes using three different wavelets. As can be shown inFig. 4, panel A depicts the results reported in[10]by Q5 without any de-noising process and all remaining panels depict the performance of all 5 discrete wavelet de-noising methods in terms of 5 criteria, respectively. It is found that all 5 discrete wavelet de-noising methods can improve the classification accuracy by comparison with same classification method without wavelet de-noising. On average, 3% enhancement in classification accuracy is obtained and the results of those de-noising methods are quite close to each other.

In Fig. 5, panel A is the same graph as one inFig. 4 and all three remaining panels depict the performance of 3 wavelet package de-noising methods in terms of 5 criteria, respectively.

It is found again that all the three methods have achieved better performance in all 5 criteria in comparison with that obtained by Q5. It is notable that increasing probability classification thresholds lead to increasing percent correctly classified, positive predictive value, sensitivity and specificity, but somehow decreasing percent classified.

Compared with results inFig. 4using discrete wavelet methods, it is suggested that wavelet-packet de-noising does show

better performance than discrete wavelet de-noising in terms of the three wavelet-packet de-noising methods considered here.

Wavelet-packet de-noising method with “haar” is the best one with 9% improvement.

5. Discussions and conclusions

The main objective of this study is to investigate whether or not a wavelet-based pre-processing approach is able to remove noise embedded in MS data with the aim of improving cancer detection performance. This study has applied a wavelet-based pre-processing approach to MS data. We take two different wavelet-transforming methods: one uses the discrete wavelet transforming and another uses the wavelet package transforming. It is found that both methods achieve better performance in terms of 5 criteria aforementioned. Although we did not com- pare the performance of wavelet-based de-noising with others, we could still draw some useful conclusions as follows:

(i) The pre-processing for MS data is an necessary and quite important step.

(ii) Both discrete wavelet and wavelet-packet de-noising could improve the classiﬁcation performance on MS data.

(iii) The wavelet packet-based de-noising performs much better than the discrete wavelet. While there is 3% average improvement using discrete wavelet de-noising, there is 8.5% using wavelet packet de-noising.

(iv) “haar” wavelet function has good performance both in discrete wavelet de-noising and in wavelet-packet de-noising.

This is partly due to the fact that the shape of “haar” wavelet could represent transients in MS data more precisely.

(6)

Fig. 4. The probability classification threshold versus percent classified (Classif), percent correctly classified (Correct), positive predictive value (PPV), sensitivity (Sens), and specificity (Spec) for five different discrete wavelet de-noising pre-processing MS datasets, All are with 50% of the samples used in training. (A) No de-noising pre-processing, (B) “db4”, (C) “sym4”, (D) “haar”, (E) “dmey”, (F) “bior3.5” wavelet de-noising.

Table 2

De-noising method T% PCT Corr. % Classif. % PPV % Sens. % Spec. %

Without de-noising 0.5 0.5 88.99(3.14) 98.11(1.38) 90.34(3.88) 87.37(5.21) 90.60(4.18)

0.63 92.58(2.72) 85.50(3.35) 93.61(3.31) 91.26(4.66) 93.83(3.34)

0.75 95.02(2.39) 72.28(4.41) 95.85(2.98) 93.86(4.18) 96.08(2.84)

Wavelet-packet db4 0.5 0.5 97.15(2.17) 99.58(0.64) 98.18(2.34) 96.13(3.38) 98.18(2.41)

0.63 98.93(1.39) 92.70(2.66) 99.45(1.36) 98.35(2.52) 99.45(1.47)

0.75 99.63(0.84) 83.17(4.10) 99.78(0.94) 99.43(1.54) 99.79(0.97)

Wavelet-packet sym4 0.5 0.5 97.68(1.65) 99.51(0.70) 98.80(1.84) 96.56(2.72) 98.80(1.88)

0.63 99.28(1.01) 93.77(2.61) 99.69(0.95) 98.84(1.76) 99.69(0.97)

0.75 99.73(0.63) 84.81(3.94) 99.90(0.65) 99.55(1.16) 99.90(0.64)

Wavelet-packet haar 0.5 0.5 98.01(1.46) 99.88(0.40) 99.16(1.57) 96.86(2.32) 99.16(1.58)

0.63 99.16(0.96) 95.23(2.54) 99.76(0.81) 98.54(1.69) 99.77(0.79)

0.75 99.71(0.64) 87.35(3.79) 99.95(0.36) 99.45(1.23) 99.95(0.33)

T%, training percent; PCT, probability classification threshold; Corr. %, percent correctly classified; Classif. %, percent classified; PPV, positive predictive value; Sens., sensitivity; Spec., specificity.

(7)

Fig. 5. The probability classification threshold versus percent classified (Classif), percent correctly classified (Correct), positive predictive value (PPV), sensitivity (Sens), and specificity (Spec) for three different wavelet-packet de-noising pre-processing MS datasets. All are with 50% samples used in training. (A) Without de-noising pre-processing, (B) “db4”, (C) “sym4”, (D) “haar” wavelet-packet de-noising.

In terms of our experimental results, we can conclude that carrying out a pre-processing de-noising analysis on MS data is a value-added step for improving performance of cancer detection, given the inherent property of MS data that noise is usually mixed with real MS data.

Acknowledgments

The authors are grateful to Mr. T. Qiu for testing the program.

The constructive comments from reviewers are appreciated. The work is supported by Cercia, The Centre of Excellence for Research in Computational Intelligence and Applications, in the School of Computer Science at the University of Birmingham, UK.

References

[1]R. Aebersold, M. Mann, Mass spectrometry based proteomics, Nature 422 (6928) (2003) 198–207.

[2]R.F. Service, Proteomics, Science 294 (2001) 2074–2083.

[3]E. Phizicky, P.I. Bastiaens, H. Zhu, M. Snyder, S. Fields, Protein analysis on a proteomic scale, Nature 422 (6928) (2003) 208–215.

[4]Y. Yasui, D. McLerran, B. Adam, M. Winget, M. Thornquist, Z. Feng, An automated peak identiﬁcation/calibration procedure for high-dimensional protein measures from mass spectrometers, J. Biomed. Biotechnol. 4 (2003) 242–248.

[5]B. Wu, T. Abbott, D. Fishman, W. Mcmurray, G. Mor, K. Stone, D. Ward, K. Williams, H. Zhao, Comparison of statistical methods for classiﬁcation of ovarian cancer using mass spectrometry data, Bioinformatics 19 (13) (2003) 1636–1643.

[6]D.L. Donoho, De-noising by soft-thresholding, IEEE Trans. Inform.

Theory 41 (3) (1995) 613–627.

[7]I.M. Johnstone, B.W. Silverman, Empirical bayes selectioin of wavelet thresholds, Ann. Statist. 33 (4) (2005) 1700–1752.

[8]D.L. Donoho, I.M. Johnstone, Ideal spatial adaptation by wavelet shrinkage, Biometrika 81 (3) (1994) 425–455.

[9]A. Antoniadis, J. Bigot, T. Sapatinas, Wavelet estimators in nonparametric regression: a comparative simulation study, J. Statist. Software 6 (6) (2001) 1–83.

[10]R.H. Lilien, H. Farid, B.R. Donald, Probabilistic disease classiﬁcation of expression dependent proteomic data from mass spectrometry of human serum, J. Comput. Biol. 10 (6) (2003) 925–946.

[11]E. Petricoin III, A.M. Ardekani, B.A. Hitt, P.J. Levine, V.A. Fusaro, S.M. Steinberg, G.B. Mills, C. Simone, D.A. Fishman, E.C. Kohn, L.A.

Liotta, Use of proteomic patterns in serum to identify ovarian cancer, The Lancet 359 (2002) 572–577.

[12]J.M. Sorace, M. Zhan, A data review and re-assessment of ovarian cancer serum proteomic proﬁling, BMC Bioinform. 4 (2003) 1–13.

[13]C.M. Michener, A.M. Ardekani, E.F. Petricoin III, L.A. Liotta, E.C.

Kohn, Genomics and proteomics: application of novel technology to early detection and prevention of cancer, Cancer Detect Prev. 26 (2002) 249–255.

[14]E.F. Petricoin, K.C. Zoon, E.C. Kohn, J.C. Barrett, L.A. Liotta, Clinical proteomics: translating benchside promise into bedside reality, Nat. Rev.

Drug. Discov. 1 (9) (2002) 683–695.

[15]P.R. Srinivas, M. Verma, Y. Zhao, S. Srivastava, Proteomics for cancer biomarker discovery, Clin. Chem. 48 (2002) 1160–1169.

[16]P.C. Herrmann, L.A. Liotta, E.F. Petricoin III, Cancer proteomics: the state of the art, Dis. Markers 17 (2001) 49–57.

[17]G.L. Wright Jr., L.H. Cazares, S. Leung, Proteinchip surface enhanced laser desorption/ionization (SELDI) mass spectrometry: a novel protein biochip technology for detection of prostate cancer biomarkers in complex protein mixtures, Prostate Cancer Prostatic Dis. 2 (1999) 264 –276.

(8)

[18]A. Vlahou, P.F. Schellhammer, S. Mendrinos, Development of a novel proteomic approach for the detection of transitional cell carcinoma of the bladder in urine, Am. J. Pathol. 158 (4) (2001) 1491–1520.

[19]NIH and FDA Clinical Proteomics Program Databank. http://clinical proteomics.steem.com, 2002.

[20]B.L. Adam, Y. Qu, J.W. Davis, Serum protein ﬁngerprinting coupled with a pattern-matching algorithm distinguishes prostate cancer from benign prostate hyperplasis and healthy men, Cancer Res. 62 (13) (2002) 3609 –3614.

[21]J.P. Antignac, B. Le Bizec, F. Monteau, F. Andre, Differentiation of betamethasone and dexamethasone using liquid chromatography/positive elecrtospray tandem mass spectrometry and multivariate statistical analysis, J. Mass Spectrometry 37 (2002) 69–75.

[22]P. Miketova, C. Abbas-Hawka, K. Voorhees, T. Hadﬁeld, Microorganism Gram-type differentiation of whole cells based on pyrolysis high- resolution mass spectrometry data, J. Anal. Appl. Pyrolysis 67 (2003) 109–122.

[23]M. Wagner, B. Tyler, D. Castner, Interpretation of static time-of-ﬂight secondary ion mass spectra of adsorbed protein ﬁlms by multivariate pattern recognition, Anal. Chem. 74 (2002) 1824–1835.

Xiaoli Li received his B.S.E. and M.S.E. degrees from the Kun-ming Univer- sity of Science and Technology, and Ph.D. degree from the Harbin Institute of Technology, China, in 1992, 1995, and 1997, respectively, all in mechanical engineering. From April 1998 to October 2003, he was a Research Fellow of the Department of Manufacturing Engineering, City University of Hong Kong, of the Alexander von Humboldt Foundation at the Institute for Pro- duction Engineering and Machine Tools, Hannover University, Germany, a Post doc fellow at the Department of Automation & Computer-Aided Engi- neering, Chinese University of Hong Kong. In 2002, he was appointed as Professor at the Electrical Engineering School, Yanshan University. China.

Currently he also works in Cercia, School of Computer Science, The Uni- versity of Birmingham, UK. His main areas of research: bio-signal analysis;

computational intelligence: monitoring; manufacturing system.

Jin Li is a Research Fellow at the CERCIA specialising in data mining and evolutionary computation. He received his M.Sc. in Computer Science in 1992 and his Ph.D. in ﬁnancial forecasting using genetic programming in 2000, from the Hefei University of Technology, China, and the Univer- sity of Essex, respectively. He has worked both in academic and business, gaining considerable experience as a software engineer and a project lead in commercial software development projects. His academic interests include evolutionary computation, data mining in modeling and forecasting.

Xin Yao (M’91–SM’96–F’03) received his B.Sc. degree from the University of Science and Technology of China (USTC), Hefei, in 1982, M.Sc. degree from the North China Institute of Computing Technology, Beijing, in 1985, and Ph.D. degree from USTC in 1990.

He was an Associate Lecturer and Lecturer from 1985 to 1990 at USTC, while working towards his Ph.D. He took up a Postdoctoral Fellowship in the Computer Sciences Laboratory, Australian National University (ANU), Canberra, in 1990, and continued his work on simulated annealing and evolutionary algorithms. He joined the Knowledge-Based Systems Group, CSIRO Division of Building, Construction and Engineering, Melbourne, in 1991, working primarily on an industrial project on automatic inspection of sewage pipes. He returned to Canberra in 1992 to take up a lectureship in the School of Computer Science, University College, University of New South Wales (UNSW), Australian Defence Force Academy (ADFA), where he was later promoted to a Senior Lecturer and Associate Professor. Attracted by the En- glish weather, he moved to the University of Birmingham, Birmingham, UK, as a Professor of Computer Science in 1999. Currently, he is the Director of the Centre of Excellence for Research in Computational Intelligence and Applications (CERCIA), a Distinguished Visiting Professor of the Univer- sity of Science and Technology of China, Hefei, and a visiting professor of three other universities. He has more than 200 publications. He is an Associate Editor or Editorial Board Member of several journals. He is the Editor of the World Scientific Book Series on Advances in Natural Compu- tation. He has given more than 35 invited keynote and plenary speeches at conferences and workshops worldwide. His major research interests include evolutionary artificial neural networks, automatic modularization of machine learning systems, evolutionary optimization, constraint handling techniques, computational time complexity of evolutionary algorithms, coevolution, iter- ated prisoner’s dilemma, data mining, and real-world applications. Dr. Yao was awarded the President’s Award for Outstanding Thesis by the Chinese Academy of Sciences for his Ph.D. work on simulated annealing and evolutionary algorithms. He won the 2001 IEEE Donald G. Fink Prize Paper Award for his work on evolutionary artificial neural networks. He is the Editor- in-Chief of the IEEE Transactions on Evolutionary Computation.