• No results found

Serum cancer biomarker discovery through analysis of gene expression data sets across multiple tumor and normal tissues

N/A
N/A
Protected

Academic year: 2021

Share "Serum cancer biomarker discovery through analysis of gene expression data sets across multiple tumor and normal tissues"

Copied!
10
0
0

Loading.... (view fulltext now)

Full text

(1)

Serum cancer biomarker discovery through analysis of gene expression data sets

across multiple tumor and normal tissues

Hoon Jin

a,1

, Han-Chul Lee

a,1

, Sung Sup Park

b

, Yong-Su Jeong

e

, Seon-Young Kim

a,c,d,⇑ a

Medical Genomics Research Center, KRIBB, Daejeon 305-806, Republic of Korea b

Aging Research Center, KRIBB, Daejeon 305-806, Republic of Korea c

Korean Bioinformation Center, KRIBB, Daejeon 305-806, Republic of Korea d

Department of Functional Genomics, University of Science and Technology, KRIBB, Daejeon 305-806, Republic of Korea

eDepartment of Genetic Engineering, College of Life Science and Graduate School of Biotechnology, Kyung Hee University, Yongin-si, Gyeonggi-do 446-701, Republic of Korea

a r t i c l e

i n f o

Article history:

Received 10 November 2010 Accepted 9 August 2011 Available online 22 August 2011

Keywords:

Cancer serum biomarker Tissue specificity analysis Gene expression

Multiple normal tissues corrected differential analysis

a b s t r a c t

The development of convenient serum bioassays for cancer screening, diagnosis, prognosis, and monitoring of treatment is one of top priorities in cancer research community. Although numerous biomarker candi-dates have been generated by applying high-throughput technologies such as transcriptomics, proteomics, and metabolomics, few of them have been successfully validated in the clinic. Better strategies to mine omics data for successful biomarker discovery are needed. Using a data set of 22,794 tumor and normal samples across 23 tissues, we systematically analyzed current problems and challenges of serum bio-marker discovery from gene expression data. We first performed tissue specificity analysis to identify genes that are both tissue-specific and up-regulated in tumors compared to controls, but identified few novel can-didates. Then, we designed a novel computation method, the multiple normal tissues corrected differential analysis (MNTDA), to identify genes that are expected to be significantly up-regulated even after their expressions in other normal tissues are considered, and, in a simulation study, showed that the multiple normal tissues corrected differential analysis outperformed the single tissue differential analysis combined with tissue specificity analysis. By applying the multiple normal tissues corrected differential analysis, we identified some genes as novel biomarker candidates. However, the number of potential candidates was disappointingly small, exemplifying the difficulty of finding serum cancer biomarkers. We discussed a few important points that should be considered during biomarker discovery from omics data.

Ó2011 Elsevier Inc. All rights reserved.

1. Introduction

Developing simple non-invasive tests that allow early cancer detection is one of top priorities in cancer research field. Such tests will impact patient care and outcome through disease screening and early detection[18]. Recent high-throughput omics tools such as transcriptomics and proteomics have produced many candidate cancer biomarkers, raising our expectation that many simple non-invasive biomarkers will be developed soon. However, few biomark-ers identified from omics data analysis have been validated and suc-cessfully translated into clinical tests[1]. Mostly, large discrepancies exist between findings from omics analysis of tissue samples and re-sults in serum analysis. Steady increases in the publications on

potential biomarkers have not been accompanied by con-current in-creases in the FDA-approved biomarkers[31].

Serum proteins in the circulatory system reflect complex phys-iological and pathological conditions in the body with the abun-dance range of known serum proteins encompassing nine orders of magnitude[18]. Here, tumor-tissue derived proteins in the cir-culation are likely to be at lower end of this range, particularly dur-ing the early stages of tumor development[18]. For this reason, analysis of tumor tissues has been pursued to identify differentially expressed proteins. However, most potential biomarkers identified from tissue analysis were not successfully validated in serum anal-ysis[18]. One obvious reason is most tumor-tissue derived pro-teins are diluted in the circulation as blood contains molecules derived from all tissues. One potential way is to focus on tissue-specific genes in which the pattern of biomarker expression in tis-sue is expected to be similarly represented in the serum[25,26]. Prostate-specific antigen (PSA), one of the most successful serum biomarkers, is indeed expressed only in normal prostate tissue, and is frequently up-regulated in prostate tumors compared to normal prostate tissues[18].

1532-0464/$ - see front matterÓ2011 Elsevier Inc. All rights reserved. doi:10.1016/j.jbi.2011.08.010

Abbreviations:MNTDA, multiple normal tissues corrected differential analysis; STDA, single tissue differential analysis.

⇑Corresponding author at: Medical Genomics Research Center, KRIBB, Daejeon 306-806, Republic of Korea. Fax: +82 42 879 8119.

E-mail address:kimsy@kribb.re.kr(S.-Y. Kim).

1 These authors contributed equally to this work.

Contents lists available atSciVerse ScienceDirect

Journal of Biomedical Informatics

j o u r n a l h o m e p a g e : w w w . e l s e v i e r . c o m / l o c a t e / y j b i n

(2)

Numerous gene expression data is publicly available in dat-abases such as gene expression omnibus (GEO) and ArrayExpress allowing researchers search, download, and extract useful informa-tion from them. We have systematically collected, processed, and analyzed human tissue gene expression data sets produced using Affymetrix platforms. Until now, we have collected more than 400 data sets encompassing diverse cancer and normal samples from most tissue types and constructed a data-rich cancer gene expression database comprising more than 22,000 samples[42]. Then, we developed a novel computational approach, the multiple normal tissues corrected differential analysis (MNTDA), to identify genes that are expected to be significantly up-regulated even after their expressions in other normal tissues are considered. In a sim-ulation study, we showed that the MNTDA outperformed the sin-gle-tissue level differential analysis (STDA) combined with tissue specificity analysis. However, contrary to our expectation, we iden-tified only a few new biomarker candidates from our analysis of gene expression data across 23 tissues. We discussed several po-tential problems in current omics approaches to cancer biomarker discovery based on our analyses.

2. Materials and methods 2.1. Data collection, processing

Gene expression data sets were downloaded from Gene Expres-sion Omnibus and ArrayExpress website. Human tissue gene expression data sets generated using Affymetrix U133A and U133Plus2 platform were collected. Detailed information on data sets is given inTable S1. We preprocessed each data set from the raw CEL files whenever available using MAS5 algorithm available in affy package[13]. Data sets without raw CEL files were also in-cluded in our study if they were processed using MAS5 algorithm. Each sample was then normalized to a target mean density of 500 and then combined into two combined data sets (U133A and U133Plus2, respectively) of approximately 10,000 human tissue samples. The U133A chip contains 22,215 probe sets while U133Plus2 contains 54,615 probe sets. The database is available

athttp://medical-genome.kribb.re.kr/GENT/with further

informa-tion[42].

2.2. Tissue specificity analysis

We used Shannon entropy (H) to measure the overall specificity of a gene[40]. Given expression levels of a gene (g= 1, 2,. . ., G) inN

tissues (t= 1, 2,. . .,N), the relative expression of a genegin a tissue

twas defined as

ptjg¼wg;t= X

16t6N wg;t;

wherewg,tis the expression level of the gene in the tissue. The

en-tropy of a gene’s expression distribution is calculated as

Hg¼ X

16t6N

ptjglog2ðptjgÞ

A low entropy value represents a highly specific gene expression pattern. However, while Shannon entropy provides a single metric for assessing the complete gene expression profile, it doesn’t pro-vide any information on the tissues in which a gene may be specif-ically expressed. A statistic, Qg|t, is introduced to measure the

degree of specificity of gene expression in each tissue type[40]:

Qgjt¼Hglog2ðptjgÞ

A lowQg|t represents a highly specific expression pattern for

that tissue.

We also used another method, ROKU, in a simulation study

[23,24]. Briefly, a statisticUfor identifying outliers was defined as

U¼nlog

r

þpffiffiffi2slogn! n

wherenandsdenote the numbers of non-outlier and outlier can-didates and

r

denotes the standard deviation of the observations of thennon-outlier calculation. Minimum U value and the number of outliers (s) were detected for various combinations of outlier candidates. Here, the number of outliers (s) was varied from 1 to 11. The best approximating combination is one that achieves the lowest value forUand is termed the minimum AIC (Akaike Infor-mation Criterion) estimate [23]. For details, please see Kadota et al.[23]

2.3. Single tissue differential analysis (STDA) and multiple normal tissues corrected differential analysis (MNTDA)

At first, we tried to evaluate cancer biomarker candidates by conventional single tissue differential analysis (STDA) which com-pares gene expression values between each cancer patients and controls[40]. Then, we designed a novel multiple normal tissues corrected differential analysis (MNTDA) to identify potential serum biomarkers for cancer patients. A flowchart of our computational approach is schematically shown inFig. 1b.

For each tissue type, we categorized each sample into normal or tumor tissues. To calculate expression level of each gene in entire normal human tissues, we first calculated a median value of each gene for each tissue type. We used a median rather than a mean value which is significantly affected by a few outliers. Then for each tissue, we computed multiple normal tissues corrected gene expression value for cancer patients and normal controls by adding the median values of all the other normal tissues to the gene expression value of cancer patients and nor-mal controls of a tissue of interest (Fig. 1b). Finally we appliedt -test to the multiple normal tissues corrected gene expression values to identify potential serum biomarkers for each cancer types [22]. Formal description of MNTDA is described as followings.

We assume that there arettissue types (t= 1, 2,. . .,N),nnormal samples (j= 1, 2,. . .,n) andmcancer samples (j= 1, 2,. . .,m), andG

genes (i= 1, 2,. . .,G). Of course, each tissue (t) has different number of normal (n) and cancer (m) samples. Conventional differential analysis (here, named as STDA) is performed by comparing two group means log2FCgit¼ X 16j6m log2ðgijtÞ=m X 16j6n log2ðgijtÞ=n

for each geneiand each tissuet.

In contrast, MNTDA is done as followings. First, for each tissuet

(t =1, 2, 3,. . .,N), a median value of each genei,medit=median(gijt)

is computed from gene expression values gijt of normal samples

j= 1, 2,. . .,n.

Then, for each tissuet= 1, 2,. . .,N, a corrected gene expression valueg’

ijtis computed for each normal and tumor samples by add-ing median gene expression valuesmeditfrom other tissues except

thektissue: G0 ijt¼gijtþ X 16t<k meditþ X kþ16t6N medit

(j= 1, 2,. . .,n(orm) , for normal (or cancer) samples.

Finally, MNTDA is performed by comparing two group means of modified gene expression values

log2FCg 0 it¼ X 16j6m log2ðg0ijtÞ=m X 16j6n log2ðg0ijtÞ=n

(3)

We applied false discovery rate (FDR) to control false positives on multiple hypothesis testing. FDR < 0.05 was considered as sig-nificant. All computations and statistical tests were performed using R statistical language (http://cran.r-project.org/), Python pro-gramming language (http://www.python.org/) and Microsoft Excel software.

2.4. Simulation study

We performed a simulation study to compare MNTDA with con-ventional STDA combined with tissue-specificity analysis. First, we generated a series of simulated data sets as illustrated inFig. 2a. We divided 23 tissues into x specific (or selective) tissues and the remaining (23-x) non-specific tissues. Gene expression data for non-specific tissues were generated by randomly sampling 100 samples from normal distribution with parameters (mean = 100 and standard deviation = 30). Gene expression data for specific (or selective) tissues were generated by randomly sam-pling 100 samples from normal distribution with parameters (mean =y and standard deviation =y0.3; y= 100, 200, 500, 1000, 2000, 5000, 10,000). The numberxof specific (or selective) tissues was varied from 1 to 5, so up to five tissues are specifically (or selectively) expressed in the simulation. Finally, we generated gene expression data for a tumor tissue by randomly sampling 100 samples from normal distribution with parameters (mean =

-yfcand standard deviation =yfc0.3;fc= 1, 2, 4, and 8). Thus, a tissue-specific gene was allowed to be up-regulated up to eight-fold compared to normal control.

With a simulated data set of 100 normal controls and 100 tumor patients, we performed following analyses: MNTDA, conventional single tissue differential analysis (STDA) combined with tissue specificity analysis, and ROC analysis as a true answer. MNTDA and STDA were performed as described above. We performed two kinds of tissue specificity analysis, Shannon entropy and ROKU. For ROC analysis, an expected serum level value for normal

controls was calculated as the sum of 23 normal tissues while an expected serum level value for cancer patients was calculated by the sum of 22 normal tissues and one tumor tissue (Fig. 2b). The AUC value was calculated for each simulated data set of 100 nor-mal and 100 tumor samples by using ROCR package [43]. The AUC was 1 for a simulated data set in which all of 100 tumor sam-ples had higher expression value than all of 100 normal samsam-ples. Different cut-offs for AUC were evaluated (Table C inFig. 2). Then, we evaluated if each method, that is MNTDA and STDA, could iden-tify the marker in a simulated data set successfully, and summa-rized the results as sensitivity and specificity. A python code for the simulation study is given as aSupplementaryfile ( Supplemen-tary code 1).

2.5. Jackknife analysis

To demonstrate that our results are not affected by potentially noisy data sets, we performed jackknife analysis. Briefly, we re-moved one data set from the combined data set, and repeated the same analyses. We performed ten jackknife re-samplings and calculated coefficients of variations as the ratio of the standard deviation (

r

) to the mean (

l

) of the ten re-sampled data sets.

Cv¼

rl

We evaluated six parameters: Shannon entropy, sum of median values across normal tissues, normal and tumor mean values in the STDA, and normal and tumor mean values in the MNTDA in a breast.

3. Results 3.1. Data sets

We collected 22,794 samples across 23 different tissues profiled using Affymetrix U133A or U133Plus2 platforms (Table 1).

Fig. 1.Two strategies for serum cancer biomarker discovery. (a) Tissue-specificity analysis. (b) Multiple normal tissues corrected differential analysis to identify potential prostate cancer screening markers.

(4)

Currently, 12,895 samples were profiled using the U133A platform and 9899 samples were profiled using the U133Plus2 platform.

One concern on using global normalization for generating a combined data set is that study-specific biases might affect results of meta-analysis. We addressed this concern by performing jack-knife re-sampling analysis on the original data sets. We randomly deleted one data set at a time, and performed the same analyses with the remaining data sets. We generated 10 jackknifed data sets and calculated coefficients of variations for each gene on two parameters: Shannon entropy value and sum of median expression across normal tissues. Also, in breast data sets, we performed the same 10 jackknife re-samplings and calculated the following four parameters: single tissue normal mean values in a breast, single tissue cancer mean values in a breast, multiple normal tissues cor-rected normal mean values in a breast, and multiple normal tissues corrected cancer mean values in a breast. We found little variation among jackknifed data sets as shown by small coefficients of vari-ation (Fig. S1). We presume that two factors, collecting many data sets, and using a median instead of a mean during the MNTDA, contribute to the robustness of the results.

Recently, we performed another analysis to assess the impact of laboratory effects on publicly available data sets by following Lukk et al’s method[33]. With a combined data set of 5089 samples of 92 biological groups produced in 93 laboratories, we found that biological effects were stronger than the laboratory effects (data available at http://medical-genome.kribb.re.kr/GENT/overview-4.php). For details, please see Shin et al.[42].

3.2. Expression pattern of good serum biomarkers across tissues

Several serum biomarkers are used in the clinic for screening, prognosis, or monitoring of recurrence after surgical treatment

[31,34]. To figure out important patterns of gene expression of

those clinical serum biomarkers, we first examined their expres-sion patterns across diverse normal and tumor tissues. Two of them, PSA (prostate specific antigen) and AFP (alpha-feto protein), are shown as representatives of serum biomarkers (Figs. S2 and

Fig. 2.A simulation study to compare performances of multiple normal tissues corrected differential analysis (MNTDA) with single tissue differential analysis (STDA) combined with tissue specificity analysis. (a) A total of 14,000 simulated data sets representing 140 different conditions (100 data sets for each condition) were generated by different combinations ofx,y, andfcvalues (x: the number of specific tissues,y: the degree of specificity compared to non-specific tissues, andfc: fold increase relative to normal counterpart). Each data set was generated by randomly sampling 2400 values (23 normal tissues + one tumor tissue; 100 values for each tissue) from normal distribution with given parameters. (b) Evaluation of performances of multiple normal tissues corrected differential analysis with single tissue differential analysis combined with tissue specificity analysis based on ROC analysis of each simulated data set. (C) Cut-offs for each analysis.

Table 1

Description of data sets used in study.

Tissue U133A U133Plus2 Total

Cancer Normal Cancer Normal Cancer Normal Bladder 87 15 39 14 126 29 Blood 2907 1081 2079 283 4986 1364 Brain 592 1609 635 238 1227 1857 Breast 2589 103 1313 235 3902 338 Cervix 64 34 74 12 138 46 Colon 319 26 820 189 1139 215 Endometrium 0 9 72 61 72 70 Esophagus 24 28 13 9 37 37 Head_neck 21 2 202 14 223 16 Kidney 380 72 529 89 909 161 Liver 175 54 182 25 358 79 Lung 712 307 546 149 1258 456 Muscle 0 331 0 246 0 577 Ovary 341 9 613 8 954 17 Pancreas 13 7 132 54 145 61 Prostate 244 83 199 32 443 115 Skin 243 47 194 18 437 65 Small intestine 0 7 13 6 13 13 Spleen 0 16 4 5 4 21 Stomach 46 2 249 43 295 45 Testis 162 19 4 6 166 25 Thyroid 68 25 62 25 130 50 Uterus 0 22 152 12 152 34 Total 8987 3908 8126 1773 17,113 5681

(5)

S3). PSA is exclusively expressed in a prostate tissue, and is up-reg-ulated in prostate cancer patients compared to controls (Fig. S2). AFP is not expressed in normal adults, but is abnormally over-ex-pressed in a subset of liver and germ cell cancer patients

(Fig. S3). Thus, we figured out two important characteristics for a

successful serum biomarker. First, the gene should be expressed in a tissue-specific manner or not expressed at all in normal tis-sues. Second, the gene should be up-regulated in cancer patients compared to normal controls.

3.3. Tissue specificity analysis

Our first strategy to identify novel serum biomarkers for cancer patients was to identify genes that are both expressed in a tissue-specific manner and up-regulated in cancer patients compared to controls (Fig. 1a). We performed tissue specificity analysis by using Shannon entropy (H) and Q-statistics as explained in the Section2. Previous studies suggested that approximately two to five percent of genes are expressed in a tissue specific manner[23,38]. For the U133A and U133P2 data sets, we calculatedHandQvalues and the percentage of tissue specific genes at each cut-off (Fig. S4). We found thatH< 3 was a reasonable choice as 3.6% (U133P2) and 3.9% of genes (U133A) were selected as tissue specific. For theQ va-lue cut-off selection, we found that 0.156% (Q< 4), 0.289% (Q< 5), 0.557% (Q< 6), 1.34% (Q< 7) of genes were selected at each cutoff in the U133P2 dataset, while 0.139% (Q< 4), 0.282% (Q< 5), 0.671% (Q< 6), and 1.965% (Q< 7) of genes were selected in the U133A data-sets.Q< 4 orQ< 5 would be a good choice for selecting tissue-spe-cific genes strictly (please note that only 1 out 23 tissues will have a lowQvalue while the others have highQvalues). However, we chose a rather loose cutoff ofQ< 6 since we wished to select tissue selective genes (that is expressed in a few tissues) as well.

For each tissue, we performed fold-change analysis andt-test to identify differentially expressed genes between cancer patients and normal controls[41]. Although several to tens of tissue-spe-cific genes were identified for each tissue, few of them were up-regulated in cancer patients compared to controls (Fig. 3). The only tissue in which significant proportions of tissue-specific genes were up-regulated was a prostate (Fig. 3). KLK3 (PSA), KLK2 (PSA), and FOLH1 (PSMA) are examples of genes that are pros-tate-specific and up-regulated in prostate cancer patients (Table 2)

[12,40,54]. KLK3 (PSA) is the most famous prostate-specific gene

used in the clinic. FOLH1 (folate hydrolase 1, formerly known as prostate-specific membrane antigen) is a type II transmembrane glycoprotein significantly up-regulated in prostate tumors com-pared to benign prostate hyperplasia tissues. CCL20 is a chemokine suggested as a novel serum marker for nasopharyngeal carcinoma detection and prediction of treatment outcomes, but its potential as a serum marker for colon cancer is not yet tested [6]. Other genes in Table 2 are located in the cytoplasm (KRT6A, KRT6B, and PDZK1) or nucleus (WT1), so they are not suitable for serum biomarkers.

In conclusion, few novel serum biomarker candidates were identified by tissue-specificity analysis. To overcome this limita-tion, we designed a novel approach, the multiple normal tissues corrected differential analysis (MNTDA).

3.4. Multiple normal tissues corrected differential analysis (MNTDA)

Conventional single tissue DEG (differentially expressed gene) analysis, which compares cancer and normal tissues, has produced many gene lists as potential serum biomarkers for diagnosis and screening of cancer patients. However, few of them were success-fully validated in serum even though most of them are significantly up-regulated in tissues of cancer patients compared to normal con-trols. We reasoned that secretion of the gene product from other

normal tissues into circulation would dilute the degree of up-reg-ulation of the gene product in a tumor tissue[18]. We show this point by an example (Fig. S5). REG1A (regenerating islet-derived 1A) is one of genes significantly up-regulated in colon cancer pa-tients compared to normal controls (log2-fc = 3.49; FDR < 9.4e-35). Moreover, as it encodes a secreted protein, it seems to be a good serum biomarker for colon cancer screening. However it is highly expressed in normal pancreas as well as a few other tissues such as small intestine and stomach (Fig. S5). The amount of expression of REG1A in normal pancreas (60,000) is well above the amount of expression in colon cancers (median2000 with a few samples over 10,000;Fig. S5). Thus, we can expect that the 8 to 10-fold increase of REG1A in colon cancer patients will not be easily detected in the serum of the patients, as the increase is di-luted by the overwhelming amounts of REG1A secreted by normal pancreas, small intestine, and stomach tissues.

To simulate the environment in circulation, we designed a novel method, the multiple normal tissues corrected differential analysis (MNTDA), to identify genes that are expected to be significantly up-regulated even after their expressions in other normal tissues are considered. The approach is shown inFig. 2b and is explained in the Section2.

Typically, the expression of most up-regulated genes in STDA is increased up to 30–50-fold in most tissues (Fig. 4). However, the significant up-regulation of most genes is almost abolished in MNTDA (Fig. 4). For example, top 50 up-regulated genes in ovarian cancer patients versus normal controls in STDA were increased be-tween 30 and 500-fold, but the same 50 genes were increased only 1.1–2.3-fold in MNTDA (Fig. 4). This data again explains why few markers were successfully validated in the clinic even if many po-tential serum biomarkers have been identified through STDA.

3.5. Simulation study

One may argue that MNTDA is a simple integration of two steps: STDA followed by tissue specificity analysis. To compare MNTDA with STDA combined with tissue specificity analysis, we generated a series of simulated data sets as described inFig. 2, and evaluated their performances in diverse simulated conditions. For tissue specificity analysis, we evaluated two widely used meth-ods: Shannon entropy and ROKU. By varying three parameters (x: the number of specific tissues,y: the degree of specificity in com-parison to non-specific tissues, and fc: fold increase relative to a corresponding normal tissue), we generated a total of 14,000 data sets representing 140 different conditions. Then, we set cut-offs for each method as followings: log2(fc)P1 for MNTDA, log2(fc)P1 andE-value < 3.0 or log2(fc)P1 and ROKU-value6100 for STDA

(Table C inFig. 2).

In summary, MNTDA outperformed STDA with tissue specificity analysis (either Entropy or ROKU) at all levels of AUC cut-offs (

Ta-ble 3). For example, when a true marker was defined as a case with

AUC = 1, the sensitivity/specificity of MNTDA was 0.661/0.844 while those of STDA with Entropy or STDA with ROKU were 0.350/0.712 and 0.374/0.720, respectively (Table 3). In 32% (1555/4934) cases, MNTDA could identify a true biomarker (that is, AUC = 1) missed by STDA with ROKU method. Those conditions included (y= 500 and fc= 8), (y= 1000 and fc= 8), (y= 2000 and

fc= 4), and (y= 2000 andfc= 8) (Fig. 5andTable 4). In contrast, MNTDA missed a true biomarker identified by STDA with ROKU method in only 2.8% (137/1677) cases. Those conditions included (y= 5000 and fc= 2 or 4) and (y= 10,000 and fc= 2 or 4)

(Table S2). Finally, 31% of true biomarkers were missed by both

MNTDA and STDA with tissue specificity analysis. True conditions missed by both MNTDA and STDA with tissue specificity analysis were (y= 100 andfc= 8), (y= 200 and fc= 4, or 8), and (y= 400 andfc= 4) (Table S2).

(6)

3.6. Potential serum biomarkers under evaluation

Through the MNTDA, we have identified some tens of genes that are significantly up-regulated in cancer patients compared to normal controls (see Tables5 and6). Among them are well-known serum biomarkers such as KLK3 (=PSA) and MUC16 (=CA125) that are currently being used in the clinic[4,7,12]. MSLN was shown to have adequate performance for ovarian cancer screening to warrant further testing[9,37]. KLK2 and OR51E2 are two promising biomarkers for prostate cancer diagnosis, prognosis, and treatment[10,51]. Metastatic melanoma patients had higher

Fig. 3.Most tissue-specific genes are down-regulated in cancer patients. Tissue-specific genes were selected based on cut-offs (H< 3 andQ< 6) and their expression patterns in each tissue were compared between tumors and normal controls. (a) U133A data set. (b) U133Plus2 data set.

Table 2

Up-regulated tissue-specific genes common to both U133Plus2 and U133A data sets.

id Symbol Tissue H(P2) Q(P2) log2fc(P2) FDR (P2)* H(A) Q(A) Log2fc(A) FDR (A)*

205476_at CCL20 Colon 2.96 5.2 1.45 0.0021 2.47 3.82 0.92 7.3E08 209125_at KRT6A Head_neck 2.27 4.91 0.68 0.723 2.42 5.03 0.01 0.983 213680_at KRT6B Head_neck 2.62 5.85 2.08 0.289 2.53 5.06 0.35 0.579 205380_at PDZK1 Liver 2.95 5.91 0.67 0.002 2.09 5.37 0.25 0.661 206067_s_at WT1 Ovary 2.88 5.77 0.69 0.570 2.75 5.69 0.48 0.733 204583_x_at KLK3 Prostate 2.16 2.72 1.74 7.6E-13 0.39 0.44 0.53 0.436 209854_s_at KLK2 Prostate 0.5 0.57 0.43 0.023 0.4 0.46 0.5 0.476 205860_x_at FOLH1 Prostate 1.64 4.61 0.71 0.003 2.9 4.13 1.34 0.002

*FDR: false discovery rate; in this table, FDR < 0.05 was not applied since too few genes are selected.

Fig. 4.Most highly up-regulated genes in tumors compared to normal controls are not good serum biomarkers. Each of top-50 genes identified in single tissue differential analysis are minimally up-regulated in multiple normal tissues corrected differential analysis. (a) U133A data set. (b) U133Plus2 data set.

Table 3

Summary of simulation study.

AUC MNTDA STDA + entropy STDA + ROKU 1.00 *0.661/0.844 0.350/0.712 0.374/0.720 0.95 0.452/0.628 0.316/0.553 0.335/0.561 0.90 0.393/0.525 0.302/0.476 0.321/0.485 0.85 0.358/0.448 0.302/0.423 0.317/0.430 0.80 0.338/0.399 0.294/0.384 0.306/0.388 0.75 0.330/0.376 0.287/0.361 0.299/0.365 0.70 0.323/0.356 0.281/0.342 0.292/0.345

(7)

serum MIA levels than patients with primary tumor[47]. The up-regulation of MMPs and CXCL cytokines in diverse types of cancers has been extensively investigated[5,8,51,53]. We listed relevant references for other genes not discussed here (Tables 5 and 6). However, we also found that many genes listed in Tables5and6

are located in the nucleus or cytoplasm, so they are not good can-didates for serum biomarkers[5].

4. Discussion

The development of convenient serum bioassays for cancer screening, diagnosis, prognosis, and monitoring of treatment is one of top priorities among cancer research community [39]. The widespread use of various high-throughput technologies has raised expectations that many useful serum bioassays will soon be developed from voluminous data generated from omics

technologies [25]. However, contrary to our expectation, the number of approved serum biomarkers has not increased

[26,31]. Although thousands of potential cancer biomarkers have

been generated from gene expression profiling and proteomics approaches, few of them have been successfully validated in the clinical samples.

Using a data set of 22,794 tumor and normal samples across 23 tissues, we analyzed current problems and challenges of serum biomarker discovery from gene expression data and found a few important points. First, studying only a specific tissue of interest is not enough for discovering serum biomarkers for the tissue. For example, comparing only tumor and normal colon tissues is not enough when one wants to discover novel serum biomarkers for colon cancer. InFig. S5, we showed that REG1A, significantly up-regulated in colon cancer patients compared to normal con-trols, is unlikely to be a successful serum biomarker as it is highly expressed in normal pancreas, small intestine, and stomach.

Fig. 5.Examples of simulated data sets in which a true biomarker was identified exclusively by multiple normal tissues corrected differential analysis (MNTDA).

Table 4

Conditions where true biomarkers were identified by MNTDA but missed by STDA combined with tissue specificity analysis in a simulation study.

y fc x(ts num) AUC Log2 (MNTFC) Log2 (STFC) E* Q* ROKU value ROKUtsnum

500 8 1 1 1.23 3.08 4.32 6.76 65.91 1 500 8 2 1 1.23 3.09 4.20 6.85 66.17 2 500 8 3 1 1.12 3.08 4.13 6.95 63.79 3 500 8 4 1 1.02 3.08 4.09 7.07 59.15 4 1000 8 1 1 1.71 3.08 3.96 5.64 83.40 1 1000 8 2 1 1.71 3.07 3.74 5.77 83.12 2 1000 8 3 1 1.46 3.07 3.65 5.96 79.89 3 1000 8 4 1 1.28 3.08 3.63 6.20 75.21 4 1000 8 5 1 1.14 3.08 3.65 6.40 69.26 5 2000 4 1 1 1.32 2.08 3.33 4.41 99.57 1 2000 4 2 1 1.31 2.08 3.10 4.70 99.19 2 2000 4 3 1 1.01 2.08 3.08 5.08 95.24 3 2000 8 1 1 2.16 3.08 3.33 4.40 100.13 1 2000 8 2 1 2.16 3.09 3.10 4.71 99.04 2 2000 8 3 1 1.74 3.08 3.08 5.08 94.78 3 2000 8 4 1 1.47 3.08 3.14 5.45 89.43 4 2000 8 5 1 1.28 3.07 3.22 5.77 82.84 5

(8)

Indeed, tissue specificity analysis indicated that REG1A was selec-tively expressed in pancreas (Shannon entropy 1.46 andQ-value 2.28), small intestine (Q-value 3.29), and stomach (Q-value 4.27), but not in colon (Q-value 13.0). After tissue specificity analysis, one would not select REG1A as a potential serum biomarker for co-lon cancer. We suggest that for most biomarkers identified from gene expression profiling, their pattern of expression in other nor-mal tissues should be analyzed (i.e. by one of tissue specificity analyses) before costly large-scale validation studies are initiated on those markers. We provide a web database named GENT (Gene Expression database of Normal and Tumor tissues; available at

http://medical-genome.kribb.re.kir/GENT/) to help users check

the pattern of expression of their interested genes across normal and tumor tissues[42].

To overcome the above-mentioned problem, we adopted two strategies – tissue specificity analysis and MNTDA – in our computa-tional approach to identify potential serum biomarkers from gene expression data. The importance of tissue-specificity analysis in ser-um biomarker discovery is well recognized among some research-ers. Klee et al. rightly pointed out that tissue-specificity analysis will help to prioritize serum biomarker candidates[26]. At first, we expected that many novel potential biomarkers would be discov-ered throughout tissues from our analysis. However, to our disap-pointment, few novel biomarkers were identified (Table 2). While some tens to hundreds of tissue-specific genes were identified from each tissue, most of them were down-regulated in cancer patients compared to controls (Fig. 3). This result is not surprising consider-ing that carcinogenesis is a kind of de-differentiation process in

Table 5

Top 25 genes with high fold change values on MNTDA in U133Plus2 data set.

Affymetrix id Symbol Log2(fc) FDR E Q Tissue Cellular location Refs. 241550_at DPPA5 3.37 0.043 4.49 9.63 Testis Soluble, non-secreted – 204475_at MMP1 2.07 2.7e13 3 9.73 Head_neck Secreted [45]

220184_at NANOG 2.07 0.045 4.42 7.81 Testis Nucleus [19–21]

206427_s_at MLANA 1.86 2.5e05 3.34 5.43 Skin Nucleus [44]

1555505_a_at TYR 1.78 1.8e05 4.14 6.74 Skin Membrane [2,16,44]

231474_at C6orf221 1.61 0.046 4.49 7.74 Testis – 205959_at MMP13 1.58 0.0002 4.54 9.28 Head_neck Secreted [8]

206426_at MLANA 1.45 0.0001 3.83 6.05 Skin Nucleus [44]

205828_at MMP3 1.40 2.5e07 4.46 8.72 Head_neck Extracellular

236121_at OR51E2 1.39 4.8e08 4.06 6.31 Prostate Membrane [48,50]

209848_s_at SILV 1.32 0.0001 3.78 5.96 Skin Melanosome [29,46]

205680_at MMP10 1.31 5.14e05 4.54 8.9 Head_neck Secreted [53]

217428_s_at COL10A1 1.30 0 4.59 8.89 Breast Cytoplasm 204914_s_at SOX1 1.26 0 4.17 6.75 Brain Nucleus 37892_at COL11A1 1.26 0 3.83 8.54 Breast Cytoplasm

206224_at CST1 1.24 0.0416 4.28 9.65 Esophagus Secreted –

212909_at LYPD1 1.24 0.0086 3.78 7.49 Ovary –

214612_x_at MAGE6 1.22 7.37e05 2.67 8.84 Skin

210265_x_at POU5F1P3 1.20 0.0369 4.35 8.2 Testis – 206630_at TYR 1.17 3.1e06 4.27 6.96 Skin Membrane [2,16,44]

220196_at MUC16 1.15 0.0021 2.31 10.14 Ovary Soluble, non-secreted [7,35,37]

221266_s_at TM7SF4 1.14 0.0003 4.34 7.59 Thyroid Membrane [28]

1553830_s_at MAGEA2 1.13 0.0009 3.92 8.85 Skin Secreted 240366_at GPR143 1.09 0.00023 4.27 9.26 Skin Membrane 217428_s_at COL10A1 1.08 5.87e08 4.59 8.3 Pancreas Cytoplasm

Table 6

Top 25 genes with high fold change values on MNTDA in U133A data set.

Affymetrix id Symbol Log2(fc) FDR E Q Tissue Cellular location Refs. 206427_s_at MLANA 1.542 5.8e19 4.08 6.27 Skin Nucleus [44]

204885_s_at MSLN 1.36 1.74e5 2.54 8.91 Ovary Secreted [9,37]

204475_at MMP1 1.35 1.45e5 3.57 10.81 Esophagus Secreted [17]

209848_s_at SILV 1.25 2.63e18 4.2 7.41 Skin Melanosome [29,46]

204583_x_at KLK3 1.24 8.77e13 2.16 2.72 Prostate Secreted [12]

37892_at COL11A1 1.23 0.0028 4.14 10.25 Pancreas Cytoplasm – 220184_at NANOG 1.17 0.0005 4.04 8.62 Testis Nucleus [19–21]

37892_at COL11A1 1.13 1.8e40 4.14 8.69 Breast Cytoplasm [11]

204914_s_at SOX11 1.05 4.5e17 3.71 8.38 Kidney Nucleus – 206426_at MLANA 1.02 3.5e14 4.1 6.81 Skin Nucleus [44]

206696_at GPR143 1.01 3.8e14 3.84 8.67 Skin Membrane

206630_at TYR 0.97 4.1e17 4.17 7.74 Skin Membrane [2,16,44]

210339_s_at KLK2 0.97 1.3e15 3.77 5.3 Prostate Secreted protein [12]

206418_at NOX1 0.92 0.0001 2.51 3.21 Colon Multipass membrane [15,27]

217404_s_at COL2A1 0.89 1.5e14 3.69 7.33 Kidney Cytoplasm – 212236_x_at KRT17 0.87 0.0194 3.63 7.59 Head_neck Soluble, non-secreted [52]

209395_at CHI3L1 0.87 0 3.56 7.05 Brain

220196_at MUC16 0.86 0.0012 2.5 9.64 Ovary Soluble, non-secreted [4,37]

206286_s_at TDGF1 0.82 9.13e5 4.06 9.17 Testis Membrane [3]

206560_s_at MIA 0.82 1.28e10 3.84 7.68 Skin Extracellular [32]

204913_s_at SOX11 0.78 1.5e20 4.23 7.68 Kidney Nucleus – 214974_x_at CXCL5 0.78 0.0025 4.24 9.71 Pancreas Extracellular [14]

204915_s_at SOX11 0.80 9.2e18 4.33 9.48 Kidney Nucleus – 204475_at MMP1 0.76 9.1e5 3.57 9.31 Colon Secreted – 217428_s_at COL10A1 0.75 8.2e43 3.81 8.51 Breast Cytoplasm –

(9)

which tissue-specificity of a cell is gradually lost. Thus, our second point is that novel serum biomarker is unlikely to be discovered through tissue-specificity analysis.

Then to overcome the limit of tissue-specificity analysis, we de-vised the MNTDA that might be useful for identifying potential ser-um biomarkers. The basic rationale of the MNTDA is to artificially simulate a circulation environment in which proteins are secreted from all tissues in the body. Through a simulation study, we found that, 32% of true biomarkers were identified by MNTDA but missed by STDA with tissue specificity analysis (Fig. 5, Table 3, and

Table S2). Those cases are characterized by moderate

tissue-spe-cific expression where the gene expression in a spetissue-spe-cific tissue is not overwhelmingly higher than gene expression in non-specific tissues. Indeed, many potential serum biomarkers in Tables5and

6 are cases where genes are moderately tissue-specific but are highly over-expressed in cancer patients compared to controls. In contrast, only 2.8% of true biomarkers were identified by STDA with tissue specificity analysis but not by MNTDA. Those cases are characterized by highly tissue-specific expression with modest over-expression in cancer patients compared to normal controls. Finally, 31% of true biomarkers were missed by both MNTDA and STDA with tissue specificity analysis. Those conditions are charac-terized by low expression across all normal tissues followed by high over-expression in cancer patients compared to normal con-trols; Alpha-feto protein (AFP) can be a representative example.

Of course, our MNTDA is naïve in several points. First, differ-ences in tissue sizes are ignored when we reproduce the circulation environment from gene expression data of normal tissues. Obvi-ously, tissues such as stomach, colon, and lung would secrete more proteins in the circulation than smaller tissues such as pancreas and bladder when a gene is expressed at the same level. Second, only 23 out of more than 50 normal tissues were included in our analysis. Third, the fact that tumors constitute only a minor portion of a diseased tissue in most cancer patients is ignored in our anal-ysis. Despite above-mentioned limits, our approach allows better selection of likely serum biomarkers from gene expression data. When we compared gene lists from STDA with those from MNTDA, we found that many significantly up-regulated genes (sometimes up to 50-fold increase) in STDA were minimally up-regulated in MNTDA since those genes were highly expressed in other normal tissues, too (Fig. 4). We insist that many omics-derived candidates that failed to pass clinical validation can be explained by the above-mentioned rationale of MNTDA. Thus, our method has a po-tential to improve the specificity of the process of identifying puta-tive biomarkers by filtering out many unpromising candidates.

Our MNTDA successfully identified two serum biomarkers in the clinic (KLK3 and MUC16) and several potential biomarkers cur-rently being evaluated (MSLN, KLK2, and OR51E2). Many other genes in Tables5and6were identified as potentially useful bio-markers in diverse cancer types, too. However, the number of potentially useful serum biomarkers is disappointingly small. In U133A data set, only 13 genes were over-expressed more than twofold in MNTDA, while in U133Plus2 data set, only 62 genes were over-expressed more than twofold. Because only about a half of them encode secreted or membrane proteins, the number of candidate serum biomarkers is again reduced. Our result can partly explain why there have been so few new serum biomarkers in re-cent years despite extensive omics studies[31]. Considering that there are only a handful of potential serum biomarkers quantita-tively different between cancer patients and normal controls, it might be better to identify novel serum biomarkers with qualita-tive differences such as post-translational modifications.

Although our study has focused only on gene expression data, our conclusions are equally applicable to other types of molecular markers. For example, because of the low abundance of potential biomarker proteins in the serum, many researchers have applied

two-step approaches in their efforts to identify novel serum pro-tein biomarkers [18]. They first identified plausible candidate markers by comparing tissues from cancer patients and normal controls, and then tried to confirm the candidate markers in the serum with few successes[18]. We suggest that the pattern of expression of those candidate markers in other tissues should be examined before validating those markers in the serum. As tis-sue-wide proteome expression databases are few, one may instead refer to gene-expression database to examine if candidate markers are abundantly expressed in other tissues.

Recently, DNA methylation markers have been extensively examined as a new source of cancer screening serum biomarkers. One of them, SEPT9, is being developed for blood-based colorectal cancer screening[30]. Again, most researchers first compare tumor and normal tissues to identify candidate DNA biomarkers and then try to validate their results in sera of patients. For the above-men-tioned reason, we think that many potential DNA biomarkers iden-tified from tissue analysis would not be useful when applied to serum samples. As a DNA methylation map across different human tissues is currently unavailable, examining the DNA methylation pattern of candidate markers across other tissues would not be easy. However, with next-generation sequencing machines such as Illumina’s Genome Analyzer and Applied Biosystems’ SOLiD, comparing DNA methylation patterns in sera of cancer patients and normal controls has become a routine work. Therefore, com-paring DNA methylation status in sera of cancer patients and con-trols by next-generation sequencers would be a promising approach to identify blood-based DNA methylation biomarkers for cancer screening.

Our work has another implication. As most molecular biomark-ers are diluted in the circulation, sampling molecular markbiomark-ers near the diseased tissue is better strategy for molecular diagnosis. For example, fecal DNA is better than serum DNA for colorectal cancer screening[36], sputum DNA is better than serum DNA for lung cancer screening[49], and urine specimen is normally used for bladder cancer screening. For tumor types in which local sampling of body fluids is available, applying omics technologies such as proteomics and next-generation sequencing to the body fluids would be better strategy to identify novel biomarkers.

Acknowledgments

This work was supported by the Basic Science Research gram (NRF2010-0008143) and Converging Research Center Pro-gram (2009-0081904) through the National Research Foundation of Korea (NRF) funded by the Ministry of Education, Science and Technology (MOEST) and KRIBB research initiative grant. We thank anonymous reviewers for their constructive comments that help improve our manuscript significantly.

Authors’ contribution:

SYK designed the study. H.J., H.C.L., Y.S.J., and S.S.P. performed bioinformatics analyses. J.H., H.C.L., S.S.P. and S.Y.K. wrote the man-uscript. All authors read and approved the manman-uscript.

Appendix A. Supplementary material

Supplementary data associated with this article can be found, in the online version, atdoi:10.1016/j.jbi.2011.08.010.

References

[1] U.D.o.H.H.S.U.F.a.D. Administration., challenge and opportunity on the critical path of new medical products; 2004. <http://www.fda.gov/oc/initiatives/ criticalpath/whitepaper.html>.

[2] Andres R, Mayordomo JI, Visus C, Isla D, Godino J, Escudero P, et al. Prognostic significance and diagnostic value of protein S-100 and tyrosinase in patients with malignant melanoma. Am J Clin Oncol 2008;31:335–9.

(10)

[3] Baldassarre G, Romano A, Armenante F, Rambaldi M, Paoletti I, Sandomenico C, et al. Expression of teratocarcinoma-derived growth factor-1 (TDGF-1) in testis germ cell tumors and its effects on growth and differentiation of embryonal carcinoma cell line NTERA2/D1. Oncogene 1997;15:927–36.

[4] Bast Jr RC, Badgwell D, Lu Z, Marquez R, Rosen D, et al. New tumor markers: CA125 and beyond. Int J Gynecol Cancer 2005;15(Suppl 3):274–81. [5] Chang HJ, Yang MJ, Yang YH, Hou MF, Hsueh EJ, Lin SR. MMP13 is potentially

a new tumor marker for breast cancer diagnosis. Oncol Rep 2009;22:1119–27. [6] Chang KP, Hao SP, Chang JH, Wu CC, Tsang NM, Lee YS, et al. Macrophage inflammatory protein-3alpha is a novel serum marker for nasopharyngeal carcinoma detection and prediction of treatment outcomes. Clin Cancer Res 2008;14:6979–87.

[7] Chauhan SC, Singh AP, Ruiz F, Johansson SL, Jain M, Smith LM, et al. Aberrant expression of MUC4 in ovarian carcinoma: diagnostic significance alone and in combination with MUC1 and MUC16 (CA125). Mod Pathol 2006;19:1386–94.

[8] Culhaci N, Metin K, Copcu E, Dikicioglu E. Elevated expression of MMP-13 and TIMP-1 in head and neck squamous cell carcinomas may reflect increased tumor invasiveness. BMC Cancer 2004;4:42.

[9] Dainty LA, Risinger JI, Morrison C, Chandramouli GV, Bidus MA, Zahn C, et al. Overexpression of folate binding protein and mesothelin are associated with uterine serous carcinoma. Gynecol Oncol 2007;105:563–70.

[10] Edwards S, Campbell C, Flohr P, Shipley J, Giddings I, Te-Poele R, et al. Expression analysis onto microarrays of randomly selected cDNA clones highlights HOXB13 as a marker of human prostate cancer. Br J Cancer 2005;92:376–81.

[11] Ellsworth RE, Seebach J, Field LA, Heckman C, Kane J, Hooke JA, et al. A gene expression signature that defines breast cancer metastases. Clin Exp Metastasis 2009;26:205–13.

[12] Emami N, Diamandis EP. Utility of kallikrein-related peptidases (KLKs) as cancer biomarkers. Clin Chem 2008;54:1600–7.

[13] Eschrich SA, Hoerter AM. Libaffy: software for processing Affymetrix GeneChip data. Bioinformatics 2007;23:1562–4.

[14] Frick VO, Rubie C, Wagner M, Graeber S, Grimm H, Kopp B, et al. Enhanced ENA-78 and IL-8 expression in patients with malignant pancreatic diseases. Pancreatology 2008;8:488–97.

[15] Fukuyama M, Rokutan K, Sano T, Miyake H, Shimada M, Tashiro S, et al. Overexpression of a novel superoxide-producing enzyme, NADPH oxidase 1, in adenoma and well differentiated adenocarcinoma of the human colon. Cancer Lett 2005;221:97–104.

[16] Glumac N, Snoj M, Hocevar M, Novakovic S. Prognostic significance of tyrosinase mRNA detected by nested RT–PCR in patients with malignant melanoma. Neoplasma 2006;53:9–14.

[17] Gu ZD, Li JY, Li M, Gu J, Shi XT, Ke Y, et al. Matrix metalloproteinases expression correlates with survival in patients with esophageal squamous cell carcinoma. Am J Gastroenterol 2005;100:1835–43.

[18] Hanash SM, Pitteri SJ, Faca VM. Mining the plasma proteome for cancer biomarkers. Nature 2008;452:571–9.

[19] Hart AH, Hartley L, Parker K, Ibrahim M, Looijenga LH, Pauchnik M, et al. The pluripotency homeobox gene NANOG is expressed in human germ cell tumors. Cancer 2005;104:2092–8.

[20] Hoei-Hansen CE. Application of stem cell markers in search for neoplastic germ cells in dysgenetic gonads, extragonadal tumours, and in semen of infertile men. Cancer Treat Rev 2008;34:348–67.

[21] Hoei-Hansen CE, Carlsen E, Jorgensen N, Leffers H, Skakkebaek NE, Rajpert-De Meyts E. Towards a non-invasive method for early detection of testicular neoplasia in semen samples by identification of fetal germ cell-specific markers. Hum Reprod 2007;22:167–73.

[22] Hsiao LL, Dangond F, Yoshida T, Hong R, Jensen RV, Misra J, et al. A compendium of gene expression in normal human tissues. Physiol Genomics 2001;7:97–104.

[23] Kadota K, Nishimura S, Bono H, Nakamura S, Hayashizaki Y, Okazaki Y, et al. Detection of genes with tissue-specific expression patterns using Akaike’s information criterion procedure. Physiol Genomics 2003;12:251–9. [24] Kadota K, Ye J, Nakai Y, Terada T, Shimizu K. ROKU: a novel method for

identification of tissue-specific genes. BMC Bioinform 2006;7:294.

[25] Klee EW. Data mining for biomarker development: a review of tissue specificity analysis. Clin Lab Med 2008;28:127–43 (vii).

[26] Klee EW, Finlay JA, McDonald C, Attewell JR, Hebrink D, Dyer R, et al. Bioinformatics methods for prioritizing serum biomarker candidates. Clin Chem 2006;52:2162–4.

[27] Laurent E, McCoy III JW, Macina RA, Liu W, Cheng G, et al. Nox1 is over-expressed in human colon cancers and correlates with activating mutations in K-Ras. Int J Cancer 2008;123:100–7.

[28] Lee KY, Huang SM, Li S, Kim JM. Identification of differentially expressed genes in papillary thyroid cancers. Yonsei Med J 2009;50:60–7.

[29] Lewis TB, Robison JE, Bastien R, Milash B, Boucher K, Samlowski WE, et al. Molecular classification of melanoma using real-time quantitative reverse transcriptase-polymerase chain reaction. Cancer 2005;104:1678–86. [30] Lofton-Day C, Model F, Devos T, Tetzner R, Distler J, Schuster M, et al. DNA

methylation biomarkers for blood-based colorectal cancer screening. Clin Chem 2008;54:414–23.

[31] Ludwig JA, Weinstein JN. Biomarkers in cancer staging, prognosis and treatment selection. Nat Rev Cancer 2005;5:845–56.

[32] Lugovic L, Situm M, Buljan M, Poduje S, Sebetic K. Results of the determination of serum markers in patients with malignant melanoma. Coll Antropol 2007;31(Suppl. 1):7–11.

[33] Lukk M, Kapushesky M, Nikkila J, Parkinson H, Goncalves A, Huber W, et al. A global map of human gene expression. Nat Biotechnol 2010;28:322–4. [34] Manne U, Srivastava RG, Srivastava S. Recent advances in biomarkers for

cancer diagnosis and treatment. Drug Discov Today 2005;10:965–76. [35] McLemore MR, Aouizerat B. Introducing the MUC16 gene: implications for

prevention and early detection in epithelial ovarian cancer. Biol Res Nurs 2005;6:262–7.

[36] Nagasaka T, Tanaka N, Cullings HM, Sun DS, Sasamoto H, Uchida T, et al. Analysis of fecal DNA methylation to detect gastrointestinal neoplasia. J Natl Cancer Inst 2009;101:1244–58.

[37] Palmer C, Duan X, Hawley S, Scholler N, Thorpe JD, Sahota RA, et al. Systematic evaluation of candidate blood markers for detecting ovarian cancer. PLoS One 2008;3:e2633.

[38] Saito-Hisaminato A, Katagiri T, Kakiuchi S, Nakamura T, Tsunoda T, Nakamura Y. Genome-wide profiling of gene expression in 29 normal human tissues with a cDNA microarray. DNA Res 2002;9:35–45.

[39] Sawyers CL. The cancer biomarker problem. Nature 2008;452:548–52. [40] Schug J, Schuller WP, Kappen C, Salbaum JM, Bucan M, Stoeckert Jr CJ, et al.

Promoter features related to tissue specificity as measured by Shannon entropy. Genome Biol 2005;6:R33.

[41] Shi L, Tong W, Fang H, Scherf U, Han J, Puri RK, et al. Cross-platform comparability of microarray technology: intra-platform consistency and appropriate data analysis procedures are essential. BMC Bioinform 2005;6(Suppl. 2):S12.

[42] Shin G, Kang TW, Yang S, Baek SJ, Jeong YS, Kim SY. GENT: gene expression database of normal and tumor tissues. Cancer Inform 2011;10:149–57. [43] Sing T, Sander O, Beerenwinkel N, Lengauer T. ROCR: visualizing classifier

performance in R. Bioinformatics 2005;21:3940–1.

[44] Soikkeli J, Lukk M, Nummela P, Virolainen S, Jahkola T, Katainen R, et al. Systematic search for the best gene expression markers for melanoma micrometastasis detection. J Pathol 2007;213:180–9.

[45] Vachani A, Nebozhyn M, Singhal S, Alila L, Wakeam E, Muschel R, et al. A 10-gene classifier for distinguishing head and neck squamous cell carcinoma and lung squamous cell carcinoma. Clin Cancer Res 2007;13:2905–15.

[46] Virador V, Matsunaga N, Matsunaga J, Valencia J, Oldham RJ, Kameyama K, et al. Production of melanocyte-specific antibodies to human melanosomal proteins: expression patterns in normal human skin and in cutaneous pigmented lesions. Pigment Cell Res 2001;14:289–97.

[47] Vucetic B, Rogan SA, Hrabac P, Hudorovic N, Cupic H, Lukinac L, et al. Biological value of melanoma inhibitory activity serum concentration in patients with primary skin melanoma. Melanoma Res 2008;18:201–7.

[48] Wang J, Weng J, Cai Y, Penland R, Liu M, Ittmann M. The prostate-specific G-protein coupled receptors PSGR and PSGR2 are prostate cancer biomarkers that are complementary to alpha-methylacyl-CoA racemase. Prostate 2006;66:847–57.

[49] Wang YC, Hsu HS, Chen TP, Chen JT. Molecular diagnostic markers for lung cancer in sputum and plasma. Ann NY Acad Sci 2006;1075:179–84. [50] Xu LL, Sun C, Petrovics G, Makarem M, Furusato B, Zhang W, et al. Quantitative

expression profile of PSGR in prostate cancer. Prostate Cancer Prostatic Dis 2006;9:56–61.

[51] Yang Y, Pospisil P, Iyer LK, Adelstein SJ, Kassis AI. Integrative genomic data mining for discovery of potential blood-borne biomarkers for early diagnosis of cancer. PLoS One 2008;3:e3661.

[52] Ye H, Yu T, Temam S, Ziober BL, Wang J, Schwartz JL, et al. Transcriptomic dissection of tongue squamous cell carcinoma. BMC Genomics 2008;9:69. [53] Yen CY, Chen CH, Chang CH, Tseng HF, Liu SY, Chuang LY, et al. Matrix

metalloproteinases (MMP) 1 and MMP10 but not MMP12 are potential oral cancer markers. Biomarkers 2009;14:244–9.

[54] Zhang Z, Bast Jr RC, Yu Y, Li J, Sokoll LJ, et al. Three biomarkers identified from serum proteomic analysis for the detection of early stage ovarian cancer. Cancer Res 2004;64:5882–90.

Figure

Fig. 1. Two strategies for serum cancer biomarker discovery. (a) Tissue-specificity analysis
Fig. 4. Most highly up-regulated genes in tumors compared to normal controls are not good serum biomarkers
Fig. 5. Examples of simulated data sets in which a true biomarker was identified exclusively by multiple normal tissues corrected differential analysis (MNTDA).

References

Related documents

of functions on the lattice are transparently expressed in terms of the operator algebra, and, applied to the information measures, can be used to derive a

The Organizationally variables causing stress includes various sub factors under the road headings of Organizational Characteristics (OC), Work Characteristics (WC), stress

Other individual lifestyle factors including no current smoking, no current drinking, no overweight or obesity and regular physical activ- ity were associated with a lower risk of RAR

Cone argued that his situation fell within the second Crrc exception to Strid&amp;mzl In support of this argument, Cone pointed to his attorney's failure to

Before 1982 sovereign debtors regularly defaulted on their debts. Since the debt crisis that commenced in that year, sovereign defaults have been rare and usually

In this way, WTO panels would recognize that not all regulatory mea- sures are protectionist and would impose some criteria for assessing the discriminatory effects

The high density of station 3 was affected by location of this station near oil palm plantation and had many small rivers that coming from oil palm

Please cite this article as: Norton, Tomas, Piette, Deborah, Exadaktylos, Vasileios, Berckmans, Daniel, Automated real-time stress monitoring of police horses using