Application to real data - Piecewise constant estimation: the mBPCR method

Part I Bayesian integrative genomics

3.1 Piecewise constant estimation: the mBPCR method

3.1.9 Application to real data

In this subsection, we show how mBPCR performed compared to other piecewise constant estimation methods on real data. We used samples from three mantle cell lymphoma cell lines (JEKO-1, GRANTA-519, REC-1) previously analyzed by us with the Affymetrix GeneChip Mapping 10K Array (Affymetrix, Santa Clara, CA), [69]. We also used the data obtained

3.1 Piecewise constant estimation: the mBPCR method 85

on JEKO-1, by using the higher density Affymetrix GeneChip Mapping 250K Nsp Array (unpublished). We considered eight recurrent gene re- gions of aberration in lymphoma plus other two gene regions (BIRC3 and LAMP1) and we compared the corresponding copy numbers obtained by the several piecewise constant methods with those obtained by the FISH technique in [69]. Lastly, we show a comparison among the estimated profiles of chromosome 11 of JEKO-1.

Gene copy number estimation

The knowledge of the true underling profile is required to properly evalu- ate the methods. In general, large aberrations on chromosomes can be detected with conventional karyotype analysis or with FISH and one could use this information for the evaluation procedure, but the width of these aberrations is so large that all the methods can detect them well, leading to a useless comparison. For this reason, we decided to take into account only genes to compare the piecewise constant methods.

In the comparison, as previously published [69], when two FISH copy numbers had been assigned to one gene, the first number should corre- spond to the copy number detected in the majority of the cells. We assigned two estimated copy numbers to one gene, when the position of the gene is between two SNPs and the method assigned two different values to these SNPs.

The results on REC-1 (Table 3.11) did not show any significant dif- ference among the methods, instead those on GRANTA-519 (Table 3.12) showed that GLAD was unable to detect the true copy number in five cases, while HMM, BioHMM and Rendersome detected an amplifica- tion on MALT1 greater than what detected by FISH analysis. All methods did not detect the true copy number of ATM, probably because the SNPs around ATM are far away from the corresponding FISH region (about 1Mb) and the deletion affects only this region. Only mBPCR with ˆρ2

1and

HMM detected a breakpoint between the two SNPs around ATM region, indicating a copy number change.

Table 3.11 Copy number estimation results obtained on sample REC-1. Globally, all methods behaved equally well on this data. Only Rendersome was unable to detect the correct copy number of D13S319. [Reprinted from BioMed Central Ltd: BMC Bioin-

formatics[65], copyright (2009), available under Creative Commons Attribution 2.0 Generic]

gene FISH mBPCR region CN ρˆ2 _ρ_ˆ2

1 CBS CGHseg HMM GLAD BioHMM Rendersome BCL6 2/3 2.89 2.85 2.89 2.90 2.05 2.79 2.85 2.86 C-MYC 2 2.02 1.99 2.06 2.12 2.05 2.07 2.02 2.02 CCND1 2 2.01 1.92 2.01 2.05 2.05 2.07 2.02 2.02 BIRC3 2/3 2.01 2.22 2.01 2.05 2.05 2.07 2.02 2.02 ATM 2 2.01 1.80 2.01 2.05 2.05 2.07 2.02 2.02 D13S319 2 2.03 2.40 1.98 2.03 2.05 2.01 2.02 2.89 LAMP1 2 1.82 1.76 1.98 1.98 2.05 2.01 2.02 2.02 TP53 1/2 1.11 1.17 1.11 1.11 1.10 1.55 1.10 1.20 MALT1 2/3 2.12 2.25 2.02 2.12 2.05 2.09 2.02 2.02 BCL2 2 2.12 2.25 2.02 2.12 2.05 2.09 2.02 2.02

Table 3.12 Copy number estimation results obtained on sample GRANTA-519. On this data, the GLAD method often did not detect the correct gene copy number. The method mBPCR with ˆρ2

1estimated the gene copy number always well, apart from ATM

whose copy number was estimated different from the FISH copy number by all meth- ods. [Reprinted from BioMed Central Ltd: BMC Bioinformatics [65], copyright (2009), available under Creative Commons Attribution 2.0 Generic]

gene FISH mBPCR region CN ρˆ2 _ρ_ˆ2

1 CBS CGHseg HMM GLAD BioHMM Rend.

BCL6 2 2.11 2.10 2.11 2.07 2.11 1.85 2.12 2.04 C-MYC 2 2.07 2.00 2.08 2.11 1.99 6.22/1.37 2.12 2.04 CCND1 2 2.06 2.03 2.06 2.34 2.20 2.4 2.12 2.04 BIRC3 2/3 2.06 1.76 2.06 1.14/2.34 2.11 2.4 2.12 2.04 ATM 1 2.06 2.01/1.61 2.06 2.34 2.11/1.14 2.4 2.12 2.04 D13S319 2 2.01 2.00 2.03 2.05 2.07 2.26 2.12 2.04 LAMP1 2 2.01 2.00 2.03 2.05 1.98 1.58 2.12 2.04 TP53 1 1.10 1.16 1.13 1.01 1.13 1.85 1.36 1.08 MALT1 3 3.36 3.05 3.17 3.04 5.30 2.16 4.78 4.28 BCL2 ampl 5.46 5.10 5.52 6.12 5.30 2.16 4.78 7.22

3.1 Piecewise constant estimation: the mBPCR method 87

Table 3.13 Copy number estimation results obtained on 250K Array data of sample JEKO-1. All methods behaved equally good. The method HMM had problem in deter- mining the right position of one breakpoint around the C-MYC amplification. All meth- ods estimated the copy number of CCND1 differently from FISH technique. [Reprinted from BioMed Central Ltd: BMC Bioinformatics [65], copyright (2009), available under Creative Commons Attribution 2.0 Generic]

gene FISH mBPCR region CN ρˆ2 _ρ_ˆ2

1 CBS CGHseg HMM GLAD BioHMM Rendersome BCL6 3/2 3.06 3.06 3.02 3.06 2.96 2.98 3.04 2.98 C-MYC ampl 7.12 7.10 7.14 6.87 6.70/2.63 7.28 6.51 7.72 CCND1 2 3.51 3.51 3.51 3.51 3.44 3.51 3.52 3.60 BIRC3 4/5 4.20 4.19 4.20 4.24 4.24 4.23 4.26 3.60 ATM 4 4.20 4.19 4.20 4.24 4.24 4.23 4.26 3.60 D13S319 4 3.72 3.72 3.81 3.82 3.72 3.81 3.73 3.64 LAMP1 4 3.67 3.67 3.82 3.67 3.72 3.69 3.73 3.64 TP53 2/3 2.57 2.69 2.22 2.76 2.34 2.76 2.83 2.90 MALT1 4 3.52 3.52 3.59 3.59 3.50 3.55 3.52 3.50 BCL2 4 3.52 3.52 3.59 3.59 3.50 3.55 3.52 3.50

Regarding the JEKO-1 data, since the cell line is triploid, to obtain more realistic copy number value, we centered the estimated log2ratio around

log23. With the denser 250K Array data, all methods behaved equally

good. Only HMM had a problem in the detection of the breakpoint cor- responding to the C-MYC amplification (see Table 3.13). On both arrays, all methods identified a gain (copy number 3 or 4) at the CCND1 posi- tion, while the copy number detected by FISH was 2. This fact cannot be explained as previously for ATM, because this region is well covered by SNPs. Instead, on the JEKO-1 10K Array data (Table 3.14), the nois- iest among all samples, we can see several cases in which CBS, HMM and GLAD did not detect correctly the gene copy number (for example, BCL2and MALT1). This occurred more frequently to BioHMM and Ren- dersome, while only once to CGHseg (LAMP1). The method mBPCR with

ρ2

Table 3.14 Copy number estimation results obtained on 10K Array data of sample JEKO-1. On this noisy data, BioHMM and Rendersome often estimated the gene copy number wrongly, while this occurred only sometimes to CBS, HMM and GLAD. The method mBPCR with ˆρ2

1correctly estimated the gene copy numbers, apart from CCND1whose copy number was estimated by all methods differently from the FISH technique. [Adapted from BioMed Central Ltd: BMC Bioinformatics [65], copyright (2009), available under Creative Commons Attribution 2.0 Generic]

gene FISH mBPCR region CN ρˆ2 _ρ_ˆ2

1 CBS CGHseg HMM GLAD BioHMM Rendersome BCL6 3/2 2.97 2.99 2.97 2.90 2.92 2.92 3.14 2.92 C-MYC ampl 12.11 9.35 10.27 10.27 13.95 9.82 8.26 13.10/3.11 CCND1 2 4.08 3.77 4.08 4.08 3.84 3.79 3.14 3.50 BIRC3 4/5 4.08 4.29 4.08 4.08 3.84 3.79 3.14 3.50 ATM 4 4.08 4.29 4.08 4.08 3.84 3.79 3.14 3.50/2.39 D13S319 4 3.72 3.59 3.57 3.72 3.62 3.58 3.14 3.43 LAMP1 4 3.41 3.82 3.41 3.41 3.62 2.49 3.14 3.43 TP53 2/3 2.81 3.00 2.83 2.50 3.52 2.93 3.14 2.93 MALT1 4 3.63 3.62 3.48 3.64 3.42 3.42 3.14 3.42 BCL2 4 3.63 3.62 3.48 3.64 3.42 3.42 3.14 3.42 Profile estimation

To compare the profile estimations, we chose the sample JEKO-1 because, using the results obtained on both types of array, we could at least under- stand which regions were more realistically estimated. Up to now, vali- dated whole chromosome profiles are not available. Among all chromosomes, we chose chromosome 11 since three of the previous genes belong to that: CCND1 (around 69.17Mb), BIRC3 (around 101.7Mb) and ATM (around 107.6Mb).

From the graphs in Figure 3.14 we can observe that, among all the piecewise constant methods, only mBPCR with ˆρ2

1 was able to detect the

high amplification after position 110Mb on the 10K Array data, while it was recognized by all methods (apart from BioHMM) on the 250K Ar- ray data. Moreover, on the 10K Array data, almost all methods detected a false deletion around position 3Mb, due to the presence of a sequence of

In document Stochastic methods in cancer research : applications to genomics and angiogenesis (Page 102-107)