• No results found

Comparison with other smoothing methods

Part I Bayesian integrative genomics

3.2 Estimation with a continuous curve: the mBRC and

3.2.3 Comparison with other smoothing methods

We compared the several versions of the Bayesian regression curves with methods which estimate the copy number as a continuous curve: lowess, wavelet [28], quantreg [18] and smoothseg [29]. Lowess is the acronym of “Locally Weighted Smoothing” (implemented in the stats library of R) and

3.2 Estimation with a continuous curve: the mBRC and BRCAk methods 95 0 50 100 150 200 0.2 0.4 0.6 0.8 1.0 1.2 R M S E p e r p r o b e 2 =1.2 and 2 =0.5 Regr.CurveAk with 2 Regr.CurveAk with 2 Regr.CurveAk with 2 1 ^ ^ 0 50 100 150 200 0.1 0.2 0.3 0.4 0.5 0.6 0.7 R M S E p e r p r o b e 2 =0.5 and 2 =0.1 Regr.CurveAk with 2 Regr.CurveAk with 2 Regr.CurveAk with 2 1 ^ ^ 0 50 100 150 200 0.10 0.15 0.20 0.25 0.30 0.35 0.40 R M S E p e r p r o b e 2 =0.5 and 2 =0.05 Regr.CurveAk with 2 Regr.CurveAk with 2 Regr.CurveAk with 2 1 ^ ^ 0 50 100 150 200 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 R M S E p e r p r o b e 2 =0.3 and 2 =0.05 Regr.CurveAk with 2 Regr.CurveAk with 2 Regr.CurveAk with 2 1 ^ ^

Fig. 3.17 RMSE per probe of BRCAk by using different estimators ofρ2. The cor-

responding true profiles are in Figure 3.2. The graphs do not show clearly whichρ2

estimator is better with respect to this error measure. As in Figure 3.16, sometimes the error committed using the estimatedρ2is lower than using the true value ofρ2,

probably because of overfitting.

it is one of the methods considered in the comparison performed in [41]. As we saw previously, both the mBRC and the BRCAk perform well, so we tested both versions with both estimators ofρ2.

To asses the performance of the methods, we considered collections of artificial datasets already used in the comparison of the piecewise con- stant methods (see Subsection 3.1.7): Cases and Four aberrations (datasets SNR = 3 and SNR = 1). As previously, the error measure considered were: the RMSE (for both) and the ROC curve (for the latter). Instead, since

0 50 100 150 200 0.2 0.4 0.6 0.8 1.0 1.2 1.4 2 =1.2 and 2 =0.5 R M S E p e r p r o b e position Regr.Curve with 2 1 ,K 01 Regr.Curve with 2 1 ,K 1 Regr.Curve with 2 1 ,K 2 Regr.CurveAk with 2 1 ^ ^ ^ ^ 0 50 100 150 200 0.1 0.2 0.3 0.4 0.5 0.6 0.7 2 =0.5 and 2 =0.1 R M S E p e r p r o b e position Regr.Curve with 2 1 ,K 01 Regr.Curve with 2 1 ,K 1 Regr.Curve with 2 1 ,K 2 Regr.CurveAk with 2 1 ^ ^ ^ ^ 0 50 100 150 200 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 2 =0.5 and 2 =0.05 R M S E p e r p r o b e position Regr.Curve with 2 1 ,K 01 Regr.Curve with 2 1 ,K 1 Regr.Curve with 2 1,K2 Regr.CurveAk with 2 1 ^ ^ ^ ^ 0 50 100 150 200 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 2 =0.3 and 2 =0.05 R M S E p e r p r o b e position Regr.Curve with 2 1,K01 Regr.Curve with 2 1 ,K 1 Regr.Curve with 2 1 ,K 2 Regr.CurveAk with 2 1 ^ ^ ^ ^

Fig. 3.18 RMSE per probe of the several Bayesian regression curves, using ˆρ2 1as the

estimator ofρ2, on four datasets with replicates. The corresponding true profiles are

in Figure 3.2. Using the estimator ˆρ2

1, all the regression curves gave similar RMSE per

probe curve. [Reprinted from BioMed Central Ltd: BMC Bioinformatics [65], copyright (2009), available under Creative Commons Attribution 2.0 Generic]

some error measures of [84] suppose that the estimated profile is piece- wise constant, we did not apply this group of methods to dataset Simulated Chromosomes. Figure 3.20 shows the profiles of three examples of data in collection Cases estimated by these smoothing methods (the true profile are in Figure 3.1). Figure 3.21 displays examples of estimated profiles of data in collection Four aberrations.

3.2 Estimation with a continuous curve: the mBRC and BRCAk methods 97 0 50 100 150 200 0.2 0.4 0.6 0.8 1.0 1.2 2 =1.2 and 2 =0.5 R M S E p e r p r o b e position Regr.Curve with 2 ,K01 Regr.Curve with 2 ,K 1 Regr.Curve with 2 ,K 2 Regr.CurveAk with 2 ^ ^ ^ ^ 0 50 100 150 200 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 2 =0.5 and 2 =0.1 R M S E p e r p r o b e position Regr.Curve with 2 ,K 01 Regr.Curve with 2 ,K 1 Regr.Curve with 2 ,K 2 Regr.CurveAk with 2 ^ ^ ^ ^ 0 50 100 150 200 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 2 =0.5 and 2 =0.05 R M S E p e r p r o b e position Regr.Curve with 2 ,K 01 Regr.Curve with 2 ,K 1 Regr.Curve with 2 ,K 2 Regr.CurveAk with 2 ^ ^ ^ ^ 0 50 100 150 200 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 2 =0.3 and 2 =0.05 R M S E p e r p r o b e position Regr.Curve with 2 ,K 01 Regr.Curve with 2 ,K 1 Regr.Curve with 2 ,K 2 Regr.CurveAk with 2 ^ ^ ^ ^

Fig. 3.19 RMSE per probe of the several Bayesian regression curves, using ˆρ2as the

estimator ofρ2, on four datasets with replicates. The corresponding true profiles are in

Figure 3.2. The graphs show that, using ˆρ2, BRCAk always had the lowest RMSE per

probe and thus performed better than the other BRCs. [Reprinted from BioMed Central Ltd: BMC Bioinformatics [65], copyright (2009), available under Creative Commons Attribution 2.0 Generic]

Results

In general, we found that all methods detected the regions of aberration quite well (see, for example, Figures 3.23 and 3.25). The wavelet method showed a higher error in the level estimation of the aberrations in the datasets SNR = 3 and SNR = 1 (Figures 3.23 and 3.25). The methods lowess and quantreg had the highest RMSE in the collection Cases, while

0 50 100 150 200 -1.0 -0.5 0.0 0.5 1.0 1.5 easy case l o g 2 r a ti o position original BRC wavelet BRC, K 2 , 2 quantreg lowess smoothseg ^ 0 50 100 150 200 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0 2.5 medium case l o g 2 r a ti o position original BRC wavelet BRC, K 2 , 2 quantreg lowess smoothseg ^ 0 50 100 150 200 -3 -2 -1 0 1 2 3 4 difficult case l o g 2 r a t io position original BRC wavelet BRC, K 2 , 2 1 quantreg lowess smoothseg ^

Fig. 3.20 Estimated profiles of the data shown in Figure 3.1, obtained by applying some smoothing methods. In each plot, the grey segments represent the true profile and the dots are the raw data points. [Adapted from BioMed Central Ltd: BMC Bioinformatics [65], copyright (2009), available under Creative Commons Attribution 2.0 Generic]

their error was not significantly different outside and inside the aberrations on datasets with SNR = 1, 3. Therefore, in the last cases the error was low inside the aberrations and high outside them in comparison with the other methods. The method smoothseg showed a similar behavior, but with a lower error.

Regarding the BRCs, all of them obtained a quite good estimation when applied to the datasets of collection Cases. On dataset SNR = 3, the ROC

3.2 Estimation with a continuous curve: the mBRC and BRCAk methods 99 0 50 100 150 200 250 300 350 400 -0.6 -0.4 -0.2 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4

original BRC lowess quantreg BRC, K2, 2 wavelet smoothseg SNR = 3 lo g 2 r a t io position ^ 0 50 100 150 200 250 300 350 400 -0.6 -0.4 -0.2 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 SNR = 3 lo g 2 r a t io position

original BRC lowess quantreg BRC, K 2 , 2 wavelet smoothseg ^ 0 50 100 150 200 250 300 350 400 -0.8 -0.6 -0.4 -0.2 0.0 0.2 0.4 0.6 0.8 1.0 SNR = 1 l o g 2 r a t i o position original BRC lowess quantreg BRC, K 2 , 2 1 wavelet smoothseg BRCAk, 2 1 ^ ^ 0 50 100 150 200 250 300 350 400 -0.8 -0.6 -0.4 -0.2 0.0 0.2 0.4 0.6 0.8 1.0 SNR = 1 l o g 2 r a t i o position original BRC lowess quantreg BRC, K2, 2 1 wavelet smoothseg BRCAk, 2 1 ^ ^

Fig. 3.21 The plots show the differences in the level estimation among the smoothing methods on a samples with SNR = 3 and SNR = 1: some oscillate more in the regions outside the aberrations. In cases of high noise, the more oscillating the profiles are, the harder it is to identify which regions correspond to the aberrations. In each graph, the grey segments represent the true profile. [Reprinted from BioMed Central Ltd: BMC

Bioinformatics[65], copyright (2009), available under Creative Commons Attribution 2.0 Generic]

curves (see Figure 3.22) of the BRCs which use the estimatorρb2 1 were

slightly better than the other ones and, in general, all the modified versions were better than the original one. But the mBRC withρb2and the BRCAk

withρb2 obtained the best RMSE. All BRCs gave a similar ROC curve

on datasets SNR = 1, the corresponding RMSE (Figure 3.25) shown that the BRCs which use the estimatorρb2

1 had an error more stable than the

0.02 0.04 0.06 0.08 0.10 0.82 0.84 0.86 0.88 FPR T P R SNR = 3 original BRC BRC,K 2 , 2 BRC,K 2 , 2 1 BRCAk, 2 BRCAk,K 2 , 2 1 ^ ^ ^ ^ 0.00 0.05 0.10 0.15 0.20 0.75 0.80 0.85 0.90 0.95 T P R FPR SNR = 3 original BRC BRC, K 2 , 2 lowess wavelet quantreg smoothseg ^

Fig. 3.22 Zoomed ROC curves of several smoothing methods applied to dataset with SNR = 3. The intersection among the ROC curves was due to the differences of the methods in the level estimation outside the aberrations. The more oscillating were the estimated curves in these regions, the closer were the corresponding ROC curves to the top side of the graph. In our case, an oscillating estimated profile is very different from the true one. [Adapted from BioMed Central Ltd: BMC Bioinformatics [65], copyright (2009), available under Creative Commons Attribution 2.0 Generic]

0 100 200 300 400 0.05 0.10 0.15 0.20 0.25 position SNR = 3 R M S E p e r p r o b e original BRC BRC,K 2 , 2 BRC,K2, 2 1 BRCAk, 2 BRCAk,K2, 2 1 ^ ^ ^ ^ 0 100 200 300 400 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50 0.55 0.60 position SNR = 3 R M S E p e r p r o b e original BRC BRC,K 2 , 2 lowess wavelet quantreg smoothseg ^

Fig. 3.23 RMSE of several smoothing methods applied to dataset with SNR = 3. The black segments on the horizontal axis correspond to the regions of aberration. On this dataset, both the original BRC and the version of BRC with bK2andρb2had everywhere

the lowest error. [Adapted from BioMed Central Ltd: BMC Bioinformatics [65], copy- right (2009), available under Creative Commons Attribution 2.0 Generic]

3.2 Estimation with a continuous curve: the mBRC and BRCAk methods 101

width. On this data the mBRC withρb2

1 seemed performing slightly better

than the BRCAk withρb2 1. 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 FPR T P R SNR = 1 original BRC BRC,K 2 , 2 BRC,K 2 , 2 1 BRCAk, 2 BRCAk,K2, 2 1 ^ ^ ^ ^ 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 T P R FPR SNR = 1 original BRC BRC, K 2 , 2 1 BRCAk, 2 1 lowess wavelet quantreg smoothseg ^ ^

Fig. 3.24 ROC curves of several smoothing methods applied to dataset with SNR = 1. On this very noisy data, the methods smoothseg and lowess seemed to be the best ones, since their ROC curves were the highest at the top left corner of the plot. The third best method was BRC with bK2andρb12. [Adapted from BioMed Central Ltd: BMC Bioinformatics[65], copyright (2009), available under Creative Commons Attribution 2.0 Generic]

We also found that the ROC measure was affected by oscillations in the estimated curve, which led to ROC curves intersected and difficult to be interpreted (Figure 3.22). This complex behavior is a consequence of the way in which lowess, wavelet, quantreg and smoothseg yielded os- cillating curves with positive and negative values outside the aberrations; while BRCs estimated the true profile with a line almost flat and close to zero (see the examples in Figure 3.21). Hence, when the threshold (used for computing the ROC curve) is negative, the proportion of probes out- side the aberrations which are above the threshold (FPR) of the BRCs is greater than the one of the other methods. At the same time, the TPR of wavelet and lowess increases, because the wrongly estimated levels of the probes inside the aberrations are above the threshold.

100 200 300 400 0.02 0.04 0.06 0.08 0.10 0.12 0.14 0.16 0.18 0.20 0.22 0.24 position SNR = 1 R M S E p e r p r o b e original BRC BRC,K2, 2 BRC,K2, 2 1 BRCAk, 2 BRCAk,K2, 2 1 ^ ^ ^ ^ 0 100 200 300 400 0.04 0.08 0.12 0.16 0.20 0.24 position SNR = 1 R M S E p e r p r o b e original BRC wavelet BRC, K 2 , 2 1 quantreg BRCAk, 2 1 smoothseg lowess ^ ^

Fig. 3.25 RMSE of several smoothing methods applied to dataset with SNR = 1. The black segments on the horizontal axis correspond to the regions of aberration. The graphs show that the method lowess, quantreg and smoothseg had more or less the same error inside and outside the aberrations. Instead, the BRC version with bK2andρb12

and BRCAk withρb2

1 had a very low error outside the aberrations and not the highest

error inside them, thus globally they performed better than the other algorithms with re- spect to the RMSE measure. [Adapted from BioMed Central Ltd: BMC Bioinformatics [65], copyright (2009), available under Creative Commons Attribution 2.0 Generic]

In conclusion, mBRC and BRCAk gave in general a better estimation than the other BRCs and the other smoothing methods considered. Re- garding theρ2estimation, we found that it is better to useρb2, ifσ2<ρ2,

andbρ2

1, ifσ2>ρ2.

Related documents