Denoising method for capillary electrophoresis signal via learned tight frame

(1)

IET Signal Processing

Research Article

Denoising method for capillary

electrophoresis signal via learned tight frame

ISSN 1751-9675 Received on 23rd July 2019 Revised 20th November 2019 Accepted on 7th February 2020 E-First on 16th April 2020 doi: 10.1049/iet-spr.2019.0242 www.ietdl.org

Yixiang Lu

1

_{, Zhenya Wang}

1

_{, Qingwei Gao}

1

, Dong Sun

1

_{, Hua Bao}

1

1_{Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, School of Electrical Engineering and Automation,}

Anhui University, Hefei, People's Republic of China E-mail: [email protected]

Abstract: Since capillary electrophoresis (CE) signals are always contaminated by random noise, which has negative influence

on the accuracy of detection and analysis, it is necessary to remove noise before further applications of the CE signals. In this study, a tight frame learned from the data itself is applied to the removal of noise for CE signals. To achieve an effective decomposition of the CE signal, a one-dimensional discrete tight frame tailored to the input signal is first constructed by introducing tight frame constraint into the popular dictionary learning model. Then, due to each subband containing different information of the noise, an adaptive threshold is computed to shrink the detail coefficients instead of using a global threshold. Finally, the denoised CE signal is reconstructed from the thresholded coefficients by using the inverse transform of the tight frame. To evaluate the denoising efficiency, the proposed method is applied to the simulated CE signals and real CE signals. Experimental results indicate that compared with other denoising methods, the proposed method obtains a better shape preservation of the peaks as well as a higher signal-to-noise ratio.

1 Introduction

As a powerful separation and identification technique that can be used as a sample preprocessing in the analysis of ionic analytes, capillary electrophoresis (CE) [1] has been well developed and widely used in the fields of chemistry, biomedicine etc. [2]. Despite the CE has many advantages such as high throughput, smaller volumes, higher sensitivity and reduced cost [3], the detected CE signals are usually contaminated by random noise. In general, the noise mainly comes from the instruments (e.g. detector), the electronic components (e.g. analogue-to-digital converter) etc. [4], which may decrease the resolution of the signal and result in a wrong outcome. Therefore, to achieve sensitivity and high-accuracy analysis of the analyte, the operation of noise removal should be conducted before data processing.

To suppress noise, two different types of strategies are employed. The first one, whose intention is to increase the intensity of the detected signal without changing the noise, is to adopt highly sensitive detection methods (such as laser-induced fluorescence and chemiluminescence detection) and some state-of-the-art preconcentration schemes (e.g. electrostacking and sweeping). The other is to remove the noise without destroying the underlying signal, that is to employ signal processing method to denoise the obtained CE signal. At the early stage of CE, the first strategy was commonly employed in the process of separation. It not only increases the cost of the analysis, but also requires the operator with some related professional knowledge and skills. In contrast, the signal processing techniques for noise removal are suitable for a wider range of scientific researchers who are without the professional background of CE. Moreover, the denoising operation can be conducted in a flexible way, which cannot be confined to the experimental spot of separation. These advantages make the signal processing become a promising technique for the noise reduction and other processing of the CE signal, especially with the development of digital signal processing. Thus, here, we are only concerned with the methods on signal denoising for CE.

The difficulty in signal denoising of CE comes from the characteristics of the detected CE signal, which contains various very sharp peaks. These peaks result in the CE signal spanning wide ranges in both time and frequency domains, and the frequency band overlapping with the frequency band of the noise

in high-frequency area [5, 6]. To remove the noise, numerous denoising methods for CE signal have been proposed in the past few years. In the early days, the finite impulse response filter, whose denoising function is realised by performing local averaging to attenuate the rapidly fluctuating components of the input signal is employed to denoise the CE signal. A typical and frequently used filter is the so-called Savitzky–Golay (SG) filter [7] and its variations [8], realise denoising by implementing the least-square polynomial fitting. The low-pass filter [9] based on the Fourier transform is another type of such filter, where the noise can be removed by pre-setting an appropriate cut-off frequency. From the principle of filtering, it can be seen that the peaks in the resultant signal obtained by these filters tend to be flattened out while the noise is filtered.

A more complicated time–frequency representation method named windowed Fourier transform [or short-time Fourier transform (STFT)] [10] was introduced and applied to reduce the noise and preserve peaks, which carry an amount of important information about the analytes. Although some improvements can be achieved by the STFT, the simultaneous proper denoising and shape preservation of peaks are not possible [11]. Since the window used in the STFT is fixed, it cannot achieve an accurate representation of local and global changes in the signal at the same time.

Since the wavelet transform (WT) [12] was introduced in the early 1990s, a significant breakthrough has been made in the field of CE signal processing, which includes denoising [11], peak detection [13], baseline correction [14] and so on. Compared to STFT, the wavelets can capture the local features both in time and frequency domains simultaneously by exploiting a more flexible window varying with the features of the signal. One of the simplest denoising strategy for CE signal based on WT is the soft or hard thresholding. The success of noise removal using the threshold relies on the fact that the underlying signal has a sparse representation, that is it is approximated by a small number of relatively large-amplitude coefficients, whereas noise will have its energy spread across a large number of small-amplitude coefficients [15]. Since the wavelet filter has a predefined structure that cannot match well with all the characteristics of any signals, a type of wavelet can achieve a good result for a given signal, and it

(2)

will not achieve a satisfactory denoising performance for another different signal.

In this paper, we presented a novel denoising method for CE signal based on the construction of a tight frame [16] learning from the input signal. Since the decomposition of CE signal under the learned tight frame filter is sparse, an adaptive threshold incorporating the spatial correlation between adjacent coefficients was computed to remove the noise. The used tight frame is learned directly from the given input signal, and it does not predefine the structure. This makes it adaptively match the characteristics of the underlying signal, and the coefficients will be sparser than that of wavelet and be more suitable for signal processing (including CE denoising). In the step of computing threshold, we divide the coefficients into two disjoint subsets to obtain a more accurate threshold for each subband. The results show that the proposed method outperforms wavelet-based thresholding and sparse-based approaches on simulated and real CE signals.

2 Related work

Signal denoising methods in the transform domain fall into two categories: algorithms using orthogonal bases with predefined structures and redundant dictionaries with predefined or unpredefined structures.

In the first group, the wavelet is the most commonly used base. In [11], Perrin et al. proposed a wavelet-based signal denoising method for CE signal. In this study, a hard threshold was used to shrink the detail wavelet coefficients of the CE signal, and then the noise-free signal is obtained by using the inverse WT. Moreover, the influences of the types of noises, used wavelet and the decomposition levels were also discussed. A similar work on denoising for DNA CE signal using the adaptive threshold was further developed in [17]. An improved CE signal denoising method using the characteristics of coefficients of underlying signal and noise is proposed in [6]. The work multiplies coefficients at adjacent scales to amplify the significant features of signal and dilute noise, and then the noise is removed by applying a multi-scale product threshold. Comparing with the traditional filters, the wavelet-based methods can provide better results in terms of noise removal and peak preservation. However, the sharp peaks also generate small-amplitude coefficients such as that of noise under WT and the threshold cannot distinguish them completely. Thus, some visual artefacts (e.g. pseudo-Gibbs effect [18, 19]) still exist in the resultant signals.

For the algorithms using redundant dictionaries, the sparse representation [20–23] is their representative. Aharon et al. [22] used the K-singular value decomposition (K-SVD) algorithm to train an adaptive redundant dictionary and applied them to image denoising. Later, Han et al. [24] realised spectrum denoising and baseline correction by designing a structure dictionary according to the features of the spectrum. Recently, in [20], Rencker et al. proposed a unified framework to deal non-linear measurements by using sparse coding and dictionary learning algorithms. However, the sparse recovery needs some knowledge about the non-linear measurement.

3 Tight frame learned from data

In this section, the construction scheme of one-dimensional (1D) tight frame directly from the data is presented progressively. First of all, the design of overcomplete dictionary is briefly reviewed. Then, the mathematic formulation of the training tight frame is presented detailedly. Finally, two alternate steps, sparse coding and dictionary update, are employed to update the tight frame in the optimisation problem.

3.1 Overcomplete dictionary learning theory

In the field of sparse representation, an overcomplete dictionary, which provides a sparse representation for the signal to be analysed, can either be chosen as a prespecified set of functions or designed by adapting its content to fit a given set of signal examples [22]. With prespecified set functions, the signal can achieve a sparse representation easily by performing a convolution

between the signal and each function of the dictionary. The traditional wavelets, curvelets, directionlets [25] and other X-lets are all belonging to this category. Despite the dictionary with predefined structure is simple and can lead to simple and fast algorithms for applications in many cases, the sparsity depends on the matching degree between the atoms and features of the input signal.

The learned overcomplete dictionary can achieve a more sparse representation for a signal than the dictionaries with predefined structures, because it is obtained directly from the input signal itself and presents the signal more efficiently by using its atoms. The existing dictionary training algorithms find the overcomplete dictionary either by maximising the probabilities (e.g. the maximum likelihood method and the maximum a-posteriori probability method [26]) or by minimising the error between the data and the linear combination of atoms (e.g. the method of optimal directions [27], the unions of orthonormal bases [28] and the K-SVD method [22, 29]). In general, they can essentially be interpreted as generalisations of the K-means algorithm, whose process involves two steps in each iteration: (i) given a set of descriptive vectors {di}i= 1

M _{, assign the training examples to their}

nearest neighbour and (ii) given that assignment, update {di}i= 1

M

to better fit the examples. The differences between each dictionary training algorithm are reflected in two aspects, one is the method used to calculate the sparse coefficients and the other is the method used to update the dictionary. Here, we only review the main idea of K-SVD training algorithm, which is the basis of our work.

For a signal model

y= g + ν

y= Dx, (1)

where y ∈ ℛN

is the observed signal and g is the considered signal, which has a sparse representation vector x under a dictionary D, ν is a Gaussian noise with zero mean and variance σ2_, the K-SVD method seeks the dictionary by solving the following optimisation problem:

min

D, X ∥ Y − DX ∥F

2

s . t . ∀i, ∥ xi∥0 ≤ K0, (2)

where Y is a sample matrix formed by {yi}i= 1

M _{in the form of}

column vectors, X is a coefficient matrix whose columns corresponding to {yi}i= 1

M _{, K0 is a small positive integer used to limit}

the number of non-zero entries in xi, ∥ ⋅ ∥F denotes the Frobenius

norm and ∥ ⋅ ∥0 stands for the ℓ0-norm. Then, the coefficient matrix X and the dictionary D are iteratively updated by matching pursuit algorithm and SVD, respectively.

3.2 Design of tight frame

Despite dictionary learning methods can obtain an adaptive overcomplete dictionary to sparsely represent the given signal, the optimisation process often leads to a severely under-constrained ill-posed problem owing to the redundancy of the overcomplete dictionary, especially when the training matrix Y contains too many similar samples. Moreover, the redundancy of dictionary will also result in a bad reconstruction property. To address the problems mentioned above, an adaptively learned tight frame, which incorporates the tight frame constraint into dictionary learning scheme, was proposed by Cai et al. [30] to represent the image using 2D tight frame. Here, we exploit the input CE signal to train a 1D tight frame, and use it to decompose the CE signal. In fact, a tight frame is often viewed as generalisation of orthonormal bases, which can be defined as: a sequence {xn} ∈ H (H is a

Hilbert space) is a tight frame for H if ∥ x ∥2

= ∑n⟨x, xn⟩2, for

any x ∈ H.

(3)

min

{ fn}nr= 1, x

∥ x − W( f1( − ), f2( − ), …, fr( − ))y ∥22 + λ ∥ x ∥0

s . t . WTW = I .

(3) where W [or W( f1( − ), f2( − ), …, fr( − ))] is an analysis

operator associated with the tight frame, which is generated by a set of filters { fi}ir= 1, x is a sparse coefficient vector that approximates the canonical tight frame coefficients Wy. In the objective function, the first term is the sparsification error that demands the proximity between the sparse coefficient vector x and coefficients Wy; the second term ∥ x ∥0 forces the obtained coefficient to be sparse as soon as possible; and the constraint WTW = I is to guarantee that W is a tight frame. There are two unknown variables W and x in (9), the common practice is alternative optimisation, i.e. keeping one fixed and optimising the other, then exchanging variables and doing it again. Thus, two optimisation sub-problems will be solved in each iteration. 3.2.1 Sparse coding: When the dictionary is known in advance, the first sub-problem is a typical sparse coding problem, which can be represented as

min

x ∥ x − Wy ∥2

2

+ λ ∥ x ∥0 . ₍₄₎

For a given signal y, the optimal problem (4) has a closed-form solution

x^(n) = Wy(n) if Wy(n) > λ

0 otherwise . (5)

Obviously, it is a hard thresholding operator, namely

x^_{= hard(Wy, λ) .} (6)

3.2.2 Tight frame update: Once the sparse vector x is calculated, the next step is to update the tight frame { fn}nr= 1; this can be realised by solving another optimisation sub-problem under the known x

min

{ fn}nr= 1

∥ x − Wy ∥22

, s . t . WTW = I . ₍₇₎

In general, the solving process of (7) is very time consuming owing to the orthogonal constraint WT_{W = I. However, according to}

[30], the above sub-problem has a closed-form solution by converting the constraint WT_{W = I into its equivalent condition,}

which can be written as ⟨ fi, fj⟩ =

1

rδi−j, ∀1 ≤ i, j ≤ r . (8)

To update the filters using the input signal, the coefficient vector x is sequentially partitioned into r vectors denoted by xi∈ ℛN

(i = 1, 2, …, r). Then, the objective function in (7) becomes min { fn}nr= 1, x ∥ x − W( f1( − ), f2( − ), …, fr( − ))y ∥22 + λ ∥ x ∥0 s . t . WTW = I . (9) ∥ x − Wy ∥22 =

∑

n= 1 N

∑

i= 1 r xi(n) − Wfi( − )y(n) 2 =

∑

n= 1 N

∑

i= 1 r xi(n) − ynTfi 2 , (10)

where yn= y(n: n + r − 1) is a r × 1 vector segmenting from the

input signal y. If we let i and n be column variable and row variable, respectively, and rearrange the coefficients, input signal and filters, i.e.

Y= y1, y2, …, yN ∈ ℛ r×N X= x1, x2, …, xN ∈ ℛ r×N F= f1, f2, …, fr ∈ ℛ r×r . (11) then we have ∥ x − Wy ∥22 = ∥ X − YTF∥F2 = ∥ X ∥F2 − 2tr(FXYT) + ∥ YT∥F2 r (12) where tr( ⋅ ) stands for the trace of a matrix, and the optimisation sub-problem formulated in (7) can be finally converted to

F= arg max

F tr(FXY

T

), s . t . FTF= 1

rIr. (13) Using the SVD, the above maximisation problem has an explicit solution, which can be expressed as

F= 1 rVU

T_, ₍₁₄₎

where U and V are the SVD of XYT_{, that is XY}T_{= UΔV}T_{. For a}

more detailed discussion on the solution of (13), see Zou et al. [31].

4 Signal model and denoising algorithm

4.1 Signal model

In CE, the signal is usually modelled as a superposition of a number of components under the assumption of system linearity [15]. A typical signal model for CE is usually expressed as

y(t) =

∑

i= 1 K

pi(t) + B(t) + n(t) (15)

where pi(t) is a peak which corresponds to the ith analyte, and B(t)

and n(t) correspond to the baseline drifting and the noise, respectively.

A peak, in CE, is a waveform with the characteristic of a rising and then a falling of the dependent variable with time. According to the definition, various peak models have been proposed in the past few years. The most widely accepted models are the Gaussian-based peaks and their variants. For example, a Gaussian-like model proposed in [4] is pi(t) = Aiexp −4 Tst− μi Wi 2 (16) where Ai is the peak's amplitude, μi is the migration time, Wi is the

width of the peak and Ts is the sampling interval. In addition, the

triangle model [32], resonance model [33] and other models were also proposed to describe the peaks present in the CE signal.

For the noise component n(t), it is usually assumed as pure white noise, which is independent and identically following a zero-mean Gaussian distribution or uniform random noise. In analytical chemistry, the most common baseline includes exponential drifting [24], sinusoidal drifting and linear drifting [4]. Apart from the consideration of B(t) and n(t) as two separated components, a unified noise model that represents the characteristics of both baseline drifting B(t) and noise n(t) was also proposed by some researchers [11]. In this paper, we consider the noise n(t) as a

(4)

separated component, and do not take into account the correction of baseline drifting.

4.2 Denoising algorithm

After discussing the construction of 1D tight frame and signal modelling, we summarise the denoising algorithm based on learned tight frame and present it in Algorithm 1 (see Fig. 1). Moreover, the amplitude–frequency characteristics of some of the tight frame filters used in our experiment are shown in Fig. 2. Obviously, they do not have predefined structures.

4.3 Threshold estimation

In the previous subsection, we proposed that the noise is removed by using a threshold Ti in part (II) of Algorithm 1 (Fig. 1). While

the universal threshold proposed by Donoho and Johnstone [34] is simple and effective, and can receive an estimate which is asymptotically optimal in the minimax sense, it is not a good choice under the framework of tight frame for that the tight frame coefficients are correlated. To achieve better denoising result, an adaptive threshold [35] based on the subband coefficients is adopted in our algorithm. In the process of construction of the tight frame from the input signal, the downsampling operation does not perform like discrete orthogonal wavelet decomposition, thus in fact, the representation of the signal under tight frame is redundant, i.e. the adjacent coefficients in each subband are correlated. Chang et al. [35] pointed out that if we separate the coefficients into two disjoint sets, namely {Wfi( − )y(2n)} and {Wfi( − )y(2n + 1)}, the

coefficients in each set will be uncorrelated. Then, the adaptive threshold can be computed in each set by the following rules.

Perform filtering on both sides of (1) using a filter fi( − ) from

the learned tight frame (except the low-pass filter), and let Wy(n),

Wg(n) and Wν(n) denote Wfi( − )y(n), Wfi( − )g(n) and Wfi( − )ν(n),

respectively. The coefficients satisfy

Wy(n) = Wg(n) + Wν(n) . (17)

After dividing Wy(n) into two sets, Wg(n) and Wν(n) are assumed

as two independent and identically distributed variables, which obey generalised Gaussian distribution and Gaussian distribution N(0, σ2_{), respectively, in each set. Then, the adaptive threshold is} defined as Ti= σ^ Wν 2 σ^ Wg (18) where σ^ Wν 2 _{and σ}^ Wg

2 _{are the variances of W}

ν(n) and Wg(n),

respectively.

Recall that Wg(n) and Wν(n) are independent of each other,

hence σ^ Wy 2 = σ^ Wg 2 + σ^ Wν 2 . (19)

For the noise variance σ^

Wν

2 _{, it can be estimated from a robust} median estimator [34]

σ^

Wν=

median Wy(n)

0.6745 (20)

where Wy(n) is the coefficient of the finest detail subband of WT.

Since Wy(n) is assumed as zero mean, σ^Wy can be estimated

empirically by σ^

Wy

2

= (1/L)∑n= 1

L

Wy2(n), where L is the number of

coefficients in each set. Then, σ^

Wg

2 _{can be directly computed by} using (19). Considering the fact that the variance of a variable is non-negative, the σ^ Wg 2 _{is defined as} σ^ Wg 2 = max σ^ Wy 2 − σ^ Wν 2 , 0 . (21)

It should be noted that if σ^

Wy 2 < σ^ Wν 2 _{, σ}^ Wg

2 _{will be 0 and the} threshold Ti will be ∞. In fact, Ti is impossible to be ∞, thus in

practice, Ti= max Wy(n) , and this corresponds to the case that

all coefficients are set to 0.

5 Experimental results

In this section, one simulated signal and two real signals were used to evaluate the performance of the proposed denoising algorithm. Meanwhile, the results obtained with our method were also compared with that of other three methods, namely sparse

Fig. 1 Algorithm 1: Denoising algorithm using learned tight frame

Fig. 2 Amplitude–frequency characteristics of some of the tight frame filters

(5)

representation-based method [20], wavelet-based method [11] and multi-scale products method [6]. For convenience, we referred to these three methods as sparse representation (SR), wavelet threshold method (WTM) and multi-scale product (MSP).

Recall that there is a parameter (threshold) λ that needs to be assigned a value in the sparse coding step of the construction of tight frame. In fact, the value of λ depends on both the noise level (σ) and desired sparsity degree of the input signal. In the experiment, the value of λ was set to be 4.4σ, which is hand optimised for the simulated CE signals. Since the first sub-problem modelled in (4) has a global optimal solution, the initialisation of tight frame { fn(0)}nr= 1 can be any tight frame satisfying the constraint WT_{W = I. The tight frame was initialised by using a}

discrete cosine transformation (DCT) tight frame with size of 32, and the size of training data segment yi was also set to 32 with

maximal overlap of size 31. Moreover, the maximal number of iteration K in Algorithm 1 (Fig. 1) was set to 35. The reason is that once the iteration number is >35, the improvement of denoising performance is very marginal. For all algorithms used for comparison, if not stated otherwise, the free parameters were set as suggested in the reference papers.

5.1 Experiments on simulated signal

The simulated CE signal (see Fig. 3a) was generated by using the model formulated in (15), specifically, five peaks, two types of baselines (linear baseline and exponential baseline). To make the simulated signal to be more general, an overlapped peak consisting of two separated peaks was also taken into consideration. As for the noise, it is modelled as Gaussian distribution with zero mean and its intensity is measured in terms of signal-to-noise ratio (SNR) with respect to the underlying CE signal. Moreover, five different noise levels that correspond to SNR = 10, 20, 25, 30 and 40 were used in our experiments.

To thoroughly compare the results of our method with that of SR, WTM and MSP, two indicators are introduced, one is the SNR

SNR = 10 log10 ∑n= 1 N g(n)2 ∑n= 1 N (g(n) − g^_(n))2 (22)

where g(n) is the original CE signal, g^_{(n) is the denoised signal} using the adopted method and N is the length of the CE signal. The other is the root-mean-square error (RMSE), which is defined as

RMSE = ∑n= 1

N

g(n) − g^_(n)2

N (23)

where g(n), g^_{(n) and N are the same as that in (22). By observing} (22) and (23) carefully, we know that the SNR is inversely proportional to the RMSE, that is the larger the value of SNR, the smaller the value of RMSE, which also means that the better performance of denoising.

Since the peak carries very important information (i.e. the type of analyte and its corresponding concentration), the shape preservation of peak becomes particularly important in the process of removing noise for CE signal. To assess the capacity of peak preservation of each denoising algorithm, another indicator called ratio of peak height [6] is also introduced, which is defined as

η=I

I (24)

where I and I are the peak heights of the denoised signal and the original pure signal, respectively. Obviously, η = 1 is the ideal case that the peak is not destroyed; otherwise, the peak is distorted in the process of the noise removal. To make it more convincing, an isolated peak (marked as number 1) and an overlapped peak (marked as number 2) are adopted to compute the value of η in our experiment.

The three indicator values from the simulated CE signal using all mentioned denoising methods are summarised in Table 1. From this table, it can be seen that our proposed method achieves the largest SNR values and the smallest RMSE values at any noise level, which indicate that the proposed method gets the best performance in noise removal. Meanwhile, we can also observe that with the decreasing of noise intensity, the superiority of denoising performance is gradually expanded. This is due to the fact that at low noise level the matching degree between the learned tight frame and the underlying CE signal is higher than the case in high noise level, which can generate a more effective representation. As for the shape preservation of peaks, the values of η1 and η2 listed in Table 1 indicate that the proposed algorithm preserves the peaks more better than other methods, except for the case of SNR = 25 dB, that is the value of η is closer to 1 than other methods. This exception can be explained as follows: since the peak 2 is corrupted seriously at this noise level, which leads to a large value of σ^

Wν

2

and subsequently generates a small value of σ^

Wg

2 and a big value of Ti, the estimated threshold Ti sets some signal

coefficients to 0.

Moreover, several other observations can also be obtained: (i) The SNR indicator reported in Table 1 indicates that the SR method obtains the smallest SNR value and the largest RMSE value, which means that its denoising performance is the worst among all the mentioned algorithms. This is due to the fact that the non-linear knowledge of the CE signal is inconsistent with the SR model, which is focused on non-linear measurement.

(ii) The MSP algorithm achieves the second largest SNR value among all the tested methods at any noise level, and the values of

η1 and η2 are more closer to 1 in most cases among the three

comparison methods. These values help the MSP method to become the second-best method in noise removal and shape preservation. The superior performance is attributed to the fact that the multi-scale product amplifies the significant features of the CE

Fig. 3 Simulated CE signal with five peaks, a linear baseline and an exponential baseline

(a) Pure CE signal, (b) Noisy CE signal (SNR = 25 dB)

The numbers 1 and 2 are used to mark the selected peaks used to compute the indicator η

(6)

signal while diluting the noise. Thus, the coefficients of signal can be easily distinguished from the coefficients of noise.

(iii) The WTM method receives the smallest η2 values at any case. The reason is as follows: (i) the Haar wavelet (with two vanishing moments), which is used in WTM cannot properly approximate the peaks (modelled as a quadratic function); (ii) peak 2 is contaminated seriously (compared with other peaks), the peak and noise contribute to the competitive detail coefficients and the estimated threshold cuts the detail coefficients of the peak. Moreover, the performance of the WTM method is between the SR

and MSP in the vast majority of situations, that is the values of SNR and RMSE are between the two algorithms. This is due to the fact that the feature representation of wavelet used in WTM is inferior to the multi-scale product used in the MSP method, and it is obviously superior to the SG filter which cannot adaptively match the features of the input CE signal.

For a visual comparison, the synthetic noisy CE signal with SNR = 25 dB and its corresponding denoised results are illustrated in Figs. 3b and 4. From Fig. 4, it can be observed that the denoised

Table 1 Results of SNR (dB), RMSE and ratio of peak height η obtained by various methods under different noise levels

Noise level, dB Indicators SR WTM MSP Proposed

SNR = 10 SNR 16.9378 22.9482 23.6438 25.4001 RMSE 0.3456 0.1329 0.1227 0.1002 η1 0.8208 1.0067 0.9904 0.9961 η2 0.8139 0.8145 0.8936 0.9281 SNR = 20 SNR 25.6908 30.9131 31.3755 34.2189 RMSE 0.1107 0.0531 0.0504 0.0363 η1 0.9248 0.9915 0.9924 0.9959 η2 0.9301 0.9281 0.9700 0.9751 SNR = 25 SNR 34.6201 34.9365 35.6332 39.1674 RMSE 0.0478 0.0334 0.0309 0.0205 η1 0.9232 1.0060 0.9886 0.9966 η2 0.9335 0.9591 0.9836 0.9791 SNR = 30 SNR 37.1742 37.3587 38.6006 43.0827 RMSE 0.0257 0.0253 0.0219 0.0131 η1 0.9458 1.0039 0.9970 1.0027 η2 0.9497 0.9731 0.9743 0.9879 SNR = 40 SNR 44.8905 45.0992 46.6593 52.4272 RMSE 0.0112 0.0104 0.0087 0.0045 η1 0.9689 1.0032 0.9981 1.0013 η2 0.9782 0.9891 0.9976 0.9984

Bold values indicate the best result.

Fig. 4 Denoising results of the simulated CE signal

(7)

result obtained by WTM is superior to the result denoised by SR, but inferior to the other two methods. This visual comparison is consistent with the results (SNR = 25 dB) listed in Table 1. By carefully observing Fig. 4d obtained by the proposed method, it can be seen that even though there are some tiny fluctuations at the begin and end of the peaks, the most portion of the baseline is smooth compared with other methods. Meanwhile, these fluctuations are relatively smaller than the fluctuations in other results on a large scale. Moreover, the curve shape (not the indicator η) of peak 2 is well preserved by our proposed method, and some distortions appear in peak 2 (mainly on both sides of the peak) after denoising by other algorithms. Thus, the visual comparisons also indicate that the proposed algorithm achieves the best results, that is it perfectly removes the noise while still preserving the features of the peaks.

Finally, to compare the computational burden of our method with other methods, the running times are assessed using MATLAB (R2018a, 64 bit) on a personal computer with an i7-4700 CPU (3.40 GHz) and 16.00 GB of memory and is listed in Table 2. It can be seen that the running time consumed by the proposed method ranks third among the four methods. This is because: (i) in the stage of tight frame training, the optimisation problem has closed-form solution; thus, the running time is shorter than SR and (ii) since the proposed method involves a large number of iterations in tight frame training, the running time is longer than the wavelet-based methods.

5.2 Experiments on real signals

The real CE signals used in this subsection are generated by separating the DNA samples using CE techniques. The experimental procedure, equipment and detection apparatus are the same as that used in [6, 17]. However, the specific experimental conditions and the analytes are completely different. The first real CE signal (Fig. 5a, denoted as Lipse Activator (LPA)-CE) was generated by separating the LPA with 4.0% concentration using a capillary that the length is 50 cm and the diameter is 75 μm, the temperature and electric-field strength are 65°C and 200 V/cm, respectively. The second real CE signal (Fig. 5b, denoted as polyvinyl pyrrolidone (PVP)-CE) was generated by separating the PVP with 8.0% concentration using a capillary that the length is 50 cm and diameter is 75 μm, the temperature and electric-field strength are 44°C and 300 V/cm, respectively.

To prove the effectiveness of the proposed method on real CE signal, the experimental settings in our method and comparison methods are the same as that in the previous simulated experiment. Owing to the lack of pure CE signal, the quantitative indicators mentioned above are no longer suitable for evaluating the

denoising results of real CE signal, and the evaluation of the results is mainly qualitative evaluation with visuality. Moreover, recall that the noise in CE signal is usually assumed as Gaussian noise, the error between the raw CE signal and denoised signal should be approximately Gaussian distribution if the denoising method is an ideal filter. Thus, we use the statistics W of Shapiro–Wilk test [36], which is exploited to test whether a sample coming from a normally distributed population to evaluate the performance of the denoising method. The definition of the statistics W can be written as W= ∑i= 1 n aix(i) 2 ∑i= 1 n (xi− x¯)2 (25) where x(i) is the ith-order statistic, x¯ is the sample mean and the

coefficients ai are given by

(a1, …, an) = mTV−1⋅ (mTV−1V−1m)−(1/2), where m is a vector

composed of the expected values of the order statistics and V is the covariance matrix of those order statistics. Obviously, the closer the value of W is to 1, the better the samples fit the Gaussian distribution.

The denoised results corresponding to Figs. 5a and b are illustrated in Figs. 6a–d and 7a–d, respectively. From these figures, it can be seen that the proposed method achieves the best results (Figs. 6d and 7d) in noise removal and shape preservation of peaks. Since an adaptive threshold was employed in the MSP and our method, some real fluctuations are represented in denoised signals (Figs. 6c, d and 7c, d), whereas the real fluctuations are smoothed out by WTM, which do not use the adaptive threshold; this makes the denoised signal to be more smooth in visual. As discussed in simulated experiment, the SR obtained the worst results (Figs. 6a and 7a) among all the methods, that is a portion of noise still remains in the recovered signals and the shapes of peaks are seriously distorted. Observing the peaks marked with a circle in Figs. 5a and b carefully, we can find that these two peaks are preserved well in denoised signals (Figs. 6d and 7d) obtained by the proposed method, and they are distorted in other results recovered by the comparison methods.

To further compare the performance of the denoising algorithms, we compute the values of W for error signals obtained by the proposed method and other comparison filters, and the corresponding results are listed in Table 2. From this table, it can be seen that the values of W obtained by the proposed method are closest to 1 in both cases, which indicates that our method achieves the best denoising performance. As with the visual evaluation, the SR obtains the smallest values of W which means that it loses some

Table 2 Running time of various algorithms on simulated signal

Method SR WTM MSP Ours

running time, s 10.0420 0.0507 0.0269 6.2760

Fig. 5 Two real CE signals

(a) LPA-CE signal, (b) PVP-CE signal

The green circles mark the selected peaks used to observe the effect of shape preservation

(8)

useful content of the CE signal as well as noise. Moreover, due to the limited space, the errors between the original CE signal and the denoised signals are illustrated in Fig. 8. From the error signals,

one can see that the errors obtained by the MSP (Fig. 8c) and the proposed method (Fig. 8d) are more homogeneous than other two methods, namely least information about the noise-free signal is

Fig. 6 Denoised results of the real LPA-CE signal

(a) Result by SR, (b) Result by WTM, (c) Result by MSP, (d) Result by the proposed algorithm

Fig. 7 Denoised results of the real PVP-CE signal

(9)

contained in the error signals. Then, combining the denoised signals in Fig. 6 and the value of W reported in Table 3, we can know that the proposed method achieves the best results.

6 Conclusion

In this paper, a novel denoising method based on tight frame learned from the input signal has been proposed for CE signal. To decompose the signal effectively, an adaptive tight frame was constructed from the underlying signal using the dictionary learning strategy. Owing to an ℓ0 regularisation term was contained in the objective function, the learned tight frame can provide a sparse representation for the analysed signal. Moreover, to remove noise appropriately, an adaptive threshold based on subband coefficients was computed to suppress the noise. The proposed algorithm was applied to remove the noise both in simulated CE signal and real CE signals, and the results were evaluated in terms of SNR, RMSE and ratio of peak height (η). The comparison on indicators and visuality indicates that the proposed method is superior to other methods both in noise removal and shape preservation of peak. The noise in the experiment is assumed as Gaussian white noise; in fact, other types of noises except for multiplicative noise will have little effect on the results. The reason is that the orthogonality of tight frame will remove the correlations existing in signal and noise (e.g. the correlated noise). Some tentative experiments also indicate that the results will be better when the correlated noise is employed in the proposed method. Moreover, only the tight constraint was added to the construction of tight frame; more constraints, which make the tight frame to have some special properties, will be added to the construction of tight frame in the future work.

7 Acknowledgments

This work was supported by the Key Science Programme of Anhui Education Department (KJ2018A0012) and the Research Fund for Doctor of Anhui University (J01003266), and is also supported by the National Natural Science Foundation of China (NSFC) (61402004).

8 References

[1] Chen, G., Zhu, X., Dou, X., et al.: ‘Analysis of capillary electrophoresis noise characteristics’, Acta Photonica Sin., 2017, 46, (6), p. 0612004

[2] Rokhas, M.K., Rönn, J.L., Wiklund, C., et al.: ‘Analysis of butterfly reproductive proteins using capillary electrophoresis and mass spectrometry’, Anal. Biochem., 2019, 566, (2019), pp. 23–26

[3] Jarméus, A., Emmer, Å.: ‘CE determination of monosaccharides in pulp using indirect detection and curve fitting’, Chromatographia, 2008, 67, (1–2), pp. 151–155

[4] Solis, A., Rex, M., Campiglia, A.D., et al.: ‘Accelerated multiple-pass moving average: a novel algorithm for baseline estimation in CE and its application to baseline correction on real-time bases’, Electrophoresis, 2007,

28, (8), pp. 1181–1188

[5] Zhang, H., Liui, X., Shao, X., et al.: ‘The study of the improved wavelet thresholding with translation invariant de-noising on capillary electrophoresis signal’. Fourth IEEE Int. Conf. Nano/Micro Engineered and Molecular Systems, Shenzhen, China, 2009, vol. 314, pp. 1099–1102

[6] Gao, Q., Lu, Y., Sun, D., et al.: ‘A multiscale products technique for denoising of DNA capillary electrophoresis signals’, Meas. Sci. Technol., 2013, 24, pp. 065004-1–065004-9

[7] Schafer, R.W.: ‘What is a Savitzky–Golay filter?’, IEEE Signal Process. Mag., 2011, 28, (4), pp. 111–117

[8] Browne, M., Mayer, N., Cutmore, T.R.H.: ‘A multiscale polynomial filter for adaptive smoothing’, Digit. Signal Process., 2007, 17, (1), pp. 69–75 [9] García-Pérez, I., Vallejo, M., Garca, A., et al.: ‘Metabolic fingerprinting with

capillary electrophoresis’, J. Chromatogr. A, 2008, 1204, (2), pp. 130–139 [10] Krawczyk, M., Gerkmann, T.: ‘STFT phase reconstruction in voiced speech

for an improved single-channel speech enhancement’, IEEE/ACM Trans. Audio Speech Lang. Process., 2014, 22, (12), pp. 4065–4080

Fig. 8 Errors between the real LPA-CE signal and the denoised signals

(a) Error between Figs. 5a and 6a, (b) Error between Figs. 5a and 6b, (c) Error between Figs. 5a and 6c, (d) Error between Figs. 5a and 6d

Table 3 Results of statistic W obtained by various methods on error signals

CE signal SR WTM MSP Ours

LPA-CE 0.625 0.984 0.992 0.994

PVP-CE 0.893 0.992 0.996 0.998

(10)

[11] Perrin, C., Walczak, B., Massart, D.: ‘The use of wavelets for signal denoising in capillary electrophoresis’, Anal. Chem., 2001, 73, (20), pp. 4903–4917 [12] Cao, H., Fan, F., Zhou, K., et al.: ‘Wheel-bearing fault diagnosis of trains

using empirical wavelet transform’, Measurement, 2016, 82, (2016), pp. 439– 449

[13] Yang, C., He, Z., Yu, W.: ‘Comparison of public peak detection algorithms for MALDI mass spectrometry data analysis’, BMC Bioinf., 2009, 10, (1), pp. 4– 10

[14] Paredes, J.L.: ‘A baseline correction algorithm for capillary electrophoresis data using local optimization of the legend algorithm in the wavelet domain’, Interciencia, 2009, 34, (8), pp. 556–562

[15] Stewart, R., Gideoni, I., Zhu, Y.: ‘Signal processing methods for capillary electrophoresis’, in Yang, N.S. (Eds.): ‘Systems and computational biology – bioinformatics and computational modeling’ (InTech, USA., 2011), pp. 311– 333

[16] Abdelnour, F.: ‘Symmetric tight frame wavelets with dilation factor M = 4’, Signal Process., 2011, 91, (12), pp. 2852–2863

[17] Wang, Y., Gao, Q.: ‘Spatially adaptive stationary wavelet thresholding for the denoising of DNA capillary electrophoresis signal’, J. Anal. Chem., 2008, 63, (8), pp. 841–847

[18] Xu, X., Wang, Y., Chen, S.: ‘Medical image fusion using discrete fractional wavelet transform’, Biomed. Signal Proc. Control, 2016, 27, pp. 103–111 [19] Shahdoosti, H.R., Hazavei, S.M.: ‘Image denoising in dual contourlet domain

using hidden Markov tree models’, Digit. Signal Process., 2017, 67, pp. 17– 29

[20] Rencker, L., Bach, F., Wang, W., et al.: ‘Sparse recovery and dictionary learning from non-linear compressive measurements’, IEEE Trans. Signal Process., 2019, 67, (21), pp. 5659–5670

[21] Chen, W.: ‘Simultaneously sparse and low-rank matrix reconstruction via non-convex and non-separable regularization’, IEEE Trans. Signal Process., 2018, 66, (20), pp. 5313–5323

[22] Aharon, M., Elad, M., Bruckstein, A.: ‘K-SVD: an algorithm for designing overcomplete dictionaries for sparse representation’, IEEE Trans. Signal Process., 2006, 54, (11), pp. 4311–4322

[23] Dian, R., Li, S., Fang, L.: ‘Multispectral and hyperspectral image fusion with spatial-spectral sparse representation’, Inf. Fusion, 2019, 49, (2019), pp. 262– 270

[24] Han, Q., Xie, Q., Peng, S., et al.: ‘Simultaneous spectrum fitting and baseline correction using sparse representation’, Analyst, 2017, 142, (13), pp. 2460– 2468

[25] Lu, Y., Gao, Q., Sun, D., et al.: ‘SAR speckle reduction using Laplace mixture model and spatial mutual information in the directionlet domain’, Neurocomputing, 2016, 173, (3), pp. 633–644

[26] Kreutz-Delgado, K., Murray, J.F., Rao, B.D., et al.: ‘Dictionary learning algorithms for sparse representation’, Neural Comput., 2003, 15, (2), pp. 349– 396

[27] Rubinstein, R.: ‘Dictionaries for sparse representation modeling’, Proc. IEEE, 2010, 98, (6), pp. 1045–1057

[28] Lesage, S., Gribonval, R., Bimbot, F., et al.: ‘Learning unions of orthonormal bases with thresholded singular value decomposition’. IEEE Int. Conf. Acoustics, Speech and Signal Processing, Philadelphia, PA, USA, 2005, vol.

5, pp. 293–296

[29] Zou, L., He, Q., Wu, J.: ‘Source cell phone verification from speech recordings using sparse representation’, Digit. Signal Process., 2017, 62, pp. 125–136

[30] Cai, J.F., Ji, H., Shen, Z., et al.: ‘Data-driven tight frame construction and image denoising’, Appl. Comput. Harmon. Anal., 2014, 37, pp. 89–105 [31] Zou, H., Hastie, T., Tibshirani, R.: ‘Sparse principle component analysis’, J.

Comput. Graph. Stat., 2006, 15, (2), pp. 265–286

[32] Stewart, R., Wee, A., Grayden, D.B., et al.: ‘Capillary electrophoresis (CE) peak detection using a wavelet transform technique’. Proc. SPIE, Biomedical Applications of Micro- and Nanoengineering IV and Complex Systems, Melbourne, Australia, 2008, vol. 7270, pp. 1–12

[33] Graves-Morris, P.R., Fell, A.F., Bensalem, M.: ‘Parameterisation of symmetrical peaks in capillary electrophoresis using [3/2]-type rational approximates’, J. Comput. Appl. Math., 2006, 189, (1–2), pp. 220–227 [34] Donoho, D.L., Johnstone, J.M.: ‘Ideal spatial adaptation by wavelet

shrinkage’, Biometrika, 1994, 81, (3), pp. 425–455

[35] Chang, S.G., Yu, B., Vetterli, M.: ‘Spatially adaptive wavelet thresholding with context modeling for image denoising’, IEEE Trans. Image Process., 2000, 9, (9), pp. 1522–1531