**Chapter 2: Creating the Core Model and Analytical Tools**

**2.2 Laboratory Experiments**

**2.2.3 Denoising and Detrending**

Theoretically, it would of course be possible to carry out comprehensive analysis directly on the raw data sampled from the scintillation counter. However, it is generally considered prudent to first filter out misleading perturbation, mainly classified as noise and trends, through specialized procedures known as denoising and detrending, respectively.

The Origin and Treatment of Noise

When considering the origin of these effects, it is useful remember how fundamentally stochastic our world is, and apart from fluctuations in expression rates and protein concentrations, which have even been pointed to as potentially essential features of the circadian clock, there could also be random variation in the reporter reaction with luciferin, or the levels of emitted light. Not least, there are also inherent inaccuracies in detecting photons and recording the corresponding data, a phenomenon that is well characterized in the context of photography and digital imaging, and all this underlying "noisy" variability, although in its sum often oriented along a normal Gaussian distribution, can add up to create phantom peaks and troughs, or to distort phase and period relationships. Depending on its precise cause, the noise can further be classified as either correlated or uncorrelated, but generally a good approximation can be achieved in denoising approaches by treating most deviations as independent and identically distributed. One possible avenue for handling noise consists of employing more robust data analysis tools, such as complex signal transforms, that are inherently better at filtering out small, random distortion than simple maxima/minima detection algorithms. Secondly, however, it is also possible to filter out noisy patterns in a separate step, even if oftentimes

108

requiring a trade-off between data fidelity, noise reduction, and computational cost, for example by utilizing averaging filters or detail filters.

The Use of Discrete Wavelet Transforms

Specifically, discrete wavelet transforms can be utilized to decompose a dataset into discrete subbands with a corresponding set of wavelet coefficients. Here, the high frequency subbands describe the finer details of the signal, which usually contain the noise component, and provided that this high frequency component is small relative to the overall signal, simply cancelling it with a coefficient of zero can be an effective avenue for "killing the noise". The basic procedure is often further refined into thresholding, a technique that relies on a framework of cut-off values to cancel all subbands deemed insignificant, before reassembling the complex, but now denoised dataset through an inverse wavelet transformation. Other possible adaptations include the use of hybrid schemes of wavelet transforms and optimization algorithms, which can for instance be used to effectively remove non-stationary noise from electocardiogram (ECG) signals. Here, the critical selection of wavelet denoising parameter is guided by a genetic algorithm, resulting in maximized filtration performance with significantly improved quality and signal to noise ratio, when compared to wavelet thresholding algorithms in the same setting (El-Dahshan 2010).

In the context of this project, a discrete wavelet transform was carried out on the experimental data using the inbuilt wpdencmp function of the MATLAB® computational software suite, as well as using the dedicated application WAVOS. However, it was observed that analysis using different signal transforms could detect not notable differences between the original and denoised data, pointing to both the overall reliability of the data analysis tools, as well as to the relative clarity and smoothness of the experimental readouts. Furthermore, in the context of the simulated data, the very nature of the data generation should not produce appreciable levels of noise, except where explicitly desired through the use of stochastic simulation approaches, and so denoising is not deemed appropriate in this context.

109

Recognizing Underlying Trends

The second type of signal distortion to be considered, before quantifying the entrainment effect of light on an asynchronous cell population by such measures as amplitude decay or the amplitude just after the pulse, is the existence of underlying trends in the data set, which would hinder the quantitative analysis and may occur in bioluminescence circadian rhythms in cultured cells for several reasons: Firstly, the response of cell cultures to different treatments is not only inherently variable, but may also be influenced by unaccounted factors. Secondly, the rhythms of the cell cultures exhibit damping, or in other words variance non-stationarities. Thirdly, these rhythms often show unstable baseline shifting, i.e. mean non-stationarities, the exact extend of which may change from experiment to experiment, or even from sample to sample. The factors that could give rise to these various non-stationarities across the time series, and to the variability between individual cells, sample populations, or test runs are likewise multifaceted. Next to more general stochastic effects pointed to above, although in the case of "fabricated" trends likely of a different, slower quality, there are also countless potentially relevant surrounding conditions, such as the physiological state and age of the cell populations, the existence of background temperature fluctuations, or artefacts related to the handling of the sample and collection of the data. It can be downright impossible on a practical level to control all these effects, and while some trends may point to valuable insights, it is oftentimes preferable to reduce any corrupting influence before the further data analysis process, and accordingly various approaches exist for removing these trends. One relatively simple procedure for removing baseline drift involves calculating and subtracting a moving average from the raw data, while MATLAB® also provides for an automated detrend function that subtracts either the mean or, depending on the data set, a least-squares best-fit line from the signal. The statistical self-affinity can also be evaluated with the use of detrended fluctuation analysis (DFA), which is related to spectral techniques such as autocorrelation, and is frequently employed for long-memory processes, even where mean or variance are found to be non-stationary. Although DFA has gained much

110

popularity since its introduction in 1994 by Peng et al (Peng et al. 1995),

various update techniques for the detection of long-range correlations have also been suggested, including a Modified Detrended Fluctuation Analysis and Centered Moving Average (CMA). In particular, a recent comparison found that at least for weak trends, CMA shows a comparable performance as DFA in long data, but better results in short data(Bashan et al. 2008). Finally, data-driven techniques for decomposing multi-component signals include Empirical Mode Decomposition, which can be employed for both detrending and denoising by making use of partial reconstructions ("Detrending and denoising with empirical mode decompositions"). In this context it is interesting to note that it is difficult to distil a precise definition of a trend, but it has been demonstrated for climate data how EMD can be utilized to determine intrinsic trends and natural variability, namely by sorting for intrinsically determined monotonic function, or alternatively a function with at most one extremum, within a certain temporal span (Wu et al. 2007). Detrending According to Moving Averages

In the case of removing possible masking effects from the circadian rhythms under investigation here, it is found that satisfactory results can be readily achieved by detrending traces on the basis of a 24-hour moving average; for an example, please see Figure 11. After all, the oscillating signal appears to constitute a relatively strong pattern around a naturally apparent anchor point, so that removal of the underlying distortion is well suited to an appropriately calibrated moving average approach. In the context of the simulated results, however, it should once again be noted, that all inputs are perfectly controlled by the computational environment, signifying that detrending might be, if anything, counterproductive for these sets of data. In fact, the ability to model even very long time spans in a tightly regulated environment is a good example of the many advantages of a well- established simulation system.

111

FIGURE 11 Detrending

Example of 24-hour moving average detrending on a light pulse trace. The top shows the raw data, the middle the trend that is removed and the bottom the residual detrended data