Matching pursuit—MP - Time-frequency signal decompositions

2.4 Non-stationary signals

2.4.2 Analytic tools in the time-frequency domain

2.4.2.2 Time-frequency signal decompositions

2.4.2.2.7 Matching pursuit—MP

time-frequency localized structures. The two methods presented above: spectrogram and scalogram are working well but they are restricted by the a priori set trade-off be-tween the time and frequency resolution in different regions of the time-frequency space. This trade-off does not follow the structures of the signal. In fact interpretation of the spectrogram or scalogram requires understanding which aspects of the repre-sentation are due to the signal and which are due to the properties of the methods.

A time-frequency signal representation that adjusts to the local signal properties is possible in the framework of matching pursuit (MP) [Mallat and Zhang, 1993].

In its basic form MP is an iterative algorithm that in each step finds an element g_γ_n (atom) from a set of functions D (dictionary) that best matches the current residue of the decomposition Rⁿx of signal x; the null residue being the signal:

⎧⎨

⎩ R⁰x= x

Rⁿx= Rⁿx,gγngγn+ Rⁿ⁺¹x

gγn = argmaxg_γi∈D| Rⁿx,gγi| (2.109) where: arg maxg_γi∈Dmeans the atom g_γ_i which gives the highest inner product with the current residue: Rⁿx. Note, that the second equation in (2.109) leads to orthogo-nality of gγn and Rⁿ⁺¹x, so :

Rⁿx²= Rⁿx,gγn ²+$$Rⁿ⁺¹x$$² (2.110) The signal can be expressed as:

∑

Thus in this limit we have:

∑

^∞

n=0

%Rⁿx,gγn

gγn (2.112)

FIGURE 2.20: Illustration of the iterative MP decomposition. a) Signal to be de-composed (black) and the residue after six iterations (dotted line), b) from top to bottom the six consecutive selected atoms, c) the sum of the six atoms.

In this way signal x is represented as a weighted sum of atoms (waveforms) from the dictionary D. The iterative procedure is illustrated in Figure 2.20. Taking into account (2.110) we can see that the representation conserves the energy of the signal:

x²=

∑

^∞

n=0

%Rⁿx,gγn& ² (2.113) The MP decomposition can be considered as an extension of the atomic decompo-sitions offered by STFT or CWT. The main advantage of the MP paradigm is the relaxation of constraint between the frequency band and frequency resolution. The MP algorithm performs decomposition in an extremely redundant set of functions, which results in a very flexible parametrization of the signal structures.

In principal the dictionary D can be any set of functions. In practical implemen-tations (e.g.,http://eeg.pl/mp)the dictionary contains a base of Dirac deltas, a base of sinusoids and a set of Gabor functions:

g_γ(t) = K(γ)e^−π(^t−uσ )²sin(2π f (t − u) + φ) (2.114) with K(γ) normalization factor such that ||gγ|| = 1, and γ = {u, f ,σ,φ} are the pa-rameters of functions in the dictionary (u –time translation, f –frequency, σ – time

width, φ – phase).

The parameters γ can be sampled in various ways. The original idea of Mal-lat [MalMal-lat and Zhang, 1993] was to follow a dyadic scheme that mimics the over-sampled discrete wavelets. For applications where the parameters are used to form statistics of the atoms the dyadic sampling produced estimators that were biased by the structure of the dictionary. Introduction of stochastic dictionaries relied on ran-domization of the time-frequency coordinates and time width of atoms [Durka et al., 2001a]. This allowed to obtain a bias free implementation. Further extensions of the MP algorithm allow for analysis of multivariate signals i.e multichannel and multi-trial decompositions [Durka et al., 2005b, Sielu˙zycki et al., 2009a] (Sect. 3.6.3).

The atomic decomposition of signal can be used to produce the time-frequency en-ergy distribution. In this approach the best properties of enen-ergy distributions (WVD) and atomic decompositions can be joined. The WVD of the whole decomposition is:

Wx(t, f ) =

∑

^∞

is WVD of individual atoms. The double sum in equation (2.115) corresponds to the crossterms, but since it is given explicitly, it can be omitted yielding the estimator of the energy density distribution in the form:

E_x^MP(t, f ) =

∑

n=0

| Rⁿx, gγn|²Wg_γn(t, f ) (2.117)

This interpretation is valid since normalization of atoms (2.114):

Z_+∞ This representation has implicitly no cross-terms and for Gabor atoms offers the highest time-frequency resolution (Sect. 1.4.7).

MP dictionary can be adjusted to be coherent with particular structures in the signal. The dictionary containing asymmetric functions was designed and proved to be useful in description of components with different time courses of the rising and decaying parts [Jedrzejczak et al., 2009] (Sect. 4.5.2).

2.4.2.2.8 Comparison of time-frequency methods The complex character of biomedical signals and their importance in health research and clinical practice brought a wide variety of signal analysis methods into applications in biomedical research. The most widespread are the spectral methods which make possible the identification of the basic rhythms present in the signal. Conventional methods of the analysis assumed stationarity of the signal, in spite of the fact that interesting processes are often reflected in fast dynamic changes of signal. This implied the application to the analysis of the signals methods operating in time-frequency space.

The available time-frequency methods can be roughly divided into two categories:

• Those that give directly continuous estimators of energy density in the time-frequency space

• Those that decompose the signal into components localized in the time-frequency space, which can be described by sets of parameters, and at the sec-ond step the components can be used to create the estimators of time-frequency energy density distribution

An example of time-frequency energy distribution obtained by means of different methods is shown inFigure 2.21.

In the first category—the Cohen’s class of time-frequency distributions—one ob-tains directly the time-frequency estimators of energy density without decomposing the signal into some predefined set of simple elements. This allows for maximal flex-ibility in expressing the time frequency content of the signal. However, there are two consequences:

• The first consequence is the lack of parametric description of the signals struc-tures

• The second consequence is that, no matter how much the signal structures are separated in time or frequency, they interfere and produce cross-terms.

The problem of compromise between time and frequency resolution manifests when one selects the proper filter kernel to suppress the cross-terms.

In the second category the most natural transition from spectral analysis to the analysis in time-frequency space is the use of short time Fourier transform (STFT) and a representation of energy density derived from it—the spectrogram. The pos-itive properties of this approach are the speed of computations and the time and frequency shift invariance, which makes the interpretation of the resulting time-frequency energy density maps easy to interpret. The main drawbacks are: (1) the a priori fixed compromise between time and frequency resolution in the whole time-frequency space, which results in smearing the time-time-frequency representation, (2) the presence of cross-terms between the neighboring time-frequency structures.

Another common choice in the second category is the CWT. From the practical point of view the main difference from the STFT relies on another compromise be-tween the time and frequency resolution. In case of CWT, one sacrifices the time resolution for the better frequency resolution of low frequency components and vice

FIGURE 2.21: Comparison of energy density in the time-frequency plane ob-tained by different estimators for a signal e): a) spectrogram, b) discrete wavelet transform, c) Choi-Williams transform, d) continuous wavelets, f) matching pursuit.

Construction of the simulated signal shown in (e), the signal consisting of: a sinusoid, two Gabor functions with the same frequency but different time positions, a Gabor function with frequency higher than the previous pair, an impulse. From [Blinowska et al., 2004b].

versa for higher frequency components; also the change of the frequency of a struc-ture leads to the change of the frequency resolution.

STFT and CWT can be considered as atomic representations of the signal, and as such give a certain parametric description of the signals. However, the representation in not sparse; in other words there are too many parameters; hence they are not very informative.

The sparse representation of the signal is provided by DWT and MP, which leads to efficient parameterization of the time series. The DWT can decompose the signal into a base of functions, that is a set of waveforms that has no redundancy. There are fast algorithms to compute the DWT. Similar to CWT, the DWT has poor time resolution for low frequencies and poor frequency resolution for high frequencies. The DWT is very useful in signal denoising or signal compression applications. The lack of redundancy has a consequence in the loss of time and frequency shift invariance.

DWT may be appropriate for time-locked phenomena, but much less for transients appearing in time at random, since parameters describing a given structure depend

on its location inside the considered window.

The decomposition based on the matching pursuit algorithm offers the step-wise adaptive compromise between the time and frequency resolution. The resulting de-composition is time and frequency invariant. The time-frequency energy density esti-mator derived from the MP decomposition has explicitly no cross-term, which leads to clean and easy-to-interpret time-frequency maps of energy density. The price for the excellent properties of the MP decomposition is the higher computational com-plexity.

The sparsity of the DWT and MP decompositions has a different character which has an effect on their applicability. DWT is especially well suited to describing time locked phenomena since it provides the common bases. MP is especially useful for structures appearing in the time series at random. The sparsity of MP stems from the very redundant set of functions, which allows to represent the signal structures as a limited number of atoms. The MP decomposition gives the parameterization of the signal structures in terms of the amplitude, frequency, time of occurrence, time, and frequency span which are close to the intuition of practitioners.

2.4.2.2.9 Empirical mode decomposition and Hilbert-Huang transform The Hilbert-Huang transform (HHT) was proposed by Huang et al. [Huang et al., 1998].

It consists of two general steps:

• The empirical mode decomposition (EMD) method to decompose a signal into the so-called intrinsic mode function (IMF)

• The Hilbert spectral analysis (HSA) method to obtain instantaneous frequency data

The HHT is a parametric method and may be applied for analyzing non-stationary and non-linear time series data.

Empirical mode decomposition (EMD) is a procedure for decomposition of a sig-nal into so called intrinsic mode functions (IMF). An IMF is any function with the same number of extrema and zero crossings, with its envelopes being symmetric with respect to zero. The definition of an IMF guarantees a well-behaved Hilbert trans-form of the IMF. The procedure of extracting an IMF is called sifting. The sifting process is as follows:

1. Between each successive pair of zero crossings, identify a local extremum in the signal.

2. Connect all the local maxima by a cubic spline line as the upper envelope Eu(t).

3. Repeat the procedure for the local minima to produce the lower envelope El(t).

4. Compute the mean of the upper and lower envelope: m11(t) =¹₂(Eu(t)+El(t)).

5. A candidate h11for the first IMF component is obtained as the difference be-tween the signal x(t) and m11(t): h11(t) = x(t) − m11(t).

In a general case the first candidate h11doesn’t satisfy the IMF conditions. In such case the sifting is repeated taking h11as the signal. The sifting is repeated iteratively:

h1k(t) = h_1(k−1)(t) − m1k(t) (2.120) until the assumed threshold for standard deviation SD computed for the two consec-utive siftings is achieved. The SD is defined as:

SD=

∑

t=0

|h_1(k−1)(t) − h1k(t)|²

h²_1(k−1)(t) (2.121)

Authors of the method suggest the SD of 0.2–0.3 [Huang et al., 1998]. At the end of the sifting process after k iterations the first IMF is obtained:

c1= h1k (2.122)

The c1mode should contain the shortest period component of the signal. Subtracting it from the signal gives the first residue:

r1= x(t) − c1 (2.123)

The procedure of finding consecutive IMFs can be iteratively continued until the variance of the residue is below a predefined threshold, or the residue becomes a monotonic function—the trend (the next IMF cannot be obtained). The signal can be expressed as a sum of the n-empirical modes and a residue:

x(t) =

∑

ⁿ

i=1

ci− rn (2.124)

Each of the components can be expressed by means of a Hilbert transform as a product of instantaneous amplitude aj(t) and an oscillation with instantaneous fre-quency ωj(t) (Sect. 2.4.1): cj= aj(t)eⁱ^R^ω^j^(t)dt. Substituting this to (2.124) gives representation of the signal in the form:

x(t) =

∑

ⁿ

i=1

aj(t)eⁱ^R^ω^j^(t)dt (2.125)

Equation (2.125) makes possible construction of time-frequency representation—the so-called Hilbert spectrum. The weight assigned to each time-frequency coordinate is the local amplitude.

2.5 Non-linear methods of signal analysis

Non-linear methods of signal analysis were inspired by the theory of non-linear dynamics— indeed the biomedical signal may be generated by the non-linear pro-cess. Dynamical systems are usually defined by a set of first-order ordinary differ-ential equations acting on a phase space. The phase space is a finite-dimensional

vector spaceR^m, in which a state x∈ R^mis defined. For the deterministic system we can describe the dynamics by an explicit system of m first-order ordinary differential equations :

dx(t)

dt = f (t,x(t)), x∈ R^m (2.126)

If the time is treated as a discrete variable, the representation can take a form of an m-dimensional map:

x_n+1= F(xn), n∈ Z (2.127)

A sequence of points x_nor x(t) solving the above equations is called a trajectory of the dynamical system. Typical trajectories can run away to infinity or can be confined to the certain space, depending on F (or f ). An attractor is a geometrical object to which a dynamical system evolves after a long enough time. Attractor can be a point, a curve, a manifold or a complicated object called a strange attractor. Attractor is considered strange if it has non-integer dimension. Non-linear chaotic systems are described by strange attractors.

Now we have to face the problem that, what we observe is not a phase space object but a time series and, moreover, we don’t know the equations describing the process that generates them. The delay embedding theorem of Takens [Takens, 1981]

provides the conditions under which a smooth attractor can be reconstructed from a sequence of observations of the state of a dynamical system. The reconstruction preserves the properties of the dynamical system that do not change under smooth coordinate changes. A reconstruction in d dimensions can be obtained by means of the retarded vectors:

ξ(t) = x(ti),x(ti+ τ),...,x(ti+ (m − 1)τ) (2.128) Number m is called the embedding dimension, τ is called delay time or lag.

According to the above formula almost any scalar observation (e.g., time series), is sufficient to learn about the evolution of a very complex high-dimensional deter-ministic evolving system. However, we don’t know in advance how long the retarded vector must be and we don’t know the delay τ.

Choosing a too small value of τ would give a trivial result and too large τ would hamper the information about the original system. Usually the time coordinate of the first minimum of the autocorrelation function is taken as τ. The phase portrait of an ECG obtained by embedding in three-dimensional space is shown inFigure 2.22.

Please note (picture b) the distortion of the phase portrait for too large τ. Finding m is even more complex. The method of false neighbors [Kantz and Schreiber, 2000]

is difficult to apply and doesn’t give unequivocal results. Usually the embedding dimension is found by increasing the dimension step by step.

2.5.1 Lyapunov exponent

Lyapunov exponents describe the rates at which nearby trajectories in phase space converge or diverge; they provide estimates of how long the behavior of a mechanical

FIGURE 2.22: Phase portraits of human ECG in three-dimensional space. A two-dimensional projection is displayed for two values of the delay τ: (a) 12 ms and (b) 1200 ms. (c) represents the phase portrait constructed from ECG of simultaneously recorded signals from three ECG leads. From [Babloyantz and Destexhe, 1988].

system is predictable before chaotic behavior sets in. The Lyapunov exponent or Lya-punov characteristic exponent of a dynamical system is a quantity that characterizes the rate of separation of infinitesimally close trajectories. Quantitatively, the separa-tion of two trajectories in phase space with initial distance ΔZ0can be characterized by the formula:

|ΔZ(t)| ≈ e^λt|Z0| (2.129)

where λ is the Lyapunov exponent. Positive Lyapunov exponent means that the tra-jectories are diverging which is usually taken as an indication that the system is chaotic. The number of Lyapunov exponents is equal to the number of dimensions of the phase space.

2.5.2 Correlation dimension

The concept of generalized dimension (special cases of which are: correlation di-mension and Hausdorff didi-mension) was derived from the notion that geometrical ob-jects have certain dimensions, e.g., a point has a dimension 0, a line—1, a surface—2;

in case of chaotic trajectories dimension is not an integer.

The measure called correlation dimension was introduced by [Grassberger and Procaccia, 1983]. It involves definition of the correlation sum C(ε) for a collection of points xnin some vector space to be the fraction of all possible pairs of points which are closer than a given distance ε in a particular norm. The basic formula for

C(ε) is: where Θ is the Heaviside step function,

Θ(x) =

0 for x≤ 0

1 for x> 0 (2.131)

The sum counts the pairs(xi,xj) whose distance is smaller than ε. In the limit N → ∞ and for small ε, we expect C to scale like a power law C(ε) ∝ ε^D², so the correlation dimension D2is defined by:

D2= lim

ε→0lim

N→∞

∂ logC(ε,N)

∂ log ε (2.132)

In practice, from a signal x(n) the embedding vectors are constructed by means of the Takens theorem for a range of m values. Then one determines the correlation sum C(ε) for the range of ε and for several embedding dimensions. Then C(m,ε) is inspected for the signatures of self-similarity, which is performed by construction of a double logarithmic plot of C(ε) versus ε. If the curve does not change its character for successive m we conjecture that the given m is a sufficient embedding dimension and D₂is found as a slope of a plot of log(C(ε,N)) versus log(ε).

The Haussdorf dimension DH may be defined in the following way: If for the set of points in M dimensions the minimal number of N spheres of diameter l needed to cover the set increases like:

N(l) ∝ l^−D^H for l→ 0, (2.133)

DHis a Hausdorff dimension. DH≥ D2, in most cases DH≈ D2.

We have to bear in mind that the definition (2.132) holds in the limit N→ ∞, so in practice the number of data points of the signal should be large. It has been pointed out by [Kantz and Schreiber, 2000] that C(ε) can be calculated automatically, whereas a dimension may be assigned only as the result of a careful interpretation of these curves. The correlation dimension is a tool to quantify self-similarity (fractal—

non-linear behavior) when it is known to be present. The correlation dimension can be calculated for any kind of signal, also for a purely stochastic time series or colored noise, which doesn’t mean that these series have a non-linear character. The approach which helps in distinguishing non-linear time series from the stochastic or linear ones is the method of surrogate data (Sect. 1.6).

2.5.3 Detrended fluctuation analysis

Detrended fluctuation analysis (DFA) quantifies intrinsic fractal-like correlation properties of dynamic systems. A fundamental feature of a fractal system is scale-invariance or self similarity in different scales. In DFA the variability of the signal

is analyzed in respect to local trends in data windows. The method allows to detect long-range correlations embedded in a non-stationary time series. The procedure relies on the conversion of a bounded time series xt(t ∈ N) into an unbound process:

X_t:

Xt=

∑

i=1(xi− xi) (2.134)

where Xtis called a cumulative sum and xi is the average in the window t. Then the integrated time series is divided into boxes of equal length L, and a local straight line (with slope and intercept parameters a and b) is fitted to the data by the least squares method:

E²=

∑

i=1(Xi− ai− b)² (2.135)

Next, the fluctuation—the root-mean-square deviation from the trend is calculated over every window at every time scale:

F(L) = This detrending procedure followed by the fluctuation measurement process is repeated over the whole signal over all time scales (different box sizes L). Next, a log−log graph of L against F(L) is constructed.

A straight line on this graph indicates statistical self-affinity, expressed as F(L) ∝ L^α. The scaling exponent α is calculated as the slope of a straight line fit to the log−log graph of L against F(L).

The fluctuation exponent α has different values depending on the character of the

In document Practical Bio Medical Signal Analysis Using MATLAB (Page 80-200)