• No results found

2.4 Signal Reconstruction

2.4.1 S-Transform Inversion

The original and discrete ST is always invertible. More specifically, in the absence of intermediate modifications, STx[n, p]is the unique TF representation of x[n], which is then exactly recoverable from the respective transform. We emphasize here that invertibility follows from employing analysis windows that fulfill two generic and necessary conditions:

1. their sliding time responses span the whole time interval of N samples, during which the input signal is observed;

2. the composite frequency response of their modulated bandwidths cover the entire spectrum of the input signal with non-zero values.

These requirements basically state that the transform should preserve the informa- tion of the signal analyzed. The sufficient conditions instead are specific to the pro- cedure used to invert the transform. As we anticipated, there are two main synthe- sis methods to map STx[n, p] back to x[n]and they are both discussed in [36]. On the one hand, the FI exploits the linear relation between the ST and the FT: every frequency component of the Fourier spectrum results from summing over time the corresponding local components of the ST, according to

1 N N/2−1

q=−N/2 N1

m=0 STx[m, q]  ejNnq = 1 N N/2−1

q=−N/2 X[q]ejNnq =DFT−1 q {X[q]} =x[n]. (2.17) The ST is so inverted by performing the inverse DFT of the one-dimensional function that is obtained by time-averaging the forward transform. On the other hand, the TI

FIGURE2.4: Amplitude of the composite transfer function of the bank of filters implementing the discrete ST.

approximates x[n]as ˜x[n] = N/2−1

q=−N/2, q6=0 √ |q| STx[n, q]e jNnq+ST x[n, 0] = N−1

m=0 x[m] 1 N N/2−1

q=−N/2 e−12 (nm)q N 2 ejN(n−m)q  = x[n] ~i[n] (2.18)

where i[n]denotes a smoothing function defined as the summation of all the win- dows normalized and frequency-shifted:

i[n] = N/2−1

q=−N/2, q6=0 √ |q| w[n, q]e jNnq+w[n, 0] = 1 N N/2−1

q=−N/2 e−12( nq N) 2 ejNnq. (2.19)

the frequency response of which is denoted by I[p]and depicted in Fig. 2.4. This function is real because it coincides the DFT of Eq. 2.19 that is symmetric. The exact input signal is therefore obtained by deconvolving i[n]from the approximation ˆx[n]. Eqs. 2.17and2.18interchange the roles of time and frequency under different sufficient conditions and computational requirements. We may assess these aspects of the ST by analogy with the STFT in [33]. Indeed, the complementarity between the TI and the FI reflects the duality of the filter-bank summation (FBS) and the overlap- add (OLA) methods used to invert the STFT, which both are reviewed in [38]. These two methods are motivated from the DFT and filter-bank interpretations of the STFT, respectively. In more detail, the OLA recovers x[n]through IDFT the cross sections of STFTx[n, p]taken for each n at which the analysis is performed, then summing the results over n. As such, the complexity of this procedure is dominated by that of the DFT, which is on the order of O(Nwlog2Nw)operations per sample if FFT blocks are used. Since the FI performs the same procedure but in the opposite order, it likewise performs on the order of O(Nlog2N)operations per sample. The OLA ensures the STFT invertibility if the analysis window shifted by the samples at which the STFT is evaluated sums to the area under the window, or equivalently, if the sampling rate in time of the STFT is dense enough: at least twice the the cut-off frequency of the prototype lowpass filter used to perform the analysis. A generalized formulation of this constraint may be applied to frequency-dependent windows for the ST as

tiples of this factor, (i.e., w[kN] = 0 with k∈Z). The constraint may be generalized for picking frequency-dependent windows according to

1 N

q w[n, q] w[0, q]e j Nnq =δ[n] (2.21)

where δ[n] is the Kronecker delta function, and the summation runs over all the analyzed voices. Any fixed-resolution window for short-time analysis satisfies the above constraint if the number of frequency bins is at least equal to the window duration Nw, or equivalently, if the STFT is sampled in the frequency domain in ac- cordance with the Nyquist theorem. This fact is exploited in [44] to build a vocoder, which is an analysis-synthesis system with limited data rate for speech signal pro- cessing. On the contrary, a set of windows for multi-resolution analysis can hardly comply with Eq.2.21without compromising the underlying TF resolution trade-off, unless it is specifically designed to do so. The Gaussian windows used in the original ST are the optimal choice for TF analysis, but they do not satisfy this constraint even when N voices are analyzed. Indeed, their modulated frequency responses make up a set of parallel bandpass filters, the composite transfer function of which preserves the energy but produces distortion. The distorted overall frequency response is plot- ted in [37, Fig. 2] and, after a proper energy normalization, in Fig. 2.4. The impulse response of this distortion is deduced from the first member of Eq.2.21, which coin- cides with Eq.2.19when Eqs.2.6and2.7are adopted. It is now clear why, unlike the FBS method, the TI has to compensate for a smoothing function through the decon- volution in Eq. 2.18, at the cost of extra computations with respect to the nominal O(N)operations per sample. In principle, the same issue would affect the FI, if the OLA method is applied to invert a version of Eq. 2.5that employs time-dependent windows (e.g., with v=m) to achieve a time-varying TF resolution.

As opposed to [23], we define the discrete ST using the DFT of the Gaussian win- dows rather than discretizing the respective continuous FT. Moreover, we normalize the Gaussian windows given by Eqs.2.6and2.7to their actual discrete summations instead of|p|/(√2πN), when constructing i[n]. These simple corrections are related to Eqs.2.20and2.21. They preserve the energy and overcome the loss of information demonstrated in [36] at the slow frequencies. As a result, we can combine both the DFT and the filtering formulations of the ST with the TI or the FI, indiscriminately, without producing artifacts. We accomplish this achievement for instance in Fig.2.5, by recovering the signal represented in Fig. 2.3. The plots of the samples returned by the TI and the FI coincide with the input time series to machine precision. As such, the ST exhibits the neutrality with respect of the forward transforms, if no in- termediate modifications are performed. Interestingly, the side effects of filtering the

FIGURE2.5: Real values of the time series denoted as y[n]which ex- actly recovers the complex chirp x[n]from the ST in Fig.2.3.

FIGURE2.6: Reconstruction of a time series by means of the TI.

ST described in [36] resemble those for the STFT in [33]. On the one hand, the syn- thesis of the TI is smoothed by the convolution between the input and the IDFT of the window-weighted version of any spectral modification. On the other hand, the output synthesized through the FI is smeared due to an extra undesired window- ing operation. Consequently, the choice between the TI and the FI maintains either the time or the frequency localization of the implemented TF filter, respectively. If the aim is reconstructing one sample per unit time, as a data stream, the TI is the natural for recovering the input signal. Therefore, the samples at the output of each channel in Eq. 2.9 are first multiplied by a frequency-dependent factor, then accu- mulated, and finally deconvolved through Eq. 2.19. This latter operation may be interpreted as that of an equalization filter, which reverses the distortion mentioned before. In the light of the formulation of the analysis and inversion with the ST, the whole real-time implementation consists of a bank of digital filters followed by the