• No results found

Envelope enhancement (EE) has shown that it can improve word intelligibility scores in pa- tients with AN[27]. The goal of this part of the thesis is to validate the results given by[27], and to develop the EE algorithm within the RT framework discussed herein[28, 8, 5]. These studies have shown an increase in word identification scores when the envelope of the speech was enhanced. A series of studies utilize EE as a speech enhancement method for speech corrupted by noise[29, 30].

2.10.1

Principles

For either speech enhancement, or to improve ineligibility in patients with AN, EE implements several common steps. First the input signal is filtered at a frequency lower than the Nyquist rate of the signal. Then it is split into frequency bands, using one of the following two ap- proaches:

• Uniform bands: The input signal is split into bands of uniform width (in terms of Hz). This is the approach utilized in [29, 30], and it takes advantage of the fact that for uniform bands, finding the Hilbert envelope is equivalent to finding the modulus of the short-time Fourier transform using a window that spans the range of interest.

• Octave bands: The input signal is split into bands spanning 3rd octave ranges. A given frequency range is divided into octaves where the upper frequency of each band is twice the lower frequency of the same band. Third octaves split the octave range further into three by having the upper frequency limit being 21/3 (1.26) times the lower frequency. This approach is utilized in [27, 28].

Once the input signal has been split into the corresponding bands the envelopes of the resulting waveforms are extracted. There are two common methods for envelope detection: Hilbert Envelope Detector and low-pass full-wave rectification, which when applied to an of- fline application yield similar results.

2.10. EE 31

Hilbert Envelope Detection

A common and efficient technique used for envelope detection is based on the Hilbert Trans- form. It involves computing an analytic signal with the real part taken as the original input signal and the imaginary part is a 90 degree phase shifted version of the input. The required envelope can then be estimated by taking the magnitude of the resulting complex vector. To obtain a smooth envelope of the input signal a low pass FIR or IIR filter can be introduced following the Hilbert transform.

Using the description given in [31] the Hilbert transform of any function f (x) is given by:

F(t)= 1 π Z ∞ −∞ f(x) t−xdx (2.3)

The above integral can be evaluated using the Cauchy principle value theorem and it can be written as the following convolution:

F(t)= 1

πt ? f (x)= F −1F {

f (x)}

i∗sgn(ω) (2.4)

where using the convolution property of the Fourier transform, the convolution is converted to a multiplication, andsgn(x) is the sign function.

Having defined the Hilbert transform for an arbitrary real function f(x), construction of the analytical signal is given by:

Y(t)=y(t)+ jh(t)= A(t)ejω(t) (2.5)

wherey(t) andY(t) are the input signal and analytic signal respectively andh(t) is the Hilbert transform of y(t) calculated using equation (2.4). Y(t) can then be converted to polar form, whereA(t) is the envelope of the input signal andω(t) is the phase of the analytic signal, which can be discarded. In [29, 30] the authors use this approach to calculate the envelopes in their RT implementation of EE.

Full-Wave Rectification

Full-wave rectification is another approach to calculate an envelope of a given input signal. Depending on the length and type of filters used it often is more computationally intensive than the Hilbert envelope detector. The Hilbert transofrm method requires one trivial fitering operation and then a summation accross the input frame to compute the envelope, whereas full-wave rectification requires taking the absoulte value followed by two filtering stages to compensate for the filter lag. It is used by [28, 5, 8] for the envelope calculation in an non- RT implementation of EE. The input signal is split into bands just like above, after which the absolute value of the signal in each band is passed through a low-pass filter (LPF) with a cutoff frequency in the range 5-25Hz(corresponding to syllabic rate in human speech). Narne et al.[5] use a filter with a cutoffat 32Hz, which is the value used in [27] as well. It must be noted that a review of the literature uncovered that both IIR and FIR filters have been used to implement full-wave rectification, however for the purpose of EE the phase response, or in particular the group delay though the filter should be constant, thus facilitating proper reconstruction of the enhanced signal from the band data.

Enhancing the Envelope

Calculating the enhanced envelope involves multiplying each band signal by a corresponding band gain vector, derived from the original envelope within each band. The gain equation relates the original and resulting envelope thorough a non-linear function. Several approaches for calculating the gain vector were encountered in the literature review:

• Power Law: This is the simplest way of non-linearly enhancing a signal envelope and is given by Clarkson et.al, as per equation (2.6), wherekindicates the band of interest, yk

is the expanded speech envelope and Ak is the calculated band envelope. It was utilized

in a real-time implementation of the EE algorithm targeted at speech enhancement in low-noise conditions [29].

2.10. EE 33

yk(n)={Ak(n)}ν (2.6)

Equation (2.6) can be implemented in two ways, in the first approach, as it is in[29],yk

is used directly as the information carrier and it is mixed with the noisy phase from the band prior to reconstruction. And in the second approach a gain vector can be calculated as the element-by-element ratio between yk and Ak which is then used to multiply the

band signal resulting in an expanded band vector.

• Spectral Threshold: This is an evolution of the previous method proposed by the same authors in [30]. The expansion method is intended to pass spectral magnitudes greater than the threshold (α), while smaller values of the original envelope are attenuated. The threshold can be either fixed or adaptive, based on the spectral variance within the band of interest. The expansion is given by:

ˆ

S (n)= [Ak(n)/α] ν 1+[Ak(n)/α]ν

Ak(n) (2.7)

As mentioned above, the threshold α can be fixed, or adaptive. The authors in [30] have found good results for ν = 2 and α = 3σ where σ is the standard deviation of the additive noise. The threshold value can be normalized for each band to facilitate variation of signal power between the bands during speech:

α(k)=α " ¯ AkN ¯ A # (2.8)

where ¯Ak is the long-time average of the speech plus noise envelope observed in band

k, ¯Ais the long-time average over both time and band number, and N is the number of bands. Unvoiced speech segments may be lower in amplitude, but are still perceptually important. They can also be hard to detect in low SNR conditions, and to alleviate that

problem the authors suggest using the spectral variance as defined by equation (2.10) to supplement equation (2.8) and calculate an adaptive value forα:

α(k,n)= α " ¯ AkN ¯ A # " C γ(n) # (2.9) γ(n)= v u t 1 N N X j=1 h Aj(n)−A¯(n) i2 (2.10)

where C is a normalization constant.

• Exponential Law: This is the method used in [27, 8]. The time-domain envelope in each band (Ak) is raised to a power K, ranging from a highly compressive value to a highly

expansive value. The calculation ofKis given by the following equation:

Ki =e

Amin−Ai

τ (K

max−Kmin)+Kmin (2.11)

where Kmin = 0.3, Kmax = 4, Amin is the minimum amplitude of the envelope within

ith band, Ai is the instantaneous amplitude value of the envelope within the ith band

and τ serves as a time constant determining the disparity between “low” and “high” envelopes. A good value for τ is reported in [8] as 0.5, however the author of [27] discovered that for the case of EE applied to AN a better value for τ is 0.001. The algorithm is evaluated for two values ofτ: 0.001 and 0.0001. K and Emin are calculated

for each band independently.

2.10.2

Real-Time Implementations of Envelope Enhancement

EE has been used in previous studies to perform speech-enhancement in RT situations. Clark- son et. al[29] in 1989 implemented the algorithm on a Texas Instruments DSP chip: TMSS320- C25. Due to the limitations of their chosen hardware they were required to use a relatively low sampling rate of 10 kHz and band extraction was limited to second order FIR filters. Nonethe-

2.11. MMSE NC 35

less the algorithm used 20 uniform bands in the range from 0 to 4 Hz. In each of the bands the envelope expansion was achieved according to equation (2.6). This approach has considerably less computational needs than the method used in [28], which is the method of choice for the implementation in this thesis. The more intensive method is chosen as it has the advantage of incorporating signal dynamics by remembering the last seen minimum value of the input signal within the band of interest.