Chapter 2. Background and literature review
2.5 Overview of dereverberation approaches in the literature
2.5.6 Acoustic channel estimation and equalization
Blind dereverberation using channel equalization techniques is a two stage process. The first stage em- ploysBSIalgorithms to estimateAIRs using only the noisy and reverberant microphone signals, and the second stage designs channel equalizers that are robust to estimation errors from the first stage. Both stages are independently well-researched problems, each with its own rich literature. An overview of the state-of-the-art for both stages is given below.
Blind system identification
The topic ofBSIis widely studied with applications in communications [76], image processing [77] and seismology [78]. More recently, it has also been studied for acoustic channels. These algorithms differ from supervised system identification algorithms, used for example in acoustic echo cancellation [79], where knowledge of the source is available.
Early research intoBSIalgorithms were based on higher order statistics (HOS) and generally dealt with single output systems. An overview of the research in this area is provided in [80], where it was con- cluded thatHOS-based methods generally computationally expensive and may converge to local min- ima. Additionally, they suffer from slow convergence and therefore require a large number of observation samples.
More recently, second order statistics (SOS)-based approaches were proposed in [81] and [82], and have since attracted more attention for their faster convergence. OneSOSmethod, the subspace method,
exploits the principle of orthogonality between the subspaces of the signal and noise by performing sub- space decomposition of the received signals [83,84]. While this approach is attractive for its closed-form solution, it requires the dimensions of the signal and noise subspaces to be numerically well defined, and therefore can be sensitive to background noise. Additionally, it is a computationally expensive ap- proach [81]. AnotherSOS-based method employs maximum likelihood to estimate theAIRs with iter- ative approaches proposed in [80,85]. It assumes that the background noise is white and Gaussian, but this assumption may not hold in many practical situations. A third approach toSOS-basedBSIutilizes the cross-relation property between multichannelAIRs and the observed microphone signals [86]. Algo- rithms based on this approach have gained significant interest in the literature for practicalAIRestima- tion as they are computationally less expensive than subspace methods and adapt faster thanHOS-based methods. A more detailed review of the literature related to this approach is given below.
The cross-relations between a channel output pair is given as
s(n, i) ∗ hm(i) ∗ hm′(i) = xm(n, i) ∗ hm′(i) = xm′(n, i) ∗ hm(i), (2.39)
where the subscriptsm, m′∈ {1, 2, . . . , M} index the acoustic channels, and m ≠ m′. In the absence of background noise, an error signal can be constructed for each pair of microphone signal and estimated channel as
emm′(n) = (xm(n, i) ∗hˆm′(i)) − (xm′(n, i) ∗hˆm(i)) . (2.40)
Cost functions can then be designed based on this error signal to estimate theAIRs ˆh to an arbitrary scale
factor, subject to the following identifiability conditions being met [87]:
1. Hm(z), the z-transforms of the multichannelAIRshm, do not share any common zeros.
2. The source signal has a full-rank autocorrelation matrix.
Two time domain adaptive solutions were presented in [87], the multichannel least mean squares (MCLMS) and multichannel Newton (MCN) algorithms. In [88] and [87], a unit-norm constraint was placed on theAIRs to avoid the trivial solution, however it was shown in [89] that this constraint is un- necessary if the initialAIRestimates are not orthogonal to the trueAIRs. In [90], these approaches were extended to the frequency domain yielding the normalized multichannel frequency domain least mean squares (NMCFLMS) algorithm, where block convolutions and correlations were computed in the fre- quency domain, resulting in increased computational efficiency. Unfortunately, in the presence of back- ground noise, all three algorithms (MCLMS,MCNandNMCFLMS) suffer from misconvergence after
2.5 Overview of dereverberation approaches in the literature 57
initially converging towards the true system coefficients. AsSNRdecreases, the level of misconvergence becomes more severe and starts earlier in the adaptation process. A number of derivative algorithms were subsequently developed to improve accuracy in the presence of noise and/or address the issue of misconvergence. In [91], a self-adaptive variable step-size was proposed to optimize speed of convergence and improve robustness to background noise. In [92] and [93], additional constraints were placed on the direct path coefficients. This requires knowledge of the direct path, which can be estimated as proposed in [94]. This approach was shown to be successful in reducing the misconvergence ofNMCFLMS, how- ever, imperfect estimation of the direct path contributes to theBSIsteady-state error. A more promising approach introduces spectral constraints on theAIRs; in [95] it was observed that the misconvergedAIRs estimates exhibit narrowband characteristics and therefore the authors proposed minimization of spec- tral energy in either the low- or high-frequency bands. This algorithm does not generalize to the case where there is excessive spectral energy in the estimatedAIRs at both low- and high-frequency bands. To address this issue, the robust NMCFLMS (RNMCFLMS) was proposed in [96], which enforces an approximately uniform spectral flatness and was shown to reduce both misconvergence and steady-state error in the presence of background noise. Other approaches such as [97–99] modelAIRs as sparseFIR
filters, and exploit this structure to improve the convergence speed ofNMCFLMS.
Robust channel equalization
In general,AIRs are non-minimum phase [100] and therefore do not have stable and causal single-channel inverse filters. In the multichannel case, however, theMINTalgorithm provides exact multichannel in- verse filters subject to the following two conditions being satisfied [101,102]
C-1 Hm(z), the z-transforms of the multichannelAIRshmdo not share any common zeros.
C-2 The equalization filters are of lengthLi≥ ⌈L−1
M−1⌉, where ⌈⋅⌉ denotes the ceiling operation.
Unfortunately, MINTis very sensitive to channel estimation errors and may instead introduce addi- tional artificial reverberation in the equalized signal. Regularization was introduced in [103] to reduce the energy of the equalizing filters, which in turn reduces the contribution of channel estimation errors to the equalized output. An alternative approach aims for partial rather than complete channel equal- ization, such that the equalized channel decays faster than theAIR. This approach, termed channel shortening, was employed successfully in the field of digital communications for robust channel equal- ization [104–107]. More recently, it has been investigated for acoustic channel equalization, motivated by psychoacoustics studies that show early reflections in anAIRimprove speech intelligibility while the late reverberant tail degrades perceived speech quality (see Section2.2.1). Algorithms proposed within
the channel shortening framework therefore aim to suppress only the reverberant tail to zero while re- laxing the constraints on the early reflections according to different criteria. The p-norm reshaping al- gorithm [108] reshapes the early coefficients based on a perceptually motivated mask, theRMCLSal- gorithm [109] completely relaxes the constraints on early coefficients and the regularized P-MINT [31] constrains the initial coefficients to be the same as the estimatedAIR.
In this thesis, robust channel equalizer design within the framework of channel shortening will be further investigated. In this vein, more detailed mathematical formulations and performance evaluation of the channel shortening algorithms listed above will be given in Chapter4.