Chapter 2 Adaptive Algorithms for Sparse Impulse Response
2.2 Conventional Adaptive Filtering Algorithms
number of coefficients, N, in this dissertation.
Although many cost functions have been proposed for adaptive filtering, the most fre- quently used cost is the so-called least-mean-squared error, where the error signal e(t) and the cost function J(t) are defined as
e(t) = d(t) − y(t) = d(t) − wT(t)x(t), (2.4)
J(t) = Ene2(t)o. (2.5)
An adaptive filter w is designed to minimize J(t). The optimal tap weights of an FIR filter can be obtained by solving the Wiener-Hopf equation if the statistics of the underlying signals are available. The solution of Wiener-Hopf equation is given as [2]:
wo = R−1p, (2.6)
where R is the auto-correlation matrix of input vector, R = E{x(t)xT(t)}, and p is the cross-
correlation vector of desired signal and input vector, p = E{d(t)x(t)}.
2.2 Conventional Adaptive Filtering Algorithms
2.2.1 Steepest Descent Method
An alternative method to find wo is to use an iterative search algorithm that starts at some
arbitrary initial point in the weight vector space and progressively moves towards the optimal point. There are many iterative search algorithms derived to minimize the underlying cost func- tion with the true statistics replaced by their estimates obtained in a certain manner. Gradient- based iterative methods are commonly exploited as steepest descent method, i.e., to adjust tap weights iteratively and move along the error surface towards the optimal value:
where α is a step size parameter and ∇wJ(t) denotes gradient of J(w(t)) respect to adaptive weight vector w(t),
∇wJ(t) = ∂J(t)
∂w(t). (2.8)
It is a vector pointing in the direction of the change in filter coefficients that will cause the greatest increase in the error signal. Solving ∇wJ(w(t)) yields
∇wJ(t) = p − Rw(t). (2.9)
Replacing it into (2.7), the steepest descent algorithm is given by
w(t + 1) = w(t) + αp − Rw(t), (2.10)
= w(t) + αR [h − w(t)] . (2.11)
2.2.2 LMS Algorithm
The steepest descent algorithm is not implementable in practical application, because of lack of statistical information of R and p in priori. In addition, the unknown system h is subject to time-varying. Not only the statistical property of input signals is subject to change, the unknown system h is also subject to be time-varying.
The LMS algorithm is a stochastic implementation of the steepest descent algorithm. It is obtained by substituting the instantaneous estimates into the steepest descent algorithm, i.e., replacing R and p in (2.10) by their instantaneous estimates. R ≈ x(t)xT(t) and p ≈ d(t)x(t). Its
coefficient updating equation is given by
w(t + 1) = w(t) + αx(t)e(t). (2.12)
Due to rough estimates of R and p, adaptation of the LMS algorithm is quite random. The advantages of the LMS algorithm include its simplicity in implementation, stable and robust performance against different signal conditions. Its main disadvantage is slow convergence for
2.2 Conventional Adaptive Filtering Algorithms
correlated input singnals due to eigenvalue spread.
2.2.3 Normalized LMS (NLMS) Algorithm
The convergence of LMS is affected by the magnitude of the input signals. In order to make its convergence behavior independent of the input energy, the NLMS algorithm was proposed that the filter vector update is normalized by the input energy. The algorithm is given by
w(t + 1) = w(t) + α x(t)e(t)
xT(t)x(t) + δ, (2.13)
where δ is a small constant called regularization parameter in order to avoid dividing by zero when the input signal is zero in a long period.
The NLMS algorithm can be also derived from many methods [2, 7]. It can be viewed as a variable step-size LMS algorithm where an optimal step size parameter is achieved by solving a constrained optimization problem using least perturbation principle. The least perturbation principle says, minimizing the coefficient vector update, subject to a constraint:
w(t + 1) = min
w(t+1)||w(t + 1) − w(t)|| 2 2,
subject to xT(t)w(t + 1) = d(t). (2.14)
This problem can be solved by using the method of Lagrange multipliers [2] and the resulting algorithm is NLMS. The NLMS algorithm applied a minimum norm update to the solution vector such that the a posteriori error of the most recently added equation is exactly zero.
The NLMS algorithm is widely used in practical application because its simplicity and stability. A step-size parameter in the range of 0 < α < 2 can assure the NLMS algorithm convergence. The main disadvantage of NLMS is that it suffers from slow convergence for correlated input signals.
2.2.4 Affine Projection Algorithm (APA)
For correlated input signals, especially the speech, the convergence speed of the LMS- type algorithms will depend on the eigenvalues of the input signal’s auto-correlation matrix. The APA improves convergence of the adaptive filter by pre-whitening the input signal. APA is an intermediate algorithm between the NLMS algorithm and the RLS algorithm, since it has both a performance and a complexity in between those of NLMS and RLS.
Define the input matrix X(t) as the P successive past input vectors and the desired vector d(t) as the P successive past value of d(t), where P is the projection order, as [2]
X(t) = [x(t) x(t − 1) · · · x(t − P + 1)], (2.15) d(t) = [d(t) d(t − 1) · · · d(t − P + 1)]T. (2.16)
APA solves the P most recent equations exactly, based on a minimum norm weight vector update: w(t + 1) = min w(t+1)||w(t + 1) − w(t)|| 2 2, (2.17) subject to XT(t)w(t + 1) = d(t). (2.18)
The cost function is minimized by taking the partial derivatives for all entries of the coefficient vector and setting the results to zero. The resulting APA can be briefly summarized as [2]:
e(t) = d(t) − XT(t)w(t), (2.19)
w(t + 1) = w(t) + αX(t)hXT(t)X(t) + δIi−1e(t), (2.20) where α is an global constant step-size parameter, and δ is the regularization parameter and I is a P × P identity matrix.
The NLMS algorithm is a special case of the APA with P = 1. The APA exploits more information of input signal. Consequently, it obtains a more accurate estimate of gradient. It achieves a faster convergence speed for the correlated input signals than NLMS only with a
2.2 Conventional Adaptive Filtering Algorithms
modest increase of computational complexity.
2.2.5 Recursive Least-Square (RLS) Algorithm
The RLS algorithm is not a stochastic algorithm, but a deterministic one. It is the solution of least-square problem, whose cost function is deterministic. An exponentially weighted cost function is defined as:
J(t) =
t
X
i=0
λi|e(t − i)|2, (2.21)
where λ is an exponential weighting factor which effectively limits the number of input samples based on which the cost function is minimized. Generally, λ is close but less than 1. The optimal value for the tap-weight vector is defined by the following normal equations
Φ(t) =
t
X
i=1
λt−ix(i)xT(i) = λΦ(t − 1) + x(t)xT(t), (2.22) z(t) = t X i=1 λt−id(i)x(i) = λz(t − 1) + d(t)x(t), (2.23) Φ(t)w(t) = z(t). (2.24) Solving w(t) yields w(t) = Φ−1(t)z(t). (2.25)
Then, using the matrix inversion lemma to the recursive model of correlation matrix Φ(t) to make it possible to invert it recursively [Refer to Appendix A.1]. For exponentially weighted RLS algorithm, its coefficient vector updating equations are depicted by:
S(n) = 1 λ " S(n − 1) + Ψ(n)Ψ T(n) λ + ΨT(n)Ψ(n) # , (2.26) w(n + 1) = w(n) + S(n)e(n), (2.27) Ψ(n) = S(n − 1)X(n). (2.28)
of the adaptation up to the present. It converges at a much higher speed than the LMS algorithm and the APA.
2.2.6 Comparison of Computational Complexity
The complexity of an adaptive algorithm is determined by the number of multiplications (with divisions counted as multiplications) per iteration.
The LMS algorithm require 2N multiplications per iteration: N is for calculation of xT(t)w(t), and N is for coefficient vector updating. The computational complexity of NLMS is
approximately same to LMS. For shift-structure input data, calculation of normalization item xT(t)x(t) could be computed in a recursive way:
σ2x(t) = σ2x(t − 1) + x2(t) − x2(t − N + 1), (2.29) which only needs 3 multiplications. Or, it can be calculated in a way of power estimate:
σ2
x(t) = λσ2x(t − 1) + (1 − λ)x2(t), (2.30)
which needs 3 multiplications too.
The APA requires approximate (P2 + 2P)N + O(P3) multiplications per iteration. There exists a fast implementation whose computational complexity is about 2N + 20P, see [20–23] for detail.
The superior performance of the RLS algorithm is attained at the expense of a large in- crease in computational complexity. The RLS algorithm requires a total of N2 + 5N + 2 multi- plications, which increases as the square of N. For example, when N = 512, the RLS algorithm requires 263170 multiplications, whereas the NLMS algorithm requires only 1027 multiplica- tions.
Table 2.1 compares the computational complexity of the related conventional adaptive filtering algorithms.