2.5 Adaptive Algorithms for the Estimation of the Parameters
2.5.2 Pseudo-Linear Regression (PLR) Algorithm
Pseudo-linear Regression (PLR) algorithm represents a simplification of the RPE algorithm by introducing
F k; zð Þ ¼ G k; zð Þ ¼ 1: ð2:147Þ The algorithm itself may be expressed as
H k^ð þ 1Þ ¼ ^H kð Þ þ aR1ðkþ 1ÞXoð Þek oð Þ:k ð2:148Þ Here the gradientryoð Þ is approximated by ryk oð Þ Xk oð Þ. The name of thek algorithm stems from the fact that the output from the adaptive filter is a nonlinear function of the parameter H, while in the algorithm itself when calculating the gradient (2.128) one neglects that Xoð Þ is dependent on the parameters ofk H. Xoð Þ is also often denoted as the regression vector and is defined by expressionk (2.106), while the output signal yoðkÞ is defined by expression (2.105).
The PLR algorithm is very similar to the RLS algorithm, so their computational complexities are comparable and they are much lower than that of the RPE algorithm.
A disadvantage of this algorithm is that it does not have obligatory to converge to the minimum of the MSE criterion, except in the case when the polynomial in the denominator of the transfer function (2.104), denoted as 1 Aðk; z1Þ, satisfies the Strictly Positive Real (SPR) condition; let us note that the discrete transfer
Table 2.5 Flow diagram of the PLR algorithm 1. Initialization
• ^Hð0Þ ¼ 0; R1ð0Þ ¼ r2I; r2 1
• Generation of the sample of the input signal x(0) and the reference signal y(0)
• Initial output error eoð0Þ ¼ yð0Þ yoð0Þ ¼ yð0Þ
• Read in the forgetting factor 0:9 q 0:99
• Calculation of the convergence factor a ¼ 1 k
• Forming of the initial vector of filtered data Xoð0Þ ¼ xð0Þ 0 . . . 0½
2. Assuming that ^Hðk 1Þ; eoðk 1Þ, R1ðk 1Þ and Xoðk 1Þ are known, in each discrete moment of time k¼ 1; 2; . . .; calculate:
• Gain matrix
R1ð Þ ¼k 1q R1ðk 1Þ R1ðk 1ÞXoðk 1ÞXToðk 1ÞR1ðk 1Þ q
aþ XToðk 1ÞR1ðk 1ÞXoðk 1Þ
!
• Filter filter:coefficients
H k^ð Þ ¼ ^H kð 1Þ þ aR1ð ÞXk oðk 1Þeoðk 1Þ
• Form data vector where xðiÞ ¼ yðiÞ ¼ 0 for i\0(causal signals)
• Calculate output yoð Þ ¼ ^k HTð ÞXk oð Þk
• Calculate output error OE eoð0Þ ¼ yðkÞ yoðkÞ
3. Increment counter k by 1 and repeat the procedure from the step 2
function G zð1Þ is denoted as SPR if Re G ef ð jxÞg [ 0 for 8x; p\x\p, where j is the imaginary unit. If not, the obtained results may be absolutely unacceptable [24, 28].
Contrary to the RPE algorithm, here it is not necessary to monitor stability during the parameter update. Because of that the PLR algorithm can be used in combination with the RPE algorithm. When RPE algorithm becomes unstable one adopts the PLR algorithm until the poles return to a stable area. In this way it is possible to improve the properties of the RPE algorithm, which will ignore the obtained results in the time intervals when the estimated poles are in the unstable area, until the stability criterion is satisfied (Table2.5).
Let us note at the end that the theory of adaptive IIR filters is still insufficiently researched, since their analysis includes nonlinear systems of high order, and this too is a reason of their relatively narrow application. Prior analyses and computer simulations are often necessary to determine with certainty the properties of IIR adaptive algorithms [29, 30]. Thus the analysis and synthesis of the adaptive IIR filters in various tasks of processing and transfer of noise-contaminated signals still represents a subject matter with both theoretical and practical interest.
Finite Impulse Response Adaptive Filters with Variable Forgetting Factor
The statistical properties of the input and the reference signal determine the environment of an adaptive filter. Although most of the analyses of adaptive filters in available literature are based on a stationary environment, the utilization of adaptive filters shows its advantages primarily in nonstationary environments.
Nonstationarity may be categorized with respect to the change of statistical properties of the input signal, the reference signal, also including the variation of the estimated system parameters, or both simultaneously. This Chapter considers the cases when the input signal is stationary, although it does not have to be the limiting condition for the application of the analyzed algorithms. Further, it was assumed that additive noise at the system output is stationary with regard to the reference signal (desired output), so that we considered a model of nonstationarity caused by the variation of the value of estimated filter parameters.
When an adaptive filter is in a nonstationary environment, the most important measures of its properties are (1) the time necessary for the algorithm to converge after the onset of nonstationary changes; (2) the achieved accuracy of the estimated parameters after the finished convergence. However, these two requirements are mutually opposed, so that it is necessary to define an algorithm representing an optimal measure of their congruence. One of the solutions is the use of adaptive algorithms with a variable forgetting factor.
3.1 Choice of Variable Forgetting Factor
The choice of a fixed forgetting factor with a value near to unity enables efficient following of slow changes of the system parameters. However, this approach gives poor results if the changes of the system parameters are abrupt. As stressed in Sect 2.4.4, the application of a variable forgetting factor in a parameter estimation algorithm ensures different evaluations of previous measurements of signals.
In the previous Chapter it was shown that the forgetting factor q in the RLS algorithm corresponds to an asymptotically exponential decrease of memory, with a value defined by Eq. (2.71), i.e.
B. Kovacˇevic´ et al., Adaptive Digital Filters, DOI: 10.1007/978-3-642-33561-7_3,
Academic Mind Belgrade and Springer-Verlag Berlin Heidelberg 2013
75
s¼ 1
1 q: ð3:1Þ
If it is assumed that the properties of an environment within an interval s remain approximately unchanged, it is possible to use (3.1) to determine the adequate value of the forgetting factor q. Thus for nonstationary signals it is necessary to adaptively change the forgetting factor during the operation of the algorithm. On the nonstationary parts of the signal it is optimal to use a short memory length s¼ smin, for which q¼ qmin\1, while for the stationary parts of the signal one should establish a long memory, i.e. s¼ smax, for which q¼ qmax 1. In this manner one obtains a tradeoff between the desired accuracy and the adaptation speed of the estimated parameters. According to (3.1) it follows
smin ¼ 20 ) qmin¼ 0:95; smax¼ 100 ) qmax¼ 0:99: ð3:2Þ Further, in such an approach it is assumed that the nonstationary signal consists of stationary parts of a certain length in a range between s¼ smin and s¼ smax. However, there is only a low probability that in practical situations the duration of these intervals of stationarity and the moment of their onset will be known.
Because of that, during the operation of the parameter estimation algorithm one has to estimate the degree of signal nonstationarity and based on that knowledge to automatically determine the change of the value of the forgetting factor.
Two convenient ways to adaptively determine the forgetting factor, both based on the energy of the error signal (residual) in one data window are presented in the papers [9] and [10]. The basic idea is to generate a variable forgetting factor based on the error residual, which is increased in the nonstationary parts of the signal, thus pointing out to the onset of nonstationarity [31, 32].
3.1.1 Choice of Forgetting Factor Based on the Extended Prediction Error
Reference [9] proposed a procedure for the choice of the variable forgetting factor based on the extended prediction error (EPE algorithm), defined on a data window of a length L with
Q kð Þ ¼1 L
XL1
i¼0
e2ðk iÞ: ð3:3Þ
Since the error e occurring because of the presence of additive noise at the filter output is a stochastic process, the idea is to use averaging (summation) to remove (filter) the stochastic error component caused by additive noise, in order to avoid erroneous recognition of nonstationarity as a presence of high additive noise.
However, the value of L (the length of data window on which one calculates the error energy, i.e. the EPE criterion) must be sufficiently small in comparison to the
maximal time constant smax (algorithm memory), in order to ensure the best possible registering of potential nonstationarity of the signal. The choice of the variable forgetting factor is defined by (3.1), i.e.
q kð Þ ¼ 1 1
sðkÞ; ð3:4Þ
where [9]:
s kð Þ ¼r2nsmax
QðkÞ : ð3:5Þ
Here r2ndenotes the expected (estimated) variance of additive noise, based on a real knowledge of the analyzed stochastic process which generated the measure-ment data at the filter output. In the stationary parts of the signal the extended prediction error QðkÞ tends to the noise variance r2n, and in this case the maximal asymptotic value of the memory (smax) controls the adaptation speed. Since the choice of the forgetting factor, defined by (3.4) and (3.5), does not guarantee positive values of the forgetting factor q in (3.4), it is necessary to limit in advance the bottom value of this factor to qmin\1. It turns out that this algorithm is efficient in the cases when the signal to noise ratio (SNR) is above 20 dB. For an SNR decreasing below 10 dB this algorithm gives poor results (SNR ratio is defined as SNR¼ 10 log r 2y=r2n
, where r2y is variance or mean power of filter output signal in absence of additive noise, while r2n denotes the variance as a measure of the mean power of additive noise). Besides that, it is necessary to specify in advance the variance of additive noise r2n, which is not easily deter-mined in many cases. The scheme (3.4), (3.5) for the choice of the variable forgetting factor is very sensitive to this parameter, which in a general case may be estimated based on measured data (by their adequate averaging or in some other way). In practical situations, to obtain a heuristic estimation of an unknown var-iance one often uses a median of the absolute deviation of the median calculated at a data window of a length L [19, 33]
r2n dðkÞ ¼median e if ð Þ median eðiÞ½ g
0:6745 ; ð3:6Þ
where k is the current discrete moment, and the index of discrete time i belongs to the set i¼ k; k 1; . . .; k L þ 1. The median represents a middle term in the sample whose elements are sorted as an increasing sequence if the sample length L is an odd number, or the arithmetic mean of the two middle terms of the sample sorted in an increasing sequence if the sample length L is an even number [16, 19, 33, 34]. The factor 0.6745 ensures that the estimation (3.6) is approximately equal to the standard deviation of the sample, r2n, for a sufficiently large length L of the data window and, in the case that the terms of the discrete sequence eðiÞf g are generated according to the normal distribution low, with a zero mean value and the
variance r2n. Instead of the estimation (3.6) one may also use the arithmetic mean [16, 19, 34]
r2n1 L
XL1
i¼0
e ið Þ e
½ 2; e¼1 L
XL1
i¼0
e ið Þ: ð3:7Þ
It is not convenient to utilize the estimation (3.7) in the situations when mea-surement noise has impulse character, i.e. when it contains sporadic realization of high intensity that are denoted as ‘‘outliers’’, i.e. such an estimation is non-robust in the quoted conditions [17, 18, 19, 21, 33]. The usual choice for the estimation (3.6) is 5 L 10, while in the case of the estimation (3.71) one adopts L 30 [26, 27, 35, 36].
3.1.2 Fortescue–Kershenbaum–Ydstie Algorithm
One of the very often cited algorithms for the choice of the variable forgetting factor in the recursive least squares (RLS) algorithm was proposed in the paper of Fortescue, Kershenbaum and Ydstie (FKY), according to which it was named FKY algorithm [10]. The value of the forgetting factor is determined according to the ratio of the current value of the squared error signal and the estimated power of additive noise. The choice of the forgetting factor in RLS algorithm is given by
q kð Þ ¼ 1 e2ð Þk
b01þ XTð ÞP k 1k ð ÞX kð Þ ; ð3:8Þ where eðkÞ is the current error or residual, and b0is a constant chosen to satisfy the desired estimation quality in the stationary mode of operation. Similar to the EPE algorithm, the FKY algorithm (3.8) does not guarantee positive values of the forgetting factor, so it is necessary to limit its bottom value to qmin\1.
Basically this algorithm was developed to ensure higher robustness (insensi-tivity) with regard to the input signal characteristics, but it also proved itself successful in the applications in nonstationary environments. As in the previous case, it is necessary for its realization to know the characteristics of additive noise, i.e. its variance r2n.
The derivation of the FKY algorithm (3.8) consists of the following steps [10].
Let a discrete system (filter) whose parameters are estimated, be described by the following linear regression model
y kð Þ ¼ HTð ÞX kk ð Þ þ n kð Þ; ð3:9Þ where y kð Þ is the system output (noisy desired output of the system), HðkÞ ¼ ½b0 b1b2. . . bM is the vector of the estimated parameters (the system model is known with an accuracy up to an unknown parameter vector), X kð Þ is the vector of input signal measurements and n kð Þ is additive noise at the system
output. If one defines a vector of estimated parameters ^H kð Þ for the M-th filter:order of an adaptive FIR filter
HðkÞ ¼ ^^ b0ð Þ ^k b1ð Þ ^k b2ð Þ . . . ^k bMð Þk
; ð3:10Þ
then according to the signal model (3.9) the expected output from the adaptive filter in a moment k (output prediction) is given as
^yðkÞ ¼ XTðkÞ ^H kð 1Þ; ð3:11Þ where the unknown parameter vector HðkÞ is replaced by its last known estimation H k^ð 1Þ before the output signal y kð Þ was measured, and noise n kð Þ is approxi-mated by its mean or expected value, which is assumed to be zero.
For the considered FIR system the input data vector is given as XTð Þ ¼ x kk ½ ð Þ x k 1ð Þ . . . x k Mð Þ, so that (3.9) reduces to a stochastic linear difference equation (linear regression model)
y kð Þ ¼XM
i¼0
bið Þx k ik ð Þ þ n ið Þ:
Parameter estimation may be achieved by the application of the recursive least squares algorithm with an exponentially weighted of squared error signals (2.89), the so-called WRLS algorithm, i.e.
H k^ð Þ ¼ ^H kð 1Þ þ K kð Þe kð Þ; ð3:12Þ e kð Þ ¼ y kð Þ ^y kð Þ ¼ y kð Þ XTð Þ ^kH kð 1Þ; ð3:13Þ K kð Þ ¼ P k 1ð ÞX kð Þ q þ X Tð ÞP k 1k ð ÞX kð Þ1
; ð3:14Þ
P kð Þ ¼1
q P kð 1Þ P k 1ð ÞX kð Þ q þ X Tð ÞP k 1k ð ÞX kð Þ1
XTð ÞP k 1k ð Þ
n o
; ð3:15Þ where the initial value P 0ð Þ ¼ r2Irepresents a unit matrix, I, multiplied by a large positive number r2 1. As will be shown in the further text, the matrix P has the meaning of an error covariance matrix of the estimated parameters. The role of the forgetting factor is to enable following of the parameter changes in the systems variable in time. The adaptation speed is determined by the asymptotic memory of the algorithm defined by (3.1), i.e.
s¼ 1
1 q; ð3:16Þ
which limits the evaluation of the previous signal measurements to s time samples.
It should be noted that for a choice of q¼ 1, with advancing estimation pro-cess, the value of the matrix P decreases, and a consequence is that the information
about the system dynamics, i.e. about the estimated parameters decreases and finally completely disappears. On the other hand, setting q to a value below 1, in order to include information about the changes occurred in the system (its para-meters) leads to continuous divisions of the matrix P with a factor lower than one, which may lead to a sharp increase of its value, as well as to a large sensitivity to disturbances and numerical errors propagating through the residual e kð Þ in (3.13).
The error signal or residual (3.13) contains the information about the state of the estimator in each discrete moment k. Small values of the error signal mean, except in the case of possible absence of the input signal, that the value of the estimated parameters is close to the desired value. In that case it is desirable to choose a value of the forgetting factor q near to unity, in order to comparably take into account all previous measurements. In the case of increasing error signal one should increase the estimator value, i.e. decrease the value of the forgetting factor q below its unit value, until the estimated parameters are updated to a desired value, and the error signal becomes sufficiently small.
According to this requirement, one may define the measure of the information content of the filter, b kð Þ, as a weighted sum of squares of error signals, which in its recursive form is given as [10]
b kð Þ ¼ q kð Þb k 1ð Þ þ e2ð Þ 1 þ Xk Tð ÞP k 1k ð ÞX kð Þ1
; ð3:17Þ
where qðkÞ is the variable forgetting factor. Let us note that the second addend in (3.17) represents a normalized error, since the term 1þ XTðkÞP ðk 1Þ XðkÞ represents an estimation of the error variance e kð Þ, as will be shown at the end of this Chapter.
The choice of bðkÞ in such a manner to preserve its constant value, i.e.
b kð Þ ¼ b k 1ð Þ ¼ . . . ¼ b0 ð3:18Þ may define the strategy for the choice of the forgetting factor in such a manner that it in each moment depends on the measure of the information contents of the filter, which is constant. Namely, from (3.17) and (3.18) it directly follows that
q kð Þ ¼ 1 e2ð Þk
b01þ XTð ÞP kk ð ÞX kð Þ : ð3:19Þ Starting from (3.16) and (3.19) one obtains for the effective filter memory
s kð Þ ¼ b0
e2ð Þ 1 þ Xk Tð ÞP k 1k ð ÞX kð Þ1: ð3:20Þ Since b0is proportional to the sum of squared error signals, when choosing its value one may start from [10]
b0¼ r2ns0; ð3:21Þ
where r2n is the expected variance of additive noise in (3.9), based on the real knowledge of the stochastic process, and s0 represents a nominal filter memory length determining the total speed of the adaptive process. Let us note that, similar to (2.32), the solution of the difference equation (3.17) is given as
b kð Þ ¼Yk
i¼0
q ið Þb0þXk
i¼0
x ið Þe2ð Þ;i
where
x 0ð Þ ¼ 1; x ið Þ ¼ 1 þ X Tð ÞP i 1i ð ÞX ið Þ1
; i¼ 1; . . .; k : Taking into account (3.18) one concludes that
b0 1Yk
i¼0
q ið Þ
" #
¼Xk
i¼0
x ið Þe2ð Þ:i
Since the sum of the squared errors represents an estimation of the variance of additive noise in the model of the filter output signal (3.9), the derived expression implies the relation (3.21).
At the end of this section is shown that for a choice of b0 according to (3.21), for stationary processes one obtains EfsðkÞg ¼ s0when k! 1. The sensitivity of the system is determined by the choice of s0 so that lower values of s0 lead to a more sensitive system, and higher to a less sensitive one, but with slower adapt-ability of the estimated parameters.
Summarily, the recursive least squares algorithm with the FKY strategy for the choice of the forgetting factor is defined in Table3.1.
It should be mentioned that an accurate solution of the problem for the choice of a constant value of b kð Þ, which reduces to the solution of (3.18) in each step, requires the determination of the values of the forgetting factor prior the deter-mination of the values of the amplification of the estimator K kð Þ, which would result in a much more complex relation for the choice of the forgetting factor. In a majority of cases the practical difference between this algorithm and the described algorithm is very small, but one must introduce testing of the obtained value of q kð Þ, in order to ensure that the forgetting factor does not assume unacceptably small or even negative values. This problem is avoided by limiting the bottom value of the forgetting factor by introducing its minimal value qmin.
As shown in the previous section the algorithm (3.12)–(3.15) minimizes the criterion of weighted squared errors e ið Þ ¼ y ið Þ ^HTX ið Þ; defined by (2.69), i.e.
J kð Þ ¼Xk
i¼0
w ið Þ y ih ð Þ ^HTX ið Þi2
; w ið Þ ¼ qki: ð3:22Þ
Relation (3.22) can be also written in the vectorial form
Nonrecursive least squares algorithm determines ^Hin one step from the con-dition of the minimum of the criterion (3.22) and is defined by the relation (2.53), or, if one adopts the notation ^H¼ ^H kð Þ where the matrix and the vectors of input and output data for a FIR discrete system (filter)
Table 3.1 Determination of forgetting factor by FKY strategy 1. Initialization
• Forgetting factor (set bottom limit) q kð Þ ¼ 1 b e2ð Þk
3. Increment iteration counter k by 1 and repeat the procedure from step 2
Z kð Þ ¼ for i\0. The algorithm (3.12)–(3.15) itself represents a recursive version of the non-recursive single-step estimation procedure (3.25), i.e. this multistep algorithm determines the minimum of the adopted estimation criterion (3.22), thus these two algorithms are equivalent in the asymptotical sense ðk ! 1Þ. If one further replaces the discrete system model (3.9) into (3.25), one obtains
H k^ð Þ ¼ H kð Þ þ Xk where the column-vector of additive noise at the system output is
N kð Þ ¼ n 0½ ð Þ n 1ð Þ . . . n kð ÞT: ð3:30Þ According to (3.29) one may conclude that the estimation of ^H kð Þ will be approximately equal to the accurate value of H kð Þ if the values of noise realization n ið Þ are much smaller than the values of the components of the system excitation signal vector X ið Þ, i.e. n ij ð Þj X ik ð Þk and n ij ð Þj 0, where k k denotes the norm of the vector, which practically means that the signal-to-noise ration (SNR) is satisfactorily high.
Relation (3.28) can be written in the modified form as
H k^ð Þ ¼ H kð Þ þ 1
so that one concludes that for sufficiently high k H k^ð Þ ¼ H kð Þ þ E w i ð ÞX ið ÞXTð Þi 1
E w if ð ÞX ið Þn ið Þg; ð3:32Þ where Ef g denotes the mathematical expectation or mean value. While deriving (3.32), the law of large numbers [16, 17, 19, 29, 34, 37] was applied, according to which
k!1lim 1 k
Xk
i¼0
w ið ÞX ið Þn ið Þ ¼ E w if ð ÞX ið Þn ið Þg: ð3:34Þ
Since w if ð Þg is a deterministic series, the term w ið Þ can be moved before the linear operator Ef g in (3.33) and (3.34), thus one concludes that for sufficiently large k (fulfilled assumption of the law of large numbers) the estimation ^H kð Þ will be close to the accurate value of H kð Þ. In this manner, the expected value of the estimation will be equal to the accurate value of the parameters (unbiased esti-mation) under the condition that the stochastic variables XðiÞ and nðiÞ are uncorrelated, i.e.
E XðiÞnðiÞf g ¼ 0: ð3:35Þ
Since according to (3.27) XðiÞ contains only realizations of the stochastic input excitation signal of the FIR filter, the condition (3.35) will be fulfilled if additive noise nðiÞ at the signal output (see relation (3.9)) is uncorrelated with the excitation
Since according to (3.27) XðiÞ contains only realizations of the stochastic input excitation signal of the FIR filter, the condition (3.35) will be fulfilled if additive noise nðiÞ at the signal output (see relation (3.9)) is uncorrelated with the excitation