Joint state and parameter estimation

3.2 Particle filtering

3.2.2 Joint state and parameter estimation

So far we have discussed particle filtering in the conventional case of state estimation with fixed and known model parameters. PHM applications, how-ever, are marked by a couple of features opposing this assumption. Firstly, the parameters the degradation model depends on are typically unique to the specific component [46]. Therefore, their value may be assessed only to some extent based on historical data. Secondly, the parameter values may be

further altered from those obtained in laboratory tests due to the environ-mental conditions appearing in the field [16, p. 153]. Lastly, the parameters may also vary significantly in respect of time, especially if the use conditions are subjected to high dynamics. Therefore, if we consider the degradation as the state, robust RUL prediction often necessitates simultaneous estima-tion of both the state and the degradaestima-tion parameters. The challenge we confront falls in the framework of joint state and parameter estimation, for which selected particle filtering methods are discussed below.

The problem is commonly approached by augmenting the parameters θ into the state vector in the system model [47], in our case (3.5). Thus the combined state-parameter space may be expressed as

x^∗_k =[xk

θk

]

, (3.13)

where the parameters can now be considered as state variables, and the problem is reduced to the original state estimation. The Algorithms 1 and 2 presented previously are therefore applicable by replacing x with x^∗_k. The difference is that each particle now comprises an estimate for both the states and the parameters.

However, in contrast to the states, the model based on which θ evolve is typically unknown. Nevertheless, the parameters have to be assigned with some type of a transition to enable their estimation. The most simple al-ternative is the so called persistance model, where the parameters are not changed between the time steps, i.e. θk = θ_k−1 [47, 48]. In this case the transition function (3.5) in combination with (3.13) becomes

x^∗_k =[xk

and we can see that all the possible θ values for the algorithm to estimate from are set by the initialized particles.

Although the persistance model has its advantages in computational sim-plicity, it often falls short if the particles have not been initialized properly.

More specifically, if the initial distribution is misplaced or the number of particles is not high enough, the estimation performance may suffer as the resampling is to be executed on the same poorly specified set of particles at each time step. The technique appears especially inadequate if the param-eters are subjected to significant changes, in which case proper estimation would require a relatively wide initial distribution of θ. However, in the field of prognostics the persistance model has been reported in estimation of the joint state-parameter space in mechanical crack growth [49] as well as power

MOSFET on-state resistance [34]. Here it is essential that the parameters of the degradation model can be defined accurately enough based on some initial or historical measurement data, and that they stay close to static throughout the degradation process.

The assumption of relatively well-known and close to fixed θ may appear too strict for PHM applications in changing environmental conditions and varied usage. Therefore, we have to consider further methods for estimating unknown and time-variant parameters. A typical approach for applying dy-namics into θ is to incorporate random walk, where an artificial noise term is added into the model to evolve the parameters [47]. In this case, (3.14) is modified as

x^∗_k =[f (xk−1,θk−1,vk−1) θk−1+ ζk−1

]

(3.15) where the artificial noise vector ζ_k−1 may be defined Gaussian as

ζ_k−1 ∼ N (0, Wk−1) , (3.16) for some predefined variance matrix Wk−1.

Although the random walk has been applied in conventional joint state and parameter estimation techniques, problems arise as the artificial evolu-tion increases the variance of the particles, leading to overdispersed posterior estimates [50, 51]. We may consider a Gaussian parameter posterior, which is approximated by the particles as [51]

p(θk| y^0:k)≈

where Vk denotes the Monte Carlo posterior variance matrix. Now, if we add the artificial noise defined as (3.16), the resulting prior parameter distribution retains the mean ¯θk+1 = ¯θk, but the variance matrix is increased to

Vk+1 = Vk+ Wk, (3.18)

which accumulates in respect of k.

This inevitable increase of Vk has some unwanted effects. As Daigle and Goebel [46] state, the estimation performance with random walk PF becomes highly sensitive on the selected Wk. Generally, large variance yields quick convergence, but the tracking of the trajectory is marked by wide variations. Too small variance in turn, results in very slow convergence, if at all, but in the case of success only small variations occur during the tracking. For prognostics the effect of the random walk is especially harmful

since the accumulated variance propagates into the estimated EoL and RUL distributions unnecessarily increasing uncertainty in the prediction.

To overcome the problem of the increasing variance Liu and West [51]

proposed a rather recognized method where kernel smoothing is applied for the parameter posteriors. This is implemented by shrinking θkcloser to their mean ¯θk, to the so called kernel locations expressed as

m⁽ⁱ⁾_k = aθ⁽ⁱ⁾_k + (1− a) ¯θk, (3.19) where a is defined by the smoothing factor h as

a =√

1− h². (3.20)

After the shrinkage, a small degree of noise is added with variance h²Vk, whereby the next parameter prior becomes

p(θk+1 | θ^k)∼ N (

θk+1 | m⁽ⁱ⁾k , h²Vk

)

, (3.21)

which is the evolution assigned for θk in the transition function. Now, the over-dispersion implied by (3.18) is trivially corrected [51, p. 204]. Here we want to remind that as the particles approximate the posterior distribution as in (3.17), their mean and variance matrix are calculated as [52]

θ¯k =

where the weights wk are, as denoted, normalized.

Within particle filtering kernel smoothing is typically incorporated in an auxiliary sampling importance resampling (ASIR) filter [51, 52, 53] originally introduced as a variant of the standard SIR [54]. The fundamental idea in ASIR is to conduct the resampling of the time instant k − 1 utilizing the knowledge of the measurement at k. In this manner the filter pursuits to simulate a situation where the optimal importance density would be avail-able [39, p. 49]. This is achieved with the help of an auxiliary integer variavail-able, denoted as j⁽ⁱ⁾, marking the index of the corresponding particle at k− 1.

The ASIR filter operates as follows [39, p. 50], [51, p. 201]. The auxiliary integer variables are based on the likelihood of the so called reference points µ⁽ⁱ⁾_k , which some how characterize the state transition, such as the mean or

a sample. Utilizing the reference points the auxiliary integers are given by

which enable us to draw the actual samples as x⁽ⁱ⁾_k ∼ p(

xk| x^jk−1⁽ⁱ⁾

)

. (3.25)

Finally we can derive new weights from the difference in the likelihoods of the actual sample and the reference point as

w⁽ⁱ⁾_k ∝

As the ASIR is combined with kernel smoothing of the parameter posteri-ors, the core idea of the filter stays the same with the addition of calcu-lating the kernel locations as in (3.19) and drawing the parameters based on (3.21). Thus, the ASIR-KS filter for a single iteration becomes as in Al-gorithm 3 [51, 53] presented on the next page. Here the kernel smoothing is performed only once but optionally it can be repeated multiple times [51, 52].

The convergence of the parameter estimates in kernel-density-based par-ticle filters is adjusted by the smoothing factor, for which Chen et al. [55]

propose the following settings regarding the required rate of dynamics. If θ are assumed to be fixed or vary only slowly, a small positive value of h should be set, i.e. 0 < h < 0.2. On the other hand, if it is expected that the parameters are subjected to significant changes, h should take a value close to 1, i.e. 0.8 < h < 1. Other methods for defining the smoothing factor include optimization from historical data [55] and a separate online tuning algorithm [56].

Kernel smoothing may be considered as an adaptive technique to miti-gate the effect of the increasing variance driven by the smoothing factor. In the field of prognostics Hu et al. [57] applied a kernel-smoothing-based PF in the RUL prediction of Lithium-ion batteries. Also other adaptive parti-cle filtering approaches have been proposed to address the variance problem within prognostics. For instance, Daigle and Goebel [46] incorporated SIR with an online Wk tuning algorithm based on relative median absolute de-viation of the posterior parameter estimates. The algorithm was applied in RUL prediction of a simulated centrifugal pump with successful results.

In addition to the increase of variance, the state-augemented particle fil-tering for simultaneous estimation of states and parameters is associated with

Algorithm 3 ASIR-KS filter

◃ Reference points as the mean

5: m⁽ⁱ⁾_k−1 ← aθk−1⁽ⁱ⁾ + (1− a) ¯θ⁽ⁱ⁾_k−1 ◃ Kernel locations as in (3.19)

◃Draw new parameters as in (3.21)

15: x⁽ⁱ⁾_k ← f(

x^j_k−1⁽ⁱ⁾,θ_k^j⁽ⁱ⁾,vk

)

◃Predict state by the model

16: w˜_k⁽ⁱ⁾ ∝

challenges in computational performance. Inefficiency is observed especially with complex systems as the number of particles required for successful es-timation generally increases by the dimension of the joint state-parameter space [46]. As a consequence, different particle filtering approaches for com-bined state and parameter estimation have also been investigated. These methods include for instance a dual particle filter [41], where two SIR al-gorithms function in parallel: one is determined to estimate the state and the other the parameters. In a process control application of an ore mill it outperformed the SIR combined with random walk. A similar algorithm was also used in battery state-of-charge estimation with unkown parameters

and the results were successful in comparison with traditional methods [58].

However, as the joint state-parameter space in this study here is considered quite modest, these approaches are not further examined.

In document Piikarbidi-MOSFET:n kiihdytetty ikäännyttäminen ja prognostiikka (Page 36-42)