Sequential Bayesian Filtering in Ocean Acoustics

(1)

Sequential Bayesian Filtering in Ocean Acoustics

Caglar Yardim, Member, IEEE, Zoi-Heleni Michalopoulou, Senior Member, IEEE, and Peter Gerstoft

Abstract— Sequential filtering provides an optimal framework for estimating and updating the unknown parameters of a system as data become available. Despite significant progress in the general theory and implementation, sequential Bayesian filters have been sparsely applied to ocean acoustics. The foundations of sequential Bayesian filtering with emphasis on practical issues are first presented covering both Kalman and particle filter approaches. Filtering becomes a powerful estimation tool, em- ploying prediction from previous estimates and updates stemming from physical and statistical models that relate acoustic measurements to the unknown parameters. Ocean acoustic applications are then discussed focusing on the estimation of environmental parameters evolving in time or space. The potential of particle filtering in ocean acoustics is further demonstrated through application to experimental data from the Shallow Water 2006 experiment.

Index Terms— Sequential Monte Carlo methods, underwater acoustics, acoustic signal processing, acoustic tracking, extended Kalman filter, unscented Kalman filter, ensemble Kalman filter, particle filter, sequential importance resampling.

I. INTRODUCTION

A common feature of inverse problems in ocean acoustics is that underlying physical parameters are estimated from measured acoustic data. Examples include source localization [1]–

[4], geoacoustic inversion [5]–[9], and marine mammal signal processing [10]. In a Bayesian approach, prior knowledge and acoustic models are combined with a likelihood function to provide posterior probability density functions (PDFs) of parameters-of-interest, such as the source location [11], [12].

Geoacoustic inversion was subsequently approached in the same framework, estimating, in addition to source location, ocean environment parameters and their uncertainties [13]–

[15]. Often, those parameters evolve in time or space, with acoustic data arriving on-line at consecutive steps. Information on parameter values and uncertainty at preceding steps can be invaluable for the determination of future estimates but is often ignored. Depending on the data and the problem, several Bayesian approaches can be used to address an inversion or tracking problem. A brief comparison is given in Appendix A.

This paper focuses on Bayesian tracking.

Sequential Bayesian filtering, tying together information on parameter evolution, a physical model relating acoustic field measurements to the unknown quantities, and a statistical

Manuscript received ? ?, 2010. This work was supported by the Office of Naval Research Code 32, under grants N00014-09-1-0313, N00015-05-1- 0264, N00014-05-1-0262, and N00014-10-1-0073.

C. Yardim and P. Gerstoft are with the Marine Physical Laboratory, Scripps Institution of Oceanography, University of California, San Diego, La Jolla, CA 92093–0238 USA (email: [email protected], [email protected]).

Zoi-Heleni Michalopoulou is with the Department of Mathematical Sci- ences, New Jersey Institute of Technology, Newark, New Jersey 071028 USA (email: [email protected]).

model describing random perturbations in the field observations, offers an optimal framework for the solution of such problems.

When functions relating (i) parameters between consecutive steps and (ii) data and parameters are linear and noise is additive and Gaussian, the Kalman filter (KF) is the optimal estimator in terms of minimizing mean squared error. The KF propagates expectations and covariances of the unknown parameters from step to step, fully characterizing posterior PDFs. Some early examples used in source tracking in the ocean with KFs are given in [16], [17]. Nonlinear functions require variations or generalization of the standard filter.

Implementation in ocean acoustic problems of a straightforward extension, the extended Kalman filter (EKF), that linearizes mildly nonlinear functions, has been pioneered in a series of papers [18]–[24]. More recently, unscented Kalman filters (UKFs) have achieved better RMS error and divergence performance than EKFs, operating by selecting deterministic points to represent parameter PDFs [25], [26].

Highly nonlinear models and complex noise processes require numerical methods for the computation of posterior PDFs and, eventually, point estimates and uncertainty. These approaches are termed particle filters (PFs) and have been encountered in ocean acoustic applications in [26]–[28].

In this paper we briefly present the foundations of sequential filters, starting from the well-known KF and its variants and proceeding with PFs, that formulate sequential estimation under the most general physical models relating unknown parameters and measurements, which are embedded in noise environments of a diverse nature. We examine and compare filters and present examples, illustrating practical challenges and solutions. A more detailed presentation of these methods, including theoretical derivations, can be found in many excel- lent signal processing papers [29]–[32] and texts on sequential Monte Carlo methods [33]–[35].

Once basic principles and methods of sequential filtering are discussed, the focus of the paper is shifted to an overview of sequential filtering implemented for parameter estimation of dynamical systems in ocean acoustics. We present applications in target tracking, wave estimation, geoacoustic inversion, and frequency tracking.

After presenting a general state-space formulation for the estimation of evolving parameters in dynamical systems and clarifying the notion of posterior PDF estimation, the paper is organized as follows. KFs and their extensions and basic PFs are presented in Sections II and III, respectively. Advanced PFs are briefly mentioned in Section III and are discussed more extensively in Appendix B. Practical challenges arising in sequential Bayesian filtering are discussed in Section IV and ocean acoustic applications with special attention to Shallow Water 2006 (SW06) experimental results are discussed in Sec-

(2)

tion V. Although Section V is dedicated to solving estimation problems in the ocean, specific issues in ocean acoustics as they relate to filtering are discussed throughout the paper.

A. State-space model

In dynamical systems, a major goal is to estimate parameters that evolve sequentially with time or space. As data become available, the unknown parameters forming a state vector, are estimated sequentially using collective data history and prior knowledge on evolution of the state. Tracking in time or space presents us with problems that can be readily formalized in the following framework.

Let y_k be the observation vector (for example, pressure along a hydrophone array in ocean acoustics) at step k and xk

represent the state vector (for example, location coordinates of a source emitting signal yk), where k = 1, . . . , K. The size of the state vector is nx. Our goal is to sequentially estimate xk as data observations yk become available.

Two equations define a state-space model:

xk = fk(xk−1, vk) (1) y_k = h_k(x_k, w_k) (2) The state equation, Eq. (1), describes the evolution or transition of xk with k and assumes that states follow a first order Markov process. Function fkis a known function relating the state vector at step k to that at step k − 1. Variable vk is the process or state noise and has a known PDF p(vk).

The observation equation, Eq. (2), relates measurements yk

to state vector x_k through a usually known function h_k. Vari- able w_kis the measurement noise with a PDF p(w_k). Noise vk

and wk can be additive, multiplicative, or incorporated in the state and measurement through complex functions of fk and hk, respectively. In addition to the state vector xk and data yk, the functions fkand hk, and noise components vk and wk

can also change with k. In most ocean acoustic applications, the noise terms are additive and can be taken out of fk and hk, where hk represents some form of a forward or acoustic propagation model.

A sketch of sequential localization of a moving target as described by Eqs. (1) and (2) is shown in Fig. 1 with circles representing target location at step k. Term xk−1|k−1 is the estimate of the target location at k − 1 using data obtained up to and including step k − 1. By exploiting knowledge of target position xk−1|k−1, a filter first predicts location xk|k−1, the target location at step k. The prediction is then updated when new data yk become available.

Examining the problem from a Bayesian standpoint, we are interested in deriving the posterior PDF for xk rather than simple point estimates. The initial PDF, p(x0), of the state vector, is assumed to be known; it can be assumed in a common form such as uniform or Gaussian, based on ground- truth information, or estimated using numerical techniques such as a Markov chain Monte Carlo (MCMC) sampler on the initial data [14], [36]. Let Yk= [y1, y2, . . . , yk] be the set of data observed at the first k steps and Xk = [x1, x2, . . . , xk] be the sequence of unknown state vectors. It is desirable to estimate the state vector and its uncertainty at all steps via

Fig. 1. Sequential Bayesian filtering. From state xk−1, state xk is first predicted via the state equation [Eq. (1)], providing xk|k−1. As data yk

becomes available, observation equation [Eq. (2)] is employed to update state xk, providing x_k|k.

PDF p(Xk|Yk). Due to the computational cost, the problem is often simplified by recursively estimating the marginal PDF p(x_k|Yk) from p(xk−1|Yk−1), rather than estimating joint PDF p(Xk|Yk).

With p(xk−1|Yk−1) available, we can predict p(xk|Yk−1) through the transition PDF p(xk|xk−1). The transition density is determined by the state equation [Eq. (1)] and noise PDF p(w_k). Due to the first order Markov chain assumption of xk, p(xk|xk−1) does not depend on data Yk−1. Density p(xk|Yk−1) can be written as:

p(x_k|Yk−1) = Z

p(x_k|xk−1, Y_k−1)p(x_k−1|Yk−1)dx_k−1

= Z

p(x_k|x_k−1)p(x_k−1|Y_k−1)dx_k−1. (3) When a new observation yk becomes available, we can update the state xk using the likelihood of the state vector l(xk) = p(yk|xk) and the prediction PDF p(xk|Yk−1) of Eq. (3). The new estimate of the state vector at k is calculated via Bayes theorem:

p(x_k|Y_k) =p(y_k|x_k)p(x_k|Y_k−1)

p(yk|Yk−1) . (4) The denominator in Eq. (4) is the normalizing constant of p(xk|Yk), termed the evidence, and may be expressed as:

p(y_k|Y_k−1) = Z

p(y_k|x_k)p(x_k|Y_k−1)dx_k. (5) Density p(xk|Yk) contains all information necessary for the computation of marginal PDFs of elements of xk, moments, and modes. Optimal sequential estimators [37], such as the minimum mean squared error (MMSE) estimator, which cal- culates the expected value (E[xk|Yk]) of the state after data have been observed, and the maximum a posteriori (MAP) estimator that provides the likely value of the state also after data observation, can be readily implemented.

II. KALMAN FILTERS

The discussion of Section I-A raises the question of how to track optimally and efficiently an evolving density p(xk|Yk).

Kalman [38] first formalized the answer. He showed that, if this evolving PDF can be represented by a Gaussian distribution, the optimum Bayesian estimator (one that attains the Bayesian Cram´er Rao lower bound (CRLB) [39] with

(3)

TABLE I KALMAN FILTER

♦Predict:

xb_k|k−1 = F_kbx_k−1|k−1

P_k|k−1 = FkP_k−1|k−1F^T_k+ Qk

♦Update:

bx_k|k =bx_k|k−1+ K_k`y_k− H_kxb_k|k−1´ P_k|k = (I − KkHk) P_k|k−1

Kalman gain:

Kk = P_k|k−1H^T_k`R_k+ HkP_k|k−1HkT´−1

the MMSE) exists and can be computed analytically. This optimum sequential estimator has been termed the Kalman filter (KF). Having a Gaussian posterior at each step implies that:

1) State and measurement noise must be additive. Noise variables vk and wk, and the prior p(x0) must be Gaussian.

2) fk and hk are known linear functions of the state and measurement vectors.

This reduces the system given in Eqs. (1–2) to:

xk = Fkxk−1+ vk (6)

y_k = H_kx_k+ w_k, (7)

where Fk and Hk replace fk and hk as matrices representing known linear functions of the state vector xk. State and measurement noise covariances Q_k and R_k are:

E{v_kv^T_i} = Q_kδ_ki

E{w_kw^T_i} = Rkδ_ki (8) E{v_kw^T_i} = 0 ∀ i, k.

Although the requirements given above are fairly restrictive, the resulting filter is straightforward to implement. Because a Gaussian PDF is uniquely and completely defined by its first two moments, mean and covariance, the KF needs only to recompute mean bx and covariance P at step k, using bx_k−1, P_k−1, and the new measurement yk.

This is a two-step procedure, repeated at each k:

Predict:This stage predicts the current value of the state given its previous value using Eq. (6). This is represented bybx_k|k−1 and P_k|k−1.

Update:This stage updates or corrects the values predicted in the previous stage given the information obtained from measurement yk. This provides the posterior at k represented byxb_k|k and P_k|k.

The resulting set of equations for the linear, Gaussian KF is given in Table I where Kk represents the Kalman gain.

A. Extended Kalman filter

Although elegant and easy to implement, the KF has strict linearity and Gaussian PDF requirements that make it unsuit- able for a large class of systems that cannot be characterized

TABLE II EXTENDEDKALMANFILTER

♦Predict:

bx_k|k−1 = fk

`

bx_k−1|k−1´ Pk|k−1 = FkPk−1|k−1F^T_k + Qk

♦Update:

bx_k|k =bx_k|k−1+ Kk`y_k− hk

` bx_k|k−1´´

P_k|k = (I − KkHk) P_k|k−1 Kalman gain:

K_k = P_k|k−1H^T_k`R_k+ H_kP_k|k−1H^T_k´−1

Jacobians:

F_k = _∂x^∂f^k

k−1

˛

˛x_k−1=bx_k−1|k−1

Hk = ^∂h_∂x^k

k

˛

˛x_k=bx_k|k−1

by Eqs. (6–7). An obvious way to extend the KF framework is to linearize functions fk and hk in Eqs. (1–2) around the current state value. When coupled with a Gaussian prior p(x0) and Gaussian noise terms, the posterior p(xk|Yk) will remain Gaussian. Thus, only mean and covariance need to be tracked via slightly modified KF equations.

The new filter, the extended Kalman filter [37], [40] (EKF), linearizes locally the state and measurement equations using the first terms in the Taylor series expansions. For the EKF to perform well, nonlinearities should be small and the underlying densities should be close to Gaussian. However, even under these circumstances, because of the implied approximations, the EKF cannot claim the optimality enjoyed by the KF for linear-Gaussian systems. For large nonlinearities, the mean and covariances will not be mapped correctly as shown in the example of Sec. II-B. Nevertheless, the EKF has been implemented successfully in numerous applications in areas such as radar and sonar target tracking among others. The EKF algorithm is summarized in Table II.

B. Unscented Kalman filter

The simplicity of the Kalman framework arises from its Gaussian nature. The filter only needs to carry the mean and covariance to the next step. A nonlinear system will disrupt the flow of evolving Gaussian PDFs, because a Gaussian random variable passing through a nonlinear transformation will lose its Gaussian form and solely mean and covariance will not be adequate in fully defining this density. This realization leads to the development of KFs that use sigma point transforms also known as unscented transforms (UT) [41]. The UT enables the propagation of the mean and covariance through nonlinear functions. KFs that make use of the UT are, hence, called unscented Kalman filters (UKF) [42].

While the EKF enforces linearity through analytic linearization, the UKF enforces Gaussianity while keeping the functional nonlinearity. This means the filter assumes that any nonlinear transformation of a Gaussian input PDF will always

(4)

be another Gaussian PDF. Since a Gaussian input results in a Gaussian output with this assumption, the system is effectively linearized in a statistical rather than analytical sense. This still enables the filter to carry all necessary information by propagating only the mean and covariance as required by the KF.

The UKF represents densities using only a few predeter- mined particles (samples) called sigma points. The UT chooses these deterministically to describe accurately the mean and covariance of a PDF. As the random variable undergoes a nonlinear transformation, these points are propagated through this nonlinear function and used to reconstruct the new mean and covariance using the UT weights [41]. Hence, unlike with the EKF, mean and covariance to at least second order of the nonlinearity (third if the initial PDF is Gaussian) can be computed accurately.

The UKF uses a recursive formulation where 2nx+ 1 sigma pointsXⁱ ²ⁿx

i=0 and their corresponding weights Wⁱ are generated and used with the unscented transform (UT) to perform the mean (bx_k) and covariance (P_k) calculations required in the Kalman framework. The UT weights are computed using three independent parameters: prior knowledge parameter β, scaling parameter κ, and parameter α, which is used to control the spread of the sigma points around the mean. The second scaling parameter λ can be computed from these parameters:

λ = α²(nx+κ)−nx. The algorithm is presented in Table III.

Details about the derivation can be found in [41].

Although UKFs are derivative-free, an improvement over EKFs, and efficient relative to more advanced techniques that will be discussed later, they have two disadvantages as demonstrated in the example below.

1) Example: EKF vs. UKF vs. PF: Assume that a Gaussian random variable x with p(x) ∼ N (µx, Σx) goes through a nonlinear transformation y = h(x) with:

µx=0.25 10

, Σx=0.1 0.2 0.2 0.5

, y =x²(1) − x(1)x(2) x(2) cos x(1)

. A cloud of 10,000 particles drawn from p(x) is plotted in Fig. 2 together with the mean and 3σ covariance ellipse. When these particles are propagated through the nonlinear equation, the distribution becomes non-Gaussian as shown.

The analytical linearization in EKF uses the Jacobian:

H = ^∂h(x)_∂x = 2x(1) − x(2) −x(1)

−x(2) sin x(1) cos x(1)

. This will result in a Gaussian with a mean and covariance of µb_y = h(µ_x) and bΣy= HΣxH^Trepresented by × and a dashed ellipsoid in Fig. 2. Both mean and covariance have high errors in the estimates.

Mean and covariance are also obtained by using the sigma point generation and a successive UT given in Table III. The values obtained by UT are very close to their true values with almost identical ellipses. Fig. 2 highlights two main drawbacks of KFs in nonlinear, non-Gaussian problems:

The first is the effect of nonlinearity on the accurate estimation of mean and covariance. This example is ‘too nonlinear’

for the EKF but not for the UKF for obtaining good mean and covariance estimates. However, there are many cases where the nonlinearity may be severe and may require an even higher

TABLE III UNSCENTEDKALMANFILTER

♦Predict:

xb_k|k−1 =P2nx

i=0W_mⁱX_k|k−1ⁱ P_k|k−1 = Qk

+P2n_x i=0W_covⁱ

h

X_k|k−1ⁱ −xbk|k−1

i h

X_k|k−1ⁱ −bxk|k−1

iT

♦Update:

xb_k|k =bx_k|k−1+ Kk`y_k−by_k|k−1´ Pk|k = Pk|k−1− K_k(Pyy+ Rk) K^T_k Sigma point generation:¹

X_k−1⁰ =bxk−1|k−1

W_m⁰ = _n^λ

x+λ, W_cov⁰ = W_m⁰ + β + 1 − α² X_k−1ⁱ =bx_k−1|k−1+“q

(nx+ κ)P_k−1|k−1”

i

W_mⁱ = W_covⁱ =_n^0.5

x+λ i = 1, . . . , nx

X_k−1ⁱ =bx_k−1|k−1−“q

(nx+ κ)P_k−1|k−1

”

i

W_mⁱ = W_covⁱ =_n^0.5

x+λ i = nx+ 1, . . . , 2nx

Unscented transform:

X_k|k−1ⁱ = f_k“ X_k−1ⁱ ” Y_k|k−1ⁱ = h“

X_k|k−1ⁱ ”

, yb_k|k−1=P2nx

i=0W_mⁱY_k|k−1ⁱ Pxy =P2n_x

i=0W_covⁱ h

Xⁱ

k|k−1−xbk|k−1

i h Yⁱ

k|k−1−ybk|k−1

iT

Pyy =P2nx i=0W_covⁱ h

Y_k|k−1ⁱ −by_k|k−1i h

Y_k|k−1ⁱ −by_k|k−1iT

Kalman gain:

Kk = Pxy(Pyy+ Rk)⁻¹

order accuracy than the UKF can provide to correctly capture mean and covariance.

The other is that the densities may be too non-Gaussian to be represented using the first two moments, even if they can be calculated correctly. The nonlinear PDF p(y) cannot be sufficiently characterized with mean and covariance alone in this example. In such cases, the only way to accurately propagate complex, non-Gaussian PDFs is to store the cloud of particles that represent the PDF as shown in Fig. 2. This will form the basic philosophy of particle filtering.

The UKF is only the first and most widely used sigma-point KF. It is possible to select different sigma points and a different UT to match more than the first two moments. Moreover, less complicated and more stable versions of UKF with reduced computational cost have also been proposed. [43]

C. Ensemble Kalman filter

The Ensemble KF (EnKF) [44] is a Kalman variant for systems where the state is composed of a large number of variables. This type of problem is encountered frequently in geophysical and hydrological fields, ocean and atmospheric

(5)

−1 −0.5 0 0.5 1 1.5 7.5

8 8.5 9 9.5 10 10.5 11 11.5 12 12.5

x₁

x2

−155 −10 −5 0 5 10

6 7 8 9 10 11 12

y₁

y 2

Fig. 2. Mean and covariance propagation of the EKF and UKF in a nonlinear, non-Gaussian system with y = h(x). True mean (◦), means from analytical (×) and statistical (+) linearizations are given with covariance estimates in terms of 3σ ellipses for the EKF (- - -), UKF (—), and the true covariance (—).

modeling, and data analysis [45]. As discussed in [46] and Sec.

IV-E, most PFs are not appropriate for very large dimension problems.

The EnKF is a KF that includes MC sampling. It is possible to argue that the EnKF is a hybrid between a KF and a PF (see Section III for details on PFs). The EnKF assumes that the large number of state variables and all the PDFs are Gaussian, hence the problem is based on a Kalman framework. However, working with a large covariance matrix in a high-dimensional system is computationally inefficient for the classical KF. This drawback is addressed by using an ensemble of points, similar to the particle set in a PF, to replace the covariance matrix with the ensemble covariance matrix using MC sampling. Detailed theoretical formulation and practical implementation can be found in [47].

III. PARTICLE FILTERS

Often the dynamics of a problem cannot be adequately captured by the Kalman family of filters, because of the nonlinear structure of state and measurement equations and the nature of the noise. When KFs and their extensions fail, the estimation of x_k via filtering posterior PDFs p(x_k|Y_k) may be possible using numerical sequential MC techniques, commonly referred to as PFs. It is important to emphasize that the PFs should not be viewed as superior to the KFs but more as filters complementary to the KFs. All variants of Bayesian filters have their place and perform better in terms of RMS error and divergence performance than the other filter types for particular sets of problems.

PFs track the posterior PDF p(xk|Yk) using a cloud of particles {xⁱ_k}^N_i=1^p = {x¹_k, . . . , x^N_k^p} that evolve with step k.

All other estimates we are interested in can be calculated from p(xk|Yk), including bx^{M M SE}_k andbx^{M AP}_k solutions, the variance in the estimates, and the marginal posterior PDF

p(xk(j)|Yk) of the jth state variable:

p(xk(j)|Yk) = Z

δ(x⁰_k(j) − xk(j))p(x⁰_k|Yk)dx⁰_k. (9)

A. Importance Sampling

Before presenting the details of the PF, it is important to summarize how importance sampling (IS) and MC integration work. IS is a method to compute expectations with respect to one distribution using random samples drawn from another.

Assume that we want to compute an integral I =R f (x)dx.

One way of computing I is using samplers [48] after assuming x is a random variable, defining f (x) = g(x)p(x) and rewriting I in the form of an expectation:

I ≡ E_p{g(x)} = Z

X

g(x)p(x)dx , (10)

where g(x) is some function of x with PDF p(x). This integral can be computed numerically via MC integration [48], by drawing Npindependent and identically distributed x samples from the sampling density (or proposal density) p(x):

{xⁱ}^N_i=1^p ∼ p(x) −→ I ≈ 1 Np

N_p

X

i=1

g(xⁱ) (11)

Selection of an appropriate sampling PDF is the critical part in IS. It is customary to use a PDF easy to sample from, such as a Gaussian or uniform PDF. However in Bayesian tracking and inversion, the integral is already in the form of Eq. (10) where p(x) is a complex PDF, e.g. Eq. (9). It is still possible to use a simple density q(x) as the sampling density by rewriting

(6)

Eq. (10) and using MC integration to get:

I = Z

X

g(x)p(x) q(x)

q(x)dx ≡ Eq

g(x)p(x) q(x)

{xⁱ}^N_i=1^p ∼ q(x) (12)

I ≈ 1 Np

N_p

X

i=1

g(xⁱ)p(xⁱ)

q(xⁱ) (13)

≈ 1 N_p

Np

X

i=1

wⁱg(xⁱ) with wⁱ=p(xⁱ)

q(xⁱ) , (14) where wⁱ are the importance weights. The variance is [48]

Varq = Eq

( p(x) q(x)

2

− I² )

/Np . (15)

Thus, the variance in the estimate is minimum if q(x) is proportional to p(x) and increases as q(x) deviates from p(x).

Using IS requires selection of a balanced q(x): As easy to sample from as possible without sacrificing the accuracy of the method.

B. Sequential Importance Sampling

Bayesian filtering requires performing successive IS runs at each k, where the output of each IS run is used as the prior for the next one. This process is referred to as sequential importance sampling (SIS). Recalling that Yk = [y1, y2, . . . , yk] and Xk = [x1, x2, . . . , xk] from Section I-A, it is possible to obtain the filtering PDF p(xk|Yk) from the full posterior PDF p(Xk|Yk):

p(x_k|Yk) = Z

p(X_k|Yk)dx₁dx₂. . . dx_k−1 , (16)

= Z

δ(x_k− x_k⁰)p(X_k⁰|Y_k⁰)dX_k⁰ , (17) which is in the same form as Eq. (10). Assuming a sampling density q(Xk|Yk) and performing IS, we obtain

p(xk|Yk) ≈

Np

X

i=1

wⁱ_kδ(xk− xⁱ_k) , (18)

wⁱ_k∝ p(Xⁱ_k|Yk)

q(Xⁱ_k|Y_k) . (19) However, to perform IS sequentially at step k, we utilize the IS results of the previous step k − 1, that is, we use the cloud of particles and associated weights {xⁱ_k−1, wⁱ_k−1}^N_i=1^p. This is done by selecting q(Xk|Yk):

q(X_k|Yk) = q(x_k|xk−1, Y_k)q(X_k−1|Yk−1) . (20) Expanding the full posterior PDF (result is only shown here, for additional details see [34])

p(Xk|Yk) =p(yk|xk)p(xk|xk−1)

p(yk|Yk−1) p(Xk−1|Yk−1) , (21) where the weight of the ith particle at step k is computed by inserting Eqs. (20–21) into Eq. (19) and ignoring the constant

term p(yk|Y_k−1):

wⁱ_k∝ p(yk|xⁱ_k)p(xⁱ_k|xⁱ_k−1)

q(xⁱ_k|xⁱ_k−1, Y_k) ×p(Xⁱ_k−1|Yk−1)

q(Xⁱ_k−1|Yk−1) (22)

∝ p(yk|xⁱ_k)p(xⁱ_k|xⁱ_k−1)

q(xⁱ_k|xⁱ_k−1, Yk) wⁱ_k−1 . (23) A key issue in PF design is choosing a good proposal distribution. A simple choice is q(xk|xk−1, Yk) = p(xk|xk−1), reducing Eq. (23) to

wⁱ_k= p(yk|xⁱ_k)wⁱ_k−1 . (24) It is important to realize that while we have used the full posterior PDF p(Xk|Yk) in deriving the IS Eqs. (16–24), only the likelihood p(yk|xk) at step k is used in updating the weights wⁱ_k. The implementation of PF is only concerned with the marginal PDF p(xk|Yk), as seen in Table IV.

C. Sequential Importance Resampling

Although SIS provides a complete framework for performing sequential Bayesian estimation, practical implementation quickly runs into a problem called sampling degeneracy. After a few iterations of successive SIS, the process leads to a cloud containing only few particles with large weights and numerous particles with negligible weights. In the extreme case, there will be only one particle left with a large w_kⁱ. This loss of sample diversity will result in poor filter performance.

To counter degeneracy, a second sampling stage is proposed right after the update stage [49]. The purpose of this resampling stage [50] is to create more high-weight particles from the original set of particles {xⁱ_k}^N_i=1^p. This is achieved either by setting a minimum effective number of particles N_p^{ef f} needed to maintain diversity and performing resampling every time N_p^{ef f} drops below this threshold, or, simply, by performing resampling at the end of each step k. This filter is called sequential importance resampling (SIR) [49], [51], [52] and is by far the most popular PF implementation.

The resampling process takes the posterior filtering PDF p(xk|Yk) represented by the particle set {xⁱ_k, wⁱ_k}^N_i=1^p of SIS at each step and redistributes samples so that all weights of the new particles are the same, that is, {x^j_k, w_k^j = 1/Np}^N_j=1^p . This results in a larger number of particles in the high likelihood regions, preventing degeneracy. It is best to examine the degeneracy and resampling via an example.

1) Example: Need for resampling: The example given in Figs. 3–4 is an application of a SIR PF tracking a target at step k = 10. The filter runs with Np= 200 particles on a state vector xkcontaining the location of the target. Target location can take values between 0 and 35 with the true value of x₁₀= 20. The equation relating data to state is: yk(i) = δ(i − xk) + wk(i), where wk(i) is zero mean white Gaussian noise, i = 1, . . . , 100. The state transition equation is selected as: xk = xk−1+ vk, where vk is also zero mean white Gaussian noise.

Figs. 3(a) and (b) illustrate the particle values and their corresponding weights at the end of the update stage (◦) before resampling. Only four particles have significant weights and provide substantial information on the posterior PDF. Hence, the goal of resampling is to create more particles at around

(7)

Fig. 3. The need for resampling: (a) Particles before (◦) and after (∗) resampling, (b) particles before resampling and their weights {xⁱ_k, w_kⁱ}^N_i=1^p for k = 10 .

Fig. 4. (a) Resampling via the CDF, (b) histogram of particles after resampling showing large number of newly created particles at high likelihood regions.

x10 = 20, so that SIS at step k + 1 will not be adversely affected from degeneracy.

Fig. 4 demonstrates the implementation of the resampling stage. The new particles generated after resampling are plotted (∗) together with the original set of particles in Fig. 3(a). Re- sampling creates a new discrete distribution by first computing the cumulative distribution function (CDF) of p(xk|Y_k) from {xⁱ_k, wⁱ_k}^N_i=1^p, as shown in Fig. 4(a). Using a sample-drawing technique by randomly picking samples from a [0,1] uniform distribution, new particles are generated via the inverse CDF.

Regions corresponding to large weights are replicated multiple times in the new cloud. On the other hand, particles with small weights are unlikely to be selected. The histogram of the resampled particles in Fig. 4(b) provides the new estimate of p(xk|Yk). Note that almost 90% of the new particles are in the high likelihood region with a minimal number of particles

“wasted” in areas with low probability.

2) SIR algorithm: We can now merge the SIS and resampling to create the SIR algorithm outlined below:

Predict: This stage is a SIS prediction that starts with the cloud of equal weigh particles from the previous step {xⁱ_k−1}^N_i=1^p, and creates a new set of predictions for the current step {xⁱ_k|k−1}^N_i=1^p by sampling from the transitional density p(xk|xk−1). This is done by propagating each xⁱ_k−1 through the state equation Eq. (1) together with a random realization from vk. This step corresponds to the implementation of Eq (3).

Update: Note that w_k−1ⁱ = 1/Np for all i because of resampling at step k − 1. Having measured yk, the weight of each particle is re-evaluated and normalized via Eq. (24):

wⁱ_k= p(y_k|xⁱ_k|k−1)

Np

X

i=1

p(yk|xⁱ_k|k−1)

, (25)

where p(yk|xⁱ_k|k−1) is the likelihood function defined by the observation equation [Eq. (2)], which includes statistical behavior of errors in the data. The weights of Eq. (25) are used in Eq. (18) for expressing the posterior PDF p(x_k|Y_k).

Resample: New particles {x^j_k|k, w_k^j = 1/Np}^N_j=1^p are drawn from a discrete approximation to the p(xk|Yk) obtained at the update stage. This is achieved through the calculation of the CDF of xk as explained above. Different types of resampling are presented in [32], [53]. All particle weights are now equal to 1/Np.

A single iteration of a ten-particle SIR algorithm is illus- trated in Fig. 5. Each particle is represented by a cylinder with height proportional to the likelihood. As usual, the first stage is the prediction stage where new particles are created from the particles representing the PDF of the previous step p(xk−1|Yk−1), the state noise at step k, and Eq. (1). The likelihood of each of these new particles is calculated using new data yk. This update stage is followed by the resampling stage where a new set of particles is formed from the previous one. The larger the weight of a particle, the more new particles it produces during resampling.

Resampling sometimes adds significantly to the computational requirements of a PF. In this case, quasi-random resampling methods can be used to decrease computational cost as an alternative to the method explained in Fig. 4.

The most commonly used efficient resampling algorithm is systematic resampling [33]. In this approach, only one random sample is picked from a [0,1/Np] uniform distribution and the CDF is sliced with Np equidistant points with 1/Np spacing starting from the random sample. Original particles are then converted to new particles using the inverse CDF as shown on the right in Fig. 5. Cylinders representing each particle are stacked to obtain the CDF. The new particles are selected by randomly choosing x¹and equally sampling upwards. Further discussion about resampling algorithms is given in Section IV- F.

To implement the filtering process for k = 1, an initial probability distribution for state vector x0 has to be selected.

(8)

Fig. 5. Particle flow for a SIR algorithm with Np= 10 particles.

TABLE IV SIR PARTICLE FILTER

♦Predict:

{xⁱ

k|k−1}^N_i=1^p ∼ p(x_k|x_k−1) given {xⁱ_k−1|k−1}^N_i=1^p xⁱ_k|k−1 = fk

“

xⁱ_k−1|k−1, vⁱ_k”

i = 1, . . . , Np

♦Update:

wⁱ_k = ^p(y^k^|x

i k|k−1) N_p

X

i=1

p(y_k|xⁱ_k|k−1)

p(x_k|Y_k) ≈

N_p

P

i=1

wⁱ_kδ(x_k− xⁱ_k|k−1)

♦Resample:

{xⁱ_k|k−1, wⁱ_k}^N_i=1^p 7−→ {x^j_k|k,_N¹

p}^N_j=1^p s.t.

p(x_k|Y_k) ≈ _N¹

p N_p

P

j=1

δ(x_k− x^j_k|k)

Prior p(x0) can be obtained via a sampler or a uniform or Gaussian PDF can be used. The latter requires more steps before stable estimates are obtained. One can sample from p(x0) to generate the initial cloud {xⁱ₀}^N_i=1^p. Particles in this cloud will be propagated through the predict, update, resample process for the estimation of the unknown parameters at the subsequent step.

The process at step k is summarized in Table IV.

D. Advanced Particle Filters

Although SIR provides robust tracking performance in many nonlinear/non-Gaussian tracking problems, it has two basic flaws:

1) Sample impoverishment: Since the resampling stage creates many exact replicas of high likelihood particles, it is possible to lose sample diversity as the filter progresses.

2) Sensitivity to outliers: The popularity of the SIR filter lies in the simplicity of the selection of transition prior p(xk|xk−1) as the sampling density. This is, at the same time, a weakness, because the prediction at step k does not use the newly available information in measurement yk, performing a ‘blind’ prediction. The filter becomes vulnerable to outliers, especially when the likelihood is peaked relative to the prior [34].

To address these problems, advanced PF variants have been developed. [30], [34], [54] The most common ones are briefly discussed in Appendix B in terms of the specific concern they address, their main advantages, and possible drawbacks.

Comparative information is provided in Table V, summarizing salient features and differences.

E. Model order selection

A state-space system sometimes needs to be described using more than one model. Moreover, it may not be possible to know a priori which model is the most appropriate for a specific problem. Often, the dimension of the state vector itself is not known, because either the model or model order is not known. In ocean acoustics, this problem may be manifested in:

(9)

TABLE V

FILTERCOMPARISON. L: LINEARG: GAUSSIANNL:NONLINEARNG:NON-GAUSSIAN.

Filter Sect Problem PDF Effort Reference Details

KF Sec II L/G µ and Σ Min. Kalman [38] Optimal filter when problem is L/G.

EKF Sec II-A mild µ and Σ Low Cox [40] Analytical linearization by using the Jacobians.

NL/NG Kay [37] Reduces the NL/NG problem to a L/G one.

UKF II-B moderate µ and Σ Low Julier [41] Statistical linearization: Unscented transform propagates µ and Σ.

NL/NG sigma pts Wan [42] Derivative free, reduces the NL/NG problem to a NL/G one.

EnKF Sec II-C moderate µ and Σ Med- Evensen [44] Ensemble statistics are used to propagates µ and Σ.

NL/NG ensemble High Reduces the NL/NG problem to a L/G one.

SIS Sec III-B NL/NG particles High Doucet [33] Degeneracy problem.

SIR Sec III-C NL/NG particles High Gordon [49] Resampling avoids degeneracy but results in sample impoverishment.

Blind prediction stage, potentially sensitive to outliers and inefficient.

RPF App B-A NL/NG particles High Musso [55] Impoverishment solved using kernel funcs., problems in larger dim.

The kernel approx. violates the asymptotic convergence to posterior.

MCMCPF App B-B NL/NG particles High Gilks [52] Impoverishment solved using a MCMC kernel.

A MCMC sampler is inserted in the resampling stage.

APF App B-C NL/NG particles High Pitt [56] Blind prediction and degeneracy issues solved using an aux. variable.

Resampling in prediction stage, problems with large state noise.

RBPF App B-D NL/NG particles High Liu [57] Effective when some parameters are conditionally L/G.

Doucet [51] L/G part of the state is solved using a KF, NL/NG part using a PF.

LPF App B-E NL/NG particles High Doucet [51] Each particle is treated as a KF, creating a bank of EKF/UKFs.

Merwe [58] Implementation using UKFs is called unscented particle filter.

• Tracking an unknown number of targets [59].

• Tracking targets with different motion models such as slow moving and fast moving targets, targets with large accelerations, maneuvering targets.

• Tracking a changing number of unknown vertical modes of the acoustic field, the number of which changes because of range dependence. Tracing an unknown number of modes in time-frequency representations.

• Switching between ray tracing, normal mode, adiabatic normal mode, near-field normal mode, narrow-angle parabolic equation, and wide-angle parabolic equation models that relate data to state parameters, depending on factors such as distance and environmental complexity between source and receiver.

• Geoacoustic tracking of sediments and their varying properties with an unknown number of layers. For static environmental inversions, ocean acoustic hypothesis test- ing [60] and evidence calculation [15], [61] have been suggested for addressing this problem.

Successful tracking under such circumstances requires sequential filtering algorithms that are flexible with the ability to switch between models [55], [62]: The state space is augmented with model order mk, becoming [x^T_k mk]^T, where parameter mk takes discrete values corresponding to model order or identifier, the former for an unknown model order and the latter when the model relating data and state is unknown. The transition equation relating xk to xk−1 also relates model parameter of consecutive states with a certain transition probability: particles contain samples both for xk

and mk.

At each step, mⁱ_k, i = 1, . . . , Np, are updated using values mⁱ_k−1 and the model order transition density. Once model parameter mⁱ_k within the ith particle is determined, a “model- conditioned” SIR is run, where the particles, after updates imposed by the state equation, are propagated through the likelihood, relating measurements and state [34]. Because of the unknown and varying mk, there are as many state

and observation equations as possible model orders. Since the state model for each model has a different number of parameters, the length of the state vector may be variable with state. Varying model algorithms are often referred to as jump Markov or switching filters due to their ability to jump or switch between different models.

1) Example: Multiple model target tracking: The example of Section III-C is expanded to include a second target that appears at step k = 11. Both transition and observation equations include white Gaussian noise, vk and wk, respectively:

xk= xk−1+ vk, (26)

and

yk(i) =

m_k

X

l=1

δ(i − xk(l)) + wk(i), (27) where k = 1, . . . , 30, and i = 1, . . . , 100. Model order mk

determines how many targets are present. We assume that xk(1) and xk(2) represent locations of two targets at step k.

The target locations are functions of step k: xk(1) = 2k and xk(2) = 3k. The transition matrix for model order is constant for all states and is given by:

P =

0.9 0.1 0.1 0.9

, (28)

where it is assumed that the probability of a sudden appearance of a second target or disappearance of an existing target is only 10%.

True state vector values are shown in Fig. 6(a) (solid lines).

To estimate state parameters, particles for x1(1) and x1(2) at k = 1 are drawn from a uniform distribution between 1 and 100. Model order for each particle is initialized via a uniform prior allowing discrete values 1 and 2. Particles are propagated through the state equation, are updated via the observation equation and resampled, and are then perturbed to predict the following step. Circles in Fig. 6(a) illustrate the modes of the posterior PDFs for state variables at k = 1, . . . , 30. The first target is tracked accurately with small deviations from true

(10)

Fig. 6. (a) Target trajectories with a varying model order: true (solid lines) and estimated with a PF (◦). Posterior PDF of model order mkfor (b) k = 10, (c) k = 11, (d) k = 12, and (e) k = 13.

values because of noise wk. The second target is missed at k = 11 and 12, and is subsequently detected and correctly localized.

Figs. 6(b–e) demonstrate the posterior PDFs of model order mk at steps (a) 10, (b) 11, (c) 12, (d) 13. At k = 10, only one target is present and the model order PDF clearly supports that. A second target enters at k = 11 but the PDF of the order remains concentrated at 1 and the second target is missed. At k = 12 the system still favors one target, however, significant probability appears at m12= 2. The data contain contributions from both targets and further update the PDF of the model order. At k = 13, the presence of two target is favored: m₁₃= 2. The system learns the presence of the second target slowly because of the low probability of transition between orders and the posterior PDF from k − 1. Changing P alters filter behavior. A P with a larger transition probability would be more sensitive to sudden changes, capturing the change at k = 11 but the location estimates would have larger RMS errors since more particles would be allowed to “jump”.

IV. PRACTICAL ISSUES

Sequential MC techniques provide a powerful framework for performing signal processing in non-stationary dynamic systems involving nonlinear equations and non-Gaussian PDFs. Implementation of such methods requires a careful selection of a suitable and computationally efficient filter, noise in state and measurement equations, and the number of particles when a particle filtering approach is the best choice.

Some of the practical issues arising in state-space estimation are discussed below.

A. Expressing the problem in a dynamic state space format Establishing the observation equation Eq. (2) that relates data yk to the state vector xk and the state equation Eq. (1) that expresses the dynamic evolution of xk is the first step in the implementation of a Bayesian filter. In ocean acoustics, yk

is often the acoustic field received at an array of hydrophones.

Some examples of states x_k in ocean acoustics are source bearing, location, velocity and acceleration, ocean bathymetry and sound speed profile, geoacoustic parameters, parameters related to acoustic modes and dispersion, frequency, and Doppler shift.

B. Likelihood function

In addition to the parametric relation between data and state vectors, Eq. (2) contains noise component wk, describing the error in data observations. The likelihood function for xk is derived based on the function hk and statistical properties of the noise wk. In ocean acoustic applications, data errors are often considered white and Gaussian, giving analytically tractable likelihoods. As an example, likelihood functions for geoacoustic and source location parameters derived from a unified approach based on maximum likelihood and additive, complex Gaussian noise are given in [63].

The assumption of white Gaussian noise is, however, not uniformly suitable. It is not always easy to identify likelihood p(yk|xk). This can occur either because a parametric observation equation cannot be easily derived or because the statistical model for the observed data is unknown [64]. Sequential filtering (with PFs) is still possible under such circumstances, by determining an estimate of the likelihood function with the help of a training step: computation of the likelihood is performed through kernel density estimation that employs known, training state vectors xland corresponding output data y_l, l = 1, . . . , L. The likelihood is then expressed as a sum of appropriately weighted and shifted kernels, with weights and lags calculated so that training state vectors and data are suitably connected.

C. State vector transitions

State noise vk in Eq. (1) determines the error in the state vector evolution and is often chosen as Gaussian. Further, parameters in the state vector are usually assumed uncorrelated, i.e.,Qk is diagonal.

Although state variables are not expected to vary in a way that is not consistent with the state evolution model [Eq. (1)], sudden changes can occur. The filters in such cases need a noise term vk with a suitable covariance Qk to continue tracking even when the state behaves in an unexpected way.

For example, when bathymetry is tracked in a relatively flat ocean environment, water column depth usually varies slowly with range. Hence, only small perturbations of the state vector can capture these variations. The presence of a canyon, on the other hand, results in radical bathymetry change. A small variance in the state noise may then cause the filtering process to diverge. Covariance Qkshould have values large enough to accommodate the unexpected changes, but at the same time small enough to prevent noisy estimates and poor performance.

These trade-offs can make state noise selection a challenging practical problem. An example where state noise and the number of particles need to be carefully selected because of sharp changes in sediment thickness and sound speed is discussed in Section V-B.