We have found a relationship between the effectiveness of **parameter** **estimation** and the effect of small variations of a **parameter** in the system’s dynamics. Namely, the larger the dynamical differences produced by small variations in the value of the **parameter**, the more likely the **parameter** is to be estimated well **using** this method. The noticeable dynamical differences due to variations in the **parameter** value cause higher error in the RMSE of the state **estimation**, which causes the parameters to be adjusted more readily to the true value of the **parameter**. The sensitivity of the **model** to the **parameter** has been shown to be a reliable indicator of the success of **parameter** **estimation** in general in Figure 7. It is worth noting that due to the randomness inherent in the algorithm at multiple stages, the success or failure of **parameter** **estimation** is not guaranteed regardless of the sensitivity of the Fenton-Karma **model** to the **parameter**. Despite the dynamically sensitive τ d being shown to be reliably estimated over five separate instances in Figure 6, there may be pernicious cases where the randomness is selected in such a way that the estimated value of the **parameter** does not converge to the true value of the **parameter**. However, these are atypical cases; more commonly, the dynamically influential parameters in the **model** are estimated reliably, regardless of randomness used during the algorithm. For all figures throughout this thesis, the algorithm has been run multiple times to determine typical behavior, and results have been selected to be representative of the normal output produced.

Show more
60 Read more

distributions, consistent with the truth. **Parameter** D is not so clear, with perhaps some evidence of a continued mod- est drift to lower values, but again it is consistent with the value used to generate the identical twin data. **Parameter** B, however, is clearly not constrained in the 1 year experiments although there are some signs that it is converging with the 5 year iterations. Given that these parameters were initially selected to be those for which the **model** was most sensi- tive, this suggests that the data used are barely adequate for constraining as many as 5 parameters simultaneously. Even though there are 8196 data points in all, they only represent 2 types of measurement so this is a not entirely unexpected result. It is, however, also possible that the preselection of sensitive parameters (based around the default values) may not be valid close to this new optimum.

Show more
5 Conclusions
A new **model**, the Advanced **Ensemble** electron density (Ne) Assimilation System (AENeAS) has been developed.
AENeAS is a physics-based data assimilation **model** of the Earth ’s upper atmosphere. Its background **model** is TIE-GCM and the **model** uses the **local** **ensemble** **transform** **Kalman** ﬁlter (LETKF) for the assimilation scheme. The LETKF is an ef ﬁ- cient implementation of the **ensemble** **Kalman** ﬁlter which de ﬁnes **local** regions where the assimilation is performed. The advantage of this is that it reduces the state space of the **model** and brings it closer to the space spanned by the **ensemble** members. The algorithm iterates through each grid point inde- pendently and so is naturally suitable for parallelization. A com- putationally ef ﬁcient implementation **using** eigendecomposition has also been presented.

Show more
13 Read more

The data assimilation (DA) actually provides another method to identify the potential temporal variations of **model** parameters by updating them in real time when observa- tions are available (Liu and Gupta, 2007; Xie and Zhang, 2013). The DA method has been widely applied in hydrol- ogy for soil moisture **estimation** (Han et al., 2012; Kumar et al., 2012; Yan et al., 2015) and flood forecasting (Y. Li et al., 2013; Liu et al., 2012; Abaza et al., 2014). It has also been successfully used to estimate **model** parameters (Moradkhani et al., 2005; Kurtz et al., 2012; Montzka et al., 2013; Panzeri et al., 2013; Vrugt et al., 2013; Xie and Zhang, 2013; Shi et al., 2014; Xie et al., 2014). For example, Vrugt et al. (2013) proposed two Particle-DREAM (DiffeR- ential Evolution Adaptive Metropolis) methods, i.e., Particle- DREAM for time-variant and time-invariant parameters, to track the evolving target distribution of HyMOD parame- ters, while both results were approximately similar and sta- tistically coherent since only 3 years of data were used. Xie and Zhang (2013) used a partitioned forecast-update scheme based on the **ensemble** **Kalman** **filter** (EnKF) to retrieve op- timal parameters in a distributed hydrological **model**. Al- though the DA method has been used to estimate **model** pa- rameters, these studies are focused on the **estimation** of con- stant parameters. Little attention has been paid to the iden- tification of time-variant **model** parameters by **using** the DA method.

Show more
13 Read more

Hydrologic models are twofold: models for understanding physical processes and models for prediction. This study addresses the latter, which modelers use to predict, for example, streamﬂow at some future time given knowledge of the current state of the system and **model** parameters. In this respect, good estimates of the parameters and state variables are needed to enable the **model** to gen- erate accurate forecasts. In this paper, a dual state–**parameter** **estimation** approach is presented based on the **Ensemble** **Kalman** Fil- ter (EnKF) for sequential **estimation** of both parameters and state variables of a hydrologic **model**. A systematic approach for identiﬁcation of the perturbation factors used for **ensemble** generation and for selection of **ensemble** size is discussed. The dual EnKF methodology introduces a number of novel features: (1) both **model** states and parameters can be estimated simultaneously; (2) the algorithm is recursive and therefore does not require storage of all past information, as is the case in the batch calibration procedures; and (3) the various sources of uncertainties can be properly addressed, including input, output, and **parameter** uncer- tainties. The applicability and usefulness of the dual EnKF approach for **ensemble** streamﬂow forecasting is demonstrated **using** a conceptual rainfall-runoﬀ **model**.

Show more
14 Read more

1. Introduction
Data assimilation is one of the most important tech- niques in numerical weather predictions. It is important for the data assimilation to draw information from the obser- vations as much as possible. Evensen (1994) suggested an **ensemble** **Kalman** **filter** (EnKF), which approximates the covariance matrix of the **Kalman** **filter** (KF; **Kalman** 1960) by **using** the **ensemble** predictions. The EnKF can consider the flow-depended covariance matrix, so the EnKF can draw information which the observations have. There are a number of studies about the EnKF with the Lorenz-96 system (Lorenz 1996), the regional models and other models based on the primitive equations, e.g., the SPEEDY **model** (Molteni 2003). Zhang et al. (2006) investigated the EnKF with a nonhydrostatic regional **model** and gave details on the dependence of EnKF performance on error growth rate and scales. Hunt et al. (2007) developed the LETKF, which has an important advantage in assimilating the observations in each **local** patch. Due to this advantage, the sampling errors are filtered, and the LETKF has a higher performance for the implementation in parallel com- puters. In Miyoshi et al. (2007), they removed **local** patches of LETKF and applied the LETKF to AFES (AGCM (atmos- pheric general circulation **model**) for the Earth Simulator; Ohfuchi et al. 2004) with a T159L48 resolution, and they investigated the stability of the LETKF without the **local** patches. In the **ensemble** based KF, it is necessary to tune the covariance or spread inflation **parameter**, which often costs a lot. In Miyoshi (2005) the inflation **parameter** is esti- mated adaptively by means of the scalar KF algorithm in order to avoid the complicated tuning. The method, however, did not work properly in the experiments with the real observations because the observational errors are not perfectly known, and the errors also influence the accuracy of an analysis. Kalnay et al. (2007) and Li et al. (2009) then reported the algorithms to adaptively estimate not only the inflation **parameter** but also the observational

Show more
For CLM, larger differences were observed in the perfor- mance of the different data assimilation methods. This larger disparity among the methods is explained by the consider- ably larger number of soil layers (10) used by CLM. This increased significantly the dimensionality of the **parameter** **estimation** problem. The overall best results at the 5, 20, and 50 cm measurement depths were observed for EnKF-AUG and EnKF-DUAL, with RMSE values that were somewhat smaller than their counterparts derived from PMCMC. This was true for both the calibration and evaluation periods. The RRPF exhibited the worst performance, in part determined by the use of a relatively small **ensemble** of N = 100 par- ticles. The superiority of the EnKF-AUG and EnKF-DUAL methods for CLM is consistent with our expectations articu- lated previously in Sect. 3.1. The analysis step of the EnKF makes it much easier for EnKF-AUG and EnKF-DUAL to track the measured soil moisture dynamics, thereby promot- ing convergence in high-dimensional state-**parameter** spaces. PF-based methods, by contrast, deteriorate in robustness and efficiency with larger dimensionality of the state-**parameter** space as they lack a state-analysis step and approximate the transient state-**parameter** PDF via the particles’ likeli- hoods. This likelihood is only a low-dimensional summary statistic of the distance between the forecasted and mea- sured values of the states. Resampling with MCMC via the likelihood thus becomes increasingly more difficult in high- dimensional state-**parameter** spaces. For CLM, the PMCMC method still achieves comparable results to EnKF-AUG and EnKF-DUAL as the dimensionality of the state-**parameter** PDF of this **model** is only somewhat larger than its coun- terpart of VIC-3L. Of course, the use of a larger ensem- ble size makes it easier to characterize the transient state- **parameter** PDF, but at the expense of a significantly increased CPU cost. For PMCMC, multiple different MCMC resam- pling steps can also enhance significantly the particle ensem- ble by allowing each particle trajectory to improve its likeli- hood. Yet, this deteriorates significantly the efficiency of im- plementation as each candidate particle requires a separate **model** evaluation of VIC-3L or CLM to determine its likeli- hood. Thus, for LSMs with relatively few state variables and **model** parameters, we expect the EnKF and PF methods to achieve a comparable performance. For larger-dimensional state-**parameter** spaces we would recommend EnKF-AUG and EnKF-DUAL, unless one can afford a very large num- ber of particles.

Show more
32 Read more

Several studies argued that the joint EnKF may suffer from important inconsistencies between the estimated state and parameters that could degrade the **filter** performance, espe- cially with large-dimensional and strongly nonlinear systems (e.g., Moradkhani et al., 2005b; Chen and Zhang, 2006; Wen and Chen, 2007). One classical approach that has been pro- posed to tackle this issue is the so-called dual **filter**, which separately updates the state and parameters **using** two in- teractive EnKFs, one acting on the state and the other on the parameters (Moradkhani et al., 2005b). The dual EnKF has been applied to streamflow forecasting problems **using** rainfall–runoff models (e.g., Lü et al., 2013; Samuel et al., 2014), subsurface contaminant (e.g., Tian et al., 2008; Lü et al., 2011; Gharamti et al., 2014b), and compositional flow models (e.g., Phale and Oliver, 2011; Gharamti et al., 2014a), to cite but a few. Gharamti et al. (2014a) concluded that the dual scheme provides more accurate state and param- eter estimations than the joint scheme when implemented with large enough ensembles. In terms of complexity, how- ever, the dual scheme requires integrating the **filter** ensem- ble twice with the numerical **model** at every assimilation cy- cle, and is therefore computationally more demanding. In re- lated works, Gharamti et al. (2013) extended the dual **filter**- ing scheme to tackle the state **estimation** problem of one- way coupled models, and to the framework of hybrid-EnKF (Gharamti et al., 2014b).

Show more
19 Read more

The ocean color data used for assimilation are the GlobColour 2 GSM- derived CHL1 products obtained from MERIS, MODIS and SeaWiFS instruments. They correspond to an 8-day averaged chlorophyll-a con- centration for case I water at 25 km resolution. Their spatial coverage varies strongly with seasons resulting in absence of observations for the Arctic Ocean in winter. Chlorophyll-a concentrations are assumed to be log-normally distributed ( Campbell, 1995 ) and the observations are thus log-transformed before assimilation. The standard deviation of these log-transformed observations is assumed equal to 0.35, so the observation error equals 35% of the observation values ( Gregg and Casey, 2004 ). Locally the true errors can be larger than 35%, resulting in a large underestimation of the observation error that can impair the quality of the estimated variables. This is the case in the Arctic Ocean in 2010 for which erroneously high concentrations occur due to a deg- radation in the quality of the MODIS Aqua ocean color product for this version of the GlobColour data set ( Meister and Franz, 2014 ). Further- more, the algorithm used for deriving chlorophyll estimates in case I water is unsuited to coastal waters. In compensation, observations in waters shallower than 300 m and less than 50 km away from the coast are not assimilated in the ﬁrst 6 months of 2008 in order to pre- vent overﬁtting to observations of poor quality. However, such criterion exclude large areas of interest (e.g. North Sea, Chukchi Sea, Hudson Bay) that would beneﬁt from assimilating case I water chlorophyll concen- trations. So, all the observations located at least 50 km away from the coast are assimilated from 1 July 2008. It means that the **estimation** of the ecosystem parameters starts in these areas with a 6-month delay compared to other North Atlantic open ocean areas (this delay is shorter at high latitudes because there are no observations during winter time). Except for the SST product, the physical observations correspond to the ones assimilated in TOPAZ4 ( Sakov et al., 2012 ). The use of version 2 of the Reynolds SST product ( Reynolds and Smith, 1994 ) from the National Climatic Data Center 3 (NCDC), with a resolution of approximately 100 km, is motivated by the coarse resolution of the **model**. We refer to

Show more
18 Read more

Furthermore, in order to compute the LETKS nudging term, EnKF-like arguments are adopted. That is, when computing the analysis update, the posterior pdf is assumed Gaussian. Linear **model** evolution is assumed so that the updates can be propagated backwards in time. Having made this Gaussian assumption at the timestep before the observations will limit the benefits of **using** the fully non- linear particle **filter** which does not make any such assump- tions on the distribution of the posterior. Indeed, considering the evolution of the EWPF with the LETKS nudging and comparing with that of the LETKF (Fig. 6a and b), they are markedly similar. Hence, the extra expense of the EWPF over the LETKF may not be justified.

Show more
18 Read more

6 NASA-GSFC Global Modeling and Assimilation Office, Greenbelt, MD, USA
Received: 20 December 2007 – Revised: 4 April 2008 – Accepted: 10 July 2008 – Published: 5 August 2008
Abstract. This paper compares the performance of the Lo- cal **Ensemble** **Transform** **Kalman** **Filter** (LETKF) with the Physical-Space Statistical Analysis System (PSAS) under a perfect **model** scenario. PSAS is a 3D-Var assimilation sys- tem used operationally in the Goddard Earth Observing Sys- tem Data Assimilation System (GEOS-4 DAS). The compar- ison is carried out **using** simulated winds and geopotential height observations and the finite volume Global Circulation **Model** with 72 grid points zonally, 46 grid points meridion- ally and 55 vertical levels. With forty **ensemble** members, the LETKF obtains analyses and forecasts with significantly lower RMS errors than those from PSAS, especially over the Southern Hemisphere and oceans. This observed advantage of the LETKF over PSAS is due to the ability of the 40- member **ensemble** LETKF to capture flow-dependent errors and thus create a good estimate of the evolving background uncertainty. An initial decrease of the forecast errors in the Northern Hemisphere observed in the PSAS but not in the LETKF suggests that the LETKF analysis is more balanced.

Show more
15 Read more

As expected, the forecast RMS error of LETKF is smaller than that of PSAS (dashed line) during a five-day forecast period over all the regions (Fig. 7). The forecast from the LETKF analysis assimilating the second set of observations (solid line) has a smaller error than that from PSAS or LETKF assimilating the first set of observations (solid line with open circles). However, different regions show different error growth characteristics. In the NH (Fig. 7a), PSAS errors initially decay, and start growing only after a day, indicating that the PSAS initial conditions are not well balanced. With the same set of observations (the first set of observations), the LETKF starts with smaller errors but they grow faster than those of PSAS, suggesting that when assimilating geopotential heights, our implementation of the LETKF did not completely succeed in suppressing the baroclinic “errors of the day” in the NH. The LETKF analysis **using** temperatures and surface pressure seems to be the most balanced, and the errors grow more slowly at an approximately constant exponential rate.

Show more
50 Read more

In this paper, we focus on **parameter** **estimation** for an elliptic inverse problem. We consider a 2D steady-state single- phase Darcy flow **model**, where permeability and boundary conditions are uncertain. Permeability is parameterized by the Karhunen-Loeve expansion and thus assumed to be Gaussian distributed. We employ two **ensemble**-based data assimilation methods: **ensemble** **Kalman** **filter** and **ensemble** **transform** particle **filter**. The formal one approximates mean and variance of a Gaussian probability function by means of an **ensemble**. The latter one transforms **ensemble** members to approximate any posterior probability function. **Ensemble** **Kalman** **filter** considered here is employed with regularization and localization— R(L)EnKF. **Ensemble** **transform** particle **filter** is also employed with a form of regularization called tempering and localization—T(L)ETPF. Regularization is required for highly non-linear problems, where prior is updated to posterior via a sequence of intermediate probability measures. Localization is required for small **ensemble** sizes to remove spurious correlations. We have shown that REnKF outperforms TETPF. We have shown that localization improves estimations of both REnKF and TETPF. In numerical experiments when uncertainty is only in permeability, TLETPF outperforms RLEnKF. When uncertainty is both in permeability and in boundary conditions, TLETPF outperforms RLEnKF only for a large **ensemble** size 1000. Furthermore, when uncertainty is both in permeability and in boundary conditions but we do not account for error in boundary conditions in data assimilation, RLEnKF outperforms TLETPF.

Show more
12 Read more

Inverse modeling and, in particular, data assimilation methods are techniques that can be used to estimate the state of dynamical systems based on partial and noisy observa- tions. In a broad sense, these techniques build on continu- ous or quasi-continuous observations to produce **model** ini- tial conditions (analyses) that can be used to better predict the future state, taking into account uncertainties in observa- tions and **model** formulation. Data assimilation methods have been successfully applied to the **estimation** of the state of the ocean or the atmosphere (e.g., Kalnay, 2003; Carrassi et al., 2018) as well as for the optimization of uncertain **model** pa- rameters (e.g., Ruiz et al., 2013). More recently, applications have been extended to atmospheric constituents (e.g., Boc- quet et al., 2010; Hutchinson et al., 2017), including ash dis- persion models with the purpose of estimating the 3-D dis- tribution of ash concentrations to be used as initial condi- tions for forecasts. Surprisingly, examples of the application of data assimilation techniques to volcanic ash dispersion are scarce and still mainly limited to a research level. For ex- ample, Wilkins et al. (2015) implemented a data insertion methodology to improve the initial conditions of ash concen- trations based on satellite estimations of ash mass loadings in a Lagrangian dispersion **model**. Fu et al. (2015, 2017a) applied an **ensemble** **Kalman** **filter** technique to the estima- tion of ash concentrations in an Eulerian dispersion **model** based on flight concentration measurements and satellite es- timations **using** idealized experiments and real observations. Their results showed that both observational sets (flight mea- surements and satellite mass loads) reduced forecast errors, which in their particular case were attributed to a wrong **model** representation of ash sedimentation processes. One important issue when **using** satellite estimates of ash mass loadings is that observations only provide a 2-D distribution of ash mass, while models usually require the vertical profile

Show more
22 Read more

Both methods require additional development of advanced ap- proaches to the treatment of **model** errors, including weak con- straint for 4-D-Var and efficient estimates of state-dependent bias in EnKF. Like 4-D-Var, EnKF has a few tuning ‘handles’ that need to be explored, including the number of **ensemble** mem- bers, the strength and characteristics of the covariance **local**- ization, the handling of **model** errors, the use of multiplicative or additive inflation, and its adaptive **estimation** **using** obser- vational increments. In the next few years more experiments with real observations will build up the EnKF experience needed for operational implementation. Fortunately, because the prob- lems solved in EnKF and 4-D-Var are very closely related, re- searchers can take advantage and share advances made in either method.

Show more
16 Read more

• Weights are given by statistical likelihood of an observation
• Example: With Gaussian observation errors (for each particle i):
• **Ensemble** mean state computed with weights
• This update does not assume any distribution of the state errors

36 Read more

In tightly coupled integration, both the pseudo-range (i.e. distance to satellite) and inertial sensor measurements are processed in a single large **Kalman** **filter**. The pseudo-range measure- ments can be considered as white noise sources.
In low-cost loosely-coupled integration, the GPS receiver is a stand-alone unit that processes pseudo-range measurements **using** its internal **Kalman** **filter** and only reports estimated global coordinates, which, however, cannot be considered as observations affected by a white noise, as their error usually exhibits a strong sequential correlation. If the noise was assumed white, the integration **filter** might provide overconfident estimates and/or diverge (Julier and Uhlmann, 1997a). The problem lies in the fact that the integration **filter** cannot **model** the real underlying system state including variables such as GPS receiver clock bias and clock drift, and it only models a high-level projection of these variables - the global coordinates. Even though the introduction of additional state variables could alleviate the problem, modelling of such non- physical variables is problematic. Another often applied approach is to artificially increase the variance of the observation variables or to ignore certain observations altogether, and thus effectively discard potentially useful information and reduce the convergence rate.

Show more
182 Read more

III. EKF A LGORITHM
EKF is broadly used for **estimation** purpose in various applications [23]-[25]. It uses linearized **model** of the nonlinear system to implement **Kalman** ﬁltering. The linearization process uses the partial derivative or Jacobian matrices of nonlinear function of the **model**. **Using** a priori and posteriori error covariance, the **estimation** process is deﬁned in terms of the linearized observation **model**. The algorithm starts with initialization of mean value of the state vector and covariance matrix.

variation, then an expensive floating-point processor would be required to handle this complex algorithm. The third method is developed based on the function of an observer (Kim, 2004). Various types of observers are then used to estimate rotor position, especially the Extended **Kalman** **Filter** (EKF) (Lenine, 2007). The biggest advantage of **using** observers is lied on that all of the states in the system can be estimated, including with the states that are hard to obtain by measurements. However, there are also some limitations on **using** the EKF as an observer, such as the characteristics of the EKF that can only be performed as first-order accuracy, a high computational complexity due to calculation of the Jacobian matrices and its covariance matrix. However, the most important problem related with the used of EKF as an observer is lied on its weak robustness characteristics against **parameter** detuning.

Show more
Some results are displayed in Fig. 11. The evolution of the retrospective analysis of F is shown for the EnKF-N, the EnKS-N L = 50, the MDA IEnKS-N L = 50 S = 1, and 4D- Var L = 50 S = 1. The RMSEs are indicated in parenthesis in the legends. Although the IEnKS-N L = 50 remains the best performer in both cases, the gap in performance is nar- rower, because of the incorrect persistence assumption within the DAW. Let us remark that, in these cases, the RMSE of the retrospective analysis of the IEnKS is different from the RMSE of the filtering analysis because the truth that serves as a point of comparison changes within the DAW. Note also that because of the imperfection of the persistence **model**, a multiplicative inflation of 1.01 of the **ensemble** anomalies has been applied to the finite-size methods since they are not meant to intrinsically account for extrinsic **model** error (Boc- quet et al., 2011), whereas the EnKF requires an inflation of 1.05 here to account for both **model** and sampling errors.

Show more
16 Read more