arxiv: v1 [physics.comp-ph] 1 Jun 2020

(1)

Analog ensemble data assimilation and a method

for constructing analogs with variational

autoencoders

Ian Grooms

Department of Applied Mathematics, University of Colorado, Boulder

Abstract

It is proposed to use analogs of the forecast mean to generate an ensemble of perturbations for use in ensemble optimal interpolation (EnOI) or ensemble variational (EnVar) methods. A new method of constructing analogs using vari-ational autoencoders (VAEs; a machine learning method) is proposed. The result-ing analog methods usresult-ing analogs from a catalog (AnEnOI), and usresult-ing constructed analogs (cAnEnOI), are tested in the context of a multiscale Lorenz-‘96 model, with standard EnOI and an ensemble square root filter for comparison. The use of analogs from a modestly-sized catalog is shown to improve the performance of EnOI, with limited marginal improvements resulting from increases in the catalog size. The method using constructed analogs (cAnEnOI) is found to perform as well as a full ensemble square root filter, and to be robust over a wide range of tuning parameters.

1 Introduction

Data assimilation methods are widely used in geophysics for a variety of purposes. Workhorse methods include the Ensemble Kalman Filter (EnKF) and its many variants (Burgers et al., 1998; Evensen, 1994; Houtekamer and Mitchell, 1998), and 3D-Var and 4D-Var (Talagrand, 2010). Traditional variational methods suffer from the use of a time-independent background covariance, whereas the drawbacks of the EnKF include the sometimes high cost of generating ensemble members and less accurate treatment of nonlinearity and non-Gaussianity. A variety of hybrids exist between ensemble and variational methods that aim to combine the strengths of the different methods (Bannister, 2017). Ensemble optimal interpolation (EnOI; Evensen, 2003; Oke et al., 2002) uses a time-independent background covariance that is generated from a time-independent ensemble of perturbations. In EnOI a single model simula-tion is required for each assimilasimula-tion cycle to propagate the mean state. EnOI uses a gain matrix to compute the increment between the forecast and analysis means. One

(2)

could alternatively use a variational approach that finds the minimizer of a loss func-tion whose background covariance is defined using the time-independent ensemble of perturbations; such a method would be a form of EnVar (Lorenc, 2013).

The ensemble of perturbations used in EnOI usually comes from a catalog of model states from a long-running simulation. Since the ensemble of perturbations used in EnOI is static EnOI suffers from the same drawbacks as classical variational methods, but has the benefit that only a single forecast is required for each assimilation cycle. The goal of this investigation is to explore a way of improving the performance of EnOI by generating a time-dependent ensemble of perturbations from a large catalog. The premise is that an ensemble of model states chosen as a subset from the catalog that are similar to the current forecast will produce an ensemble of perturbations that is more appropriate for use in EnOI than an ensemble that is representative of the climatology of the model. Ensemble perturbations drawn from the climatology represent the cor-relations in the climatology, which can be a poor proxy for corcor-relations in the forecast error. Analog ensemble perturbations come from the part of the dynamical system’s attractor (or pullback attractor for non-autonomous systems) that is close to the actual forecast, and therefore represent correlations on a specific part of the model attractor rather than over the whole climatology. As a result they are expected to provide a more realistic representation of forecast error, since the forecast error distribution should be expected to cover a neighborhood of the attractor close to the forecast mean.

Model states that are similar to the current forecast are called ‘analogs’ (Lorenz, 1969) and have a long history in weather forecasting and forecast downscaling (Delle Monache et al., 2013; Eckel and Delle Monache, 2016; Zhao and Giannakis, 2016). Van den Dool (1994) considered finding analogs from a large historical catalog of model states, and showed that to make an effective analog global weather forecast would require an impossibly large catalog - on the order of 1030_{years of data. Nevertheless, in the}

current setting one may still expect some degree of success with analogs drawn from a practically-sized catalog since the analogs are not being used for forecasting, but only to improve the background covariance within the data assimilation framework.

One way of avoiding the impossibly large size requirements of a catalog for ana-log forecasting is to use a reasonably-sized cataana-log to construct anaana-logs, and there are many ways of doing this (Abatzoglou and Brown, 2012; Hidalgo et al., 2008; Maurer et al., 2010; Pierce et al., 2014). This investigation explores a new way of constructing analogs using variational autoencoders (Kingma and Welling, 2019). A standard au-toencoder consists of two functions: an encoder e(x) that maps the model state x ∈ Rd to a latent space z ∈ Rlwhere l d, and a decoder d(z) that maps a vector in the latent space to a model state. Both e and d are usually specified as deep artificial neu-ral networks. Given a catalog of model states {xi}Ni=1, the parameters of e and d are

chosen to minimize

X

i

kxi− d(e(xi))k22

or some similar loss function. A standard autoencoder does not impose any particular structure on the latent space. For example, a sufficiently powerful autoencoder might simply learn the map i = e(xi), d(i) = xi. As a result, standard autoencoders are

not always useful as generative models: If zi = e(xi), then d(zi+ ) need not be

(3)

the latent space; specifically, they aim to choose the parameters of e and d so that the structure of the data in latent space is approximately Gaussian. This is accomplished by two devices. First, the latent space is divided in two so that e(x) = (µ, σ) where µ, σ ∈ Rl. Then, a latent space vector is constructed as z = µ + σ ◦ where is a standard normal Gaussian random vector and ◦ denotes the elementwise product (also known as Hadamard product, or Schur product). Second, the loss function is altered by the addition of a term that penalizes deviations of the latent space distribution from a standard normal. (For details on the form of this additional penalty term see Kingma and Welling (2019).) This investigation uses a variational autoencoder (VAE) to generate analogs that are then used to construct the ensemble of perturbations for use in data assimilation.

Data assimilation algorithms using analogs have been proposed in the context of geophysical data assimilation by Lguensat et al. (2017, 2019). Two key differences between this approach and the current approach are (i) the current approach relies on a model simulation to make the forecast, whereas in Lguensat et al. (2017, 2019) the dynamics are data-driven in a manner similar to analog forecasting, and (ii) the current approach investigates the use of VAEs to construct analogs. The data-driven approach of Lguensat et al. (2017, 2019) is expected to be clearly superior in cases where there is no reliable dynamical model for the system in question, or where such a model would be prohibitively expensive.

The investigation is carried out in the context of a multiscale Lorenz-‘96 model, which is described in section 2. The configuration and training of the VAE is described in section 3. The data assimilation system setup is described in section 4, and the results of data assimilation experiments are described in section 5. Conclusions are offered in section 6.

2 Multiscale Lorenz-‘96 model configuration

Many data assimilation methods have been initially explored in the context of the Lorenz-’96 model (Lorenz, 1996, 2006). Higher dimensionality can be obtained in this model by simply retaining the model form while increasing the dimension; al-ternatively there is a two-scale version also described by Lorenz (1996). This latter two-scale model has two sets of variables, Xi describing the large, slow scales and

Yj describing the small, fast scales. Grooms and Lee (2015) introduced a multiscale

Lorenz-‘96 model with a single set of variables xiwith distinct large-scale and

small-scale parts. The model is governed by the following system of ordinary differential equations

˙

x = hNS(x) + J TTNL(Tx) − x + F 1 (1)

where h, F ∈ R, J ∈ N, 1 is a vector of ones, and the nonlinearities have the form (NS(x))_i= −xi+1(xi+2− xi−1) (2)

(NL(X))k = −Xk−1(Xk−2− Xk+1). (3)

(4)

Figure 1: A simulation of the multiscale Lorenz-‘96 model initialized at t = 0 with a sample from a standard normal distribution.

41 equally-spaced points. The matrix J TT spectrally interpolates a vector of length 41 back to the full dimension of x. The number of state variables in x is 41J ; here J = 64 for a total system dimension of 2624. In the definition of the nonlinear terms the indices are assumed to extend periodically, as in the Lorenz-‘96 model.

The large-scale part of the model dynamics, which can be extracted by applying T to x, is identical to the dynamics of the standard Lorenz-‘96 model, except that the large scales are coupled to small scales via the term hTNS(x). While the

Lorenz-‘96 model is often configured with K = 40 large-scale variables (e.g. Lorenz and Emanuel, 1998), this multiscale model uses 41 variables so that the real and imaginary parts of the 20th _{Fourier mode are not split between large and small scales. At small}

scales, the dynamics are the same as those of original Lorenz-‘96 model but with the direction of indexing reversed, and with coupling to the large scales. Coupling to the large scales drives small-scale instabilities, which then grow and cause feedback onto the large-scale flow. Figure 1 shows the result of a simulation of this model initialized at t = 0 with a sample from a standard normal distribution. After a short transient the dynamics settle onto an attractor, with large-scale nonlinear waves propagating eastward and small-scale instabilities transiently excited by the large-scale waves.

3 Variational Autoencoder

(5)

Figure 2: Architecture of the variational autoencoder. The leftmost vertical line indi-cates the data x. The blue rectangles in the left half indicate convolutional layers, and the blue rectangles in the right half indicate transposed convolutional layers. The yel-low rectangles in the middle indicate fully-connected layers. The green oval indicates the random noise . The rightmost vertical line indicates the output.

and the associated terminology. The architecture of the autoencoder is summarized in Fig. 2. The encoder e(x) is constructed as follows:

1. A convolutional layer with three filters of size 3 × 1 2. A convolutional layer with nine filters of size 3 × 1 3. A convolutional layer with 27 filters of size 3 × 1 4. A max pooling layer with 2 × 1 pool size

5. Two convolutional layers with 27 filters each of size 3 × 1 6. A max pooling layer with 2 × 1 pool size

7. Two convolutional layers with 27 filters each of size 3 × 1 8. A max pooling layer with 2 × 1 pool size

9. A fully connected layer with two outputs, each of size 492. The decoder d(z) is constructed as follows:

1. A fully connected layer with output of size 656 with 27 channels.

(6)

3. A transposed convolutional layer with 27 filters of size 3 × 1 and stride of 1 4. A transposed convolutional layer with nine filters of size 3 × 1 and stride of 2 5. A transposed convolutional layer with nine filters of size 3 × 1 and stride of 1 6. A transposed convolutional layer with one filter of size 3 × 1 and a stride of 1.

The convolutional layers, transposed convolutional layers, and fully connected lay-ers all use the exponential linear unit activation function, with the form

eLU(s) =

s s ≥ 0

e−s_{− 1} _{s < 0} (4)

The eLU function has a continuous first derivative, which implies that the the de-coder also has a continuous first derivative, since it is a composition of continuously-differentiable functions. By contrast, the use of max pooling layers in the encoder implies that the encoder is continuous, but its first derivative is only piecewise contin-uous.

The data used to train the VAE consists of 70,000 snapshots of the state of the multiscale Lorenz-‘96 model described in the previous section. These snapshots are generated by initializing the model from a standard normal, then running the simula-tion until it reaches a statistical equilibrium, then taking data every 1 time unit, which corresponds to 5 days in the standard dimensionalization of the Lorenz-‘96 model. Us-ing trainUs-ing data from the model’s attractor means that the variational autoencoder is attempting to learn a map that transforms the stationary invariant measure on the sys-tem attractor to a Gaussian distribution in latent space. The model is trained (i.e. the parameters of e and d are estimated) using stochastic gradient descent. The batch size is 3500 snapshots, and the optimization was trained for 272 epochs, at which point the objective function had saturated. In more realistic applications than the multiscale Lorenz-‘96 model it would be of interest to explore multiple architectures and training regimens to investigate whether the VAE strikes a balance between being sufficiently expressive and being simple enough to train with limited data. For the purpose here of demonstrating the proof of concept in a simple model, a single architecture suffices.

4 Data Assimilation: Methods

The observations are taken at every fourth point in space and at every 0.2 time units (which corresponds to 1 day in the standard dimensionalization of the Lorenz-‘96 model). At every assimilation cycle there are effectively 16 observations for each of the 41 large-scale Lorenz-‘96 modes. The observation errors are Gaussian with zero mean and variance 1/2. With this observing system and this forecast lead time the dynamics of the model remain only weakly non-Gaussian according to the taxonomy of Metref et al. (2014) and Morzfeld and Hodyss (2019), implying that the primary limitation to performance of an EnKF will be ensemble size rather than non-Gaussianity.

(7)

each method, a range of parameters is explored. For each parameter combination that is tested, at least 8 experiments are run. For each experiment a reference simulation is initialized from standard normal noise and run for 9 time units, by which time it has reached a statistical equilibrium. Observations are taken starting at time t = 9, every 0.2 time units (1 day) for 73 time units (one year), which corresponds to 365 assimilation cycles. The first 73 assimilation cycles of each experiment are considered a burn-in period, and are discarded when computing performance statistics.

At each assimilation cycle the performance of the filter is measured using the root mean square error (RMSE), defined as the 2-norm of the error between the reference simulation and the filter analysis mean. At each parameter value this procedure results in at least 8 × (365 − 73) = 2336 values of RMSE. The mean of these values is used to summarize the performance of the method for that specific combination of parameters. RMSE based on the forecast is available, but does not behave qualitatively differently from analysis RMSE and is therefore not shown.

The following subsections detail the different assimilation methods to be compared.

4.1 Serial Ensemble Square Root Filter

The point of this investigation is to consider methods that improve on EnOI but that are less computationally costly than an EnKF. As such, it is useful to run an EnKF as a baseline for comparison. The baseline method used here is the serial ensemble square root assimilation of Whitaker and Hamill (2002), with Schur-product localization in observation space and multiplicative inflation. This method is referred to as ESRF in the results.

The initial ensemble is constructed by initializing each ensemble member using an independent draw from a standard normal distribution, then forecasting this initial con-dition for 9 time units (45 days), by which time they have reached the system attractor. The final condition of each simulation at t = 9 is used to initialize the ensemble, so the initial ensemble is completely independent of the reference simulation used to generate the observations. The localization function has the form

`i= e− 1 2( i L) 2 (5) where L is the localization radius. For reference, the large-scale Lorenz-‘96 modes in this model are effectively 64 units apart, so to convert L to a comparable localization radius for the standard Lorenz-‘96 model it suffices to divide L by 64. The multi-plicative inflation is applied to the analysis ensemble, since El Gharamti et al. (2019) recently found that posterior inflation is more appropriate and more effective in situ-ations without model error. Inflation is applied by multiplying the analysis ensemble perturbations by an inflation factor of r ≥ 1.

The three tunable parameters for the ESRF are the ensemble size Ne, the

localiza-tion radius L, and the inflalocaliza-tion factor r. Some limited exploralocaliza-tion of ensemble size Ne

was performed. First a range of L and r were explored at Ne= 100. Then, a range was

explored at Ne= 200. The optimal RMSE obtained at Ne= 200 was not significantly

better than at Ne= 100, so all results reported here for all methods (including ESRF)

(8)

4.2 Ensemble OI

EnOI can also be considered as a baseline for comparison of the analog methods, but on the other side from the ESRF method. The EnOI used here is configured exactly the same as the ESRF, except that no inflation needs to be applied. A different ensem-ble of perturbations is drawn randomly for each experiment from a catalog of 41,000 model states (once drawn, the ensemble perturbations remain time-independent for all assimilation cycles within a single experiment). This catalog is different from the one used to train the VAE, but is constructed in the same way. The climatological spread represented by this ensemble is too large to be an accurate representation of the fore-cast error, so the ensemble of perturbations is scaled to a pre-defined forefore-cast spread, which forms the second tunable parameter (together with localization radius) for the EnOI method.

4.3 Analog Ensemble OI

The analog ensemble OI (AnEnOI) method is exactly the same as the EnOI method except for the following: At each assimilation cycle the ensemble is chosen to be the Ne= 100 members of the catalog that are closest to the current forecast. The impact of

the size of the catalog is briefly explored by performing experiments using (i) a catalog of only 1,000 members, and (ii) the full catalog of 41,000 members. Results reported below are for the smaller catalog, unless noted otherwise. The similarity of analogs to the forecast is defined using the 2-norm; the impact of using other, more dynamically motivated measures of similarity is not explored.

4.4 Constructed Analog Ensemble OI

The constructed analog ensemble OI (cAnEnOI) is exactly the same as the AnEnOI except for the construction of the analogs. To construct analogs, the forecast mean is first encoded using the encoder e(x). Recall that the encoder produces two vectors, µ and σ, and during training the encoded state is z = µ + σ ◦ where is a standard normal random variable. For the purposes of constructing analogs, an ensemble in latent space is constructed as follows:

zi= µ + rzi, i = 1, . . . , Ne (6)

where i are independent draws from a standard normal distribution and rz is a

tun-able parameter governing the spread of the ensemble in the latent space. The analog ensemble is then constructed using the decoder as xi= d(zi).

As noted above, the decoder is a continuously-differentiable function. The ensem-ble in latent space is Gaussian, so for small enough rzthe analog ensemble will also

be approximately Gaussian distributed, with a covariance matrix approximately

r_z2DDT (7)

(9)

Figure 3: Mean RMSE for the four methods, as a function of the governing parameters. The experimental results are shown as red dots; values in between are interpolated. Values for cAnEnOI are at rz= 0.7. The colorbar is the same for all plots. Note that

all methods include localization radius as a tunable parameter, but ESRF has inflation as a tunable parameter while the other methods have forecast spread. The axis limits on each panel are different.

matrix. Of course the analog ensemble covariance matrix will also have rank less than or equal to Ne− 1.

For small rzthe correlation structure depends only on the forecast mean, and not

on rz. For larger rzthe nonlinearity of the decoder comes into play with two important

consequences. First, the analog ensemble becomes increasingly non-Gaussian, which allows the rank of the covariance matrix to exceed the dimension of the latent space (though the ensemble covariance matrix still must have rank bounded by Ne − 1).

Second, the correlation structure of the analog ensemble begins to depend on rz as

well as on the forecast mean.

It is desirable to decouple the forecast spread of the analog ensemble from the correlation structure of the analog ensemble covariance matrix. This can be achieved by first constructing the analog ensemble as described above, and then rescaling the ensemble perturbations to have the desired spread. As a result, the cAnEnOI method has three main tunable parameters: (i) localization radius, (ii) rz which controls the

(10)

5 Data Assimilation: Results

Figure 3 shows analysis RMSE as a function of tunable parameters for the four methods (ESRF, EnOI, AnEnOI, and cAnEnOI). Results for the cAnEnOI method are shown for rz= 0.7; the dependence on rzis discussed below. The ESRF has a fairly broad well in

parameter space where the analysis RMSE is around 1.5. The optimal observed RMSE is 1.35, which occurs at inflation factor r = 1.02 and localization radius L = 320. This is about a factor of two larger than the raw observation error 1/√2, which is not bad given that only one quarter of the state variables are observed. At localization radii smaller than 192 or larger than 384 the performance begins to degrade. For larger local-ization radii the ESRF performance becomes erratic, being limited by the deleterious effects of rare spurious long-range correlations: some experiments perform well, while others diverge. For smaller localization radii the ESRF performance also degrades, for reasons that are not entirely clear and appear to be related to dynamical imbalance of the analysis.

The EnOI (with a catalog of 1,000 model states) has an optimal analysis RMSE of 2.27, which occurs at a localization radius of 32 and a forecast spread of 0.6. Though significantly worse than ESRF, the EnOI still produces reasonably-accurate analyses; for comparison, a random draw from the climatological distribution would produce an RMSE of 4.97. The optimal localization radius for EnOI is a factor of 10 smaller than for ESRF. This is presumably because the correlations encoded in the ESRF ensemble are far more meaningful (i.e. representative of forecast error correlations) at long range than the climatological correlations associated with the EnOI ensemble.

The use of analogs significantly improves the EnOI method: the optimal analysis RMSE for AnEnOI is 2.01, which occurs at a localization radius of 16 and a forecast spread of 0.6. It is not clear why the optimal localization radius decreases, but on the other hand as seen in Fig. 3 the analysis RMSE of AnEnOI is not too strongly sensitive to changes in localization radius or forecast spread. This is very encouraging, since a catalog of only 1,000 states would presumably be far too small to produce accurate analog forecasts for this system. Increasing the catalog size to 41,000 further improves the analysis RMSE to 1.90: a very modest improvement for a very large increase in catalog size. To put a positive spin on this, it suggests that the bulk of the benefits that can be obtained by moving from EnOI to AnEnOI do not require an unrealistically large catalog.

The real success comes from using constructed analogs. The optimal analysis RMSE obtained using cAnEnOI at rz = 0.6 is 1.30: slightly better than obtained

using ESRF! The optimal localization radius and forecast spread are 40 and 0.7, re-spectively, but as shown in Fig. 3, the performance of cAnEnOI is not strongly sensi-tive to changes in these parameters: performance comparable to ESRF can be obtained over a wide range of localization radii and forecast spread.

Figure 4 shows the cAnEnOI analysis RMSE as a function of latent space spread rz

(11)

Figure 4: Mean RMSE for the cAnEnOI method, as a function of latent space spread rz and forecast spread (left panel), and as a function of rz and localization radius L

(right panel). The experimental results are shown as red dots; values in between are interpolated. The colorbar is the same for both panels, and is the same as in Fig. 3.

for rzbetween 0.05 and 0.2, cAnEnOI produces RMSE of 1.38 (at optimal values of

forecast spread and localization radius), which is comparable to ESRF. As rzincreases

the performance improves, with excellent results in the range .2 ≤ rz ≤ 1. As rz

increases further the performance slowly degrades, but even at rz= 2 the performance

is better than the optimal results using the AnEnOI method with ‘found’ analogs.

6 Conclusions

This work introduces a new use for analogs, besides forecasting and downscaling: to construct an ensemble background covariance matrix for use in data assimilation, as in the EnOI or EnVar frameworks. The research was carried out in the context of a multiscale Lorenz-‘96 model invented by Grooms and Lee (2015). Two methods were formulated: one based on finding analogs within a catalog of historical states (AnEnOI), the other based on constructing analogs using a variational autoencoder (VAE; Kingma and Welling, 2019) trained on a catalog of historical states (cAnEnOI). It was found that AnEnOI outperforms a basic EnOI method even with a relatively small catalog of 1,000 members, and further improvements were marginal when the catalog size was increased to 41,000. The cAnEnOI method was able to perform as well as an optimized ensemble square root filter (ESRF), and was quite robust to variations in the tuning parameters of the method. Several alternate methods exist for constructing analogs (Abatzoglou and Brown, 2012; Hidalgo et al., 2008; Maurer et al., 2010; Pierce et al., 2014); these could also be used in a cAnEnOI method.

(12)

more difficult and may be practically impossible. Fortunately, given the long history of analogs, there is already research on efficient ways to find analogs within a large catalog of large model states; see, e.g., Raoult et al. (2018) and Yang and Alessandrini (2019). Since the method using constructed analogs is far more successful, the second difficulty of real geophysical models is more pertinent. To overcome this limitation it is suggested to use a local analysis in the vein of the Local Ensemble Kalman Fil-ter (LEnKF; Brusdal et al., 2003; Evensen, 2003; Ott et al., 2004). This framework uses many local ensembles: for each model grid point a local ensemble analysis is per-formed using observations near that grid point. The cAnEnOI method developed here could easily be used in this local framework: For each grid point an analog ensemble is constructed for use in the local assimilation. The benefit of such a local analysis is that the VAE would only have to be trained to generate local subsets of the model state, rather than, e.g., the full state of a global coupled climate model. A similar localization procedure could be leveraged in the case of ‘found’ analogs rather than constructed ones.

Overall, the results are quite promising. EnOI is a widely used method (Backeberg et al., 2014; Deng et al., 2018; Mignac et al., 2015; Wu et al., 2018; Xie et al., 2011) be-cause of its acceptable performance and significantly reduced cost compared to EnKF, and ensemble background covariances are widely used in EnVar and hybrid data assim-ilation methods (Bannister, 2017). The results here suggest that improvements could be obtained using either found analogs or constructed analogs; the increased cost of using analogs will be situation-dependent, but if the costs can be made lower than the cost of forecasting an ensemble, then the analog EnOI or EnVar methods may be an attractive alternative.

Acknowledgments

The author is grateful to Jeff Anderson for a discussion on the history of analog weather forecasting. This work used the Extreme Science and Engineering Discovery Environ-ment (XSEDE; Towns et al., 2014) Bridges (Nystrom et al., 2015) at the Pittsburgh Supercomputing Center through allocation TG-DMS190025. This work was funded by the US National Science Foundation under grant number DMS 1821074.

References

J. T. Abatzoglou and T. J. Brown. A comparison of statistical downscaling methods suited for wildfire applications. International Journal of Climatology, 32(5):772– 780, 2012.

B. C. Backeberg, F. Counillon, J. A. Johannessen, and M.-I. Pujol. Assimilating along-track sla data using the enoi in an eddy resolving model of the agulhas system. Ocean Dynamics, 64(8):1121–1136, 2014.

(13)

K. Brusdal, J.-M. Brankart, G. Halberstadt, G. Evensen, P. Brasseur, P. J. van Leeuwen, E. Dombrowsky, and J. Verron. A demonstration of ensemble-based assimilation methods with a layered ogcm from the perspective of operational ocean forecasting systems. Journal of Marine Systems, 40:253–289, 2003.

G. Burgers, P. Jan van Leeuwen, and G. Evensen. Analysis scheme in the ensemble kalman filter. Mon. Wea. Rev., 126(6):1719–1724, 1998.

L. Delle Monache, F. A. Eckel, D. L. Rife, B. Nagarajan, and K. Searight. Probabilistic weather prediction with an analog ensemble. Mon. Wea. Rev., 141(10):3498–3516, 2013.

Z. Deng, J. Liu, X. Qiu, X. Zhou, and H. Zhu. Downscaling RCP8. 5 daily temperatures and precipitation in Ontario using localized ensemble optimal interpolation (EnOI) and bias correction. Climate Dynamics, 51(1-2):411–431, 2018.

F. A. Eckel and L. Delle Monache. A hybrid nwp–analog ensemble. Mon. Wea. Rev., 144(3):897–911, 2016.

M. El Gharamti, K. Raeder, J. Anderson, and X. Wang. Comparing adaptive prior and posterior inflation for ensemble filters using an atmospheric general circulation model. Mon. Wea. Rev., 147(7):2535–2553, 2019.

G. Evensen. Sequential data assimilation with a nonlinear quasi-geostrophic model using Monte Carlo methods to forecast error statistics. J. Geophys. Res. Oceans, 99 (C5):10143–10162, 1994.

G. Evensen. The ensemble Kalman filter: Theoretical formulation and practical imple-mentation. Ocean dynamics, 53(4):343–367, 2003.

I. Grooms and Y. Lee. A framework for variational data assimilation with superparam-eterization. Nonlinear Proc. Geoph., 22(5):601–611, 2015.

H. Hidalgo, M. Dettinger, and D. Cayan. Downscaling with constructed analogues: Daily precipitation and temperature fields over the United States, 2008. Califor-nia Energy Commission, PIER Energy-Related Environmental Research. CEC-500-2007-123.

C. F. Higham and D. J. Higham. Deep learning: An introduction for applied mathe-maticians. SIAM Review, 61(4):860–891, 2019.

P. L. Houtekamer and H. L. Mitchell. Data assimilation using an ensemble kalman filter technique. Mon. Wea. Rev., 126(3):796–811, 1998.

D. Kingma and M. Welling. An introduction to variational autoencoders. Foundations and Trends in Machine Learning, 12(4):307–392, 2019. doi: 10.1516/2200000056. R. Lguensat, P. Tandeo, P. Ailliot, M. Pulido, and R. Fablet. The analog data

(14)

R. Lguensat, P. H. Viet, M. Sun, G. Chen, T. Fenglin, B. Chapron, and R. Fablet. Data-driven interpolation of sea level anomalies using analog data assimilation. Remote Sensing, 11(7):858, 2019.

A. C. Lorenc. Recommended nomenclature for EnVar data assimilation methods. Re-search Activities in Atmospheric and Oceanic Modeling, 5, 2013.

E. Lorenz. Predictability: A problem partly solved. In Proceedings of Seminar on Predicability, volume 1, pages 1–18. ECMWF, Reading, UK, 1996.

E. Lorenz. Predictability: A problem partly solved. In T. Palmer and R. Hagedorn, editors, Predictability of Weather and Climate, pages 40–58. Cambridge University Press, 2006.

E. N. Lorenz. Atmospheric predictability as revealed by naturally occurring analogues. J. Atmos. Sci., 26(4):636–646, 1969.

E. N. Lorenz and K. A. Emanuel. Optimal sites for supplementary weather observa-tions: Simulation with a small model. J. Atmos. Sci., 55(3):399–414, 1998.

E. P. Maurer, H. G. Hidalgo, T. Das, M. D. Dettinger, and D. R. Cayan. The utility of daily large-scale climate data in the assessment of climate change im-pacts on daily streamflow in california. Hydrology and Earth System Sciences, 14(6):1125–1138, 2010. doi: 10.5194/hess-14-1125-2010. URL https://www. hydrol-earth-syst-sci.net/14/1125/2010/.

S. Metref, E. Cosme, C. Snyder, and P. Brasseur. A non-gaussian analysis scheme using rank histograms for ensemble data assimilation. Nonlinear Proc. Geoph., 21: 869–885, 2014.

D. Mignac, C. Tanajura, A. Santana, L. Lima, and J. Xie. Argo data assimilation into hycom with an enoi method in the atlantic ocean. Ocean Science, 11(1), 2015. M. Morzfeld and D. Hodyss. Gaussian approximations in filters and smoothers for data

assimilation. Tellus A, 71(1):1–27, 2019.

N. A. Nystrom, M. J. Levine, R. Z. Roskies, and J. R. Scott. Bridges: A uniquely flexible hpc resource for new communities and data analytics. In Proceedings of the 2015 XSEDE Conference: Scientific Advancements Enabled by Enhanced Cyberinfrastructure, XSEDE ’15, pages 30:1–30:8, New York, NY, USA, 2015. ACM. ISBN 978-1-4503-3720-5. doi: 10.1145/2792745.2792775. URL http: //doi.acm.org/10.1145/2792745.2792775.

P. R. Oke, J. S. Allen, R. N. Miller, G. D. Egbert, and P. M. Kosro. Assimilation of surface velocity data into a primitive equation coastal ocean model. J. Geophys. Res. Oceans, 107(C9):5–1, 2002.

(15)

D. W. Pierce, D. R. Cayan, and B. L. Thrasher. Statistical downscaling using localized constructed analogs (loca). Journal of Hydrometeorology, 15(6):2558–2585, 2014. B. Raoult, G. Di Fatta, F. Pappenberger, and B. Lawrence. Fast retrieval of weather

analogues in a multi-petabytes archive using wavelet-based fingerprints. In Interna-tional Conference on ComputaInterna-tional Science, pages 697–710. Springer, 2018. O. Talagrand. Variational assimilation. In W. Lahoz, B. Khattatov, and R. Menard,

editors, Data Assimilation Making Sense of Observations, pages 41–67. Springer, 2010.

J. Towns, T. Cockerill, M. Dahan, I. Foster, K. Gaither, A. Grimshaw, V. Hazle-wood, S. Lathrop, D. Lifka, G. Peterson, R. Roskies, J. Scott, and N. Wilkins-Diehr. XSEDE: Accelerating Scientific Discovery. 16:62–74, 2014.

H. Van den Dool. Searching for analogues, how long must we wait? Tellus A, 46(3): 314–324, 1994.

J. S. Whitaker and T. M. Hamill. Ensemble data assimilation without perturbed obser-vations. Mon. Wea. Rev., 130(7):1913–1924, 2002.

B. Wu, T. Zhou, and F. Zheng. EnOI-IAU Initialization Scheme Designed for Decadal Climate Prediction System IAP-DecPreS. Journal of Advances in Modeling Earth Systems, 10(2):342–356, 2018.

J. Xie, F. Counillon, J. Zhu, L. Bertino, and A. Schiller. An eddy resolving tidal-driven model of the South China Sea assimilating along-track SLA data using the EnOI. Ocean Science, 7(5), 2011.

D. Yang and S. Alessandrini. An ultra-fast way of searching weather analogs for re-newable energy forecasting. Solar Energy, 185:255–261, 2019.