• No results found

Algorithms for inverse big data problems

N/A
N/A
Protected

Academic year: 2021

Share "Algorithms for inverse big data problems"

Copied!
72
0
0

Loading.... (view fulltext now)

Full text

(1)

Algorithms for inverse big data problems

Serge Gratton University of Toulouse

CERFACS-IRIT

Joint work with E. Bergou, S. Gurol, E. Simon, D. Titley-Peloquin, Ph.L. Toint, J. Tshimanga,L.N. Vicente, Z. Zhang

(2)

A dynamical system is characterized by state variables, e.g. velocity components pressure density temperature gravitational potential

Goal: predict the state of the system at a future time from

dynamical integration model observational data

Applications: climate, meteorology, oceanography, neutronics, finance, ...

(3)

A dynamical integration modelpredicts the state of the system given the state at an earlier time.

! integrating may lead to very largeprediction errors

! (inexact physics, discretization errors, approximated parameters)

Observational data are used to improve accuracy of the forecasts.

! but the data areinaccurate (measurement noise, under-sampling

! 107 observations (109 variables) processed every day: inverse big

(4)

A dynamical integration modelpredicts the state of the system given the state at an earlier time.

! integrating may lead to very largeprediction errors

! (inexact physics, discretization errors, approximated parameters)

Observational data are used to improve accuracy of the forecasts.

! but the data areinaccurate (measurement noise, under-sampling

! 107 observations (109 variables) processed every day: inverse big

(5)

A dynamical integration modelpredicts the state of the system given the state at an earlier time.

! integrating may lead to very largeprediction errors

! (inexact physics, discretization errors, approximated parameters)

Observational data are used to improve accuracy of the forecasts.

! but the data areinaccurate (measurement noise, under-sampling

! 107 observations (109 variables) processed every day: inverse big

data

(6)

−2 −1 0 1 −1

0 1

known situation 2 days ago and background prediction

record data for the past 2 days

minimize the di↵erence between the model and observations

(7)

−2 −1 0 1 −1

0 1

known situation 2 days ago and background prediction record data for the past 2 days

minimize the di↵erence between the model and observations

(8)

−2 −1 0 1

−1 0 1

known situation 2 days ago and background prediction record data for the past 2 days

minimize the di↵erence between the model and observations

(9)

−2 −1 0 1 −1

0 1

known situation 2 days ago and background prediction record data for the past 2 days

minimize the di↵erence between the model and observations

(10)

Notations and toy example Subspace and model reduction Weak contraints Conclusion

Modeling and toy example Dual iterative solvers

Using subspace ideas, subspace reduction

Ensemble methods with a stochastic dynamical system Variational methods with a stochastic dynamical system Summary, caveat and ongoing related work

(11)

Notations and toy example Subspace and model reduction Weak contraints Conclusion

Modeling and toy example

Dual iterative solvers

Using subspace ideas, subspace reduction

Ensemble methods with a stochastic dynamical system Variational methods with a stochastic dynamical system Summary, caveat and ongoing related work

(12)

A vibrating string

We consider a vibrating string, hold fixed at both ends

It is released with a zero initial speed, from an unknown position The string remains in the vertical plane

The string is observed with a set of physical devices measuring the position string at regularly spaced points during a given time span We would like to

make a forecast of the string position outside the observation time span

Observations1

0 x

u(x)

(13)

A vibrating string : the model

The string positionu(x) is the solution of the partial di↵erential equation 8 < : @2 @t2u(x,t) @ 2 @x2u(x,t) = 0 in ]0,1[⇥]0,T[ u(0,t) =u(1,t) = 0, t 2]0,T[ u(x,0) =u0(x),@@tu(x,0) = 0, x 2]0,1[

Under regularity assumptions on u0, this system has one unique

solution

We suppose that the system is observed at timetn,n 0

The problem reads minu0

PNobt

n=0 ku(:,tn) ynk2

This is an infinite dimensional linear least squares problem, that has to

(14)

The observations

We consider now that the string is observed regularly in time and

space. No noise,moreobservations than unkonwns

The discretized version of linear least-squares problem minu0

PNobt

n=0 ku(:,tn) ynk2 is solved with aconjugate gradient

technique

! test(’over’)

(15)

Realistic di

ffi

cult case

In practice, observing a 3D field at all space points is out of reach

! test(’under’)

The observations are noisy, which introduces high frequencies in the analysis

! test(’noisy’)

Both e↵ect (always) come together

(16)

Exploiting ”a priori” information

We do not consider the previous solution acceptable, because we doubt a string might take such positions. We expect the solution to be smooth enough

We would like to introduce the fact that the string position should

not vary too much when considering points that areclose in the

physical space

purely algebraic approach, e.g. minu0 PNobx j=0 1|u0j u0j+1|2+ PNobt n=0 ku(:,tn) ynk2s.t. discretized dynamics

using a pseudo-physical smoothing process

(17)

Smoothing in the discretized space with the heat equation

We consider the discretized heat equation

8 < : @ @tu(x,t) @ 2 @x2u(x,t) = 0 in ]0,1[⇥]0,T[ u(0,t) =u(1,t) = 0, t 2]0,T[ u(x,0) =u0(x), x 2]0,1[

For a given T,u(.,T) is smoother thanu0, because high frequency

(18)
(19)

Solve a large-scale non-linear weighted least-squares problem: min x2Rn 1 2kx xbk 2 B 1 + 1 2 N X j=0 Hj Mj(x) yj 2R 1 j where

x ⌘x(t0) is the control variable

Mj are model operators: x(tj) =Mj(x(t0))

Hj are observation operators: yj ⇡Hj(x(tj))

the obervations yj and the background xb are noisy

(20)

Back to the realistic di

ffi

cult case

Underdetermined case

! test(’under-reg’)

Noisy case

! test(’noisy-reg’)

Underdetermined and noisy case

(21)

Issues on background regularization

Background matrix mat-vec in CG : another di↵erential equationhas

to be solved

In case of modelling,when a direct solution not applicable, an

inner-outer iteration scheme has to be controlled

Determining a reasonable background matrix : based on physical

considerations, possibly onstatisticsover past assimilation periods

Introduction of balanced relations in the background : when variables

are related to each other by relations that are not accounted for in the model and not properly observed, an additional (weak) penalty term is added

(22)

min x2Rn 1 2kx xbk 2 B 1 + 1 2 N X j=0 Hj Mj(x) yj 2R 1 j Algorithmic considerations:

very largeproblem size: m⇠107 observations, n⇠108 unknowns

iterative methods for nonlinearproblems:

primal/dual formulations? [S. Gratton and J. Tshimanga (2009) ; S. Gratton, P. Toint, and J. Tshimanga, 2011.]

preconditioning? [S. Gratton, A. Sartenaer, J. Tshimanga, A.T. Weaver (2008) ; S. Gratton, P. Laloyaux, A. Sartenaer, and J. Tshimanga (2001) ]

using of subspace information? handle stochastic dynamical systems?

(23)

Notations and toy example

Subspace and model reduction Weak contraints Conclusion

Modeling and toy example

Dual iterative solvers

Using subspace ideas, subspace reduction

Ensemble methods with a stochastic dynamical system Variational methods with a stochastic dynamical system Summary, caveat and ongoing related work

(24)

Notations and toy example

Subspace and model reduction Weak contraints Conclusion

Solve a large-scale non-linear weighted least-squares problem:

min x2Rn 1 2kx xbk 2 B 1 + 1 2 N X j=0 Hj Mj(x) yj 2R 1 j

Typically solved using a truncated Gauss-Newton algorithm (known as 4D-Var in the DA community)

! linearize Hj Mj(x(k)+ x(k)) ⇡Hj Mj(x(k)) +Hj(k) x(k)

! solve the linearized subproblem

min x(k)2Rn 1 2k x (k) (x b x(k))k2B 1 + 1 2 H (k) x(k) d(k) 2 R 1 ! update x(k+1)=x(k)+ x(k)

(25)

Notations and toy example

Subspace and model reduction Weak contraints Conclusion

Defining the constraint as "=H x d we can write Lagrangian function

for the quadraticproblem as

L( x,", ) = 1 2kx+ x xbk 2 B 1+ 1 2k"k 2 R 1+ T(" H( x) +d)

From Karush-Kuhn-Tucker conditions, the following stationary conditions are satisfied at any optimum:

r xL( x,", ) =B 1(x+ x xb) HT = 0 r"L( x,", ) =R 1"+ = 0

r L( x,", ) =" H x +d = 0

(26)

Notations and toy example

Subspace and model reduction Weak contraints Conclusion

Exploiting the structure: Dual Approach

Theexact solution is either

xb xk+BHkT (HkBHkT+R) 1(dk Hk(xb xk))

| {z }

Lagrange mult. : requires solving a linear systemiterativelyinRm

or, from normal equations

xb xk+ (B 1+HkTR 1Hk) 1HkTR 1(dk Hk(xb xk))

| {z }

linear systemiterativelyinRn

Ifm<<n, then performing theminimization inRmcan both reduce memory and

computational cost

Nonlinear problem: we want to solve iteratively and truncate afterfew stepsin a

trust-region tuncated CG solver. Does it hurt to use this dual approach?

(27)

Notations and toy example

Subspace and model reduction Weak contraints Conclusion

Dual/Primal approach compared

Cost function: J( xk) = 12k xk xb+xkk2B 1+ 12kHk xk dkk2R 1

Blue primal - Red dual

the relation between primal and dual formula is lost when an iterative

solver is truncated prematurely

This is bad news for the nonlineariterations

(28)

Notations and toy example

Subspace and model reduction Weak contraints Conclusion

Preconditioned CG algorithm

For i = 0,1, ...

1 qi = (B 1+HTR 1H)pi

2 ↵i =<ri,zi > / <qi,pi > Compute the step-length

3 xi+1= xi+↵ipi Update the iterate

4 ri+1=ri ↵iqi Update the residual

5 zi+1= F ri+1 Update thepreconditioned residual

6 i =<ri+1,zi+1> / <ri,zi> Ensure A-conjugate directions

7 pi+1=zi+1+ ipi Update the descent direction

(29)

Notations and toy example

Subspace and model reduction Weak contraints Conclusion Initialization steps given v0;r0 = (HTR 1H + B 1)v0 b,. . . Loop: WHILE 1 HTbqi 1 = HT(R 1HB 1HT + Im)bpi 1 2 ↵i 1 = rT i 1zi 1/bqiT1bpi 1 3 BHTbvi = BHT(vi 1 + ↵i 1bpi 1) 4 HTbri = HT(ri 1 + ↵i 1qbi 1) 5 BHTbzi = FHTbri = BHTGbri FHT =BHTG 6 i = (rT i zi/riT1zi 1) 7 BHTbpi = BHT( bzi + ibpi 1)

(30)

Notations and toy example

Subspace and model reduction Weak contraints Conclusion Initialization 0= 0,br0=R 1(d H(xb x)), b z0=Gbr0,pb1=bz0,k= 1 Loop on k 1 bqi =Abbpi 2 ↵i=<bri 1,bzi 1>M/ <bqi,bpi >M 3 i = i 1+↵ibpi 4 bri =bri 1 ↵ibqi 5 i=<bri 1,bzi 1>M/ < bri 2,bzi 2>M 6 bzi=Gbri 7 bpi =bzi 1+ ibpi 1 b A=R 1HBHT+I m G is the preconditioner M is the inner-product RPCGAlgorithm: M=HBHT

preserves monotonic decrease on quadratic cost

Gshould be symmetric w.r.t. toM NeedBHTG=FHT

(31)

Notations and toy example

Subspace and model reduction Weak contraints Conclusion

Observations: SST (Sea Surface Temperature) and SSH(Sea Surface Height) observations from satellites. Sub-surface hydrographic observations from floats

Number of observations (m): 105

Number of state variables (n): 106for strong constraint and 107for weak constraint

Computation: 64 CPUs

(32)

Notations and toy example

Subspace and model reduction Weak contraints Conclusion

Let Fk be s.p.d.. Is the relationFkHkT =BHkTGk reasonable ?

Perfect preconditioner :

(B 1+HkTR 1Hk) 1HkT =BHkT(HkBHkT+R) 1

IfHkBHkT is nonsingular, andIm(FHkT)⇢Im(BHT), there is always

a unique Gk that corresponds to Fk. But don’t want to compute

Gk = (HkBHkT +R) 1FkHkT: too expensive

We focus on quasi-Newton Limited Memory Preconditioner [Morales

and Nocedal 2000] [Gratton, Sartenaer and Tshimanga 2011] that are widely

used in applications

[Gratton, Gurol and Toint 2012] derivethe quasi-Newton LMP in dual

space which generatesmathematically equivalent iterates to those of primal approach

(33)

Notations and toy example

Subspace and model reduction Weak contraints Conclusion

F

as a Quasi-Newton Limited Memory Preconditioner

Quasi-Newton LMPs are simply based on the idea that generates

preconditioners by using LBFGSupdating formula

Fk+1 = (In ⌧kpkqkT)Fk(In ⌧kqkpkT) +⌧kpkpkT ⌧k = 1/(qkTpk) qk = (B 1+HTR 1H)pk The pairs Fk defined by Fk =Fk+1 Fk, is min Fk W1/2 FkW1/2 F subject to Fk = FkT, Fk+1qk =pk

whereW is a positive definite matrix satisfying Wpk =qk. MatrixF

approximates the inverse of the system matrix: a preconditioner Serge G. Large-scale inverse problems

(34)

Notations and toy example

Subspace and model reduction Weak contraints Conclusion

G

as a Quasi-Newton Limited Memory Preconditioner

GivingF, we can find G that satisfies FHT =BHTG

G can be derived by using the formula for F using the relations

pi = BHTbpi,

qi = HTbqi

Gk+1 = (Im b⌧kbpk(Mbqk)T)Gk(Im ⌧bkbqkbpkTM) +b⌧kbpkbpkTM

M =HBHT

b

pk is the search direction

b

qk = (Im+R 1HBHT)bpk The dual pairs

b

⌧k = 1/(bqkTHBHTbpk)

(35)

Notations and toy example

Subspace and model reduction Weak contraints Conclusion

The quasi-Newton LMP in dual space (Nonlinear case)

In the nonlinear iterations, inheriting the previous preconditioner Gk 1

may not be possible because we need the existence ofFk, s.p.d,

satisfying FkHkT =BHkTGk 1

Theinherited matrixGk 1 may not satisfy HkFkHkT

=HkBHkTGk 1 for anyFk, s.p.d..

The method can be improved by using a safeguarding strategy, whose convergence relies on the Cauchy decrease and a trust-region

”magical” step

(36)

Notations and toy example

Subspace and model reduction Weak contraints Conclusion

Numerical experiment on heat equation (1/3)

The dynamical model is considered to be the nonlinear heat equation defined by x t 2x u2 2x v2 +f[x] = 0 in⌦⇥(0,1) x[u,v,t] = 0 on ⌦⇥(0,1)

where the temperature variable x[u,v,t] depend on both timet and

position given by spatial coordinates u and v. The functionf[x] is defined by

f[x] =exp[⌘x]

(37)

Notations and toy example

Subspace and model reduction Weak contraints Conclusion

Numerical experiment on heat equation (2/3)

f[x] =exp[⌘x] ⌘ = 2

(38)

Notations and toy example

Subspace and model reduction Weak contraints Conclusion

Numerical experiment on heat equation (3/3)

f[x] =exp[⌘x] ⌘= 4.2

(39)

Notations and toy example

Subspace and model reduction Weak contraints Conclusion

Modeling and toy example Dual iterative solvers

Using subspace ideas, subspace reduction

Ensemble methods with a stochastic dynamical system Variational methods with a stochastic dynamical system Summary, caveat and ongoing related work

(40)

Notations and toy example

Subspace and model reduction

Weak contraints Conclusion

Gradient based approach DFO

Nemo, GYRE configuration

Sea level anomaly (top left), velocities (top right), temperature (bottom left) and salinity (bottom right)

North Atlantic circulation with its Gulf stream current.

The basin has 3180 km length, 2120 km width and 4, 5 km depth. State of length 90000

(41)

Notations and toy example

Subspace and model reduction

Weak contraints Conclusion

Gradient based approach DFO

A subspace from physical considerations

Physical considerations:

The ocean and the atmosphere exhibit anattractor: a slowly varying

geometrical oject that contains the state

Most of the variability can be explained in the “attractor subspace”

(of low dimension r)

! Minimize first in this subspace (of basisL)

(42)

Notations and toy example

Subspace and model reduction

Weak contraints Conclusion

Gradient based approach DFO

Empirical Orthogonal Functions (EOFs)

Construction of L:

Let x1, . . . ,xp2Rn be a set ofstate vectors

BuildC = p11Ppi=1(xi ¯x)(xi ¯x)T

Compute theeigenvectors of C (EOFs)

Storer eigenvectorscorresponding to thelargest eigenvalues

! Already used in the reduced Kalman filters(SEEK filter)

(43)

Notations and toy example

Subspace and model reduction

Weak contraints Conclusion

Gradient based approach DFO

Choice for

r

Selectr such that: Pr i=1 i Pn i=1 i 0.9 ( i &)

!Ther= 100 yields more than 90%

(44)

Notations and toy example

Subspace and model reduction

Weak contraints Conclusion

Gradient based approach DFO

Using the subspace

[Gratton, Laloyau, Sartenaer, Tshimanga 2011]

The solution of the first system in the subspace spanned byLcan be

obtained with a Ritz-Galerkin formula:

x00=xb+L(LTA0L) 1LTb0

It is also possible to approximate the inverse Hessian with aLimited

Memory Preconditioner H =hIn L(LTA0L) 1LTA0 i h In A0L(LTA0L) 1LT i + L(LTA0L) 1LT

We also experimented the use of asmoother (↵t based) on the

vectors in L

Possible to keepA0L from past iteration orrecompute (linearization)

(45)

Notations and toy example

Subspace and model reduction

Weak contraints Conclusion

Gradient based approach DFO

Subspace solution

(46)

Notations and toy example

Subspace and model reduction

Weak contraints Conclusion

Gradient based approach DFO

Subspace solution with linearization

(47)

Notations and toy example

Subspace and model reduction

Weak contraints Conclusion

Gradient based approach DFO

Using the subspace

[Gratton, Laloyau, Sartenaer 2014]

The Ritz-Galerkin starting point corresponds to solving the quadratic

minimisation probleminside the subspace Im(L)

Since this subspace it typically of dimension less then 1000, using derivative free optimization techniques is conceivable, even on the

nonlinear function

Possible scheme, by processing vectors by e.g. batches of 10 members

fork= 1, ..

1 Build a new subspaceLk

2 Minimize function in affine spacexk+ImLk to getxk+LkwK

SID-PSM[Custodio, Vicente 2007], see also [Vaz, Vicente 2009])

3 Setxk+1=xk+LkwKr

(48)

Notations and toy example

Subspace and model reduction

Weak contraints Conclusion

Gradient based approach DFO

Subspace solution

A criterion⇡i to promote subspace vectorsli for whichlarge variationof the cost function

occurs (singular value i) is used:

⇡i=⌧i+ ·

maxj⌧j

maxj j i

,where⌧i =kh(x0) h(x0+li)kR 1

Vectorsli can besmoothed

See[Gratton, Vicente, Zhang 2014]for extension to full domain decomposition and stochastic methods. Also see[Gratton, Royer, Vicente, Zhang 2014]for a complexity analysis ofstochastic direct searchand[Diouane, Gratton, Vicente 2014]for similar algorithms in globally convergent evolution algorithms

(49)

Notations and toy example

Subspace and model reduction

Weak contraints Conclusion

Gradient based approach DFO

Modeling and toy example Dual iterative solvers

Using subspace ideas, subspace reduction

Ensemble methods with a stochastic dynamical system

Variational methods with a stochastic dynamical system Summary, caveat and ongoing related work

(50)

Notations and toy example

Subspace and model reduction

Weak contraints Conclusion

Gradient based approach DFO

Weak-constraint 4D-Var

min x2Rn 1 2kx0 xbk 2 B 1+ 1 2 N X j=0 Hj xj yj 2R 1 j +1 2 N X j=1 kxj Mj(xj 1) | {z } qj k2 Qj 1 x = 0 B B B @ x0 x1 .. . xN 1 C C C A2R

nis the control variable (withx

j =x(tj))

xb is the background given at the initial time (t0).

yj 2Rmj is the observation vector over a given time interval

Hj maps the state vector xj from model space to observation space

Mj represents an integration of the numerical model from timetj 1 totj B,Rj andQj are the covariance matrices of background, observation and

model error.

We have a large-scale weighted nonlinear least-squares problem. Serge G. Large-scale inverse problems

(51)

Notations and toy example

Subspace and model reduction

Weak contraints Conclusion

Gradient based approach DFO

A data assimilation approach

The objective function to be minimized

1 2kx0 xbk 2 B 1+ 1 2 k X i=1 xi Mi xi 1 2Q 1 i +1 2 k X i=1 kdi Hi(xi)k2R 1 i .

The corresponding LMM linearized least-squares is

1 2kx0+ x0 xbk 2 B 1+ 1 2 k X i=1 xi mi Mi xi 1 2Q 1 i +1 2 k X i=1 kyi Hi xik2R 1 i + 2 k X i=0 kxik2, whereyi=di Hi(xi), Hi=H0i(xi), mi=Mi xi 1 xi,Mi=M0i xi 1 .

The penalty termcan be considered asfurther observations.

(52)

Notations and toy example

Subspace and model reduction

Weak contraints Conclusion

Gradient based approach DFO

Kalman Filter

Kalman filter (KF) Initialization Prediction step xi|i 1 =Mi xi 1|i 1+mi, Correction step xi|i = xi|i 1 Pi|i 1H>i (HiPi|i 1H>i +Ri) 1(Hi xi|i 1 yi), wherePi|i 1=MiPi 1|i 1M>i +Qi.

Pi|i 1 is expensive to compute and store) ensemble Kalman filter.

(53)

Notations and toy example

Subspace and model reduction

Weak contraints Conclusion

Gradient based approach DFO

Ensemble

Kalman filter

Ensemble Kalman filter

Initialization Prediction step xn i|i 1 =Mi xin 1|i 1+mi+Vin, Vin⇠N(0,Qi), n = 1, . . . ,N, Correction step xin|i = xin|i 1 PNi|i 1H>i (HiPNi|i 1H>i +Ri) 1(Hi xin|i 1 Win yi), Win⇠N(0,Ri), n = 1, . . . ,N, PNi|i 1=N11PNn=1⇣ xn i|i 1 N1 PN n=1 xin|i 1 ⌘ ⇣ xn i|i 1 N1 PN n=1 xin|i 1 ⌘> . Serge G. Large-scale inverse problems

(54)

Notations and toy example

Subspace and model reduction

Weak contraints Conclusion

Gradient based approach DFO

Derivative-free implementation of the EnKS

Mi =M0i(xi 1)occurs only in advancing the time as action on an ensemble member Mi xin|i 1⇡ Mi ⇣ xi 1+⌧ xin|i 1 ⌘ Mi(xi 1) ⌧ .

Hi =H0i(xi)occurs only in the action on an ensemble member

Hi xi|in ⇡ Hi ⇣ xi +⌧ xin|i ⌘ Hi(xi) ⌧ .

The algorithm has to be transformed to get the least-squares solution:

ensemble smoother

(55)

Notations and toy example

Subspace and model reduction

Weak contraints Conclusion

Gradient based approach DFO

Globalization of LMM-EnKS

Initialization: choose ⌘12(0,1), ⌘22(0,1), min >0, max >0, >1,x0

and 02[ min, max]. Forj = 0,1,2, . . .

1 Approximatethe solution of the linearized LMM subproblem byEnKS. 2 Add the regularization term as further observations.

3 Compute⇢= f(x

j) f(xj+ xj)

mj(xj) mj(xj+xj), where x

j is the EnKS mean.

If⇢ ⌘1, then setxj+1 =xj+ xj and

j+1= 8 > < > : j if kgmjk<⌘2/ j, max ( j 1 pj pj , min ) if kgmjk ⌘2/ j.

Otherwise, the step is rejected and j+1 = j.

[E. Bergou and L.N. Vicente]

(56)

Notations and toy example

Subspace and model reduction

Weak contraints Conclusion

Gradient based approach DFO

Experiments set-up

Dynamical model = Lorenz 63 model dx dt = (x y) dy dt = ⇢x y xz dz dt = xy z Simplified mathematical

model foratmospheric

convection [Lorenz 63]. Model describing idealized thermal convection in the Earth’s atmosphere. The system if fully observed at each time stepdt = 0.11.

k = 40,N = 400.

(57)

Notations and toy example

Subspace and model reduction

Weak contraints Conclusion

Gradient based approach DFO

Numerical experiments

Objective function values Root mean square error

[E. Bergou, S. Gratton, L.N. Vicente, submitted to SIAM/JUQ]

(58)

Notations and toy example

Subspace and model reduction

Weak contraints Conclusion

Gradient based approach DFO

Modeling and toy example Dual iterative solvers

Using subspace ideas, subspace reduction

Ensemble methods with a stochastic dynamical system

Variational methods with a stochastic dynamical system

Summary, caveat and ongoing related work

(59)

Variational approach: the linearized subproblem (Inner

loop)

The linearized problem at thek-th outer loop is given by

min x 1 2kx0 b (k) k2B 1+ 1 2 N X j=0 H(jk) xj d(jk) 2 Rj 1+ 1 2 N X j=1 kxj Mj(k) xj 1 | {z } qj k2Q 1 j x= 0 B B B B @ x0 x1 . . . xN 1 C C C C A2R nis the increment.

The vectorsb(k),cj(k)anddj(k)are defined by

b(k)=xb x0(k) dj(k)=yj Hj(xj(k))

(60)

Rewriting the linearized subproblem

min x2Rn 1 2kL x bk 2 D 1+ 1 2kH x dk 2 R 1 where L= 0 B B B B B @ I M1 I M2 I . .. . .. MN I 1 C C C C C A d= 0 B B B @ d0 d1 .. . dN 1 C C C A andb= 0 B B B @ b c1 .. . cN 1 C C C A H=diag(H0,H1, . . . ,HN)

(61)

Rewriting the linearized subproblem

min x2Rn 1 2kL x bk 2 D 1+ 1 2kH x dk 2 R 1 L x = 0 B B B B B @ I M1 I M2 I . .. . .. MN I 1 C C C C C A 0 B B B B B @ x0 x1 x2 .. . xN 1 C C C C C A = 0 B B B B B @ x0 x1 M1 x0 x2 M2 x1 .. . xN MN xN 1 1 C C C C C A

(62)

Rewriting the linearized subproblem

Making change of variables

p =L x

the subproblem can also be rewritten as min p2Rn 1 2k p bk 2 D 1+ 1 2kHL 1 p dk2 R 1 x =L 1 p issequential ! x j =Mj xj 1+ qj

(63)

The linearized subproblems

State Formulation min x 1 2kL x bk 2 D 1+ 1 2kH x dk 2 R 1 Forcing Formulation min p 1 2k p bk 2 D 1+ 1 2kHL 1 p dk2 R 1

Matrix-vector products withL

can be parallelizedin the time

dimension.

Solution algorithm:

Preconditioned Lanczos or PCG type methods.

Preconditioningis difficultsince

D1/2eL T(LTD 1L)eL 1D1/2

can be ill-conditioned depending on the accuracy ofeL 1.

Matrix-vector products withL 1

issequential.

Solution algorithm:

Preconditioned Lanczos or PCG type methods.

PreconditioningbyD often

e↵ective. The structure ofD is tri-diagonal by blocks!

(64)

Saddle Point Approach

In matrix form: 0 @D0 R0 HL LT HT 0 1 A | {z } A 0 @µ x 1 A= 0 @bd 0 1 A

whereA is a (2n+m)-by-(2n+m) indefinite symmetricmatrix.

The solution of this problem is a saddle point

! This approach istime-parallel.

(65)

Preconditioning Saddle Point Formulation of 4D-Var

A= 0 @D0 R0 HL LT HT 0 1 A= ✓ A BT B 0 ◆

B is the most computationally expensive block and calculations

involving Aare relatively cheap.

Theinexact constraint preconditioner proposed by (Bergamaschi et. al. 2005) is promising for our application. The preconditioner can be chosen as: P = Ae BeT B 0 ! = 0 @D 0 eL 0 R 0 e LT 0 0 1 A, where e

Lis an approximation to the matrixL

e

B= [eLT 0] is a full row rank approximation of the matrix B2Rn⇥(m+n)

(66)

Notations and toy example Subspace and model reduction

Weak contraints

Conclusion

Numerical Results

Implementation platform

We used the Object Oriented Prediction System (OOPS) OOPS consists of simplified models of a real-system The model

It is a two-layer quasi-geostraphic model with 1600 grid-points Implementation details

There are 100 observations of stream function every 3 hours, 100 wind observations plus 100 wind-speed observations every 6 hours The error covariance matrices are assumed to be horizontally isotropic and homogeneous, with Gaussian spatial structure

The analysis window is 24 hours, and is divided into 8 subwindows 3 outer loops with 10 inner loops each are performed

(67)

Notations and toy example Subspace and model reduction

Weak contraints

Conclusion

Numerical Results

Methods

1 Forcing formulation withD preconditioning

!Solution method is preconditioned conjugate-gradients

2 Saddle point formulation with an updated inexact constraint preconditioner !Solution method is GMRES

!The initial preconditioner is chosen as

P0= 0 @D 0 e L 0 R 0 e LT 0 0 1 A with eL= 0 B B B @ I I I . .. ... I I 1 C C C A. P01= 0 @ 0 0 e L T 0 R 1 0 e L 1 0 eL 1DeL T 1 A and eL 1= 0 B B B @ I I I .. . . .. ... I · · · I I 1 C C C A.

!The low-rank low-cost preconditioner is denoted asCk !The low-rank least-Frobenius preconditioner is denoted asFk

(68)

Notations and toy example Subspace and model reduction

Weak contraints

Conclusion

Numerical Results with OOPS qg-model

Figure : Nonlinear cost function values along iterations

!Last 8 pairs were used to construct the preconditioner ! B=↵vwT(where v=r

(69)

Notations and toy example Subspace and model reduction

Weak contraints

Conclusion

Numerical Results with OOPS qg-model

Figure : Nonlinear cost function values along iterations

!Last 8 pairs were used to construct the preconditioner Serge G. Large-scale inverse problems

(70)

Notations and toy example Subspace and model reduction

Weak contraints

Conclusion

Numerical Results with OOPS qg-model

Figure : Nonlinear cost function values along sequential subwindow integrations

!At each iteration theforcing formulationrequires one application ofL 1, followed by one application ofL T(16 sequential subwindow integrations)

!At each iteration ofsaddle point formulationrequireone subwindow integration (provided thatL 1andL T are applied simultaneously)

(71)

Notations and toy example Subspace and model reduction

Weak contraints

Conclusion

Modeling and toy example Dual iterative solvers

Using subspace ideas, subspace reduction

Ensemble methods with a stochastic dynamical system Variational methods with a stochastic dynamical system

Summary, caveat and ongoing related work

(72)

Notations and toy example Subspace and model reduction Weak contraints

Conclusion

Conclusions

Very large scale higly-nonlinear forecasting problems can be solved

using dual methods, projection methods, satistical sampling,adapted

to nonlinearity

In some severe cases, having ”more” than a local minimum might be

needed. Model reductionand exploration techniques are useful

Most systems are a priori designed for solving the prediction problem. Using DA techniques to other settings not necessarily successful.

Further work. Errors between time-steps correlated. Having a parallel

codeis a challenge. Use nonlinear multigrid, domain decomposition

ideas forscalability.

Figure

Figure : Nonlinear cost function values along iterations
Figure : Nonlinear cost function values along iterations
Figure : Nonlinear cost function values along sequential subwindow integrations

References

Related documents

The mentioned incentives that the model considers, to be able to adapt to different scenarios, are anticipation speed (awareness of drivers for vehicles downstream),

Since organizational cultures play a critical role in the adoption of the innovative technology among organizations and individuals in Saudi Arabia (Loch et al., 2003; Straub

However, in 1997,Ohio Senate Bill 102 of the 122nd General Assembly created the Ohio School Facilities Commission (OSFC) as a separate and distinct agency to oversee the

In the majority of patients with poor initial views, full views of the glottis could be ob- tained after applying simple measures in the first attempt success rate of

It is along such purpose, the current study would like to find out if ABM (Accounting, Business, Management) grade XII students of Senior High School of Divine

We note that the prolonged covariant derivative preserves the weight... THE GENERALIZED RECURRENT WEYL SPACES HAVING A DECOMPOSABLE CURVATURE TENSOR.. 14) Taking the prolonged

latter, 11 patients (28%) reported a daily ethanol alcoholic skeletal myopathy who maintained an ethanol intake of ∏ 60 g/day did not suffer further consumption of 20–60