• No results found

MAP estimators and 4D-VAR

N/A
N/A
Protected

Academic year: 2021

Share "MAP estimators and 4D-VAR"

Copied!
27
0
0

Loading.... (view fulltext now)

Full text

(1)

MAP estimators and 4D-VAR

Jochen Voss

University of Leeds

12 May 2014, Reading-Warwick Data Assimilation Meeting

joint work with Masoumeh Dashti, Kody Law and Andrew Stuart

(2)

Outline

Data Assimilation using 4D-VAR

Infinite Dimensional MAP Estimators

Consistency

Conclusions

(3)

Data Assimilation using 4D-VAR

(4)

We consider the following “caricature” of a forecasting problem:

true state x

0

true state x

T

background x

(b)

forecast x

(f )

model, observations

t = 0 (now) t = T (future) time

I

We want to forecast the unknown state x

T

of a system for a future time T , starting from the current state x

0

.

I

The current state x

0

is unknown, our “best guess” is x

(b)

.

I

For times 0 ≤ t

1

≤ t

2

≤ · · · ≤ t

J

≤ T we have noisy observations

y

j

≈ H(x

tj

).

(5)

For a Bayesian approach we make the following assumptions:

I

x

(b)

− x

0

∼ N (0, C ), i.e. x

0

∈ R

d

has density p(x

0

) = 1

(2π)

d /2

|C |

1/2

exp



− 1

2 (x

0

− x

(b)

)

>

C

−1

(x

0

− x

(b)

)

 .

I

The observations are independent and satisfy y

j

∼ N H(x

tj

), R, i.e.

y

j

∈ R

m

has density p(y

j

|x

0

) = 1

(2π)

m/2

|R|

1/2

exp 

− 1

2 (y

j

− H(x

tj

))

>

R

−1

(y

j

− H(x

tj

))  , where x

tj

= M

tj

(x

0

) is the system state at time t

j

, for j = 1, . . . , J.

From these assumptions we can find the posterior density of x

0

as

p(x

0

|y ) = p(y |x

0

)p(x

0

) p(y

j

) ∝

J

Y

j =1

p(y

j

|x

0

)p(x

0

) = · · · .

(6)

For high-dimensional models, the full posterior density for x

0

can be difficult to work with and often it is convenient to use a point estimate for x

0

instead. Here we use the maximum a posteriori (MAP) estimator ˆ x

0

, defined as

ˆ

x

0

= arg max

x0

p(x

0

|y ).

This estimator attempts to obtain a “typical” value from the posterior

distribution. The MAP estimator works well if the posterior is unimodal

and highly concentrated.

(7)

For the forecasting problem, the MAP estimator is the x

0

∈ R

d

which maximises

p(x

0

|y ) ∝

J

Y

j =1

p(y

j

|x

0

)p(x

0

)

∝ exp 

− 1 2

J

X

j =1

(y

j

− H(x

tj

))

>

R

−1

(y

j

− H(x

tj

))

− 1

2 (x

0

− x

(b)

)

>

C

−1

(x

0

− x

(b)

) 

=: exp −I (x

0

) 

or, equivalently, minimises the “cost function” I . The data assimilation

method based on this procedure is called 4D-VAR.

(8)

Summary: the 4D-VAR method minimises I (x ) = Φ(x ) + 1

2 kx − x

(b)

k

2E

where

kxk

2E

= x

>

C

−1

x and

Φ(x ) = 1 2

J

X

j =1

(y

j

− H(x

tj

))

>

R

−1

(y

j

− H(x

tj

))

where H is the observation map and x

tj

= M

tj

(x ) is obtained by

integrating the model, starting with state x at time 0, until time t

j

.

By shifting coordinates, we can assume x

(b)

= 0 without loss of

generality.

(9)

Infinite Dimensional MAP Estimators

(10)

In many applications of MAP estimators, including applications in weather forecasting, the system state x is an infinite-dimensional object.

Thus, it is natural to ask whether MAP estimators (and the 4D-VAR method) still work in infinite-dimensional spaces:

I

If a numerical method does not make sense for the limiting, infinite dimensional object, the method may be ill-behaved for

high-dimensional systems.

I

Separating issues of discretisation from issues of the estimation method can lead to greater clarity.

I

General rule: discretise as late as possible.

(11)

We assume that the posterior µ is a probability measure on an infinite dimensional, separable Banach space (X , k · k

X

).

Problem. The MAP estimator is defined in terms of densities w.r.t.

Lebesgue measure, but Lebesgue measure does not exist on infinite dimensional spaces.

Solution 1. We can consider reference measures µ

0

other than Lebesgue measure. Here we assume that µ

0

is a Gaussian measure on X and that µ has density

d µ d µ

0

(x ) ∝ exp −Φ(x ) 

w.r.t. µ

0

.

(12)

In finite dimensions:

d µ

d Leb (x ) = d µ d µ

0

(x ) · d µ

0

d Leb (x )

∝ exp −Φ(x) · exp − 1

2 x

>

C

−1

x 

= exp −Φ(x ) − 1 2 kxk

2E

 for all x ∈ R

d

.

Infinite dimensional analogue of the right-hand side:

“ d µ

d Leb (x )” ∝ exp −Φ(x ) − 1 2 kxk

2E



for all x ∈ E ⊂ X where (E , k · k

E

) is the Cameron-Martin space of the

Gaussian measure µ

0

. Even if the left-hand side does not make sense any

more, we can still try to maximise the right-hand side over E .

(13)

Example. If µ is the distribution of the solution of the stochastic differential equation (SDE) dx

t

= f (x

t

) dt + dw

t

on X = L

2

([0, T ], R), then we can choose µ

0

to be Wiener measure (i.e. the distribution of a Brownian motion). By the Girsanov formula from stochastic analysis, µ has density exp −Φ(x ) w.r.t. µ

0

, where

Φ(x ) = 1 2

Z

T 0

f (x

t

)

2

dt − Z

T

0

f (x

t

) dx

t

, and the Cameron-Martin space of µ

0

is

E = n

x ∈ H

1

([0, T ], R)

x

0

= 0, Z

T

0

˙

x

t2

dt < ∞ o

, with norm

kxk

2E

= Z

T

0

˙ x

t2

dt

for all x ∈ E . The “MAP estimator” minimises Φ(x ) +

12

kxk

2E

.

(14)

Solution 2. Without using densities we can consider small ball probabilities µ B(x , ε) and then let ε ↓ 0.

Definition. ˆ x ∈ X is a MAP estimator for µ, if lim

ε↓0

µ B(ˆ x , ε) 

sup

x ∈X

µ B(x , ε)  = 1.

Our main result show that ˆ x ∈ X is a MAP estimator for µ, if and only if ˆ

x is a minimiser of the Onsager-Machlup functional

I (x ) =

( Φ(x ) +

12

kxk

2E

, if x ∈ E , and

+∞ otherwise.

In particular this implies that MAP estimators always lie in the

Cameron-Martin space E .

(15)

Assumptions. The function Φ : X → R satisfies the following conditions:

A1 Φ is bounded from below, i.e. there is an M ∈ R, such that for all x ∈ X we have

Φ(x ) ≥ M.

A2 Φ is locally bounded from above, i.e. for every r > 0 there exists K = K (r ) > 0 such that for all x ∈ X with kx k

X

< r we have

Φ(x ) ≤ K .

A3 Φ is locally Lipschitz continuous, i.e. for every r > 0 there exists L = L(r ) > 0 such that for all x

1

, x

2

∈ X with kx

1

k

X

, kx

2

k

X

< r we have

|Φ(x

1

) − Φ(x

2

)| ≤ Lkx

1

− x

2

k

X

.

(16)

Theorem. Assume A1, A2 and A3. Then the following statements hold:

i) Any MAP estimator ˆ x ∈ X minimises the Onsager-Machlup functional I . In particular, ˆ x satisfies ˆ x ∈ E .

ii) Any ˆ x ∈ E which minimises the Onsager-Machlup functional I is a

MAP estimator.

(17)

The proof of the result is long and technical, but it is based on the following property of the Onsager-Machlup functional: If x

1

, x

2

∈ E , then

lim

ε↓0

µ(B(x

2

, ε))

µ(B(x

1

, ε)) = exp I (x

1

) − I (x

2

).

The technical difficulties are caused, among other things, by the following facts:

I

A copy of the measure µ shifted by x ∈ X is absolutely continuous w.r.t. µ, if and only if x ∈ E . Thus working with the probabilities µ(B(x , ε)) works best if x ∈ E .

I

E ⊆ X is dense, but µ(E ) = 0.

(18)

Consistency

(19)

We have seen that the 4D-Var method minimises I (x ) = Φ(x ) + 1

2 kxk

2E

where

Φ(x ) = 1 2

J

X

j =1

y

j

− G

j

(x )

2 R

and

G

j

(x ) = H M

tj

(x ).

The observations y

j

satisfy y

j

= G(x

) + η

j

, where η

j

∼ N (0, R) are i.i.d.

and x

∈ X is the true state.

Question. Does the 4D-VAR estimate converge to x

as J → ∞?

(Answer: no, but . . . )

(20)

Large sample size limit. Let x

∈ X and y

j

= G(x

) + η

j

where η

j

∼ N (0, R) are i.i.d. for j = 1, . . . , J. Then the corresponding Onsager-Machlup functional is

I

J

(x ) := kx k

2E

+

J

X

j =1

|y

j

− G(x)|

2R

.

Theorem. Assume that G : X → R

K

is locally Lipschitz continuous and x

∈ E . For J ∈ N, let x

J

∈ E be a minimiser of I

J

. Then

J→∞

lim G(x

J

) = G(x

)

almost surely.

(21)

Small noise limit. Let x

∈ X and

y

n

= G(x

) + 1 n η

n

,

where η

j

∼ N (0, R) are i.i.d. for j ∈ N. Then the corresponding Onsager-Machlup functional is

I

n

(x ) := kx k

2E

+ n

2

|y

n

− G(x)|

2R

.

Theorem. Assume that G : X → R

K

is locally Lipschitz continuous. For n ∈ N, let x

n

∈ E be a minimiser of I

n

. Then

n→∞

lim G(x

n

) = G(x

)

almost surely.

(22)

Example. Consider again the process x = (x

t

)

t∈[0,T ]

defined by the SDE dx

t

= f (x

t

) dt + dW , x

0

= a

and assume we want to make inference about the path x based on observations

y

j

= x

tj

+ η

j

where 0 ≤ t

1

< t

2

< · · · < t

J

≤ T and η

j

∼ N (0, γ

2

).

To apply our results, we choose µ

0

to be the Wiener measure on X = L

2

[0, T ], R  with Cameron-Martin space

E = n

x ∈ H

1

([0, T ], R)

x

0

= 0, Z

T

0

˙

x

t2

dt < ∞ o

.

(23)

The Onsager-Machlup functional is again I (x ) = Φ(x ) + 1

2 kxk

2H1

.

The function Φ incorporates both the density d µ/d µ

0

(found using the Girsanov formula) and the observations: Assuming that the drift satisfies f = F

0

, we find

Φ(x ) = Z

T

0

Ψ(x

t

) dt − F (x

T

) + 1 2γ

2

J

X

j =1

y

j

− x

tj

2

where

Ψ(x ) = 1

2 |f (x)|

2

+ f

0

(x ).

(24)
(25)
(26)

Conclusions

(27)

I

We have shown that MAP estimators can be used in infinite dimensional problems.

I

The 4D-VAR method for data assimilation can be described in this framework.

I

The infinite dimensional approach allows for insights into the regularity properties of the problem.

Masoumeh Dashti, Kody J. H. Law, Andrew M. Stuart and Jochen Voss.

MAP Estimators and their Consistency in Bayesian Nonparametric Inverse

Problems. Inverse Problems, vol. 29, 2013.

References

Related documents

Thus, based on the 85 participants, EU farmers’ decision to encourage natural pest control was positively associated with farm income and the perceived importance of natural

Immigrants who show a dual sense of identity constitute a partial failure, while immigrants who retain their ethnic North African identities demonstrate a

Under Lydia’s leadership, and with the support of Paul and his friends, who stayed in Lydia’s home for some time, the early church in Philippi became a place for all: women,

ZEP520A is high performance positive EB resists which show high resolution, high sensitivity and dry etch resistance.. They are suitable for various

IJEDR1702038 International Journal of Engineering Development and Research ( www.ijedr.org ) 229 Then this input wav file is sent to the WBFM Transmit block where the audio rate

Due to the biodegradable and non-hazardous nature of majority of natural bioactive compounds, the current study was designed to evaluate crude saponins (Rh.Sp), crude methanolic

Those who study the reception history of the Hebrew Bible take seriously the work of the translator, recognizing that those who penned the LXX and Targums were not simply

The holding in Husserl that the Warsaw Convention, as modified by the Montreal Agreement, established the carrier's strict liabiity for a passenger's personal