How To Solve A Sequential Mca Problem

(1)

Monte Carlo-based statistical methods

(MASM11/FMS091)

Jimmy Olsson

Centre for Mathematical Sciences Lund University, Sweden

Lecture 6

Sequential Monte Carlo methods II February 3, 2012

(2)

Changes in HA1

Problem 4(c) in HA1 has been modified: You doNOThave to provide any confidence interval for the varianceV(P(V1) +P(V2))

and the standard deviationD(P(V1) +P(V2)). Provide only point

(3)

Plan of today’s lecture

1 Last time: Sequential MC problems

2 Sequential Monte Carlo (SMC) methods

Overview

(4)

We are here

−→ •

Overview

(5)

Last time: Sequential MC problems

In the sequential MC framework, we aim at sequentially estimating sequences(τn)n≥0 of expectations

τn=Efn(φ(X0:n)) = Z

Xn

φ(x0:n)fn(x0:n)dx0:n (∗)

over spacesXn ofincreasing dimension, where the densities(fn)

are known up to normalizing constants only, i.e., for everyn≥0,

fn(x0:n) =

zn(x0:n) cn

,

(6)

Last time: Markov chains

Some applications involved the notion ofMarkov chains: A Markov chain onX⊆_Rd_{is a family of random variables (=}

stochastic process)(Xk)k≥0 taking values inX such that P(Xk+1 ∈A|X0, X1, . . . , Xk) =P(Xk+1 ∈A|Xk).

The densityq of the distribution of Xk+1 givenXk=x is called

thetransition density of(Xk). Consequently,

P(Xk+1 ∈A|Xk =xk) = Z

A

q(xk+1|xk)dxk+1.

As a first example we considered an AR(1) process:

X0= 0, Xk+1=αXk+k+1,

(7)

Last time: Markov chains (cont.)

The following theorem provides the joint densityfn(x0, x1, . . . , xn)

ofX0, X1, . . . , Xn.

Theorem

Let(Xk) be Markov with X0 ∼χ. Then for n >0,

fn(x0, x1, . . . , xn) =χ(x0)

n−1 Y

k=0

q(xk+1|xk).

Corollary (The Chapman-Kolmogorov equation)

Let(Xk) be Markov. Then for n >1,

fn(xn|x0) = Z · · · Z n−1 Y k=0 q(xk+1|xk) ! dx1· · ·dxn−1.

(8)

Last time: Rare event analysis (REA) for Markov chains

Let(Xk) be a Markov chain. Assume that we want to compute, for

n= 0,1,2, . . . τn=E(φ(X0:n)|X0:n∈A) = Z A φ(x0:n) fn(x0:n) P(X0:n∈A) dx0:n = Z A φ(x0:n) χ(x0)Qn−k=01q(xk+1|xk) P(X0:n∈A) dx0:n,

whereA is a possibly “rare” event andP(X0:n∈A)is generally

unknown. We thus face a sequential MC problem (∗) with

(

zn(x0:n)←χ(x0)Qn−_k₌₀1q(xk+1|xk),

(9)

Last time: Estimation in general HMMs

Graphically: "! # "! # "! # "! # "! # "! # Yk−1 Yk Yk+1 Xk−1 Xk Xk+1 - - - -6 6 6 ... ... (Markov chain) (Observations)

Yk|Xk=xk∼g(yk|xk) (Observation density) Xk+1|Xk=xk∼q(xk+1|xk) (Transition density)

(10)

Last time: Estimation in general HMMs

In an HMM, thesmoothing distribution fn(x0:n|y0:n) is the

conditional distribution of a setX0:nof hidden states given Y0:n=y0:n.

Theorem (Smoothing distribution) fn(x0:n|y0:n) =

χ(x0)g(y0|x0)Qn_k₌₁g(yk|xk)q(xk|xk−1)

Ln(y0:n)

,

where

Ln(y0:n) =density of the observationsy0:n = Z · · · Z χ(x0)g(y0|x0) n Y k=1 g(yk|xk)q(xk|xk−1)dx0· · ·dxn.

(11)

Last time: Estimation in general HMMs

whereLn(y0:n)(= obscene integral) is generally unknown. We

thus face a sequential MC problem (∗) with

(

zn(x0:n)←χ(x0)g(y0|x0)Qnk=1g(yk|xk)q(xk|xk−1),

(12)

We are here

−→ •

Overview

(13)

We are here

−→ •

Overview

(14)

Sequential Monte Carlo (SMC) methods

It is natural to aim at solving the problem using usual self-normalized IS.

However, the generated samples(X_0:i_n, ωn(X0:in))should be such

that

having (X_0:i_n, ωn(X0:in)), the next sample (Xi

0:n+1, ωn+1(X0:in+1)) is easily generated with a complexity

that does not increase with n(online sampling). the approximation remains stable asn increases. We call each drawXi

0:n= (X0i, . . . , Xni) aparticle. Moreover, we

denote importance weights by

(15)

We are here

−→ •

Overview

(16)

Sequential importance sampling (SIS)

We proceed recursively. Assume that we have generated particles

(X_0:i_n) fromgn(x0:n) so that N X i=1 ωi_n PN `=1ω`n h(X_0:i_n)≈Efn(φ(X0:n)), where, as usual,ωi n=ωn(X0:in) =zn(X0:in)/gn(X0:in). Key trick: Choose an instrumental distribution satisfying

(17)

SIS (cont.)

Consequently,X_0:i_n₊₁ andωi_n₊₁ can be generated by keeping the previousX_0:i_n,

simulating X_ni₊₁ ∼gn+1(xn+1|X0:in), settingXi 0:n+1 = (Xni+1, X0:in), and computing ω_ni₊₁= zn+1(X i 0:n+1) gn+1(X_0:i_n₊₁) = zn+1(X i 0:n+1) zn(X0:in)gn+1(Xni+1|X0:in) × zn(X i 0:n) gn(X0:in) = zn+1(X i 0:n+1) zn(X_0:i_n)gn+1(X_ni₊₁|X_0:i_n) ×ωi_n.

(18)

SIS (cont.)

Voilà, the sample(X_0:i_n₊₁, ω_ni₊₁)can now be used to approximate

Efn+1(φ(X0:n+1))!

So, by running the SIS algorithm, we have updated an approximation N X i=1 ω_ni PN `=1ωn` h(X_0:i_n)≈_E_fn(φ(X0:n)) to an approximation N X i=1 ωi_n₊₁ PN `=1ω`n+1 h(X_0:i_n₊₁)≈Efn+1(φ(X0:n+1))

by only adding a componentX_ni₊₁ toX_0:i_n and sequentially updating the weights.

(19)

SIS: Pseudo code

for i= 1→N do draw Xi 0 ∼g0 set ω₀i = z0(X0i) g0(X0i) end for return (X₀i, ωi₀) for k= 0,1,2, . . .do for i= 1→N do draw X_ki₊₁ ∼gk+1(xk+1|X0:in) set X_0:i_k₊₁ ←(X_ki₊₁, X_0:i_k) set ω_ki₊₁ ← zk+1(X0:ik+1) zk(Xi0:k)gk+1(Xki+1|X i 0:k) ×ω_ki end for return (X_0:i_k₊₁, ωi_k₊₁) end for

(20)

Example: REA reconsidered

We consider again the example of REA for Markov chains (X=R):

τn=E(φ(X0:n)|a≥X`,∀`= 0, . . . , n) = Z (a,∞)n+1 φ(x0:n) χ(x0)Qn−k=01q(xk+1|xk) P(a≥X`,∀`) | {z } =zn(x0:n)/cn dx0:n.

Choosegk+1(xk+1|x0:k)to be the conditional density of Xk+1

givenXk andXk+1 ≥a: gk+1(xk+1|x0:k) ={cf. HA1, Problem 1}= q(xk+1|xk) R∞ a q(z|xk)dz .

(21)

Example: REA

This implies that

gn(x0:n) = χ(x0) R∞ a χ(x0)dx0 n−1 Y k=0 q(xk+1|xk) R∞ a q(z|xk)dz .

In addition, the weights are updated as

(22)

Example: REA; Matlab implementation for

AR(1) process with Gaussian noise

% design of instrumental distribution:

int = @(x) 1 − normcdf(a,alpha*x,sigma);

trunk_td_rnd = @(x) ... % use HA1, Problem 1, to draw from % the truncated transition density;

% SIS: % x = starting position.

part = a*ones(N,1); % initialization in a w = ones(N,1);

for k = 1:(n − 1), % main loop part_mut = trunk_td_rnd(part); w = w.*int(part);

part = part_mut;

end

(23)

REA: Importance weight distribution

/

Serious drawback of SIS: the importance weights degenerate!...

−150 −10 −5 0 500 1000 n = 1 −150 −10 −5 0 500 1000 n = 10 −150 −10 −5 0 50 100 n = 100

(24)

What’s next?

Weight degeneration is a universal problem with the SIS method and is due to the fact that the particle weights are generated through subsequent multiplications.

This drawback prevented—during several decades—the SIS method from being practically useful.

Next week we will discuss an efficient solution to this problem: SIS with resampling.