Monte Carlo-based statistical methods
(MASM11/FMS091)
Jimmy Olsson
Centre for Mathematical Sciences Lund University, Sweden
Lecture 6
Sequential Monte Carlo methods II February 3, 2012
Changes in HA1
Problem 4(c) in HA1 has been modified: You doNOThave to provide any confidence interval for the varianceV(P(V1) +P(V2))
and the standard deviationD(P(V1) +P(V2)). Provide only point
Plan of today’s lecture
1 Last time: Sequential MC problems
2 Sequential Monte Carlo (SMC) methods
Overview
We are here
−→ •
1 Last time: Sequential MC problems
2 Sequential Monte Carlo (SMC) methods
Overview
Last time: Sequential MC problems
In the sequential MC framework, we aim at sequentially estimating sequences(τn)n≥0 of expectations
τn=Efn(φ(X0:n)) = Z
Xn
φ(x0:n)fn(x0:n)dx0:n (∗)
over spacesXn ofincreasing dimension, where the densities(fn)
are known up to normalizing constants only, i.e., for everyn≥0,
fn(x0:n) =
zn(x0:n) cn
,
Last time: Markov chains
Some applications involved the notion ofMarkov chains: A Markov chain onX⊆Rdis a family of random variables (=
stochastic process)(Xk)k≥0 taking values inX such that P(Xk+1 ∈A|X0, X1, . . . , Xk) =P(Xk+1 ∈A|Xk).
The densityq of the distribution of Xk+1 givenXk=x is called
thetransition density of(Xk). Consequently,
P(Xk+1 ∈A|Xk =xk) = Z
A
q(xk+1|xk)dxk+1.
As a first example we considered an AR(1) process:
X0= 0, Xk+1=αXk+k+1,
Last time: Markov chains (cont.)
The following theorem provides the joint densityfn(x0, x1, . . . , xn)
ofX0, X1, . . . , Xn.
Theorem
Let(Xk) be Markov with X0 ∼χ. Then for n >0,
fn(x0, x1, . . . , xn) =χ(x0)
n−1 Y
k=0
q(xk+1|xk).
Corollary (The Chapman-Kolmogorov equation)
Let(Xk) be Markov. Then for n >1,
fn(xn|x0) = Z · · · Z n−1 Y k=0 q(xk+1|xk) ! dx1· · ·dxn−1.
Last time: Rare event analysis (REA) for Markov chains
Let(Xk) be a Markov chain. Assume that we want to compute, for
n= 0,1,2, . . . τn=E(φ(X0:n)|X0:n∈A) = Z A φ(x0:n) fn(x0:n) P(X0:n∈A) dx0:n = Z A φ(x0:n) χ(x0)Qn−k=01q(xk+1|xk) P(X0:n∈A) dx0:n,
whereA is a possibly “rare” event andP(X0:n∈A)is generally
unknown. We thus face a sequential MC problem (∗) with
(
zn(x0:n)←χ(x0)Qn−k=01q(xk+1|xk),
Last time: Estimation in general HMMs
Graphically: "! # "! # "! # "! # "! # "! # Yk−1 Yk Yk+1 Xk−1 Xk Xk+1 - - - -6 6 6 ... ... (Markov chain) (Observations)Yk|Xk=xk∼g(yk|xk) (Observation density) Xk+1|Xk=xk∼q(xk+1|xk) (Transition density)
Last time: Estimation in general HMMs
In an HMM, thesmoothing distribution fn(x0:n|y0:n) is the
conditional distribution of a setX0:nof hidden states given Y0:n=y0:n.
Theorem (Smoothing distribution) fn(x0:n|y0:n) =
χ(x0)g(y0|x0)Qnk=1g(yk|xk)q(xk|xk−1)
Ln(y0:n)
,
where
Ln(y0:n) =density of the observationsy0:n = Z · · · Z χ(x0)g(y0|x0) n Y k=1 g(yk|xk)q(xk|xk−1)dx0· · ·dxn.
Last time: Estimation in general HMMs
Assume that we want to compute, online forn= 0,1,2, . . . τn=E(φ(X0:n)|Y0:n=y0:n) = Z · · · Z φ(x0:n)fn(x0:n|y0:n)dx0· · ·dxn = Z · · · Z φ(x0:n) χ(x0)g(y0|x0)Qnk=1g(yk|xk)q(xk|xk−1) Ln(y0:n) dx0· · ·dxn,
whereLn(y0:n)(= obscene integral) is generally unknown. We
thus face a sequential MC problem (∗) with
(
zn(x0:n)←χ(x0)g(y0|x0)Qnk=1g(yk|xk)q(xk|xk−1),
We are here
−→ •
1 Last time: Sequential MC problems
2 Sequential Monte Carlo (SMC) methods
Overview
We are here
−→ •
1 Last time: Sequential MC problems
2 Sequential Monte Carlo (SMC) methods
Overview
Sequential Monte Carlo (SMC) methods
It is natural to aim at solving the problem using usual self-normalized IS.
However, the generated samples(X0:in, ωn(X0:in))should be such
that
having (X0:in, ωn(X0:in)), the next sample (Xi
0:n+1, ωn+1(X0:in+1)) is easily generated with a complexity
that does not increase with n(online sampling). the approximation remains stable asn increases. We call each drawXi
0:n= (X0i, . . . , Xni) aparticle. Moreover, we
denote importance weights by
We are here
−→ •
1 Last time: Sequential MC problems
2 Sequential Monte Carlo (SMC) methods
Overview
Sequential importance sampling (SIS)
We proceed recursively. Assume that we have generated particles
(X0:in) fromgn(x0:n) so that N X i=1 ωin PN `=1ω`n h(X0:in)≈Efn(φ(X0:n)), where, as usual,ωi n=ωn(X0:in) =zn(X0:in)/gn(X0:in). Key trick: Choose an instrumental distribution satisfying
SIS (cont.)
Consequently,X0:in+1 andωin+1 can be generated by keeping the previousX0:in,
simulating Xni+1 ∼gn+1(xn+1|X0:in), settingXi 0:n+1 = (Xni+1, X0:in), and computing ωni+1= zn+1(X i 0:n+1) gn+1(X0:in+1) = zn+1(X i 0:n+1) zn(X0:in)gn+1(Xni+1|X0:in) × zn(X i 0:n) gn(X0:in) = zn+1(X i 0:n+1) zn(X0:in)gn+1(Xni+1|X0:in) ×ωin.
SIS (cont.)
Voilà, the sample(X0:in+1, ωni+1)can now be used to approximate
Efn+1(φ(X0:n+1))!
So, by running the SIS algorithm, we have updated an approximation N X i=1 ωni PN `=1ωn` h(X0:in)≈Efn(φ(X0:n)) to an approximation N X i=1 ωin+1 PN `=1ω`n+1 h(X0:in+1)≈Efn+1(φ(X0:n+1))
by only adding a componentXni+1 toX0:in and sequentially updating the weights.
SIS: Pseudo code
for i= 1→N do draw Xi 0 ∼g0 set ω0i = z0(X0i) g0(X0i) end for return (X0i, ωi0) for k= 0,1,2, . . .do for i= 1→N do draw Xki+1 ∼gk+1(xk+1|X0:in) set X0:ik+1 ←(Xki+1, X0:ik) set ωki+1 ← zk+1(X0:ik+1) zk(Xi0:k)gk+1(Xki+1|X i 0:k) ×ωki end for return (X0:ik+1, ωik+1) end forExample: REA reconsidered
We consider again the example of REA for Markov chains (X=R):
τn=E(φ(X0:n)|a≥X`,∀`= 0, . . . , n) = Z (a,∞)n+1 φ(x0:n) χ(x0)Qn−k=01q(xk+1|xk) P(a≥X`,∀`) | {z } =zn(x0:n)/cn dx0:n.
Choosegk+1(xk+1|x0:k)to be the conditional density of Xk+1
givenXk andXk+1 ≥a: gk+1(xk+1|x0:k) ={cf. HA1, Problem 1}= q(xk+1|xk) R∞ a q(z|xk)dz .
Example: REA
This implies that
gn(x0:n) = χ(x0) R∞ a χ(x0)dx0 n−1 Y k=0 q(xk+1|xk) R∞ a q(z|xk)dz .
In addition, the weights are updated as
ωik+1= zk+1(X i 0:k+1) zk(X0:ik)gk+1(Xki+1|X0:ik) ×ωik = Qk `=0q(X`i+1|X`i) Qk−1 `=0q(X`i+1|X`i)× q(Xki+1|Xki) R∞ a q(z|X i k)dz ×ωik= Z ∞ a q(z|Xki)dz×ωki.
Example: REA; Matlab implementation for
AR(1) process with Gaussian noise
% design of instrumental distribution:
int = @(x) 1 − normcdf(a,alpha*x,sigma);
trunk_td_rnd = @(x) ... % use HA1, Problem 1, to draw from % the truncated transition density;
% SIS: % x = starting position.
part = a*ones(N,1); % initialization in a w = ones(N,1);
for k = 1:(n − 1), % main loop part_mut = trunk_td_rnd(part); w = w.*int(part);
part = part_mut;
end
REA: Importance weight distribution
/
Serious drawback of SIS: the importance weights degenerate!...−150 −10 −5 0 500 1000 n = 1 −150 −10 −5 0 500 1000 n = 10 −150 −10 −5 0 50 100 n = 100
What’s next?
Weight degeneration is a universal problem with the SIS method and is due to the fact that the particle weights are generated through subsequent multiplications.
This drawback prevented—during several decades—the SIS method from being practically useful.
Next week we will discuss an efficient solution to this problem: SIS with resampling.