Aim: produce random variables for given distribution Inverse Method
Let F be the distribution function of an univariate distribution and let F −1 (y) = inf{x|F (x) ≥ y} (generalized inverse of F ).
For a uniformly distributed random variable U ∼ U(0, 1), let X = F −1 (U ).
Then X has distribution function F :
P(X ≤ x) = P(F −1 (U ) ≤ x) = P(U ≤ F (x)) = F (x).
Example:
For the exponential distribution, we have F (x) = 1 − exp(−θx). Thus X = F −1 (U ) = − 1
θ log(1 − U )
is exponentially distributed with parameter θ.
Remarks:
The result shows that the generation of sequences of random variables (usually iid) for some given distribution is based on the the production of uniform random variables.
Since the algorithms for generating uniform random variables are deter-
Generating Random Numbers
Acceptance-Rejection Method
Let f be the density of some univariate distribution we want to sample from. Suppose the f (·) is majorized by c g(·),
f (y) ≤ c g(y),
for some simple probability density g and some constant c > 1.
The idea of acceptance-rejection is to sample proposals X from the simpler (simulationwise) density g and then to reject some proposals which are likely to be overrepresented in the sample:
◦ Sample X ∼ g(x)
◦ Sample U ∼ U (0, 1)
◦ Accept X if U ≤ f (X) c g(X) .
Example: Non-standard prior for binomial parameter Suppose that Y is binomially distributed
Y ∼ Bin(n, θ)
with non-conjugate prior distribution π(θ) = 4 1
2 − θ − 1
2
, θ ∈ [0, 1].
The posterior distribution (conditional on data Y = y) satisfies π(θ|y) ∼ θ y (1 − θ) n−y 1
2 − θ − 1
2
, θ ∈ [0, 1].
Example: Non-standard prior for binomial parameter (contd)
Since 1 2 − |θ − 1 2 | ≤ 1 2 − 2 (θ − 1 2 ) 2 = 2 θ (1 − θ) we get for some c > 0 π(θ|y) ≤ c θ y+1 (1 − θ) n−y+1
The right side is proportional to the density of a beta distribution with parameters y + 2 and n − y + 2. This suggests the following sampling scheme:
◦ Define f (θ) = θ y (1 − θ) n−y 1
2 − θ − 1
2
g(θ) = 2 θ y+1 (1 − θ) n−y+1
◦ Sample θ ∼ Beta(y + 2, n − y + 2) U ∼ U(0, 1)
◦ Accept θ if U ≤ f (θ) g(θ) .
◦ Note: We did not need to compute the proportionality factor Z
Θ
f (y|θ) π(θ) dθ,
since this cancels in the ratio f (θ) g(θ) .
2.0 2.5 3.0
cg ( θ )
1200
Acceptance−Rejection for Bayesian Inference with Non−Standard Prior 1600
Generating Random Numbers
For other distributions (e.g. the normal) exist special methods.
In this course, we will only use the built-in generators in R:
N (µ, σ 2 ) rnorm(n,mean=0,sd=1) U[min, max] runif(n,min=0,max=1) Beta(a, b) rbeta(n,a,b)
Bin(s, p) rbinom(n,size,prob) Cauchy(α, σ)) rnorm(n,loc=0,scale=1) χ 2 n (δ) rchisq(n,df,ncp=0) Exp(rate) rnorm(n,rate=1)
F m,n rf(n,df1,df2)
Γ(a, s) rgamma(n,shape,rate=1,scale=1/rate) Geom(p) rgeom(n,prob)
H(m, n, k) rhyper(nn,m,n,k)
log-normal(µ, σ 2 ) rlnorm(n,mean=0,sd=1)) Logistic(µ, σ 2 ) rlogis(n,loc=0,scale=1) NegBinom(s, p) rnbinom(n,size,prob,mu) Poisson(λ) rpois(n,lambda)
t n rt(n,df)
Weibull(a, b) rweibull(n,shape,scale=1)
R also provides functions for calculating the density (dF), the distribution
function (pF) and quantiles (qF), where F is the name of the distribution
as in the above commands.
Aim: Evaluate expectation E(h(Y )) =
Z
h(y) f (y) dy, (1)
where Y is some (possibly high-dimensional) random variable with distri- bution defined by f (y).
Examples:
◦ Suppose ˆ θ = ˆ θ(Y ) is an estimator for some parameter θ. Quantities of interest are the bias and the standard deviation of ˆ θ,
bias(ˆ θ) = E(ˆθ) − θ, σ(ˆ θ) = E(ˆθ − E(ˆθ)) 2 1 2
.
◦ In cases, where the estimate ˆ θ is obtained by an iterative estimation procedures (e.g. by Newton-Raphson), the estimator ˆ θ(Y ) cannot be written in closed form and the integral (1) cannot be computed by numerical integration.
◦ For two normally distributed samples Y 11 , . . . , Y n 1 1 and Y 12 , . . . , Y n 2 2 , the hypothesis of equal means can be tested by the two-sample t test with test statistic
T =
Y ¯ 1 − ¯ Y 2 q s 2 1
n 1 + n s 2 2
2
.
The p-value of the test is defined as
P(T (Y ) > t) = E (1
Monte Carlo Methods
Monte Carlo approach:
◦ Draw sample Y (1) , . . . , Y (n) iid ∼ f (y).
◦ Estimate expectation by 1
n
n
P
t=1
h(Y (t) )
(Monte Carlo integration)
◦ For independent samples, the Law of Large Numbers yields 1
n
n
P
t=1
h(Y (t) ) → E(h(Y )) as n → ∞.
Example: Two-sample t test in R
N<-10000 #number of MC repititions
n1<-8 #sample size
n2<-4
Y1<-rgamma(N*n1,1,1) #sample from gamma distribution Y2<-rgamma(N*n2,2,2)
#Y1<-rnorm(N*n1,0,1) #alternatively:
#Y2<-rnorm(N*n2,0,2) #sample from normal distribution Y<-c(Y1,Y2)
dim(Y)<-c(N,n1+n2)
tstat<-function(Y,n1,n2) { Y1<-Y[1:n1]
Y2<-Y[(n1+1):(n1+n2)]
#two-sample t test statistic
T<-(mean(Y1)-mean(Y2))/sqrt(var(Y1)/n1+var(Y2)/n2)
#Satterthwaite approximation of degrees of freedom
df<-(var(Y1)/n1+var(Y2)/n2)^2/((var(Y1)/n1)^2/(n1-1)+(var(Y2)/n2)^2/(n2-1)) return(c(T,df))
}
#calculate test statistic for N samples R<-apply(Y,1,tstat,n1,n2)
#estimate significance level
a<-mean(ifelse(abs(R[1,])>qt(0.975,R[2,]),1,0))
a
Suppose Y 1 , . . . , Y n ∼ P θ and ˆ θ = S(Y ) is an estimator for θ.
Aim: Determine variance of ˆ θ.
Problem: Iterative algorithms such as the EM algorithm often do not give analytic expressions for the variance of the estimator.
Idea: Simulate from the distribution P θ to obtain samples Y (b) = (Y 1 (b) , . . . , Y n (b) ), b = 1, . . . , B
and for each sample an estimate θ ˆ (b) = S(Y (b) ).
Then the standard error of ˆ θ = S(Y ) can be estimated by ˆ
σ 2 (ˆ θ) = 1
B
B
P
b=1
θ ˆ (b) − ¯ θ 2
.
Problem: We cannot sample from P θ since θ is unknown.
Idea: Approximate P θ by P θ ˆ Parametric bootstrap:
◦ Estimate θ by ˆ θ = S(Y )
◦ For b = 1, . . . , B, simulate bootstrap replications Y j ∗ (b) iid ∼ P θ ˆ .
◦ Compute bootstrap estimate ˆ θ ∗ (b) from the bootstrap sample Y ∗ (b) .
◦ Estimate the variance of ˆ θ = S(Y ) by
Importance Sampling
Aim: Reduction of variance Rewrite E(h(Y )) with Y ∼ f as
Z
h(y) f (y) dy =
Z h(y) f (y)
g(y) g(y) dy.
Thus if Z (1) , . . . , Z (n) ∼ g then 1
n
n
X
t=1
h(Z (t) ) f (Z (t) ) g(Z (t) )
offers an alternative approximation for E(h(Y )).
The variance of the Monte Carlo estimate for E(h(Y )) is minimized if
|h(y)| f (y)
g(y) ≈ constant
Example:
Suppose we want to approximate p = P(Y ≥ c) for (i) Y ∼ N (0, 1)
(ii) Y ∼ Cauchy(0, 1)
An intuitive Monte Carlo approximation is p n = 1
n
n
P
t=1
1 (c,∞) Y (t)
where Y (t) iid ∼ N (0, 1) resp. Y (t) iid ∼ Cauchy(0, 1).
Since p n is the mean of n Bernoulli random variables, its relative error is
pvar(p n )
p =
r 1 − p
p n .
Alternative approach: Let Z (t) be exponentially distributed on (c, ∞), Z (t) ∼ g(z) = exp(c − z) 1 (c,∞) (z).
Then p can be approximated by p 0 n = 1
n
n
P
t=1
f Z (t) g Z (t) ,
where f is the density of a (i) normal or (ii) Cauchy distribution.
0 10 20 30 40 50
0.0 0.1 0.2 0.3 0.4
t
pn [%]p
nsolid p’
ndashed
0 10 20 30 40 50
6 8 10 12 14
t
pn [%]p
nsolid p’
ndashed
3 4 5 6 7 8 9 10
0 1 2 3 4
z
f(z)/g(z) [x1000]
3 4 5 6 7 8 9 10
0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5
z
f(z)/g(z)