Generating Random Numbers

(1)

Aim: produce random variables for given distribution Inverse Method

Let F be the distribution function of an univariate distribution and let F ⁻¹ (y) = inf{x|F (x) ≥ y} (generalized inverse of F ).

For a uniformly distributed random variable U ∼ U(0, 1), let X = F ⁻¹ (U ).

Then X has distribution function F :

P(X ≤ x) = P(F ⁻¹ (U ) ≤ x) = P(U ≤ F (x)) = F (x).

Example:

For the exponential distribution, we have F (x) = 1 − exp(−θx). Thus X = F ⁻¹ (U ) = − 1

θ log(1 − U )

is exponentially distributed with parameter θ.

Remarks:

The result shows that the generation of sequences of random variables (usually iid) for some given distribution is based on the the production of uniform random variables.

Since the algorithms for generating uniform random variables are deter-

(2)

Generating Random Numbers

Acceptance-Rejection Method

Let f be the density of some univariate distribution we want to sample from. Suppose the f (·) is majorized by c g(·),

f (y) ≤ c g(y),

for some simple probability density g and some constant c > 1.

The idea of acceptance-rejection is to sample proposals X from the simpler (simulationwise) density g and then to reject some proposals which are likely to be overrepresented in the sample:

◦ Sample X ∼ g(x)

◦ Sample U ∼ U (0, 1)

◦ Accept X if U ≤ f (X) c g(X) .

Example: Non-standard prior for binomial parameter Suppose that Y is binomially distributed

Y ∼ Bin(n, θ)

with non-conjugate prior distribution π(θ) = 4 ¹

2 − θ − ¹

2 , θ ∈ [0, 1].

The posterior distribution (conditional on data Y = y) satisfies π(θ|y) ∼ θ ^y (1 − θ) ^n−y ¹

2 − θ − ¹

2 , θ ∈ [0, 1].

(3)

Example: Non-standard prior for binomial parameter (contd)

Since ¹ ₂ − |θ − ¹ ₂ | ≤ ¹ ₂ − 2 (θ − ¹ ₂ ) ² = 2 θ (1 − θ) we get for some c > 0 π(θ|y) ≤ c θ ^y+1 (1 − θ) ^n−y+1

The right side is proportional to the density of a beta distribution with parameters y + 2 and n − y + 2. This suggests the following sampling scheme:

◦ Define f (θ) = θ ^y (1 − θ) ^n−y ¹

2 − θ − ¹

2 g(θ) = 2 θ ^y+1 (1 − θ) ^n−y+1

◦ Sample θ ∼ Beta(y + 2, n − y + 2) U ∼ U(0, 1)

◦ Accept θ if U ≤ f (θ) g(θ) .

◦ Note: We did not need to compute the proportionality factor Z

Θ

f (y|θ) π(θ) dθ,

since this cancels in the ratio ^{f (θ)} _g(θ) .

2.0 2.5 3.0

cg ( θ )

1200

Acceptance−Rejection for Bayesian Inference with Non−Standard Prior 1600

(4)

Generating Random Numbers

For other distributions (e.g. the normal) exist special methods.

In this course, we will only use the built-in generators in R:

N (µ, σ ² ) rnorm(n,mean=0,sd=1) U[min, max] runif(n,min=0,max=1) Beta(a, b) rbeta(n,a,b)

Bin(s, p) rbinom(n,size,prob) Cauchy(α, σ)) rnorm(n,loc=0,scale=1) χ ² _n (δ) rchisq(n,df,ncp=0) Exp(rate) rnorm(n,rate=1)

F _m,n rf(n,df1,df2)

Γ(a, s) rgamma(n,shape,rate=1,scale=1/rate) Geom(p) rgeom(n,prob)

H(m, n, k) rhyper(nn,m,n,k)

log-normal(µ, σ ² ) rlnorm(n,mean=0,sd=1)) Logistic(µ, σ ² ) rlogis(n,loc=0,scale=1) NegBinom(s, p) rnbinom(n,size,prob,mu) Poisson(λ) rpois(n,lambda)

t _n rt(n,df)

Weibull(a, b) rweibull(n,shape,scale=1)

R also provides functions for calculating the density (dF), the distribution

function (pF) and quantiles (qF), where F is the name of the distribution

as in the above commands.

(5)

Aim: Evaluate expectation E(h(Y )) =

Z

h(y) f (y) dy, (1)

where Y is some (possibly high-dimensional) random variable with distri- bution defined by f (y).

Examples:

◦ Suppose ˆ θ = ˆ θ(Y ) is an estimator for some parameter θ. Quantities of interest are the bias and the standard deviation of ˆ θ,

bias(ˆ θ) = E(ˆθ) − θ, σ(ˆ θ) = E(ˆθ − E(ˆθ)) ² ¹ ₂

.

◦ In cases, where the estimate ˆ θ is obtained by an iterative estimation procedures (e.g. by Newton-Raphson), the estimator ˆ θ(Y ) cannot be written in closed form and the integral (1) cannot be computed by numerical integration.

◦ For two normally distributed samples Y ₁₁ , . . . , Y _n ₁ ₁ and Y ₁₂ , . . . , Y _n ₂ ₂ , the hypothesis of equal means can be tested by the two-sample t test with test statistic

T =

Y ¯ ₁ − ¯ Y ₂ q s ² ₁

n 1 + _n ^s ² ²

2 .

The p-value of the test is defined as

P(T (Y ) > t) = E (1

(6)

Monte Carlo Methods

Monte Carlo approach:

◦ Draw sample Y ⁽¹⁾ , . . . , Y ^{(n) iid} ∼ f (y).

◦ Estimate expectation by 1

n

P

t=1

h(Y ^(t) )

(Monte Carlo integration)

◦ For independent samples, the Law of Large Numbers yields 1

n

P

t=1

h(Y ^(t) ) → E(h(Y )) as n → ∞.

Example: Two-sample t test in R

N<-10000 #number of MC repititions

n1<-8 #sample size

n2<-4

Y1<-rgamma(Nn1,1,1) #sample from gamma distribution Y2<-rgamma(Nn2,2,2)

#Y1<-rnorm(N*n1,0,1) #alternatively:

#Y2<-rnorm(N*n2,0,2) #sample from normal distribution Y<-c(Y1,Y2)

dim(Y)<-c(N,n1+n2)

tstat<-function(Y,n1,n2) { Y1<-Y[1:n1]

Y2<-Y[(n1+1):(n1+n2)]

#two-sample t test statistic

T<-(mean(Y1)-mean(Y2))/sqrt(var(Y1)/n1+var(Y2)/n2)

#Satterthwaite approximation of degrees of freedom

df<-(var(Y1)/n1+var(Y2)/n2)^2/((var(Y1)/n1)^2/(n1-1)+(var(Y2)/n2)^2/(n2-1)) return(c(T,df))

}

#calculate test statistic for N samples R<-apply(Y,1,tstat,n1,n2)

#estimate significance level

a<-mean(ifelse(abs(R[1,])>qt(0.975,R[2,]),1,0))

a

(7)

Suppose Y ₁ , . . . , Y _n ∼ P _θ and ˆ θ = S(Y ) is an estimator for θ.

Aim: Determine variance of ˆ θ.

Problem: Iterative algorithms such as the EM algorithm often do not give analytic expressions for the variance of the estimator.

Idea: Simulate from the distribution P _θ to obtain samples Y ^(b) = (Y ₁ ^(b) , . . . , Y _n ^(b) ), b = 1, . . . , B

and for each sample an estimate θ ˆ ^(b) = S(Y ^(b) ).

Then the standard error of ˆ θ = S(Y ) can be estimated by ˆ

σ ² (ˆ θ) = ¹

B

P

b=1

θ ˆ ^(b) − ¯ θ 2

.

Problem: We cannot sample from P θ since θ is unknown.

Idea: Approximate P _θ by P _θ _ˆ Parametric bootstrap:

◦ Estimate θ by ˆ θ = S(Y )

◦ For b = 1, . . . , B, simulate bootstrap replications Y _j ^{∗ (b)} ^iid ∼ P _θ _ˆ .

◦ Compute bootstrap estimate ˆ θ ^{∗ (b)} from the bootstrap sample Y ^{∗ (b)} .

◦ Estimate the variance of ˆ θ = S(Y ) by

(8)

Importance Sampling

Aim: Reduction of variance Rewrite E(h(Y )) with Y ∼ f as

Z

h(y) f (y) dy =

Z h(y) f (y)

g(y) g(y) dy.

Thus if Z ⁽¹⁾ , . . . , Z ⁽ⁿ⁾ ∼ g then 1

n

X

t=1

h(Z ^(t) ) f (Z ^(t) ) g(Z ^(t) )

offers an alternative approximation for E(h(Y )).

The variance of the Monte Carlo estimate for E(h(Y )) is minimized if

|h(y)| f (y)

g(y) ≈ constant

Example:

Suppose we want to approximate p = P(Y ≥ c) for (i) Y ∼ N (0, 1)

(ii) Y ∼ Cauchy(0, 1)

An intuitive Monte Carlo approximation is p _n = ¹

n

P

t=1

1 _(c,∞) Y ^(t)

where Y ^{(t) iid} ∼ N (0, 1) resp. Y ^{(t) iid} ∼ Cauchy(0, 1).

Since p _n is the mean of n Bernoulli random variables, its relative error is

pvar(p n )

p =

r 1 − p

p n .

(9)

Alternative approach: Let Z ^(t) be exponentially distributed on (c, ∞), Z ^(t) ∼ g(z) = exp(c − z) 1 _(c,∞) (z).

Then p can be approximated by p ⁰ _n = ¹

n

P

t=1

f Z ^(t) g Z ^(t) ,

where f is the density of a (i) normal or (ii) Cauchy distribution.

0 10 20 30 40 50

0.0 0.1 0.2 0.3 0.4

t

pn [%]

p

n

solid p’

n

dashed

0 10 20 30 40 50

6 8 10 12 14

t

pn [%]

p

n

solid p’

n

dashed

3 4 5 6 7 8 9 10

0 1 2 3 4

z

f(z)/g(z) [x1000]

3 4 5 6 7 8 9 10

0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5

z

f(z)/g(z)

(10)

Markov Chain Monte Carlo

Suppose that Y = (Y 1 , . . . , Y d ) ^T has density f (y).

Aim: Approximation E(h(Y )) ≈ 1

n

P

t=1

h(Y ^(t) ) with Y ⁽¹⁾ , . . . , Y ^{(n) iid} ∼ f (y).

Problem: Independent sampling from a multivariate distribution is often difficult.

Solution:

◦ The Law of Large Numbers 1

n

P

t=1

h(Y ^(t) ) → E(h(Y )) as n → ∞.

still applies if Y ^(t) are (not too) dependent observations with Y ^(t) ∼ f (y).

◦ Idea: Generate a sequence of random numbers Y ^(k) which converges to a dependent sample from the joint distribution f (y)

Y ^(t) ∼ f (y) but NOT Y ^{(t) iid} ∼ f (y).

◦ We will show that this can be accomplished by sampling Y _i ^(t) from the conditional distribution

f y _i

Y ₁ ^(t) , . . . , Y _i−1 ^(t) , Y _i+1 ^(t−1) , . . . , Y _d ^(t−1)

for i = 1, . . . , d

Generating Random Numbers

Aim: produce random variables for given distribution Inverse Method

Let F be the distribution function of an univariate distribution and let F −1 (y) = inf{x|F (x) ≥ y} (generalized inverse of F ).

For a uniformly distributed random variable U ∼ U(0, 1), let X = F −1 (U ).

Then X has distribution function F :

P(X ≤ x) = P(F −1 (U ) ≤ x) = P(U ≤ F (x)) = F (x).

Example:

For the exponential distribution, we have F (x) = 1 − exp(−θx). Thus X = F −1 (U ) = − 1

θ log(1 − U )

is exponentially distributed with parameter θ.

Remarks:

The result shows that the generation of sequences of random variables (usually iid) for some given distribution is based on the the production of uniform random variables.

Since the algorithms for generating uniform random variables are deter-

Generating Random Numbers

Acceptance-Rejection Method

Let f be the density of some univariate distribution we want to sample from. Suppose the f (·) is majorized by c g(·),

f (y) ≤ c g(y),

for some simple probability density g and some constant c > 1.

The idea of acceptance-rejection is to sample proposals X from the simpler (simulationwise) density g and then to reject some proposals which are likely to be overrepresented in the sample:

◦ Sample X ∼ g(x)

◦ Sample U ∼ U (0, 1)

◦ Accept X if U ≤ f (X) c g(X) .

Example: Non-standard prior for binomial parameter Suppose that Y is binomially distributed

Y ∼ Bin(n, θ)

with non-conjugate prior distribution π(θ) = 4  1

2 − θ − 1

2

 , θ ∈ [0, 1].

The posterior distribution (conditional on data Y = y) satisfies π(θ|y) ∼ θ y (1 − θ) n−y  1

2 − θ − 1

2

 , θ ∈ [0, 1].

Example: Non-standard prior for binomial parameter (contd)

Since 1 2 − |θ − 1 2 | ≤ 1 2 − 2 (θ − 1 2 ) 2 = 2 θ (1 − θ) we get for some c > 0 π(θ|y) ≤ c θ y+1 (1 − θ) n−y+1

The right side is proportional to the density of a beta distribution with parameters y + 2 and n − y + 2. This suggests the following sampling scheme:

◦ Define f (θ) = θ y (1 − θ) n−y 1

2 − θ − 1

2

g(θ) = 2 θ y+1 (1 − θ) n−y+1

◦ Sample θ ∼ Beta(y + 2, n − y + 2) U ∼ U(0, 1)

◦ Accept θ if U ≤ f (θ) g(θ) .

◦ Note: We did not need to compute the proportionality factor Z

Θ

f (y|θ) π(θ) dθ,

since this cancels in the ratio f (θ) g(θ) .

2.0 2.5 3.0

cg ( θ )

1200

Acceptance−Rejection for Bayesian Inference with Non−Standard Prior 1600

Generating Random Numbers

For other distributions (e.g. the normal) exist special methods.

In this course, we will only use the built-in generators in R:

N (µ, σ 2 ) rnorm(n,mean=0,sd=1) U[min, max] runif(n,min=0,max=1) Beta(a, b) rbeta(n,a,b)

Bin(s, p) rbinom(n,size,prob) Cauchy(α, σ)) rnorm(n,loc=0,scale=1) χ 2 n (δ) rchisq(n,df,ncp=0) Exp(rate) rnorm(n,rate=1)

F m,n rf(n,df1,df2)

Γ(a, s) rgamma(n,shape,rate=1,scale=1/rate) Geom(p) rgeom(n,prob)

H(m, n, k) rhyper(nn,m,n,k)

log-normal(µ, σ 2 ) rlnorm(n,mean=0,sd=1)) Logistic(µ, σ 2 ) rlogis(n,loc=0,scale=1) NegBinom(s, p) rnbinom(n,size,prob,mu) Poisson(λ) rpois(n,lambda)

t n rt(n,df)

Weibull(a, b) rweibull(n,shape,scale=1)

R also provides functions for calculating the density (dF), the distribution

function (pF) and quantiles (qF), where F is the name of the distribution

as in the above commands.

Aim: Evaluate expectation E(h(Y )) =

Z

h(y) f (y) dy, (1)

where Y is some (possibly high-dimensional) random variable with distri- bution defined by f (y).

Examples:

◦ Suppose ˆ θ = ˆ θ(Y ) is an estimator for some parameter θ. Quantities of interest are the bias and the standard deviation of ˆ θ,

bias(ˆ θ) = E(ˆθ) − θ, σ(ˆ θ) = E(ˆθ − E(ˆθ)) 2 1 2

.

◦ In cases, where the estimate ˆ θ is obtained by an iterative estimation procedures (e.g. by Newton-Raphson), the estimator ˆ θ(Y ) cannot be written in closed form and the integral (1) cannot be computed by numerical integration.

◦ For two normally distributed samples Y 11 , . . . , Y n 1 1 and Y 12 , . . . , Y n 2 2 , the hypothesis of equal means can be tested by the two-sample t test with test statistic

T =

Y ¯ 1 − ¯ Y 2 q s 2 1

n 1 + n s 2 2

2

.

The p-value of the test is defined as

P(T (Y ) > t) = E (1

Let F be the distribution function of an univariate distribution and let F ⁻¹ (y) = inf{x|F (x) ≥ y} (generalized inverse of F ).

For a uniformly distributed random variable U ∼ U(0, 1), let X = F ⁻¹ (U ).

P(X ≤ x) = P(F ⁻¹ (U ) ≤ x) = P(U ≤ F (x)) = F (x).

For the exponential distribution, we have F (x) = 1 − exp(−θx). Thus X = F ⁻¹ (U ) = − 1

with non-conjugate prior distribution π(θ) = 4 ¹

2 − θ − ¹

, θ ∈ [0, 1].

The posterior distribution (conditional on data Y = y) satisfies π(θ|y) ∼ θ ^y (1 − θ) ^n−y ¹

2 − θ − ¹

, θ ∈ [0, 1].

Since ¹ ₂ − |θ − ¹ ₂ | ≤ ¹ ₂ − 2 (θ − ¹ ₂ ) ² = 2 θ (1 − θ) we get for some c > 0 π(θ|y) ≤ c θ ^y+1 (1 − θ) ^n−y+1

◦ Define f (θ) = θ ^y (1 − θ) ^n−y ¹

2 − θ − ¹

g(θ) = 2 θ ^y+1 (1 − θ) ^n−y+1

since this cancels in the ratio ^{f (θ)} _g(θ) .

N (µ, σ ² ) rnorm(n,mean=0,sd=1) U[min, max] runif(n,min=0,max=1) Beta(a, b) rbeta(n,a,b)

Bin(s, p) rbinom(n,size,prob) Cauchy(α, σ)) rnorm(n,loc=0,scale=1) χ ² _n (δ) rchisq(n,df,ncp=0) Exp(rate) rnorm(n,rate=1)

F _m,n rf(n,df1,df2)

log-normal(µ, σ ² ) rlnorm(n,mean=0,sd=1)) Logistic(µ, σ ² ) rlogis(n,loc=0,scale=1) NegBinom(s, p) rnbinom(n,size,prob,mu) Poisson(λ) rpois(n,lambda)

t _n rt(n,df)

bias(ˆ θ) = E(ˆθ) − θ, σ(ˆ θ) = E(ˆθ − E(ˆθ)) ² ¹ ₂

◦ For two normally distributed samples Y ₁₁ , . . . , Y _n ₁ ₁ and Y ₁₂ , . . . , Y _n ₂ ₂ , the hypothesis of equal means can be tested by the two-sample t test with test statistic

Y ¯ ₁ − ¯ Y ₂ q s ² ₁

n 1 + _n ^s ² ²

◦ Draw sample Y ⁽¹⁾ , . . . , Y ^{(n) iid} ∼ f (y).

h(Y ^(t) )

h(Y ^(t) ) → E(h(Y )) as n → ∞.

Y1<-rgamma(Nn1,1,1) #sample from gamma distribution Y2<-rgamma(Nn2,2,2)

Suppose Y ₁ , . . . , Y _n ∼ P _θ and ˆ θ = S(Y ) is an estimator for θ.

Idea: Simulate from the distribution P _θ to obtain samples Y ^(b) = (Y ₁ ^(b) , . . . , Y _n ^(b) ), b = 1, . . . , B

and for each sample an estimate θ ˆ ^(b) = S(Y ^(b) ).

σ ² (ˆ θ) = ¹

θ ˆ ^(b) − ¯ θ 2

Idea: Approximate P _θ by P _θ _ˆ Parametric bootstrap:

◦ For b = 1, . . . , B, simulate bootstrap replications Y _j ^{∗ (b)} ^iid ∼ P _θ _ˆ .

◦ Compute bootstrap estimate ˆ θ ^{∗ (b)} from the bootstrap sample Y ^{∗ (b)} .

Thus if Z ⁽¹⁾ , . . . , Z ⁽ⁿ⁾ ∼ g then 1

h(Z ^(t) ) f (Z ^(t) ) g(Z ^(t) )