• No results found

Beta and Gamma Distributions

N/A
N/A
Protected

Academic year: 2021

Share "Beta and Gamma Distributions"

Copied!
26
0
0

Loading.... (view fulltext now)

Full text

(1)

Bayesian Scientific Computing, Spring 2013 (N. Zabaras)

Beta and Gamma Distributions

(2)

Bayesian Scientific Computing, Spring 2013 (N. Zabaras)

 Beta distribution,

 Gamma Function, Normalization of the Beta Distribution, Beta as a Prior to Bernoulli, Posterior and Predictive Distributions

 A Frequentist View of Bayesian Learning, Variance Decomposition

 Gamma Distribution

 Exponential Distribution

 Chi Squared Distribution

 Inverse Gamma Distribution

 The Pareto Distribution

2

Contents

• Following closely Chris Bishops’ PRML book, Chapter 2

(3)

Bayesian Scientific Computing, Spring 2013 (N. Zabaras)

 The Beta(a,b) distribution with is defined as follows:

 The expected value, mode and variance of a Beta

random variable x with (hyper-)parameters α and β :

 For more information visit this link.

(4)

Bayesian Scientific Computing, Spring 2013 (N. Zabaras)

 If a=b=1, we obtain a uniform distribution.

 If a and b are both less than 1, we get a bimodal distribution with spikes at 0 and 1.

If a and b are both greater

(5)

Bayesian Scientific Computing, Spring 2013 (N. Zabaras)

 The gamma function extends the factorial to real numbers:

 With integration by parts:

For integer n:

 For more information visit this link.

(6)

Bayesian Scientific Computing, Spring 2013 (N. Zabaras)

 Showing that the Beta(a,b) distribution is normalized

correctly is a bit tricky. We need to prove that:

Indeed we follow the steps: (a) change the variables y to

t=y+x; (b) change the order of integration in the shaded

triangular region; and (c) change x to m via x=tm:

1

1 1

0

( ) ( )a b (a b m) a (1 m)b dx

    

Beta Distribution: Normalization

(7)

Bayesian Scientific Computing, Spring 2013 (N. Zabaras)

Beta Distribution

7 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.5 1 1.5 2 2.5 3 x pdf Beta(0.1,0.1) 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.5 1 1.5 2 2.5 3 x pdf Beta(1,1) 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.5 1 1.5 2 2.5 3 x pdf Beta(2,3) 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.5 1 1.5 2 2.5 3 x pdf Beta(8,4)

(8)

Bayesian Scientific Computing, Spring 2013 (N. Zabaras)

 Assuming a Bernoulli likelihood and Beta prior we derive the

posterior as:

This is also a Beta distribution:

 a and b are the effective number of observations of x=1 and

x=0, respectively, introduced by the prior (don’t have to be

(9)

Bayesian Scientific Computing, Spring 2013 (N. Zabaras)

 From the properties of the Beta distribution, we compute:

 The posterior mean always lies in between the prior mean

and the MLE estimate:

 This can be shown easily by noticing that:

Posterior Mean and Variance

(10)

Bayesian Scientific Computing, Spring 2013 (N. Zabaras)

Posterior Distribution

10

(11)

Bayesian Scientific Computing, Spring 2013 (N. Zabaras)

 We can now compute the probability that the next coin flip is

(12)

Bayesian Scientific Computing, Spring 2013 (N. Zabaras)

 Consider the case of infinite data (N→∞):

and the posterior mean and variance become:

 For N→∞, the distribution as expected spikes around the

MLE estimate with zero variance (i.e. the uncertainty

decreases as N→∞). Is this a general property?

Properties of the Posterior Distribution

(13)

Bayesian Scientific Computing, Spring 2013 (N. Zabaras)

A Frequentist View of Bayesian Learning

13

 Consider inference of parameter q using data D. We

expect that because the posterior p(q|D) incorporates the

information from the data D, it will imply less variability for q

than the prior p(q).

 We have the following identities:

[ ]q   q |D

[ ] | | |

(14)

Bayesian Scientific Computing, Spring 2013 (N. Zabaras)

A Frequentist View of Bayesian Learning

14

This means that on average over the realizations of the

data D, the conditional expectation E[q|D] is equal to E[q].

 Also, the posterior variance on average is smaller than the

prior variance by an amount that depends on the variations in posterior means over the distribution of

possible data.

[ ]q   q | D

[ ] | | |

var q  var q D  var  q D  var q D

 | 

(15)

Bayesian Scientific Computing, Spring 2013 (N. Zabaras)

Posterior Mean

15

Note the not-surprising result regarding the posterior mean:

 

| ( | ) ( | ) ( ) ( , ) ( ) p d p p d d p d d p d q q q q q q q q q q q q q q      



 

D D D D D D D

 

q

q |

             Prior Posterior mean mean Posterior mean averaged over the data

(16)

Bayesian Scientific Computing, Spring 2013 (N. Zabaras)

Variance Decomposition Identity

16

If (q,D) are two scalar random variables then we have:

 Here is the proof:

(17)

Bayesian Scientific Computing, Spring 2013 (N. Zabaras)

Posterior Variability

17

 We can derive a similar expression regarding the posterior

variance:

 Thus on average (over the data), the variability in q

decreases. For a particular observed data set D, it is

however possible that

 These results implicitly assume that the data follow the distribution:

 

Pr | | | ior Posterior variance variance averaged over all data

var q  var q Dvar  q D   var q D  

(18)

Bayesian Scientific Computing, Spring 2013 (N. Zabaras)

 The Gamma distribution is a two-parameter family of

continuous distributions. It has a scale parameter θ>0 and

a shape parameter k>0. If k is an integer then the

distribution represents the sum of k independent

exponentially distributed random variables, each of which

has a mean of θ (which is equivalent to a rate parameter

of θ −1) .

 More often, we also use the rate

(19)

Bayesian Scientific Computing, Spring 2013 (N. Zabaras)

 It is frequently a model for waiting times. For important

properties see here.

 It is more often parameterized in terms of a shape

parameter a = k and an inverse scale parameter b = 1/θ,

called a rate parameter:

 The mean, mode and variance with this parametrization are:

 

1 1 0

( | , )

,

0,

, ( )

( )

a a bx a u

b

p x a b

x

e

x

a

u

e du

a

    

 

Gamma Distribution- Rate Parametrization

(20)

Bayesian Scientific Computing, Spring 2013 (N. Zabaras)

Plots of

As we decrease the rate b, the distribution squeezes

leftwards and upwards .

(21)

Bayesian Scientific Computing, Spring 2013 (N. Zabaras)

An empirical PDF of rainfall data fitted with a Gamma

distribution.

(22)

Bayesian Scientific Computing, Spring 2013 (N. Zabaras)

Exponential Distribution

22

This is defined as

 Here λ is the rate parameter.

This distribution describes the times between events in a

Poisson process, i.e. a process in which events occur

continuously and independently at a constant average rate λ.

 

(X | )  (X |1, )   exp(x), x 0,

(23)

Bayesian Scientific Computing, Spring 2013 (N. Zabaras)

Chi-Squared Distribution

23

This is defined as

 This is the distribution of the sum of squared Gaussian

random variables. More precisely,

 

2 1 2 2 1 1 2 ( | ) ( | , ) exp( ), 0, 2 2 2 2 x X X x x                        Gamma 2 2 1 ~ (0,1) . ~ i i i

Let Z and S Z Then S

(24)

Bayesian Scientific Computing, Spring 2013 (N. Zabaras)

Inverse Gamma Distribution

24

This is defined as follows:

where:

a is the shape and b the scale parameters.

It can be shown that:

1 ~ ( | , ) ~ ( | , ) If X Gamma X a bXInvGamma X a b

 

( 1) ( | , ) exp( / ), 0, ( ) a a b X a b x b x x a        InvGamma 2 2 ( 1), , 1 1 var ( 2) ( 1) ( 2) b b

Mean exists for a Mode

(25)

Bayesian Scientific Computing, Spring 2013 (N. Zabaras)

The Pareto Distribution

25

Used to model the distribution of quantities that exhibit

long tails (heavy tails)

This density asserts that x must be greater than some

constant m, but not too much greater, k controls what is “too much”.

As k → ∞, the distribution approaches δ(x − m).

On a log-log scale, the pdf forms a straight line, of the form

log p(x) = a log x + c for some constants a and c (power

law, Zipf’s law).

( 1)

(X k m| , )  km xk  k (xm)

(26)

Bayesian Scientific Computing, Spring 2013 (N. Zabaras)

The Pareto Distribution

26

Applications: Modeling the frequency of words vs their rank, distribution of wealth (k=Pareto Index), etc.

References

Related documents