• No results found

Bayesian statistics, simulation and software

N/A
N/A
Protected

Academic year: 2021

Share "Bayesian statistics, simulation and software"

Copied!
14
0
0

Loading.... (view fulltext now)

Full text

(1)

1/14

Bayesian statistics, simulation and software

Module 3: Bayesian principle, binomial model and conjugate priors

Jesper Møller and Ege Rubak

Department of Mathematical Sciences Aalborg University

(2)

2/14

Motivating example: Spelling correction

(Adapted from Bayesian Data Analysis by Gelman et al. (2014))

 Problem: Someone types ’radom’.

 Question: What did they meant to type? Random?

Ingredients:

 Data x: The observed word — radom.

 Parameter of interest θ: The correct word.

Comments: To solve this we need

 background information on which words similar to radom could be

typed;

 an idea about how these words are typically mistyped.

(3)

3/14

Bayesian idea

 Data/observation model: Conditional on θ (here the correct word),

data x (here the typed word) is distributed according to a density

π(x|θ) ∝ L(θ; x) ← the likelihood.

 Prior: Prior knowledge (i.e. before collecting data) about θ is

summaried by a density

π(θ) ← the prior.

 Posterior: The updated knowledge about θ after collecting data:

The conditional distribution of θ given data x is by Bayes theorem

π(θ|x) = π(x|θ)π(θ)

π(x) ∝ π(x|θ)π(θ)

(4)

4/14

Example: Prior

Google provides the following prior probabilities for three candidate words:

θ π(θ)

random 7.60 × 10−5

radon 6.05 × 10−6

radom 3.12 × 10−7

Comments

 The relative high probability for the word radom is perhaps

surprising: Name of a city in Poland and of a semiautomatic pistol of Polish design.

 Probably, in the context of writing a scientific report these prior

probabilities should be different.

(5)

5/14

Example: Likelihood

Google’s model of spelling and typing errors provides the following conditional probabilities/likelihoods: θ π(x = ’radom’|θ) random 0.00193 radon 0.000143 radom 0.975 Comments

 This is not a probability distribution but a likelihood function!

 If one in fact intends to write ’radom’ this actually happens in

97.5% of cases.

 If one intends to write either ’random’ or ’radon’ this is rarely

(6)

6/14

Example: Posterior

Combining the prior and likelihood we obtain the posterior probabilities:

π(θ|x) = π(x|θ)π(θ) π(x) ∝ π(x|θ)π(θ) θ π(x = ’radom’|θ)π(θ) π(θ|x = ’radom’) random 1.47 × 10−7 0.325 radon 8.65 × 10−10 0.002 radom 3.04 × 10−7 0.673 Conclusion

 With the given prior and likelihood the word ’radom’ is twice as

likely as ’random’. Criticism

 Is the posterior probability for ‘radom’ too high?

 Prior depends on context — and hence might be ‘wrong’.

 Likelihood is perhaps OK in this case – or it could depend on

context as well.

(7)

7/14

Binomial model in a Bayesian context

 Data model: Binomial, X ∼ B(n, p), n known.

π(x|p) =n

x 

px(1 − p)n−x, x= 0, 1, . . . , n, 0 ≤ p ≤ 1.

 Prior: A convenient choice is a Beta distribution:

π(p) ∼ Be(α, β),

where we have to specify the shape parameters α > 0 and β > 0. The Beta distribution has density/pdf

π(p) =

(Γ(α)Γ(β)

Γ(α+β) pα−1(1 − p)β−1 for 0 ≤ p ≤ 1

0 otherwise.

(8)

8/14

Beta distribution: Examples

0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.4 0.8 α=1 , β=1 0.0 0.2 0.4 0.6 0.8 1.0 0.0 1.0 2.0 α=0.5 , β=0.5 0.0 0.2 0.4 0.6 0.8 1.0 0.0 1.0 2.0 α=2 , β=4 0.0 0.2 0.4 0.6 0.8 1.0 0.0 1.0 2.0 3.0 α=2 , β=0.5 Mean: E[p] = α α+ β, Variance: Var[p] = αβ (α + β)2(α + β + 1).

(9)

9/14

Binomial model in a Bayesian context (continued)

 Data model:X ∼ B(n, p).

 Prior: π(p) ∼ Be(α, β), that is

π(p) = (Γ(α)Γ(β) Γ(α+β) pα−1(1 − p)β−1 for 0 ≤ p ≤ 1 0 otherwise.  Posterior: π(p|x) ∝ π(x|p)π(p) =n x  px(1 − p)n−x · Γ(α)Γ(β) Γ(α + β)p α−1(1 − p)β−1 ∝ px+α−1(1 − p)n−x+β−1 ∼ Be(x + α, n − x + β).

(10)

10/14

Posterior mean & variance

Posterior π(p|x) ∼ Be(x + α, n − x + β). Posterior mean E[p|x] = x+ α (x + α) + (n − x + β) = x+ α α+ β + n.

If n ≫ max{α, β}, then E[p|x] ≈ x

n (the ”natural”unbiased estimate).

Posterior variance Var[p|x] = (x + α)(n − x + β) (x + α + n − x + β)2(x + α + n − x + β + 1) = (x + α)(n − x + β) (α + β + n)2(α + β + n + 1) = ( x n+ α n)(n−xn + β n) (α+β+nn )2(α + β + n + 1)≈ x nn−xn α+ β + n + 1 → 0 as n → ∞.

(11)

11/14

Conjugate priors

In the binomial example: Both prior and posterior were beta distributions! Very convenient!

We say that the beta distribution is conjugate.

Definition: Conjugate priors

Let π(x|θ) be the data model. A class Π of prior distributions for θ is said to be conjugate for π(x|θ) if

π(θ|x) ∝ π(x|θ)π(θ) ∈ Π

whenever π(θ) ∈ Π. That is, prior and posterior are in the same class of distributions.

Notice: Π should be a class of “tractable” distributions for this to be useful.

(12)

12/14

Example: Placentia Previa (PP)

 Question: Is the sex ratio different for PP births compared to

normal births?

 Prior knowledge: 48.5% of new-borns are girls.

 Data: Of n = 980 cases of PP x=437 were girls (437/980=44.6%).

 Data model: X ∼ B(n, p).

 Prior: π(p) ∼ Be(α, β).

 Posterior:

π(p|x) ∼ Be(x + α, n − x + β) = Be(437 + α, 543 + β) How to choose α and β, and what difference does it make?

(13)

13/14

Placenta Previa: Beta priors and posteriors

0.3 0.4 0.5 0.6 0.7 0 5 10 15 20 25 α=1, β = 1 post. density prior density post. 0.975 p = 0.485 0.3 0.4 0.5 0.6 0.7 0 5 10 15 20 25 α =4.85, β = 5.15 post. density prior density post. 0.975 p = 0.485 0.3 0.4 0.5 0.6 0.7 0 5 10 15 20 25 α =9.7, β = 10.3 post. density prior density post. 0.975 p = 0.485 0.3 0.4 0.5 0.6 0.7 0 5 10 15 20 25 α =97, β = 103 post. density prior density post. 0.975 p = 0.485

(14)

14/14

Conclusion (see the notes or Gelman et al. (2014))

For the different choices of priors, 95% posterior intervals (as defined later) for p do not contain 48.5%, which indicates that the probability for a female birth given placenta previa is lower than in the general

population.

References

Related documents

Using a design approach (Greenwald, Prat- kanis, Leippe, & Baumgardner, 1986), we attempted to maxi- mize the possibility of obtaining groupthink in an experiment that (a) used

complete unit (Alternative I) consisting of intake; hydraulic coagulation; hydraulic flocculation; tube settler sedimentation; rapid sand filter; reservoir;

The Pre-View 3 programme aims to enhance parental confidence and knowledge of SRE, by provid- ing – in conjunction with the Schools Library Service – opportunities and

conducted in both environmental simulators (e.g., thermal-vacuum chambers, etc.) and operational simulators (e.g., Earth polar and desert region Mars analogs, etc) to ensure that

The $12.8 billion acquisition of Catamaran, provider of pharmacy benefit management (PBM) services and healthcare information technology (HCIT) solutions to the healthcare

29 Main Menu “Directions” button will change to “Check In” if distance to a place is 0 mi If the x button is selected then press “Finish,” it will complete and send

In specific, CENTROTEC was able to consolidate and in some areas strengthen its market position in its largest segment Gas Flue Systems, for which it posted revenue of EUR

In conjunction with UNCG Staff and Bryan School Alumni Association Board Executive Committee, it is the responsibility of the President to lead and provide direction to