• No results found

Lecture 3 Gaussian Probability Distribution

N/A
N/A
Protected

Academic year: 2021

Share "Lecture 3 Gaussian Probability Distribution"

Copied!
7
0
0

Loading.... (view fulltext now)

Full text

(1)

Lecture 3

Gaussian Probability Distribution

p(x) = 1 s 2pe -(x -m)2 2s2 gaussian Plot of Gaussian pdf x P(x)

Introduction

l Gaussian probability distribution is perhaps the most used distribution in all of science. u also called “bell shaped curve” or normal distribution

l Unlike the binomial and Poisson distribution, the Gaussian is a continuous distribution:

m = mean of distribution (also at the same place as mode and median) s2 = variance of distribution

y is a continuous variable (-∞ £ y £ ∞)

l Probability (P) of y being in the range [a, b] is given by an integral:

u The integral for arbitrary a and b cannot be evaluated analytically

+ The value of the integral has to be looked up in a table (e.g. Appendixes A and B of Taylor).

P(y) = 1 s 2p e -( y-m) 2 2s2 † P(a < y < b) = 1 s 2p e -( y-m) 2 2s2 a b Ú dy

(2)

It is very unlikely (< 0.3%) that a measurement taken at random from a Gaussian pdf will be more than ± 3s from the true mean of the distribution.

l The total area under the curve is normalized to one. + the probability integral:

l We often talk about a measurement being a certain number of standard deviations (s) away

from the mean (m) of the Gaussian.

+ We can associate a probability for a measurement to be |m - ns|

from the mean just by calculating the area outside of this region.

ns Prob. of exceeding ±ns 0.67 0.5 1 0.32 2 0.05 3 0.003 4 0.00006

Relationship between Gaussian and Binomial distribution

l The Gaussian distribution can be derived from the binomial (or Poisson) assuming: u p is finite

u N is very large

u we have a continuous variable rather than a discrete variable

l An example illustrating the small difference between the two distributions under the above conditions: u Consider tossing a coin 10,000 time.

p(heads) = 0.5 N = 10,000P(-• < y < •) = 1 s 2p e -( y-m) 2 2s2 -• • Ú dy = 1

(3)

n For a binomial distribution:

mean number of heads = m = Np = 5000 standard deviation s= [Np(1 - p)]1/2 = 50

+ The probability to be within ±1s for this binomial distribution is:

n For a Gaussian distribution:

+ Both distributions give about the same probability!

Central Limit Theorem

l Gaussian distribution is important because of the Central Limit Theorem l A crude statement of the Central Limit Theorem:

u Things that are the result of the addition of lots of small effects tend to become Gaussian. l A more exact statement:

u Let Y1, Y2,...Yn be an infinite sequence of independent random variables

each with the same probability distribution.

u Suppose that the mean (m) and variance (s2) of this distribution are both finite. + For any numbers a and b:

+ C.L.T. tells us that under a wide range of circumstances the probability distribution

that describes the sum of random variables tends towards a Gaussian distribution as the number of terms in the sum Æ∞.

P = 10 4! (104 - m)!m! m=5000-50 5000+50  0.5m0.510 4 -m = 0.69 † P(m - s < y < m + s ) = 1 s 2p e -( y-m) 2 2s2 m-s m+s Ú dy ª 0.68 lim nÆ•P a < Y1+Y2 + ...Yn - nm s n < b È Î Í ˘ ˚ ˙ = 12p e -1 2y 2 a b Ú dy

Actually, the Y’s can be from different pdf’s!

(4)

+ Alternatively:

n sm is sometimes called “the error in the mean” (more on that later). l For CLT to be valid:

u m and s of pdf must be finite.

u No one term in sum should dominate the sum.

l A random variable is not the same as a random number.

u Devore: Probability and Statistics for Engineering and the Sciences:

+ A random variable is any rule that associates a number with each outcome in S n S is the set of possible outcomes.

l Recall if y is described by a Gaussian pdf with m = 0 and s = 1 then

the probability that a < y < b is given by:

l The CLT is true even if the Y’s are from different pdf’s as long as

the means and variances are defined for each pdf!

u See Appendix of Barlow for a proof of the Central Limit Theorem.

† lim nÆ•P a < Y - m s / n < b È Î Í ˘ ˚ ˙ = limnÆ•P a <Y - ms m < b È Î Í ˘ ˚ ˙ = 1 2p e -1 2y 2 a b Ú dy

dy

e

b

y

a

P

b a y

Ú

=

<

<

-2 2 1

2

1

)

(

p

(5)

l Example: A watch makes an error of at most ±1/2 minute per day.

After one year, what’s the probability that the watch is accurate to within ±25 minutes?

u Assume that the daily errors are uniform in [-1/2, 1/2].

n For each day, the average error is zero and the standard deviation 1/√12 minutes. n The error over the course of a year is just the addition of the daily error.

n Since the daily errors come from a uniform distribution with a well defined mean and variance + Central Limit Theorem is applicable:

+ The upper limit corresponds to +25 minutes:

+ The lower limit corresponds to -25 minutes:

+ The probability to be within ± 25 minutes:

+ less than three in a million chance that the watch will be off by more than 25 minutes in a year!

† lim nÆ•P a < Y1+Y2+ ...Yn - nm s n < b È Î Í ˘ ˚ ˙ = 12p e -1 2y 2 a b Ú dya =Y1+Y2+ ...Yn - nm s n = -25 - 365 ¥ 0 1 12 365 = -4.5 † P = 1 2p e -1 2y 2 -4.5 4.5 Ú dy = 0.999997 = 1- 3¥10-6 † b =Y1+Y2 + ...Yn - nm s n = 25 - 365 ¥ 0 1 12 365 = 4.5

(6)

l Example: Generate a Gaussian distribution using random numbers.

u Random number generator gives numbers distributed uniformly in the interval [0,1] n m = 1/2 and s2 = 1/12

u Procedure:

n Take 12 numbers (ri) from your computer’s random number generator n Add them together

n Subtract 6

+ Get a number that looks as if it is from a Gaussian pdf!

0

-6 +6

Thus the sum of 12 uniform random

numbers minus 6 is distributed as if it came from a Gaussian pdf with m = 0 and s = 1.

E

A) 5000 random numbers B) 5000 pairs (r1 + r2) of random numbers C) 5000 triplets (r1 + r2 + r3) of random numbers D) 5000 12-plets (r1 + r2 +…r12) of random numbers. E) 5000 12-plets (r1 + r2 +…r12 - 6)of random numbers. Gaussian m = 0 and s = 1 † P a <Y +Y2 + ...Yn - nm s n < b È Î Í ˘ ˚ ˙ = P a < ri -12 ⋅12 i=1 12 Â 1 12 ⋅ 12 < b È Î Í Í Í Í ˘ ˚ ˙ ˙ ˙ ˙ = P -6 < ri - 6 i=1 12 Â < 6 È Î Í ˘ ˚ ˙ = 1 2p e -12y2 -6 6 Ú dy

(7)

K.K. Gan L3: Gaussian Probability Distribution 7

l Example: The daily income of a "card shark" has a uniform distribution in the interval [-$40,$50].

What is the probability that s/he wins more than $500 in 60 days?

u Lets use the CLT to estimate this probability:

u The probability distribution of daily income is uniform, p(y) = 1.

+ need to be normalized in computing the average daily winning (m) and its standard deviation (s).

u The lower limit of the winning is $500:

u The upper limit is the maximum that the shark could win (50$/day for 60 days):

+ 16% chance to win > $500 in 60 days

† lim nÆ•P a < Y1+Y2+ ...Yn - nm s n < b È Î Í ˘ ˚ ˙ = 12p e -1 2y 2 a b Ú dya =Y1+Y2 + ...Yn - nm s n = 500 - 60 ¥ 5 675 60 = 200 201 = 1 † m = yp(y)dy -40 50 Ú p(y)dy -40 50 Ú = 1 2[50 2 - (-40)2] 50 - (-40) = 5 s2 = y2p(y)dy -40 50 Ú p(y)dy -40 50 Ú -m2 = 1 3[50 3 - (-40)3] 50 - (-40) - 25 = 675 b = Y1+Y2+ ...Yn - nm s n = 3000 - 60 ¥ 5 675 60 = 2700 201 = 13.4 P = 1 2p e -1 2y 2 1 13.4 Ú dy ª 1 2p e -1 2y 2 1 • Ú dy = 0.16

References

Related documents

The principle hypotheses for this work include (1) if an automation system has been virtually prototyped and commissioned using the component-based approach in a virtual

As consequence, this paper contains a study on the effect of the rotor bar number on the torque (average and quality) of five-phase induction machines under different supply

No Enbrel Humira Simponi Proceed to appropriate program policy Other Proceed to Figure 3 Deny Actemra Cimzia Orencia Deny Deny Will it be used in combination with a potent

I understand that I may not get a full correction from my LASIK procedure and this may require future retreatment procedures, such as more laser treatment or the use of glasses

Background: This study is the first to examine the relationship between gender and self-assessed health (SAH), and the extent to which this varies by socio- economic position

Hasil dari penelitian ini yaitu faktor yang menyebabkan CAH terindikasi lamban belajar adalah faktor postnatal (sesudah lahir), lingkungan keluarga, dan faktor

Gallium arsenide (GaAs) is one type of semiconductor material compounds and it has a zinc blende crystal structure.. The material Gallium and the material Arsenide

What are the driving factors leading companies to request sales tax outsourcing services:. • Complexity of returns at the local level of tax (County