• No results found

7 3 The Normal Distribution Estimating From Data

In document Statistical Inference For Everyone (Page 146-149)

Estimating the mean,µ, knowing the deviation,σ

Typically one is provided with a series of measurements of a quan- tity, and we want toestimatethe value of that quantity, and have a description of ouruncertaintyin the estimate. In Chapter9(Applica-

tions of Parameter Estimation and Inferenceon page165) we go through

a number of detailed examples of this process. Here, we simply sum- marize the result. We are given:

1 A series ofNmeasurements, data={x1,x2,x3, . . . ,xN} 2 The real deviation,σ

3 We are modeling the data as a true value,µ, with uncertainty with a likelihood from the Normal distribution with known deviation,

σ, as in Normal(0,σ). Further, we assume independence between

the measurements.

Since in this case we are givenσ, we wish then to estimate the pa-

rameterµ. The result will be a probability distribution overµ, with a

best (i.e. most probable) value and an uncertainty in that value. The

result is that the distribution ofµis also a Normal distribution, In scientific applications, this notation is often shortened to

µ=x¯±σ/√N, so it is clear what

is the best estimate ofµ(i.e. ¯x)

and what is the uncertainty in that estimate (i.e.σ/√N).

P(µ|data,σ) =Normal(x¯,σ/√N)

where the center value (and thus the most probable value ofµ) is

given by the sample mean of the data.

Sample MeanThesample meanof a set ofNsamples,x1,x2,· · ·,xN Sample MeanThesample meanof

a set ofNsamples,x1,x2,· · ·,xNis

given by

¯

p r i o r s, l i k e l i h o o d s,a n d p o s t e r i o r s 147

is given by

¯

x≡ x1+x2+xN3+· · ·+xN

The uncertainty inµis given byσ/√N. As a consequence, larger N(i.e. more data points), makes us more confident in the particular estimate forµ.

Estimate of location parameterµgivenNsamples andσ, the

known deviationIn summary, the best estimate for the location Estimate of location parameterµ

givenNsamples andσ, the known deviationThe best estimate for the location parameterµin the

Normal distribution given a set of

Nsamples,x1,x2,· · ·,xNis given by ˆ µ=x1+x2+· · ·+xN N ±σ/ √ N parameterµin the Normal distribution given a set ofNsamples,

x1,x2,· · ·,xNis given by ˆ µ= x1+x2+x3+· · ·+xN N ±σ/ √ N

Example 7.4 Estimating the True Length of an Object

Say we have an object, and5measurements of its length from the same ruler but from different people,

5.1[cm],4.9[cm],4.7[cm],4.9[cm],5.0[cm]

Say that we further know that the uncertainty (given this ruler) of

one measurement hasσ = 0.5[cm]. What is the best estimate of the In real measurements, there is always the problem of bias or

systematicuncertainties, where the uncertainty does not follow a Normal distribution. We will not consider this issue here.

length? The best estimate should be given by the sample mean of these5samples, ˆ µ = x1+x2+· · ·+xN N = 5.1[cm] +4.9[cm] +4.7[cm] +4.9[cm] +5.0[cm] 5 =4.92[cm]

with uncertainty related to the known uncertainty of a single mea- surement, ˆ σ = √σ N = 0.5√[cm] 5 =0.223[cm] yielding a final best estimate of

ˆ

µ = 4.92[cm]±0.223[cm]

or (with 2σrange), The95% credible interval (CI) is

really at the 1.96σlevel, yielding [4.481[cm], 5.358[cm]]. We will almost always approximate it as 2σby hand, but the computer will

generate the true95% credible interval when requested.

ˆ

148 s tat i s t i c a l i n f e r e n c e f o r e v e r yo n e

Estimating the mean,µ,notknowing the deviation,σ

If we are not so fortunate to be given the deviation, as in the previous case, then this parameter too must be estimated from the data. As a first step we can estimate the deviation with thesampledeviation.

Sample DeviationThe sample deviation of a set ofNsamples, Sample DeviationThe sample

deviation of a set ofNsamples,

x1,x2,· · ·,xNis given by S≡ r 1 N−1((x1−x¯) 2+· · ·+ (x N−x¯)2) x1,x2,· · ·,xNis given by S≡ r 1 N−1((x1−x¯) 2+ (x 2−x¯)2+· · ·+ (xN−x¯)2)

Approximate estimate of location parameterµand deviationσ

givenNsamples The posterior probability forµandσgiven a set of Approximate estimate of location

parameterµand deviationσ givenNsamplesThe posterior probability forµandσgiven a set

ofNsamples,x1,x2,· · ·,xNcan be approximated with P(µ|data) ∼ Normal(x¯,S/√N) P(σ|data) ∼ Normal(S, S2/q(N1)/3

which works well if we have many (N>30) data points.

Nsamples,x1,x2,· · ·,xN can be approximated with P(µ|data) ∼ Normal(x¯,S/ √ N) P(σ|data) ∼ Normal S,S2/ q (N−1)/3

which works well if we have many (N>30) data points.

With a smaller data set, the value ofSas an estimate for the devi- ation becomes too small. When the estimate forσis too small, then

the result is claimingmore confidencein the estimate of the mean,µ,

than is warranted. This discrepancy depends on the number of data Because the uncertainty in the mean depends explicitly on the number of data points, it goes beyond the level of this chapter to give a form for the posterior probability distribution for the deviation,σ.

points, and thus it makes sense that the proper distribution should depend on the number of data points, in addition to the sample mean and deviation. The proper, although less convenient, result is that the posterior probability forµtakes the form of the Student’st

distribution,

Estimate of location parameterµgivenNsamples andunknown

σThe posterior probability forµtakes the form of the Student’st Estimate of location parameterµ givenNsamples andunknown

σThe posterior probability forµ

takes the form of the Student’st

distribution,

P(µ|data) =Studentdof=N−1(x¯,S/

N) This distribution requiresthree

numbers to specify, referred to as the mean (µ), deviation (σ) and the degrees of freedom(dof). The degrees of freedom is defined in this case to be the number of data points less one,N−1.

distribution,

P(µ|data) =Studentdof=N−1(x¯,S/

N)

This distribution requiresthreenumbers to specify, referred to as the mean (µ), deviation (σ) and thedegrees of freedom(dof). The degrees

of freedom is defined in this case to be the number of data points less one,N−1.

Example 7.5 Estimating the True Length of an Object...Again Say we have an object, and5measurements of its length from the same ruler but from different people,

p r i o r s, l i k e l i h o o d s,a n d p o s t e r i o r s 149

Unlike earlier, let’s say that we don’t know the uncertainty (given this ruler) of one measurement What is the best estimate of the length? Again, the best estimate should be given by the sample mean of these 5samples, ˆ µ = x1+x2+· · ·+xN N = 5.1[cm] +4.9[cm] +4.7[cm] +4.9[cm] +5.0[cm] 5 =4.92[cm]

with uncertainty related to the sample deviation

S2 = 1 N−1 (x1−x¯)2+· · ·+ (xN−x¯)2 = 1 5−1 (5.1[cm]4.92[cm])2+ (4.9[cm]4.92[cm])2+ (4.7[cm]4.92[cm])2+ (4.9[cm]−4.92[cm])2+ (5.0[cm]−4.92[cm])2 = 0.024[cm]2 S = q 0.024[cm]2=0.155[cm] S √ N = 0.155[cm] √ 5 =0.069[cm]

Looking at TableD.2on page236with “Degrees of Freedom” equal

to4, we find that the95% credible interval forµ(between areas0.025 and0.975) falls±2.776·S/√N, thus we have

ˆ

µ = 4.92[cm], 95% CI= [4.92[cm]−2.776·0.069[cm], 4.92[cm] +2.776·0.069[cm]]

= 4.92[cm], 95% CI= [4.73[cm], 5.11[cm]]

Although much of this is easier with the computer, it is instructive to go through simple examples by hand.

In document Statistical Inference For Everyone (Page 146-149)

Related documents