• No results found

Lecture 8. Confidence intervals and the central limit theorem

N/A
N/A
Protected

Academic year: 2021

Share "Lecture 8. Confidence intervals and the central limit theorem"

Copied!
15
0
0

Loading.... (view fulltext now)

Full text

(1)

Lecture 8. Confidence intervals

and the central limit theorem

Mathematical Statistics and Discrete Mathematics

November 25th, 2015

(2)

Central limit theorem

Let X1, X2, . . . Xnbe a random sample of size n from a distribution of X with mean µ

and variance σ2. Then,for largen, n X i=1 Xi≈ N (nµ, nσ2), X≈ N (µ, σ2/n), X− µ σ/√n ≈ N (0, 1).

Here X ≈ Y means that X and Y haveapproximatelythe same distribution.

Note that the central limit theorem is valid foranyrandom variable X with mean µ and variance σ2. In particular, X can be discrete, and the theorem says that the sample

means for large sample sizes are well approximated by the continuous normal distribution.

(3)

Central limit theorem

1 2 3 4 0.2 0.4 0.6 0.8 1.0 1 2 3 4 0.2 0.4 0.6 0.8 1.0 1 2 3 4 0.2 0.4 0.6 1 2 3 4 0.1 0.2 0.3 0.4 0.5 0.6 0.7

Figure:A comparison of PDF’s of sums of n independent uniform random variables on (0, 1)

for n = 1, 2, 3, 4.

(4)

Central limit theorem

● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ■ ■ ■ ■ ■ ■ ■■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆◆◆ ◆ ◆ ◆ ◆ ◆ ◆◆◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ 5 10 15 20 25 30 0.05 0.10 0.15 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ 5 10 15 20 25 30 0.05 0.10 0.15

Figure:A comparison of PDF’s of Binom(n, p) with n = 20, 30, 40 and p = 0.5 on the left,

and n = 60, 90, 120 and p = 0.1 on the left. One can see that the shape of the PDF approaches the bell curve of the normal distribution. Note that the number of variables n required for a good approximation by a normal distribution depends on the distribution of a single variable.

(5)

Central limit theorem

We toss a fair coin 400 times. Let X be the total number of heads. We want to know

P(190 ≤ X ≤ 210).

We have

X∼ Binom(400, 1/2), µ = E[X] = 400 · 1/2 = 200,

σ2= Var[X] = 400 · 1/2 · (1 − 1/2) = 100.

By the central limit theorem, X−20010 is approximately distributed like a standard normal variable Z, and hence

P(190 ≤ X ≤ 210) = P(X ≤ 210) − P(X ≤ 189) = P(X − 200 ≤ 10) − P(X − 200 ≤ −11) = PX− 200 10 ≤ 1  − PX− 200 10 ≤ −1.1  ≈ FZ(1) − FZ(−1.1) = 0.8413 − 0.1357 = 0.7056. 5 / 15

(6)

Confidence intervals for µ with arbitrary data and σ

2

known

Let X be anarbitrary randomvariable with known variance σ2, and let X

1, X2, . . . , Xn

be a random sample oflarge sizenfrom the distribution of X. Let Z ∼ N (0, 1) be a standard normal variable, and let zα/2> 0 be such that

FZ(−zα/2) = α/2.

Then, the random interval [L, R], where

L= X − zα/2σ/

n and R= X + zα/2σ/

√ n

is a confidence interval for the true mean µ with confidence level 1 − α, that is

(7)

Chi-squared and t-distribution

If Z1, Z2, . . . , Znis a random sample of size n from the standard normal distribution,

then we say that the random variable

Q=

n

X

i=1

X2i

haschi-squared distributionwith n degrees of freedom. We denote this by writing

Q∼ χ2

(n).

If Z and Q be independent random variables such that Z is a standard normal variable, and Q has chi-squared distribution with n degrees of freedom, then we say that the random variable

T =pZ Q/n hast-distributionwith n degrees of freedom.

These are very important distributions and numerical values for their CDF’s are found in all mathematical tables.

(8)

Chi-squared and t-distribution

5 10 15 20 25 30 35

0.05 0.10 0.15

(9)

Chi-squared and t-distribution

-4 -2 2 4 0.1 0.2 0.3 0.4 -4 -2 2 4 0.1 0.2 0.3 0.4 -4 -2 2 4 0.1 0.2 0.3 0.4 -4 -2 2 4 0.1 0.2 0.3 0.4

Figure:A comparison of PDF’s of the t-distribution with 1, 3, 10, and 30 degrees of freedom

(orange) and the standard normal distribution (blue).

(10)

Chi-squared and t-distribution

If X1, X2, . . . Xnis a sample from the normal distribution N (µ, σ2), and X is the

sample mean, and S2the sample variance, then

(n − 1)S2/σ2

has chi-squared distribution withn− 1degrees of freedom, and

X− µ S/√n has t-distribution withn− 1degrees of freedom.

Proof.The proof is outside the scope of the course. Partial arguments can be found in the book.

(11)

Confidence intervals for µ with normal data and σ

2

unknown

Let X be a normal random variable with unknown variance, and let X1, X2, . . . , Xnbe

a random sample of size n from the distribution of X. Let Tn−1be a random variable

that has t-distribution with n − 1 degrees of freedom, and let tα/2> 0 be such that

FTn−1(tα/2) = 1 − α/2.

Then, the random interval [L, R], where

L= X − tα/2S/√n and R= X + tα/2S/√n

is a confidence interval for the true mean µ of X with confidence level 1 − α, that is

P(L ≤ µ ≤ R) = 1 − α.

Proof.The proof is analogous to the one for σ2known. We use the fact X−µ

S/√n ∼ Tn−1.

(12)

The manufacturer claims that their mix of nuts and fruits contains 33g fruits per 100g. We want to check this claim. We buy 5 packages and weigh the fruit content. We obtain the following numbers:

31.84, 32.35, 31.20, 32.89, 32.80. We find x = 15P5 i−1xi= 32.22, and s2 =14 P5 i−1x2i − 5(x 2 ) = 0.50. We assume that the sample comes from a normal distribution. In the tables, we find that

t0.025= 2.776

The 95% confidence interval is then

[l, r] =hx− t0.025 s √ 5, x + t0.025 s √ 5 i =h32.22 − 2.776 · √ 0.5 √ 5 , 32.22 + 2.776 · √ 0.5 √ 5 i

(13)

Confidence intervals for σ

2

with normal data

Let X be a normal random variable with unknown variance, and let X1, X2, . . . , Xnbe

a random sample of size n from the distribution of X. Let χ2n−1be a random variable that has chi-squared distribution with n − 1 degrees of freedom, and let

χ2

α/2, χ

2

1−α/2> 0 be numbers such that

Fχ2 n−1(χ 2 α/2) = 1 − α/2, and Fχ2 n−1(χ 2 1−α/2) = α/2

Then, the random interval [L, R], where

L= (n − 1)S 2 χ2 α/2 and R=(n − 1)S 2 χ2 1−α/2

is a confidence interval for the true variance σ2of X with confidence level 1 − α, that

is

P(L ≤ σ2≤ R) = 1 − α.

(14)

Confidence intervals for σ

2

with normal data

Proof.We will use the fact that (n − 1)S2/σ2 ∼ χ2

n−1. By the definition of χ2 α/2, χ 2 1−α/2, > 0, we have 1 − α = P(χ2α/2≤ (n − 1)S22≤ χ2 1−α/2) = P χ 2 α/2 (n − 1)S2 ≤ 1 σ2 ≤ χ2 1−α/2 (n − 1)S2 ! = P (n − 1)S 2 χ2 1−α/2 ≤ σ2(n − 1)S2 χ2 α/2 ! .

(15)

Let us find a 95% confidence interval for σ2in the fruit mix example. We have s2= 0.50, n − 1 = 4, α/2 = 0.025. We find in the tables that,

χ21−α/2= χ20.975 = 0.484 and χ2α/2= χ20.025 = 11.1.

Hence, the confidence interval is

[l, r] =h4 · 0.5 11.1 , 4 · 0.5 4.84 i = [0.18, 4.13]. 15 / 15

References

Related documents

The central limit theorem states that: “In selecting random samples of size n from a population, the sampling distribution of the sample mean can be. approximated by a

iii) Jurutera projek/pihak berkuasa jalan hendaklah juga menyemak dengan agensi kemudahan awam terlibat bagi menentukan sama ada agensi berkenaan memerlukan atau

This pneumatic control block for two- hand start ZSB is used where a man­ ually started operation, such as the triggering of compressed air cylinders, would otherwise involve danger

But when the sample size is large the population of all possible sample proportions has approximately normal distribution, with mean (  ˆp ) equals P, and standard deviation (  ˆp

Principally, it is dedicated to reviewing the basic concepts underlying strategic marketing and applying these ideas to various tourism marketing organizations both in Australia

We will study the fundamental principles and techniques of data mining, and we will examine real-world examples and cases to place data-mining techniques in context, to

Chapter exams: 60 percent Discussion board participation: 20 percent Media professional essay: 20 percent COURSE OBJECTIVE:.. This course is an introduction to all

NNBRHINATOT Number of incoming handover attempts from neighbouring MSC (incoming handovers include basic handovers and subsequent handovers from neighboring MSC). NNBRHINASUCC