• No results found

Topic 5: Confidence Intervals (Chapter 9)

N/A
N/A
Protected

Academic year: 2021

Share "Topic 5: Confidence Intervals (Chapter 9)"

Copied!
5
0
0

Loading.... (view fulltext now)

Full text

(1)

Topic 5: Confidence Intervals (Chapter 9)

1. Introduction

• The two general areas of statistical inference are: 1) estimation of parameter(s), ch. 9

2) hypothesis testing of parameter(s), ch. 10

• Let X be some random variable with unknown meanµ. Suppose we take a sample of size . One could find the sample mean,n x, and use it to estimateµ. Such a single number used to estimate a parameter is called a point estimate. • Question: Are there other reasonable point estimates (other thanx) for µ? • Claim: The “best” point estimate of µ isx.

• The point estimate does not give us a sense of how close x might be toµ, i.e. it doesn’t tell us about the inherent uncertainty in the process. Therefore, interval estimates are used, which provide a range of values that contain the parameter, sayµ, with some specified degree of confidence. This range is called a confidence interval.

2. Two-Sided Confidence Interval

• Consider now some random variable X with mean µ and standard deviationσ. Suppose we want to estimate µ (the estimation of σ follows later). Using the central limit theorem

~ ( , X N n) σ µ Therefore, (X Z n ) µ σ− = (1) is a standard normal. • Recall that by definition

2 2 ( ) P zα < <Z zα = −1 α (2) e.g. because z.025 =1.96 ( 1.96 1.96) 1 .05 .95 P − < <Z = − =

(2)

• Substituting (1) into (2), and using some algebra, one can show that (2) implies 2 2 ( P X z X z n n α α ) 1 σ µ σ α − < < + = − (3) e.g. letting α =.05 ( 1.96 1.96 ) .95 P X X n n σ µ σ − < < + =

• The interpretation of (3) is tricky. Notice that we calculate a random interval (X 1.96 ,X 1.96 )

n n

σ σ

− + .

If we do this repeatedly, we expect that 95% of the intervals contain the unknown parameterµ. It does not mean that µ assumes a value within the interval with probability 0.95. An illustration of this is given in Figure 9.1. • In general, a 100 (1−α)% confidence interval for µ is

2 2 (X z ,X z n n α α ) σ σ − + . (4)

• The confidence interval could be written as X E± where 2 E z n α σ = . (5)

E is also called the bound on error (BOE).

• Suppose one wants to reduce E. This could obviously be done by increasing α or increasing . In fact, may be determined from E. Notice from (5), n n

2 n z E α σ = . Therefore, 2 2 [ n z E α ] σ = . (6)

• Consider an example. Let X be the cholesterol level of a U.S. male who smokes. Suppose µ is unknown, but that we know σ =46 (we actually assume this value because we presumably know σ =46 for population of all U.S. adult males). Suppose we take a sample ofn=12, and find

217

x= .

Then the best point estimate of µ is 217. Also, a 95% confidence interval for µ is (using X =217)

(3)

1.96(46) 217 12 ± 217 26 (191, 243). = ± = A 99% confidence interval for µ is

2.58(46) 217 12 (183, 251) ± = Why is it wider?

• Suppose we are designing the study, and we want to know µ within 10 units. (Note that E=10 and the interval’s width is 20). Substitutingσ, E, and α =.01 into (6), one has

2 2.58(46)

( ) 140.8

10

n= = .

Therefore our recommendation is to obtain a sample for n=141 males. 3. One-Sided Confidence Intervals

• Sometimes (though not often), one desires just an upper or a lower bound onµ. Using algebraic manipulations as before, one has:

An 100(1-α )% upper (one-sided) confidence bound on µ is X z

n α

σ

+ , (7)

and a corresponding lower confidence bound is X z

n α

σ

− . (8)

e.g. for U.S. males, to find a 95% upper bound, recall that . Therefore the upper bound from the sample is

.05 1.65 z = 1.65(46) 217 12 217 21.9 238.9 ± = + = 4. Students’ t distribution

• Consider a random variable X which is known to be normally distributed, but where neither µ nor σ are known. To make inferences aboutµ, we could use the Z transformation, provided σ is known. However, in case of unknownσ, which is the common case, we replace σ by to find s

(4)

(x ) t s n µ − = .

This statistic has the t distribution with n-1 “degrees of freedom” (df). • Several facts of interest are

1) The t-distribution is also symmetric about 0, but it has “thicker tails”, i.e. it’s a bit flatter. This is because there is added variability introduced by using the random variable rather than a constant s σ in the denominator (see Figure 9.2).

2) The t-distribution approaches Z as becomes large. This is because as increases, we have more information and approaches

n

n s σ. Why?

• Recall we had an extensive Z table, but it’s impractical to give a table for each statistic. Therefore, tables typically give only upper percent points. For example, from Table A-4, one has

1 n t 10,.025 2.228 t = . By symmetry, 10,.975 2.228 t = − ,

and thus one can find upper and lower percentiles.

• One can construct confidence intervals based on the t-distribution in the same way as before. The expression comparable to (3) is

2 2 ( s s ) P X t X t n n α µ α 1 α − < < + = − .

In general, a 100(1-α )% confidence interval for µ, with approximately normal

x, is 2 2 (X t s ,X t n n α α − + s ). (9)

• For example, let X denote an infant’s plasma aluminum level, which is assumed to be approximately normal. Suppose we take a random sample of

infants, and find 10

n= x =37.2 and s=7.13. To find a 95% confidence interval, we find

9,.025 2.262

t = .

Hence, the interval estimate is

2.262(7.13) 37.2 10 37.2 5.1 (32.1, 42.3) ± = ± =

(5)

• Notice that in this problem, the interval width is a random variable. Hence, if two experimenters took different samples, their interval widths would differ, unlike the case of knownσ . In general, the interval based on t would be wider than that based on Z. Why? Though the widths differ, we expect 100(1-α )% of the intervals to contain µ. See Figure 9.3.

5. Case Study

• This has a nice case study on the efficacy of using a particular drug for treating ADD children.

References

Related documents

The hypothesis put forward with respect to the XLPE insulated cables is that degradation of field aged cables can be detected applying chemical analyses and electrical

The main objective of this work was to develop, implement, and evaluate a web-based scheduling application for a collaborative preparation of class schedules at

altera showed a distinct effect on the emergence and survival of annual and perennial species 28.. and negatively affected the growth of individuals belonging to both groups

Even though the results obtained with this work were satisfactory and consistent with most studies among the literature, as well as with some of the expectations from when this work

The goal of the present study will be to employ a bioecological model of human development to examine how acculturative experiences within the individual, family, and school

pushed from the exposed electrode and collected by the dielectric surface. A space charge develops that temporarily reduces the electric field, quenching charge transfer. When

A prospective study on evaluating the diagnostic yield of video capsule endoscopy followed by directed double-balloon enteroscopy in patients with obscure gastrointestinal