Estimation

(1)

(2)

Statistical inference may be divided into two major

areas:

• Estimation

(3)

ESTIMATION Terms

• In theory of estimation,

STATISTIC

is renamed as

ESTIMATOR

( )

Value of an estimator is called

ESTIMATE

(4)

Point Estimate

is a

SINGLE VALUE

of the estimator, obtained from available

sample observations

Example : proportion of vegetarians in a random sample of 50

PGP students can be a point estimate of the corresponding

proportion in the population of all PGP students.

Interval Estimate (Confidence Interval)

is an

INTERVAL

that provides an upper and lower bound for a

specific unknown population parameter.

Ex: The interval (45, 52) may contain the true proportion of

vegetarians among all PGP students with 95% confidence.

(5)

(6)

An

estimator

of a parameter is a statistic used to estimate the

parameter. The most commonly-used estimator of the:

Population (Parameter)

Estimator (statistic)

Mean (



)

is the

Mean (X)

Variance (



2

)

is the

Variance (s

2

)

Standard Deviation (



)

is the

Standard Deviation (s)

Proportion (p)

is the proportion ( )

Difference of means (

)

is the

difference of means( )

p

• Desirable properties of estimators include:



Unbiasedness



Efficiency



Consistency

Point Estimators and Their

Properties

2 1 

(7)

An

unbiased

estimator is on

target on average.

A

biased

estimator is

off target on average.

{

Bias

(8)

Properties of estimator: Unbiasedness

T is said to be an unbiased estimator of

θ iff E(T)=θ

Example:

SAMPLE MEAN IS THE ESTIMATOR OF POPULATION MEAN







  

)

(

1 )

(

1 )

1 (

)

(

1 1 1 n i n i i n i i

E

x

n

x

E

n

x

n

E

x

E

(9)

Example of biased estimator: Sample

variance.

Given sample of size n from the population with unknown mean () and variance (2_{) we estimate mean as we already know and variance (intuitively) as:}

Sample variance is not an unbiased estimator for the population variance. That is why when mean and variance are unknown the following equation is used for sample

variance:



      n i i n i i x x n x x n T 1 2 2 1 2 1 ) ( 1 2 2 2 1 2 2 2 2 1 2 2 1

1 ]

[

]

[

1 ))

(

)

[var(

]

))

(

)

[var(

1 )

(

)

(

1 )

(











n

x

E

x

E

x

n

x

E

x

E

n

T

E

n i i n i i i n i

























  









n i i

x

n

s

1 2 2

)

(

1

(10)

Consistency

• A consistent estimator converges towards the

parameter being estimated as the sample size

increases. i.e.







n

as

T

V

and

T

E

0 )

(

)

(



n = 100

n = 10

(11)

An estimator is

efficient

if it has a relatively small variance (and

standard deviation).

An

efficient

estimator is,

on average, closer to the

parameter being estimated..

An

inefficient

estimator is, on

average, farther from the

parameter being estimated.

(12)

sample mean vs sample median

E(sample mean)=μ

E(sample median)=μ

V(sample mean) is σ

2

/n

(13)

Example

Suppose, you want to estimate mean and sd of score of a batsman

in one day cricket. So, you have randomly chosen 5 different

innings and recorded scores as below

20 52 8 63 11

Find out unbiased estimator of mean and variance.

07 .

25 )

(

1

1 ˆ

8 .

30

2 1















x

n

s

x

n i i







(14)

Interval Estimator =

Point Estimator ± Margin of

Error

(15)

Confidence Interval

Sample

Statistic or

Point

Estimator

Confidence Limit

(Lower)

Confidence Limit

(Upper)

A Probability That the Population Parameter Falls

Somewhere Within the Interval.

(16)

• Confidence Coefficient/Level

: Probability that

the confidence interval will contain true

parameter

• Denoted by

(1 - α) %

e.g. 90%, 95%, 99%

: Probability that the interval does not contain the

parameter

The confidence coefficient is the area under the

curve of the sampling distribution.

(17)

Mean

Confidence

Intervals

Proportion

Known

Interval Estimator (Large

Sample)

known

(18)

Confidence Interval of



(



known)

Assumption

–

Population Standard Deviation,



is Known

–

Sample size is large

Confidence Interval Estimator of



:

Let be an iid random sample of size n,

drawn from a population with mean



and sd



.

100(1-



)% Confidence Interval of



is

n

z

x

n

z

x



_ _/ ₂











_ _/ ₂



n

x

₁

,

₂

,...,

(19)

The quantity is often called the

margin of error

or

the

sampling error

.

n

z__/₂



For example, if: n = 30



= 20

= 122



114.85,129.15



15 . 7 122 ) 65 . 3 )( 96 . 1 ( 122 30 20 96 . 1 122 96 . 1         n x



A 95% confidence interval:



=0.05 ;



/2=0.025

x

z

0.025



1 .

96

(20)

0.4 0.3 0.2 0.1 0.0 x f( x)

Sampling Dis tribution of the Me an

 x x x x x x x x 2.5% 95% 2.5% n x1.96  x1.96_n x 2.5% fall above the interval 2.5% fall below the interval 95% fall within the interval

What is happening?

(21)

Background

We define as the z value that cuts off a right-tail area of under the standard normal curve.

z

_ 2  2 P z z P z z P z z z z n >    <       < <           





2 2 2 2 2 1 ( ) (1- )100% Confidence Interval: x 5 4 3 2 1 0 -1 -2 -3 -4 -5 0.4 0.3 0.2 0.1 0.0 Z f( z)

S t a n d a rd N o r m al Dis trib uti o n

z_ 2 (1) z_ 2  2  2

/2

(22)

0.99 0.005 2.576

0.98 0.010 2.326

0.95 0.025 1.960

0.90 0.050 1.645

0.80 0.100 1.282

(

1 



)



2 z

_ 2

Critical Values of

z

and Levels of

Confidence

5 4 3 2 1 0 - 1 -2 - 3 -4 - 5 0. 4 0. 3 0. 2 0. 1 0. 0 Z f( z) S t a n d a rd N o r m al Di s trib utio n z_ 2 (1) z_ 2  2  2

(23)

(24)

When sampling from the same population, using a fixed sample size, the

higher the confidence level, the wider the confidence interval.

5 4 3 2 1 0 -1 -2 - 3 - 4 -5 0 .4 0 .3 0 .2 0 .1 0 .0 Z f( z) S t a n d a r d N o r m al Di s tri b u ti o n 80% Confidence Interval: x n 1 28.  5 4 3 2 1 0 -1 -2 - 3 -4 -5 0 .4 0 .3 0 .2 0 .1 0 .0 Z f( z) S t a n d a r d N o r m al Di s tri b u ti o n

95% Confidence Interval:

x

n



1 96

.



Confidence level and the Width

of the Confidence Interval

(25)

Sample Size and the Width of the

Confidence Interval

When sampling from the same population, using a fixed confidence

level, the

larger the sample size, n, the narrower the confidence

interval.

0 .9 0 .8 0 .7 0 .6 0 .5 0 .4 0 .3 0 .2 0 .1 0 .0 x f( x)

S a m p lin g D is trib u tio n o f th e Me a n

95% Confidence Interval:

n = 40

0 .4 0 .3 0 .2 0 .1 0 .0 x f( x)

S a m p lin g D is trib u tio n o f th e Me a n

(26)

Confidence Interval of



(



unknown)

Assumption

–

Population Standard Deviation,



is unknown

–

Sample size is large

Confidence Interval Estimator of



:

Let be an iid random sample of size n, drawn from a large sample with mean  and sd . 100(1-)% Confidence Interval of  is

n

s

z

x

n

s

z

x



_ _/ ₂









_ _/ ₂ n x x x₁, ₂,...,













n i i

x

n

s

where

1 2

1

(27)

Practice Problem 1:

• A manufacturer of light bulbs claims that its light bulbs have a mean life  hours with a standard deviation of 85 hours. A random sample of 40 such bulbs is

selected for testing. If the sample produces a mean value of 1505 hours, find out 95% Confidence Interval of .

Solution: Given, n=40 (large), =85 (known), 1-=0.95, =0.05,

Therefore, 95% CI of  is given by 1505  x

96 .

1

025 . 0 2 /



z



z

_



1478

.

66 ,

1531.34



96 .

1

40

85 1505

,

96 .

1

40

85 1505











_

_

(28)

Practice Problem 2:

• Waiting times (in hours) at a popular restaurant are found to have a mean waiting time of 1.52 hours with sd 2.25hrs. for a sample of 50 customers. Construct the 99% confidence interval for the estimate of the population mean.

Solution: Given, n=50 (large), s=2.25 (estimated), 1-=0.99, =0.01,

Therefore, 99% CI of  is given by 52 . 1  x

58 .

2

005 . 0 2 /



z



z

_



1 .

20 ,

2.34 

58 .

2

50

25 .

2

52 .

1 ,

58 .

2

50

25 .

2

52 .

1 

















(29)

The estimator of the population proportion, , is the sample proportion, . If the sample size is large,

p p p p p p pq n q = (1 - p) p p p n p n q      

has an approximately normal distribution, with E( ) = and V( ) = where . When the population proportion is unknown, use the estimated value, , to estimate the standard deviation of .

For estimating , a sample is considered large enough when both an are greater than 5.

,

 

Large-Sample Confidence Intervals

for the Population Proportion, p

(30)

• Assumptions

–

Two Categorical Outcomes

–

Population Follows Binomial Distribution

–

Large Sample

100(1-



)% Confidence Interval for population

proportion p is given by

Large-Sample Confidence Intervals

for the Population Proportion, p

n

p

Z

p

n

p

Z

p

ˆ



_ _/₂

ˆ

(

1 

ˆ

)



ˆ



_ _/₂

ˆ

(

1 

ˆ

)

(31)

A marketing research firm wants to estimate the share that foreign companies have in the Indian market for certain products. A random sample of 100

consumers is obtained, and it is found that 34 people in the sample are users of foreign-made products; the rest are users of domestic products. Give a 95% confidence interval for the share of foreign products in this market.







 

.

( .

)( .

)

.

( .

)( .

)

.

, .

p

z

pq

n

















 2

0 34

1 96

0 34 0 66

100 0 34

1 96 0 04737

0 34

0 0928

0 2472 0 4328

Thus, the firm may be 95% confident that foreign manufacturers control anywhere from 24.72% to 43.28% of the market.

(32)

The width of a confidence interval can be reduced only at the

price of:

• a

lower level of confidence,

or

• a

larger sample

.





   . . ( . )( . ) . ( . )( . ) . . . , . p z pq n          2 0 34 1 645 0 34 0 66 100 0 34 1 645 0 04737 0 34 0 07792 0 2621 0 4197

90% Confidence Interval





   . . ( . )( . ) . ( . )( . ) . . . , . p z pq n          2 0 34 1 96 0 34 0 66 200 0 34 1 96 0 03350 0 34 0 0657 0 2743 0 4057

Sample Size, n = 200

Lower Level of Confidence

Larger Sample Size

Reducing the Width of Confidence Intervals

-The Value of Information

(33)

Mean

Confidence

Intervals



Known

Interval Estimator (Small Sample)

known

(34)

Confidence Interval of



(



known)

Assumption

– Population Distribution is Normal

– Population Standard Deviation,



is known

Confidence Interval Estimator of



:

Let be an iid random sample of size n, drawn

from a normal distribution with mean



and sd



.

100(1-



)% Confidence Interval of



is

n

z

x

n

z

x



_ _/ ₂











_ _/ ₂



n

x

₁

,

₂

,...,

(35)

Confidence Interval of



(



unknown)

Assumption

– Population Distribution is Normal

– Population Standard Deviation,



is unknown

Confidence Interval Estimator of



:

Let be a random sample of size n, drawn from normal with mean  and sd . 100(1-)% Confidence Interval of  is

n

s

t

x

n

s

t

x



_ _/ ₂_,_n_₁











_ _/ ₂_,_n_₁



n x x x₁, ₂,...,















n i i

x

n

s

where

1 2

1

where is the value of the t distribution

with n-1 degrees of freedom that cuts off a tail area of to its right.

2 / 

(36)

A stock market analyst wants to estimate the average return on a certain

stock. A random sample of 15 days yields an average (annualized) return of

and a standard deviation of s = 3.5%. Assuming a normal

population of returns, give a 95% confidence interval for the average return

on this stock.

The critical value of t for df = (n -1) = (15 -1) =14 and a right-tail area of 0.025 is:

𝑠`2 = 15

14𝑠

2 _{= 13.125; 𝑠` = 3.623}

The corresponding confidence interval or interval estimate is:

t_{0 025}_.  2.145 8.56,12.18 81 . 1 37 . 10 15 623 . 3 145 . 2 37 . 10 025 . 0       n s t x

Practice Problem 4:

% 37 . 10  x

(37)

•

How close do you want your sample estimate to be to the unknown parameter? (What is the desired bound, B?)

•

What do you want the desired confidence level (1-) to be so that the distance between your estimate and the parameter is less than or equal to B?

•

What is your estimate of the variance (or standard deviation) of the population in question?

Before determining the necessary sample size, three questions must be answered:

n







_ 2

z

x

:

for

Interval

Confidence

)%

-(1

:

example

For



}

Bound, B

Sample-Size Determination

(38)

Minimum required sample size in estimating the population

mean,

:

Bound of estimate : B (Known)







n

z

B



2 2 2 2

Minimum required sample size in estimating the population

proportion,

n

z pq

B



2 2 2

Minimum Sample Size: Mean

and Proportion

(39)

A marketing research firm wants to conduct a survey to estimate the average amount spent on entertainment by each person visiting a popular resort. The

people who plan the survey would like to determine the average amount spent by all people visiting the resort to within $120, with 95% confidence. From past operation of the resort, an estimate of the population standard deviation is s = $400. What is the minimum required sample size?

n z B      2 2 2 2 2 2 2 1 96 400 120 42 684 43 ( . ) ( ) .

Example 1

(40)

The manufacturers of a sports car want to estimate the proportion of people in a given income bracket who are interested in the model. The company wants to know the population proportion, p, to within 0.01 with 99% confidence. Current company records indicate that the proportion p may be around 0.25. What is the minimum required sample size for this survey?

n

z pq

B





 2 2 2 2 2

2 576 0 25 0 75

010

124.42

125 .

( . )( .

)

.

Example 2

(41)

Problem

NDTV randomly selected 10,000 final year students

across different management schools in India and

asked them about their career choices. 4% said they

want to take the plunge and start their own companies

even if that meant giving up lucrative job offers from

established MNCs. Find a 99% confidence interval of

the true population proportion of management

students in India who want to work their start-ups.