Statistical inference may be divided into two major
areas:
•
Estimation
ESTIMATION Terms
• In theory of estimation,
STATISTIC
is renamed as
ESTIMATOR
( )
Value of an estimator is called
ESTIMATE
Point Estimate
is a
SINGLE VALUE
of the estimator, obtained from available
sample observations
Example : proportion of vegetarians in a random sample of 50
PGP students can be a point estimate of the corresponding
proportion in the population of all PGP students.
Interval Estimate (Confidence Interval)
is an
INTERVAL
that provides an upper and lower bound for a
specific unknown population parameter.
Ex: The interval (45, 52) may contain the true proportion of
vegetarians among all PGP students with 95% confidence.
An
estimator
of a parameter is a statistic used to estimate the
parameter. The most commonly-used estimator of the:
Population (Parameter)
Estimator (statistic)
Mean (
)
is the
Mean (X)
Variance (
2)
is the
Variance (s
2)
Standard Deviation (
)
is the
Standard Deviation (s)
Proportion (p)
is the proportion ( )
Difference of means (
)
is the
difference of means( )
p
•
Desirable properties of estimators include:
Unbiasedness
Efficiency
Consistency
Point Estimators and Their
Properties
2 1
An
unbiased
estimator is on
target on average.
A
biased
estimator is
off target on average.
{
Bias
Properties of estimator: Unbiasedness
T is said to be an unbiased estimator of
θ iff E(T)=θ
Example:
SAMPLE MEAN IS THE ESTIMATOR OF POPULATION MEAN
)
(
1
)
(
1
)
1
(
)
(
1 1 1 n i n i i n i iE
x
n
x
E
n
x
n
E
x
E
Example of biased estimator: Sample
variance.
Given sample of size n from the population with unknown mean () and variance (2) we estimate mean as we already know and variance (intuitively) as:
Sample variance is not an unbiased estimator for the population variance. That is why when mean and variance are unknown the following equation is used for sample
variance:
n i i n i i x x n x x n T 1 2 2 1 2 1 ) ( 1 2 2 2 1 2 2 2 2 1 2 2 11
]
[
]
[
1
))
(
(
)
[var(
]
))
(
(
)
[var(
1
)
(
)
(
1
)
(
n
n
n
n
x
E
x
x
E
x
n
x
E
x
E
n
T
E
n i i n i i i n i
n i ix
x
n
s
1 2 2)
(
1
1
Consistency
• A consistent estimator converges towards the
parameter being estimated as the sample size
increases. i.e.
McGraw-Hill/Irwin © 2007 The McGraw-Hill Companies, Inc. All rights reserved.
n
as
T
V
and
T
E
0
)
(
)
(
n = 100
n = 10
An estimator is
efficient
if it has a relatively small variance (and
standard deviation).
An
efficient
estimator is,
on average, closer to the
parameter being estimated..
An
inefficient
estimator is, on
average, farther from the
parameter being estimated.
sample mean vs sample median
E(sample mean)=μ
E(sample median)=μ
V(sample mean) is σ
2/n
Example
Suppose, you want to estimate mean and sd of score of a batsman
in one day cricket. So, you have randomly chosen 5 different
innings and recorded scores as below
20 52 8 63 11
Find out unbiased estimator of mean and variance.
07
.
25
)
(
1
1
ˆ
8
.
30
2 1
x
x
n
s
x
n i i
Interval Estimator =
Point Estimator ± Margin of
Error
Confidence Interval
Sample
Statistic or
Point
Estimator
Confidence Limit
(Lower)
Confidence Limit
(Upper)
A Probability That the Population Parameter Falls
Somewhere Within the Interval.
• Confidence Coefficient/Level
: Probability that
the confidence interval will contain true
parameter
• Denoted by
(1 - α) %
e.g. 90%, 95%, 99%
: Probability that the interval does not contain the
parameter
The confidence coefficient is the area under the
curve of the sampling distribution.
Mean
Confidence
Intervals
Proportion
Known
Interval Estimator (Large
Sample)
known
Confidence Interval of
(
known)
Assumption
–
Population Standard Deviation,
is Known
–Sample size is large
Confidence Interval Estimator of
:
Let be an iid random sample of size n,
drawn from a population with mean
and sd
.
100(1-
)% Confidence Interval of
is
n
z
x
n
z
x
/ 2
/ 2
nx
x
x
1,
2,...,
The quantity is often called the
margin of error
or
the
sampling error
.
nz/2
For example, if: n = 30
= 20
= 122
114.85,129.15
15 . 7 122 ) 65 . 3 )( 96 . 1 ( 122 30 20 96 . 1 122 96 . 1 n x
A 95% confidence interval:
=0.05 ;
/2=0.025
x
z
0.025
1
.
96
0.4 0.3 0.2 0.1 0.0 x f( x)
Sampling Dis tribution of the Me an
x x x x x x x x 2.5% 95% 2.5% n x1.96 x1.96n x 2.5% fall above the interval 2.5% fall below the interval 95% fall within the interval
What is happening?
Background
We define as the z value that cuts off a right-tail area of under the standard normal curve.
z
2 2 P z z P z z P z z z z n > < < <
2 2 2 2 2 1 ( ) (1- )100% Confidence Interval: x 5 4 3 2 1 0 -1 -2 -3 -4 -5 0.4 0.3 0.2 0.1 0.0 Z f( z)S t a n d a rd N o r m al Dis trib uti o n
z 2 (1) z 2 2 2
/2
/2
0.99
0.005 2.576
0.98
0.010 2.326
0.95
0.025 1.960
0.90
0.050 1.645
0.80
0.100 1.282
(
1
)
2
z
2Critical Values of
z
and Levels of
Confidence
5 4 3 2 1 0 - 1 -2 - 3 -4 - 5 0. 4 0. 3 0. 2 0. 1 0. 0 Z f( z) S t a n d a rd N o r m al Di s trib utio n z 2 (1) z 2 2 2When sampling from the same population, using a fixed sample size, the
higher the confidence level, the wider the confidence interval.
5 4 3 2 1 0 -1 -2 - 3 - 4 -5 0 .4 0 .3 0 .2 0 .1 0 .0 Z f( z) S t a n d a r d N o r m al Di s tri b u ti o n 80% Confidence Interval: x n 1 28. 5 4 3 2 1 0 -1 -2 - 3 -4 -5 0 .4 0 .3 0 .2 0 .1 0 .0 Z f( z) S t a n d a r d N o r m al Di s tri b u ti o n
95% Confidence Interval:
x
n
1 96
.
Confidence level and the Width
of the Confidence Interval
Sample Size and the Width of the
Confidence Interval
When sampling from the same population, using a fixed confidence
level, the
larger the sample size, n, the narrower the confidence
interval.
0 .9 0 .8 0 .7 0 .6 0 .5 0 .4 0 .3 0 .2 0 .1 0 .0 x f( x)S a m p lin g D is trib u tio n o f th e Me a n
95% Confidence Interval:
n = 40
0 .4 0 .3 0 .2 0 .1 0 .0 x f( x)S a m p lin g D is trib u tio n o f th e Me a n
Confidence Interval of
(
unknown)
Assumption
–
Population Standard Deviation,
is unknown
–Sample size is large
Confidence Interval Estimator of
:
Let be an iid random sample of size n, drawn from a large sample with mean and sd . 100(1-)% Confidence Interval of is
n
s
z
x
n
s
z
x
/ 2
/ 2 n x x x1, 2,...,
n i ix
x
n
s
where
1 21
Practice Problem 1:
• A manufacturer of light bulbs claims that its light bulbs have a mean life hours with a standard deviation of 85 hours. A random sample of 40 such bulbs is
selected for testing. If the sample produces a mean value of 1505 hours, find out 95% Confidence Interval of .
Solution: Given, n=40 (large), =85 (known), 1-=0.95, =0.05,
Therefore, 95% CI of is given by 1505 x
96
.
1
025 . 0 2 /
z
z
1478
.
66
,
1531.34
96
.
1
40
85
1505
,
96
.
1
40
85
1505
Practice Problem 2:
• Waiting times (in hours) at a popular restaurant are found to have a mean waiting time of 1.52 hours with sd 2.25hrs. for a sample of 50 customers. Construct the 99% confidence interval for the estimate of the population mean.
Solution: Given, n=50 (large), s=2.25 (estimated), 1-=0.99, =0.01,
Therefore, 99% CI of is given by 52 . 1 x
58
.
2
005 . 0 2 /
z
z
1
.
20
,
2.34
58
.
2
50
25
.
2
52
.
1
,
58
.
2
50
25
.
2
52
.
1
The estimator of the population proportion, , is the sample proportion, . If the sample size is large,
p p p p p p pq n q = (1 - p) p p p n p n q
has an approximately normal distribution, with E( ) = and V( ) = where . When the population proportion is unknown, use the estimated value, , to estimate the standard deviation of .
For estimating , a sample is considered large enough when both an are greater than 5.
,
Large-Sample Confidence Intervals
for the Population Proportion, p
• Assumptions
–
Two Categorical Outcomes
–
Population Follows Binomial Distribution
–Large Sample
100(1-
)% Confidence Interval for population
proportion p is given by
Large-Sample Confidence Intervals
for the Population Proportion, p
n
p
p
Z
p
p
n
p
p
Z
p
ˆ
/2ˆ
(
1
ˆ
)
ˆ
/2ˆ
(
1
ˆ
)
A marketing research firm wants to estimate the share that foreign companies have in the Indian market for certain products. A random sample of 100
consumers is obtained, and it is found that 34 people in the sample are users of foreign-made products; the rest are users of domestic products. Give a 95% confidence interval for the share of foreign products in this market.
.
.
( .
)( .
)
.
( .
)( .
)
.
.
.
, .
p
z
pq
n
20 34
1 96
0 34 0 66
100
0 34
1 96 0 04737
0 34
0 0928
0 2472 0 4328
Thus, the firm may be 95% confident that foreign manufacturers control anywhere from 24.72% to 43.28% of the market.
The width of a confidence interval can be reduced only at the
price of:
•
a
lower level of confidence,
or
•
a
larger sample
.
. . ( . )( . ) . ( . )( . ) . . . , . p z pq n 2 0 34 1 645 0 34 0 66 100 0 34 1 645 0 04737 0 34 0 07792 0 2621 0 419790% Confidence Interval
. . ( . )( . ) . ( . )( . ) . . . , . p z pq n 2 0 34 1 96 0 34 0 66 200 0 34 1 96 0 03350 0 34 0 0657 0 2743 0 4057Sample Size, n = 200
Lower Level of Confidence
Larger Sample Size
Reducing the Width of Confidence Intervals
-The Value of Information
Mean
Confidence
Intervals
Known
Interval Estimator (Small Sample)
known
Confidence Interval of
(
known)
Assumption
– Population Distribution is Normal
– Population Standard Deviation,
is known
Confidence Interval Estimator of
:
Let be an iid random sample of size n, drawn
from a normal distribution with mean
and sd
.
100(1-
)% Confidence Interval of
is
n
z
x
n
z
x
/ 2
/ 2
nx
x
x
1,
2,...,
Confidence Interval of
(
unknown)
Assumption
– Population Distribution is Normal
– Population Standard Deviation,
is unknown
Confidence Interval Estimator of
:
Let be a random sample of size n, drawn from normal with mean and sd . 100(1-)% Confidence Interval of is
n
s
t
x
n
s
t
x
/ 2,n1
/ 2,n1
n x x x1, 2,...,
n i ix
x
n
s
where
1 21
1
where is the value of the t distributionwith n-1 degrees of freedom that cuts off a tail area of to its right.
2 /
A stock market analyst wants to estimate the average return on a certain
stock. A random sample of 15 days yields an average (annualized) return of
and a standard deviation of s = 3.5%. Assuming a normal
population of returns, give a 95% confidence interval for the average return
on this stock.
The critical value of t for df = (n -1) = (15 -1) =14 and a right-tail area of 0.025 is:
𝑠`2 = 15
14𝑠
2 = 13.125; 𝑠` = 3.623
The corresponding confidence interval or interval estimate is:
t0 025. 2.145 8.56,12.18 81 . 1 37 . 10 15 623 . 3 145 . 2 37 . 10 025 . 0 n s t x
Practice Problem 4:
% 37 . 10 x•
How close do you want your sample estimate to be to the unknown parameter? (What is the desired bound, B?)•
What do you want the desired confidence level (1-) to be so that the distance between your estimate and the parameter is less than or equal to B?•
What is your estimate of the variance (or standard deviation) of the population in question?Before determining the necessary sample size, three questions must be answered:
n
2z
x
:
for
Interval
Confidence
)%
-(1
:
example
For
}
Bound, B
Sample-Size Determination
Minimum required sample size in estimating the population
mean,
:
Bound of estimate : B (Known)
n
z
B
2 2 2 2Minimum required sample size in estimating the population
proportion,
n
z pq
B
2 2 2Minimum Sample Size: Mean
and Proportion
A marketing research firm wants to conduct a survey to estimate the average amount spent on entertainment by each person visiting a popular resort. The
people who plan the survey would like to determine the average amount spent by all people visiting the resort to within $120, with 95% confidence. From past operation of the resort, an estimate of the population standard deviation is s = $400. What is the minimum required sample size?
n z B 2 2 2 2 2 2 2 1 96 400 120 42 684 43 ( . ) ( ) .
Example 1
The manufacturers of a sports car want to estimate the proportion of people in a given income bracket who are interested in the model. The company wants to know the population proportion, p, to within 0.01 with 99% confidence. Current company records indicate that the proportion p may be around 0.25. What is the minimum required sample size for this survey?