III. Estimation and Testing in Simple Regression
a. Estimation in the Simple Linear Regression Model
b. Sampling Distribution of b1
c. Understanding Standard Errors
d. Confidence Intervals
e. Example: Log-Log Price Elasticity Regressions
f. Hypothesis-testing
g. Example: Market Model and Hypothesis-testing
a. Estimation in the Simple Linear Regression Model
Recall the SLR that assumes that every observation in the dataset was generated by the model:
We use Least Squares to estimate β0 and β1 . Recall the formulas:
ˆ
β
1=
b
1=
i=1(X
i−
X)(Y
i−
Y)
N∑
(X
i−
X)
2 i=1N
∑
ˆ
β
0=
b
0=
Y
−
b
1X
a. Estimation in the Simple Linear Regression Model NOTE!!: β0 is not b0, β1 is not b1 and εi is not e
Y
True Line: β0 + β1 X
Least Squares Line: b0 + b1 X
ei
X
b. Sampling Distribution of b1
It is possible to derive the sampling distribution of b1. See appendix 2, b1 is a weighted average of the Y values!
This distribution describes how the estimator b1 would vary over different samples with the X values fixed.
It turns out that b1 is normally distributed
Mean is β1 -- unbiased Variance of b1
The variance term determines how
close the estimate will be to the true value. Remember: large σ is bad!
b
1~ N
β
1,
σ
b1
2
b. Sampling Distribution of b1
What is the formula for ? Can we intuit what should be in the formula? (see appendix 2 for derivation).
– How should σ figure in the formula?
– How should N figure in the formula?
– Anything else?
three factors:
– N
– σ2
– sX
σb
1
2
Var b
( )
1=
σ
2
X
i−
X
(
)
i=1 N
∑
2=
σ
2N
−
1
35 40 45 50 55 60 65 0 1000 2000 Size Pri ce
35 40 45 50 55 60 65
0 1000 2000 Size Pri ce
35 40 45 50 55 60 65
0
1000
2000
Pri
ce
35 40 45 50 55 60 65
0
1000
2000
Pri
ce
b. Sampling Distribution of b1
sX
N
σ
c. Understanding Standard Errors
When estimating a quantity, it is vital to develop a notion of the precision of the estimation.
examples:
i. estimate the slope of the regression line
ii. estimate the value of a flat-panel TV given its size
iii. estimate the expected return on a portfolio
iv. estimate the value of a brand name
v. estimate the damages from patent infringement
Why is this important?
• We plan on making business decisions based on our
estimates.
• Some decisions may be very sensitive to the
c. Understanding Standard Errrors
An example from “everyday” life:
– When framing a house, we can estimate
a required piece of wood to ± ¼” – When building a fine cabinet, the
estimates may have to be accurate to ±1/16” or even ±1/32”
The standard deviations of the least squares estimators of the slope and intercept give a precise measurement of the accuracy of the estimator.
c. Understanding Standard Errors
If we insert our estimate of σ, then we have estimated standard deviations or standard errors for the least squares estimators:
Now we can summarize the amount of information there is in the sample about the true regression line parameters.
sb
1 =
s2 (N−1)s2X
S not σ
Bottom Line:
c. Understanding Standard Errors
Where can we find the standard errors on the R printout?
d. Confidence Intervals
We want a margin of error in the estimation of the slope. We can use the standard errors to construct a confidence
interval which provides the margin of error.
All confidence intervals are of the form:
t* is a positive number obtained from the t distribution.
So we have the estimate +/- a multiple of the standard error.
b
1±
t
*s
bd. Confidence Intervals
To define a confidence interval, you must first set the confidence level.
We can never be completely confident that an interval will cover the true value. (the 100% confidence interval is
everything!). So we set a confidence level. Typically, 95 per cent is used (for very large datasets a 99 per cent level
should be used).
We then determine the multiple, t*, so that there is a 95 per
d. Confidence Intervals: finding t*
We find t* by reference to the t distribution. The t distribution is
similar to the standard normal (Z) and indexed by the number of degrees of freedom, N-2. For a 100*(1-α)% confidence level:
t
*N−2,α/ 2
d. t Distribution and Confidence Intervals
Thus, the 100 x (1-α)% C.I. is given by:
Confidence intervals provide information about the range of values of the slope consistent with our data. This is much more useful than simply using the slope estimate.
An estimate without some idea of its precision is useless.
The only question is how to find (“look-up”) t*
b
1±
t
N* −2,α/ 2s
bd. t Distribution and Confidence Intervals
Let’s compute a confidence interval for the flat-panel TV data
First we choose α, Pick α = 0.05 or 95 per cent level of confidence.
95% CI: 57.13 ± t68,0.025 (6.555)
Finding the t* cut-off value
We need to find the value, t*, such that
or
This should remind us of the CDF function (Cumulative Distribution Function) …
Pr ⎡⎣−tN* −2,α/ 2 ≤ X ≤ tN* −2,α/ 2⎤⎦ = 1− α where X ~ tN−2
Pr X ≤ −t*
N−2,α/ 2
⎡⎣ ⎤⎦ = α
Quick Review: CDF
The CDF is a table of probabilities that tells us: for any little x, what is the probability that the random variable X is less than x?
Plotting this “table”…
X Probability
Quick Review: CDF
Let’s blow up the boxed area:
0.025
t68,.025
Note that we are using the CDF function or “table” backwards. We are reading from Probability to Value.
In fact, we are using the “Inverse” CDF function
Quick Review: CDF
Let’s do it in R. We use the qt() function which is the
“quantile” or inverse CDF function. We need to tell R which t distribution to use (the one with 68 df) and feed in α/2.
So our confidence interval is
57.13 ± 1.995(6.555) = [44.05, 70.21]
e. Example: Log-Log Price Elasticity Regressions
Let’s look at some demand data. The dataset detergent has demand data on sales and prices of Tide 128 oz laundry
detergent at some 86 stores with 2-5 years of weekly data.
Hard to see much of
anything here.
Note: used
pch=“.” to
e. Example: Log-Log Price Elasticity Regressions
Quantity is an odd variable. It can’t be less than zero and is often very small. There appear to be store-weeks where quantity is huge. Are these outliers?
Recall that the logarithm function has the effect of
compressing large values and expanding the axis for small values.
Basic Properties:
log 1
( )
=
0
log z
(
×
w
)
=
log z
( )
+
log w
( )
e. Example: Log-Log Price Elasticity Regressions
Graph histogram of quantity and log(quantity).
0 200 400 600 800 1000
-2 0 2 4 6 x lo g(x)
Histogram of q_tide128
q_tide128 F re qu en cy
0 500 1000 1500 2000
0
4000
10000
Histogram of log(q_tide128)
log(q_tide128) F re qu en cy
0 2 4 6 8
0
1000
e. Example: Log-Log Price Elasticity Regressions
e. Example: Log-Log Price Elasticity Regressions
Run regressions with log and non-logged variables.
e. Example: Elasticities
In the regression with raw variables, we interpret regression coefficient as the expected change in q for a given change in p.
E q p
⎡⎣ ⎤⎦
=
662
−
69.4p
Δ
q
Δ
p
=
−
69.4
How do we interpret this? Not very meaningful without a
e. Example: Elasticities
To interpret the coefficient, some would convert it into an elasticity.
%
Δ
q
%
Δ
p
=
Δ
q
q
Δ
p
p
=
Δ
q
Δ
p
×
p
q
=
−
69.4
8.36
81
=
−
7.16
Here we used the average levels of price and quantity. A one percent reduction in price yields a 7.16 percent
e. Example: Elasticities
In the log-log regression, the coefficient on log-price can be interpreted directly as an elasticity. Why?
Δ
logq
Δ
logp
=
−
4.4
=
log q
( )
1−
log q
( )
0log p
( )
1−
log p
( )
0=
log
q
1q
0⎛
⎝⎜
⎞
⎠⎟
log
p
1p
0⎛
⎝⎜
⎞
⎠⎟
=
log 1
+
Δ
q
q
0⎛
⎝⎜
⎞
⎠⎟
e. Example: Log-log Demand
Let’s compute a confidence interval for the price elasticity. Since the sample size is very large (> 14,000), we will use a 99 percent confidence level.
Good to be a marketer with ample and informative data.
f. Hypothesis Testing
Suppose that we are interested in a specific value of the slope parameter, β1.
This can be rephrased as a hypothesis
H0: Null (from “no effect”) vs.
HA : Alternative
For example, is there any evidence in the data to support the existence of a relationship between X and Y?
So if we want test whether X affects Y, we would test whether
β1 = 0.
f. Hypothesis Testing
How can we assess whether or not the data support or refute the null hypothesis?
We can look at our estimate of the true slope and compare it to the hypothesized value:
b
1− β
1*(discrepancy)
What is wrong just using the discrepancy above? How close is close?
f. Hypothesis Testing
t statistic:
The basic intuition is that if the null is true then the t statistic should be small (in absolute value).
Get worried when t is large!
t
=
b
1− β
1 *s
b1
f. Hypothesis Testing
Formal Approach to Hypothesis-Testing: Two Steps:
i. Pick the significance level (α) = Prob(reject when null true)
by deciding what level of error of this kind is acceptable (called type I error).
ii. Use α to choose a rejection region – the set of t statistic
values which will lead to a rejection. This is done by picking a
f. Hypothesis Testing
This is exactly the same problem as picking the cut-off value in setting up the confidence interval!
f. Hypothesis Testing
In practice, we take a value of α to be around .05 unless:
– Sample population is small, or is large
– Cost of making a type I error is large
(type I error = reject null when null is true)
g. Example: Market Model and Hypothesis Testing
Even though we know it to be false, let’s hypothesize that there is no relationship between the Windsor Mutual Fund and the Market.
H0: β1 = 0 HA: β1 ≠ 0
slope estimate calculated std error t stat = 32.1 = (.93572-0)/.02915
of slope coef
hypothesized value
s = 0.01872
s
Xi ! X
(
)
2g. Example: Market Model and Hypothesis Testing
The t value is huge (32) relative to the null t distribution with 180-2 =178 degrees of freedom!
To illustrate just how big this is, let’s simulate some numbers from the t distribution
In 1000 draws from the t distribution, we didn’t get a single value anywhere near 32.
We conclude that we reject the null hypothesis: H0: β1 = 0
Histogram of rt(1000, df = 178)
F
re
qu
en
cy
-3 -2 -1 0 1 2 3
0
20
40
60
g. Example: Market Model and Hypothesis Testing
Now let’s test a more relevant value of β1
In finance, stocks and portfolios are characterized by their betas which are estimated from regressions very similar to this one. β1 is sometimes used as an estimate of risk or volatility. The value of 1 has central
significance.
β1 > 1: volatile assets (amplify market up/down moves)
β1 < 1: non-volatile assets (shrink market movements)
This suggests that we consider the hypothesis that β1 = 1. H0: β1 = 1
g. Example: Market Model and Hypothesis Testing
We can use the qnorm (inverse of CDF) command to compute this for us.
We want the value of t statistic so that:
Pr[ | t | > t* ] = .05
0.025 Area
Find the critical value, t* for the .05 significance level.
Value such that Pr[t < t*] = 0.025
g. Example: Market Model and Hypothesis Testing
Now compute the value of t stat:
This is larger than the 95% critical value for t(178) so we reject the null hypothesis: H0: β1 = 1
Estimate
of b1 Hypo Value
Std Error
g. Example: Market Model and Hypothesis Testing Let’s look at the intercept for the Windsor fund regression. Remember this is Jensen’s alpha.
H0: β0 = 0 HA: β0 ≠ 0
We can see that the intercept is significantly different from zero. However, 95 CI is large:
h. P Values
One of the problems with formal hypothesis-testing is that the strength of information in the data in support/against null is not conveyed by accept/ reject!
– t value is a tiny bit less than the t cutoff, we accept the null
– If the t value is a tiny bit bigger we reject the null
The information from the data is pretty much the same but we act quite differently.
Therefore, we need some measure of the strength of rejection. The
h. P Values
The p-value is the probability of observing a value of the t statistic farther out in the tail than the observed t value.
Observed t-stat value
For the standard t-tests printed out by R, the p-value is
h. P Values
Let’s compute p for the mutual funds example of testing
β1 = 1. pt() is the R function for the CDF of a t distribution.
This means that 2.7% of the area of the null distribution is greater than 2.2 and less than -2.22 or:
Pr[t180-2 ≥ 2.22] + Pr[t180-2 ≤ -2.22] = 0.0275
h. P Values
Small p value (< α) large | t | reject
Large p value (≥ α) small | t | accept null
Appendix 1: Sampling Distribution of b0, cov(b1,b0)
b
0~N(
!
0,
"
b0 2
)
cov b
(
0,b1)
= −σ2 XN −1
( )
sX2⎛ ⎝
⎜ ⎞
⎠ ⎟
Appendix 2: Derivation of Var(b1)
First, let’s write b1 as a linear combination of the Y’s. This makes b1 very much like a weird sort of sample average.
To make things easier to read, use one symbol for the denominator.
Now we can see that b1 is a linear combination of the Y’s. We will call the weights, ci.
b1 =
"
(
Xi ! X)
(
Yi ! Y)
Xi ! X(
)
2"
=Xi ! X
(
)
Yi"
Xi ! X
(
)
2"
(Why ?)Appendix: Derivation of Var(b1)
These are not the same weights as in the sample average (1/N vs. ci). Let’s observe some simple properties of the ci.
Property 1:
Weights sum up to zero. Observations farther from receive larger weights. Remember: just because something sums to 0 doesn’t mean that it is all zeroes!
Property 2: ci
2 =
(
Xi ! X)
2D2
"
"
= 1D2
(
Xi ! X)
2
= D
D2 = 1 D!
Appendix: Derivation of Var(b1)
Derivation of Var(b1)
Now that we know a few things about the ci weights, we can easily derive the variance formula.
Now we use:
i. the fact that, given Xi , the Yi are independent
ii. the formula from Math-Stat prereq on the variance of a l. c. of
indep r.v.s
recalling that, we get the, by now, familiar formula!
Var(b1) = Var
(
∑
ciYi)
Glossary of Symbols
s - standard error of the regression ci - weights used to compute b1 D - denominator of ci weight
" - standard errors of least squares estimates
α - significance level
tN-2 - t random variable with N-2 degrees of freedom
tN-2, α/2 - t critical value for t with N-2 df and significance level α
sb
Important Equations
estimate of error variance standard error of the regression three factors driving sampling variance of slope
Var b
( )
1=
σ
2
X
i−
X
(
)
i=1 N
∑
2=
σ
2N
−
1
Important Equations
(1
− α
)% C.I.: b
±
t
N−2,α/2s
bstd errors of coefs
Confidence Interval
Rejection Region for
Important Equations
Glossary of R Commands
• hist(a): Graphs a histogram of the given data values of
the variable a.
• pf(t stats,df=10): Returns the p-value of a t statistic
with degrees of freedom = 10
• qt(prob,df=10): Returns the t statistics for the left-tail
probability of a t distribution with degree of freedom = 10.
• rt(100,df=10): Generates random 100 numbers for
the t distribution with degree of freedom of 10.