ChIIIEstimationandTestingintheSimpleRegressionModel.pdf

(1)

III. Estimation and Testing in Simple Regression

a.  Estimation in the Simple Linear Regression Model

b.  Sampling Distribution of b₁

c.  Understanding Standard Errors

d.  Confidence Intervals

e.  Example: Log-Log Price Elasticity Regressions

f.  Hypothesis-testing

g.  Example: Market Model and Hypothesis-testing

(2)

a. Estimation in the Simple Linear Regression Model

Recall the SLR that assumes that every observation in the dataset was generated by the model:

We use Least Squares to estimate β₀ and β₁ . Recall the formulas:

ˆ

β

₁

=

b

₁

=

i=1

(X

i

−

X)(Y

i

−

Y)

N

∑

(X

_i

−

X)

2 i=1

N

∑

ˆ

β

₀

=

b

₀

=

Y

−

b

₁

X

(3)

a. Estimation in the Simple Linear Regression Model NOTE!!: β₀ is not b₀, β₁ is not b₁ and ε_i is not e

Y

True Line: β₀ + β₁ X

Least Squares Line: b₀ + b₁ X

e_i

X

(4)

b. Sampling Distribution of b₁

It is possible to derive the sampling distribution of b₁. See appendix 2, b₁ is a weighted average of the Y values!

This distribution describes how the estimator b₁ would vary over different samples with the X values fixed.

It turns out that b₁ is normally distributed

Mean is β1 -- unbiased Variance of b1

The variance term determines how

close the estimate will be to the true value. Remember: large σ is bad!

b

₁

~ N

β

₁

,

σ

_b

1

2

(5)

What is the formula for ? Can we intuit what should be in the formula? (see appendix 2 for derivation).

–  How should σ figure in the formula?

–  How should N figure in the formula?

–  Anything else?

three factors:

–  N

–  σ2

–  s_X

σ_b

1

2

Var b

( )

₁

=

σ

2

X

_i

−

X

(

)

i=1 N

∑

2

=

σ

2

N

−

1

(6)

35 40 45 50 55 60 65 0 1000 2000 Size Pri ce

35 40 45 50 55 60 65

0 1000 2000 Size Pri ce

35 40 45 50 55 60 65

0

1000

2000

Pri

ce

35 40 45 50 55 60 65

0

1000

2000

Pri

ce

s_X

N

σ

(7)

c. Understanding Standard Errors

When estimating a quantity, it is vital to develop a notion of the precision of the estimation.

examples:

i.  estimate the slope of the regression line

ii.  estimate the value of a flat-panel TV given its size

iii.  estimate the expected return on a portfolio

iv.  estimate the value of a brand name

v.  estimate the damages from patent infringement

Why is this important?

•  We plan on making business decisions based on our

estimates.

•  Some decisions may be very sensitive to the

(8)

c. Understanding Standard Errrors

An example from “everyday” life:

–  When framing a house, we can estimate

a required piece of wood to ± ¼” –  When building a fine cabinet, the

estimates may have to be accurate to ±1/16” or even ±1/32”

The standard deviations of the least squares estimators of the slope and intercept give a precise measurement of the accuracy of the estimator.

(9)

If we insert our estimate of σ, then we have estimated standard deviations or standard errors for the least squares estimators:

Now we can summarize the amount of information there is in the sample about the true regression line parameters.

s_b

1 =

s2 (N−1)s2_X

S not σ

Bottom Line:

(10)

Where can we find the standard errors on the R printout?

(11)

d. Confidence Intervals

We want a margin of error in the estimation of the slope. We can use the standard errors to construct a confidence

interval which provides the margin of error.

All confidence intervals are of the form:

t* _{is a positive number obtained from the t distribution.}

So we have the estimate +/- a multiple of the standard error.

b

₁

±

t

*

s

_b

(12)

d. Confidence Intervals

To define a confidence interval, you must first set the confidence level.

We can never be completely confident that an interval will cover the true value. (the 100% confidence interval is

everything!). So we set a confidence level. Typically, 95 per cent is used (for very large datasets a 99 per cent level

should be used).

We then determine the multiple, t*,_{so that there is a 95 per}

(13)

d. Confidence Intervals: finding t*

We find t*_{by reference to the t distribution. The t distribution is}

similar to the standard normal (Z) and indexed by the number of degrees of freedom, N-2. For a 100*(1-α)% confidence level:

t

*

N−2,α/ 2

(14)

d. t Distribution and Confidence Intervals

Thus, the 100 x (1-α)% C.I. is given by:

Confidence intervals provide information about the range of values of the slope consistent with our data. This is much more useful than simply using the slope estimate.

An estimate without some idea of its precision is useless.

The only question is how to find (“look-up”) t*

b

₁

±

t

_N* ₋_2,_α_{/ 2}

s

_b

(15)

d. t Distribution and Confidence Intervals

Let’s compute a confidence interval for the flat-panel TV data

First we choose α, Pick α = 0.05 or 95 per cent level of confidence.

95% CI: 57.13 ± t_68,0.025 (6.555)

Finding the t* cut-off value

We need to find the value, t*_{, such that}

or

This should remind us of the CDF function (Cumulative Distribution Function) …

Pr ⎡⎣−t_N* ₋_2,_α_{/ 2} ≤ X ≤ t_N* ₋_2,_α_{/ 2}⎤⎦ = 1− α where X ~ t_N₋₂

Pr X ≤ −t*

N−2,α/ 2

⎡⎣ ⎤⎦ = α

(16)

Quick Review: CDF

The CDF is a table of probabilities that tells us: for any little x, what is the probability that the random variable X is less than x?

Plotting this “table”…

X Probability

(17)

Quick Review: CDF

Let’s blow up the boxed area:

0.025

t_68,.025

Note that we are using the CDF function or “table” backwards. We are reading from Probability to Value.

In fact, we are using the “Inverse” CDF function

(18)

Quick Review: CDF

Let’s do it in R. We use the qt() function which is the

“quantile” or inverse CDF function. We need to tell R which t distribution to use (the one with 68 df) and feed in α/2.

So our confidence interval is

57.13 ± 1.995(6.555) = [44.05, 70.21]

(19)

e. Example: Log-Log Price Elasticity Regressions

Let’s look at some demand data. The dataset detergent has demand data on sales and prices of Tide 128 oz laundry

detergent at some 86 stores with 2-5 years of weekly data.

Hard to see much of

anything here.

Note: used

pch=“.” to

(20)

Quantity is an odd variable. It can’t be less than zero and is often very small. There appear to be store-weeks where quantity is huge. Are these outliers?

Recall that the logarithm function has the effect of

compressing large values and expanding the axis for small values.

Basic Properties:

log 1

( )

=

0

log z

(

×

w

)

=

log z

( )

+

log w

( )

(21)

Graph histogram of quantity and log(quantity).

0 200 400 600 800 1000

-2 0 2 4 6 x lo g(x)

Histogram of q_tide128

q_tide128 F re qu en cy

0 500 1000 1500 2000

0

4000

10000

Histogram of log(q_tide128)

log(q_tide128) F re qu en cy

0 2 4 6 8

0

1000

(22)

(23)

Run regressions with log and non-logged variables.

(24)

e. Example: Elasticities

In the regression with raw variables, we interpret regression coefficient as the expected change in q for a given change in p.

E q p

⎡⎣ ⎤⎦

=

662

−

69.4p

Δ

q

Δ

p

=

−

69.4

How do we interpret this? Not very meaningful without a

(25)

To interpret the coefficient, some would convert it into an elasticity.

%

Δ

q

%

Δ

p

=

Δ

q

Δ

p

=

Δ

q

Δ

p

×

p

q

=

−

69.4

8.36

81

=

−

7.16

Here we used the average levels of price and quantity. A one percent reduction in price yields a 7.16 percent

(26)

In the log-log regression, the coefficient on log-price can be interpreted directly as an elasticity. Why?

Δ

logq

Δ

logp

=

−

4.4

=

log q

( )

₁

−

log q

( )

₀

log p

( )

₁

−

log p

( )

₀

=

log

q

1

q

₀

⎛

⎝⎜

⎞

⎠⎟

log

p

1

p

₀

⎛

⎝⎜

⎞

⎠⎟

=

log 1

+

Δ

q

₀

⎛

⎝⎜

⎞

⎠⎟

(27)

e. Example: Log-log Demand

Let’s compute a confidence interval for the price elasticity. Since the sample size is very large (> 14,000), we will use a 99 percent confidence level.

Good to be a marketer with ample and informative data.

(28)

f. Hypothesis Testing

Suppose that we are interested in a specific value of the slope parameter, β₁.

This can be rephrased as a hypothesis

H₀: Null (from “no effect”) vs.

H_A : Alternative

For example, is there any evidence in the data to support the existence of a relationship between X and Y?

So if we want test whether X affects Y, we would test whether

β₁ = 0.

(29)

How can we assess whether or not the data support or refute the null hypothesis?

We can look at our estimate of the true slope and compare it to the hypothesized value:

b

₁

− β

₁*

(discrepancy)

What is wrong just using the discrepancy above? How close is close?

(30)

t statistic:

The basic intuition is that if the null is true then the t statistic should be small (in absolute value).

Get worried when t is large!

t

=

b

1

− β

1 *

s

_b

1

(31)

Formal Approach to Hypothesis-Testing: Two Steps:

i.  Pick the significance level (α) = Prob(reject when null true)

by deciding what level of error of this kind is acceptable (called type I error).

ii.  Use α to choose a rejection region – the set of t statistic

values which will lead to a rejection. This is done by picking a

(32)

This is exactly the same problem as picking the cut-off value in setting up the confidence interval!

(33)

In practice, we take a value of α to be around .05 unless:

–  Sample population is small, or is large

–  Cost of making a type I error is large

(type I error = reject null when null is true)

(34)

g. Example: Market Model and Hypothesis Testing

Even though we know it to be false, let’s hypothesize that there is no relationship between the Windsor Mutual Fund and the Market.

H₀: β₁ = 0 H_A: β₁ ≠ 0

slope estimate calculated std error t stat = 32.1 = (_.93572_-₀_)/_.02915

of slope coef

hypothesized value

s = 0.01872

s

X_i ! X

(

)

2

(35)

The t value is huge (32) relative to the null t distribution with 180-2 =178 degrees of freedom!

To illustrate just how big this is, let’s simulate some numbers from the t distribution

In 1000 draws from the t distribution, we didn’t get a single value anywhere near 32.

We conclude that we reject the null hypothesis: H0: β1 = 0

Histogram of rt(1000, df = 178)

F

re

qu

en

cy

-3 -2 -1 0 1 2 3

0

20

40

60

(36)

Now let’s test a more relevant value of β₁

In finance, stocks and portfolios are characterized by their betas which are estimated from regressions very similar to this one. β₁ is sometimes used as an estimate of risk or volatility. The value of 1 has central

significance.

β1 > 1: volatile assets (amplify market up/down moves)

β₁ < 1: non-volatile assets (shrink market movements)

This suggests that we consider the hypothesis that β₁ = 1. H₀: β₁ = 1

(37)

We can use the qnorm (inverse of CDF) command to compute this for us.

We want the value of t statistic so that:

Pr[ | t | > t*_{] = .05}

0.025 Area

Find the critical value, t* for the .05 significance level.

Value such that Pr[t < t*] = 0.025

(38)

Now compute the value of t stat:

This is larger than the 95% critical value for t(178) so we reject the null hypothesis: H₀: β₁ = 1

Estimate

of b₁ _Hypo Value

Std Error

(39)

g. Example: Market Model and Hypothesis Testing Let’s look at the intercept for the Windsor fund regression. Remember this is Jensen’s alpha.

H₀: β₀ = 0 H_A: β₀ ≠ 0

We can see that the intercept is significantly different from zero. However, 95 CI is large:

(40)

h. P Values

One of the problems with formal hypothesis-testing is that the strength of information in the data in support/against null is not conveyed by accept/ reject!

–  t value is a tiny bit less than the t cutoff, we accept the null

–  If the t value is a tiny bit bigger we reject the null

The information from the data is pretty much the same but we act quite differently.

Therefore, we need some measure of the strength of rejection. The

(41)

h. P Values

The p-value is the probability of observing a value of the t statistic farther out in the tail than the observed t value.

Observed t-stat value

For the standard t-tests printed out by R, the p-value is

(42)

h. P Values

Let’s compute p for the mutual funds example of testing

β₁ = 1. pt() is the R function for the CDF of a t distribution.

This means that 2.7% of the area of the null distribution is greater than 2.2 and less than -2.22 or:

Pr[t_180-2 ≥ 2.22] + Pr[t_180-2 ≤ -2.22] = 0.0275

(43)

h. P Values

Small p value (< α) large | t | reject

Large p value (≥ α) small | t | accept null

(44)

Appendix 1: Sampling Distribution of b₀, cov(b_1,b₀)

b

₀

~N(

!

₀

,

"

_b

0 2

)

cov b

(

₀,b₁

)

= −σ2 X

N −1

( )

s_X2

⎛ ⎝

⎜ ⎞

⎠ ⎟

(45)

Appendix 2: Derivation of Var(b₁)

First, let’s write b₁ as a linear combination of the Y’s. This makes b₁ very much like a weird sort of sample average.

To make things easier to read, use one symbol for the denominator.

Now we can see that b₁ is a linear combination of the Y’s. We will call the weights, c_i.

b₁ =

"

(

Xi ! X

)

(

Yi ! Y

)

X_i ! X

(

)

2

"

=

X_i ! X

(

)

Y_i

"

X_i ! X

(

)

2

"

(Why ?)

(46)

Appendix: Derivation of Var(b₁)

These are not the same weights as in the sample average (1/N vs. c_i). Let’s observe some simple properties of the c_i.

Property 1:

Weights sum up to zero. Observations farther from receive larger weights. Remember: just because something sums to 0 doesn’t mean that it is all zeroes!

Property 2: ci

2 ₌

(

Xi ! X

)

2

D2

"

= 1

D2

(

Xi ! X

)

2

= D

D2 = 1 D!

(47)

Appendix: Derivation of Var(b₁)

Derivation of Var(b₁)

Now that we know a few things about the c_i weights, we can easily derive the variance formula.

Now we use:

i.  the fact that, given X_i , the Y_i are independent

ii.  the formula from Math-Stat prereq on the variance of a l. c. of

indep r.v.s

recalling that, we get the, by now, familiar formula!

Var(b₁) = Var

(

∑

c_iY_i

)

(48)

Glossary of Symbols

s - standard error of the regression c_i - weights used to compute b₁ D - denominator of c_i weight

" - standard errors of least squares estimates

α - significance level

t_N-2 - t random variable with N-2 degrees of freedom

t_N-2, _α_/2 - t critical value for t with N-2 df and significance level α

s_b

(49)

Important Equations

estimate of error variance standard error of the regression three factors driving sampling variance of slope

Var b

( )

₁

=

σ

2

X

_i

−

X

(

)

i=1 N

∑

2

=

σ

2

N

−

1

(50)

(1

− α

)% C.I.: b

±

t

_N₋_2,_α_/2

s

_b

std errors of coefs

Confidence Interval

Rejection Region for

(51)

(52)

Glossary of R Commands

•  hist(a): Graphs a histogram of the given data values of

the variable a.

•  pf(t stats,df=10): Returns the p-value of a t statistic

with degrees of freedom = 10

•  qt(prob,df=10): Returns the t statistics for the left-tail

probability of a t distribution with degree of freedom = 10.

•  rt(100,df=10): Generates random 100 numbers for

the t distribution with degree of freedom of 10.