• No results found

ChIIIEstimationandTestingintheSimpleRegressionModel.pdf

N/A
N/A
Protected

Academic year: 2020

Share "ChIIIEstimationandTestingintheSimpleRegressionModel.pdf"

Copied!
52
0
0

Loading.... (view fulltext now)

Full text

(1)

III. Estimation and Testing in Simple Regression

a.  Estimation in the Simple Linear Regression Model

b.  Sampling Distribution of b1

c.  Understanding Standard Errors

d.  Confidence Intervals

e.  Example: Log-Log Price Elasticity Regressions

f.  Hypothesis-testing

g.  Example: Market Model and Hypothesis-testing

(2)

a. Estimation in the Simple Linear Regression Model

Recall the SLR that assumes that every observation in the dataset was generated by the model:

We use Least Squares to estimate β0 and β1 . Recall the formulas:

ˆ

β

1

=

b

1

=

i=1

(X

i

X)(Y

i

Y)

N

(X

i

X)

2 i=1

N

ˆ

β

0

=

b

0

=

Y

b

1

X

(3)

a. Estimation in the Simple Linear Regression Model NOTE!!: β0 is not b0, β1 is not b1 and εi is not e

Y

True Line: β0 + β1 X

Least Squares Line: b0 + b1 X

ei

X

(4)

b. Sampling Distribution of b1

It is possible to derive the sampling distribution of b1. See appendix 2, b1 is a weighted average of the Y values!

This distribution describes how the estimator b1 would vary over different samples with the X values fixed.

It turns out that b1 is normally distributed

Mean is β1 -- unbiased Variance of b1

The variance term determines how

close the estimate will be to the true value. Remember: large σ is bad!

b

1

~ N

β

1

,

σ

b

1

2

(5)

b. Sampling Distribution of b1

What is the formula for ? Can we intuit what should be in the formula? (see appendix 2 for derivation).

–  How should σ figure in the formula?

–  How should N figure in the formula?

–  Anything else?

three factors:

–  N

–  σ2

–  sX

σb

1

2

Var b

( )

1

=

σ

2

X

i

X

(

)

i=1 N

2

=

σ

2

N

1

(6)

35 40 45 50 55 60 65 0 1000 2000 Size Pri ce

35 40 45 50 55 60 65

0 1000 2000 Size Pri ce

35 40 45 50 55 60 65

0

1000

2000

Pri

ce

35 40 45 50 55 60 65

0

1000

2000

Pri

ce

b. Sampling Distribution of b1

sX

N

σ

(7)

c. Understanding Standard Errors

When estimating a quantity, it is vital to develop a notion of the precision of the estimation.

examples:

i.  estimate the slope of the regression line

ii.  estimate the value of a flat-panel TV given its size

iii.  estimate the expected return on a portfolio

iv.  estimate the value of a brand name

v.  estimate the damages from patent infringement

Why is this important?

•  We plan on making business decisions based on our

estimates.

•  Some decisions may be very sensitive to the

(8)

c. Understanding Standard Errrors

An example from “everyday” life:

–  When framing a house, we can estimate

a required piece of wood to ± ¼” –  When building a fine cabinet, the

estimates may have to be accurate to ±1/16” or even ±1/32”

The standard deviations of the least squares estimators of the slope and intercept give a precise measurement of the accuracy of the estimator.

(9)

c. Understanding Standard Errors

If we insert our estimate of σ, then we have estimated standard deviations or standard errors for the least squares estimators:

Now we can summarize the amount of information there is in the sample about the true regression line parameters.

sb

1 =

s2 (N−1)s2X

S not σ

Bottom Line:

(10)

c. Understanding Standard Errors

Where can we find the standard errors on the R printout?

(11)

d. Confidence Intervals

We want a margin of error in the estimation of the slope. We can use the standard errors to construct a confidence

interval which provides the margin of error.

All confidence intervals are of the form:

t* is a positive number obtained from the t distribution.

So we have the estimate +/- a multiple of the standard error.

b

1

±

t

*

s

b
(12)

d. Confidence Intervals

To define a confidence interval, you must first set the confidence level.

We can never be completely confident that an interval will cover the true value. (the 100% confidence interval is

everything!). So we set a confidence level. Typically, 95 per cent is used (for very large datasets a 99 per cent level

should be used).

We then determine the multiple, t*, so that there is a 95 per

(13)

d. Confidence Intervals: finding t*

We find t* by reference to the t distribution. The t distribution is

similar to the standard normal (Z) and indexed by the number of degrees of freedom, N-2. For a 100*(1-α)% confidence level:

t

*

N−2,α/ 2

(14)

d. t Distribution and Confidence Intervals

Thus, the 100 x (1-α)% C.I. is given by:

Confidence intervals provide information about the range of values of the slope consistent with our data. This is much more useful than simply using the slope estimate.

An estimate without some idea of its precision is useless.

The only question is how to find (“look-up”) t*

b

1

±

t

N* 2,α/ 2

s

b
(15)

d. t Distribution and Confidence Intervals

Let’s compute a confidence interval for the flat-panel TV data

First we choose α, Pick α = 0.05 or 95 per cent level of confidence.

95% CI: 57.13 ± t68,0.025 (6.555)

Finding the t* cut-off value

We need to find the value, t*, such that

or

This should remind us of the CDF function (Cumulative Distribution Function) …

Pr ⎡⎣−tN* 2,α/ 2 ≤ X ≤ tN* 2,α/ 2⎤⎦ = 1− α where X ~ tN2

Pr X ≤ −t*

N−2,α/ 2

⎡⎣ ⎤⎦ = α

(16)

Quick Review: CDF

The CDF is a table of probabilities that tells us: for any little x, what is the probability that the random variable X is less than x?

Plotting this “table”…

X Probability

(17)

Quick Review: CDF

Let’s blow up the boxed area:

0.025

t68,.025

Note that we are using the CDF function or “table” backwards. We are reading from Probability to Value.

In fact, we are using the “Inverse” CDF function

(18)

Quick Review: CDF

Let’s do it in R. We use the qt() function which is the

“quantile” or inverse CDF function. We need to tell R which t distribution to use (the one with 68 df) and feed in α/2.

So our confidence interval is

57.13 ± 1.995(6.555) = [44.05, 70.21]

(19)

e. Example: Log-Log Price Elasticity Regressions

Let’s look at some demand data. The dataset detergent has demand data on sales and prices of Tide 128 oz laundry

detergent at some 86 stores with 2-5 years of weekly data.

Hard to see much of

anything here.

Note: used

pch=“.” to

(20)

e. Example: Log-Log Price Elasticity Regressions

Quantity is an odd variable. It can’t be less than zero and is often very small. There appear to be store-weeks where quantity is huge. Are these outliers?

Recall that the logarithm function has the effect of

compressing large values and expanding the axis for small values.

Basic Properties:

log 1

( )

=

0

log z

(

×

w

)

=

log z

( )

+

log w

( )

(21)

e. Example: Log-Log Price Elasticity Regressions

Graph histogram of quantity and log(quantity).

0 200 400 600 800 1000

-2 0 2 4 6 x lo g(x)

Histogram of q_tide128

q_tide128 F re qu en cy

0 500 1000 1500 2000

0

4000

10000

Histogram of log(q_tide128)

log(q_tide128) F re qu en cy

0 2 4 6 8

0

1000

(22)

e. Example: Log-Log Price Elasticity Regressions

(23)

e. Example: Log-Log Price Elasticity Regressions

Run regressions with log and non-logged variables.

(24)

e. Example: Elasticities

In the regression with raw variables, we interpret regression coefficient as the expected change in q for a given change in p.

E q p

⎡⎣ ⎤⎦

=

662

69.4p

Δ

q

Δ

p

=

69.4

How do we interpret this? Not very meaningful without a

(25)

e. Example: Elasticities

To interpret the coefficient, some would convert it into an elasticity.

%

Δ

q

%

Δ

p

=

Δ

q

q

Δ

p

p

=

Δ

q

Δ

p

×

p

q

=

69.4

8.36

81

=

7.16

Here we used the average levels of price and quantity. A one percent reduction in price yields a 7.16 percent

(26)

e. Example: Elasticities

In the log-log regression, the coefficient on log-price can be interpreted directly as an elasticity. Why?

Δ

logq

Δ

logp

=

4.4

=

log q

( )

1

log q

( )

0

log p

( )

1

log p

( )

0

=

log

q

1

q

0

⎝⎜

⎠⎟

log

p

1

p

0

⎝⎜

⎠⎟

=

log 1

+

Δ

q

q

0

⎝⎜

⎠⎟

(27)

e. Example: Log-log Demand

Let’s compute a confidence interval for the price elasticity. Since the sample size is very large (> 14,000), we will use a 99 percent confidence level.

Good to be a marketer with ample and informative data.

(28)

f. Hypothesis Testing

Suppose that we are interested in a specific value of the slope parameter, β1.

This can be rephrased as a hypothesis

H0: Null (from “no effect”) vs.

HA : Alternative

For example, is there any evidence in the data to support the existence of a relationship between X and Y?

So if we want test whether X affects Y, we would test whether

β1 = 0.

(29)

f. Hypothesis Testing

How can we assess whether or not the data support or refute the null hypothesis?

We can look at our estimate of the true slope and compare it to the hypothesized value:

b

1

− β

1*

(discrepancy)

What is wrong just using the discrepancy above? How close is close?

(30)

f. Hypothesis Testing

t statistic:

The basic intuition is that if the null is true then the t statistic should be small (in absolute value).

Get worried when t is large!

t

=

b

1

− β

1 *

s

b

1

(31)

f. Hypothesis Testing

Formal Approach to Hypothesis-Testing: Two Steps:

i.  Pick the significance level (α) = Prob(reject when null true)

by deciding what level of error of this kind is acceptable (called type I error).

ii.  Use α to choose a rejection region – the set of t statistic

values which will lead to a rejection. This is done by picking a

(32)

f. Hypothesis Testing

This is exactly the same problem as picking the cut-off value in setting up the confidence interval!

(33)

f. Hypothesis Testing

In practice, we take a value of α to be around .05 unless:

–  Sample population is small, or is large

–  Cost of making a type I error is large

(type I error = reject null when null is true)

(34)

g. Example: Market Model and Hypothesis Testing

Even though we know it to be false, let’s hypothesize that there is no relationship between the Windsor Mutual Fund and the Market.

H0: β1 = 0 HA: β1 ≠ 0

slope estimate calculated std error t stat = 32.1 = (.93572-0)/.02915

of slope coef

hypothesized value

s = 0.01872

s

Xi ! X

(

)

2
(35)

g. Example: Market Model and Hypothesis Testing

The t value is huge (32) relative to the null t distribution with 180-2 =178 degrees of freedom!

To illustrate just how big this is, let’s simulate some numbers from the t distribution

In 1000 draws from the t distribution, we didn’t get a single value anywhere near 32.

We conclude that we reject the null hypothesis: H0: β1 = 0

Histogram of rt(1000, df = 178)

F

re

qu

en

cy

-3 -2 -1 0 1 2 3

0

20

40

60

(36)

g. Example: Market Model and Hypothesis Testing

Now let’s test a more relevant value of β1

In finance, stocks and portfolios are characterized by their betas which are estimated from regressions very similar to this one. β1 is sometimes used as an estimate of risk or volatility. The value of 1 has central

significance.

β1 > 1: volatile assets (amplify market up/down moves)

β1 < 1: non-volatile assets (shrink market movements)

This suggests that we consider the hypothesis that β1 = 1. H0: β1 = 1

(37)

g. Example: Market Model and Hypothesis Testing

We can use the qnorm (inverse of CDF) command to compute this for us.

We want the value of t statistic so that:

Pr[ | t | > t* ] = .05

0.025 Area

Find the critical value, t* for the .05 significance level.

Value such that Pr[t < t*] = 0.025

(38)

g. Example: Market Model and Hypothesis Testing

Now compute the value of t stat:

This is larger than the 95% critical value for t(178) so we reject the null hypothesis: H0: β1 = 1

Estimate

of b1 Hypo Value

Std Error

(39)

g. Example: Market Model and Hypothesis Testing Let’s look at the intercept for the Windsor fund regression. Remember this is Jensen’s alpha.

H0: β0 = 0 HA: β0 ≠ 0

We can see that the intercept is significantly different from zero. However, 95 CI is large:

(40)

h. P Values

One of the problems with formal hypothesis-testing is that the strength of information in the data in support/against null is not conveyed by accept/ reject!

–  t value is a tiny bit less than the t cutoff, we accept the null

–  If the t value is a tiny bit bigger we reject the null

The information from the data is pretty much the same but we act quite differently.

Therefore, we need some measure of the strength of rejection. The

(41)

h. P Values

The p-value is the probability of observing a value of the t statistic farther out in the tail than the observed t value.

Observed t-stat value

For the standard t-tests printed out by R, the p-value is

(42)

h. P Values

Let’s compute p for the mutual funds example of testing

β1 = 1. pt() is the R function for the CDF of a t distribution.

This means that 2.7% of the area of the null distribution is greater than 2.2 and less than -2.22 or:

Pr[t180-2 ≥ 2.22] + Pr[t180-2 ≤ -2.22] = 0.0275

(43)

h. P Values

Small p value (< α) large | t | reject

Large p value (≥ α) small | t | accept null

(44)

Appendix 1: Sampling Distribution of b0, cov(b1,b0)

b

0

~N(

!

0

,

"

b

0 2

)

cov b

(

0,b1

)

= −σ2 X

N −1

( )

sX2

⎛ ⎝

⎜ ⎞

⎠ ⎟

(45)

Appendix 2: Derivation of Var(b1)

First, let’s write b1 as a linear combination of the Y’s. This makes b1 very much like a weird sort of sample average.

To make things easier to read, use one symbol for the denominator.

Now we can see that b1 is a linear combination of the Y’s. We will call the weights, ci.

b1 =

"

(

Xi ! X

)

(

Yi ! Y

)

Xi ! X

(

)

2

"

=

Xi ! X

(

)

Yi

"

Xi ! X

(

)

2

"

(Why ?)
(46)

Appendix: Derivation of Var(b1)

These are not the same weights as in the sample average (1/N vs. ci). Let’s observe some simple properties of the ci.

Property 1:

Weights sum up to zero. Observations farther from receive larger weights. Remember: just because something sums to 0 doesn’t mean that it is all zeroes!

Property 2: ci

2 =

(

Xi ! X

)

2

D2

"

"

= 1

D2

(

Xi ! X

)

2

= D

D2 = 1 D!

(47)

Appendix: Derivation of Var(b1)

Derivation of Var(b1)

Now that we know a few things about the ci weights, we can easily derive the variance formula.

Now we use:

i.  the fact that, given Xi , the Yi are independent

ii.  the formula from Math-Stat prereq on the variance of a l. c. of

indep r.v.s

recalling that, we get the, by now, familiar formula!

Var(b1) = Var

(

ciYi

)

(48)

Glossary of Symbols

s - standard error of the regression ci - weights used to compute b1 D - denominator of ci weight

" - standard errors of least squares estimates

α - significance level

tN-2 - t random variable with N-2 degrees of freedom

tN-2, α/2 - t critical value for t with N-2 df and significance level α

sb

(49)

Important Equations

estimate of error variance standard error of the regression three factors driving sampling variance of slope

Var b

( )

1

=

σ

2

X

i

X

(

)

i=1 N

2

=

σ

2

N

1

(50)

Important Equations

(1

− α

)% C.I.: b

±

t

N2,α/2

s

b

std errors of coefs

Confidence Interval

Rejection Region for

(51)

Important Equations

(52)

Glossary of R Commands

•  hist(a): Graphs a histogram of the given data values of

the variable a.

•  pf(t stats,df=10): Returns the p-value of a t statistic

with degrees of freedom = 10

•  qt(prob,df=10): Returns the t statistics for the left-tail

probability of a t distribution with degree of freedom = 10.

•  rt(100,df=10): Generates random 100 numbers for

the t distribution with degree of freedom of 10.

References

Related documents