• No results found

Etc 2410 Notes

N/A
N/A
Protected

Academic year: 2021

Share "Etc 2410 Notes"

Copied!
133
0
0

Loading.... (view fulltext now)

Full text

(1)

Introductory Econometrics

Contents

1 Review of simple regression 3

1.1 The Sample Regression Function . . . 3

1.2 Interpretation of regression as prediction . . . 6

1.3 Regression in Eviews . . . 6

1.4 Goodness of …t . . . 18

1.5 Derivations . . . 19

1.5.1 Summation notation . . . 19

1.5.2 Derivation of OLS . . . 22

1.5.3 Properties of predictions and residuals . . . 23

2 Statistical Inference and the Population Regression Function 24 2.1 Simple random sample . . . 24

2.2 Population distributions and parameters . . . 25

2.3 Population vs Sample . . . 25

2.4 Conditional Expectation . . . 25

2.5 The Population Regression Function . . . 26

2.6 Statistical Properties of OLS . . . 26

2.6.1 Properties of Expectations . . . 27

2.6.2 Unbiasedness . . . 28

2.6.3 Variance . . . 30

2.6.4 Asymptotic normality . . . 31

2.7 Summary . . . 35

3 Hypothesis Testing and Con…dence Intervals 37 3.1 Hypothesis testing . . . 37

3.1.1 The null hypothesis . . . 37

3.1.2 The alternative hypothesis . . . 37

3.1.3 The null distribution . . . 38

3.1.4 The alternative distribution . . . 39

3.1.5 Decision rules and the signi…cance level . . . 40

3.1.6 The t test — theory . . . 40

3.1.7 The t test — two sided example . . . 41

3.1.8 The t test — one sided example . . . 44

3.1.9 p-values . . . 46

3.1.10 Testing other null hypotheses . . . 49

3.2 Con…dence intervals . . . 51

3.3 Prediction intervals . . . 52

(2)

4 Multiple Regression 55

4.1 Population Regression Function . . . 55

4.2 Sample Regression Function and OLS . . . 55

4.3 Example: house price modelling . . . 56

4.4 Statistical Inference . . . 57

4.5 Applications to house price regression . . . 59

4.6 Joint hypothesis tests . . . 62

4.7 Multicollinearity . . . 65

4.7.1 Perfect multicollinearity . . . 65

4.7.2 Imperfect multicollinearity . . . 69

5 Dummy Variables 69 5.1 Estimating two means . . . 69

5.2 Estimating several means . . . 71

5.3 Dummy variables in general regressions . . . 72

5.3.1 Dummies for intercepts . . . 73

5.3.2 Dummies for slopes . . . 78

6 Some non-linear functional forms 80 6.1 Quadratic regression . . . 80

6.1.1 Example: wages and work experience . . . 81

6.2 Regression with logs –explanatory variable . . . 82

6.2.1 Example: wages and work experience . . . 86

6.3 Regression with logs –dependent variable . . . 87

6.3.1 Example: modelling the log of wages . . . 93

6.3.2 Choosing between levels and logs for the dependent variable . . . 94

6.4 Practical summary of functional forms . . . 97

7 Comparing regressions 98 7.1 Adjusted R2 . . . . 98

7.2 Information criteria . . . 99

7.3 Adjusted R2 as an IC . . . 99

8 Functional form 100 9 Regression and Causality 100 9.1 Notation . . . 101

9.2 Regression for prediction . . . 101

9.3 Omitted variables . . . 102

9.4 Simultaneity . . . 104

9.5 Sample selection . . . 104

10 Regression with Time Series 105 10.1 Dynamic regressions . . . 106

10.1.1 Finite Distributed Lag model . . . 106

10.1.2 Autoregressive Distributed Lag model . . . 106

10.1.3 Forecasting . . . 107

10.1.4 Application . . . 107

10.2 OLS estimation . . . 108

(3)

10.2.2 A general theory for time series regression . . . 112

10.3 Checking weak dependence . . . 113

10.4 Model speci…cation . . . 113

10.5 Interpretation . . . 124

10.5.1 Interpretation of FDL models . . . 124

10.5.2 Interpretation of ARDL models . . . 125

11 Regression in matrix notation 127 11.1 De…nitions . . . 127

11.2 Addition and Subtraction . . . 128

11.3 Multiplication . . . 129

11.4 The PRF . . . 129

11.5 Matrix Inverse . . . 130

11.6 OLS in matrix notation . . . 131

11.6.1 Proof . . . 131

11.7 Unbiasedness of OLS . . . 132

11.8 Time series regressions . . . 132

1

Review of simple regression

1.1 The Sample Regression Function

Regression is the primary statistical tool used in econometrics to understand the relationship between variables. To illustrate, consider the dataset introduced in Example 2.3 of Wooldridge for relating the salary paid to corporate chief executive o¢ cers to the return on equity achieved by their …rms. Data is available for 209 …rms. The idea is to examine whether the salaries paid to CEOs is related to the earnings of their …rms, and speci…cally whether …rms with higher incomes reward their CEOs with higher salaries. A scatter plot of the possible relationship is shown in Figure 1, which reveals the possibility of increasing returns to equity corresponding to higher CEO salaries, but with some apparently high salaries for a small number of CEOs also included (these are known as outliers, to be discussed later).

A regression line can be …t to this data using the method of Ordinary Least Squares (OLS), as shown in Figure 2. The OLS method works as follows. The dependent variable for the regression is denoted yi, where the subscript i refers to the number of the observation for i = 1; : : : ; n. In the

example we have n = 209 and yi corresponds to the CEO salary for each of the 209 …rms. The

explanatory variable, or regressor, is denoted xi for i = 1; : : : ; n and corresponds to the Return

on Equity for each of the 209 …rms. The data are shown in Table 1. The …rst observation in the dataset is y1= 1095 and x1 = 14:10, meaning that the CEO of the …rst …rm earns $1,095,000 and

the …rm’s Return on Equity is 14.10%. The second observation is y2 = 1001 and x2 = 10:90, the

last observation is y209 = 626 and x209 = 14:40, and so on.

The regression line is a linear function of xithat is used to calculate a prediction of yi, denoted

^

yi. This regression line is expressed

^

yi = ^0+ ^1xi; i = 1; : : : ; n: (1)

This is called the Sample Regression Function (SRF). The “hat” on top of any quantity implies that it is a prediction or an estimate that is calculated from the data. The method of OLS is used to calculate ^0 and ^1, respectively the intercept and the slope of the regression line. The prediction errors, or regression residuals, are denoted

^

(4)

and OLS chooses the values of ^0and ^1such that the overall residuals (^u1; : : : ; ^un) are minimised,

in the sense that the Sum of Squared Residuals (SSR)

SSR = n X i=1 ^ u2i = n X i=1 (yi y^i)2

is as small as possible. This is the sense in which the OLS regression line is known as the line of best …t.

The formulae for ^0 and ^1 are given by ^ 1= Pn i=1(xi x) (yi y) Pn i=1(xi x) 2 ; (3) and ^ 0 = y ^1x; (4)

where y and x are the sample means

y = 1 n n X i=1 yi; x = 1 n n X i=1 xi.

The derivations are given below.

For the CEO salary data, the coe¢ cients of the regression line can be calculated to be ^0 = 963:191 and ^1 = 18:501, so the regression line can be written

^

yi = 963:191 + 18:501xi,

or equivalently using the names of the variables: d

salaryi = 963:191 + 18:501 RoEi:

The interpretation of this regression line is that it gives a prediction of CEO salary in terms of the return on equity of the …rm. For example, for the …rst …rm the predicted salary on the basis of return on equity is

^

y1 = 963:191 + 18:501xi

= 963:191 + 18:501 14:10 = 1224:1;

or $1; 224; 100, and the residual is ^

u1 = y1 y^1

= 1095 1224:1 = 129:1;

or $129; 100. That is, the CEO of the …rst company in the dataset is earning $129; 100 less than predicted by the …rm’s return on equity. Table 2 gives some of the values of ^yi and ^ui

(5)

0 2,000 4,000 6,000 8,000 10,000 12,000 14,000 0 10 20 30 40 50 60 ROE S A L A R Y

Figure 1: Scatter plot of CEO salaries against Return on Equity

0 2,000 4,000 6,000 8,000 10,000 12,000 14,000 16,000 0 10 20 30 40 50 60 ROE S A L A R Y

(6)

Table 1: Data on CEO salaries and Return on Equity Observation (i) Salary (yi) Return on Equity (xi)

1 1095 14.10 2 1001 10.90 3 1122 23.50 .. . ... ... 208 555 13.70 209 626 14.40

Table 2: CEO salaries and Return on Equity, with regression predictions and residuals Observation (i) Salary (yi) Return on Equity (xi) Predicted Salary (^yi) Residual (^ui)

1 1095 14.10 1224.1 129:1 2 1001 10.90 1164.9 163:9 3 1122 23.50 1398.0 276:0 .. . ... ... ... ... 208 555 13.70 1216.7 661:7 209 626 14.40 1229.6 603:6

1.2 Interpretation of regression as prediction

The intrepretation of the regression coe¢ cients ^0and ^1 relies on the interpretation of regression as giving predictions for yi using xi. For a general regression equation

^

yi = ^0+ ^1xi;

the interpretation of ^0 is that it is the predicted value of yi when xi = 0. It depends on the

application whether xi = 0 is practically relevant. In the CEO salary example, a …rm with zero

return on equity (i.e. net income of zero) is predicted to have a CEO with a salary of $963,191. Such a prediction has some value in this case because it is possible for a …rm to have zero net income in a particular year, and the data contains observations where the return on equity is quite close to zero. As a di¤erent example, if we had a regression of individual wages on age of the form

d

wagei = ^0+ ^1agei, it would make no practical sense to predict the wage of an individual of age

zero! In this case the intercept coe¢ cient ^0 does not have a natural interpretation.

The slope coe¢ cient ^1 measures the change in the predicted value ^yi that would follow from a

one unit increase in the regressor xi. The predicted value of yi given the regressor takes the value

xi is ^yi = ^0+ ^1xi, while the predicted value of yi given the regressor takes the value xi+ 1 is

^

yi = ^0+ ^1(xi+ 1). The change in prediction for yi based on this change in xi is ^yi yi= ^1. In

the CEO salary example, an increase of 1% in a …rm’s return on equity corresponds to a predicted increase of 18.501 ($18; 501) in CEO salary. This quanti…es how increases in …rm income change our prediction for CEO salary. Econometrics is especially concerned with the estimation and interpretation of such slope coe¢ cients.

1.3 Regression in Eviews

Eviews is statistical software designed speci…cally for econometric analysis. Data can be read in from Excel …les and then easily analysed using OLS regression. The steps to carry out the CEO salary analysis in the previous section are presented here.

(7)

Figure 3: Excel spreadsheet for CEO salary data

Figure 3 shows part of an Excel spreadsheet containing the CEO salary data. The variable names are in the …rst row, followed by the observations for each variable. To open this …le in Eviews, go to “File - Open - Foreign Data as Work…le...” as shown in Figure 4, and select the Excel …le in the subsequent window. On opening the …le, the dialog boxes in Figures 5, 6 and 7 can often be left unchanged. The …rst speci…es the range of the data within the work…le (in this case the …rst two columns of Sheet 1), the second speci…es that the variable names are contained in the …rst row of the spreadsheet, and the third speci…es that a new work…le be created in Eviews to contain the data. For simple data sets such as this, the defaults in these dialog boxes will be correct. More involved data sets will be considered later. On clicking “Finish” in the …nal dialog box, the new work…le is displayed in Eviews, see Figure 8.

The Range of the work…le speci…es the total number of observations available for analysis, in this case 209. The Sample of the work…le speci…es which observations are currently being used for analysis, and this defaults to the full range of the work…le unless otherwise speci…ed. There are four objects displayed in the work…le — c, resid, roe and salary. The …rst two of these will be present in any work…le. The “c” and “resid” objects contain the coe¢ cient values and residuals from the most recently estimated regression. The objects roe and salary contain the data on those two variables. For example, double clicking on salary gives the object view shown in Figure 9, where the observations can be seen. Many other views are possible, but a common and important …rst step is to obtain some graphical and statistical summaries by selecting “View - Descriptive Statistics & Tests - Histogram and Stats”as shown in Figure 10. This results in Figure 11, where the histogram gives an idea of the distribution of the variable and the descriptive statistics provide an idea of the measures of central tendency (mean, median), dispersion (maximum, minimum, standard deviation) and other measures. The mean CEO salary is $1,281,120 while the median is $1,039,000. The substantial di¤erence between these two statistics is because there are at least three very large salaries that are very in‡uential on the mean, but not the median. These observations were also evident in the scatter plot in Figure 1. The same descriptive statistics can

(8)

Figure 4: To open an Excel …le in Eviews...

(9)
(10)

Figure 7: ... and specify that a new undated work…le be created.

(11)

Figure 9: Contents of the salary object

be obtained for the Return on Equity variable.

The scatter plots in Figure 1 or 2 can be obtained by selecting “Quick - Graph...”as shown in Figure 12, entering “roe salary”into the resulting Series List box as shown in Figure 13, and then specifying a Scatter with regression line (if desired) as shown in Figure 14. The result is Figure 2. The regression equation itself can be computed by selecting “Quick - Estimate Equation...” as shown in Figure 15, and then specifying the equation as shown in Figure 16. The dependent variable (salary) for the regression equation goes …rst, the “c” refers to the intercept of the equation ^0 and then the explanatory variable (roe). The results of the regression calculation are shown in Figure 17. In particular the values of the intercept ^0 = 963:1913 and the slope coe¢ cient on RoE ^1= 18:50119 can be read from the “Coe¢ cient” column of the tabulated results. The equation can be named as shown in Figure 18, which means that it will appear as an object in the work…le and can be saved for future reference.

To obtain the predicted values salaryd i for the regression, click on the “Forecast” button and enter a new variable name in the “Forecast name”box, say “salary_hat”, as shown in Figure 19. A new object called salary_hat is created in the work…le and double clicking on it reveals the values shown in the Figure ??, the …rst three of which correspond to the values given in Table 2 for ^yi.

To obtain the residuals ^ui for the regression select “Proc - Make Residual Series” in the

equation window as shown in Figure 20 and name the new residuals object as shown in the Figure 21. The resulting residuals for the CEO salary regression are shown in Figure 22, the …rst three of which correspond to the values given in Table 2 for ^ui.

(12)

Figure 10: Obtaining descriptive statistics 0 10 20 30 40 50 60 70 80 90 0 2000 4000 6000 8000 10000 12000 14000 Series: SALARY Sample 1 209 Observations 209 M ean 1281.120 M edian 1039.000 M aximum 14822.00 M inimum 223.0000 Std. Dev. 1372.345 Skewness 6.854923 Kurtosis 60.54128 Jarq ue-Bera 30470.10 Probability 0.000000

(13)

Figure 12: Selecting Quick - Graph...

(14)

Figure 14: Selecting a scatter plot with regression line.

(15)

Figure 16: Specifying a regression of CEO salary on an intercept and Return on Equity

(16)

Figure 18: Naming an equation to keep it in the work…le.

(17)

Figure 20: Make residuals object from a regression

(18)

Figure 22: The residuals from the CEO salary regression.

1.4 Goodness of …t

The equation (2) that de…nes the regression residuals can we written

yi = ^yi+ ^ui; (5)

which states that the regression decomposes each observation into a prediction (^yi) that is a

function of xi, and the residual ^ui. Letvar (yc i) denote the sample variance of y1; : : : ; yn:

c var (yi) = 1 n 1 n X i=1 (yi y)2;

and similarly var (^c yi) and var (^c ui) are the sample variances of ^y1; : : : ; ^yn and ^u1; : : : ; ^un. Some

simple algebra (in section 1.5.3 below) shows that c

var (yi) =var (^c yi) +var (^c ui) : (6)

(Note that (6) does not follow automatically from (5) and requires the additional property that Pn

i=1y^iu^i = 0.) Equation (6) shows that the variation in yi can be decomposed into the sum of

the variation in the regression predictions ^yi and the variation in the residuals ^ui. The variation

of the regression predictions is referred to as the variation in yi that is explained by the regression.

A common descriptive statistic is

R2 = var (^c yi) c var (yi)

;

which measures the goodness of …t of the a regression as the proportion of variation in the dependent variable that is explained by the variation in xi. The R2 is known as the coe¢ cient

of determination and lies between 0 and 1. The closer is R2 to one, the better the regression is said to …t. Note that this is just one criteria by which to evaluate the quality of a regression, and others will be given during the course.

(19)

It is common, as in Wooldridge, to express the R de…nitions and algebra in terms of sums of squares rather than sample variances. Equation (6) can be written

1 n 1 n X i=1 (yi y)2= 1 n 1 n X i=1 (^yi y)2+ 1 n 1 n X i=1 ^ u2i;

where use is made of Pni=1u^i = 0, which in turn implies Pi=1n yi =Pni=1y^i (see section 1.5.3 for

the derivation). Cancelling the 1= (n 1) gives

SST = SSE + SSR; where SST = n X i=1

(yi y)2 “total sum of squares”

SSE = n X i=1 ^ yi y^ 2

“explained sum of squares”

SSR =

n

X

i=1

^

u2i “residual sum of squares”.

In this case R2 can equivalently be de…ned

R2 = SSE SST:

The R2 for the CEO salary regression in Figure 17 is 0.0132, so that just 1.32% of the variation

in CEO salaries is explained by the Return on Equity of the …rm. This low R2 (i.e. close to zero) need not imply the regression is useless, but it does imply that CEO salaries are determined by other important factors besides just the pro…tability of the …rm.

Some intuition for what R2 is measuring can be found in Figures 23 and 24, which show two hypothetical regressions with R2= 0:185 and R2 = 0:820 respectively. The data in Figure 24 are less dispersed around the regression line, so that changes in xi more precisely predict changes in

y2;i than y1;i. There is more variation in y1;i that is left unexplained by the regression.

Figure 25 gives one example of how R2 does not always provide a foolproof measure of the quality of a regression. The regression in Figure 25 has R2 = 0:975, very close to the maximum possible value of one, but the scatter plot clearly reveals that the regression does not explain an important feature of the relationship between y3;i and xi — there is some curvature or

non-linearity that is not captured by the regression. A high R2 is a nice property for a regression to have, but is neither necessary nor su¢ cient for a regression to be useful.

1.5 Derivations

1.5.1 Summation notation

It will be necessary to know some simple properties of summation operators to follow the deriva-tions. The summation operator is de…ned by

n

X

i=1

(20)

-1.0 -0.5 0.0 0.5 1.0 1.5 2.0 2.5 0.0 0.2 0.4 0.6 0.8 1.0 X Y1 Figure 23: R2= 0:185 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0 2.5 0.0 0.2 0.4 0.6 0.8 1.0 X Y2 Figure 24: R2= 0:820

(21)

-1.0 -0.5 0.0 0.5 1.0 1.5 2.0 0.0 0.2 0.4 0.6 0.8 1.0 X Y3 Figure 25: R2= 0:975 It follows that n X i=1 (ai+ bi) = n X i=1 ai+ n X i=1 bi:

If c is a constant (i.e. does not vary with i) then

n X i=1 c = c + c + : : : + c | {z } ntimes = nc: Similarly n X i=1 cai = c n X i=1 ai;

which is an extension of taking c outside the brackets in (ca1+ ca2) = c (a1+ a2).

The sample mean of a1; : : : ; an is

a = 1 n n X i=1 ai; and then n X i=1 (ai a) = n X i=1 ai n X i=1 a = na na = 0; (7)

(22)

The sum of squares of ai around the sample mean a can be expressed n X i=1 (ai a)2 = n X i=1 a2i 2aia + a2 = n X i=1 a2i 2a n X i=1 ai+ n X i=1 a2 = n X i=1 a2i 2na2+ na2 = n X i=1 a2i na2: 1.5.2 Derivation of OLS

Consider a prediction for yi of the form

~

yi= b0+ b1xi;

where b0 and b1 could be any coe¢ cients. The residuals from this predictor are

~

ui= yi y~i= yi b0 b1xi:

The idea of OLS is to choose the values of b0 and b1 that minimise the sum of squared residuals

SSR (b0; b1) = n X i=1 ~ u2i = n X i=1 (yi b0 b1xi)2:

The minimisation can be done using calculus. The …rst derivatives of SSR (b0; b1) with respect

to b0 and b1 are @SSR (b0; b1) @b0 = 2 n X i=1 (yi b0 b1xi) @SSR (b0; b1) @b1 = 2 n X i=1 xi(yi b0 b1xi) :

Setting these …rst derivatives to zero at the desired estimators ^0 and ^1 gives the …rst order conditions n X i=1 yi ^0 ^1xi = 0 (8) n X i=1 xi yi ^0 ^1xi = 0: (9)

(See equations (2.14) and (2.15) of Wooldridge, who takes a di¤erent approach to arrive at these equations.)

The …rst equation can be written

n X i=1 yi n^0 ^1 n X i=1 xi= 0;

(23)

which is equivalent (after dividing both sides by n) to y ^0 ^1x = 0: Solving for ^0 gives (4).

Substituting this expression for ^0 into the second equation gives

n X i=1 xi (yi y) ^1(xi x) = 0; or n X i=1 xi(yi y) ^1 n X i=1 xi(xi x) = 0: Notice that n X i=1 (xi x) (yi y) = n X i=1 xi(yi y) x n X i=1 (yi y) = n X i=1 xi(yi y) ; and similarly n X i=1 (xi x)2 = n X i=1 xi(xi x) ;

so the …rst order condition for ^1 can be written

n X i=1 (xi x) (yi y) ^1 n X i=1 (xi x)2 = 0; which leads to (3).

1.5.3 Properties of predictions and residuals The OLS residuals

^ ui= yi y^i= yi ^0 ^1xi; satisfy n X i=1 ^ ui= 0; (10)

because of (8), and hence ^u = 0. Similarly because of (9) the residuals satisfy

n

X

i=1

xiu^i= 0: (11)

From these two it follows that

n X i=1 ^ yi = n X i=1 (yi u^i) = n X i=1 yi n X i=1 ^ ui= n X i=1 yi; (12)

so the OLS predictions ^yi have the same sum, and hence the same sample mean, as the original

dependent variable yi. Also n X i=1 ^ yiu^i = ^0 n X i=1 ^ ui+ ^1 n X i=1 xiu^i = 0: (13)

(24)

Now consider the total sum of squares SST = n X i=1 (yi y)2 = n X i=1 (yi y^i+ ^yi y)2 = n X i=1 (^ui+ ^yi y)2 = n X i=1 ^ u2i + 2 n X i=1 ^ ui(^yi y) + n X i=1 (^yi y)2 = n X i=1 ^ u2i + 2 n X i=1 ^ uiy^i y n X i=1 ^ ui ! + n X i=1 (^yi y)2 = SSR + SSE:

The last step uses (13) and (10) to cancel the middle two terms. It also uses (12) to identify SSE = Pn

i=1 y^i y^ 2

= Pni=1(^yi y)2. Also (10) implies that u = 0 so that SSR = Pni=1 u^i u^ 2

does not require the ^u. Dividing this equality through by n 1 gives (6).

2

Statistical Inference and the Population Regression Function

The review of regression in section 1 suggests that regression is a useful tool for summarising the relationship between two observed variables and for calculating predictions for one variable based on observations on the other. In econometrics we want to do more than this. We want to use the information contained in a sample to carry out inductive inference (statistical inference) on the underlying population from which the sample was drawn. For example, we want to take the sample of 209 CEO salaries in section 1 as being representative of the salaries of CEOs in the population of all …rms. In practice it is necessary to be very careful about the de…nition of this population. This dataset, taken from the American textbook of Wooldridge, would best be taken as being representative of only US …rms, rather than all …rms in the world, or all …rms in OECD countries. In fact the population may be US publicly listed …rms, since …rms unlisted on the stock market may have quite di¤erence processes for executive salaries. Nevertheless, with the population carefully de…ned, the idea of statistical inference is to make statistical statements about that population, not only the sample that has been observed.

2.1 Simple random sample

Suppose there is a well de…ned population in which we are interested, eg. the population of publicly listed …rms in the US. A simple random sample is one in which each …rm in the population has an equal probability of being included in the sample. Moreover each …rm in the sample is chosen independently of all the others. That is the probability of inclusion or exclusion of one …rm into the sample does not depend on the inclusion or exclusion of any other …rm.

For each …rm included in the sample, we take one or more measurements of interest (eg. CEO salary and the …rm’s Return on Equity). Mathematically these are represented as the random variables yi and xi for i = 1; : : : ; n, where n is the sample size. The concept of a random variable

re‡ects the idea that the values taken in the sample would have been di¤erent if a di¤erent random sample had been drawn. In the observed sample we had y1 = 1095, y2= 1001, etc, but if another

(25)

simple random sample had been drawn then di¤erent …rms should (most likely) have been chosen and the yi values would have been di¤erent. That is, random variables take di¤erent values if a

di¤erent sample is drawn.

In the population of all …rms, there is a distribution of CEO salaries. The random variables y1; y2; : : : are independent random drawings from this distribution. That is, each of y1; y2; : : :

are independent of each other and are drawn from the same underlying distribution. They are therefore called independent and identically distributed random variables, always abbreviated as i.i.d..

There are many other sampling schemes that may arise in practice, some of which will be introduced later. Our initial discussion of regression modelling will be con…ned to cross-sectional data drawn as a simple random sample, i.e. con…ned to i.i.d. random variables.

2.2 Population distributions and parameters

The population distribution of interest will be characterised by certain parameters of interest. For example, the distribution of CEO salaries for the population of publicly listed …rms in the US will have some population mean that could be denoted . The population mean is de…ned mathematically as the expected value of the population distribution. This expected value is the weighted average of all possible values in the population, with weights given by the probability distribution denoted f (y), i.e. = R yf (y) dy. (The evaluation of such integrals will not be required here, but see Appendix B.3 of Wooldridge for some more details.) Since each of the random variables y1; y2; : : : represent random drawings from the population distribution, each of

them also has a mean of . This is written

E (yi) = ; i = 1; 2; : : : : (14)

Similarly the population distribution of CEO salaries has some variance that could be denoted

2, which is de…ned in terms of y i as 2 = var (y i) = E h (yi )2 i ; i = 1; 2; : : : : (Or, in terms of integrals, as 2 =R (y )2f (y) dy.)

2.3 Population vs Sample

It will be important throughout to be clear on the distinction between the population and the sample. The population is too large or unwieldly or simply impossible to fully observe and measure. Therefore a quantity such as the population mean = E (yi) is also impossible to

observe. Instead we take a sample, which is a subset of the population, and attempt to estimate the population mean based on that sample. An obvious (but not the only) statistic to use to estimate is the sample mean y = n1 Pni=1yi. A sample statistic such as y is observable, for eg

y = 1281:12 for the CEO salary data (see Figure 11).

It is vital at all times to keep clear the distinction between an unobservable population pa-rameter like = E (yi) about which we wish to learn, and an observable sample statistic y that

we use to estimate . More generally we want to use y (and perhaps other statistics) to draw statistical inferences about .

2.4 Conditional Expectation

In econometrics we are nearly always interested in at least two random variables in a population, eg yi for CEO salary and xi for Return on Equity, and the relationships between them. Of central

(26)

interest in econometrics is the conditional distribution of yi given xi. That is, rather than being

interested in the distribution of CEO salaries in isolation (the so-called marginal distribution of yi), we are interested in how the distribution of CEO salaries changes as the Return on Equity of

the …rm changes. For regression analysis, the fundamental population quantity of interest is the conditional expectation of yi given xi, which is denoted by the function

E (yijxi) = (xi) : (15)

(As outlined in Appendix B.4 of Wooldridge, this conditional expectation is de…ned as (x) = R

yfY jX(yjx) dy, where fY jX is conditional distribution of yi given xi.) Much of econometrics is

devoted to estimating conditional expectations functions.

The idea is that E (yijxi) provides the prediction of yi corresponding to a given value of xi

(i.e. the value of yi that we would expect given some value of xi). For example (10) is the

population mean of CEO salary for a …rm with Return on Equity of 10%. This conditional mean will be di¤erent (perhaps lower?) than (20) which is the population mean of CEO salary for a …rm with Return on Equity of 20%. If the population mean of yi changes when we change the

value of xi, there is a potentially interesting relationship between yi and xi to explore.

Consider the di¤erence between the unconditional mean = E (yi) given in (14) and the

conditional mean (xi) = E (yijxi) given in (15). These are di¤erent population quantities with

di¤erent uses. The unconditional mean provides an overall measure of central tendency for the distribution of yi but provides no information on the relationship between yi and xi. The

conditional mean (xi), by contrast, describes how the predicted/mean value of yi changes with

xi. For example, is of interest if we want to investigate the overall average level of CEO salaries

(perhaps to compare them to other occupations say), while (xi) is of interest if we want to start

to try to understand what factors may help explain the level of CEO salaries.

Note also that is, by de…nition, a single number. On the other hand (xi) is a function,

that is it is able to take di¤erent values for di¤erent values of xi.

2.5 The Population Regression Function

The Population Regression Function (PRF) is, by de…nition, the conditional expectations function (15). In a simple regression analysis, it is assumed that this function is linear, i.e.

E (yijxi) = 0+ 1xi: (16)

This linearity assumption need not always be true and is discussed more later. This PRF speci…es the conditional mean of yi in the population for any value of xi. It speci…es one important aspect

of the relationship between yi and xi.

Statistical inference in regression models is about using sample information to learn about E (yijxi), which in the case of (16) amounts to learning about 0 and 1. Consider the SRF

introduced in (1), restated here:

^

yi = ^0+ ^1xi: (17)

The idea is that ^0 and ^1 are the sample OLS estimators that we calculate to estimate the unobserved population coe¢ cients 0 and 1. Then for any xi we can use the sample predicted

value ^yi to estimate the conditional expectation E (yijxi).

2.6 Statistical Properties of OLS

An important question is whether the OLS SRF (17) provides a good estimator of the PRF (16) in some sense. In this section we address this question assuming that

(27)

A1 (yi; xi)i=1 are i.i.d. random variables (i.e. from a simple random sample)

A2 the linear form (16) of the PRF is correct.

Estimators in statistics (such as a sample mean y or regression coe¢ cient ^0; ^1) can be considered to be random variables since they are functions of the random variables that represent that data. For example the sample mean y = 1nPni=1yi is a random variable because it is de…ned

in terms of the random variables y1; : : : ; yn. That is, if a di¤erent random sample had been drawn

for y1; : : : ; yn then a di¤erent value for y would be obtained. The distribution of an estimator

is called the sampling distribution of the estimator. The statistical properties of an estimator is derived from its sampling distribution.

2.6.1 Properties of Expectations

The properties of a sampling distribution are often de…ned in terms of its mean and variance and other similar quantities. To work these out, it is necessary to use some simple properties of the expectations operator E and the conditional expectations operator, summarised here.

Suppose z1; : : : ; zn are i.i.d random variables and c1; : : : ; cnare non-random. Then

E1 E (Pni=1cizi) =Pni=1ciE (zi)

E2 var (Pni=1cizi) =Pni=1c2i var (zi)

E3 E (ci) = ci, var (ci) = 0:

Property E1 continues to hold if zi are not i.i.d. (for example, if they are correlated with each

other) but Property E2 does not continue to hold if ziare correlated. Recall from Assumption A1

that, at least for now, we are assuming that the random variables yi and xi are each i.i.d. across

i. Property E3 simply states that the expectation of a constant (ci) is itself, and that a constant

has no variation.

In view of the de…nition of the PRF (16), conditional expectations are fundamental to re-gression analysis. It turns out to be useful to be able to work with not only E (yijxi) but

E (yijx1; : : : ; xn), which is the conditional expectation of yi given information on the

explana-tory variables for all observations, not only observation i. The reason for this becomes clear in the following section. Under Assumption A1

E (yijxi) = E (yijx1; : : : ; xn) : (18)

This can be proven formally, but the intuition is simply that under independent sampling, infor-mation in explanatory variables xj for j 6= i is not informative about yi since (yi; xi) and (yj; xj)

are independent for all j 6= i. That is, knowing xj for j 6= i does not change our prediction of yi.

For example, our prediction of the CEO salary for …rm 1 is not improved by knowing the Return to Equity of any other …rms, it is assumed to be explained only by the performance of …rm 1. That is

E (salaryijRoEi) = E (salaryijRoE1; : : : ; RoEn) :

Equation (18) is reasonable under Assumption A1, but not in other sampling situations such as time series data considered later.

The conditional variance of a random variable is a measure of its conditional dispersion around its conditional mean. For example

var (yijxi) = E

h

(yi E (yijxi))2 xi

i :

(28)

(Compare this to the unconditional variance var (yi) = E

h

(yi E (yi))2

i

.) The conditional vari-ance of yi is the variation in yi that remains when xi is given a …xed value. The

uncondi-tional variance of yi is the overall variation in yi, averaged across all xi values. It follows that

var (yijxi) var (yi). If yi and xi are independent then var (yijxi) = var (yi). It is frequently the

case in practice that var (yijxi) may vary in important ways with xi. For example, it may be that

CEO salaries are more highly variable for more pro…table …rms than less pro…table …rms. Or, if yi is wages and xi is individual age, then it is likely that the variation in wage across individuals

become greater as age increases. If var (yijxi) varies with xi then this is called heteroskedasticity.

If var (yijxi) is constant across xi then this is called homoskedasticity.

Under Assumption A1, the conditional expectations operator has properties similar to E1 and E2. Suppose c1; : : : ; cn are either non-random or functions of x1; : : : ; xn only (i.e. not functions

of y1; : : : ; yn). Then

CE1 E (Pni=1ciyijx1; : : : ; xn) =Pni=1ciE (yijxi)

CE2 var (Pni=1yijx1; : : : ; xn) =Pni=1c2i var (yijxi)

Without i.i.d. sampling (eg. time series), CE1 would continue to hold in the form E (Pni=1yijx1; : : : ; xn) =

Pn

i=1E (yijx1; : : : ; xn) while CE2 would generally not be true.

The …nal very useful property of conditional expectations is the Law of Iterated Expectations: LIE For any random variables z and x, E [z] = E [E (zjx)].

The LIE may appear odd at …rst but is very useful and has some intuition. Leaving aside the regression context, let z represent the outcome from a roll of a die, i.e. a number from 1; 2; : : : ; 6. The expected value of this random variable is E (z) = 16(1 + 2 + : : : + 6) = 3:5, since the probability of each possible outcome is 16. Now suppose we de…ne another random variable x that takes the value 0 if z is even and 1 if z is odd. That is x = 0 if z = 2; 4; 6; and x = 1 if z = 1; 3; 5, so that Pr (x = 0) = 12 and Pr (x = 1) = 12. It should be clear that E (zjx = 0) = 4 and E (zjx = 1) = 3, which illustrates the idea that conditional expectations can take di¤erent values (4 or 3) when the conditioning variables take di¤erent values (0 or 1). The expected value of the random variable E (zjx) is taken as an average over the possible x values, that is, E [E (zjx)] = 12(4 + 3) = 3:5, since the probability of each possible outcome of E (zjx) is

1 2. This

illustrates the LIE, i.e. E (z) = E [E (zjx)] = 3:5. While E [E (zjx)] may appear more complicated than E (z), it frequently turns out to be easier to work with.

The LIE also has a version in variances: LIEvar var (z) = E [var (zjx)] + var [E (zjx)].

This shows that the variance of a random variable can be decomposed into its average conditional variance given x and the variance of the regression function on x.

2.6.2 Unbiasedness

An estimator is de…ned to be unbiased if the mean of its sampling distribution is equal to the true value of the parameter being estimated. If ^ is any estimator of a parameter , it is unbiased if E ^ = . The idea is that an unbiased estimator is one that does not systematically under-estimate or over-under-estimate the true value . Some samples from the population will give values of ^ below and some samples will give values of ^ above , and these di¤erences average out. In practice we only get to observe a single value of ^ of course, and this single value may di¤er from

(29)

by being too large or small. It is only on average over all possible samples that the estimator gives . So unbiasedness is a desirable property for a statistical estimator, although not one that occurs very often. However in linear regression models there are situations where the OLS estimator can be shown to be unbiased. We consider the unbiasedness of the sample mean …rst, and then the OLS estimator of the slope coe¢ cient in a simple regression.

Let y = E (yi) denote the population mean of the i.i.d. random variables y1; : : : ; yn. Then

E (y) = E 1 n n X i=1 yi ! = 1 n n X i=1 E (yi) = 1 n n X i=1 y = y; (19)

where the second step uses Property E1 above, and this shows that the sample mean is an unbiased estimator of the population mean.

Under Assumptions 1 and 2 above, the OLS estimators ^0and ^1can be shown to be unbiased. Just ^1 is considered here. First recall the property of zero sums around sample means (7), which implies n X i=1 (xi x) = 0; (20) and n X i=1 (xi x) (yi y) = n X i=1 (xi x) yi y n X i=1 (xi x) = n X i=1 (xi x) yi; (21) and similarly n X i=1 (xi x)2 = n X i=1 (xi x) xi: (22)

Using (21) allows ^1 in (3) to be written ^ 1 = Pn i=1(xi x) yi Pn i=1(xi x) 2 = n X i=1 (xi x) Pn i=1(xi x)2 yi = n X i=1 an;iyi: (23)

This shows that ^1 is a weighted sum of y1; : : : ; yn, with the weight on each observation yi being

given by an;i= (xi x) Pn i=1(xi x)2 ; (24)

which for each i depends on all of x1; : : : ; xn(hence the subscript n included in the an;inotation).

Now use the LIE to write

E ^1jx1; : : : ; xn = n X i=1 an;iE (yijx1; : : : ; xn) = n X i=1 an;i( 0+ 1xi) = 0 n X i=1 an;i+ 1 n X i=1 an;ixi; (25)

(30)

where the second lines uses (18), which holds under Assumption A1. Using (20) gives n X i=1 an;i= Pn i=1(xi x) Pn i=1(xi x) 2 = 0;

and using (22) gives

n X i=1 an;ixi = Pn i=1(xi x) xi Pn i=1(xi x)2 = 1: Substituting these into (25) gives

E ^1jx1; : : : ; xn = 1; (26)

and hence, applying the LIE,

Eh^1i= EhE ^1jx1; : : : ; xn

i

= E [ 1] = 1: This shows that ^1 is an unbiased estimator of 1.

2.6.3 Variance

The variance of an estimator measures how dispersed values of the estimator can be around the mean. In general it is preferred for an estimator to have a small variance, implying that it tends not to produce estimates very far from its mean. This is especially so for an unbiased estimator, for which a small variance implies the distribution of the estimator is closely concentrated around the true population value of the parameter of interest.

For the sample mean, consider again the i.i.d. random variables y1; : : : ; yneach with population

mean y = E (yi) and population variance 2y. Then

var (y) = var 1 n n X i=1 yi ! = 1 n2 n X i=1 var (yi) = 2 y n; (27)

the second equality following from Property E2. This formula shows what factors in‡uence the precision of the sample mean — the variance 2

y and the sample size n. Speci…cally having

a population with a small variance 2y leads to a more precise estimator y of y, which makes intuitive sense. Similarly intuitively, a larger sample size n implies a smaller variance of y, implying that more precise estimates are obtained from larger sample sizes.

Now consider the variance of the OLS slope estimator ^1. Using Property LIEvar above, the variance of ^1 can be expressed

var ^1 = Ehvar ^1jx1; : : : ; xn i + varhE ^1jx1; : : : ; xn i = E h var ^1jx1; : : : ; xn i + var [ 1] = Ehvar ^1jx1; : : : ; xn i ;

where (25) is used to get the second line and then Property E3 (the variance of a constant is zero) to get the third line. The conditional variance of ^1 given x1; : : : ; xn is

var ^1jx1; : : : ; xn = n X i=1 a2n;ivar (yijxi) = Pn i=1(xi x)2var (yijxi) Pn i=1(xi x)2 2 ;

(31)

using Property CE2 to obtain the …rst line and then substituting for an;i to obtain the second

line. This implies

var ^1 = E 2 6 4 Pn i=1(xi x)2var (yijxi) Pn i=1(xi x)2 2 3 7 5 ; (28)

which is a fairly complicated formula that doesn’t shed a lot of light on the properties of ^1, but it does have later practical use when we talk about hypothesis testing.

A simpli…cation of the variance occurs under homoskedasticity, that is when var (yijxi) = 2

for every i. If the conditional variance is constant then

var ^1 = E " 2 Pn i=1(xi x)2 # = 2 n 1E 1 s2 x ; (29) where s2x = 1 n 1 n X i=1 (xi x)2

is the usual sample variance of the explanatory variable xi. Formula (29) is simple enough to

understand what factors in a regression in‡uence the precision of ^1. The variance will be small for small values of 2 and large values of n 1 and s2x. This implies practically that slope coe¢ cients can be precisely estimated in situations where the sample size is large, where the regressor xi is highly variable, and where the dependent variable yi has small variation around

the regression function (i.e. small 2). 2.6.4 Asymptotic normality

Having discussed the mean and variance of a sampling distribution, it is also possible to consider the entire sampling distribution. This becomes important when we discuss hypothesis testing.

First consider the sample mean of some i.i.d. random variables y1; : : : ; yn with mean y and

variance 2y. Recall from (19) and (27) that the sample mean y has mean E (y) = y and variance var (y) = 2y=n. In general the sampling distribution of y is not known, but in the special case we it is known that each yi is normally distributed, then it also follows that y is normally distributed.

That is if yi i:i:d:N y; 2y then

y N y; 2 y n ! : (30)

If the distribution of yiis not normal, then the distribution of y is also not normal. In econometrics

is it very rare to know that each yi is normally distributed, so it would appear that (30) has only

theoretical interest. However, there is a powerful result in probability called the Central Limit Theorem that states that even if yi is not normally distributed, the sample mean y can still be

taken to be approximately normally distributed, with the approximation generally working better for larger values of n. Technically we say that y converges to a normal distribution as n ! 1, or that y is asymptotically normal, and we will write this in the form

y a N y; 2 y n ! ; (31)

(32)

.00 .04 .08 .12 .16 .20 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 X F _ G A M M A

Figure 26: The Gamma distribution with parameters b = r = 2

with the “a” denoted the fact that the normal distribution for y is asymptotic (i.e. as n ! 1) or more simply is approximate.

The proof of the Central Limit Theorem goes beyond our scope, but it can be illustrated using simulated data. Suppose that y1; : : : ; yn are i.i.d. random variables with a Gamma distribution

as shown in Figure 26. The mean of this distribution is y = 4. The details of the Gamma distribution are not important for this discussion, although it is a well-known distribution for modelling certain types of data in econometrics. For example, the skewed shape of the distribution can make it suitable for income distribution modelling, in which many people or households make low to moderate incomes and a relative few make high to very high incomes. Clearly this Gamma distribution is very di¤erent in shape from a normal distribution! We can use Eviews to draw a sample of size n from this Gamma distribution and to compute y. Repeating this many times builds up a picture of the sampling distribution of y for the given n. The results of doing this are given in Figures 27-31.

Figure 27 shows the simulated sampling distribution of y when n = 5. The skewness of the population distribution of yi in Figure 26 remains evident in the distribution of y in Figure 27,

but to a reduced extent. The approximation (31), which is meant to hold for large n, does not work very well for n = 5. As n increases, however, through n = 10; 20; 40; 80 in Figures 28-31, it is clear that the sampling distribution of y becomes more and more like a normal distribution, even though the underlying data from yi is very far from being normal. This is the Central

Limit Theorem at work and is why, for reasonable sample sizes, we are prepared to rely on an approximate distribution such as (31) to carry out statistical inference.

Two other features of the sampling distributions in Figures 27-31 are worth noting. Firstly the mean of each sampling distribution is known to be y = 4 because y is unbiased for every n. Secondly the variance of the sampling distribution becomes smaller as n increases because var (y) = 2y=n. That is, the sampling distribution becomes more concentrated around y = 4 as n increases (note carefully the scale on the horizontal axis changing as n increases).

The same principle applies to the regression coe¢ cients ^0 and ^1. Each can be shown to be asymptotically normal because of the Central Limit Theorem. For ^1, the Central Limit Theorem

(33)

0 100 200 300 400 500 600 700 800 900 0 1 2 3 4 5 6 7 8 9 10 11 F re q u e n c y

Figure 27: Sampling distribution of y with n = 5 observations from the Gamma(2; 2) distribution.

0 200 400 600 800 1,000 1,200 1 2 3 4 5 6 7 8 F re q u e n c y YBAR_10

(34)

0 400 800 1,200 1,600 2,000 2 3 4 5 6 7 8 F re q u e n c y YBAR_20

Figure 29: Sampling distribution of y with n = 20 observations from the Gamma(2; 2) distribution.

0 200 400 600 800 1,000 2.4 2.8 3.2 3.6 4.0 4.4 4.8 5.2 5.6 6.0 F re q u e n c y YBAR_40

(35)

0 200 400 600 800 1,000 1,200 1,400 2.8 3.0 3.2 3.4 3.6 3.8 4.0 4.2 4.4 4.6 4.8 5.0 5.2 5.4 F re q u e n c y

Figure 31: Sampling distribution of y with n = 80 observations from the Gamma(2; 2) distribution.

applies to the sum (23) and gives the approximate distribution ^ 1 a N 1; !21;n ; (32) where in general !21;n= var ^1 = E 2 6 4 Pn i=1(xi x) 2 var (yijxi) Pn i=1(xi x)2 2 3 7 5 ; (33)

as given in (28). Under homoskedasticity, this simpli…es to !21;n= 2 n 1E 1 s2 x ; (34) as shown in (29). 2.7 Summary

In introductory econometrics the topic of statistical inference and its theory is typically the most di¢ cult to grasp, both in its concept and formulae. What follows is a summary of the important concepts of this section.

Populations and Samples

Statistical inference is the process of attempting to learn about some characteristics of a population based on a sample drawn from that population.

The most straightforward sampling approach is a simple random sample, in which every element in the population has an equal chance of being included in the sample.

(36)

Population characteristics such as means and variances are de…ned as the expectations y = E (yi) and 2y = E h yi y 2i .

Sample estimators of means and variances are de…ned as the sums y = n1Pni=1yi and

s2 y = n 11

Pn

i=1(yi y)2.

Regression in the Population and the Sample

The Population Regression Function (PRF) is de…ned in terms of the conditional expecta-tions operator E (yijxi) = 0+ 1xi:

The Sample Regression Function (SRF) is de…ned in terms of the OLS regression line ^yi=

^

0+ ^1xi.

Statistical properties

Under simple random sampling y a N y; 2y=n ^ 1 a N 1; !21;n , where in general !21;n= E 2 6 4 Pn i=1(xi x)2var (yijxi) Pn i=1(xi x) 2 2 3 7 5 ; or under homoskedasticity !21;n= 2 n 1E 1 s2 x :

(37)

3

Hypothesis Testing and Con…dence Intervals

The idea of statistical inference is that we use the observable sample information summarised by the SRF

^

yi= ^0+ ^1xi

to make inferences about the unobservable PRF

E (yijxi) = 0+ 1xi:

For example, in the CEO salary regression in Figure 17, we take ^1 = 18:50 to be the point estimate of the unknown population coe¢ cient 1. This point estimate is very useful, but on its own it doesn’t communicate the uncertainty that is implicit in having taken a sample of just n = 209 …rms from all …rms in the population. If we had taken a di¤erent sample of …rms, we would have obtained a di¤erent value for ^1. This uncertainty is summarised in the sampling distribution of ^1 in equation (32), which quanti…es (approximately) the entire distribution of ^

1 that could have been obtained by taking di¤erent samples from the underlying population.

The techniques of hypothesis testing and con…dence intervals provide ways of making probabilistic statements about 1that are more informative, and more honest about the statistical uncertainty, than a simple point estimate.

3.1 Hypothesis testing

3.1.1 The null hypothesis

The approach in hypothesis testing is to specify a null hypothesis about a particular value for a population parameter (say 0 or 1) and then to investigate whether the observed provides evidence for the rejection of this hypothesis. For example, in the CEO salary regression, we might specify a null hypothesis that …rm pro…tability has no predictive power for CEO salary. In the PRF

E (SalaryijRoEi) = 0+ 1RoEi; (35)

the null hypothesis would be expressed

H0 : 1 = 0: (36)

If the null hypothesis were true then E (SalaryijRoEi) = 0, which states that average CEO

salaries are constant ( 0) across all levels of …rm pro…tability.

Note that the hypothesis is expressed in terms of the population parameter 1, not the sample estimate ^1. Since we know that ^1 = 18:50, it would be nonsense to investigate whether ^1 = 0.... it isn’t! Instead we are interested in testing to see whether ^1 = 18:50 di¤ers su¢ ciently from zero such that we can conclude that 1 also di¤ers from zero, albeit with some level of uncertainty that acknowledges the sampling variability inherent in ^1.

3.1.2 The alternative hypothesis

After the specifying the null hypothesis, the next requirement is the alternative hypothesis. The alternative hypothesis is speci…ed as an inequality, as opposed to the null hypothesis which is an equality. In the case of a null hypothesis speci…ed as (36), the alternative hypothesis would be one of the following three possibilities

H1 : 1 6= 0 or

H1 : 1 > 0 or

(38)

depending on the practical context. The alternative H1: 16= 0 is called a two-sided alternative

(falling on both sides of the null hypothesis) while H1: 1> 0 or H1: 1< 0 are called one-sided

alternatives. A one-sided alternative would be speci…ed in situations where the only reasonable or interesting deviations from the null hypothesis lie on one side. In the case of the null hypothesis H0: 1= 0 in (35), we might specify H1 : 1 > 0 if the only interest were in the hypothesis that

pro…table …rms reward their CEOs with higher salaries. However there is also a possibility that some less pro…table …rms might try to improve their fortunes by attempting to attract proven CEOs with o¤ers of a higher salary. With two con‡icting stories like this, the sign of the possible relationship would be unclear and we would specify H1 : 1 6= 0. One very important point is

that we must not use the sign of the sample estimate to specify the alternative hypothesis — the hypothesis testing methodology requires that the hypotheses be speci…ed before looking at any sample information. The hypotheses must be speci…ed on the basis of the practical questions of interest. Both one and two sided testing will be discussed below.

3.1.3 The null distribution

The idea in hypothesis testing is to make a decision whether or not to reject H0 in favour of H1

on the basis of the evidence in the data. For testing (36), an approach to making this decision can be based on the sampling distribution (32). Speci…cally, if H0 : 1 = 0 is true then

^

1 a

N 0; !21;n ;

where !21;n is given in (33), or (34) in the special case where homoskedasticity can be assumed. This sampling distribution can also be written

^

1

!1;n a

N (0; 1) ;

which is useful because the distribution on the right hand side is now a very well known one, the standard normal distribution, for which derivations and computations are relatively straightfor-ward. However this expression is not yet usable in practice because !1;n depends on population

expectations (i.e. it contains an E and a var) and is not observable. It can, however, be estimated using ^ !1;n= qPn i=1(xi x) 2 ^ u2i Pn i=1(xi x)2 ; (37)

which is obtained from (33) by dropping the outside expectation, replacing var (yijxi) by the

squared residuals ^u2i, and then taking the square root to turn ^!21;ninto ^!1;n. This quantity ^!1;nis

called the standard error of ^1. It can then be shown (using derivations beyond our scope) that ^ 1 ^ !1;n a N (0; 1) :

That is, replacing the unknown standard deviation !1;n with the observable standard error ^!1;n

does not change the approximate distribution of ^1. However, often a practical better approxi-mation is provided by ^ 1 ^ !1;n a tn 2; (38)

where tn 2 denotes the t distribution with n 2 degrees of freedom. For large values of n the

(39)

for smaller n using (38) can often give a more accurate approximation. Equation (38) provides a practically usable approximate null distribution for ^1 (it’s called the null distribution because recall we imposed the null hypothesis to obtain ^1 a N 0; !21;n in the …rst step above).

If it is known that the conditional distribution of yi given xi is homoskedastic (i.e. that

var (yijxi) is constant) then (34) can be used to justify the alternative estimator

^ !1;n= s ^2 Pn i=1(xi x)2 ; (39) where ^2 = 1 n 2 n X i=1 ^ u2i;

is the sample variance of the OLS residuals. In small samples the standard error estimated using (39) may be more precise than that estimated by (37), provided the assumption of homoskedas-ticity is correct. If the assumption of homoskedashomoskedas-ticity is incorrect, however, the standard error in (39) is not valid. In econometrics the standard error in (37) is referred to as “White’s standard error” while (39) is referred to the “OLS standard error”. Modern econometric practice is to favour the robustness of (37), and we will generally follow that practice.

If it is known that the conditional distribution of yi given xi is both homoskedastic and

normally distributed (written yijxi N 0+ 1xi; 2 ) then the null distribution (38) with ^!1;n

given in (39) is exact, no longer an approximation. This is a beautiful theoretical result, but since it is very rarely known that yijxi N 0+ 1xi; 2 in practice, we should acknowledge that

(38) is an approximation.

3.1.4 The alternative distribution

If H0 : 1= 0 is not true then the approximate sampling distribution (32) can be written

^

1 a

1+ N 0; !21;n ;

which is informal notation that represents a normal distribution with a constant 1 added to it (which is identical to a N 1; !21;n distribution). Then repeating the steps leading to (38) gives

^ 1 !1;n a 1 !1;n + N (0; 1) ; and then replacing !1;n by the standard error ^!1;n gives

^ 1 ^ !1;n a 1 !1;n + tn 2: (40)

This equation says that if the null hypothesis is false then the distribution of the ratio ^1=^!1;n

is no longer approximately tn 2, but instead is tn 2 with a constant 1=!1;n added to it. That

is, the distribution is shifted (either positively or negatively depending on the sign of 1) relative to tn 2 distribution. The di¤erence between (38) under the null and (40) under the alternative

(40)

3.1.5 Decision rules and the signi…cance level

In hypothesis testing we either reject H0 or do not reject H0 (we don’t accept hypotheses, more

on this soon). A hypothesis test requires a decision rule, that speci…es when H0 is to be rejected.

Because we have only partial information, i.e. a random sample rather than the entire popu-lation, there is some probability that any decision we make will be incorrect. That is, there is a chance we might reject H0 when H0 is in fact true, which is called a Type I error. There is also a

chance that we might not reject H0 when H0 is in fact false, which is called a Type II error. The

four possibilities are summarised in this table.

Truth in the population Decision H0 true H0 false

Reject H0 Type I error Correct

Do not reject H0 Correct Type II error

Clearly we would like a hypothesis test to minimise the probabilities of both Type I and II errors, but there is no unique way of doing this. The convention is to set the signi…cance level of the hypothesis to a small …xed probability , which speci…es the probability of a Type I error. The most common choice is = 0:05, although = 0:01 and = 0:10 are sometimes used.

3.1.6 The t test — theory

The t statistic for testing H0 : 1 = 0 is

t = ^1 ^ !1;n

: (41)

From (38) we know that t a tn 2 if H0 is true, while from (40) we know that t is shifted away

from the tn 2 distribution if H0 is false. First consider testing H0: 1 = 0 against the one-sided

alternative H1 : 1 > 0, implying the interesting deviations from the null hypothesis induce a

positive shift of t away from the tn 2 distribution. We will therefore de…ne a decision rule based

on t that states that H0 is reject if t takes a larger value than would be thought reasonable from

the tn 2 distribution. The way we formalise the statement “t takes a larger value than would be

thought reasonable from the tn 2 distribution” is to use the signi…cance level. The decision rule

is de…ned to reject H0 if t takes a larger value than a critical value c , which is de…ned by the

probability

Pr (tn 2 > c ) =

for signi…cance level . The distribution of t under H0 is tn 2, so the value of c can be computed

from the tn 2 distribution, as shown graphically in Figure 32 for = 0:05 and n 2 = 30. The

critical value in this case is c0:05= 1:697, which can be found in Table G.2 of Wooldridge (p.833)

or computed in Eviews.

For testing H0 : 1 = 0 against H1 : 1< 0, the procedure is essentially a mirror image. The

decision rule is to reject H0 if t takes a smaller value than the critical value, which is shown in

Figure 33. If c is the -signi…cance critical value for testing against H1: 1 > 0, then c is the

-signi…cance critical value for testing against H1 : 1 < 0. That is

Pr (tn 2< c ) = :

The critical value for = 0:05 and n 2 = 30 is therefore simply c0:05= 1:697.

For testing H0 : 1 = 0 against H1 : 1 6= 0, the potentially interesting deviations from the null

(41)

Figure 32: The tn 2 distribution with = 0:05 critical value for testing H0 : 1 = 0 against

H1: 1 > 0.

Therefore we need to check in either direction. That is, we will reject H0 if either t takes a larger

value than considered reasonable for the tn 2 distribution, or a smaller value. The decision rule

is to reject H0 if t > c =2 or t < c =2, which can be expressed more simply as t > c =2 , where

c =2 satis…es

Pr tn 2> c =2 =

2; or equivalently

Pr jtn 2j > c =2 = :

The critical value for = 0:05 and n 2 = 30 is c =2= 2:042.

3.1.7 The t test — two sided example

Every hypothesis test needs to specify the following elements: 1. The null hypothesis H0:

2. The alternative hypothesis H1:

3. A signi…cance level :

4. A test statistic. (in this case t, but we will see others soon) 5. A decision rule that states when H0 is rejected.

6. The decision, and its interpretation.

Consider the CEO salary regression, which has PRF

(42)

Figure 33: The tn 2 distribution with = 0:05 critical value for testing H0 : 1 = 0 against

H1: 1 < 0.

Figure 34: The tn 2 distribution with = 0:05 critical value for testing H0 : 1 = 0 against

(43)

Figure 35: Choosing to use White standard errors that allow for heteroskedasticity

and the hypotheses H0 : 1 = 0 and H1 : 1 6= 0;so that we are interested in either positive or

negative deviations from the null hypothesis, i.e. any role for …rm pro…tability in predicting CEO salaries, whether positively or negatively. We will choose = 0:05, which is the default choice unless speci…ed otherwise.

The test statistic will be the t statistic given in (41). This statistic can be computed in Eviews using either of (37) or (39), with the default choice being (39) which imposes the homoskedasticity assumption. This assumption can frequently be violated in practice, and can be tested for, but we will play it safe for now and use the (37) version of ^!1;n which allows for heteroskedasticity.

This requires an additional option to be changed in Eviews. When specifying the regression in Eviews in Figure 16, click on the “Options” tab to reveal the options shown in Figure 35, and select “White” for the coe¢ cient covariance matrix as shown. The resulting regression is shown in Figure 36, with the selection of the appropriate White standard errors highlighted. We now have enough information to carry out the hypothesis test. The details are as follows.

1. H0: 1 = 0

2. H1: 1 6= 0

3. Signi…cance level: = 0:05 4. Test statistic: t = 2:71

5. Reject H0 if jtj > c0:025= 1:980

6. H0 is rejected, so Return on Equity is a signi…cant predictor for CEO Salary.

The critical value of c0:025= 1:96 is found from the table of critical values on p.833 of Wooldridge,

reproduced in Figure 37. For this regression with n = 209, the relevant t distribution has n 2 = 207 degrees of freedom. This many degrees of freedom is not included in the table, so we choose the closest degrees of freedom that is less than this number, i.e. 120. The test is two-sided with signi…cance level of = 0:05, so the critical value of c0:025 = 1:980 can be read from the third

(44)

Figure 36: CEO salary regression with White standard errors

3.1.8 The t test — one sided example

The assessment for ETC2410/ETC3440 in semester two of 2013 consisted of 40% assignments during the semester and a 60% …nal exam. Descriptive statistics for these marks, both expressed as percentages, are shown in Figures 38 and 39. It may be of interest to investigate how well assignment marks earned during the semester predict …nal exam marks. In particular, we would expect that those students who do better on assignments during the semester will go on to also do better on their …nal exams. The scatter plot in Figure 40 show that such a relationship potentially does exist in the data, so we will carry out a formal hypothesis test in a regression.

The PRF has the form

E (examijasgnmti) = 0+ 1asgnmti; (43)

and we will test H0 : 1 = 0 (that assignment marks have no predictive power for exam marks)

against the one-sided alternative H1 : 1 > 0 (that higher assignment marks predict higher exam

marks). The estimates are given in Figure 41, in which the SRF is d

exami = 23:763

(5:360) + 0:548(0:095)asgnmti:

The numbers in parentheses below the coe¢ cients are the standard errors. This is a common way of reporting an estimated regression equation, since it provides su¢ cient information for the reader to carry out some inference themselves if they wish. The hypothesis test of interest proceeds as follows.

1. H0: 1 = 0

2. H1: 1 > 0

3. Signi…cance level: = 0:05 4. Test statistic: t = 5:766

(45)
(46)

0 2 4 6 8 10 12 14 16 20 25 30 35 40 45 50 55 60 65 70 75 80 85 Series: ASGNMT Sample 1 118 Observations 118 M ean 59.67415 M edian 61.92188 M aximum 83.62500 M inimum 19.25000 Std. Dev. 13.27081 Skewness -0.738493 Kurtosis 3.340469 Jarq ue-Bera 11.29557 Probability 0.003525

Figure 38: Assignment marks for ETC2410 / ETC3440 in semester two of 2013.

0 2 4 6 8 10 12 0 10 20 30 40 50 60 70 80 90 Series: EXAM Sample 1 118 Observations 118 M ean 56.49154 M edian 56.61960 M aximum 93.24731 M inimum 0.000000 Std. Dev. 16.74263 Skewness -0.790440 Kurtosis 4.709527 Jarq ue-Bera 26.65651 Probability 0.000002

Figure 39: Exam marks for ETC2410 / ETC3440 in semester two of 2013.

5. Reject H0 if t > c0:05= 1:662

6. H0is rejected, so there is evidence that higher assignment marks predict signi…cantly higher

…nal exam marks.

The critical value in this case is found from the table in Figure 37 using 90 degrees of freedom (n 2 = 116 in this case) and the column corresponding to the = 0:05 level of signi…cance for a one-sided test.

3.1.9 p-values

A convenient alternative way to express a decision rule for a hypothesis test is to use p-values rather than critical values, where they are available.

First consider testing H0 : 1 = 0 against H1: 1 > 0. The critical value for this t test is c0:05

as shown in Figure 32. Recall that c0:05is de…ned to satisfy Pr (tn 2> c0:05) = 0:05, which means

that the area under the tn 2distribution to the right of c0:05is 0.05. Any value of the test statistic

t that falls above c0:05 leads to a rejection of the null hypothesis, and the area under the tn 2

(47)

0 20 40 60 80 10 20 30 40 50 60 70 80 90 ASGNMT E X A M

Figure 40: Scatter plot of exam marks against assignment marks.

References

Related documents

Experiments were designed with different ecological conditions like prey density, volume of water, container shape, presence of vegetation, predator density and time of

O emprego sequencial do filtro de mediana e da transformação inversa da Fração Mínima de Ruído (FMR) mostrou-se eficiente para o tratamento dos ruídos e permitiu a geração

Participants’ general challenge, hindrance, and threat appraisal tendencies, when confronted with work-related stressors, defined as stimuli in the work environment that require

o Supply and application of one coat of PoluAl XT MB to a dry film thickness of 25-30 µm Inspection and quality control in accordance to Blygold protocol.. Please find annexed

High burden of diabetic foot infections in the top end of Australia: An emerging health crisis (DEFINE study). Diabetes research and clinical practice. O'Rourke I, Heard S, Treacy

This block of code will unnecessarily read the html page each time while serving the response of a request, which means if ten people are using our blogging site then for each page

The study presents results on the taxonomic composition and abundance of fish larvae collected in south-western part of the Oman Sea (near Muscat and Sohar) from

Deretter følger et kapittel om den sovjetiske historikerdebatten på 1920-tallet om leninismens opphav, et kapittel der jeg tar for meg spørsmålet om det fantes en russisk