Ordinary Least Square Estimation

(1)

EC114 Introduction to Quantitative Economics

12. Ordinary Least Squares Estimation

Marcus Chambers

Department of Economics University of Essex

(2)

Outline

1 _{Ordinary Least Squares (OLS) Estimation}

2 Goodness-of-fit

3 _{Computing OLS Estimates}

Reference: R. L. Thomas, Using Statistics in Economics,

(3)

Recall that the population regression line is given by E(Y) = α + βX

and the sample regression line is given by ˆ

Y = a + bX

where a and b can be regarded as estimates of α and β. Another way to think of these relationships is in terms of Y itself:

Y = α + βX + ,

Y = a+ bX + e,

(4)

It is clear that, if we vary the sample regression line in some way, we will obtain a different set of residuals. In other words, if we vary the estimation method for the sample regression line, we will obtain a different set of residuals.

(5)

The best known method of fitting a straight line to a scatter diagram is Ordinary Least Squares (OLS).

The sample regression line is determined by the intercept aand the slope b.

A good criterion in the choice of a and b is to make the residuals ‘small’ somehow.

Small residuals imply that the differences between the actual Y and the fitted ˆYare small.

The OLS method of estimation chooses a and b in order to minimize the sum of the squares of the residuals:

n

X

i=1

(6)

We know that ei = Yi− a − bXiso that the sum of squared

residuals can be written

S=X

i

e2_i =X

i

(Yi− a − bXi)2,

which is a function of a and b alone because Yi and Xiare

the given data points.

The objective is to minimise S with respect to a and b. To do this we need to partially differentiate S with respect to aand b and set these derivatives equal to zero:

∂S ∂a = −2 X (Yi− a − bXi) = 0, ∂S ∂b = −2 X Xi(Yi− a − bXi) = 0.

(7)

As these derivatives are set equal to zero we can divide both sides by −2 and re-arrange the terms to give:

X Yi = na+ b X Xi, X XiYi = a X Xi+ b X X_i2, noting thatP a = na.

These are known as the normal equations (but are not related to the normal distribution).

Note that, because Yi− a − bXi= ei, we can also write the

first-order conditions in the form: ∂S ∂a = −2 X ei= 0, ∂S ∂b = −2 X Xiei = 0.

(8)

We therefore have two equations in two unknowns which we can solve for a and b.

The extension question on Problem Set 12 deals with this solution.

A compact representation of the solution is: b= P xiyi

P x2 i

, a = ¯Y− b¯X,

where xi= Xi− ¯X, yi= Yi− ¯Y, and ¯Xand ¯Y are the sample

means of X and Y respectively.

The above expressions for a and b are the OLS estimators of α and β.

(9)

We can compute a and b from various sample sums, making use of the following:

X x2_i =X(Xi− ¯X)2 = X X2_i −(P Xi) 2 n = XX2_i − n¯X2, X xiyi = X (Xi− ¯X)(Yi− ¯Y) = X XiYi− P XiP Yi n = XXiYi− n¯X ¯Y.

In view of this another common expression for b is:

b= P XiYi− P XiP Yi n P X2 i − (P Xi)2 n .

(10)

The data on money stock (Y) and GDP (X) in Table 9.1 of Thomas yield: X Xi = 132.004, X X_i2= 1247.66, XXiYi = 220.956, X Yi = 23.718, X Y_i2= 45.154, n = 30. Based on these quantities we obtain:

X x2_i = XX2_i −(P Xi) 2 n = 1247.66 − 132.004 2 30 = 666.86, X xiyi = X XiYi− P XiP Yi n = 220.956 − 132.004 × 23.718 30 = 116.60.

(11)

The slope coefficient is therefore b= P xiyi P x2 i = 116.60 666.86 = 0.1748.

We also find that ¯

X= 132.004

30 = 4.40, ¯Y =

23.718

30 = 0.7906

and so the intercept is

a= ¯Y− b¯X= 0.7906 − (0.1748 × 4.40) = 0.0212. The sample regression line is therefore

ˆ

(12)

Note that the residuals (the vertical distance between the data point and the line) are larger for the countries with larger GDP (X).

(13)

Note, too, that the sample regression line passes through the point ¯X, ¯Y.

The reason can be seen by re-arranging the equation for a, which gives

¯

Y= a + b¯X.

This is known as the point of sample means. In our example this point is (4.40,0.791).

Note that the intercept a > 0, although its value is small. We had expected a relationship of the form Y = bX, suggesting a = 0.

Although a is small we will want to know whether it is significantly different from zero – we shall consider testing the hypothesis that a = 0 at a later point.

(14)

The value b = 0.175 means that the demand for money per head will increase by $175 whenever GDP per head increases by $1000.

But a more interesting quantity is the income (GDP) elasticity of the demand for money.

We can use the previous results to compute an estimate of it – the required elasticity is given by the formula

η = dY

dX X Y.

However the elasticity varies along our sample regression line because the values of X and Y vary along the line. It is, however, common to evaluate η at the sample means of X and Y, while dY/dX can be estimated by b.

(15)

In our case the elasticity evaluated at the sample means is η = 0.175 4.40

0.791 = 0.973. Thus we obtain a GDP elasticity close to unity. A 1% rise in GDP per head leads to a 0.97% rise in demand for money per head.

It would be of interest to test the hypothesis that η = 1 and we will examine how to do this later on in the term.

(16)

So far we have fitted the sample regression line ˆ

Y = 0.021 + 0.175X

to our scatter of points in the money-income example. The values of the intercept (0.021) and slope (0.175) were obtained by the method of ordinary least squares (OLS) which chooses these values so as to minimise the sum of squared residuals: X i e2_i =X i (Yi− a − bXi)2.

But we might want to ask the question: how well does our sample regression line fit the data?

(17)

We can observe that the sample regression line passes ‘fairly close’ to each point in the scatter, although with greater dispersion for larger values of X (GDP per head). We need, however, to be more precise about this; in other words we need some sort of numerical measure of

(18)

We will use the coefficient of determination, R2_{, which is}

equal to the square of the correlation coefficient R, where

R= P (X − ¯X)(Y − ¯Y)

pP(X − ¯X)2_{pP(Y − ¯}_Y)2

We know that −1 ≤ R ≤ 1, and so it follows that 0 ≤ R2 _{≤ 1.}

In our example of the demand for money we found that

R= 0.8787.

Hence the coefficient of determination must therefore be R2 = (0.8787)2= 0.772.

In regression analysis it is possible to give a precise interpretation to the value 0.772 obtained for R2.

(19)

Suppose we ask the question:

What proportion of the variation in the demand for money in our 30 countries can be attributed to the variation in GDP?

If our sample regression line is able to explain a high proportion of the variation in the demand for money then it must provide a good fit to the data.

Consider the next Figure, which refers to a single sample point, namely France, which is observation i = 8.

We have Y₈ = 2.3912; ˆY₈ = 1.6776; and the overall sample mean is ¯Y = 0.7906.

(20)

The diagram shows the fitted line, the sample mean line, the residual e8= Y8− ˆY8, as well as the deviations Y8− ¯Y

(21)

The variations in demand for money are measured relative to the mean.

The following relationship holds:

total = variation due + residual

variation to X variation

Y₈− ¯Y = Yˆ₈− ¯Y + e₈

1.6006 = 0.8870 + 0.7136

Such a relationship holds for all points in the sample, so that we can write

(22)

Note that these variations can be positive or negative, and that they only apply to a single point in the sample.

However, we require an overall measure for the entire sample, and when we talk about variation we usually have in mind a positive measure.

A measure of variation of Y taken over the entire sample is the total sum of squares (SST):

n

X

i=1

(Yi− ¯Y)2.

This is the total variation in Y that we attempt to explain by our regression line, and is always non-negative.

We have seen this sort of quantity before – dividing by n− 1 gives the sample variance.

(23)

A sample-wide measure of the variation in Y due to X is given by the explained sum of squares (SSE):

n

X

i=1

(ˆYi− ¯Y)2.

This quantity is also non-negative.

Finally, a measure of the total residual variation is the residual sum of squares (SSR):

n

X

i=1

e2_i,

(24)

The following relationship holds:

total sum of = explained sum + residual sum

squares of squares of squares

Pn

i=1(Yi− ¯Y)2 = Pn_i=1(ˆYi− ¯Y)2 + Pn_i=1e2i

SST = SSE + SSR

The extension question on Problem Set 12 deals with this identity.

These quantities are used to define the coefficient of determination, R2_{, as follows:}

R2 = variation in Y due to X

total variation in Y =

SSE

(25)

Alternative (but equivalent) expressions for R2include

R2= 1 −SSR

SST

obtained by making the substitution SSE=SST−SSR. Another expression is R2= b 2_{P x}2 i P y2 i where xi= Xi− ¯Xand yi = Yi− ¯Y.

The derivation of this last expression requires showing that SSE= b2P x2

i and noting that SST=P y2i (see the

(26)

In our demand for money example we have already shown that R2_{= 0.772 by squaring the correlation coefficient}

R= 0.8787.

However, we know that

b= 0.17485, Xx2_i = 666.86, Xy2_i = 26.403, and hence an alternative derivation is

R2= b 2_{P x}2 i P y2 i = 0.17485 2_{× 666.86} 26.403 = 0.772.

This implies that just over 77% of the variation in the demand for money can be attributed to the variation in GDP.

(27)

In (a) and (c) R2 = 1 because all points lie on a single sample line – in (a) R = +1 and in (c) R = −1.

In (b) R = 0 due to the lack of association between the two variables and hence R2_{= 0.}

(28)

The correlation coefficient, R, is a measure of the strength of association between two variables, and says nothing about the direction of causation (if any exists).

The coefficient of determination, R2, however, is based on the regression model Y = α + βX + in which the

causation is assumed to go from X to Y.

However, we should be careful to refer to R2 _{as the}

percentage of the variation in Y attributed to X rather than explained by X, because any such relationship could be spurious.

(29)

In practice we use computer software for OLS calculations. As an example, the Stata output for the money demand example is of the form:

. regress m g

Source | SS df MS Number of obs = 30

---+--- F( 1, 28) = 94.88

Model | 20.3862321 1 20.3862321 Prob > F = 0.0000

Residual | 6.01600434 28 .214857298 R-squared = 0.7721

---+--- Adj R-squared = 0.7640

Total | 26.4022364 29 .910421946 Root MSE = .46353

---m | Coef. Std. Err. t P>|t| [95% Conf. Interval]

---+---g | .1748489 .0179502 9.74 0.000 .1380795 .2116182

_cons | .0212579 .1157594 0.18 0.856 -.2158645 .2583803

---Quite a lot of information is provided by default, but note that the estimates a and b are given at the start of the final two rows (under the heading ‘Coef.’).

(30)

Summary

Ordinary Least Squares (OLS) Estimation Goodness-of-fit

Next week: