1997 [Jack Johnston, John Dinardo] Econometric Methods.pdf

(1)

(2)

(3)

vi Ecoxom'rmc Msnloos

1.3 To derivecovtc, b4

1.4 Gauss-Markov theorem

1.5 To derivevarteol

ProblemR

2 Further

Aspects of Two-variable Relationships

2.1 Timeas aRegressor

= 2.l.1

Constant Growth Culwes

2.1.2 Numerical Example

17 _{Transfonnations} ofValiables

2.2.1 Log-Log Transformations

2.2.2 Semilog Transformations

2.2.3 Reciprocal Transformations

2.A AnEmpirical Example of aNonlinear Relation; U.S. lntlation d

an Unemployment

2.4 Lagged Dependent Variableas Regressor

2.4.1 Anlntroduction _{to Asymptotics}

2.4.2 Convergence in Probability

2.4.3 Convergence in Distribution

2.4.4 The Autoregressive Equation

2.5 Stationary and _{Nonstationary Series}

2.5.1 Unit Root

2.5.2 Numerical lllustration

Maximum Likelihood Estimation of the Autoregressive Equation

2.6.1 Maximum Likelihood Estimators

2.6.2 Properties of Maximum Likelihood Estimators Appendix

2.1 Change of variables in density functions

2.2 Maximum likelihood estimators for the AR(1)model

Problems

3 The k-variable Linear Equation

3.1 Matrix Formulation ofthe k-variable Model

3.1.1 The AlgebraofLeast Squares

3.1.2 Decomposition ofthe Sum of Squares

3.1.3 Equation inDeviation Form

3.2 Partial Correlation Coefticients

3.2.1 Sequential Buildup ofthe Explained Sum of Squares

3.2.2 Partial Correlation Coefhcients and Multiple Regression Coefticients

3.2.3 General Treatment ofPartial Copelationand Multiple Regression Coefficients

The GeometryofLeast Squares Inference in the k-variable Equation

3.4.l Assumptions

3.4.2 Mean and Variance of b

36 36 37 37 41 42 43 43 45 46 47 49 52 53 54 55 56 57 59 59 61 61 63 65 69 70 70 72 73 76 78 82 83 86 86 87

(4)

Contents vii

3.4.3 Estimation ofG2 3.4.4 Gauss-Markov Theorem

3.4.5 Testing Linear Hypotheses about _p

3.4.6 Restricted and Unrestricted Regressions

3.4.7 Fitting the Restricted Regression

3.5 Prediction Appendix

3.1 To prove rI2.3 = (r12- r13n3)/

1- r2jq 1

-n2a

3.2 Solving for asingle regression coefficient ina multiple regression

3.3 Toshow that minimizing a'a _{subject to X'a} z.= _{r gives}

a = XX'X)- lc

3.4 Derivationoftherestricted estimator b. Problems

4

89 89 O 95 96 99 100 101 103 103 104 109 109 110 110 111 112 113 113 116 117 1l8 119 121 121 126 127 128 128 129 130 132 133 133 134 135 137 137 138 139

Some Tests

of

the k-variable Linear Equation

for Specihcation Error

4.l Specilkation Error

4.1.1 Possible Problems with u

4.1.2 Possible Problems with X

4.1.3 Possible Problems with

_p

4.2 Model Evaluation and Diagnostic Tests

4.3 Tests of Parameter Constancy

4.3.1 The Chow Forecast Test

4.3.2 The Hansen Test

4.3.3 Tests Basedon Recursive Estimation

4.3.4 One-step Ahead Prediction Errors

4.3.5 CUSUM and _CUSUMSQ_Tests

4.3.6 A More General TestofSpeciscation Error: The Ramsey RESET Test

'4.4 A Numerical Illustration $5 Tests of Stnlctural Change

%-'4.5. 1 Testof_{One Structural Change}

4.5.2 Testsof _{Slope Coefscients}

4.5.3 Testsof_lntercepts 4.5.4 Summary 4.5.5 A Numerical Example 4.5.6 Extensions ' 46 Dummy Variables 4.6.1 Introduction 4.6.2 Seasonal Dummies 4.6.3 QualitativeVariables

4.6.4 TwoorMore SetsofDummy Variables

4.6.5 A Numerical Example Appendix 4.1 Toshow var(#) = c2

g/k

+

#2(#)#j)-1.Vj

Problems . .E

(5)

viii scoxoMErrltlc METHODS /

,$5 Maximum

Likelihood (ML),

Generalized Least Squares

D

_(GLS),

_and

_{Instrumental Vnriable (IV) Estimators}

5.1 Muimum Likelihood Estimators

5.1.1 Promrties of Maximum Likelihood Estimators ML Estimation ofthe Linear Model

Likelihood Ratio, Wald,and Lagrange Multiplier Tests

5.3.1 Likelihood Ratio (LR) Tests

5.3.2

_ne

Wald (W) Test

5.3.3 Lagrange Multiplier (LM) Test

5.4 ML Estimation of theLinear Modelwith Nonspherical Disturbances

5.4.1 Generalized Least Squares

5.5 Instrumental Variable (lV) Estimators

5.5.1 Smcial Case

5.5.2 Twotage Least Squares (2SLS)

5.5.3 ChoiceofInstruments

5.5.4 TestsofLinear Restrictions Apmndix

5.l Change of variables in density functions

5.2 Centered and uncentered #2

5.3 Toshow that e.'XX'X4- l#?e. = e.'e. - e'e

Problems '

6

Heteroscedasticity and Autocorrelation

'S--/

6.1 Properties ofOLS Estimators

6.2 Tests for Heteroscedasticity

6.2.1

_ne

White Test

6.2.2

_ne

Breusch-pagan/Godfrey Test

6.2.3

_ne

Goldfeld-ouandt Test

6.2.4 Extensions of the Goldfeld-ouandt Test

6,3 Estimation Under Heteroscedasticity

6.3.1 Estimation w'ith Grouped Data

6.3.2 Estimation of the Heteroscedasticity Relation

6.4 Autocorrelated Disturbances

6.4.1 FormsofAutocorrelation: Autoregressive and Moving Average Schemes

6.4.2 Reasons for Autocorrelated Disturbances

6.5 OLS and Autocorrelated Disturbances

6.6 Testing for Autocorrelated Disturbances

6.6.1 Durbin-Watson Test

6.6.2 The Wallis Test for Fourth-order Autocorrelation

6.6.3 Durbin Tests for a Regression Containing Lagged Values

Of _{the Dependent Variable} '

6.6.4 Breusch-Godfrey Test

6.6.5 Box-pierce-ljung Statistic

6.7 Estimation ofRelationships with Autocorrelated

Disturbnnc-142 142 143 145 147 147 148 149 151 152 153 156 157 157 158 158 159 160 l61 162 163 166 166 l67 168 168 170 171 jgj 174 175 176 176 178 179 182 182 185 187 188

(6)

Contents

6.8 Forecasting with Autocorrelated Disturbances

6.9 Autoregressive Conditional Heteroscedasticity (ARCH) Appendix

6.1 LM test formultiplicative heteroscedasticity

6.2 LR test for groupwise homoscedasticity

6.3 Propertiesof_{the ARCHII) process}

Problems

7 Univariate Time Series Modeling

A Rationale for Univariate Analysis

7.1.1

The Lag Operator

7.1.2 ARMA Modeling

7.2 PropertiesofAR, MA, and ARMA Prpcesses

7.2.1 AR(1) Process

.

7.2.2 AR(2) Process

7.2.3 MA Processes

7.2.4 ARMA Processes

7.3 Testing for Stationarity

7.3.1 Graphical lnspection

7.3.2 lntegrated Series

7.3.3 Trend Stationary (TS)and Diffrence Stationary (DS) Series

7.3.4 Unit Root Tests

7.3.5 Numerical Example

7.4 Identihcation, Estimation, and Testing of ARIMA Models

7.4.1 Identihcation 7.4.2 Estimation 7.4.3 Diagnostic Testing Forecasting 7.5.1 MA(1)Process 7.5.2 ARMA(1,1) Process _' 7.5.3 ARIMA(1,1,0) Process 7.6 Seasonality

7.7 A Numerical Example: Monthly Housing Starts Problems

8' Autoregressive Distributed Lag Relationships

8.1 Autoregressive Distributed Lag Relations

8.1.1 A Constant Elasticity Relation

'td' .':'/ ._t_{k .}a',l.: ' _.' , ' -'i' 't.'..,:'

8

l 2 Re arameterizatien -'-u ' ' ' -. . P 8.1.3 Dynamic Equilibrillm 8.1.4 Unit Elasticity 8.1.5 Generalizations Specification and Testing

8.2.1 General to Simpleand Vice Versa

8.2.2 Estimation and Testing

8.2.3 Exogeneity 192 195 198 200 201 202 204 205 206 207 244 244 245 245 246 246 247 248 248 250 253

(7)

ECONOMETRIC METHODS

8.2.4 Exogeneity Tests

8.2.5 The Wu-llausman Test

8.3 Nonstationary Regressors

8.4 A Numerical Example

8.4.1 Stationarity

8.4.2 Cointegration

8.4.3 AResmcihed Relationship

8.4.4 AGeneral ADL Relation

8.4.5 A Reparameterization

8.5 Nonnested Models ApNndix

8.l Nonsingular linearMnsformations _of_thevariables

inanequation

8.2 Toestablish theequality ofthetest statistics in Eqs._(8.37)

and (8.41)

9 Multiple Equation Models

9.1 Vector Autoregressions (VARs)

9.l.1 A Simple VAR 9.1.2 AThree-variable VAR 9.l.3 Higher-order Systems ' 9.2 EstimationofVARS

. 9.2. 1 Testing the Order of the VAR_9.2.2 _{Testing for Granger Causality}

9.2.3 Forecasting, Impulse Response Functions,

andVariance Decomposition

9.2.4 lmpulse Response Functions

9.2.5 Orthogonal lnnovations

9.2.6 Variance Decomposition

; 9.3 Vector Error Correction Models

' , 9.3.1 Testing for Cointegration Rank

. . 9.3.2 Estimation ofCointegrating Vectors

9.3.3 Estimation ofaVector Error Correction Model

9.4 Simultaneous Structural Equation Models

9.5 Identihcation Conditions

9.6 Estimationof Structural Equations

9.6.1 Nonstationary Variables

9.6.2 System Methods of Estimation Appendix

9.1 Seemingly Unrelated Regressions (SUR)

9.2 Higher-order VARS 9.2.1 A VAR(1) Process 9.2.2 A VAR(2) Process Pmblems 256 257 259 265 266 266 270 271 275 280 282 284 285 287 287 287 292 294 295 296 296 297 298 299 30l 301 302 303 305 305 309 314 317 317 318 320 320 321 322

(8)

Contents xi

10 Generalized

Method of Moments

v 10.1 The MethodofMoments 10.2 OLS asaMoment Problem

10.3 lnstrumental Variablesas aMoment Problem

10.4 GMM and the Orthogonality Condition

10.5 Distribution of_theGMM estimator

10.6 Applications

10.6.1 Two-stage Least Squares,and Tests

ofOveridentifying Restrictions

10.6.2 Wu-Hausman Tests Revisited

10.6.3 Maximum Likelihood

10.6.4 Euler Equations '

10.7 Readings Problems

11 A Smorgasbord of Computationally Intensive Methods

11_{1 An Introduction to Monte Carlo Methods} C

11.1.1 Some Guidelines for Monte Carlo Experiments

11.1.2 An Example

11.1.3 Generating Pseudorandom Numbers

111.4 Presenting the Results

11.2

Monte Carlo Methods and Permutation Tests

11.3 The Bootstrap 362

11.3.1

The Standard Errorofthe Median 362

11.3.2 An Example 363

11.3.3 The Parametric Bootstrap 365

11.3.4 Residual Resampling: Time Seriesand Forecasting 366

11.3.5 Data Resampling: Cross-section Data 369

11.3.6 Some Remarks on Econometric Applications ofthe Bootstrap 369

327 328 329 330 333 335 336 336 338 342 343 344 345 348 348 349 350 352 354 359

11.4 Nonparametric Density Estimation 370

11.4.1 Some General Remarks onNonparametric Density Estimation 375

11.4.2 An Application: The Wage EffectsofUnions 376

11.5 Nonparametric Regression 379

11.5.1 Extension: The Partially Linear Regression Modl 383

116 References ..' 385

Problems 385

12 Panel Data

388

( 12.1 Sourcesand TypesofPanel Data 389

12.2 The Simplest Case-rfhe Pooled Estimator 390

12.3 Two Extensions to the Simple Model 390

12.4 The Random Effects Model 391

(9)

xii EcoxoMETRlc METHODS

12.6 The FixedEffects Model in the Two-period Case

12.7

_ne

Fixed Effects Modelwith _MoreThan _{Two Time Periods}

12.8

_ne

PerilsofFixed Effects Estimation

12.8.1 Example 1: Measurement Error in X

12.8.2 Example 2: Endogenous X

12.9 Fixed Effects orRandom Effects?

12.10 A Wu-llausman Test

12.11 Other Specifkation Testsand _{an Introduction to Chnmberlain's}

Approach

l2. 1l.1 Formalizing the Restrictions

12.11.2 Fixed Effects in the General Model

12.11.3

Testing the Restrictions

13

Discrete and Limited

Dependent Variable

Models

l3.1 Typesof_{Discrete Choice Models}

l3.2 The Linear Probability Model

13.3 Example: A Simple Descriptive ModelofUnion Participation

13.4 Formulating a Probability Model

l3.5 The Probit

13.6 The Logit

13.7 Misspecihcation in Binary Dependent Models

13.7.1 Heteroscedasticity

13.7.2 Misspecihcation in the Probitand Logit

13.7.3 Functional Fonn: What ls the_{Right Model to Use?}

Extensions to the Basic Model: Groufe Data

13.8.1 Maximum Likelihood Methods

13.8.2 Minimum

_xl

Methods 13.9 Ordered lrobit 13.10 395 397 399 399 402 403 403 404 406 407 407 408 409 412 412 414 415 418 4l9 424 426 426 427 430 432 432 433 434 Tobit Models 436

13.10.1 The Tobitas an Extension of the Probit 436

13.10.2 Why Not lgnore trf'he _Problem''? ₄₃₉

13.10.3 Heteroscedasticity and the Tobit 440

13.11 Tw()Possible Solutions 441

13.11.1 Symmetrically Trimmed Least Squares 442

13.11.2 Censored Least Absolute Deviations _(CLAIXBstimator 444 Treatment Effects and-rfwo-step Methods 446

13.12.1 The Simple Heckman Correction 447

13.12.2 Some Cautionary Remarks about Selectivity Bias 449

13.12.3 The Tobit asaSpeciz Cws 450

452 45/

(10)

Contents xiii

Appendix A

A.1 Vectors

A.1.1 Multiplication bya Scalar

A.1.2

Addition and Subtraction A.1.3 Linear Combinations A.1.4 Some Geometry A.1.5 Vector Multiplication A.1.6 Equality of_Vectors

A.2 Matrices

A.2.1 Matrix Multiplication A.2.2 The Transpose of aProduct A.2.3 Some Important Square Matrices A.2.4 Partitioned Matrices

A.2.5 Matrix Differentiation A.2.6 Solutionof Equations

A.2.7 The Inverse Matrix A.2.8 The Rank ofaMatrix

A.2.9 Some PropertiesofDeterminants E

A.2.10 Properties of_{Inverse Matrices}

A.2. 11 Moreon Rankand the Solution of Equations

A.2.12 Eigenvalues and Eigenvectors

A.2. 13 Properties ofEigenvalues and Eigenvectors A.2.14 _QuadraticForms and Positive Dehnite Matrices

Appendix

B

B. 1 Random Variablesand Probability Distributions B.2 The Univariate Normal Probability Distribution B.3 Bivariate Distributions

B.4 Relations between the Normal,

_F,

t, and F Distributions

8.,5 Expectations in Bivariate Distributions B.6 Multivariate Densities

B.7 Multivariate Normal pdf B.8 Distributions of

_Quadratic

Forms B.9 Independence of_QuadraticForms

B.10 Independence of aQuadraticFormand aLinear Function

Appendix

C

Appendix

D

lndex

485 486 487 489 490 490 492 493 495 496 497 499 521

(11)

CHAPTER 1

Rlationships

_between

Two Variables

The

economics literature contains innumerable discussions of relationships be-tweenvariables in pairs: quantity and price; consumption and income; demand for money and the interest rate; trade balance and the exchange rate; education and income;unemployment _{and the inqation} _{rate; and many} _more.This isnottosay that

economists believe that theworld _{can be analyzed}adequately _{in terms of}_{a collection}

of bivariate relations. When they leave the two-dimensional diagrams of _the

text-books behind and take on the analysis ofreal problems, multivariate relationships

abound. Nonetheless, some bivariate relationships _{are signiscant in themselves;}

more importantly for our purposes, the mathematical and statistical tools developed for two-variable relationships are fundnmental btlilding blocks for the analysis of more compli-ated situations.

1.1

EXAMPLES OF BIVARIATE RELATIONSHIPS

Figure 1.1

displays two aspects of the relationship between real personal saving (SAV) and real personal disposable income (lNC) in the United States. ln Fig. 1. 1J

the value of each selies _{is shown quarterly for the period from 1959.1 to 1992.1}.

These twoseries and many oftheothers in the examples throughout the bookcome

from the DRl Basic Economics Database

_(formerly

Citibase); where relevant, we

indicate thecorrespondence _{between our labels and the Citibase labels for}the

vari-ables.l Figure 1.1J _{is a typical} _example _{of a}_tlme_{series plot, in}_which _{time is}

dis-playedonthe horizontal axis and thevalues of the series are displayed onthevertical

axis. lncome shows an upward trend throughout the period, and in the early _years.

saving does likewise. This pattern, however, isnot replicated in themiddle and later

lA definition ofall series is given in the data diskv which nrnompanse thiR _{volumt. lnstructions for}

(12)

2 BcpyoMETmc METHODS G' _INC = ewee ₃₅ Ch₂₅₀ ..'* = e t-- ' **%* c:l z? > sAV e - ' 31* o e * 200 *-r**-r sz ' jz e' > _u <:? . 25 _> czl ,.'# '-M'-M tl ,% e', h .' l '> 150 -E! M e- Q - .<# Q ' 2% _u 'J g:

.--eu

1 .e t .-* ..Q G) ₁₀₀ .e' r = a t. 15 n % ... jx Ye + 'a 50 1()(X) * 60 65 70 75 80 85 90 300 Year % + #' + + + + + + ₊ + ++ + 2 .-+ + Af'$ '*+ ++ + +

t.

> :: '$. x+ + + < + + #' + ₊ + + + + + + + + + 150 #+ ++ *' + +$ + + + ₊ + + + + ++ ₊ + .1 + +J + l +:. + 4 50 1e 15 2% 25 A 35 3 mC FIGURE 1.1

(13)

CHAIC'ER 1: Relationships between TwoVariables 3 years. Onemight be tempted toconclude fromFig. 1.1J that saving ismuch more

volatile than income, but that doesnot necessarily follow,since theseries have

sep-arate scalesaz

An alternative display of the same information is in terms of a scatter plot. shown in Fig. l.1:.

Hereone series is plotted against theother. The time dimension

isnolonger shownexplicitly, butmost software _programsallow theoption of_joining

successive points onthescatter sothat theevolution oftheseries over timemay still

be traced. Both partsof Fig. 1.1

indicate gpositive _association _{between the}_variables:

increases in one tend to be associated with increases in the other. It is clear that

although theassociation isapproimately _{linear in the early part of the period, it is}

notsoin thesecond half. Figures 1.2and l.3

illustratevarious associations _{between the}natural log ofreal

personal expenditure ongasoline (GAS),the natural logofthereal priceofgasoline

(PRICE), and the natural logof real disposable personal income (INCOME). The

derivationsoftheseries are described in the data disk.The rationale for the logarith-mic transformations is discussed in Chapter 2.Figure 1.2

givesvarious _{time plets}of

gasoline expenditure, price, and income. The real price series, with 1987 as the base year, shows the two dramatic price hikes of the early and late 1970s, which were subsequently eroded byreductions _{in the} nominal price ofoil and by U.S. inflation, so thereal price at the end of the period was less than that obtaining at the start.

The incomeand expenditure series are bothshown in per capita form, because U.S. population increased byabout 44 _percent _over the period, from 176million _{to 254}

million. The population series used _todellate the expenditure and income series is the civilian noninstitutional _{population aged l 6 and} _{over, which} _{has increased} _even

faster than the general population. Per capitareal expenditure ongasoline increased

steadily in the 1960: and early 1970s,as real income grew and real price declined.

This steady rise endedzwith the price shocks _{of the 1970s,}and _per capita _gas

con-sumption hasnever regained the peak levelsoftheearly seventies.

The scatter plots inFig. 1.3 further illustrate the upheaval in thismarket. The plot for thewhole period in Fig. 1.3/ shows very different associations between ex-penditure and price in theearlier and later periods. Thescatter for 1959.1 to 1973.3 in Fig. 1.?b

looks like aconventional negative association between price and quantity. This is shattered in themiddle _period

_(1973.4

_{to 1981}.4)

andreestablished, though with avery different slope, in the last period

(1982.1

to 1992.1). This dataset will

be analyzed econometrically in thisand laterchapters.

These illustrative scatter diagrams have three main characteristics. One isthe

sign of _the _{association or covariation-that} _{is, do the} _{valiables move together} _in

apositive or negative fashion? Another is the strength of the association. A third characteristic is the linearity

_(or

otherwise) oftheassociation-is the general shape

of the scatter linearor curvilinear? In Section 1

.2

wediscuss the extent to which the

correlation coefhcient measures the hrst two characteristics for a linearassociation,

and in later chapters we will show how to dealwith the linearity question, buttirst

we giveanexample ofabivariate frequency distribution.

(14)

ECONOMEnUC METHODS Year (* ew . % --.4.0 o 1) GAS 't*c ! , -4 j e-'* e'i # @ % . :% .* -.. 1 ' % >..w. 2 # & #' -7 ₈ ' *A JP - ,-.zz _& -4 2 _m rd ..

x

< _.-.z INCOME _c) (J e ., (J * > e# - 4 3 = 7₉ r'R ' # #' # ' -4 4 ..* -8.0 w' # -.e u ₅ 81 -4.6 60 65 70 75 80 85 90 Yc% -7.6 -3.9 () FIGURE 1.2

Time series plots of natural logofgasolineconsumption _{in 1987 dollars per}capita.

(a)Gasoline consumption vs. natural logof pricein1987cents _{per gallon. bj}

(15)

CHANER 1: Relationships between TwoValiables 5 5.0 k e 4.8 m 4.6 1959.1 to1992.1 4.4 -.8.1 -8.0 -7.9 -7.8 -7 .7 -7.6 5.2 GAS aj 4.70 4.65 x 4.60 m 4.55 4.50 1959.1 to1973.3 4.45 -8.1 -8 .0 -7.9 -7.8 -7.7 -7.6 4.75 GAS 5.0 y 4.8 4.6 1973.4 to1981.4 4.4 -7.80 -7.75 -7.70 -7.65 -7.60 5.2 GAS (c) FIGURE 1.3

Vatter plots of price and gasoline consumption.

1.1.1 Bivariate Frequency Distributions

5.0

k

48 < * m 4.6 1982.1 to 1992.1 4.4 -7.80 -7.75 -7.70 -7.65 -7.* 5.2 GAS (6f)

The data underlying Figs. l.1

to 1.3 come in the form of n pairs of observations

of the form (.Y,.,Yi), i = 1, 2,. . . _, n. When the sample size n is very large, the

data areusually _{printed as}_abivariate frequency distribution; theranges of X and i'

are split into subintervals and eachcell of the table shows the number of observatiops

(16)

6 ECONOMETRIC METHODS TABLE 1.1

Distribution of heights and chest circumferences of 5732 Scottish militiamen

*A-%

Chest circumference _(inches)

45 and Row 33-35 36-38 39-41 47-4: _over toKals *2 K5 ₃₉ ₃₃₁ ₃₂₆ ₂₆ ₀ ₇₂₂ Height r< *7 ₄₀ ₅₉₁ 1010 170 4 1815 (inches) A* *9 ₁₉ ₃₁₂ ₁₁₄₄ ₄₈₈ ₁₈ ₁₉₈₁ 78-71 5 100 479 290 23 897 71-73 0 17 l20 153 27 317 Colx:nn toenlq l03 1351 3079 1127 72 5732 souxe: Fdinhvnik u- - *--2 5aqicJlJ/urucl(1817,pp.26X2M). T A B L E l.; Conditional

---f@rthedata in Table 1.1

Mean of lri ' rlNen clxst (inches) 66.31 _66.84 67.89 69.16 _70.53

Mean of clw psen Yight (inches) 38.41 39.19 40.26 40.76 41.80

in thecorresmnding _pairof subintervals. _{Table 1.1 provides} _an example.S _{It is}_not

possible to give a simple. two-dimensional representation of these data. However, insmction of the cell frequencies suggests _{a positive} association _{between the two}

measurements.

nis

isconhrmed _bycalculating theconditional means. First of all, eachof the

_tive

cenlral columns ofthe table givesadistribution of heights for a given

chest mesurement. -111e%

are conditional frequency distributions, and traditional

statistics such s mean andvariances _may _be_calculated. _{Similarly, the rows}_of _the

tablegivedistributions of chet measurements, conditional onheight. The two setsof

conditionalmeans areshewn in Table1.2.,

each _mean series increases monotonically with increases in > mweitioning _variable, _indicating

apositive association between

thevariablp ..: . .:- .. 'q .. .

1.2

THE CORRELATION COEFHCIENT

The directionand closeness of the linear ssociation _{between two}variables _are

mea-sured by thecorrelation coefhcient.4 _Letthe observations be denoted by

_(m,

Fj)with

i = 1,2,

. . ., n. Once the sample means have *en calculated, the data may be

ex-pressed in deviation form as xi =

Xi -

X

yi = Yi-

#

3condensed from Stephen M. Stigler, Fc Histor. of_{Statistics Harvard University Press. 1986, p. 208.}

(17)

CHAIXI'ER 1: Relationships between TwoVariables 7

where

k

and

₉

_{denote the}_{sample means of}_Xand F. Figure 1.4shows anillustrative point ona scatter diagram with thesample means as newaxes, giving four quadrants,

which arenumbered counterclockwise. _{The product} _xiyi _{is positive for all points in}

quadrants I and lll andnegative _{for all points in quadrants 11}and IV._{Since a}

_msitive

relationship will _{have points lying for the} most _{part in quadrants} 1 and 111,and a

negative relationship will have points lying mostly in the other _{two quadrants, the}

signof

_X''j

- lxiyi will indicate whether thescatter slopes upward ordownward.This

sum, however, will tend to increase inabsolute _terms _{as more} _{data are}added _{to the}

sample. Thus, it is better to express the sum inaverage terms, giving the sample covariance, Fl COVIXF) =

(X

- X)(Ff

-Fjln

j.j _{j jj}

(

. n = Xilb'/l i= 1

The value of_{te covariance depends} _on the units inwhich the variables are

mea-sured. Changing onevariable _{from dollars to}cents will givea new covariance 100

times te old. Toobtain _a_{measure of association} _{that is invariant} _{with respect} _to

unitsof _{measurement, the deviations}_are _{expressed in}standard deviation units. The covariance of_thestandardized deviations is thecorrelation coemcient, r nnmely,

y'

Quadrant11 _QuadrantI

(xy)negative _(xy)positive

(arr,l'j) I I I I yf= K- F I l l F xi= Xi -.# ' . . . . QuadrantIlI _QzzzzzfrtznllV

(.ry)_msitive _(ay)negative

r

0 X _x

FIGURE 1.4

(18)

8 EcoxoMilrrluc METHODS Xi _A' r = - - /n = xiyilnsxsy , _J _J , i= l A' _.Y i= l

where

sx = x2ln i i= l n J = ylln )' i ja.j

Omitting subscripts and the limits of summation

_(since

_{there is no}ambiguity) and

performing some algebraic manipulations give three equivalent expressions for the

correlation coefkient--two interms of deviationsand _{one in terms of the}raw data:

N- _vy r = Mxs).

X

xy =

'

''-'JJIZ>u

y2 Y

n

J(

A'F

-(E .Y)-(E

F) =

'

n

X

X2

-(X

X)2 nl F2

-(X

F)2 N

1.2.1

The Correla-e -- Coefficient foraBivariate Frequency Distribution In general.abivariate distributionsuch asthat shown in Table 1.1 may berepresented

by the pairedvalues _X,. ')

with _frequency_n _{f0r i} = 1,

. . . _, mandj = 1,. .., p. Xi

is themidmint of the ith subinterval on the Xaxis, and

j

the midpoint of the

/th

subintenral on the F axis. Ifwe use aperiod fora subscript over which summation

has taken place. the marginal frequencies for Xare given by ni. =

XJ!-1nij for J

i = 1,

. . . . /?1. In conjurtion with the Xivalues these marginal frequencies will yield

the standard deviation of X. that is. s.%..The marginal frequencies for i' are n.j =

X'Nj.

jnij forj = 1.. . . , p.

nus.

the standard deviation of F, orJy, may be obtained.

Finally the covariance isobtained from

m p

covlx

p

-:7 y''

nijxi

-Rlvj

- jhln

(1.4)

=l j= l

wherenis the totalnumber of obsen'ations. Putting the threeelements together, one

may express the correlation coefNcint fortul*-bivarin- frequency distribution in

terms of theraw dataas

m P m p

n

l (2

nxi

-(

nxij

₍

n.jk'jb f=1)=1 =l j=L r = m m p P

n

X

ni.xil

-(X

ni.xt'jl _n

_X

n.jhl

-(

X

n.jYj)1 f=1 =1 j=3 /=1

(19)

CHAIC'ER 1: Relationships between Two Variables 9 1.2.2 The Limits ofr

Thecorrelation _{coefscient must lie in the}_range from

-1 to+_{1. To see this, let c}

be any arbitrary constant. Then

X(y

- c.x)2% 0. Now let

c =

X

xy/

X

xl

.

Substi-tution in the inequality gives

_(X

xy)2 t?

(J x2)(E

y2),_{that is,}r2 ci ₁.This

expres-sion isone formof the Cauchpschwarz inequality. The equality will only hold if

eachand every ydeviation isa constant multiple ofthecorresponding .v

deviation.In

sucha case theobservations a1llieon a single straight line,with apositive slope

(r

=

1)or a negative slope

(r

= - 1).Figure

l.5

shows _two_cases inwhich r is

approxi-matelyzero. lnone case theobservations arescattered over all four quadrants; in the other they lieexactly on aquadratic curve,where positiveand negative products

off-set oneanother. Thus, thecorrelation coefhcient _{measures the degree}oflinear associ-ation. A lowvalue forrdoesnot rule out the possibilityof a strong nonlinear

associa-tion,and such an association might give positiveor negative values forrif thesample

observations happen to be located in pnrticular segmentsofthe nonlinear relation.

1.2.3

Nonseme Correlations and Other Matters

Correlation coefficients must be interpreted with care. Many coefscients that are bothnumerically large and also adjudged statistically sign@cant _{by tests to be}

de-scribed latermay contain no real infonnation. That statistical signihcance has been

achieved does notnecessarily imply that a meaningful and useful relationship has

been found. The crucial question is, What has caused theobserved covariation? f there is a theory about the

joint

variation ofX and F, the sign _{and size of the}

corre-lation coefhcient may lendsuppol't to that theoly, but ifno such theoryexists orcan

be devised, the correlation may beclassed as anonsense correlation.

(c)

FIGURE 1.5

Paired variables forwhich r2 =r 0.

(20)

10 Ecoxouslxlc MErruoos

Our favorite spurious, _or_{nonsense, correlation was} given inabeautiful 1926 pa-per by thestatistician G. Udny Yule.5Yule took annual data from 1866 to 1911 for the death rate inEngland andWales and forthe proportion of allmarriages solemnized

in the Church of England and found the correlation coefscient to be +0.95.

How-ever,noBritish politician proposed closing down the Church of England toconfer

immortality on theelectorate. Morerecently, using annual data from 1897 to 1958, Plosser and Schwerthave founda correlation coefficient of +0.91 between the logof

nominal income inthe United Statesand the logof accumulated sunspots.ti Hendry hasnoted a vel'y strong. though somewhat nonlinear, positive relationship _between

the iniation rate andthe accumulation of annual rainfall in the United Kingdom.7 lt would benice ifthe British could reduce their inflationrate and,as a bonus, enjoy the inestimable sideeffect ofimprovedweather, butsuch happyconjunctions _arenot

to be.

lnthese three examples all ofthevariables are subject to trend-like movements

over time.8 Presumably some complex set of medical, economic, and social factors

contributed tothereduction in the death rate in England and Wales, even asa

differ-ent setof factors preuced a decline in the proportionof marriages in the Church of England. Cumulative sunsmts and cumulative rainfall necessarily trend upward, as

do the U.S.nominal income andthe British iniation rate. Seriesresponding _to

essen-tiallyunrelated generating mechanisms _{may thus}_display_{contemporaneous} upward

and/or downward movements and thus yield strong correlation coefficients. Trends

may l>ehttedto such series, as will beshown in thenext chapter, and theresiduals

from such trends calculated. Correlations between pairs ofresiduals forsuch series

will l>enegligible.

An altemative approach tocorrelating detrendedresiduals _{is to}correlate the first dilerences of the series. _{The first differences are} simply the changes in the series between adjacent observations.

_ney

are usually denoted by the prefix . Thus,

Xt = Xt

-Xt-1 Fr = Ff-

Yt-l

Many series Iat _{show very} high correlations between X and F

(the

Ievels) will

show very low correlations between A.Yand AF

(the

hrst

dterence. Thisresult

usually indicates a spurious relationship. On the other hand, if there is a causal

relationship Ytween the variables, _we expect to hnd correlations betweenlevels and also between

tirst

differences. This point has recently been emphasized in an important pamr byStiglerand Sherwin.g The main thesis ofthe paper is that if

5G._UdnyYule, tWlzy Do We Sometimes Get Nonsense Correlations between TimeSeries?''9Journal

of the Royal Statistical Society Series A, General, 89, 1926, 1-69.

6charles 1. Plosserand G. William Schwert, eeMoney,lncome, and Sunspots: Measuring Economic Re-lationships and the EffectsofDifferencing,'' Journal ofMonetary Economics, 4. 1978, 637-660. ?David F. Hendry, Econometrics-Alchemy orScience?'', Economica, 47, 1980, 387-406.

s'Frends, likemost ecenomic phenemena, are often fragile and transitoly The Imint has been made in lyricalstyle by Sir Alec Cairncross, one ofBritain's most distinguished economists and _{a former chief} economic adviser to the British government. &&h

trend is_a_{trend, is a trend, but the question is, will it} bend? Will italter _{its course, through some}unforeseen force andcome toapremature end?''

gGeorgeJ. Stigler and Robert A. Sherwin, rrhe

Extent of the Marketv'' Journal of f-awand Economics, QR.19*, 555-585.

(21)

CHANER 1: Relationships between TwoVariables l1 twogoods or services are in thesame market their prices should l>eclosely related.

However,since most prices, likemany economic series, show trend-like movements

over time, Stigler and Sherwin wish to guard against being misled byspurious

cor-relation. Thus, in addition tocorrelating price levels they correlate price changes. Asone example, the prices ofDecember 1982silver futuresontheNew York

Com-modity Exchange and _{te Chicago Board}ofTradeover a30-day trading period gave

r = 0.997, and theprice changes

gave r = 0.956. InMinneapolis, Minnesota, and

Kansas City, Missouri, twocenters ofthe iour-milling industry, themonthly

whole-sale prices of :our over 1971-198 1 gavecorrelations of0.97 for levels and 0.92 for first differences. ln these twocases the

tirst

differencecorrelations strongly reinforce

the levels correlations and _{support the thesis} of_asingle market for these goods.

1.2.4

A Case Study

Gasoline isretailed on theWest Coastof the United States by the ttmajors''

(Arco, Shell, Texaco, etc.) and by minors,'' _or tindependents.''

Traditionally the majors

haveoffered _{a greater}variety of_{products, differentiated in terms}ofgradeofgasoline, method of payment, degree of service, and soforth;whereas the minors havesoldfor cash and_{offered a smaller range of} products. ln thespring of 1983 Arco abolished its credit cards and sold for cash only. By the fall of 1983 the other majors had responded bycontinuing theircredit cards _{but introducing two prices,}_{a credit} price andalower cash price. Subsequently one of the independents sued Arcounder the

antitrust laws. The essence of the plaintiff's

case was that there were really two separate markets for gasoline, one inwhich the majors competed with each other,

anda second inwhich theminors competed. They further alleged, though not in this

precise language, that Arco was like a shark that had

jumped

out of the big pool into their little pool with the intention of gobbling them al1 up. Noone questioned

that there was competition within themajors and competition within theminors: the

crucial question was whethe.r there was competition between majors andminors.

The problem was a perfect candidate for the Stigler/sherwin type of analysis.

The Lundberg Surveyreports _{detailed information twice a month on the prices of al1}

types and grades of gasoline at_avery large sample of stations. These data are also

averaged formajors and minors. Twelve differentiated products were defined for the

majors and four for theminors. This step allowed the calculation of 66correlation

coefhcients fora11pairs ofproducts within themajors and 6correlation coefcients

within the minors. Each set of coefficients would be expected _to consist of very

highnumbers, re:ecting the intensity of competition _{inside each group. However, it}

wasalso possible tocalculate 48correlation coecients for a1lcross-pairs ofa major

price_{and a minor} price. If the plaintiff's

argument were correct,these 48coefficients

would be ofnegligible size. On the other hand, if there were

just

a single large

mar-ket for gasoline, thecross correlations shouldnot bemarkedly less thancorrelations

within each _{group. A}nice feature of the problem was that the within-group

corre-lations provided a standard of reference for theassessment ofthecross correlations.

In thecases discussed in the Stigler/sherwin paperonly subjectivejudgments could

bemade _{about the size of correlation coefficient required to establish that two goods}

(22)

12 ECONOMETRICMETHODS E, E'

The preceding approach yielded _amatrix of 120 correlation coefhcients. ln

or-der to guard against possible spurious correlation, such _a_{matrix was computed} for levels, forlirst differences, for logsoflevels, and for rst differences of logs

_(which

measure percent changes in price).ln addition, regression analysiswas used to adjust

for possiblecommon influences from the priceof crude oilorfrom general inpation,

and matriceswere produced forcorrelations between theresiduals from these

regres-sions.ln all cases thematrices showed ''forests''

of tall trees (thatis, highcorrelation

coefficients),and the treeswere justas tall inthe rectangle of cross correlations as in

the triangles of within correlations. Thesimple correlation coefscients _{thus provided}

conclusive evidence for theexistence _{of a}single _{market for retail gasoline.}

1.3

PROBABILITY MODELS FOR TWO VARIABLES

Classical statistical inference is based on the presumption that there exists some

population distribution of all possibleobservations onthevariables of_{interest. That}

distribution ischaracterized bycertain crucial _parameter values. _{From a}sample ofn

observations sample statistics _arecomputed and theseserve as a basis for inference about the population parameters. Ever since the work of Haavelmo in the 1940s the probability approach has been extensively used in econometrics.lo Indeed the development of econometrics _{in the past half} century has been driven mainly by theeffort _toadapt andextend classical _{inference procedures to deal with the}special

problems raised by the nature of _{the data generation process in}economics and the general 'lnnvnilahility _{of controlled economic}

experiments.

13.1

Di.uwmwBivariate Probability Distribution

To introduce some of the main ideas, consider adiscrete bivariate probability

dis-tribution as shown in Table 1

.3.

Thecell entries indicate the probability ofthe

_joint

occurrence of the associated X, F values. Thus, pij = probability that X = Xi and

F = Yj. The column and row totals, where a period indicates the subscript

over which summation has taken place, give the marginal probabilities for Xand _F,

re-smctively. There aresix _{important population parnmeters for the bivariate}

distribu-tion. Themeans are desned by l.x = E (X) =

Y-

Pxi and

_(1.6)

i Thevariances are desned as

2

= vartatl = FgtxY- p.xlzj =

X

pi.lxi - gyjl Gx i

2

= var(F) = Fg(F - Jty)2j =

X

yjlyj - gyjl G_h' .

loerrygve_Haavelmo,_The_{ProbabilityApproach}_{in Econometrics,} supplement _{to Econometrica, 12,}_July, 1944.

(23)

CHAIC'ER 1: Relationships between TwoVariables 13 TABLE 1.3

A bivariate probability distribution

Marginal .1) -'' Xi ''. Xm probability Fz ;1I '- -piL '-' pmt p.t

1r/

7)1_t J'j 77./ F# PLp _J, _Pmp _P.p Marginal _pl. _pi. _pm. 1 probability . dw Thecovariance is

cxy

= covtm

r)

= '((x - Jtxltr - p,yll

=

Y-lY-,

Pijxi - y'xt - #.y)

i _j

Finally, the populationcprrelation cxcint isdefinM q Gxy corrtx, F) =

p =

exey

In these formulae

_Xj

and

_Xj

indicate summationover therelevant subscripts.

Conditional probabilities

Consider the Xicolumn in Table 1.3. Each cell probability may be divided by

the column total,pi., to give a conditional probability forFgiven Xi.Thus,

#

bability that

_z-

-

vj

given that

x

-xi

= PrO

Pi.

=

probtry

I

xi)

(1. 10)

Theman of this distribution is theconditional expectation of F, given Xi, that is,

pij

Mylm= EY

l

-Y,')=

X

. j pi. (1.11) Similarly, thevariance _{of this distribution is a}_{conditional variance, or}

g pij a j jo Gylxf = Var(F

l

Xij = YM, (F./ - Jtylx)

₍

. Pi.

ne conditional means andvariances are both functions of X, so there isa set ofm

conditional means andvariances. ln asimilar fashion one may use te row

probabil-ities to study the conditional distributions of XgivenF.

. . . ''

. . ' .

(24)

14 ECONOMETRIC METHODS

TABLE 1.4

Bivariate distribution of income _(#)and

vacationexpenditure _(1r) ! x(y,,q,; 20 ₃₀ ₄₀ j _zg# _ig _; 2 .08 .15 .03 F 3 .04!, .06 .06 ($.-ww₎ 4 () ',., .()6 .. .j5 .

_s

() () (s

i

03 6 0 , 0 . sjjj _atj _alj Marginal probability ! . '-') . . S, Mean(F_IzY) . , 2.5 3.9 Var ()' a) - -44 85 1 09

rs

,.e ...

n.

s

-4

o

j-

_a.f

_u

j

,

y

0

.7

4

- ' . TAB 1.E l.6

Conditional probabilities from Table 1.4

F 2 , Of . 0t! l 2 3 4 5 6 -t--q ). Dt t).

@

0. <--. 29 0.7 0.2 --.Qs. 0 .-... . 0 .. .. Q 14 3: 0. 0.5 0.2 0.2 0 0 e 0 0.1 0.2 0.5 0.1 0.l A numerical ex'ample Table l.4

presents h)

mthetical

dataonincome andvacation expenditure for an

imaginary

_mpulation.

_nere

arejust three levelsofincome andsixpossible levels of vacation exNnditure. Everyone.no matter how humble, gets tospend at least

$1,000

onvacation. 'I1e marginal probabilities show that 40 percent of this population have incomes of S20.(X%.D

_mrcent

have incomes of

$30,000,

and 30 percent have in-comes of S40.(G).

ne

conditional probabilities derived from these dataare shown in

Table 1.5.

Theseconditional pmbabilities are used to calculate the conditionalmeans

andvariances shown inthe lst two rows ofTable 1.4. Mean vacation expenditure

rises with income but the increase isnot linear, being greater for the increase from $30,000 to

$40,000

than forthe increase from$20,000to

$30,000.

The conditional

variance also increases with income. Onecould _{carry out the parallel} analysis for X given F. This might be of interest to atravel agent concerned with the distribution of income for people with agiven vacation expenditure.

1.3.2

The Bivariate Normal Dlstribution

The previousexamples _{have been in tenus} ofdiscretevariables. Forcontinuous

vari-ables themost famous distribution is the bivariate normal. When X and F follow a

(25)

CHAFI'ER 1: Relationships between Two Variables 15 1

fx,

A')= X

l'nvxa.y

1- p2 1

(x

-gx

jl

. g_p

(x-

gy

jy

-gy

j

.y.

(y

-gy

41 (

j. j,; eXP -

_a(j

- pa)

j

a

j

,-

j

a.y

_jq

ay

_y

j a.y

ln thisequation we have used .x

and yto indicate thevalues taken by thevariables X

and F. The lower-case letters here donot measure deviations fromsample means, as

they do in the discussion of the correlation coefhcient in Section 1.2.The range of

variation for both variables is fromminus _{to plus insnity.} lntegrating _{over y in Eq.}

(1.13) gives themarginal distribution forX, which is

2

1

1 x -gx

/(*

= exp -;;.

(1.

14) 2'mx d Gx

Thus, the marginal distribution ofX is seen to be normal with mean y,x and stan-dard deviationo. Likewise, themarginal distribution of F isnormal with mean gy

andstandard deviation o'y. The remaining parameter in Eq.

(1.13)

is p, which can

be shown to be the correlation coefficient between X and F. Finally, from the

_joint

distribution (Eq.

_(1.13)J

and the marginal distribution (Eq.

_(1.14)1,

the conditional

distribution of F given Xmay be obtainedll as

/(y

I

x) =

fx,yqlfx)

2

'

exp -

j'

('

-

,.zl-)

= /-2'nvy 1x Gy Ix (1. 15) The conditional distribution is alsoseen to benormal. The conditional mean is

Mylx= a +

l?x

(1.16)

& where a = _gy

-p

p.x and

p

= p A

(1.17)

a.x

The conditional mean is thus a linear function of the X variable. The conditional variance is invarialzt with X and is given by

z z z

_a

Gylx = Gyll

-P

)

(1.1 )

Thiscondition of constant variance isreferred _{to as homoscedsticity.} Finally, the conditional mean and valiance for X given F may be obtained by interchanging x

and y in the last three formulae.

. '(..'.

1.4

THE TWO-VARIABLE LINEAR REGRESSION MODEL

Inmany bivariate situations thevariables are treated inasymmetrical fashion. For

te Scottish soldiers of Table 1.1 theconditional distribution of height, given chest 11See Problem l.4.

(26)

16 EcoxoMElwc METHODS

size, is

_just

as meaningful and interesting as the conditional distribution of chest

size, given height. Theseare two aspects of the

joint

valiation. However, in the

va-cation expenditure/income _{example we} _{have already tended to}_{show more} interest in theconditional distribution of expenditure, given income, than in the distribution of income, givenexpenditure. This example is typical of many economic situations.

Economists often have explicit notions, derived from theoretical models, of

causal-ity running from X, say, to F.

_nus.

the theory of consumer behavior leads one to expect that household income will be a major determinant of household vacation

expenditure, but labor economics does not give equal strength _{to the proposition}

that household vacation exmnditure _{is a major determinant} of household income. Although it is formally true that a

joint

distribution can always be factored in two

differentways intothe productof a marginal and a conditional distribution, one

fac-torization will often lx of more interest toan economist than the other. Thus, in

the expenditure/income _case the factorization

_fx,

F) = f(X) . .J(F

1

X) will be of greater interest than the factorization

_fx,

F) = .J(F) '

f

CX

_l

F). Moreover, in the hrst factorization the conditional distribution of expenditure, given income, will usually receive much more attention and analysis than the marginal distribution for income.

1.4.1

A Conditional Model

To formulate a model forvacation expenditure that is conditional on income, letus

consider how data on such variables might _{be obtained. One possibility is that a}

sample of ??households from the N households in the

jopulation

_was taken and the values of Fand X recorded fOr _{the year in question.lz This is an example of}

cross-section data. There will _besome-presumably complex and certainly

unknown-bivmiate distlibution fora11N households. This bivariate distribution itself will be some marginalization of a multivariate distribution covering income and all cate-gories of expenditure. Concentrating onthe conditional distribution, economic

the-Ol'y Would _suggest

Ev

_I

_x)

= gx)

where gX) is expected to bean increasing function ofX. lf the conditional

expec-tation is linear in X,as in thecase of abivariate normal distribution, then

EY

_I

X) = + PX

(1.19)

For the ft.hhousehold tis expectation gives EY

_l

Xi) =

a + pXi

The actualvacation _expenditure of_{the th household is denoted by Yi, so}_we define a discrepancy or disturbance ui as

ui = Yi

-F(F

_I

Xi4= Ff

-a

-pxi

(1.20)

l7We_{now return} _{to the earlier convention of using X}_pnd_{F to indicate both the label for a}variable and

(27)

CHAPTER 1: Relationships Ytween Two Variables 17 The disturbance ui must thereforerepresent thenet in:uence of everything other than

the incomeofthe th household. These other factors might include such things as the

number and ages of household members, accumulated savings, and so fonh. Such factorsmight _bemeasured and included in Eq.

_(1.19),

butwith _{any finite numYr of}

explanatory factorswe still cannot expect perfect agreement between individual

ob-servationsand expectedvalues. Thus, theneed _to_{specify a}disturbance termremains. Takingconditional expectations ofbothsides of Eq.

_(1.20)

givesEui Xi) = 0.The

variance of ui is also seen to be the variance ofthe conditional distribution, tz'z,jxj.

A lf we lookat the

_/th

household, the disturbance uj will have zero expectation and

variancetojxy . Theseconditional variances may well vary with income. ln the

hypo-.!k7

tetical dataofTable l.4

theyare positively associated with income. For the present,

however, we will make the homoscedasticity assumption that the disturbance

vari-ances areconstant and independent ofincome.Finally, we make theassumption that the disturbances are distributd independently of one another. This rules out such

thingsas tivacation mania,'' where everyone rushes offtoEurope and large positive disturbances become apparent. This assumption implies that the disturbeces are

pairwise uncorrelated.lS _{Collecting these assumptions together gives}

Eui) = 0 for all i

A'(I/?) = c2 fora11i

(1.21)

vartlpl =

,

covui, uj) = Eluiuj) = 0 fori # j

These assumptions are embodied in thesimple statement

z

The uiare iidto,tw

)

(1.22)

which reads the ui are independently and identically distributed with zero mean

andvariance G2.''

Now suppose the available datacome intime series formand that

Xt = _{aggregate real}

disposable personal income in year t Yt= aggregate real vacation expenditure

in year t where t = 1, 2,

. . . _, n. The series

(.'V)

isnolonger a set of sample values from the

distribution of all N incomes in any year: it is theactual sum ofall incomes in each

l3Two_{variables are said} _{to be independently distributed,}_{or stochastically} independent. if theconditional distributions are equal tothecorresponding marginal distributions. This statement is equivalent tothe joint probabilities being the productpfthemarginal _{probabilities. For the discrete case. the} cov-n'-v-between X and F is then

covtx, 1')=

V.Fl

pijlxi - p,xlls. - p,).)

i j

=

Y-.

pi.xi - #.xl

X.

zji - p',.l using Eq.

(1.6)

i j

=

ne converse is notnecessarily true since thecovariance _measures linear association', but substituting p = 0 in Eq.

(1.13)shows _{that it is true for the bivariate}nonnal distribution, since the bivariate density thencollapses into theproductof the two marginal densities.

(28)

18 Ecoxo-c M.hmsjms , ?

year.It might be regarded as a sample of n observations from the 4population''

of

al1possible aggregate income numbers, _{but this interpretation seems to be putting}

some strainonthemeaning ofbothsample andpopulation. Moreover, theusual time

series sample'' consists of data forn adjacent years. Wewould berather suspicious ofcross-section samples thatalways consisted only of n adjacent households. They could be from Millionaires' Row or from Skid Row. Thus, it is difhcult to give an

unambiguous and useful interpretation of fX), themarginal distribution of X over

time. However, theconditional _distribution fY

_t

X) is still important and must be given aprobabilistic formulation. Tosee thisreasoning, return to thecross section

formulation and introduce the timesubscript. Thus, Yit=

a +

_pukjt

+ _uit

_(1.23)

whee Yit= real vacation expenditure by the th household in

year t

Xit = real disposable incomeofthe th household in

year t

Making the

_{(implausible)}

assumption that thea and

p

parameters are the same for all households and aggregating Eq.

_(1.23)

_{over al1N households in the}_economy,we

d

n

X

Yit- Na +

p

y-',m,

+

7-,

_uu

i i i

whichmay berewritten as

F, = Na + pXt ₊ Ut

(1.24)

where F and X denote aggregate expenditure and _{aggregate income}and _{& is an}

ag-gregate disturbance. The assumptions made about the householdI/'s _{imply that Ut is} astochastic variable with zero mean and variance Na.l. ln the context Oftime series, oneneeds tomake afurtherassumption _{about the independence, or lack thereof, of}

the &'s. lf the independence assumption ischosen, _{then the}statement is that the Ut are iidlo, Na'l).

1.4.2

F*lmotoq _{and Estimators}

Whether thesample dataare of cross section ortime series form, thesimplest version

of the two-variable model is Yi=

a + pXi + ui. with the uibeing iidto,G2). There

are thus three parameters to be estimated in themodel, namely, a,

_p

, andG2. The

parameters a and

#

are taken asapair,since numerical values ofboth arerequired

to fita specific line. Once such a line has beep fitted, the residuals from that line may

beused _{to form}_{an estimate} oftr2.

An estimator is a formula, method, or recipe for estimating anunknown

popu-lation parameter', and an estimate is thenumerical value obtained when sample data aresubstituted in the formula. The firststep in fittingastraight line to snmple data

is to plot the scatter diagram and make sure from visual _{inspection that the scatter}

is approximately linear.The _treatment of nonlinear scatters is discussed in thenext

chapter. Let the straight line fitted to the data be denoted by

_h

= _a ₊ bxi, where

Z.

indicates the height of_{the line at .Y,..'I'he actual Yi}value will in general deviate from

h.

Many estimatoo of the paira,b _{may be devised.}

(29)

CHAPTER 1: Relationships between Two Variables 19 1. Fitaline by eye and read off the implied values for the intercept a and slom b.

Different artists'' may,of course, draw different lines,soit is preferable tohave an estimator thatwill yield thesame result foragiven dataset, irreslxctive of'lwt investigator.

2. Pass aline through the leftmost pointand therightmost pointofthe scatter. If X.

denotes thesmallest value of X in thesample and X.. the largest and F., F.. the associated Fvalues, thisestimator is

b = (F.,

-Y.4lX.. - .Y+)

a =

F, - bX. = F+.

-DX..

This estimator can hardly beexpected to perlbrm very well since ituses only two

of thesample points and ignores therest.

3. The lastcriticism may bemet by averaging the Xand F coordinates of themleft

mostand them rightmost points, where missome integer between 1 andnll, and

passing aline through theresultant average points. Such anestimator with m set

atr#3 _orn/2 _{has been proposed in the literature on}_errors invariables, _aswill be discussed later. This typeof estimator doesnot easily lend itself tomathematical

manipulation, and some of its properties inrepeated applications _{are diflkult to}

determine.

'

''

...

'

( .E'

1.4.3 Leastquares Estimators

The dominant and powerful estimating principle, which emerged in theearly _years

of thenineteenth century for thisand other problems, is that ofleast square4 Let the residuals fromany litted straight line be denoted by

ei = Yi

-f

= Yi

-a - bXi

i = 1, 2,

. .., n

(1.25)

From te desnition of

f

and fromFig. 1.6 theseresiduals are seen to be measured in thevertical (F) direction. Each pairof a,bvalues definesadifferent line and hence

a different set ofresiduals. The residual sum of squares is thus a function of a and

b. The least squares principle is ,

Selct t!,b tominimize theresidual _{sum of squares.}

Rss =

X

e?,=

fa,

b)

The necessary conditions for astationary value ofRSS are15

l4See again the unfolding story in Stephen M. Stigler, Fe History of akzzz7tlfcl.,

Harvard University Press, 1986.

l5Inobtaining the derivatives we leave thesummation sign in placeand differentiate the _{typical term} withrespect toaand b in turn,and simply observe therule thatany constant can bemoved in front of thesummation sign butanything thatvaries fromone sample point toanother must be kept to theright of the summation sign. Finally,we have dropped thesubscripts andrange of summation since there is no ambiguity. Strictly speaking, 0ne should also distinguish between thea and bvalues thatappear in theexpression to be minimized and the specificvalues _{that actually do minimize the residual sum of} squares, but again there is littlerisk of ambiguity and wehave kept the expressions uncluttered. '

(30)

20 EcoNoMsTmc METHODS F P_(x. y. y-₌ a+bx P I I I _e_i I I . I 1 I _J'_l I F 1 _. 1 I. j t . I I I I 1 . . . I '' '.' ' 1 . . I J 1 l 1 I I I I 0 _X _X. _X I FIGURE 1.6

Residuals from a htted straight line.

(1.26)

t?(X

el) and =

-2X

X(F -a -bX) =

-2X#e

= 0

(1.27)

Db

Simplifyzg gives thenormal _{equations for the linear regression of F on X. That is,}

N'

r

=

na +

bYx

-

(1.28)

XXX = &VX ₊ FXXI

Thereason for theadjective porm.alFillbecome clearFhenwedisuss the geometzy of least squares later.

The firstnormal equation may berewritten as

a = lV- bx

(1.29)

Substtuting

fora in te second normalequation gives

?G

el)

=

-2X(r

-a -bx) =

-2Xe

= 0

aa

X

xy

_us

b= =r

)(

x2 _sx (1.30)

(31)

CHAICER 1: Relationships between Two Variables Thus, the least-squares slope may lirst of a11be estimated byEq.

_(1.30)

from the sample deviations, and the intercept then obtained _from substituting _{for b in Eq.}

(1.29).

Notice that these two expressions haveexactly thesame formasthose given

in Eq.

_(1.17)

for the intercept and slope of the conditional mean in the bivariate

normal distribution. The only difference is that Eqs.

_(1.29)

and

_(1.30)

are in terms

ofsample statistics, whereas Eq.

_(1.17)

is in tenns of population statistics.

Tosummarize, _{the least-squares line has three important properties.} It minimizes

thesum ofthe squared residuals. It passes through themean point

(X,

9,

as shown by Eq.

_(1.29).

Finally, the least-squares residuals havezero correlation in the sample

with thevalues ofX.16

The disturbance variance c2 cannot be estimated from a sample of u values,

since these depend on theunknown a and

_p

values and arethusunobservable. An

es-timate can be based on thecalculated residuals

_(the

ej). Two possibilities are

J()

elln

or

X

e2/(?z

-zjsTorreaslms to beexplained in Chapter

7-+....111.p..#./.9%--11...-J

el

2 =

(j

gj)

J .

n- 2)

1.4.4 Decomposition of

_the

Sumof Squares

Using Eqs.

₍₁

.25)

and

_(1.29),

_{one may express} the residuals _{in terms of the} _{x, y}

deviations, namely

ei =

yj - bxi

(1.32)

Squaring both sides, followed bysumming over the sampleobservations, gives

X

el =

V

,2

-lbT

xy + bl

X

.x2

The residual _{sum of} _squares is thus seen to be a quadratic function of b. Since

J(

xl % _{0, and the}equality would only hold in the pathological case ofzero variation

in the X variable, the single stationary _{point is necessarily a minimum. Substitution}

from Eq.

_(1.30)

gives

X

=

blX

xl ₊

N

el

=

hxxy +

X,2

=

r2Xy2

+

X

el

(

l.33)

This famvusdecomposition of thesum ofsquares is usually written as

TSS = ESS + RSS

X-

Xe = X.(A ₊ X)e = '.'.%', xe + #

y.

_e =

N.

xe using Eq._(1.26)

(32)

22 scoNoMsnuc METHODS wherel? TSS = total

sum of squared deviations in the Fvariable

RSS = residual, _{or unexplained, sum of} _{squares from the regression of F}

X on

ESS = explained sum of squares from theregression of F onX

The last line of Eq.

_(1.33)

may berearranged to give

RSS ESS

rl = 1- ₌

(1.34)

TSS TSS

Thus, _{may be interpreted} as the proportion of the Fvaliation attributable to the

linear regression onX. Equation (1.34)

provides an alternative demonstration that

the limits of r are t 1 and tiut in the limiting case the snmple points a1l lie on a in lestraight line.

s g

1.4.5

A Nz.mer' .*1

Kumple

Table 1.6 givessome simple data to illustrate the application of these formulae.

Sub-stitution in Eq. ( l.28)

then gives the normal equations 40 = 5a ₊ 20: 230 = lQa₊ 120: '. .' . . . ' ' _. with solution . . . ' . _p . j .y j yyy

ne same data in deviation form are shown in Table 1.7. Theregression coefficients may be obtained from

s _zv _yo b = =-' -= = 1.75

J(

.:2

T

and

a = 1-/- bk ₌ 8- 1.75(4) = 1

The explained sum of squares may be calculated as ESS = b

X

xy = 1.75(70) = 122.5

and the residual sum of squares is given bysubtraction _as

RSS = TSS- ESS = 124 - 122.5 = 1.5

Finally, the proportion of the Fvariation explained _{by te linear}regression is

ESS 122.5

2= = = () 98-/9

r .

TSS 124

l7unfortunately there isno uniform notation for sumsof squares. Someauthors use SSR to indicate the sumof squares due to theregression _(ourESS),and _{SSE to indicate the sum}_{of squares} _{due to error}_(our

(33)

CHAPTER 1: Relationships between TwoVariables 23 TABLE 1.< k ..:J; . _v

x

r avlr xz , e xe l 4 8 4 4.50 -0.50 _{- 1} 3. 7 21 9 6.25 0.75 2.25 1 3 3 1 2.75 0.25 0.25 5 9 45 25 9.75 -0.75 -3.75 . . .'. 't' . . ..% 9 17 153 81 16.75 0.25 2.25 ' .7 sums 20 40 230 1.20 40 0 0 TABLE 1.7 x y xy .r2 y2 _J _e _xe -2 -4 8 4 16 -3.50 -0.50 1.00 - 1 - l 1 1 1 - 1.75 4.75 -0.75 -3 -5 15 9 25 -5.25 0.25 -0.75 1 1 l 1 1 1.75 -0.75 -0.75 5 9 45 25 81 8.75 0.25 1.25 Sums 0 0 70 40 124 0 0 0 1

_s

IMFERENCE

INTHE TWO-VARIABLE, LEAST-SQUARES MODEL

The least-squares (LS)estimators of a and

_p

have been defined in Eqs.

_(1.28)

to (1.30). There arenow two important questions:

1. What are the properties oftheseestimators?

2. How may these estimators beused _tomake inferences abouta and

p

1.5.1

Pmperties ofLS Estimators

The answers to both questions depend on the sampling distribution of the LS es-timators. A sampling distribution describes the behavior of the estimatorts) in

re-peated applications of the estimating formulae. A given sample yields _a smcilic

numerical estimate. Another sample from the same population will yield another

numerical estimate. _A sampling distlibution describes the results that will

_the

ob-tained for the estimatorts) _{over the potentially inhnite} set of samples that may l>e

drawn from the population.

The parameters of interest are a,

p

, and

G2 _of _the _conditional _{distribution,} fY

₁

X). ln that conditional distribution theonly source of variation from one

hy-mthetical sample toanother isvariation _{in the}stochastic _disturbance

_(lg),

which in conjunction with the given Xvalues will determine the i' values and _{hence the}

sam-pIevalues of a, b, and sl. Analyzing J'conditional _{on X thus treats the .Y1,X2,}. . .,

(34)

assump-24 EcoxoMElwc METHODS

tion that themarginal distribution forX, that is,fX), doesnotinvolve the parameters of interestor,inother words, thatfX) contains noinfonnation ona,

J'1,

and

tO.

This

iscalled the fixed regressor case, orthe caseof nonstochastic X.From Eq.

_(1.30)

the LS slope may be written

b

-

y'''

_wfs

where the weights w/ are given by

xi

wf = -

(1.35)

7

x?

l

These weights _{are hxed in}remated snmpling and have the following properties:

l

wf

= 0

Y)

w?,=

- . and

Y-'

wixi = Y'N

wixi = 1

(1.36)

L .f

(

''-'

-'-lt

then follows that

b =

X

wjrj

_(1.37)

so that the LSslom isalinearcombination of the Fvalues.

The sampling distribution of b is derived from Eq.

_(1.37)

bysubstituting Ff =

t + pXi + ui and using the stochastic properties of u to determine the stochasti properties of_b.

_nus.

b

-a

(y-.

w,.)+

_p

(y''

w,.x,.)+

y'-'-

_wjuf

=

p

+

X,

_wiui

Eb) =

p

and so

_(1.39)

that is. the LS slope isan unbiased estimator of

_p

. From Eq.

(1.38)

the variance of

p isseen to be

2

vartyl

=

ELb

-

p)2j

= E

(X

wjujj

From the properties of thew's itmay be shownlB that

z

G

varthl

=

z

x

By similar methods itmay be'shownlg that

and

These four formulae give the means andvariances of themarginal distributions of

a and b.The twoestimators, however, are in general not stochastically independent, (1.40) E

₍₈

= a

2 g

-1+

X

2

j

varttz)

= tz' _z

n

x (1.41) (1.42) l'SeeAppendix 1.1. l9seeApmndix 1.2.