vi Ecoxom'rmc Msnloos
1.3 To derivecovtc, b4
1.4 Gauss-Markov theorem
1.5 To derivevarteol
ProblemR
2 Further
Aspects of Two-variable Relationships2.1 Timeas aRegressor
= 2.l.1
Constant Growth Culwes
2.1.2 Numerical Example
17 Transfonnations ofValiables
2.2.1 Log-Log Transformations
2.2.2 Semilog Transformations
2.2.3 Reciprocal Transformations
2.A AnEmpirical Example of aNonlinear Relation; U.S. lntlation d
an Unemployment
2.4 Lagged Dependent Variableas Regressor
2.4.1 Anlntroduction to Asymptotics
2.4.2 Convergence in Probability
2.4.3 Convergence in Distribution
2.4.4 The Autoregressive Equation
2.5 Stationary and Nonstationary Series
2.5.1 Unit Root
2.5.2 Numerical lllustration
Maximum Likelihood Estimation of the Autoregressive Equation
2.6.1 Maximum Likelihood Estimators
2.6.2 Properties of Maximum Likelihood Estimators Appendix
2.1 Change of variables in density functions
2.2 Maximum likelihood estimators for the AR(1)model
Problems
3 The k-variable Linear Equation
3.1 Matrix Formulation ofthe k-variable Model
3.1.1 The AlgebraofLeast Squares
3.1.2 Decomposition ofthe Sum of Squares
3.1.3 Equation inDeviation Form
3.2 Partial Correlation Coefticients
3.2.1 Sequential Buildup ofthe Explained Sum of Squares
3.2.2 Partial Correlation Coefhcients and Multiple Regression Coefticients
3.2.3 General Treatment ofPartial Copelationand Multiple Regression Coefficients
The GeometryofLeast Squares Inference in the k-variable Equation
3.4.l Assumptions
3.4.2 Mean and Variance of b
36 36 37 37 41 42 43 43 45 46 47 49 52 53 54 55 56 57 59 59 61 61 63 65 69 70 70 72 73 76 78 82 83 86 86 87
Contents vii
3.4.3 Estimation ofG2 3.4.4 Gauss-Markov Theorem
3.4.5 Testing Linear Hypotheses about p
3.4.6 Restricted and Unrestricted Regressions
3.4.7 Fitting the Restricted Regression
3.5 Prediction Appendix
3.1 To prove rI2.3 = (r12- r13n3)/
1- r2jq 1
-n2a
3.2 Solving for asingle regression coefficient ina multiple regression
3.3 Toshow that minimizing a'a subject to X'a z.= r gives
a = XX'X)- lc
3.4 Derivationoftherestricted estimator b. Problems
4
89 89 O 95 96 99 100 101 103 103 104 109 109 110 110 111 112 113 113 116 117 1l8 119 121 121 126 127 128 128 129 130 132 133 133 134 135 137 137 138 139Some Tests
ofthe k-variable Linear Equation
for Specihcation Error
4.l Specilkation Error
4.1.1 Possible Problems with u
4.1.2 Possible Problems with X
4.1.3 Possible Problems with
p
4.2 Model Evaluation and Diagnostic Tests
4.3 Tests of Parameter Constancy
4.3.1 The Chow Forecast Test
4.3.2 The Hansen Test
4.3.3 Tests Basedon Recursive Estimation
4.3.4 One-step Ahead Prediction Errors
4.3.5 CUSUM and CUSUMSQTests
4.3.6 A More General TestofSpeciscation Error: The Ramsey RESET Test
'4.4 A Numerical Illustration $5 Tests of Stnlctural Change
%-'4.5. 1 TestofOne Structural Change
4.5.2 Testsof Slope Coefscients
4.5.3 Testsoflntercepts 4.5.4 Summary 4.5.5 A Numerical Example 4.5.6 Extensions ' 46 Dummy Variables 4.6.1 Introduction 4.6.2 Seasonal Dummies 4.6.3 QualitativeVariables
4.6.4 TwoorMore SetsofDummy Variables
4.6.5 A Numerical Example Appendix 4.1 Toshow var(#) = c2
g/k
+#2(#)#j)-1.Vj
Problems . .Eviii scoxoMErrltlc METHODS /
,$5 Maximum
Likelihood (ML),Generalized Least Squares
D
(GLS),
andInstrumental Vnriable (IV) Estimators
5.1 Muimum Likelihood Estimators
5.1.1 Promrties of Maximum Likelihood Estimators ML Estimation ofthe Linear Model
Likelihood Ratio, Wald,and Lagrange Multiplier Tests
5.3.1 Likelihood Ratio (LR) Tests
5.3.2
ne
Wald (W) Test5.3.3 Lagrange Multiplier (LM) Test
5.4 ML Estimation of theLinear Modelwith Nonspherical Disturbances
5.4.1 Generalized Least Squares
5.5 Instrumental Variable (lV) Estimators
5.5.1 Smcial Case
5.5.2 Twotage Least Squares (2SLS)
5.5.3 ChoiceofInstruments
5.5.4 TestsofLinear Restrictions Apmndix
5.l Change of variables in density functions
5.2 Centered and uncentered #2
5.3 Toshow that e.'XX'X4- l#?e. = e.'e. - e'e
Problems '
6
Heteroscedasticity and Autocorrelation'S--/
6.1 Properties ofOLS Estimators
6.2 Tests for Heteroscedasticity
6.2.1
ne
White Test6.2.2
ne
Breusch-pagan/Godfrey Test6.2.3
ne
Goldfeld-ouandt Test6.2.4 Extensions of the Goldfeld-ouandt Test
6,3 Estimation Under Heteroscedasticity
6.3.1 Estimation w'ith Grouped Data
6.3.2 Estimation of the Heteroscedasticity Relation
6.4 Autocorrelated Disturbances
6.4.1 FormsofAutocorrelation: Autoregressive and Moving Average Schemes
6.4.2 Reasons for Autocorrelated Disturbances
6.5 OLS and Autocorrelated Disturbances
6.6 Testing for Autocorrelated Disturbances
6.6.1 Durbin-Watson Test
6.6.2 The Wallis Test for Fourth-order Autocorrelation
6.6.3 Durbin Tests for a Regression Containing Lagged Values
Of the Dependent Variable '
6.6.4 Breusch-Godfrey Test
6.6.5 Box-pierce-ljung Statistic
6.7 Estimation ofRelationships with Autocorrelated
Disturbnnc-142 142 143 145 147 147 148 149 151 152 153 156 157 157 158 158 159 160 l61 162 163 166 166 l67 168 168 170 171 jgj 174 175 176 176 178 179 182 182 185 187 188
Contents
6.8 Forecasting with Autocorrelated Disturbances
6.9 Autoregressive Conditional Heteroscedasticity (ARCH) Appendix
6.1 LM test formultiplicative heteroscedasticity
6.2 LR test for groupwise homoscedasticity
6.3 Propertiesofthe ARCHII) process
Problems
7 Univariate Time Series Modeling
A Rationale for Univariate Analysis
7.1.1
The Lag Operator
7.1.2 ARMA Modeling
7.2 PropertiesofAR, MA, and ARMA Prpcesses
7.2.1 AR(1) Process
.
7.2.2 AR(2) Process
7.2.3 MA Processes
7.2.4 ARMA Processes
7.3 Testing for Stationarity
7.3.1 Graphical lnspection
7.3.2 lntegrated Series
7.3.3 Trend Stationary (TS)and Diffrence Stationary (DS) Series
7.3.4 Unit Root Tests
7.3.5 Numerical Example
7.4 Identihcation, Estimation, and Testing of ARIMA Models
7.4.1 Identihcation 7.4.2 Estimation 7.4.3 Diagnostic Testing Forecasting 7.5.1 MA(1)Process 7.5.2 ARMA(1,1) Process ' 7.5.3 ARIMA(1,1,0) Process 7.6 Seasonality
7.7 A Numerical Example: Monthly Housing Starts Problems
8' Autoregressive Distributed Lag Relationships
8.1 Autoregressive Distributed Lag Relations
8.1.1 A Constant Elasticity Relation
'td' .':'/ .tk .a',l.: ' .' , ' -'i' 't.'..,:'
8
l 2 Re arameterizatien -'-u ' ' ' -. . P 8.1.3 Dynamic Equilibrillm 8.1.4 Unit Elasticity 8.1.5 Generalizations Specification and Testing8.2.1 General to Simpleand Vice Versa
8.2.2 Estimation and Testing
8.2.3 Exogeneity 192 195 198 200 201 202 204 205 206 207 244 244 245 245 246 246 247 248 248 250 253
ECONOMETRIC METHODS
8.2.4 Exogeneity Tests
8.2.5 The Wu-llausman Test
8.3 Nonstationary Regressors
8.4 A Numerical Example
8.4.1 Stationarity
8.4.2 Cointegration
8.4.3 AResmcihed Relationship
8.4.4 AGeneral ADL Relation
8.4.5 A Reparameterization
8.5 Nonnested Models ApNndix
8.l Nonsingular linearMnsformations ofthevariables
inanequation
8.2 Toestablish theequality ofthetest statistics in Eqs.(8.37)
and (8.41)
9 Multiple Equation Models
9.1 Vector Autoregressions (VARs)
9.l.1 A Simple VAR 9.1.2 AThree-variable VAR 9.l.3 Higher-order Systems ' 9.2 EstimationofVARS
. 9.2. 1 Testing the Order of the VAR9.2.2 Testing for Granger Causality
9.2.3 Forecasting, Impulse Response Functions,
andVariance Decomposition
9.2.4 lmpulse Response Functions
9.2.5 Orthogonal lnnovations
9.2.6 Variance Decomposition
; 9.3 Vector Error Correction Models
' , 9.3.1 Testing for Cointegration Rank
. . 9.3.2 Estimation ofCointegrating Vectors
9.3.3 Estimation ofaVector Error Correction Model
9.4 Simultaneous Structural Equation Models
9.5 Identihcation Conditions
9.6 Estimationof Structural Equations
9.6.1 Nonstationary Variables
9.6.2 System Methods of Estimation Appendix
9.1 Seemingly Unrelated Regressions (SUR)
9.2 Higher-order VARS 9.2.1 A VAR(1) Process 9.2.2 A VAR(2) Process Pmblems 256 257 259 265 266 266 270 271 275 280 282 284 285 287 287 287 292 294 295 296 296 297 298 299 30l 301 302 303 305 305 309 314 317 317 318 320 320 321 322
Contents xi
10 Generalized
Method of Momentsv 10.1 The MethodofMoments 10.2 OLS asaMoment Problem
10.3 lnstrumental Variablesas aMoment Problem
10.4 GMM and the Orthogonality Condition
10.5 Distribution oftheGMM estimator
10.6 Applications
10.6.1 Two-stage Least Squares,and Tests
ofOveridentifying Restrictions
10.6.2 Wu-Hausman Tests Revisited
10.6.3 Maximum Likelihood
10.6.4 Euler Equations '
10.7 Readings Problems
11 A Smorgasbord of Computationally Intensive Methods
111 An Introduction to Monte Carlo Methods C11.1.1 Some Guidelines for Monte Carlo Experiments
11.1.2 An Example
11.1.3 Generating Pseudorandom Numbers
111.4 Presenting the Results
11.2
Monte Carlo Methods and Permutation Tests
11.3 The Bootstrap 362
11.3.1
The Standard Errorofthe Median 362
11.3.2 An Example 363
11.3.3 The Parametric Bootstrap 365
11.3.4 Residual Resampling: Time Seriesand Forecasting 366
11.3.5 Data Resampling: Cross-section Data 369
11.3.6 Some Remarks on Econometric Applications ofthe Bootstrap 369
327 328 329 330 333 335 336 336 338 342 343 344 345 348 348 349 350 352 354 359
11.4 Nonparametric Density Estimation 370
11.4.1 Some General Remarks onNonparametric Density Estimation 375
11.4.2 An Application: The Wage EffectsofUnions 376
11.5 Nonparametric Regression 379
11.5.1 Extension: The Partially Linear Regression Modl 383
116 References ..' 385
Problems 385
12 Panel Data
388( 12.1 Sourcesand TypesofPanel Data 389
12.2 The Simplest Case-rfhe Pooled Estimator 390
12.3 Two Extensions to the Simple Model 390
12.4 The Random Effects Model 391
xii EcoxoMETRlc METHODS
12.6 The FixedEffects Model in the Two-period Case
12.7
ne
Fixed Effects Modelwith MoreThan Two Time Periods12.8
ne
PerilsofFixed Effects Estimation12.8.1 Example 1: Measurement Error in X
12.8.2 Example 2: Endogenous X
12.9 Fixed Effects orRandom Effects?
12.10 A Wu-llausman Test
12.11 Other Specifkation Testsand an Introduction to Chnmberlain's
Approach
l2. 1l.1 Formalizing the Restrictions
12.11.2 Fixed Effects in the General Model
12.11.3
Testing the Restrictions
12.12 Readings Problems
13
Discrete and LimitedDependent Variable
Modelsl3.1 TypesofDiscrete Choice Models
l3.2 The Linear Probability Model
13.3 Example: A Simple Descriptive ModelofUnion Participation
13.4 Formulating a Probability Model
l3.5 The Probit
13.6 The Logit
13.7 Misspecihcation in Binary Dependent Models
13.7.1 Heteroscedasticity
13.7.2 Misspecihcation in the Probitand Logit
13.7.3 Functional Fonn: What ls theRight Model to Use?
Extensions to the Basic Model: Groufe Data
13.8.1 Maximum Likelihood Methods
13.8.2 Minimum
xl
Methods 13.9 Ordered lrobit 13.10 395 397 399 399 402 403 403 404 406 407 407 408 409 412 412 414 415 418 4l9 424 426 426 427 430 432 432 433 434 Tobit Models 43613.10.1 The Tobitas an Extension of the Probit 436
13.10.2 Why Not lgnore trf'he Problem''? 439
13.10.3 Heteroscedasticity and the Tobit 440
13.11 Tw()Possible Solutions 441
13.11.1 Symmetrically Trimmed Least Squares 442
13.11.2 Censored Least Absolute Deviations (CLAIXBstimator 444 Treatment Effects and-rfwo-step Methods 446
13.12.1 The Simple Heckman Correction 447
13.12.2 Some Cautionary Remarks about Selectivity Bias 449
13.12.3 The Tobit asaSpeciz Cws 450
13.13 Readings Problems
452 45/
Contents xiii
Appendix A
A.1 Vectors
A.1.1 Multiplication bya Scalar
A.1.2
Addition and Subtraction A.1.3 Linear Combinations A.1.4 Some Geometry A.1.5 Vector Multiplication A.1.6 Equality ofVectors
A.2 Matrices
A.2.1 Matrix Multiplication A.2.2 The Transpose of aProduct A.2.3 Some Important Square Matrices A.2.4 Partitioned Matrices
A.2.5 Matrix Differentiation A.2.6 Solutionof Equations
A.2.7 The Inverse Matrix A.2.8 The Rank ofaMatrix
A.2.9 Some PropertiesofDeterminants E
A.2.10 Properties ofInverse Matrices
A.2. 11 Moreon Rankand the Solution of Equations
A.2.12 Eigenvalues and Eigenvectors
A.2. 13 Properties ofEigenvalues and Eigenvectors A.2.14 QuadraticForms and Positive Dehnite Matrices
Appendix
BB. 1 Random Variablesand Probability Distributions B.2 The Univariate Normal Probability Distribution B.3 Bivariate Distributions
B.4 Relations between the Normal,
F,
t, and F Distributions8.,5 Expectations in Bivariate Distributions B.6 Multivariate Densities
B.7 Multivariate Normal pdf B.8 Distributions of
Quadratic
Forms B.9 Independence ofQuadraticFormsB.10 Independence of aQuadraticFormand aLinear Function
Appendix
C
Appendix
Dlndex
485 486 487 489 490 490 492 493 495 496 497 499 521CHAPTER 1
Rlationships
between
Two Variables
The
economics literature contains innumerable discussions of relationships be-tweenvariables in pairs: quantity and price; consumption and income; demand for money and the interest rate; trade balance and the exchange rate; education and income;unemployment and the inqation rate; and many more.This isnottosay thateconomists believe that theworld can be analyzedadequately in terms ofa collection
of bivariate relations. When they leave the two-dimensional diagrams of the
text-books behind and take on the analysis ofreal problems, multivariate relationships
abound. Nonetheless, some bivariate relationships are signiscant in themselves;
more importantly for our purposes, the mathematical and statistical tools developed for two-variable relationships are fundnmental btlilding blocks for the analysis of more compli-ated situations.
1.1
EXAMPLES OF BIVARIATE RELATIONSHIPS
Figure 1.1
displays two aspects of the relationship between real personal saving (SAV) and real personal disposable income (lNC) in the United States. ln Fig. 1. 1J
the value of each selies is shown quarterly for the period from 1959.1 to 1992.1.
These twoseries and many oftheothers in the examples throughout the bookcome
from the DRl Basic Economics Database
(formerly
Citibase); where relevant, weindicate thecorrespondence between our labels and the Citibase labels forthe
vari-ables.l Figure 1.1J is a typical example of atlmeseries plot, inwhich time is
dis-playedonthe horizontal axis and thevalues of the series are displayed onthevertical
axis. lncome shows an upward trend throughout the period, and in the early years.
saving does likewise. This pattern, however, isnot replicated in themiddle and later
lA definition ofall series is given in the data diskv which nrnompanse thiR volumt. lnstructions for
2 BcpyoMETmc METHODS G' INC = ewee 35 Ch250 ..'* = e t-- ' **%* c:l z? > sAV e - ' 31* o e * 200 *-r**-r sz ' jz e' > u <:? . 25 > czl ,.'# '-M'-M tl ,% e', h .' l '> 150 -E! M e- Q - .<# Q ' 2% u 'J g:
.--eu
1 .e t .-* ..Q G) 100 .e' r = a t. 15 n % ... jx Ye + 'a 50 1()(X) * 60 65 70 75 80 85 90 300 Year % + #' + + + + + + + + ++ + 2 .-+ + Af'$ '*+ ++ + +t.
> :: '$. x+ + + < + + #' + + + + + + + + + + + 150 #+ ++ *' + +$ + + + + + + + + ++ + + .1 + +J + l +:. + 4 50 1e 15 2% 25 A 35 3 mC FIGURE 1.1CHAIC'ER 1: Relationships between TwoVariables 3 years. Onemight be tempted toconclude fromFig. 1.1J that saving ismuch more
volatile than income, but that doesnot necessarily follow,since theseries have
sep-arate scalesaz
An alternative display of the same information is in terms of a scatter plot. shown in Fig. l.1:.
Hereone series is plotted against theother. The time dimension
isnolonger shownexplicitly, butmost software programsallow theoption ofjoining
successive points onthescatter sothat theevolution oftheseries over timemay still
be traced. Both partsof Fig. 1.1
indicate gpositive association between thevariables:
increases in one tend to be associated with increases in the other. It is clear that
although theassociation isapproimately linear in the early part of the period, it is
notsoin thesecond half. Figures 1.2and l.3
illustratevarious associations between thenatural log ofreal
personal expenditure ongasoline (GAS),the natural logofthereal priceofgasoline
(PRICE), and the natural logof real disposable personal income (INCOME). The
derivationsoftheseries are described in the data disk.The rationale for the logarith-mic transformations is discussed in Chapter 2.Figure 1.2
givesvarious time pletsof
gasoline expenditure, price, and income. The real price series, with 1987 as the base year, shows the two dramatic price hikes of the early and late 1970s, which were subsequently eroded byreductions in the nominal price ofoil and by U.S. inflation, so thereal price at the end of the period was less than that obtaining at the start.
The incomeand expenditure series are bothshown in per capita form, because U.S. population increased byabout 44 percent over the period, from 176million to 254
million. The population series used todellate the expenditure and income series is the civilian noninstitutional population aged l 6 and over, which has increased even
faster than the general population. Per capitareal expenditure ongasoline increased
steadily in the 1960: and early 1970s,as real income grew and real price declined.
This steady rise endedzwith the price shocks of the 1970s,and per capita gas
con-sumption hasnever regained the peak levelsoftheearly seventies.
The scatter plots inFig. 1.3 further illustrate the upheaval in thismarket. The plot for thewhole period in Fig. 1.3/ shows very different associations between ex-penditure and price in theearlier and later periods. Thescatter for 1959.1 to 1973.3 in Fig. 1.?b
looks like aconventional negative association between price and quantity. This is shattered in themiddle period
(1973.4
to 1981.4)andreestablished, though with avery different slope, in the last period
(1982.1
to 1992.1). This dataset willbe analyzed econometrically in thisand laterchapters.
These illustrative scatter diagrams have three main characteristics. One isthe
sign of the association or covariation-that is, do the valiables move together in
apositive or negative fashion? Another is the strength of the association. A third characteristic is the linearity
(or
otherwise) oftheassociation-is the general shapeof the scatter linearor curvilinear? In Section 1
.2
wediscuss the extent to which the
correlation coefhcient measures the hrst two characteristics for a linearassociation,
and in later chapters we will show how to dealwith the linearity question, buttirst
we giveanexample ofabivariate frequency distribution.
ECONOMEnUC METHODS Year (* ew . % --.4.0 o 1) GAS 't*c ! , -4 j e-'* e'i # @ % . :% .* -.. 1 ' % >..w. 2 # & #' -7 8 ' *A JP - ,-.zz & -4 2 m rd ..
x
< .-.z INCOME c) (J e ., (J * > e# - 4 3 = 79 r'R ' # #' # ' -4 4 ..* -8.0 w' # -.e u 5 81 -4.6 60 65 70 75 80 85 90 Yc% -7.6 -3.9 () FIGURE 1.2Time series plots of natural logofgasolineconsumption in 1987 dollars percapita.
(a)Gasoline consumption vs. natural logof pricein1987cents per gallon. bj
CHANER 1: Relationships between TwoValiables 5 5.0 k e 4.8 m 4.6 1959.1 to1992.1 4.4 -.8.1 -8.0 -7.9 -7.8 -7 .7 -7.6 5.2 GAS aj 4.70 4.65 x 4.60 m 4.55 4.50 1959.1 to1973.3 4.45 -8.1 -8 .0 -7.9 -7.8 -7.7 -7.6 4.75 GAS 5.0 y 4.8 4.6 1973.4 to1981.4 4.4 -7.80 -7.75 -7.70 -7.65 -7.60 5.2 GAS (c) FIGURE 1.3
Vatter plots of price and gasoline consumption.
1.1.1 Bivariate Frequency Distributions
5.0
k
48 < * m 4.6 1982.1 to 1992.1 4.4 -7.80 -7.75 -7.70 -7.65 -7.* 5.2 GAS (6f)The data underlying Figs. l.1
to 1.3 come in the form of n pairs of observations
of the form (.Y,.,Yi), i = 1, 2,. . . , n. When the sample size n is very large, the
data areusually printed asabivariate frequency distribution; theranges of X and i'
are split into subintervals and eachcell of the table shows the number of observatiops
6 ECONOMETRIC METHODS TABLE 1.1
Distribution of heights and chest circumferences of 5732 Scottish militiamen
*A-%
Chest circumference (inches)
45 and Row 33-35 36-38 39-41 47-4: over toKals *2 K5 39 331 326 26 0 722 Height r< *7 40 591 1010 170 4 1815 (inches) A* *9 19 312 1144 488 18 1981 78-71 5 100 479 290 23 897 71-73 0 17 l20 153 27 317 Colx:nn toenlq l03 1351 3079 1127 72 5732 souxe: Fdinhvnik u- - *--2 5aqicJlJ/urucl(1817,pp.26X2M). T A B L E l.; Conditional
---f@rthedata in Table 1.1
Mean of lri ' rlNen clxst (inches) 66.31 66.84 67.89 69.16 70.53
Mean of clw psen Yight (inches) 38.41 39.19 40.26 40.76 41.80
in thecorresmnding pairof subintervals. Table 1.1 provides an example.S It isnot
possible to give a simple. two-dimensional representation of these data. However, insmction of the cell frequencies suggests a positive association between the two
measurements.
nis
isconhrmed bycalculating theconditional means. First of all, eachof thetive
cenlral columns ofthe table givesadistribution of heights for a givenchest mesurement. -111e%
are conditional frequency distributions, and traditional
statistics such s mean andvariances may becalculated. Similarly, the rowsof the
tablegivedistributions of chet measurements, conditional onheight. The two setsof
conditionalmeans areshewn in Table1.2.,
each mean series increases monotonically with increases in > mweitioning variable, indicating
apositive association between
thevariablp ..: . .:- .. 'q .. .
1.2
THE CORRELATION COEFHCIENT
The directionand closeness of the linear ssociation between twovariables are
mea-sured by thecorrelation coefhcient.4 Letthe observations be denoted by
(m,
Fj)withi = 1,2,
. . ., n. Once the sample means have *en calculated, the data may be
ex-pressed in deviation form as xi =
Xi -
X
yi = Yi-#
3condensed from Stephen M. Stigler, Fc Histor. ofStatistics Harvard University Press. 1986, p. 208.
CHAIXI'ER 1: Relationships between TwoVariables 7
where
k
and9
denote thesample means ofXand F. Figure 1.4shows anillustrative point ona scatter diagram with thesample means as newaxes, giving four quadrants,which arenumbered counterclockwise. The product xiyi is positive for all points in
quadrants I and lll andnegative for all points in quadrants 11and IV.Since a
msitive
relationship will have points lying for the most part in quadrants 1 and 111,and a
negative relationship will have points lying mostly in the other two quadrants, the
signof
X''j
- lxiyi will indicate whether thescatter slopes upward ordownward.Thissum, however, will tend to increase inabsolute terms as more data areadded to the
sample. Thus, it is better to express the sum inaverage terms, giving the sample covariance, Fl COVIXF) =
(X
- X)(Ff-Fjln
j.j j jj(
. n = Xilb'/l i= 1The value ofte covariance depends on the units inwhich the variables are
mea-sured. Changing onevariable from dollars tocents will givea new covariance 100
times te old. Toobtain ameasure of association that is invariant with respect to
unitsof measurement, the deviationsare expressed instandard deviation units. The covariance ofthestandardized deviations is thecorrelation coemcient, r nnmely,
y'
Quadrant11 QuadrantI
(xy)negative (xy)positive
(arr,l'j) I I I I yf= K- F I l l F xi= Xi -.# ' . . . . QuadrantIlI QzzzzzfrtznllV
(.ry)msitive (ay)negative
r
0 X x
FIGURE 1.4
8 EcoxoMilrrluc METHODS Xi A' r = - - /n = xiyilnsxsy , J J , i= l A' .Y i= l
where
sx = x2ln i i= l n J = ylln )' i ja.jOmitting subscripts and the limits of summation
(since
there is noambiguity) andperforming some algebraic manipulations give three equivalent expressions for the
correlation coefkient--two interms of deviationsand one in terms of theraw data:
N- vy r = Mxs).
X
xy ='
''-'JJIZ>u
y2 Yn
J(
A'F-(E .Y)-(E
F) ='
nX
X2-(X
X)2 nl F2-(X
F)2 N1.2.1
The Correla-e -- Coefficient foraBivariate Frequency Distribution In general.abivariate distributionsuch asthat shown in Table 1.1 may berepresentedby the pairedvalues X,. ')
with frequencyn f0r i = 1,
. . . , mandj = 1,. .., p. Xi
is themidmint of the ith subinterval on the Xaxis, and
j
the midpoint of the/th
subintenral on the F axis. Ifwe use aperiod fora subscript over which summation
has taken place. the marginal frequencies for Xare given by ni. =
XJ!-1nij for J
i = 1,
. . . . /?1. In conjurtion with the Xivalues these marginal frequencies will yield
the standard deviation of X. that is. s.%..The marginal frequencies for i' are n.j =
X'Nj.
jnij forj = 1.. . . , p.
nus.
the standard deviation of F, orJy, may be obtained.Finally the covariance isobtained from
m p
covlx
p
-:7 y''
nijxi-Rlvj
- jhln(1.4)
=l j= lwherenis the totalnumber of obsen'ations. Putting the threeelements together, one
may express the correlation coefNcint fortul*-bivarin- frequency distribution in
terms of theraw dataas
m P m p
n
l (2
nxi-(
nxij(
n.jk'jb f=1)=1 =l j=L r = m m p Pn
X
ni.xil-(X
ni.xt'jl nX
n.jhl-(
X
n.jYj)1 f=1 =1 j=3 /=1CHAIC'ER 1: Relationships between Two Variables 9 1.2.2 The Limits ofr
Thecorrelation coefscient must lie in therange from
-1 to+1. To see this, let c
be any arbitrary constant. Then
X(y
- c.x)2% 0. Now letc =
X
xy/X
xl.
Substi-tution in the inequality gives
(X
xy)2 t?(J x2)(E
y2),that is,r2 ci 1.Thisexpres-sion isone formof the Cauchpschwarz inequality. The equality will only hold if
eachand every ydeviation isa constant multiple ofthecorresponding .v
deviation.In
sucha case theobservations a1llieon a single straight line,with apositive slope
(r
=1)or a negative slope
(r
= - 1).Figurel.5
shows twocases inwhich r is
approxi-matelyzero. lnone case theobservations arescattered over all four quadrants; in the other they lieexactly on aquadratic curve,where positiveand negative products
off-set oneanother. Thus, thecorrelation coefhcient measures the degreeoflinear associ-ation. A lowvalue forrdoesnot rule out the possibilityof a strong nonlinear
associa-tion,and such an association might give positiveor negative values forrif thesample
observations happen to be located in pnrticular segmentsofthe nonlinear relation.
1.2.3
Nonseme Correlations and Other MattersCorrelation coefficients must be interpreted with care. Many coefscients that are bothnumerically large and also adjudged statistically sign@cant by tests to be
de-scribed latermay contain no real infonnation. That statistical signihcance has been
achieved does notnecessarily imply that a meaningful and useful relationship has
been found. The crucial question is, What has caused theobserved covariation? f there is a theory about the
joint
variation ofX and F, the sign and size of thecorre-lation coefhcient may lendsuppol't to that theoly, but ifno such theoryexists orcan
be devised, the correlation may beclassed as anonsense correlation.
(c)
FIGURE 1.5
Paired variables forwhich r2 =r 0.
10 Ecoxouslxlc MErruoos
Our favorite spurious, ornonsense, correlation was given inabeautiful 1926 pa-per by thestatistician G. Udny Yule.5Yule took annual data from 1866 to 1911 for the death rate inEngland andWales and forthe proportion of allmarriages solemnized
in the Church of England and found the correlation coefscient to be +0.95.
How-ever,noBritish politician proposed closing down the Church of England toconfer
immortality on theelectorate. Morerecently, using annual data from 1897 to 1958, Plosser and Schwerthave founda correlation coefficient of +0.91 between the logof
nominal income inthe United Statesand the logof accumulated sunspots.ti Hendry hasnoted a vel'y strong. though somewhat nonlinear, positive relationship between
the iniation rate andthe accumulation of annual rainfall in the United Kingdom.7 lt would benice ifthe British could reduce their inflationrate and,as a bonus, enjoy the inestimable sideeffect ofimprovedweather, butsuch happyconjunctions arenot
to be.
lnthese three examples all ofthevariables are subject to trend-like movements
over time.8 Presumably some complex set of medical, economic, and social factors
contributed tothereduction in the death rate in England and Wales, even asa
differ-ent setof factors preuced a decline in the proportionof marriages in the Church of England. Cumulative sunsmts and cumulative rainfall necessarily trend upward, as
do the U.S.nominal income andthe British iniation rate. Seriesresponding to
essen-tiallyunrelated generating mechanisms may thusdisplaycontemporaneous upward
and/or downward movements and thus yield strong correlation coefficients. Trends
may l>ehttedto such series, as will beshown in thenext chapter, and theresiduals
from such trends calculated. Correlations between pairs ofresiduals forsuch series
will l>enegligible.
An altemative approach tocorrelating detrendedresiduals is tocorrelate the first dilerences of the series. The first differences are simply the changes in the series between adjacent observations.
ney
are usually denoted by the prefix . Thus,Xt = Xt
-Xt-1 Fr = Ff-
Yt-l
Many series Iat show very high correlations between X and F
(the
Ievels) willshow very low correlations between A.Yand AF
(the
hrst
dterence. Thisresultusually indicates a spurious relationship. On the other hand, if there is a causal
relationship Ytween the variables, we expect to hnd correlations betweenlevels and also between
tirst
differences. This point has recently been emphasized in an important pamr byStiglerand Sherwin.g The main thesis ofthe paper is that if5G.UdnyYule, tWlzy Do We Sometimes Get Nonsense Correlations between TimeSeries?''9Journal
of the Royal Statistical Society Series A, General, 89, 1926, 1-69.
6charles 1. Plosserand G. William Schwert, eeMoney,lncome, and Sunspots: Measuring Economic Re-lationships and the EffectsofDifferencing,'' Journal ofMonetary Economics, 4. 1978, 637-660. ?David F. Hendry, Econometrics-Alchemy orScience?'', Economica, 47, 1980, 387-406.
s'Frends, likemost ecenomic phenemena, are often fragile and transitoly The Imint has been made in lyricalstyle by Sir Alec Cairncross, one ofBritain's most distinguished economists and a former chief economic adviser to the British government. &&h
trend isatrend, is a trend, but the question is, will it bend? Will italter its course, through someunforeseen force andcome toapremature end?''
gGeorgeJ. Stigler and Robert A. Sherwin, rrhe
Extent of the Marketv'' Journal of f-awand Economics, QR.19*, 555-585.
CHANER 1: Relationships between TwoVariables l1 twogoods or services are in thesame market their prices should l>eclosely related.
However,since most prices, likemany economic series, show trend-like movements
over time, Stigler and Sherwin wish to guard against being misled byspurious
cor-relation. Thus, in addition tocorrelating price levels they correlate price changes. Asone example, the prices ofDecember 1982silver futuresontheNew York
Com-modity Exchange and te Chicago BoardofTradeover a30-day trading period gave
r = 0.997, and theprice changes
gave r = 0.956. InMinneapolis, Minnesota, and
Kansas City, Missouri, twocenters ofthe iour-milling industry, themonthly
whole-sale prices of :our over 1971-198 1 gavecorrelations of0.97 for levels and 0.92 for first differences. ln these twocases the
tirst
differencecorrelations strongly reinforcethe levels correlations and support the thesis ofasingle market for these goods.
1.2.4
A Case StudyGasoline isretailed on theWest Coastof the United States by the ttmajors''
(Arco, Shell, Texaco, etc.) and by minors,'' or tindependents.''
Traditionally the majors
haveoffered a greatervariety ofproducts, differentiated in termsofgradeofgasoline, method of payment, degree of service, and soforth;whereas the minors havesoldfor cash andoffered a smaller range of products. ln thespring of 1983 Arco abolished its credit cards and sold for cash only. By the fall of 1983 the other majors had responded bycontinuing theircredit cards but introducing two prices,a credit price andalower cash price. Subsequently one of the independents sued Arcounder the
antitrust laws. The essence of the plaintiff's
case was that there were really two separate markets for gasoline, one inwhich the majors competed with each other,
anda second inwhich theminors competed. They further alleged, though not in this
precise language, that Arco was like a shark that had
jumped
out of the big pool into their little pool with the intention of gobbling them al1 up. Noone questionedthat there was competition within themajors and competition within theminors: the
crucial question was whethe.r there was competition between majors andminors.
The problem was a perfect candidate for the Stigler/sherwin type of analysis.
The Lundberg Surveyreports detailed information twice a month on the prices of al1
types and grades of gasoline atavery large sample of stations. These data are also
averaged formajors and minors. Twelve differentiated products were defined for the
majors and four for theminors. This step allowed the calculation of 66correlation
coefhcients fora11pairs ofproducts within themajors and 6correlation coefcients
within the minors. Each set of coefficients would be expected to consist of very
highnumbers, re:ecting the intensity of competition inside each group. However, it
wasalso possible tocalculate 48correlation coecients for a1lcross-pairs ofa major
priceand a minor price. If the plaintiff's
argument were correct,these 48coefficients
would be ofnegligible size. On the other hand, if there were
just
a single largemar-ket for gasoline, thecross correlations shouldnot bemarkedly less thancorrelations
within each group. Anice feature of the problem was that the within-group
corre-lations provided a standard of reference for theassessment ofthecross correlations.
In thecases discussed in the Stigler/sherwin paperonly subjectivejudgments could
bemade about the size of correlation coefficient required to establish that two goods
12 ECONOMETRICMETHODS E, E'
The preceding approach yielded amatrix of 120 correlation coefhcients. ln
or-der to guard against possible spurious correlation, such amatrix was computed for levels, forlirst differences, for logsoflevels, and for rst differences of logs
(which
measure percent changes in price).ln addition, regression analysiswas used to adjustfor possiblecommon influences from the priceof crude oilorfrom general inpation,
and matriceswere produced forcorrelations between theresiduals from these
regres-sions.ln all cases thematrices showed ''forests''
of tall trees (thatis, highcorrelation
coefficients),and the treeswere justas tall inthe rectangle of cross correlations as in
the triangles of within correlations. Thesimple correlation coefscients thus provided
conclusive evidence for theexistence of asingle market for retail gasoline.
1.3
PROBABILITY MODELS FOR TWO VARIABLES
Classical statistical inference is based on the presumption that there exists some
population distribution of all possibleobservations onthevariables ofinterest. That
distribution ischaracterized bycertain crucial parameter values. From asample ofn
observations sample statistics arecomputed and theseserve as a basis for inference about the population parameters. Ever since the work of Haavelmo in the 1940s the probability approach has been extensively used in econometrics.lo Indeed the development of econometrics in the past half century has been driven mainly by theeffort toadapt andextend classical inference procedures to deal with thespecial
problems raised by the nature of the data generation process ineconomics and the general 'lnnvnilahility of controlled economic
experiments.
13.1
Di.uwmwBivariate Probability DistributionTo introduce some of the main ideas, consider adiscrete bivariate probability
dis-tribution as shown in Table 1
.3.
Thecell entries indicate the probability ofthe
joint
occurrence of the associated X, F values. Thus, pij = probability that X = Xi andF = Yj. The column and row totals, where a period indicates the subscript
over which summation has taken place, give the marginal probabilities for Xand F,
re-smctively. There aresix important population parnmeters for the bivariate
distribu-tion. Themeans are desned by l.x = E (X) =
Y-
Pxi and(1.6)
i Thevariances are desned as
2
= vartatl = FgtxY- p.xlzj =X
pi.lxi - gyjl Gx i2
= var(F) = Fg(F - Jty)2j =X
yjlyj - gyjl Gh' .loerrygveHaavelmo,TheProbabilityApproachin Econometrics, supplement to Econometrica, 12,July, 1944.
CHAIC'ER 1: Relationships between TwoVariables 13 TABLE 1.3
A bivariate probability distribution
Marginal .1) -'' Xi ''. Xm probability Fz ;1I '- -piL '-' pmt p.t
1r/
7)1t J'j 77./ F# PLp J, Pmp P.p Marginal pl. pi. pm. 1 probability . dw Thecovariance iscxy
= covtmr)
= '((x - Jtxltr - p,yll=
Y-lY-,
Pijxi - y'xt - #.y)i j
Finally, the populationcprrelation cxcint isdefinM q Gxy corrtx, F) =
p =
exey
In these formulae
Xj
andXj
indicate summationover therelevant subscripts.Conditional probabilities
Consider the Xicolumn in Table 1.3. Each cell probability may be divided by
the column total,pi., to give a conditional probability forFgiven Xi.Thus,
#
bability thatz-
-vj
given thatx
-xi
= PrO
Pi.
=
probtryI
xi)
(1. 10)
Theman of this distribution is theconditional expectation of F, given Xi, that is,
pij
Mylm= EYl
-Y,')=X
. j pi. (1.11) Similarly, thevariance of this distribution is aconditional variance, org pij a j jo Gylxf = Var(F
l
Xij = YM, (F./ - Jtylx)(
. Pi.ne conditional means andvariances are both functions of X, so there isa set ofm
conditional means andvariances. ln asimilar fashion one may use te row
probabil-ities to study the conditional distributions of XgivenF.
. . . ''
. . ' .
14 ECONOMETRIC METHODS
TABLE 1.4
Bivariate distribution of income (#)and
vacationexpenditure (1r) ! x(y,,q,; 20 30 40 j zg# ig ; 2 .08 .15 .03 F 3 .04!, .06 .06 ($.-ww) 4 () ',., .()6 .. .j5 .
s
() () (si
03 6 0 , 0 . sjjj atj alj Marginal probability ! . '-') . . S, Mean(FIzY) . , 2.5 3.9 Var ()' a) - -44 85 1 09rs
,.e ...n.
s
-4o
j-
a.f
u
j
,y
0
.74
- ' . TAB 1.E l.6Conditional probabilities from Table 1.4
F 2 , Of . 0t! l 2 3 4 5 6 -t--q ). Dt t).
@
0. <--. 29 0.7 0.2 --.Qs. 0 .-... . 0 .. .. Q 14 3: 0. 0.5 0.2 0.2 0 0 e 0 0.1 0.2 0.5 0.1 0.l A numerical ex'ample Table l.4presents h)
mthetical
dataonincome andvacation expenditure for animaginary
mpulation.
nere
arejust three levelsofincome andsixpossible levels of vacation exNnditure. Everyone.no matter how humble, gets tospend at least$1,000
onvacation. 'I1e marginal probabilities show that 40 percent of this population have incomes of S20.(X%.Dmrcent
have incomes of$30,000,
and 30 percent have in-comes of S40.(G).ne
conditional probabilities derived from these dataare shown inTable 1.5.
Theseconditional pmbabilities are used to calculate the conditionalmeans
andvariances shown inthe lst two rows ofTable 1.4. Mean vacation expenditure
rises with income but the increase isnot linear, being greater for the increase from $30,000 to
$40,000
than forthe increase from$20,000to$30,000.
The conditionalvariance also increases with income. Onecould carry out the parallel analysis for X given F. This might be of interest to atravel agent concerned with the distribution of income for people with agiven vacation expenditure.
1.3.2
The Bivariate Normal DlstributionThe previousexamples have been in tenus ofdiscretevariables. Forcontinuous
vari-ables themost famous distribution is the bivariate normal. When X and F follow a
CHAFI'ER 1: Relationships between Two Variables 15 1
fx,
A')= Xl'nvxa.y
1- p2 1(x
-gxjl
. gp(x-
gyjy
-gyj
.y.(y
-gy41
(
j. j,; eXP -a(j
- pa)j
aj
,-j
a.yjq
ayy
j a.yln thisequation we have used .x
and yto indicate thevalues taken by thevariables X
and F. The lower-case letters here donot measure deviations fromsample means, as
they do in the discussion of the correlation coefhcient in Section 1.2.The range of
variation for both variables is fromminus to plus insnity. lntegrating over y in Eq.
(1.13) gives themarginal distribution forX, which is
2
1
1 x -gx/(*
= exp -;;.(1.
14) 2'mx d GxThus, the marginal distribution ofX is seen to be normal with mean y,x and stan-dard deviationo. Likewise, themarginal distribution of F isnormal with mean gy
andstandard deviation o'y. The remaining parameter in Eq.
(1.13)
is p, which canbe shown to be the correlation coefficient between X and F. Finally, from the
joint
distribution (Eq.(1.13)J
and the marginal distribution (Eq.(1.14)1,
the conditionaldistribution of F given Xmay be obtainedll as
/(y
I
x) =fx,yqlfx)
2'
exp -j'
('
-,.zl-)
= /-2'nvy 1x Gy Ix (1. 15) The conditional distribution is alsoseen to benormal. The conditional mean isMylx= a +
l?x
(1.16)
& where a = gy-p
p.x andp
= p A(1.17)
a.xThe conditional mean is thus a linear function of the X variable. The conditional variance is invarialzt with X and is given by
z z z
a
Gylx = Gyll
-P
)
(1.1 )
Thiscondition of constant variance isreferred to as homoscedsticity. Finally, the conditional mean and valiance for X given F may be obtained by interchanging x
and y in the last three formulae.
. '(..'.
1.4
THE TWO-VARIABLE LINEAR REGRESSION MODEL
Inmany bivariate situations thevariables are treated inasymmetrical fashion. For
te Scottish soldiers of Table 1.1 theconditional distribution of height, given chest 11See Problem l.4.
16 EcoxoMElwc METHODS
size, is
just
as meaningful and interesting as the conditional distribution of chestsize, given height. Theseare two aspects of the
joint
valiation. However, in theva-cation expenditure/income example we have already tended toshow more interest in theconditional distribution of expenditure, given income, than in the distribution of income, givenexpenditure. This example is typical of many economic situations.
Economists often have explicit notions, derived from theoretical models, of
causal-ity running from X, say, to F.
nus.
the theory of consumer behavior leads one to expect that household income will be a major determinant of household vacationexpenditure, but labor economics does not give equal strength to the proposition
that household vacation exmnditure is a major determinant of household income. Although it is formally true that a
joint
distribution can always be factored in twodifferentways intothe productof a marginal and a conditional distribution, one
fac-torization will often lx of more interest toan economist than the other. Thus, in
the expenditure/income case the factorization
fx,
F) = f(X) . .J(F1
X) will be of greater interest than the factorizationfx,
F) = .J(F) 'f
CXl
F). Moreover, in the hrst factorization the conditional distribution of expenditure, given income, will usually receive much more attention and analysis than the marginal distribution for income.1.4.1
A Conditional ModelTo formulate a model forvacation expenditure that is conditional on income, letus
consider how data on such variables might be obtained. One possibility is that a
sample of ??households from the N households in the
jopulation
was taken and the values of Fand X recorded fOr the year in question.lz This is an example ofcross-section data. There will besome-presumably complex and certainly
unknown-bivmiate distlibution fora11N households. This bivariate distribution itself will be some marginalization of a multivariate distribution covering income and all cate-gories of expenditure. Concentrating onthe conditional distribution, economic
the-Ol'y Would suggest
Ev
I
x)
= gx)where gX) is expected to bean increasing function ofX. lf the conditional
expec-tation is linear in X,as in thecase of abivariate normal distribution, then
EY
I
X) = + PX(1.19)
For the ft.hhousehold tis expectation gives EY
l
Xi) =a + pXi
The actualvacation expenditure ofthe th household is denoted by Yi, sowe define a discrepancy or disturbance ui as
ui = Yi
-F(F
I
Xi4= Ff-a
-pxi
(1.20)
l7Wenow return to the earlier convention of using XpndF to indicate both the label for avariable and
CHAPTER 1: Relationships Ytween Two Variables 17 The disturbance ui must thereforerepresent thenet in:uence of everything other than
the incomeofthe th household. These other factors might include such things as the
number and ages of household members, accumulated savings, and so fonh. Such factorsmight bemeasured and included in Eq.
(1.19),
butwith any finite numYr ofexplanatory factorswe still cannot expect perfect agreement between individual
ob-servationsand expectedvalues. Thus, theneed tospecify adisturbance termremains. Takingconditional expectations ofbothsides of Eq.
(1.20)
givesEui Xi) = 0.Thevariance of ui is also seen to be the variance ofthe conditional distribution, tz'z,jxj.
A lf we lookat the
/th
household, the disturbance uj will have zero expectation andvariancetojxy . Theseconditional variances may well vary with income. ln the
hypo-.!k7
tetical dataofTable l.4
theyare positively associated with income. For the present,
however, we will make the homoscedasticity assumption that the disturbance
vari-ances areconstant and independent ofincome.Finally, we make theassumption that the disturbances are distributd independently of one another. This rules out such
thingsas tivacation mania,'' where everyone rushes offtoEurope and large positive disturbances become apparent. This assumption implies that the disturbeces are
pairwise uncorrelated.lS Collecting these assumptions together gives
Eui) = 0 for all i
A'(I/?) = c2 fora11i
(1.21)
vartlpl =
,
covui, uj) = Eluiuj) = 0 fori # j
These assumptions are embodied in thesimple statement
z
The uiare iidto,tw
)
(1.22)
which reads the ui are independently and identically distributed with zero mean
andvariance G2.''
Now suppose the available datacome intime series formand that
Xt = aggregate real
disposable personal income in year t Yt= aggregate real vacation expenditure
in year t where t = 1, 2,
. . . , n. The series
(.'V)
isnolonger a set of sample values from thedistribution of all N incomes in any year: it is theactual sum ofall incomes in each
l3Twovariables are said to be independently distributed,or stochastically independent. if theconditional distributions are equal tothecorresponding marginal distributions. This statement is equivalent tothe joint probabilities being the productpfthemarginal probabilities. For the discrete case. the cov-n'-v-between X and F is then
covtx, 1')=
V.Fl
pijlxi - p,xlls. - p,).)i j
=
Y-.
pi.xi - #.xlX.
zji - p',.l using Eq.(1.6)
i j
=
ne converse is notnecessarily true since thecovariance measures linear association', but substituting p = 0 in Eq.
(1.13)shows that it is true for the bivariatenonnal distribution, since the bivariate density thencollapses into theproductof the two marginal densities.
18 Ecoxo-c M.hmsjms , ?
year.It might be regarded as a sample of n observations from the 4population''
of
al1possible aggregate income numbers, but this interpretation seems to be putting
some strainonthemeaning ofbothsample andpopulation. Moreover, theusual time
series sample'' consists of data forn adjacent years. Wewould berather suspicious ofcross-section samples thatalways consisted only of n adjacent households. They could be from Millionaires' Row or from Skid Row. Thus, it is difhcult to give an
unambiguous and useful interpretation of fX), themarginal distribution of X over
time. However, theconditional distribution fY
t
X) is still important and must be given aprobabilistic formulation. Tosee thisreasoning, return to thecross sectionformulation and introduce the timesubscript. Thus, Yit=
a +
pukjt
+ uit(1.23)
whee Yit= real vacation expenditure by the th household in
year t
Xit = real disposable incomeofthe th household in
year t
Making the
(implausible)
assumption that thea andp
parameters are the same for all households and aggregating Eq.(1.23)
over al1N households in theeconomy,wed
n
X
Yit- Na +p
y-',m,
+7-,
uui i i
whichmay berewritten as
F, = Na + pXt + Ut
(1.24)
where F and X denote aggregate expenditure and aggregate incomeand & is an
ag-gregate disturbance. The assumptions made about the householdI/'s imply that Ut is astochastic variable with zero mean and variance Na.l. ln the context Oftime series, oneneeds tomake afurtherassumption about the independence, or lack thereof, of
the &'s. lf the independence assumption ischosen, then thestatement is that the Ut are iidlo, Na'l).
1.4.2
F*lmotoq and EstimatorsWhether thesample dataare of cross section ortime series form, thesimplest version
of the two-variable model is Yi=
a + pXi + ui. with the uibeing iidto,G2). There
are thus three parameters to be estimated in themodel, namely, a,
p
, andG2. Theparameters a and
#
are taken asapair,since numerical values ofboth arerequiredto fita specific line. Once such a line has beep fitted, the residuals from that line may
beused to forman estimate oftr2.
An estimator is a formula, method, or recipe for estimating anunknown
popu-lation parameter', and an estimate is thenumerical value obtained when sample data aresubstituted in the formula. The firststep in fittingastraight line to snmple data
is to plot the scatter diagram and make sure from visual inspection that the scatter
is approximately linear.The treatment of nonlinear scatters is discussed in thenext
chapter. Let the straight line fitted to the data be denoted by
h
= a + bxi, whereZ.
indicates the height ofthe line at .Y,..'I'he actual Yivalue will in general deviate fromh.
Many estimatoo of the paira,b may be devised.CHAPTER 1: Relationships between Two Variables 19 1. Fitaline by eye and read off the implied values for the intercept a and slom b.
Different artists'' may,of course, draw different lines,soit is preferable tohave an estimator thatwill yield thesame result foragiven dataset, irreslxctive of'lwt investigator.
2. Pass aline through the leftmost pointand therightmost pointofthe scatter. If X.
denotes thesmallest value of X in thesample and X.. the largest and F., F.. the associated Fvalues, thisestimator is
b = (F.,
-Y.4lX.. - .Y+)
a =
F, - bX. = F+.
-DX..
This estimator can hardly beexpected to perlbrm very well since ituses only two
of thesample points and ignores therest.
3. The lastcriticism may bemet by averaging the Xand F coordinates of themleft
mostand them rightmost points, where missome integer between 1 andnll, and
passing aline through theresultant average points. Such anestimator with m set
atr#3 orn/2 has been proposed in the literature onerrors invariables, aswill be discussed later. This typeof estimator doesnot easily lend itself tomathematical
manipulation, and some of its properties inrepeated applications are diflkult to
determine.
'
''
...
'
( .E'1.4.3 Leastquares Estimators
The dominant and powerful estimating principle, which emerged in theearly years
of thenineteenth century for thisand other problems, is that ofleast square4 Let the residuals fromany litted straight line be denoted by
ei = Yi
-f
= Yi-a - bXi
i = 1, 2,
. .., n
(1.25)
From te desnition of
f
and fromFig. 1.6 theseresiduals are seen to be measured in thevertical (F) direction. Each pairof a,bvalues definesadifferent line and hencea different set ofresiduals. The residual sum of squares is thus a function of a and
b. The least squares principle is ,
Selct t!,b tominimize theresidual sum of squares.
Rss =
X
e?,=fa,
b)The necessary conditions for astationary value ofRSS are15
l4See again the unfolding story in Stephen M. Stigler, Fe History of akzzz7tlfcl.,
Harvard University Press, 1986.
l5Inobtaining the derivatives we leave thesummation sign in placeand differentiate the typical term withrespect toaand b in turn,and simply observe therule thatany constant can bemoved in front of thesummation sign butanything thatvaries fromone sample point toanother must be kept to theright of the summation sign. Finally,we have dropped thesubscripts andrange of summation since there is no ambiguity. Strictly speaking, 0ne should also distinguish between thea and bvalues thatappear in theexpression to be minimized and the specificvalues that actually do minimize the residual sum of squares, but again there is littlerisk of ambiguity and wehave kept the expressions uncluttered. '
20 EcoNoMsTmc METHODS F P(x. y. y-= a+bx P I I I ei I I . I 1 I J'l I F 1 . 1 I. j t . I I I I 1 . . . I '' '.' ' 1 . . I J 1 l 1 I I I I 0 X X. X I FIGURE 1.6
Residuals from a htted straight line.
(1.26)
t?(X
el) and =-2X
X(F -a -bX) =-2X#e
= 0(1.27)
DbSimplifyzg gives thenormal equations for the linear regression of F on X. That is,
N'
r
=na +
bYx
-
(1.28)
XXX = &VX + FXXI
Thereason for theadjective porm.alFillbecome clearFhenwedisuss the geometzy of least squares later.
The firstnormal equation may berewritten as
a = lV- bx
(1.29)
Substtuting
fora in te second normalequation gives?G
el)=
-2X(r
-a -bx) =-2Xe
= 0aa
X
xyus
b= =r)(
x2 sx (1.30)CHAICER 1: Relationships between Two Variables Thus, the least-squares slope may lirst of a11be estimated byEq.
(1.30)
from the sample deviations, and the intercept then obtained from substituting for b in Eq.(1.29).
Notice that these two expressions haveexactly thesame formasthose given
in Eq.
(1.17)
for the intercept and slope of the conditional mean in the bivariatenormal distribution. The only difference is that Eqs.
(1.29)
and(1.30)
are in termsofsample statistics, whereas Eq.
(1.17)
is in tenns of population statistics.Tosummarize, the least-squares line has three important properties. It minimizes
thesum ofthe squared residuals. It passes through themean point
(X,
9,
as shown by Eq.(1.29).
Finally, the least-squares residuals havezero correlation in the samplewith thevalues ofX.16
The disturbance variance c2 cannot be estimated from a sample of u values,
since these depend on theunknown a and
p
values and arethusunobservable. Anes-timate can be based on thecalculated residuals
(the
ej). Two possibilities areJ()
ellnor
X
e2/(?z-zjsTorreaslms to beexplained in Chapter
7-+....111.p..#./.9%--11...-J
el2 =
(j
gj)J .
n- 2)
1.4.4 Decomposition of
the
Sumof SquaresUsing Eqs.
(1
.25)and
(1.29),
one may express the residuals in terms of the x, ydeviations, namely
ei =
yj - bxi
(1.32)
Squaring both sides, followed bysumming over the sampleobservations, gives
X
el =V
,2-lbT
xy + blX
.x2
The residual sum of squares is thus seen to be a quadratic function of b. Since
J(
xl % 0, and theequality would only hold in the pathological case ofzero variationin the X variable, the single stationary point is necessarily a minimum. Substitution
from Eq.
(1.30)
givesX
=blX
xl +N
el=
hxxy +X,2
=
r2Xy2
+X
el(
l.33)This famvusdecomposition of thesum ofsquares is usually written as
TSS = ESS + RSS
X-
Xe = X.(A + X)e = '.'.%', xe + #y.
e =N.
xe using Eq.(1.26)22 scoNoMsnuc METHODS wherel? TSS = total
sum of squared deviations in the Fvariable
RSS = residual, or unexplained, sum of squares from the regression of F
X on
ESS = explained sum of squares from theregression of F onX
The last line of Eq.
(1.33)
may berearranged to giveRSS ESS
rl = 1- =
(1.34)
TSS TSS
Thus, may be interpreted as the proportion of the Fvaliation attributable to the
linear regression onX. Equation (1.34)
provides an alternative demonstration that
the limits of r are t 1 and tiut in the limiting case the snmple points a1l lie on a in lestraight line.
s g
1.4.5
A Nz.mer' .*1Kumple
Table 1.6 givessome simple data to illustrate the application of these formulae.
Sub-stitution in Eq. ( l.28)
then gives the normal equations 40 = 5a + 20: 230 = lQa+ 120: '. .' . . . ' ' . with solution . . . ' . p . j .y j yyy
ne same data in deviation form are shown in Table 1.7. Theregression coefficients may be obtained from
s zv yo b = =-' -= = 1.75
J(
.:2T
and
a = 1-/- bk = 8- 1.75(4) = 1The explained sum of squares may be calculated as ESS = b
X
xy = 1.75(70) = 122.5and the residual sum of squares is given bysubtraction as
RSS = TSS- ESS = 124 - 122.5 = 1.5
Finally, the proportion of the Fvariation explained by te linearregression is
ESS 122.5
2= = = () 98-/9
r .
TSS 124
l7unfortunately there isno uniform notation for sumsof squares. Someauthors use SSR to indicate the sumof squares due to theregression (ourESS),and SSE to indicate the sumof squares due to error(our
CHAPTER 1: Relationships between TwoVariables 23 TABLE 1.< k ..:J; . v
x
r avlr xz , e xe l 4 8 4 4.50 -0.50 - 1 3. 7 21 9 6.25 0.75 2.25 1 3 3 1 2.75 0.25 0.25 5 9 45 25 9.75 -0.75 -3.75 . . .'. 't' . . ..% 9 17 153 81 16.75 0.25 2.25 ' .7 sums 20 40 230 1.20 40 0 0 TABLE 1.7 x y xy .r2 y2 J e xe -2 -4 8 4 16 -3.50 -0.50 1.00 - 1 - l 1 1 1 - 1.75 4.75 -0.75 -3 -5 15 9 25 -5.25 0.25 -0.75 1 1 l 1 1 1.75 -0.75 -0.75 5 9 45 25 81 8.75 0.25 1.25 Sums 0 0 70 40 124 0 0 0 1s
IMFERENCEINTHE TWO-VARIABLE, LEAST-SQUARES MODEL
The least-squares (LS)estimators of a and
p
have been defined in Eqs.(1.28)
to (1.30). There arenow two important questions:1. What are the properties oftheseestimators?
2. How may these estimators beused tomake inferences abouta and
p
1.5.1
Pmperties ofLS EstimatorsThe answers to both questions depend on the sampling distribution of the LS es-timators. A sampling distribution describes the behavior of the estimatorts) in
re-peated applications of the estimating formulae. A given sample yields a smcilic
numerical estimate. Another sample from the same population will yield another
numerical estimate. A sampling distlibution describes the results that will
the
ob-tained for the estimatorts) over the potentially inhnite set of samples that may l>e
drawn from the population.
The parameters of interest are a,
p
, andG2 of the conditional distribution, fY
1
X). ln that conditional distribution theonly source of variation from onehy-mthetical sample toanother isvariation in thestochastic disturbance
(lg),
which in conjunction with the given Xvalues will determine the i' values and hence thesam-pIevalues of a, b, and sl. Analyzing J'conditional on X thus treats the .Y1,X2,. . .,
assump-24 EcoxoMElwc METHODS
tion that themarginal distribution forX, that is,fX), doesnotinvolve the parameters of interestor,inother words, thatfX) contains noinfonnation ona,
J'1,
andtO.
Thisiscalled the fixed regressor case, orthe caseof nonstochastic X.From Eq.
(1.30)
the LS slope may be writtenb
-y'''
wfswhere the weights w/ are given by
xi
wf = -
(1.35)
7
x?l
These weights are hxed inremated snmpling and have the following properties:
l
wf
= 0Y)
w?,=- . and
Y-'
wixi = Y'Nwixi = 1
(1.36)
L .f
(
''-'-'-lt
then follows thatb =
X
wjrj(1.37)
so that the LSslom isalinearcombination of the Fvalues.
The sampling distribution of b is derived from Eq.
(1.37)
bysubstituting Ff =t + pXi + ui and using the stochastic properties of u to determine the stochasti properties ofb.
nus.
b
-a
(y-.
w,.)+p
(y''
w,.x,.)+y'-'-
wjuf=
p
+X,
wiuiEb) =
p
and so
(1.39)
that is. the LS slope isan unbiased estimator of
p
. From Eq.(1.38)
the variance ofp isseen to be
2
vartyl
=ELb
-p)2j
= E(X
wjujjFrom the properties of thew's itmay be shownlB that
z
G
varthl
=z
x
By similar methods itmay be'shownlg that
and
These four formulae give the means andvariances of themarginal distributions of
a and b.The twoestimators, however, are in general not stochastically independent, (1.40) E