6.3 Serial correlation in the error distribution
6.3.2 FGLS estimation with serial correlation
For AR(1) disturbances of (6.15), if ρ were known, we could estimate the coefficients by GLS. The form of Σu displayed in (6.3) is simplified when we consider first-order serial correlation with one parameter ρ. An analytical inverse of Σu may be derived as
Σ−1u = σu−2
As with heteroscedasticity, we do not explicitly construct and apply this matrix Rather, we can implement GLS by transforming the original data and running a regression on the transformed data For observations 2, . . . , T , we quasidifference the data: yt− ρyt−1, xj,t− ρxj,t−1 and so on. The first obser-vation is multiplied by p
1 − ρ2.
The GLS estimator is not feasible because ρ is an unknown population parameter just like β and σu2. Replacing the unknown ρ values above with a consistent estimate and computing Σbu yields the FGLS estimator As with heteroscedasticity, the OLS residuals from the original model may be used to generate the necessary estimate The Prais and Winsten (1954) estimator uses an estimate of ρ based on the OLS residuals to estimate Σb−1u by (6.16). The closely related Cochrane and Orcutt (1949) variation on that estimator differs only in its treatment of the first observation of the transformed data, given the estimate of ρ from the regression residuals. Either of these estimators may be iterated to convergence: essentially they operate by ping-ponging back and forth between estimates of β and ρ. Optional iteration refines the estimate of ρ, which is strongly recommended in small samples. Both estimators are available in Stata with the prais command.
Other approaches include maximum likelihood, which simultaneously esti-mates one parameter vector (β′, σ2, ρ), and the grid search approach of Hil-dreth and Lu (1960). Although you could argue for the superiority of a maxi-mum likelihood approach, Monte Carlo studies suggest that the Prais-Winsten estimator is nearly as efficient in practice as maximum likelihood.
I illustrate the Prais-Winsten estimator by using the monetary policy reac-tion funcreac-tion displayed above. FGLS on this model finds a value of ρ of 0.19 and a considerably smaller coefficient on the lagged change in the long-term interest rate than that of our OLS estimate.
. prais D.rs LD.r20, nolog
Prais-Winsten AR(1) regression -- iterated estimates
Source | SS df MS Number of obs = 524
---+--- F( 1, 522) = 25.73 Model | 6.56420242 1 6.56420242 Prob > F = 0.0000 Residual | 133.146932 522 .25507075 R-squared = 0.0470 ---+--- Adj R-squared = 0.0452 Total | 139.711134 523 .2671341 Root MSE = .50505 ---D.rs | Coef. Std. Err. t P>|t| [95% Conf. Interval]
---+---r20 |
LD. | .3495857 .068912 5.07 0.000 .2142067 .4849647 _cons | .0049985 .0272145 0.18 0.854 -.0484649 .0584619
---+---rho | .1895324
---Durbin-Watson statistic (original) 1.702273
Durbin-Watson statistic (transformed) 2.007414
In summary, although we may use FGLS to deal with autocorrelation, we should always be aware that this diagnosis may reflect misspecification of the model’s dynamics or omission of one or more key factors from the model. We may mechanically correct for first-order serial correlation in a model, but we then attribute this persistence to some sort of clockwork in the error process rather than explaining its existence. Applying FGLS as described here is suitable for AR(1) errors but not for higher-order AR(p) errors or moving-average (MA) error processes, both of which may be encountered in practice.
Regression equations with higher-order AR errors or MA errors can be modeled by using Stata’s arima command.
Exercises
1. Use the cigconsump dataset, retaining only years 1985 and 1995. Regress lpackpc on lavgprs and lincpc. Use the Breusch-Pagan test (hettest) for variable year. Save the residuals, and use robvar to compute their variances by year. What do these tests tell you?
2. Use FGLS to refit the model, using analytical weights based on the residu-als from each year. How do these estimates differ from the OLS estimates?
3. Use the sp500 dataset installed with Stata (sysuse sp500), applying tssetdate. Regress the first difference of close on two lagged differences and lagged volume. How do you interpret the coefficient estimates? Use the Breusch-Godfrey test to evaluate the errors’ independence. What do you conclude?
4. Refit the model with FGLS (using prais). How do the FGLS estimates compare to those from OLS?
Chapter 7
Regression with indicator variables
One of the most useful concepts in applied economics is the indicator variable, which signals the presence or absence of a characteristic. Indicator variables are also known as binary or Boolean variables and are well known to econo-metricians as dummy variables (although the meaning of that latter term is shrouded in the mists of time). Here we consider how to use indicator variables
• to evaluate the effects of qualitative factors;
• in models that mix quantitative and qualitative factors;
• in seasonal adjustment; and
• to evaluate structural stability and test for structural change.
7.1 Testing for significance of a qualitative factor
Economic data come in three varieties: quantitative (or cardinal), ordinal (or ordered), and qualitative.1 In chapter 3, I described the first category as con-tinuous data to stress that their values are quantities on the real line that may conceptually take on any value. We also may work with ordinal or ordered data. They are distinguished from cardinal measurements in that an ordinal measure can express only inequality of two items and not the magnitude of their difference; for example, a Likert scale of “How good a job has the pres-ident done? 5 = great, 4 = good, 3 = fair , 2 = poor, 1 = very poor” will generate ordered numeric responses. A response of 5 beats 4, which in turn
1I discuss censored data in chapter 10.
182
beats 3 for voter satisfaction. But we cannot state that a respondent of 5 is five times more likely to support the president than a voter responding 1, nor 25% more likely than a respondent of 4, and so on. The numbers can be taken only as ordered. They could be any five ordered points on the real line (or the set of integers). The implication: if data are actually ordinal rather than cardinal, we should not mistake them for cardinal measures and should not use them as a response variable or as a regressor in a linear regression model.
In contrast, we often encounter economic data that are purely qualitative, lacking any obvious ordering. If these data are coded as string variables, such as M and F for survey respondents’ genders, we are not likely to mistake them for quantitative values. We hope that few researchers would contemplate using five-digit zip codes (U.S. postal codes) in a quantitative setting. But where a quality may be coded numerically, there is the potential to misuse this qualitative factor as quantitative. This misuse of course is nonsensical:
as described in section 2.2.4, we can encode a two-letter U.S. state code (AK, AL, AZ,. . ., WY) into a set of integers 1, . . . , 50 for ease of manipulation, but we should never take those numeric values as quantitative measures.
How should we evaluate the effects of purely qualitative measures? Since the answer to this question will apply largely to ordinal measures as well, it may be taken to cover all nonquantitative economic and financial data. To test the hypothesis that a qualitative factor has an effect on a response variable, we must convert the qualitative factor into a set of indicator variables, or dummy variables. Following the discussion in section 4.5.3, we then conduct a joint test on their coefficients, If the hypothesis to be tested includes one qualitative factor, the estimation problem may be described as a one-way ANOVA Economic researchers consider that ANOVA models may be expressed as linear regressions on an appropriate set of indicator variables.2
The equivalence of one-way ANOVA and linear regression on a set of indica-tor variables that correspond to one qualitative facindica-tor generalizes to multiple qualitative factors. If two qualitative factors (e.g., race and sex) are hypoth-esized to affect income, an economic researcher would regress income on two appropriate sets of indicator variables, each representing one of the qualita-tive factors. If we include one or many qualitaqualita-tive factors in a model, we will estimate a linear regression on several indicator (dummy) variables.
2Stata’s anova command has a regress option that presents the results of ANOVA models in a regression framework.