Effect of Regressor Forecast Error on the
Variance of Regression Forecasts
LEONARD J. TASHMAN1*{, THORODD BAKKEN2 AND JEFFREY BUZAS1
1University of Vermont, USA 2Citibank International, Norway
ABSTRACT
It is well understood that the standard formulation for the variance of a regression-model forecast produces interval estimates that are too narrow, principally because it ignores regressor forecast error. While the theoretical problem has been addressed, there has not been an adequate explanation of the eect of regressor forecast error, and the empirical literature has supplied a disparate variety of bits and pieces of evidence. Most business-forecasting software programs continue to supply only the standard formulation. This paper extends existing analysis to derive and evaluate large-sample approximations for the forecast error variance in a single-equation regression model. We show how these approximations substan-tially clarify the expected eects of regressor forecast error. We then present a case study, which (a) demonstrates how rolling out-of-sample evaluations can be applied to obtain empirical estimates of the forecast error variance, (b) shows that these estimates are consistent with our large-sample approximations and (c) illustrates, for `typical' data, how seriously the standard formulation can understate the forecast error variance. Copyright
#2000 John Wiley & Sons, Ltd.
KEY WORDS regression; regressor;ex anteversusex postforecasts; forecast
error variance; relative variance; prediction interval; out-of-sample; rolling evaluation
INTRODUCTION
It is well understood that the standard formulation for the variance of a regression-model forecast (henceStandard) produces interval estimates that are too narrow. Articulate statements of this problem can be found, among many other places, in Adams (1987, p. 139), Fildes (1985, p. 563), Intrilligator (1978, p. 517), Levenbach and Cleary (1984, p. 241), and Newbold and Bos (1994, p. 8). The source of diculty, in large part, is that theStandardassumes that future values of each regressor are known with certainty.
* Correspondence to: Leonard J. Tashman, School of Business Administration, University of Vermont, Burlington, VT 05405, USA. E-mail: [email protected]
The error variance of anh-step-ahead forecast from originTis represented, for the case of a single regressor, by equation (0), the Standard (Diebold, 1998, p. 293). Incorporated here is uncertainty due to (a) random variation about the true regression function,s2and (b) estimation (sampling) error in the regression coecients. The latter component, in turn, depends also upon sample size and the deviations of future values of the regressor from their in-sample mean.
The Standard s2 11=n x
Thÿx2=S xtÿx2 0
However, error in forecasting out-of-sample values of the regressor (hence, regressor forecast error) introduces an additional source of uncertainty. Hence the forecast error variance is necessarily larger than that given by (0). TheStandardalso fails to re¯ect the likelihood that the uncertainty associated with out-of-sample forecasts of the regressor will increase with the lead time of these forecasts.
Many textbooks take care to note that theStandardapplies to forecasts that areconditionalon the assumed values of the regressor. (See, for example, Diebold, 1998, p. 291.) No such admonitions, however, are to be found in the manuals of six, well-known forecasting programs. (See Software List, before the Reference list.) The concern here is that practitioners insouciantly report prediction intervals that are too narrow, perhaps severely so.
Two key issues emerge. How much greater will the forecast error variance be (or how much wider will a prediction interval be) if it is also to account for regressor forecast error? Can variance in¯ation factors be estimated and used as adjustments to theStandard?
The empirical literature, reviewed in the next section, supplies bits and pieces of often disparate evidence, mostly gleaned from comparisons ofex post(Xknown out-of-sample) andex ante (out-of-sample values ofX must be forecast) scenarios (see, for example, Osborn and Teal, 1979; Jarrett, 1990). However, because they have not systematically linked measures of forecast error in the dependent variable to regressor forecast error, these studies have not provided useful guidance to practitioners of regression forecasting. In contrast, Feldstein's early analysis
(Feldstein, 1971) of the case of stochastic regressors provides a useful starting point.
For the dependent variable, let us de®ne therelative forecast error variance(hencerfev) as the ratio of theex anteto ex postforecast error variance. The denominator Ð the ex postforecast error variance Ð is theStandard.
In the third section, we extend the Feldstein analysis to derive large-sample approximations for the rfev. These substantially clarify the relationship between forecast error in the dependent variable and regressor forecast error.
We then describe a methodology for estimating therfevfrom out-of-sample forecast errors. Our methodology employs matching, rolling out-of-sample forecasts of the regressors and the dependent variable.
A case study is presented to illustrate, for `typical' data, how seriously the Standard can understate theex anteforecast error variance.
If our large-sample approximation of therfevis adequate, straightforward adjustments to the
Standardcan be made to account for regressor forecast error. We use the case study data to test the large-sample approximations against purely empirical measures of the variance of out-of-sample forecast errors. The results show a close match.
PRIOR EVIDENCE
There is ample evidence that regressor forecast error can have serious implications. Jacobs and Sterken (1994) compare ex post and ex ante scenarios for several variables in their macroeconomic model, GUESS, and conclude that `forecast errors to a large extent can be attributed to wrong assumptions for exogenous variables' (p. 17). Ashley's (1983, 1988) studies demonstrate that, for use of macroeconomic variables as regressors in forecasting models, the incremental error often is so severe as to make inclusion of the regressor in a model more harmful than bene®cial. Bassin's (1987) analysis of the errors in regression-model forecasts of quarterly shipments series in 15 industries found that the mean absolute percent forecast errors (MAPEs) were twice as large on average when the (largely macroeconomic) regressors were forecastex ante
as when known values were assumedex post. In this study, theex anteforecasts of the regressors were econometric forecasts obtained from Data Resources, Inc. and all forecast error measures represented averages across the forecast horizon of 1±12 quarters. Geriner and Ord (1989) performedex anteversusex postevaluations to compare bivariate against univariate forecasting models. Theex postforecasts were made using the known, post-sample values of the explanatory variables. Theex anteforecasts were based on univariate ARIMA projections of the regressor. For their four annual data series,ex anteforecasting accuracy is substantially worse thanex post, at both short and long horizons. For a one-period-ahead forecast, theex antemeasure is 60% higher than theex post. For the average of 1±6 periods ahead, theex antemeasure is 25 times as large. Curiously, however, for their two monthly and four quarterly series,ex anteforecasting accuracy is no worse thanex post.
In seeming contradiction to the studies cited above, Armstrong (1985) writes that of 13 published studies he found Ð all from the 1960s and 1970s Ð ten `mysteriously' showed thatex anteforecasting accuracy was at least as good, if not better, thenex postaccuracy. His inference: `The point is quickly reached where greater accuracy in forecasting the causal variables does not lead to greater accuracy in forecasting the dependent variable' (p. 241).
Armstrong's surmise raises additional questions. In the studies cited, how serious were the magnitudes of regressor forecast error? Are there diminishing returns to improve regressor forecasting accuracy? Finally, what sort ofinvisible handis at work that benevolently osets the eect of regressor forecast error, leavingex anteaccuracy no worse Ð indeed sometimes better Ð thanex postaccuracy?
Our analysis in the next section will show that, when forecasting is done `automatically' (no user judgement is input), there is no invisible hand at work; that is, in terms ofexpectations, the uncertainty imparted by regressor forecast error must widen the forecast error variance. Individual departures from the mathematical expectation are, of course, possible. In two of the 15 industries examined by Bassin (1987), theex anteMAPE was below that of theex postMAPE. We do ®nd support for Armstrong's diminishing returns argument, as will be shown in the next section.
RELATIVE FORECAST ERROR VARIANCE
Making the traditional assumptions that underpin the classical regression model and, in addition, assuming independence of (a) model errors from (b) errors in forecasting the regressors,
Feldstein (1971) derives a general expression for theex anteforecast error variance in a single-equation regression model (Eqs (3) or (4), p. 56). Pindyck and Rubinfeld present a simpli®ed version of the Feldstein equation, in which there is but a single explanatory variable (1991, Eq. (846), p. 197). Neither Feldstein nor Pindyck and Rubinfeld show explicitly how the forecast error variance is aected by the magnitudes of regressor forecast error.
To clarify the relationship between the forecast error variance and regressor forecast error, we have derived large-sample approximations for the rfev, again based upon the single-equation regression model that satis®es the traditional assumptions of zero mean and constant variance. The key additional assumptions needed for our derivation are that regressor forecast errors are uncorrelated (a) among themselves and (b) with model errors. A useful property of our expressions is that they are independent of the measurement scales of both the dependent variable and the regressors and thus are relatively easy to interpret.
Let the dependent variableyt be related to the vector of regressorsx~t x1t;x2t;x3t;. . .;xkt,
ytx~tbt t1;. . .;T
wherebis a vector of regression coecients andtis random error with mean zero and variance
s2. In our large-sample derivations (Appendix A), it will not be necessary to assume that the errorsftgTt1 are uncorrelated.
The regressorsx~tare assumed to be stochastic with meanuxand covariance matrixSx. Letb^
represent an estimate of b from the observations fyt; ~xtgTt1: Typically b^ is the ordinary least squares estimator.
Theh-step-ahead forecast,y^Th;for the dependent variable when the vectorx~Th is known is ^
yThx~Thb^
and theC% prediction interval foryTis ^ yTh+t nÿk;Cs y^Th wheres y^Th E yThÿy^Th2 p
is the standard deviation of the forecast error andt nÿk;C is the critical value from thet-distribution withnÿkdegrees of freedom and con®dence levelC. The exact expression of s y^Th will depend on how b^ is estimated and assumptions about the
correlation structure offtgTt1:
When the explanatory variables themselves must be forecast, we letx^Threpresent the vector of
forecasts from timeT. We assume that
^
xTh x~ThuTh
where the vectoruTdenotes the forecast errors. We assume thatuThas mean zero and covariance
matrix, Su 0 0 . . . 0 0 s2 u1 . . . 0 ... ... .. . ... 0 0 . . . s2 uk 0 B B B @ 1 C C C A 0 0 . . . 0 0 p1s2x1 . . . 0 ... ... .. . ... 0 0 . . . pks2xk 0 B B B @ 1 C C C A
wherepjrepresents the ratio of the forecasterrorvariance forxj to the variance ofxj; that is, pjs 2 uj s2 xj
Thepjrepresent the portion of the variance ofxthat is unexplained by the forecasting model for
x. The form of the covariance matrix follows from the assumption that regressor forecast errors are uncorrelated among themselves.
It is worth noting that if one restricts attention to the case in which regressors are forecast from an autoregressive process, then our assumption that regressor forecast errors are uncorrelated implies that the regressors themselves are uncorrelated. In the more general framework explored here, however, one can assume uncorrelated forecast errors without implying uncorrelated regressors.
The forecast foryTwhen the regressors are forecast is given by: ~
yThx^Thb^
For large samples and normally distributed forecast error in both the regressor and the model, the error in forecasting the dependent variable is approximately the dierence between two independent, normal variables and hence approximately normally distributed. The prediction interval thus is of the form
~
yTh+t nÿk;Cs y~Th
where the forecast standard deviation s y~Th
E yThÿy~Th2
q
contains a component for forecast error in the regressors.
To describe the increase in the prediction interval due to regressor forecast error, we can look at the relative forecast error variance. In Appendix A, we show that
lim n!1 s2 y~ Th s2 y^Th1 Xk j1 pjr2yxjxÿj 1ÿr2 xjxÿj 1ÿr2yxjxÿj 1 where r2
yxjxÿj represents the population coecient of partial determination for adding xj to a model already containing xÿj 1;x1;. . .;xjÿ1;xj1;. . .;xk0; and r2xjxÿj represents the popu-lation coecient of multiple determination for the regression ofxjonxÿj:The square root of the
right-hand side gives the ratio of the width of theex antetoex postprediction interval. We note in Appendix A that examining the ratio of prediction errors in the limit is equivalent to assuming that the regression coecients are known.
TWO SPECIAL CASES OF INTEREST Single explanatory variable
Consider the simplest case, in which there is a single regressor,x~t 1;x10and thatx1is forecast
with error. The right-hand side of equation (1) reduces to
1 1pÿ1r2r2 2
wherer2is the square of the correlation betweenyandx.
When sample size is large, therfevÐ the extent of variance in¯ation from theStandardthat is due to error in forecastingxÐ is seen to depend on two factors:
(1) The strength of the relationship betweenyandx, as measured byr2 (2) The degree of error in forecastingx, measured byp1.
Ifr2is close to zero, therfevis close to unity. Hence, if the model supplies a very poor ®t to the (in-sample) data, the question of accuracy in forecastingxout-of-sample is moot. Conversely, if
r2is high, any degree,p, of error in forecastingx is considerably leveraged.Achieving forecast
accuracy in a regressor becomes more important the better the model ®ts the in-sample data. Given the in-sample utility of the model Ð summarized by theleverage ratio,r2= 1ÿr2Ð the
rfevgrows in proportion to increases inp. Ifp=1, therfevÐ and hence the relative width of the prediction interval Ð is determined by the leverage ratio.
There is an interesting second-order eect. By taking the derivative of the square root of equation (2) with respect top, we ®nd that a given percentage point reduction inpÐ that is, a unit improvement in forecastingXÐ will generate a larger reduction in the relative width of the prediction interval whenpis initially high than whenpis initially low. This result lends support to Armstrong's above-mentioned assertion that there are diminishing returns to improved accuracy in forecasting a regressor. However, diminishing returns should be viewed in a relative sense: as shown above, for a model that ®ts well in-sample, improved accuracy in forecastingX out-of-sample can be worthwhile throughout the range ofp.
Multiple regressors, only one forecast with error
We next consider the case where there are multiple explanatory variables but onlyx1is forecast with error. The extension of the single regressor case is remarkably straightforward. The right-hand side of equation (1) becomes
1 p1r2yx1xÿ1
1ÿr2
x1xÿ1 1ÿr2yx1xÿ1
3 The expression following the sign can be termed themarginal eect of regressor x1. From it we see that the eect of forecast error inx1on the precision of forecasting the dependent variable depends once again on p1, the degree of forecast error in x1 as well as on the coecient of determination re¯ecting the strength of relationship between x1 and y. In this case, the term
r2
yx1xÿ1 is a partial coecient of determination, representing the utility of adding x1 to a
An additional term comes in as well,r2
x1xÿ1; which is the coecient of determination in an equation in which x1 is regressed on x2;x3;. . .;xk: So this term expresses the degree of
collinearity between the given regressor and the others in the equation.
The second and third factors are not independent. Rather, as the degree of collinearity increases, the utility of adding x1 to a model already containing x2;x3;. . .;xk decreases.
Collinearity may mitigate or exacerbate the eect of regressor forecast error on therfev. At the other extreme, ifx1and x2;x3;. . .;xkare orthogonal, equation (3) reduces to equation (2).
It is helpful to note that, in the general case described by equation (1), therfevis an additive sum of themarginal eectterms, each of which represents the marginal contribution to therfevof forecast error in an individual regressor.
ESTIMATING THE RELATIVE FORECAST ERROR VARIANCE
We will describe our methodology for estimating therfevwithin the context of the model with two regressors:
ytb1xtb2ztt 4
The model contains no lagged variables and t is assumed to have zero mean and constant
variance. The estimation relies on the form of therfevexpressed by Appendix A, equation (A1), whose components are the regression coecients (bj), the variance of the random error term,s2, in the regression equation, and the variances of regressor forecast error,s2
u:
Historical series of lengthnare available. We withholdnÿTfrom the series, so that the model is ®t across periods, 1. . .T;and used to forecast each test period, Th;whereh1. . .nÿT: The presence of the regressors requires that an assumption be made about the test period values ofxandz. The several variations are shown in Table I along with the notation of the type of regression forecast obtained.
Our procedure employs rolling out-of-sample evaluations. See Schnaars (1986) for an excellent empirical application and Tashman (2000) for a comprehensive evaluation of the procedure. Normally, rolling out-of-sample evaluations have been applied to compare the forecasting accuracy of extrapolation methods. The application to regression involves some additional considerations, as will now be described. There are two phases.
First, after the regression model is estimated over the initial period of ®t, the ®t period is successively updated from originTto originnÿ1. At each origin, the regression coecientsbjare recalibrated and a new mean square error obtained (estimate ofs2). Then averages are taken of
Table I. Test-period assumptions for the regressors (type of forecasts fory)
Bothxandzare known y x;z
xis known,zis forecast y x;z!
xis forecast,zis known y x!;z
the individual-origin estimates of the bj and of the individual-origin estimates of s2. These averages are used as the inputs in Appendix A, equation (A1), for thebjand fors2.
Second, forecasts must be generated for the regressors. Such forecasts can be judgemental, extrapolative, outputs of another model or a mix of the three. For this study, only automatic extrapolations were applied. Doing so not only eliminates judgement as a potentially confounding factor, but provides a statistical basis for measurement of regressor forecast error. Extrapolative forecasts of the regressors were made at each originTthroughnÿ1 and input into a regression equation of a matching ®t period. For example, regressor forecasts made at originT2 were input into the regression equation that is ®t through period T2. At each origin,h-step-ahead forecasts forxandzare subtracted from the known values of the regressors during periodThto obtainh-step-ahead regressor forecast errors. Finally, for each regressor and each lead time, the mean square of these errors is calculated and input into Appendix A, equation (A1), for the estimate ofs2
u:
In summary, we use Appendix A, equation (A1), to calculate a large-sample approximation of each h-step-aheadrefv. The inputs for thebjands2terms are averages of coecient estimates obtained at each originT throughnÿ1. The inputs fors2
uÐ one for each regressor Ð are the
mean square regressor forecast errors athsteps ahead.
CASE STUDY
This case study applies the preceding methodology to illustrate plausible values for therfev. We have adapted and updated the Harvard Business School Case called Alpha Concrete Products
(Harvard College, 1974), a case that examines the use of regression analysis to forecast a company's annual sales revenue (Sales).
Two primary regressors were the population of the sales region (Pop) and the number of Building Permits (Perms). The annual series are shown in Appendix B. Overall model ®t was good, with a multipleR2above 095 and residuals showing no evidence of model misspeci®cation. The partial coecients of determination (for the initial ®t period) were 091 forPopand 067 for
Perms.
Forecasts for the regressors were made using an automatic exponential smoothing algorithm. ForPop, a linear trend (Holt's method) was chosen. Out-of-sample forecast errors were small with the parameter, p, measuring the degree of regressor forecast error, being less than 01.
Perms, in contrast, was a highly cyclical variable. The automatic algorithm defaulted to a random walk and out of sample forecast error, withpabove 075, was substantial.
To summarize the characteristics of the regressors:Popwas an extremely important regressor in-sample and could be forecast accurately out of sample.Permswas a statistically signi®cant but less important regressor whose out-of-sample forecasting accuracy was very poor. There was a low degree of collinearity between the two regressors.
Based on the analysis of the previous section, we know that the eects on therefvof regressor utility and regressor forecastability are osetting, so that neither regressor in this case study presents an extreme case.
We selected the commercial software packagetsMetrixto ®t the regression equations because of its unique capability (Tashman, 2000) to recalibrate regression coecients in a rolling out-of-sample evaluation.
In Table II, we report estimates of the relative forecast standard error-square root of therfevÐ for forecast horizons of 1±5 years. The initial ®t period was set by withholding the last 7 years of data. Withholding somewhat more data than the longest forecast horizon ensures that the empirical estimates of the error variances at the longest horizon are based on more than a single forecast error.
Each value in the table represents the ratio of the standard error of anex anteforecast (square root of the forecast error variance) to the standard error of theex postforecast,Sales(Pop,Perms). The ®rst line of values ÐSales(Pop,Perms!) Ð shows the degree of in¯ation in the standard error of the forecast attributable to the marginal eect of forecast error inPerms. (In this line, known values ofPopand forecasts ofPermswere used to forecastSales.) Theex antestandard error is 38% higher for one-year-ahead forecasts and more than double theStandardat horizons 2±5.
From the second line ÐSales(Pop!,Perms) Ð we see that, while forecast error inPopin¯ates the standard error by only 5% at the ®rst horizon, the in¯ation grows dramatically as the forecast horizon lengthens. This pattern is due to the growth in the out-of-sample estimates of the regressor forecast error variance,s2
u;as the forecast horizon lengthens.
In the general ex ante forecast ÐSales(Pop!,Perms!) Ð the reported ratios are uniformly highest, an expected result re¯ecting the additive marginal eects of individual-regressor forecast error. For a three-year-ahead forecast ofSales, the results show that theex antestandard error of the forecast is nearly three times that of the Standard. As a ®rst approximation Ð that is, assuming that the distributional critical values applicable to the distributions ofex postforecast errors were applicable to theex anteerrors as well Ð we could say that the prediction interval for the 3-year-ahead forecast should be nearly three times as wide as the prediction interval the forecaster will be shown by forecasting software.
If these large-sample approximations are adequate, the adjustments required to theStandard
to account for regressor forecast error are reasonably straightforward. An estimate of the degree of regressor forecast error (p) must be made for each lead time. Although this can be done judgementally, an automatic extrapolation would provide an ecient macro for the regression routines. The remaining two components (Appendix A, equation (A1)) are byproducts of any regression algorithm. Software developers should be encouraged to make the relatively minor adaptations required to facilitate these calculations.
TESTING THE LARGE-SAMPLE APPROXIMATION
In this section, we compare the results in Table II against empirical measures of the variance of out-of-sample forecast errors.
Table II. Large-sample approximations of theprfevin the Alpha Concrete Products case (base of 100 is theStandard)
Forecast (Notation in Table I) Forecast horizon
1 2 3 4 5
Sales(Pop,Perms!) 138 209 232 256 202
Sales(Pop!,Perms) 105 131 187 234 322
Empirical out-of-sample forecast errors were a byproduct of the rolling out-of-sample evaluations of the prior sections. Based on each combination of known and forecasted values of the regressors, as shown in Table I, we generated forecasts of the dependent variable from each origin,T(year 15) tonÿ1 (year 21). When collated by lead time, the result is a collection of seven one-step-ahead forecasts, six two-step-ahead forecasts and so forth through three ®ve-step-ahead forecasts. We used the actual values of the dependent variable through the test period to calculate the forecast errors and then, for each group of speci®c lead-time errors, we computed the variance as the mean of the squared errors.
Shown in Table III are the results from the completeex anteforecasting equation:Sales(Pop!,
Perms!). Results for the partial ex ante forecasts are very similar. The weights used in the weighted average represent the number of forecasts recorded at each lead time: this was seven one-year-ahead forecasts down to three ®ve-year-ahead forecasts.
The purely empirical calculations and our large-sample approximations appears to be a close match. The two types of estimates of therfevare consistent in demonstrating (a) in¯ation of the forecast error variance in face of regressor forecast error and (b) the tendency of the in¯ation factors to increase with the lead time of the forecast. (There is a reduction in the purely empirical
rfevat lead 5; however, empirical measures can be erratic.)
The proximity of the results in Table III can, of course, be at least partly a chance occurrence, and evidence based on a single case study can be considered no more than indicative. Nonetheless, the ®ndings are encouraging and, in our judgement, suggest that the proposed adjustments to the Standard to account for regressor forecast error are worthy of further investigation.
SUMMARY
In this paper, we have reported large-sample approximations for the relative forecast error variance of a single-equation regression model. The assumptions made are that the regressor forecast errors are uncorrelated with themselves and with the model errors. The results show that, under these assumptions, therfevdepends upon three parameters:
. The incremental utility of adding a regressor to a model,r2
yxjxÿj . The degree of error in forecasting the regressor out-of-sample,p
. The degree of multicollinearity between the regressor in question and the set of remaining regressors in the model,r2
xjxÿj
Table III. Large-sample approximations versus purely empirical estimates of the prfev in the Alpha Concrete Products case (both regressors forecastex ante)
Forecast horizon
1 2 3 4 5 Wt ave.
Large-sample approximation 142 226 287 332 367 249
These factors interact; for example, a given degree of error in forecasting a regressor has a more powerful eect on the forecast error variance the greater the utility of adding that regressor to the model.
We have shown how estimates can be made of therfevusing rolling out-of-sample evaluations with matching ®t periods for the regressors and dependent variable. When applied to a case study using `typical data', the results suggested that the prfev (a) grows with the lead time of the forecast, re¯ecting increases in the variance of regressor forecast error over the forecast horizon and (b) can readily exceed a factor of 2, which is to say that regressor forecast error can more than double the width of a prediction interval.
Finally we compared our large-sample approximations of the rfev to purely empirical calculations of the out-of-sample forecast error variance and found the two sets of estimates to be very close. Our conclusion is that the large-sample approximations show promise as a valid basis for calculation ofex anteprediction intervals and are worthy of further empirical evaluations.
APPENDIX A: PROOF OF EQUATION (1) Here we establish the identity given in equation (1). We ®rst show that
lim n!1 s2 y~ s2 y^1 Xk j1 b2 js2uj s2 A1
and we then show that
b2js2 uj s2 pjr2yxjxÿj 1ÿr2 xjxÿj 1ÿr2yxjxÿj A2 from which the result follows.
Examining the ratio of prediction errors in the limit is equivalent to assuming thatbis known. Then, s2 y^ Th E yThÿy^Th2E yThÿxT0hb2s2 and s2 y~ Th E yThÿy~Th2E yThÿx^0Tb2E YThÿx0Thbÿu0Thb2 E yThÿx0Thb2ÿ2Eu0Thb YThÿx0Thb E u0Thb2 s2b0S ubs2 Xk j1 b2js2 uj In the middle line of the expression for s2 y~
Th; the cross-product term is zero under the
®nal line, theb0S
ubterm becomes the summation of the products
Xk j2
b2
js2uj
under the assumption that regressor forecast errors are uncorrelated among themselves. Taking the ratio of the ®nal expression fors2 y^Thtos2 y~Thestablishes equation (A1). To establish equation (A2), letSSE xÿjrepresent the sum of squared errors for the regression
ofyt onxÿj:A straightforward application of the expectation of a quadratic form shows that
ESSE xÿj s2 nÿk1 b2ESSE xjon xÿj
whereSSE xj onxÿjrepresents the sum of squared errors for the regression ofxjonxÿj:It is not
dicult to show thatE[SSE xjon xÿj nÿk1 1ÿr2xjxÿjs2xj: The population coecient of partial determination for addingxjto a model already containingxÿjis, by de®nition,
r2
yxjxÿjnlim!1
SSE xÿj ÿSSE xj;xÿj
SSE xÿj
whereSSE xj;xÿjrepresents the sum of squared errors from the regression ofyt on xj;xÿj:
Then it follows that
r2 yxjxÿj b2j 1ÿr2 xjxÿjs2xj s2b2 j 1ÿr2xjxÿjs2xj
Straightforward algebra leads to equation (A2) and equation (1) follows immediately from this. APPENDIX B: THE ALPHA CONCRETE PRODUCTS DATA
Year Sales(dollars) Pop(# of people) Perms(# of permits)
1959 2,904,000 868,000 8000 1960 2,868,000 890,627 8272 1961 3,303,000 936,000 6700 1962 4,888,476 958,000 8030 1963 5,879,591 974,000 8420 1964 5,947,587 978,000 8744 1965 5,905,301 991,000 6494 1966 5,442,447 1,009,000 5657 1967 4,327,223 1,019,000 3982 1968 6,237,503 1,029,000 5116 1969 5,921,922 1,047,000 5490 1970 7,619,577 1,059,000 5964 1971 7,863,210 1,099,000 9070 1972 9,853,870 1,128,000 12,777
1973 11,979,262 1,157,000 17,320 1974 11,954,110 1,195,500 13,450 1975 11,402,350 1,234,000 8841 1976 12,580,030 1,275,000 7658 1977 13,662,405 1,316,000 10,117 1978 15,090,037 1,364,000 13,085 1979 17,760,224 1,416,000 13,310 1980 18,685,524 1,472,000 11,505 SOFTWARE LIST
(1) Forecast Pro for Windows, Version 3 (1997), Business Forecast Systems, Belmont, CA. (2) Minitab, Release 12 (1998), Minitab Inc., State College, PA.
(3) SAS/ETS,Release 6 (1997), SAS Institute, Inc., Cary, NC.
(4) SmartForecasts for Windows, Version 1 (1997), SmartSoftware Inc., Belmont, CA. (5) SPSS Trends, Version 75 (1997), SPSS, Inc., Chicago, IL.
(6) TsMetrix, Version 2 (1997), RER, Inc., San Diego, SA.
The paper refers only to the regression options in these packages. In SAS/ETS and other ARIMA packages, ARIMA procedures can be used by the sophisticated analyst to obtain forecast standard errors that incorporate regressor forecast error. To do so one can (a) create a univariate forecast for each regressor (b) use a transfer function to reproduce a regression model containing those regressors. However, this approach cannot be viewed as a satisfactory surrogate for many practitioners of regression-based forecasting: For one, it is infeasible if the time series are too short for ARIMA modeling. It does not allow for judgemental inputs of regressor forecast error variance. Also the software capability is not widely available. Of the six packages listed above, for example, only SAS/ETS oers the requisite ARIMA technology.
ACKNOWLEDGEMENTS
Many thanks go to former University of Vermont students Michael Brodie and Peter Tashman for very signi®cant contributions to early stages of this research, and to Professor William Bassin of Shippensburg University for his constant support and feedback over the long life of this project.
REFERENCES
Adams FG. 1987.The Business Forecasting Revolution. Oxford University Press: New York. Armstrong JS. 1985.Long-Range Forecasting, 2nd edn. Wiley-Interscience: New York.
Ashley R. 1983. On the usefulness of macroeconomic forecasts as inputs to forecasting models.Journal of Forecasting2: 211±223.
Ashley R. 1988. On the relative worth of recent economic forecasts.International Journal of Forecasting4: 363±376.
Bassin WM. 1987. How to anticipate the accuracy of a regression model.Journal of Business Forecasting: Methods & Systems6: 26±28.
Diebold F. 1998.Elements of Forecasting. South-Western: Cincinnati.
Feldstein M. 1971. The error of forecast in econometric models when the forecast-period exogenous variables are stochastic.Econometrica39: 55±60.
Fildes R. 1985. Quantitative forecasting Ð the state of the art: econometric models. Journal of the Operational Research Society36: 549±580.
Geriner PT, Ord JK. 1991. Automatic forecasting using explanatory variables: a comparative study.
International Journal of Forecasting7: 127±140.
Intrilligator MD. 1978.Econometric Methods, Techniques and Applications. Prentice Hall: Englewood Clis, NJ.
Jacobs J, Sterken E. 1994. Macroeconomic models and portfolio investment. In International Symposium on Forecasting, Stockholm.
Jarrett J. 1990. Improving forecasts by decomposing the error.Journal of Business Forecasting: Methods & Systems9: 12±15.
Levenbach H, Cleary JP. 1984.The Modern Forecaster. Lifetime Learning Publications: Belmont, CA. Newbold P, Bos T. 1994. Introductory Business and Economic Forecasting, 2nd edn. South-Western:
Cincinnati.
Osborn DR, Teal F. 1979. An assessment and comparison of two NIESR econometric model forecasts.
National Institute Economic Review27: 50±62.
Pindyck RS, Rubinfeld DL. 1991.Econometric Models and Economic Forecasts, 3rd edn. McGraw-Hill: New York.
Schnaars SP. 1986. A comparison of extrapolation procedures on yearly sales forecasts. International Journal of Forecasting2: 71±85.
Tashman LJ. 2000. Out-of-sample tests of forecast accuracy: an analysis and review.International Journal of Forecasting16: forthcoming.
Authors' biographies:
Leonard J. Tashmanhas spent half his life on the faculty of the School of Business Administration of the
University of Vermont. Forecasts of his near-term retirement are probably accurate.
Thorodd Bakkenreceived his B.S. and MBA degrees from the School of Business Administration of the
University of Vermont. He works for Citigroup, as a foreign exchange and interest rate dealer. He was a national team cross country skier in Norway, and four time NCAA champion in the USA.
Jerey Buzasis Associate Professor of Statistics in the Department of Mathematics and Statistics at the
University of Vermont. His research interests include covariate measurement error in non-linear regression models.
Authors' addresses:
Leonard J. Tashman, School of Business Administration, University of Vermont, Burlington, VT 05405,
USA.
Thorodd Bakken, Citibank International plc, Norway Branch, Tordenskiolds Gate 8-10, P.O. Box 1481
Vika, 0116 Oslo, Norway.
Jerey Buzas, Department of Mathematics and Statistics, University of Vermont, Burlington, VT 05405,