Forecasting and model averaging with structural breaks

(1)

2015

Forecasting and model averaging with structural

breaks

Anwen Yin

Iowa State University

Follow this and additional works at:https://lib.dr.iastate.edu/etd

Part of theEconomics Commons,Finance and Financial Management Commons, and the

Statistics and Probability Commons

This Dissertation is brought to you for free and open access by the Iowa State University Capstones, Theses and Dissertations at Iowa State University Digital Repository. It has been accepted for inclusion in Graduate Theses and Dissertations by an authorized administrator of Iowa State University Digital Repository. For more information, please [email protected].

Recommended Citation

Yin, Anwen, "Forecasting and model averaging with structural breaks" (2015).Graduate Theses and Dissertations. 14720.

(2)

by

Anwen Yin

A dissertation submitted to the graduate faculty in partial fulfillment of the requirements for the degree of

DOCTOR OF PHILOSOPHY

Major: Economics

Program of Study Committee: Helle Bunzel, Co-major Professor Gray Calhoun, Co-major Professor

Joydeep Bharttacharya David Frankel

Jarad Niemi Dan Nordman

Iowa State University Ames, Iowa

(3)

DEDICATION

(4)

TABLE OF CONTENTS

LIST OF TABLES . . . vi

LIST OF FIGURES . . . vii

ACKNOWLEDGEMENTS . . . ix

ABSTRACT . . . x

CHAPTER 1. FORECASTING EQUITY PREMIUM WITH STRUC-TURAL BREAKS . . . 1

1.1 Introduction . . . 1

1.2 Detecting and Dating Structural Breaks . . . 4

1.2.1 Break Model . . . 5

1.2.2 Data . . . 6

1.2.3 Break Estimation . . . 7

1.2.4 Full Sample Estimation Results . . . 8

1.3 Forecast with Parameter Instability . . . 9

1.3.1 Methodology . . . 10

1.4 Out-of-sample Forecast . . . 15

1.4.1 Forecast Using the Stable Model of Stock Market Variance . . . . 15

1.4.2 Forecast Using the Break Model of Stock Market Variance . . . . 16

1.4.3 Comparing the Stable Model with the Break Model . . . 17

(5)

CHAPTER 2. COMBINING MULTIPLE PREDICTIVE MODELS

WITH POSSIBLE STRUCTURAL BREAKS . . . 20

2.2 Econometric Model . . . 24

2.2.1 Bivariate Predictive Model . . . 24

2.2.2 Forecast Combination . . . 25

2.2.3 Forecast Evaluation . . . 29

2.3 Empirical Results . . . 30

2.3.1 Data and Out-of-sample Forecast . . . 30

2.3.2 Bivariate Model Prediction . . . 33

2.3.3 Forecast Excess Returns Using Combined Model . . . 34

2.4 Conclusion . . . 37

CHAPTER 3. OUT-OF-SAMPLE FORECAST MODEL AVERAG-ING WITH PARAMETER INSTABILITY . . . 39

3.2 Related Literature . . . 43

3.3 Econometric Theory . . . 46

3.3.1 Model and Estimation . . . 46

3.3.2 Cross-Validation Criterion . . . 47 3.3.3 Cross-Validation Weights . . . 49 3.4 Simulation Results . . . 54 3.4.1 Design I . . . 57 3.4.2 Design II . . . 58 3.4.3 Design III . . . 59 3.4.4 Summary . . . 59 3.5 Empirical Application . . . 59

(6)

3.5.2 Forecast Taiwan GDP Growth . . . 62 3.6 Conclusion . . . 63

APPENDIX A. FORECASTING EQUITY PREMIUM WITH STRUC-TURAL BREAKS . . . 65

APPENDIX B. COMBINING MULTIPLE PREDICTIVE MODELS

WITH POSSIBLE STRUCTURAL BREAKS . . . 77

APPENDIX C. OUT-OF-SAMPLE FORECAST MODEL

AVERAG-ING WITH PARAMETER INSTABILITY . . . 94

(7)

LIST OF TABLES

Table A.1 Estimation Results for Stable Predictive Models . . . 66

Table A.2 Estimation Results for the Stock Market Variance Model with Three Breaks . . . 67

Table B.1 U.S. Market Equity Premium Out-of-Sample R2_OS Statistics for Combining Methods . . . 78

Table C.1 Monte Carlo Simulation: Design I . . . 95

Table C.2 Monte Carlo Simulation: Design II . . . 95

Table C.3 Monte Carlo Simulation: Design III . . . 96

Table C.4 U.S. Quarterly GDP Growth Rate Forecast Comparison . . . 96

(8)

LIST OF FIGURES

Figure A.1 Break Estimation Results for Historical Mean, Dividend-price Ra-tio, Dividend Yield, Earnings-price RaRa-tio, Dividend-payout Ratio and Stock Market Variance . . . 68 Figure A.2 Break Estimation Results for Cross Sectional Premium,

Book-to-market Ratio, Net Equity Expansion, Treasury Bill, Long Term Yield and Term Spread . . . 69

Figure A.3 Break Estimation Results for Default Premium and Inflation . . 70

Figure A.4 Out-of-Sample Forecast Evaluation for the Stable Model . . . 71

Figure A.5 Out-of-Sample Forecast Evaluation for the Break Model under

Fixed Window . . . 71

Recursive Window . . . 72

Rolling Window . . . 72 Figure A.8 Recursive window out-of-sample forecast comparison between the

break Model and stable model . . . 73

Figure A.9 Rolling window out-of-sample forecast comparison between the

break Model and stable model . . . 74 Figure A.10 Fixed window out-of-sample forecast comparison between the break

Model and stable model . . . 75 Figure A.11 Robust Weights Example . . . 76

(9)

Figure B.1 Monthly Data Time Series Plots . . . 79

Figure B.2 Quarterly Data Time Series Plots . . . 80

Figure B.3 Annual Data Time Series Plots . . . 81

Figure B.4 Monthly Data Variable Correlation Matrix . . . 82

Figure B.5 Quarterly Data Variable Correlation Matrix . . . 83

Figure B.6 Annual Data Variable Correlation Matrix . . . 84

Figure B.7 Cumulative Difference in Squared Forecast Error (CDSFE): Indi-vidual Model, Monthly Data . . . 85

Figure B.8 Cumulative Difference in Squared Forecast Error (CDSFE): Indi-vidual Model, Quarterly Data . . . 86

Figure B.9 Cumulative Difference in Squared Forecast Error (CDSFE): Indi-vidual Model, Annual Data . . . 87

Figure B.10 Monthly Data: Model Out-of-Sample Forecasts Correlation Matrix 88 Figure B.11 Quarterly Data: Model Out-of-Sample Forecasts Correlation Matrix 89 Figure B.12 Annual Data: Model Out-of-Sample Forecasts Correlation Matrix 90 Figure B.13 Cumulative Difference in Squared Forecast Error (CDSFE): Com-bined Model, Monthly Data . . . 91

Figure B.14 Cumulative Difference in Squared Forecast Error (CDSFE): Com-bined Model, Quarterly Data . . . 92

Figure B.15 Cumulative Difference in Squared Forecast Error (CDSFE): Com-bined Model, Annual Data . . . 93

(10)

ACKNOWLEDGEMENTS

I would like to take this opportunity to express my thanks to those who helped me with various aspects of conducting research and the writing of this dissertation. First and foremost, Helle and Gray for their guidance, patience and support throughout this research and the writing of my dissertation. Their insights and words of encouragement have often inspired me and renewed my hopes for completing my doctoral degree. I would also like to thank my committee members for their efforts and contributions to this work: Dan, David, Jarad and Joydeep. Thank you everyone!

(11)

ABSTRACT

This dissertation consists of three papers. Collectively they attempt to investigate on how to better forecast a time series variable when there is uncertainty on the stability of model parameters.

The first chapter applies the newly developed theory of optimal and robust weights to forecasting the U.S. market equity premium in the presence of structural breaks. The empirical results suggest that parameter instability cannot fully explain the weak forecasting performance of most predictors used in related empirical research.

The second chapter introduces a two-stage forecast combination method to forecast-ing the U.S. market equity premium out-of-sample. In the first stage, for each predictive model, we combine its stable and break cases by using several model averaging methods. Next, we pool all adjusted predictive models together by applying equal weights. The empirical results suggest that this new method can potentially offer substantial predictive gains relative to the simple one-stage overall equal weights method.

The third chapter extends model averaging theory under uncertainty regarding struc-tural breaks to the out-of-sample forecast setting, and proposes new predictive model weights based on the leave-one-out cross-validation criterion (CV), as CV is robust to heteroscedasticity and can be applied generally. It provides Monte Carlo and empirical evidence showing that CV weights outperform several competing methods.

(12)

CHAPTER 1.

FORECASTING EQUITY PREMIUM WITH

STRUCTURAL BREAKS

1.1

Introduction

Recent econometric advances and empirical evidence seem to suggest that the market excess returns are predictable to some degree. Forty years ago this would have been tantamount to an outright rejection of the efficient capital market hypothesis. In fact, the martingale is long considered to be a necessary condition for an efficient asset market, one in which the information contained in past prices is instantly, fully, and perpetually reflected in the asset’s current price. If the market is efficient, then it should not be possible to profit by trading on the information contained in the asset’s price history, hence the conditional expectation of future price changes, conditional on the price history, cannot be either positive or negative and therefore must be zero. A model associated with the efficient market hypothesis is the random walk model. It assumes that the successive returns are independent, and that the returns are identically distributed over time. Consequently, it implies that the efficient market hypothesis and random walk model combined can fully explain the weak forecasting performance of a wide range of predictors in empirical studies.

However, one of the central tenets of modern financial economics is the necessity of some degree of trade-off between risk and the expected excess returns. In addition, although the martingale hypothesis places a restriction on the expected returns, it does not account for risk in any way. Particularly, if an asset’s expected price change is

(13)

positive, it may be the reward necessary to attract investors to hold the asset and to bear the associated risk. Therefore, the martingale property may be neither a necessary nor a sufficient condition for rationally determined asset prices. The complex structure of security markets and frictions in the trading process could possibly generate stock return predictability.

Recently, Goyal and Welch (2008) show that the simple historical average model of the U.S. equity market excess returns forecasts future returns better than other models with various predictors suggested by the literature. They argue that the poor out-of-sample performance of linear predictive regressions is a systematic problem, not confined to any decade. They compare predictive regressions with historical average returns and find that historical average returns almost always generate superior return forecasts, so they conclude that “the profession has yet to find some variable that has meaningful and robust empirical equity premium forecasting power”. Subsequently, in examining the cause of the forecast failure shown in Goyal and Welch (2008), Rapach et al. (2010) argue that model uncertainty and parameter instability impair the forecasting ability. Additionally, Rapach and Wohar (2006) and Paye and Timmermann (2006) have shown empirical evidence of detected structural breaks in equity premium predicative models. But the literature on how to forecast excess returns with detected structural breaks is limited.

In this paper, we attempt to answer two empirical questions. First, if the true data generating process underlying the predictive model indeed has structural breaks, how to forecast excess returns? Second, can structural breaks or parameter instability fully explain the poor out-of-sample performance of those variables evaluated in Goyal and Welch (2008)? For the presence of parameter instability, using monthly data from Goyal and Welch (2008) and the break testing procedure by Bai and Perron (1998), we find that all models except for the one using the stock market variance, do not have significant statistical evidence for breaks. Therefore, parameter instability alone cannot explain the

(14)

puzzle of weak out-of-sample predictive power for most variables. Next, for the stock market variance model with estimated breaks, we apply the optimal and robust weights theory proposed by Pesaran et al. (2013) to forecasting the U.S. market equity premium out-of-sample. Our empirical results suggest that the stock market variance does have predictive power in forecasting excess returns. In addition, its predictive ability is present even without assuming parameter instability for the linear predictive model. Our further analysis shows that for the stock market variance, its break model outperforms the stable one.

This paper builds on literature related to out-of-sample forecast evaluation and struc-tural breaks. Researchers, such as Giacomini and Rossi (2009), have provided empirical evidence and suggest that parameter instability or structural break is an important source of forecast failure in macroeconomics and finance. Parameter instability can arise as a result of changes in tastes, technology, institutional arrangements and government policy. If there are breaks in the underlying data generating process and the break sizes are large, predictive models without taking into account this fact tend to forecast poorly out-of-sample. Researchers, such as Inoue and Kilian (2004), Goyal and Welch (2008) and Giacomini and Rossi (2009), have documented this out-of-sample forecast breakdown under parameter instability.

In the modeling of structural breaks, parameters can be assumed to change at discrete time intervals or continuously. With the discrete break model, break dates are estimated and forecasts are typically constructed using the post-break observations. Furthermore, Pesaran and Timmermann (2007) have proposed the optimal window theory to forecast in the presence of breaks. They argue that forecasts from the post-break window may not be mean squared forecast error optimal, as the estimation error could be large due to small post-break sample size. Their optimal estimation window includes pre-break observations which involves a bias-variance trade-off. On the other hand, Pesaran et al. (2013) propose optimal weights in the sense that the resulting forecasts minimize the expected mean

(15)

squared forecast error. With known break sizes and dates, their optimal weights follow a step function that allocates constant weights within regimes, but different weights across regimes. Since in practice break dates and sizes are unknown and their estimation could be highly imprecise, Pesaran et al. (2013) also develop weights that are robust to the uncertainty surrounding the break dates and sizes. With the continuously varying parameter model, breaks are assumed to occur at every time instant and observations are down-weighted to take account of the slowly changing nature of the parameters, for example, exponential smoothing.

The remainder of this paper is organized as follows. Section 1.2 reports the break estimation results. Section 1.3 outlines the weighted least squares theory we use to forecast out-of-sample with breaks. Section 1.4 reports empirical results. Section 1.5 concludes.

1.2

Detecting and Dating Structural Breaks

Goyal and Welch (2008) use the stable linear one-step ahead predictive model to evaluate the predictive power of a wide range of variables,1

yt+1 = ¯y+βxt+ut+1 (1.1)

where t = 1, ..., T. yt+1 is the market excess returns, ¯y is the intercept, xt is the ex-ogenous predictor available at time t to forecast the next period returns yt+1 and ut+1

is a disturbance term. The un-modeled structural breaks may be the cause why many predictors are week to forecast the excess returns relative to the benchmark which is simply

yt+1 = ¯y+ut+1. (1.2)

In this section we will present the break model and outline the method we will use to detect and estimate possible breaks for model (1.1).

(16)

1.2.1 Break Model

The model subject to m breaks occurring at times (t1, t2, ..., tm) is

yt+1 =                        y₁+β1xt+ut+1, t= 1, ..., t1 y₂+β2xt+ut+1, t=t1+ 1, ..., t2 .. . ... y_m+βmxt+ut+1, t=tm−1+ 1, ..., tm y_m₊₁+βm+1xt+ut+1, t=tm+ 1, ..., T (1.3)

where yt+1 is the one-step ahead market excess returns, xt is the exogenous predictor available at time t to forecast the next period returns yt+1 and ut+1 is a disturbance

term. The reason for using the discrete, step-function type break model is that some of the potential sources of breaks, such as shifts in economic policy regimes or large macroeconomic shocks, are likely to lead to rather sudden shifts in the parameters of the forecasting model. In addition, we assume that parameter instability only occurs in the regression coefficients ¯y and β.

The idea of estimating structural breaks in Bai and Perron (1998) is to find a set of dates which globally minimizes the sum of squared residuals from the least squares regression (ˆt1,ˆt2, ...ˆtM) = argmin m+1 X i=1 ti X s=ti−1+1 [ys+1−y¯s−βsxs]2 (1.4)

where i indexes the number of regimes. The regression parameter estimates are the

ordinary least squares estimates associated with the m-partition of the data sample. For break identification, a crucial assumption in Bai and Perron (1998) is that there is enough number of observations within each regime. Given the break date estimates, the regression model coefficients, nβb_i

om+1 i=1

, are the least squares estimates associated with the partition comprised of the estimated break dates.

(17)

1.2.2 Data

Our monthly data from January 1871 to December 2011 are obtained from Goyal and Welch (2008). Since not all variables are available for the entire time span, in order to take a comprehensive look at the performance of all predictors, we only consider a subset of the data from May 1937 to December 2011 for our empirical analysis. It is worth mentioning that in this paper we examine more predictive variables than those studied in Paye and Timmermann (2006) and Rapach and Wohar (2006).

The dependent variable, the market equity premium, is the log returns on the S&P 500 index including dividends minus the log returns on the risk-free rate. The predictors are

• Log dividend-price ratio (dp): log of a 12-month moving sum of dividends paid on the S&P 500 index minus the log of stock prices.

• Log dividend yield (dy): log of a 12-month moving sum of dividends minus the log of lagged prices.

• Log earnings-price ratio (ep): log of a 12-month moving sum of earnings on the S&P 500 index minus the log of stock prices.

• Log dividend-payout ratio (de): log of a 12-month moving sum of dividends minus the log of a 12-month moving sum of earnings.

• Stock market variance (svar): monthly sum of squared daily returns on the S&P 500 index.

• Cross sectional premium (csp): the relative valuations of high- and low-beta stocks.

• Book to market ratio (bm): ratio of book value to market value for the Dow Jones Industrial Average.

(18)

• Net equity expansion (ntis): ratio of a 12-month moving sum of net equity issues by NYSE-listed stocks to the total end of year market capitalization of NYSE stocks.

• 3-month Treasury bill rate (tbl): interest rate on a three-month secondary market Treasury bill.

• Long term government bond yield (lty): long term government bond yield.

• Term spread (tms): long term yield minus the Treasury bill rate.

• Default premium (dfy): difference between BAA- and AAA-rated corporate bond

yields.

• Inflation (infl): inflation is the Consumer Price Index (all urban consumers) from the Bureau of Labor Statistics.

These variables can be put into three categories: stock characteristics variables, such as the dividend price ratio; market micro-structure variables, such as the net equity expansion; and macroeconomic indicators, for example, the inflation rate.

1.2.3 Break Estimation

Our model (1.3) assumes that all regression coefficients are subject to structural breaks, since there is no convincing evidence saying otherwise. Because the total number of breaks is another parameter to estimate, a predictive model with a large number of estimated break dates fully based on equation (1.4) may be overfitted. To correct possible model overfitting, we adopt the approach by Zeileis et al. (2003) to select the number of estimated breaks based on the Bayesian information criterion which penalizes overfitting. The number of breaks associated with the minimum Bayesian information criterion (BIC) value will be selected. If the BIC value achieves its minimum at the point where the total number of breaks is zero, then it favors a stable model with no breaks.

(19)

The total number of breaks estimation results for all models are presented in FigureA.1, FigureA.2 and Figure A.3.

We have 14 models in total, 13 univariate regression models plus one historical mean model as benchmark. For each model labeled by its predictor, FigureA.1, FigureA.2and FigureA.3report the BIC value and the sum of squared residuals (RSS) as a function of the number of breaks. The RSS is shown in blue colored curve and it is downward-sloping in all figures. This is not surprising because adding one more arbitrary break is analogous to adding one more regressor in a linear model and the RSS will decrease as the result of model overfitting. The black colored BIC curve is the criterion we use in break number selection. By BIC, we can see that only the stock market variance model has evidence of parameter instability with three breaks. But the evidence is not strong enough to rule out the stable model shown in Figure A.1. Both models have approximately the same BIC value, so next we will split the analysis of the stock market variance model into two cases, the break model case and the stable model case. For other models, it is clear from these figures that the stable model is the best choice.

For the break model of stock market variance, the break date estimates are March 1956, September 1974 and November 1985. Note that the second break date, September 1974, corresponds to the timing of the oil shock documented by economists.2 The last break date may be related to the great moderation.

1.2.4 Full Sample Estimation Results

For all stable models, we simply estimate their parameters by least squares then conduct inference. Separately, for the stock market variance model with breaks, based on previous results, we estimate its parameters for each segment by least squares. Our full sample least squares estimation results for all stable models are presented in Table A.1. The full sample estimation results for the stock market variance model with breaks

(20)

are reported in Table A.2. In Table A.1, for each model labeled by its predictor, we report its in-sample R2 _{statistic, intercept estimate and predictor coefficient estimate} _β_.

Parentheses report the t statistic for each parameter estimate above. In Table A.2, we report all statistics separately for each segment.

For all predictor-based stable models except for the stock market variance model, the in-sample explanatory power of predictors measured by R2 _{is very low. Furthermore,}

most predictor coefficients are insignificant. Our results contradict with studies, such as Giacomini and Rossi (2009) and Goyal and Welch (2008), which conjecture that the insignificant predictive ability of economic variables is likely due to parameter instability. Our results show that most predictors in Goyal and Welch’s monthly data are stable in the bivariate predictive model, and the poor forecasting performance of these variables cannot be attributed to un-modeled parameter instability.

For the stock market variance model with three breaks, its R2 value is higher than any other predictors shown in Table A.1 in all segments. Furthermore, its parameter estimates are significant in all segments. Our results suggest that the stock market variance has predictive power in forecasting excess returns.

Next we will show how to apply the optimal and robust weights to forecasting out-of-sample with breaks.

1.3

Forecast with Parameter Instability

With mounting evidence of parameter instability in many macroeconomic and fi-nancial predictive models (see Rapach and Wohar (2006) and Paye and Timmermann (2006)), how to forecast a time series variable of interest with model parameter instabil-ity is an important issue. Researchers have proposed various methods to forecast under modeled breaks, and this strand of literature is fast evolving. Here we apply the weighed least squares theory proposed by Pesaran et al. (2013) to forecast in the presence of

(21)

breaks. In this section we will outline the construction of optimal weights and robust weights, and examine their empirical performance next. From a forecaster’s perspective, the latest break date should be most important to predict the future, so for models with multiple estimated breaks, we only focus on forecasting after the latest break and drive weights accordingly.

1.3.1 Methodology

1.3.1.1 Optimal Weights

The theory supporting the optimal weights assumes that the break dates and sizes are known. Following the notation of related out-of-sample forecast literature, we denote the total sample sizeT + 1, and split the sample into two parts: the first R observations for the training sample while the remaining P observations for prediction and forecast evaluation, R+P =T + 1. In addition, we impose that the break point,τ, falls into the estimation sample, and is bounded far away from both ends, that is, 1<< τ << R. We only consider the one-step ahead forecast problem. The predictive model with optimal weights is

b

yt+1 =x0t+1βb_topt (1.5)

The weights used in parameter estimation are optimal in the sense of minimizing the expected mean squared forecast error

w= arg min w E yt+1−x0t+1βb_t 2 (1.6) There are three popular estimation windows in the out-of-sample forecast literature: recursive window, rolling window and fixed window. Under the recursive window, at each point in time, the estimated parameters are updated by adding one more obser-vation starting with sample size R. Under the rolling window, the estimation window

(22)

is always fixed at length of R, for example, the first estimate uses data from period 1 to period R, while the second estimate runs from period 2 to period R+ 1. Under the fixed window, parameters are estimated only once using the entire estimation sampleR. Mathematically, for the recursive window

b β_topt = t X s=1 wsxsx0s !−1 _t X s=1 wsxsys ! (1.7)

for the rolling window

b β_topt = t X s=t−R+1 wsxsx0s !−1 _t X s=t−R+1 wsxsys ! (1.8)

and for the fixed window

b βopt = R X s=1 wsxsx0s !−1 _R X s=1 wsxsys ! (1.9) where t=R, ..., R+P −1.

The optimal weights theory states that observations in each regime will receive differ-ent weights for parameter estimation. If there is only one break, then the optimal weights take a simple two-regime form under fixed window, distinct weights across regimes but constant within each regime

     w1 = _R1 _µ₊₍₁₋_µ₎₍₁₊1 _µRλ2_ω2₎ w2 = _R1 1+µRλ 2_ω2 µ+(1−µ)(1+µRλ2_ω2₎ (1.10)

where τ is the break date, µ = τ /R, λ = β1−β2

σ , ω =

1

τ Pτ

s=1x2s. Optimal weights under recursive window or rolling window take the same form except that we need to update Rwith the actual sample size in each estimation step. Since we do not know the population value of these parameters, in practice we need to take advantage of our break

(23)

detection results earlier to provide sample approximations for the population parameter values ofβ1, β2 and σ. Our ordinary least squares estimates for the βs in the third and

fourth segments in table A.2 will serve as proxies for β1 and β2. The sample standard

deviation from September 1974 to December 2011 will be used to approximate σ.

1.3.1.2 Robust Weights

For optimal weights we have assumed that the dates and the sizes of parameter breaks are known. However, this assumption may not be relevant to real time forecasting. Specifically, the break sizes are difficult to estimate unless a relatively large number of post-break observations is available. So in addition to optimal weights, Pesaran et al. (2013) also propose weights which are robust to the uncertainty of break dates and sizes. In the robust weights theory, break dates and sizes are unknown.

The derivation of robust weights is an extension to deriving optimal weights. To illustrate the main idea of robust weights, we will continue the derivation process from equation (1.10). Rewrite equation (1.10) as

     Rw1 = _µ₊₍₁₋_µ₎₍₁₊1 _µRλ2_ω2₎ Rw2 = 1+µRλ 2_ω2 µ+(1−µ)(1+µRλ2_ω2₎ (1.11)

We can reformulate the time profile of the weights as

Rwt µ, λ2 =w2+ (w1−w2)I[τ−t] (1.12) for t= 1,2, ..., R.Hence, Rw a, µ, λ2= 1 R +µλ 2 1 R+µ(1−µ)λ 2 − µλ2 1 R +µ(1−µ)λ 2 I[µ−a] (1.13) where a=t/R ∈[0,1].

(24)

There is one discrete break in βi, but now we do not know the exact date of the break, τ. Instead, to derive the robust weights, we can impose a uniform distribution assumption on the break fraction, µ ≡ τ /R ∼ Uµ, µ, where µ and µ are some pre-specified lower and upper bounds for the break fraction. µ could take the value of zero while µ can be very close to one. By minimizing the expected mean squared forecast error, the population robust weights can be solved as

Rw(a) =            0 +O(R−1₎ _for _{a < µ} µ−µ−1R_µµ₁₋1_µdµ− µ−µ−1R_aµ₁₋1_µdµ+O(R−1_{) for} _µ_≤_a_≤_µ µ−µ−1Rµ µ 1 1−µdµ+O(R −1₎ _for_{a > µ} (1.14) then approximated by w(a)≈              0 fora < µ −1 R₍µ−µ₎log 1−a 1−µ forµ≤a≤µ −1 R(µ−µ)log 1−µ 1−µ fora > µ (1.15)

In the case where µ and µare close to the end points of 0 and 1, we have

w(a)≈ −log(1−a)

R , a ∈[0, µ] (1.16)

A discrete time version can be obtained by setting Rµ=R−1. Namely,

w_t∗ = −log(1−t/R) R−1 , fort = 1,2, ..., R−1 (1.17) and w_R∗ = −1 R−1log 1− R−1 R = log(R) R−1 (1.18)

(25)

Due to approximation error, these weights do not sum to unity, so they need to be re-scaled as wt = w_t∗ PR i=1w ∗ i , for t= 1,2, ..., R (1.19)

So under fixed window, the sample robust weights take the following form

wt =      log(1−s/R) PR−1

i=1 log(1−i/R)−log(R)

,s = 1, ..., R−1 log(R) log(R)−PR−1 i=1 log(1−i/R) ,s =R (1.20)

Robust weights under recursive window or rolling window take the same form as in equation (1.20) except that we need to update R with the actual sample size used in each estimation step. With robust weights, the least squares parameter estimates under the fixed window are:

b βR= R X s=1 wsxsx0s !−1 _R X s=1 wsxsys ! (1.21)

under the rolling window

b β_tR= t X s=t−R+1 wsxsx0s !−1 _t X s=t−R+1 wsxsys ! (1.22)

and under the recursive window

b β_tR= t X s=1 wsxsx0s !−1 _t X s=1 wsxsys ! (1.23) where t=R, ..., T.

Note that the robust weights shown in equation (1.20) do not involve break dates and sizes. Comparing robust weights with optimal weights, we can see that robust weights

(26)

take different values for different observations, as opposed to constant weights within a structural regime under optimal weights. In our empirical applications, the robust weights are monotonically increasing as time runs toward the end of the sample: the most recent observation receives the highest weight while observations in the distant past receive smaller weights. An example is shown in Figure A.11.

1.4

Out-of-sample Forecast

In the empirical analysis, we reserve the last 36 observations from the monthly data as the evaluation sample, P = 36. Theses observations represent the last three years of monthly data from January 2009 to December 2011. For the break model of stock market variance (1.3), the training sample starts with the first observation after the second break date (August 1974) and ends with the observation right before the evaluation sample (December 2008). The justification for our training sample size choice is that the econometric theory for forecasting with more than one break in the coefficient is not fully developed. Furthermore, from a researcher’s perspective in empirical analysis, the

latest break matters the most. Overall, we have R = 859 and P = 36 for the stable

model of the stock market variance in equation (1.1), while R = 412 and P = 36 for the structural break model of the stock market variance (1.3). We use the mean squared forecast error (MSFE) to evaluate forecasts and compare results.

1.4.1 Forecast Using the Stable Model of Stock Market Variance

We first examine the out-of-sample performance of model (1.1) for the stock market variance without assuming structural breaks. We use model (1.1) to forecast the last 36 months of the equity premium using all window choices. In addition, we also include forecasts from the historical mean benchmark model (1.2). The results are shown in FigureA.4.

(27)

Figure A.4shows that almost all estimation windows perform at least as well as the benchmark measured by a series of test errors, which supports our in-sample estimation results that the predictive power of stock market variance is significant. It is worth mentioning that forecasting results using annual data in Goyal and Welch (2008) suggest that the regression coefficient for the stock market variance predictor is insignificant, but our results using monthly data state otherwise. This could be due to the fact that we have more observations for parameter estimation using monthly data.

1.4.2 Forecast Using the Break Model of Stock Market Variance

Previously we have shown the forecasting performance of the stable model (1.1). Here we switch to the break model (1.3) and apply the optimal and robust weights to forecasting out-of-sample. In addition, we also consider the post-break window method which only uses observations after the latest break to estimate parameters.

In practice, it is up to the researcher to decide which method to use among optimal weights, robust weights and post-break window. Robust weights involve using observa-tions even before the break date to estimate parameters so it may introduce estimation bias. The post-break window only uses observations after the recent break so it may help reduce estimation bias, but if the post-break window size is small, it may result in a large efficiency loss. Optimal weights assume that the true break dates and sizes are known, but in practice it is almost impossible to estimate them with great precision, especially when either the sample size or the break size is small.

1.4.2.1 Fixed Window

Out-of-sample results for the stock market variance model under fixed window are shown in Figure A.5.

We can see that the stock market variance model performs at least as well as the benchmark over the evaluation sample period measured by a series of test errors. The

(28)

robust weights perform especially well towards the end of the evaluation sample period. Comparing weighting methods, our results suggest that the post-break window could be used as an alternative to robust weights if the computation of robust weights is costly.

1.4.2.2 Recursive Window

Out-of-sample forecasting results for the stock market variance model under recursive window are shown in Figure A.6.

In this case we see that the robust weights and the post-break window work well over most part of the evaluation sample period. For most forecasts, the efficiency gains are relatively large under either robust weights or post-break window compared with the historical mean model.

1.4.2.3 Rolling Window

Out-of-sample results for the stock market variance model under rolling window are shown in Figure A.7.

Results in this case are similar to those under fixed window. Robust weights forecast better than others at the beginning and towards the end of the sample. Post-break window does well during the middle of the evaluation period.

1.4.3 Comparing the Stable Model with the Break Model

Previously we have shown that the stock market variance has predictive power in forecasting excess returns based on Goyal and Welch’s monthly data, and the predictive ability stays regardless of the presence of structural breaks. Since our break detection results presented in section 1.2.3 do not provide a clear guidance on which model to choose, the break model (1.3) or the stable model (1.1) for the stock market variance, a natural extension is to compare the out-of-sample performance between these two models.

(29)

Following Goyal and Welch (2008), to construct a graphical device to evaluate the out-of-sample forecasting performance for two competing models, we will create a time series plot of the mean squared forecast error (MSFE) difference between the stable and break model,

∆MSFE = MSFEstable−MSFEbreak (1.24)

We will consider all estimation window choices, namely, recursive window, rolling window and the fixed window. In addition, since we have three weighting choices for the break model, totally we have nine MSFE difference time series plots. These MSFE difference plots are presented in Figure A.8, FigureA.9 and FigureA.10. In each plot, if the curve moves up, it implies that the break model outperforms the stable model during that evaluation period. If the curve moves down, it supports the stable model during that period.

A number of fluctuations can be seen under the recursive window in Figure A.8. All weighting methods show strong support for the break model at the end of the sample, and the MSFE difference curve remains positive for most part of the evaluation period. Rolling window favors the stable model as shown in FigureA.9. Both optimal weights and robust weights support the stable model at the beginning of the series, and the MSFE difference remains negative for most part of the evaluation period. The post-break window curve is very flat, and it stays close to zero through the entire evaluation period.

For the fixed window shown in Figure A.10, we can see that the robust weights

show strong support for the break model at the beginning and towards the end of the evaluation period. Optimal weights and post-break window are flat with the difference remaining positive for most part of the evaluation period.

(30)

1.5

Conclusion

Goyal and Welch (2008) have examined the out-of-sample performance of a wide range of predictors suggested by the empirical finance literature in forecasting excess returns using stable linear models. They conclude that most predictors have weak predictive ability. Furthermore, researchers argue that the cause of the failure for these predictors is parameter instability and have provided empirical evidence, see Paye and Timmermann (2006) and Rapach and Wohar (2006). Then the problem is how to forecast out-of-sample with modeled breaks, and how to evaluate forecasts and compare models.

This paper applies the newly developed theory of optimal and robust weights to forecasting the U.S. market equity premium in the presence of structural breaks using Goyal and Welch’s data. The weights are optimal in the sense of minimizing the expected mean squared forecast error, or robust to the break dates and size estimation error. Our empirical results suggest that parameter instability cannot fully explain the weak forecasting performance of most predictors considered in Goyal and Welch (2008). We find that out of 13 predictors, only one variable, the stock market variance, has evidence of structural breaks. But the evidence is not strong to rule out the stable model. Our empirical results suggest that the stock market variance has predictive power for market equity premium regardless of the presence of modeled breaks. Comparing the break model with the stable one, our results favor the former in forecasting excess returns.

(31)

CHAPTER 2.

COMBINING MULTIPLE PREDICTIVE

MODELS WITH POSSIBLE STRUCTURAL BREAKS

2.1

Introduction

Forecast combination is receiving growing attention in econometrics and finance. Combining predictive models is a smoothed extension of model selection, and may pos-sibly substantially reduce risk relative to model selection. While a broad consensus is that forecast combination improves forecast accuracy, there is no consensus on how to construct the forecast weights. Particularly, researchers have recognized the usefulness of forecast combination in the presence of model parameter instability, and structural breaks are often mentioned as motivation for combining predictive models. The under-lying idea is that models may differ in how they adapt to changes. Thus, when breaks are small, predictive models with stable parameters may outperform models with time-varying parameters. The converse is true in the presence of large breaks happened in the distant past. Since estimating the break dates and sizes precisely is difficult in real time, it is possible that combining forecasts from models with different degrees of adaptability can offer significant gains relative to selecting a single best model. Recent literature on economic forecasting1 _{has focused on two particularly appealing methods, equal weights} and Bayesian averaging. The equal weights method selects a set of models and then assigns them all equal weight for all forecasts. The Bayesian averaging method produces weights as by-product of Bayesian model averaging. In addition to the aforementioned

(32)

weights, Hansen (2007) proposes Mallows’ model averaging, and has extended the theory to various settings in subsequent research.2

In the literature on forecasting the market equity premium, Rapach et al. (2010) and Rapach and Zhou (2013) show that forecast combination can deliver statistically and economically significant out-of-sample gains relative to the historical average returns consistently over time. They argue that model uncertainty and instability seriously impair the predictability of individual model and the empirical explanations for the benefits of forecast combination are that combining forecasts can take advantage of all available information and combining forecasts are linked to the real economy. In their empirical analysis, they report that forecast combination can solve the puzzle presented in Goyal and Welch (2008) that many economic variables have week or no predictive power to forecast the U.S. market excess returns based on linear models. Specifically, Rapach et al. (2010) and Rapach and Zhou (2013) apply combination methods such as equal weights and discounted mean squared forecast error weights to demonstrating its superior out-of-sample performance relative to the historical mean benchmark. But it is not clear in their analysis how the equal weights and the discounted mean squared forecast error weights are related to structural breaks.

This paper introduces a two-stage forecast combination method which explicitly deals with structural breaks. In the first stage, to take into account the uncertainty on param-eter instability for each predictive model, we combine its stable and break cases by using one of the four proposed methods, namely, equal weights, discounted mean squared fore-cast error weights, Schwarz information criterion weights weights and Mallows’ weights. Next, we pool all adjusted predictive models obtained from the first stage together by applying equal weights. We recommend using the Mallows’ weights in the first stage because it is theoretically justified by Hansen (2009).3

2_{See Hansen (2008), Hansen (2009) and Hansen and Racine (2011)}

3_{In our empirical analysis, Mallows’ weights work the best among all four methods in forecasting}

(33)

To evaluate our two-stage forecast combination method and to compare results with related literature, we apply the two-stage forecast combination method to forecasting the U.S. market equity premium out-of-sample using an updated comprehensive dataset from Goyal and Welch (2008), and compare our results with those from Rapach et al. (2010) and Rapach and Zhou (2013) in similar studies. It is worth mentioning that in this paper we use all frequencies of data available to thoroughly investigate the empir-ical performance for all combination methods.4 _{Our empirical results suggest that the} two-stage forecast combination method, especially the one based on Mallows’ weights, can potentially offer substantial forecasting gains relative to a simple one-stage equal weighting method used in Rapach et al. (2010) and Rapach and Zhou (2013) over the same dataset.5

This paper builds on an extensive literature on forecast combination and market equity premium prediction. Timmermann (2006) provides a comprehensive survey on forecast combination by analyzing theoretically the factors that determine the advantages from combining forecasts, and discussing several cases related to model misspecification, parameter instability and the role of combinations under asymmetric loss. Hansen (2008) proposes forecast combination based on the Mallows’ information criterion which is an asymptotically unbiased estimate of both the in-sample mean squared error and the out-of-sample one-step ahead mean squared forecast error. Clark and McCracken (2010) examines the effectiveness of combining various models of instability in improving VAR forecasts made with real-time data, and considers a wide range of forecast combination methods in their analysis. Elliott (2011a) examines the sizes of the theoretical gains to optimal combination and provides conditions under which averaging and optimal com-bination are equivalent. Cheng and Hansen (2013) consider forecast comcom-bination with factor-augmented regression and investigate forecast combination across models using

4_{Rapach et al. (2010) uses quarterly data only. Rapach and Zhou (2013) uses monthly data only.} 5_{Rapach et al. (2010) and Rapach and Zhou (2013) also consider other methods, such as median}

weighting, trimmed mean weighting and discounted mean squared forecast error weights with different values of the discount factor, the simple one-step overall equal weights perform the best.

(34)

weights that minimize the Mallows’ and the leave-h-out cross validation criteria. Ra-pach and Wohar (2006) examine the structural stability of predictive regression models of the U.S. quarterly aggregate real stock returns over the postwar era. They find strong evidence of structural breaks in several bivariate predictive regression models of S&P 500 returns. Goyal and Welch (2008) systematically investigates the in-sample and out-of-sample performance of linear regressions that predict the equity premium with prominent variables suggested by the academic literature. Campbell and Thompson (2008) show that many predictive regressions of equity premium using other financial predictors can beat the historical market average return once some restrictions are imposed on the signs of model parameters and return forecasts. Rapach et al. (2010) recommend combining individual forecasts to predict market equity premium and show that forecast combina-tion offers statistically and economically significant out-of-sample gains relative to the historical average market returns consistently over time. Rapach and Zhou (2013) survey the literature on equity premium forecasting and show strategies, such as economically motivated model restrictions, forecast combination, diffusion indices and regime switch-ing models, can improve forecastswitch-ing performance by addressswitch-ing the substantial model uncertainty and parameter instability.

The remainder of this paper is organized as follows. Section 2.2 presents the econo-metric models, estimation method, out-of-sample forecast procedure and forecast com-bination methods. Section 2.3 presents the data and our empirical results. Section 2.4 concludes.

(35)

2.2

Econometric Model

2.2.1 Bivariate Predictive Model

Following Goyal and Welch (2008) and Rapach et al. (2010), we first consider the stable one-step ahead bivariate predictive model:

rt+1 =β0i +β1iXi,t+et (2.1)

where rt+1 is the one period ahead market equity premium, Xi,t is predictor i avail-able at time t to forecast the next period excess returns and et is a disturbance term. We generate a series of out-of-sample forecasts of the market equity premium using a recursive estimation window. Specifically, we split the total sample of T observations into two parts, an estimation sample of sizeR and an evaluation sample of sizeP, where R+P =T. Under the recursive window, at each point in time, the estimated parameters are updated by adding one more observation starting with sample size R. For example, the first out-of-sample forecast of the market equity premium based on predictor xi,t is

ˆ

ri, R+1 = ˆβ0i, R+ ˆβ i

1, RXi, R (2.2)

where ˆβi

0, R and ˆβ1i, R are the ordinary least squares estimates of β0i and β1i, respectively,

in equation (2.1) using the first R observations in the sample. Then, the second period out-of-sample forecast is

ˆ

ri, R+2 = ˆβ0i, R+1+ ˆβ1i, R+1Xi, R+1 (2.3)

where ˆβ₀i_{, R}₊₁ and ˆβ₁i_{, R}₊₁ are the least squares estimates ofβ₀i andβ₁i using the firstR+ 1 observations. Proceeding in this manner through the end of the out-of-sample periodT, we have recursively produced a sequence of out-of-sample forecasts of size P,{rˆs+1}Ps=1,

using predictor xi,t. Using predictive model (2.1), we can apply the same procedure to the rest of predictors, xi,t, where i = 1, ..., M. Since we have 14 predictors available to

(36)

forecast excess returns across all data frequencies in our empirical applications presented in section 2.3,M = 14.

In additional to the bivariate predictive model (2.1) using various predictors to fore-cast the market equity premium, we can simply use the historical mean of the excess returns for prediction

rt+1 =β0i +et. (2.4)

We apply the same out-of-sample procedure outlined previously to generate a sequence of forecasts of sizeP according to model (2.4), we label this series{¯rs+1}Ps=1. In the

liter-ature of examining the efficient capital market hypothesis and forecasting stock returns, the historical mean model (2.4) serves as a natural benchmark predictive model to com-pare with other proposed complex models, see Goyal and Welch (2008), Campbell and Thompson (2008) and Rapach et al. (2010). To compare results with related literature, we continue to use model (2.4) as benchmark in this paper.

2.2.2 Forecast Combination

Since there is no prior information implying that model (2.1) is the true model or the best linear predictor, given that researchers have documented evidence of parameter instability or structural breaks in linear models forecasting stock returns,6 _{a natural} competing alternative to model (2.1) is a model with breaks in its coefficients

rt+1 =β0i,t+β i

1,tXi,t +et. (2.5)

Because of sample size concerns and the uncertainty surrounding the quality of the estimates of structural break dates and sizes, we only consider the one break model in this paper, rt+1 =        β₀i_,₁+β₁i_,₁Xi,t+et if t < τ β₀i_,₂+β₁i_,₂Xi,t+et if t≥τ (2.6)

(37)

whereτ is the time index of the break. The break dateτ is restricted to the interval [τ1, τ2]

which is bounded away from the ends of the sample on both sides, 1 < τ1 < τ2 < R.

Following related literature,7 _{we restrict that the break date falls into the middle 70%} portion of the estimation sample.

The break date can be estimated by concentration. That is, for a fixed value of τ, we can estimate the piece-wise model parameters by least squares and then calculate the sum of squared errors, SSE(τ) = ˆet(τ). We apply this procedure to all possible values of τ in the interval [τ1, τ2], so we have a series of values of the sum of squared errors, {SSEs(τ)}sτ2=τ1. Our estimate of the break date, ˆτ, would be the value of τ that is the

global minimizer of {SSEs(τ)}τs2=τ1.

For a given bivariate predictive model with predictor Xi, we can choose a stable version of model (2.1) or a break version of model (2.6). Next, we are going to present several forecast combination methods to combine model (2.1) and model (2.6) to form a averaged model with predictor Xi. We assign weight w to the break model (2.6) and 1−w to the stable model (2.1), where w∈[0,1], so the averaged model, MODi, is

rt+1 =w β₀i_,t+β₁i_,tXi,t + (1−w) β₀i +β₁iXi,t +et. (2.7) 2.2.2.1 Equal Weights

We can equally weight all models without estimating or calculating any additional parameters, so the averaged model (2.7), MODe_i, is

rt+1 = 1 2 β₀i_,t+β₁i_,tXi,t + 1 2 β₀i +β₁iXi,t +et. (2.8)

2.2.2.2 Discounted Mean Squared Forecast Error Weights

Stock and Watson (2003) propose a discounted mean squared forecast error (DMSFE) combination method that computes weights based on the past predictive performance of

(38)

individual models over a holdout out-of-sample period. That is, for model j at timet, w_j,td = φ −1 j,t PM s=1φ −1 s,t (2.9) where φj,t = t X l=1 θt−l(rl+1−rˆl+1)2 (2.10)

and θ is a discount factor. The DSMFE method assigns greater weight to individual

predictive model with lower past mean squared forecast error (MSFE) over the holdout out-of-sample period. When θ = 1, there is no discounting, so all past observations are treated equally when calculating MSFE over the holdout period. If θ < 1, DMSFE allows for greater weights on the more recent observations. The averaged model (2.7), MODd_i, is rt+1 =wdi,t β₀i_,t+β₁i_,tXi,t + (1−wdi,t) β₀i +β₁iXi,t +et. (2.11)

2.2.2.3 Schwarz Information Criterion Weights

The Schwarz information criterion weight8 _{for each predictive model is calculated} based on the associated value of the Schwarz information criterion (SIC). For example, at time t, if the the SIC value for the break model (2.6) is SICb(t) and the SIC value for the stable model (2.1) is SICs(t), then the SIC weight for the break model, w_ts, is

w_ts = exp (SIC

b₍_t₎₎

exp (SICb(t)) + exp (SICs(t)). (2.12)

The averaged model (2.7), MODd_i, is: rt+1 =wsi,t β₀i_,t+β₁i_,tXi,t + (1−wsi,t) β₀i +β₁iXi,t +et. (2.13) 2.2.2.4 Mallows’ Weights

Hansen (2007) proposes an averaging estimator with the weight selected to minimize a Mallows’ information criterion, which is an asymptotically unbiased estimate of both

(39)

the in-sample mean squared error and the out-of-sample one-step ahead mean squared forecast error. Subsequently, Hansen (2009) extends Mallows’ model averaging to the structural break case. Specifically, at time period t, the Mallows’ weight for the break model (2.6), wm_t , is w_tm =        0 if Ft<p¯ 1− _Fp¯ t if Ft≥p¯ (2.14)

where Ft is the standard F-test statistic in Andrews (1993), and ¯pis a penalty coefficient whose value depends on the asymptotic distribution of the Andrews’ SupF test statistic. Hansen (2009) provides a table of ¯pvalues for various cases.

The averaged model (2.7), MODm_i , is

rt+1 =wmi,t

β₀i_,t+β₁i_,tXi,t + (1−wmi,t)

β₀i +β₁iXi,t +et. (2.15)

2.2.2.5 Combining All Predictive Models

Since the seminal work of Bates and Granger (1969), it has been known that combin-ing forecasts across predictive models can produce forecasts that outperform any scombin-ingle individual model. Forecast combination can be viewed as a diversification strategy anal-ogous to portfolio diversification that improves forecasting performance.

For our bivariate predictive model (2.1), since totally we haveM predictors available, there are M candidate models to forecast the market equity premium. Once we include break model (2.6), we end up with 2M predictive models. Previously we have shown several methods to average the stable and break version of a predictive model with predictor Xi, next, we are going to combine all 2M models to form one pooled model.

Specifically, for each bivariate model iwith predictor Xi, first, we combine its stable and break cases using the four previously outlined combination methods to get MODj_i, i = 1, ..., M and j ∈ {e, d, s, m}. So for a given weighting method j, at this point we haveM models left from the initial 2M models.

(40)

Next, we will assign equal weight to all M models for a given method j, that is, 1 M PM i=1MOD j

i. This will be our final combined model to forecast,

rt+1 = 1 M M X i=1 w_i,tj β₀i_,t+β₁i_,tXi,t + (1−wj_i,t)β₀i +β₁iXi,t +et (2.16) where j ∈ {e, d, s, m}.9

Intuitively, we have introduced a two-stage weighting procedure. In the first stage, for each predictive modeli, we combine its stable and break cases by using one of the four outlined methods, namely, equal weights, DMSFE weights, SIC weights and Mallows’ weights. Then, we pool all models together by equal weights.

Note that in the second stage we only consider equal weights to pool all models which have been averaged for parameter instability in the first round. The reason is that in many empirical applications, combining a large number of predictive models by equal weights tend to outperform other complex methods. But at the first stage regarding model parameter instability, complex averaging methods may offer substantial gains.

2.2.3 Forecast Evaluation

A popular metric to evaluate forecasts is the mean squared forecast error (MSFE)

MSFE = 1 T T X t=1 (yt−yˆt)2. (2.17)

Since in this paper we compare forecasting performance of various predictors and combination methods with the historical mean benchmark, and to compare our results with those from related literature, we adopt Campbell and Thompson’s out-of-sample R_OS2 statistic to compare methods and models,

R2_OS(i) = 100× 1− M SF E i M SF E0 (2.18) whereiindexes the model or method and 0 represents the benchmark. TheR2_OS statistic measures the reduction in MSFE for the predictive model or combination forecast relative

(41)

to the historical average model. Thus, if theR2

OS statistic is positive, it indicates better forecasting performance for model i than the benchmark model. The higher the R2

OS value, the better the out-of-sample performance.

In addition, following Goyal and Welch (2008) and Rapach and Zhou (2013), to con-struct a series of graphical device to evaluate the out-of-sample forecasting performance for the benchmark and the competing models, we adopt the cumulative difference in squared forecast errors (CDSFE) between the historical mean model and the bivariate model or the combined model as another metric,

CDSFEt = t X s=1 (rs+1−r¯s+1) 2 − t X s=1 (rs+1−rˆs+1) 2 (2.19) where ¯rs+1 is the forecast from the benchmark model (2.4) and ˆrs+1 is the forecast from

model (2.1) or model (2.16). At any periodt, if the value of CDSFEtis positive, it implies

that the competing model outperforms the benchmark by having a smaller prediction error rate.

Time series plots of the CDSFE curve can be conveniently used to determine if the competing model has a lower MSFE than the historical average benchmark for any period by simply comparing the height of the curve at the beginning and end points of the segment corresponding to the period of evaluation. If the curve is higher at the end of the evaluation period relative to the starting time, then the competing model or method has a lower MSFE than the benchmark during the out-of-sample evaluation period. A model or method which forecasts better than the historical average model will thus have a slope that is positive everywhere during the out-of-sample evaluation period.

2.3

Empirical Results

2.3.1 Data and Out-of-sample Forecast

Our data are from Goyal and Welch (2008). We use the most recently updated data up to the year of 2013. The market equity premium is the log returns on the S&P 500

(42)

index including dividends minus the log returns on the risk-free rate. Since we apply forecast combination methods to data of all frequencies, we only keep the following fourteen variables which are available for all data frequencies:

• Log dividend-price ratio (dp): log of a 12-month moving sum of dividends paid on the S&P 500 index minus the log of stock prices.

• Log dividend yield (dy): log of a 12-month moving sum of dividends minus the log of lagged prices.

• Log earnings-price ratio (ep): log of a 12-month moving sum of earnings on the S&P 500 index minus the log of stock prices.

• Log dividend-payout ratio (de): log of a 12-month moving sum of dividends minus the log of a 12-month moving sum of earnings.

• Stock market variance (svar): monthly sum of squared daily returns on the S&P 500 index.

• Book to market ratio (bm): ratio of book value to market value for the Dow Jones Industrial Average.

• Net equity expansion (ntis): ratio of a 12-month moving sum of net equity issues by NYSE-listed stocks to the total end of year market capitalization of NYSE stocks.

• Treasury bill rate (tbl): interest rate on a three-month secondary market Treasury bill.

• Long term yield (lty): long term government bond yield.

• Long term return (ltr): return on long term government bonds.

(43)

• Default yield spread (dfy): difference between BAA- and AAA-rated corporate bond yields.

• Default return spread (dfr): long term corporate bond returns minus the long term government bond returns.

• Inflation (infl): inflation is the Consumer Price Index (all urban consumers) from the Bureau of Labor Statistics.

As for the sample split choices, to make our results comparable to those of Goyal and Welch (2008), Rapach et al. (2010) and Rapach and Zhou (2013), we adopt the following choices:

• Monthly data: the estimation sample runs from 1927:01 to 1956:12 (R = 360), and the evaluation sample runs from 1957:01 to 2013:12 (P = 684).

• Quarterly data: the estimation sample runs from 1947:1 to 1964:4 (R = 72), and the evaluation sample runs from 1965:1 to 2013:4 (P = 196).

• Yearly data: the estimation sample runs from 1927 to 1964 (R = 38), and the

evaluation sample runs from 1965 to 2013 (P = 49).

We use the recursive window to forecast out-of-sample, meaning that the estimation sample always starts from the same beginning period and additional observations are used as they become available. Following this procedure, all model parameters are estimated recursively and out-of-sample forecasts are generated accordingly.10

The time series plots of monthly, quarterly and yearly data are presented in Fig-ureB.1, FigureB.2and FigureB.3, respectively. Additionally, we also present the corre-lation matrices for all variables across all data frequencies in Figure B.4, FigureB.5 and 10_{Model parameters can also be estimated using a rolling window, which drops earlier observations as}

additional data become available. Rolling window may be justified by appealing to structural breaks, but our results shown in this paper do not change substantially if we switch to use the rolling window.

(44)

Figure B.6. We can see that the dependent variable, the U.S. market equity premium, is weekly correlated with all 14 predictors across all frequencies. This may partially ex-plain the earlier findings in Goyal and Welch (2008) that the simple bivariate predictive model (2.1) forecasts excess returns poorly out-of-sample compared with the historical mean model (2.4). Furthermore, some predictors, such as dividend-price ratio (dp), dividend-yield ratio (dy), earnings-price ratio (ep) and book to market ratio (bm), are highly correlated with each other. This may explain the poor out-of-sample performance of the “kitchen-sink” model which includes all predictors in Goyal and Welch (2008).

2.3.2 Bivariate Model Prediction

To reexamine the empirical results of Goyal and Welch (2008) with updated data, here we use model (2.1) to forecast equity premium out-of-sample for all 14 predictors. Then we present the out-of-sample time series plots of the cumulative difference of squared forecast error (CDSFE) between model (2.1) and the historical benchmark model (2.4) for all predictors. The monthly, quarterly and yearly CDSFE plots are shown in FigureB.7, Figure B.8 and Figure B.9, respectively. This is an informative graphical device that shows an individual model’s out-of-sample forecasting performance over time. When the curve in each panel of those figures increases, the predictive model outperforms the historical average model, while the opposite holds when the curve decreases. These plots conveniently illustrate whether an individual model has a lower MSFE than the benchmark over a selected out-of-sample evaluation period. A predictive model that always beats the benchmark for any period will have a curve with a slope that is always positive.

From Figure B.7, Figure B.8 and Figure B.9, we can see that the historical mean model still outperforms most predictors even with updated data. Furthermore, there is no models or predictors which consistently outperform the historical mean over the evaluation period for all data frequencies. These figures also suggest that dividend-price

(45)

ratio (dp), dividend-yield ratio (dy), earnings-price ratio (ep) and book to market ratio (bm) tend to outperform the benchmark over the most part of the evaluation period. This is not surprising because in the previous section we have shown that these variables are highly positively correlated with each other, but they are weakly correlated with the market premium. Overall, whether these variables have statistically significant predic-tive power is questionable in linear predicpredic-tive models, so it is still difficult to identify individual predictors that reliably beat the benchmark in predicting excess returns.

2.3.3 Forecast Excess Returns Using Combined Model

Rapach et al. (2010) and Rapach and Zhou (2013) conclude that forecast combination appears successful for out-of-sample premium prediction because it can substantially re-duce forecast variance and include information from various economic predictors. They also suggest that the usefulness of forecast combination ultimately stems from the highly uncertain, complex and constantly evolving data-generating process underlying the ex-pected market excess returns. They find that combining forecasts, especially using equal weights averaging all models, outperform the historical average by statistically and eco-nomically meaningful margins, and more consistently than a range of individual models suggested by the literature.

Our main empirical contribution is to prove that the two-stage forecast combination method outlined in section 2.2.2.5, can substantially improve the out-of-sample forecast performance using the same dataset studied by Rapach et al. (2010) and Rapach and Zhou (2013). The out-of-sample performance are evaluated and compared using both the

CDSFE plots and the Campbell and Thompson (2008)R2

OS. The results hold consistently

for all data frequencies.

Before reporting our empirical results on two-stage forecast combination, we start with presenting the out-of-sample forecast correlation matrices for all bivariate models for monthly, quarterly and yearly data in Figure B.10, Figure B.11 and Figure B.12,

(46)

respectively. From these figures we can see that forecasts form some models are correlated to some degree, for example, forecasts from the dividend-price ratio model (dp) and the long term yield model (lty) are negatively correlated. This graphical device help us confirm that our choice of equal weights for the second stage combination outlined in section 2.2.2.5is more desirable than optimization based complex weights, for example, the least squares weights.11

In additional to the four first-step combination methods outlined in section 2.2.2, namely, equal weights, DMSFE weights, SIC weights and Mallows’ weights, we also con-sider combining all 14 stable models12_{(model (}_2.1_{)) and all 14 break models (model (}_2.6₎₎ in the following analysis. They are labeled “stable” and “break”, respectively, in our sub-sequent figures and tables. So there are six combined models available.

The solid lines in Figure B.13, Figure B.14 and Figure B.15 plot the differences between the cumulative squared forecast error for the historical mean model and the cu-mulative squared forecast error for the combined model of all six methods for monthly, quarterly and annual data, respectively. In contrast to previous figures for the bivariate

model, the CDSFE curves in Figure B.13, Figure B.14 and Figure B.15 are

predomi-nantly positive, indicating that for this particular dataset and forecast problem, forecast combination delivers substantial and consistent gains compared with individual predic-tive model. Note that the combined models work particularly well for the monthly data as the slope of the CDSFE curve is almost positive for the entire out-of-sample evaluation period for all methods. While for the annual data, it looks like the forecast performance of the combined model somehow deteriorates towards the end of the evaluation sample period.

Furthermore, we are interested in comparing the six averaging methods used in our combined models. This can be done by comparing their associated R2

OS statistics.

Ta-11_{See Timmermann (2006).}

12_{This is equivalent to the “Mean” combination method used in Rapach et al. (2010), and the}

(47)

ble B.1reports the out-of-sampleR2

OS statistics for all combination methods for all data

frequencies. R2

OSmeasures the percent reduction in mean squared forecast error (MSFE)

for the combination methods (2.16) given in the first row of Table B.1 relative to the historical average benchmark forecast by model (2.4). Cp stands for Mallows’ weights. DMSFE is the discounted mean squared forecast error weights with discount factorθ = 1. The column titled “Break” shows equal weights for the break version of all bivariate pre-dictive models. The column titled “Stable” shows equal weights for the stable version of all bivariate predictive models. We apply the two-stage forecast combination procedure to the first four columns, meaning that in the first stage, for each bivariate predictive model, we use Cp, DMSFE, Equal or SIC weights to average its stable and break cases, then we apply equal weights to all 14 break-adjusted models. For the last two columns, we simply equally weight all 14 break or stable bivariate predictive models.

From Table B.1 we can see that forecast combination offers the largest gains relative to the benchmark for monthly data. More than 10% reduction in mean squared forecast error can be achieved by using combined models. Among the four methods de