A few remarks about model robustness - A sorted leading indicators dynamic (SLID) factor model

Apart from uncertainty about the future, it is customary to distinguish between three types of uncertainty that might have an impact on model based forecast accuracy: measurement, model and parameter uncertainties. Factor models are less subject to such uncertainties than most low- dimension models. In the former, measurements errors (resulting in noise) are likely to be removed from the first few factors. Most variables that might contain leading information can be included in the model. There is no risk of omitting a relevant variable66_{, and unnecessary variables are likely to}

be poorly correlated with the first few factors. Factor models are “non parametric” in the sense that no specific assumption is made about statistical distributions of the data and, consequently, parameter uncertainty is not an issue.

However, the issue of robustness is also relevant for factor models. In the present model, as with every factor model, the selection of series and hence the composition of the data set is based on judgement and data availability. It is likely that data composition has an impact on factors (especially when they are extracted in real time from leading variables) and that factor model robustness might depend to some extent on the database structure.

Since the model is non-parametric, classic robustness tests are not relevant. Tests of robustness are rather concerned with monitoring the impact ceteris paribus of each calibration, and of the database composition (e.g. dropping ex ante some selected series).

4.1. FACTOR SELECTION

The choice of factors was based on an adapted version of the BIC and confirmed by out-of-sample RMSE statistics. RMSE statistics were generally different for each factor combination, with an exception: results were close with three or four factors but with a different level of filtering with our out-of-sample span. Given the room for an empirical choice of factors, the issue is twofold: is the model is robust to the choice of factors? Can economic analysis inform the choice of factor?

It is impossible to answer the first question based on the latest developments in the econometric literature about factor model. Most models are based on fitted factors, and this is a weak spot which probably weighs on their forecast performance.

On the second question, factors interpretation can definitely help and inform factors selection. For instance, in the case of coincident forecast, the fourth factor seems to reflect the impact of wealth effects on the recent developments in private consumption where data were filtered. The economist can choose to keep the factor if it is thought that the recent trend is likely to continue. On the other hand, the model with four factors seemed to perform best with almost no filtering, so that the interpretation of the first factors (accounting for the largest share of the data variance) is impossible compared to three factors and filtered data (in particular the correlation of the first factor with the

66_{Banerjee et al. [2003] suggest that a real-time indicator-selection procedure produces significantly better results. It}

means that even if a selected indicator cannot be used in the medium run to produce reliable forecasts, it could used reliably in the short run, which would suggest some kind of short-run inertia in the forecasting power of such indicator. The explanation probably lies in the fact that a given subset of indicators provides reliable forecasts because they are appropriate for describing a given shock to the economy. The indicators remain relatively reliable as long as the effects of this shock are spread to the economy.

reference variable is much lower). This discussion shows that, in practice, factors interpretation can substantially help factors selection thanks to data filtering.

4.2. SIGNAL/NOISE RATIO

This paper suggests that factors estimation can be enhanced by some data filtering that removes series with low correlation to GDP from the database. The impact of the correlation threshold was checked for a small range running from nil (no filtering) to 0.5. Series sorting and selection based on a signal noise ratio always improved forecast accuracy substantially. Forecast performances obtained with three factors change gradually over forecast horizons and are relatively stable across levels of filtering.

The explanation to this phenomenon can be that an appropriate level of filtering removes subsets of series with very low correlation to the reference variable but high idiosyncratic cross-correlation that might bias factor estimation in an approximate factor model framework. If one accepts this argument, the relevant level of filtering is intrinsic to the database that is used. In other words, filtering can correct for some unwelcome features of the data used with respect to factor estimation: too high correlation among some series but too low signal about the business cycle. This comes from the fact that even when a larger number of economic series (by contrast to simulated data) is used, i.e. for larger N, the number of different types/groups of data might not increase. In fact, the latter tends to stabilise quite rapidly, as there is only a limited number of economic data groups/types. Any new series introduced in the database tends to be more correlated with the data of the same type than with the variable of interest. Thus, there is an increased risk that one of the first principal components corresponds more to this type of data than to a latent factor for all series. Apart from filtering, the other solution is obviously to reduce the number of series per types/groups of data, which comes at the cost of increased series selection uncertainty and lesser consistency due to the number of series.

4.3. TIME WINDOW WIDTH

Understanding the time window width impact is probably the most difficult issue in the current factor model framework. At the first stage of estimation (and not at the end of the convergence process), forecast accuracy is better with longer samples’ time-span. The forecast accuracy does not deteriorate with a shorter six-year time span sample and the SEM method (only the model seems to converge more slowly with a narrower window), but it deteriorates significantly with a 5-year sample. The optimal span of the sample could be connected to an integer number of business cycles, especially with very short samples. Considering a short sample of 1 ½ cycles, most variables with peak frequency at the business cycle will show non stationary. Intuitively, it might be more difficult to extract the business cycle signal with more upturns than downturns (or the opposite), possibly because of the asymmetry between both phases of the cycle.

Tests were also conducted with a sample longer than 7 years, but the problem is that longer series available for the out-of-sample period are significantly less numerous and of a lesser quality. Any difference in forecast accuracy could be due more to data availability reasons than to the structure

of factors based on the sample span. With a 9 or 8-year sample, results are nevertheless close to those obtained with a 7-year span window67_.

Moreover, there is a complex link between the optimal number of factors and the time sample span. As the time window gets narrower, the total number of factors is reduced (in line with the rank of the data matrix). Hence, loading and respective factors reflect different shocks over the time window period. The narrower the time window, the more likely it is that most factors (apart from the first) correspond to idiosyncratic shocks and that the more common information is summarised in the first factor. However, the discrepancy between a large number of series and a small matrix rank widens.

Last but not least, the time sample span has also an impact on cross-correlation estimation. Factor models can be assumed to be robust to new shocks (contrary to low-dimension models68_{) insofar as}

they incorporate all data available to describe all kinds of shocks. But they are not robust to changes in the commonalities of shocks, i.e. the different ways in which shocks affect the various series in the data set. With a given time window, the assumption is that the shock commonalities for the time period are relevant for assessing the situation in the near future. The choice of the period on which shocks commonalities are measured has necessarily an impact on forecast performances. A potential enhancement to the model may be found in the calibrating of the sample span according to the duration of past business cycles.

4.4. DATABASE COMPOSITION

The description of the database composition shows that three sorts of data are markedly overweighed in the database: French data, survey data and industrial data. The prevalence of survey data explains why the model is not very reliable following a major break in confidence due to an unpredictable event (such as Sept. 11th 2001). Abundant industrial data partially explain why the model overshoots an upturn that is mainly located in the industrial sector (as at the end of 2003) and led by France. It is difficult to correct these unappealing features, as industrial data are abundant and display a very good correlation with global activity, i.e. the link between industry and the whole economy is strong, most of the time. Survey data also contain useful, albeit noisy, leading information.

Some attempts were made to reduce the weight of these types of data in the data set. These attempts were not successful in the sense that only very slight - most likely not robust - improvements in the RMSE were recorded. It seems that without removing some of the very well correlated industrial or French series, results would be roughly the same. Conversely, it is unlikely that results could be robustly better without these series. The noise added in the data matrix by survey data does not have an observable adverse effect on the model forecast performance provided that the noisiest series are dropped from the data set on the basis of to the signal-to-noise ratio test. Based on limited and unfruitful experiments, it seemed preferable to retain the full database. More sophisticated schemes for the database composition could be envisaged, either based on judgmental assessment, sample survey techniques, or bootstrapping. But considerable computation costs would be incurred given the use of database of 2000 series times 6 leads or lags.

67_{Same number of factors used, same optimal filtering level, very slight deterioration in the RMSE, almost no change}

in the hit ratios.

68_{With leading equations, it can be argued that a model will perform well with respect to GDP forecasts only if the}

nature of the main shocks to the economy occurring in the forecast period is foreseen, and the relevant leading indicators for these shocks are selected accordingly.

Other experiments also showed that factors and results obtained rely on the database composition. For example, the addition of intra-EU trade data or the removal of EC survey data deeply change the factors' features and suggest another optimal level of filtering (but the optimal number of factors is not affected). It means that the series selection is a crucial step in the model design and that the economist job cannot be replaced by a pure statistical procedure. This feature of approximate factor models is not tackled yet by most of the literature.

4.5. MAXIMUM LAG OF THE SERIES IN THE DATA SET

Tests were also conducted with respect to the number of lags that are used for all series in the data set. The issue is the following: if the factor estimation method is consistent, there can be no harm in introducing series not only coincidentally, but also with one lag, two lags, etc. The relevant signal at x lags can be extracted and the idiosyncratic component excluded from the factors. As the series enters the database with x lags, it is possible to estimate factors up to the observations corresponding to the following x quarters. On the other hand, if factor estimation is biased by the introduction of series with a too low signal, too many lagged series might deteriorate forecast accuracy at close horizons.

In practice, a slight deterioration was noticed where series were introduced in the data set with more than four lags for coincident, Q+1 and Q+2 forecast and more than five lags for Q+3 forecasts. This phenomenon is in accordance with common sense: there are very few series that contain relevant information with a lead of more than four or five quarters. Using series with the corresponding lags of four or five quarters adds noise to the data set and very little signal, so that the estimation of factors can only get worse. On this issue, the experiment under real-time availability constraints showed again that a consistent estimation of factors is not guaranteed irrespective of the series that are selected even with a very large number of series.

In document A sorted leading indicators dynamic (SLID) factor model for short-run euro-area GDP forecasting (Page 33-37)