Application - Sparse estimation of high-dimensional time series models.

Table 2.4: Mean Absolute Forecast Error for the four considered estimators (rows) and three samples (column). The average MAFE, averaged over the three samples, is provided in the last column.

Estimator Sample 1 Sample 2 Sample 3 Average

GroupLasso+Cov 0.81 0.80 0.80 0.80 GroupLasso 1.23 1.38 1.72 1.44 Lasso+Cov 0.83 0.81 0.97 0.87 Lasso 1.51 1.85 2.37 1.91 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● SID1 S1C22A4 CDKN1B RABEP1 TOM1 PAX5 SPIN1 AA739413 AI153246 XIRP1 H13 P0 GTF2A TOR1B CRP1 OLFM1 X63240 SAA2 IGH SOCS3 ACTB TNFAIP6 LCN2 IGIV1 HSD17B FMOD PGM PTPRB IGHB CYP1B1 Directed effects

Figure 2.3: Directed effects: a directed edge is drawn from one gene to another if the GroupLasso+Cov estimator indicates, by giving a non-zero regression estimate, that the former influences the latter.

The GroupLasso+Cov attains the best forecast performance. It is closely followed by the Lasso+Cov. An important gain in prediction accuracy is obtained by accounting for the correlation structure of the error terms: the MAFE of the GroupLasso+Cov is, on average, 45% lower than the MAFE of the GroupLasso. Furthermore, we see from Table 2.4 that the grouplasso estimators perform better than the corresponding lasso estimators.

We study the interaction between the genes that trigger transitions to the mammary gland’s main development stages. Figure 2.3 represents the ‘directed, lagged effects’ [Abegaz and Wit, 2013] inferred from bB. We discuss the results obtained from the first sample. Results for the other two samples are similar

44 Algorithm multivariate grouplasso

Figure 2.4: Contemporaneous interactions: an undirected edge is drawn between two genes if the GroupLasso+Cov estimator indicates, by giving a non-zero estimate in Ω, that the innovations are partially correlated. Contemporaneous interactions are observed for only a subset of 13 genes, as indicated by the rectangle.

and available from the authors upon request. The nodes in the network are the different genes. A directed edge from gene A to gene B is drawn if the GroupLasso+Cov indicates, by giving a non-zero estimate, that gene A has a lagged effect on gene B. The solution is very sparse: 850 out of the possible 900 = 302 _{effects are estimated as zero. Some genes such as GTF2A and TOR1B,}

neither influence any other genes, nor are influenced by other genes. Other genes, such as HSD17B and SAA2 are important hubs in the gene regulatory network. Previous research (Abegaz and Wit, 2013 and references therein) found these genes to play a central role in the mammary gland’s development stages.

Figure 2.4 represents the ‘contemporaneous interactions’ [Abegaz and Wit, 2013] inferred from bΩ. Again, the genes are the different nodes in the network. The elements of bΩ have a natural interpretation as partial correlations between the innovations (or error components) of the q equations in the VAR model. An edge is drawn between gene A and gene B if the corresponding element in the inverse error covariance matrix is estimated as non-zero. This means that the innovations

2.4. Application 45

of genes A and B are contemporaneously partially correlated: conditional on all other innovations, a shock in the innovation of gene A will lead to an instantaneous shock in the innovation of gene B, and vice versa. As can be seen from Figure 2.4, contemporaneous interactions are observed only between a subset of 13 gene innovations, indicated by the rectangle. An important advantage of the sparse estimator is that the main interactions in the large gene regulatory network are highlighted. Out of the possible 435 interactions, only 32 are estimated as non- zero. As such, the researcher can concentrate on these results to further deepen our knowledge into the interactions at play in the development stages of the mammary gland.

Chapter 3 The predictive power of the

business and bank sentiment of

firms: A high-dimensional

Granger Causality approach

Abstract

We study the predictive power of industry-specific economic sentiment indicators for future macro-economic developments. In addition to the sentiment of firms towards their own business situation, we study their sentiment with respect to the banking sector – their main credit providers. The use of industry-specific sentiment indicators results in a high-dimensional forecasting problem. To identify the most predictive industries, we present a bootstrap Granger Causality test based on the Adaptive Lasso. This test is more powerful than the standard Wald test in such high-dimensional settings. Forecast accuracy is improved by using only the most predictive industries rather than all industries.

3.1 Introduction

Sentiment indicators are often considered to be among the most important leading indicators of the real economy [Dreger and Kholodilin, 2013] and are therefore closely followed by business cycle analysts, central banks and business owners

48 High-dimensional Granger Causality

(Vuchelen, 2004, Claveria et al., 2007, Martinsen et al., 2014). However, studies on the predictive power of sentiment indicators find mixed results. While many studies find that sentiment indicators have predictive power for future economic developments (Kumar et al., 1995, Hansson et al., 2005, Lemmens et al., 2005, Abberger, 2007, Klein and Oezmucur, 2010, Christiansen et al., 2014), others conclude that sentiment indicators provide only limited information for predicting economic variables (Cotsomitis and Kwan, 2006, Claveria et al., 2007, Dreger and Kholodilin, 2013 and Bruno, 2014).

An important communality between these studies is the use of aggregate sentiment indicators. This paper, instead, examines the predictive power of disaggregate sentiment indicators. Especially in the context of business sentiment – as is the topic of this paper – some segments have more predictive power than others. Here, we segment firms according to their industry. Our methodology takes into account that different industry segments might contain predictive power for different macro-economic indicators.

To study the predictive power, we use a Granger Causality approach. A set of time series Granger Causes another time series if the former has incremen- tal predictive power for the latter. Granger Causality tests in low-dimensional time series settings have a long history. They are used, among others, in macro- economics to study the predictive power of monetary aggregates for output and price variables [Sahoo and Acharya, 2010], in operational research to study the predictive power of academic literature for practitioner literature [Ghosh et al., 2010], or in finance to study the predictive power of volume for stock prices [Blasco et al., 2005]. Because predictive analysis based on disaggregate sentiment indicators requires handling a large number of such indicators, we introduce a Granger Causality test for high-dimensional time series data.

Recently, testing procedures for high-dimensional cross-section data have gained attention, for instance Wasserman and Roeder [2009], Meinshausen et al. [2009] and Chatterjee and Lahiri [2011]. We extend the residual bootstrap procedure of Chatterjee and Lahiri [2011] to high-dimensional time series data. The bootstrap test statistic, based on the Adaptive Lasso [Zou, 2006], identifies those industry segments whose predictive power is statistically significant. Our simulation study shows that this test statistic is more powerful than the standard Wald test statistic in a high-dimensional setting. Furthermore, important gains in forecast accuracy are obtained by not using all industry segments but by first selecting the most predictive ones using the bootstrap test.

In document Sparse estimation of high-dimensional time series models. (Page 57-63)