(8.3) Different methods - Edge deletion tests and ℓ1-regularization methods in graphical modell

Recently, there has been a resurgence in the implementation of greedy

forward/backward selection procedures, similar to that described in the start of the methods section, by Edwards (2000). The motivation for this type of method lies in the fact that the full structure of the model can be learnt with a high probability with just O(d log(p)) samples, which, when compared with one of the methods used in this thesis, the glasso, is a vast improvement, since the glasso requires (d2 log(p)) samples. In the approach proposed by Johnson et al in 2011, a combination of the forward-backward greedy algorithms was considered. The idea was to start with an empty set of variables, with the first step being to find the best next “candidate” (variables) to the active set, only if it improves the loss function used by a significant amount. The next step (the

least one of the variables added before does not contribute a significant amount to the loss function, then the algorithm removes them from the active set.

While this seems like an interesting approach to compare with the other methods used in this thesis, questions still remain with the complexity of the problem, and whether this approach will be slower than the other approaches in a high-dimensional setting.

(9) Conclusions

Throughout this thesis the motivation has been to convince the reader of the usefulness of graphical modelling in the context of Multivariate Time Series. The way of showing the advantages of this type of model was to consider it when representing structural vector autoregressive models (SVAR) of varying orders. Previous research in this area, which was developed upon in this thesis, in the form of the GMTS and SIN approaches, were proven to still be very important approaches to consider when aiming to discover the structure of the graphical model.

The SVAR model has been shown to provide a useful platform to determine the dependencies between not only the contemporaneous and lagged variables, but also between solely the contemporaneous variables. Simply being able to use the coefficients of each regression to determine which variables are linearly dependent is an attractive prospect. There are also other aspects of analysis associated with SVAR models, such as impulse analysis and forecast error variance decompositions, which could be examined further in other analyses.

An issue becomes apparent when comparing with the ℓ1-regularization methods introduced in this thesis. When the aim of the research is to provide a sparse structure for the model, the SIN and GMTS merely provide different tests for conditional

independencies between variables, and do not have a method of inducing sparsity in the structure of the graphical model, like the penalization methods in ℓ1-regularization. In this thesis however, it was determined that perhaps penalization methods are not always necessary when it comes to finding the Conditional Independence Graph (CIG) of the dataset, as shown by the superior results from the GMTS and SIN methods. It must be noted however that the ℓ1-regularization methods do require further analysis, because it is still unknown how these approaches could provide relatively poor results.

Even in terms of the convergence rates, the simpler testing procedures (GMTS and SIN) were a more attractive prospect using these simulated datasets, due to the lack of

optimization required in the process. A more formal comparison of the speeds of convergence is required for each of the ℓ1-regularization methods, since basic observations of process speeds does not provide any proper results in the context of research in this area.

It has been discovered that while the main motivation behind using the ℓ1-regularization remains (approximately) true for each of the four methods examined in this thesis, there are many differences, great or small, between them. This can be seen in the function that requires penalization in each method, the manner in which the tuning parameter is selected in each of the methods, or even the penalty itself, which is imposed on the function, which in this case can be based on either the precision matrix or the partial correlation matrix.

In Section 8 of this thesis, there were many areas of future research considered, that were outside the motivations of this thesis. It would be useful to examine the

performances of these six methods using different simulation studies. In particular, due to the knowledge that the ℓ1-regularization methods are designed to cope with high- dimensional problems, p ≫ n, a set of simulation studies, similar to the one conducted in this thesis, could be carried out to determine whether the GMTS and SIN methods would outperform the ℓ1-regularization methods again. If it is found that the ℓ1-

regularization methods do perform better, then perhaps it would be useful to consider a more comprehensive analysis of these approaches, considering some of the methods not used in this research (like the SCIO by Liu and Luo (2012), and the scaled-lasso by Sun and Zhang (2012).

(10) Bibliography

[1] Banerjee, O., El Ghaoui, L., & d'Aspremont, A. (2008). Model selection through

sparse maximum likelihood estimation for multivariate gaussian or binary data. The Journal of Machine Learning Research, 9, 485-516.

[2] Banerjee, O., Ghaoui, L. E., d'Aspremont, A., & Natsoulis, G. (2006, June).

Convex optimization techniques for fitting sparse Gaussian graphical models. In Proceedings of the 23rd international conference on Machine learning (pp. 89-96). ACM.

[3] Belloni, A., Chernozhukov, V., & Wang, L. (2011). Square-root lasso: pivotal

recovery of sparse signals via conic programming. Biometrika, 98(4), 791-806.

[4] Bien, J., & Tibshirani, R. J. (2011). Sparse estimation of a covariance matrix.

Biometrika, 98(4), 807-820.

[5] Boyd, S., & Vandenberghe, L. (2004). Convex optimization. Cambridge

university press.

[6] Brüggemann, R. (2004). Model reduction methods for vector autoregressive

processes (Vol. 536). Springer Science & Business Media.

[7] Bühlmann, P., & Van De Geer, S. (2011). Statistics for high-dimensional data:

methods, theory and applications. Springer Science & Business Media.

[8] Cai, T. T., Liu, W., & Zhou, H. H. (2012). Estimating sparse precision matrix:

Optimal rates of convergence and adaptive estimation. arXiv preprint

[9] Cai, T. T., Zhang, C. H., & Zhou, H. H. (2010). Optimal rates of convergence

for covariance matrix estimation. The Annals of Statistics, 38(4), 2118-2144.

[10] Cai, T., Liu, W., & Luo, X. (2011). A constrained ℓ1 minimization approach to

sparse precision matrix estimation. Journal of the American Statistical

Association, 106(494), 594-607.

[11] d'Aspremont, A., Banerjee, O., & El Ghaoui, L. (2008). First-order methods for

sparse covariance selection. SIAM Journal on Matrix Analysis and

Applications, 30(1), 56-66.

[12] Dempster, A. P. (1972). Covariance selection. Biometrics, 157-175.

[13] Drton, M., & Perlman, M. D. (2007). Multiple testing and error control in

Gaussian graphical model selection. Statistical Science, 430-449.

[14] Drton, M., & Perlman, M. D. (2008). A SINful approach to Gaussian graphical

model selection. Journal of Statistical Planning and Inference, 138(4), 1179- 1200.

[15] Edwards, D. (2000). Introduction to graphical modelling. Springer Texts in

Statistics.

[16] Friedman, J., Hastie, T., & Tibshirani, R. (2001). The elements of statistical

learning (Vol. 1). Springer, Berlin: Springer series in statistics.

[17] Friedman, J., Hastie, T., & Tibshirani, R. (2008). Sparse inverse covariance

estimation with the graphical lasso. Biostatistics, 9(3), 432-441.

[18] Friedman, J., Hastie, T., & Tibshirani, R. (2010). Applications of the lasso and

grouped lasso to the estimation of sparse graphical models (pp. 1-22). Technical

[19] Gottard, A., & Pacillo, S. (2010). Robust concentration graph model

selection. Computational Statistics & Data Analysis, 54(12), 3070-3079.

[20] Hsieh, C. J., Dhillon, I. S., Ravikumar, P. K., & Sustik, M. A. (2011). Sparse

inverse covariance matrix estimation using quadratic approximation. In

Advances in Neural Information Processing Systems (pp. 2330-2338).

[21] James, G. M., Radchenko, P., & Lv, J. (2009). DASSO: connections between

the Dantzig selector and lasso. Journal of the Royal Statistical Society: Series B

(Statistical Methodology), 71(1), 127-142.

[22] Johnson, C. C., Jalali, A., & Ravikumar, P. (2011). High-dimensional sparse

inverse covariance estimation using greedy methods. arXiv preprint

arXiv:1112.6411.

[23] Li, X., Zhao, T., Yuan, X., & Liu, H. (2012). An R Package flare for High

Dimensional Linear Regression and Precision Matrix Estimation. R Package

Vignette.

[24] Lin, A., (2008). Edge Deletion Tests in Graphical Modelling for Multivariate

Time Series. Honours Project dissertation (unpublished), University of Canterbury.

[25] Liu, H., & Wang, L. (2012). Tiger: A tuning-insensitive approach for optimally

estimating gaussian graphical models. arXiv preprint arXiv:1209.2437.

[26] Liu, H., Han, F., Yuan, M., Lafferty, J., & Wasserman, L. (2012). High-

dimensional semiparametric Gaussian copula graphical models. The Annals of

Statistics, 40(4), 2293-2326.

[27] Liu, H., Lafferty, J., & Wasserman, L. (2009). The nonparanormal:

Semiparametric estimation of high dimensional undirected graphs. The Journal

[28] Liu, W., & Luo, X. (2012). High-dimensional sparse precision matrix estimation

via sparse column inverse operator. arXiv preprint arXiv:1203.3896.

[29] Lütkepohl, H. (2005). New introduction to multiple time series analysis.

Springer Science & Business Media, New York.

[30] Mazumder, R., & Hastie, T. (2012). The graphical lasso: New insights and

alternatives. Electronic journal of statistics, 6, 2125-2149.

[31] Meinshausen, N., & Bühlmann, P. (2006). High-dimensional graphs and

variable selection with the lasso. The Annals of Statistics, 1436-1462.

[32] Peng, J., Wang, P., Zhou, N., & Zhu, J. (2009). Partial correlation estimation by

joint sparse regression models. Journal of the American Statistical

Association, 104(486).

[33] Pfaff, B. (2008). VAR, SVAR and SVEC models: Implementation within R

package vars. Journal of Statistical Software, 27(4), 1-32.

[34] Sims, C. A. (1980). Macroeconomics and reality. Econometrica: Journal of the

Econometric Society, 1-48.

[35] Sun, T., & Zhang, C. H. (2013). Sparse matrix inversion with scaled lasso. The

Journal of Machine Learning Research, 14(1), 3385-3418.

[36] Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal

of the Royal Statistical Society. Series B (Methodological), 267-288.

[37] Tsay, R. S. (2013). Multivariate Time Series Analysis: With R and Financial

Applications. John Wiley & Sons.

[38] Verzelen, N., & Villers, F. (2009). Tests for Gaussian graphical

[39] Vichik, S., & Oshman, Y. (2011, June). Optimal covariance selection for

estimation using graphical models. In American Control Conference (ACC),

2011 (pp. 5049-5054). IEEE.

[40] Wilson, G. T. (2010). Atmospheric CO2 and global temperatures: the strength

and nature of their dependence. Working paper.

[41] Wilson, G. T., Reale, M., & Morton, A. S. (2001). Developments in multivariate

time series modeling. Department of Mathematics and Statistics, University of

Canterbury.

[42] Wilson, G. T., & Reale, M. (2008). The sampling properties of conditional

independence graphs for I (1) structural VAR models. Journal of Time Series

Analysis, 29(5), 802-810.

[43] Xue, L., & Zou, H. (2012). Regularized rank-based estimation of high-

dimensional nonparanormal graphical models. The Annals of Statistics,40(5), 2541-2571.

[44] Yuan, M. (2010). High dimensional inverse covariance matrix estimation via

linear programming. The Journal of Machine Learning Research, 11, 2261- 2286.

[45] Yuan, M., & Lin, Y. (2007). Model selection and estimation in the Gaussian

Appendix A

In document Edge deletion tests and ℓ1-regularization methods in graphical modelling for multivariate time series. (Page 83-92)