Recently, there has been a resurgence in the implementation of greedy
forward/backward selection procedures, similar to that described in the start of the methods section, by Edwards (2000). The motivation for this type of method lies in the fact that the full structure of the model can be learnt with a high probability with just O(d log(p)) samples, which, when compared with one of the methods used in this thesis, the glasso, is a vast improvement, since the glasso requires (d2 log(p)) samples. In the approach proposed by Johnson et al in 2011, a combination of the forward-backward greedy algorithms was considered. The idea was to start with an empty set of variables, with the first step being to find the best next “candidate” (variables) to the active set, only if it improves the loss function used by a significant amount. The next step (the
least one of the variables added before does not contribute a significant amount to the loss function, then the algorithm removes them from the active set.
While this seems like an interesting approach to compare with the other methods used in this thesis, questions still remain with the complexity of the problem, and whether this approach will be slower than the other approaches in a high-dimensional setting.
(9) Conclusions
Throughout this thesis the motivation has been to convince the reader of the usefulness of graphical modelling in the context of Multivariate Time Series. The way of showing the advantages of this type of model was to consider it when representing structural vector autoregressive models (SVAR) of varying orders. Previous research in this area, which was developed upon in this thesis, in the form of the GMTS and SIN approaches, were proven to still be very important approaches to consider when aiming to discover the structure of the graphical model.
The SVAR model has been shown to provide a useful platform to determine the dependencies between not only the contemporaneous and lagged variables, but also between solely the contemporaneous variables. Simply being able to use the coefficients of each regression to determine which variables are linearly dependent is an attractive prospect. There are also other aspects of analysis associated with SVAR models, such as impulse analysis and forecast error variance decompositions, which could be examined further in other analyses.
An issue becomes apparent when comparing with the ℓ1-regularization methods introduced in this thesis. When the aim of the research is to provide a sparse structure for the model, the SIN and GMTS merely provide different tests for conditional
independencies between variables, and do not have a method of inducing sparsity in the structure of the graphical model, like the penalization methods in ℓ1-regularization. In this thesis however, it was determined that perhaps penalization methods are not always necessary when it comes to finding the Conditional Independence Graph (CIG) of the dataset, as shown by the superior results from the GMTS and SIN methods. It must be noted however that the ℓ1-regularization methods do require further analysis, because it is still unknown how these approaches could provide relatively poor results.
Even in terms of the convergence rates, the simpler testing procedures (GMTS and SIN) were a more attractive prospect using these simulated datasets, due to the lack of
optimization required in the process. A more formal comparison of the speeds of convergence is required for each of the ℓ1-regularization methods, since basic observations of process speeds does not provide any proper results in the context of research in this area.
It has been discovered that while the main motivation behind using the ℓ1-regularization remains (approximately) true for each of the four methods examined in this thesis, there are many differences, great or small, between them. This can be seen in the function that requires penalization in each method, the manner in which the tuning parameter is selected in each of the methods, or even the penalty itself, which is imposed on the function, which in this case can be based on either the precision matrix or the partial correlation matrix.
In Section 8 of this thesis, there were many areas of future research considered, that were outside the motivations of this thesis. It would be useful to examine the
performances of these six methods using different simulation studies. In particular, due to the knowledge that the ℓ1-regularization methods are designed to cope with high- dimensional problems, p ≫ n, a set of simulation studies, similar to the one conducted in this thesis, could be carried out to determine whether the GMTS and SIN methods would outperform the ℓ1-regularization methods again. If it is found that the ℓ1-
regularization methods do perform better, then perhaps it would be useful to consider a more comprehensive analysis of these approaches, considering some of the methods not used in this research (like the SCIO by Liu and Luo (2012), and the scaled-lasso by Sun and Zhang (2012).
(10) Bibliography
[1] Banerjee, O., El Ghaoui, L., & d'Aspremont, A. (2008). Model selection through
sparse maximum likelihood estimation for multivariate gaussian or binary data. The Journal of Machine Learning Research, 9, 485-516.
[2] Banerjee, O., Ghaoui, L. E., d'Aspremont, A., & Natsoulis, G. (2006, June).
Convex optimization techniques for fitting sparse Gaussian graphical models. In Proceedings of the 23rd international conference on Machine learning (pp. 89-96). ACM.
[3] Belloni, A., Chernozhukov, V., & Wang, L. (2011). Square-root lasso: pivotal
recovery of sparse signals via conic programming. Biometrika, 98(4), 791-806.
[4] Bien, J., & Tibshirani, R. J. (2011). Sparse estimation of a covariance matrix.
Biometrika, 98(4), 807-820.
[5] Boyd, S., & Vandenberghe, L. (2004). Convex optimization. Cambridge
university press.
[6] Brüggemann, R. (2004). Model reduction methods for vector autoregressive
processes (Vol. 536). Springer Science & Business Media.
[7] Bühlmann, P., & Van De Geer, S. (2011). Statistics for high-dimensional data:
methods, theory and applications. Springer Science & Business Media.
[8] Cai, T. T., Liu, W., & Zhou, H. H. (2012). Estimating sparse precision matrix:
Optimal rates of convergence and adaptive estimation. arXiv preprint
[9] Cai, T. T., Zhang, C. H., & Zhou, H. H. (2010). Optimal rates of convergence
for covariance matrix estimation. The Annals of Statistics, 38(4), 2118-2144.
[10] Cai, T., Liu, W., & Luo, X. (2011). A constrained ℓ1 minimization approach to
sparse precision matrix estimation. Journal of the American Statistical
Association, 106(494), 594-607.
[11] d'Aspremont, A., Banerjee, O., & El Ghaoui, L. (2008). First-order methods for
sparse covariance selection. SIAM Journal on Matrix Analysis and
Applications, 30(1), 56-66.
[12] Dempster, A. P. (1972). Covariance selection. Biometrics, 157-175.
[13] Drton, M., & Perlman, M. D. (2007). Multiple testing and error control in
Gaussian graphical model selection. Statistical Science, 430-449.
[14] Drton, M., & Perlman, M. D. (2008). A SINful approach to Gaussian graphical
model selection. Journal of Statistical Planning and Inference, 138(4), 1179- 1200.
[15] Edwards, D. (2000). Introduction to graphical modelling. Springer Texts in
Statistics.
[16] Friedman, J., Hastie, T., & Tibshirani, R. (2001). The elements of statistical
learning (Vol. 1). Springer, Berlin: Springer series in statistics.
[17] Friedman, J., Hastie, T., & Tibshirani, R. (2008). Sparse inverse covariance
estimation with the graphical lasso. Biostatistics, 9(3), 432-441.
[18] Friedman, J., Hastie, T., & Tibshirani, R. (2010). Applications of the lasso and
grouped lasso to the estimation of sparse graphical models (pp. 1-22). Technical
[19] Gottard, A., & Pacillo, S. (2010). Robust concentration graph model
selection. Computational Statistics & Data Analysis, 54(12), 3070-3079.
[20] Hsieh, C. J., Dhillon, I. S., Ravikumar, P. K., & Sustik, M. A. (2011). Sparse
inverse covariance matrix estimation using quadratic approximation. In
Advances in Neural Information Processing Systems (pp. 2330-2338).
[21] James, G. M., Radchenko, P., & Lv, J. (2009). DASSO: connections between
the Dantzig selector and lasso. Journal of the Royal Statistical Society: Series B
(Statistical Methodology), 71(1), 127-142.
[22] Johnson, C. C., Jalali, A., & Ravikumar, P. (2011). High-dimensional sparse
inverse covariance estimation using greedy methods. arXiv preprint
arXiv:1112.6411.
[23] Li, X., Zhao, T., Yuan, X., & Liu, H. (2012). An R Package flare for High
Dimensional Linear Regression and Precision Matrix Estimation. R Package
Vignette.
[24] Lin, A., (2008). Edge Deletion Tests in Graphical Modelling for Multivariate
Time Series. Honours Project dissertation (unpublished), University of Canterbury.
[25] Liu, H., & Wang, L. (2012). Tiger: A tuning-insensitive approach for optimally
estimating gaussian graphical models. arXiv preprint arXiv:1209.2437.
[26] Liu, H., Han, F., Yuan, M., Lafferty, J., & Wasserman, L. (2012). High-
dimensional semiparametric Gaussian copula graphical models. The Annals of
Statistics, 40(4), 2293-2326.
[27] Liu, H., Lafferty, J., & Wasserman, L. (2009). The nonparanormal:
Semiparametric estimation of high dimensional undirected graphs. The Journal
[28] Liu, W., & Luo, X. (2012). High-dimensional sparse precision matrix estimation
via sparse column inverse operator. arXiv preprint arXiv:1203.3896.
[29] Lütkepohl, H. (2005). New introduction to multiple time series analysis.
Springer Science & Business Media, New York.
[30] Mazumder, R., & Hastie, T. (2012). The graphical lasso: New insights and
alternatives. Electronic journal of statistics, 6, 2125-2149.
[31] Meinshausen, N., & Bühlmann, P. (2006). High-dimensional graphs and
variable selection with the lasso. The Annals of Statistics, 1436-1462.
[32] Peng, J., Wang, P., Zhou, N., & Zhu, J. (2009). Partial correlation estimation by
joint sparse regression models. Journal of the American Statistical
Association, 104(486).
[33] Pfaff, B. (2008). VAR, SVAR and SVEC models: Implementation within R
package vars. Journal of Statistical Software, 27(4), 1-32.
[34] Sims, C. A. (1980). Macroeconomics and reality. Econometrica: Journal of the
Econometric Society, 1-48.
[35] Sun, T., & Zhang, C. H. (2013). Sparse matrix inversion with scaled lasso. The
Journal of Machine Learning Research, 14(1), 3385-3418.
[36] Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal
of the Royal Statistical Society. Series B (Methodological), 267-288.
[37] Tsay, R. S. (2013). Multivariate Time Series Analysis: With R and Financial
Applications. John Wiley & Sons.
[38] Verzelen, N., & Villers, F. (2009). Tests for Gaussian graphical
[39] Vichik, S., & Oshman, Y. (2011, June). Optimal covariance selection for
estimation using graphical models. In American Control Conference (ACC),
2011 (pp. 5049-5054). IEEE.
[40] Wilson, G. T. (2010). Atmospheric CO2 and global temperatures: the strength
and nature of their dependence. Working paper.
[41] Wilson, G. T., Reale, M., & Morton, A. S. (2001). Developments in multivariate
time series modeling. Department of Mathematics and Statistics, University of
Canterbury.
[42] Wilson, G. T., & Reale, M. (2008). The sampling properties of conditional
independence graphs for I (1) structural VAR models. Journal of Time Series
Analysis, 29(5), 802-810.
[43] Xue, L., & Zou, H. (2012). Regularized rank-based estimation of high-
dimensional nonparanormal graphical models. The Annals of Statistics,40(5), 2541-2571.
[44] Yuan, M. (2010). High dimensional inverse covariance matrix estimation via
linear programming. The Journal of Machine Learning Research, 11, 2261- 2286.
[45] Yuan, M., & Lin, Y. (2007). Model selection and estimation in the Gaussian