Concluding remarks - Sparse Models in High-Dimensional Dependence Modelling and Index Tracking

The vine copula has been successfully applied in a variety of areas as a flexible tool of dependence modelling. The major technical compromise in the specific applications of vine copulas lies in the so-called simplifying assumption, which simplifies a vine model such that all the bivariate conditional copulas depend on the corresponding conditioning variables through the copula observations only, and the functional forms of these bivariate copulas do not depend on the conditioning variables. In order to relax the SA while maintaining a reasonable model complexity, we propose a generalized-linear-model-based framework to capture the effect from conditioning variables on a bivariate dependency, leading to the vine-GLM copula models. Moreover, we also develop a penalized-MLE-based regularization estimation procedure to control the complexity of vine copula models, which leads to the sparse vine copula models. Empirical studies we conducted on some financial datasets show that our proposed models with GLM and/or sparsity significantly improve the conventional vine copula model with the simplifying assumption using the criteria such as the Bayesian information criterion. In this chapter, eight bivariate copulas are considered as candidates in the specific estimation. Other bivariate copulas can be similarly analyzed with our proposed models to increase the flexibility of dependence modelling. Moreover, while the linear effect of the conditioning variables is focused on in our specific applications of the vine-GLM copula models, other transformations of the conditioning variables, such as quadratic terms, can easily be included to increase the model flexibility.

Chapter 3 Index Tracking using Principal

Component Analysis

3.1 Introduction

Index tracking is a dominant method of the passive investment strategy. It constructs a tracking portfolio to reproduce the return of a benchmark stock market index. Obviously, a stock market index can be tracked by a full replication, which buys and holds all stocks that make up the index with the same weights as those in the index. When the full replication is infeasible (see more details in Section1.2.2), many passively managed funds use a subset of stocks to construct a tracking portfolio to mimic the benchmark index return (see evidence in [71]). We refer to the problem of constructing partial replications as the index tracking problem.

In general, the index tracking problem should be addressed in two steps. One is iden- tifying stocks to hold in the tracking portfolio. The other one is to compute the fund allocation to each selected stock. Focusing on minimizing the in-sample tracking error, [7] formulates the index tracking problem as a mixed-integer quadratic programming problem, and solves the “two steps” simultaneously. This paper inspires numerous studies that

explicitly exploit various mathematical optimization tools. For examples, [55] compares several tracking errors. [22] introduces another tracking error from the regression point of view, and formulates the index tracking problem as a mixed-integer linear programming problem. [105] suggests solving the index tracking problem using a hybrid programming method. For each given stock subset, stock weights are determined using quadratic programming to minimize the tracking error. The best stock subset which leads to the small- est tracking error is searched by a genetic algorithm. Most of the above methods focus on minimizing the in-sample tracking error by solving a mixed-integer quadratic programming problem, which is NP-hard (see [105]). However, it is challenging to obtain optimal solutions of a mixed-integer quadratic programming problem in an efficient way, especially when the number of index components is in the order of hundreds.

While the objective of the above-mentioned papers focuses on constructing a tracking portfolio that minimizes in-sample tracking errors, other criteria have been advocated to constructing the tracking portfolio. [102] studies the mean-variance performance of a tracking portfolio in the Markowitz framework, and this study only discusses the full replication. In terms of the partial replication, [5] studies the index tracking procedure based on the cointegration between the index level and the value of the tracking portfolio, suggesting that the value of the tracking portfolio should be cointegrated with the index level. [51] applies the clustering analysis to the index tracking problem. After stocks are clustered the authors of [51] suggest selecting one stock subjectively from each cluster. The factor model is used in [26] to address the index tracking problem. The authors of [26] suggest that the tracking portfolio should share the same factor structure with the index. However, most of these methods assume that stocks in the tracking portfolio are given, or only use naive or ad-hoc methods to select these stocks. For example, one ad-hoc approach would be to select those stocks with largest market capital.

This chapter provides a more quantitative and theoretically supported method to select stocks in tracking portfolios. In order to do so, the index return is modelled by a linear combination of stock returns plus an independent random noise. A method to identify dominant stocks is proposed based on the principal component analysis (PCA). We first decompose the index return as a function of principal components (PCs) of stock returns. According to Sobol’s total sensitivity index, some essential PCs are retained to approximate

the index return, and the approximation error is controlled by Sobol’s total sensitivity index. When stock returns follow a multivariate normal distribution, some analytical properties are established.

In our proposed approach, the selection of dominant stocks to construct tracking portfolios turns to be the question of choosing stocks which explain retained PCs. If the number of stocks in a tracking portfolio is pre-determined, we suggest selecting stocks that has the largest “similarity” with the retained PCs. In order to measure this similarity, [21] suggests Yanai’s generalized coefficient of determination (GCD). In this chapter, we additionally recommend the distance correlation and HHG test statistics.

Given the selected stocks, determining their weights by minimizing a specific tracking error is computationally easy. When the mean square error (of the difference between the index return and the tracking portfolio return) is used as a measure of tracking error, weights are solved using quadratic programming. When the conditional value at risk (of the difference between the index return and the tracking portfolio return) is used as a measure of tracking error, weights are determined using linear programming.

The rest of this chapter is organized as follows. Section 3.2 sets up the mathematical formulation of the index tracking problem. Section3.3discusses the methodology to retain the significant PCs. In section 3.4, stocks in tracking portfolios are determined according to the retained PCs. In Section3.5, some applications on real financial data are presented to support the tracking accuracy and the computational efficiency of our proposed method.

In document Sparse Models in High-Dimensional Dependence Modelling and Index Tracking (Page 53-56)