• No results found

Models of asset fluctuations

11.5 Portfolio analysis using random matrix theory

With real physical systems, it is usually possible to relate correlations to underlying interactions between basic sub-units such as atoms or molecules. However, this is

LLOY BARC

0.68 BP

1.08 RDSA

0.75 ULVR

1.25

FTSE LLOY

0.75

BP

RDSA

0.62

ULVR 1.19

0.87 BARC

0.68

(a)

(b)

Figure 11.7 Minimal spanning trees constructed from the distance matrix of Table 11.3 for FTSE 100 data over the period 2006–12. (a) MST for the BP, RDSA, LLOY, BARC, and ULVR stocks. (b) MST that also includes the data for the correlation of these stocks with the FTSE index itself.

Figure 11.8 Average minimum spanning tree constructed from data for markets of 53 coun-tries over the period 1997–2006. Coding is: Europe, grey circles; North America, white dia-monds; South America, grey squares; Asian-Pacific area, black triangles; and ‘other’ (Israel, Jordan, Turkey, South Africa), white squares. (Figure reprinted from Coelho et al. (2007b), with permission from Elsevier.)

not possible for stocks or financial assets, where the underlying interactions are not known. More difficulties arise when analysing the significance and meaning of cross-correlations determined empirically. Firstly, market conditions change with time and correlations may not be stationary. Secondly, the finite length of time series used to estimate the correlations can introduce ‘measurement noise’. But it is pertinent to ask

the question: can we estimate, from a correlation matrix, those stocks that on average remain correlated over any particular time period? One answer lies in comparing the statistics of the empirical correlation matrix with those of a random correlation matrix and assessing the deviations. To develop this approach we first need to understand a little about random matrices.

11.5.1 Elements of random matrix theory

The methods behind random matrix theory (RMT) have a rich history and were originally developed more than 50 years ago by Wigner, Dyson, and others to explain the statistics associated with energy levels in complex nuclei. They proposed that the Hamiltonian H describing a complex heavy nucleus was made up of independent random elements Hij, drawn from a probability distribution. So for a physical system, the RMT predictions were an average over all the possible interactions. Deviations from these predictions then provided insights into the system specific or non-random properties of the empirical system being considered.

Of particular interest is the distribution of eigenvalues of the matrix under con-sideration. It can be shown that for a real symmetric matrix with independent and identically distributed elements with finite variance σ2of the distribution, the density ρ(λ) of the eigenvalues λ is given by

ρ(λ) = 1 2πσ2

2− λ2 if 4σ2≥ λ2, (11.32)

else ρ is zero (see Appendix 11.8). This is the ‘semicircle’ law first derived by Wigner.

In our case, we see from the previous section that we have an N × T matrix G composed of N time series each with T elements. The matrix is generally not square but we construct a square N× N correlation matrix by computing C = (1/N)GGt.

Provided G is square, we can use Wigner’s semicircle law to obtain, from the properties of G, the corresponding properties of the matrix C. Thus the eigenvalues of C are obtained from those of G by squaring them: λC = λ2G. Furthermore, the density of eigenvalues of C is related to that of the density of eigenvalues of G via ρ(λC)dλC = 2ρ(λG)dλG, where the factor 2 takes into account that there are two solutions, λG=±√

λC. This yields

ρ(λC) = 1 2πσ2

*

2− λC

λC (11.33)

This density is valid for 4σ2> λC; else it is zero.

For the case when G is not square, ie N = T , it can be shown that in the limit T → ∞ and keeping the ratio Q = T/N fixed, the spectrum of eigenvalues, λC, of the square matrix C is bounded, and distributed according to the density

ρ(λC) = Q 2πσ2

max− λC)(λC− λmin)

λC , (11.34)

where λmin< λ < λmaxand

0 0.2 0.4 0.6 0.8 1 1.2 1.4

0 0.5 1 1.5 2 2.5 3 3.5 4

Density of eigenvalues

Eigenvalues

Q=1 Q=4 Q=8

Figure 11.9 Density of eigenvalues of the square random matrix C, as defined in the text (eqn (11.34)) for different values of Q. For Q > 1, the function has a cut-off for both small and large values of λ. For Q = 1 it diverges as λ→0.

λmaxmin = σ2(1± 1/

Q)2. (11.35)

The main characteristics of this density function is that the spectrum has a lower bound and so there are no eigenvalues below λ = λmin which furthermore tends to zero as Q→1. The density also vanishes for values of λ > λmax. For finite values of N , these discontinuities become blurred and there is a small probability of finding eigenvalues in the ‘forbidden’ range.

Figure 11.9 illustrates eqn (11.34) for three different values of Q. This density distribution arises then from the noise associated with the random matrix and, as implied in the introduction, the interesting step is to compare this with the density matrix obtained from analysing real asset price correlation data.

11.5.2 Eigenvalue analysis of stock data

A typical result for an eigenvalue analysis of stock data is shown in Figure 11.10. While the majority of eigenvalues lie within the range predicted by random matrix theory, i.e. λmin < λ < λmax, there is a discrete spectrum of larger eigenvalues well outside this range. This is indicative of correlations in the data, which can be explored by looking at the components of the corresponding eigenvectors.

It is found that the eigenvector components associated with the largest eigenvalue are distributed more or less uniformly across all the stocks that make up the correlation matrix. The smaller eigenvalues are associated with smaller clusters of company or industrial sectors.

Figure 11.11 shows the eigenvector components corresponding to the second largest eigenvalue for a portfolio of 641 stocks traded at the stock exchanges of Paris, London, and New York (the corresponding eigenvalue spectrum is shown in Figure 11.10). The data is grouped into eight different industrial sectors and one can see a clear segregation between (negative) European and (positive) US components. Extending this analysis

0 20 40 Values of eigenvalues

Distribution

60 2 1 00 0.5 1 1.5 2

3 4

80

Figure 11.10 Distribution of eigenvalues of the correlation matrix for a portfolio of 641 stocks traded at three different markets: Paris, London, and NYSE (data: daily closure price in US dollars from 30 December 1994 to 1 January 2007). The vertical lines in the inset indicate the region predicted by random matrix theory, λmin=0.2994 and λmax=2.1107 (eqn (11.35)).

Arrows point to eigenvalues that are outside this region. (Figure reprinted from Coelho et al.

(2008a), with permission from Advances in Complex Systems, World Scientific.)

a b c d e f g h i

−0.040.000.040.08 PAR LON NYSE

Figure 11.11 The elements of the eigenvectors corresponding to the second largest eigen-value of Figure 11.10, show a separation of the European data (negative) from the US data (positive). The elements are grouped in markets and nine industrial sectors; (a) basic materi-als, (b) consumer goods, (c) consumer services, (d) financimateri-als, (e) health care, (f) industrimateri-als, (g) oil and gas, (h) technology, (i) utilities. (Figure reprinted from Coelho et al. (2008a), with permission from Advances in Complex Systems, World Scientific.)

to the third eigenvector components, groups together the New York and London data for all eight sectors (Coelho et al. 2008a).

This methodology would then appear to be a powerful way of establishing portfolios of assets with specific correlation characteristics, based on the data alone. No other

knowledge of a qualitative nature—not even the nature of the company is necessary.

More detail of the method can be found in the paper by Plerou et al. (2002).

The method has been extended to cover other random matrices arising from L´evy processes. Detail of these developments together with much more detail can be found in the book edited by Burda et al. (2005).