Models of asset fluctuations
11.4 Portfolio analysis using minimum spanning trees
Mantegna and Stanley (2000) have proposed a simple and convenient way to visualize correlations between a large number of stocks via a minimum spanning tree (MST) or an indexed hierarchical tree, i.e. by using concepts developed in graph theory.
In order to compute these objects it is first necessary to introduce a quantity which can be thought of as distance. To this end we will introduce a T -dimensional vector
˜ri whose components are the normalized log-returns ˜ri,n= ˜ri(tn) with 1≤ n ≤ T , see eqn (11.27).
A distance dij between vectors ˜ri and ˜rj (thus corresponding to stocks i and j) may then be defined using the Euclidean norm as follows,
2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 Year
–0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8
Correlation coefficient
Figure 11.5 A plot of the correlation coefficient ρLloyds-BP for Lloyds and BP for the data shown in Figure 11.4. The coefficient is computed as a moving average with a time window of 100 days (daily data, obtained from yahoo.com). Periods of high correlations may be linked to specific external events.
dij=||˜ri− ˜rj|| =
T
n=1
(˜ri,n− ˜rj,n)2
1/2
. (11.30)
The distance may be expressed in terms of the correlation coefficients ρij (see eqn 11.28),
d2ij= (˜ri− ˜rj)· (˜ri− ˜rj) =||˜ri||2+||˜rj||2− 2˜ri· ˜rj= 2(1− ρij), (11.31) where we have made use of the normalization||˜ri||2=||˜ri||2= 1 (see eqn 11.27).
Since −1 < ρij < 1, it follows that 0 < dij < 2. Since dij = 0 ⇔ i = j and dij = dji, and also the so-called ‘triangular’ inequality holds, i.e. dij ≤ dik+ dkj, dij satisfies all the properties associated with a distance.
The matrix of distances between pairs of stocks i and j may now be used to create a network of nodes with each node representing a time series of a stock. Stocks that are highly correlated then correspond to nodes that are close together on the network.
A minimum spanning tree (MST) is a network consisting only of N−1 links out of the total of N (N−1)/2 possible available links between N different stocks. The N −1 links are chosen so that the total length of the network is minimized, under the conditions that all nodes are connected and that there are no loops. (The absence of loops turns the network into a tree in the language of graph theory.)
Various algorithms are available to compute an MST (which is unique only if all distance dij differ). Prim’s algorithm is given as follows.
• Determine the element with the minimum distance of the matrix and connect the corresponding nodes i and j.
• Determine the minimum distance amongst the remaining matrix entries.
2006 2007 2008 2009 2010 2011 2012 Year
3 4 5 6 7 8 9
Log-price
FTSE BP RDSA LLOY BARC ULVR
Figure 11.6 The log-price of five FTSE 100 stocks traded at the London Stock Exchange, together with the value of the FTSE itself, over the period 2006–12 (weekly data from http://www.yahoo.com). (BP – British Petrol, RDSA – Royal Dutch Shell plc, LLOY – Lloyds Banking Group plc, BARC – Barclays plc, ULVR – Unilever)
• Connect the corresponding nodes, provided only one of them is already connected to other nodes.
• Repeat this procedure until all nodes are connected.
In this way the minimum spanning tree may be fully constructed.
Let us illustrate this with a rather simple, specific example with a portfolio of six stocks from the London Stock Exchange, see Figure 11.6. Weekly data over the period January 2006 to January 2012 has been used to first compute the correlation matrix, ρij, (see Table 11.2) and then the distance matrix, dij (see Table 11.3).
The minimum distance is between the pair LLOY and BARC (0.46), so these will be connected. The next minimum distance is between BP and RDSA (0.56), but none of these are already connected, so the next pair, BP and BARC (1.16) needs to be considered, and will be connected. This is now followed by the links BP and RDSA (0.56) and RSDA and ULVR (1.56), which completes the procedure, resulting in the minimum spanning tree as shown in Figure 11.7(a). In Figure 11.7(b) we show the MST for all the data of Table 11.3; the FTSE index is now in the centre.
An alternative representation of the correlation between stocks is in the form of an indexed hierarchical tree which may be constructed from the minimum spanning tree, as described in the book by Mantegna and Stanley (2000).
Minimal spanning trees have been used to explore the correlations and clustering in stock portfolios in the various markets across the globe. Stocks traded on a market are often classified according to economic sectors, such as Resources, or Basic Industries, with various subdivisions in these sectors. Coelho et al. (2007a) have used a minimum spanning tree analysis to examine the clustering of stocks traded on the FTSE 100. By doing so, they could show that some stocks were better reclassified from their present sector. A new classification scheme, adopted by FTSE in the beginning of 2006, offered
Table 11.2 Computed correlation matrix for the 5 FTSE 100 stocks and the value of the FTSE from Figure 11.6. (Note that this is a symmetric matrix, with entries of 1.00 (full correlation) as diagonal elements.)
FTSE BP RDSA LLOY BARC ULVR
FTSE 1.00 0.72 0.81 0.58 0.62 0.29
BP 1.00 0.72 0.32 0.42 0.18
RDSA 1.00 0.34 0.38 0.22
LLOY 1.00 0.77 0.11
BARC 1.00 0.15
ULVR 1.00
Table 11.3 Computed distance matrix, obtained from the correla-tion matrix of Table 11.2. (Note that this is a symmetric matrix, with entries of 0.00 (distance zero) as diagonal elements.)
FTSE BP RDSA LLOY BARC ULVR
FTSE 0.00 0.75 0.62 0.92 0.87 1.19
BP 0.00 0.75 1.17 1.08 1.28
RDSA 0.00 1.15 1.11 1.25
LLOY 0.00 0.68 1.33
BARC 0.00 1.30
ULVR 0.00
an improvement for the descriptions of the correlations seen in the data. However, still not all correlations were captured appropriately, which could have affected the design of optimum portfolios at the time.
Coelho et al. (2007b) also used an MST analysis to study the process of market integration for a large group of national stock market indices, as shown in Figure 11.8. Using a moving time window, they showed how the asset tree evolves over time and described the dynamics of its normalized length, mean occupation layer, and other parameters. Over the period studied, 1997–2006, the minimum spanning tree shows a tendency to become more compact, implying that global equity markets are increasingly interrelated. The consequence for global investors is a potential reduction of the benefits of international portfolio diversification.
The use of minimum spanning trees has been criticized for involving only a subset of the complete set of elements of the correlation matrix. One could imagine using a different subset, corresponding, for example, to the maximal spanning tree. Would this have any relevance? It is not so straightforward to address this question. For this reason we turn now to a different method that exploits the complete set of elements of the correlation matrix in a more systematic manner.