• No results found

5. BIOGEOGRAPHICAL ZONES OF THE IRMINGER BASIN:

5.3. EMPIRCAL ORTHOGONAL FUNCTION ANALYSIS:

5.3.2. INTERPRETING EOFS:

Imagine that every row in the matrix F is plotted as a position vector in p- dimensional space. If the observations are totally random the resulting plot will look like a large mass of data points. If however there are any patterns or

regularities in the data these will be seen as clusters of points in particular regions (or directions) of the plot. The aim of EOF analysis is to define a new coordinate system where the axes of the graphs are rotated, so that they pass through the centre of a particular cluster (see Figure 5.8 for an illustration of a case where p = 2). By doing this a pattern (or mode of variability) in the data is picked out and the resulting EOF describes the spatial distribution of the mode. The first EOF is the projection of the original measurements onto the new coordinate system which maximises the variance explained. An analysis often reveals that just the first few EOF modes explain a large proportion of the variance. This is exactly what is hoped for: the EOF analysis has reduced the large and complex dataset to a few modes of variability.

The difficulty now comes in trying to interpret the physical basis behind the EOF patterns. The first issue is that there is not necessarily any link between the patterns and a ‘real-world’ physical mechanism. The EOFs are purely statistical entities and there is no a priori reason why they should reflect

dynamical processes. Indeed a single physical process can be spread over several modes, or alternatively more than one process may contribute to a single EOF mode.

The most important factor in being able to successfully interpret EOF modes is a thorough understanding and familiarity with the data to be analysed. An EOF analysis should not be embarked upon until the data has been

investigated using more simple methods, such as identification of anomalies, time series at certain grid points, correlation analyses etc. In addition knowledge of the physical processes which might be observed is vital. In oceanography these may be, for example, upwelling events, El Niño or topographically forced processes.

Secondly, the maps of the modes of variability need to be presented in a way that will aid interpretation. The EOF patterns themselves are dimensionless and interpreting them in terms of useful quantities is not always easy. Instead an homogenous correlation map, which is the correlation between the time series of the EOF mode and the time series of the original data at each point, can be created. This map highlights the ‘centres of action’ of the mode and in addition, the square of the correlations is a measure of the percentage of the variance explained locally by each mode (Houghton and Tourre, 1992).

Finally the physical interpretation of the EOF patterns is greatly helped by studying the time series of the EOF modes. Generally an inter-annual (or possibly inter-decadal if a longer time series is available) signal will be easily recognisable in the time series, although often a running mean filter is applied to remove some of the noise. The time series will usually oscillate between positive and negative values. Where the time series is negative (positive) this corresponds to areas of negative (positive) correlation in the EOF spatial maps.

One difficulty with interpreting EOFs often arises from the fact that the modes are constrained to be orthogonal. There is usually no reason in climate studies to expect that the data were generated by orthogonal modes of variability. Richman (1986) suggests that rotating the EOFs may yield more insight into the physical processes behind them. The general concept is to replace the EOF spatial patterns in C (from Equation 5.4) with patterns CR that satisfy:

CR = C.R [Eqn 5.6]

where the matrix R is chosen such that the resulting rotated patterns, CR,

maximise a simplicity function. There are several different rotations, each with its own simplicity function, but the ‘varimax’ rotation is the most commonly used. The varimax rotation finds a linear combination of the original EOF spatial patterns such that the variance of the time series (or loadings) is maximised. The function to be maximised is:

⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ ⎟ ⎠ ⎞ ⎜ ⎝ ⎛ − = j i ij i ij a t a t V 2 2 4 1 1 [Eqn 5.7]

where t is the number of observations (see Figure 5.7) and a are the time series of Equation 5.5 (Richman, 1986). An orthogonal rotation, such as varimax, will find a new orthogonal basis, but, unlike the unrotated case, the time series in the rotated frame will not be uncorrelated. There is much debate on whether to rotate or not to rotate, but the general consensus is that if rotation aids interpretation it should be carried out (Jolliffe, 1989). More details on the mathematics and pros and cons of rotation can be found in Preisendorfer (1988) and von Storch and Navarra (1999).

In conclusion, the correlation maps and the time series of the EOF modes together, combined with a good understanding of the physical processes under study, should yield a sensible interpretation of the results. A note of caution though from von Storch and Navarra (1999): ‘[advanced statistical] methods are often needed to find a signal in a vast noisy phase space, i.e. the needle in the haystack. But after having the needle in our hand, we should be able to identify the needle by simply looking at it.’

5.3.3. EOF ANALYSIS OF IRMINGER BASIN SEAWIFS