4. Development of a Synoptic Classification Procedure for Application to
4.4. Temporal Synoptic Indexing: Data and Methods
The TSI technique adopted in this study utilizes gridded ERA-interim data, which is the latest reanalysis product offered by the European Centre for Medium Range Weather Forecasting (ECMWF) and was introduced briefly in Chapter 2, Section 2.4.
The reanalyses data provide a multivariate, spatially complete and physically coherent data set describing atmospheric circulation and consist of global observations assimilated by a numerical forecast model (Dee et al., 2011). The global coverage of these data make reanalysis products ideally suited to studies in remote or inaccessible terrain, where prolonged in-situ measurements are challenging. These attributes have seen the use of reanalysis data in glaciological studies increase. Examples include:
energy balance modelling (Hock et al., 2007; Rye et al., 2010); temperature-index modelling (Hanna et al., 2005; Radić and Hock, 2006; Hock et al., 2007);
parameterization of near-surface, on-glacier lapse rates (Gardner et al, 2009;
Hodgkins et al., 2012a, b); and the identification of atmospheric indices’ relationship to glacier mass balance (Shea and Marshall, 2007).
The choice of which climatic elements to retain for classification was determined by this study’s aim of investigating the effect of synoptic scale meteorological variability on processes of ablation; the variables chosen for inclusion are therefore guided by a priori consideration of meteorological elements that are likely to affect components of the SEB. Additionally, the TSI procedure was developed for the identification of ‘air masses’ (Kalkstein and Corrigan, 1986), which associates source areas with categories (Barry and Chorley, 2009). This is a useful concept to retain for the purposes of this study, as it allows discussion of the categories identified to be contextualised in the synoptic climatological literature. Consequently, the variables chosen for inclusion in this study were as follows:
- dewpoint temperature (2 m);
- air temperature (2 m);
- west-east wind scalar (the ‘U’ component of wind speed; 10m);
- south-north wind scalar (the ‘V’ component of wind speed; 10m);
- surface air pressure;
- total cloud cover.
Temperature and humidity are retained for the effect that these variables have on the sensible and latent heat fluxes respectively; these variables are also considered to be key air mass indicators (Barry and Perry, 1973). The U and V wind components provide information about wind speed and direction: wind speed is important for its effect on the turbulent heat fluxes, whilst the wind direction is evidently important for determining the origin of advected air (and hence air-mass source). The wind direction is regarded as additionally important as it implicitly considers the effect of local topographic barriers in modulating local micro-meteorology. Cloud cover is included because this parameter is relevant for both the longwave and shortwave radiative energy components (e.g. Sedlar and Hock, 2009; Klok and Oerlemans, 2002). Processes of mass convergence and uplift are quantified by air pressure which, along with cloud cover, has been used extensively in other weather classifications (e.g. Kalkstein and Corrigan, 1986; Kalkstein et al., 1990; Brazel et al., 1992;
Kalkstein et al., 1996; Cheng and Lam, 2000; Bejarán and Camilloni, 2003; Fealy and Sweeney; 2007; Bower et al., 2007). These variables were obtained at 6-hourly intervals (00, 06, 12 and 18 UTC) for the 1.5° × 1.5° grid boxes in closest proximity to the study locations (Figure 4.1) for input into the TSI procedure. At Storglaciären, the TSI procedure is performed on the reanalysis data for July-August, 2005-2011, and at Vestari Hagafellsjökull for the period June-August, 2001-2010.
Ultimately, the methodology employed by TSI classifies individual days into categories through application of a clustering algorithm; however, prior to this stage, PCA is performed on the data. This is required to remove co-linearity within the meteorological variables, which maximises the ‘disconnecting power’ of the subsequent clustering procedure (Huth et al., 1993). The goal of PCA is to reduce a large number of variables exhibiting varying degrees of co-linearity to a smaller number of orthogonal components, whilst preserving as much variability in the original data as possible (Wilks, 1995). The input to the PCA is a similarity matrix.
When the variables contain different units (e.g. air pressure and cloud cover), this should be in the form of a correlation, rather than a covariance matrix (Kalkstein et al., 1990). For the data in this study, the correlation matrix is a k × k symmetric matrix, where k = 24 (6 parameters measured 4 times daily). PCA is therefore a way of reducing the number of variables required to describe each day’s meteorology.
Figure 4.1. Location of grid cells from which reanalysis data are extracted.
The principal components (PCs) are calculated as a linear function of the original, standardised, k variables, and are a projection of these data onto the m eigenvectors:
′ , 4.1
where is the mth PC, are coefficients and ′ are the k variables, standardised according to:
′ , 4.2
in which the overbar denotes the mean and σ indicates the standard deviation. Each PC can therefore be considered as analogous to a weighted average; the weights (the ekm coefficients) are dependent on the correlation between the original variables and eigenvectors (Johnson, 1991), subject to the constraint that:
1 . 4.3
For k variables there are k possible PCs, although it is hoped that much of the variance in the original data can be represented by many fewer components (Wilks, 1995). The choice as to how many PCs to retain, however, is a matter of judgement. In this study, a scree plot is used to assist in the decision making, which involves a line plot of each component’s eigenvalue: a steep break in slope is an indicator of a large relative change in the amount of variance explained by the PC, and suggests a suitable cut-off point (Wilks, 1995).
Whilst the rotation of the PCs is an important consideration for some applications within the atmospheric sciences, this is primarily to aid in their physical interpretation (e.g. Mote, 1998a). For the TSI methodology, PCA is only required to reduce the dimensionality of the data into orthogonal components, so that the efficacy of the clustering procedure is maximised; no rotation is therefore required (Huth et al., 1993;
Kalkstein et al., 1990). The clustering algorithm is therefore performed on daily PC scores as calculated through Equation 4.1.
Days characterised by similar meteorological conditions will manifest in proximate component scores. The aim of the clustering procedure is to aggregate these scores into homogenous groups. For this purpose, an agglomerative, hierarchical clustering algorithm is applied. This method starts with each of the days in a separate cluster, then, at each stage, the two most similar (closest) clusters are joined until all days reside in a single group. The distance between any two clusters ( , ) is evaluated according to the average linkage method:
,
1 , 4.4
where and are the numbers of members (days) in each of the clusters being evaluated; and are the order of observation within each cluster (1... and 1... , respectively); and is the squared Euclidean distance (Kalkstein et al., 1987), calculated from:
. 4.5
Thus, the squared difference is taken across each of the retained PCs. This average linkage method is preferred over other, frequently-used agglomerative algorithms, following Kalkstein et al. (1987), who found this technique to be superior to the Ward’s and centroid methods for the TSI procedure: the former failed to partition
‘extreme’ events into distinct clusters, and the latter was prone to ‘snowballing’, whereby larger clusters grew at the expense of smaller ones. Because the agglomerative, hierarchical algorithm joins all days into a single group, the stage at which the algorithm is terminated determines the number of clusters that are retained.
Deciding when this termination should occur is therefore of significant importance, and represents the second critical decision in the TSI procedure, after choosing how many PCs to retain.
Although some sophisticated methods have been proposed to assist with deciding on how many clusters are appropriate (Joliffe and Philipp, 2010), the lack of consensus about which rule to apply (Everitt et al., 2001) means that judgement-based methods remain the most popular (Baxter, 1994). Among such approaches is the use of the
distance measure (Equation 4.4). This involves examining a line plot of the value of at each point in the agglomeration schedule: a large increase in value (break in slope) indicates that two dissimilar clusters have been fused and suggests a suitable point to truncate the algorithm. This method has been used in synoptic climatological studies before (e.g. Kalkstein and Corrigan, 1986; Fernau and Samson, 1990; Mote, 1998b;
Fealy, 2004), and is used in this study. To aid in the visual inspection of the distance measures, the values at each step were standardised according to:
′ , 4.6
in which is the distance measure as given by Equation 4.4, i is the stage of the agglomeration (1...n – 1), and the overbar and are the mean and standard deviation of the previous i distance measures, respectively (Everitt et al., 2001). Inspection of the agglomeration schedule in this way may be challenged in the presence of multiple breaks of slope (Aldenderfer and Blashfield, 1984). To aid decisions in such instances, the following were used as a guide: i) too many clusters are preferred over too few; and ii) the partitioning of days within clusters should be physically plausible, such that the occurrence of one or two very large clusters and many small clusters would be undesirable (e.g. Huth et al., 1993). The first condition is imposed because the retention of a larger number of clusters not only leads to greater homogeneity within those clusters identified, but also allows the possibility of aggregating clusters at a later stage based on subsequent analysis of their meteorological characteristics and effect on the SEB. The second criterion mainly relates to prior considerations of weather types at the study locations. Iceland’s climatological setting, being regularly affected by contrasting air masses and disturbances in the polar front (Einarsson, 1984), means that a limited number of large weather types would be unlikely to capture this variability. The same considerations apply to Storglaciären’s location in northern Sweden, which is similarly influenced by frequent extratropical cyclones and both maritime and continental air masses.