• No results found

Classification analysis

SHIFT IN MODAL MONTH

4.5 Classification analysis

By using three separate methods of describing seasonality, a large amount of information has been produced in the preceding sections of this chapter. The three methods can be seen as complementing one another, as each provides information not revealed by the others. This amounts to a very comprehensive description of the patterns of seasonality across Scotland. However, it is difficult to assimilate all the information provided to gain a general picture of how seasonality varies across space. A method of condensing the appropriate information was required to simplify the many patterns which have been described and, to this end, a classification analysis was employed. This enables the seasonality of flooding at each site to be described by reference to one of a small number of seasonality groups, and, by plotting the group membership of individual stations on a map,

allows spatial patterns to be easily understood. The greatest advantage of a classification analysis was seen to be the ability to condense several sources of information into a single-term description of seasonality for each site, since values of a large number of scalar variables can be used to determine group membership. To implement the classification analysis of the seasonal data, a computer package, CLUSTAN (Wishart 1987), was used. Four separate aspects of the classification had to be determined: the choice of input variables to describe seasonality; the classification method to be employed; the similarity measure to be used and the final number of groups (clusters) to be arrived at.

It was decided that the analysis should be based upon the two-monthly flood frequency data described in Section 4.3 as this provided a detailed source of information for the analysis. Mean day of flood and r values would not be needed in addition, since these can be seen as summarising the two-monthly data. It was also decided not to include the modal month data produced for high discharge events, as it was felt that at stations with relatively short records, the small number of events exceeding a low frequency threshold would make the use of such data somewhat unreliable. However, the data set did include all stations which satisfied the requirements of the threshold revision, including those where standard period- adjusted two-monthly frequency values were not available. In such cases, unadjusted values based simply on the period of record available were used, thus enabling an improved spatial coverage to be achieved.

A classification analysis involves the allocation of cases (in this instance catchments) to clusters, and this can be done using a great variety of methods. The choice of method determines the specific means by which cases are combined to form clusters, and the CLUSTAN user manual recommends a two-part method for a study with this number of cases (143). Initially, Ward's method, which involves a hierarchical fusion of clusters at each step in the classification, is to be used. At the first step, each case is considered to constitute a one-member cluster, and, at each subsequent step, the two closest clusters are merged with each other, distances being calculated in terms of euclidean sum of squares which is a standard method of calculating distance in multivariate space (principal component values were used, as recommended, rather than the actual values of the six original variables). This method was used to produce ten clusters after which point a second method is recommended. This involves iterative relocation of cases from within their existing groups to new groups wherever this results in a reduction in within-group variation

(distance between members of a group) and an increase in between-group variation (distance between groups). In this way, groups become better-defined than is possible by a hierarchical method in which no relocation is possible after initial fusion of clusters. Fusion then proceeds until the desired number of clusters is produced, and in this case the classification was terminated with four clusters since these appeared to represent four physically meaningful models of flood seasonality. Figure 4.5 shows two-monthly flood frequency values for the members of each of these four groups; actual values are presented in Figure 4.5a, while a more accurate picture of the typical seasonality of flooding in each can be gained from Figure 4.5b where 25, 50 and 75 percentile values of the two-month frequencies are given. Taking each group in turn, Group A is characterised by a strong winter seasonality, with the highest two-monthly percentage occurring in either October-November or December-January at all but one of its stations, and 69% recording over 50% of events in these four months.. Flood frequencies in the remaining months of the year are therefore rather low, and it can be seen that frequencies in August- September and February-March are considerably lower than in the intervening two two-month periods. Group A may therefore be considered to be composed of stations with winter-strong seasonality.

Group B is distinguished by the unusually high number of events occurring in late winter, with February-March frequencies being considerably higher than for any other group, and generally only a little lower than for December-January. Frequencies in this group are low in April-May and June-July - as is generally the case - but then gradually rise through August-September and October-November to a maximum in December-January or occasionally February-March. This group has less pronounced seasonality than Group A (mean r value 0.491 compared with 0.564 in Group A), but still the high proportion of events occurring in late winter is a strong characteristic.

Group C is characterised by a much less pronounced seasonality than any of the other three groups and is therefore composed of the stations with the lowest

r values (mean=0.355). Only rarely does the number of events occurring in any two-month period exceed 30% of the total, and June-July frequencies are conspicuously high. Unexpectedly perhaps, the opposite is found in August- September for most stations in this group, but the lack of any season with unusually high flood frequency defines this group clearly as one with a very weak

Figure 4.5a Two-month frequency values for Groups A-D

J J-June-July; AS=August-September; ON=October-November; DJ=December-Jamiary; FM-February-March; AM=April-May

% frequ ency % frequ ency

Figure 4.5b

Median and quartile ranges for two- month frequency values in Groups A-D

JJ-June-July; AS=August-September; ON=October-November; DJ-December-January; FM-February-March; AM=April-May

% fr eq u en cy % fr eq u en < , y —75% ---- ♦— 50% —25% JJ AS ON DJ FM AM Two-month period —75% —*— 50% —25%

seasonal signature, and a rather earlier mean day of flood than Group B which has only a relatively modest seasonality.

Finally, Group D is characterised by a relatively strong early seasonality. August- September frequencies are unusually high at stations in this group, and the season of maximum frequency is no later than October-November at 78% of its members. Though there is a strong seasonality at these stations, Group D stations are clearly differentiated from those of Group A by their significantly earlier bias.

The spatial patterns in group membership can be seen in Figure 4.6, and it is immediately apparent that each of the groups has its own spatial distribution of members. Beginning with Group A, these stations with winter-dominated seasonality are generally found in inland areas and often on rivers draining large catchments: many Group A stations are found on the Rivers Clyde, Tweed, Dee and Spey and some of their larger tributaries. Group B stations are heavily concentrated on Northumberland and the Borders, with only six of the seventeen members occurring outwith this area. Of these, two occur below two of the largest loch storages in the study (06007 Ness @ Ness-side below Loch Ness and 94001 Ewe @ Poolewe below Loch Maree), but the concentration of Group B stations in the south-east of the study area is striking.

Even more striking, perhaps, is the spatial distribution of the members of Group C, which are essentially confined to two geographic areas. In north-east Scotland, a cluster of these stations with especially weak seasonality is found in a coastal area extending from the River Dee to the lower River Findhorn and across the Moray Firth to Caithness. Further south, a larger and more concentrated cluster of stations is found on the south shore of the Firth of Forth, encompassing the rivers between the Almond and the Tyne, with a further four examples being found further south in Berwickshire and Northumberland. None of the stations on rivers along this stretch of the Firth of Forth belongs to any other group. Finally, the members of Group D, showing a pronounced early seasonality, are found most predominantly in south-west Scotland. Only three of the 21 stations in hydrometric areas 77-83 belong to any other group; a strong cluster of Group D stations is also found in the Kelvin sub-catchment of the Clyde, but only five examples are found north of the Forth-Clyde valley.

This classification analysis allows a succinct description of the seasonality of flooding across Scotland to be made. It has been possible to identify broad spatial

Figure 4.6

GROUP MEMBERSHIP FOR 143 STATIONS

(autumn/winter) (weak seasonality) (late winter) ___ (winter strong)

Q

U

ca

C

___________________ 124

patterns, along with exceptions to them. The area with the earliest flooding in the year is south-west Scotland, where flood frequencies in August-September and October-November are especially high. Moving directly east, flooding in Northumberland and at some stations in the Tweed basin is somewhat contrasting, with the months between December and March assuming greatest importance. Immediately to the north, a concentrated cluster of Group C stations indicates a much more even distribution of floods amongst the seasons, and another similar cluster of Group C stations is found in north-east Scotland. Stations with winter- dominated seasonality are found in some numbers in all parts of Scotland except the south-west, but are mostly confined to relatively large basins and inland areas. However, these are only general trends and Figure 4.6 shows exceptions to them. Some of these appear as the result of the four groups employed merging into each other to some extent, with the effect that, in some instances, stations with relatively similar seasonalities will be shown as belonging to different groups. In other cases, however, adjacent catchments do have markedly differing seasonalities. A good example might be a cluster of three stations in the Tweed basin which each belong to a different group: 21009 Tweed @ Norham belongs to Group A, having a clearly winter-dominated flood seasonality, but a left bank tributary, the Whiteadder Water (station 21022) shows a much less pronounced seasonality and is assigned to Group C while on the opposite side of the main river, the River Till (station 21031) is assigned to Group B on account of its late winter seasonality. It would therefore seem that the determinants of flood seasonality must operate at both regional and more local scales. On the one hand, regional effects such as the general trend for flooding to occur later in the east than the west are likely to be controlled by large-scale meteorological factors, while more localised differences such as the Tweed example given above are likely to be the result of a variety of catchment-scale factors. The causes of the patterns of flood seasonality described here are discussed in Chapter 5.