7. Extreme value analysis
7.1 Introduction
Most statistical methods are concerned with what happens in the centre of a distribution and seek robust methods that can adequately describe a dataset without being overly influenced by extreme values. There are situations, however, where the extreme values are the prime interest, as is the case for minima values in low- flow analysis. Estimates of the probability of occur- rence of low-flow events can be derived from historical records using frequency analysis. The chapter starts by presenting two example Australian catchments and their low-flow data (section 7.2). This is followed by a general introduction to the concepts of frequency ana- lysis (section 7.3), which involves the definition and selection of the type of hydrological event and extreme characteristics to be studied (section 7.4), the choice of probability distribution (section 7.5), the estimation of distribution parameters (section 7.6) and, lastly, the calculation of extreme quantiles or design values for a given problem (section 7.7).
The procedure is demonstrated using the Weibull dis- tribution for estimating the T-year event for the two example catchments (section 7.8). The chapter closes with a brief introduction to regional frequency analysis (section 7.9). For a more general and detailed presen- tation of frequency analysis in hydrology, the reader is referred to Haan (1977), Stedinger and others (1993) and Tallaksen and others (2004). Frequency analysis of low flows is more specifically covered in the low- flow review by Smakhtin (2001) and the Institute of Hydrology (1980) report on low-flow prediction at the ungauged site.
7.2 Example data for at-site low-flow frequency analysis
In this chapter, two gauging stations from Eastern Victoria in Australia, namely, the Nicholson River at Deptford (Station 223204) and Timbarra River at Timbarra (Station 223207), are used to illustrate the calculation procedure for at-site low-flow frequency
Figure 7.1 Location of example catchments in Australia and their hydrological regime
Station 223207 Station 223204 Station 223204 0 50 100 150 200 250 300
Jan Feb Mar April May June July Aug Sept Oct Nov Dec
10 00 m 3/d ay 10 00 m 3/d ay
Monthly minimum flow Mean monthly flow Station 223207
0 50 100 150 200 250 300
Jan Feb Mar April May June July Aug Sept Oct Nov Dec Monthly minimum flow Mean monthly flow
58 eXtReme Value analysIs
analysis and have catchment areas of 287 km² and 205 km², respectively. Both catchments are located at altitudes higher than 500 m.a.s.l. and are mainly covered by tall forest trees. The geology underlining the catch- ments are known as Palaeozoic igneous and meta- morphic rocks.
The location of the two catchments is shown in Figure 7.1, along with plots depicting the hydrological regime of each station. The seasonal variation in mean monthly flow and monthly minimum flow shows that the lowest flows are found in late summer (January – March) for both stations. In our example, annual minima 7-day (AM(7)) series (section 7.4.1) are determined for a hydrological year starting on 1 August. Only Station 223204 experiences zero values (twice in the observed record). The derived low-flow values are listed below for Station 223204 and 223207, covering the period 1963 – 95 (n = 34 years) and 1959 – 83 (n = 25 years),
respectively. Values are in 1 000 m³/d (equivalent to m³/s if divided by 86.4). Station 223204: 8.0, 13.0, 2.0, 2.7, 9.1, 0.0, 0.7, 21.6, 35.1, 12.7, 1.3, 9.1, 26.0, 22.7, 21.3, 7.9, 23.0, 6.0, 3.1, 4.1, 0.0, 4.9, 2.3, 12.4, 4.4, 2.1, 3.1, 7.6, 9.0, 9.0, 20.4, 13.6, 3.3, 10.3 Station 223207: 27.3, 38.0, 72.9, 53.6, 45.7, 37.0, 21.7, 36.6, 43.1, 12.6, 20.7, 66.7, 76.6, 53.1, 15.7, 66.6, 78.0, 53.1, 62.6, 40.7, 51.6, 23.1, 23.4, 23.0, 6.0
7.3 Introduction to frequency analysis
Statistics are concerned with methods (estimators) for making conclusions about the properties of the popu- lation (true value) based on the properties of a sample drawn from the population. A given characteristic, for example, the mean value, computed by an estimator is called a sample estimate or statistic and is commonly denoted using the hat symbol (^).
Let X denote a random variable, and x a real number. The cumulative distribution function (cdf):
(7.1) designates the probability P that the random value X is less than or equal to x, namely, the non-exceedance probability for x. The probability density function (pdf) is the derivative of the cdf and describes the relative likelihood that the continuous random variable X takes on different values:
{
X x}
xFX( )=Pr ≤
(7.2) The relation between f(x) and F(x) is shown in Figure 7.2, where F(x), the non-exceedance probability, equals the area under the curve for X ≤ x. The total area covered by f(x) is 1.
An exploratory data analysis, including a graphical display of the data, is generally recommended before performing a statistical analysis, as it might reveal and help to explore important characteristics of the time series. In Figure 7.3, the AM(7) flow values (in m3/d) for Station 223204 are plotted for each year in the record (data from Figure 7.2). The plot provides information on extreme values in the sample and their time of occurrence, possible trend in the series, spu- rious data, and so on. Although no clear trend can be identified in the data series in Figure 7.3, a sequence of wet and dry periods can be observed, that is, rather high low flows are found in the 1970s, indicating wet con- ditions as compared with the dryer values in the 1960s and 1980s. dx x dF x fX( =) ( )
Figure 7.2 Probability density function, f(x), and non- exceedance probability, F(x)
f(x)
F(x)
x X
The range of values observed within a sample can be displayed in a histogram, a plot of bars showing the fraction of the total sample which falls into different class intervals. The histograms of the low-flow series from the two Australian catchments reveal clear differ- ences between the two stations (Figure 7.4). Station 223204 experiences zero values and the series is skewed towards higher values as compared with Station 223207, which has a more symmetrical distribution and thus lower skewness. Summary statistics such as the mean, standard deviation and skewness can be used to describe samples of a population (section 7.6.1). If the true distri- bution of the population is known, the sample statistics can be compared to the theoretical values (section 7.8).
59