Pre-processing Methods - Statistical models for dependent and non stationary extreme events

3.3.1 Full pre-processing model

A common approach for handling non-stationarity in a time series is to pre-process (or pre-whiten) the full data series before fitting a model for a stationary series (Chatfield, 2004). Essentially we propose this as the basis for modelling extreme values of a non-stationary process. Our pre-processing approach involves first fitting a model for the covariate effect on the underlying distribution of the process

{Yt}. In some contexts an established model, based on a scientific or data-based rationale, may already exist. In the absence of such a model a flexible statistical model could be fitted. Specifically, we propose a Box-Cox location-scale model of the form

Yλ(xt) t −1

λ(xt) =µ(xt) +σ(xt)Zt (3.3.1)

where _{Zt} are assumed to be approximately stationary, and λ, µ and log(σ) are linear functions of the covariates. We do not include previous values of _{Yt} as covariates since we assume that, conditionally on the covariates, the _{Yt} process has independent events and also for consistency with the standard method, where we know of no examples of using previous values of the process as covariates for the current value.

We shall assume that the body of the distribution of the derived series _{Zt} is stationary and can be modelled using its empirical distribution ˜FZ. However, we do not use the stationary model of Section 3.2.1 for the extremes of _{Zt} as the extreme values of _{Yt} may have a different form of non-stationarity than for all of _{Yt} or our Box-Cox location-scale model may not fully capture all the covariate effects, so the extreme values of_{Zt}may not behave like extreme values of a stationary series. Instead, we model the extreme values of _{Zt} using the methods for non-stationary extremes in Section 3.2.2, i.e. with a fixed threshold uz. Let φz,u(xt) be the rate of exceedance of uz by Zt, and define the GPD

scale and shape parameters by ψz,u(xt) andξz(xt) respectively. Thus the full pre- processing model comprises a GPD(ψz,u(xt), ξz(xt)) for threshold exceedances and the empirical distribution of the transformed process _{Zt}, ˜FZ, below this level. To estimate return levels, we therefore use the GPD if φ(xt) > p, otherwise we use the empirical distribution ˜FZ. Critical to our use of the standard method of analysis for the extremes of _{Zt} is that we believe most, if not all, of the non- stationarity of _{Yt} will have been removed, or at least simplified, so that the majority of problems identified in Section 3.2.2 concerning the lack of threshold stability will have been alleviated.

Inference for this model then follows a two-step procedure; the first step is to estimate the Box-Cox location-scale parameters (λ(xt), µ(xt), σ(xt)). There are many possible ways to do this, but we suggest assuming that the underlying distribution is Gaussian since it is then straightforward to use likelihood inference to estimate the Box-Cox and location-scale parameters and it is robust to obser- vations in the tails. The second step is to model the tail of the approximately stationary series _{Zt} using the approach for non-stationary series discussed in Section 3.2.2.

The conditional and marginal return levels defined for a non-stationary series in equations (3.1.1) and (3.1.2) can easily be obtained under the pre-processing approach. We start with the conditional return levels. Since

p= Pr(Yt > yp,t|Xt =xt) = Pr µ(xt) +σ(xt)Zt> yλ(xt) p,t −1 λ(xt) Xt=xt !

we can first find the conditional return levels zp,t for the transformed series {Zt} and then back transform these to give

yp,t={λ(xt)[µ(xt) +σ(xt)zp,t] + 1}1/λ(xt).

Unlike in the standard method, if φz,u(xt) _≤ p the conditional return levels zp,t can be estimated using ˜FZ. If φz,u(xt)> pthe conditional return levelszp,t can be

estimated using expression (3.2.8).

Letzp(xt) be the transformation under equation (3.3.1) of the marginal return level yp. Then yp is the solution to the equation

p = 1 n " X t∈T Pr(Zt> zp(xt)|Xt=xt, Zt> uz) Pr(Zt> uz|Xt=xt) +X t /∈T Pr(Zt> zp(xt)|Xt=xt) # = 1 n " X t∈T φz,u(xt) 1 +ξz zp(xt)₋uz ψz,u(xt) −1/ξz + ! +X t /∈T 1₋F˜Z(zp(xt)) #

whereT =_{t:zp(xt)> uz}is the set of all times where the transformed marginal return level exceeds the thresholduzso that the GPD model for exceedances holds.

3.3.2 Varying threshold approach

An alternative method that is ‘in-between’ the standard and pre-processing methods is to use a time (and/or covariate) varying threshold to define the extremes on the original scale. This can be seen as an extension to the already popular approach of splitting data into seasons to allow for different thresholds in different seasons (see Smith, 1989, K¨uchenhoff and Thamerus, 1996 and Heffernan and Tawn, 2004, for examples with ozone data), which allows a continuously varying threshold. Such a threshold may be obtained from the pre-processing method by transforming the constant threshold uz back to the original scale to give the varying threshold

u(xt) =_{λ(xt)[µ(xt) +σ(xt)uz] + 1_}1/λ(xt)_. _(3.3.2)

The excesses of this threshold can then be modelled using the method for non- stationary extremes outlined in Section 3.2.2. Estimates of both conditional and marginal return levels are obtained in the same way as for the standard method. Specifically, as in the standard method and unlike the pre-processing method, we

cannot make estimates of either return level below the threshold.

A further disadvantage of this method compared to the pre-processing method is that the GPD parameters fitted under the varying threshold method are likely to have more covariates than in the pre-processing model, making it more difficult to fit the model. This can be seen by considering the simplest case where the extremes of _{Zt} are stationary, i.e. Zt|Zt > uz ∼ GPD(ψz,u, ξz). By a change of variable, the distribution of the exceedances of the varying threshold given in equation (3.3.2) is then, fory >0

Pr(Yt ≥y+u(xt)|Yt > u(xt),Xt=xt) = " 1 + ξz [y+u(xt)]λ(xt)₋_u_(x_t)λ(xt) ψz,uλ(xt)σ(xt) #−1/ξz + . (3.3.3)

For generalλ(xt), this is not a GPD and so any attempt to model the exceedances Yt−u(xt)|Yt > u(xt) using a GPD model is likely to result in a poor fit. Suppose that the Box-Cox parameter λ(xt) is equal to 1; in this case equation (3.3.3) simplifies to a GPD with shape parameterξz and scale parameterψz,uσ(xt). Now σ(xt) needs to be estimated for both varying threshold and pre-processing methods, however we can see that the pre-processing method will give the more efficient estimate of σ(xt) as it uses all the data _{Yt}, not only those {Yt} which are exceedances of u(xt).

In document Statistical models for dependent and non stationary extreme events (Page 71-74)