Estimators - 9-6-2011 12:00 AM The Analysis of Extreme Synoptic Winds

is defined for all x andθ= [µ σ]T_{. Several methods and estimators exist for fitting the un-} known parameters to a given dataset and are covered in Section 4.2. The equations defining additional statistical properties such as the probability density function of the GEVD are provided in Appendix B.1.

4.2 Estimators

Several classical estimators exist which appropriately estimate unknown parameters of a distribution, either graphically or numerically, for a specific dataset. An estimator is a function of the observed data which is utilised to estimate the unknown parameters or estimands. Graphical or least-square methods have been historically preferred in the field of wind engineering over numerical methods. One of the debates when using these methods is the selection of an appropriate plotting position. The original plotting position given by

Gumbel (1958) as

F(xi)=

N+1 (4.3)

has been considered biased and is often replaced by the plotting position given by Gringorten (1963) as

F(xi)=

i₋0.44

N+0.12. (4.4)

The debate of the appropriate plotting position has been recently revisited and is ongoing, see Makkonen (2006), Makkonen (2008), and Cook (2011). Overall, the methods typi- cally applied by the wind engineering community to calculate the fitted parameters of the extreme value distribution are outdated when one considers the efficient numerical estimators utilised by statisticians. The plotting position debate is extraneous considering the statistical techniques which are available to directly solve estimands (de Haan, 2007). In the field of extreme value statistics, the method of moments (MoM), maximum likelihood estimators (MLE), and probability weighted moments (PWM) are established estimators. Despite the general disregard for such methods by the wind engineering community, in the field of statistical modelling of extremes, the methods are considered classic when one considers current research. Updated methods include optimal bias-robust estimators (OBRE) and Bayesian methods.

Parameter estimation using MoM is carried out by solving the population moments (e.g. mean, variance) using the sample moments. The estimator is easily biased as calculation of the sample mean can be sensitive to outliers for small sample sizes. An alternative to MoM which is less sensitive to outliers is PWM. PWM belong to the family of L-estimates introduced by Greenwood et al. (1979) and further developed by Landwehr et al. (1979) and Hosking et al. (1985). L-estimators tend to be less sensitive to outliers than other

classical estimators as they are calculated from linear functions of the data, rather than the individual values (Hosking, 1990). The estimator differs from conventional moments since the estimates are calculated from linear combinations of ordered data. Hosking et al. (1985) show that PWM have reduced bias, which often provides a better fit to observed data than MLE. Alternatively, Dupuis and Field (1998b) found PWM can be biased by a single large event, thus, the authors suggest the use of OBRE. OBRE are a robust extension of MLE which produce similar parameter estimates as PWM and provide additional information describing the quality of the fit to each observation.

4.2.1 Maximum Likelihood Estimators

MLE were introduced by Fisher (1912, 1922) and were applied to the GEVD by Jenkinson (1969). Huber (1964) proposed a generalisation of the MLE by a class of estimators called M-estimators which provides the basis of OBRE discussed in Section 4.2.2. The formu- lation of the MLE is summarised here in the context of M-estimators, the general form of which is given by Huber (1964) as

min n

i=1

ρ(xi,θ) (4.5)

whereρis an appropriate function. An estimate of the parameters minimising Equation 4.5 are calculated by setting the derivative ofρ, expressed as

ψ(x;θ)= ∂

∂θρ(x,θ), (4.6)

equal to zero and solving the resulting implicit equation n

i=1

In the case of the MLE, the general form of the M-estimators is rewritten as a maximum by taking the negative value of the functionρgiving

max n

i=1

−ρ(xi,θ). (4.8)

The parameters which will maximise the likelihood function, defined as

L(θ; x)= fθ(x1, ...,xn|θ) (4.9) = n Y i=1 fθ(xi) (4.10)

are then sought. Equation 4.9 can be written in the form given by Equation 4.10 provided x satisfies independence. By taking the logarithm, a monotonic transformation, of Equation 4.10 the log-likelihood is written as

log L(θ; x)=

i=1

log fθ(xi). (4.11)

Thus, the function described in the general form of the M-estimators is equal to

ρ(x;θ)=₋log fθ(x). (4.12)

and its derivative, defined by Equation 4.6, equals

ψ(xi;θ)=

∂

∂θlog fθ(x;θ) (4.13)

which is commonly referred to as the maximum likelihood scores function, s(xi;θ) (Hampel

4.2.2 Optimal Bias-Robust Estimators

The most significant shortcoming of the classical estimators is the lack of robustness. De- pending on the number of observations, small deviations from the underlying model can greatly affect the estimands if the influence function (IF) is unbounded (Dupuis and Field, 1998b). By bounding the IF of an estimator, small contaminations in the data will not largely affect the outcome of the estimator. The influence function in its general form is provided in Appendix B.2.1. To mitigate the influence of deviations from the assumed model, robust estimators are based on the data that are well fit by the model. Observations not well fit by the assumed model are therefore weighted lower than those fit well by the model. OBRE have been successfully applied to environmental extremes such as temper- ature (Dupuis and Field, 1998b) and wind measurements obtained from buoys moored in the Pacific Ocean (Dupuis and Field, 2004).

The M-estimators discussed in Section 4.2.1 form a starting point for OBRE. The IF of MLE is unbounded as a result of the score function, given by Equation 4.13, being un- bounded in x (Dupuis and Field, 1998b). An overview of the IF for MLE is provided in Appendix B.2.2. To construct a bounded influence function for the MLE, a bounded version of Equation 4.6 is required which is as similar to Equation 4.13, the maximum likelihood scores function, as possible. To bound the influence of observations not well fit by the model, the Huber function forms the basis of a weighting function. The Huber function maps values of function z which are outside the bounds of hc(z) to the nearest value on

hc(z) (z 7→ hc(z)), thus reducing the influence of the furthest values (Hampel et al., 1986). The multidimensional Huber function is given by

hc(z)=zWc(z)=z min 1,

k z_k

(4.14) where Wc is the weighting function, c is the robustness constant and k · k denotes the

Euclidean norm. When the robustness constant in Equation 4.14 equals infinity, the MLE is achieved since Wc(x,θ)= 1 for all observations and parameters. The complete derivation of the estimator is provided by Hampel et al. (1986) and Dupuis and Field (1998b), while the resulting bounded estimator and associated algorithm for the OBRE procedure are provided in Appendix B.3. The OBRE algorithm provides estimates of the fitted parameters and the weight applied to each observation.

In document 9-6-2011 12:00 AM The Analysis of Extreme Synoptic Winds (Page 74-79)