Schatten van de extreme value index voor afgeronde data (Engelse titel: Estimation of the extreme value index for imprecise data)

(1)

Technische Universiteit Delft

Faculteit Elektrotechniek, Wiskunde en Informatica Delft Institute of Applied Mathematics

Schatten van de extreme value index voor

afgeronde data

(Engelse titel: Estimation of the extreme value

index for imprecise data)

Verslag ten behoeve van het Delft Institute of Applied Mathematics

als onderdeel ter verkrijging

van de graad van

BACHELOR OF SCIENCE in

TECHNISCHE WISKUNDE

door

Jasper Jonathan Velthoen Delft, Nederland

November 2014

(2)

(3)

BSc verslag TECHNISCHE WISKUNDE

“ Schatten van de extreme value index voor afgeronde data (Engelse titel: Estimation of extreme value index for imprecise data)”

Jasper Jonathan Velthoen

Technische Universiteit Delft

Begeleider Dr. J. Cai

Overige commissieleden

Dr. D.C. Gijswijt Dr.ir. F.H. van der Meulen Dr.ir. M. Keijzer

(4)

(5)

Notations

GP D(µ, σ, γ) The generalized Pareto distribution with locationµ, scaleσ and shape γ P(σ, γ) The Pareto distribution with scale σ and shape γ

[.]n Rounding off on n decimals

d.e_n Taking the ceil keeping the first n decimals

b.c_n Taking the floor keeping the first n decimals U =₁₋1_F

←

The left continuous inverse of ₁₋1_F

(8)

(9)

Chapter 1

Introduction

In this bachelor thesis we will focus on estimating the extreme value index. This is a parameter that is analysed greatly in extreme value theory, a part of statistics which is concerned with extreme events, i.e. events we observe just a few times or maybe not at all. This part differs from the more generally known statistics, which is concerned with more regular events, i.e. events that are observed more frequently. To illustrate this difference let us consider the experiment of throwing a fair coin.

probability mass function of a fair coin =

(₁

2 heads 1

2 tails

Say we do not know the probability of throwing heads. In that case we can try to estimate it by performing the experiment of throwing the coin several times. Throwing the coin 1000 times may result in 512 times throwing heads. The estimated probability will now be ₁₀₀₀512 = 0.512, which is fairly close to 0.5. The estimation in this case is fairly close because the event, throwing heads, is very probable to happen. Let us now consider an unfair coin.

probability mass function of a unfair coin =

( 1

10000 heads 9999

10000 tails

When performing the same experiment of throwing the coin a 1000 times, it is quite likely we do not even observe a throw resulting in heads, which leads to a estimated probability of zero. On the other hand, if we do observe a throw resulting in heads, the estimated probability is 1 in 1000, which is a big overestimation. The method used with the fair coin is no longer applicable in the case of rare events.

Extreme value theory approaches these kind of problems by looking at the distribution of the sample maxima and minima instead of sample means. Let us recall the tails of a probability density function have to converge to zero. Extreme value theory uses this fact to estimate the speed of convergence to zero and uses extrapolation of this convergence. In this way, the tail distribution can be estimated and thereby also the probability of rare events. The extreme value index is a parameter that appears in the generalized Pareto distribution, i.e. the distribution of the tail. The value of the extreme value index determines the speed of convergence and with that the shape of the tail.

The applications of Extreme value theory are very broad. For example in the field of insur-ance, e.g. to adequately insure against earthquakes, it is good to know the probability of them happening, especially the large ones. Also in risk analysis it is used to estimate the probabilities of a huge loss.

(10)

6 CHAPTER 1. INTRODUCTION When estimating, problems often arise from the data itself. In this thesis we focus on estimating the extreme value index for the tail distribution of earthquakes happening in Greece. The problem with earthquake data is that the magnitude measurements are only available rounded off on one decimal. Rounded data is a big problem when estimating the extreme value index, the reason for this lies in the way in which the extreme value index is estimated.

The problem of rounded data is discussed in [7] and is indeed a problem that occurs fre-quently. An example, for instance, is measurement errors which results in rounding off or imprecise data.

In the thesis we will begin in Chapter 2 by introducing extreme value theory. Accordingly we will introduce some important estimators for the extreme value index in Chapter 3. In Chapter 4 we will state in detail the problem with rounded data. Accordingly we will define a new estimator and show some simulation results in Chapter 5 and 6. Finally, in Chapter 7, we will apply the new estimator to real-life earthquake data form Greece.

(11)

Chapter 2

Extreme value theory

2.1

Introduction

The aim of this chapter is to explain some of the basics of extreme value theory and provide the necessary tools for estimation of the extreme value index. The goal is to estimate the tail distribution of earthquakes happening in Greece. Let’s denote the magnitude of the earthquakes byX1, X2, . . . Xn. When trying to estimate the tail distribution it is no longer useful to consider

the mean of the magnitudes, but instead the distribution of the maximum of the magnitudes. This means we are no longer interested inP

Xi but instead in maxXi. It is obvious the central limit theorem can not be applied any more and therefore we need to find a new limit relation which is concerned with the sample maximum.

Firstly consider the maximum ofnindependent and identical distributed observations, when ntends to infinity.

max(X1, X2, . . . , Xn)→P x∗, n→ ∞

Where x∗ is interpreted as the endpoint of the distribution F(x) = P(X ≤ x), i.e. x∗ = sup{x : F(x) < 1}. This means, when having infinitely many samples the sample maximum will approachx∗ whereF(x∗) = 1. Let us now consider the distribution of this maximum for n observations.

P(max(X1, X2, . . . , Xn)≤x) =P(X1 ≤x, X2 ≤x, . . . , Xn≤x) =Fn(x)

This follows while assuming all Xi are independent. When letting n → ∞ we will get the following distribution, lim n→∞F n₍_x_{) =} ( 0 ifx < x∗ 1 ifx≥x∗

Obviously this is a degenerate distribution. It is more interesting to consider a non-degenerate distribution. To achieve this result we will use a normalization.

max(X1, X2, . . . , Xn)−bn an →dG(x)⇔ lim n→∞F n₍_a nx+bn) =G(x) (2.1) WithG(x) a non degenerate distribution and an>0 andbn∈R for alln∈N.

The limit relation (2.1) we just derived is not easy to work with in practise. Therefore some equivalent relations are derived. These relations are stated in the theorem below. These relations will be used in Chapter 3 for deriving estimators for the extreme value index. The theorem and its proof can be found in Theorem 1.1.6 in [1].

(12)

8 CHAPTER 2. EXTREME VALUE THEORY

Theorem 2.1.1. For γ ∈_R the following statements are equivalent:

• There exist real constants an>0 and bn∈R such that lim n→∞F n₍_a nx+bn) =Gγ(x) = exp(−(1 +γx)− 1 γ₎ ∀_x _with _{1 +}_{γx >}₀ _(2.2)

• With U the left continuous inverse of ₁₋1_F, where F is the distribution function of X and

a(t) a positive function such that,

lim t→∞ U(tx)−U(t) a(t) = xγ₋₁ γ (2.3)

With the right-hand side equal to logx when γ = 0.

• There exists a positive non decreasing function f such that,

lim t↑x∗ 1−F(t+xf(t)) 1−F(t) = (1 +γx) −1 γ ∀_x _with_{1 +}_{γx >}₀ _(2.4) Where x∗ = sup{x:F(x)<1}

The previous theorem gives equivalent formulations for limit relation (2.1) for all γ ∈_R. In the theorem below we will simplify (2.3) and (2.4) for γ >0. This result and its proof can be found in Corollary 1.2.10 in [1].

Theorem 2.1.2. • For γ >0, (2.3) is equivalent to

lim t→∞ U(tx) U(t) =x γ _{f or x >}₀ _(2.5) • Forγ >0 (2.4)is equivalent to lim t→∞ R∞ t (1−F(x)) dx x 1−F(t) =γ (2.6)

We now derived a limit relation which is concerned with the maximum of a sample. In the following section we will show 3 cases of possible non-generate distributions G(x) which can appear in the limit. Also we will derive a sufficient condition on the distribution functionF(x), such that the limit relation (2.1) holds for a givenG(x)

2.2

Limiting Distributions

The non-degenerate limiting distributionG(x) is defined as the class of extreme value distribu-tions [4], [5].

Theorem 2.2.1. The class of extreme value distributions is given by

Gγ(x) = exp

−(1 +γx)−1γ

, 1 +γx >0 (2.7)

with γ ∈_R called the extreme value index and for γ = 0 Gγ is given byexp(−e−x).

This theorem gives us the explicit form of all the non-degenerate distributions G(x) that satisfy the limit relation (2.1). To go a bit further we can divide this general class of distributions in 3 sub categories, i.e. γ >0, γ = 0 and γ <0 respectively. We discuss every case separately and give some properties of these distributions. For these three cases see [1, p. 9,10].

(13)

2.2. LIMITING DISTRIBUTIONS 9 2.2.1 The case γ >0

In this case we can rewrite the class of extreme value distributions. We will look atGγ

x−1

γ

, note that the condition on this distribution becomesx >0.

Gγ x−1 γ = exp(−x−1γ₎ = exp(−x−α)

Withα >0, i.e. the tail index is greater 0. This is the Fr´echet class of distributions. Consider now the endpoint of the Fr´echet class of distributions, i.e. x∗ = sup{x:Gγ(x)<1}. It is clear that the endpoint of this class of distributions is infinity. The tail distribution of this class is heavy tailed.

2.2.2 The case γ = 0

Also for this case we have a reformulation for the distribution given in Theorem 2.2.1 given γ= 0.

Gγ(x) = exp(−e−x)

The name of this class of distributions is the Gumbel distribution and is defined for allx ∈_R. Notice that in the same way as in the case γ > 0 we have an right endpoint that is infinity. Though in contrast to the previous case this class is light tailed.

2.2.3 The case γ <0

In this case we can find an alternative parametrization withGγ

−1+_γx Gγ −x+ 1 γ = exp(−(−x)α)

When looking at the original condition 1 +γx > 0, we will find x < −1_γ So we have an right endpoint that is finite and therefore we have a short-tailed distribution.

Now we have categorized three different distribution which all describe a different tail distri-bution. It is only left to prove a sufficient condition on the distribution functionF so the limit relation holds for one of the 3 cases. In other words, we will derive a condition on the original distribution functionF such that the limit relation (2.1) holds for Gγ as defined above.

(14)

10 CHAPTER 2. EXTREME VALUE THEORY

2.3

A condition on the sampling distribution function

When the limit condition holds for a certainF and a certainγ ∈_R, i.e. limn→∞F(anx+bn) = Gγ(x), then we say F is in the domain of attraction of Gγ. This is commonly denoted by F ∈ D(Gγ). The next theorem is in [1] and states the sufficient condition we want.

Theorem 2.3.1. F ∈ D(Gγ) withGγ the generalized extreme value distribution if and only if

1. for γ >0: F has a infinite right endpoint and

lim t→∞

1−F(tx) 1−F(t) =x

−1/γ_, _∀_{x >}₀ _(2.8)

2. for γ <0: F has a finite right endpoint x∗ and

lim t↓0

1−F(x∗−tx) 1−F(x∗₋_t₎ =x

−1/γ_, _∀_{x >}₀ _(2.9)

3. for γ = 0: F has an finite or infinite right endpoint x∗ and

lim t→∞ 1−F(t+xf(t)) 1−F(t) =e −x_, _∀_x_∈ R (2.10)

(15)

Chapter 3

Estimators for the extreme value

index

3.1

Introduction

In the previous chapter we have seen the basics of extreme value theory. When looking at extreme events, we will look at the distribution of a sample maximum, the so called class of extreme value distributions [4] [5].

Gγ = exp

−(1 +γx)−1/γ

, 1 +γx >0 (3.1) Theγ is called the extreme value index. In this chapter we will look at a few different estimators for the extreme value index. First we will start with the most well known and used estimator, the Hill estimator. The disadvantage of the Hill estimator is that it can only be applied when γ >0, i.e. it can only be used for infinite tails, see Chapter 2. Therefore we will also discuss estimators which estimate also for negativeγ, such as the moment estimator and the maximum likelihood estimator.

3.2

The Hill estimator

The following derivation of the Hill estimator is due to Hill [6]. By (2.6) in Theorem 2.1.2, we have forγ >0x >0 lim t→∞ R∞ t (1−F(x)) dx x 1−F(t) =γ (3.2)

Therefore we will rewrite the denominator of equation (3.2) using partial intergration.

Z ∞ t (1−F(tx))dx x = logx(1−F(x)) ∞ t + Z ∞ t logx dF(x) =−logt 1− Z t −∞ dF(x) + Z ∞ t logx dF(x) = Z ∞ t logx −logtdF(x) Hence we have lim t→∞ R∞ t logx −logtdF(x) 1−F(t) =γ (3.3) 11

(16)

12 CHAPTER 3. ESTIMATORS FOR THE EXTREME VALUE INDEX To extract an estimator from this result we will fill in for t the order statistic Xn−k,n.

Intuitively this can be considered as the beginning of the tail. Because with estimating the extreme value index we are estimating the tail distribution. Therefore we only need data from the tail. For the distribution functionF we will use the empirical distributionFn. This will give us the following result,

ˆ γH := R∞ Xn−k,nlogx −logXn−k,n dFn(x) 1−Fn(Xn−k,n) = n k · 1 n k X i=1 logXn−i+1,n −logXn−k,n = 1 k k X i=1 logXn−i+1,n −logXn−k,n

This leads to the following definition.

Definition 3.2.1. The estimator ˆγH is called the Hill estimator and is defined by ˆ γH = 1 k k X i=1 logXn−i+1,n −logXn−k,n (3.4)

Where Xi,n the ith order statistic of an iid random sample.

To be able to use the Hill estimator we have to prove it’s consistency, i.e. it converges to the extreme value index. Also to be able to use the estimator it is essential to show its error is asymptotically normal. The asymptotic normality is needed to indicate the speed of convergence of the estimator. Below we will state the theorem of the consistency, for the theorem concerning the asymptotic normality we refer to [1]. Let us now consider n observations, X1, . . . Xn. It

is obvious we do not want to use all the data to estimate the tail-index, as also stated above by the derivation of the Hill estimator. We want to limit ourselves to only the tail data. This gives rise to a new problem, which data is from the tail. Therefore we define the order statistic Xn−k,n, which we consider as the beginning of the tail. Here k is unknown, but we can define a condition on k, k=k(n), k → ∞and k/n→0 as n→ ∞. In Chapter 4 it will be explained how to cope with not knowing this k.

Theorem 3.2.2. Let X1, X2, . . . Xn be i.i.d. random variables with distribution function F.

Suppose F ∈ D(Gγ) withγ >0. Then

ˆ

γH →P γ

if k=k(n), k→ ∞ and k/n→0 as n→ ∞

The complete proof will not be given here, but let us give the idea of the proof. The first thing to show is, ifX ∼d_F _and _H_{= P(}_Y _≤_y_{) = 1}₋1

y thenX d

=U(Y). WhereU(.) is defined earlier as the left continuous inverse of ₁₋1_F. This is true because of:

P(U(Y)≤x) =P(Y ≤U←) = 1− 1 U← = 1− 1₁ 1−F(x) =F(x) = P(X≤x)

(17)

3.3. THE MOMENT ESTIMATOR 13 Because X has the same distribution as U(Y), it is possible to say something about the relation betweenX and Y in the limit ofn→ ∞ using the form in Theorem 2.1.2

lim n→∞ Xn−i+1,n Xn−k,n d = lim n→∞ U(Yn−i+1,n) U(Yn−k,n) d = Yn−i+1,n Yn−k,n γ

But this means that fornlarge enough it follows that, Xn−i+1,n Xn−k,n

d

≈Yn−i+1,n Yn−k,n

γ

Let us now look what this means for the Hill estimator.

ˆ γH = 1 k k X i=1 logXn−i+1,n Xn−k,n d =1 k k X i=1 logU(Yn−i+1,n) U(Yn−k,n) d ≈1 k k X i=1 log Yn−i+1,n Yn−k,n γ =γ· 1 k k X i=1 logYn−i+1,n Yn−k,n The only thing left to show now is that 1_kPk

i=1log

Yn−i+1,n

Yn−k,n goes in probability to 1. This can be shown by noticing that logYi has an standard exponential distribution and accordingly proving that this sum is asymptotically normal. We will not show that here but it is proven in [1].

3.3

The moment estimator

As stated above the Hill estimator has a downside, which is that it can only be used for a positive extreme value index. The estimator in this section is the moment estimator. The moment estimator is derived from the Hill estimator, but it is altered in such a way that it is applicable for negative γ [2]. We will state the moment estimator and its consistency here without the proof.

Definition 3.3.1. The estimatorγˆM is called the moment estimator and is defined by

ˆ γM :=Mn(1)+ 1− 1 2 1− (Mn(1))2 Mn(2) !−1 (3.5) With, M_n(j):= 1 k k X i=1 (logXn−i+1,n−logXn−k,n)j (3.6)

WhereXi,n the ith order statistic of an iidrandom sample.

Supose F ∈ D(Gγ) withγ ∈Rand x∗>0 Then ˆ

γM →P γ

ifk=k(n), k→ ∞ and k/n→0 as n→ ∞

(18)

14 CHAPTER 3. ESTIMATORS FOR THE EXTREME VALUE INDEX

3.4

The maximum likelihood estimator

The final estimator we will look at is the maximum likelihood estimator forγ. The full derivation of this estimator is shown in [3]. For the derivation we will look at relation 2.4 in Theorem 2.1.1, withf a positive non-decreasing function.

lim t↑x∗ 1−F(t+xf(t)) 1−F(t) = (1 +γx) −1_γ 0< x <(0∨(−γ))−1 We can also write this equation as follows

lim t↑x∗P X−t f(t) X > t = (1 +γx)−γ1 _(3.7) The right-hand side of this relation is a distribution we will use more often in the following chapters therefore we will define it below.

Definition 3.4.1. The distribution given below is called the generalized Pareto distribution

P(X ≤x) = 1−1 +γx σ

−_γ1

, 0< x <(0∨(−γ))−1

Now relation (3.7) tells us that the large observations are distributed according the gener-alized Pareto distribution. Say we have the observations X1, . . . , Xn. The large observations are now given by, Xn−k,n, . . . Xn,n. We write these as the observations Y0, Y1, . . . Yk equal to Xn−k,n, Xn−k+1,n−Xn−k,n, . . . Xn,n−Xn−k,n respectively. Now when n → ∞ these observa-tions Yi are distributed as the generalized Pareto distribution. Hereby we have the condition that k = k(n), k → ∞and k/n → 0 as n→ ∞. As a result a maximum likelihood estimator can be constructed for these altered observations Yi.

Definition 3.4.2. The maximum likelihood estimators for the scale and shape are defined as the values γ,ˆ σˆ that maximizes the following likelihood equations,

1 k k X i=1 log 1 +γ σ(Xn−i−1−Xn−k,n) =γ 1 k k X i=1 1 1 +γ_σ(Xn−i−1−Xn−k,n) = 1 γ+ 1

This maximum likelihood estimator behaves irregular forγ <−1 and therefore this estimator is consistent for γ ∈(−1,∞), [9].

Suppose F ∈ D(Gγ) withγ ∈(−1,∞) andx∗>0 Then

ˆ

γM LE →P γ

(19)

Chapter 4

The problem with rounded data

4.1

Introduction

We have set all the basics for solving the problem at hand, i.e. estimating with imprecise data and in our cases rounded data. First let us look what we encounter when applying the existing estimators, from 3, directly to the rounded data. In this chapter we will first look at estimating the extreme value index, using the moment estimator, for simulated data which is not rounded. Knowing what the estimation should look like we will try to do the same with rounded data on 0 to 3 decimals.

4.2

Estimation with non-rounded data

To give an example of how the estimation with non rounded data is done we will consider data simulated from theGP D(0,1,−0.1). This distribution has a finite right end point. Recall that we need to choosek first in order to do estimations. We do not know the value ofkdefined for the order statisticXn−k,n, i.e. the beginning of the tail. The condition on this k was given as follows,k=k(n), k→ ∞ and k/n→0 as n→ ∞. It is not clear how big or small this value is and therefore we will use the Hill plot to visualize the estimation. It shows the estimated values ˆ

γk plotted against the values of k. In Figure 4.1 the Hill plot shows a clear convergence to the tail index, i.e. −0.1, for 1000 to 1750 observations used from the tail. In the beginning we also see some fluctuations, these have to do with too little data that is used in the estimation. In practise, when we do not know the real value of γ, we estimate it by the average over the first stable period in the Hill plot as the interval ofk.

4.3

Estimation with rounded data

Let us now have a look when this data is rounded. We consider 4 cases, rounding on integers, on 1 decimal, 2 decimal’s and three decimal’s. From Figure 4.2 it is clear the result of the rounded data gives us different estimations, in comparison with the estimations on non-rounded data. We clearly see that for all the three plots there is oscillation in the estimation of the extreme value index. In the rounding on zero decimals the plot tells us nothing at all about the value of the extreme value index, while the rounding on 3 decimals already looks a lot like the non-rounded estimations. The problem we now have is the oscillating of the estimator. As a result we can not give a clear estimation of the extreme value index. Although, in a rough way it looks a bit like the estimations of the non-rounded data. This gives us a motivation to develop a method which lets us estimate the extreme value index more precisely.

(20)

16 CHAPTER 4. THE PROBLEM WITH ROUNDED DATA

Figure 4.1: The Hill plot of Moment estimator on data simulated from a GPD(0,1,-0.1) with N = 10000

4.4

The oscillation

Before we find a way to make the data more smooth it is good to explain where the oscillation comes from. For that it is important to know the difference between rounded and non rounded data. The difference is that, in rounded data the data points are often not unique, i.e. by rounding off you are likely to observe more of the same data points. To show the effect of these ties, let us look at the Hill estimator again from definition 3.2.1

ˆ γH = 1 k k X i=1 logXn−i+1,n −logXn−k,n

The estimation is always depending on kand therefore we construct a Hill plot to examine the estimates as a function of this k. The oscillation comes from the ties in the data. Let’s say we have observations equal to (1,1,2,2,2,3,3) and we calculate the Hill estimate for γ for k= 3, this is equal to 0.27. Now letkincrease and look atk= 4. The number of observations increase only, because of the ties in the data, the sum stays the same. The new estimation of γ is now 0.20. Although when letting k increase such that Xn−k,n < Xn−k+1,n, in our example k = 5, every term in the sum of the Hill estimator will increase and thereby the estimator will increase also. In our example we have Xn−k,n = 1 < 2 = Xn−k+1,n. The new estimators shows this

increase clearly,γ = 0.86.

With observations which are not rounded, we are a lot less likely to observe ties in the data. Therefore, the oscillating behaviour is less drastically because the sum in the estimation is adjusted every time kincreases, instead of increasing drastically when there is a jump in the observations.

(21)

4.4. THE OSCILLATION 17

Figure 4.2: The Hill plot of Moment estimator on data simulated from a GPD(0,1,0.1), rounded on respectivly 0,1,2 and 3 decimals,N = 10000

(22)

(23)

Chapter 5

Proposal for a new estimator

5.1

Introduction

In this chapter we will propose an estimator that will perform better while estimating with rounded data in comparison with the estimators mentioned in Chapter 3. As seen in Chapter 4, the estimation with rounded data begins to oscillate as a result of ties in the data. The approach will be to make the data more smooth. Therefore we will first investigate the rounding error, i.e. X−[X]. Although the rounding error is often assumed to be uniformly distributed, this is not always the case. After we give some insight in the distribution of the rounding error, we will suggest an new estimator for data that comes from a distribution that has tails with an infinite endpoint, i.e. γ >0. Also we will look at observations from a distribution with finite tails and look at the problems that occur.

5.2

The rounding error

As discussed before in the introduction, the rounding error is often assumed to be uniformly distributed. This is a valid assumption when working with a symmetric or nearly symmetric distribution. However when working with heavily skewed distributions, such as tail distributions, this assumption of a uniformly distributed rounding error is no longer valid. We can demonstrate this by comparing the rounding errors of data simulated from the standard normal distribution and from a generalized Pareto distribution.

5.2.1 The rounding error of the normal distribution

In Figure 5.1 the rounding error of data from the standard normal distribution is shown. The rounding is done on integers. Clearly this approximates a uniform distribution on [−0.5,0.5].

(24)

20 CHAPTER 5. PROPOSAL FOR A NEW ESTIMATOR

Figure 5.1: The non-parametric density estimate of the rounding differences of a standard normal distributed. Simulated with N=1000000.

5.2.2 The generalized Pareto Distribution

When we look at the generalized Pareto distribution we will split it up in 4 cases. Because the generalized Pareto distribution is very different for different values of γ. We will look at the characteristics of the generalized Pareto distribution for the following values ofγ.

1. γ <−1 2. γ =−1 3. γ ∈(−1,0) 4. γ ≥0

For every case there is a histogram made from simulated data in figure 5.2. For the first case the density function is a monotonously increasing function and when γ gets really small the distribution will get skewed to the left. For the second case the density function looks like a standard uniform distribution on [0,1]. The simulation was done from the GP D(0,1,−1). We will show this also analytically,

1− 1 +γ·x−µ σ −_γ1 = 1−(1−x)1 =x.

This is exactly the uniform distribution on [0,1]. In the third case the density function is monotonously decreasing and for values close to zero the distribution is skewed to the right, but the distribution always has a short tail. In the last case it is skewed to the right and has infinite tails. Also, whenγ increases, the skewness increases.

5.2.3 Rounding error for the generalized Pareto distribution

Now we recall we are examining the rounding error distribution when simulating from the generalized Pareto distribution. Because in this thesis we are considering earthquake magnitudes

(25)

5.2. THE ROUNDING ERROR 21

Figure 5.2: Different histograms for the generalized Pareto distribution, respectively simulated from GPD(0,1,-1.5),GPD(0,1,-1),GPD(0,1,0.4),GPD(0,1,0.1). Simulated with N=1000000

it is fair to assume the right tail distribution is skewed to the right.Therefore the appropriate extreme value index would be in between −0.25 and 1. So we will only look at the rounding error distribution of a right skewed generalized Pareto distribution. To visualize it we will take a heavily skewed distribution with γ = 1. In Figure 5.3 the rounding error of data from GP D(0,1,1) when rounding on zero decimals. This obviously does not look like a uniform distribution and it even has a jump discontinuity at point x = 0. Intuitive this is not logical and we can clear this by taking floors instead of rounding. This is shown in the right plot in figure 5.3. Immediately it is clear that taking floors reverses the part in the error distribution <0 with the one >0. From the plot it is clear that the rounding error density can better be approximated by a decreasing linear line. This behaviour can be explained by looking at the

Figure 5.3: The non-parametric density estimate of the rounding differences of a GPD(0,1,0.1). On the left the data is rounded on zero decimals, on the right floors are taken of the data. Simulated with N=1000000.

(26)

22 CHAPTER 5. PROPOSAL FOR A NEW ESTIMATOR 5.4. From this figure it is clear that the behaviour of the rounding differences comes from the skewness of the distribution. The probability of rounding up, i.e.the grey area, is always greater than the probability of rounding down, i.e. the black area. The reason why the probability of having a positive rounding error is greater as a negative rounding error, comes from the fact thatx >0. This means that all data points rounded to zero have a positive rounding error.

Figure 5.4: The density function of the GPD(0,1,0.1) between 0.5 and 1.5. The grey area is the probability of rounding up to, the black area is the probability of rounding down to 1. Simulated with N=1000000.

5.3

An estimator for infinite tails

As seen above we do not have uniformly distributed rounding errors, which means in order to correctly add back the rounding error to the data we need the exact rounding error distribution. Unfortunately this is not possible, because in order to to this sampling distribution of the data is needed and therefore the extreme value index which we want to estimate.

The next question that comes up is: is the exact error distribution needed to correctly estimate our tail index γ, i.e. by adding a small stochast to our data do we change it’s tail distribution? It turns out that when adding a small stochast to our data we do not influence the tail distribution only if we have an infinite tail, i.e. γ >0. This result is proven in the theorem below,

Theorem 5.3.1. LetXa random variable distributed by distribution functionF ∈ D(Gγ)where γ >0the extreme value index, then[X]p+U is distributed with distribution functionH∈ D(Gγ), i.e. has the same tail distribution as X. Where U =U_[−0.5·10−p_,₀_.₅_·₁₀−p_] independently from X

Proof. For γ >0 we have from Theorem 2.3.1 lim t→∞ 1−F(tx) 1−F(t) = limt→∞ P(X > tx) P(X > t) =x −1/γ_, _∀_{x >}₀

Let us now look at the sum of our rounded X and the uniformly distributed random variable denoted : P([X]p+U > tx) P([X]p+U > t) = P(X > t) P([X]p+U > t) · P(X > tx) P(X > t) · P([X]p+U > tx) P(X > tx) (5.1)

(27)

5.3. AN ESTIMATOR FOR INFINITE TAILS 23 With the definition ofX it is clear the term in the middle will converge tox−1/γ ast→ ∞. We are only left to prove the left and the right term will converge to 1 whent→ ∞. It is clear when the left or right term goes to 1 ast→ ∞ the other does to. So we will prove it only for the right term. By constructing a upper bound and lower bound we will show the term will converge to 1. P(X > t) P([X]p+U > t) ≤ P(X > t) P([X]p−10−p _{> t}₎ = P(X > t) P(X > t+ 10−p₎ = lim_t_→∞ t t+ 10−p −1_γ P(X > t) P([X]p+U > t) ≥ P(X > t) P(X+ 10−p _{> t}₎ = P(X > t) P(X > t−10−p₎ = lim_t→∞ t t−10−p −1_γ

The upper bound and lower bound will clearly converge to 1 when t → ∞. As a result of this we have proven that the right hand side of 2.1 converges tox−1/γ_{. Which means that the} distribution functionH of [X]p +U is in the domain of Gγ and so the extreme value index is the same as that ofX.

Remark 5.3.2. The result from theorem 5.3.1 does not only hold for normal rounding. The proof is similar for taking floors and ceils. The theorem and proofs will change a bit. For taking the floor the uniform distribution added will be U_[0_,₁₀−n_] and the lower and upper bound are

determined by,

bXc_p+U ≤ dXe_p ≤X+ 10−p

bXc_p+U ≥ bXc_p ≤X−10−p

For taking the ceil the uniform distribution used is U_[−10−p_,_0]. and the lower and upper bound

are determined by,

dXe_p+U ≤ dXe_p ≤X+ 10−p

dXe_p+U ≥ bXc_p ≤X−10−p

The rest of the proof goes the same as above.

From now on we will introduce some notation: let X1, . . . , Xn denote the real underlying observations. These for example are the non-rounded magnitudes of earthquakes. The data points we observe we will denote by Y1, . . . Yn, i.e. the rounded magnitudes. From these Yi we want to estimate the tail distribution, assuming we know the tail index is greater as zero. If we try to estimate it using the moment or the Hill estimator we get a oscillating result as shown in figure 4.2. The result tells that now we can add a uniform distribution to our data, Zi = Xi+Ui from the uniform distribution stated in the theorem, which gives us a altered sample Z1, . . . , Zn. The theorem tells us that now the constructed sample Zi is distributed with the same tail distribution as our original sampleXi and therefore we can estimate it’s tail distribution by estimating the tail distribution of theZi sample. This gives rise to the following definition

Definition 5.3.3. The estimatorγˆR is defined by:

ˆ γR:= 1 k k X i=1 logZn−i+1,n−logZn−k,n (5.2)

Where Zi =Xi+Ui, with Xi a rounded i.i.d. random variable and U a uniform distribution

(28)

24 CHAPTER 5. PROPOSAL FOR A NEW ESTIMATOR

Theorem 5.3.4. Let X1, X2, . . . be i.i.d. random variables all rounded on n decimals with

distribution function F and Z1, Z2, . . . defined as Zi = Xi +Ui, with Ui ∼i.i.d. U[0,10−n_] all

independent of Xi. LetF ∈ D(Gγ) withγ >0. Then if n→ ∞, k=k(n)→ ∞, k/n→0, ˆ

γR→P γ

Proof. Notice that Zi are all i.i.d. distributed, hence from Theorem 3.2.2 follows the result if Zi ∼H ∈ D(Gγ) , which immediately follows from Theorem 5.3.1

5.4

An estimator for finite tails

Since we now have an estimator for infinite tails, let us have a look at finite tails. According to Theorem 2.3.1 we are dealing with aγ <0 and it provides us with a similar limit relation as in the previous section,

lim t↓0 1−F(x∗−tx) 1−F(x∗₋_t₎ = lim_t_↓₀ P(X > x∗−tx) P(X > x∗₋_t =x −1/γ

Where x∗ = sup{x:F(x)< 1}. When we try to approach the problem in the same way as in Theorem 2.3.1 the 2 terms that should converge to 1 will not. This can be made clear intuitively by recalling that withγ >0 we have a infinite tail. So, also when we are adding a small stochast we still keep this tail. On the other hand if we have γ < 0 the tail is finite. This means that when we are adding a small stochast we are changing the endpoint a little bit. Thereby we are also changing the tail distribution. The small stochast is a lot smaller when having a infinite endpoint relatively to an finite endpoint. Therefore it changes the tail distribution with finite endpoint significantly.

(29)

Chapter 6

The estimator with simulated data

6.1

Introduction

In the previous chapter we have developed a new estimator, which is defined in Definition 5.3.3, but it is only valid for distributions with infinite tail, i.e. forγ >0. The new estimator was based on adding back the rounding error and in this way smoothing out the data. On the other hand we were not able to prove the same result for finite tails. When we add the rounding error back to a tail distribution with finite tails, we would shift the endpoint of the distribution and thereby change the tail distribution. In this chapter we compare the finite sample performance of the new estimator with that of existing estimators. Also we will look at a method of approximating the extreme value index for negative values.

6.2

Simulation for

ˆ

γ

R

Simulations are done with aP(1,0.1), which has a tail index of 0.1. In Figure 6.1 we compare the Hill estimator with our new estimator by a Hill plot. The black line in shows the estimate when using the Hill estimator, which is definitely not informative due to the big oscillations. The red line on the other hand gives the estimate of the new estimator. It is clear that the oscillation is reduced a lot and it is also a lot closer to the blue line, i.e. the estimate with the Hill estimator performed on non-rounded data. To correctly get an estimate for the extreme value index we will take the average of the estimates in some interval ofk. Therefore recall the the condition onk, k=k(n), k → ∞and k/n→0 as n→ ∞. This means the interval should lay somewhere in the tail, as explained in Chapter 4 we will use the mean over the first stable period.

Let us now look at the bias and mean squared error (MSE), to verify the efficiency of our new estimator. We runm= 100 iterations to estimate the extreme value index withn= 10000 from P(1,0.1). Then we calculate the Bias(_m1 Pm

i=1γˆi−γ) and MSE ( 1

m

Pm

i=1(ˆγi−γ)2). In

Table 6.1 the estimates are the mean for k = [400,425]. It is clear from the table that the estimation on rounded data with the original Hill estimator results in a much larger bias than our new estimator. Also the mean squared error is a factor 2 higher as our own estimator.

In conclusion, both the visual and numerical evidence confirms the out-performance of our estimator compared to the Hill estimator.

(30)

26 CHAPTER 6. THE ESTIMATOR WITH SIMULATED DATA

Figure 6.1: Hill plot of the Hill estimator and the adjusted Hill estimator on original and rounded data, n= 10000 simulated data is fromP(1,0.1)

Estimator mean bias MSE

ˆ

γH (on original data) 0.09962 -0.00038 0.00002 ˆ

γH (on rounded data) 0.08245 -0.01755 0.00034 ˆ

γR (on rounded data) 0.09099 -0.00901 0.00018

Table 6.1: The estimated results for the Hill estimator on non-rounded and rounded data, and the adjusted Hill estimator on rounded data. Numbers result of 100 simulations withN = 10000. Data is from P(1,0.1)

6.3

Simulation for

γ <

0

For estimating withγ <0 we can no longer use our estimator defined in Definition 5.3.3. This is because it is only proven forγ > 0. When trying to prove a similar result forγ < 0, adding a small uniform stochast will have significant effect on the distribution and will therefore not result in a correct estimate. This is also explained in 5.4. So we will look at existing estimators. The most obvious choice would be the moment estimator, because the γ is smaller zero. On the other hand, the maximum likelihood estimator also performs good for γ > −1. Therefore, we will compare the maximum likelihood estimator with the moment estimator. In Figure 6.2 is shown the estimation of γ with both the moment and maximum likelihood estimator. It is clear that the moment estimator, the red line, oscillates a lot heavier as the maximum likelihood estimate, the blue line.

Let us now look at the efficiency of the estimation. As also done above for our new estimator. The results are displayed in Table 6.2. It is clear the maximum likelihood estimate is far less biased but the mean squared error stays the same.

In Conclusion, we are not able to remove the oscillation from the estimate, but both vi-sual and numerical evidence shows the maximum likelihood estimate performs better than the moment estimator.

(31)

6.3. SIMULATION FORγ <0 27

Figure 6.2: Hill plot of the estimation with the moment estimator and the maximum likelihood estimator on rounded data,n= 10000 simulated data is fromGP D(0,1,−0.1)

Estimator mean bias MSE

ˆ

γM (on original data) -0.09229 0.00771 0.00957 ˆ

γM (on rounded data) -0.08572 0.01428 0.00340 ˆ

γM L (on rounded data)3 -0.10612 -0.00612 0.00403

Table 6.2: The estimated results for the moment estimator on normal and rounded data, and the maximum likelihood estimator on rounded data. Numbers result of 100 simulations with N = 10000. Data is from GP D(0,1,−0.1)

(32)

(33)

Chapter 7

Application on real life data

7.1

Introduction

In previous chapters we have developed several approaches to solve the problem with rounded data, i.e. we are now able to estimate the extreme value index for bothγ >0 asγ <0. In this chapter we will have a look at real life data from Greece. The developed methods will be used accordingly to estimate the probability of an earthquake happening greater than 6 on the scale of Richter.

7.2

The data

The used data is from the Institute of Geodynamics of the National Observatory of Athens [8]. The institue publishes all earthquakes with magnitude greater equals 2 on the scale of Richter from 01/01/2008 on. All data comes from measurements in a specified area in Greece as shown below in Figure 7.1. For the application we will consider all data from 1/1/2008 up till 30/9/2014. This comes down to a total of 70260 data points. In Figure 7.1 the histogram of this data is shown. In the beginning of the histogram the distribution is not really clear due to the decrease in observations between 2 and 3. On the other hand if we only consider the tail, i.e. data of three or higher on the scale of Richter. The observations look similar to the generalized Pareto distribution forγ ∈[−0.25,1].

7.3

The estimation of the extreme value index

For the estimation of the extreme value index for the tail distribution of magnitudes, we need an indication of the value of the extreme value index. When this indication gives us a positiveγ, it is best to do the estimation with the adjusted Hill estimator from Definition 5.3.3. However, if it is negative we better use the maximum likelihood estimator, which, as shown in previous chapter, will give us a better estimate compared with the moment estimator. In Figure 7.2 we see the the estimation with the moment estimator on the earthquake data in black and the adjusted Hill estimator in red.

The oscillation of the moment estimator has a big amplitude but theγ seems to be positive. We will check this by taking the means of the estimations in the first few intervals ofk. These results are shown in Table 7.1. We clearly see that only in one interval the estimate goes below zero, therefore we can conclude that the extreme value index is positive.

The adjusted Hill estimator is now the best choice for the estimation of the extreme value index. In Figure 7.2 the red line shows the estimate by the adjusted Hill method. In order to

(34)

30 CHAPTER 7. APPLICATION ON REAL LIFE DATA

Figure 7.1: Left, the area where the earthquakes are measured. Right, a histogram of the earthquake data happening from 1-1-2008 up till 30-9-2014.

Figure 7.2: The Moment estimator in black and the adjusted Hill estimator in red on the earthquake magnitudes.

choose an interval for k, we need to find the first stable period in the Hill plot. The Hill plot is shown in Figure 7.3. We choose [263,327] as the interval ofk.

The estimate we get isγ = 0.0902

7.4

The tail probabilities

Since we have estimated the extreme value index, we are able to calculate the tail probability. Therefore we will use the first limit relation from Theorem 2.3.1, which we can write in the following way, lim t→∞ P(X > tx) P(X > t) =x −1/γ

(35)

7.4. THE TAIL PROBABILITIES 31 k [300,400] [400,500] [500,600] [600,700] [700,800]

Mean estimate 0.040 0.024 0.011 -0.011 0.047

Table 7.1: Gamma estimate for different intervals of data used from the tail

Figure 7.3: The estimate of the adjusted Hill estimator for the earthquake magnitudes

letq > Xn−k,n be the wanted quantile,t=Xn−k,n the beginning of the tail and ˆγ the estimated extreme value index.

P(X > q) P(X > Xn−k,n) ≈ q Xn−k,n −1/γˆ ⇒P(X > q)≈P(X > Xn−k,n) q Xn−k,n −1/γˆ ⇒Pˆ(q) = k n q Xn−k,n −1/ˆγ

In Figure 7.4 we calculated the probability of an earthquake happening with magnitude greater than 6. We took the mean of the first stable period in the Hill plot, i.e.k = [359,423]. Which gave us the final result, ˆp= 1.144·10−4

(36)

32 CHAPTER 7. APPLICATION ON REAL LIFE DATA

(37)

Chapter 8

Concluding Remarks

In this bachelor thesis we have looked at earthquake data from the Institute of Geodynamics of the National Observatory of Athens. We thereby looked at the magnitude of earthquakes happening from 2008 up till now. The goal was to correctly estimate the extreme value index and thereby the tail probabilities, but because the data is rounded off on one decimal the existing estimators fail. As shown in Chapter 4, the estimation will start to oscillate with respect to the amount of data used from the tail, i.e. k.

The approach to tackle this problem was to make the data smoother. This was done by adding a uniform distribution that represents the rounding error. So, for example, this can be done by adding an uniformly distributed random variable on [−0.05,0.05] to an observations that is rounded on one decimal. This result is shown and proven in Theorem 5.3.1. The result is only valid with γ > 0. The reason for this is explained in Chapter 2. The end point of the tail-distribution for these gamma values is infinite. The consequence is that adding a small uniformly distributed random variable does not change the tail distribution of the overall distribution. Unfortunately it is not valid forγ <0, because these tail distributions have finite end point. Therefore, adding a small uniformly distributed random variable changes the tail distribution.

In Chapter 6 we tested the performance of the new estimator against the Hill estimator. The performance of the new estimator was severely better, i.e. there was almost no oscillation and the mean squared error and bias were also a lot smaller. For the case with finite tails we compared the moment estimator with the maximum likelihood estimator. From visual and numerical results the maximum likelihood estimator was better.

After having developed a method to correctly estimate the extreme value index we applied it to the magnitudes of the earthquakes in Chapter 7. Firstly, we used the moment estimator to assess whether the extreme value index was positive or negative. The result was a positive γ and therefore we could make the data more smooth by adding a small uniformly distributed random variable. This all resulted in a estimate of ˆγ = 0.0902

Finally, as discussed in Section 7.4, we were able to estimate the tail probability of a mag-nitude of 6 or higher, ˆp6= 0.0001144

For further research one can look at tail distributions with finite endpoint. Although the maximum likelihood estimate performs better than the moment estimator, it still oscillates. One can consider the data as a discrete distribution and try to perform a maximum likelihood estimate on these discrete data. The case ofγ = 0 is not covered in this thesis, so one can look at a way to estimate the extreme value index in this case. When using the estimator in practice, an asymptotic result is needed, for which further research is required.

(38)

34 CHAPTER 8. CONCLUDING REMARKS

8.1

Acknowledgements

During the last months there were some people who helped me with my research and writing of this thesis. Therefore I would like to express my great appreciation. First of all, thank you Juan-Juan Cai for the encouragement during the whole process, for always helping me whenever needed and for being a great supervisor. Next I would like to thank Richard Kraaij for his time, helping me with some difficult parts. Last but not least, I thank Marjolein Velthoen, for correcting my thesis on the English language.

(39)

Bibliography

[1] L. de Haan and A. Ferreira. Extreme Value Theory: An Introduction. Springer Series in Operations Research and Financial Engineering. Springer, 2007.

[2] A. L. M. Dekkers, J. H. J. Einmahl, and L. De Haan. A moment estimator for the index of an extreme-value distribution. The Annals of Statistics, 17(4):1833–1855, 12 1989.

[3] Holger Drees, Ana Ferreira, and Laurens de Haan. On maximum likelihood estimation of the extreme value index. The Annals of Applied Probability, 14(3):1179–1201, 08 2004. [4] R. A. Fisher and L. H. C. Tippett. Limiting forms of the frequency distribution of the largest

or smallest member of a sample. Mathematical Proceedings of the Cambridge Philosophical Society, 24:180–190, 4 1928.

[5] Boris Gnedenko. Sur la distribution limite du terme maximum d’une serie aleatoire. Annals of mathematics, pages 423–453, 1943.

[6] Bruce M. Hill. A simple general approach to inference about the tail of a distribution. The Annals of Statistics, 3(5):1163–1174, 09 1975.

[7] Muneya Matsui, Thomas Mikosch, and Laleh Tafakori. Estimation of the tail index for lattice-valued sequences. Extremes, 16(4):429–455, 2013.

[8] Institute of Geodynamics of the National Observatory of Athens. Earthquake dataset from 1-1-2008 to 30-9-2014, 2014. http://bbnet.gein.noa.gr/HL/database.

[9] Chen Zhou. Existence and consistency of the maximum likelihood estimator for the extreme value index. Journal of Multivariate Analysis, 100(4):794 – 815, 2009.

Schatten van de extreme value index voor afgeronde data (Engelse titel: Estimation of the extreme value index for imprecise data)