6. Some Aspects of Heavy Tails 118
6.3. Parameter Estimation
If a model for the data is chosen then it remains to estimate the parameters.
6.3.1. The Exponential Distribution
For the exponential distribution both the maximum likelihood estimator and the method of moments yield the estimator
ˆ α =
1 n
n
X
i=1
Yi
−1
.
6.3.2. The Gamma Distribution The log likelihood function is
n
X
i=1
(γ log α − log Γ(γ) + (γ − 1) log Yi− αYi) .
This yields the equations These equations can only be solved numerically.
Because Gamma distributions are light tailed the method of moments can be used. This yields the estimators
ˆ
i=1Yi. Because of the simplicity the second estimator should be prefered.
6.3.3. The Weibull Distribution The log likelihood function is
n
X
i=1
(log α + log c + (α − 1) log Yi− cYiα) .
We can assume that α 6= 1. The we obtain the equations c1 Also these equations have to be solved numerically.
The equations obtained from the method of moments are even harder to solve.
Anyway, taking into account that the distribution is heavy tailed for α < 1, the method of moments should not be used, except if it is clear that α > 1.
6.3.4. The Lognormal Distribution
Here obviously the best unbiased estimator is (compare with the normal distribution)
ˆ
6.3.5. The Pareto Distribution
Solving the maximum likelihood equations yields the two equations α Also these equations have to be solved numerically.
We shall try to find another approach. Assume for the moment that β is known.
Then the maximum likelihood estimator for α ˆ
looks similar to the estimator in the exponential case. In fact, IIP[log(1 + Yi/β) > x] = IIP[Yi > β(ex− 1)] = e−αx
is exponentially distributed. Because, for Yi large, log(1 + Yi/β) ≈ log(Yi/β) = log Yi− log β, we find that intuitively log Yi is approximately Exp(α) distributed for large values of Yi. Explicitly
IIP[log Yi > x] = (1 + ex/β)−α= (βe−x+ 1)−αβαe−αx ∼ βαe−αx
If we choose M large enough then ˆ
α = 1
Pn
i=11I{log Yi>M } n
X
i=1
1I{log Yi>M }(log Yi− M )−1
is an estimator for α.
There remains one problem: How shall we choose M ?
• The larger M the closer is the distribution of log Yi − M to an exponential distribution.
• The larger M the less data are used in the estimator.
It is clear that M can be chosen to be large if n is larger. Thus M must depend on n. Moreover the optimal M will depend on the parameters α and β, which are not known. Thus M must depend on the sample as well. This can be achieved if we estimate α only using the largest k(n) + 1 data. Let {Yi:n: i ≤ n} denote the order statistics Y1:n≤ Y2:n≤ · · · ≤ Yn:n of {Yi : i ≤ n}. We define the estimator
ˆ
α = 1 k(n)
k(n)
X
i=1
(log Yn+1−i:n− log Yn−k(n):n)−1
. (6.2)
Intuitively, this estimator will converge to α as n → ∞ provided k(n) → ∞ and Yk(n):n → ∞ a.s.. It can be shown that whenever k(n)/n → 0 and k(n)/ log log n →
∞ then ˆα → α a.s..
It remains to estimate β. Note that for n → ∞ and γ ∈ (0, 1) 1 − (1 + Ybγnc:n/β)−α → γ a.s.,
which we will see later (Section 6.4.1). Thus for n large we find the estimator β =ˆ Yb(1−γ)nc:n
γ−1/ ˆα− 1.
Because quite often the distribution will not be explicitly Pareto, only the tail will behave like the tail of a Pareto distribution, we wish to let γ → 0 as n → ∞. Choose therefore γn = k(n), i.e. our estimator is
β =ˆ Yn−k(n):n (n/k(n))1/ ˆα− 1.
In order to simplify the estimator note that asymptotically this is the same as β =ˆ k(n)
n
1/ ˆα
Yn−k(n):n. (6.3)
20 40 60 80 100 1.2
1.4 1.6 1.8 2.0
Figure 6.3: Hill’s estimator as function of k for the Swedish fire insurance data.
One can show that ˆβ → β a.s. as n → ∞.
The last question of interest is how to choose k(n) optimally. It can be shown that in many situations the following holds. If, for any ε > 0, k(n)n−2/3+2εconverges to 0 then the rate of convergence of ˆα to α is n−1/3+ε. Thus, in practice, one would take ε as small as possible and thus use k(n) = bn2/3c. Another possibility is to plot the estimator for k = 1, 2, 3 . . .. Then choose the first point, where the function
ˆ
α(k) stabilises (for a short time). More about this and also other estimators can be found in [36].
6.3.6. Parameter Estimation for the Fire Insurance Data
We now fit a Pareto and a lognormal model to the Swedish and Danish fire insurance data considered before. We already have seen that a heavy tailed distribution func-tion should be used in order to model the fire insurance data. The explicit figures for the claims have not been given here, so the reader cannot do the calculations himself.
Example 6.1 (continued). In the Pareto case we choose k(218) = b2182/3c = 36.
Figure 6.3 indicates that k indeed should be chosen close to 36. We find the estimates ˆ
α = 1.37567, β = 0.879807.ˆ
Note that we find the mean value ˆβ/( ˆα − 1) = 2.34199, whereas the empirical mean value is 2.28172, only slightly smaller. The big difference lies in the estimate for
50 100 150 200 1.4
1.5 1.6 1.7 1.8 1.9
Figure 6.4: Hill’s estimator as function of k for the Danish fire insurance data.
the variance. The empirical variance is 14.3158, whereas in the Pareto model the variance is infinite.
Fitting a lognormal model we find the parameters ˆ
µ = 0.218913, σˆ2 = 0.945877.
In the lognormal model we obtain for the mean value 1.99741 which is even smaller than the empirical mean value. The variance of the lognormal model is 6.28397, which again is smaller than the empirical variance. This shows that choosing a Pareto model the insurance company is on the safer side. Therefore the Pareto model is very popular among practitioners. The comparison with the empirical mean value and the empirical variance shows that we should not choose this model.
The problem we have with the lognormal model is that there are too many small claims, i.e. a real lognormal distribution would have more claims close to eµˆ. These small claims are responsible that ˆµ is relative small. The Pareto distribution has a lot of very small claims, but for the estimation they are not considered. That is another advantage of the Pareto model. For the estimation only the tail is considered. And the tail is the important part of the distribution function in practice.
Example 6.2 (continued). In the Pareto case we choose k(500) = b5002/3c = 62.
Figure 6.4 shows that the Hill estimator as a function of k is very unstable. For a short range it stabilizes around 70. Thus 62 seems not to be a too bad choice. It starts really to stabilize at about k = 170. This seems to be too large for k. Anyway, we calculate the parameters for both k = 62 and k = 180.
For k = 62 the estimates are ˆ
α = 1.69605, β = 4.20413.ˆ
The mean value is 6.03997, whereas the empirical mean was 9.08176. This indicates that k = 62 is not a good choice. However, the Pareto distribution will have a lot of very small claims, which here is not the case. We should here only model the claims over a certain threshold to be Pareto distributed. For example, if we only take claims over 10 into account, i.e. the largest 109 claims the empirical mean will be 24.0818, whereas the chosen parametrization will yields a conditional mean of 30.4067.
For k = 180 the estimates are ˆ
α = 1.34103, β = 2.87879.ˆ
The mean value is 8.4415, which still is smaller than the empirical mean value.
Taking only values over 10 into account the conditional mean will be 47.7646, which is much larger than the empirical conditional mean 24.0818.
Fitting a lognormal distribution the parameters are ˆ
µ = 1.84616, σˆ2 = 0.461025.
In the lognormal model we obtain for the mean value 7.97787 which is also much smaller than the empirical mean value. The variance of the lognormal model is 100.924, which again is much smaller than the empirical variance 270.933. This shows that choosing a Pareto model the insurance company is on the safer side.