Summary - Semiparametric Bayesian Risk Estimation for Complex Extremes

In this chapter, we have reviewed how the probability of very extreme sets can be inferred from that of moderately extreme sets, in the univariate setting using maxima or exceedances of a high threshold, and in the multivariate setting, where no natural ordering exists. We have then focused on extremes in time series, where dependence in time prevents from using standard inference procedures; the peaks-over-threshold approach is one of the methods that deal with short-range dependence of extreme observations. In the two-dimensional case, we have described how asymptotic dependence can be described and measured. We have emphasised the importance of models for data that are asymptotically independent, with dependence at subasymptotic levels.

The last section was dedicated to the conditional tail approach, and we have seen how it can be used to estimate the probabilities of extreme events. In Laplace margins, this approach is parsimonious and easy to use for inference on multidimensional data. It is very flexible and covers a broad class of extremal structures of dependence, but lacks self-consistency for extreme joint probabilities extrapolated from different conditional distributions. It is also unclear to what extent the assumptions needed to make inference have an impact on the estimation of extreme probabilities and on the assessment of uncertainty for risk estimates.

We suggest a new approach to fitting the conditional tail model in Chapter 5, but before we do so, we introduce background material on the Bayesian nonparametric framework and we explore finite-sample properties of the conditional model in Chapter 4.

3 The Dirichlet process

3.1 Formal definitions

3.1.1 The Dirichlet distribution

In Bayesian modelling, the Dirichlet distribution is a conjugate prior for the parameters of a multinomial distribution. It is also a generalisation of the beta distribution to the (d − 1)- dimensional simplexS = {(x1,..., xd−1) : xj≥ 0,Pd−1_{j =1}xj≤ 1}. The Dirichlet density function

is f (x1,..., xd−1| γ1,...,γd) =Γ(γ1+ · · · + γd) Γ(γ1)···Γ(γd) Ã 1 −d−1X j =1 xj !γd_{−1 d−1} Y j =1 xγj−1 j , (x1,..., xd−1) ∈ S, (3.1) withγj> 0, j = 1, . . . , d. Ferguson (1973) uses a constructive approach that permits a more

general definition. Let Z1,..., Zdbe independently gamma distributed Ga(1,γj) variables, with

shape parametersγj≥ 0, and γj> 0 for some j , j = 1, . . . , d. The distribution of (X1,..., Xd),

with Xj= Zj Pd k=1Zk , j = 1,...,d,

is Dirichlet with parameter (γ1,...,γd), which we shall write Dir(γ1,...,γd). Anyγj= 0 implies

that Xj≡ 0; if γj> 0 for all j = 1, . . . , d, the density function of (X1,..., Xd−1) is exactly (3.1).

The marginal expectation and variance are E(Xj) =_P_dγj k=1γk , var(Xj) = γ j¡Pd_k=1γk− γj¢ ¡Pd k=1γk ¢2¡Pd k=1γk+ 1 ¢. (3.2)

An interesting property of the Dirichlet distribution is its updating of prior beliefs after recording multinomial observations, building a useful link with the Pólya urn scheme devel- oped in Section 3.1.3. If (X1,..., Xd) have prior distribution Dir(γ1,...,γd) and observations

32 Chapter 3. The Dirichlet process are such that

Pr(Y = j | X1,..., Xd) = Xj, j = 1,...,d,

almost surely, then the posterior distribution becomes

(X1,..., Xd) | {Y = j } ∼ Dir¡γ1,...,γj+ 1, . . . , γd¢. (3.3)

3.1.2 Ferguson’s definition

Given a setP and its associated σ-field T , Ferguson (1973) gives the following definition of a Dirichlet process:

Definition 3.1 (Dirichlet process)

Letν(·) be a finite measure on (P ,T ). We say that P is a Dirichlet process on (P ,T ) with parameter_{ν(·) and we write DP(ν) if, for every k = 1,2,... and measurable partition (C}1,...,Ck)

ofP ,

{P(C1),...,P(Ck)} ∼ Dir{ν(C1),...,ν(Ck)}.

The joint probability of any measurable setsD1,...,Dl, for any l = 1,2,..., can be derived from

the partition with sets

Ck1,...,kl= l \ j =1D kj j ,

where kj∈ {0, 1} and D1_j is interpreted asDj andD0_j as its complementDc_j = X \ Dj. The

marginal distribution of {P(D1),...,P(Dl)} is given by

P ¡Dj¢ = X

{(k1,...,kl):kj=1}

P ¡Ck1,...,kl ¢

A more practical expression for the finite measureν(·) is γP0(·) = ν(·), where γ = ν(P ) > 0 is

a constant termed the concentration parameter and P0(·) is a probability distribution termed

the baseline distribution. These terms can be understood from the expectation and variance of the Dirichlet distribution (3.2), which yield, for any set_{C ∈ T ,}

E{P(C )} = P0(C ),

var{P(C )} =P0(C ){1 − P0(C )}

γ + 1 . (3.4)

The baseline distribution P0(·) can thus be interpreted as the prior belief for P(·), and the

concentration parameterγ as the assurance we have in this prior belief; the larger the value of γ, the stronger the confidence.

In analogy with the Bayesian update of the Dirichlet distribution as a prior distribution for multinomial data in (3.3), Ferguson shows that if X is a sample from the Dirichlet process

3.1 Formal definitions 33 P(·) = γP0(·), then P(· | X ) is the updated Dirichlet process DP{γP0(·) + δX(·)}, where δx(·) is

the measure on (_{P ,T ) such that δ}x(C ) = 1(x ∈ C ), for any C ∈ T .

3.1.3 Extension of the Pólya urn scheme

Blackwell and MacQueen (1973) give another definition of the Dirichlet process based on a generalisation of Pólya urn schemes using a continuum of colours.

Definition 3.2 (Pólya sequence)

The sequence (Xn) of random variables taking values inP is a Pólya sequence with parameter

ν(·) if for every C ∈ T , Pr(X1∈ C ) = ν(C )/ν(P ) = P0(C ), and

Pr(X_n+1∈ C | X1,..., Xn} = νn(C )/νn(P ),

withνn(·) = ν(·) + Pn_{i =1}δXi(·).

For finiteP , this definition mimics the process of drawing a ball from an urn initially contain- ingν(x) balls of colour x and putting the ball drawn back into the urn with an additional ball of the same colour. By extending this to the continuous setting as in Definition 3.2, Blackwell and MacQueen show thatνn(·)/νn(P ) converges with probability 1 to a discrete distribution

P(·) and P ∼ DP(ν). They also show that given P(·), the variables X1,..., Xnare independent

and

X1,..., Xn| P ∼ P.

As we shall see in Section 3.2, this is one of the building blocks of the Dirichlet process mixture model.

3.1.4 Constructive definition

Similarly to the construction of a Dirichlet distribution using gamma-distributed random variables described in Section 3.1.1, Ferguson (1973) introduces an alternative definition of the Dirichlet process through a gamma process with independent increments. A more intuitive approach is presented by Sethuraman (1994), who shows that a Dirichlet process with measure γP0(·) can be represented as

P(·) =X∞

c=1

wcδXc(·), (3.5)

where the weights wcare constructed using the stick-breaking process as follows,

w1= V1, V1∼ Beta(1, γ), wc= Vc× c−1 Y k=1 (1 −Vk), Vciid∼ Beta(1, γ), c = 2,3,..., (3.6)

where the Vcare mutually independent and independent of the Xc, which are independent

34 Chapter 3. The Dirichlet process length 1 to get w1, then breaking the remainder of the stick to get w2, and so on. Notice that

P_∞ c=1wc= 1, since N X c=1 wc= V1+ N X c=2 Vc c−1 Y k=1 (1 −Vk) = 1 − N Y c=1(1 −Vc )−→ 1,Pr N → ∞,

where the convergence holds withP -probability 1, and the last equality is obtained using a simple recursion argument.

The very simple representation offered by the stick-breaking process is widely used in the conditional approach to fitting Dirichlet process mixtures, as we shall see in Section 3.3.2.

In document Semiparametric Bayesian Risk Estimation for Complex Extremes (Page 50-54)