• No results found

Validity of the Parametric Bootstrap Procedure

6.4 Parametric Bootstrap Procedure for Goodness-of-Fit Testing

6.4.2 Validity of the Parametric Bootstrap Procedure

Motivation

Consider testing the appropriateness of various dependence structures on the basis of a random sample X1 = (X11, . . . , X1d), . . . , Xn = (Xn1, . . . , Xnd) from a continuous random vector X with cumulative distribution function F . Denote by F1, . . . , Fd the univariate marginal distributions of X and let C : Rd → R be the copula associated with F . C is the cumulative distribution function of U = ξ(X), where ξ : Rd → Rd is defined for all x1, . . . , xd∈ R by

ξ(x1, . . . , xd) = (F1(x1), . . . , Fd(xd)).

The vectors U1 = ξ(X1), . . . , Un = ξ(Xn) are only observed if the marginals F1, . . . , Fd are known. However, Fj can be estimated by

Fjn= 1 n + 1

n

X

i=1

1{Xij≤t},

for all t ∈ R and j ∈ {1, . . . , d}.

Letting ξn(x1, . . . , xd) = (F1n(x1), . . . , Fdn(xd))T, for all x1, . . . , xd ∈ R, we can base a test of the hypothesis

H0 : C ∈ C = {Cθ : θ ∈ O}

on the pseudo-observations

Uc1 = ξn(X1), . . . , cUn= ξn(Xn).

statistic

Sn= φ(GCn) (6.102)

with

GCn = n1/2(Cn− Cθn),

where Cθn is a parametric estimate of Cθ derived from the estimation θn= T (X1, . . . , Xn) of θ under H0 while Cn is the empirical copula defined for all u ∈ [0, 1]d by

Cn(u) = 1 n

n

X

i=1

1{ bU

i≤u}.

Another way is to base the test on Kendall’s distribution, i.e. the distribution function K of the probability integral transformation W = F (X). Since we can write W in the form W = C(U ), a consistent estimator of K is given by the empirical distribution Kn of the pseudo-observations Wc1 = Cn( bU1), . . . , cWn= Cn( bUn), defined by

Kn(w) = 1 n

n

X

i=1

1{cW

i≤w}.

Therefore, if Kθ denotes the distribution function of W when C = Cθ ∈ O, and if Kθn is a parametric estimate of Kθ derived from θn= T (X1, . . . , Xn) under the subsidiary hypothesis

H0K : K ∈ K = {Kθ : θ ∈ O}, a goodness-of-fit test can be based on a continuous functional

Sn = φ(GKn) (6.103)

of

GKn = n1/2(Kn− Kθn).

Whether H0C is tested using GCn or H0K is tested using GKn, the limiting distribution of the test Sn depends not only on the unknown parameter θ but also possibly on the nuisance parameters F1, . . . , Fd. Although a parametric bootstrap may help to find valid P -values, this cannot be done on the basis of the results of Stute et al [52] (see [23]), because of the presence of dependence among the set of pseudo-observations bU1, . . . , bUn and cW1, . . . , cWn. It then becomes necessary to establish the validity of the parametric bootstrap in situations where the hypothesis to be tested

Chapter 6. A Review of Goodness-of-Fit Test Statistics 79

concerns the distribution P of an unobservable s-variate random vector U , viz.

H0 : P ∈ P = {Pθ : θ ∈ O},

where O is an open subset of Rp, and U = ξ(X) for some function ξ : Rd→ Rs of an observable d-variate random vector X.

In order to encompass procedures based on GCn and GKn as special cases, suppose that a test of H0 is to be derived from a continuous functional

Sn = φ(GAn) (6.104)

of an abstract empirical process of the form

GAn = n1/2(An− Aθn),

where Aθn and Anare respectively parametric and nonparametric estimate of an abstract quantity A that depends on P . More generally, A is a function mapping a closed rectangle T ⊂ [−∞, ∞]

into Rs, and Aθ denotes the form taken by A when P = Pθ for some θ ∈ O. Therefore, T = [0, 1]d, s = d and Aθ = Cθ for the test based on GCn; T = [0, 1], s = 1 and Aθ = Kθ for a test based on GKn.

In order to show that the parametric bootstrap yields a valid approximation to the null distribution of the empirical process GAn under appropriate conditions, the processes

Θn = n1/2n− θ) and

An = n1/2(An− A)

need to converge weakly, as n → ∞, respectively to a centered random variable Θ and a centered process A in the space D(T, Rs) of c`adl`ag processes from T to Rs.

Symbolically, we write

Θn = n1/2n− θ) Θ (6.105)

and

An = n1/2(An− A) A. (6.106)

Validity of the One-Level Parametric Bootstrap

Let U1, . . . , Un be a random sample from some distribution P , and assume that we want to test the hypothesis

H0 : P ∈ P = {Pθ : θ ∈ O},

where P is a family of probability measures on Rd indexed by the parameter θ living in an open set O ⊂ Rp. Assume that P is identifiable, i.e.,

θ 6= θ0 ⇒ Pθ 6= Pθ0.

Let T ⊂ [−∞, ∞] be a closed rectangle and suppose that the test of H0 is to be based on an abstract mapping A : A → Rs. Suppose that A = Aθ when P = Pθ, and let A = {Aθ : θ ∈ O}.

Then the identifiability is ensured if for each  > 0,

inf

 sup

t∈T

kAθ(t) − Aθ0(t)k : θ ∈ O and |θ − θ0| > 



> 0.

Furthermore, assume that the mapping θ → Aθ is Frechet differentiable with derivative ˙Aθ, i.e., for all θ0 ∈ O,

lim

khk→0sup

t∈T

kAθ0+h(t) − Aθ0(t) − ˙Aθ(t)hk

khk = 0. (6.107)

Finally, let θn = Tn(U1, . . . , Un) be a consistent estimate of θ and assume that the D(T, Rs )-valued process An = Υn(U1, . . . , Un) estimates A consistently. Suppose specifically that the process Θn= n1/2n− θ) and An= n1/2(An− A) have centered Gaussian limits when n → ∞ as in 6.105 and 6.106.

Before we give the conditions under which the weak limits of the processes GAn = n1/2(An− A) and GAn? = n1/2(A?n− A?) are independent and identically distributed and then guarantee that a parametric bootstrap based on the process An is valid, let us give the following definitions (see [23]).

Definition 6.2. A family P = {Pθ : θ ∈ O} is said to belong the class S(λ) for a given measure λ (independent of θ) if

1. The measure Pθ is absolutely continuous with respect to λ for all θ ∈ O.

2. The density pθ = dPθ admits first and second order derivatives with respect to all compo-nents of θ ∈ O. The gradient (row) vector with respect to θ is denoted ˙pθ, and the Hessian

Chapter 6. A Review of Goodness-of-Fit Test Statistics 81 converges weakly in D(T, Rs) × Rp to a centered Gaussian pair (A, W) and the Fr´echet derivative A of A defined by Equation (6.107) satisfies˙

A(t) = E[A(t)W˙ P(t)]

and

A?n= n1/2(A?n− A).

The conditions under which the weak limits of the processes GAn = n1/2(An− Aθn) and GAn? = n1/2(A?n− Aθ?n) are independent and identically distributed are given by the following theorem, which then guarantees that a parametric bootstrap based on the process An is valid.

Theorem 6.15. Assume that P ∈ S(λ) and that as n → ∞,

(An, Θn, WP,n) (A, Θ, WP) in D(T, Rs) × Rp⊗2, where the limit is a centered Gaussian process.

Let Γ = E[ΘWTP] and set a(t) = E[A(t)WP] for every t ∈ T. Then, as n → ∞, (An, A?n, Θn, Θ?n) (A, A?, Θ, Θ?) in D(T, Rs)⊗2× Rp⊗2.

In the limit, A? = A+ aΘ and Θ? = Θ+ ΓΘ are defined in terms of an independent copy (A, Θ) of (A, Θ). If in addition (An, θn) is P-regular for A × O, then

(GAn, GAn?) (A − ˙AΘ, A− ˙AΘ) in D(T, Rs)⊗2, as n → ∞.

A Two-Level Parametric Bootstrap

When performing a goodness-of-fit test based on a continuous functional Sn = φ(GAn) of a process GAn = n1/2(An− Aθn), we have to compute Aθn at various points, but this is not always easy. For tests based on the empirical copula, we have Aθn = Cθn, and many copula families are not algebraically closed. In this case, a simple way to solve the problem is to generate a random sample V1?, . . . , Vm? from the probability measure Qθn with distribution function Cθn and for u ∈ [0, 1]d, to approximate Cθn(u) by

n?(u) = 1 m

m

X

j=1

1{Vj?≤u}.

In other words, we replace Aθn by an approximation ˇA?n= Ψm(V1?, . . . , Vm?) built from a random sample V1?, . . . , Vm? from

Q ∈ Q = {Q : θ ∈ O}.

Chapter 6. A Review of Goodness-of-Fit Test Statistics 83

For the approach to make sense, we assume that if A = Aθ0 and ˇAn = Ψm(V1, . . . , Vm) for a random sample V1, . . . , Vm from Q = Qθ0, then

n= n1/2( ˇAn− A) ˇA ∈ D(T, Rs), (6.108)

as n → ∞ (and hence m → ∞).

Given that such a process exists, the following method can be used to circumvent the lack of a closed form for Aθn in the computation of the test Sn.

1. Compute θn= Tn(U1, . . . , Un) and let An= Υn(U1, . . . , Un).

2. Given U1, . . . , Un, generate a random sample V1?, . . . , Vm? from Qθn. 3. Let ˇA?n= Ψm(V1?, . . . , Vm?) and compute

Sn= φ(GAnˇ?), (6.109)

where

GAˇ

?

n = n1/2(An− ˇA?n). (6.110) A second parametric bootstrap procedure is necessary to approximate the distribution of Sn. To this end (see [23]), take N large and repeat the following steps for every k ∈ {1, . . . , N }:

1. Given U1, . . . , Un, V1, . . . , Vn, generate a random sample U1,k? , . . . , Un,k? from Pθn. 2. Compute θ?n,k = Tn(U1,k? , . . . , Un,k? ) and let A?n,k = Υ(U1,k? , . . . , Un,k? ).

3. Given U1, . . . , Un, V1?, . . . , Vn? and U1,k? , . . . , Un,k? , generate a random sample V1,k??, . . . , Vn,k??

from Qθ?n,k. 4. Let

??n,k = Ψm(V1,k??, . . . , Vn,k??) and compute

Sn,k? = φ(GAn,kˇ?), (6.111) where

GAˇ

? = n1/2(A? − ˇA?? ). (6.112)

Under the convention that large values of Sn lead to the rejection of H0, and under regularity

In order to establish the validity of the conditions of the previous two-level parametric bootstrap, Genest and R´emillard [23] first introduced the following notation. Let U1, . . . , Un and V1, . . . , Vm independent random samples from Pθn and Qθn, respectively.

2. Given U1?, . . . , Un?, V1?, . . . , Vm? and θ?n= Tn(U1?, . . . , Un?), the random vectors V1??, . . . , Vn??

The following result (see [23]) gives the conditions under which the weak limits of the processes

GAˇ

?

n = n1/2(An− ˇA?n)

Chapter 6. A Review of Goodness-of-Fit Test Statistics 85

and

GAˇ

??

n = n1/2(A?n− ˇA??n)

are independent and identically distributed, and then proves the validity of a two-level parametric bootstrap.

Theorem 6.16. Assume that P ∈ S(λ), Q ∈ S(ν) and that as n → ∞, (An, ˇAn, Θn, WP,n, WQ,n) (A, ˇA, Θ, WP, WQ)

and that the limit is a centered Gaussian process in D(T, Rs)m⊗2× Rp⊗3. Let Γ = E[ΘWTp] and set a(t) = E[A(t)WTp] and ˇa(t) = ˇA(t)WTQ for every t ∈ T. Then as n → ∞,

(An, A?n, ˇAn, ˇA?n, ˇA??n, Θn, Θ?n) (A, A?, ˇA,Aˇ?, ˇA??, Θ, Θ?)

in D(T, Rs)⊗5 × Rp⊗2. In the limit, A? = A + aΘ, Θ? = Θ + ΓΘ, ˇA? = ˇA + ˇaΘ, Aˇ??= ˇA⊥⊥+ ˇaΘ? where (A, Θ) is an independent copy of (A, Θ). In addition, the processes A,ˇ Aˇ and ˇA⊥⊥ are mutually independent and identically distributed, as well as independent of A, A, Θ and Θ. Moreover, if (An, θn) is P-regular for A × O and ˇAn is Q-regular for A, then

(GAnˇ?, GAnˇ??) (A − ˇA− ˙AΘ, A− ˇA⊥⊥− ˙AΘ) in D(T, Rs)⊗2 as n → ∞.

Related documents