• No results found

Uncertainty quantification for the family-wise error rate in multivariate copula models

N/A
N/A
Protected

Academic year: 2021

Share "Uncertainty quantification for the family-wise error rate in multivariate copula models"

Copied!
29
0
0

Loading.... (view fulltext now)

Full text

(1)

Uncertainty quantification for the family-wise error rate

in multivariate copula models

Thorsten Dickhaus

(joint work with Taras Bodnar, Jakob Gierl and Jens Stange)

University of Bremen Institute for Statistics

Adaptive Designs and Multiple Testing Procedures Workshop 2015

University of Cologne, 24.06.2015

(2)

STPs with p-value copulae Asymptotics Copula calibration Application

Outline

Simultaneous test procedures in terms of p-value copulae Asymptotic behavior of empirically calibrated multiple tests Estimation of an unknown copula

Application: Exchange rate risks

References:

Dickhaus, T., Gierl, J. (2013): Stange, J., Bodnar, T., Dickhaus, T. (2014):

Simultaneous test procedures in Uncertainty quantification for the family-wise terms of p-value copulae. error rate in multivariate copula models.

CMCGS 2013 Proceedings, 75-80. AStA Adv. Stat. Anal., online first.

(3)

Notational setup

Given: Statistical model (Ω, F , (P

ϑ

)

ϑ∈Θ

)

H

m

= (H

i

)

i=1,...,m

Family of null hypotheses with ∅ 6= H

i

⊂ Θ and alternatives K

i

= Θ \ H

i

(Ω, F , (P

ϑ

)

ϑ∈Θ

, H

m

) multiple test problem ϕ = (ϕ

i

: i = 1, . . . , m) multiple test for H

m

Test decision

Hypotheses 0 1

true U

m

V

m

m

0

false T

m

S

m

m

1

W

m

R

m

m

(4)

STPs with p-value copulae Asymptotics Copula calibration Application

Local significance level

(Strong) control of the Family-Wise Error Rate (FWER):

∀ϑ ∈ Θ : FWER

ϑ

(ϕ) = P

ϑ

(V

m

> 0) ≤ α

!

Bonferroni correction:

Carry out each individual test ϕ

i

at local level α

loc.

:= α/m.

Let I

0

(ϑ) denote the index set of true hypotheses in H

m

under ϑ.

FWER

ϑ

(ϕ) = P

ϑ

 [

i∈I0(ϑ)

i

= 1}

≤ X

i∈I0(ϑ)

P

ϑ

({ϕ

i

= 1})

≤ m

0

α

loc.

≤ mα

loc.

= α.

(5)

Simultaneous test procedures

K. R. Gabriel (1969), Hothorn et al. (2008)

Definition:

Define the (global) intersection hypothesis by H

0

= T

m i=1

H

i

. Consider the extended problem (Ω, F , (P

ϑ

)

ϑ∈Θ

, H

m+1

) with H

m+1

= {H

i

, i ∈ I

:= {0, 1, . . . , m}}.

Assume real-valued test statistics T

i

, i ∈ I

, which tend to larger values under alternatives. Then we call

(a) (H

m+1

, T ) with T = {T

i

, i ∈ I

} a testing family.

(b) ϕ = (ϕ

i

, i ∈ I

) a simultaneous test procedure (STP), if

∀0 ≤ i ≤ m : ϕ

i

=

( 1, if T

i

> c

α

,

0, if T

i

≤ c

α

, such that

∀ϑ ∈ H

0

: P

ϑ

({ϕ

0

= 1}) = P

ϑ

({T

0

> c

α

}) ≤ α.

(6)

STPs with p-value copulae Asymptotics Copula calibration Application

FWER control with STPs

Assumptions (for the moment):

1. There exists a ϑ

∈ H

0

which is a least favorable parameter configuration (LFC) for the FWER of the STP ϕ

based on T

1

, . . . , T

m

.

2. ∀1 ≤ i ≤ m : H

i

: {θ

i

(ϑ) = θ

i

}, where θ : Θ → Θ

0

3. L(T

i

) is continuous under H

i

with known cdf. F

i

. Exemplary model classes:

• ANOVA1: all pairs comparisons (Tukey contrasts), multiple comparisons with a control group (Dunnett contrasts) Assumptions 1. - 3. are fulfilled (θ: difference operator)

• Multiple association tests in contingency tables, genetic association studies

Assumptions 1. - 3. are fulfilled, at least asymptotically

(for large sample sizes)

(7)

FWER control with STPs

Assumptions (for the moment):

1. There exists a ϑ

∈ H

0

which is a least favorable parameter configuration (LFC) for the FWER of the STP ϕ

based on T

1

, . . . , T

m

.

2. ∀1 ≤ i ≤ m : H

i

: {θ

i

(ϑ) = θ

i

}, where θ : Θ → Θ

0

3. L(T

i

) is continuous under H

i

with known cdf. F

i

. Exemplary model classes:

• ANOVA1: all pairs comparisons (Tukey contrasts), multiple comparisons with a control group (Dunnett contrasts) Assumptions 1. - 3. are fulfilled (θ: difference operator)

• Multiple association tests in contingency tables, genetic association studies

Assumptions 1. - 3. are fulfilled, at least asymptotically

(for large sample sizes)

(8)

STPs with p-value copulae Asymptotics Copula calibration Application

Copulae

Theorem: (Sklar (1959, 1996))

Let X = (X

1

, . . . , X

m

)

>

a random vector with values in R

m

and with joint cdf F

X

and marginal cdfs F

X1

, . . . , F

Xm

.

Then there exists a function C : [0, 1]

m

→ [0, 1] such that

∀x = (x

1

, . . . , x

m

)

>

∈ ¯ R

m

: F

X

(x) = C(F

X1

(x

1

), . . . , F

Xm

(x

m

)).

If all m marginal cdfs are continuous, the copula C is unique.

Obviously, it holds:

If all X

i

, 1 ≤ i ≤ m, are marginally distributed as UNI[0, 1],

then F

X

= C !

(9)

p-values, distributional transforms

Under our general assumptions 1. - 3., appropriate p-values corresponding to the T

i

are given by

∀1 ≤ i ≤ m : p

i

= 1 − F

i

(T

i

).

Properties of p

i

under assumptions 1. - 3.:

• T

i

> c

α

⇐⇒ p

i

< 1 − F

i

(c

α

), if F

i

is strictly isotone.

We may think of α

(i)loc.

:= 1 − F

i

(c

α

) as a multiplicity-adjusted local significance level.

• 1 − p

i

is equal to R ¨uschendorf’s distributional transform.

• Under H

i

, we have p

i

∼ UNI[0, 1] and 1 − p

i

∼ UNI[0, 1].

(10)

STPs with p-value copulae Asymptotics Copula calibration Application

A simple calculation

Let us construct an STP ϕ in terms of p-values.

Due to the above, we only have to consider multiple tests of the form ϕ = (ϕ

i

: 1 ≤ i ≤ m) with ϕ

i

= 1

[0,α(i)loc.)

(p

i

).

For arbitrary ϑ ∈ Θ and ϑ

∈ H

0

, we get:

FWER

ϑ

(ϕ) = P

ϑ

 [

i∈I0(ϑ)

{p

i

< α

(i)loc.

}

 ≤ P

ϑ m

[

i=1

{p

i

< α

(i)loc.

}

!

= 1 − P

ϑ m

\

i=1

{1 − p

i

≤ 1 − α

(i)loc.

}

!

= 1 − C

ϑ

(1 − α

(1)loc.

, . . . , 1 − α

(m)loc.

),

with C

ϑ

denoting the copula of (1 − p

i

: 1 ≤ i ≤ m) under ϑ

.

(11)

Projection method, Hothorn et al. (2008)

Assume that an (asymptotically) jointly normal vector of test statistics T = (T

1

, . . . , T

m

)

>

is at hand.

For control of the FWER by an STP based on T, determine the equicoordinate (two-sided) (1 − α)-quantile of the joint normal distribution of T and project onto the axes.

R: vcov() + mvtnorm

(12)

STPs with p-value copulae Asymptotics Copula calibration Application

FWER control at level α = 0.3 via contour lines of the copula C ϑ

We obtain α

loc.

≈ 0.2.

Cross-check: Φ

−1

(1 −

αloc.

/

2

) is equal to the tabulated normal quantile for the chosen parameters. The structural information provided by C

ϑ

increases power!

If one hypothesis is more important than the other,

just change the slope of the blue straight line.

(13)

STPs with p-value copulae Asymptotics Copula calibration Application

FWER control at level α = 0.3 via contour lines of the copula C ϑ

Cross-check: Φ

−1

(1 −

αloc.

/

2

) is equal to the tabulated normal quantile for the chosen parameters. The structural information provided by C

ϑ

increases power!

If one hypothesis is more important than the other,

just change the slope of the blue straight line.

(14)

STPs with p-value copulae Asymptotics Copula calibration Application

FWER control at level α = 0.3 via contour lines of the copula C ϑ

We obtain α

loc.

≈ 0.2.

Cross-check: Φ

−1

(1 −

αloc.

/

2

) is equal to the tabulated normal quantile for the chosen parameters. The structural information provided by C

ϑ

increases power!

If one hypothesis is more important than the other,

just change the slope of the blue straight line.

(15)

STPs with p-value copulae Asymptotics Copula calibration Application

FWER control at level α = 0.3 via contour lines of the copula C ϑ

Cross-check: Φ

−1

(1 −

αloc.

/

2

) is equal to the tabulated normal quantile for the chosen parameters. The structural information provided by C

ϑ

increases power!

If one hypothesis is more important than the other,

just change the slope of the blue straight line.

(16)

STPs with p-value copulae Asymptotics Copula calibration Application

FWER control at level α = 0.3 via contour lines of the copula C ϑ

We obtain α

loc.

≈ 0.2.

Cross-check: Φ

−1

(1 −

αloc.

/

2

) is equal to the tabulated normal quantile for the chosen parameters.

The structural information provided by C

ϑ

increases power!

If one hypothesis is more important than the other,

just change the slope of the blue straight line.

(17)

Unknown copula C ϑ

In the case that we are willing to assume 1. - 3., but do not know the copula C

ϑ

, we propose:

• Parametric copula estimation

(e. g., via Spearman’s ρ and/or Kendall’s τ and/or Hoeffding’s lemma)

• Nonparametric copula estimation (e. g., with Bernstein copulae)

• Modeling with structured (hierarchical) copulae (e. g., for block dependencies)

• Approximating contour lines by resampling or statistical learning techniques

These are research topics within our Research Unit FOR 1735

”Structural Inference in Statistics: Adaptation and Efficiency”.

(18)

STPs with p-value copulae Asymptotics Copula calibration Application

Extended model setup with copula parameter

Extended model for the family of probability measures:

P = (P

ϑ,η

: ϑ ∈ Θ, η ∈ Ξ)

ϑ ∈ Θ Parameter of interest (H

j

⊂ Θ, 1 ≤ j ≤ m), η ∈ Ξ Nuisance (copula) parameter

representing the dependency structure Fundamental assumption: η does not depend on ϑ.

FWER control in the extended model:

sup

ϑ∈Θ,η∈Ξ

FWER

ϑ,η

(ϕ) ≤ α.

!

LFC ϑ

∈ H

0

: Put P

η

= P

ϑ

and FWER

η

(ϕ) = FWER

ϑ

(ϕ).

(19)

Empirical calibration of critical values

We recall for a multiple test ϕ with test statistics T

1

, . . . , T

m

and critical values c

1

, . . . , c

m

under our general assumptions 1. - 3.:

FWER

ϑ,η

(ϕ) ≤ FWER

η

(ϕ) = P

η

m

[

j=1

{T

j

> c

j

}

= 1 − C

η

(F

1

(c

1

), . . . , F

m

(c

m

)).

Empirical calibration of ϕ:

• Assume that the dependence structure of T is determined by the copula function C

η0

, η

0

∈ Ξ.

• Utilization of an estimate ˆ η for η

0

leads to the empirically calibrated critical values ˆc = c(ˆ η) and the calibrated test ˆ ϕ.

• Calibrated local significance levels: Take u(ˆ η) from the set

C

η−1ˆ

(1 − α) and put α

(j)loc.

= 1 − u

j

(ˆ η), 1 ≤ j ≤ m.

(20)

STPs with p-value copulae Asymptotics Copula calibration Application

Empirical calibration of critical values

We recall for a multiple test ϕ with test statistics T

1

, . . . , T

m

and critical values c

1

, . . . , c

m

under our general assumptions 1. - 3.:

FWER

ϑ,η

(ϕ) ≤ FWER

η

(ϕ) = P

η

m

[

j=1

{T

j

> c

j

}

= 1 − C

η

(F

1

(c

1

), . . . , F

m

(c

m

)).

Empirical calibration of ϕ:

• Assume that the dependence structure of T is determined by the copula function C

η0

, η

0

∈ Ξ.

• Utilization of an estimate ˆ η for η

0

leads to the empirically calibrated critical values ˆc = c(ˆ η) and the calibrated test ˆ ϕ.

• Calibrated local significance levels: Take u(ˆ η) from the set

C

η−1ˆ

(1 − α) and put α

(j)loc.

= 1 − u

j

(ˆ η), 1 ≤ j ≤ m.

(21)

Regard FWER

η

0

(ϕ) as a derived parameter of the copula model for T.

Theorem:

Assume that C

η0

∈ {C

η

|η ∈ Ξ ⊆ R

p

}, p ∈ N.

Suppose an estimator ˆ η

n

: Ω → Ξ of η

0

fulfilling

√ n(ˆ η

n

− η

0

) → N

d p

(0, Σ

0

) as n → ∞.

Then, under standard regularity assumptions, it holds:

a) Asymptotic Normality (Delta method)

√ n FWER

η0

( ˆ ϕ) − α 

d

→ N (0, σ

2η0

).

b) Asymptotic Confidence Region (ˆ σ

n2

consistent for σ

2η0

)

n→∞

lim P

η0

 √

n FWER

η0

( ˆ ϕ) − α ˆ

σ

n

≤ z

1−δ



= 1 − δ.

(22)

STPs with p-value copulae Asymptotics Copula calibration Application

Three ”inversion formulas”

Lemma:

X and Y real-valued random variables with marginal cdfs F

X

and F

Y

and bivariate copula C

η

, depending on a copula parameter η.

σ

X,Y

: Covariance of X and Y

ρ

X,Y

: Spearman’s rank correlation coefficient (population version) τ

X,Y

: Kendall’s tau (population version)

Then it holds:

σ

X,Y

= f

1

(η) = Z

R2

[C

η

{F

X

(x), F

Y

(y)}

−F

X

(x)F

Y

(y)] dx dy, ρ

X,Y

= f

2

(η) = 12

Z

[0,1]2

C

η

(u, v) du dv − 3, τ

X,Y

= f

3

(η) = 4

Z

[0,1]2

C

η

(u, v) dC

η

(u, v) − 1.

(23)

Example: Gumbel-Hougaard copulae

(One-parametric Archimedean copula)

C

η

(u

1

, . . . , u

m

) = exp

 −

m

X

j=1

(− ln(u

j

))

η

1/η

 , η ≥ 1.

Taking m = 2, we obtain

τ

η

= η − 1 η and, consequently,

η = (1 − τ )

−1

. (1)

Thus, η can easily be calibrated by a method of moments

(plug-in of an augmented sample version of τ into (1)).

(24)

STPs with p-value copulae Asymptotics Copula calibration Application

Gumbel-Hougaard copulae and max-stability

Proposition: (max-stability of Gumbel-Hougaard copulae) For all η ≥ 1 and (u

1

, . . . , u

m

)

>

∈ [0, 1]

m

, it holds:

1. C

η

is a max-stable copula, i. e.,

∀n ∈ N : C

η

(u

1

, . . . , u

m

)

n

= C

η

(u

n1

, . . . , u

nm

).

2. It exists a family of copulas such that for any member C, it holds

n→∞

lim



C(u

1/n1

, . . . , u

1/nm

) 

n

= C

η

(u

1

, . . . , u

m

).

=⇒ Applications of Gumbel-Hougaard copulae

in multivariate extreme value statistics

(25)

Example: Multiple support tests

X

1

, . . . , X

n

: sample of iid. random vectors with values in [0, ∞)

m

, each of which distributed as X = (X

1

, . . . , X

m

)

>

with

∀1 ≤ j ≤ m : X

j

= ϑ

d j

Z

j

, ϑ

j

> 0, where Z

j

has cdf. F

j

: [0, 1] → [0, 1].

Parameter of interest: ϑ = (ϑ

1

, . . . , ϑ

m

)

>

∈ Θ = (0, ∞)

m

. Multiple test problem (ϑ

j

: 1 ≤ j ≤ m given constants):

H

j

: {ϑ

j

≤ ϑ

j

} versus K

j

: {ϑ

j

> ϑ

j

}, j = 1, . . . , m Test statistics: T

j

= max

1≤i≤n

X

i,j

j

, 1 ≤ j ≤ m

If the copula of X is in the domain of attraction of some C

η

,

our theory applies, at least asymptotically.

(26)

STPs with p-value copulae Asymptotics Copula calibration Application

An application to exchange rate risks

Consider daily exchange rates:

EUR/CNY, EUR/HKD, EUR/MXN, and EUR/USD.

Data from 01/07/2010 to 30/06/2014 (http://sdw.ecb.europa.eu) were transformed into log-returns.

Entire sample was split into two sub-samples, where the first sub-sample consists of the data for the first three years.

Research question:

For which of the four time series does the tail behavior of the

returns remain stable during the fourth year of analysis?

(27)

Stochastic model for extreme returns

It is common practice to model excesses over large thresholds u by generalized Pareto distributions (GPDs) with cdf

G

ξ,ϑ

(x) =

 1 − (1 + ξx/ϑ)

−1/ξ

, ξ 6= 0, 1 − exp(−x/ϑ), ξ = 0, where x ≥ 0 for ξ ≥ 0 and 0 ≤ x ≤ −ϑ/ξ if ξ < 0.

Table: Maximum likelihood estimates of the GPD parameters based on data from 01/07/2010 until 30/06/2013

Parameter EUR/CNY EUR/HKD EUR/MXN EUR/USD ξ -0.18027 -0.14824 -0.05606 -0.22055

(0.09342) (0.09707) (0.10757) (0.06810)

ϑ 0.00315 0.00309 0.00485 0.00403

(0.00046) (0.00046) (0.00076) (0.00044) x0= u − ϑ/ξ 0.02503 0.02868 0.09441 0.02620

(28)

STPs with p-value copulae Asymptotics Copula calibration Application

Results of the data analysis on second sub-sample

Table: Lower confidence limits for ϑ

j

and x

0,j

, 1 ≤ j ≤ 4, for the second time period from 01/07/2013 until 30/06/2014

ϑj

EUR/CNY EUR/HKD EUR/MXN EUR/USD Bonferroni 0.002384 0.002189 0.002248 0.002691 Sid ´akˇ 0.002387 0.002192 0.002253 0.002694 Gumbel Gηˆ 0.002510 0.002321 0.002449 0.002809

x0,j

EUR/CNY EUR/HKD EUR/MXN EUR/USD Bonferroni 0.020769 0.022605 0.047982 0.020143 Sid ´akˇ 0.020784 0.022625 0.048063 0.020155 Gumbel Gηˆ 0.021465 0.023501 0.051565 0.020678

(29)

References

Gabriel, K. R. (1969). Simultaneous test procedures - some theory of multiple comparisons. Ann. Math. Stat., Vol. 40, 224-250.

Hothorn, T., Bretz, F., Westfall, P. (2008). Simultaneous Inference in General Parametric Models. Biometrical Journal, Vol. 50, No. 3, 346-363.

R ¨uschendorf, L. (2009). On the distributional transform, Sklar’s theorem, and the empirical copula process. J. Stat.

Plann. Inference, Vol. 139, No. 11, 3921-3927.

Sklar, A. (1996). Random variables, distribution functions, and copulas - a personal look backward and forward. In:

Distributions with Fixed Marginals and Related Topics.

Institute of Mathematical Statistics, Hayward, CA, 1-14.

References

Related documents

Quality: We measure quality (Q in our formal model) by observing the average number of citations received by a scientist for all the papers he or she published in a given

○ If BP elevated, think primary aldosteronism, Cushing’s, renal artery stenosis, ○ If BP normal, think hypomagnesemia, severe hypoK, Bartter’s, NaHCO3,

Proprietary Schools are referred to as those classified nonpublic, which sell or offer for sale mostly post- secondary instruction which leads to an occupation..

Newby indicated that he had no problem with the Department’s proposed language change.. O’Malley indicated that the language reflects the Department’s policy for a number

All the figures and tables should be labeled (Times New Roman 11) and included in list of figures and list of tables respectively.

In this study, it is aimed to develop the Science Education Peer Comparison Scale (SEPCS) in order to measure the comparison of Science Education students'

This type of tyre was first developed by Litchfield of Goodyear Tyre having an extra layer of rubber inside the tyre, which acts as an envelope.. These tyres are known as tubeless

The research that followed this literature review used four focus groups to describe video edited sequences of in vivo art therapy for people with borderline personality disorder.. A