• No results found

Construction of a copula estimator through recursive partitioning of the unit hypercube

N/A
N/A
Protected

Academic year: 2021

Share "Construction of a copula estimator through recursive partitioning of the unit hypercube"

Copied!
33
0
0

Loading.... (view fulltext now)

Full text

(1)

Construction of a copula estimator through recursive partitioning of the unit hypercube

O. Laverny1,2 V. Maume-Deschamps1 E. Masiello1 D. Rulli`ere1 April 17, 2020

1Universit´e Lyon 1

2SCOR SE

(2)

Table of contents

1. Introduction

2. Piecewise linear copulas 3. The Cort Estimator 4. Asymptotic behavior

5. Bagging and cross-validation 6. Simulation Study

7. Conclusion

1/24

(3)

Introduction

(4)

Copulas Basics

Suppose that X is a (continuous) random vector of dimension d with c.d.f F and marginals c.d.f (Fi)i ∈{1,...,d }. Then Sklar’s theorem [6] gives us the copula of X as :

C (u) = F F1−1(u1), ..., Fd−1(ud)

• C is a c.d.f with uniform margins on [0, 1].

• It characterises the dependence structure of F in the sense that F is completely characterised by C and the Fi’s.

The estimation of the copula is a wide-treated subject: there exists a lot of parametric distributions that can be fitted. Some

non-parametric models exists but are facing problems in high dimensions.

2/24

(5)

Density estimation trees

In regression,the CART algorithm from Breiman [3] selects a covariate and a univariate breakpoint, minimizing a loss, and assign to each leaf the mean response inside the leaf.

In density estimation,the DET from Ram and Gray [4] selects a dimension and a breakpoint minimizing a loss, and assign to each leaf the frequency of observations:

f (x ) =X

`∈L

f` λ(`)1x ∈`

• What loss can we use ?

• Will this yield a copula if applied to pseudo-observations ?

3/24

(6)

Piecewise linear copulas

(7)

Definition

Let I = [0, 1]d be the unit hypercube and L a partition of I.

Definition (Piecewise linear copula)

Let the piecewise linear copula be defined by its distribution function:

∀u ∈ I, Cp,L(u) =X

`∈L

p`λ`(u)

• λ`(u) = λ([0,u]∩`)λ(`) where λ is the lebesgue measure.

• p is a vector of weights summing to one.

Corresponding density : cp,L(u) = P

`∈L p`

λ(`)1u∈`

4/24

(8)

Copula constraints

We restrict the leaves in L to be hyperboxes of the form [a, b].

Property (Copula constraints are linear in the weights) Cp,L is a proper copula

⇐⇒

p ∈ CL= {p ∈ R|L| : Bp = g and p ≥ 0}

B1= (λ`i(ui))(i ,u)∈{1,...,d }×ML, `∈L (size nd × |L|)

B2= (1)`∈L (size 1 × |L|)

g1= (ui)(i ,u)∈{1,...,d }×ML (size nd )

B = (B1, B2) (size (nd + 1) × |L|)

g = (g1, 1) (size (nd + 1))

Where MLis the set of middle-points of leaves in L. 5/24

(9)

Kendall τ , Spearman ρ

Definition

τ = 4 Z

C (u) c(u) du − 1, and ρ = 12 Z

C (u)du − 3, and K (t) = P(C (U) ≤ t)

Property (Piecewise linear class)

τ = −1 + 2X

`∈L k∈L

d

Y

i =1

(bi∧ di− ai∧ ci) (bi∧ di+ ai∧ ci− 2ci)

+ 2 (di− ci) (bi− ai∧ di)

ρ = −3 + 6X

`∈L

p` d

Y

i =1

(2 − bi− ai)

where we denoted ` = (a, b] and k = (c, d ], ∧ denotes the minimum operator and γ+is the upper regularised gamma function.

6/24

(10)

The Cort Estimator

(11)

An integrated square error loss...

We use the integrated square error between densities.

kcp,L− ck22 ≈ kcp,Lk22− 2 hcp,L, ci (additive indep.)

≈ kcp,Lk22−2 n

n

X

i =1

cp,L(ui) (MC plug-in)

=X

`∈L

p`2

λ(`) − 2X

`∈L

p`f`

λ(`) (f = emp. freq)

= p0Ap − 2p0Af (A = diag (λ(`)−1)

= kpk2L− 2 hp, fiL

Where hx , y iL= P

`∈L x`y`

λ(`) is, indeed, a scalar product.

7/24

(12)

... yields a simple quadratic program

The weights p that minimize the integrated square error for a given partition L are given by the following:

Definition (Quadratic program)

p is the solution to the quadratic program : arg min

p∈CL

kpk2L− 2 hp, fiL which is the projection of f onto CL regarding k.k2L. We denote p = PCL(f) this projection.

Note that without the constraints,p = f.

8/24

(13)

Joint optimisation fo the breakpoint and the weights For a set of dimensions D in P({1, ..., d }), let L(`, x , D) be the partition of the leaf ` splitted on a point x in dimensions D, i.e:

L((a, b], x , D) = ×

j ∈D

{(aj, xj], (xj, bj]} ×

j ∈{1,...,d }\D

{(aj, bj]} . Define then the full partition by:

Lx ,D= L \ {`(x )} ∪ L(`(x ), x , D).

We will omit the parameter D if D = {1, .., d }.

Definition (Final optimisation problem)

The global optimisation problem we want to solve is : arg min

D ∈ P({1,...,d }) x ∈ I

p ∈ CLx,D

kpk2L

x ,D − 2 hp, fLx,DiLx ,D

9/24

(14)

The recursive procedure

1. Solve the density problem:

arg min

D ∈ P({1,...,d }) x ∈ I

−kfLx,Dk2L

x ,D

• Find the splitting dimensions D first

• Minimize greedily on x via a non-linear programming solver.

2. Recurse on each ` in Lx ,D by rescaling ` to I and solving the same problem to obtain the final partition L.

3. Then, with L fixed, solve the projection:

arg min

p ∈ CL

kpk2L− 2 hp, fLiL

via a quadratic programming solver, with inital values fL.

10/24

(15)

Finding the splitting dimensions D (for U ∼ C )

Hypothesis (Hj)

(Uj ⊥⊥ U−j) |U ∈ ` and Uj|U ∈ ` ∼ U (`j)

Bowman [2] : Suppose that ` = I, containing n obs. of the random vairable U ∼ F , for F the restriction of C to `, rescaled to I. Then:

Definition (Test statistic)

Denote by ff,L(n) the piecewise constant density that will be estimated on data U1, ..., Un∼ F , and ej ,n(x ) = E(ff,L(n)(x )|Hj).

The test statistic is given by :

Ij = kej ,n− ff,L(n)k22

where L, ej ,n and ff,L(n)are stochastic objetcs, depending on U.

11/24

(16)

Test procedure

We weakened the test by assuming that the next split is enough to test Hj. This gives a test procedure as follows:

1. Solve

x= arg min

x ∈ I

−kfLxk2Lx 2. Compute:

Ibj = X

k∈Lx ∗,{1,...,d }\{j }

 fk2

λ(k)+ X

`∈Lx ∗,{1,..,d }

`⊂k

 f`2

λ(`)− 2fkf`

λ(k)



 .

Argument : the cut will be on the same x in dimensions other than j wheter or not we work under Hj.

3. Compare to a Monte-carlo simulation of its distribution under the null to exclude the dimension j if necessary.

12/24

(17)

Asymptotic behavior

(18)

Previous result

Ram and Gray [4] gave the consistency of ff,L(n). Assuming the maximum diameter of leaves goes to 0 as n goes to ∞, we have :

P



n7→+∞lim kff,L(n)− f k22 = 0



= 1.

Denoting q s.t:

∀` ∈ L, q` = Z

`

c(u)du,

this results writes dL(f, q)2→ 0, a.s.

Furthermore, by construction, q ∈ C.

13/24

(19)

Constraint influence

Definition (Integrated constraint influence) kcp,L(n) − ff,L(n)k22 = dL(p, f)2

This quantity measures how much f and p are far from each other.

But since f is closer and closer to q, which is in the set that f is projected on to give p, we have :

Property (Asymptotical effect of constraints) The integrated constraint influence is asymptotically 0.

Proof.

C is convex, closed and non-empty. Hence p = PC(f) exist and is unique. Since q ∈ C, we have that dL(f, p)2 ≤ dL(f , q)2.

14/24

(20)

Cort consistency

Property (Consistency)

For c the density of the true copula, assuming the diameter of the leaves goes to 0 as n goes to ∞, the estimator cp,L(n) is consistent, i.e :

P



n7→+∞lim kcp,L(n)− ck22 = 0



= 1

Proof.

kcp,L(n)− ck22 = dL(p, q)2 and dL(p, q)2 ≤ dL(f, q)2.

15/24

(21)

Bagging and cross-validation

(22)

A simple forest

In regression : See Breiman [3]

In density estimation,Kernels uses leave-one-out for bandwidths.

Sain, Baggerly and Scot [5] formalized the cross-validation process for density estimation. The more involved out-of-bag procedure we propose is inspired by Wu [7].

Definition (Out-of-bag ”density” and metrics)

coob(u) = 1 N(u)

N

X

j =1

c(j )(u)1u was not in the training set of c(j )

Joob(cN) = kcNk22−2 n

n

X

i =1

coob(ui)

KLoob(cN) = Z

c(u) ln c(u) cN(u)



≈ −1 n

n

X

i =1

ln(coob(xi))

16/24

(23)

A weighted forest

After fitting the treesc1, ..., cN, we can assign weights to them minimizing an out-of-bag integrated square error for the forest :

Definition (Out-of-bag ”density” and metrics, weighted case)

coobw (u) = 1 W (u)

N

X

j =1

wjc(j )(u)1u was not in the training set of c(j )

Where W (u) is the sum of wj’s for trees that did not see u. Then : Definition (Optimal weights)

w= arg min

w

Joob(cNw)

17/24

(24)

Simulation Study

(25)

The Cort estimator is implemented in thecort R package, avaliable on CRAN.

The dataset is as follows :

• Simulation of 200 points from a 3-dimensional clayton copula with θ = 8 for marginals 1, 3 and 4.

• The second marginal is added as independent uniform draws.

• The fourth marginal is flipped, inducing anticomonotonicity.

Marginals 1, 3 and 4 exibit strong dependency.

18/24

(26)

Figure 1: The dataset we will use.

(27)

Figure 2: In gray scale, we observe a bivariate histogram of the simulation from the estimated tree. The small red points represent the input data.

(28)

Figure 3: Left : Oob Kullback-leibler and Oob ISE if function of the number of trees;

Right : Constraint influence and L2-Norm in function of weights

(29)

Figure 4: Top row: kendall’s taus. Bottom row: Spearman’s rho. Left: empirical values from burn-in data. Right : values from the fitted models. The size of the subsamples is in abssissa.

(30)

Table 1: Statitics of several models on the Dataset

Empirical Cb(m=10) Cb(m=5) Beta Cort Bagged Cort Kendall Taus

τ1,2 -0.01 0.00 0.02 -0.01 0.00 -0.04

τ1,3 -0.80 -0.75 -0.68 -0.80 -0.78 -0.56

τ1,4 0.78 0.73 0.66 0.78 0.71 0.54

τ2,3 0.03 0.02 0.00 0.02 0.00 0.05

τ2,4 -0.03 -0.02 -0.01 -0.04 0.00 -0.05

τ3,4 -0.78 -0.73 -0.65 -0.77 -0.69 -0.55

Spearman Rhos

ρ1,2 -0.02 0.00 0.02 -0.02 0.00 -0.02

ρ1,3 -0.93 -0.91 -0.87 -0.93 -0.93 -0.72

ρ1,4 0.93 0.90 0.86 0.93 0.87 0.70

ρ2,3 0.04 0.02 0.00 0.04 0.00 0.04

ρ2,4 -0.05 -0.03 -0.01 -0.06 0.00 -0.04

ρ3,4 -0.92 -0.90 -0.85 -0.92 -0.86 -0.71

Bagging Results

KLoob Inf 4.48 3.80 -4.55 -5.15 NaN

23/24

(31)

Conclusion

(32)

Take away and potential improvements

Some take away points:

• Piecewise linear distribution function are handy models for copula modeling since the copula constraints have a nice expression

• Fitting piecewise linear d.f with trees is quite simple and fast

• The main issue is the degree of freedom in weights took away by the copula constraint.

• Such models can easily be bagged, boosted, cross-validated...

24/24

(33)

References

20erard Biau, Erwan Scornet, and Johannes Welbl. “Neural Random Forests”.

In: (Apr. 25, 2016).

20A.W. Bowman. “Density Based Tests for Goodness-of-Fit”.In: Journal of Statistical Computation and Simulation 40.1-2 (Feb. 1992), pp. 1–13. issn:

0094-9655, 1563-5163.

20Leo Breiman. “Out-of-Bag Estimation”.In: (1996), p. 13.

20Parikshit Ram and Alexander G. Gray. “Density Estimation Trees”.In:

Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD ’11. The 17th ACM SIGKDD International Conference. San Diego, California, USA: ACM Press, 2011, p. 627. isbn:

978-1-4503-0813-7.

20Stephan R Sain, Keith A Baggerly, and David W Scott. “Cross-Validation of Multivariate Densities”.In: (1994), p. 12.

20A Sklar. “Fonctions de Repartition `a n Dimension et Leurs Marges”.In:

Universit´e Paris 8.3.2 (1959), pp. 1–3.

20Kaiyuan Wu, Wei Hou, and Hongbo Yang. “Density Estimation via the Random Forest Method”.In: (2017), p. 28.

References

Related documents

Thus, if we can learn the usage relations among API elements (i.e. characterizing an API via its surrounding APIs), when we know some of the corresponding APIs in two

as to whether the financial statements presented fairly the financial position of UNAIDS as at 31 December 2012, the results of its financial performance, changes in net

AIDS Healthcare Foundation CONTACTS FOR ADMINISTRATION: Medical Case Management (MCM) Address for Agency's Headquarters (HQ): Official Contact for Programmatic, Day-to-Day and

Key words: Quasi-likelihood estimator, minimum contrast estimator, least-squares estimator, least absolute deviation estimator, maximum likelihood estimator, uniform strong law of

The PLC-5 controller stands at the center of a control architecture that brings together existing and future systems by means of networks such as EtherNet/IP, ControlNet and

This paper aims at shedding light on the role that guarantees and collateral play within the capital requirements argument and, consequently on the interest rate Italian banks

Our proposed punishment of the borrower (if she deviates from any ongoing path at date t) is the perfect equilibrium path starting in period t + 1 that gives all the surplus for

70 million tonnes Production value: 8.2 billion NOK 1,03 billion EUR Export: 65 percent of production Number of gravel and crushed rock