Open Access Dissertations Theses and Dissertations
8-2016
Maximum empirical likelihood estimation in
U-statistics based general estimating equations
Lingnan Li
Purdue University
Follow this and additional works at:https://docs.lib.purdue.edu/open_access_dissertations
Part of theMathematics Commons, and theStatistics and Probability Commons
This document has been made available through Purdue e-Pubs, a service of the Purdue University Libraries. Please contact [email protected] for additional information.
Recommended Citation
Li, Lingnan, "Maximum empirical likelihood estimation in U-statistics based general estimating equations" (2016).Open Access Dissertations. 796.
PURDUE UNIVERSITY GRADUATE SCHOOL Thesis/Dissertation Acceptance
This is to certify that the thesis/dissertation prepared By
Entitled
For the degree of
Is approved by the final examining committee:
To the best of my knowledge and as understood by the student in the Thesis/Dissertation Agreement, Publication Delay, and Certification Disclaimer (Graduate School Form 32), this thesis/dissertation adheres to the provisions of Purdue University’s “Policy of Integrity in Research” and the use of copyright material.
Approved by Major Professor(s):
Approved by:
Head of the Departmental Graduate Program Date
Lingnan Li
Maximum Empirical Likelihood Estimation in U-statistics Based General Estimating Equations
Doctor of Philosophy Hanxiang Peng Chair Benzion Boukai Guang Cheng Wei Zheng Hanxiang Peng Evgeny Mukhin 7/23/2016
BASED GENERAL ESTIMATING EQUATIONS
A Dissertation Submitted to the Faculty
of
Purdue University by
Lingnan Li
In Partial Fulfillment of the Requirements for the Degree
of
Doctor of Philosophy
August 2016 Purdue University Indianapolis, Indiana
ACKNOWLEDGMENTS
I want to acknowledge the effort from Professor Hanxiang Peng, who graciously became to be my thesis advisor four years ago and inspired me to start the research in the area of empirical likelihood.
I would like to thank Professor Fei Tan, who helped me on the projects; to thank Professor Benzion Boukai, who admitted me to the statistics program long time ago and supported me through the time; to thank Professor Jyoti Sarkar, Professor Fang Li, and Professor Wei Zheng; and to thank Professor Guang Cheng for serving on my dissertation committee.
I would also like to acknowledge the effort from Professor Zhongmin Shen, who helped me become part of the Ph.D. program and one of Professor Peng’s students.
Finally, I would like to thank the Department of Mathematical Sciences of IUPUI, without whom this thesis would not have been possible at all.
TABLE OF CONTENTS Page LIST OF TABLES . . . v SYMBOLS . . . vi ABBREVIATIONS . . . viii ABSTRACT . . . ix 1 INTRODUCTION . . . 1
2 JACKKNIFE EMPIRICAL LIKELIHOOD FOR VECTOR U-STATISTICS 5 2.1 Vector U-statistics . . . 5
2.2 JEL for vector U-statistics . . . 6
3 MAXIMUM EMPIRICAL LIKELIHOOD ESTIMATION . . . 9
3.1 Asymptotic behaviors of the local logarithm of the JEL ratio . . . . 9
3.2 Some lemmas . . . 12
3.3 Proof of Theorem 3.1.1 . . . 14
4 EASY MAXIMUM EMPIRICAL LIKELIHOOD ESTIMATION . . . 15
4.1 MELE’s and semiparametric effeciency . . . 15
4.2 Easy maximum empirical likelihood estimation . . . 18
5 APPLICATIONS: EXAMPLES . . . 23
5.1 Estimating the expected values with side information . . . 23
5.2 Smoothed U-quantiles . . . 30
6 AN EL APPROACH TO GOODNESS-OF-FIT TESTING FOR MULTI-VARIATE AND HIGH DIMENSIONAL SYMMETRIES . . . 33
6.1 Testing multivariate symmetries . . . 33
6.1.1 Spherical symmetry . . . 33
6.1.2 Rotational symmetry . . . 36
6.1.3 Antipodally symmetric distributions . . . 38
6.1.4 Coordinatewise symmetry . . . 38
6.1.5 Exchangeability . . . 39
6.2 Simulation study . . . 40
6.2.1 Testing the center of spherical symmetry . . . 40
6.2.2 Testing rotational symmetry . . . 45
6.3 Proof of Theorem 6.1.2 . . . 45
Page
7.1 Proofs for Lemmas in Chapter 3 . . . 51
7.2 Proofs for Lemmas and Theorems in Chapter 4 . . . 65
REFERENCES . . . 70
LIST OF TABLES
Table Page
6.1 Case 1: The simulated α = .05 level of significance of the EL test about
the center of spherical symmetryH0 :θ=0withJ =r,d= dim,n= 400
and m= 2000 repetitions. Data generated from multivariate normal. . 41
6.2 Case 1: The simulated power of the test about the center of spherical
symmetry H0 :θ =0 vs H1 :θ = rep(0.4,dim) with J =r, d = dim, and
m = 2000 repetitions at α = 0.05 level of significance. Data generated
from multivariate normal. . . 42
6.3 Case 2: The simulated power of the test about the center of spherical
symmetry H0 : θ = 0 vs H1 : θ = θ1 with J = r, d = dim, n = 400 and
m = 2000 repetitions at α = 0.05 level of significance. Data generated
from multivariate normal. . . 43
6.4 Case 3: The simulated α = .05 level of significance of the test about the
center of spherical symmetry H0 : θ =0 with J = K =r, d = L = dim,
n = 50, and m = 100 repetitions. Data generated from multivariate
normal. . . 44
6.5 Case 3: The simulated power of the test about the center of spherical
symmetry H0 : θ = 0 vs H1 : θ = θ1 with J = K = r, d = L = dim,
n = 50, and m = 100 repetitions at α = 0.05 level of significance. Data
generated from multivariate normal. . . 44
6.6 Case 1: The simulated α= 0.05 level of significance of the test about the
direction of rotational symmetry H0 : θ = (0,0,1)> with d = 3, J = r,
and m = 2000 repetitions. Data generated from the von Mises-Fisher
distribution. . . 46
6.7 Case 2: The simulated power of the test about rotational symmetry H0 :
θ = (0,0,1)> vs H1 : θ = (0.14,0.14,0.98)> with d = 3, J = r, and
m = 2000 repetitions at α = 0.05. Data generated from the von
SYMBOLS
X, Y, Z, . . . column random variables (vectors) x, y, z, . . . realizations of X, Y, Z, . . .
a, b, c, . . . constants
α, β, θ, . . . column parameter vectors
N(θ0) a neighborhood of θ0
N (µ,Σ) normal distribution with µ and variance (matrix) Σ
R the real space
Rm the m dimensional real (column) vector space
Rn,Rˆn empirical likelihood (ratio)
π1, . . . , πn probability weights
P, E, V ar, Cov probability, expected value, variance, covariance f, g, h, . . . column vector functions
˜
f the centered vector function off i.e. ˜f =f −E(f(Z))
Unm(h) a U-statistic of order m with kernel h
θ0 the true value of parameter
Θ the parameter space
kak the euclidean norm of a vectora
a> the transpose of a vector a
A> the transpose of a matrix A
A−1 the inverse of a matrix A
A⊗2 AA>
˙
f(z;θ) ∂θ∂f(z;θ)∈Rk (column vector) for θ ∈
Rk, f ∈R
∂
∂θ>f(z;θ) a row vector of partial derivatives for θ∈Rk, f ∈R ˙
f(z;θ) ∂θ∂>f(z;θ) the m×k matrix for θ ∈Rk, f ∈Rm ˙
f(z;θ)> ∂θ∂f(z;θ)> the k×m matrix for θ ∈Rk, f ∈
¨
f(z;θ) ∂θ∂θ∂2>f(z;θ) = ∂θ∂>∂θ∂f(z;θ) the k×k matrix forθ ∈Rk, f ∈R
→p convergence in probability
ABBREVIATIONS
a.s. almost sure convergence
DF Distribution Function
CDF Cumulative Distribution Function
EDF Empirical Distribution Function
EL Empirical Likelihood
GEE General Estimating Equation
i.i.d. Independent and Identically Distributed
JEL Jackknife Empirical Likelihood
MELE Maximum Empirical Likelihood Estimate
MVUE Minimum Variance Unbiased Estimate
r.v. Random Variable
ABSTRACT
Li, Lingnan Ph.D., Purdue University, August 2016. Maximum Empirical Likelihood
Estimation in U-statistics Based General Estimating Equations. Major Professor:
Hanxiang Peng.
In the first part of this thesis, we study maximum empirical likelihood estimates (MELE’s) in U-statistics based general estimating equations (UGEE’s). Our technical maneuver is the jackknife empirical likelihood (JEL) approach. We give the local uniform asymptotic normality condition for the log-JEL for UGEE’s. We derive the estimating equations for finding MELE’s and provide their asymptotic normality. We obtain easy MELE’s which have less computational burden than the usual MELE’s and can be easily implemented using existing software. We investigate the use of
side information of the data to improve efficiency. We exhibit that the MELE’s
are fully efficient, and the asymptotic variance of a MELE will not increase as the number of UGEE’s increases. We give several important examples and demonstrate that efficient estimates of moment based distribution characteristics in the presence of side information can be obtained using JEL for U-statistics.
In the second part, we propose several JEL goodness-of-fit tests for spherical symmetry, rotational symmetry, antipodal symmetry, coordinatewise symmetry and exchangeability. We employ the jackknife empirical likelihood for vector U-statistics to incorporate side information. We use estimated constraint functions and allow the number of constraints and the dimension to grow with the sample size so that these tests can be used to test hypotheses for high dimensional symmetries. We demonstrate that these tests are distribution free and asymptotically chisquare distributed. We conduct extensive simulations to evaluate the performance of these tests.
1. INTRODUCTION
Empirical likelihood (EL) is a data-driven likelihood approach with nonparametric nature which is effective and requires few assumptions about the distribution of the data. Owen (1988, 1990, 1991) showed that the empirical likelihood ratio statistics have the limiting chi-square distribution under mild conditions. He also demonstrat-ed that tests and confidence intervals can be constructdemonstrat-ed. The empirical likelihood theory has been successfully extended to various areas of statistics with tremendous
accomplishments. These include Bartlett correction (DiCiccio, et al., 1991),
gener-alized linear models (Kolaczyk, 1994), heteroscedastic partially linear models (Lu, 2009), partially linear models (Shi and Lau, 2000; Wang and Jing, 2003), paramet-ric and semiparametparamet-ric models in multiresponse regression (Chen and Van Keilegom, 2009), right censored data (Li and Wang, 2003), U-statistics with side information
(Yuan, et al., 2012), and stratified samples with nonresponse (Fang, et al., 2009).
Qin and Lawless (1994) linked empirical likelihood with finitely many estimating
e-quations and investigated maximum empirical likelihood estimators. Chen, et al.
(2009) obtained asymptotic normality for the number of constraints growing to
infin-ity. Hjort,et al. (2009) and Peng and Schick (2013a, 2013b) generalized the empirical
likelihood approach to allow for the number of constraints to grow with the sample size and for the constraints to use estimated criteria functions. Algorithms, calibra-tion and higher-order precision of the approach can be found in Hall and La Scala (1990), Emerson and Owen (2009) and Liu and Chen (2010) among others.
In Owen’s homepage (http://statweb.stanford.edu/∼owen/empirical/) software
can be found. Here are two algorithms from this site: scel.R (R function to compute
empirical likelihood using a self-concordant convex criterion) andel.R(Mai Zhou’s R
U-statistics is a class of statistics which is especially useful in estimation. Many popular statistics such as high order moments information can be expressed by
U-statistics, see e.g. Serfling (1980), Kowalski and Tu (2008) and Lee (1990). Yuan, et
al. (2012) explored maximum empirical likelihood estimates (MELE’s) in U-statistics
with side information. However, usual EL method runs into serious computational difficulties when it’s applied to U-statistics. U-statistics are not independent but correlated so that they do not satisfy the independence or at least asymptotic inde-pendence which is assumed by the definition of empirical likelihood. Moreover, unlike the usual empirical likelihood, the nonlinearality of EL weightsπj’s in the constraints
equations results in that there are no explicit solutions for the EL weights. Jing, et
al. (2009) identified the asymptotic independence of the jackknife pseudo values of a
U-statistic and introduced their jackknife empirical likelihood (JEL) for U-statistics, and showed its effectiveness in handling one- or two-sample U-statistics. Some nice properties of the jackknife pseudo values of a U-statistic were exploited to establish the Wilks theorems for their cases. For example, the average of the jackknife pseudo values is equal to the U-statistic, and the sample variance of them is an asymptotically unbiased estimator of the asymptotic variance of the U-statistic.
Motivated by applications to goodness of fit U-statistic testing, Peng and Tan (2016) gave two approaches to justify the JEL for vector U-statistics and proved the Wilks theorems. They extended empirical likelihood for general estimating equations (GEE’s) to U-statistics based general estimating equations (UGEE’s). The results were extended to allow for the use of estimated constraints and for the number of constraints to grow with the sample size. They exhibited that the JEL can be used to construct EL tests for moment based distribution characteristics (e.g. skewness, coefficient of variation) with less computational burden and more flexibility than the usual EL. This can be done in the U-statistic representation approach and the vector U-statistic approach which were illustrated with several examples including JEL tests for Pearson’s correlation, Goodman-Kruskal’s Gamma, overdisperson, U-quantiles, variance components, and simplicial depth. They showed that tests are
asymptotically distribution free. They ran simulations to exhibit power improvement of the tests with incorporation of side information.
Soon it was realized that it can also be used to construct point estimators. Qin and Lawless (1994) linked empirical likelihood with GEE’s and investigated maximum em-pirical likelihood estimators (MELE’s). They established consistency and asymptotic normality of MELE’s under the usual regularity conditions, and demonstrated that the variance of a MELE will not increase when the number of estimating equations is increased. Furthermore, they showed that MELE’s are fully semiparametrically
efficient in the sense of least dispersed regular estimators (Bickel, et al. (1993), Van
der Vaart (2000)). Peng and Schick (2013) explored MELE’s in the case of constraint functions that may be discontinuous and/or depend on additional parameters. The latter is the case in applications to semiparametric models where the constraint func-tions may depend on the nuisance parameter. Zhang (1995, 1997) used the method of MELE’s to construct improved estimates in M-estimation and quantile processes with the availability of auxiliary (side) information. He established consistency and asymp-totic normality, and proved that the asympasymp-totic variances of the resulting estimators are smaller than those of the usual sample M-estimators and sample quantiles. It was utilized by Hellerstein and Imbens (1999) for the least squares estimators in a linear regression model and the application to a real data set was presented. These authors dealt with finitely many of constraints. Peng and Schick (2013) has employed on-step estimator to construct MELE’s. Peng (2015) developed a class of easy MELE’s which is computationally more efficient. Recently, Tang and Leng (2012) used this idea to construct more efficient estimators of parameters in quantile regression.
In this thesis, we study MELE’s in UGEE’s and their asymptotic behaviors. Our technical maneuver is the jackknife empirical likelihood approach. It is well known that the asymptotic behaviors of the U-statisticUn(h) is determined byh1 (see
Chap-ter 2). Here we shall apply the theory of Qin and Lawless (1994) on h1 to derive the
asymptotic behaviors of the MELE’s of the JEL for UGEE’s. These results for the UGEE’s are parallel to those of Qin and Lawless (1994). We obtain the uniform local
asymptotic normality for the logarithm of the JEL ratio in Chapter 3. We derive the estimating equations for the MELE’s in UGEE’s in Chapter 4. Here we also give a class of easy MELE’s and establish their asymptotic distribution. In Chapter 5, we provide a number of examples. Here we demonstrate that efficient estimates of moment based distribution characteristics in the presence of side information can be obtained using JEL for U-statistics. In Chapter 6, we propose several JEL tests for various multivariate and high dimensional symmetries. Some of the technical details are provided in Chapter 7.
2. JACKKNIFE EMPIRICAL LIKELIHOOD FOR
VECTOR U-STATISTICS
In this chapter, we recall some facts about one-sample multivariate U-statistics and introduce the JEL approach.
2.1 Vector U-statistics
Let (Z,S) be a measurable space andP be a probability measure on this space.
Let Z1, . . . , Zn be independent copies of a Z-valued random variable Z with
cumu-lative distribution functionF underP. Leth:Rm 7→Rd be a known function that is
permutation symmetric in its m arguments. θ ∈ Θ is a parameter we are interested
in. A multivariate or vector U-statistic with kernel h of order m is defined as
Un =:Unm(h) =: n m −1 X 1≤i1<...<im≤n h(Zi1, . . . , Zim;θ), n≥2.
Throughout we assume h is Fm-square integrable, that is, h ∈ L2(Fm), where
L2(Fm) =
f :R
kfk2dFm <∞ , where kvk denotes the euclidean norm of vecotr
v. We assume throughout that
E(h(;θ)) =:E(h(Z1, . . . , Zm;θ)) = 0. (2.1.1)
This of course implies E(Un) = 0. Also, we shall abbreviate Pnf = n−1Pnj=1f(Zj)
and P f =E(f(Z)). Lethm =h and
hc(z1, . . . , zc;θ) =E(h(z1, . . . , zc, Zc+1, . . . , Zm;θ)), c= 1, . . . , m−1.
Then hc is a version of the conditional expectation, that is,
Letδz be the point mass at z ∈ Z. We now define h∗c(z1, . . . , zc) = (δz1 −P). . .(δzc −P)P m−ch, c= 0,1, . . . , m. Throughout we let ˜ f =f−P f
be the centered version of a functionf for whichP f is well defined. Clearlyh∗1 = ˜h1.
2.2 JEL for vector U-statistics
Let Un(−−j)1 denote the U-statistic based on the n−1 observations Z1, . . ., Zj−1,
Zj+1, . . ., Zn. The jackknife pseudo values of the U-statistic are defined as
Vnj =nUn−(n−1)U (−j)
n−1, j = 1, . . . , n.
LetRnj = ˜Vnj−m˜h1(Zj;θ). It has been shown in (4.6) of Peng and Tan (2016) that
each component of Rnj is ofOp(n−1/2), hence
˜
Vnj(θ) =m˜h1(Zj, θ) +Op(n−1/2), j = 1, . . . , n.
As argued in Peng and Tan (2016), this shows that each jackknife value ˜Vnj depends
asymptotically on Zj, so that ˜Vnj, j = 1, . . . , nare asymptotically independent. As a
result, if πj is a probability mass placed at Zj, then approximately the same
proba-bility massπj is placed at the jackknife value ˜Vnj forj = 1, . . . , n; because of the the
asymptotic independence of the jackknife values, the joint likelihood is approximately
the product of these πj’s. Consequently, it is justified to introduce the JEL for the
vector U-statistic Un(h) as follows:
ˆ Rn(θ) = sup nYn j=1 nπj :π ∈Pn, n X j=1 πjV˜nj(θ) = 0 o , θ ∈Θ, (2.2.1)
where Pn denotes the closed probability simplex in dimension n, i.e.
Pn =
Using Lagrange multipliers, Owen (1988) derived πj = 1 n 1 1 +ξ>V˜ nj(θ) , j = 1, . . . , n, (2.2.2)
where ξ satisfies the equation
1 n n X j=1 ˜ Vnj(θ) 1 +ξ>V˜ nj(θ) = 0. (2.2.3)
As (2.1.1) holds, ˜Vnj = Vnj. For notational brevity, we sometimes write ˜Vnj = Vnj.
Peng and Tan (2016) showed that the JEL for vector U-statistics are asymptotically chi-square distributed under the same usual assumption as for the asymptotic nor-mality of vector U-statistics, that is, ifV ar(h1(Z)) is nonsingular then−2 log ˆRn(θ0)
is asymptotic chi-square distributed with d degrees of freedom, i.e.
3. MAXIMUM EMPIRICAL LIKELIHOOD ESTIMATION
In this chapter, we give the local uniform asymptotic normality condition for the log-JEL for UGEE’s.
3.1 Asymptotic behaviors of the local logarithm of the JEL ratio
Let (Z,S) be a measurable space, Q be a family of probability measures onS,
and θbe a parameter of interest which is from an open subset Θ ofRk. LetZ
1, ..., Zn
be independent and identically distributed (i.i.d.) Z-valued random variables with
an unknown distribution Q belonging to the model Q. Recall that a kernel function
h :Rm 7→
Rd is permutation symmetric about its m arguments and satisfies (2.1.1).
We are interested in inference about the characteristic θ and work with the jackknife
empirical likelihood (JEL) ratio ˆRn(θ) in (2.2.1).
Qin and Lawless (1994) studied the maximum empirical likelihood estimator (MELE), ˆ θ= arg max θ∈Θ ˆ Rn(θ). (3.1.1)
Recall that the local JEL ratio is defined as ˆ Ln(t) = log ˆ Rn(θ0+n−1/2t) ˆ Rn(θ0) , t∈Rk, θ0+n−1/2t∈Θ.
For a function f onZ ×Θ, let ˙f ,f¨denote the first and second partial derivative of
f with respect to parameter θ, that is,
˙ f(z;θ) = ∂ ∂θf(z;θ), ¨ f(z;θ) = ∂ 2 ∂θ∂θ>f(z;θ), z∈Z, θ ∈Θ. Recall ˜h=h−E(h). Let us introduce the following assumptions.
(A1) There exist a neighborhood N(θ0) of θ0 and a square-integrable function Gon
Zm such that h(z
to θ for every (z1, . . . , zm)∈ Zm with the first partial derivative ˙h of full rank
and second partial derivative ¨h satisfying
kh˙(z1, . . . , zm;θ)k+k¨h(z1, . . . , zm;θ)k ≤G(z1, . . . , zm), θ∈N(θ0). (A2) supθ∈N(θ0)Ekh(Z1, . . . , Zm;θ)k2 <∞. (A3) W =m2E ˜h 1(Z;θ0)⊗2 is positive definite.
Remark 3.1.1 (A1) implies that kh(z1, . . . , zm;θ)k is also bounded by a
square-integrable function G∗(z1, . . . , zm) on Zm×N(θ0) provided that N(θ0) is bounded
(by K). In fact, it follows from the mean value theorem that for θ ∈ N(θ0), there
exist θ∗ lying between θ0 and θ such that
kh(z1, . . . , zm;θ)k ≤ kh(z1, . . . , zm;θ0)k+kh˙(z1, . . . , zm;θ∗)kkθ−θ0k ≤ kh(z1, . . . , zm;θ0)k+G(z1, . . . , zm)kθ−θ0k ≤ kh(z1, . . . , zm;θ0)k+ 2KG(z1, . . . , zm) :=G∗(z1, . . . , zm). Letun =un(θ0) where un(θ) =n−1/2 n X j=1 mh˜1(Zj;θ). (3.1.2)
We have the following uniform local asymptotic normality.
Theorem 3.1.1 Assume that (A1)-(A3) are satisfied. Then it holds the expansions sup ktk≤C k −2 log ˆRn(θ0+n−1/2t)−(un−At)>W−1(un−At)k=oP(1) (3.1.3) and sup ktk≤C kLˆn(t)−t>A>W−1un+ 1/2t>A>W−1Atk=oP(1) (3.1.4)
Note that A has full rank by (A1) and plays the same role as the quantity
−E[ ˙u(Z;θ)] does in Qin and Lawless (1994). Note also that the quadratic functionq defined by
q(t) = t>A>W−1un−1/2t>A>W−1At, t∈Rk, (3.1.5)
is uniquely maximized by ˆt = (A>W−1A)−1A>W−1u
n. This shows that θ 7→ Rˆn(θ)
has a local maximizer ˆθ such that
n1/2(ˆθ−θ0)−(A>W−1A)−1A>W−1un =oP(1). Therefore, ˆ θ =θ0+ (A>W−1A)−1 m n n X j=1 A>W−1˜h1(Zj;θ0) +oP(n−1/2). (3.1.6)
This of course implies the asymptotic normality of ˆθ, i.e.
√ n(ˆθ−θ0) =⇒ N (0,(A>W−1A)−1). (3.1.7) Substituting (2.2.2) in ˆRn(θ), we get −log ˆRn(θ) = n X j=1 log(1 +ξ>Vnj(θ)), θ ∈Θ, (3.1.8)
where ξ satisfies (2.2.3). It is not difficult to see that under (A1) – (A3) the random
functionθ7→Rˆn(θ) is continuously differentiable. Consequently, ˆθand ˆξ=ξ(ˆθ) must
satisfy B1n(θ, ξ) = 0, B2n(θ, ξ) = 0, (3.1.9) where B1n(θ, ξ) = 1 n n X j=1 Vnj(θ) 1 +ξ>V nj(θ) , (3.1.10) and B2n(θ, ξ) = 1 n n X j=1 1 1 +ξ>V nj(θ) ∂Vnj(θ)> ∂θ ξ. (3.1.11)
Note that (3.1.9) are the estimating equations for the MELE ˆθ. Summarizing the
Theorem 3.1.2 Suppose (A1) – (A3) hold. Then there is a maximizerθˆfor the JEL θ 7→ Rˆn(θ) such that θˆ solves (3.1.9) and satisfies the stochastic expansion (3.1.6)
hence (3.1.7).
3.2 Some lemmas
To prove Theorem 3.1.1, we need the following lemmas with the proof delayed to Chapter 7.
Lemma 3.2.1 Assume (A1)-(A2) are met. Then it holds for every finite C, ˆ Dn(C) = sup ktk≤C 1 n n X j=1 kVnj(θ0+n−1/2t)−m˜h1(Zj;θ0)k2 =OP(1/n). (3.2.1)
Lemma 3.2.2 Under assumptions (A1)-(A2), it holds the expansion,
sup ktk≤C kn−1/2 n X j=1 (Vnj(θ0+n−1/2t)−mh˜1(Zj;θ0)) +Atk=oP(1) (3.2.2)
for every finite constant C, with A=−E[mh˙˜1(Z;θ0)].
To complete the proof of Theorem 3.1.1, we need a general result from Peng and Schick (2013). Let Tn1(t), ...,Tnn(t) be d-dimensional random vectors indexed by
t ∈ Rk, where k ≤ d. We are interested in the asymptotic behavior of the empirical
likelihood process Rn(t) = sup nYn j=1 nπj :π ∈Pn, n X j=1 πjTnj(t) = 0 o , ktk ≤C,
where C is a positive constant. We shall use the following result which is a special
case of Lemma 5.2 of Peng and Schick (2015).
Lemma 3.2.3 Let x1, ..., xn be d-dimensional vectors. Set
x∗ = max 1≤j≤nkxjk, x¯= 1 n n X j=1 xj, S = 1 n n X j=1 xjx>j ,
and let λ denote the smallest and Λ the largest eigen value of the matrix S. Then the inequality λ >5kx¯kx∗ implies k −2 logR−nx¯>S−1x¯k ≤(Λ + Λ 3 4Λ2) 2nkx¯k3x∗ (λ− kx¯kx∗)3, (3.2.3) where R = supn n Y j=1 nπj :π ∈Pn, n X j=1 πjxj = 0 o .
Motivated by this we introduce the quantities
T∗n(t) = max 1≤j≤nkTnj(t)k, ¯ Tn(t) = 1 n n X j=1 Tnj(t), Sn(t) = 1 n n X j=1 Tnj(t)T>nj(t).
We impose the following conditions. (B1) supktk≤CT∗n(t) =oP(n1/2).
(B2) There is a positive definite d×d matrix S such that
sup ktk≤C
kSn(t)−Sk=oP(1).
(B3) There exist k-dimensional random vectors un and and d×k matrix A of full
rankk such that un=OP(1) and
sup ktk≤C
k√nT¯n(t)−un+Atk=oP(1).
We have the following result with the proof delayed to Chapter 7.
Lemma 3.2.4 Suppose (B1)-(B3) hold. Then sup ktk≤C k −2 logRn(t)−(un−At)>S−1(un−At)k=oP(1) (3.2.4) and therefore sup ktk≤C klog Rn(t) Rn(0) −t>A>S−1un+ 1 2t > A>S−1Atk=oP(1). (3.2.5)
3.3 Proof of Theorem 3.1.1
With the help of the Lemma 3.2.1 – Lemma 3.2.4 we now prove the main result.
Proof of Theorem 3.1.1. We verify the assumptions of Lemma 3.2.3 with
Tnj(t) = Vnj(θ+n−1/2t),S =W,un=n−1/2Pnj=1m˜h1(Zj;θ), andA=−E[mh˙˜1(Z;θ0)].
Since Ekm˜h1(Z;θ0)kis finite, we obtain
max
1≤j≤nkm
˜
h1(Zj;θ0)k=oP(n1/2). (3.3.1)
For a fixed C, note that we have the bound
sup ktk≤C max 1≤j≤nkVnj(θ+n −1/2t)k ≤ max 1≤j≤nkm ˜ h1(Zj;θ)k+n1/2Dˆ1/2n
where ˆDn is given in (3.2.1). It then follows from Lemma 3.2.1 that ˆDn=op(1) hence
sup ktk≤C max 1≤j≤nkVnj(θ+n −1/2t)k=o P(n1/2).
This implies (B1). From Lemma 3.2.2 and the central limit theorem it follows that (B3) holds. We are now left to verify (B2). To this end, set
¯ Wn = 1 n n X j=1 ˜ h1(Z;θ)˜h>1(Z;θ). By (A2), kW¯n−Wk=oP(1). (3.3.2)
We conclude (B2) from Lemma 3.2.1, (3.3.2) and the bound,
ka>(Sn(t)−W¯n)ak=k 1 n n X j=1 (a>Vnj(θ+n−1/2t))2− 1 n n X j=1 (a>h˜1(Zj;θ))2k ≤Dˆn+ 2( 1 n n X j=1 k˜h1(Zj;θ)k2Dˆn)1/2,
valid for every unit vector ainRk, everytwith ktk ≤C. We now apply Lemma 3.2.3
4. EASY MAXIMUM EMPIRICAL LIKELIHOOD
ESTIMATION
In this chapter, we discuss the existence of maximum empirical likelihood estimate
(MELE) ˆθ. We derive the estimating equations for the MELE’s and obtain their
asymptotic behaviors. In the end, we give a class of easy MELE’s and establish their asymptotic distributions.
4.1 MELE’s and semiparametric effeciency
If h1 were a known function, we would work with the empirical likelihood ratio,
Rn(θ) = sup nYn j=1 nπj :π∈Pn, n X j=1 πjm˜h1(Zj;θ) = 0 o , θ ∈Θ. (4.1.1)
It follows from Owen (1988) that if Var(mh1) is finite and positive definite then (4.1.1)
reaches its maximum when πj = 1 n 1 1 +ξ>m˜h 1(Zj;θ) , j = 1, ..., n,
where ξ= (ξ1, ..., ξd)> are Lagrange multipliers, which is a d×1 vector and satisfies n X j=1 mh˜1(Zj;θ) 1 +ξ>m˜h 1(Zj;θ) = 0. Moreover, ξ→0 as n→ ∞.
In addition to satisfying (A1)-(A3), we further assume EG3 <∞. It then follows
from Lemma 1 of Qin and Lawless (1994) that as n → ∞, with probability 1 the EL
ratio function in (4.1.1) attains its maximum value at some ˜θ in the interior of the
ball kθ−θ0k ≤n−1/3, and ˜θ and ˜ξ=ξ(˜θ) satisfy
where A1n(θ, ξ) = 1 n n X j=1 mh˜1(Zj;θ) 1 +ξ>m˜h 1(Zj;θ) , (4.1.3) A2n(θ, ξ) = 1 n n X j=1 1 1 +ξ>m˜h 1(Zj;θ) ∂m˜h1(Zj;θ)> ∂θ ξ. (4.1.4)
The equations in (4.1.2) are theoretically useful and can’t be used to find the MELE’s
defined by the JEL ratio (2.2.1) becauseh1 is unknown. Instead, we find the MELE’s
by solving the estimating equations in (3.1.10) – (3.1.11). The next lemma states that both sets of equations give the same solutions as the sample size tends to infinity. To this end, let N0(θ0,0) = {θ : kθ −θ0k ≤ n−1/3} × {ξ : kξk ≤ n−1/3} denote
a neighborhood of (θ0,0). We have the following lemma with the proof delayed to
Chapter 7.
Lemma 4.1.1 Assume (A1)-(A3) hold. Then sup (θ,ξ)∈N0(θ0,0) kB1n(θ, ξ)−A1n(θ, ξ)k=oP(1), (4.1.5) and sup (θ,ξ)∈N0(θ0,0) kB2n(θ, ξ)−A2n(θ, ξ)k=oP(1). (4.1.6)
Consequently, from Lemma 4.1.1 it follows that for large n, there exists some
point ˆθ in a shrinking neighborhood of θ0, such that ˆθ and ˆξ = ξ(ˆθ) satisfy (3.1.9)
and the JEL (2.2.1) reaches its maximum value at (ˆθ,ξˆ). These statement hold on an
event Ω withP(Ω) = 1 at least for sufficiently large n. In general, on its complement
Ωc where (ˆθ,ξˆ) are not defined we define them to be arbitrary numbers. The below
theorem gives the asymptotic normality and the proof can be found in Chapter 7.
Theorem 4.1.1 Assume (A1)-(A3) hold with the dominating function G satisfying E(G3)<∞. Then, as n tends to infinity, with probability one θˆ(θ) attains its maxi-mum at some θˆin a shrinking neighborhood of θ0, and θˆand ξˆ=ξ(ˆθ) solves (3.1.9),
and satisfy
√
n(ˆθ−θ0)→N (0, V),
√
where V = E(h˙˜1)> E(˜h⊗12) −1 E(h˙˜1) −1 , U = 1 m2 E(˜h ⊗2 1 ) −1 Id−E(h˙˜1)V E(h˙˜1)> E(˜h⊗12) −1 , and θˆand ξˆare asymptotically uncorrelated.
As a result, the weights πj, j = 1, . . . , n can be estimated by
ˆ πj = 1 n 1 1 + ˆξ>m˜h 1(Zj; ˆθ) , j = 1, ..., n. (4.1.8)
This, in turn, yields an efficient estimate of the DF F as follows:
ˆ Fn(z) = n X j=1 ˆ πj1(Zj ≤z). (4.1.9)
Remark 4.1.1 The asymptotic variance matrixV can be consistently estimated by ˆ V = Xn j=1 ˆ πjh˙˜1(Zj; ˆθ) > n X j=1 ˆ πj˜h⊗12(Zj; ˆθ) −1 n X j=1 ˆ πjh˙˜1(Zj; ˆθ) −1 . (4.1.10)
Theorem 4.1.1 also can be used to get approximate confidence limits for θ or F.
By the U-statistics theory, the asymptotic distribution of a U-statistic Umn(h)
of order m with kernel h is dictated by h1. Thus we apply Corollary 1 of Qin and
Lawless (1994) with their estimating function g =h1 to obtain the below result.
Corollary 4.1.1 Assume that the assumptions of Theorem 4.1.1 hold. Suppose d >
k. Then the asymptotic covariance-variance matrix V = Vd of
√
n(ˆθ −θ0) does not
increase (in the sense of positive definiteness of positive definite matrices) as the number of estimating equations increases.
Using the same argument as above and applying Theorem 3 of Qin and Lawless (1994), we have the following.
Theorem 4.1.2 Under the assumptions of Theorem 4.1.1, the MELEθˆis fully effi-cient in the sense of Van der Vaart (1988) and Bickel, Klaassen, Ritov and Wellner (1993).
The efficiency criteria used are that of a least dispersed regular estimator or that of a locally asymptotic minimax (LAM) estimator. These criteria are based on the convolution theorems and on the lower bounds on the local asymptotic risk in LAN (locally asymptotically normal) and LAM families, see the above references and ad-ditional references therein.
4.2 Easy maximum empirical likelihood estimation
In this section, we study a special case of the U-statistics based estimating equa-tions, that is, some of the equations do not involve parameters. At a first glance, it is seemingly restrictive for use. But actually it is quite useful as we shall demonstrate below. As to this special case, we derive the estimating equations for the MELE’s which are computationally faster than the usual MELE’s – easy MELE’s – as the solutions of the estimating equations given before.
Consider a kernel functions of the form,
h(Z1, . . . , Zm;θ) = (u(Z1, . . . , Zm;θ)>, v(Z1, . . . , Zm)>)>,
where u : Rm×Θ 7→ Rp and v :
Rm 7→ Rq are measurable functions and Θ is an
open subset of Rk. Supposeu and v satisfy
E(u(Z1, . . . , Zm;θ)) = 0, θ∈Θ, (4.2.1)
and
E(v(Z1, . . . , Zm)) = 0. (4.2.2)
While (4.2.1) serves as a criterion equation for the parameter θ, (4.2.2) describes side
information about the underlying distribution. The parameter θ is usually estimated
by the M-estimate, the solution to the sample version of (4.2.1), that is, the UGEE with kernel u(;θ), Unm(u(;θ)) = n m −1 X 1≤i1<···<im≤n u(Zi1, . . . , Zim;θ) = 0, θ ∈Θ, (4.2.3)
where we assume, without loss of generality, thatu is argument symmetric about its
m variables (otherwise we symmetrize it). The M-estimate is not efficient in general
as the side information given by (4.2.2) is not utilized. Often we assume that the
number p of equations in (4.2.1) is equal to the dimension k of the parameter θ
(otherwise we can eliminate those redundant equations). Throughout this section we
assume p=k.
We now work with the JEL, ˆ Rn(θ) = sup nYn j=1 nπj :π∈Pn, n X j=1 πjvec(Vnju(θ), V v nj) = 0 o , θ ∈Θ, (4.2.4) where Vu
nj, Vnjv are the jackknife pseudo values based on the U-statistics with kernel
function u and v respectively.
In this case, V =m2E (˜u 1(Z;θ0)>,˜v1(Z)>)>(˜u1(Z;θ0)>,˜v1(Z)>) , so that V =m2 E(˜u⊗12) E(˜u1˜v1>) E(˜v1u˜>1) E(˜v ⊗2 1 ) . (4.2.5)
Introduce the following assumption.
(A4) Suppose u satisfies (A1), uand v satisfy (A2), and V satisfies (A3).
Under (A4), we can apply Lemma 3.2.1 and Lemma 3.2.2 to u, so that it holds for
every finite C, ˆ Dn(C) = sup ktk≤C 1 n n X j=1 kVnju(θ0+n−1/2t)−mu˜1(Zj;θ0)k2 =OP( 1 n). (4.2.6) sup ktk≤C kn−1/2 n X j=1 (Vnju(θ0+n−1/2t)−mu˜1(Zj;θ0)) +Autk=oP(1), (4.2.7)
where Au = −E[mu˙˜1(Z;θ0)] of full rank. Let Av = 0q×k and A = (A>u, A
>
v)
>. It follows from Theorem 3.1.1 holds for the JEL (4.2.4). Let ξ = (ξu>
(p×1), ξ >
v(q×1)) be the Lagrange multipliers. Then we have
πj = 1 n 1 1 +ξ> uVnju(θ) +ξ>vVnjv , j = 1, . . . , n,
where ξ satisfies n X j=1 vec(Vnju(θ), Vnjv) 1 +ξ> uVnju(θ) +ξv>Vnjv = 0.
Thus, similar to the discussion in the previous section, as n→ ∞, (4.2.4) attains its
maximum value in probability at some ˆθ in a shrinking neighborhood of θ0, and ˆθ
and ˆξ = ( ˆξu>,ξˆv>)> = (ξu(ˆθ)>,ξˆv>)> satisfy C1n(ˆθ,ξˆ) = 0, C2n(ˆθ,ξˆ) = 0, (4.2.8) where C1n(θ, ξ) = 1 n n X j=1 vec(Vnju(θ), Vnjv) 1 +ξ> uVnju(θ) +ξv>Vnjv , (4.2.9) C2n(θ, ξ) = 1 n n X j=1 ξu>V˙nju(θ) 1 +ξ> uVnju(θ) +ξv>Vnjv . (4.2.10)
Therefore, as a corollary of Theorem 4.1.1, the asymptotic properties of the MELE’s ˆ
θ of (4.2.4) can be obtained as stated below. As a convention, we drop the argument
at the true value of parameter so that u(Z;θ0) = u(Z), E(u(Z;θ0)⊗2) = E(u⊗2(Z))
and of course E(u⊗2(Z)) =E(u⊗2), etc. Under (A4),V is invertible. Let the inverse
of the block matrix V be V−1 = (Vij)
i,j=1,2. By the inverse formulas for a block
matrix, we have V−1 = V11−·12 −V11−·12E(˜u1v˜>1)(E(˜v ⊗2 1 )) −1 −V22−·11E(˜v1u˜>1)(E(˜u ⊗2 1 ))−1 V −1 22·1 , (4.2.11) where V11·2 =E(˜u⊗12)−E(˜u1v˜>1) E(˜v ⊗2 1 ) −1 E(˜v1u˜>1), V22·1 =E(˜˜v1⊗2)−E(˜v1u˜>1) E(˜u ⊗2 1 ) −1 E(˜u1v˜>1).
Theorem 4.2.1 Suppose (A4) holds. Suppose also p = k. Then, as n tends to in-finity, with probability one θˆ(θ) attains its maximum at some θˆin a shrinking neigh-borhood of θ0, ξˆ= (0,ξˆv>)> solves (4.2.8), and satisfy
√
n(ˆθ−θ0)→N (0,V¯),
√
where ¯ V = E( ˙˜u1) −1 V11·2 E( ˙˜u1) −> , U¯ =V22·1−V21V11−·12V 12 . and θˆand ξˆare asymptotically uncorrelated.
Remark 4.2.1 As the asymptotic variance of the estimate of the UGEE (4.2.3) is Σ = E( ˙˜u1)
−1
E(˜u⊗12) E( ˙˜u1)
−>
, the asymptotic variance of the MELE’s satisfies ¯
V ≤Σ.
Now let us expand the constraints in (4.2.4) as n X j=1 πjVnju(θ) = 0, ϑ∈Θ, (4.2.12a) n X j=1 πjVnjv = 0. (4.2.12b)
Based on (4.2.12b), we look at the JEL,
Rn = sup nYn j=1 nπj :π ∈Pn, n X j=1 πjVnjv = 0 o . (4.2.13)
This has the solution ˆ πj = 1 n 1 1 + ˆξ> vVnjv , j = 1, . . . , n, where ˆξv satisfies n X j=1 Vv nj 1 +ξ> vVnjv = 0. (4.2.14)
Naturally, we substitute ˆπj in (4.2.12a) to get
1 n n X j=1 Vnju(θ) 1 + ˆξ> v Vnjv = 0. (4.2.15)
Therefore, we find ξu = 0, and ˆξ = (0p×1,ξˆv). It is worth to note that (4.2.14) –
(4.2.15) are identical to (4.2.9) – (4.2.10). Consequently, we can find the MELE’s by solving (4.2.14) – (4.2.15). We refer the MELE’s as the solutions to (4.2.15) to as
the easy empirical likelihood estimates as they have less computational burden than
those usual MELE’s as the solutions to (4.2.9) – (4.2.10) (or (3.1.9)) as pointed by Peng (2015).
5. APPLICATIONS: EXAMPLES
In this chapter, we give several examples.
5.1 Estimating the expected values with side information
We are interested in estimation of the expected value θ = E(ψ(Z1, . . . , Zm)) for
some known function ψ : Rm 7→
Rd in the presence of side information given by
ET(Z1, . . . , Zm) = a for some measurable function T : Rm 7→ R and constant a.
Without loss of generality, we assume bothψ andT are argument-symmetric. Hence
our kernel functions are given by (u(Z
1, . . . , Zm;θ) =ψ(Z1, . . . , Zm)−θ, (5.1.1a)
v(Z1, . . . , Zm) = T(Z1, . . . , Zm)−a. (5.1.1b)
We shall apply Theorem 4.2.1 to derive the asymptotic behaviors.
Let us first mention that U-statistics are quite general. Heffernan (1997) showed that a statistical functionalθ =θ(Q) of a distributionQadmits an unbiased estimator iff there is a functionψ ofm variables such thatθ(Q) = R· · ·R
ψ dQm, and derived the
U-statistic as the unique MVUE of a central moment. Moment based distribution characteristics (e.g. Pearson’s correlation) are functions of central moments, so that the sample versions as test statistics can be expressed as functions of U-statistics.
Example 1 Estimating the mean difference in the presence of known covariance. Let Z = (X, Y) be a bivariate random vector with finite second
mo-ments. We are interested in estimating the mean differenceθ=E(X−Y) when there
is available the side information Cov(X, Y) =a for some knowna. Let Zi = (Xi, Yi),
i= 1, . . . , n be a random sample of Z. Let u(Z1, Z2;θ) =
1
be the symmetrized kernel function. The side information can be expressed by the U-statistic based equation E(Un(v)) = 0 with the kernel equal to
v(Z1, Z2) =
1
2(X1Y1+X2Y2−X1Y2−X2Y1)−a.
We shall apply Theorem 4.2.1 to derive the asymptotic behavior of the MELE ˆθ.
Let E(X) =µ1,E(Y) = µ2, Var(X) = σ21, and Var(Y) =σ22. Then
u1(z1;θ) =E(u(Z1, Z2;θ)|Z1 =z1) =E(1 2(x1−y1) + 1 2(X2−Y2)−θ) = 1 2(x1−y1) + 1 2(µ1−µ2)−θ, = 1 2(x1−y1)− 1 2θ, v1(z1) =E(v(Z1, Z2)|Z1 =z1) = 1 2(x1y1+E(X2Y2)−x1µ2−y1µ1)−a = 1 2(x1y1+µ1µ2+a−x1µ2−y1µ1)−a, = 1 2(x1−µ1)(y1−µ2)− 1 2a, and ˙ u1(z;θ) =− 1 2.
Let ˆθ be the MELE. Then it follows from Theorem 4.2.1 that
√ n(ˆθ−θ0)→N (0,V¯), where ¯ V =E((X−Y)−θ0)2− E[ (X−Y)−θ0 (X−µ1)(Y −µ2)−a ]2 E (X−µ1)(Y −µ2)−a 2 =σ21 +σ22−2a− E (X−µ1)2(Y −µ2) −E (X−µ1)(Y −µ2)2 2 E (X−µ1)2(Y −µ2)2 +a2 .
Example 2 Estimating the DF in the presence of known CV. By (4.1.9),
we can construct an improved distribution function (DF) F of a random variable Z
in when side information is available. Let Z1, . . . , Zn be i.i.d. copies ofZ. Let
ψ(Z1, Z2;t) =
1
Then
θt=E(ψ(Z1, Z2;t)) =
1
2(P(Z1 ≤t) +P(Z2 ≤t)) =F(t).
Often we have some side information aboutF. Here we assume the side information
is given byσ/µ=c0, that is, the coefficient of variation ofZ equals to a constantc0.
This side information can be expressed as a U-statistic equation by taking T(Z1, Z2) =
1 2(Z
2
1 +Z22)−(1 +c20)Z1Z2.
Therefore, with the kernel functions equal to u(Z1, Z2;θt) = 1 2(1[Z1 ≤t] +1[Z2 ≤t])−θt, v(Z1, Z2) = 1 2(Z 2 1 +Z 2 2)−(1 +c 2 0)Z1Z2,
the jackknife pseudo values Vu
nj, Vnjv can be computed, and the estimates (ˆθt,ξˆ) can
be obtained as the solutions to the estimating equations (3.1.9). Alternatively, we can apply (4.2.14) and (4.2.15) to this example and obtain computationally faster
estimates as the solutions to the below equations: Find ˆξ as the solution to
n X j=1 Vv nj 1 +ξ> vVnjv = 0,
while ˆθt is the solution to
1 n n X j=1 Vnju(θt) 1 + ˆξ> v Vnjv = 0. As ˆ πj = 1 n 1 1 + ˆξ> vVnjv , j = 1, . . . , n. by (4.1.9), an efficient estimate of the DFF(z) is given by
ˆ Fn(z) = n X j=1 ˆ πj1[Zj ≤z].
Here we suppress the dependence of ˆπj on the fixed t.
Example 3 Estimating the convolution with side information. Let Z1,
σ2 = Var(Z
1). Let ψ(z1, z2) = 1[z1 +z2 ≤ t], z1, z2 ∈ R for known t ∈ R. Hence
θt=P(Z1+Z2 ≤t) is the convolution ofZ1 and Z2. Suppose that there is available
the side information that the coefficient of variation is known: σ/µ = c0 for some
known constantc0. The kernel functions can be constructed as
u(Z1, Z2;θt) = 1(Z1+Z2 ≤t)−θt, v(Z1, Z2) = 1 2(Z 2 1 +Z22)−(1 +c20)Z1Z2.
LetF be the DF of Z. It follows
u1(z1;θ) =E(u(Z1, Z2;θ)|Z1 =z1) = F(t−z1)−θt, v1(z1) =E(v(Z1, Z2)|Z1 =z1) = 1 2(z 2 1 + (1 +c 2 0)µ 2 )−(1 +c20)µz1; and ˙ u1(z1;θt) = −1.
By Theorem 4.2.1, the MELE ˆθt is asymptotically normally:
√ n(ˆθt−θt)→N (0,V¯t), where ¯ Vt=E(F(t−Z1)−θt)2 − E(F(t−Z1)−θt)(12(Z12+σ2+µ2)−(1 +c20)µZ1) 2 E(12(Z2 1 + (1 +c20)µ2)−(1 +c20)µZ1)2 .
Example 4 Estimating Gini’s mean difference with side information.
Let Z1, . . . , Zn be i.i.d. with r.v Z. Gini’s mean difference of Z1 and Z2 is
de-fined as E|Z1−Z2|, which is an alternative index of variability. It is estimated by
the U-statistic Un2 = n 2 −1 X 1≤i<j≤n |Zi−Zj|.
Suppose there is available side information that the inter-quartile range is known, P(q1 ≤Z ≤q3) = 0.5,
for some known constants q1, q3. Clearly the U-statistic as estimate does not utilize
this side information. Here we use the JEL method to incorporate the side informa-tion. To this end, let θ = E|Z1 −Z2|. The kernel is then given by u(Z1, Z2;θ) =
|Z1−Z2|−θ. The side information can be expressed by the U-statistic with the kernel
equal to
v(z1, z2) = (1[q1 ≤z1 ≤q3] +1[q1 ≤z2 ≤q3])/2−0.5.
It is easy to calculate
u1(z1;θ) =E|z1−Z2| −θ,
v1(z1) = 1[q1 ≤z1 ≤q3]/2−0.25,
and ˙u1(z1;θ) = −1. By Theorem 4.2.1, the MELE ˆθ has asymptotic normal
distribu-tion with the variance equal to
E(u21)−(E(u1v1))2/E(v21) =E(u 2
1)−16(E(u1v1))2.
Example 5 Estimating the overdispersion parameter with side infor-mation. Overdispersion is common in count data. This can be modeled by
overdis-persion parameter φ as
Var(Z) = φE(Z), φ >1. (5.1.2) LetZ1, . . . , Zn be count data (frequency data). Suppose there is available side
infor-mation that
P(Z = 0) =P0.
We are interested in estimating the overdispersion parameter φ. Clearly (5.1.2) can
be written as
E(Z2) =φE(Z) + (E(Z))2, which is equivalent to
E(Z12−φZ1−Z1Z2) = 0.
Thus the corresponding U-statistic based equation with the kernel equal to the sym-metrized function: u(Z1, Z2;φ) = 1 2(Z 2 1 +Z 2 2)− φ 2(Z1+Z2)−Z1Z2. (5.1.3)
The jackknife pseudo value of (5.1.3) is one constraint for estimating φ. Moreover, the kernel function of the side information can be constructed as
v(Z1, Z2) =
1
2(1[Z1 = 0] +1[Z2 = 0])−P0. (5.1.4)
Now we apply Theorem 4.2.1 to (5.1.3) and (5.1.4) to estimateφ. LetE(Z) = µ. We
have u1(z1;φ) =E(u(Z1, Z2;φ)|Z1 =z1) =E 1 2(z 2 1 +Z 2 2)− φ 2(z1+Z2)−z1Z2 =1 2z 2 1 − φ 2z1+ 1 2µ 2−z 1µ; v1(z1) =E(v(Z1, Z2)|Z1 =z1) =1 2(1[z1 = 0] +P(Z2 = 0))−P0 =1 2(1[z1 = 0]−P0), and ˙ u1(z;θ) = z 2. Hence, by Theorem 4.2.1, √ n( ˆφ−φ0)→N (0,V¯), where ¯ V = 4 µ2 E(u21)− E(u1v1) 2 E(v2 1) = 4 µ2 E(u21)−4 E(u1v1) 2 P0(1−P0) .
Example 6 The simplicial depth function. Let Z1, . . . , Zn be i.i.d. with a
distribution Q on Rm. The simplicial depth function D(z) of a point z ∈
Rm with
respect to distribution Q is defined as follows:
D(z) = P z∈∆(Z1, . . . , Zm+1)
, z ∈Rm,
where ∆(Z1, . . . , Zm+1) denotes the random simplex with vertices Z1, . . . , Zm+1, i.e.,
value of the population simplicial depth at pointz. The usual estimate of the depth function is the sample simplicial depth Dn(z) given by the U-statistic,
Dn(z) = n m+ 1 −1 X 1≤i1<···<im+1≤n 1[z ∈∆(Zi1, . . . , Zim+1)], z ∈R m.
The depth function can be used to define the multivariate medians and possess ro-bustness property. When additional information is available about the underlying
distribution Q, our JEL approach is capable to employ it into the estimation. In
this example, we assume the marginal medians of Z are known as the side
informa-tion. Let Z = (Z(1), . . . , Z(m))>, and M = (a(1), . . . , a(m))> where a(l) = med(Z(l)), l = 1, . . . , m. Let Zj = (Z (1) j , . . . , Z (m) j ) >, j = 1, . . . , n. Fix z ∈ Rm and D =D(z).
The kernel functions can be constructed as
u(Z1, . . . , Zm+1;D) = 1[z ∈∆(Z1, . . . , Zm+1)]−D; and v(l)(Z1, . . . , Zm+1) = 1 m+ 1 m+1 X j=1 1[Zj(l) ≤a(l)]− 1 2 , l = 1, . . . , m. Thus u1(z1;D) =P z ∈∆(z1, Z2, . . . , Zm+1) −D; v1(l)(z1) = 1 m+ 1 1[z (l) 1 ≤a (l) ]− 1 2 + 1 m+ 1 m+1 X j=2 P(Zj(l) ≤a(l))− 1 2 = 1 m+ 1 1[z (l) 1 ≤a (l)]− 1 2 , l= 1, . . . , m, and ˙ u1(z;D) =−1. Letv1 = (v (1) 1 , . . . , v (m) 1 )
>. Hence, by Theorem 4.2.1, the MELE ˆDsatisfies
√ n( ˆD−D)→N (0,V¯), where ¯ V =E(u21)−E(u1v>1) E(v1v>1) −1 E(v1u1).
5.2 Smoothed U-quantiles
The theory of U-quantile provides a unified treatment of several commonly used
statistics, see Arcones (1993). Let κ:Rm 7→
Rbe a measurable argument-symmetric
function. Associated with κ there induces a distribution function
H(t) = P(κ(Z1, . . . , Zm) ≤ t), t ∈ R. The minimum variance unbiased estimate
(MVUE) of H(t) is the U-statistic of order m given by
Un(t) =Unm(t) =: n m −1 X 1≤i1<···<im≤n 1[κ(Zi1, . . . , Zim)≤t], t∈R, (5.2.1)
and κ shall be referred to as the kernel (of the U-quantile). AsH(t) is a distribution function, its p-th quantile tp is well defined by tp = inf{t : H(t)≥p} for p ∈ [0,1].
The U-quantiles include the Hodges-Lehmann median estimator, Gini’s mean differ-ence, Theil’s estimator of the slope in a simple linear model, and Kendall’s tau. They correspond to the U-quantiles with p0 = 1/2 and the kernels κ(z1, z2) = 2−1(z1+z2),
|z1−z2|, (y1−y2)/(x1−x2), and (x1−x2)(y1−y2) respectively.
As the U-quantiles are discontinuous, our theory does not apply here. We now
consider smoothed U-quantiles. Let F be a continuous DF. A continuous estimator
of H(t) is the smoothed U-quantile,
Hnm(t) = n m −1 X 1≤i1<···<im≤n Fb((t−κ(Zi1, . . . , Zim)), t∈R, (5.2.2)
where Fb(t) = F(t/b) with b a bandwidth. This is the smoothed version of the
U-statistic in (5.2.1). Given p ∈ [0,1], the p-th U-quantile solves H(tp) = p. The
smoothed sample p-th U-quantile ˆtp is a solution of Hnm(tp) = p.
Let us take the Theil-Sen estimator for an illustration. The Theil-Sen estimator
is a robust estimator of the slope in a simple linear model. Suppose that Zi =
(Xi, Yi)>, i= 1, . . . , n are independent and satisfy
whereαis the intercept andβ is the slope, andεi, i= 1, . . . , nare i.i.d. The Theil-Sen
estimator of the slope is the median of the slopes (Yi−Yj)/(Xi−Xj):
β = med{(Yi−Yj)/(Xi−Xj) : 1≤i < j ≤n},
whereX1, . . . , Xn are assumed to be distinct for the sake of convenience. In this case,
κ(z1, z2) = (y1−y2)/(x1−x2) and
H(t) = P((Y1−Y2)/(X1 −X2)≤t).
While the corresponding U-statistic is Un2(β) = n 2 −1 X 1≤i<j≤n 1[(Yi−Yj)/(Xi−Xj)−β <0],
the smoothed version is Hn2(β) = n 2 −1 X 1≤i<j≤n Fb β−(Yi−Yj)/(Xi−Xj) . (5.2.3)
Hence, to estimate β, one constraint can be chosen as
Hn2(β)−
1
2 = 0. (5.2.4)
In a simple linear model, we assume errors εi, i = 1, . . . , n are i.i.d. with the
normal with zero mean. Therefore, we have P(0≤εj) =
1
2. (5.2.5)
Here we relax the normality assumption to (5.2.5), the assumption of zero median of the error and use it to improve the efficiency of the Theil-Sen estimator. The U-statistic based equation of (5.2.5) is
n 2 −1 X 1≤i<j≤n 1[0< εi+εj 2 ]− 1 2 = 0. (5.2.6)
Thus, we obtain the smoothed version of (5.2.6) n 2 −1 X 1≤i<j≤n Fb( εi+εj 2 )− 1 2 = 0. (5.2.7)
This gives us another constraint.
Now let us apply Theorem 4.1.1 to get the asymptotic distribution of the im-proved Theil-Sen estimator β. From (5.2.4) and (5.2.7), the kernel is h(Z1, Z2;β) =
(h(1)(Z1, Z2;β), h(2)(Z1, Z2;β))>, where h(1)(Z1, Z2;β) =Fb β−(Y1−Y2)/(X1−X2) −1/2, h(2)(Z1, Z2;β) =Fb (Y1+Y2−β(X1+X2))/2 −1/2. Therefore, h(1)1 (z1;β) =E[Fb(β−(y1−Y2)/(x1−X2))]−1/2, h(2)1 (z1;β) =E[Fb((y1+Y2−β(x1+X2))/2)]−1/2.
Letf be the pdf of F and fb(t) =f(tb). SinceF is a cdf, it follows
˙ h(1)1 (z1;β) =E b−1fb(β−(y1−Y2)/(x1 −X2)) , ˙ h(2)1 (z1;β) =E −(x1+X2)/(2b)fb((y1+Y2−β(x1+X2))/2) .
Using these we can obtain the asymptotic normal distribution of the MELE by The-orem 4.1.1.
6. AN EL APPROACH TO GOODNESS-OF-FIT TESTING
FOR MULTIVARIATE AND HIGH DIMENSIONAL
SYMMETRIES
In this chapter, we develop empirical likelihood tests to various multivariate and high dimensional symmetries based on the characterizations of the symmetries. We report some simulation results.
6.1 Testing multivariate symmetries 6.1.1 Spherical symmetry
A random vector X in Rd is spherically symmetric about a point θ ∈
Rd if
X−θ= Γ(d X−θ),
for every orthogonal d×d matrix Γ, where= denotes both sides of the equality haved
an identical distribution. Spherical symmetry is equivalent to the assertion that the radius V =kX−θkis independent of the spatial unit vectorU = (X−θ)/kX−θk,
which is uniformly distributed on the unit sphere Sd−1 in
Rd, i.e. U ∼ U(Sd−1).
Independence of V and U of course implies
E[aj(V)bk(U)] = 0, aj ∈L2,0(FV), bk ∈L2,0(FU), j, k= 1,2, . . . , (6.1.1) where L2(F) = h:R h2dF <∞ and L2,0(F) = h∈L2(F) : R h dF = 0 for a
distributionF, andFV andFU are the distribution functions ofV andU respectively.
There are numerous choices foraand b, for example, one can choose the sign function
a1(v) = sign(v), Huber’s function a2(v) = v1[kvk ≤1.4] + 1.4sign(v)1[kvk>1.4] and
AsU is uniformly distributed overU(Sd−1), we can use it as side information. To
this end, we shall resort to the Jackknife empirical likelihood for vector U-statistics developed by Tan, et al. (2015). Let (Vi, Ui), i = 1, . . . , n be a random sample of
(V, U), and let R = U1 + U2 and R0 = R/kRk. It is well known that U being
uniformly distributed over the unit sphere U(Sd−1) is equivalent to the assertion
that kRkand R0 are independent. Independence implies
E(cl(kRk)R0) = 0, cl∈L2,0(G), l= 1,2, . . . ,
where G denotes the distribution function of kRk. Note that G is known and
com-putable. We mention the formula below in the usual case.
Remark 6.1.1 Ford= 3, the distribution function G of kRk is given by G(r) = 1 16π2 Z kxk=1,kyk=1,kx+yk≤r dx1dx2dy1dy2 p (1−x2 1−x22)(1−y21−y22) , where x= (x1, x2, x3), y = (y1, y2, y3) and 0≤r≤2.
We can chooseclas foraj. AsGis known, a systematic way of choosingcl, l= 1,2, . . .
is to take them to be basis functions of L2,0(G), for example, cl = ϕl ◦G, where
ϕl(t) =
√
2 cos(lπt), t ∈ (0,1) is the usual orthonormal trigonometric basis. Denote cL= (c1, . . . , cL)> for some positive integer L. Let κ(U1, U2) =cL(kRk)⊗R0, where
⊗ denotes the Kronecker product. Then it is an argument-symmetric vector kernel
and satisfies E(κ(U1, U2)) = 0 by the preceding independence. The vector U-statistic
with κ as the kernel is now given by
Un(cL) = n 2 −1 X 1≤i<j≤n κ(Ui, Uj).
The Jackknife pseudo values of the vector U-statistics are calculated by Vnj =nUn(cL)−(n−1)U
(−j)
n−1(cL), j = 1, . . . , n,
whereUn(−−j)1 is the vector U-statistic based on then−1 observations with the deletion
(6.1.1) as constraints to construct the Jackknife empirical likelihood ratio with side information as follows: Rssu n = sup nYn j=1 nπj :π ∈Pn, n X j=1 πjvec aJ(Vj)⊗bK(Uj), Vnj(cL) = 0o,
whereaJ = (a(1), . . . , a(J))>,bK = (b(1), . . . , b(K))>, and vec(X, Y) denotes the column
vector consisting of stacking X, Y. Let Rss
n be the empirical likelihood ratio when
the Jackknife pseudo values are not included. It follows from Corollary 3.1 of Tan,et
al. (2015) that the following holds. Denote κ1(u) =E(κ(U1, U2)|U1 =u).
Theorem 6.1.1 Suppose the covariance Cov(vec(aJ(V), bK(U), κ1(U)) has full rank
J K +Ld. Then −2 logRssu
n has an asymptotic chisquare distribution with J K+Ld
degrees of freedom, that is,
−2 logRnssu ⇒χ2(J K+Ld).
If E(a⊗J2(V)E(b⊗K2(U)) has full rank J K, then −2 logRnss ⇒χ2J K.
A systematic way of choosing a(j) is a(j) = ϕ
j ◦ FV, j = 1,2, . . ., which is an
orthonormal basis of L2,0(FV). But it is not computable as FV is unknown. One
can estimate it by the empirical distribution function ˆFV and obtain computable
ˆ
a(j) =ϕj◦FˆV. With estimated constraints, we now work with the empirical likelihood
ratio: ˆ Rss n = sup nYn i=1 nπj :π ∈Pn, n X j=1 πjˆaJ(Vj)⊗Uj = 0 o .
We will allow J, d to depend on the sample size n, J = Jn, d = dn, and grow to
infinity slowly. The following is the asymptotic result with the proof delayed to the last section.
Theorem 6.1.2 Suppose Jndn→ ∞ but Jn3d5n+Jn4d3n+Jn5d3n=o(n). Then
−2 log ˆ√Rnss−Jndn
Jndn
This shows −2 log ˆRss
n is approximately chisquare distributed with Jndn degrees of
freedom. Thus for 0< α <1,
P −2 log ˆRnss> χ2Jndn(1−α)→α, whereχ2
d(1−α) is the (1−α)-th percentile of the chisquare distribution withddegrees
of freedom. Accordingly the test1[−2 log ˆRss
n > χ2Jndn(1−α)] has an asymptotic size
α. If Jn =J for alln, then this is the case of the sphere with infinity dimension. If
dn =d for all n, then this is the case of a d-dimensional sphere.
6.1.2 Rotational symmetry
A random vector X ∈ Sd is rotationally symmetric about a fixed directionθ if
X−θ =d O(X−θ),
for every d×d rotation matrixO about the direction θ in Rd. Rotational symmetry
is equivalent to the assertion that the projection T = θ>X of X onto the direction
θ is independent of the unit tangent ξ at θ to Sd−1, which is uniformly distributed
on Sd−2, i.e. ξ ∼ U(Sd−2(θ)), where Sd−2(θ) =
x∈Rd:kxk= 1, x>θ = 0 . For
more details, see page 179 of Madia and Jupp (2000). Notice that X satisfies the
tangent-normal equation,
X =T θ+√1−T2ξ. (6.1.3)
Rotationally symmetric distributions include von Mises-Fisher-type distributions with densities of the form f(θ>x), x∈ Sd, Waston-type distributions with densities of the
form g κ(θ>x)2, x ∈ Sd, and Bingham-type distributions of densities of the form
h(x>Kx), x∈ Sd, where f, g, h are nonnegative functions and
K is a positive definite
matrix.
In modeling directional and axial data using parametric distributions, one often wishes to test the null hypothesis that the underlying distribution is rotationally
test is the empirical likelihood ratio elaborated below. Let (Ti, ξi), i = 1, . . . , n be a
random sample of (T, ξ), where T = θ>0X and ξ = (X −T θ0)/
√
1−T2. As in the
spherical symmetry, the uniform distribution of ξ can be used as side information.
To this end, let R = ξ1 +ξ2 and R0 = R/kRk. Again the statement that ξ ∼
U(Sd−2) is equivalent to the assertion thatkRk and R0 are independent. Let G be
the distribution function ofkRk. Let d(m) =ϕm◦GanddM = (d(1), . . . , d(M))>. This
is computable as G is known. Let κ(ξ1, ξ2) = dM(kRk)⊗R0. Then it is
argument-symmetric and satisfies E(κ(ξ1, ξ2)) = 0 by the preceding independence. The vector
U-statistic with κ as kernel is now given by
Un(dM) = n 2 −1 X 1≤i<j≤n κ(ξi, ξj).
The Jackknife pseudo values of the vector U-statistics are calculated by Vnj(dM) =nUn(dM)−(n−1)U
(−j)
n−1(dM), j = 1, . . . , n,
Analogously we construct the Jackknife empirical likelihood ratio with side informa-tion as follows: Rrsu n = sup nYn j=1 nπj :π ∈Pn, n X j=1 πjvec(aJ(Tj)⊗bK(ξj), Vnj(dM) = 0o,
for some choices of aJ = (a(1), . . . , a(J))> and bK = (b(1), . . . , b(K))>. Let Rnrs be
the empirical likelihood ratio when the jackknife pseudo values are not included.
It follows from Corollary 3.1 of Tan, et al. (2015) that the following holds. Let
ξ1(x) =E(κ(ξ1, ξ2)|ξ2 =x).
Theorem 6.1.3 Suppose the matrixCov(vec(aJ(T)), bK(ξ), κ1(ξ))has full rankJ K+
M(d−1). Then−2 logRrsu
n ⇒χ2(J K+M(d−1)).IfE(a
⊗2
J (T)E(b
⊗2
K (ξ))has full rankJ K,
then −2 logRnrs⇒χ2J K.
As the distribution FT of T is unknown, we estimate it by the empirical
ˆ
aJn = (a
(1), . . . , a(Jn))> = (ϕ
1, . . . , ϕJn)
> ◦Fˆ
T. With the estimated constraints, the
empirical likelihood ratio as test statistic is now given by ˆ Rrs n = sup nYn j=1 nπj :π ∈Pn, n X j=1 πjˆa(Jn)(Tj)⊗ξj = 0 o .
We have the following.
Theorem 6.1.4 Suppose Jndn→ ∞ but Jn5d5n=o(n). Then
−2 log ˆRrs n −Jn(dn−1) p Jn(dn−1) ⇒N (0,1). (6.1.4) Thus −2 log ˆRrs
n is approximately chisquare distributed with Jn(dn−1) degrees of
freedom and the test 1[−2 log ˆRnrs > χ2J
n(dn−1)(1−α)] has an asymptotic size α ∈
(0,1).
6.1.3 Antipodally symmetric distributions
LetX have the continuous generalized Scheiddegger-Watson distribution, i.e., the
density is of the form g(kxvk), x ∈ Sd−1, where g is some known function, and xv is
the part ofx in ans-dimensional subspaceV. Then the tangent-norm equation of X
is given by
X =T η+ (1−T2)1/2ξ, T =kxvk, η∈ V, ξ∈ V⊥.
where kηk = 1, η ∈ V, kξk = 1, ξ ∈ V⊥. Here we take η = X
v/kXvk, ξ = (X −
T η)/(1−T2)1/2. A relationship similar to the rational symmetry is that T, η and ξ
are independent, and ηand ξ are uniformly distributed on unit spheres inV and V⊥.
Analogous to the preceding discussions, one can construct an empirical likelihood ratio test and we shall omit the details.
6.1.4 Coordinatewise symmetry
A random vector X inRd has a distribution coordinatewise symmetric about θ if
(X1−θ1, . . . , Xd−θd) d
Coordinatewise symmetry of X about θ is equivalent to the assertion that the coor-dinatewise radius vector V = (V1, . . . , Vd)> is independent of the coordinatewise sign
vectorU = (U1, . . . , Ud)>, whereVj =kXj−θjkandUj = sign(Xj−θj). Independence
implies
E[a(j)(V)b(k)(U)] = 0, a(j)∈L2,0(FV), b(k) ∈L2,0(FU), j, k= 1,2, . . . .
Analogous to the preceding discussions, we choose severala(j) ∈L
2,0(FV), j = 1, . . . , J
and b(k) ∈L2,0(FU), k= 1, . . . , K and construct the empirical likelihood ratio:
Rcs n = sup nYn i=1 nπj :π ∈Pn, n X j=1 πjaJ(Vi)⊗bK(Ui) = 0 o , By Owen’s theorem, we have the following.
Theorem 6.1.5 Suppose the matrix E(aJa>J(V))⊗E(bKb>K(U)) has full rank J K.
Then
−2 logRncs ⇒χ2J K.
6.1.5 Exchangeability
A random vector X in Rd is exchangeable if
(X1, . . . , Xd) d
= (Xπ1, . . . , Xπd)
for every permutation π1, . . . , πd of 1, . . . , d. Then O = (Xd,1, . . . , Xd,d)>, where
Xd,1 ≤. . .≤Xd,d are the order statistics of X, isindependent of R = (R1, . . . , Rd)>,
where Rj =Pdi=11[Xi ≤Xj] are the rank statistics. Independence implies
E[a(j)(O)b(k)(R)] = 0, a(j) ∈L2,0(FO), b(k) ∈L2,0(FR), j, k = 1,2, . . . .
In the same fashion, we choose several a(j) ∈ L
2,0(FO), j = 1, . . . , J and b(k) ∈
L2,0(FR), k = 1, . . . , K and construct the empirical likelihood ratio:
Res n = sup nYn j=1 nπj :π ∈Pn, n X j=1 πjaJ(Oj)⊗bK(Rj) = 0 o , where (Oj, Rj), j = 1, . . . , nis a random sample of (O, R).
Theorem 6.1.6 Suppose the matrix E(aJa>J(O))⊗E(bKb>K(R)) has full rank J K.
Then
−2 logRnes⇒χ2J K.
6.2 Simulation study
6.2.1 Testing the center of spherical symmetry
Consider the null hypothesis that the center of symmetry of a distribution is the origin of Rd versus the alternative hypothesis that the center is θ1 different from the
origin,
H0 :θ =0 vs H1 :θ =θ1.
We were interested in the power performance of the proposed tests at the nominal level of significanceα =.05 in different cases. Specifically, we looked at the following cases.
Case 1. How the power of the tests decreases as the dimension d grows with respect
to different sample sizes n. Case 2. How the power of the tests increases as θ1 moves
away from the origin for a fixed sample size n with respect to different dimensions d
and different number of constraints J, K. Case 3. How the side information increases
the power of the tests. We examined the test1[−2 log ˆRss
n > χ2Jndn(1−α)] in Cases 1
and 2 and the test 1[−2 logRssu
n > χ2J K+Ld(1−α)] in Case 3. In the latter situation,
we choose a(v) = (sign(v), v1[kvk ≤ 1.4] + 1.4sign(v)1[kvk>1.4])> and bK(U) =U.
As E(U) = 0, these choices still satisfy the equalities in (6.1.1) by the independence ofV andU even though the componentsa(1), a(2) ofaare not inL
2,0(FV). Examining
Cases 2 and 3, one observes that there is significant power increase and sample size reduction with the use of the side information
Table 6.1.
Case 1: The simulated α = .05 level of significance of the EL test
about the center of spherical symmetry H0 : θ = 0 with J = r,
d = dim, n = 400 and m = 2000 repetitions. Data generated from
multivariate normal. r=1 r=2 r=3 r=4 r=5 dim=1 0.0395 0.0520 0.0470 0.0585 0.0465 dim=2 0.0475 0.0610 0.0480 0.0460 0.0560 dim=3 0.0570 0.0605 0.0520 0.0615 0.0595 dim=4 0.0555 0.049 0.05450 0.06500 0.0705