Maximum empirical likelihood estimation in U-statistics based general estimating equations

(1)

Open Access Dissertations Theses and Dissertations

8-2016

Maximum empirical likelihood estimation in

U-statistics based general estimating equations

Lingnan Li

Purdue University

Follow this and additional works at:https://docs.lib.purdue.edu/open_access_dissertations

Part of theMathematics Commons, and theStatistics and Probability Commons

This document has been made available through Purdue e-Pubs, a service of the Purdue University Libraries. Please contact [email protected] for additional information.

Recommended Citation

Li, Lingnan, "Maximum empirical likelihood estimation in U-statistics based general estimating equations" (2016).Open Access Dissertations. 796.

(2)

PURDUE UNIVERSITY GRADUATE SCHOOL Thesis/Dissertation Acceptance

This is to certify that the thesis/dissertation prepared By

Entitled

For the degree of

Is approved by the final examining committee:

To the best of my knowledge and as understood by the student in the Thesis/Dissertation Agreement, Publication Delay, and Certification Disclaimer (Graduate School Form 32), this thesis/dissertation adheres to the provisions of Purdue University’s “Policy of Integrity in Research” and the use of copyright material.

Approved by Major Professor(s):

Approved by:

Head of the Departmental Graduate Program Date

Lingnan Li

Maximum Empirical Likelihood Estimation in U-statistics Based General Estimating Equations

Doctor of Philosophy Hanxiang Peng Chair Benzion Boukai Guang Cheng Wei Zheng Hanxiang Peng Evgeny Mukhin 7/23/2016

(3)

(4)

BASED GENERAL ESTIMATING EQUATIONS

A Dissertation Submitted to the Faculty

of

Purdue University by

Lingnan Li

In Partial Fulfillment of the Requirements for the Degree

of

Doctor of Philosophy

August 2016 Purdue University Indianapolis, Indiana

(5)

ACKNOWLEDGMENTS

I want to acknowledge the effort from Professor Hanxiang Peng, who graciously became to be my thesis advisor four years ago and inspired me to start the research in the area of empirical likelihood.

I would like to thank Professor Fei Tan, who helped me on the projects; to thank Professor Benzion Boukai, who admitted me to the statistics program long time ago and supported me through the time; to thank Professor Jyoti Sarkar, Professor Fang Li, and Professor Wei Zheng; and to thank Professor Guang Cheng for serving on my dissertation committee.

I would also like to acknowledge the effort from Professor Zhongmin Shen, who helped me become part of the Ph.D. program and one of Professor Peng’s students.

Finally, I would like to thank the Department of Mathematical Sciences of IUPUI, without whom this thesis would not have been possible at all.

(6)

TABLE OF CONTENTS Page LIST OF TABLES . . . v SYMBOLS . . . vi ABBREVIATIONS . . . viii ABSTRACT . . . ix 1 INTRODUCTION . . . 1

2 JACKKNIFE EMPIRICAL LIKELIHOOD FOR VECTOR U-STATISTICS 5 2.1 Vector U-statistics . . . 5

2.2 JEL for vector U-statistics . . . 6

3 MAXIMUM EMPIRICAL LIKELIHOOD ESTIMATION . . . 9

3.1 Asymptotic behaviors of the local logarithm of the JEL ratio . . . . 9

3.2 Some lemmas . . . 12

3.3 Proof of Theorem 3.1.1 . . . 14

4 EASY MAXIMUM EMPIRICAL LIKELIHOOD ESTIMATION . . . 15

4.1 MELE’s and semiparametric effeciency . . . 15

4.2 Easy maximum empirical likelihood estimation . . . 18

5 APPLICATIONS: EXAMPLES . . . 23

5.1 Estimating the expected values with side information . . . 23

5.2 Smoothed U-quantiles . . . 30

6 AN EL APPROACH TO GOODNESS-OF-FIT TESTING FOR MULTI-VARIATE AND HIGH DIMENSIONAL SYMMETRIES . . . 33

6.1 Testing multivariate symmetries . . . 33

6.1.1 Spherical symmetry . . . 33

6.1.2 Rotational symmetry . . . 36

6.1.3 Antipodally symmetric distributions . . . 38

6.1.4 Coordinatewise symmetry . . . 38

6.1.5 Exchangeability . . . 39

6.2 Simulation study . . . 40

6.2.1 Testing the center of spherical symmetry . . . 40

6.2.2 Testing rotational symmetry . . . 45

6.3 Proof of Theorem 6.1.2 . . . 45

(7)

Page

7.1 Proofs for Lemmas in Chapter 3 . . . 51

7.2 Proofs for Lemmas and Theorems in Chapter 4 . . . 65

REFERENCES . . . 70

(8)

LIST OF TABLES

Table Page

6.1 Case 1: The simulated α = .05 level of significance of the EL test about

the center of spherical symmetryH0 :θ=0withJ =r,d= dim,n= 400

and m= 2000 repetitions. Data generated from multivariate normal. . 41

6.2 Case 1: The simulated power of the test about the center of spherical

symmetry H0 :θ =0 vs H1 :θ = rep(0.4,dim) with J =r, d = dim, and

m = 2000 repetitions at α = 0.05 level of significance. Data generated

from multivariate normal. . . 42

symmetry H0 : θ = 0 vs H1 : θ = θ1 with J = r, d = dim, n = 400 and

m = 2000 repetitions at α = 0.05 level of significance. Data generated

from multivariate normal. . . 43

6.4 Case 3: The simulated α = .05 level of significance of the test about the

center of spherical symmetry H0 : θ =0 with J = K =r, d = L = dim,

n = 50, and m = 100 repetitions. Data generated from multivariate

normal. . . 44

symmetry H0 : θ = 0 vs H1 : θ = θ1 with J = K = r, d = L = dim,

n = 50, and m = 100 repetitions at α = 0.05 level of significance. Data

generated from multivariate normal. . . 44

6.6 Case 1: The simulated α= 0.05 level of significance of the test about the

direction of rotational symmetry H0 : θ = (0,0,1)> with d = 3, J = r,

and m = 2000 repetitions. Data generated from the von Mises-Fisher

distribution. . . 46

6.7 Case 2: The simulated power of the test about rotational symmetry H0 :

θ = (0,0,1)> vs H1 : θ = (0.14,0.14,0.98)> with d = 3, J = r, and

m = 2000 repetitions at α = 0.05. Data generated from the von

(9)

SYMBOLS

X, Y, Z, . . . column random variables (vectors) x, y, z, . . . realizations of X, Y, Z, . . .

a, b, c, . . . constants

α, β, θ, . . . column parameter vectors

N(θ0) a neighborhood of θ0

N (µ,Σ) normal distribution with µ and variance (matrix) Σ

R the real space

Rm the m dimensional real (column) vector space

Rn,Rˆn empirical likelihood (ratio)

π1, . . . , πn probability weights

P, E, V ar, Cov probability, expected value, variance, covariance f, g, h, . . . column vector functions

˜

f the centered vector function off i.e. ˜f =f −E(f(Z))

Unm(h) a U-statistic of order m with kernel h

θ0 the true value of parameter

Θ the parameter space

kak the euclidean norm of a vectora

a> the transpose of a vector a

A> the transpose of a matrix A

A−1 _{the inverse of a matrix} _A

A⊗2 AA>

˙

f(z;θ) _∂θ∂f(z;θ)∈_Rk _{(column vector) for} _θ _∈

Rk, f ∈R

∂

∂θ>f(z;θ) a row vector of partial derivatives for θ∈Rk, f ∈R ˙

f(z;θ) _∂θ∂>f(z;θ) the m×k matrix for θ ∈Rk, f ∈Rm ˙

f(z;θ)> _∂θ∂f(z;θ)> the k×m matrix for θ ∈_Rk_{, f} _∈

(10)

¨

f(z;θ) _∂θ∂θ∂2>f(z;θ) = _∂θ∂>_∂θ∂f(z;θ) the k×k matrix forθ ∈Rk, f ∈R

→p convergence in probability

(11)

ABBREVIATIONS

a.s. almost sure convergence

DF Distribution Function

CDF Cumulative Distribution Function

EDF Empirical Distribution Function

EL Empirical Likelihood

GEE General Estimating Equation

i.i.d. Independent and Identically Distributed

JEL Jackknife Empirical Likelihood

MELE Maximum Empirical Likelihood Estimate

MVUE Minimum Variance Unbiased Estimate

r.v. Random Variable

(12)

ABSTRACT

Li, Lingnan Ph.D., Purdue University, August 2016. Maximum Empirical Likelihood

Estimation in U-statistics Based General Estimating Equations. Major Professor:

Hanxiang Peng.

In the first part of this thesis, we study maximum empirical likelihood estimates (MELE’s) in U-statistics based general estimating equations (UGEE’s). Our technical maneuver is the jackknife empirical likelihood (JEL) approach. We give the local uniform asymptotic normality condition for the log-JEL for UGEE’s. We derive the estimating equations for finding MELE’s and provide their asymptotic normality. We obtain easy MELE’s which have less computational burden than the usual MELE’s and can be easily implemented using existing software. We investigate the use of

side information of the data to improve efficiency. We exhibit that the MELE’s

are fully efficient, and the asymptotic variance of a MELE will not increase as the number of UGEE’s increases. We give several important examples and demonstrate that efficient estimates of moment based distribution characteristics in the presence of side information can be obtained using JEL for U-statistics.

In the second part, we propose several JEL goodness-of-fit tests for spherical symmetry, rotational symmetry, antipodal symmetry, coordinatewise symmetry and exchangeability. We employ the jackknife empirical likelihood for vector U-statistics to incorporate side information. We use estimated constraint functions and allow the number of constraints and the dimension to grow with the sample size so that these tests can be used to test hypotheses for high dimensional symmetries. We demonstrate that these tests are distribution free and asymptotically chisquare distributed. We conduct extensive simulations to evaluate the performance of these tests.

(13)

(14)

1. INTRODUCTION

Empirical likelihood (EL) is a data-driven likelihood approach with nonparametric nature which is effective and requires few assumptions about the distribution of the data. Owen (1988, 1990, 1991) showed that the empirical likelihood ratio statistics have the limiting chi-square distribution under mild conditions. He also demonstrat-ed that tests and confidence intervals can be constructdemonstrat-ed. The empirical likelihood theory has been successfully extended to various areas of statistics with tremendous

accomplishments. These include Bartlett correction (DiCiccio, et al., 1991),

gener-alized linear models (Kolaczyk, 1994), heteroscedastic partially linear models (Lu, 2009), partially linear models (Shi and Lau, 2000; Wang and Jing, 2003), paramet-ric and semiparametparamet-ric models in multiresponse regression (Chen and Van Keilegom, 2009), right censored data (Li and Wang, 2003), U-statistics with side information

(Yuan, et al., 2012), and stratified samples with nonresponse (Fang, et al., 2009).

Qin and Lawless (1994) linked empirical likelihood with finitely many estimating

e-quations and investigated maximum empirical likelihood estimators. Chen, et al.

(2009) obtained asymptotic normality for the number of constraints growing to

infin-ity. Hjort,et al. (2009) and Peng and Schick (2013a, 2013b) generalized the empirical

likelihood approach to allow for the number of constraints to grow with the sample size and for the constraints to use estimated criteria functions. Algorithms, calibra-tion and higher-order precision of the approach can be found in Hall and La Scala (1990), Emerson and Owen (2009) and Liu and Chen (2010) among others.

In Owen’s homepage (http://statweb.stanford.edu/∼owen/empirical/) software

can be found. Here are two algorithms from this site: scel.R (R function to compute

empirical likelihood using a self-concordant convex criterion) andel.R(Mai Zhou’s R

(15)

U-statistics is a class of statistics which is especially useful in estimation. Many popular statistics such as high order moments information can be expressed by

U-statistics, see e.g. Serfling (1980), Kowalski and Tu (2008) and Lee (1990). Yuan, et

al. (2012) explored maximum empirical likelihood estimates (MELE’s) in U-statistics

with side information. However, usual EL method runs into serious computational difficulties when it’s applied to U-statistics. U-statistics are not independent but correlated so that they do not satisfy the independence or at least asymptotic inde-pendence which is assumed by the definition of empirical likelihood. Moreover, unlike the usual empirical likelihood, the nonlinearality of EL weightsπj’s in the constraints

equations results in that there are no explicit solutions for the EL weights. Jing, et

al. (2009) identified the asymptotic independence of the jackknife pseudo values of a

U-statistic and introduced their jackknife empirical likelihood (JEL) for U-statistics, and showed its effectiveness in handling one- or two-sample U-statistics. Some nice properties of the jackknife pseudo values of a U-statistic were exploited to establish the Wilks theorems for their cases. For example, the average of the jackknife pseudo values is equal to the U-statistic, and the sample variance of them is an asymptotically unbiased estimator of the asymptotic variance of the U-statistic.

Motivated by applications to goodness of fit U-statistic testing, Peng and Tan (2016) gave two approaches to justify the JEL for vector U-statistics and proved the Wilks theorems. They extended empirical likelihood for general estimating equations (GEE’s) to U-statistics based general estimating equations (UGEE’s). The results were extended to allow for the use of estimated constraints and for the number of constraints to grow with the sample size. They exhibited that the JEL can be used to construct EL tests for moment based distribution characteristics (e.g. skewness, coefficient of variation) with less computational burden and more flexibility than the usual EL. This can be done in the U-statistic representation approach and the vector U-statistic approach which were illustrated with several examples including JEL tests for Pearson’s correlation, Goodman-Kruskal’s Gamma, overdisperson, U-quantiles, variance components, and simplicial depth. They showed that tests are

(16)

asymptotically distribution free. They ran simulations to exhibit power improvement of the tests with incorporation of side information.

Soon it was realized that it can also be used to construct point estimators. Qin and Lawless (1994) linked empirical likelihood with GEE’s and investigated maximum em-pirical likelihood estimators (MELE’s). They established consistency and asymptotic normality of MELE’s under the usual regularity conditions, and demonstrated that the variance of a MELE will not increase when the number of estimating equations is increased. Furthermore, they showed that MELE’s are fully semiparametrically

efficient in the sense of least dispersed regular estimators (Bickel, et al. (1993), Van

der Vaart (2000)). Peng and Schick (2013) explored MELE’s in the case of constraint functions that may be discontinuous and/or depend on additional parameters. The latter is the case in applications to semiparametric models where the constraint func-tions may depend on the nuisance parameter. Zhang (1995, 1997) used the method of MELE’s to construct improved estimates in M-estimation and quantile processes with the availability of auxiliary (side) information. He established consistency and asymp-totic normality, and proved that the asympasymp-totic variances of the resulting estimators are smaller than those of the usual sample M-estimators and sample quantiles. It was utilized by Hellerstein and Imbens (1999) for the least squares estimators in a linear regression model and the application to a real data set was presented. These authors dealt with finitely many of constraints. Peng and Schick (2013) has employed on-step estimator to construct MELE’s. Peng (2015) developed a class of easy MELE’s which is computationally more efficient. Recently, Tang and Leng (2012) used this idea to construct more efficient estimators of parameters in quantile regression.

In this thesis, we study MELE’s in UGEE’s and their asymptotic behaviors. Our technical maneuver is the jackknife empirical likelihood approach. It is well known that the asymptotic behaviors of the U-statisticUn(h) is determined byh1 (see

Chap-ter 2). Here we shall apply the theory of Qin and Lawless (1994) on h1 to derive the

asymptotic behaviors of the MELE’s of the JEL for UGEE’s. These results for the UGEE’s are parallel to those of Qin and Lawless (1994). We obtain the uniform local

(17)

asymptotic normality for the logarithm of the JEL ratio in Chapter 3. We derive the estimating equations for the MELE’s in UGEE’s in Chapter 4. Here we also give a class of easy MELE’s and establish their asymptotic distribution. In Chapter 5, we provide a number of examples. Here we demonstrate that efficient estimates of moment based distribution characteristics in the presence of side information can be obtained using JEL for U-statistics. In Chapter 6, we propose several JEL tests for various multivariate and high dimensional symmetries. Some of the technical details are provided in Chapter 7.

(18)

2. JACKKNIFE EMPIRICAL LIKELIHOOD FOR

VECTOR U-STATISTICS

In this chapter, we recall some facts about one-sample multivariate U-statistics and introduce the JEL approach.

2.1 Vector U-statistics

Let (Z,S) be a measurable space andP be a probability measure on this space.

Let Z1, . . . , Zn be independent copies of a Z-valued random variable Z with

cumu-lative distribution functionF underP. Leth:_Rm 7→_Rd _{be a known function that is}

permutation symmetric in its m arguments. θ ∈ Θ is a parameter we are interested

in. A multivariate or vector U-statistic with kernel h of order m is defined as

Un =:Unm(h) =: n m −1 X 1≤i1<...<im≤n h(Zi1, . . . , Zim;θ), n≥2.

Throughout we assume h is Fm-square integrable, that is, h ∈ L2(Fm), where

L2(Fm) =

f :R

kfk2_dFm _<_∞ _{, where} _k_v_k _{denotes the euclidean norm of vecotr}

v. We assume throughout that

E(h(;θ)) =:E(h(Z1, . . . , Zm;θ)) = 0. (2.1.1)

This of course implies E(Un) = 0. Also, we shall abbreviate Pnf = n−1Pn_j=1f(Zj)

and P f =E(f(Z)). Lethm =h and

hc(z1, . . . , zc;θ) =E(h(z1, . . . , zc, Zc+1, . . . , Zm;θ)), c= 1, . . . , m−1.

Then hc is a version of the conditional expectation, that is,

(19)

Letδz be the point mass at z ∈ Z. We now define h∗_c(z1, . . . , zc) = (δz1 −P). . .(δzc −P)P m−c_h, _c_{= 0}_,₁_{, . . . , m.} Throughout we let ˜ f =f−P f

be the centered version of a functionf for whichP f is well defined. Clearlyh∗₁ = ˜h1.

2.2 JEL for vector U-statistics

Let U_n(−₋j)₁ denote the U-statistic based on the n−1 observations Z1, . . ., Zj−1,

Zj+1, . . ., Zn. The jackknife pseudo values of the U-statistic are defined as

Vnj =nUn−(n−1)U (−j)

n−1, j = 1, . . . , n.

LetRnj = ˜Vnj−m˜h1(Zj;θ). It has been shown in (4.6) of Peng and Tan (2016) that

each component of Rnj is ofOp(n−1/2), hence

˜

Vnj(θ) =m˜h1(Zj, θ) +Op(n−1/2), j = 1, . . . , n.

As argued in Peng and Tan (2016), this shows that each jackknife value ˜Vnj depends

asymptotically on Zj, so that ˜Vnj, j = 1, . . . , nare asymptotically independent. As a

result, if πj is a probability mass placed at Zj, then approximately the same

proba-bility massπj is placed at the jackknife value ˜Vnj forj = 1, . . . , n; because of the the

asymptotic independence of the jackknife values, the joint likelihood is approximately

the product of these πj’s. Consequently, it is justified to introduce the JEL for the

vector U-statistic Un(h) as follows:

ˆ Rn(θ) = sup n_Yn j=1 nπj :π ∈Pn, n X j=1 πjV˜nj(θ) = 0 o , θ ∈Θ, (2.2.1)

where Pn denotes the closed probability simplex in dimension n, i.e.

Pn =

(20)

Using Lagrange multipliers, Owen (1988) derived πj = 1 n 1 1 +ξ>_V˜ nj(θ) , j = 1, . . . , n, (2.2.2)

where ξ satisfies the equation

1 n n X j=1 ˜ Vnj(θ) 1 +ξ>_V˜ nj(θ) = 0. (2.2.3)

As (2.1.1) holds, ˜Vnj = Vnj. For notational brevity, we sometimes write ˜Vnj = Vnj.

Peng and Tan (2016) showed that the JEL for vector U-statistics are asymptotically chi-square distributed under the same usual assumption as for the asymptotic nor-mality of vector U-statistics, that is, ifV ar(h1(Z)) is nonsingular then−2 log ˆRn(θ0)

is asymptotic chi-square distributed with d degrees of freedom, i.e.

(21)

(22)

3. MAXIMUM EMPIRICAL LIKELIHOOD ESTIMATION

In this chapter, we give the local uniform asymptotic normality condition for the log-JEL for UGEE’s.

3.1 Asymptotic behaviors of the local logarithm of the JEL ratio

Let (Z,S) be a measurable space, Q be a family of probability measures onS,

and θbe a parameter of interest which is from an open subset Θ of_Rk_{. Let}_Z

1, ..., Zn

be independent and identically distributed (i.i.d.) Z-valued random variables with

an unknown distribution Q belonging to the model Q. Recall that a kernel function

h :_Rm _7→

Rd is permutation symmetric about its m arguments and satisfies (2.1.1).

We are interested in inference about the characteristic θ and work with the jackknife

empirical likelihood (JEL) ratio ˆRn(θ) in (2.2.1).

Qin and Lawless (1994) studied the maximum empirical likelihood estimator (MELE), ˆ θ= arg max θ∈Θ ˆ Rn(θ). (3.1.1)

Recall that the local JEL ratio is defined as ˆ Ln(t) = log ˆ Rn(θ0+n−1/2t) ˆ Rn(θ0) , t∈_Rk, θ0+n−1/2t∈Θ.

For a function f onZ ×Θ, let ˙f ,f¨denote the first and second partial derivative of

f with respect to parameter θ, that is,

˙ f(z;θ) = ∂ ∂θf(z;θ), ¨ f(z;θ) = ∂ 2 ∂θ∂θ>f(z;θ), z∈Z, θ ∈Θ. Recall ˜h=h−E(h). Let us introduce the following assumptions.

(A1) There exist a neighborhood N(θ0) of θ0 and a square-integrable function Gon

Zm _{such that} _h₍_z

(23)

to θ for every (z1, . . . , zm)∈ Zm with the first partial derivative ˙h of full rank

and second partial derivative ¨h satisfying

kh˙(z1, . . . , zm;θ)k+k¨h(z1, . . . , zm;θ)k ≤G(z1, . . . , zm), θ∈N(θ0). (A2) sup_θ_∈_N(θ₀₎Ekh(Z1, . . . , Zm;θ)k2 <∞. (A3) W =m2_E _˜_h 1(Z;θ0)⊗2 is positive definite.

Remark 3.1.1 (A1) implies that kh(z1, . . . , zm;θ)k is also bounded by a

square-integrable function G∗(z1, . . . , zm) on Zm×N(θ0) provided that N(θ0) is bounded

(by K). In fact, it follows from the mean value theorem that for θ ∈ N(θ0), there

exist θ∗ lying between θ0 and θ such that

kh(z1, . . . , zm;θ)k ≤ kh(z1, . . . , zm;θ0)k+kh˙(z1, . . . , zm;θ∗)kkθ−θ0k ≤ kh(z1, . . . , zm;θ0)k+G(z1, . . . , zm)kθ−θ0k ≤ kh(z1, . . . , zm;θ0)k+ 2KG(z1, . . . , zm) :=G∗(z1, . . . , zm). Letun =un(θ0) where un(θ) =n−1/2 n X j=1 mh˜1(Zj;θ). (3.1.2)

We have the following uniform local asymptotic normality.

Theorem 3.1.1 Assume that (A1)-(A3) are satisfied. Then it holds the expansions sup ktk≤C k −2 log ˆRn(θ0+n−1/2t)−(un−At)>W−1(un−At)k=oP(1) (3.1.3) and sup ktk≤C kLˆn(t)−t>A>W−1un+ 1/2t>A>W−1Atk=oP(1) (3.1.4)

(24)

Note that A has full rank by (A1) and plays the same role as the quantity

−E[ ˙u(Z;θ)] does in Qin and Lawless (1994). Note also that the quadratic functionq defined by

q(t) = t>A>W−1un−1/2t>A>W−1At, t∈Rk, (3.1.5)

is uniquely maximized by ˆt = (A>W−1_A₎−1_A>_W−1_u

n. This shows that θ 7→ Rˆn(θ)

has a local maximizer ˆθ such that

n1/2(ˆθ−θ0)−(A>W−1A)−1A>W−1un =oP(1). Therefore, ˆ θ =θ0+ (A>W−1A)−1 m n n X j=1 A>W−1˜h1(Zj;θ0) +oP(n−1/2). (3.1.6)

This of course implies the asymptotic normality of ˆθ, i.e.

√ n(ˆθ−θ0) =⇒ N (0,(A>W−1A)−1). (3.1.7) Substituting (2.2.2) in ˆRn(θ), we get −log ˆRn(θ) = n X j=1 log(1 +ξ>Vnj(θ)), θ ∈Θ, (3.1.8)

where ξ satisfies (2.2.3). It is not difficult to see that under (A1) – (A3) the random

functionθ7→Rˆn(θ) is continuously differentiable. Consequently, ˆθand ˆξ=ξ(ˆθ) must

satisfy B1n(θ, ξ) = 0, B2n(θ, ξ) = 0, (3.1.9) where B1n(θ, ξ) = 1 n n X j=1 Vnj(θ) 1 +ξ>_V nj(θ) , (3.1.10) and B2n(θ, ξ) = 1 n n X j=1 1 1 +ξ>_V nj(θ) ∂Vnj(θ)> ∂θ ξ. (3.1.11)

Note that (3.1.9) are the estimating equations for the MELE ˆθ. Summarizing the

(25)

Theorem 3.1.2 Suppose (A1) – (A3) hold. Then there is a maximizerθˆfor the JEL θ 7→ Rˆn(θ) such that θˆ solves (3.1.9) and satisfies the stochastic expansion (3.1.6)

hence (3.1.7).

3.2 Some lemmas

To prove Theorem 3.1.1, we need the following lemmas with the proof delayed to Chapter 7.

Lemma 3.2.1 Assume (A1)-(A2) are met. Then it holds for every finite C, ˆ Dn(C) = sup ktk≤C 1 n n X j=1 kVnj(θ0+n−1/2t)−m˜h1(Zj;θ0)k2 =OP(1/n). (3.2.1)

Lemma 3.2.2 Under assumptions (A1)-(A2), it holds the expansion,

sup ktk≤C kn−1/2 n X j=1 (Vnj(θ0+n−1/2t)−mh˜1(Zj;θ0)) +Atk=oP(1) (3.2.2)

for every finite constant C, with A=−E[mh˙˜1(Z;θ0)].

To complete the proof of Theorem 3.1.1, we need a general result from Peng and Schick (2013). Let _Tn1(t), ...,Tnn(t) be d-dimensional random vectors indexed by

t ∈ _Rk_{, where} _k _≤ _d_{. We are interested in the asymptotic behavior of the empirical}

likelihood process Rn(t) = sup n_Yn j=1 nπj :π ∈Pn, n X j=1 πjTnj(t) = 0 o , ktk ≤C,

where C is a positive constant. We shall use the following result which is a special

case of Lemma 5.2 of Peng and Schick (2015).

Lemma 3.2.3 Let x1, ..., xn be d-dimensional vectors. Set

x∗ = max 1≤j≤nkxjk, x¯= 1 n n X j=1 xj, S = 1 n n X j=1 xjx>j ,

(26)

and let λ denote the smallest and Λ the largest eigen value of the matrix S. Then the inequality λ >5kx¯kx∗ implies k −2 logR−nx¯>S−1x¯k ≤(Λ + Λ 3 4Λ2) 2nkx¯k3_x∗ (λ− kx¯kx∗₎3, (3.2.3) where R = supn n Y j=1 nπj :π ∈Pn, n X j=1 πjxj = 0 o .

Motivated by this we introduce the quantities

T∗n(t) = max 1≤j≤nkTnj(t)k, ¯ Tn(t) = 1 n n X j=1 Tnj(t), Sn(t) = 1 n n X j=1 Tnj(t)T>nj(t).

We impose the following conditions. (B1) sup_k_t_k≤_C_T∗_n(t) =oP(n1/2).

(B2) There is a positive definite d×d matrix S such that

sup ktk≤C

k_Sn(t)−Sk=oP(1).

(B3) There exist k-dimensional random vectors un and and d×k matrix A of full

rankk such that un=OP(1) and

sup ktk≤C

k√n_T¯n(t)−un+Atk=oP(1).

We have the following result with the proof delayed to Chapter 7.

Lemma 3.2.4 Suppose (B1)-(B3) hold. Then sup ktk≤C k −2 logRn(t)−(un−At)>S−1(un−At)k=oP(1) (3.2.4) and therefore sup ktk≤C klog Rn(t) Rn(0) −t>A>S−1un+ 1 2t > A>S−1Atk=oP(1). (3.2.5)

(27)

3.3 Proof of Theorem 3.1.1

With the help of the Lemma 3.2.1 – Lemma 3.2.4 we now prove the main result.

Proof of Theorem 3.1.1. We verify the assumptions of Lemma 3.2.3 with

Tnj(t) = Vnj(θ+n−1/2t),S =W,un=n−1/2Pn_j=1m˜h1(Zj;θ), andA=−E[mh˙˜1(Z;θ0)].

Since Ekm˜h1(Z;θ0)kis finite, we obtain

max

1≤j≤nkm

˜

h1(Zj;θ0)k=oP(n1/2). (3.3.1)

For a fixed C, note that we have the bound

sup ktk≤C max 1≤j≤nkVnj(θ+n −1/2_t₎_{k ≤} _max 1≤j≤nkm ˜ h1(Zj;θ)k+n1/2Dˆ1/2n

where ˆDn is given in (3.2.1). It then follows from Lemma 3.2.1 that ˆDn=op(1) hence

sup ktk≤C max 1≤j≤nkVnj(θ+n −1/2_t₎_k₌_o P(n1/2).

This implies (B1). From Lemma 3.2.2 and the central limit theorem it follows that (B3) holds. We are now left to verify (B2). To this end, set

¯ Wn = 1 n n X j=1 ˜ h1(Z;θ)˜h>1(Z;θ). By (A2), kW¯n−Wk=oP(1). (3.3.2)

We conclude (B2) from Lemma 3.2.1, (3.3.2) and the bound,

ka>(_Sn(t)−W¯n)ak=k 1 n n X j=1 (a>Vnj(θ+n−1/2t))2− 1 n n X j=1 (a>h˜1(Zj;θ))2k ≤Dˆn+ 2( 1 n n X j=1 k˜h1(Zj;θ)k2Dˆn)1/2,

valid for every unit vector ain_Rk_{, every}_t_with _k_t_{k ≤}_C_{. We now apply Lemma 3.2.3}

(28)

4. EASY MAXIMUM EMPIRICAL LIKELIHOOD

ESTIMATION

In this chapter, we discuss the existence of maximum empirical likelihood estimate

(MELE) ˆθ. We derive the estimating equations for the MELE’s and obtain their

asymptotic behaviors. In the end, we give a class of easy MELE’s and establish their asymptotic distributions.

4.1 MELE’s and semiparametric effeciency

If h1 were a known function, we would work with the empirical likelihood ratio,

Rn(θ) = sup n_Yn j=1 nπj :π∈Pn, n X j=1 πjm˜h1(Zj;θ) = 0 o , θ ∈Θ. (4.1.1)

It follows from Owen (1988) that if Var(mh1) is finite and positive definite then (4.1.1)

reaches its maximum when πj = 1 n 1 1 +ξ>_m˜_h 1(Zj;θ) , j = 1, ..., n,

where ξ= (ξ1, ..., ξd)> are Lagrange multipliers, which is a d×1 vector and satisfies n X j=1 mh˜1(Zj;θ) 1 +ξ>_m˜_h 1(Zj;θ) = 0. Moreover, ξ→0 as n→ ∞.

In addition to satisfying (A1)-(A3), we further assume EG3 <∞. It then follows

from Lemma 1 of Qin and Lawless (1994) that as n → ∞, with probability 1 the EL

ratio function in (4.1.1) attains its maximum value at some ˜θ in the interior of the

ball kθ−θ0k ≤n−1/3, and ˜θ and ˜ξ=ξ(˜θ) satisfy

(29)

where A1n(θ, ξ) = 1 n n X j=1 mh˜1(Zj;θ) 1 +ξ>_m˜_h 1(Zj;θ) , (4.1.3) A2n(θ, ξ) = 1 n n X j=1 1 1 +ξ>_m˜_h 1(Zj;θ) ∂m˜h1(Zj;θ)> ∂θ ξ. (4.1.4)

The equations in (4.1.2) are theoretically useful and can’t be used to find the MELE’s

defined by the JEL ratio (2.2.1) becauseh1 is unknown. Instead, we find the MELE’s

by solving the estimating equations in (3.1.10) – (3.1.11). The next lemma states that both sets of equations give the same solutions as the sample size tends to infinity. To this end, let N0(θ0,0) = {θ : kθ −θ0k ≤ n−1/3} × {ξ : kξk ≤ n−1/3} denote

a neighborhood of (θ0,0). We have the following lemma with the proof delayed to

Chapter 7.

Lemma 4.1.1 Assume (A1)-(A3) hold. Then sup (θ,ξ)∈N0(θ0,0) kB1n(θ, ξ)−A1n(θ, ξ)k=oP(1), (4.1.5) and sup (θ,ξ)∈N0(θ0,0) kB2n(θ, ξ)−A2n(θ, ξ)k=oP(1). (4.1.6)

Consequently, from Lemma 4.1.1 it follows that for large n, there exists some

point ˆθ in a shrinking neighborhood of θ0, such that ˆθ and ˆξ = ξ(ˆθ) satisfy (3.1.9)

and the JEL (2.2.1) reaches its maximum value at (ˆθ,ξˆ). These statement hold on an

event Ω withP(Ω) = 1 at least for sufficiently large n. In general, on its complement

Ωc _{where (ˆ}_θ,_ξˆ_{) are not defined we define them to be arbitrary numbers. The below}

theorem gives the asymptotic normality and the proof can be found in Chapter 7.

Theorem 4.1.1 Assume (A1)-(A3) hold with the dominating function G satisfying E(G3)<∞. Then, as n tends to infinity, with probability one θˆ(θ) attains its maxi-mum at some θˆin a shrinking neighborhood of θ0, and θˆand ξˆ=ξ(ˆθ) solves (3.1.9),

and satisfy

√

n(ˆθ−θ0)→N (0, V),

√

(30)

where V = E(h˙˜1)> E(˜h⊗12) −1 E(h˙˜1) −1 , U = 1 m2 E(˜h ⊗2 1 ) −1 Id−E(h˙˜1)V E(h˙˜1)> E(˜h⊗12) −1 , and θˆand ξˆare asymptotically uncorrelated.

As a result, the weights πj, j = 1, . . . , n can be estimated by

ˆ πj = 1 n 1 1 + ˆξ>_m˜_h 1(Zj; ˆθ) , j = 1, ..., n. (4.1.8)

This, in turn, yields an efficient estimate of the DF F as follows:

ˆ Fn(z) = n X j=1 ˆ πj1(Zj ≤z). (4.1.9)

Remark 4.1.1 The asymptotic variance matrixV can be consistently estimated by ˆ V = _Xn j=1 ˆ πjh˙˜1(Zj; ˆθ) > n X j=1 ˆ πj˜h⊗12(Zj; ˆθ) −1 n X j=1 ˆ πjh˙˜1(Zj; ˆθ) −1 . (4.1.10)

Theorem 4.1.1 also can be used to get approximate confidence limits for θ or F.

By the U-statistics theory, the asymptotic distribution of a U-statistic Umn(h)

of order m with kernel h is dictated by h1. Thus we apply Corollary 1 of Qin and

Lawless (1994) with their estimating function g =h1 to obtain the below result.

Corollary 4.1.1 Assume that the assumptions of Theorem 4.1.1 hold. Suppose d >

k. Then the asymptotic covariance-variance matrix V = Vd of

√

n(ˆθ −θ0) does not

increase (in the sense of positive definiteness of positive definite matrices) as the number of estimating equations increases.

Using the same argument as above and applying Theorem 3 of Qin and Lawless (1994), we have the following.

Theorem 4.1.2 Under the assumptions of Theorem 4.1.1, the MELEθˆis fully effi-cient in the sense of Van der Vaart (1988) and Bickel, Klaassen, Ritov and Wellner (1993).

(31)

The efficiency criteria used are that of a least dispersed regular estimator or that of a locally asymptotic minimax (LAM) estimator. These criteria are based on the convolution theorems and on the lower bounds on the local asymptotic risk in LAN (locally asymptotically normal) and LAM families, see the above references and ad-ditional references therein.

4.2 Easy maximum empirical likelihood estimation

In this section, we study a special case of the U-statistics based estimating equa-tions, that is, some of the equations do not involve parameters. At a first glance, it is seemingly restrictive for use. But actually it is quite useful as we shall demonstrate below. As to this special case, we derive the estimating equations for the MELE’s which are computationally faster than the usual MELE’s – easy MELE’s – as the solutions of the estimating equations given before.

Consider a kernel functions of the form,

h(Z1, . . . , Zm;θ) = (u(Z1, . . . , Zm;θ)>, v(Z1, . . . , Zm)>)>,

where u : _Rm×Θ 7→ _Rp _and _v _:

Rm 7→ Rq are measurable functions and Θ is an

open subset of _Rk_{. Suppose}_u _and _v _satisfy

E(u(Z1, . . . , Zm;θ)) = 0, θ∈Θ, (4.2.1)

and

E(v(Z1, . . . , Zm)) = 0. (4.2.2)

While (4.2.1) serves as a criterion equation for the parameter θ, (4.2.2) describes side

information about the underlying distribution. The parameter θ is usually estimated

by the M-estimate, the solution to the sample version of (4.2.1), that is, the UGEE with kernel u(;θ), Unm(u(;θ)) = n m −1 X 1≤i1<···<im≤n u(Zi1, . . . , Zim;θ) = 0, θ ∈Θ, (4.2.3)

(32)

where we assume, without loss of generality, thatu is argument symmetric about its

m variables (otherwise we symmetrize it). The M-estimate is not efficient in general

as the side information given by (4.2.2) is not utilized. Often we assume that the

number p of equations in (4.2.1) is equal to the dimension k of the parameter θ

(otherwise we can eliminate those redundant equations). Throughout this section we

assume p=k.

We now work with the JEL, ˆ Rn(θ) = sup n_Yn j=1 nπj :π∈Pn, n X j=1 πjvec(Vnju(θ), V v nj) = 0 o , θ ∈Θ, (4.2.4) where Vu

nj, Vnjv are the jackknife pseudo values based on the U-statistics with kernel

function u and v respectively.

In this case, V =m2_E _(˜_u 1(Z;θ0)>,˜v1(Z)>)>(˜u1(Z;θ0)>,˜v1(Z)>) , so that V =m2   E(˜u⊗₁2) E(˜u1˜v1>) E(˜v1u˜>1) E(˜v ⊗2 1 )  . (4.2.5)

Introduce the following assumption.

(A4) Suppose u satisfies (A1), uand v satisfy (A2), and V satisfies (A3).

Under (A4), we can apply Lemma 3.2.1 and Lemma 3.2.2 to u, so that it holds for

every finite C, ˆ Dn(C) = sup ktk≤C 1 n n X j=1 kV_nju(θ0+n−1/2t)−mu˜1(Zj;θ0)k2 =OP( 1 n). (4.2.6) sup ktk≤C kn−1/2 n X j=1 (V_nju(θ0+n−1/2t)−mu˜1(Zj;θ0)) +Autk=oP(1), (4.2.7)

where Au = −E[mu˙˜1(Z;θ0)] of full rank. Let Av = 0q×k and A = (A>u, A

>

v)

>_{. It} follows from Theorem 3.1.1 holds for the JEL (4.2.4). Let ξ = (ξ_u>

(p×1), ξ >

v(q×1)) be the Lagrange multipliers. Then we have

πj = 1 n 1 1 +ξ> uVnju(θ) +ξ>vVnjv , j = 1, . . . , n,

(33)

where ξ satisfies n X j=1 vec(V_nju(θ), V_njv) 1 +ξ> uVnju(θ) +ξv>Vnjv = 0.

Thus, similar to the discussion in the previous section, as n→ ∞, (4.2.4) attains its

maximum value in probability at some ˆθ in a shrinking neighborhood of θ0, and ˆθ

and ˆξ = ( ˆξ_u>,ξˆ_v>)> = (ξu(ˆθ)>,ξˆv>)> satisfy C1n(ˆθ,ξˆ) = 0, C2n(ˆθ,ξˆ) = 0, (4.2.8) where C1n(θ, ξ) = 1 n n X j=1 vec(V_nju(θ), V_njv) 1 +ξ> uVnju(θ) +ξv>Vnjv , (4.2.9) C2n(θ, ξ) = 1 n n X j=1 ξ_u>V˙_nju(θ) 1 +ξ> uVnju(θ) +ξv>Vnjv . (4.2.10)

Therefore, as a corollary of Theorem 4.1.1, the asymptotic properties of the MELE’s ˆ

θ of (4.2.4) can be obtained as stated below. As a convention, we drop the argument

at the true value of parameter so that u(Z;θ0) = u(Z), E(u(Z;θ0)⊗2) = E(u⊗2(Z))

and of course E(u⊗2(Z)) =E(u⊗2), etc. Under (A4),V is invertible. Let the inverse

of the block matrix V be V−1 _{= (}_Vij₎

i,j=1,2. By the inverse formulas for a block

matrix, we have V−1 =   V₁₁−_·1₂ −V₁₁−_·1₂E(˜u1v˜>1)(E(˜v ⊗2 1 )) −1 −V₂₂−_·1₁E(˜v1u˜>1)(E(˜u ⊗2 1 ))−1 V −1 22·1  , (4.2.11) where V11·2 =E(˜u⊗12)−E(˜u1v˜>1) E(˜v ⊗2 1 ) −1 E(˜v1u˜>1), V22·1 =E(˜˜v1⊗2)−E(˜v1u˜>1) E(˜u ⊗2 1 ) −1 E(˜u1v˜>1).

Theorem 4.2.1 Suppose (A4) holds. Suppose also p = k. Then, as n tends to in-finity, with probability one θˆ(θ) attains its maximum at some θˆin a shrinking neigh-borhood of θ0, ξˆ= (0,ξˆv>)> solves (4.2.8), and satisfy

√

n(ˆθ−θ0)→N (0,V¯),

√

(34)

where ¯ V = E( ˙˜u1) −1 V11·2 E( ˙˜u1) −> , U¯ =V22·1−V21V11−·12V 12 . and θˆand ξˆare asymptotically uncorrelated.

Remark 4.2.1 As the asymptotic variance of the estimate of the UGEE (4.2.3) is Σ = E( ˙˜u1)

−1

E(˜u⊗₁2) E( ˙˜u1)

−>

, the asymptotic variance of the MELE’s satisfies ¯

V ≤Σ.

Now let us expand the constraints in (4.2.4) as            n X j=1 πjVnju(θ) = 0, ϑ∈Θ, (4.2.12a) n X j=1 πjVnjv = 0. (4.2.12b)

Based on (4.2.12b), we look at the JEL,

Rn = sup n_Yn j=1 nπj :π ∈Pn, n X j=1 πjVnjv = 0 o . (4.2.13)

This has the solution ˆ πj = 1 n 1 1 + ˆξ> vVnjv , j = 1, . . . , n, where ˆξv satisfies n X j=1 Vv nj 1 +ξ> vVnjv = 0. (4.2.14)

Naturally, we substitute ˆπj in (4.2.12a) to get

1 n n X j=1 V_nju(θ) 1 + ˆξ> v Vnjv = 0. (4.2.15)

Therefore, we find ξu = 0, and ˆξ = (0p×1,ξˆv). It is worth to note that (4.2.14) –

(4.2.15) are identical to (4.2.9) – (4.2.10). Consequently, we can find the MELE’s by solving (4.2.14) – (4.2.15). We refer the MELE’s as the solutions to (4.2.15) to as

the easy empirical likelihood estimates as they have less computational burden than

those usual MELE’s as the solutions to (4.2.9) – (4.2.10) (or (3.1.9)) as pointed by Peng (2015).

(35)

(36)

5. APPLICATIONS: EXAMPLES

In this chapter, we give several examples.

5.1 Estimating the expected values with side information

We are interested in estimation of the expected value θ = E(ψ(Z1, . . . , Zm)) for

some known function ψ : _Rm _7→

Rd in the presence of side information given by

ET(Z1, . . . , Zm) = a for some measurable function T : Rm 7→ R and constant a.

Without loss of generality, we assume bothψ andT are argument-symmetric. Hence

our kernel functions are given by (_u₍_Z

1, . . . , Zm;θ) =ψ(Z1, . . . , Zm)−θ, (5.1.1a)

v(Z1, . . . , Zm) = T(Z1, . . . , Zm)−a. (5.1.1b)

We shall apply Theorem 4.2.1 to derive the asymptotic behaviors.

Let us first mention that U-statistics are quite general. Heffernan (1997) showed that a statistical functionalθ =θ(Q) of a distributionQadmits an unbiased estimator iff there is a functionψ ofm variables such thatθ(Q) = R· · ·R

ψ dQm, and derived the

U-statistic as the unique MVUE of a central moment. Moment based distribution characteristics (e.g. Pearson’s correlation) are functions of central moments, so that the sample versions as test statistics can be expressed as functions of U-statistics.

Example 1 Estimating the mean difference in the presence of known covariance. Let Z = (X, Y) be a bivariate random vector with finite second

mo-ments. We are interested in estimating the mean differenceθ=E(X−Y) when there

is available the side information Cov(X, Y) =a for some knowna. Let Zi = (Xi, Yi),

i= 1, . . . , n be a random sample of Z. Let u(Z1, Z2;θ) =

1

(37)

be the symmetrized kernel function. The side information can be expressed by the U-statistic based equation E(Un(v)) = 0 with the kernel equal to

v(Z1, Z2) =

1

2(X1Y1+X2Y2−X1Y2−X2Y1)−a.

We shall apply Theorem 4.2.1 to derive the asymptotic behavior of the MELE ˆθ.

Let E(X) =µ1,E(Y) = µ2, Var(X) = σ21, and Var(Y) =σ22. Then

u1(z1;θ) =E(u(Z1, Z2;θ)|Z1 =z1) =E(1 2(x1−y1) + 1 2(X2−Y2)−θ) = 1 2(x1−y1) + 1 2(µ1−µ2)−θ, = 1 2(x1−y1)− 1 2θ, v1(z1) =E(v(Z1, Z2)|Z1 =z1) = 1 2(x1y1+E(X2Y2)−x1µ2−y1µ1)−a = 1 2(x1y1+µ1µ2+a−x1µ2−y1µ1)−a, = 1 2(x1−µ1)(y1−µ2)− 1 2a, and ˙ u1(z;θ) =− 1 2.

Let ˆθ be the MELE. Then it follows from Theorem 4.2.1 that

√ n(ˆθ−θ0)→N (0,V¯), where ¯ V =E((X−Y)−θ0)2− E[ (X−Y)−θ0 (X−µ1)(Y −µ2)−a ]2 E (X−µ1)(Y −µ2)−a 2 =σ2₁ +σ₂2−2a− E (X−µ1)2(Y −µ2) −E (X−µ1)(Y −µ2)2 2 E (X−µ1)2(Y −µ2)2 +a2 .

Example 2 _{Estimating the DF in the presence of known CV}. By (4.1.9),

we can construct an improved distribution function (DF) F of a random variable Z

in when side information is available. Let Z1, . . . , Zn be i.i.d. copies ofZ. Let

ψ(Z1, Z2;t) =

1

(38)

Then

θt=E(ψ(Z1, Z2;t)) =

1

2(P(Z1 ≤t) +P(Z2 ≤t)) =F(t).

Often we have some side information aboutF. Here we assume the side information

is given byσ/µ=c0, that is, the coefficient of variation ofZ equals to a constantc0.

This side information can be expressed as a U-statistic equation by taking T(Z1, Z2) =

1 2(Z

2

1 +Z22)−(1 +c20)Z1Z2.

Therefore, with the kernel functions equal to u(Z1, Z2;θt) = 1 2(1[Z1 ≤t] +1[Z2 ≤t])−θt, v(Z1, Z2) = 1 2(Z 2 1 +Z 2 2)−(1 +c 2 0)Z1Z2,

the jackknife pseudo values Vu

nj, Vnjv can be computed, and the estimates (ˆθt,ξˆ) can

be obtained as the solutions to the estimating equations (3.1.9). Alternatively, we can apply (4.2.14) and (4.2.15) to this example and obtain computationally faster

estimates as the solutions to the below equations: Find ˆξ as the solution to

n X j=1 Vv nj 1 +ξ> vVnjv = 0,

while ˆθt is the solution to

1 n n X j=1 V_nju(θt) 1 + ˆξ> v Vnjv = 0. As ˆ πj = 1 n 1 1 + ˆξ> vVnjv , j = 1, . . . , n. by (4.1.9), an efficient estimate of the DFF(z) is given by

ˆ Fn(z) = n X j=1 ˆ πj1[Zj ≤z].

Here we suppress the dependence of ˆπj on the fixed t.

Example 3 Estimating the convolution with side information. Let Z1,

(39)

σ2 _{= Var(}_Z

1). Let ψ(z1, z2) = 1[z1 +z2 ≤ t], z1, z2 ∈ R for known t ∈ R. Hence

θt=P(Z1+Z2 ≤t) is the convolution ofZ1 and Z2. Suppose that there is available

the side information that the coefficient of variation is known: σ/µ = c0 for some

known constantc0. The kernel functions can be constructed as

u(Z1, Z2;θt) = 1(Z1+Z2 ≤t)−θt, v(Z1, Z2) = 1 2(Z 2 1 +Z22)−(1 +c20)Z1Z2.

LetF be the DF of Z. It follows

u1(z1;θ) =E(u(Z1, Z2;θ)|Z1 =z1) = F(t−z1)−θt, v1(z1) =E(v(Z1, Z2)|Z1 =z1) = 1 2(z 2 1 + (1 +c 2 0)µ 2 )−(1 +c2₀)µz1; and ˙ u1(z1;θt) = −1.

By Theorem 4.2.1, the MELE ˆθt is asymptotically normally:

√ n(ˆθt−θt)→N (0,V¯t), where ¯ Vt=E(F(t−Z1)−θt)2 − E(F(t−Z1)−θt)(1₂(Z12+σ2+µ2)−(1 +c20)µZ1) 2 E(1₂(Z2 1 + (1 +c20)µ2)−(1 +c20)µZ1)2 .

Example 4 Estimating Gini’s mean difference with side information.

Let Z1, . . . , Zn be i.i.d. with r.v Z. Gini’s mean difference of Z1 and Z2 is

de-fined as E|Z1−Z2|, which is an alternative index of variability. It is estimated by

the U-statistic Un2 = n 2 −1 X 1≤i<j≤n |Zi−Zj|.

Suppose there is available side information that the inter-quartile range is known, P(q1 ≤Z ≤q3) = 0.5,

(40)

for some known constants q1, q3. Clearly the U-statistic as estimate does not utilize

this side information. Here we use the JEL method to incorporate the side informa-tion. To this end, let θ = E|Z1 −Z2|. The kernel is then given by u(Z1, Z2;θ) =

|Z1−Z2|−θ. The side information can be expressed by the U-statistic with the kernel

equal to

v(z1, z2) = (1[q1 ≤z1 ≤q3] +1[q1 ≤z2 ≤q3])/2−0.5.

It is easy to calculate

u1(z1;θ) =E|z1−Z2| −θ,

v1(z1) = 1[q1 ≤z1 ≤q3]/2−0.25,

and ˙u1(z1;θ) = −1. By Theorem 4.2.1, the MELE ˆθ has asymptotic normal

distribu-tion with the variance equal to

E(u2₁)−(E(u1v1))2/E(v21) =E(u 2

1)−16(E(u1v1))2.

Example 5 Estimating the overdispersion parameter with side infor-mation. Overdispersion is common in count data. This can be modeled by

overdis-persion parameter φ as

Var(Z) = φE(Z), φ >1. (5.1.2) LetZ1, . . . , Zn be count data (frequency data). Suppose there is available side

infor-mation that

P(Z = 0) =P0.

We are interested in estimating the overdispersion parameter φ. Clearly (5.1.2) can

be written as

E(Z2) =φE(Z) + (E(Z))2, which is equivalent to

E(Z₁2−φZ1−Z1Z2) = 0.

Thus the corresponding U-statistic based equation with the kernel equal to the sym-metrized function: u(Z1, Z2;φ) = 1 2(Z 2 1 +Z 2 2)− φ 2(Z1+Z2)−Z1Z2. (5.1.3)

(41)

The jackknife pseudo value of (5.1.3) is one constraint for estimating φ. Moreover, the kernel function of the side information can be constructed as

v(Z1, Z2) =

1

2(1[Z1 = 0] +1[Z2 = 0])−P0. (5.1.4)

Now we apply Theorem 4.2.1 to (5.1.3) and (5.1.4) to estimateφ. LetE(Z) = µ. We

have u1(z1;φ) =E(u(Z1, Z2;φ)|Z1 =z1) =E 1 2(z 2 1 +Z 2 2)− φ 2(z1+Z2)−z1Z2 =1 2z 2 1 − φ 2z1+ 1 2µ 2₋_z 1µ; v1(z1) =E(v(Z1, Z2)|Z1 =z1) =1 2(1[z1 = 0] +P(Z2 = 0))−P0 =1 2(1[z1 = 0]−P0), and ˙ u1(z;θ) = z 2. Hence, by Theorem 4.2.1, √ n( ˆφ−φ0)→N (0,V¯), where ¯ V = 4 µ2 E(u2₁)− E(u1v1) 2 E(v2 1) = 4 µ2 E(u2₁)−4 E(u1v1) 2 P0(1−P0) .

Example 6 The simplicial depth function. Let Z1, . . . , Zn be i.i.d. with a

distribution Q on _Rm_{. The} _{simplicial depth function} _D₍_z_{) of a point} _z _∈

Rm with

respect to distribution Q is defined as follows:

D(z) = P z∈∆(Z1, . . . , Zm+1)

, z ∈_Rm_,

where ∆(Z1, . . . , Zm+1) denotes the random simplex with vertices Z1, . . . , Zm+1, i.e.,

(42)

value of the population simplicial depth at pointz. The usual estimate of the depth function is the sample simplicial depth Dn(z) given by the U-statistic,

Dn(z) = n m+ 1 −1 X 1≤i1<···<im+1≤n 1[z ∈∆(Zi1, . . . , Zim+1)], z ∈R m_.

The depth function can be used to define the multivariate medians and possess ro-bustness property. When additional information is available about the underlying

distribution Q, our JEL approach is capable to employ it into the estimation. In

this example, we assume the marginal medians of Z are known as the side

informa-tion. Let Z = (Z(1), . . . , Z(m))>, and M = (a(1), . . . , a(m))> where a(l) = med(Z(l)), l = 1, . . . , m. Let Zj = (Z (1) j , . . . , Z (m) j ) >_, _j _{= 1}_{, . . . , n}_{. Fix} _z _∈ Rm and D =D(z).

The kernel functions can be constructed as

u(Z1, . . . , Zm+1;D) = 1[z ∈∆(Z1, . . . , Zm+1)]−D; and v(l)(Z1, . . . , Zm+1) = 1 m+ 1 m+1 X j=1 1[Z_j(l) ≤a(l)]− 1 2 , l = 1, . . . , m. Thus u1(z1;D) =P z ∈∆(z1, Z2, . . . , Zm+1) −D; v₁(l)(z1) = 1 m+ 1 1[z (l) 1 ≤a (l) ]− 1 2 + 1 m+ 1 m+1 X j=2 P(Z_j(l) ≤a(l))− 1 2 = 1 m+ 1 1[z (l) 1 ≤a (l)_]₋ 1 2 , l= 1, . . . , m, and ˙ u1(z;D) =−1. Letv1 = (v (1) 1 , . . . , v (m) 1 )

>_{. Hence, by Theorem 4.2.1, the MELE ˆ}_D_satisfies

√ n( ˆD−D)→N (0,V¯), where ¯ V =E(u2₁)−E(u1v>1) E(v1v>1) −1 E(v1u1).

(43)

5.2 Smoothed U-quantiles

The theory of U-quantile provides a unified treatment of several commonly used

statistics, see Arcones (1993). Let κ:_Rm _7→

Rbe a measurable argument-symmetric

function. Associated with κ there induces a distribution function

H(t) = P(κ(Z1, . . . , Zm) ≤ t), t ∈ R. The minimum variance unbiased estimate

(MVUE) of H(t) is the U-statistic of order m given by

Un(t) =Unm(t) =: n m −1 X 1≤i1<···<im≤n 1[κ(Zi1, . . . , Zim)≤t], t∈R, (5.2.1)

and κ shall be referred to as the kernel (of the U-quantile). AsH(t) is a distribution function, its p-th quantile tp is well defined by tp = inf{t : H(t)≥p} for p ∈ [0,1].

The U-quantiles include the Hodges-Lehmann median estimator, Gini’s mean differ-ence, Theil’s estimator of the slope in a simple linear model, and Kendall’s tau. They correspond to the U-quantiles with p0 = 1/2 and the kernels κ(z1, z2) = 2−1(z1+z2),

|z1−z2|, (y1−y2)/(x1−x2), and (x1−x2)(y1−y2) respectively.

As the U-quantiles are discontinuous, our theory does not apply here. We now

consider smoothed U-quantiles. Let F be a continuous DF. A continuous estimator

of H(t) is the smoothed U-quantile,

Hnm(t) = n m −1 X 1≤i1<···<im≤n Fb((t−κ(Zi1, . . . , Zim)), t∈R, (5.2.2)

where Fb(t) = F(t/b) with b a bandwidth. This is the smoothed version of the

U-statistic in (5.2.1). Given p ∈ [0,1], the p-th U-quantile solves H(tp) = p. The

smoothed sample p-th U-quantile ˆtp is a solution of Hnm(tp) = p.

Let us take the Theil-Sen estimator for an illustration. The Theil-Sen estimator

is a robust estimator of the slope in a simple linear model. Suppose that Zi =

(Xi, Yi)>, i= 1, . . . , n are independent and satisfy

(44)

whereαis the intercept andβ is the slope, andεi, i= 1, . . . , nare i.i.d. The Theil-Sen

estimator of the slope is the median of the slopes (Yi−Yj)/(Xi−Xj):

β = med{(Yi−Yj)/(Xi−Xj) : 1≤i < j ≤n},

whereX1, . . . , Xn are assumed to be distinct for the sake of convenience. In this case,

κ(z1, z2) = (y1−y2)/(x1−x2) and

H(t) = P((Y1−Y2)/(X1 −X2)≤t).

While the corresponding U-statistic is Un2(β) = n 2 −1 X 1≤i<j≤n 1[(Yi−Yj)/(Xi−Xj)−β <0],

the smoothed version is Hn2(β) = n 2 −1 X 1≤i<j≤n Fb β−(Yi−Yj)/(Xi−Xj) . (5.2.3)

Hence, to estimate β, one constraint can be chosen as

Hn2(β)−

1

2 = 0. (5.2.4)

In a simple linear model, we assume errors εi, i = 1, . . . , n are i.i.d. with the

normal with zero mean. Therefore, we have P(0≤εj) =

1

2. (5.2.5)

Here we relax the normality assumption to (5.2.5), the assumption of zero median of the error and use it to improve the efficiency of the Theil-Sen estimator. The U-statistic based equation of (5.2.5) is

n 2 −1 X 1≤i<j≤n 1[0< εi+εj 2 ]− 1 2 = 0. (5.2.6)

Thus, we obtain the smoothed version of (5.2.6) n 2 −1 X 1≤i<j≤n Fb( εi+εj 2 )− 1 2 = 0. (5.2.7)

(45)

This gives us another constraint.

Now let us apply Theorem 4.1.1 to get the asymptotic distribution of the im-proved Theil-Sen estimator β. From (5.2.4) and (5.2.7), the kernel is h(Z1, Z2;β) =

(h(1)(Z1, Z2;β), h(2)(Z1, Z2;β))>, where h(1)(Z1, Z2;β) =Fb β−(Y1−Y2)/(X1−X2) −1/2, h(2)(Z1, Z2;β) =Fb (Y1+Y2−β(X1+X2))/2 −1/2. Therefore, h(1)₁ (z1;β) =E[Fb(β−(y1−Y2)/(x1−X2))]−1/2, h(2)₁ (z1;β) =E[Fb((y1+Y2−β(x1+X2))/2)]−1/2.

Letf be the pdf of F and fb(t) =f(t_b). SinceF is a cdf, it follows

˙ h(1)₁ (z1;β) =E b−1fb(β−(y1−Y2)/(x1 −X2)) , ˙ h(2)₁ (z1;β) =E −(x1+X2)/(2b)fb((y1+Y2−β(x1+X2))/2) .

Using these we can obtain the asymptotic normal distribution of the MELE by The-orem 4.1.1.

(46)

6. AN EL APPROACH TO GOODNESS-OF-FIT TESTING

FOR MULTIVARIATE AND HIGH DIMENSIONAL

SYMMETRIES

In this chapter, we develop empirical likelihood tests to various multivariate and high dimensional symmetries based on the characterizations of the symmetries. We report some simulation results.

6.1 Testing multivariate symmetries 6.1.1 Spherical symmetry

A random vector X in _Rd _{is spherically symmetric about a point} _θ _∈

Rd if

X−θ= Γ(d X−θ),

for every orthogonal d×d matrix Γ, where= denotes both sides of the equality haved

an identical distribution. Spherical symmetry is equivalent to the assertion that the radius V =kX−θkis independent of the spatial unit vectorU = (X−θ)/kX−θk,

which is uniformly distributed on the unit sphere Sd−1 _in

Rd, i.e. U ∼ U(Sd−1).

Independence of V and U of course implies

E[aj(V)bk(U)] = 0, aj ∈L2,0(FV), bk ∈L2,0(FU), j, k= 1,2, . . . , (6.1.1) where L2(F) = h:R h2dF <∞ and L2,0(F) = h∈L2(F) : R h dF = 0 for a

distributionF, andFV andFU are the distribution functions ofV andU respectively.

There are numerous choices foraand b, for example, one can choose the sign function

a1(v) = sign(v), Huber’s function a2(v) = v1[kvk ≤1.4] + 1.4sign(v)1[kvk>1.4] and

(47)

AsU is uniformly distributed overU(Sd−1_{), we can use it as side information. To}

this end, we shall resort to the Jackknife empirical likelihood for vector U-statistics developed by Tan, et al. (2015). Let (Vi, Ui), i = 1, . . . , n be a random sample of

(V, U), and let R = U1 + U2 and R0 = R/kRk. It is well known that U being

uniformly distributed over the unit sphere U(Sd−1_{) is equivalent to the assertion}

that kRkand R0 are independent. Independence implies

E(cl(kRk)R0) = 0, cl∈L2,0(G), l= 1,2, . . . ,

where G denotes the distribution function of kRk. Note that G is known and

com-putable. We mention the formula below in the usual case.

Remark 6.1.1 Ford= 3, the distribution function G of kRk is given by G(r) = 1 16π2 Z kxk=1,kyk=1,kx+yk≤r dx1dx2dy1dy2 p (1−x2 1−x22)(1−y21−y22) , where x= (x1, x2, x3), y = (y1, y2, y3) and 0≤r≤2.

We can chooseclas foraj. AsGis known, a systematic way of choosingcl, l= 1,2, . . .

is to take them to be basis functions of L2,0(G), for example, cl = ϕl ◦G, where

ϕl(t) =

√

2 cos(lπt), t ∈ (0,1) is the usual orthonormal trigonometric basis. Denote cL= (c1, . . . , cL)> for some positive integer L. Let κ(U1, U2) =cL(kRk)⊗R0, where

⊗ denotes the Kronecker product. Then it is an argument-symmetric vector kernel

and satisfies E(κ(U1, U2)) = 0 by the preceding independence. The vector U-statistic

with κ as the kernel is now given by

Un(cL) = n 2 −1 X 1≤i<j≤n κ(Ui, Uj).

The Jackknife pseudo values of the vector U-statistics are calculated by Vnj =nUn(cL)−(n−1)U

(−j)

n−1(cL), j = 1, . . . , n,

whereU_n(−−j)1 is the vector U-statistic based on then−1 observations with the deletion

(48)

(6.1.1) as constraints to construct the Jackknife empirical likelihood ratio with side information as follows: Rssu n = sup n_Yn j=1 nπj :π ∈Pn, n X j=1 πjvec aJ(Vj)⊗bK(Uj), Vnj(cL) = 0o,

whereaJ = (a(1), . . . , a(J))>,bK = (b(1), . . . , b(K))>, and vec(X, Y) denotes the column

vector consisting of stacking X, Y. Let Rss

n be the empirical likelihood ratio when

the Jackknife pseudo values are not included. It follows from Corollary 3.1 of Tan,et

al. (2015) that the following holds. Denote κ1(u) =E(κ(U1, U2)|U1 =u).

Theorem 6.1.1 Suppose the covariance Cov(vec(aJ(V), bK(U), κ1(U)) has full rank

J K +Ld. Then −2 logRssu

n has an asymptotic chisquare distribution with J K+Ld

degrees of freedom, that is,

−2 logR_nssu ⇒χ2_{(J K+Ld)}.

If E(a⊗_J2(V)E(b⊗_K2(U)) has full rank J K, then −2 logR_nss ⇒χ2_{J K}.

A systematic way of choosing a(j) _is _a(j) ₌ _ϕ

j ◦ FV, j = 1,2, . . ., which is an

orthonormal basis of L2,0(FV). But it is not computable as FV is unknown. One

can estimate it by the empirical distribution function ˆFV and obtain computable

ˆ

a(j) =ϕj◦FˆV. With estimated constraints, we now work with the empirical likelihood

ratio: ˆ Rss n = sup n_Yn i=1 nπj :π ∈Pn, n X j=1 πjˆaJ(Vj)⊗Uj = 0 o .

We will allow J, d to depend on the sample size n, J = Jn, d = dn, and grow to

infinity slowly. The following is the asymptotic result with the proof delayed to the last section.

Theorem 6.1.2 Suppose Jndn→ ∞ but Jn3d5n+Jn4d3n+Jn5d3n=o(n). Then

−2 log ˆ_√R_nss−Jndn

Jndn

(49)

This shows −2 log ˆRss

n is approximately chisquare distributed with Jndn degrees of

freedom. Thus for 0< α <1,

P −2 log ˆR_nss> χ2_J_n_d_n(1−α)→α, whereχ2

d(1−α) is the (1−α)-th percentile of the chisquare distribution withddegrees

of freedom. Accordingly the test1[−2 log ˆRss

n > χ2Jndn(1−α)] has an asymptotic size

α. If Jn =J for alln, then this is the case of the sphere with infinity dimension. If

dn =d for all n, then this is the case of a d-dimensional sphere.

6.1.2 Rotational symmetry

A random vector X ∈ Sd _{is rotationally symmetric about a fixed direction}_θ _if

X−θ =d _O(X−θ),

for every d×d rotation matrix_O about the direction θ in _Rd. Rotational symmetry

is equivalent to the assertion that the projection T = θ>X of X onto the direction

θ is independent of the unit tangent ξ at θ to Sd−1_{, which is uniformly distributed}

on Sd−2_{, i.e.} _ξ _∼ U₍_Sd−2₍_θ_{)), where} _Sd−2₍_θ_{) =}

x∈_Rd_:_k_x_k_{= 1}_{, x}>_θ _{= 0} _{. For}

more details, see page 179 of Madia and Jupp (2000). Notice that X satisfies the

tangent-normal equation,

X =T θ+√1−T2_ξ. _(6.1.3)

Rotationally symmetric distributions include von Mises-Fisher-type distributions with densities of the form f(θ>x), x∈ Sd_{, Waston-type distributions with densities of the}

form g κ(θ>x)2, x ∈ Sd_{, and Bingham-type distributions of densities of the form}

h(x>_Kx), x∈ Sd_{, where} _{f, g, h} _{are nonnegative functions and}

K is a positive definite

matrix.

In modeling directional and axial data using parametric distributions, one often wishes to test the null hypothesis that the underlying distribution is rotationally

(50)

test is the empirical likelihood ratio elaborated below. Let (Ti, ξi), i = 1, . . . , n be a

random sample of (T, ξ), where T = θ>₀X and ξ = (X −T θ0)/

√

1−T2_{. As in the}

spherical symmetry, the uniform distribution of ξ can be used as side information.

To this end, let R = ξ1 +ξ2 and R0 = R/kRk. Again the statement that ξ ∼

U(Sd−2_{) is equivalent to the assertion that}_k_R_k _and _R0 _{are independent. Let} _G _be

the distribution function ofkRk. Let d(m) =ϕm◦GanddM = (d(1), . . . , d(M))>. This

is computable as G is known. Let κ(ξ1, ξ2) = dM(kRk)⊗R0. Then it is

argument-symmetric and satisfies E(κ(ξ1, ξ2)) = 0 by the preceding independence. The vector

U-statistic with κ as kernel is now given by

Un(dM) = n 2 −1 X 1≤i<j≤n κ(ξi, ξj).

The Jackknife pseudo values of the vector U-statistics are calculated by Vnj(dM) =nUn(dM)−(n−1)U

(−j)

n−1(dM), j = 1, . . . , n,

Analogously we construct the Jackknife empirical likelihood ratio with side informa-tion as follows: Rrsu n = sup n_Yn j=1 nπj :π ∈Pn, n X j=1 πjvec(aJ(Tj)⊗bK(ξj), Vnj(dM) = 0o,

for some choices of aJ = (a(1), . . . , a(J))> and bK = (b(1), . . . , b(K))>. Let Rnrs be

the empirical likelihood ratio when the jackknife pseudo values are not included.

It follows from Corollary 3.1 of Tan, et al. (2015) that the following holds. Let

ξ1(x) =E(κ(ξ1, ξ2)|ξ2 =x).

Theorem 6.1.3 Suppose the matrixCov(vec(aJ(T)), bK(ξ), κ1(ξ))has full rankJ K+

M(d−1). Then−2 logRrsu

n ⇒χ2(J K+M(d−1)).IfE(a

⊗2

J (T)E(b

⊗2

K (ξ))has full rankJ K,

then −2 logR_nrs⇒χ2_{J K}.

As the distribution FT of T is unknown, we estimate it by the empirical

(51)

ˆ

aJn = (a

(1)_{, . . . , a}(Jn)₎> _{= (}_ϕ

1, . . . , ϕJn)

> _◦_F_ˆ

T. With the estimated constraints, the

empirical likelihood ratio as test statistic is now given by ˆ Rrs n = sup n_Yn j=1 nπj :π ∈Pn, n X j=1 πjˆa(Jn)(Tj)⊗ξj = 0 o .

We have the following.

Theorem 6.1.4 Suppose Jndn→ ∞ but Jn5d5n=o(n). Then

−2 log ˆRrs n −Jn(dn−1) p Jn(dn−1) ⇒N (0,1). (6.1.4) Thus −2 log ˆRrs

n is approximately chisquare distributed with Jn(dn−1) degrees of

freedom and the test 1[−2 log ˆR_nrs > χ2_J

n(dn−1)(1−α)] has an asymptotic size α ∈

(0,1).

6.1.3 Antipodally symmetric distributions

LetX have the continuous generalized Scheiddegger-Watson distribution, i.e., the

density is of the form g(kxvk), x ∈ Sd−1, where g is some known function, and xv is

the part ofx in ans-dimensional subspaceV. Then the tangent-norm equation of X

is given by

X =T η+ (1−T2)1/2ξ, T =kxvk, η∈ V, ξ∈ V⊥.

where kηk = 1, η ∈ V, kξk = 1, ξ ∈ V⊥_{. Here we take} _η ₌ _X

v/kXvk, ξ = (X −

T η)/(1−T2)1/2. A relationship similar to the rational symmetry is that T, η and ξ

are independent, and ηand ξ are uniformly distributed on unit spheres inV and V⊥_.

Analogous to the preceding discussions, one can construct an empirical likelihood ratio test and we shall omit the details.

6.1.4 Coordinatewise symmetry

A random vector X in_Rd _{has a distribution coordinatewise symmetric about} _θ _if

(X1−θ1, . . . , Xd−θd) d

(52)

Coordinatewise symmetry of X about θ is equivalent to the assertion that the coor-dinatewise radius vector V = (V1, . . . , Vd)> is independent of the coordinatewise sign

vectorU = (U1, . . . , Ud)>, whereVj =kXj−θjkandUj = sign(Xj−θj). Independence

implies

E[a(j)(V)b(k)(U)] = 0, a(j)∈L2,0(FV), b(k) ∈L2,0(FU), j, k= 1,2, . . . .

Analogous to the preceding discussions, we choose severala(j) _∈_L

2,0(FV), j = 1, . . . , J

and b(k) ∈L2,0(FU), k= 1, . . . , K and construct the empirical likelihood ratio:

Rcs n = sup n_Yn i=1 nπj :π ∈Pn, n X j=1 πjaJ(Vi)⊗bK(Ui) = 0 o , By Owen’s theorem, we have the following.

Theorem 6.1.5 Suppose the matrix E(aJa>J(V))⊗E(bKb>K(U)) has full rank J K.

Then

−2 logR_ncs ⇒χ2_{J K}.

6.1.5 Exchangeability

A random vector X in _Rd _{is exchangeable if}

(X1, . . . , Xd) d

= (Xπ1, . . . , Xπd)

for every permutation π1, . . . , πd of 1, . . . , d. Then O = (Xd,1, . . . , Xd,d)>, where

Xd,1 ≤. . .≤Xd,d are the order statistics of X, isindependent of R = (R1, . . . , Rd)>,

where Rj =Pd_i=11[Xi ≤Xj] are the rank statistics. Independence implies

E[a(j)(O)b(k)(R)] = 0, a(j) ∈L2,0(FO), b(k) ∈L2,0(FR), j, k = 1,2, . . . .

In the same fashion, we choose several a(j) _∈ _L

2,0(FO), j = 1, . . . , J and b(k) ∈

L2,0(FR), k = 1, . . . , K and construct the empirical likelihood ratio:

Res n = sup n_Yn j=1 nπj :π ∈Pn, n X j=1 πjaJ(Oj)⊗bK(Rj) = 0 o , where (Oj, Rj), j = 1, . . . , nis a random sample of (O, R).

(53)

Theorem 6.1.6 Suppose the matrix E(aJa>J(O))⊗E(bKb>K(R)) has full rank J K.

Then

−2 logR_nes⇒χ2_{J K}.

6.2 Simulation study

6.2.1 Testing the center of spherical symmetry

Consider the null hypothesis that the center of symmetry of a distribution is the origin of _Rd versus the alternative hypothesis that the center is θ1 different from the

origin,

H0 :θ =0 vs H1 :θ =θ1.

We were interested in the power performance of the proposed tests at the nominal level of significanceα =.05 in different cases. Specifically, we looked at the following cases.

Case 1. How the power of the tests decreases as the dimension d grows with respect

to different sample sizes n. Case 2. How the power of the tests increases as θ1 moves

away from the origin for a fixed sample size n with respect to different dimensions d

and different number of constraints J, K. Case 3. How the side information increases

the power of the tests. We examined the test1[−2 log ˆRss

n > χ2Jndn(1−α)] in Cases 1

and 2 and the test 1[−2 logRssu

n > χ2J K+Ld(1−α)] in Case 3. In the latter situation,

we choose a(v) = (sign(v), v1[kvk ≤ 1.4] + 1.4sign(v)1[kvk>1.4])> and bK(U) =U.

As E(U) = 0, these choices still satisfy the equalities in (6.1.1) by the independence ofV andU even though the componentsa(1)_{, a}(2) _of_a_{are not in}_L

2,0(FV). Examining

Cases 2 and 3, one observes that there is significant power increase and sample size reduction with the use of the side information

(54)

Table 6.1.

Case 1: The simulated α = .05 level of significance of the EL test

about the center of spherical symmetry H0 : θ = 0 with J = r,

d = dim, n = 400 and m = 2000 repetitions. Data generated from

multivariate normal. r=1 r=2 r=3 r=4 r=5 dim=1 0.0395 0.0520 0.0470 0.0585 0.0465 dim=2 0.0475 0.0610 0.0480 0.0460 0.0560 dim=3 0.0570 0.0605 0.0520 0.0615 0.0595 dim=4 0.0555 0.049 0.05450 0.06500 0.0705