A converse to Scheffe's theorem

(1)

A Converse to Scheffe's Theorem

Dennis D. Boos

The Annals of Statistics, Vol. 13, No. 1. (Mar., 1985), pp. 423-427.

Stable URL:

http://links.jstor.org/sici?sici=0090-5364%28198503%2913%3A1%3C423%3AACTST%3E2.0.CO%3B2-C

The Annals of Statistics is currently published by Institute of Mathematical Statistics.

Your use of the JSTOR archive indicates your acceptance of JSTOR's Terms and Conditions of Use, available at

http://www.jstor.org/about/terms.html. JSTOR's Terms and Conditions of Use provides, in part, that unless you have obtained prior permission, you may not download an entire issue of a journal or multiple copies of articles, and you may use content in the JSTOR archive only for your personal, non-commercial use.

Please contact the publisher regarding any further use of this work. Publisher contact information may be obtained at http://www.jstor.org/journals/ims.html.

Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed page of such transmission.

The JSTOR Archive is a trusted digital repository providing for long-term preservation and access to leading academic journals and scholarly literature from around the world. The Archive is supported by libraries, scholarly societies, publishers, and foundations. It is an initiative of JSTOR, a not-for-profit organization with a mission to help the scholarly community take advantage of advances in technology. For more information regarding JSTOR, please contact [email protected].

(2)

A CONVERSE TO SCHEFFE'S THEOREM

North Carolina State University

Convergence of densities implies convergence of their distribution functions via Scheffk's theorem. This paper is concerned with the converse: what are sufficient conditions to obtain convergence of densities from convergence of distribution functions? A general lemma is given and local limit results are

obtained for translation and scale statistics.

1. Introduction. When does convergence in distribution lead to conver-

gence of the associated densities? Specifically, let T, = (aln(T1, - bl,),

. .

. ,

ak,(Tk, -bkn)) be a standardized random vector in R? which converges weakly to a distribution having density (Radon-Nikodym derivative) g with respect to (wrt) Lebesgue measure p on Rk. If T, has density g, wrt p, what are sufficient conditions for g, to converge to g, say pointwise a.e. p? Lemma 1below gives one set of such conditions and Theorems 1and 2 of Section 3 verify these conditions for certain translation and scale statistics. The statistical motivation for these new local limit theorems arose in a Bayesian context and is discussed briefly a t the end of Section 3.

2. The main lemma. Let G, and G denote the distribution functions (dfs)

of g, and g and let

"+"

stand for weak convergence. If G,

+

G, a sufficient condition for g,(x) +g(x) a.e. p is that each subsequence g,, have a further subsequence g,,, converging to some density g*. In that case, Scheffk's theorem (Serfling, 1980, page 17) yields G,.

+

G* which would contradict G,

+

G if g* # g on a set of positive p measure. However, the existence of convergent subsequences is not easy to verify in the absence of a metric space. (In the topology of pointwise convergence g, is compact if (g,) is closed and (g,(x)) is bounded for each x. Unfortunately, this topology is not metrizable here, e.g., Dugundji, 1966, page 273). Thus, it is convenient to use uniform convergence on compacts for which the Ascoli theory is available (e.g., Royden, 1968, page 179). This leads to Lemma 1and (2.3). If the equicontinuity is uniform, then we can extend the result to uniform convergence on Rk, i.e., wrt the norm sup,

I

g,(x)

-g(x)I

IIgn-gIIm-LEMMA1. Suppose that G, and G have continuous densities g, and g wrt p on Rk.If G,

+

G and

(2.1) sup,

I

g,(x)

I

5 M ( x )

<

m, each x E Rk,

Received March 1983; revised July 1984.

AMS 1980 subject classifications. Primary 62320; secondary 62G05.

Key words andphrases. Convergence of densities, local limit theorems, translation statistics, scale statistics.

(3)

424 DENNIS D. BOOS

and

(g,] is equicontinuous, i.e., for each x and c

>

0 there exists (2.2) 6(x, c) and n(x, c) such that

I

x - y

1

<

6(x, c) implies that

I

gn(x) -g n ( ~ )

I

<

&for all n 2n(x, c), then for any compact subset C of R k

If (g,] is uniformly equicontinuous, i.e., 6(x, c) = 6(c) and n(x, c) = n(c) in (2.2) do not depend on x, and g(x,) +0 whenever

I

x,

I

+w, then

PROOF.Conditions (2.1) and (2.2) are exactly what the classical Ascoli theorem requires for (g,] to be compact wrt the topology of uniform convergence on compacts. Thus, if g,, is a subsequence of g,, then there is a further subsequence g,- which converges uniformly on each compact subset of Rk to some g*. Scheffk's theorem then shows that g* =g a.e. p, and since they are both continuous g*(x) =g(x), each x E Rk. SO, for each compact set C

and (2.3) follows by the usual argument (Royden, 1968, page 37, problem 11). Now, suppose that (2.4) is false. Then, there must exist an c

>

0 and a subsequence n' of n and a sequence x,, such that

(2.6)

I

g,, (x,,) -g(x,.)

1

>

c for all n'.

If the x,, are bounded, then there exists some C which contain all the x,, and (2.6) contradicts (2.5). If the xnf are not bounded, then there is a subsequence x,. such that at least one coordinate of x,. is tending to fw. Since g(x,.) +0 as n" +w we have by (2.6) that g,-(x,-) r c/2 for all n " sufficiently large, say all n" r no. Let 6 = 6(c/4) be such that

I

x,. -y

I

<

6 implies

I

gnI,(xnf,)-g,,,(y)

I

<

c/4 for all n 2n(c/4). Then

for all n" r max(no, n(c/4)) since g,.(y) 2 c/4 for all y in the 6 neighborhood of xnf, (let

I

-

I

on Rk be the maximum of the coordinates). But Gnf,(xnf,

+

6)

-G,,,(x,,, -6 ) 5 2

11

G,. -G

11,

+

G(x,,,

+

6) -G(x,. -6), and

11

G,- -G

11,

+0 by Polya's theorem in Rk and G(x,.

+

6) - G(x,. - 6 ) = g(x,*.) ( 2 ~ 7 ) ~ , where x,*. E (x,. -6, xnfl

+

6). Since a t least one coordinate of x,. +fw, the same is true for x,*.. Thus, g(x,*.) +0 as n " +w and (2.7) is contradicted. O

(4)

distributed (iid) observations with common density f (x). Then, using the translation property we have

= 1 -

J

I(T(x)

>

y ) e ~ p ( z ? = ~ log f(xi

+

t ) ] dxl

. .

dx,.

Now, to get derivatives of G,(y) we need only justify the interchange of operations

Klaassen (1984) introduced this approach and showed that if f is absolutely continuous with

J

I

f '(x)

I

dx

<

03, then

For notational convenience, the first derivative off is often written f

',

and if G, is a df then g, is its first derivative.

We can get similar representations for the logarithm of positive scale statistics S ( X ) , i.e., those satisfying S(ax) = aS(x)

>

0 for all a

>

0 and x = (xl, . .

.

,

x,). Then,

(3.3)

= 1-

J

I(1og S(x)

>

y)exp(zL, log(etf (etxi))) dxl

. . .

dx,.

Iff is absolutely continuous with J

I

xf '(x)

I

dx

<

co, then

(3.4) h,(y) =

J

I(log S(x)

>

y)

n?.,

f(xi) dxl

. . . dx,.

Obviously, higher order derivatives may be taken under sufficient regularity conditions. A similar approach is also possible for two-sample shift statistics and ratios of positive scale statistics.

(5)

426 DENNIS D. BOOS

THEOREM1. Let X 1 ,

. . .

,

X , be independent with common density f ( x ) on R1. Let T , be a translation statistic such that G , ( y ) = P ( n l / ' [ T n-T ( F ) ]5y ) +

@ ( y / a ) , each y. I f f is absolutely continuous and I ( f )

<

w, then with g ( x ) = a-lc$(x/a) we have

THEOREM2. Let X 1 ,

.-

. ,

X , be independent with common density f ( x ) on R1. Let S , be a positive scale statistic such that H , ( y ) = P(nl/'[log S -log S ( F ) ]

5y ) +@ ( y / a ) ,each y. I f f is absolutely continuous and I l ( f )

<

w, then with h ( x )

= a - ' $ ( x / a ) we have

PROOFS. Since both proofs are virtually the same', only the first will be given. From I ( f )

<

w via Cauchy-Schwarz we get

J

I

f ' ( x )

1

dx

<

co and thus (3.2) with

g,(y) replaced by n-1/2gn(n-1/2y

+

T ( F ) )can be written as

g n ( ~ )= E[n-lI2 C Z i

- ( f ' l f )

( X i ) l I ( n l / ' ( T n- T ( F ) )

>

Y ) .

One application of the Cauchy-Schwarz inequality yields g,(y) 5 [ I ( f

)I1/'

and

another gives

I

gn(x) -g n ( ~ )

I

5 [ I ( f

)I1/'

I

G n ( x )-G n ( y )

1

5 I ( f ) x

I

-Y

1

Hence, (g,) is uniformly bounded and uniformly equicontinuous so that Lemma

1 applies. O

Relation to local limit theorems. Theorems 1 and 2 yield local limit theorems

for virtually all location and scale estimators in use as long as I ( f )

<

w or I l ( f )

<

w. On the other hand, when T , ( X ) =

X

the sample mean, then characteristic

function arguments yield better results. From Feller (1966, page 515, 516) we know that if X 1 has finite 2nd moment and characteristic function w such that

I

w

I

'

is integrable for some integer r, then

11

g, -g

11,

+0. Moreover, if

I

w

I

'

is not integrable for some r, then g, must be unbounded. Since I ( f )

<

leads to bounded g,, it must be a stronger condition. For example, if X 1 has the uniform density f ( x ) = I(-% I x I l/2), then I ( f ) = w but

I

w ( t )

1

'

= 4t-'sin2(t/2) is

integrable.

Extensions. Analogous results for two-sample shift statistics and ratios of

positive scale statistics should be clear. Boos (1983)also gives results for conver- gence of g; to g' and for the convergence of certain bivariate densities. However, the methods of this section are not ideally suited for joint densities, and there are no "density" versions of Slutsky's theorem or of the Crambr-Wold device to help extend results from 1 to k dimensions.

Statistical application. The motivation for considering convergence of densi-

(6)

the full likelihood of the sample. Convergence of the resulting posteriors then depends on convergence of &(x 10) and Theorems 1 and 2 are relevant. For example, $ might be the sample median or a trimmed mean. (We know that the density of the sample median converges pointwise (Serfling, 1981, page 85), but Theorem 1 adds the uniformity.) However, the approach is most useful in a semiparametric framework where /i(x 18) is unknown but can be estimated by the bootstrap. In that case analogues to Theorems 1and 2 can be proved for the bootstrap density estimator ;i(x

I

0) as long as I(f,) or Il(fn) stay bounded, where f, is the density generating the bootstrap samples.

Acknowledgement. This paper was written during a sabbatical visit to the

Department of Statistics, University of California, Berkeley. I would like to thank Peter Bickel, Harold Goldstein, Chris Klaassen, and Lucien LeCam for helpful discussions during this time.

REFERENCES

BOOS,D. D. (1983). On convergence of densities of translation and scale statistics. Institute of Statistics Mimeo Series #1625, North Carolina State University, Raleigh.

BOOS, D. D. and MONAHAN, J. F. (1983). A robust Bayesian approach using bootstrapped likelihoods. Preprint.

DUGUNDJI,J. (1966). Topology. Allyn and Bacon, Boston.

FELLER,W. (1966). Introduction to Probability Theory and Its Applications II. Wiley, New York. KLAASSEN,C. A. J. (1984). Location estimators and spread. Ann. Statist. 12 311-321.

ROYDEN,H. L. (1968). Real Analysis. Macmillan, New York.

SERFLING,R. J. (1980). Approximation Theorems of Mathematical Statistics. Wiley, New York.

DEPARTMENTOF STATISTICS Box 8203