• No results found

Adaptive Bayesian inference on the mean of an infinite-dimensional normal distribution

N/A
N/A
Protected

Academic year: 2020

Share "Adaptive Bayesian inference on the mean of an infinite-dimensional normal distribution"

Copied!
24
0
0

Loading.... (view fulltext now)

Full text

(1)

2003, Vol. 31, No. 2, 536–559 © Institute of Mathematical Statistics, 2003

ADAPTIVE BAYESIAN INFERENCE ON THE MEAN OF AN INFINITE-DIMENSIONAL NORMAL DISTRIBUTION

BY EDUARD BELITSER ANDSUBHASHISGHOSAL

Utrecht University and North Carolina State University

We consider the problem of estimating the mean of an infinite-dimensional normal distribution from the Bayesian perspective. Under the as-sumption that the unknown true mean satisfies a “smoothness condition,” we first derive the convergence rate of the posterior distribution for a prior that is the infinite product of certain normal distributions and compare with the min-imax rate of convergence for point estimators. Although the posterior distrib-ution can achieve the optimal rate of convergence, the required prior depends on a “smoothness parameter”q. When this parameterqis unknown, besides the estimation of the mean, we encounter the problem of selecting a model. In a Bayesian approach, this uncertainty in the model selection can be handled simply by further putting a prior on the index of the model. We show that if q takes values only in a discrete set, the resulting hierarchical prior leads to the same convergence rate of the posterior as if we had a single model. A slightly weaker result is presented whenqis unrestricted. An adaptive point estimator based on the posterior distribution is also constructed.

1. Introduction. Suppose we observe an infinite-dimensional random vector

X=(X1, X2, . . .), where Xi’s are independent, Xi has distribution N (θi, n−1),

i=1,2, . . . , andθ =1, θ2, . . .)2, that is,

i=1θi2 <∞. The parameterθ is unknown and the goal is to make inference about θ. Let Pθ , n stand for the distribution of X; here and throughout, we suppress the dependence of X and

=Pθ , n on n unless indicated otherwise. We shall write · for the 2 norm

throughout the paper.

We also note that X may be thought of as the sample mean vector of the first n observations of an i.i.d. sample Y1,Y2, . . . , each taking values in R∞

and distributed like Y=(Y1, Y2, . . .), where Yi’s are independent and Yi has distributionN (θi,1),i=1,2, . . . .Since the sample mean is sufficient, these two formulations are statistically equivalent. To avoid complicated notation, we shall generally work with the first formulation. However, the latter formulation will help us apply the general theory of posterior convergence developed by Ghosal, Ghosh and van der Vaart (2000).

The interest in the infinite-dimensional normal model is partly due to its equivalence with the prototypical white noise model

dXε(t)=f (t) dt+ε dW (t), 0≤t≤1,

Received February 2000; revised March 2002.

AMS 2000 subject classifications.Primary 62G20; secondary 62C10, 62G05.

Key words and phrases.Adaptive Bayes procedure, convergence rate, minimax risk, posterior

distribution, model selection.

(2)

whereXε(t)is the noise-corrupted signal,f (·)L2[0,1] is the unknown signal,

W (t)is a standard Wiener process and ε >0 is a small parameter. The statistical estimation problem is to recover the signalf (t), based on the observationXε(t). The above model arises as the limiting experiment in some curve estimation problems such as in density estimation [Nussbaum (1996); Klemelä and Nussbaum (1998)] and nonparametric regression [Brown and Low (1996)], whereε=n−1/2

andnis the sample size.

Suppose that the functions {φi, i =1,2, . . .} form an orthonormal basis in

L2[0,1]. Under the assumptionf (·)L2[0,1], we can reduce this problem to the

problem of estimating the mean θ =1, θ2, . . .)2 of an infinite-dimensional

normal distribution. Indeed,

Xi=θi+εξi, i=1,2, . . . ,

where Xi = T

0 φi(t) dXε(t), θi = T

0 φi(t)f (t) dt and ξi = T

0 φi(t) dW (t), so that the ξi’s are independent standard normal random variables. In so doing, we arrive at the infinite-dimensional Gaussian shift experiment withε=n−1/2. The signal f (t)can be recovered from the basis expansion f (t)=∞i=1θiφi(t) and the map relatingf andθ is an isometric isomorphism.

Coming back to the infinite-dimensional normal model, if a Bayesian analysis is intended, one assigns a prior to θ and looks at the posterior distribution. Diaconis and Freedman (1997) and Freedman (1999) considered the independent normal prior N (0, τi2) for θi, i = 1,2, . . . , and comprehensively studied the nonlinear functional θ − ˆθ2, where θˆ is the Bayes estimator, both from the Bayesian and the frequentist perspectives. As a consequence of their main results, they established frequentist consistency of the Bayes estimator for all θ2 if

i=1τi2 <∞. A peculiar implication of their result is that the Bayesian and frequentist asymptotic distributions ofθ − ˆθ2 differ, and hence the Bernstein– von Mises theorem fails to hold for this simple infinite-dimensional problem. A similar conclusion was reached earlier by Cox (1993) in a slightly different model.

This estimation problem was first studied in a minimax setting by Pinsker (1980). He showed that if the unknown infinite-dimensional parameter θ is assumed to belong to an ellipsoidq(Q)= {θ:∞i=1i2qθi2≤Q}, then the exact asymptotics of the minimax quadratic risk over the ellipsoidq(Q)are given by

lim

n→∞infθˆ θsup q(Q)

n2q/(2q+1)Eθˆθθ2=Q1/(2q+1)γ (q), (1.1)

where γ (q) = (2q +1)1/(2q+1)(q/(q + 1))2q/(2q+1) is the Pinsker constant [Pinsker (1980)] and denotes the expectation with respect to the probability measure generated byXgivenθ.

(3)

and showed that the posterior mean attains the minimax rate nq/(2q+1). We shall write q to denote this prior. We show in Theorem 2.1 that the posterior distribution q(·|X) obtained from the prior q also converges at the rate

nq/(2q+1)in the sense that, for any sequenceMn→ ∞,

q

θ:nq/(2q+1)θθ0> Mn|X

→0

in 0-probability as n→ ∞, where the true value of the parameterθ0 belongs to the linear subspaceq = {θ:∞i=1i2qθi2<∞}. It has long been known that for finite-dimensional models the posterior distribution converges at the classical

n−1/2 rate, but for infinite-dimensional models results have only been obtained recently by Ghosal, Ghosh and van der Vaart (2000) and Shen and Wasserman (2001).

Our main goal in the present paper is to construct, without knowing the value of the smoothness parameterq, a prior such that the posterior distribution attains the optimal rate of convergence. In this case, instead of one single modelθq, we have a nested collection of models θq, q ∈Q. The inference based on such a prior is therefore adaptive in the sense that the same prior gives the optimal rate of convergence irrespective of the smoothness condition. Here, besides the problem of estimation, we encounter the problem of model selection. In a Bayesian framework, perhaps the most natural candidate for a prior that gives the optimal rate over various competing models is a mixture of the appropriate priors in different models, indexed by the smoothness parameter q. So we consider a mixture of q’s over q, that is, a prior of the form λq q, where λq >0,

λq=1 and the sum is taken over some countable set. We show that the resulting hierarchical or mixture prior leads to the optimal convergence rate of the posterior simultaneously for allq under consideration. Similar results were also found by Ghosal, Lember and van der Vaart (2002) in the context of density estimation and Huang (2000) for density estimation and regression.

The problem of adaptive estimation in a minimax sense was first studied by Efromovich and Pinsker (1984). They proposed an estimator θˆ of θ which does not assume knowledge of the smoothness parameter q, yet attains the minimum in (1.1).

(4)

result. Uniformity of the convergence of the posterior distribution with respect to the parameter lying in an infinite dimensional ellipsoid is discussed in Section 8.

From now on, all symbols O and o refer to the asymptotics n→ ∞ unless otherwise specified. Byx, we shall mean the greatest integer less than or equal tox.

2. Known smoothness. Recall that we have independent observations Xi distributed asN (θi, n−1),i=1,2, . . . .From now on, we assume that the unknown parameter θ = 1, θ2, . . .) belongs to the set q = {θ:∞i=1i2qθi2 < ∞}, a Sobolev-type subspace of 2. Note that q measures the “smoothness” of θ,

since, in the equivalent white noise model with the standard trigonometric Fourier basis, the set q essentially corresponds to the class of all periodic functions

f (·)L2[0,1]whoseL2[0,1]norm of theqth derivative is bounded, whenqis an

integer (otherwise theqth Weyl derivative is meant). So, in some senseqstands for the “number of derivatives” of the unknown signal in the equivalent white noise model. For this reason,q will often be referred to as the smoothness parameter ofθ.

Letθ0=10, θ20, . . .)denote the true value of the parameter so that we have θ0∈q. Let the prior pbe defined as

θi’s are independentN (0, i(2p+1)).

Then the posterior distribution ofθ givenXis described by

θi’s are independentN

nXi

n+i2p+1,

1

n+i2p+1

.

The posterior mean is given byθˆ =ˆ1ˆ2, . . .), where

ˆ

θi=

nXi

n+i2p+1, i=1,2, . . . .

Note that the posterior distribution of θi depends on X only through Xi. In Theorem 5.1 of Zhao (2000) [and implicitly in Section 3 of Cox (1993) and Theorem 5 of Freedman (1999)], it is shown that θˆ converges to θ0 at the rate

n−min(p, q)/(2p+1). By a slight extension of Zhao’s (2000) argument, we observe that the posterior distribution also converges at this rate.

THEOREM 2.1. For any sequenceMn→ ∞,

p

θ:nmin(p, q)/(2p+1)θ−θ0> Mn|X

→0,

(2.1)

(5)

PROOF. Applying Chebyshev’s inequality, the posterior probability in (2.1) is at most

Mn−2n2 min(p, q)/(2p+1)

i=1

ˆiθi0)2+var(θi|Xi) .

It suffices to show that the0-expectation of the above expression tends to zero. Now it is easy to see that

i=1

var(θi|Xi)=

i=1

(n+i2p+1)−1=On−2p/(2p+1) .

It follows from Theorem 5.1 of Zhao (2000) that

i=1

0ˆiθi0)

2=On−2 min(p, q)/(2p+1)

and hence the result follows.

REMARK 2.1. Shen and Wasserman (2001) also calculated the rate of

convergence of the posterior distribution using a different method for the special case whenθi0=ipand also showed that the obtained rate is sharp.

REMARK 2.2. Interestingly, as Zhao (2000) pointed out, although the pri-or p with p =q is the best choice (among independent normal priors with power–variance structure) as far as the convergence rate is concerned, q has a peculiar property. Both the prior and the posterior assign zero probability toq. This is an easy consequence of the criterion for summability of a random series. It is also interesting to note that summability of ∞i=1i2i2 is barely missed by the prior since ∞i=1i2i2 <∞ almost surely for any p < q and soq=sup{p:∞i=1i2i2<∞almost surely}. To fix this problem, Zhao (2000) proposed a compound prior∞k=1wkπk, where underπk, theθi’s are independent with distribution N (0, i(2q+1)) for i =1, . . . , k and degenerate at 0 otherwise, and the wk’s are weights bounded from below by a sequence exponentially decaying ink. She showed that, on the one hand, the posterior mean attains the minimax rate of convergence and, on the other hand, assigns probability 1 toq.

3. Discrete spectrum adaptation. So far our approach relies heavily on the fact that we know the smoothness, that is, we have chosen a single model as the “correct” one from the collection of possible models corresponding to different choices of q >0. In general, when the parameter specifying the model is not chosen correctly, this may lead to suboptimal or inconsistent procedures, since further analysis does not take into account the possibility of other models.

(6)

the assertion holds for no larger value ofq. If actuallyθ0∈qfor someq> q, then we lose in terms of the rate of convergence since we could have obtained the better ratenq/(2q+1)by using the prior q. On the other hand, ifθ0∈q,

q< qonly, then the model is misspecified. In general, one expects inconsistency in this type of situation. In this case, however, since all the models are dense in2,

the posterior is nevertheless consistent. Still, there is a loss in rate of convergence again as we only get the rate nq/(2q+1) from Theorem 2.1 instead of the rate

nq/(2q+1)achievable by the prior q.

Therefore, we intend to present a prior that achieves the posterior rate of convergence nq/(2q+1) at θ0 whenever θ0∈q for different values of the smoothness parameterq. LetQ= {. . . , q1, q0, q1, . . .} be a countable subset of

the positive real semiaxis without any accumulation point other than 0 or∞. Such a set may be arranged in an increasing sequence that preserves the natural order, that is, 0<· · ·< q1< q0< q1<· · ·.We show that there is a prior forθ such

that wheneverθ0∈q,q∈Q, the posterior converges at the ratenq/(2q+1). We may think of q as another unknown parameter. So, instead of one single model, we have a sequence of nested models parameterized by a “smoothness” parameter q ranging over the discrete set Q. In a Bayesian approach, perhaps the most natural way to handle this uncertainty in the model is to put a prior on the index of the model or the “hyperparameter” q. The resulting prior is therefore the two-level hierarchical prior

givenq, theθi’s are independentN (0, i(2q+1));

qhas distributionλ.

The main purpose of this article is to show that this simple and natural procedure is rate adaptive, that is, the mixture prior achieves the convergence ratenq/(2q+1)

wheneverθ0∈q andq∈Q.

Let λm=λ (q=qm). Henceforth, we shall write m for qm. We thus have =∞m=−∞λm m. Throughout, we shall assume thatλm>0 for allm.

Without loss of generality, we may assume thatθ0∈q0. Otherwise, we simply relabel.

THEOREM 3.1. For any sequenceMn→ ∞,the posterior probability

θ:nq0/(2q0+1)θθ

0> Mn|X

→0 (3.1)

inPθ0-probability asn→ ∞.

(7)

prior to the form m0λm m. For such a prior we then show that the posterior probability of {θ:∞i=1i2q0θ2

i > B, q > q0} is small for a sufficiently large B, while Theorem 2.1 implies that{θ:nq0/(2q0+1)θθ

0> Mn, q=q0}converges

to 0 in0-probability. Therefore, the effective parameter space is further reduced to{θ:∞i=1i2q0θ2

iB}for which the general theory developed by Ghosal, Ghosh and van der Vaart (2000) applies.

REMARK 3.1. It should be noted that the cases q < q0 and q > q0 are not

symmetrically treated. Unlike q < q0, we do not show that (q > q0|X) tends

to 0. An obvious reason is that it may still be possible thatθ0∈qm for m >0. For instance, if θi20 decreases exponentially with i, then θ0∈ q for all q. In such cases, clearly it is not to be expected that (q > q0|X) tends to 0. Even

if θ0 ∈/ q1, q1 > q0, it is still possible that (q > q0|X) does not tend to 0. Indeed, considering two values{q0, q1} and the special caseθi0=ip, a referee

exhibited that ifpq1+1−(2q1+1)/(2(2q0+1)), then (q =q1|x)→1 in

probability. In general, the q1 prior leads to a slower posterior convergence rate than the requirednq0/(2q0+1)for a generalθ

0∈q0, so it seems at first glance that our mixture prior cannot simultaneously achieve the optimal rate. The apparent paradox is resolved when one observes, as the referee does, that the latter happens only whenp < q1+1−(2q1+1)/(2(2q0+1)).

The following lemma shows that, given the observations, the conditional probability of misspecifying the model in favor of a coarser model (than the true one) from the posterior converges to zero.

LEMMA 3.1. There exist an integerN and a constantc >0such that for any

m <0andn > N,

0 (q=qm|X)λm

λ0

exp−cn1/(q0+qm+1) (3.2)

and,therefore,

(q < q0|X)→0

(3.3)

inPθ0-probability asn→ ∞.

The proof of the Lemma 3.1 is somewhat lengthy and involves some messy calculations. We defer it to Section 5.

(8)

According to Lemma 3.1, the posterior mass is asymptotically concentrated on

qq0. We separate two possibilities: q=q0 andq > q0. For m >0, introduce

˜

λm = λm/

j=1λj, and ˜ =

m=1λ˜m m. The next lemma shows that the posterior (˜ ·|X) is effectively concentrated on the set {θ:∞i=1i2q0θ2

iB} for a largeB. The proof of this lemma is postponed to Section 6.

LEMMA3.2.

lim

B→∞supn1

θ:

i=1

i2q0θ2 i > BX

=0.

(3.4)

Finally, the following lemma shows that the posterior converges at rate

nq0/(2q0+1). In the proof which is given in Section 6, we exploit the general theory of posterior convergence developed by Ghosal, Ghosh and van der Vaart (2000).

For m≥0, introduce λ¯m=λm/j=0λj and ¯ =∞m=0λ¯m m. Note that, unlike in ˜, the possibilityq=q0 is not ruled out here.

LEMMA3.3. For anyB >0andMn→ ∞,

lim n→∞0 ¯

θ:nq0/(2q0+1)θθ

0> Mn,

i=1

i2q0θ2 iBX

=0.

(3.5)

With the help of Lemmas 3.1–3.3 and Theorem 2.1, the main result can be easily proved now.

PROOF OF THEOREM 3.1. For the sake of brevity, denote rn =rn(q0)=

nq0/(2q0+1). Clearly,

{θ:rnθθ0> Mn|X}

=

θ:rnθθ0> Mn, q > q0,

i=1

i2q0θ2 i > BX

+

θ:rnθθ0> Mn, q > q0,

i=1

i2q0θ2 iBX

+ {θ:rnθθ0> Mn, q < q0|X}

+ {θ:rnθθ0> Mn, q=q0|X}

≤ ˜

θ:

i=1

i2q0θ2 i > BX

(q > q0|X)

+

θ:rnθθ0> Mn, qq0,

i=1

i2q0θ2 iBX

(9)

+ {θ:rnθθ0> Mn, q < q0|X}

+ 0{θ:rnθθ0> Mn|X} (q=q0|X)

≤ ˜

θ:

i=1

i2q0θ2 i > BX

+ ¯

θ:rnθθ0> Mn,

i=1

i2q0θ2 iBX

+ (q < q0|X)

+ 0{θ:rnθθ0> Mn|X}.

Givenε >0,B can be chosen sufficiently large to make the first term less thanε

by Lemma 3.2. For thisB, apply Lemma 3.3 to the second term. The third term goes to zero in 0-probability by Lemma 3.1, while the last term converges to zero by Theorem 2.1. The theorem is proved.

REMARK 3.3. In place of m, Zhao’s (2000) mixture prior described in Remark 2.2 may also possibly be used. However, we do not pursue this approach for the following reasons. First, the expressions will be even more complicated. Second, we think that when2 is considered as the parameter space, the property

that massigns zero probability toq is not as bad as it might first appear since

q is not closed and its closure, the whole of 2, obviously receives the whole

mass. Indeed, both mand Zhao’s mixture prior have support of the whole of2.

Finally, the criterion that “q should receive the whole mass” loses much of its original motivation in the present context of adaptation, when 2 is the grand

parameter space.

4. Adaptive estimator. As a consequence of the result on adaptivity of the posterior distribution, we now show that there is an estimator θˆ based on the posterior distribution that is rate adaptive in the frequentist sense. The problem of adaptive estimation in a minimax sense was first studied by Efromovich and Pinsker (1984). They proposed an estimator which attains the minimum in (1.1) without knowledge of the smoothness parameterq. Their method is based on adaptively determining optimal damping coefficients in an orthogonal series estimator.

To construct an estimator based on the posterior distribution, one may maximize the posterior probability of a ball of appropriate radius as in Theorem 2.5 of Ghosal, Ghosh and van der Vaart (2000). However, their construction requires knowledge of the convergence rate and hence does not apply to the adaptive setup of the problem. The following modification of their construction is due to van der Vaart (2000). Let

δn∗=infδn: (θ:θθδn|X)≥3/4 for someθ2

.

(10)

Take any pointθˆ∈2which satisfies

θ:θ − ˆθδn∗+n−1X ≥3/4.

(4.2)

In words, θˆ is the center of the ball of nearly the smallest radius subject to the constraint that its posterior mass is not less than 3/4. Note that while definingθˆ, we have not used the value of the smoothness parameter. The next theorem shows thatθˆ has the optimal frequentist rate of convergencenq0/(2q0+1)and hence is an adaptive estimator.

THEOREM 4.1 [van der Vaart (2000)]. For anyMn→ ∞andθ0∈q0,

0

nq0/(2q0+1)ˆθθ

0> Mn

→0,

whereθˆ is defined by(4.1)and(4.2).

PROOF. By Theorem 3.1, we have that for anyMn→ ∞,

θ:θ−θ0 ≤Mnn−q0/(2q0+1)X

≥3/4 (4.3)

with0-probability tending to 1.

Let B(θ˜, r)= {θ:θ − ˜θr} denote the ball around θ˜ of size r. The balls B(θˆ, δn∗+1/n)andB(θ0, Mnnq0/(2q0+1)) both have posterior probability at least 3/4, so they must intersect; otherwise, the total posterior mass would exceed 1. Therefore, by the triangle inequality,

ˆθθ0 ≤δn∗+n−1+Mnnq0/(2q0+1)≤2Mnnq0/(2q0+1)+n−1, since by the definition ofδn∗and (4.3), δnMnnq0/(2q0+1). The proof follows.

5. Proof of the Lemma 3.1. We begin with a couple of preliminary lemmas. We recall that givenθ, theXi’s are independent and distributed asN (θi, n−1), and given q =qm, theθi’s are independent and distributed as N (0, i(2qm+1)). Therefore, givenq=qm, the marginal distribution ofXis given by the countable product ofN (0, n−1+i(2qm+1)),i=1,2, . . . .Let us denote this measure byP

m. Let =

m=−∞λmPm, the marginal distribution of X when q is distributed according toλ. The true distribution ofXis, however,0, the countable product ofN (θi0, n−1),i=1,2, . . . .

LEMMA 5.1. For anyn≥1andθ0∈2,PλandPθ0 are mutually absolutely continuous.

(11)

densitiesf (x)andg(x)on the real line is defined byA(f, g)=√f (x)g(x) dx.

The affinity between two normal densitiesN (µ1, σ12)andN (µ2, σ22)is therefore

given by

1−1−σ2)

2

σ12+σ22

1/2

exp

1−µ2)2

412+σ22)

.

Since both Pm and0 are product measures, according to Kakutani’s criterion [see, e.g., Williams (1991), page 144], we need only to show that the infinite product

i=1

AN (0, n−1+i(2q+1)), N (θi0, n−1)

or, equivalently,

i=1

1−((n

−1+i(2q+1))1/2n−1/2)2

2n−1+i(2q+1)

(5.1)

and

i=1

exp

θi20

4(2n−1+i(2q+1)) (5.2)

do not diverge to zero. A product of the form∞i=1(1−ai),ai>0,i=1,2, . . . , converges if∞i=1aiconverges. Also,(ab)2≤a2−b2forab≥0. Therefore, (5.1) converges as

i=1

i(2q+1)

2n−1+i(2q+1)

n

2

i=1

i(2q+1)<.

The product in (5.2) also converges since

i=1

θi20

8n−1+4i(2q+1)

n

8

i=1

θi20<.

The lemma is proved.

The following lemma estimates the posterior probability of themth model.

LEMMA5.2. For any integersmandn≥1,

0 (q=qm|X)λm

λ0

exp

1 2

i=1

(i(2q0+1)i(2qm+1))(i(2q0+1)θ2 i0)

i−2(q0+qm+1)+2n−1i(2q0+1)+n−2

(12)

PROOF. By the martingale convergence theorem [see, e.g., Williams (1991), page 109],

(q=qm|X)= lim

k→∞ (q=qm|X1, . . . , Xk) a.s.[].

Therefore, by Lemma 5.1, (q =qm|X1, . . . , Xk) converges, as k → ∞, to

(q =qm|X) a.s. [0]. Since these, being probabilities, are bounded by 1, the convergence also takes place inL1(Pθ0). By Bayes theorem,

(q=qm|X1, . . . , Xk)

= λm k

i=1(n−1+i(2qm+1))−1/2exp[−12

k

i=1(n−1+i(2qm+1))−1Xi2]

l=−∞λl k

i=1(n−1+i(2ql+1))−1/2exp[−12

k

i=1(n−1+i(2ql+1))−1X2i] ≤λm

k

i=1(n−1+i(2qm+1))−1/2exp[−12

k

i=1(n−1+i(2qm+1))−1Xi2]

λ0ki=1(n−1+i(2q0+1))−1/2exp[−12

k

i=1(n−1+i(2q0+1))−1Xi2]

.

Putai=(n−1+i(2qm+1))−1−(n−1+i(2q0+1))−1. Exploiting the independence of theXi’s under0and using

E exp −α 2X

2= 1

1+ασ2exp

µ2α 2(1+ασ2)

forX distributed asN (µ, σ2)andα >σ−2, we obtain

0 (q=qm|X) = lim

k→∞0 (q=qm|X1, . . . , Xk)

λm

λ0

lim sup k→∞

k

i=1

n−1+i(2q0+1)

n−1+i(2qm+1) 1/2

0

exp

ai

2X

2

i

=λm

λ0

i=1

n−1+i(2q0+1)

n−1+i(2qm+1) 1/2

1+ai

n

−1/2

exp

aiθi20

2(1+(ai/n))

=λm

λ0

i=1

1+aii

(2q0+1)

1+(ai/n) 1/2

exp

aiθi20 2(1+(ai/n))

λm

λ0 exp 1 2 ∞

i=1

ai 1+(ai/n)

(i(2q0+1)θ2 i0)

=λm

λ0 exp 1 2 ∞

i=1

(i(2q0+1)i(2qm+1))(i(2q0+1)θ2 i0)

i−2(q0+qm+1)+2n−1i(2q0+1)+n−2

.

(13)

REMARK5.1. By exactly the same arguments, we can also conclude that for anyl,

0 (q=qm|X)λm

λl exp

1 2

i=1

(i(2ql+1)i(2qm+1))(i(2ql+1)θ2 i0)

i−2(ql+qm+1)+2n−1i(2ql+1)+n−2

.

PROOF OFLEMMA 3.1. Set

S1=

i=1

(i(2q0+1)i(2qm+1))i(2q0+1) i−2(q0+qm+1)+2n−1i(2q0+1)+n−2

and

S2= −

i=1

(i(2q0+1)i(2qm+1)2 i0

i−2(q0+qm+1)+2n−1i(2q0+1)+n−2.

We shall show that there exists anN not depending onm,m <0, such that for

n > N,

S1≤ −121n1/(q0+qm+1)

(5.3)

and

S2≤241 n1/(q0+qm+1).

(5.4)

Note that

i(2q0+1)i(2qm+1)

0, for alli,

−1

2i(2qm+1), fori > I,

where I =21/(2(q0−q−1)). Also if n > N1 =2(q0+q−1+1)/(q0−q−1), then for in1/(q0+qm+1), the first term in the common denominator in the expressions for

S1 and S2 dominates the other two terms. Piecing these facts together, we see

that the terms inS1 are less than or equal to−1/6 forI < in1/(q0+qm+1)and

less than or equal to zero in general. Thus

S1≤ −16

n1/(q0+qm+1)I ≤ −1 12n

1/(q0+qm+1)

ifn > N2=(2I+2)2q0+1, and so (5.3) follows.

LetC0=max(1,i=1i2q0θi20). Now for anyk,

S2≤

i=1

i(2qm+1)θ2 i0

i−2(q0+qm+1)+2n−1i(2q0+1)+n−2

k

i=1

i2q0+1θ2 i0+n2

i=k+1

i(2qm+1)θ2 i0

(5.5)

kC0+n2(k+1)(2q0+2qm+1)

i=k+1

(14)

The first term can be made less than or equal to(48)−1n1/(q0+qm+1)by choosing

k= (48C0)−1n1/(q0+qm+1). Sincek+1≥(48C0)−1n1/(2q0+1), the second term

is also less than or equal to (48)−1n1/(q0+qm+1) for n > N3, where N3 is the smallestnsuch that

i(48C0)−1n1/(2q0+1) i2q0θ2

i0< (48C0)−2(2q0+1).

(5.6)

Note thatN1, N2 andN3do not depend on a particularm <0. Thus forn > N =

max(N1, N2, N3), (5.3) and (5.4) follow. Equation (3.2) now follows from

Lemma 5.2. Equation (3.3) is an immediate consequence of (3.2).

6. Proof of Lemmas 3.2 and 3.3. To simplify notation, we drop the overhead tildes from ˜ andλ˜m’s and simply write =

m=1λm m, where

m=1λm=1.

PROOF OFLEMMA 3.2. By Chebyshev’s inequality,

θ:

i=1

i2q0θ2 i > BX

B−1

i=1

i2q0E(θ2 i|X).

Now

E(θi2|X)= ∞

m=1

(q=qm|X)E(θi2|X, q=qm)

= ∞ m=1

(q=qm|X)

1

n+i2qm+1 +

n2X2i (n+i2qm+1)2

≤ 1

n+i2q1+1 +

n2X2i (n+i2q0+1)2.

Therefore,

0

θ:

i=1

i2q0θ2 i > BX

B−1 ∞

i=1

i2q0

1

n+i2q1+1 +

n20X

2

i

(n+i2q0+1)2

=B−1

i=1

i2q0 n+i2q1+1 +

i=1

n2i2q0θ2 i0

(n+i2q0+1)2 + ∞

i=1

ni2q0 (n+i2q0+1)2

.

It therefore suffices to show that the sums in the above display are finite and bounded inn. The first two sums are clearly so because these are bounded by the convergent series ∞i=1i−1−2(q1−q0) and

(15)

last sum into sums over ranges{i: 1≤ik}and{i:i > k}, withk= n1/(2q0+1). Then

i=1

ni2q0

(n+i2q0+1)2 ≤n −1

k

i=1

i2q0+n

i=k+1

i−2q0−2

n−1k2q0+1+n(k+1)−2q0−1 2q0+1

,

which is bounded by 2/(2q0+1).

To prove Lemma 3.3, it is more convenient to view the sample as n i.i.d. observationsY1,Y2, . . . ,Yn, where each Yj is distributed like Y=(Y1, Y2, . . .)

with distribution : the Yi’s are independently distributed as N (θi,1), i = 1,2, . . . .Note that here we distinguish between this andPθ , n, the distribution ofX. We need some preparatory lemmas.

LEMMA6.1. For anyθ,θ0∈2:

(i) is absolutely continuous with respect toPθ0.

(ii) ∞i=1(Yiθi0)(θiθi0) converges a.s. [0] and in L2(Pθ0), and has mean0and varianceθθ02.

(iii) dPθ

dPθ0

(Y)=exp

i=1

(Yiθi0)(θiθi0)

1

2θθ0

2

.

(iv) −

log dPθ

dPθ0

(y)dPθ0(y)= 1

2θθ0

2

.

(v) log dPθ

dPθ0 (y)

2

dPθ0(y)= θθ0

2+1

4θθ0

4.

(vi) The Hellinger distance

H (θ0,θ)=

dP

θ

dPθ0 )(y)

1/2

−1 2

dPθ0(y)

1/2

satisfies

H2(θ0,θ)=2

1−exp−18θθ02 ,

(6.1)

so that,in particular,H (θ0,θ)≤ 12θθ0,and ifθθ0 ≤1,thenH (θ0,θ)

(16)

PROOF. Whereas and 0 are product measures and the infinite product of affinities of their respective components converges to exp[−18θθ02],

(i) follows from Kakutani’s criterion as in the proof of Lemma 5.1. For (ii), consider the mean zero martingale Sk=

k

i=1(Yiθi0)(θiθi0) and note that

supk1ESk2 = θθ02<∞. The martingale convergence theorem [see, e.g.,

Williams (1991), page 109] now applies. For (iii), note that on the sigma-field generated by(Y1, Y2, . . . , Yk), the Radon–Nikodym derivative is given by

exp

k

i=1

(Yiθi0)(θiθi0)−12

k

i=1

(θiθi0)2

.

The rest follows from (ii) and the martingale convergence theorem. Assertions (iv) and (v) are immediate consequences of (ii) and (iii). For (vi), note that

H2(θ0,θ)=2−20((dPθ/dPθ0)(Y))

1/2. To evaluateE

θ0((dPθ/dPθ0)(Y))

1/2,

consider the martingale

Sk=exp

1 2

k

i=1

(Yiθi0)(θiθi0)−18

k

i=1

(θiθi0)2

.

Clearly Sk is positive, has unit expectation and bounded second moment, so its limit (dPθ/dPθ0)

1/2(Y)exp[1

8θθ02] also has expectation 1. This

implies (6.1). The next assertion in (vi) now follows from the inequality 1−exxand the last from the mean value theorem.

The following lemma, which estimates the probability of a ball under product normal measure from below, will be used in the calculation of certain prior probabilities. The result appeared as Lemma 5 in Shen and Wasserman (2001). Its simple proof is included for completeness.

LEMMA 6.2. Let W1, . . . , WN be independent random variables, with Wi having distributionN (ξi, i−2d),d >0.Then

Pr

N

i=1

Wi2≤δ2

≥2−N/2edNexp

N

i=1

i2i2

Pr

N

i=1

Vi2≤2δ2N2d

,

whereV1, . . . , VN are independent standard normal random variables.

PROOF. Using (a +b)2 ≤ 2(a2 +b2), N! ≥ eNNN and the change of variablevi=

2Ndwi, we obtain

Pr

N

i=1

Wi2≤δ2

=N i=1w2iδ2

N

i=1

(2π )−1/2idexp−12i2d(wi+ξi)2

(17)

(2π )N/2(N!)d

N

i=1w2iδ2 exp

N

i=1

i2d(wi2+ξi2)

dw1· · ·dwN

(2π )N/2(N!)dexp

N

i=1

i2i2

×N i=1w2iδ2

exp

N2d

N

i=1

w2i

dw1· · ·dwN

(N!)dexp

N

i=1

i2i2

2−N/2NdN(2π )N/2

×N

i=1vi2≤2N22 exp −1 2 N

i=1

v2i

dv1· · ·dvN

≥2−N/2edNexp

N

i=1

i2i2

Pr

N

i=1

Vi2≤2N22

.

For simplicity, we now drop the overhead bars from ¯ andλ¯ and simply write =∞m=0λm m, where

m=0λm=1.

LEMMA6.3. There exist positive constantsC, csuch that for allε >0,

{θ:θθ0 ≤ε} ≥0exp

−1/q0. (6.2)

PROOF. Clearly the left-hand side (LHS) of (6.2) is bounded from below by λ0 0{θ:∞i=1(θiθi0)2≤ε2}. Now by independence, for anyN,

0

θ:

i=1

(θiθi0)2≤ε2 ≥ 0 θ: N

i=1

(θiθi0)2≤ε2/2 0 θ: ∞

i=N+1

(θiθi0)2≤ε2/2

.

Also

i=N+1

(θiθi0)2≤2

i=N+1

θi2+2

i=N+1

θi20.

(6.3)

The second sum in (6.3) is less than or equal to

2N−2q0 ∞

i=N+1

i2q0θ2

i0≤2N−2q0C0<

ε2

(18)

wheneverN > N1=(8C0)1/2q0ε−1/q0, whereC0=∞i=1i2q0θi20. By Chebyshev’s

inequality, the first sum on the right-hand side (RHS) of (6.3) is less thanε2/4 with probability at least

1− 8

ε2 ∞

i=N+1

E 0i2)=1− 8

ε2 ∞

i=N+1

i(2q0+1)1− 4 q0N2q0ε2

>1

2

if N > N2 = (8/q0)1/2q0ε−1/q0. To bound 0{θ:Ni=1(θiθi0)2 ≤ ε2/2},

we apply Lemma 6.2 with d =q0 + 12, ξi =θi0 and δ2 =ε2/2. Note that by

the central limit theorem, the factor Pr{Ni=1Vi2≤2δ2N2d} on the RHS of the inequality from Lemma 6.2 is at least 14 if 2δ2N2d > N and N is large, that is, ifN > N3=ε−1/q0andN > N4. ChoosingN =max(N1, N2, N3, N4)and noting

thatNi=1i2q0+1θ2

i0≤N C0, (6.2) is obtained.

The final lemma gives the entropy estimates. Recall that for a totally bounded subset S of a metric space with a distance function d, the ε-packing number

D(ε, S, d)is defined to be the largest integermsuch that there exists1, . . . , smS withd(sj, sk) > εfor allj, k=1,2, . . . , m,j =k. Theε-entropy is logD(ε, S, d). A bound for the entropy of Sobolev balls was obtained in Theorem 5.2 of Birman and Solomjak (1967). It may be possible to derive a bound similar to that in Lemma 6.4 from their result by a Fourier transformation. However, the exact correspondence is somewhat unclear because of the possible fractional value of the index q. Below, we present a simple proof which could be of some independent interest as well.

LEMMA6.4. For allε >0,

logD

ε,

θ:

i=1

i2i2≤B

, ·

≤ 1

2(8B)

1/(2q)log4(2e)2q ε−1/q. (6.4)

PROOF. Denoteq(B)= {θ:

i=1i2qθi2≤B}. Ifε2>4B, then

i=1

θi2≤ ∞

i=1

i2i2≤B <ε

2

4

and hence for any two θ,θq(B), θθ < ε, implying that D(ε,

q(B))=1. Thus (6.4) is trivially satisfied for such an ε. We shall therefore assume thatε2≤4B.

Letθ1, . . . ,θmq(B)be such thatθiθj> ε,j, k=1,2, . . . , m,j =k. For an integerN, consider the set

q, N(B)=

1, . . . , θN,0, . . .): N

i=1

i2i2≤B

(19)

Withθ¯ =1, . . . , θN,0, . . .), note that for anyθq(B),

θ− ¯θ2=

i=N+1

θi2≤(N +1)−2qBε

2

8

ifN = (8B)1/(2q)ε−1/q. For this choice ofN, we have

1≤N(8B)1/(2q)ε−1/q< N+1≤2N .

(6.5)

Therefore, forj, k=1,2, . . . , m,j=k,

ε2<θjθk2= ¯θj − ¯θk2+ (θj − ¯θj)(θk− ¯θk)2≤ ¯θj − ¯θk2+4

ε2

8 ,

implying that ¯θj − ¯θk > ε/

2. The set q, N(B) may be viewed as an

N-dimensional ellipsoid and the θ¯j’s as N-dimensional vectors in RN with the usual Euclidean distance. Fort=(t1, . . . , tN)andδ >0, letB(t, δ)stand for the Euclidean ball with radius δ centered at t. The balls B(θ¯j, ε/2

2) are clearly disjoint. Note that iftq(B)andt=(t1, . . . , tN )B(t, ε/2

2), we have

N

i=1

i2qti2≤2 N

i=1

i2qti2+2 N

i=1

i2q(titi)2≤2B+2N2q

ε2

8 ≤4B

by (6.5), and hence

m

j=1

B

¯ θj,

ε

2√2

q, N(4B).

LetVN stand for the volume of the unit ball inRN. Note that the volume of the ellipsoid{t:Ni=1ai2ti2≤A}is given byVNAN/2/Ni=1ai. Bounding the volume ofmj=1B(θ¯j, ε/2

2)by that ofq, N(4B), we obtain

m

ε2

8 N/2

VN(4B)N/2VN N

i=1

iq=(4B)N/2VN(N!)q.

SinceN! ≥NNeN for allN≥1, we arrive at the estimate

m

32Be2q

N2qε2 N/2

.

(6.6)

From (6.5), we also have N22 ≥ 23−2qB. Substituting in (6.6), the required bound is obtained.

(20)

THEOREM 6.1. Suppose we have i.i.d. observations Y1, Y2, . . . from a

distribution P having a density p (with respect to some sigma-finite measure)

belonging to a class P. Let H (P0, P ) denote the Hellinger distance between

P0 andP. Let be a prior on P and P¯ ⊂P. Let εn be a positive sequence such thatεn→0andnεn2→ ∞and suppose that

logD(εn,, H )c12n (6.7)

and

P:− log p

p0

dP0≤εn2, log

p p0

2

dP0≤εn2

c2ec3

2

n (6.8)

for some constantsc1, c2, c3, wherep0 stands for the density of P0. Then for a

sufficiently large constantM,the posterior probability

P ∈ ¯P:H (P0, P )Mεn|Y1, Y2, . . . , Yn

→0 (6.9)

inP0n-probability.

PROOF OF LEMMA 3.3. TakeP = {:θ2} andP¯ = {:

i=1i2q

θi2 ≤ B}. By part (vi) of Lemma 6.1, the Hellinger distance H (Pθ0, Pθ) is equivalent to the2distanceθ0−θ, so in (6.7) and in the conclusion (6.9), the

former may be replaced by the latter. Lemma 6.4 shows that (6.7) is satisfied by a multiple ofnq0/(2q0+1). For (6.8), note that by parts (iv) and (v) of Lemma 6.1, the set

θ:−0

log dPθ

dPθ0 (Y)

ε2, Eθ0

log dPθ

dPθ0 (Y)

2

ε2

contains{θ:θθ02≤ε2/2}forε <1. Therefore, by Lemma 6.3, (6.8) is also

satisfied by a multiple ofnq0/(2q0+1). The result thus follows.

7. Continuous spectrum adaptation. Suppose now thatq0∈Q, whereQis

an arbitrary subset of the positive semiaxisR+. Since our basic approach relies on the assumption that the smoothness parameter can only take countably many possible values, we proceed as follows. Choose a countable dense subsetQ∗ofQ

and put a priorλon it such thatλ(q=s) >0 for eachs∈Q∗.

THEOREM 7.1. For any sequence Mn → ∞ and for any given δ >0, the posterior probability

θ:nq0/(2q0+1)δθθ

0> Mn|X

→0

(21)

Note that the prior does not depend onδ, so the obtained rate is almost optimal. The proof of the following theorem is very similar to that of Theorem 3.1. The main differences in the proofs are indicated in the outline of the proof below.

OUTLINE OF THE PROOF. The true q0 may not belong to Q∗, but we may

choose q∗∈Q∗, q< q0 arbitrarily close to q0. Choose also q,¯ q˜ ∈Q∗ which

satisfyq < q¯ ∗<q < q˜ 0, and henceθ0∈q0⊂q˜ ⊂q∗⊂q¯.

Lemmas 3.1 and 3.2 go through in the sense that (q < q∗|X)→0 asn→ ∞

and

sup n

θ:

i=1

i2q¯θi2> B, qqX

→0 asB→ ∞

in0-probability. To see this, apply Lemma 3.1 with q0 replaced by q˜ andq−1 byq∗and Lemma 3.2 withq0replaced byq¯ andq1byq∗.

As in the proof of Theorem 3.1, we obtain the bound

θ:nq/(¯ 2q¯+1)θ−θ0> MnX

= θ:nq/(¯ 2q¯+1)θθ0> Mn, qqX

+ θ:nq/(¯ 2q¯+1)θθ0> Mn, q < qX

θ:

i=1

i2q¯θi2> B, qqX

+

θ:nq/(¯ 2q¯+1)θθ0> Mn,

i=1

i2q¯θi2≤B, qqX

+ (q < q∗|X).

Note that the fourth term on the RHS of the series of inequalities in the proof of Theorem 3.1 has been absorbed into the first two terms on the RHS of the last inequality. Now, the second term on the RHS of the last display converges to zero by Lemma 3.3 with q0 replaced by q¯. Therefore all the terms go to zero. Since

¯

q can be made arbitrarily close toq0, this means that for any given δ >0, the

posterior converges at least at the ratenq0/(2q0+1)+δ in

2.

REMARK 7.1. The loss of a power δ is due to the fact that the q’s are not strictly separated. It is intuitively clear that the requirement of strict separation stems from the fact that the closer the possible values of the smoothness parameter are, the more difficult it is to distinguish between them.

(22)

on the estimates in Lemmas 5.2 and 6.3 does not seem to work because of the absence of point masses. Nevertheless, we believe that Theorem 7.1 continues to hold in this case, but a complete analogue of Theorem 3.1 will possibly not hold. The intuition behind our belief is essentially the same as in Theorem 7.1. Since the slightest deviation in the assumed value ofq from the trueq0affects the

rate by a power onn, and without a point mass at q0, one can at best hope for a

concentration of the posterior distribution ofqvalues in a neighborhood ofq0, the

rate of convergence of the posterior distribution will be affected by an arbitrarily small power ofn.

8. Uniformity over ellipsoids. It is of some interest to know the extent to which the posterior convergence in Theorems 2.1, 3.1 and 7.1 is uniform over θ0; see Brown, Low and Zhao (1997) in this context. For Theorem 2.1, it

is immediately seen from the proof that (2.1) holds uniformly over an ellipsoid, that is, for anyQ≥0 andMn→ ∞,

sup θ0∈q(Q)

0 q

θ:nq/(2q+1)θ−θ0> MnX

→0.

For Theorem 3.1, such a proposition is much more subtle. One needs to check uniformity in each of Lemmas 3.1, 3.2 and 3.3. In Theorem 8.1 below, we show that a local uniformity holds in the sense that for everyθ0 ∈0, there exists a

small ellipsoid around it where the convergence is uniform.

THEOREM 8.1. For anyθ∗∈q0there existsε >0such that

sup θ0∈Eq0(θ, ε)

0 q

θ:nq0/(2q0+1)θθ

0> MnX

→0,

whereEq0(θ

, ε)= {θ

0:

i=1i2q0(θi0−θi)2< ε}.

OUTLINE OF THEPROOF. We need to check the uniformity in every step of

the proof of Theorem 3.1. Going through the proof of Lemma 3.2, it is easily seen that for anyQ >0,

lim

B→∞θ sup 0∈q0(Q)

sup n≥1

θ:

i=1

i2q0θ2 i > BX

=0.

For the uniform version of Lemma 3.3, we need to show that for anyB >0,

lim n→∞θ sup

0∈q0(Q)

θ:rnθθ0> Mn,

i=1

i2q0θ2 iBX

=0.

(8.1)

Further, note that (6.9) in Theorem 6.1 could have been stated as an inequality [see the proof of Theorem 2.1 of Ghosal, Ghosh and van der Vaart (2000)],

0

P ∈ ¯P:H (P0, P )Mεn|Y1, Y2, . . . , Yn} ≤Cecnε 2

n+ 1

2

n

(23)

where C and c are positive constants depending on c1, c2 andc3 that appeared

in the conditions of Theorem 6.1. Therefore, (8.1) holds if the conditions of Theorem 6.1 hold simultaneously for θ0 ∈q0(Q). The entropy equation (6.7) does not depend on the true valueθ0[see relation (6.4) in Lemma 6.4], soc1is free

ofθ0. It is possible to choose the samec2 andc3 forθ0 belonging to the ellipsoid

q(Q) by the estimates given by Lemmas 6.2 and 6.3. Therefore, it remains to check uniformity in Lemma 3.1.

However, (3.3) in Lemma 3.1 does not hold uniformly overθ0∈q0(Q)for a givenQ >0. The problem is that by varyingθ0 overq0(Q), one can encounter arbitrarily slow convergence of the LHS of (5.6) to zero, making the integer N3

defined there dependent onθ0. Nevertheless, for a givenθ∗=1, θ2, . . .)q0, it is possible to choose anε >0 such that

lim

n→∞ sup

θ0∈Eq0(θ, ε)

0 (q < q0|X)→0. (8.2)

The proof is largely a repetition of the arguments given in the proof of Lemma 3.1, so we restrict ourselves only to those places where modifications are needed.

First, if 0 < ε < 1, θ0 ∈ E(θ, ε) and C0 = ∞i=1i2q0(θi)

2, then using

(ab)2≤2(a2+b2), we have∞i=1i2q0θ2

i0≤D0, whereD0=2C0+2. Proceed

as in the proof of Lemma 3.1 with k= (48D0)−1n1/(q0+qm+1). If we choose

ε=4−1(48D0)−2(2q0+1)andN3the smallest integernsuch that

i(48D0)−1n1/(2q0+1) i2q0

i)2<4−1(48D0)−2(2q0+1),

then it easily follows that fornN3,

i(48D0)−1n1/(2q0+1) i2q0θ2

i0< (48D0)−2(2q0+1).

The proof of (8.2) now follows as before and, hence, the theorem is proved.

REMARK8.1. The uniformity holds forθ0 belonging to a compact set in the

topology generated by the norm(i=1i2q0θ2

i)1/2onq0.

REMARK8.2. In a similar manner, one can formulate the uniform version of

posterior convergence for the case of a continuous spectrum as well.

(24)

REFERENCES

BIRMAN, M. S. and SOLOMJAK, M. Z. (1967). Piecewise-polynomial approximation of functions of the classWpα.Soviet Math. Sb.73295–317.

BROWN, L. D. and LOW, M. G. (1996). Asymptotic equivalence of nonparametric regression and white noise.Ann. Statist.242384–2398.

BROWN, L. D., LOW, M. G. and ZHAO, L. H. (1997). Superefficiency in nonparametric function estimation.Ann. Statist.252607–2625.

COX, D. D. (1993). An analysis of Bayesian inference for nonparametric regression.Ann. Statist.21

903–923.

DIACONIS, P. and FREEDMAN, D. (1997). On the Bernstein–von Mises theorem with infinite dimensional parameters. Technical Report 492, Dept. Statistics, Univ. California. EFROMOVICH, S. Y. and PINSKER, M. S. (1984). Learning algorithm for nonparametric filtering.

Autom. Remote Control451434–1440.

FREEDMAN, D. (1999). On the Bernstein–von Mises theorem with infinite dimensional parameters.

Ann. Statist.271119–1140.

GHOSAL, S., GHOSH, J. K. andVAN DERVAART, A. W. (2000). Convergence rates of posterior distributions.Ann. Statist.28500–531.

GHOSAL, S., LEMBER, J. and VAN DER VAART, A. W. (2002). On Bayesian adaptation. In

Proceedings of 8th Vilnius Conference on Probability and Statistics

References

Related documents

It was decided that with the presence of such significant red flag signs that she should undergo advanced imaging, in this case an MRI, that revealed an underlying malignancy, which

Variations in the concentration of elements in plant samples of the tunnel #504 on the sampling profi le Analysis of the results suggests that the point with the highest

Field experiments were conducted at Ebonyi State University Research Farm during 2009 and 2010 farming seasons to evaluate the effect of intercropping maize with

The ethno botanical efficacy of various parts like leaf, fruit, stem, flower and root of ethanol and ethyl acetate extracts against various clinically

Applications of Fourier transform ion cyclotron resonance (FT-ICR) and orbitrap based high resolution mass spectrometry in metabolomics and lipidomics. LC–MS-based holistic metabolic

The broad elements of this general periodization of the women’s movement incorporates the suggestion of: a first wave movement associated with liberal democratic political

There are infinitely many principles of justice (conclusion). 24 “These, Socrates, said Parmenides, are a few, and only a few of the difficulties in which we are involved if

Based on the idea, we have put forward novel routing strategies for Barrat- Barthelemy- Vespignani (BBV) weighted network. By defining the weight of edges as