2.2 The Goldenshluger–Lepski Method
2.2.3 Multiple Parameters
We now discuss the Goldenshluger–Lepski method itself. The method is introduced by Goldenshluger and Lepski (2008). Let D be an open interval of Rd containing D0 = [−1/2,1/2]d. The authors consider the stochastic process Y onD defined by
dY(t) =F(t)dt+ε dW(t),
where F : D →R, W is a standard Weiner process on Rd and ε ∈(0,1) determines the variability of Y. It is assumed that F is continuous and bounded. The aim is to estimate F(x0) for some fixed x0 ∈ D0. The authors obtain expectation bounds
on therth power of the error of an adaptive estimator with respect to the Euclidean norm forr >0.
Goldenshluger and Lepski (2008) consider a range of non-adaptive kernel estimators indexed by the compact set Θ ⊆ Rm equipped with the Euclidean norm |·|
2. Let
KΘ be the set of kernels Kµ : Rd×Rd → R for µ ∈ Θ. The authors assume that there is some open interval D1 of Rd with D0 ⊆ D1 ⊆ D such that, for all µ ∈ Θ,
supp(Kµ(·, y))⊆D1 for all y∈D0 and
Z
D
Kµ(t, y)dt= 1
for all y ∈ D1. This ensures that Kµ satisfies the usual definition of a kernel, along with some regularity conditions.
The authors also assume some boundedness properties of the collection of kernelsKΘ.
CHAPTER 2. LITERATURE REVIEW 28 p∈[1,∞] and kfkp,∞ = sup x∈D0 Z Rd |f(t, x)|p dt 1/p and kfk∞,∞ = sup x∈D0 sup t∈Rd |f(t, x)| for f : Rd×D
0 → R and p ∈ [1,∞). The assumption is that supµ∈ΘkKµk1,∞ < ∞
and supµ∈ΘkKµk2,∞ < ∞. Continuity of the kernels is also required. It is assumed
that there existγ ∈(0,1] andL >0 such that
sup µ,µ0∈Θ kK˜µ−K˜µ0k2,∞ |µ−µ|γ2 ≤L and µ,µsup0∈Θ sup x∈Rd |1− kKµ(·, x)k2/kKµ0(·, x)k2| |µ−µ|γ2 ≤L, where ˜Kµ(·, x) = Kµ(·, x)/kKµ(·, x)k2 for µ∈Θ.
Goldenshluger and Lepski (2008) define the set KΘ×Θ of auxiliary kernels Kµ,ν : Rd×Rd→R by
Kµ,ν(t, x) =
Z
D1
Kµ(t, y)Kν(y, x)dy fort, x ∈Rd. The kernel K
µ,ν is in some sense smoother than bothKµ and Kν. The authors also demand that Kµ,ν(t, x) = Kν,µ(t, x) for all t, x ∈ Rd and all µ, ν ∈ Θ. This occurs if, for example,Kµ(t, x) =Kµ(t−x,0) for all t, x∈Rd and all µ∈Θ. Having defined all of the kernels, the authors define the non-adaptive estimators. Let
ˆ Fµ(x) = Z D Kµ(t, x)dY(t) and ˆFµ,ν(x) = Z D Kµ,ν(t, x)dY(t)
forx∈D0 and µ, ν ∈Θ. We first consider the bias of these estimators. Let
Bµ(x) = Z D Kµ(t, x)F(t)dt−F(x) and Bµ,ν(x) = Z D Kµ,ν(t, x)F(t)dt−F(x)
CHAPTER 2. LITERATURE REVIEW 29
forx∈D0 and µ, ν ∈Θ. We also require
˜ Bµ(x) = sup ν∈Θ |Bµ,ν(x)−Bν(x)| ∨ |Bµ(x)|
for x ∈ D0 and µ ∈ Θ. In order to define the adaptive estimator, the variability of
the non-adaptive estimators must be taken into account. Let
ξµ(x) = Z D Kµ(t, x)dW(t) and ξµ,ν(x) = Z D Kµ,ν(t, x)dW(t)
for x ∈ D0 and µ, ν ∈ Θ, and let σµ(x) = kKµ(·, x)k2 and σµ,ν(x) = kKµ,ν(·, x)− Kν(·, x)k2. We also require ˜ σµ(x) = sup ν∈Θ Z Rd |Kν(y, x)|σµ(y)dy ∨σµ(x) forx∈D0 and µ∈Θ.
Goldenshluger and Lepski (2008) continue by defining the majorant, which is neces- sary for defining the adaptive estimator. Recall thatx0 ∈D is the point at which we
are interested in estimating F. Let ΣΘ = {σ˜µ(x0) : µ ∈ Θ} and σmin = inf ΣΘ. In
order to define the majorant, the authors must controlg : ΣΘ→[0,∞) by
g(σ) = sup µ∈ΘE sup ν∈Θ:˜σν(x0)≤σ |ξµ,ν −ξν| ! .
The function g measures the variability of the pairwise comparisons involved in the definition of the adaptive estimator.
The authors assume that there is a known functione: ΣΘ→[0,∞), which is contin-
uous and non-decreasing, such that e(σ) ≥ g(σ) for all σ ∈ ΣΘ. It is also assumed
thate(2σ)/e(σ)∈[ce, Ce] for all σ∈ΣΘ and some Ce ≥ce >1. This is similar to the functione being slowly varying. In general, findinge is a very difficult problem. The
CHAPTER 2. LITERATURE REVIEW 30
majorant can then be defined as Q: ΣΘ →[0,∞) by
Q(σ) =κ0e(σ) +σ(1 +κ1log(σ/σmin))1/2,
where κ0 = 2Ce and κ1 = 128r(1∨(log(Ce)/log(2))). This is an inflated version of e(σ), with the increase based on the variability σ ∈ΣΘ.
Goldenshluger and Lepski (2008) then define the adaptive estimator. Let
ˆ Rµ= sup ν∈Θ:˜σν(x0)≥˜σµ(x0) |Fˆµ,ν −Fˆν| −εQ(˜σν(x0))/2 .
Forδ=εQ(σmin)/4, let ˆµ be a random element of Θ such that
ˆ Rµˆ+εQ(˜σµˆ(x0))≤ inf µ∈Θ ˆ Rµ+εQ(˜σµ(x0)) +δ.
The adaptive estimator is given by ˆFµˆ. The authors provide a bound on the error of
ˆ
Fµ(xˆ 0) under a final assumption.
Let ΘF be the set of µ ∈ Θ such that for all σ ∈ ΣΘ for which σ ≥ σ˜µ(x0), there
exists θ ∈ Θ such that ˜σθ(x0) = σ and ˜Bθ(x0) ≤ εQ(˜σθ(x0))/4. It is assumed
that ΘF 6= ∅, which gives a condition on F with respect to the kernels. Define µ∗ = arg minµ∈ΘF σ˜µ(x0). Theorem 1 of Goldenshluger and Lepski (2008) shows that
(E(|Fˆµˆ(x0)−F(x0)|r))1/r ≤CεQ(˜σµ∗(x0))
for ε ∈ (0,1) sufficiently small and some C > 0. Therefore, the error of ˆFµˆ(x0) is
bounded by an inflated version of the variability of the pairwise comparisons with respect toµ∗, whereµ∗ in some sense produces an estimator with small variability.
CHAPTER 2. LITERATURE REVIEW 31
2.3
Optimal Transport
We now discuss a sample of the relevant literature from optimal transport. We start with the first modern treatment.
2.3.1
Early Research
The optimal transport problem in its modern form is introduced by Kantorovitch (1958). The author allows transport between any finite measures with the same total mass. However, here we assume that the measures are probability measures. The author demands that X =Y is compact and the cost function c is continuous. The quantity
Wc(P, Q) = inf γ∈Π(P,Q)
Z
c dγ
for P, Q ∈ P(X) is defined, and considered as a distance on P(X). Kantorovitch (1958) states that Wc(P, Q) is attained by some γ ∈ Π(P, Q) because Π(P, Q) is compact.
The author then considers an early form of the dual problem. Define U :X → R to be a potential for γ ∈ Π(P, Q) if for all x, y ∈ X we have |U(y)−U(x)| ≤ c(x, y), and furthermore U(y)−U(x) = c(x, y) if γ(A×B) > 0 for all open sets A, B such that x∈A and y ∈B. The theorem of Kantorovitch (1958) shows that γ ∈Π(P, Q) attainsWc(P, Q) if and only if it has a potential.