Tuning Parameter Selection - Partially Linear Mode Regression

Chapter 2 Partially Linear Mode Regression

2.4 Tuning Parameter Selection

To implement the above two estimation methods, the number of interior knots k

and the bandwidth h need to be selected appropriately. In this section, we propose two methods to select the tuning parameters. Similar to the strategy in Zhao et al. (2013) and Zhao et al. (2014), we first consider a two dimensional M-fold cross validation method. This method can be employed to select tuning parameter for both one-stage method and two-stage method. Additionally, in order to reduce the computing time in our one-stage estimation method, we develop a two-layer tuning parameter selection method. This method is designed only for the one-stage method.

Two-dimensional cross validation

As in Zhao et al. (2014), He et al. (2002) and Wang et al. (2009), we use cubic spline basis functions to approximate g(T) with ℓ = 4. Lower order of spline basis functions can be applied if g(T) is less smooth (He et al., 2002). After fixing the order of spline basis functions, the B-spline method is typically sensitive to the choice of the number of knots k. Besides the interior knots, the performance of all kernel- based methods can be noticeably affected by the choice of bandwidths. To address the choice of k and h, we propose a two dimensional cross validation method that entails minimizing the objective function

CV(k, h) =M−1 M X m=1 n−_m1 X i∈Im Kh n Yi−gˆ(−m)(Ti)−βˆ (−m) 1 Xi o , (2.10)

whereM presents the number of partitions of the data set,Imis the observation index

1,· · ·, M, ˆβ₁(−m) and ˆg(−m)(·) are estimates from applying the considered estimation method to the observed data after deleting themth subset. To be more specific, when one chooses (h, k) that go in the one-stage estimation method, ˆβ₁(−m) and ˆg(−m)_{(·) in}

(2.10) are ˆβRO,1 and ˆgRO(·) computed using data {(Yj, Tj, Xj), j ∈ I\Im}, where

I ={1,· · · , n}, and “\” is the set subtraction operator, form = 1,· · · , M. Similarly, when one selects (h, k) that go along with the two-stage estimation method, ˆβ₁(−m)and ˆ

g(−m)_{(·) in (2}_._{10) are ˆ}_β

RT ,1 and ˆgRT(·) computed using data{(Yj, Tj, Xj), j ∈ I\Im},

where I = {1,· · · , n} for m = 1,· · · , M. All these estimators depend on (h, k), the dependence we suppress on the right-hand side of (2.10) for cleaner notations. Following this cross validation procedure, referred to as the two-dimensional CV in the sequel, the chosen number of knots and bandwidth are given by

(ˆk,ˆh) = max

k,h CV(k, h).

We follow the strategy used in He et al. (2002) to determine the candidate values for k such that, given a chosen order of B-spline, and thus ℓ is fixed, these values lie in [max(0.5n1/5 −ℓ,8 + 2n1/5−ℓ)]. The M-fold two dimensional cross validation method can be computationally prohibitive. To ease the computational burden for tuning parameters selection for the one-stage estimation method, we propose another procedure described next for selecting tuning parameters tailored for this estimation method.

Two layer tuning parameter selection

We observe in extensive simulation studies that the quality of an estimator for

β1 is often noticeably influenced by how well g(·) is estimated, although the other

way around is not necessarily true. This motivates our second strategy for selecting (h, k), referred to as the two-layer tuning parameters selection method outlined in the following algorithm.

(L1) For each candidate value of k, kc, where c ∈ {1,· · · , C}, find an h among its

candidate values, {h1,· · · , hD}, that minimizes the integrated squared error

(ISE) of the estimate for g(·),

ISE(h, kc) =

Z 1

{ˆgRO(t)−g˜(t)}2dt, (2.11)

where ˜g(t) is a preliminary estimate forg(·), and ˆgRO(t) is the one-stage estimate

obtained based on the entired observed data set, whose dependence of on h, after k is fixed at kc, is suppressed on the right-hand side. Denote by h(c) =

arg min1≤d≤DISE(hd, hc), for c= 1,· · · , C.

(L2) Compute the M-fold CV criterion in (2.10) evaluated at (h(c)_{, k}

c), for c =

1,· · · , C. The selected values for the tuning parameters used in the one-stage estimation method are given by (h(c∗), kc∗), where c∗ = arg min₁_≤_c_≤_CCV(h(c), k_c).

This two-layer procedure requires C(D+M) rounds of estimation of β1 and g(·),

in contrast to CDM rounds of such estimation that the two-dimensional CV procedure involves. Hence, besides being well motivated by the empirical evidence that estimating the non-parametric part of the regression model has a greater impact on estimating the parametric part than the influence of the other way around, the two- layer tuning parameters selection method yields a tremendous amount of saving in computing time. The price one pays for such saving is that one needs some pilot estimator forg(·), namely ˜g(t) in (2.11), that can estimate the truth reasonably well. One way to obtain a ˜g(t) is to posit a flexible parametric model for the mode resid- ual, ϵ= Y −β1X −g(T), and approximate g(T) via a polynomial function of some

order, then estimate the unknowns using the maximum likelihood method. Another option is to use the estimate from the two-stage estimation method, ˆgRT(t), as a pilot

estimate.

It is worth pointing out that, for the two-stage estimation method, the nonparametric part of the estimation for β +g(t) is mostly accomplished in the first stage,

i.e., (T1) in Section 2.3, where ˆβ0 + ˆg(t) is obtained and does not depend on h.

Hence, the two-layer tuning parameters selection procedure is not applicable for the two-stage estimation method since one chooses h for estimating the nonparametric part in (L1).

In document Translucent Voices: Creating Sound Pedagogy And Safe Spaces For Transgender Singers In The Choral Rehearsal (Page 59-62)