Theoretical properties - Linear Hypothesis Testing in Dense High-Dimensional Linear

Chapter 3 Linear Hypothesis Testing in Dense High-Dimensional Linear

3.3.4 Theoretical properties

In deriving the theoretical properties of our test, we impose the following assumption.

Assumption 5. Let (i) xi and εi have Gaussian distributions, N (0, ΣX) and

N (0, σ2

ε), respectively. Moreover, assume (ii) that there exist constants c1, c2 > 0,

such that σε and the eigenvalues of ΣX lie in [c1, c2]. Lastly, let (iii) there exist

constants c3, c4 ∈ (0, 1), such that σu2/σz2 ≥ c3 and σ2ε/σy2 ≥ c4.

Assumption 5(i) is only imposed to simplify the proof. In high-dimensional literature Gaussian design is a very common assumption (e.g. Javanmard and Montanari (2014b) and Cai and Guo (2015)). The same results, at the expense of more complicated proofs, can be derived for sub-Gaussian designs and errors. Assumption 5(ii) is very standard in high-dimensional literature (see Bickel, Ritov, and Tsybakov 2009; Ning and Liu 2014; Geer, Bühlmann, Ritov, and Dezeure 2014 for more details).

Assumption 5(iii) imposes nondegeneracy of signal-to-noise ratios for models (3.1.1) and (3.3.3). Since kak2 is allowed to tend to infinity, σ2z = a

>_Σ

Xa/(a>a)2can

tend to zero and thus it is too restrictive to assume that σu is bounded away from

zero. Hence, Assumption 5(iii) is a relaxation, as it only rules out the uninteresting case of asymptotic noiselessness.

Remark 3.3.2. The sparsity condition is imposed on neither a nor β∗. Theorem 3.3.1

below says that we can conduct valid inference of a non-sparse linear combination of a non-sparse high-dimensional parameter without knowing ΣX. To the best of

our knowledge, this is the first result that allows for such generality.

Theorem 3.3.1. Let Assumption 5 hold. Consider estimators (3.3.6) and (3.4.2) with suitable choice of tuning parameters: η, λ pn−1_{log p, ρ}−1

0 = O(1) and

ρ0 ≤ [1 + c2c−11 (c −1

3 − 1)]−1/2. Suppose that kγ∗k0 = o(

√

n/ log p). Then, under H0

in (3.1.2), optimization problems (3.3.6) and (3.4.2) are feasible with probability approaching one and

lim

n,p→∞P |Sn| > Φ

−1_{(1 − α/2) = α} _{∀α ∈ (0, 1),}

where Sn is defined in Equation (3.3.9).

Theorem 3.3.1 establishes that the proposed test is asymptotically exact regardless of how sparse the model parameter or the loading vector are. In that sense, the result is unique in the existing literature as it covers cases of β sparse and a sparse (SS), β sparse and a dense (SD) , β dense and a sparse (DS) and especially β dense and a dense (DD). The (SS) case appears in a number of existing works (see Belloni, Chernozhukov, and Hansen (2014), Geer, Bühlmann, Ritov, and Dezeure (2014), Javanmard and Montanari (2014b), and Ning and Liu (2014)), case (SD) appears in Cai and Guo (2015). Whenever (SS) case holds, our result above matches the above mentioned work see Theorem 3.3.2. In the special setting of (SD) our result generalizes the one of Cai and Guo (2015) as Theorem 3.3.1 does not impose any restriction on the size of the loading vector a. The last two cases of (DS) and (DD) present an extremely challenging cases in which inference based on estimation (much like Wald or Rao or Likelihood principles) fails due to the inherit limit of detection – work of Cai and Guo (2016) provides details of impossibility of estimation in such settings. However, despite these challenges our method is able to provide asymptotically valid inference as we have developed inference based on a specifically designed moment condition (and not a parameter estimation alone). The result in Theorem 3.3.1 is based on the assumption that ˆπ∗ is a possibly

inconsistent estimator of the parameter vector π∗, i.e. the full model is dense with

all non-zero entries. In the following, we will show that if the model is a sparse model, the proposed test (3.3.9) maintains strong power properties. To facilitate the mathematical derivations, we consider the local alternatives of the form

H1,n: a>β∗ = g0+ n−1/2(a>ΩXa)1/2σεd, (3.3.10)

where d ∈ R is a fixed constant. The following result shows that the proposed test achieves certain optimality in detecting alternatives H1,n.

Theorem 3.3.2. Consider zi and wi defined in (3.3.1). Let Assumption 5 hold

and consider the choice of tuning parameters, as in Theorem 3.3.1. Suppose that kγ∗k0∨ kβ∗k0∨ kak0 = o(

√

n/ log p). Then, under H1,n in (3.3.10), optimization

problems (3.3.6) and (3.4.2) are feasible with probability approaching one and

lim

n,p→∞P |Sn| > Φ

−1_{(1 − α/2) = Ψ}

α(d) ∀α ∈ (0, 1),

where Ψα(d) := Φ (−Φ−1(1 − α/2) + d) + Φ (−Φ−1(1 − α/2) − d).

To better understand the optimality of the result above, consider the estimator (possibly infeasible) discussed at the end of Section 3.2: let ¯β denote an estimator satisfying√n( ¯β − β∗) ∼ N (0, ΩXσε2). Notice that, for the low-dimensional compo-

nents of β∗, ¯β achieves semi-parametric efficiency; see Robinson (1988). Therefore,

for sparse a, a>β is a semi-parametrically efficient estimator for a¯ >β∗. Notice that

√

n(a>β − a¯ >β∗) ∼ N (0, a>ΩXaσε2). Based on such efficient estimator, one might

consider an “oracle” test: for a test of nominal size α, reject the null H0 : a>β∗ = g0

if and only if _√

n|a>β − g¯ 0|

(a>_Ω

Xa)1/2σε

> Φ−1(1 − α/2).

It is easy to verify that the power of this “oracle” test of nominal size α against the local alternatives H1,n (3.3.10) is asymptotically equal to Ψα(d). Therefore,

Theorem 3.3.2 says that our test asymptotically achieves the same power as the “oracle” test under sparse a and β∗, i.e. it is as efficient as the “oracle” test.

models, the rate of Theorem 4 can also be shown to be optimal. As existing results apply only to the case of a = ej for a coordinate vector ej, 1 ≤ j ≤ p, we

discuss the relations of our work in this specific settings. We note that the tests based on VBRD and BCH are asymptotically equivalent to this “oracle” test and hence have the same asymptotic local power; the power of Wald or Score inferential methods (see Theorem 2.2 in Geer, Bühlmann, Ritov, and Dezeure (2014), Theorem 1 in Belloni, Chernozhukov, and Hansen (2014) or Theorem 4.7 in Ning and Liu (2014)) and that of Javanmard and Montanari (2014b) (see Theorem 2.3 therein) is asymptotically equal to and converges to Ψα(d), respectively. This in turn, implies

that the proposed method is semi- parametrically efficient and asymptotically minimax. For vectors a that have more than one non-zero coordinate, we can only compare our work with that of Cai and Guo (2015), where we observe that the result of Theorems 1 and 3 therein matches those of Theorem 4 covering the case of extremely sparse beta and potentially dense vectors a. However, observe that the confidence intervals developed therein require specific knowledge of the sparsity of the parameter β∗, kβ∗k0, a quantity rarely known in practice. Unlike their method,

our method can be directly implemented without the knowledge of the sparsity of β∗ and yet achieves the same optimality guarantees.

In document Essays on estimation and inference in high-dimensional models with applications to finance and economics (Page 116-119)