Consistent Validation for Edge Selection - Edge Selection in High Dimensional Gaussian

4.2 Edge Selection in High Dimensional Gaussian

4.2.3 Consistent Validation for Edge Selection

We first illustrate the CCV procedure as in Feng and Yu (2013) for GLMs. Suppose we haven independently and identically distributed observations (xi, yi),i= 1,· · · , n,

distribution y given x is an exponential family with a canonical link. Its density function is written as follows: f(y;x,β) =c(y, π)exp[(yθ−b(θ))/a(φ)],where θ =xβ

and φ ∈ (0,∞) is the dispersion parameter. Here, β is the parameter of interest, and

β₀ is the true parameter, with ||β₀||0 = d0 < n, where ||β||0 = |{j : βj 6= 0}|. The

log-likelihood can be written as follows based on an affine transformation:

l(y,β) = 1 n n X i=1 [yiθi−b(θi)].

CCV considers sparse estimation by minimizing a penalized negative log-likelihood function with a tuning parameter, λ:

ˆ β= argmin β [−l(y,β) +λ p X j=1 p(|βj|)],

where pλ(·) is the penalty function. Feng and Yu (2013) considered both convex and

folded-concave penalties as the penalty function for CCV. The multi-stage CCV procedure is described in Algorithm 1.

Algorithm 1. CCV Implementation (Feng and Yu 2013)

1. Compute the solution path using the entire dataset. A sequence of solutions βˆ(λ) is

generated as a function of the penalty level λ.

2. Randomly split the whole dataset into {(xi, yi), i ∈ s} (the validation set) and

{(xi, yi), i ∈ sc} (the construction set) r times. The sizes of the validation set and

the construction set are nv and nc respectively. For each split j = 1,· · · , r, compute

the restricted MLE path according to the active set sequence generated in the previous step.

3. Average the negative log-likelihood over the r splits for each model (from Step 1),

and choose the estimator in the model with the tuning parameter corresponding to the smallest loss function value.

4. Compute the restricted MLE for the selected model.

Our proposed method follows the steps of CCV. We first consider the edge struc- tures from the entire solution path. For each structure, CoVES computes the empirical negative log-likelihood via repeated random subsampling validation. We finally select the edge structure having the smallest negative log-likelihood. In CoVES, it is of our interest to select important edges instead of significant variables, and its corresponding likelihood is based on multivariate Gaussian distribution. Detailed algorithms are illustrated as follows in Algorithm 2.

Algorithm 2. CoVES Implementation

1. Calculate the solution path of the precision matrices using the entire dataset. A

sequence of solutions is generated with corresponding penalty level λ from (4.1). Along

the path, a sequence of candidate sets of edges are determined based on the precision

estimates, Θˆ(λ).

E(λ) = {(xj, xk) : ˆΘ(λ)(j,k)6= 0}

2. Randomly split the dataset into a validation set, s (size nv) and a construction set,

sc _(size _n

c) r times. For each split j = 1,· · · , r, compute the restricted MLE path

according to the active edge sequence generated in Step 1.

3. Average the negative log-likelihood over the r splits for each active edge set in Step

1, and choose the active edge set Eˆ with the smallest average validation error.

4. Compute the restricted MLE with the selected edge set Eˆ in Step 3.

In the second step of the CoVES algorithm, we give a detailed description for each repetition given a set of edges, E. Let Ed = {(1,1),· · · ,(p, p)} and E be one

of the estimated graphs from the full solution path. We minimize an unpenalized negative log-likelihood function with zero constraints to the unselected edges, Ec _and

the corresponding optimization problem is written as follows: ˜ Θsc_,E = argmin Θ>0 {log det(Θ)−tr(SscΘ)} subject toΘij = 0,(i, j)∈Ed\E, where Ssc = 1 nc X i∈sc

(xi −x¯sc)T(x_i −x¯_sc) is the empirical covariance matrix from the

construction sample and ¯xsc =

i∈sc

xi is the construction sample average. Note that

all the repetitions have the common set of edges, E, but may give different valued precision matrix estimators, ˜Θsc_,E.

Next, the validation set is used to obtain the empirical negative log-likelihood for the precision matrix estimator, ˜Θsc_,E. The corresponding log-likelihood is from the

multivariate Gaussian density and is written asls( ˜Θsc_,E), wherel_s(Θ) =

1 2(log det(Θ)− tr(SsΘ)). In the log-likelihood,Ss = 1 nv X i∈s

(xi−x¯s)T(xi−x¯s) is the empirical covariance

matrix from the validation sample, and ¯xs=

i∈s

xi is the validation sample average.

The negative log-likelihood evaluates how well each set of edges fits with the validation set. Note that the expected loss function evaluated at E is the expectation of the negative log-likelihood with respect to a random selection of s, ΓE = −E[ls( ˜Θsc_,E)].

It is called the risk function at ˜Θsc_,E. We take the average of the empirical negative

log-likelihood across the multiplersplits to estimate the risk function, which is denoted as ˆΓE as follows: ˆ ΓE =− 1 r X s∈R ls( ˜Θsc_,E).

Note that we now have numerous empirical negative log-likelihoods corresponding to the candidate sets. We choose the set with the smallest empirical negative log-likelihood. This step determines the graph structure and the edge set estimate is denoted as ˆE. In the last step, we estimate the signals of the conditional dependency in the selected

graph using the complete dataset and the estimate is denoted as ˆΘ_Eˆ.

In document Shin_unc_0153D_14733.pdf (Page 106-110)