4.2 Edge Selection in High Dimensional Gaussian
4.2.3 Consistent Validation for Edge Selection
We first illustrate the CCV procedure as in Feng and Yu (2013) for GLMs. Suppose we haven independently and identically distributed observations (xi, yi),i= 1,· · · , n,
distribution y given x is an exponential family with a canonical link. Its density function is written as follows: f(y;x,β) =c(y, π)exp[(yθ−b(θ))/a(φ)],where θ =xβ
and φ ∈ (0,∞) is the dispersion parameter. Here, β is the parameter of interest, and
β0 is the true parameter, with ||β0||0 = d0 < n, where ||β||0 = |{j : βj 6= 0}|. The
log-likelihood can be written as follows based on an affine transformation:
l(y,β) = 1 n n X i=1 [yiθi−b(θi)].
CCV considers sparse estimation by minimizing a penalized negative log-likelihood function with a tuning parameter, λ:
ˆ β= argmin β [−l(y,β) +λ p X j=1 p(|βj|)],
where pλ(·) is the penalty function. Feng and Yu (2013) considered both convex and
folded-concave penalties as the penalty function for CCV. The multi-stage CCV pro- cedure is described in Algorithm 1.
Algorithm 1. CCV Implementation (Feng and Yu 2013)
1. Compute the solution path using the entire dataset. A sequence of solutions βˆ(λ) is
generated as a function of the penalty level λ.
2. Randomly split the whole dataset into {(xi, yi), i ∈ s} (the validation set) and
{(xi, yi), i ∈ sc} (the construction set) r times. The sizes of the validation set and
the construction set are nv and nc respectively. For each split j = 1,· · · , r, compute
the restricted MLE path according to the active set sequence generated in the previous step.
3. Average the negative log-likelihood over the r splits for each model (from Step 1),
and choose the estimator in the model with the tuning parameter corresponding to the smallest loss function value.
4. Compute the restricted MLE for the selected model.
Our proposed method follows the steps of CCV. We first consider the edge struc- tures from the entire solution path. For each structure, CoVES computes the empirical negative log-likelihood via repeated random subsampling validation. We finally select the edge structure having the smallest negative log-likelihood. In CoVES, it is of our interest to select important edges instead of significant variables, and its correspond- ing likelihood is based on multivariate Gaussian distribution. Detailed algorithms are illustrated as follows in Algorithm 2.
Algorithm 2. CoVES Implementation
1. Calculate the solution path of the precision matrices using the entire dataset. A
sequence of solutions is generated with corresponding penalty level λ from (4.1). Along
the path, a sequence of candidate sets of edges are determined based on the precision
estimates, Θˆ(λ).
ˆ
E(λ) = {(xj, xk) : ˆΘ(λ)(j,k)6= 0}
2. Randomly split the dataset into a validation set, s (size nv) and a construction set,
sc (size n
c) r times. For each split j = 1,· · · , r, compute the restricted MLE path
according to the active edge sequence generated in Step 1.
3. Average the negative log-likelihood over the r splits for each active edge set in Step
1, and choose the active edge set Eˆ with the smallest average validation error.
4. Compute the restricted MLE with the selected edge set Eˆ in Step 3.
In the second step of the CoVES algorithm, we give a detailed description for each repetition given a set of edges, E. Let Ed = {(1,1),· · · ,(p, p)} and E be one
of the estimated graphs from the full solution path. We minimize an unpenalized negative log-likelihood function with zero constraints to the unselected edges, Ec and
the corresponding optimization problem is written as follows: ˜ Θsc,E = argmin Θ>0 {log det(Θ)−tr(SscΘ)} subject toΘij = 0,(i, j)∈Ed\E, where Ssc = 1 nc X i∈sc
(xi −x¯sc)T(xi −x¯sc) is the empirical covariance matrix from the
construction sample and ¯xsc =
1
nc
X
i∈sc
xi is the construction sample average. Note that
all the repetitions have the common set of edges, E, but may give different valued precision matrix estimators, ˜Θsc,E.
Next, the validation set is used to obtain the empirical negative log-likelihood for the precision matrix estimator, ˜Θsc,E. The corresponding log-likelihood is from the
multivariate Gaussian density and is written asls( ˜Θsc,E), wherels(Θ) =
1 2(log det(Θ)− tr(SsΘ)). In the log-likelihood,Ss = 1 nv X i∈s
(xi−x¯s)T(xi−x¯s) is the empirical covariance
matrix from the validation sample, and ¯xs=
1
nv
X
i∈s
xi is the validation sample average.
The negative log-likelihood evaluates how well each set of edges fits with the validation set. Note that the expected loss function evaluated at E is the expectation of the negative log-likelihood with respect to a random selection of s, ΓE = −E[ls( ˜Θsc,E)].
It is called the risk function at ˜Θsc,E. We take the average of the empirical negative
log-likelihood across the multiplersplits to estimate the risk function, which is denoted as ˆΓE as follows: ˆ ΓE =− 1 r X s∈R ls( ˜Θsc,E).
Note that we now have numerous empirical negative log-likelihoods corresponding to the candidate sets. We choose the set with the smallest empirical negative log-likelihood. This step determines the graph structure and the edge set estimate is denoted as ˆE. In the last step, we estimate the signals of the conditional dependency in the selected
graph using the complete dataset and the estimate is denoted as ˆΘEˆ.