Model selection and stability selection - Graphical models for estimating dynamic networks

We have seen that estimation of ΘΘΘ considering different coloured graphs given a smoothing parameter λ is possible inside a convex optimization framework. We have also proposed several factorial graphical models. In this subsection we ad- dress same issues on how to choose the ’best’ coloured graph, and what should be a good compromise between a sparse and a dense graph. In particular, according to Meinshausen and B¨uhlmann (2010) we want to find a smoothing parameter such that the expected number of false positive links is taken under control. This aim

Algorithm 3.1: Calculate sparse ΘΘΘ with structured on ΩΩΩ

Require: Coloured graph Si≺ ∗,Ni≺ ∗ and set an initial vector ΣΣΣ0.

1. Find the linear maps A1, . . . , Am.

2. k = 0, . . . ,. 3. Estimate ΘΘΘ(k). 4. Set ΣΣΣ0= diag(ΘΘΘ(k)).

5. Replay ΣΣΣ(k)₀ with ΣΣΣ(k+1)₀ and estimate ˆΘΘΘ(k+1). 6. If|| ˆΘΘΘ(k+1)− ˆΘΘΘ(k)||1< ε Stop

else k = k + 1, go to 3 end.

can be reached through stability selection. Stability selection is similar to bootstrap idea which consists of a re-sampling procedure. For example we used SGLΩ

to estimate the matrix represented in Table 3.5.

3.4.1 Model selection.

Let’s assume that the smoothing parameter is fixed (point-wise control) λ = λopt

and two coloured graphs need to be compared:

[S0≺ FΓT, N0≺ FΓ, S1≺ F1, N1≺ 0],

and

[S0≺ FΓT, N0≺ FΓ, S1≺ FΓ, N1≺ 0].

An information criterion such as AIC, BIC and AICc can be used to compare different factorial graphical models. AIC typically will select more and more com- plex models as the sample size increases, because the maximum log-likelihood increases linearly with n while the penalty term for complexity is proportional to the number of degrees of freedom. Note that for factorial graphical models the number of degrees of freedom is approximated by the number of ”free” parameters different from zero, i.e. the estimated elements different from zero which do not belong to the same partition. AICc penalizes complexity more strongly than AIC, with less chance of over-fitting the model. BIC is constructed in a manner quite similar to AIC with stronger penalty for complexity (Claeskens and Hjort, 2008).

3.4.2 Stability selection.

Usually, the choice of smoothing parameter λ is crucial as it leads to the structure of the network. In particular, if λ = 0 and there are no constraints, the maximum likelihood estimator is ˆΘΘΘ = S−1, if λ → ∞, ˆΘΘΘ is diagonal which means all random variables are independent. Generalized cross validation can be used to select the tuning parameter λ = λcvbut Leng et al. (2006) showed that given λcvthe estima-

tor of ΘΘΘ is not consistent in terms of variables selection. Alternatively, bootstrap (Breiman, 1999) can be considered to estimate empirical distribution of each el- emtent of ˆΘΘΘ. We prefer stability selection (Meinshausen and B ¨uhlmann, 2010) because the expected number of links falsely estimated is controlled and variables selection is consistent. Moreover, the choice of a smoothing parameter λ becomes less important Meinshausen and B¨uhlmann (2010). We adapt stability selection to factorial graphical models.

Let’s consider a vector of smoothing parameters such that λ _{∈ Λ ⊆ R}+ that determines the amount of regularization.

Theorem 3.4.1. A necessary and sufficient condition for θi j,kl= 0 for all i, k∈ Γ

and j, l_{∈ T is that |S}i j,kl| ≤ λ for all i 6= k and l 6= j Mazumder and Hastie (2011).

Upper and lower bounds of λ , uλ and lλ respectively, are calculated such that

u_λ= max(|Si j,kl|) and lλ = min(|Si j,kl|) for i 6= k l 6= j, where i,k ∈ Γ and j,l ∈ T .

Then, for all λ> u_λan empty graph is estimated while for λ< l_λa fully connected graph is estimated. We search the solution of the optimization problem (3.3) for value of λ into the range [lλ, uλ]. The ”optimal” value of λ = λopt can be chosen

by minimizing a score which measures the goodness-of-fit. Let’s suppose a graph ˆG= (V, ˆE) has been inferred, where

E_λ_opt ={(vi, vj) : ˆθi, j6= 0}

is the estimated edge set and,

E=_{(vi, vj) : θi, j6= 0}

denotes the active set. Let I =_{{1,...,n} be the index set for sample y}(i), i_{∈ I, then:} The set of stable edges is indicated as:

Estable={(vi, vj) : ˆΠλi j ≥ πthr},

and it depends on λopt via ˆΠ λopt

i j . The tuning parameter πthr indicates a threshold

Algorithm 3.2: Stability selection for graphical models. Require: n.

• Draw sub-samples of size [n/2] without replacement, denoted by I∗_{⊂ {1,...,n}, where |I}∗_{| = [n/2].}

• Run the selection algorithm ˆEλopt(I

∗_{) on I}∗_.

• Do these steps many times and compute the relative frequencies, ˆ

Πλ_{i j}opt = P∗((vi, vj)∈ ˆEλ), for i, j = 1, . . . , p.

distribution of the random variables is exchangeable and ˆE is a better choice than a random guessing, then it can be shown that

E(F P)≤ 1

2πthr− 1

q2 k ,

where k is the dimension of the model (it depends on the factorial model), q is the number of selected variables (e.g. _{| ˆE|), and FP = |E}c_{∩ ˆE}stable| is the number of

falsely positive selected. This is a finite sample control, even if k>> N. Choose E(F P)≤ v, then if q2≤ vk:

πthr= (1 + q2/vk)/2,

and πthr∈ (1₂, 1) is bounded.

Figure 3.2 is taken from Meinshausen and B¨uhlmann (2010) and it illustrates that the choice of a tuning parameter λ is less important with stability selection than cross validation.

In document Graphical models for estimating dynamic networks (Page 69-72)