tional tool used by the BBA is the QRD. Specifically, it employs Givens rotations to compute the QRD of a triangular matrix after deleting a column [45, 63]. Theoretical measures of complexity are derived in order to compare the proposed BBA against existing methods. Several strategies and heuristics that improve the computational performance of the BBA are presented. The algorithms are implemented using FORTRAN, BLAS and LAPACK on a Sun-Fire-280R of two SPARC-v9 processors with a clock speed of 750MHz and 4Gb of RAM.
The next section briefly discusses the use of the QRD to compute the least-squares and residual sum of squares of the regression model. The regression tree that generates all possible models is introduced. Section 3 presents the BBA and various strategies for improving the efficiency and the applicability of the algorithm. Various heuristics that allow the investigation of large-scale models are also considered. Experimental results are analyzed in section 4. Conclusions and notions for future work are presented and discussed in the final section.
3.2 Computing all possible sub-models
The QRD of the exogenous matrix A in (3.1) is given byQTA = R
0 !
n
m − n , (3.2)
where Q ∈ Rm×mis orthogonal and R ∈ Rn×nis upper triangular and non-singular. Let QTy = ˜y = ˜y1
˜y2
!
n
m − n. (3.3)
The least-squares estimator ofβ is given by ˆβ = argmin
β ky − Aβk
2=argmin β kQ
T(y − Aβ)k2=R−1˜y1,
where k · k denotes the Euclidian norm. The residual sum of squares (RSS) is computed by RSS = ky − Aˆβk2=kQT(y − Aˆβ)k2=k ˜y
2k2.
Let S denote a selection matrix, which comprises d columns of the n×n identity matrix In. Consider
the modified regression model
36 CHAPTER 3. BRANCH-AND-BOUND ALGORITHMS FOR SUBSET REGRESSION where A(S)=AS ∈ Rm×d and β(S)=STβ ∈ Rd. The least-squares estimator ofβ(S), i.e., ˆβ(S), is
obtained from the solution of argmin β(S) ky − A(S)β(S)k2=argmin β(S) kQT(y − ASβ(S))k2=argmin β(S) k ˜y1− RSβ(S)k2.
Now, computing the QRD of RS
QT(S)RS = R(S) 0 ! d n − d and Q T (S)˜y1= ˆy1 ˆy2 ! d n − d, (3.5)
the least-squares estimator and the residual sum of squares of the modified model (3.4) are given by ˆβ(S)=R−1(S)ˆy1 and RSS(S)=RSS + ˆyT2 ˆy2, respectively. The factorization (3.5) is equivalent to
the re-triangularization of R in (3.2) after deleting columns [60, 61, 63, 65, 69, 71]. Notice that if
S consists of the first d (d = 1,... ,n) columns of In, then QT(S)=Inand R(S)is the leading d × d
sub-matrix of RS.
Now, let V = [v1,v2, . . . ,vd] denote the d = |V| indices of the selected columns (variables)
included in the sub-matrix R(S)(model). The sub-models corresponding to the sub-sets [v1], [v1,v2],
···, [v1,v2, . . . ,vd] are immediately available. The problem is equivalent to computing all 2n− 1
subsets of V = [v1,v2, . . . ,vn], where vi=i (i = 1,... ,n). In solving this problem, a function called
Drop will be used. This function re-triangularizes a triangular matrix after deleting a column. That is,
Drop(V,i) = [v1, . . . ,vi−1,vi+1, . . . ,vn], where i = 1,... ,n − 1.
The re-triangularization performed by Drop is obtained by applying a sequence of Givens rotations using (n − i)(n − i + 3)/2 floating point operations (flops) [45, 63]. The application of the Givens rotations on the modified response vector y is not discussed but is included in the complexity analysis. The rotations are between adjacent planes. Figure 3.1 shows the re-triangularization of a 6 × 6 matrix after deleting the 3rd column. A rotation Gj (j = 4,5,6) annihilates the element at
position ( j, j − 1). The rows affected by the rotations are indicated by an arc.
A formal and detailed description of the regression tree (hereafter abbreviated to RT) that gen- erates all subset models has been given in [42]. Here the basic concepts using a more convenient notation are introduced. A node of the RT is a tuple (V,k), where V is the set of indices and k (k = 0,... ,|V| − 1) indicates that the children of this node will include the first k variables. If
V = [v1,v2, . . . ,vn], then the RT T (V,k) is an (n − 1)–tree having as root the node (V,k), where k = 0,... ,n − 1. The children are defined by the tuples (Drop(V,i),i − 1) for i = k + 1,...,n − 1.
3.3. THE BRANCH-AND-BOUND ALGORITHM (BBA) 37 • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • ◦ • • ◦ • ◦ initial matrix - - - G4 G5 G6
applying G4 applying G5 applying G4
Zero element • Non-zero element ◦ Annihilated element
Figure 3.1: Re-triangularization of an n × n upper triangular matrix after deleting the ith column using Givens rotations, where n = 6 and i = 3.
Formally this can be expressed recursively as
T (V,k) =
(
(V,k) if k = n − 1,
(V,k),T (Drop(V,k + 1),k),··· ,T(Drop(V,n − 1),n − 2) if k < n − 2. The number of nodes in the sub-tree T (Drop(V,i),i − 1) is given by δi=2n−i−1 and δi=2δi+1,
where i = 1,... ,n − 1 [42].
Computing all possible subset regressions of a model having n variables is equivalent to gener- ating T (V,0), where V = [1,2,... ,n]. Generally, the complexity –in terms of flops– of generating
T (V,k) is given by
18 · 2|V |−k− 3(|V | − k + 2)(|V | − k + 3). (3.6)
Figure 3.2 shows T (V,k) together with the sub-models generated from each node, where V = [1,2,3,4,5] and k = 0. A sub-model is denoted by a sequence of numerals that correspond to the indices of variables.
3.3 The Branch-and-Bound Algorithm (BBA)
Various statistics, such as R2 and Cp, are associated with model selection [53, 87, 88, 94, 108].
Many of these criteria are monotone functions of the RSS for subsets with the same number of variables. That is, if RSS(V1)≥ RSS(V2), then f (RSS(V1))≥ f (RSS(V2)), where V1 and V2 are
sets of independent variables, |V1| = |V2|, RSS(Vi)is the RSS of the model comprising the variables
in Vi (i = 1,2) and f is the statistic R2 or Cp. In this context, the problem of finding the best