An algorithm is termed greedy if at an iteration it only considers the best local solution and does not reconsider any previous decision. Full enumeration of a single component consisting of p loadings requires the comparison of (3p−1)/2 cases. This is feasible for small problems but not tractable for larger numbers of variables. Dynamic pro- gramming approaches are not appropriate as previously considered cases can become important after considering subsequent cases.
It is proposed that variable tuples are considered together and these are fully enumer- ated. If all combinations ofktuples are taken then solutions are found which although not guaranteed to be globally optimum may provide good solutions compared to the optimal properties of PCA. There arep!/(p−k)!k! combinations to consider on a com- ponent and eachktuple requires 3k patterns to be considered. As a simple illustration taking pairwise tuples, the patterns to consider for each pair (i, j) of variable loadings are shown in Table 2.4
The idea is presented in pseudo-code as Algorithm 1. To initialize a random or homogeneous vector is used. Alternatively the sign function can be applied to threshold the principal component, where any weight with absolute value less than a chosen threshold is zeroed. Combinations are considered at random from the p!/(p−k)!k! possibilities. To extract allp components there are
p3k p! (p−k)!k!
ith jth 0 0 0 1 0 -1 1 0 1 1 1 -1 -1 0 -1 1 -1 -1
Table 2.4: Each row is the pair of simple unscaled loadings to consider for the selected pair of variables
ably less than full enumeration, so that
3k p!
(p−k)!k! << 3p−1
2
If k is kept small, less than 3 or 4 then solutions for large data sets become feasible. In factk must be less than 0.4p to be a better option than enumeration. For small k
the algorithm may be repeated starting from the best solution found to date. Once a simple component has been found the next is found subject to maximizing the objective described earlier in equation (2.4).
If strict orthogonality is required, then a subsequent simple component could be sought with the extra constraint that it is in the p−q subspace orthogonal to the
q components already found. However, a simple component consisting of Hausman weights is not guaranteed to be in an orthogonal subspace. Alternatively, in a similar way toprincipal component regressionthe simple components can be regressed onto an independent variable, in which case it is desirable for them to be uncorrelated.
Finding principal components sequentially is optimal as the global solution is the solution to an eigenvalue problem, however for simple components this is not the case. Consequently, better solutions may be obtained by considering groups of simple com- ponents simultaneously. However, the search space is expanded. If q is the number of components to consider together and k the tuple size to consider across theq compo- nents, then the number of comparisons is
3kq p!
(p−k)!k! (2.8)
as each ktuple acrossq components requires 3kq comparisons. Sequentially there are
q3k p!
(p−k)!k! (2.9)
comparisons. So finding components simultaneously requires
p!(3qk−q3k) (p−k)!k!
extra comparisons or
3k(q−1)
q
times as many. For example to find 3 simple components simultaneously for a problem involving 20 variables, simultaneous enumeration requires 320×3≈4×1028comparisons; simultaneous simple components taking pairs requires 138,510 and sequential simple components requires 5,130. Sequential enumeration would require 3×320/2≈59 com- parisons. Clearly full simultaneous enumeration is likely to be impractical. Sequential enumeration would be practical for small problems and simultaneous simple compo- nents would be tractable for moderate size problems. However a sequential simple component approach may find solutions to large problems so long as these solutions are close to principal components in terms of the variance they explain and their or- thogonality or correlation.
Input: Covariance matrix S (p×p). The number of components to find. The number of elements to consider simultaneouslyk. The penalty weight λ Output: Simple Component set C (q×p)
repeat repeat
Initialize the simple component { Random, First Principal Component, Vector of Ones}
Apply threshold to make it simple
Reorganize the vector and covariance into fixed and variable parts to reduce computation
forall the possible patterns (see Table 2.4) for k variables do
Evaluate the loss in Equation (2.13) and keep the best solution
end
until All k combinations of p variables have been evaluated;
until All q components are extracted;
Algorithm 1:Outline of the basic greedy search approach to find simple components.
Returning to the example in Section 2.2, Table 2.5 shows the simple components found for k= 3 using a squared orthogonal penalty. This can be compared with those found by enumeration, Table 2.3. Notice that the first three components describe iden- tical axes. However, the enumerated solutions explain slightly more variation overall. This is still encouraging as the number of comparisons is substantially less than full enumeration. Full enumeration of q components requires q32p comparisons. For these data this equates to 1,328,600 comparisons by enumeration compared to 29,700 us- ing the simple component search. In this case the simple component solution is more sparse than after sequential enumeration. For instance SC4 can be easily interpreted as a contrast between the freshness on application of the deodorant, compared to the freshness after application and feeling confidence. SC5 contrasts freshness with gentle and pleasant. The performance of the sequential simple component algorithm is ex-
SC1 SC2 SC3 SC4 SC5
Fresh deodorant 1 1 0 1 1
Fresh on application -1 -1 0 -1 0 Pleasant on skin during application -1 -1 0 0 1
Gentle on skin -1 0 0 0 1
Did not mark clothes -1 1 1 0 0 Did not leave white marks -1 1 1 0 0 Feeling fresh all day -1 -1 0 1 -1
Feeling confident -1 -1 0 1 0 Sticky 1 -1 1 0 0 Greasy 1 -1 1 0 0 Wetness 1 -1 0 0 0 Coldness 0 -1 0 0 1 Variance explained 5.29 1.38 1.17 0.77 0.6
Table 2.5: The unscaled components found with the simple component algorithm (k= 3, squared orthogonal penalty).
plored in the next section. If good solutions are found in polynomial time then there may be value for large problems.