A New Greedy Algorithm to Find Simple Components

An algorithm is termed greedy if at an iteration it only considers the best local solution and does not reconsider any previous decision. Full enumeration of a single component consisting of p loadings requires the comparison of (3p₋₁₎_/_{2 cases. This is feasible} for small problems but not tractable for larger numbers of variables. Dynamic pro- gramming approaches are not appropriate as previously considered cases can become important after considering subsequent cases.

It is proposed that variable tuples are considered together and these are fully enumerated. If all combinations ofktuples are taken then solutions are found which although not guaranteed to be globally optimum may provide good solutions compared to the optimal properties of PCA. There arep!/(p−k)!k! combinations to consider on a component and eachktuple requires 3k _{patterns to be considered. As a simple illustration} taking pairwise tuples, the patterns to consider for each pair (i, j) of variable loadings are shown in Table 2.4

The idea is presented in pseudo-code as Algorithm 1. To initialize a random or homogeneous vector is used. Alternatively the sign function can be applied to threshold the principal component, where any weight with absolute value less than a chosen threshold is zeroed. Combinations are considered at random from the p!/(p−k)!k! possibilities. To extract allp components there are

p3k p! (p−k)!k!

ith jth 0 0 0 1 0 -1 1 0 1 1 1 -1 -1 0 -1 1 -1 -1

Table 2.4: Each row is the pair of simple unscaled loadings to consider for the selected pair of variables

ably less than full enumeration, so that

3k p!

(p−k)!k! << 3p−1

If k is kept small, less than 3 or 4 then solutions for large data sets become feasible. In factk must be less than 0.4p to be a better option than enumeration. For small k

the algorithm may be repeated starting from the best solution found to date. Once a simple component has been found the next is found subject to maximizing the objective described earlier in equation (2.4).

If strict orthogonality is required, then a subsequent simple component could be sought with the extra constraint that it is in the p−q subspace orthogonal to the

q components already found. However, a simple component consisting of Hausman weights is not guaranteed to be in an orthogonal subspace. Alternatively, in a similar way toprincipal component regressionthe simple components can be regressed onto an independent variable, in which case it is desirable for them to be uncorrelated.

Finding principal components sequentially is optimal as the global solution is the solution to an eigenvalue problem, however for simple components this is not the case. Consequently, better solutions may be obtained by considering groups of simple components simultaneously. However, the search space is expanded. If q is the number of components to consider together and k the tuple size to consider across theq components, then the number of comparisons is

3kq p!

(p−k)!k! (2.8)

as each ktuple acrossq components requires 3kq comparisons. Sequentially there are

q3k p!

(p−k)!k! (2.9)

comparisons. So ﬁnding components simultaneously requires

p!(3qk−q3k) (p−k)!k!

extra comparisons or

3k(q−1)

times as many. For example to ﬁnd 3 simple components simultaneously for a problem involving 20 variables, simultaneous enumeration requires 320×3≈4×1028comparisons; simultaneous simple components taking pairs requires 138,510 and sequential simple components requires 5,130. Sequential enumeration would require 3×320/2≈59 comparisons. Clearly full simultaneous enumeration is likely to be impractical. Sequential enumeration would be practical for small problems and simultaneous simple components would be tractable for moderate size problems. However a sequential simple component approach may ﬁnd solutions to large problems so long as these solutions are close to principal components in terms of the variance they explain and their orthogonality or correlation.

Input: Covariance matrix S (p×p). The number of components to ﬁnd. The number of elements to consider simultaneouslyk. The penalty weight λ Output: Simple Component set C (q×p)

repeat repeat

Initialize the simple component { Random, First Principal Component, Vector of Ones}

Apply threshold to make it simple

Reorganize the vector and covariance into ﬁxed and variable parts to reduce computation

forall the possible patterns (see Table 2.4) for k variables do

Evaluate the loss in Equation (2.13) and keep the best solution

end

until All k combinations of p variables have been evaluated;

until All q components are extracted;

Algorithm 1:Outline of the basic greedy search approach to ﬁnd simple components.

Returning to the example in Section 2.2, Table 2.5 shows the simple components found for k= 3 using a squared orthogonal penalty. This can be compared with those found by enumeration, Table 2.3. Notice that the ﬁrst three components describe iden- tical axes. However, the enumerated solutions explain slightly more variation overall. This is still encouraging as the number of comparisons is substantially less than full enumeration. Full enumeration of q components requires q3₂p comparisons. For these data this equates to 1,328,600 comparisons by enumeration compared to 29,700 using the simple component search. In this case the simple component solution is more sparse than after sequential enumeration. For instance SC4 can be easily interpreted as a contrast between the freshness on application of the deodorant, compared to the freshness after application and feeling conﬁdence. SC5 contrasts freshness with gentle and pleasant. The performance of the sequential simple component algorithm is ex-

SC1 SC2 SC3 SC4 SC5

Fresh deodorant 1 1 0 1 1

Fresh on application -1 -1 0 -1 0 Pleasant on skin during application -1 -1 0 0 1

Gentle on skin -1 0 0 0 1

Did not mark clothes -1 1 1 0 0 Did not leave white marks -1 1 1 0 0 Feeling fresh all day -1 -1 0 1 -1

Feeling conﬁdent -1 -1 0 1 0 Sticky 1 -1 1 0 0 Greasy 1 -1 1 0 0 Wetness 1 -1 0 0 0 Coldness 0 -1 0 0 1 Variance explained 5.29 1.38 1.17 0.77 0.6

Table 2.5: The unscaled components found with the simple component algorithm (k= 3, squared orthogonal penalty).

plored in the next section. If good solutions are found in polynomial time then there may be value for large problems.

In document Simple components, correlated components and an application of statistical shape analysis to consumer and other multivariate data (Page 54-57)