CHAPTER 6: USING PRINCIPAL RAYS TO MODEL PCA IN TREE SPACE
6.3 The First Principal Ray Sets
Up to this point we have concentrated on finding a goodsingleray. However, it is not very effective to use one single ray as a principal component when there is only a small proportion of variation captured, which is caused by the fact that a large portion of data points may not project positively onto the ray. In this section we are going to define the analog of the first principal component as a set of rays, called thefirst principal ray set (1st PR set). Of course, we want the set of rays to have the property that makes them useful for PCA type analysis. A major concern about using a set of rays as a principal component is the uniqueness of projections. If a whole set of rays is considered as one principal component, then the projection of any data tree onto this set of rays should be unique. To address this issue, we need to introduce the concept of antipodality between rays.
Definition 6.3.1. InTn, two rays~r1and~r2areantipodal if any two treesT1∈~r1andT2∈~r2are antipodal.
Now we can define a mutually antipodal ray set (MAR set) as a maximal subset of rays which are pairwise antipodal to each other. One nice property about antipodality is that the projection of any data tree onto a MAR set is always unique. This property is stated formally in the following lemma.
Lemma 6.3.1. Given an MAR setR, any tree T projects positively onto at most one ray ofR.
Proof. To prove the lemma, we will use the fact that the angle sum of any triangle inCAT(0) space is no larger than 180◦. Suppose thatT has positive projectionsPandQonto two rays~r1and~r2ofR, respectively, and letO denote the origin (see Figure 6.38), then∠T P O= 90◦ and∠T QO= 90◦. By the above fact, we
have∠T OP+∠OT P 690◦and∠T OQ+∠OT Q690◦, hence∠T OP <90◦ and∠T OQ <90◦. Therefore, ∠P OQ6 ∠T OP+∠T OQ <180◦, and this contradicts the fact that~r
1and~r2are antipodal.
Figure 6.38: This plot shows that there will be a contradiction if treeT projects positively onto two antipodal rays~r1and~r2.
Due to this nice property of MAR sets, it is natural to define the first principal ray set as the best MAR set in terms of capturing data variation.
Definition 6.3.2. Given a set of treesT={T1, T2, ..., Tr} inTn, thefirst principal ray set (1st PR set) is
defined as the MAR set which has the largest sum of squared projections fromT.
Now we are ready to modify the optimization formulation defined by (6.12) and (6.13) in order to search for the 1st PR set. Suppose we look for the 1st PR set containing at mostmrays, there will bemvariable trees, and to enforce the antipodality, a large penalty will be added into the objective function for any pair of rays being not antipodal. The modified formulation is as follows
Maximize m X j=1 r X l=1 " P e∈C(j,l)|e|τj|e|T(j,l)−P k(j,l) i=1 kA (j,l) i kkB (j,l) i k + kτjk #2 −P(τ1, . . . , τm) (6.14) Subject to τj ∈ Vj(T;O;ATj,B T j ), for all 16j6m (6.15) where P(τ1, . . . , τm) =H X 16i<j6m kτik+kτjk −L(τi, τj)
is the penalty function, and H is a large positive constant to prevent any pair of rays from being not antipodal. However, if the number of rays m is large, it can be computationally expensive to apply the steepest descent algorithm, especially when multiple rays approach the orthant boundaries at the same time. For this reason, we propose two simple heuristics.
1. Obtain a set ofnlocally optimal rays Ω by solving the single ray problem defined in (6.12) and (6.13) fornvarious starting points.
2. Sort the rays in Ω in descending order according to the proportion of data variation captured, and label the rays from 1 ton.
3. Get ith candidate MAR set ω
i by starting from ray i in sorted Ω, and keep adding rays with label
larger intoωi thanias long asωi is a MAR set.
4. Choose the set with the largest total proportion of data variation amongω1, . . . , ωn to be the approx-
imate 1st PR set.
This heuristic is easy to implement because it completely avoids solving the penalized optimization problem defined in (6.14) and (6.15). However, one clear drawback is that the accuracy of the heuristic highly depends on the set Ω obtained in step 1.
The second heuristic is a sequentially greedy algorithm, containing:
1. Start with a ray by solving the single ray problem defined in (6.12) and (6.13), label the ray as R1. 2. Suppose we already have a set of rays R1, . . . , Rk, to search for the next ray Rk+1, we solve the
optimization problem defined in (6.14) and (6.15) with the penalty function being
P(τk+1) =H X 16i6k kτik+kτk+1k −L(τi, τk+1) , where τi∈Ri.
3. Repeat Step 2 until no more rays can be added to increase the total variation captured.
Although this heuristic solves the penalized optimization problem, it only deals with one variable tree at a time, hence it is easier to implement than the original formulation.