Correlation between a Grouping and a Feature

4.4 Correlations and Problem Deﬁnition

4.4.1 Correlation between a Grouping and a Feature

i is correlated

withfj, objects in the same object subset of the partition tend to have similarfj values

while objects from diﬀerent object subsets tend to have diﬀerent fj values. We use the data matrix in Figure 4.2 as an example. Given featuref₃ and two groupings indicated by trees T{f1} _and _T{f2} _{in Figure} _4.3

• G{₈f1} ={{s₂, s₃},{s₄, s₅},{s₇, s₈}}

• G{₁f2} ={{s₂, s₅},{s₆, s₈},{s₃, s₄}}

Their feature values of f₃ are plotted in Figure 4.4, using diﬀerent markers to repre- sent objects in diﬀerent subsets of a grouping. As we can see, grouping G{₈f1} is more correlated with f₃ than G{₁f2}.

We use normalized RSS (residue squared sum of error) and BIC (Bayesian information criterion) to measure the correlation between a grouping and a feature.

Figure 4.4: Correlation: feature f₃ and groupings G₈{f1},P₁{f2}

Given a grouping GF_i ={S₁, S₂, ..., S_u}, S =u_l₌₁S_l and a feature fj, we calculate the total varianceSST, variance between object subsetsSSB, and variance inside object subsets SSE as follows

M(S_l) = 1 |S_l| s_k∈S_l fj(sk), M M = 1 |S| s_k∈S fj(sk) (4.1) SSB = u l=1 |S_l|(M(S_l)−M M)2 (4.2) SSE = u l=1 s_k∈S_l (fj(sk)−M(Sl))2 (4.3) SST = s_k∈S (fj(sk)−M M)2 =SSE+SSB (4.4)

Then we get the normalized RSS

RSS(GF_i , fj) = SSE

SST =

SSE

SSE+SSB (4.5)

For example, given the two partitions and feature in Figure4.4, their RSS scores are

• RSS(G{₈f1}, f₃) = ₂₈3._.57₉₃ = 0.12 • RSS(G{₁f2}, f₃) = 33₃₄._.14₁₄ = 0.97

As we can see, a lower RSS value implies a stronger correlation.

Property 4.4.1. Monotonicity of RSS: Given two partitionsGF

i andGF

j , ifGF j is a child (ﬁner) partition ofGF

i , for any featurefu (fu ∈F −F), we haveRSS(GF

i , fu)≥ RSS(GF

j , fu).

Proof: Because GF_j is a child (ﬁner) partition ofGF_i ,GF_i andGF_j contain the same subset of objects. Therefore, we have

SST(GF_i , fu) =SST(GF j , fu)

For each subset S_i_l in P_iF

• If∃S_j_t ∈PF

j , such that Si_l =Sjt, then

s_k∈S_il (fu(sk)−M(Si_l))2 = s_k∈S_jt (fu(sk)−M(Sjt))2 • IfS_i_l is partitioned into {S_j₁, S_j₂, ..., S_j_w} in GF j , then s_k∈S_il (fu(sk)−M(Si_l))2 = w x=1 s_k∈S_jx (fu(sk)−M(Sjx))2 + w x=1 |S_j_x|(M(S_j_x)−M(S_i_l))2 Thus, s_k∈S_il (fu(sk)−M(Si_l))2 ≥ w x=1 s_k∈S_jx (fu(sk)−M(Sjx))2

Therefore, according to Equation 4.3, we have

SSE(GF_i , fu)≥SSE(GF j , fu)

Since RSS = SSE_SST, we get

RSS(GF_i , fu)≥RSS(GF j , fu)

According to Property 4.4.1, if we measure correlation using RSS only, we will find that the finest groupings always have the lowest RSS score and hence the strongest correlation. To correct this bias of favoring finest groupings, we normalize the score by the number of subsets in the grouping since a finer grouping must contain a larger number of object subsets.

We utilize BIC (Bayesian information criterion) (Schwarz(1978)) to deﬁne the correlation between a grouping and a feature, taking into account RSS, the number of subsets and also the total number of objects in the grouping.

Deﬁnition 4.4.1. BIC Correlation between a Grouping and a Feature: Given grouping GF

i and feature fj ,fj ∈F −F, the correlation between them is deﬁned as C(GF_i , fj) =log(RSS(GF_i , fj)) +u· log(|S

_|₎

|S| (4.6)

in which, u is the number of object subsets in the grouping and |S| is total number of objects in the grouping.

A lower C(GF

i , fj) value indicates a stronger correlation between grouping GF i and

feature fj. For example, the correlation scores in Figure 4.4 are

• C(G{₈f1}, f₃) =log(0.12) + 3·log₆(6) =−1.76 • C(G{₁f2}, f₃) =log(0.97) + 3·log₆(6) = 1.25

The correlation score between G{₈f1} and f₃ is much lower than the score between

The incorporation ofu(the number of subsets in a grouping) into Equation4.6avoids the cases where ﬁnest groupings always have the lowest score. When two groupings have similar RSS scores, the BIC correlation favors the one containing smaller number of object subsets. Given two groupings which we used as examples before,

• G{₈f1} ={{s₂, s₃},{s₄, s₅},{s₇, s₈}}

• G{₆f1} ={{s₂, s₃},{s₄, s₅, s₇, s₈}}

G{₈f1} is a ﬁner grouping ofG{₆f1}. Their RSS values are similar

• RSS(G{₈f1}, f₃) = 0.12 • RSS(G{₆f1}, f₃) = 0.13

However, since G{₆f1} contains fewer subsets, its BIC correlation score is lower than that of G{₈f1}.

• C(G{₈f1}, f₃) =−1.76 • C(G{₆f1}, f₃) =−2.08

The incorporation of |S|(the total number of objects (samples) in a partition) into Equation 4.6 is easy to see. A correlation supported by a large number of objects is always better than a correlation supported by a smaller number of objects.

As we discussed in Section 4.3, each tree hierarchy TF _{can imply a set of groupings}

{GF_}_{. Similar to TreeQA, we deﬁne the correlation between tree} _TF _{and feature} _f i, fi ∈F −F as the strongest correlation achieved by a grouping of{PF

}and featurefi.

Deﬁnition 4.4.2. Correlation between a Tree and a Feature: Given treeTF _and

feature fi, fi ∈F −F, the correlation between them is C(TF, fi) =min{C(GF j , fi)|GF j ∈ {G F_}} (4.7)

In document Efficient algorithms in analyzing genomic data (Page 78-83)