Chapter II. KNOWLEDGE TRANSFER ISSUES
3. Stages of knowledge transfer in mergers and acquisitions
3.2. Knowledge classification
7 DefineCounti(x, m) =P50
p=1Counti(x, m, p) 8 Sort Counti(x, m) in descending order
9 Take the top N elements in the sorted Counti(x, m)
10 ENDFOR
11 Sequence Sx matches Si iff Counti(x, m) is in the top N positions for all m = 1, . . . ,6.
There is one missing detail in the above description. If the value of any element in the sorted Counti(x, m) list is < C% ∗the first element for some parameter C, this element and the rest of the elements in this sorted list of length N are excluded.
Unsupervised Face Clustering We have now shown how to compare each se-quence Si against the rest of the data set R = {Sx|x 6= i, x, . . . , N} to produce an output set Mi.
As shown in figure 5-7, we compare each sequence Si against the rest of the data setR={Sx|x6=i, x, . . . , N}, to produce an output setMi containing matches forSi. If the output set Mi is not empty, the system will combine Si and each element Sx in Mi into a cluster. This process is repeated for each Mi from each sequence Si. This clustering step is performed greedily such that if any two clusters contain matching elements, the two clusters will be merged together.
Figure 5-6: The Face Sequence Matching Algorithm. This matching procedure re-ceives one sequence as input and produces an output match set containing one or more sequences. The intuition behind this algorithm is that if two sequences be-long to the same individual, there will be multiple occurrences of similar local image patches between these two sequences.
Figure 5-7: The Face Sequence Clustering Procedure. Given a data set containing t sequences, each sequence is compared to the rest of sequences in the data set. The sequence matching algorithm produces an output set of matches, which are greed-ily merged such that any two clusters containing matching elements will be merged together
false recognition, but in an incremental setup, the latter may get fixed if the robot acquires more data from the corresponding individual. Moreover, given the robot’s greedy clustering mechanism, themergingfailure has a compounding effect over time.
In order to reflect how the robot’s clustering mechanism performs with respect to both of these failure modes, we opted for a set of metric. Given that an individual P has a set of sequencesX =Si|i= 0, .., Xsizein the training set, we define the following categories:
• Perfect cluster: if the system forms one cluster containing all elements in P’s sequence set X .
• Goodcluster: if the system forms one cluster containingSj|j = 0, .., M, M < Xsize and leaves the rest as singletons. Note that the perfect cluster category is a sub-set of the good cluster category.
• Splitcluster: if the system splits the elements of X into multiple clusters.
• Merged cluster: if the system combines sequences from one or more other in-dividuals with any sequences inside X into a cluster. Note that the set of sequences which are labeled as non-faces are treated as if it is an individual.
Thus, merging a non-face sequence into any cluster will be penalized in the same way.
We also define some additional metric for analyzing the split and merged clusters:
• Splitdegree: the proportion of the largest of the A clusters in a splitting case.
• Merged purity: the proportion of the number of sequences from individual I who holds the majority of the sequences in a merged cluster.
• Merged maximum: the maximum number of individuals merged together in a cluster from all of the merged cases.
The split degree and merged purity provide some information about the severity of a split or merged failure. A high split degree corresponds to a lower severity, as this means the clustering still successfully forms a significant cluster of an individual instead of many small ones. A high merged purity also corresponds to a lower severity, since it reflects cases where an individual still holds a significant majority of a cluster.
If the merged purity value is very high, the few bad sequences may not be well represented and will not significantly affect recognition performance.
The merged maximum may provide more information that the total number of merged cases, when a large number of sequences are falsely merged together into a cluster. This would only yield one merged case, and thus its severity will not be reflected by the total number of merged cases.
Given an unlabelled training set containing face sequences from M individuals (according to ground truth), we then measure the clustering performance by the following measurements:
• Number of people: the number of individuals who has at least 2 sequences in the data set
• None: the number of individuals whose sequences did not get clustered at all
• Perfect: the number of perfect clusters
• Good: the number of good clusters, which also includes the perfect clusters
• Split: the number of split clusters
• Splitdegree: a distribution of the split degrees of all the split cases
• Mergedpurity: a distribution of the merged purity of all the merged cases
• Merged maximum: the maximum number of individuals merged together in a cluster from all the merged cases