Complexity concepts
III. 2.4.2.2 Bound methods: PRISM approach
These methods count the number of samples in the overlap region of the sample distributions. This process is done in a controlled way where the boundaries are modified such the overlap region progressively decays. The trajectory of this count is then examined to determine the separability of the classes. Two possible sources where information regarding class separation can be obtained by:
1. Value of the trajectory.
2. Shape of the trajectory.
We have to mention, Pierson’s measure of classification complexity calls the Overlap Sum (Pierson 1998). Overlap Sum is the arithmetical mean of overlapped points with respect to progressive collapsing iteration. This criterion does not require any exact knowledge of the distributions and is easy to compute. It is proved (Pierson 1998) to have strong correlation with Bayes error (Rybnik 2004). Further, in the terms of computational complexity, it has an advantage over kNN (Bouyoucef 2007).
The following Space partitioning methods are quite new. They have been proposed for Pattern recognition in the Singh’s work (Singh 2003). As proven by an example of Pierson’s Overlap Sum method of Space partitioning group of methods is more efficient compare to conventional data analysis approaches (Pierson 1998).
The space partitioning methods analyze classification problems by decomposing feature space into a number cells or boxes. Then, they are analyzed at different resolutions.
The only class separability measure based on feature space partitioning according to Singh (Singh 2003) are the other space portioning method proposed at work (Kohn,
tel-00481367, version 1 - 6 May 2010
Nakano and Silva 1996). This measure is called Class Discriminability Measure (CDM).
It is grounded on data-adaptive partitioning of the feature space in such a way the regions having samples from two or more classes are more finely partitioned than those that have a single class. CDM is based on the analysis of inhomogeneous buckets. The basic idea is to divide the feature space into a number of hyper-cuboids.
Each of these subspaces is termed as a box. A given box at any stage is first tested on the basis of a number of stopping criteria. If the stopping criterion is not satisfied, the box partitioning continues. After the stopping criteria are satisfied, all boxes that are inhomogeneous and not linearly separable are used to calculate CDM that is defined on the basis of the difference between total samples in a box and the number of samples from the majority class, over all boxes (Kohn, Nakano and Silva 1996), (Singh 2003).
A class separability ratio calculated using PRISM-based approach has been compared with the Bayes error ratio, as well as with the mentioned Fukunaga’s scatter matrices measures across a range of normal and non-normal data sets (Kohn, Nakano and Silva 1996). The main conclusion is that the CDM measure is more in line with Bayes error when ranking the importance of features compared to indirect Bayes error estimating criteria and is attractive in use due to its reduced computational cost (Kohn, Nakano and Silva 1996), (Singh 2003).
The next paragraph presents a more sophisticate concept of feature space portioning which forms the Pattern Recognition using Information Slicing Method (PRISM) framework.
The feature space in dim-dimension can be partitioned using subspaces with different topologies. In order to avoid the curse of dimensionality, a hyper-cuboids’ primitive is used in PRISM framework (Singh 2002). Partitioning algorithms create hyper-cuboids for m (1 < i < m) data points in dim-dimensional feature space, each of which can be assigned to one of the known l (index of maximal class) classes {1, … , l}. The extra parameter - resolution B (commonly, 0≤ B≤31) of partitions per axes must be selected by user in advance. The total number of boxes is Ktotal=(B+1)D
The difference of PRISM from CDM (Kohn, Nakano and Silva 1996) is that it does not start to split from the median position. Secondly, there is no stopping criterion and the empty boxes are not analyzed. The measure of classification complexity is described on the basis of mentioned above partitioned scheme. The ratio of measures lies in the interval [0;1]. The higher ratio indicates the simpler problem (Singh 2003[2]).
tel-00481367, version 1 - 6 May 2010
Since, we have a space partitioned on hyper-cuboids by resolution factor B, we can calculate the Purity measure which defines “how pure data is” in hyper-cuboids. This method of calculation is more advanced than CLC (Cluster Label Consistency) complexity estimating proposed at work (Shipp and Kuncheva 2001)
Therefore, based on the initial PRISM clustering, we have the instances allocated in the boxes. For each box computes the category ratios. For a total of
Kj
For each box and for each class inside the j-th box, we apply equation II.34. We receive a probabilistic distribution vector p1j,p2j,...,plKjl . Taking into account the required normalization (Singh 2003), the parameter of separability for box Gj is calculated as: overall purity of different cells is given as:
∑
= To give the largest weigh to lowest resolution (Singh 2003), plus to change a meaning of complexity value on opposite, equation III.6.3 takes a form:∑
=Next Neighbourhood separability PRISM’s criterion proposed by Singh at work (Singh 2003) defines a classification complexity measure that depends on the concept of decision boundaries. It is very similar to Purity, that’s why we have skipped its TDTS implementation, but following PRISM’s Collective entropy basing on Information theory is very important measure that represents order/disorder of the system (Singh 2003[2]).
tel-00481367, version 1 - 6 May 2010
In this particular PRISM case, Collective entropy signifies order/disorder accumulated at different resolution considering non-empty cells for a given global resolution parameter B. For probability distribution vector j j l l
pKj
p
p1 , 2 ,..., produced by equation II.6.1 the entropy measure for each box Gj calculates as:
∑
=Let us note here that generally, the base of logarithm in work (Singh 2003) is not defined. Finally, the overall collective entropy included weighted factor and required normalization is calculated as:
This is to keep consistency with other measures: maximal value of 1 signifies complete certainty and minimum value of 0 uncertainty and disorder.
The utility of the PRISM framework and the above complexity measures Purity and Collective entropy based on feature space partitioning have demonstrated in practice (Singh 2003) ability to predict the relevant level of classification error.
The group of methods that cannot be range with the classification complexity measures mentioned above is organized in the group name Other methods. This group may show up to be a very different to their origin. We do not try to provide complete overview of these methods. Our aim is to give a clue about this group of complexity estimators, and give a definition of the type of estimators/criteria that are implemented into T-DTS. One of these measures is the well known Fisher linear discriminant ratio.