Three-stage Classifier Using a 2-2-2 Strategy

6.9 Pattern Classification

6.9.1 Three-stage Classification

6.9.1.2 Three-stage Classifier Using a 2-2-2 Strategy

A huge step is taken next by excluding the training patterns representing the random pattern. Thus, the size of the training set decreases to some extent. A modification is made to the three-stage classifier, with a 2-2-2 approach used instead of a 2-2-3 (see Figure

6.18). The third stage of the classifier now only has to classify the input pattern into two classes, circular and spiderweb. Hopefully, with the omission of the random pattern, the

confusion level decreases. Table6.11revised Fisher Ratio scores for selected features of the three stages. 3rd_{stage classifier} k₃ training set 3 G2

{circular, spiderweb, random}

{circular} {spiderweb}

u₃ u₄

input pattern, x

feature set 3

Figure 6.18: Figure showing the modification made to the flowchart of the three-stage

classifier, with the third stage now needing to classify the input pattern into either one of two classes, circular or spiderweb.

1ststage 2ndstage 3rdstage

Feature Score Feature Score Feature Score

r2 3.0414 f0 2.2073 s4 0.1564

d1 2.8886 f3 1.9565 f2 0.1229

u2 2.8331 f4 1.7016 f5 0.0785

f0 2.1186 f2 1.4367 u2 0.0154

- - r2 1.2765 - -

Table 6.11: Selected features used for the three-stage classifier using a 2-2-2 strategy based on Fisher Ratio scores.

The leave-one-out strategy is once again employed to estimate the accuracy of the classifier with four classes to classify into. The focus here is to reduce class-specific classification errors, especially for the spiderweb class, which proved to be the main drawback in the 2-2- 3 strategy. The effect of omitting the random pattern on overall classification error remains the main interest. Based on the results shown in Figure 6.19, both thek-NN and average distance k-NN using a 2-2-2 strategy showed vast improvements over the 2-2-3 strategies in terms of correct classification percentage for specific values of k.

The k-NN classifier improves as much as 12.6% while the average distance k-NN classifier experienced a 17.1% improvement. Looking into the class-specific classification error, omission of the random pattern proves to be a crucial step according to the results shown in Table6.12.

0 5 10 15 20 25 30 35 40 45 50 0 10 20 30 40 50 60 70 80 90 100

3−stage k−NN with 2−2−3 strategy

3−stage average distance k−NN with 2−2−3 strategy 3−stage k−NN with 2−2−2 strategy

3−stage average distance k−NN with 2−2−2 strategy

Percentage of Correct Classifications, %

Figure 6.19: Percentage of successful classification for k in the range [1, 50] using the three-stage approach with the random pattern omitted leaving only four classes to classify into. The maximum successful classification percentage and the respective values of k

are as the following; k-NN using a 2-2-3 strategy (63.4% at various values of k), average distancek-NN using a 2-2-3 strategy (64.9% atk=4),k-NN using a 2-2-2 strategy (76.0% at various values ofk) and average distancek-NN using a 2-2-2 strategy (82.0% atk=4).

Class k-NN average distancek-NN

(k=2) (k=4)

Circular 60.9% 65.2%

Rectangular 94.4% 100.0%

Unidirectional 76.2% 81.0%

Spiderweb 60.0% 70.0%

Table 6.12: Distribution of correct classifications over different classes for the three- stage classifier using the 2-2-2 strategy. Vast improvement is experienced over the 2-2-3

approach, especially for the spiderweb pattern.

classifier succeeded in classifying 60% and 70% of the spiderweb test samples. These are important results, considering that the 2-2-3 strategy failed miserably in the task, directly affecting the overall classification performance.

Another way of evaluating classification performance is by including fuzzy elements in the consideration of correct classifications. Previous experiments assumed hard classification to give total membership to a particular class. In evaluating fuzzy classification, soft memberships are made accountable in deciding whether a test pattern is correctly classified. As an example, let test pattern A be a circular class which is classified as 32% circular,

12% rectangular, 3% unidirectional and 53% spiderweb. In a hard classification rule, A

is considered misclassified. However, in fuzzy terms, this is not really the case, since A is classified as 32% circular (second in rank) which represents quite a strong figure, considering the fact that there are four classes (25% per class in average). Furthermore, looking from human perception, it is very clear that subjectivity is quite inevitable when it comes to classifying crack patterns, or any problem which involves complicated combinations of line structures. Each crack class may contain an element of the others to a certain extent and this situation is the situation which the fuzzy strategy tries to model.

Thus, in the next analysis, correct classification is defined as any condition where the test pattern is classified into its actual class with more than 25% confidence, regardless of the rank. Figure 6.20 attempts to compare the performance of the k-NN and the average distancek-NN classifers with the fuzzy element brought onto the surface. As expected, the percentage increased dramatically when the new rule was applied. The interesting point from the analysis is that nearly all the test patterns (96% for both classifiers) received considerable amount of attention from the classifiers (above 25% confidence vote). Table

6.13 shows the class-specific classification success for both classifiers.

0 5 10 15 20 25 30 35 40 45 50 70 75 80 85 90 95 100

3−stage k−NN using 2−2−2 strategy with fuzzy classification

3−stage average distance k−NN using 2−2−2 strategy with fuzzy classification

Percentage of Correct Classifications, %

Figure 6.20: Percentage of successful classifications forkin the range [1, 50] when fuzzi- ness is considered in defining correct classification. A pattern is considered correctly classified if more than 25% confidence is recorded for its actual class. The maximum successful classification percentage and the respective values ofkare as follows;k-NN (96% atk=4)

and average distancek-NN (96% at various value ofk).

In document Analysis of craquelure patterns for content based retrieval (Page 178-181)