• No results found

Stefano Cagnoni, Monica Mordonini, and Giovanni Adorn

3.2. Evolving binary classifiers using Sub-machine-code genetic programming

SmcGP can be used to efficiently develop high-performance binary classifiers, in terms of both accuracy and computation speed. This is a result of more general interest, since anyN-class classifier of arbitrary complexity can be implemented as an ensemble of distinct specialized binary classifiers which, organized in differ- ent possible architectures with different degrees of redundancy, can perform the original, more complex, classification task.

However, the choice to use a multiclassifier approach must be corroborated by methods that produce fast, accurate classifiers very efficiently for a multiplicity of reasons. Firstly, since the final accuracy of an ensemble of classifiers depends, at least linearly, on the error rates of the single classifiers, developing ensembles of classifiers require that each component be very accurate. Secondly, even if an ensemble of classifiers usually yields better results than the corresponding single classifier, thanks to a richer and/or redundant processing of information, develop- ing an ensemble of binary classifiers is usually much more time-consuming than developing a single, equivalent classifier. Finally, running an ensemble of classifiers is generally very computationally demanding.

50 Sub-machine-code GP for binary image analysis We describe a general framework within which the evolutionary approach can be used to design binary classifiers which meet the above-mentioned require- ments. The performance of binary classifiers thus obtained has been assessed on a low-resolution digit recognition problem and on an image “segmentation-by- classification” task, in which classifiers have not only been used as stand-alone modules, but also as building blocks for multiple-classifier architectures.

Following a typical detection scheme, each classifier is associated to one class and is required to have 1 as output when the input pattern belongs to the corre- sponding class, and 0 otherwise. To solveN-class problems, a set ofNbinary clas- sifiers can be used; the final classification can be derived from the analysis of the response of all classifiers. This is, for instance, the typical architecture and training strategy used inN-class classifiers based on feed-forward neural networks. In the approach described in this chapter, however, classifiers are evolved independently of one another, differently from neural networks in which paths leading from the input to the output layer share several weights, allowing classifiers to be trained concurrently.

The ideal situation for such a multiclassifier architecture occurs when only the classifier corresponding to the class of the input pattern outputs 1, while all others give 0 as output. As pointed out in [5], where this kind of classification architecture is described in detail with reference to the use of classifiers evolved by GP, there are, quite intuitively, two trivial cases in which this strategy fails. One occurs when no classifier produces a high output, while the other occurs when more than one classifier produce 1 as output. The problem of deriving a final decision in this case can be tackled by several different approaches, such as a hierarchy of increasingly specialized classifiers or an “a posteriori” statistical approach.

Such architectures usually perform classification based on criteria similar to those used in sports tournaments. If one looks upon the output of the classifier as the result of a match between the two classes under consideration, the deci- sion could be based, for example, upon a knock-out or a round-robin tournament mechanism. A more computationally expensive but usually more performing ap- proach is to record all outputs of the classifier set and combine them into a pattern that is then classified by a so-called “stacked generalizer” [16].

Therefore, from the point of view of classification strategy, the general archi- tecture of the evolutionary classifiers considered in this chapter is quite conven- tional. The peculiar features of the approach are related with the high degree of parallelism in the computation performed by each individual and with the cri- terion by which the output of each individual, and therefore its fitness function, is computed. Regarding the former property, SmcGP performs parallel compu- tation by applying bitwise operators to packed representations of arrays of 1-bit data. More precisely, ifP is the dimension in bits of the input space in SmcGP, the function computed by each individualIk is a function fk :{0, 1}P → {0, 1}S (S > 1). Each fk can be seen as a set of Salternative binary-output functions

fki : {0, 1}P → {0, 1}computed concurrently at each function evaluation. The output space sizeS is equal to the size of the word into which 1-bit data are packed.

Stefano Cagnoni et al. 51 Therefore, decoding each individualIkimplies two steps. In the first one, the function encoded by the corresponding representation is decoded. In the second one, each bit of the result obtained in the first step is considered, in turn, as the output of the classifier, and the corresponding binary-output function is taken into consideration. As long as a fitness functionFis defined for the problem at hand, a different fitness valueF(fki) can therefore be associated to each solution. The fitnessF(Ik) of each individual is equal to the maximum fitness obtained in the second step,

FIk =max i Ffki . (3.1)

This means that while only one fitness value is assigned to each individualIk,

Sfitness function evaluations are performed for each evaluation ofIk. This implies that defining fitness as in (3.1) can be interpreted as choosing one of the bits of the output pattern as the actual 1-bit output of an individual.

Besides being beneficial from the point of view of computation efficiency, this strategy favors extraction of the most relevant features of the patterns un- der classification. This can be explained considering that, in SmcGP, there exists a direct morphologic and geometric correspondence between input and output. Even when SmcGP spans a larger function space than the one defined by the en- coding of inputs into one long word, and evolves complex functions that are made up of building blocks that operate on different “slices” of the input pattern and keeps such a correspondence for each of the words into which the whole input is divided, operations performed on those slices are local.

Because of these properties, the fact that one bit of the output pattern provides the best fitness generally implies that the area of the input pattern by which the value of such a bit is most influenced contains the most significant feature for the classifier under consideration.