Decoding methods - Using Output Codes for Two-class Classification Problems

The decoding process is to apply then binary classifiers and then obtain an output codexfrom the learners. This output code is used to compare to the base codewords (rows) that are defined in the matrixM. The new instance is assigned to the class with the closest codeword. The most frequently decoding designs are: Hamming Decoding, Inverse Hamming Decoding and Euclidean Decoding.

4.2.1 Hamming decoding

The Hamming decoding method is based on measurement of Hamming distance and it is one of the most common decoding techniques. The experimental results in this thesis are based on Hamming decoding method. In this section, we will give a brief introduction of Hamming distance and then we will state how the Hamming decoding method has been used.

Hamming distance

We have a short review of the Hamming distance here. Hamming distance was first introduced by Richard Hamming in 1950. It is used in telecommunication to detect and correct flipping errors. In machine learning, the term Hamming distance between two equal length words is the number of different bits at the same position where the corresponding symbols are different. In other words, it describes the minimum number of substitutions need to change from one word to the other word. To calculate the Hamming distance between two words is quite simple. The Hamming distance calculation can be processed as follows (suppose there arek bit symbols in each string):

• Iterate i from 0 tok.

• Compare the symbols at the ith position in both strings.

• If they are different.

increase counter d by 1

• Stop when we reach the last bit of the string.

Hamming distance represents the distance between two strings. In other words, it is the number of different bits in two strings. For example, the Hamming Distance between:

”apple” and ”apply” is 1;

”10010011” and ”01011111” is 4; ”10101010” and ”01010101” is 8.

Hamming distance can be used for any string of symbols. However, in this thesis, the examples are binary cases so that most strings are composed of 0s and 1s.

Hamming decoding method

The Hamming decoding method (Hamming, 1950) is one of the most popular strategies for ECOCs. From its name, it is obvious that the initial proposal to decode is to use the Hamming decoding measure. There is an alternative way to calculate Hamming Distance. It is defined as follows:

HD(x, yi) =Pn_j₌₁(1−sign(xjyij))

The Hamming decoding method is based on the error correcting principle under the assumption that two possible symbols can be found at each position of the sequence. Each learning task can be modelled as a binary problem.

f1 f2 f3 f4 f5 f6 f7 class A 0 0 0 0 0 0 0 class B 0 0 0 1 1 1 1 class C 0 1 1 0 0 1 1 class D 1 0 1 0 1 0 1 output 1 0 1 1 1 0 1

Table 4.1: An example of an exhaustive matrix using Hamming distance

Hamming decoding can guarantee to correct up to d−₂1 bit errors, where d is the minimum Hamming distance between all possible pairs in the codeword matrix. Suppose we have the following codeword matrix:

Heref1, f2, ..., f7 are the base learners. These learners can be any binary learners

as they learn to discriminate between 0s and 1s. For example, f1 learns class D

against class A, class B and class C. In the output example in the table, the prediction of f1 is positive so that the output for f1 is 1. This is a very clear case.

Learnerf3 is a little bit different. It learns classC and class D against classA and

class B. In other words, f3 predicts whether the class belongs to either class A and

classB or class C and class D. In this example, the prediction of f3 is also positive

(1). We keep tracking the classifiers f1, f2, ..., f7 and then we get the output code

string 1011101.

With the output code string 1011101 in hand, we then calculate the Hamming distance to each base codeword. The class with the smallest Hamming distance is the predicted class. In our example, the Hamming distances to each base codeword are:

class A: 0000000 vs 1011101 is : 5 class B: 0001111 vs 1011101 is : 3 class C: 0000000 vs 0110011 is : 5 class D: 1010101 vs 1011101 is : 1

test instance to class D. The Hamming distance can be calculated using either of those two ways that we mentioned before.

4.2.2 Inverse Hamming decoding

Inverse Hamming decoding (Escalera & Pujol, 2010) is another popular decoding method. It is defined as follows: Let ∆ be the matrix composed by the Hamming decoding measure between the codewords. ∆ can be inverted to find the vector containing the N individual class likelihood function by means of:

IHD(x, yi) =max(∆−1DT) where,

∆(i1, i2) =HD(yi1, yi2), and

D is the vector of Hamming decoding values of the test codeword x for each of the base codewords yi.

Escalera and Pujol state that, in practical situation, the behaviour of the inverse Hamming decoding method is very close to the behaviour of the Hamming decoding strategy.

4.2.3 Euclidean decoding

Euclidean Decoding (Escalera & Pujol, 2010) is another well-known decoding strategy. This measure is defined as follows:

ED(x, yi) =

q Pn

j=1(xj−yij)

It measures the Euclidean distance between two code vectors, It also behaves similarly to the Hamming distance.

In document Using Output Codes for Two-class Classification Problems (Page 52-56)