• No results found

Classifiers in Active Learning

3.3 Selection Strategies

3.3.1 Exploitation Based Selection Strategies

3.3.1.1 Classifiers in Active Learning

In many selection strategies, especially in exploitation based selection strategies, a classifier is built from a labelled set and then it is used to predict the labels of unlabelled examples in the pool and assign an informativeness measure with each member in the pool. Many selection strategies algorithms have been proposed using support vector machines (Raghavan et al., 2006), logistic regression (Hoi et al.,

2006a), Na¨ıve Bayes (Segalet al.,2006), maximum entropy (Zhu et al., 2008a), etc. Among them, we will focus on Na¨ıve Bayes, SVM and k-NN based active learners as they are suitable for large-scale text systems.

Na¨ıve Bayes Based Active Learners As discussed previously in Section 2.3.3, Na¨ıve Bayes classifiers are well-known probabilistic classifiers and became popular because of their simplicity, efficiency and accuracy. Using a Na¨ıve Bayes classifier in active learning is particularly efficient since when training a Na¨ıve Bayes classifier only a single pass over the labelled set is needed to gather term frequencies infor- mation and no optimization is needed (Rennie & Rifkin, 2001). Segal et al. (2006) used boosted Na¨ıve Bayes classifiers in their work of efficient uncertainty sampling for labelling large email corpora. Roy & McCallum(2001) tried to optimise the true generalization error rate with a Na¨ıve Bayes classifier in active learning using an expected error reduction method. McCallum & Nigam (1998b) used a Na¨ıve Bayes classifier in their QBC selection strategy where they modified the QBC method and

applied EM to improve parameter estimates of the Na¨ıve Bayes classifier. After that, Na¨ıve Bayes classifiers became widely used with EM (see Ghani,2001; Nigam & Ghani, 2000;Nigam et al., 1998, 2000; Probst & Ghani, 2007).

SVM Based Active Learners SVMs have been very successful and are very

widely used in active learning. One of the most popular approaches to support vector machine based active learning is proposed by Tong & Koller (2001). They developed their selection strategy based on the analysis of version space. The idea is that the most informative examples are those which can halve whole portions of the version space. Three methods are proposed which are approximations to the querying component that always halve the version space. The first method named ‘SIMPLE MARGIN’ selects the example closest to the decision hyperplane in the kernel space; namely, the point with the smallest margin. The second method named ‘MAXMIN MARGIN’ which computes two margins m+ andmfor each unlabelled

example when it is labelled as positive class and negative class respectively and then chooses to query the examples for which the quantity min(m+, m) is greatest. The

third method named ‘RATIO MARGIN’ which also computes two margins m+ and

m− as in MAXIMIN MARGIN but chooses the example to query with the largest value of min(mm+−,m

m+) instead. MAXMIN MARGIN and RATIO MARGIN achieve

better performance than SIMPLE MARGIN but the main drawback is their high computation because for each query, two SVMs for each unlabelled example in the pool need to be built.

Since SIMPLE MARGIN is more efficient, it is widely used in SVM based active learning (Moskovitch et al., 2009; Novak et al., 2006), in particular, as it performs

quite well on text classification problems (see Godbole et al., 2004). Schohn & Cohn (2000) described an application of using the SIMPLE MARGIN strategy in text classification and found that the performance of the SVM trained with the small set of actively selected documents is better than the SVM trained with the whole dataset. Ertekin (2005) also found that by using active learning the need for train- ing examples for SVM can be significantly reduced, and the learner’s classification performance is preserved, even increased in some cases.

Many extensions or variants of basic SVM-based active learners using SIMPLE MARGIN have been proposed. Ertekin (2005) proposed an extension of SIMPLE MARGIN called ‘Simple Random Active Learning’. In Simple Random Active Learning, firstly a small constant number of unlabelled examples are randomly se- lected then the example closest to the margin among this small set is chosen for querying labels. Raghavan et al. (2006) used a method similar to Simple Random Active Learning but focused on extending the traditional active learning frame- work to include feedback on features in addition to labelling examples. Xu et al.

(2003) proposed representative sampling as an extension to the SIMPLE MARGIN method. The representative sampling method explores the clustering structure of uncertain documents (documents in the classification margin) and selects clustering centers for labelling. Dasgupta & Ng (2009) used a similar SVM based selection strategy as SIMPLE MARGIN and combined active learning, transductive learning and ensemble learning for sentiment classification.

k-NN Based Active Learners The re-building, re-classifying and re-ranking of the pool can make uncertainty-based active learning very computationally expensive.

As a lazy learner, the k Nearest Neighbour classifier is attractive to active learning as the introduction of new examples to the classifier simply involves adding them to the labelled set, and that so much computation required for classification (e.g. the similarities between all examples) can be pre-computed.

In k-NN based uncertainty sampling, the output of the k-NN classifier can be transformed into a class membership probability estimate where the distribution is based on the distance of the query example to its k nearest neighbours and then the estimate can be used as a measure of uncertainty. Examples of using k-NN classifiers in the active learning process were proposed initially by Hasenjager & Ritter (1998) and Lindenbaumet al. (2004). More recent examples include develop- ing recommender systems to minimise the number of requests for user evaluations (Teixeiraet al.,2002); investigating dimensionality reduction for active learning with nearest neighbour classifier in text classification (Davy & Luz, 2007b); supervised network intrusion detection method based on Transductive Confidence Machines (TCM-KNN) (Li & Guo,2007); and building classification systems with a weighted k-nearest neighbour classifier (Cebron & Berthold,2008).

It is interesting to note that except for a small number of examples as mentioned, the k-NN classifier has not been used popularly in active learning research. This is something that we intend to pursue in this work.