SIMPLE ENHANCED - Online Choice of Active Learning Algorithms

SIMPLE

Figure 11: Left: Data Set ‘Twonorm2D’ that was created by projecting the UCI ‘Twonorm’ data set to two dimensions. This data set demonstrates the unstable general behavior of active learners during early stages of the learning session. Right: Learning curves ofSIMPLE

and enhancedSIMPLE(via buffering) on the ‘Twonorm2D’ data set.

We propose the following solution using the idea of “buffering”, as well as our classification entropy estimator. One of the symptoms of the drastic decision boundary instability described above is a (drastic) change in the classification entropy. We observe that when the boundary shifts from the border between the two clusters to the middle of one of the clusters (iterations 5,7) the entropy decreases. Therefore, the solution we propose is to look at the change in entropy of the pool (and training set) classification in the end of each trial. As long as the entropy decreases, new examples are buffered and not added to the training set. Once the entropy has not decreased, all buffered

(4) (5) (6)

(7) (8)

Figure 12: Pool classification and training sets (blackened points) generated by theSIMPLEactive learner after 4-8 iterations on the ‘Twonorm2D’ data set.

points are added to the training set. If we want to keep a bounded buffer this solution introduces a new parameter; namely, the buffer size. In our experience a small buffer is sufficient and in all our experiments we used a buffer of three points.

This buffering approach provides an effective solution to the above problem. In Figure 11 (right) we see the learning curves of SIMPLEon the ‘Twonorm2D’ data set with and without this

enhancement. It is evident that the learning curve of the original algorithm is very unstable. Adding the buffering enhancement smoothes the learning curve. Figure 13 shows the pool classification and the training sets generated by the enhancedSIMPLEactive learner after 4-8 iterations. The decision

boundary now does not change drastically from one iteration to another.

References

(4) (5) (6)

(7) (8)

Figure 13: Pool classification and training sets (blackened points) generated by the enhanced

SIMPLEactive learner after 4-8 iterations on the ‘Twonorm2D’ data set.

P. Auer, N. Cesa-Bianchi, Y. Freund, and R. E. Schapire. The nonstochastic multiarmed bandit problem. SIAM Journal on Computing, 32(1):48–77, 2002.

R. Bachrach, S. Fine, and E. Shamir. Query by committee, linear separation and random walks. In Proceedings of Euro-COLT, 13th European Conference on Computational Learning Theory, 1999.

O. Bousquet and A. Elisseeff. Stability and generalization. Journal of Machine Learning Research, pages 499–526, 2002.

C. Campbell, N. Cristianini, and A. Smola. Query learning with large margin classifiers. In Pro- ceedings of ICML-2000, 17th International Conference on Machine Learning, pages 111–118, 2000.

G. Cauwenberghs and T. Poggio. Incremental and decremental support vector machine learning. In Neural Information Processing Systems (NIPS), pages 409–415, 2000.

N. Ceza-Bianchi, Y. Freund, D. Haussler, D. P. Helmbold, R. E. Schapire, and M. K. Warmuth. How to use expert advice. Journal of the ACM, 44(3):427–485, May 1997.

C. C. Chang and C. J. Lin. Libsvm: a library for support vector machines, 2002. URL

http://www.csie.ntu.edu.tw/ cjlin/papers/libsvm.pdf.

D. A. Cohn, L. Atlas, and R. E. Ladner. Improving generalization with active learning. Machine Learning, 15(2):201–221, 1994.

D. A. Cohn, Z. Ghahramani, and M. I. Jordan. Active learning with statistical models. Journal of Artificial Intelligence Research, 4:129–145, 1996.

N. Cristianini and J. Shawe-Taylor. An Introduction to Support Vector Machines. Cambridge Uni- versity Press, 2000.

Y. Freund, H. S. Seung, E. Shamir, and N. Tishby. Selective sampling using the query by committee algorithm. Machine Learning, 28(2-3):133–168, 1997.

I. Guyon, N. Matic, and V. Vapnik. Discovering informative patterns and data cleaning. In Advances in Knowledge Discovery and Data Mining, pages 181–203. 1996. URL http://citeseer.nj.nec.com/guyon96discovering.html.

R. Herbrich, T. Graepel, and C. Campbell. Bayes point machines. Journal of Machine Learning Research, 1:245–279, 2001. URLhttp://citeseer.nj.nec.com/herbrich01bayes.html. D. Hochbaum and D. Shmoys. A best possible heuristic for the K-center problem. Mathematics of

Operations Research, 10(2):180–184, 1985.

A. Krogh and J. Vedelsby. Neural network ensembles, cross validation, and active learning. In G. Tesauro, D. Touretzky, and T. Leen, editors, Advances in Neural Infor- mation Processing Systems, volume 7, pages 231–238. The MIT Press, 1995. URL http://citeseer.nj.nec.com/krogh95neural.html.

S. Kutin and P. Niyogi. Almost-everywhere algorithmic stability and generalization error. Technical Report TR-2002-03, University of Chicago, 2002.

M. Lindenbaum, S. Markovitch, and D. Rusakov. Selective sampling for nearest neighbor classi- fiers. Machine Learning, 54(2):125–152, 2004.

D. MacKay. Information-based objective functions for active data selection. Neural Computation, 4(4):590–604, 1992. URLhttp://citeseer.nj.nec.com/47461.html.

A. K. McCallum and K. Nigam. Employing EM in pool-based active learning for text classifica- tion. In Proceedings of ICML-98, 15th International Conference on Machine Learning. Morgan Kaufmann Publishers.

T. Mitchell. Generalization as search. Artificial Intelligence, 28:203–226, 1982.

G. R¨atsch, T. Onoda, and K. R. M ¨uller. Soft margins for Adaboost. Machine Learning, 42:287–320, 2001.

N. Roy and A. McCallum. Toward optimal active learning through sampling estimation of error reduction. In Proceedings of ICML-2001, 18th International Conference on Machine Learning, pages 441–448, 2001.

M. Saar-Tsechansky and F. Provost. Active learning for class probability estimation and ranking, 2002. URLhttp://citeseer.nj.nec.com/tsechansky01active.html.

G. Schohn and D. Cohn. Less is more: Active learning with support vector machines. In ICMLPro- ceedings of ICML-2000, 17th International Conference on Machine Learning, pages 839–846, 2000.

B. Sch¨olkopf and A. Smola. Learning with Kernels. MIT Press, 2002.

M. Seeger. Learning with labeled and unlabeled data, 2002. URL

http://citeseer.nj.nec.com/seeger01learning.html.

H. S. Seung, M. Opper, and H. Sompolinsky. Query by committee. In Computational Learning Theory, pages 287–294, 1992. URLhttp://citeseer.nj.nec.com/seung92query.html. J. Shaw-Taylor and N. Christianini. Further results on the margin distribution. In Proceedings of

the 12th Annual ACM Conference on Computational Learning Theory (COLT), pages 278–285, 1999.

J. Shaw-Taylor and N. Christianini. On the generalization of soft margin algorithms. IEEE Trans- actions on Information Theory, 48(10):2721–2735, 2002.

N. Tishby, F. C. Pereira, and W. Bialek. Information bottleneck method. In 37-th Allerton Confer- ence on Communication and Computation, 1999.

S. Tong and D. Koller. Support vector machine active learning with applications to text classifica- tion. Journal of Machine Learning Research, 2:45–66, 2001.

V. N. Vapnik. Statistical Learning Theory. John Wiley & Sons, Inc., 1998.

T. Zhang and F. Oles. A probability analysis on the value of unlabeled data for classification prob- lems. In International Joint Conference on Machine Learning, pages 1191–1198, 2000.

In document Online Choice of Active Learning Algorithms (Page 33-37)