• No results found

To the best of our knowledge, FIMT-DD is the first algorithm for learning model trees from time-changing data streams with explicit drift detection. The algorithm is able to learn very fast, offering an excellent processing and prediction time per example. The only memory it requires is for storing sufficient statistics at tree leaves which is independent from the size of the input. The model tree is available for use at any time during the course of learning and will always reflect the latest changes in the functional dependencies. This is provided through local change detection and adaptation, avoiding the costs of re-growing the whole tree when only local changes are necessary.

In terms of accuracy, FIMT-DD is competitive with batch algorithms even for medium sized datasets and has smaller values for the variance component of the error. In a sense, this means that, given an infinite stream of instances arriving at high speed, the incrementally learned model tree will have asymptotically approximately comparable accuracy to the tree learned by an assumed ideal batch learner that is able to process an infinite amount of training data. In addition, the algorithm effectively maintains an up-to-date model even in the presence of different types of concept drifts. Both of the proposed methods for change

96 Learning Model Trees from Time-Changing Data Streams

detection are quite robust, and rarely trigger false alarms. With respect to the ability to handle concept drift, the general conclusion is that the best combination of a change detection (TD and BU) and adaptation method (Prune and AltTree) depends largely on the type of concept drift.

97

7

Online Option Trees for Regression

You may delay, but time will not. Benjamin Franklin

Hoeffding-based algorithms for learning decision or regression trees are considered as one of the most popular methods for online classification and regression. Due to their ability to make valid selection and stopping decisions based on probabilistic estimates, they are able to analyze large quantities of data that go beyond the processing abilities of batch learning algorithms. The probabilistic estimates enable provably approximately correct decisions on the estimated advantage of one attribute over another. This type of a selection decision is based on an estimation of the merit of each candidate refinement of a particular hypothesis, which corresponds to a simple hill-climbing search strategy in the space of possible hypothe- ses. Hill-climbing search strategies are known for their inability to avoid local minima, and tree building approaches that employ them are susceptible to myopia.

In this chapter, we discuss the idea of introducing option nodes in regression trees as a method for avoiding the problem of local minima. We start with a presentation of existing algorithms for capping options for Hoeffding (decision) trees, and continue with a presenta- tion of our algorithm for online learning of option trees for regression. The presentation is structured in two sections, of which, the first one discusses the splitting criterion, and the second one discusses methods for aggregating multiple predictions. The last section present an experimental evaluation conducted on a number of real-world and artificial datasets. Our results show that an online option tree has better generalization power as compared to a Hoeffding-based regression tree, resulting from the better exploration of the hypothesis space. At the same time, the use of options enables us to resolve ambiguous situations and reach selection decisions sooner.

7.1

Capping Options for Hoeffding Trees

Pfahringer et al. (2007) were the first to explore the idea of using options for improving the accuracy of online classification. Their algorithm, termed Hoeffding Option Trees, extends the Hoeffding tree algorithm of Domingos and Hulten (2000) by introducing a mechanism for appending additional splits to the existing internal nodes. In particular, the algorithm performs periodical reevaluation of each selection decision. If a different split is probabilis- tically estimated to have higher merit than the existing one, it will be added as an optional split.

The question that one might ask now is: ”Why would a different split have a higher merit than the one which was previously chosen based on the collected statistical evidence?”. The answer to this question is three-fold. The first part is concerned with the probabilistic nature of the selection process. Namely, although the probability for a correct selection can be very high, there is still a non-zero chance for a failure. The second part takes into account the existence of tie situations and the assumption that the third-best split or lower

98 Online Option Trees for Regression

can be safely removed from the ”competition”. The third part is related with the possibility for changes (concept drift) in the functional dependency modeled with the tree.

In the work of Pfahringer et al. (2007), the options are not introduced during the split selection phase. Their algorithm gradually adds optional splits to the existing ones until a maximal number of allowed options is reached. When computing a prediction, the option tree uses a weighted voting strategy to combine all of the predictions produced by the alternate sub-trees. This strategy sums the the individual probabilistic predictions (per class) which were obtained from the alternate subtrees and outputs the class with the highest probability. A slightly improved version of the same algorithm termed Adaptive Hoeffding Option Trees has been proposed in which each leaf stores an estimation of its current error using an exponentially weighted moving average (Bifet et al., 2009b). This estimation is used in the voting scheme to adjust the weight of each node proportional to the square of the inverse of its error.

Following a similar approach, Liu et al. (2009) have proposed another variant of intro- ducing options by using the change detection features of the CVFDT algorithm (Hulten et al., 2001). Options are appended to the existing splits based on a similar reevaluation. The main difference is in the method for combining the available predictions. The approach of Liu et al. (2009) selects the most recent and most accurate alternate sub-tree from which a prediction will be derived. An optional split would thus reflect the most recent and best performing ”knowledge”.

Option nodes encode valuable information on the existence of alternative splits at various parts of the tree structure. The alternative splits point out different directions for exploring the hypothesis space and emphasize the ambiguity in the model. They are an interesting direction for extending Hoeffding-based trees, as noted previously by Pfahringer et al. (2007). However, we see some room for improvement in the way options are introduced. Our observation is that, due to the fact that the Hoeffding bound is conservative and requires a considerable number of examples to be processed before reaching a statistical stability, options will be introduced with some delay that translates into a delay in the process of learning. In addition, the proposed algorithms impose sufficient statistics to be continuously maintained in all of the internal nodes. This implies an additional increase in the memory consumption and the processing time per example. In the following section, we present our suggestions for improvements. Finally, options have so far not been used in online regression trees.