3.3 Handling Concept Drift
3.3.1 Instance Selection Based Concept Drift Handling Ap-
Instance selection approaches handle concept drift by selecting which in- stances to train on, based on their perceived relevance to the current con- cept. The most common instance selection technique is a variation of the continuous rebuild approach known as a sliding window. In sliding window approaches the training set on which the classifier is trained is known as the training window. The data in the training window is refreshed periodically with new data, based on the assumption that recent data is more likely to be from the same concept as the current concept.
With sliding windows new instances are added to the front of the training window as they arrive. Once the new training window has been formed the classifier is rebuilt, making sliding window approaches continuous rebuild approaches. The number of instances in the training window is known as the
size of the window. A fixed sized sliding window maintains a fixed number of instances in the window. When new instances are added to the front of the window, an equal number of instances are removed from the end of the window. Figure 3.2 illustrates how a fixed size sliding window behaves on a data steam containing concept drift. The example shows a data stream split into batches, but the principal of a sliding window can be applied to an instance based data stream in a very similar manner. Each box is a
(a) A fixed size sliding window at time t.
(b) A fixed size sliding window at time t + 15.
(c) A fixed size sliding window at time t + 30.
Figure 3.2: An illustration of how a fixed size sliding window handles concept drift.
batch of data in a data stream, the red batch is the current batch, the batch of data that the classifier is currently processing, and the blue batches are “historic data”, batches which the classifier has processed in the past. The purple batches are historic data which forms the training data of the classifier and the orange batches are unlabelled data yet to be processed. At each batch the classifier is trained on the data in the training window and
classifies the unlabelled data in the current batch. After the current batch has been classified it is added to the training window and the oldest batch is removed from the training window. In Figure 3.2a the sliding window is one batch away from a change in concept, marked by the dashed vertical line. Figure3.2b shows the same data stream after the training window has moved 15 batches to the right. The training window now contains data from both before and after the change in concept, which might result in a drop in classifier performance. Finally Figure 3.2c shows the process 15 batches further along. At this stage the data in the training window is composed solely of data from the new concept which should result in a rebound in classifier performance. The size of the sliding window dictates the properties of the fixed size sliding window. A small window reacts to a change in concept faster than a large window, but is more sensitive to noise, whereas a large window tends to perform better when the concept is stable.
A variable size sliding window allows the size of the window to change, rather than remain fixed. With a variable size window the window is usually allowed to grow when the concept is stable (Figure 3.3a) and shrink when there is a suspicion that the concept has changed (Figure3.3b).
One of the earliest sliding window approaches was the Floating Rough Approximation (FLORA) family of concept drift handling approaches start- ing with FLORA (Kubat,1989). FLORA uses a sliding window with a fixed window size. This was improved in FLORA2 (Widmer & Kubat,1992) which uses a variable size window with the window size adjusted based on the clas- sifier error rate. FLORA3 and FLORA4 (Widmer & Kubat,1996) improved
(a) A variable size sliding window at time t.
(b) A variable size sliding window at time t + 15.
(c) A variable size sliding window at time t + 30.
Figure 3.3: An illustration of how a variable size sliding window handles concept drift.
the handling of recurring trends and noise respectively.
Klinkenberg & Renz (1998) introduced another notable sliding window approach which uses accuracy, precision and recall indicators to adjust the window size. The average value of each indicator is calculated over a number of previous batches. If any of the indicators for the current batch are above the average value for that indicator for the previous batches a change in concept is flagged. The difference between the current indicator value and the average value obtained from the previous batches determines by how much the training window is shrunk. If the difference is large, then a concept shift is suspected and the window is shrunk to the current batch. Otherwise a more gradual change in concept is presumed and the window size is shrunk by a user defined constant.
error based signal. If the error rate is above a warning threshold a change in concept is suspected and a new classifier is trained in parallel with the current classifier. If the error rate goes above a second threshold, the error threshold, a change in concept is declared and current classifier is replaced by the one trained from where the error rate exceeded the warning threshold.
Kuncheva (2009) uses statistical tests on the error rate to adjust the window size. When the error rate is greater than the mean error rate plus three times the standard deviation the window collapses to the current batch, otherwise the window grows.
The above mentioned sliding window approaches all use the error rate of a classifier as an indicator which influences the window size. This is a very common way of selecting the window size (other examples using this idea includes (Baena-Garc´ıa et al., 2006; Klinkenberg & Joachims, 2000; Nishida & Yamauchi, 2007)), and tends to handle concept drift in an intuitive and effective manner. However, they require that the classification error can be calculated.
Vorburger & Bernstein (2006) use an adaptation of Shannon’s entropy to calculate a window size indicator. The concept drift handling is achieved using a sliding window, when the entropy goes below a threshold the training window collapses, otherwise it grows.
Instance based selection approaches attempt to handle concept drift by selecting which instances are used to train the classifier. The most common selection technique is a sliding window variant. Sliding window based tech- niques are intuitive, do usually not require too much parameter tuning and
are based on an the reasonable assumption that instances which are chrono- logically close are likely to belong to the same concept. However, sliding window techniques are restricted by the classifier used. For example some simple classifiers are fast to update, which is important for processing data streams, but may suffer in terms of accuracy. Ensembles have been shown to be able to achieve high accuracy on some non-evolving data, and can be altered to handle concept drift. The next section will look at how ensembles can be used on data exhibiting concept drift.