Methods which Reset the System Upon Drift Detection

2.5 Online Approaches to Learn in the Presence of Concept Drift

2.5.1 Methods which Reset the System Upon Drift Detection

Some of the approaches which reset the system upon drift detection are: Drift Detection Method (DDM), Early Drift Detection Method (EDDM) and Statistical Test of Equal Proportions

2.5 Online Approaches to Learn in the Presence of Concept Drift

(STEPD).

DDM (Gama et al.; 2004) is based on the idea that the error rate of a learning algorithm should decrease as the number of training examples increases when the distribution of the examples is stationary. So, a significant increase in the error of the algorithm suggests that the class distribution is changing. It is important to note that ensemble approaches that do not explicitly detect drifts, but handle them by pruning ensemble members with high training error, are also based on this idea.

DDM stores the minimum error rate (pmin) and standard deviation (smin) obtained so far.

If pi + si >= pmin+ 2smin, where pi is the current error rate and si is the current standard

deviation, a warning level is triggered. If pi+ si>= pmin+ 3smin, it is considered that there is

concept drift.

EDDM (Baena-Garc´ıa et al.; 2006) is inspired on DDM. However, it is based on the idea that not only the accuracy is improved while the learning of a stable concept is occurring, but also the distance between two errors increases. So, the average distance between two errors (p′

i) and its

standard deviation (s′

i) is calculated. Their maximum values so far are stored (p′max and s′max).

If (p′

i+ 2s′i)/(p′max+ 2s′max) < α, a warning level is triggered. If (p′i+ 2s′i)/(p′max+ 2s′max) < β,

α > β, it is considered that there is concept drift.

STEPD (Nishida and Yamauchi; 2007b) assumes that, after a concept drift, the difference between the accuracy for recent examples and the overall accuracy from the beginning of the learning should be statistically different. So, STEPD compares these two accuracies by using a statistical test of equal proportions. The proportions are ˆpA = r/(n − W ) and ˆpB = s/W ,

where n is the total number of examples learnt by the online classifier, s is the number of correct classifications among the most recent W examples and r is the number of correct classifications among n − W examples (excluding the most recent W examples from the n examples). A drift can be detected only after n ≥ 2W . The statistic is calculated by using the Pearson’s chi-square test with Yates’ correction for continuity (Yates; 1934), which does not have high computational costs. If ˆpA > ˆpB and a statistically significant difference is detected with a

certain one-tailed significance level, a significant decrease in the recent accuracy is detected. STEPD uses two different significance levels: αw (warning) and αd (drift). While ˆpA> ˆpB and

there is a statistically significant difference using αw, a warning level is set. When ˆpA > ˆpB

and there is a statistically significant difference using αd, it is considered that there is a concept

drift.

All these methods start storing the new incoming examples when a warning level is triggered. Then, when the drift is confirmed, the model learnt so far (including the variables stored by the drift detection method, such as pmin, smin, p′max, s′max, n) is reset and a new model starts being

learnt, considering the examples stored since the warning level was triggered. In EDDM, new p′

max and s′maxstart being considered after a drift detection only after 30 errors have happened.

If the similarity between (p′

2.5 Online Approaches to Learn in the Presence of Concept Drift

triggered, the stored examples are removed and the method returns to normality.

Successful points

These approaches can quickly react to drifts once they are detected.

Experiments using perceptrons, neural networks and decision trees as the base learners on eight artificial databases presenting concept drift and one real world electricity market database showed that DDM recovered faster from concept drift than base learners not using DDM, usually achieving significantly better generalization in the end of the training.

Experiments on an artificial database presenting a very slow gradual drift showed that EDDM reacted faster than DDM for this type of drift. Similar behaviour occurred for a real world database. Experiments on four artificial databases with gradual and abrupt concept drifts showed that this approach had similar behaviour to DDM for these types of drifts. The experiments used three different base learners: C4.5 decision trees (Quinlan; 1993), instance- based learning (Aha et al.; 1991) and nearest-neighbourhood with generalization (Martin; 1995). Experiments using instance-based learning (Aha et al.; 1991) and naive-bayes as the base learners on five artificial data sets presenting different types of drifts showed that STEPD usually detected drifts faster than DDM and EDDM, obtaining better error rate for sudden drifts and comparable error rate to EDDM on gradual drifts. However, the speed of the detection in comparison to EDDM is questionable, as a single choice of parameters was tried, when EDDM actually has parameters which allow tuning the trade-off between speed of drift detection and false alarms.

Problems

As the model is reset whenever a concept drift is detected, these approaches cannot use any information previously learnt. So, they can present higher increase in the generalization error right after a drift in comparison to approaches which do not handle drifts, especially when the drift does not cause many changes. Besides, no advantage can be taken from the previous model in the case of recurrent drifts. Resetting the model after a drift detection also makes the approaches very sensitive to false alarms.

Another problem is that, if the warning level is triggered and there is really a drift at that moment, the accuracy of the classifier for the new concept can be improved only after the drift level is triggered.

Moreover, when the warning level is triggered, it is necessary to store the new incoming examples. This is not considered a true online learning behaviour. However, a modification in the implementation to start training a new classifier with the new examples when a warning level is triggered would transform the approaches in true online learning approaches.

2.5 Online Approaches to Learn in the Presence of Concept Drift

DDM may also suffer from slow drift detection, as it detects drifts based on the current error rate (pi) and its standard deviation (si), which are calculated considering values since the

last detected concept drift. So, it may take a long time for pi+ si to considerably increase when

a drift happens and result in a drift detection. A similar problem may happen to EDDM, even though the parameter β allows tuning the trade-off between speed of drift detection and false alarms. STEPD has a parameter W which works in a similar way to a sliding window to detect drifts. A too large W will take longer to detect drifts and a too small W may cause unstable behaviour. Besides, STEPD needs at least 2W examples between consecutive drifts to detected them properly. So, drifts are not always continuously monitored.

In document Online ensemble learning in the presence of concept drift (Page 49-52)