6. Self-adaptation
6.3 A combination of continuous and threshold-based learning from observed traffic
6.3.1 Retraining the cluster-based model
We have discussed the clustering-based detection method that was used in [34], which distinguishes “suspicious” behavior in addition to the regular classifications of “normal” and “abnormal” behavior. “Suspicious” behavior is collected in a reservoir, with which the entire model is periodically updated. The modeling technique itself is extensively described in [35]. There it is claimed that the clustering technique on which the proposed method is based, namely DBSCAN, is less suitable for the purpose of detecting concept- drift, because it assumes a static environment and is not suited for real-time updating based on a continuous data stream. The modeling technique is based on Affinity Propagation, which is an algorithm that specifies a set of clusters in which the centers of the clusters represent real data items (they are called “exemplars”) and in which the summed distance between the data items in each cluster and the exemplar in each cluster is minimized. A parameter can be set to control the number of different clusters that is created. We will omit the description of all of the underlying mathematics of the method. However, we will briefly handle the way in which the model is updated. The updating consists out of two parts. First, there is the continuous updating from real-time data. This applies to the situation when new data flows in and is flagged as “normal” (the data is within the ε-range). The data will belong to a certain cluster, represented by an exemplar. As the new data item comes in, the following will happen to the exemplar and its properties:
The fractional term is named the “forgetting factor” and is used to assign more weight to recent data while reducing the weight of past data in order to stay up to date with a changing environment. ni is especially
important, because it has an effect in the second part of the model-updating mechanism. In chapter 7 we already discussed that the retraining phase of the model would be triggered as soon as one of the rebuilding-
criteria was met. The retraining is performed by using the Affinity Propagation based method on the existing exemplars as well as all the items in the reservoir. Existing exemplars will have an additional preference to be selected as an exemplar in the new model based on ni. When more data-items were within the ε-range of an
exemplar during the period of real-time detection, the exemplar will gain importance, which is expressed in the higher value of ni.
It may be noted that in fact the model itself is not updated in real-time; the position of each exemplar remains the same and the cluster will not change exemplars; only the properties of the cluster, which is represented by its exemplar, are updated. Also, the incoming items in the real-time data are immediately discarded as soon as the cluster’s properties have been updated. Another method would be to update the (position of) the exemplar itself. In [35] it is also suggested that “normal” items could be stored and used with the retraining to increase accuracy. A significant drift can be caught by the change detection and rebuilding method which is triggered by one of the rebuilding criteria. A minor drift within ε could then be coped with by retraining the current exemplar as well as adapting the corresponding model parameters. This means that an exemplar may be subject to a change in position or may even be entirely replaced by another exemplar before the “major” retraining takes place, which incorporates the items in the reservoir. The notion that this real-time updating of exemplars in addition to the periodic update that incorporates the reservoir is not required to maintain a high detection accuracy is proven in [35]. A visual representation of this principle is in the figure below.
Figure 13 - Classification of legitimate items after clustering
One option would be to retain incoming data items, and frequently update the model to cope with minor drifts in addition to the full-fledged reservoir-based retraining. The new exemplar to which the current exemplar would drift in such a case is displayed as the larger blue circle. It is computationally less efficient to store all incoming data items and perform frequent retraining to cope with those minor drifts. A more efficient method would be to discard all incoming items after updating only some of the exemplar’s properties, such as the number of items associated to the exemplar, which will later have an effect in the reservoir-based retraining. The new exemplar that would then be picked after the reservoir-based retraining for this example situation is displayed as the larger red circle. Although the blue circle exemplar would be optimal with respect to the adaptation to the live environment, the fact that, in general, the blue circle and red circle exemplars are not significantly far
apart is proven in [35]. Therefore, it is desirable to use the computationally more efficient method, which discards incoming items immediately after they have been processed, and which is the method that is used in [34].