6.3 Adaptive One-Class Ensemble-based Anomaly Detection
6.3.2 Progressive Update Component
The importance of the insider threat problem requires a continuous monitoring of the implemented detection system by the system’s administrator(s). A false alarm is a result of a normal behaviour detected as anomalous by the system. This maps to normal instances that have a high similarity with anomalous instances, thus appear as suspicious events. To address this issue and to reduce the number of false alarms raised, we introduce the progressive update component. The role of this component is to progressively update the detection system with acquired FP chunks. We define an FP chunk as follows:
Definition 6.3.1: FP ChunkAn FP chunk is a segment of capacitycthat accumulates test instances declared as FPs. Let FPchunks={FP1,FP2, ..., FPc} represent an FP chunk acquired at sequences, such thatDtfor each FPt ∈FPchunk
s; 1 ≤ t ≤ c is predicted as a positive (i.e. anomalous instance) while the actual class label for FPt is normal. Thus, each FPtis declared as FP.
Consider, in Fig. 6.1, the set of blue arrows represent the testing instances de- clared as FPs, and each segment of blue arrows represents an FP chunk acquired sequentially. For example, the segment of blue arrows at sequences=1 represents FPchunk1. We definecas the capacity (size) of FP chunks; each FP chunk FPchunks can accumulatecnumber of FPs. Fig. 6.1 illustrates FP chunks of capacityc=3; each FP chunk accumulates3FPs.
The progressive update method relies on the continuous monitoring by the ad- ministrator to investigate whether the flagged alarms are true or false. Along the run of the detection system, when an alarm is flagged, the administrator is required
102 Chapter 6. Anomaly Detection for Insider Threat Detection
to investigate the suspected user and decide whether it is a malicious insider threat or a false alarm. In the latter case, the FP (false alarm) is accumulated into the FP chunk. This procedure continues until the current FP chunk is full (i.e. capacityc of FP chunk is reached). The FP chunk is then fed to the progressive update com- ponent to oversample the FP instances in order to generate artificial samples. The oversampling component is later described in this Section. The set of FP instances to- gether with the set of artificial instances are then utilised to update the pre-generated models. LetN represent the pre-generated set of genuine normal instances, and let As represent the set of artificial instances generated for FPchunks. Hence, the pre- generated models are retrained onR=N∪FPchunks∪As. Similarly, this process is repeated progressively for each accumulated FP chunk.
The rationale behind the progressive update method is not simply to retrain the models with new instances, but also to enrich the models with recently detected FPs and synthetically generated artificial samplesclose (not replicates)to FPs. Therefore, the decision boundary in each model adapts to the falsely detected behaviour and reduces the chances of FPs in the upcoming testing instances.
The idea of continuous monitoring to investigate flagged alarms to identify FPs has been applied in [61]. The authors defined cumulative recall measure based on adaily budget. The term daily budget refers to the maximum number of alarms an analyst can investigate per day to judge whether they are TPs or FPs.
Oversampling Component The method of updating pre-generated models with FP instances in the FP chunks may not have a significant influence on the decision boundary. However, the oversampling of FP instances generates artificial instances, and enriches the updated model with recently acquired FP instances, as well as nor- mal artificial samples. In this way, the recently acquired normal behaviour of a user or a community will be well represented, and in turn will trigger the base method to adapt the model’s decision boundary.
The oversampling component is in charge of generating artificial samples from the FP instances upon the acquirement of each FPchunks. The task of allocating
6.3. Adaptive One-Class Ensemble-based Anomaly Detection 103
the number of samples to be generated for each FP instance in the FPchunks de- pends on the degree of outlierness of each FP instance. Thus, the number of sam- ples to be generated varies among FP instances. We utilise a density-based method, namely, Local Outlier Factor (LOF) [113], to calculate the local outlier factor (score) lofNt for each FP instance FPtin acquired FPchunk
swith respect to thekLOF nearest neighbours from only the set of genuine normal class instancesN. lofNt is tuned for kLOF=
p
1 +card(N) (thumb-rule), where the radicand1 +card(N)represents the number of instances utilised to calculatelofNt . The motivation to integrate LOF to calculate the anomaly score for FP instances, despite iForest can provide it, is that LOF can be used independently of the base method (ocsvm or iForest).
Let perclofNt represent the percentile rank for each FPt compared to the set of normal instancesN. In Fig. 6.3, we represent the genuine normal instancesN by blue filled circles. We define twotypes of FP instances, where each type is oversam- pled based on its degree of outliernesslofNt :
• Outlier instance: represented by a solid-line red empty circle. Each FP instance FPt is considered an outlier instance, if it has a highlofNt value; located far away from the genuine normal instancesN. For example, ifperclofNt=90, this means that theloft
N for FPtis greater than90%of the normal instancesN. • Safe instance: represented by a solid-line blue empty circle. Each FP instance
FPtis considered a safe instance, if it has alowlofNt value, located at or near the borderline of genuine normal instancesN.
In Fig. 6.3, the red dashed circles represent the artificial samples generated for FP outlier instances, while the blue dashed circles represent the artificial samples for FP safe instances. The idea is that safe instances are given more chance to generate ar- tificial samples around them, while the outlier instances are given less chance. This gives the update component more conservative control on the adaptation of the deci- sion boundary. Oversampling more safe instances than outlier instances safeguards the system from fast movement of the decision boundary due to outliers. Otherwise, more False Negatives (FN) (i.e. anomalous instances predicted as normal) will be in the upcoming FP chunks.
104 Chapter 6. Anomaly Detection for Insider Threat Detection
FIGURE 6.3: Artificial oversampling of FP instances over two fea-
tures. The blue filled circles represent the genuine normal instances
N, the solid-line red empty circle represents an FP outlier instance, and the solid-line blue empty circle represent an FP safe instance. The red dashed circles represent the artificial samples generated for FP outlier instances, and the blue dashed circles represent the artificial samples for FP safe instances. The solid-line green boundary repre- sents the decision boundary of the pre-generated model, and green dashed decision boundary represents the adapted decision boundary
of the updated model.
Letperc.overrepresent the percentage of artificial samples to be generated, and letnumS=(perc.over/100)×crepresent the number of artificial samples to be gener- ated. The process of generating artificial samples associated to each FPt∈F P chunks instance is executed feature-wise over a number of iterations.
Recall thatxtf0 represents the value of thefthfeature of Xt0 ∈N at a session slot t0. Likewise, letptf represent the value of the fth feature of FPt at a session slott, given that FPt={pt1, pt2, ..., ptm}. For each featuref; 1 f m, we find the nearest neighbourxtf0 ofXt0 ∈N forptf of FPt. In other words, we search the set of normal instancesN at the level of featuref only, and we find the closest featurextf0 forptf. At the level of feature f, there exists two directions: positive (+ve), and negative (−ve). Thus, ptf may have (1) only +ve neighbours from the set N, (2) only −ve neighbours, or (3) both+veneighbours and−veneighbours. We define the positive (+ve) nearest neighbour and the negative (−ve) nearest neighbour as follows:
Definition 6.3.2: Positive nearest neighbourA+venearest neighbour is the closest xt0 f ofXt 0 ∈N forpt f of FPt, such thatxt 0
f is located in the+vedirection toptf.
Definition 6.3.3: Negative nearest neighbourA−venearest neighbour is the closest xtf0ofXt0 ∈N forptf of FPt, such thatxt0
6.3. Adaptive One-Class Ensemble-based Anomaly Detection 105
Fig. 6.4 illustrates generating artificial samples of FP instances over one feature (i.e. one dimension). Let the blue filled circle representxt0
f, the blue empty circle representptf, and the blue dashed circle represent an artificial feature valueatf asso- ciated toptf. The process of generating an artificial feature value atf is executed as follows:
• Ifptf has only a+venearest neighbour xtf0 at the level of featuref (Fig. 6.4.1), thenatf is calculated in the+vedirection along the segment joiningptf andxtf0 according to Eq. 6.1, such thatdir=+ 1.
• Ifptf has only a−venearest neighbour xtf0 at the level of featuref (Fig. 6.4.2), thenatf is calculated in the−vedirection along the segment joiningptf andxtf0 according to Eq. 6.1, such thatdir=−1.
• Ifptf has both a+venearest neighbour and a−venearest neighbour at the level of featuref (Fig. 6.4.3), thenatf can be calculated in the+veor−vedirection. A random directiondiris selected at each iteration, andatf is calculated in the selected directiondiraccording to Eq. 6.1.
atf=ptf +dir×rand(0 :λ×dist(ptf, xtf0)) (6.1) wheredist(pt
f, xt
0
f) represents the distance between an FP feature valueptf and the nearest neighbourxtf0.
Note thatλ, tuned forλ=0.8, denotes a parameter that controls the distance per- mitted to generate artificial features along the segment joiningptf andxtf0. The value ofλis in the range]0.5,1[. The rationale behind this is to generate the artificial sam- ples a bit closer to the FP instances and not the normal instances, so that the adapted decision boundary is influenced by these samples.
Consequently, an artificial sampleAt={at
1, at2, ..., atf}associated with an FPt in- stance is generated at each iteration. The steps described are repeated for a number of iterations untilnumSof artificial samples are generated.
106 Chapter 6. Anomaly Detection for Insider Threat Detection
FIGURE6.4: Artificial oversampling of FP instances over one feature. The blue filled circle represents a genuine normal feature valuext0
f,
the blue empty circle represents an FP feature valuept
f, and the blue
dashed circle represents an artificial feature value at
f associated to
ptf. Fig. 6.4.1 illustrates the case where ptf has only a+venearest neighbourxtf0. Fig. 6.4.2 illustrates the case whereptf has only a−ve
nearest neighbourxt0
f. Fig. 6.4.3 illustrates the case wherep t
f has two
nearest neighbours in both directions.