Progressive Update Component - Adaptive One-Class Ensemble-based Anomaly Detection

6.3 Adaptive One-Class Ensemble-based Anomaly Detection

6.3.2 Progressive Update Component

The importance of the insider threat problem requires a continuous monitoring of the implemented detection system by the system’s administrator(s). A false alarm is a result of a normal behaviour detected as anomalous by the system. This maps to normal instances that have a high similarity with anomalous instances, thus appear as suspicious events. To address this issue and to reduce the number of false alarms raised, we introduce the progressive update component. The role of this component is to progressively update the detection system with acquired FP chunks. We define an FP chunk as follows:

Definition 6.3.1: FP ChunkAn FP chunk is a segment of capacitycthat accumulates test instances declared as FPs. Let FPchunks={FP1,FP2, ..., FPc} represent an FP chunk acquired at sequences, such thatDtfor each FPt _∈_FPchunk

s; 1 ≤ t ≤ c is predicted as a positive (i.e. anomalous instance) while the actual class label for FPt is normal. Thus, each FPtis declared as FP.

Consider, in Fig. 6.1, the set of blue arrows represent the testing instances declared as FPs, and each segment of blue arrows represents an FP chunk acquired sequentially. For example, the segment of blue arrows at sequences=1 represents FPchunk1. We definecas the capacity (size) of FP chunks; each FP chunk FPchunks can accumulatecnumber of FPs. Fig. 6.1 illustrates FP chunks of capacityc=3; each FP chunk accumulates3FPs.

The progressive update method relies on the continuous monitoring by the administrator to investigate whether the flagged alarms are true or false. Along the run of the detection system, when an alarm is flagged, the administrator is required

102 Chapter 6. Anomaly Detection for Insider Threat Detection

to investigate the suspected user and decide whether it is a malicious insider threat or a false alarm. In the latter case, the FP (false alarm) is accumulated into the FP chunk. This procedure continues until the current FP chunk is full (i.e. capacityc of FP chunk is reached). The FP chunk is then fed to the progressive update component to oversample the FP instances in order to generate artificial samples. The oversampling component is later described in this Section. The set of FP instances to- gether with the set of artificial instances are then utilised to update the pre-generated models. LetN represent the pre-generated set of genuine normal instances, and let As represent the set of artificial instances generated for FPchunks. Hence, the pre- generated models are retrained onR=N∪FPchunks∪As. Similarly, this process is repeated progressively for each accumulated FP chunk.

The rationale behind the progressive update method is not simply to retrain the models with new instances, but also to enrich the models with recently detected FPs and synthetically generated artificial samplesclose (not replicates)to FPs. Therefore, the decision boundary in each model adapts to the falsely detected behaviour and reduces the chances of FPs in the upcoming testing instances.

The idea of continuous monitoring to investigate flagged alarms to identify FPs has been applied in [61]. The authors defined cumulative recall measure based on adaily budget. The term daily budget refers to the maximum number of alarms an analyst can investigate per day to judge whether they are TPs or FPs.

Oversampling Component The method of updating pre-generated models with FP instances in the FP chunks may not have a significant influence on the decision boundary. However, the oversampling of FP instances generates artificial instances, and enriches the updated model with recently acquired FP instances, as well as normal artificial samples. In this way, the recently acquired normal behaviour of a user or a community will be well represented, and in turn will trigger the base method to adapt the model’s decision boundary.

The oversampling component is in charge of generating artificial samples from the FP instances upon the acquirement of each FPchunks. The task of allocating

6.3. Adaptive One-Class Ensemble-based Anomaly Detection 103

the number of samples to be generated for each FP instance in the FPchunks de- pends on the degree of outlierness of each FP instance. Thus, the number of samples to be generated varies among FP instances. We utilise a density-based method, namely, Local Outlier Factor (LOF) [113], to calculate the local outlier factor (score) lof_Nt for each FP instance FPt_{in acquired FPchunk}

swith respect to thekLOF nearest neighbours from only the set of genuine normal class instancesN. lof_Nt is tuned for kLOF=

1 +card(N) (thumb-rule), where the radicand1 +card(N)represents the number of instances utilised to calculatelof_Nt . The motivation to integrate LOF to calculate the anomaly score for FP instances, despite iForest can provide it, is that LOF can be used independently of the base method (ocsvm or iForest).

Let perclof_Nt represent the percentile rank for each FPt compared to the set of normal instancesN. In Fig. 6.3, we represent the genuine normal instancesN by blue filled circles. We define twotypes of FP instances, where each type is oversam- pled based on its degree of outliernesslof_Nt :

• Outlier instance: represented by a solid-line red empty circle. Each FP instance FPt is considered an outlier instance, if it has a highlof_Nt value; located far away from the genuine normal instancesN. For example, ifperclof_Nt=90, this means that theloft

N for FPtis greater than90%of the normal instancesN. • Safe instance: represented by a solid-line blue empty circle. Each FP instance

FPtis considered a safe instance, if it has alowlof_Nt value, located at or near the borderline of genuine normal instancesN.

In Fig. 6.3, the red dashed circles represent the artificial samples generated for FP outlier instances, while the blue dashed circles represent the artificial samples for FP safe instances. The idea is that safe instances are given more chance to generate artificial samples around them, while the outlier instances are given less chance. This gives the update component more conservative control on the adaptation of the decision boundary. Oversampling more safe instances than outlier instances safeguards the system from fast movement of the decision boundary due to outliers. Otherwise, more False Negatives (FN) (i.e. anomalous instances predicted as normal) will be in the upcoming FP chunks.

104 Chapter 6. Anomaly Detection for Insider Threat Detection

FIGURE 6.3: Artificial oversampling of FP instances over two fea-

tures. The blue filled circles represent the genuine normal instances

N, the solid-line red empty circle represents an FP outlier instance, and the solid-line blue empty circle represent an FP safe instance. The red dashed circles represent the artificial samples generated for FP outlier instances, and the blue dashed circles represent the artificial samples for FP safe instances. The solid-line green boundary represents the decision boundary of the pre-generated model, and green dashed decision boundary represents the adapted decision boundary

of the updated model.

Letperc.overrepresent the percentage of artificial samples to be generated, and letnumS=(perc.over/100)×crepresent the number of artificial samples to be generated. The process of generating artificial samples associated to each FPt∈F P chunks instance is executed feature-wise over a number of iterations.

Recall thatxt_f0 represents the value of thefthfeature of Xt0 _∈_N _{at a session slot} t0. Likewise, letpt_f represent the value of the fth feature of FPt _{at a session slot}_t_, given that FPt={pt₁, pt₂, ..., pt_m}. For each featuref; 1 f m, we find the nearest neighbourxt_f0 ofXt0 ∈N forpt_f of FPt. In other words, we search the set of normal instancesN at the level of featuref only, and we find the closest featurext_f0 forpt_f. At the level of feature f, there exists two directions: positive (+ve), and negative (−ve). Thus, pt_f may have (1) only +ve neighbours from the set N, (2) only −ve neighbours, or (3) both+veneighbours and−veneighbours. We define the positive (+ve) nearest neighbour and the negative (−ve) nearest neighbour as follows:

Definition 6.3.2: Positive nearest neighbourA+venearest neighbour is the closest xt0 f ofXt 0 ∈N forpt f of FPt, such thatxt 0

f is located in the+vedirection toptf.

Definition 6.3.3: Negative nearest neighbourA−venearest neighbour is the closest xt_f0ofXt0 ∈N forpt_f of FPt_{, such that}_xt0

6.3. Adaptive One-Class Ensemble-based Anomaly Detection 105

Fig. 6.4 illustrates generating artificial samples of FP instances over one feature (i.e. one dimension). Let the blue filled circle representxt0

f, the blue empty circle representpt_f, and the blue dashed circle represent an artificial feature valueat_f associated topt_f. The process of generating an artificial feature value at_f is executed as follows:

• Ifpt_f has only a+venearest neighbour xt_f0 at the level of featuref (Fig. 6.4.1), thenat_f is calculated in the+vedirection along the segment joiningpt_f andxt_f0 according to Eq. 6.1, such thatdir=+ 1.

• Ifpt_f has only a−venearest neighbour xt_f0 at the level of featuref (Fig. 6.4.2), thenat_f is calculated in the−vedirection along the segment joiningpt_f andxt_f0 according to Eq. 6.1, such thatdir=−1.

• Ifpt_f has both a+venearest neighbour and a−venearest neighbour at the level of featuref (Fig. 6.4.3), thenat_f can be calculated in the+veor−vedirection. A random directiondiris selected at each iteration, andat_f is calculated in the selected directiondiraccording to Eq. 6.1.

at_f=pt_f +dir×rand(0 :λ×dist(pt_f, xt_f0)) (6.1) wheredist(pt

f, xt

f) represents the distance between an FP feature valueptf and the nearest neighbourxt_f0.

Note thatλ, tuned forλ=0.8, denotes a parameter that controls the distance per- mitted to generate artificial features along the segment joiningpt_f andxt_f0. The value ofλis in the range]0.5,1[. The rationale behind this is to generate the artificial samples a bit closer to the FP instances and not the normal instances, so that the adapted decision boundary is influenced by these samples.

Consequently, an artificial sampleAt_={_at

1, at2, ..., atf}associated with an FPt instance is generated at each iteration. The steps described are repeated for a number of iterations untilnumSof artificial samples are generated.

106 Chapter 6. Anomaly Detection for Insider Threat Detection

FIGURE6.4: Artificial oversampling of FP instances over one feature. The blue filled circle represents a genuine normal feature valuext0

the blue empty circle represents an FP feature valuept

f, and the blue

dashed circle represents an artificial feature value at

f associated to

pt_f. Fig. 6.4.1 illustrates the case where pt_f has only a+venearest neighbourxt_f0. Fig. 6.4.2 illustrates the case wherept_f has only a−ve

nearest neighbourxt0

f. Fig. 6.4.3 illustrates the case wherep t

f has two

nearest neighbours in both directions.

In document Opportunistic machine learning methods for effective insider threat detection (Page 122-127)