3.4 Numerical Results
3.4.2 Anomaly Detection Accuracy
Next, the typical weekly profiles are used as ground-truth signals to evalu- ate the anomaly detection accuracy. This makes it possible to work with labeled time series and, in turn, to precisely quantify the classification performance of Bayesian netin terms of: i) Precision, ii) True Positive Rate (TPR) and iii) F measure.
In Fig. 3.4, we plot a segment from a typical weekly profile for sensor (target) node with ID 1000003 and cause nodes 1000003, 1000090, 1000197 (the speed traces in this plot are synchronized in time). According to the DAG construction method of Section 3.3.2, the target node 1000003 is also among the cause nodes in the DAG, and it contains the past readingstxc
t´1, . . . , xct´Wu.
Instead, the effect node contains yt, i.e., the speed measured at the target road
at time t.
Let us now consider Fig. 3.4 to illustrate how anomalies are injected and detected. For each sensor node, we inject random artificial anomalies of length Dtime slots in random non-overlapping positions, as explained above. Hence, we compute a probabilistic score from the marginal CDF of the effect node (that is computed taking the noisy profile as the input sequence). Whenever the anomaly rating exceeds the (sensor-specific) threshold ζ, the corresponding time slot is flagged as containing an anomaly (see the circular markers in the top subplot of Fig. 3.4). In the bottom subplot, we show the score for the trace in the top subplot, which is defined as:
SCOREt“ $ & % log10pCtq ´ log10p0.5q Ct ă 0.5 ´ log10p1 ´ Ctq ` log10p0.5q Ct ě 0.5 , (3.5)
where Ct “ CDFpyt|xtq is the Cumulative Distribution Function computed
for yt and conditioned on the past readings (xt in the DAG). Using Eq. (3.5),
the further the current speed yt is from the median of the PDF Ppyt|xtq, the
greater is the score. A high score means that the speed value yt is atypical
with respect to what would be predicted by the (marginalized) PDF. In this example, we use σn “ 5 km/h, which is the maximum noise level that was
considered in our experiments. As for the threshold ζ, for Fig. 3.4 we have set log10pno. of anomalies{no. of samplesq “ ´3 (application requirement), where
Figure 3.4: Anomaly detection example for sensor node with ID 1000003 and cause nodes 1000003, 1000090, 1000197.
“no. of samples” is the total number of data points in the validation set for each sensor node; ζ is numerically found to meet this. For a given threshold ζ, anomalies are detected (circular markers in the top subplot of Fig. 3.4) by assessing whether |SCOREt| ě ζ. Note that in Fig. 3.4 we consider a single
target road and, as such, the given application requirement is used to compute a single threshold ζ for that road. However, for an entire network, this same requirement is employed to derive one threshold per DAG (i.e., one for each target road in the physical topology).
Referring to α as the total number of (artificial) anomalies that were in- jected, we have that the number of time slots that may be possibly affected is S “ αpD ` W q. This is because anomalies are non-overlapping, each anomaly lasts D time slots and its effect could propagate for W further time slots due to the memory in the DAG, i.e., D` W is the support of a single anomaly. Given this, we define S (with |S| “ S) as the set of time slots that could possibly contain an anomalous reading, as per the previous reasoning. From this definition, it follows that the maximum number of True Positives (TP) is
|S| “ S. We also track the number of False Negatives (FN), False Positives (FP), and True Negatives (TN) and with řX we mean the total number of time slots that are flagged as being of type X, with X P tTP, FP, TN, FNu. For instance, in the example of Fig. 3.4, we have α “ 2, D “ 5, W “ 5, S “ 20, ř
TP “ 13, řFN “ 7, řFP “ 0, and řTN “ T1´ S, where T1 is the number
of time slots in the graphs. For the following results, we used α “ 70, which corresponds to an average of 10 artificial anomalies that are added per day.
Fig. 3.5 shows the classification performance of the proposed score-based anomaly detector in terms of: i) Precision, ii) TPR and iii) F measure (F ). These metrics are defined as follows: Precision“ řTP{řpTP ` FPq, TPR “ ř
TP{řpTP ` FNq, and F , which is a weighted average of Precision and TPR, i.e., F “ 2řTP{p2řTP `řpFP ` FNqq. In Fig. 3.5, these metrics are plot- ted as a function of the application requirement (i.e., log10pno. of anomalies {
no. of samplesq), which is reported in the abscissa. As an example, a require- ment equal to zero means that all the time samples are flagged as containing an anomaly and, as such, the true positive rate is TPR“ 1. However, in this case, the Precision is heavily impacted by the number of false positives (FP), which is at least T´ S, where S is the maximum number of time slots affected by real anomalies (true positives) and T corresponds to the number of time slots in the time series. As expected, the anomaly detection accuracy increases with an increasing noise level, approaching F “ 0.8 for σn“ 5 km/h (when the
requirement on the x-axis is ´2). Also, a higher Precision entails a smaller TPRand vice-versa.
In Fig. 3.6, we show the Receiver Operating Characteristic (ROC) space, obtained plotting the TPR (i.e., Sensitivity) against the False Positive Rate, FPR “ řFP{řpFP ` TNq (i.e., 1 ´ Specificity) varying the application re- quirement as a free parameter. This space shows the discrimination capability of the score-based anomaly classifier as we vary the requirement. Ideally, we would like to get TPR Ñ 1 and FPR Ñ 0, which means that desirable working points lie in the upper-left corner of the ROC space. As expected, the anomaly detection accuracy increases with an increasing noise level (increasing σn).
Moreover, we can further improve the performance of the proposed Bayesian framework through the following TP aggregation criterion (“TP aggr.”). As discussed above, each anomaly has an associated support of D` W time slots. Hence, whenever the score exceeds the threshold at any given time instant,
Figure 3.6: The ROC space.
D` W data points per anomaly instance are counted as true positives if at least one alarm is raised within the real (and known) support of the injected anomaly. With aggregation, the ROC curves effectively move towards the upper-left corner of the space, leading to some major improvement. For the example in Fig. 3.4, this strategy leads tořTP“ 20,řFN “ 0,řFP“ 0, and the total number of true negatives is řTN “ T1´ S, where T1 is the number
of time slots in the plot. The rationale about this approach is that, if there is at least one alarm within the support of an anomaly instance, in practice, this may be sufficient to declare the entire anomaly instance as detected.