3 Methodology
3.2 System architecture and operation
3.3.3 Phase three: performance at detecting attacks
The aim of phase three was to determine the systems ability to actually detect clone tags in an audit log. This would be the overall indicator in determining the feasibility of this intrusion detection system. The question that needed to be answered was: how different does attack behaviour need to be for it to be detectable?
An attack is represented in an audit log as an audit record representing when a clone tag has been used. This phase involved inserting a number of attack audit records throughout the UTAS audit log for reader four, and seeing if Deckard could identify these attacks from the normal records.
It can be seen that there would be 22 tags involved in this phase. In actual fact, this number was reduced to 11 tags because some tags did not have enough data to adequately represent the problem of tag cloning. For example, tags with less than four observations were discarded because inserting attacks at three points in the audit log (as this phase was about to do) would not work out fairly.
The testing for phase three was undertaken by first inserting attacks into the data. The data was then fed into Deckard, which went about updating the LFP; classifying the audit records as either normal or as anomalous. After the profile had been updated, the system checked to see if the data that was used in an update contained an attack record.
A confusion matrix recorded the outcome of each profile’s updates. It contained information about the system’s ability to make correct classifications and incorrect classification. True positives and true negatives are correct classifications when the system correctly classifies an observation as containing an attack or not containing an attack. The false positives and false negatives are classifications when the system misclassifies an observation as either containing an attack, when in fact it does not, or when it fails to detect the presence of an attack.
Using the values in each confusion matrix, it is be possible to calculate some quantitative measures of the systems performance at detecting clone tags. More information on the theory of confusion matrices can be obtained from Witten & Frank (2000, p. 138). As stated in section 2.4, the hallmarks of a good intrusion detection system are a high rate of detection (high true positive rate), and a low false alarm rate (false positive rate).
Figure 8 - Detecting attacks, shows the algorithm that the processing engine of the system used to determine if a classification was true or false. The system increments one of four counters each time a profile is updated to record the outcome of an update: the true positive counter is incremented when an attack is correctly detected; the false positive counter is incremented when an attack incorrectly detected; the false negative counter is incremented when an attack is missed; and the true negative counter is incremented when no attack is detected in an update that did not contain attacks.
If (value > threshold) { ++alerts; if (alertsInPeriod > 0){ ++TruePostives; } else{ ++FalsePositives; } else{ if (alertsInPeriod > 0){ ++FalseNegatives; } else{ ++TrueNegatives; } }
Figure 8 - Detecting attacks
The values in the resulting confusion matrices could then be used to produce the detection rates. They allow the system to evaluate the classification in relation to the other values in the confusion matrix and are a better indicator then simply examining the individual values. The formulas of how they would be calculated were obtained from Kohavi & Provost (1998, p. 272):
Figure 9 - True positive rate
• The proportion of positive cases that were correctly identified. True Positive Rate =
False Negatives + True Positives True Positives
Figure 10 - False positive rate
• The proportion of negative cases that were incorrectly classified as positive.
Figure 11 - Precision
• The proportion of the predicted positive cases that was correct.
3.3.3.1 Synthesizing attack data in the audit log
As this phase was detecting the systems ability to detect attacks, it was necessary to insert some attacks into the data set. Attack audit records were inserted into the audit log in two of the ways that could represent the scenario in section 2.4. Firstly, an attacker could use a clone tag a number of times in a time period; this was labelled the attack intensity. For example, an attacker uses a clone tag three times in one morning on a particular date. Secondly, an attacker could use a clone tag several times over a sustained time period; this was labelled the attacked frequency. For example, an attacker uses a clone tag every two weeks. Figure 12 - Synthesizing attacks in audit log, illustrates how attacks were inserted.
Figure 12 - Synthesizing attacks in audit log
False Positive Rate =
True Negatives + False Positives False Positives
Precision =
False Positives + True Positives True Positives
It was hoped that elevating the frequency and intensity of the number of attacks would provide the answer to the question of how different attack behaviour needs to be from normal behaviour in order for it to be detected by Deckard.
Attack data for each tag was inserted as follows:
1. The audit records for each tag were extracted from the audit log and counted. 2. Three positions in the data were identified at locations 25%, 50%, 75% of the number of records in the audit log. For example, if the audit contained 100 records, then location 25% would be at record 25 of the log.
3. The audit record at this point that was present was copied and then reinserted into the same position n number of times based on the attack intensity setting. 4. This was repeated at locations 50% and 75%.
5. Then the process was restarted, at the next intensity level.
For example, the program initialized with a frequency of 25% and an intensity of one attack. This means that the audit log for each tag had a one attack audit record inserted into it at 25% of the data. The system would then update all its profiles in an attempt to detect this single attack and the results recorded into a confusion matrix. Then, the intensity was increased to two and process repeated up to an intensity of three.
At this point, intensity was reset to one attack, but the frequency of attacks was increased to two. This means attacks were now being inserted at two locations in the audit log - 25% and 50%. The process of increasing the attack intensity was repeated, until finally, attacks had been inserted at the three frequency locations (25%, 50%, and 75%).
3.3.3.2 Limitations on the method
There are several known limitations on the method that may have affected the results in the confusion matrices.
The attacks that were used in the audit log may not realistically interact with the background activity (Mell et al. 2003, p. 15) because they were inserted synthetically. This may mean that the system is given an unfair advantage at
distinguishing attack records from legitimate audit records. The only way that this could be evaluated would be to compare the performance of the system using an audit log that did have “real” attacks in it. However this was not possible because such an audit log was not available.
The statistical model that has been used to model behaviour in the system is influenced by the attack audit records. The system does not remove those audit records that were classified as anomalous, therefore they go onto be used in the future classification of audit records. It was decided that discarding these audit records would not be suitable for this implementation because of the underlying true error rate that was determined in section 3.3. That is, discarding data may result in legitimate data or normal behaviour being removed from the audit log. On the other hand, it may be argued that in keeping those records that did not produce an alert may in fact be anomalous, as there are times when the system will fail to detect attacks.
Another limitation on the system may be that in some way it is predisposed to trigger alerts coincidently where attack records have been inserted in the audit log. It was demonstrated in section 3.3, that there is a degree of error already present in the classification of tag behaviour. Thus it may be possible that attacks have been inserted into those erroneous periods where alerts are already occurring. It would be possible to verify this by recording the locations in section 3.3 that triggered alerts and comparing them to the locations that triggered alerts in this phase. However in the real world this is hardly feasible. Therefore it was decided that the best way to overcome this would be to simply insert attacks at more than one location in the audit log to reduce this possibility.
Section 3.3.2.1 briefly discussed the potential problem of pre-existing attacks in the original audit log. Mell et al. (2003, pp. 15-6) believes that such attacks would pose a problem in establishing the false positive rate of the system. For example, if the system were to classify a period as containing an attack, when the synthetic data was not inserted into that period, it may in fact be detecting a pre-existing attack that existed in the original audit directly from the beginning. The pre-existence of attacks in the audit logs is impossible to know or verify.