Representation of alternative hypotheses - The involvement of the security analyst

4.5 The involvement of the security analyst

5.1.1 Representation of alternative hypotheses

The manifestation of a multi-step attack from the point of view of the defender is a sequence of traces, as we said in section3.3.1, page36. The hypotheses formulated by the analyst should then be identifiable with an eventual set of traces left by the attacker. The concept of alternative hypotheses has been already addressed in the literature about multi-step attack detection. However, these hypotheses are not considered as proposals coming from the human analyst in his quest of detection models, but as stated by the own detection methods. The hypotheses are then alternative choices created by the system from the information it has, for presenting them to the analyst [Mathew 2010] or as an intermediary step of the detection process [Skopik 2014]. They refer to one of the following four elements:

• Alternative goals or root causes of an attack [Geib 2001, Julisch 2001].

5.1. From Concrete to Abstract ASG 105 are tentatively proposed by the system but they need to be confirmed.

• The presence of missing steps [Cuppens 2002b, Ning 2004b, Zhai 2006], if the

expected sequence of events is incoherent with the observations.

• The whole attack scenario [Mathew 2010], as it could be a false positive.

As far as we know, the only publication in the selected corpus about multi-step attack detection that addresses alternative hypotheses proposed by the analyst is the one where the system SLEUTH [Hossain 2017] is proposed. But hypothesis decision is not fully integrated in SLEUTH. Its authors just propose to reclassify the data according to a diﬀerent hypothesis and to “re-run the analysis”.

In the literature about attack graphs, that is out of the scope of the systematic survey presented in Chapter 4, the hypotheses proposed by the analyst are more present. These hypotheses correspond to the diﬀerent paths that a multi-step attack can take in a network according to structural data. Much work has been done about attack graphs [Sheyner 2002,Shostack 2014,Singhal 2017], which constitutes the base of the structural-based detection methods (section 4.3.3). However, hypotheses made on an attack graph consider only the structural information of the network and not the sequence of traces representing the attack. What interests us to address our research question is the proposal of hypotheses from an evidence set of traces, not from the topology and characteristics of the defended network.

Models for the representation of hypotheses can be found in other domains of Cybersecurity. For example, Yen et al. [Yen 2010] propose a hypothesis reasoning framework for cyber situation awareness. Their objective is to translate the hypotheses created in the “mental world” of the analyst to a specific platform allowing team collaboration. The framework is based on recognition-primed decision (RPD), a model of rapid decision making. The authors do not specify any formal detail of the model representing the hypotheses, but they give a very interesting list of requirements that such a model should meet:

1. Representativeness. The model should be able to represent all the important elements of the hypotheses.

2. Simplicity and intuitiveness. The analyst should be capable of building an instance of the model in a simple way and to intuitively interpret instances created by other analysts or the ones she could have created in the past.

3. Expressed in mathematical form. Hypotheses in the model should be expressed in a mathematical-based language and the model itself should be a math-

(a) E-Tree (b) H-Tree

Figure 5.1: Trees in the AOH model. Image adapted from [Zhong 2013]. ematical object. The purpose of this requirement is to be able to reduce the work of the analyst by an eﬃcient analysis using mathematical-based tools.

4. Computation friendliness. The model and the hypotheses represented on it should be manageable by a computer system, that could assist the human analyst.

Yen et al. argue that traditional graphs cannot meet these requirements and that hypothesis models have to be represented as hypergraphs. They say so due to the broad scope of the problem they tackle: to represent any human hypothesis related to cyber security awareness. We will see in Chapter 5that if we restraint the hypotheses to sequences of events representing multi-step attacks, models based on traditional graphs are perfectly possible.

The AOH (Action, Observation and Hypothesis) model is another reasoning system intended to capture the know-how of expert analysts to guide unexperienced ones [Zhong 2013, Zhong 2014, Zhong 2015]. The experience-based reasoning process is expressed in a model called E-Tree, where actions are related to observations in the frame of a working hypothesis. A representation of an E-Tree is shown in Figure

5.1a. Each relationship action-observation is called an Experience Unit (EU). The disjunctive hypotheses in the E-Tree can be extracted into another model, the H-Tree (Figure 5.1b), so the analyst can just focus on them.

This model has evolved since it was first published. The idea of extracting the H-Tree has been abandoned and the E-Tree has become the AOH model itself. It has been used for other purposes: the identification of multimedia data that is interesting for the cybersecurity analysis [Alnusair 2017], the triage of security data coming to

5.1. From Concrete to Abstract ASG 107

Figure 5.2: Example of the AOH model for capturing the cognitive process of the analyst in the investigation of an abnormal quantity of IRC alerts. Image adapted from one of the publications by Zhong et al. [Zhong 2015].

a Security Operations Center (SOC) [Zhong 2017] and the sharing of the processes followed by the analyst [Thomas 2018]. All these proposals, always with Chen Zhong as the author in common, are based on capturing the activities performed by the analyst in the form of cognitive traces1_{. Both actions (e.g. browse a log file) and}

observations (e.g. a lot of alerts representing an IRC connection) generate traces that can be automatically incorporated into the AOH model. Actions become traces thanks to a defined catalogue. On their side, observations become traces through the selection of data performed by the analyst. However, the model does not provide a format to represent the hypotheses, that are just described in plain text. One of the examples proposed by Zhong et al. [Zhong 2015] is represented in Figure5.2.

We do not want to end up this section without addressing the issue trees, also called logic trees. They were created to simplify the resolution of issues by logic reasoning [Wojick 1975]. They are applied in business decision taking. There are not many scientific publications analyzing them but they are very similar in structure

to our AASG model presented in Chapter 5. In an issue tree, nodes represent a brief statement or question about an issue. The moves down the tree are made by responding to the parent node, getting to other nodes with new statements or questions. Paths in the tree represent a line of reasoning. An example of an issue tree representing a discussion about the appearance of the stirrup in History as the origin of Feudalism [Wojick 1975] is shown in Figure5.3.

Figure 5.3: Example of an issue tree representing the discussion about the creation of the stirrup marking the beginning of Feudalism [Wojick 1975].

In document Modélisation et identification de cyberattaques multi-étapes dans des ensembles d'événements (Page 121-125)