The Development of CORAS Modelling - Security and Safety Standards

5.4 Security and Safety Standards

6.1.2 The Development of CORAS Modelling

We then turn to the CORAS model language itself used for the CORAS Modelling of risks. Here there are several relevant modelling restrictions in CORAS that are relevant to this domain that require consideration. These primarily relate to vulnerabilities and treatments in CORAS.

A vulnerability is defined broadly as “A weakness, flaw or deficiency that opens for, or may be exploited by, a threat to cause harm to or reduce the value of an as-

FIGURE 6.2: Risks Method Completeness, Reprinted From [113, p. 119]

though this could be the machine learning system, a storage system for the training data and networks.

First, one initial limitation is that the vulnerability and risks are related to one

asset, like for many existing risk methodologies.[74, p. 25] In this domain, the train-

ing data (or even sets of training data) and the machine learning system collectively are vulnerable and it makes more sense to conceive of, and model data flows. How- ever, a newer feature of CORAS already allows for dependency boxes to be drawn to make a larger system from subsystems, which allows assets to be grouped together

and hence allows for a single vulnerability to be mapped to multiple assets.[22, p. 8]

Second, the knowledge that the attacker may have, can only be modelled as a vulnerability. For example, as a necessary condition of an attack, the attacker may

need to have knowledge of the model itself. (See Subsection2.3.1.) In CORAS, the

only way to model the knowledge is to say that there is a vulnerability relating to the control protecting that knowledge, about that asset. Indeed, the association between a vulnerability and the knowledge is not absolute when you consider that it is not necessary to exactly reproduce the training data to divulge secrets about the learned

hypothesis.[70, p. 166] While it may be sufficient in many cases to draw still com-

prehensible diagrams, the inability to model knowledge per se comes at the expense of being able to promote clarity and understanding of the risks, if only because the models become more complicated than necessary.

Third, is the fact that the data itself is of great importance in the machine learning life-cycle. The collection, pre-processing and storing of data means that multiple systems and networks can be attacked to obtain, alter or insert data. Tying the data to the storage assets and network assets and showing the vulnerabilities of each can quickly lead to the explosion of a CORAS diagram. This explosion happens because the vulnerabilities leading to data leakage of each asset are likely to be different: The vulnerabilities of a storage system are different from the network and from an internet connection and so cannot easily be abstracted. On the other hand, with the analysis in more detail, individual vulnerabilities of different components need to be considered and hence such scaling issues cannot, ultimately, always be avoided.

Fourth, formally, it is not possible to model the relationships between vulnerabilities when this is clear, at least in the CORAS software tool. That is, an attack

is possible with vulnerability "A"OR vulnerability "B" vs an attack is only possi-

ble with vulnerability "A"ANDvulnerability "B". However, this distinction can be

achieved in practice by adding "AND" and "OR" labels to the lines or by having the vulnerabilities drawn in sequence, for "AND" relationship or attached to the same line relationship line (between attacker and threat scenario) at the same point, for "OR", or other similar graphical techniques. However, there appears to have been a similar consideration of the need in the modelling language to include AND/OR

labels in [31]. (See Fig. B.1.) Lund points out that with the CORAS modelling, a

balance needs to be found between the expressiveness of the model and its com- prehension/usability, with the use of the "AND and "OR" labels may well be low

in practice.[65] Indeed, as the lack of a full understanding of the vulnerabilities in

machine learning strongly suggests that these relationships between vulnerabilities may not always be clear enough to allow for such a labelling in all cases. Also, such relationships can still be captured in a tabular format. This labelling may well be useful to enhance the expressiveness, although was not found to be necessary dur- ing the use case in the evaluation case study.

Fifth, the risk treatment modelling should be extended in a similar way to allow for individual treatments, as an additional model component to be added to treatment scenarios. There is a large number of possible treatments that can be added, and it would assist to show these being added over time, as these would alter the risks. An example is training treatments that can include adversarial training and ensemble training. Indeed, more attack specific treatments are likely to be discov- ered and may need to be added.

Collectively, while these issues still allow for the modelling of high-level threats in this domain, CORAS should be modified, as a priority, for machine learning and Industry 4.0 to:

• Include knowledge in the language

• Allow for classes of treatments to which individual treatments can be added.

Including knowledge could be achieved by the change to the CORAS model lan-

guage shown in Fig.6.3.

In document The Design of a Risk Management Framework for Machine Learning Systems in Industry 4.0 (Page 85-87)