Software Requirements - Representation And Utilisation Of Extracted Rules

4.3 Representation And Utilisation Of Extracted Rules

5.1.1 Software Requirements

The proposed solution is implemented with the help of two separate, prototype software applications. The requirements of both software applications are described in Chapter3

and Chapter4. To get a better understanding of the overall scheme, Figure5.1shows all components of both applications: knowledge acquisition (1A) and representation (1B), and knowledge utilisation (2). This figure summarises the steps taken from processing the event log entries to producing a security action plan. Both applications should be fully automated and fulfil all functional requirements. The synopsis of each application is described in the following:

1. An application to perform the knowledge acquisition and representation process, which consists of the following two modules:

(a) For the first module, read the event log entries of a machine that was configured by security experts, remove the routine events and acquire the expert security knowledge by learning user-activity patterns in the form of TAC rules;

(b) Convert the TAC rules into a PDDL domain action model file, such that it represents the acquired knowledge from a particular event log dataset. The users should be able to enable or disable this feature;

(c) Store every TAC rule in a MySQL database. Each rule should have the following information: properties of every event in the form of object-value pairs, number of times this specific rule was found in different datasets, a temporal accuracy value and a causal rank; and

(d) For the second module, collect all TAC rules from the database that were acquired from one or more distinct event log datasets and convert them into a single PDDL domain action model file to represent and utilise the acquired knowledge.

2. The second application will perform the knowledge utilisation process, which also consists of two modules:

Figure 5.1: Overview of the proposed solution represented in two phases. The first phase is further divided into two segments; first, extract temporal-association-causal (TAC) rules from the given event log dataset and second, represent the TAC rules in a PDDL domain action model file. In second phase, a plan solution is generated for the

(a) For the first module, read the event log entries of a previously unseen, vulnerable machine along with the existing domain model to generate a PDDL problem instance;

(b) Modify the default parameters of existing LPG planner software tool (details are in Section5.1.2.2) to accommodate the processing of domain models and problems instances with a large number of objects, and use it to generate a sequence of actions that can be used to perform security assessment and configuration on the vulnerable machine; and

(c) For the second module, create a simple parsing tool that essentially converts the output plan solution of LPG planner in a clear, descriptive format for presenting the discovered security risks and proposed mitigation actions to non-expert users.

Flexibility is one of the significant requirements for building the application. This is be- cause the working of some features needs exploration before finding the best performing implementation methodology. This will also allow the users to configure the software application in a desired manner. For example, the proposed solution calculates the support range automatically for Apriori algorithm (Section 3.2.2); however, users should also be allowed to manually input the minimum and maximum support values in case of incorrect automated values. Another example is the elimination of temporal-association rules based on a 50% threshold of temporal-association accuracy value (Section 3.3.1). This threshold should be changeable by the user to achieve a desired balance between the quantity and quality of rules.

Another requirement is the design of both applications that should cater the need of all the potential users. For the knowledge acquisition and representation application, the main focus should be on the expert-level users, who want to acquire domain action models from configured machines. For the knowledge utilisation application, the main focus should be on the non-expert users, who want to analyse and understand the security status of vulnerable machines. Furthermore, in order to provide a better experience to all kinds of users, both applications should display and explain the step-by-step output of each module in a well-known and clear format, e.g. within console and HTML. This is not an essential requirement for the developed solution to operate; however, it

can benefit both expert and non-expert users in terms of debugging and visualising the results of overall process as shown in Appendix C.

The proposed solution should also aim to reuse the open-source code of existing tools and libraries during the development of both applications (where possible). Such code is generally reliable, tested, verified, crowd-sourced and will also minimise the development effort [188]. For example, the Two-Sample Kolmogorov–Smirnov normality test, performed in Section 3.2.2.1, has already been implemented in Accord.NET1 library. Similarly, the source code of graph analysis and several causal inference algorithms used in Section 3.4 is available in a Pcalg2 library of R language. Furthermore, the source code of LPG planner3 used in Section 4.3.4.2 is also available and can be easily modi- fied to fit our needs. The format of input and output of different components in both applications are designed to be compatible with external tools.

In document Discovering and Utilising Expert Knowledge from Security Event Logs (Page 110-113)