Conclusions - Conclusions and future work

9. Conclusions and future work

9.1 Conclusions

Let us recapitulate the research questions. The main research question is as follows.

HOW CAN A WEB-BASED ANIDS AUTONOMOUSLY ADAPT TO

LEGITIMATE CHANGES IN THE MONITORED WEB

APPLICATION?

In chapter 5 we have distinguished three sub questions, which together serve in answering the main research question. These sub questions are the following.

WHEN CAN AN OBSERVED CHANGE BE CONSIDERED TO BE LEGITIMATE? ꟷ

WHEN A LEGITIMATE CHANGE HAS BEEN DISTINGUISHED, HOW TO UPDATE THE MODEL?

ꟷ

ARE CERTAIN TRAINING ALGORITHMS MORE PREFERABLE THAN OTHERS WHEN IT COMES TO THE PROPOSED METHODS FOR DISTINGUISHING

LEGITIMATE CHANGES AND RETRAINING THE MODEL?

Based on the promising results that were shown in chapter 8, we can conclude that our method can serve as an effective way to cope with the challenges that are involved in web-based anomaly detection. However, the method has its limitations. In addition to the shortcomings that we described in paragraph 8.4.1, an important fact that has to be taken in to account when one would consider to deploy our method in a production environment is that there are several parameters that should be manually adjusted based on a specific environment or application. This notion has especially become apparent in paragraph 8.2.2.3, and it contradicts the desired goal that was described in paragraph 7.2.3, namely to have a system that does not require any parameters that should be configured manually and that relate to properties of the training data that are unknown beforehand. On the other hand, it could be possible to automate the adjustment of these parameters. With this in mind, the system as is can be seen as a solid foundation that lends itself for

extensions that include even more extensive self-learning capabilities. A brief overview of such extensions is included in the next and last paragraph of this report, which is about the future work that can be done to improve the current system and the opportunities to research remaining uncertainties.

This brings us to the answers on the research questions. Since our system, despite its limitations, proved to be a well performing method to cope with the challenges of legitimate change detection and retraining, and given the uniqueness of our approach, we argue that we can advocate that our system can offer an

acceptable solution to these challenges in general. With this we do not imply that our system is the

embodiment of the most optimal solution in web-based anomaly detection. Instead, our method is a solution of which we have proved that it can be effective under certain circumstances. Being based on several

methods that are individually able to cope with some of the challenges related to anomaly-based intrusion detection, we have designed a system that would theoretically be superior to using each single method independently: the whole is greater than the sum of its parts. Even though the system that was created for this research lacks certain features that, in case they would have been implemented, would have improved the effectivity with respect to the challenges, we believe we can make the well-founded assertion that our system, given its current results and its opportunities for optimization, can serve as a solution, which enables us to answer the research questions based on this solution. The answers will then be as follows.

WHEN CAN AN OBSERVED CHANGE BE CONSIDERED TO BE LEGITIMATE?

The method with which to determine whether a change is legitimate, depends on the entity

that is subject to this assessment. For whitelisted web resources, parameter names, and

regular parameter values, an observed change can be considered to be legitimate when the

newly observed entity that represents the change has a sufficient reputation index after a

certain period of time. The reputation index is based on the variety of clients as well as the

confidence indexes of the clients that accessed the new entity in the monitored environment.

The confidence index of a client is a measure for how well a client can be trusted, i.e. to what

extent the client can considered to be non-malicious, and depends on the variety of entities

that was accessed by the client in the past.

For irregular parameter values, there are different gradations of change, which are specified

by pre-defined thresholds. Minor change is automatically considered to be legitimate and

incorporated in the model on a per-request basis, directly after detection. Instances of

moderate change (suspicious items) and major change (suspicious-anomalous items) are

evaluated at certain intervals by using the clustering algorithm that was described in this

report, and in the following situations an entity will be considered to be legitimate:

-

The requests belong to a cluster that has an exemplar that was already legitimate, or:

o

The exemplar is suspicious and the cluster is either large and dense enough or

has a sufficient reputation index.

o

The exemplar is suspicious-anomalous and the cluster is large and dense

enough and has a sufficient reputation index.

WHEN A LEGITIMATE CHANGE HAS BEEN DISTINGUISHED, HOW TO UPDATE THE MODEL?

There are different types of models. For whitelisted web resources the new web resource is

simply whitelisted. For new parameter names, the parameter name itself is whitelisted, and

the corresponding values that were monitored are either used to create a regularparameter

values model or an irregular parameter values model, depending on the number of different

special characters in the batch of values.For new regular parameter values that belong to

existing parameter names it is first determined whether the existing regular model should

either be updated or replaced, based on the fraction of newly observed requests that are

considered to be valid by the existing model. When the new requests deviate too much from

the existing model, the model is replaced instead of updated. Finally, for new irregular

parameter values belonging to existing parameter names, the clustering algorithm will output

a new set of clusters, of which the illegitimate clusters will be removed based on the criteria

that were listed earlier.

ARE CERTAIN TRAINING ALGORITHMS MORE PREFERABLE THAN OTHERS WHEN IT COMES TO THE PROPOSED METHODS FOR DISTINGUISHING LEGITIMATE CHANGES AND RETRAINING

THE MODEL?

When we talk about training algorithmsin our proposed methods, this is mainly about the

calculations that are performed in the trusted client system and the clustering method. For

the trusted client system we are not aware of any methods that could outperform our

current method with respect to the properties that we deem important in calculating “trust”,

and the ability with which the relative importance of these properties can be adjusted by

means of tweaking the variables of the algorithm.

For the clustering algorithm however, we have pointed out that the choice of using the

Affinity Propagation (AP) algorithm as opposed to for example the DBSCAN algorithm, was a

deliberate decision, because AP is more suitable for adapting to a changing environment

(refer to paragraph 6.3.1).

Finally, we have noted the importance of the system’s autonomicity. The algorithms that we

introduced can certainly be improved, in such a way that they would be able to automatically

adjust the system’s parameter values for different applications, or for a certain application

that grows in time, with respect to the number of different entities and clients. In such a way,

these more autonomous algorithms would be more preferable.

HOW CAN A WEB-BASED ANIDS AUTONOMOUSLY ADAPT TO LEGITIMATE

CHANGES IN THE MONITORED WEB APPLICATION?

Our proposed method, based on which we have created the web-based ANIDS named

Scandax, proved to be fairly able to autonomously adapt to legitimate changes in the

monitored web application. Improvements are possible to make the system even more

In document Self adaptation to concept drift in web based anomaly detection (Page 85-88)