9. Conclusions and future work
9.1 Conclusions
Let us recapitulate the research questions. The main research question is as follows.
HOW CAN A WEB-BASED ANIDS AUTONOMOUSLY ADAPT TO
LEGITIMATE CHANGES IN THE MONITORED WEB
APPLICATION?
In chapter 5 we have distinguished three sub questions, which together serve in answering the main research question. These sub questions are the following.
WHEN CAN AN OBSERVED CHANGE BE CONSIDERED TO BE LEGITIMATE? ꟷ
WHEN A LEGITIMATE CHANGE HAS BEEN DISTINGUISHED, HOW TO UPDATE THE MODEL?
ꟷ
ARE CERTAIN TRAINING ALGORITHMS MORE PREFERABLE THAN OTHERS WHEN IT COMES TO THE PROPOSED METHODS FOR DISTINGUISHING
LEGITIMATE CHANGES AND RETRAINING THE MODEL?
Based on the promising results that were shown in chapter 8, we can conclude that our method can serve as an effective way to cope with the challenges that are involved in web-based anomaly detection. However, the method has its limitations. In addition to the shortcomings that we described in paragraph 8.4.1, an important fact that has to be taken in to account when one would consider to deploy our method in a production environment is that there are several parameters that should be manually adjusted based on a specific environment or application. This notion has especially become apparent in paragraph 8.2.2.3, and it contradicts the desired goal that was described in paragraph 7.2.3, namely to have a system that does not require any parameters that should be configured manually and that relate to properties of the training data that are unknown beforehand. On the other hand, it could be possible to automate the adjustment of these parameters. With this in mind, the system as is can be seen as a solid foundation that lends itself for
extensions that include even more extensive self-learning capabilities. A brief overview of such extensions is included in the next and last paragraph of this report, which is about the future work that can be done to improve the current system and the opportunities to research remaining uncertainties.
This brings us to the answers on the research questions. Since our system, despite its limitations, proved to be a well performing method to cope with the challenges of legitimate change detection and retraining, and given the uniqueness of our approach, we argue that we can advocate that our system can offer an
acceptable solution to these challenges in general. With this we do not imply that our system is the
embodiment of the most optimal solution in web-based anomaly detection. Instead, our method is a solution of which we have proved that it can be effective under certain circumstances. Being based on several
methods that are individually able to cope with some of the challenges related to anomaly-based intrusion detection, we have designed a system that would theoretically be superior to using each single method independently: the whole is greater than the sum of its parts. Even though the system that was created for this research lacks certain features that, in case they would have been implemented, would have improved the effectivity with respect to the challenges, we believe we can make the well-founded assertion that our system, given its current results and its opportunities for optimization, can serve as a solution, which enables us to answer the research questions based on this solution. The answers will then be as follows.
WHEN CAN AN OBSERVED CHANGE BE CONSIDERED TO BE LEGITIMATE?
The method with which to determine whether a change is legitimate, depends on the entity
that is subject to this assessment. For whitelisted web resources, parameter names, and
regular parameter values, an observed change can be considered to be legitimate when the
newly observed entity that represents the change has a sufficient reputation index after a
certain period of time. The reputation index is based on the variety of clients as well as the
confidence indexes of the clients that accessed the new entity in the monitored environment.
The confidence index of a client is a measure for how well a client can be trusted, i.e. to what
extent the client can considered to be non-malicious, and depends on the variety of entities
that was accessed by the client in the past.
For irregular parameter values, there are different gradations of change, which are specified
by pre-defined thresholds. Minor change is automatically considered to be legitimate and
incorporated in the model on a per-request basis, directly after detection. Instances of
moderate change (suspicious items) and major change (suspicious-anomalous items) are
evaluated at certain intervals by using the clustering algorithm that was described in this
report, and in the following situations an entity will be considered to be legitimate:
-
The requests belong to a cluster that has an exemplar that was already legitimate, or:
o
The exemplar is suspicious and the cluster is either large and dense enough or
has a sufficient reputation index.
o
The exemplar is suspicious-anomalous and the cluster is large and dense
enough and has a sufficient reputation index.
WHEN A LEGITIMATE CHANGE HAS BEEN DISTINGUISHED, HOW TO UPDATE THE MODEL?
There are different types of models. For whitelisted web resources the new web resource is
simply whitelisted. For new parameter names, the parameter name itself is whitelisted, and
the corresponding values that were monitored are either used to create a regularparameter
values model or an irregular parameter values model, depending on the number of different
special characters in the batch of values.For new regular parameter values that belong to
existing parameter names it is first determined whether the existing regular model should
either be updated or replaced, based on the fraction of newly observed requests that are
considered to be valid by the existing model. When the new requests deviate too much from
the existing model, the model is replaced instead of updated. Finally, for new irregular
parameter values belonging to existing parameter names, the clustering algorithm will output
a new set of clusters, of which the illegitimate clusters will be removed based on the criteria
that were listed earlier.
ARE CERTAIN TRAINING ALGORITHMS MORE PREFERABLE THAN OTHERS WHEN IT COMES TO THE PROPOSED METHODS FOR DISTINGUISHING LEGITIMATE CHANGES AND RETRAINING
THE MODEL?