Challenges - Evaluating Vulnerability Prediction Models

1.4.1 Overview

In today’s connected world, software systems are pervasive. Thus, a vulnerability found in one of those systems has the potential to impact billions of users. Unfortunately, it is not possible for every software vendor to continuously test their entire codebase. First of all, uncovering vulnerabilities unlike bugs require specific skills and knowledge that developers don’t necessarily possess. Then, the cost of searching the entire code base remains which represents millions of lines of code for the largest projects.

To help reduce this cost, researchers came up with various methods to guide the security testing effort. Among those, vulnerability prediction modelling has shown some promising results and research in this area has been pretty active over the past few years.

1.4. Challenges

Most of the studies suggesting new approaches for vulnerability prediction modelling fail to compare their results with previously introduces ones. Furthermore, the way approaches are evaluated often differs from one study to another. This difference can originate from the choice of the dataset, ground truth, evaluation methodology, performance metrics or a combination of those. This makes impossible the comparison of results across studies, which can give an impression of fuzziness and instability. In order to pursue its evolution, research on vulnerability prediction modelling need more comparison and replication studies acknowledging what has already been proposed. Still, novel approaches should continue to emerge but they should always be compared to previous ones. One possibility for a novel approach could be the use of “naturalness of code” that has been successfully applied for defect prediction [156].

In the following, an overview of the main challenges addressed in this thesis is presented.

1.4.2 Challenges addressed in this thesis

Collecting Vulnerabilities: One of the many factors explaining the lack of empir- ical studies in this line of research is the absence of standard vulnerability datasets, which could be used to evaluate VPM. Indeed most of researchers starts by creating their own datasets. This hinders research in the area as creating a dataset is hard. Indeed, information required is usually scattered in different places. Additionally more than one project should be used to build such a dataset in order to verfify the gener- alization. Ideally, those projects should in addition have a large number of reported vulnerabilities, be security-sensitive and open source, which reduces the candidates.

In fact, from the researcher point of view, the creation of a dataset is interesting as it ensures a full control over and understanding of the data. Yet this is counterproductive at a larger scale. Moreover, as new vulnerabilities are found on a daily basis, a dataset can quickly be outdated and bias the results.

Finally, vulnerability datasets are not only useful for vulnerability prediction modelling, but can also be used for a large variety of analysis. Among the possible analyses, the ones analysing vulnerabilities properties are of special interest as when put together with the results of VPM, they can help developers understand the output of the model.

Challenge #1:

The first challenges addressed in this thesis is the automatic collection of vulnerable code instances. The collection process can be used in order to make a reliable, evolutive, multi-project and large dataset that is suitable for vulnerability analysis.

Evaluating the Existing VPM Approach: Previous studies on VPM often do not compare their results with previous works. One explanation, aside from the use of different datasets, lies in the fact that the target and evaluation methodology of the studies are usually different. Hence, even if the metrics used for measuring the performance of the models are the same (which is not always the case) it is unwise to compare the result obtained following a 10-fold cross validation to ones obtained performing next release validation.

Chapter 1. Introduction

The only remaining solution for researcher willing to compare their result with other approaches is thus to replicate them using their dataset, evaluation methodology and target. Unfortunately, only a couple of studies provide replication framework and researchers have thus to recreate the approach based on the information available on the papers. Replicating approaches is time consuming which explains the lack of comparison.

Challenge #2:

The second major challenge addressed in this dissertation is the replication and comparison of existing approaches.

Suggesting a new approach based on naturalness: The application of Natural Language Processing (NLP) to software engineering has received a growing interest in the recent years [25]. In particular, the study of software naturalness [77] has given birth to many approaches for generating source code, e.g., code completion [77], synthesis [155], review [76], obfuscation [115] and repairs [154]) and performing static analyses [81, 104, 140].

The naturalness of software is the measure of how surprising is a software component to a statistical language model trained on other software components. Intuitively, one might think that a “surprising” component might be suspicious. Yet, interestingly, the naturalness of software has never been used for security-related tasks. Additionally, the fact that it has been successfully applied to the area of defect prediction [156] makes it an even more interesting candidate for building a VPM approach upon. History of VPM has shown that approaches working for defect prediction are usually good candidate for vulnerability prediction. Still, the diverse ways to compute naturalness, e.g., tokenization, n, smoothing techniques,..., require careful thinking on which settings to use as features.

Challenge #3:

The third major challenge addressed in this dissertation is the development of a new VPM approaches based on the notion of naturalness.

Using Naturalness of software for Software Engineering: As stated before the study of software naturalness led to the development of many approaches. Still, the spectrum of it possible usages is large and has not been fully explored and some fields of research could benefit from it, like mutation testing.

Mutation Testing mutates part of a program to evaluate its test suite. To test a mutant, it needs to be executed against the test suite. One of the main issues of mutation testing is that generated mutants are numerous and testing them all can take time, while not all of them are of interest. Thus, some preliminary analysis to reduce their number is required and naturalness could be a good indicator to select mutants.

Challenge #4:

The last challenge addressed in this thesis is how naturalness could be used for other software engineering tasks.

In document Evaluating Vulnerability Prediction Models (Page 32-35)