Challenges
Addressed by
Contributions
Preliminary Analysis Proposition Validation Analysis
Collecting Information on vulnerabilities
Evaluating existing VPM approaches
Suggesting a new approach based on Naturalness
Chapter 3 Chapter 7 Chapter 11 Chapter 4 Chapter 8 Chapter 8-11 Chapter 5 Chapter 9 Chapter 12
Exploring the use of Naturalness for S.E. Chapter 11 Chapter 11 Chapter 13 core
contributions
derivative contributions
Figure 1.4: Thesis structure
1.5
Contributions and Thesis structure
This thesis is composed of five parts. The first part introduces the thesis, technical background and the state of the art. Then, the next three parts propose solutions to address the main challenges presented in Section 1.4. Finally, the last part concludes the thesis.
Each challenge is addressed using the methodology shown in Figure 1.4. At first, a preliminary study motivates the challenge, then a proposition to tackle the challenge is made under the form of a publicly available framework enabling further analysis. Finally, a validation study is presented. While Parts II and III address respectively the challenges 1 and 2, Part IV focuses on the naturalness of software and addresses the last two challenges (3 and 4).
Part I: Introduction and state of the art. is composed of the present Chapter (1), which introduces the context of this thesis and Chapter 2 which presents the state of the art regarding vulnerability prediction modelling.
Part II: Analysing and collecting vulnerabilities. addresses the challenge of col- lecting information on vulnerabilities to build a dataset. Chapter 3 presents a manual analysis of Android vulnerabilities. This analysis highlights the difficulties faced by researchers when trying to collect and analyse a large number of vulnerabilities. Then, Chapter 4 presents Data7, a publicly available extensible framework that automati- cally collects vulnerability fixes and information. This framework can be used as base for a VPM evaluation but also to support empirical analysis of vulnerabilities like the one presented in Chapter 5. This chapter presents an analysis of vulnerability fixes according to their types and severities and gives pointers for specific VPM analysis.
The contributions presented in this part are based on work that has been presented in the following papers:
• Matthieu Jimenez, Mike Papadakis, Tegawende F. Bissyand´e, and Jacques Klein. Pro- filing android vulnerabilities. In 2016 IEEE International Conference on Software Quality, Reliability and Security (QRS), pages 222–229, Aug 2016
• Matthieu Jimenez, Mike Papadakis, and Yves Le Traon. An empirical analysis of vul- nerabilities in openssl and the linux kernel. In 23rd Asia-Pacific Software Engineering Conference, APSEC 2016, Hamilton, New Zealand, December 6-9, 2016, pages 105– 112, 2016
Chapter 1. Introduction
• Matthieu Jimenez, Yves Le Traon, and Mike Papadakis. Enabling the continous analy- sis of security vulnerabilities with vuldata7. In IEEE International Working Conference on Source Code Analysis and Manipulation, 2018
Part III: Investigating Vulnerability Prediction Models is dedicated to the sec- ond of four challenges of this thesis, the replication and comparison of existing VPM approaches. Chapter 7 starts by presenting an exact independent replication study of three of the main VPM approaches on a dataset of Linux Kernel vulnerabilities built using the framework of Chapter 4. Following this experience, chapter 8 presents an ex- tensible framework that allows practitioners to replicate, evaluate and compare VPM approaches. This framework is then used in Chapter 9 to perform the largest empir- ical study on VPM comparing the three approaches replicated before using different settings and evaluation criteria.
The contributions presented in this part are based on work that has been presented in the following papers:
• Matthieu Jimenez, Yves Le Traon, and Mike Papadakis. Enabling the continous analy- sis of security vulnerabilities with vuldata7. In IEEE International Working Conference on Source Code Analysis and Manipulation, 2018
• An Empirical Study on Vulnerability Prediction of Open-Source Software Releases (under review)
Part IV: Naturalness of Software addresses the third and fourth challenges that are related by their use of naturalness. Chapter 11 presents a study on the use of naturalness for software engineering. The focus is put on the effect of code represen- tation and language models parameters on naturalness. The chapter also introduces a framework to compute naturalness of software, which is used to create a VPM ap- proach studied and evaluated in Chapter 12in an effort to address the third challenge. Finally, Chapter 13 address the last challenge by presenting an empirical study on the use of naturalness for the selection of “fault-revealing” mutants.
The contributions presented in this part are based on work that has been presented in the following papers:
• Matthieu Jimenez, Maxime Cordy, Yves Le Traon, and Mike Papadakis. On the impact of tokenizer and parameters on n-gram based code analysis. In 34th IEEE International Conference on Software Maintenance and Evolution, ICSME 2018, Madrid, Spain, September 23-29 2018, 2018
• Matthieu Jimenez, Thierry Titcheu Checkham, Maxime Cordy, Mike Papadakis, Mari- nos Kintis, Yves Le Traon, and Mark Harman. Are mutants really natural? a study on how “naturalness” helps mutant selection. In 12th International Symposium on Empirical Software Engineering and Measurement, ESEM 2018, Oulu, Finland, 11-13 October 2018, 2018
Finally, this dissertation is concluded in Chapter 15, where possible future research directions are discussed.
2
State of the Art
This chapter presents a list of works and studies on VPMs and provides an overview of works on related research topics introduced in Section 1.3. Special care was taken to exhaustively cover all the published studies until the time of writing.
Contents
2.1 VPMs: Over a decade of studies . . . 20