Methodology - Effective Bug Finding

1 config INC

2 bool "Increment variable x"

4 config DEC

5 bool "Decrement variable x"

6 depends on INC

3.2 Methodology

3.2.1 Objective

My objective was to qualitatively understand the complexity and nature of bugs in highly-configurable systems-level software (cf. Section 1.3). This includes addressing the following research questions:

RQ1 What are the variability characteristics of bugs?

RQ2 What are the challenges for program analysis tools?

RQ3 What kind of bug finders do we need?

3.2.2 Subject

I study Linux, which is likely the largest highly configurable open-source system in existence, with more than 14 million lines of code, and 16 thousand configuration options, as of 2016. Crucially, I have free ac- cess to the bug tracker3, the source code and change history4, and to public discussions on the mailing list5 and other forums. There also exist books on Linux development [BC05, Lov10], which are valuable resources when understanding a bug-fix.

3_{https://bugzilla.kernel.org/} 4_{http://git.kernel.org/}

32 Chapter 3. A Qualitative Study of Bugs in Linux

Message filters

CONFIG_fid configuration config option if fid is [not]? set when _fid is [not]? set if fid is [en|dis]abled when fid is [en|dis]abled

Content filters select fid config _fid depends on fid #if #ifdef _fid #else #elif #endif

Figure 3.2: Regular expressions selecting configuration-related commits; where fid abbreviates[A-Z0-9_]+, matching feature identifiers.

3.2.3 Part 1: Finding variability bugs

I focus my attention on bugs already found, corrected, and merged into the stable branch of Linux. These bugs have been publicly discussed (usually on LKML) and confirmed as actual bugs by the developers, so the information about the nature of the bug fix is reliable, and I mini- mize the chance of including fictitious problems. In early 2014, when this study was conducted, the Linux stable repository had over 400, 000 commits. This large commit history rules out manual investigation of each commit. Thus, I have settled on a semi-automated search through Linux commits to find (variability) bugs via historic bug fixes, using the following steps:

1. Selecting variability-related commits. I find commits whose message indicates a variability-related change; or whose patch appears to alter the feature model, the feature mapping, or configuration- dependent code. This is achieved by matching case insensitively the regular expressions of Figure 3.2. Expressions on the left, identify commits in which the author’s message relates the commit to specific features. Those on the right, identify commits introducing changes to the Kconfig feature model, the (CPP) feature mapping, or code near an#if_{conditional. This step selects in the order of tens} of thousands of commits.

2. Selecting bug-fixing commits. I further narrow to commits that po- tentially fix bugs and thus, together with the previous filter, we obtain candidates for variability bug-fixes. This is achieved by

3.2. Methodology 33 Generic filters bug fix[es] closes \# oops warn error unsafe invalid violation end trace kernel panic Specific filters void \* unused overflow undefined deadlock double lock memory leak uninitialized dangling pointer

null [pointer]? dereference . . .

Figure 3.3: Regular expressions selecting bug-fixing commits. matching (also case insensitively) the regular expressions of Fig- ure 3.3 against the commit message. Expressions on the left are generic keywords that can appear in any bug-fixing commit message. Expressions on the right of the figure try to identify bug- fixing commits for specific types of bugs, such as void-pointer dereferences (void *), undefined symbols (undefined), use be- fore initialization (uninitialized_{), and so on. Generic filters may} select thousands of commits in Linux, whereas specific filters may select only a few hundreds or tens.

3. Manual scrutiny. Finally, I read the commit message or the issue, and inspect the changes introduced by the commit to remove false positives. For instance, commit 7518b5890d passes through the previous two filtering steps yet, after examining the complete commit message, it became clear that it does not fix a bug, but adds new functionality. I examined simple commits first. A complex commit either introduces more than a few changes (I choose a cut-off value of ten), or affects very complex subsystems (e.g., the kernel/sched Linux subsystem). The ideal commit has an elaborated message pro- viding some form of error trace, and introduces few modifications. The selection of the keywords and patterns of figures 3.2 and 3.3 was based on my own understanding of the Linux kernel, and a preliminary analysis of Linux code and commit messages to identify keywords and phrases used by developers. For instance, I learned that, in Linux, an

34 Chapter 3. A Qualitative Study of Bugs in Linux

Oops refers to a critical condition detected by the kernel, such as aNULL pointer dereference in kernel code.

3.2.4 Part 2: Analysis of bug candidates

The second part of the methodology is significantly more laborious than the first part. For each variability bug identified, I manually analyze the commit message, the patch fix, and the actual code to build an understanding of the bug. When more context is required, I find and fol- low the associated discussion on LKML. Code inspection is supported by ctags,6 and the Unix grep utility.

1. The semantics of the bug. For each bug, I want to understand the cause of the bug, the effect on the program semantics, and the re- lation between the two. This often requires understanding the in- ner workings of the kernel, and translating this understanding to general programming language terms accessible to a broader au- dience. As part of this process, I try to identify a relevant bug trace, and collect links to available information about the bug online. 2. Variability related properties. I establish what is the presence condi-

tion of a bug (precondition in terms of configuration choices), and where it was fixed: in the code, in the feature model, or in the mapping.

3. Simplified version. I condense my understanding in a simplified version of the bug. This serves to explain the original bug, and con- stitutes an easily accessible benchmark for testing and evaluating ideas and proof-of-concept prototypes.

I analyzed bugs from the previous step (cf. Section 3.2.3) following this method, and stored the reports from this analysis in a publicly available database. The detailed content of the report is explained in Section 3.4.

3.2.5 Part 3: Data analysis

I reflect on the set of collected data in order to find answers to my three research questions (cf. Section 3.2.1). This step is supported with some

In document Effective Bug Finding (Page 43-47)