5.9 Experimental Evaluation
5.9.11 Conclusions
We presented two efficient, scalable feature model synthesis algorithms in this chapter: FGE-CNF and FGE-DNF. We evaluated FGE-CNF against FGE-BDD—a BDD-based imple-
mentation, and evaluated FGE-DNF against FGE-FCA—a FCA-based synthesis technique
by Ryssel et al.[RPK11].
We use two datasets for our evaluation: a dataset with input derived from the Linux variability model and 267 models gathered from the SPLOT model repository, and a
dataset of 20 generated feature models with 3-CNF constraints. We use both datasets for our evaluation on FGE-CNF. We generated configurations for SPLOT models with
less than 100,000 configurations for our evaluation onFGE-DNF. The input from the
generated 3-CNF feature models were more difficult for both FGE-BDD and FGE-CNF.
FGE-BDD timeouted in most cases while FGE-CNF was able to complete computation for
12 of the 20 models. Our evaluation showed that FGE-CNF was significantly faster than
FGE-BDD. FGE-DNF and FGE-FCA were generally comparable for the real world model
dataset. However, there were five models where FGE-DNF was significantly faster than
FGE-FCA. These models had many features that were mandatory with respect to the
Feature Model Synthesis
In the previous chapter, we introduced the FEATURE-GRAPH-EXTRACTIONalgorithm for
synthesizing a feature graph given input as a propositional formula. A feature graph encapsulates all feature diagrams that are entailed by the input formula. However, the feature graph is not a proper feature model—the hierarchy is a DAG instead of a tree, and feature groups can overlap.
In this chapter, we describe a semi-automated procedure, called FEATURE-TREE-SYNTHESIS,
that abstractly selects the most suitable feature diagram from the feature graph. In practice, both FEATURE-TREE-SYNTHESISand FEATURE-GRAPH-EXTRACTIONare intertwined.
FEATURE-TREE-SYNTHESISsupplements FEATURE-GRAPH-EXTRACTIONby providing a semi-
automated procedure for determining a distinct feature hierarchy. Our procedure uses the input dependencies as a guide for the configuration semantics and a textual similarity measure to approximate the domain semantics. Given a feature, the user selects its parent given a list of implied features that are ranked by their similarity to the selected feature. In a practical synthesis scenario, the input dependencies may be incomplete, i.e., constraints may be missing in the input. Our procedure deals with incomplete dependencies by providing a second list that ranks features solely using the textual similarity measure. This algorithm, combined with FEATURE-GRAPH-SYNTHESIS
in Chapter5, form a complete feature model synthesis algorithm.
Chapter Organization In Section6.1, we introduce the motivation and context of our synthesis algorithm. Section6.2gives an overview of the procedure with an examples and in Section6.3describes our procedure. Section6.4describes the evaluation of our procedure on the Linux, eCos, and portions of the FreeBSD kernel.
6.1 Introduction
Variability-rich software systems, such as FreeBSD, do not have a feature model and could benefit from having one. FreeBSD describes features and dependencies in an ad-hoc manner—features are scattered in documentation and dependencies are hidden in code. Such projects would benefit from having an explicit feature model instead. Unfortunately, constructing a feature model is both time and cost-intensive. Building the feature hierarchy in particular, requires substantial effort from a modeler. This task requires the modeler to review feature descriptions and dependencies to determine which dependencies to model in the hierarchy, and which to defer to cross- tree constraints. FreeBSD has 1203 features; constructing a feature model for a project of this size would require tremendous time and effort. Furthermore, the difficulty is compounded when the modeler lacks a complete set of dependencies. In this case, the dependencies could be uncovered by examining supplemental data such as feature names and feature descriptions. This may require the modeler to sift through the text of potentially hundreds of features in order to determine the correct placement for a single feature. Even with a complete set of dependencies, selecting the right parent for a feature is still challenging—a single feature may depend on over a hundred others as we have observed in the variability models of the Linux and eCos kernels.
We present a tool-supported approach for reverse engineering feature models called FEATURE-TREE-SYNTHESIS. The key challenge is the construction of the feature diagram.
This task reduces to the selection of a parent for each feature. We present heuristics for identifying the likely parent candidates for a given feature. Our heuristics significantly decrease the number of features that a user has to consider from potentially thousands to only a handful—typically five or less, as shown by our experiments. We also provide automated procedures for finding feature groups, implies and excludes edges. If the set of input dependencies are complete, the final feature model is entailed by the input dependencies.
FEATURE-TREE-SYNTHESISrequire a list of feature names, supplementary descriptions,
and a propositional formula describing its dependencies. Feature names and descrip- tions can be extracted from documentation, preprocessor symbols or code comments. For our evaluation on the FreeBSD kernel, we extracted the input data by analyz- ing Makefiles, preprocessor declarations, and documentation, using a combination of generic and custom extraction tools.
Due to the complexity, size and nature of most software projects, it is likely that the extracted feature dependencies and descriptions are incomplete. Our heuristics
accommodate this incompleteness by leveraging two sources of data that complement one another—when dependencies are incomplete, the feature descriptions are used to identify parent candidates and vice versa.
We evaluate the effectiveness of our procedures by comparing the results of our heuris- tics to the reference feature models of the Linux, eCos and FreeBSD kernels. Linux and eCos both have an existing reference feature model[BSL+10b]. The input depen-
dencies and supplementary feature descriptions were jextracted from the reference models themselves. For FreeBSD, we manually constructed a reference feature model for a subset of features after domain analysis. The evaluations show that, for 76% of features in Linux and 79% in eCos, the correct parent is in the top five parent candidates returned by our heuristics. In contrast to Linux and eCos, the input set of dependencies for FreeBSD is incomplete, and thus, we consider two separate results for FreeBSD: (1) for 84% of the features whose parent dependency is present, the correct parent is in the top two candidates; (2) for 75% of the remaining features, the correct parent is in the top or 3% of all 1203 features. Finally, our procedure automatically recovers all feature groups, as presented in the reference models for Linux and eCos. We assume that the modeler settled on the same hierarchies as these models in our evaluation. With the incomplete dependencies of FreeBSD, we were still able to retrieve one of the three feature groups.
The contribution of this work is twofold. On the practical side, we present heuristics and procedures for reverse engineering feature models. Although reverse engineering feature models from logic formulas[CW07] and descriptions [ASB+08,NE08b] were
considered before in separation, the main contribution of our approach is that it combines both sources of information together. This combination is desirable since, as our evaluation shows, the two sources are complementary. Also the procedures of [CW07] and [ASB+08,NE08b] are not complete, in the sense that the former cannot
recover parents which are not direct dependencies, while the latter suggests only a single hierarchy that is unlikely the desired one. We also reverse engineering of large- scale feature models on input derived from the Linux and eCos kernel showing that our approach and procedures scale. On the theoretical front, we describe how both configuration semantics and domain semantics relate to feature hierarchy.
(acpi → acpi_system∧pm)
∧ (acpi_system → acpi)
∧ (cpu_freq → pm)
(1) ∧ (cpu_freq → powersave∨performance)
(2) ∧ (cpu_hotplug → powersave)
(3) ∧ (cpu_hotplug → ¬performance)
∧ (cpu_hotplug → acpi∧cpu_freq)
∧ (powersave → ¬performance)
∧ (powersave → cpu_freq)
∧ (performance → cpu_freq)
∧ (powersave∧acpi → cpu_hotplug) (a) Dependencies
pm Power management, CPUandACPI options
acpi Advanced ConfigurationandPower Interface support
acpi_system Enableyoursystemtoshut down using ACPI
cpu_freq CPU frequency scaling
cpu_hotplug Allows turning CPU onandoff
powersave ThisCPU governor usesthelowest frequency
performance ThisCPU governor usesthehighest frequency
(b) Features and descriptions