2.3 Software Security as a Characteristic of Software Quality
2.3.3 Modeling the Vulnerability Discovery Process
The same year thatMilk or Wine was published, another group attempted to deter-
mine whether models similar to those used for software reliability engineering could be used to provide software security metrics [CA05, AMR05, AMR07]. Like Ozment
and Schechter, Alhazmi, et al., were interested in seeing whether vulnerability den-
sity was a useful metric for software security, and like Arbaugh, et al., the authors
looked at vulnerability discovery rates to determine whether models could be used to predict trends. In addition, they also compared the ratio of known vulnerabilities to known defects. Two years earlier, Anderson [And02], had proposed this as a metric for software security, and guessed that the value might be around 1%, while similarly, McGraw [McG03] suggested this ratio was probably higher, around 5%, but neither
actually measured it. Alhazmi,et al., hoped to determine which (if either) estimate
was correct, and hypothesized that if one were correct, this ratio could be used to estimate the number of remaining undiscovered vulnerabilities. Their goal was to develop a model for the entire vulnerability discovery process.
2.3.3.1 Windows VDD and VDR
For their analysis, the authors looked at different versions of the Microsoft Windows and the Redhat Linux operating systems.
Table 2.2: Vulnerability density vs. defect density for several versions of the Mi- crosoft Windows operating system. - from [CA05]
density (DKD) and vulnerability density (here labeled VKD) and the ratio between
the two, for Microsoft Windows client operating systems Windows 95, 98 and XP and Windows server operating systems Windows NT and 2000. Looking at the client operating systems they noted that while the defect densities and vulnerability densities were quite close for versions 95 and 98. For Windows XP the values were much lower. They attributed this difference to the fact that their dataset included the defects reported in the beta version, as well as the final release, resulting in a much larger defect total. They also stated that their numbers represented only a fraction of XP’s overall vulnerability density and therefore they expected this value to “go up significantly, perhaps to a value more comparable to the two previous versions.” (See Table 2.2) Interpreting their results, the authors observed that there were several vulnerabilities shared between Win 98 and XP, and that the slope of the XP graph shows almost no learning rate.
They also compared the vulnerability and defect densities for two versions of Microsoft Windows Server: Windows NT and 2000. They were surprised to find
that the VKD is around three times higher for the server versions than for the client
versions. The authors offered two possible explanations: First, that a larger portion of a server’s software is involved with functions requiring external access, which they claimed made it more vulnerable, and Second, they asserted that server software
Figure 2.17: Cumulative and Shared vulnerabilities between Windows 95 and Win- dows 98. [CA05]
Figure 2.18: Cumulative and Shared vulnerabilities between Windows 98 and Win- dows XP. [CA05]
must have undergone more stringent testing and therefore more vulnerabilities were found and reported.
Figure 2.17 shows the cumulative vulnerabilities for Windows 95 and 98 as well as the shared vulnerabilities. Figure 2.18 compares Windows 98 and XP and Figure 2.19 shows the same for Windows NT and 2000.
Table 2.3: Vulnerability density vs. defect density for two versions of the Redhat operating system. - from [CA05]
2.3.3.2 Linux VDD and VDR
After examining the various MS Windows operating systems, the authors were curi- ous to see if an open source operating system displayed the same characteristics as the closed source systems. They chose two versions of Redhat Linux for compari- son. Table 2.3 shows their results for Redhat version 6.2 and 7.1. Figure 2.20 shows the plot of cumulative vulnerabilities for both versions as well as the vulnerabilities shared between them. Looking at the graph, they made the following observations:
While the code size for version 7.1 is twice as large as version 6.2, the VKD and
DKD) are similar. Additionally, the VKD for Red Hat Linux is in the same range
as that of Windows 2000. Looking at the ratio of VKD to DKD in Red Hat 7.1, the
authors state that they expected the VKD “to rise significantly in the near future”
and note that the value of the ratios for both Linux versions are close to the 5% proposed by McGraw. [McG03] Here, as well as with MS Windows, they noted the shared vulnerabilities between versions.
Figure 2.19: Cumulative and Shared vulnerabilities between Windows NT and Win- dows 2000. [CA05]
Figure 2.20: Cumulative and Shared vulnerabilities between Redhat Linux versions 6 and 7. [CA05]
2.3.3.3 A proposed model
Figure 2.21: Proposed 3-phase model. See [CA05]
Alhazmi, et al., found a common pattern across all the operating systems they
phases. They claimed that these phases follow the s-shaped model they had proposed in their earlier work. [CA05]
Figure 2.21 describes the proposed three phase model. According to their defini- tions, Phase 1 is the phase where users begin to switch to the new operating system and testers (both good and bad) gather knowledge about how to break it. In Phase 2, the time when the operating system usage gathers momentum and it reaches its peak usage. The authors claimed that most vulnerabilities would be found in this phase. Phase 3 begins as the system is replaced by a newer release and attention shifts to the newer system. From this model, the authors claimed that the vulner- ability discovery rate is controlled by two-factors, the momentum gained by market acceptance, and saturation (defined as total vulnerabilities minus the cumulative number of discovered vulnerabilities), and that the vulnerability discovery process could be modeled by the following equation:
dy/dt=Ay(B−y) (2.1)
where t = calendar time, A = a constant of proportionality, y is the cumulative discovered vulnerabilities and B = total number of vulnerabilities.
Fitting their data to the model, the authors applied a chi-squared goodness of fit test to if this model applied. Figure 2.22 and Figure 2.23 show the results of this fit and their corresponding P-values for Windows NT 4.0 and Figure 2.24 and Figure 2.25 show the same for Red Hat Linux 7.1.
For most of the operating systems tested, the fit does appear to be statistically significant and the authors concluded that like defect densities, vulnerability densities fall seem to fall within a range, and that range appears to support the 1%-5% values proposed by McGraw and Anderson. They claimed that vulnerability density is a
“significant and useful metric”, and further surmised that the ratio of VKD to VDD
Figure 2.22: Chi-squared test of the Alhazmi vulnerability discovery equation for Windows NT. See [CA05]
Figure 2.23: Results of the Alhazmi model fit tests for vulnerability discovery equa- tion for Windows NT. See [CA05]
shared code in a newer operating system can impact the VDR of a previous version
Figure 2.24: Chi-squared test of the Alhazmi vulnerability discovery equation for RedHat Linux. See [CA05]
Figure 2.25: Results of the Alhazmi model fit tests for vulnerability discovery equa- tion for Redhat Linux. See [CA05]
2.3.3.4 Discussion
In this paper, the authors proposed a 3-phase ’S’ curve model to describe vulnerabil- ity discovery over the lifetime of a software product. Later work has confirmed that the ’S’ curve does appear to describe the vulnerability lifecycle, however, the authors stated that the discovery rate was governed by a finite number of vulnerabilities and their market value, (which they referred to as ’market share’), and used the ratio of vulnerabilities to defects to support their assumption. While research presented in Chapter 3 suggest that market share does appear to play a role in the number of
vulnerabilities discovered, especially in phase 2, my research also suggests that the long slow rise (here described as phase 1), followed by the steep linear rise (phase 2) is more likely the result of the attackers’ learning curve. [CFBS10, CCBS14] regard- less of the ratio of vulnerabilities to software defects, or the number of remaining undiscovered vulnerabilities.
Alhazmi, et al., go on to explain the shape of their model resulted from “the vari- ability of the effort that goes into discovering vulnerabilities”. They believed that the rise in phase 2 indicated a strong increase in effort devoted to finding vulnera- bilities because this period was the one in which discovering vulnerabilities would be “the most rewarding”. However, their only justification was that this was the period where the operating system reaches its peak of popularity. In spite of observing that legacy code carried over to a later version resulted in shared vulnerabilities between versions and that some vulnerabilities found in the later version actually affected the earlier version, they concluded that the cause of the increase resulted from increased effort.
Although the authors mentioned the attacker learning process, code involved in “external access” and the effects of shared code when discussing their results, they did not consider these as important factors affecting the vulnerability discovery process they were attempting to model. Instead, the authors considered the size of the installed base and the time to saturation the most important drivers in the vulnerability lifecycle. Since time to saturation is related to the vulnerability density and the vulnerability to defect ratio, they even claimed that measuring vulnerability density “allows us to measure the quality of programming in terms of how secure the code is.” Research has shown that while vulnerability density may help determine whether software quality is improving, it can say nothing about the security of the code. [Gaf14, Bea16]