Linux Kernel. Security Report

(1)

L i n u x R e p o r t Executive Summary

Four years of analysis of the source code for the Linux operating system have revealed that the system has become more secure by reducing the number of dangerous security defects detected in the most critical parts of the Linux kernel. Coverity has combined two years of analysis work carried out in a commercial setting at Coverity with four years of academic research

conducted at the Computer Science Laboratory at Stanford University using an early prototype of Coverity's technology. Our study had led us to several conclusions:

Security Report

Linux

Kernel

Coverity has combined two years of analysis work carried out in a commercial setting at Coverity with four years of academic research conducted at the Computer Science Laboratory at Stanford

University. Linux has proven to be a stable, secure platform, which is only improving over time.

Major development efforts increase defect density while subsequent small revisions tend to fix a large number of bugs.

The number of security defects tends to increase with code size, yet the percentage located in security-critical areas in the code continues to decrease.

The code-heavy drivers account for 50-65% of security defects, while the core kernel accounts for between 1-2% and the file systems account for 9-13%.

Linux's open source development process aids in the continued security and stability of the system as a whole.

As Linux grows and more developers have the chance to contribute to the source code, basic secure coding rules are not followed consistently in certain areas of the code causing severe security defects to escape into released versions of the kernel.

Linux's development methodology would benefit from a centralized security auditing process. Adding more contributors to the Linux kernel who do not understand security rules will decrease the overall security of the system without a centralized auditing effort to ensure that secure coding rules are followed consistently.

September 2005

Authors: Andy Chou, Bryan Fulton and Seth Hallem

(2)

The Linux Security Study

In this report we illustrate the results of a series of security-specific analyses of the Linux kernel. This is an update to our December, 2004 report regarding the overall quality of the 2.6.9 kernel, which analyzed over 5.5 million lines of Linux kernel source code. This report examines the evolving state of security in the Linux kernel over the course of 4 years worth of kernel releases. The main improvements to our methodology in this study are:

Coverity's team regularly reports detected security flaws and other critical bugs to the Linux kernel implementers. Currently, we are working to make all the potential defects available to the maintainers of the major

components of the kernel. In general, these bug reports are quickly converted into patches by the responsible implementers that remedy the defective source code. The patch is then released to the Linux kernel mailing list for vetting by a broader audience, which helps ensure that each fix is, in fact, a complete fix, and that each fix does not introduce another problem. Patches are generally applied to the latest release candidate of the kernel by the maintainer for that candidate. Patches are often released within a matter of days of the release of the bug reports.

In some circumstances, a security advisory is released that contains numerous security-specific fixes and users are recommended to upgrade to the newest version immediately rather than wait for the next release candidate.

The use of advanced security-specific analyses.

The use of a sophisticated "tainted data" dataflow analysis engine that specifically tracks the propagation of user-controllable tainted data.

Analysis improvements that understand the implications and security concerns of "user land" and "kernel land" memory mapping.

Coverity's Security Analysis

Coverity's security analysis finds numerous coding defects that can lead to dangerous security vulnerabilities. These defects include improper

stack/heap access, integer overflow, buffer overflow, file-based race conditions, platform specific bugs, insecure coding practices, and exploitable string management errors. These errors can lead to serious issues such as memory corruption, privilege escalation, unauthorized reading of memory or files, process/system crashes, denial of service, etc.

Modern applications and systems must constantly interact with the untrustworthy outside world while still maintaining a high level of reliability and availability in the face of malicious users or processes.

Given this operating environment, a key rule in computer security is to properly validate all inputs to a program. Data obtained from the outside world must be treated as potentially dangerous until the proper steps have been taken to sanitize its contents and ensure its validity. Untrustworthy data can enter a system through a variety of means such as command-line arguments, files, environment variables, over the network, copied from user-land memory, etc. Coverity's security analysis uses a combination of Coverity Prevent's analysis engine and sophisticated tracking of this "tainted"

data to pinpoint security holes. Using Coverity's interprocedural analysis engine, we are able to track data as it flows throughout a program, searching for potential improper uses of this data.

Coverity's team regularly reports

detected security flaws and other critical

bugs to the Linux kernel implementers.

Currently, we are

working to make

all the detected bugs

available to the

maintainers of the

major components

of the kernel.

(3)

For security analyses of operating system code such as the Linux kernel, we have augmented our analysis to understand the security implications of “kernel land” and “user land” memory mapping.

When a kernel needs access to user data, they will typically use a “paranoid” copy routine such as copy_from_user() to safely copy data stored in user-controllable memory into the kernel’s memory space.

Similar interactions occur at system calls, ioctl handling code, etc. This data, from the perspective of the kernel, is completely untrustworthy as a malicious user can supply any range of possible data values.

Our analysis understands this notion and the security rules associated with user/kernel interactions.

Type

Tainted Scalar Buffer overflow, integer overflow,

denial of service, or memory corruption.

Tainted String Unsafe usage of tainted strings Unsafe resource read/write, environment corruption, cross-site scripting, file corruption, format string vulnerabilities,

command injection, or SQL injection.

Description Effect

Unsafe usage of tainted scalars

User Pointer Unsafe dereference of user-land

pointers Denial of service or memory

corruption.

String Null Denial of service or memory

corruption.

Unsafe use of tainted, potentially non- terminated strings

String Size Buffer overflow or possible

execution of arbitrary code.

Unsafe use of tainted strings with a potentially unbounded size.

Overrun Static Buffer overrun on the stack

Overrun Dynamic Buffer overrun on the heap

Stack corruption, malicious code injection, or denial of service.

Heap corruption, malicious code injection, or denial of service.

In this study, our analysis focused on automatically detecting seven different categories of security

defects. The first five (Tainted Scalar, Tainted String, User Pointer, String Null and String Size)

focus on potential misuses of different types of tainted data. The remaining two analyses (Overrun

Static and Overrun Dynamic) detect "traditional" buffer overruns of stack or heap allocated data.

(4)

We ran our security analysis over 6 different versions of the Linux kernel, ginning with the 2.2.16 release from July, 2000 and ending with the 2.6.12.4 released in August, 2005. For each version, we have recorded statistics such as the number of defects discovered, the location where the bugs were discovered, and the number of lines of code. We also recorded metrics about the code base, such as the number of files, the number of functions, the number of paths analyzed, and the number of potential violations of the Tainted Scalar rule described above. We chose this last metric in order to correlate the number of security violations we detected to the number of potential

instances where this particular security rule could, hypothetically, be violated (the developer success rate for this rule).

0 1000000 2000000 3000000 4000000 5000000 6000000

Lines of Code

2.2.16 2.4.5 2.4.19 2.4.21 2.6.7 2.6.12.4

Kernel Version

Code Growth Across 4 Years of Linux Development

Other Arch Drivers Networking Filesystem Kernel

2.2.16

2.4.5

2.4.19

2.4.21

2.6.7

2.6.12.4

Files Functions 0

10000 20000 30000 40000 50000 60000 70000 80000

Code Growth Over Time

Files Functions

As shown in these charts, the Linux kernel is a constantly evolving and expanding piece of software.

As thousands of developers work together to incorporate new features and fixes, the difficulty in maintaining high levels of quality and security increases with each new release.

Architecture

(5)

Results and Key Conclusions

As new features and additional drivers are constantly being developed, the amount of code continues to grow with each new release. From the data, we can see that the greatest spikes in defect numbers are associated with major development efforts (i.e., 2.2 to 2.4, 2.4 to 2.6).

These are places where large amounts of new code are added. Large volumes of new code cause a predictable increase in the raw number of bugs discovered throughout a system. In contrast to these inflection points across major releases, we can also see that minor releases tend to incorporate many bug fixes and add fewer new bugs as most development efforts are spent improving and debugging the previously introduced, new features.

Since the 2.4.19 release, the security defect density has steadily declined for each subsequent release, continually strengthening the overall security of the system. Highlighting this trend, the 2.4.21 kernel is the first release where both the total number of security defects discovered and the security defect density have gone down from the previous release. From this release onward, both metrics have continually decreased over time. This trend illustrates that while new, non- security related defects are still being added, developers are learning to take security concerns into account as security defects are becoming less and less common even in the face of continued code churn and code growth.

From this study, we can conclude that the Linux kernel is a stable, secure operating system and continues to improve over time. We also stress that the Linux development process helps enable the security and stability of such a large project. Here, we outline and discuss the findings and major conclusions reached by our study.

0 1,000,000 2,000,000 3,000,000 4,000,000 5,000,000 6,000,000 7,000,000

2.2.16 2.4.5 2.4.19 2.4.21 2.6.7 2.6.12.4

0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 Code count Security Density

0 1,000,000 2,000,000 3,000,000 4,000,000 5,000,000 6,000,000 7,000,000

2.2.16 2.4.5 2.4.19 2.4.21 2.6.7 2.6.12.4

0.225 0.23 0.235 0.24 0.245 0.25 0.255 0.26 0.265 0.27 0.275 Code count Total Density

1 - More code implies more bugs, yet security defects are decreasing

2.2.16 2.4.5

2.4.19

2.4.21 2.6.7

2.6.12.4

Total Security Non Security 0

200 400 600 800 1000 1200

Kernel Version

Security Vs. Non-Security Bugs Over Time

Total Security Non Security

(6)

Another interesting result of our study is that the location of the majority of both security and non-security bugs continues to move from high priority areas such as the kernel, filesystem, and networking code into the lesser used drivers. Any driver defect is only a serious vulnerability if that specific driver is being used in a given system configuration. Many of the drivers are only used by enthusiasts deploying Linux on home desktops or laptops, so the large, enterprise distributions of Linux are not affected by these defects. This migration trend is most likely due to the increased number of kernel developers as Linux becomes more popular. New developers tend to start by writing driver code to support a new device. Developers new to the Linux kernel often do not fully understand the security implications and rules associated with Linux kernel development in their first foray into the kernel community.

2 - Defects are low where it matters

0 200 400 600 800 1000 1200 1400

Defect Count

2.2.16 2.4.5 2.4.19 2.4.21 2.6.7 2.6.12.4

Kernel Version

Defect Distribution Over Time

Other Arch Drivers Networking Filesystem Kernel Architecture

(7)

The results of this study indicate that the Linux development process is a key factor in the security and stability of the system. While developers vary both in skill and in security expertise, new code must pass through a series of steps before it is incorporated into a kernel release. Thus, new, less experienced developers will have their code routinely checked and inspected before it will be signed-off by the subsystem maintainer and incorporated into the kernel. Ultimately, the code will go through the ranks and be inspected by one of the two lead developers, Andrew Morton and Linus Torvalds. This hierarchical development process allows highly skilled, specialized developers to inspect new code from more inexperienced contributors before that code ever gets to the next stage. Those developers that understand both simple and complex security rules are able to weed out potentially devastating defects early in the

development process.

The following image outlines the Linux kernel development process:

While this process has been shown to be effective and helpful to the overall security of the system, it is not perfect. Many defects discovered by our analysis are very common and simple mistakes. Thus, while having many eyeballs inspect each code change can be a great asset to overall system security, it does not guarantee that even the simplest of defects will be caught before a release. Many new developers do not understand the kernel's security rules, or, perhaps, feel that they can put off security concerns to the subsystem maintainer as he/she is more of a security expert. As such, we can see an argument for a centralized security auditing process to aid in the early detection of security vulnerabilities as well as the handling of new security reports.

3 - Linux development process is good for security, but not perfect

(8)

The Evolution of Source Code Analysis Technology

Compile time source code analysis for security has long remained more of a promise than a reality.

Coverity has developed the first system that can identify defects with a combination of precision, speed, and accuracy across millions of lines of code that satisfies the most demanding industry customers as well as the fast paced open source community. Early attempts at static source code analysis focused on software verification. While these efforts yielded tools, the domains of those tools were inevitably limited to either restrictive, custom programming languages, or very small programs that do not reflect the complexity of today's commercial software.

Coverity Prevent represents a breakthrough in the source code analysis community by applying sophisticated compiler optimization techniques combined with an innovative system architecture to produce the most effective analysis system every built. Coverity Prevent is designed as a generic analysis platform with modular plug-ins that carry out specific simulations of the source code and report specific types of defects. Because of this module design, each analysis is finely tuned to the type of bug that is detects, allowing Coverity's system to achieve a false positive rate that is the overall lowest amongst source code analysis systems.

About Coverity and Linux

Coverity was founded by Stanford professor Dr. Dawson Engler and four Ph.D. students in the Computer Science Laboratory at Stanford University. By combining a strong academic foundation with fast growing market traction experience, Coverity is able to leverage a combination of

innovative ideas and enterprise experience to build a source code analysis technology that is unparalleled in either research or industry.

Coverity's experience with the Linux source code began when the founding team was working at Stanford on a research project then known as the Meta Compiler Project. The project was focused on proving that compiler technologies could be successfully applied to find defects in robust software systems. The first published work on the Meta Compiler system was described to the research community in late 2000. In that publication, 511 bugs were reported, while the research uncovered over 2000 defects in the Linux kernel source code using the primitive analysis technology in our first experimental implementation of the analysis platform.

In 2001, a follow up presentation to the research community used the second iteration of the analysis platform to present a study of defect trends in the Linux operating system. In this study, over 1000 defects were detected in the Linux kernel source code from version 2.4.1. At that time, the source code contained 1.6 million lines of code.

With such solid results, a follow-on study was conducted to detect security flaws in the Linux kernel due to unsafe usage of user-controllable tainted data. The study illustrated a basic technique for detecting defects that, if left in the source code, could possibly cause severe security

vulnerabilities and lead to complete system takeover if discovered and abused by a malicious party.

The analysis uncovered 125 security flaws in the 2.4.12 kernel as well as substantial security flaws in an un-named commercial security system.

As Coverity's analysis technology matured, repeated analyses of the Linux kernel were performed as a means for validating the progress of Coverity's technology. This work was punctuated by

numerous releases of security holes to the Linux kernel implementers dating back to 2004, and July 2004 posting of the http://linuxbugs.coverity.com site that presented approximately 900 bugs in the Linux 2.6.4 kernel. At that time, the Linux kernel contained over 5 million lines of source code. These bugs were released publicly to the Linux kernel implementers and many were fixed, although the exact number of subsequent fixed is still yet to be assessed. Several patches to the Linux kernel were applied in response to that release.

http://linuxbugs.coverity.com Prevent in action:

http://www.coverity.com

Coverity, Inc.

185 Berry St. Suite 3600 San Francisco, CA 94107 (800) 873-8193 [email protected]