6.2 A Case Study
7.1.2 Static Analysis to Find Bugs
The literature concerning static analysis techniques to find bugs is very extensive.
This section only attempts to summarize a few of the most relevant works.
Lint [45], the ancestor of all program checkers, was designed to find common
C programming errors. Due to the limited CPU and memory resources available
at the time of its implementation, the analysis performed by lint is simplistic. For
example, lint checks for uses of uninitialized local variables by seeing if they are
assigned earlier in the source file than their first use, rather than using a more
sophisticated analysis (such as dataflow analysis). Due to the imprecision of lint’s
analysis, special annotation comments may be added to the program’s source code to
suppress false warnings. Many checks performed by lint are now built into modern
analysis tools. However, because of the much greater computing power available
today, it uses more powerful analysis techniques, such as dataflow analysis. Because
FindBugs analyzes Java rather than C, it focuses less on low-level bugs, such as
memory allocation and dereferencing errors, and more on higher level concerns,
such as API use.
LCLint [26] is a static analysis tool to detect errors in C programs. For pro-
grams with no specifications, LCLint works much like the original lint tool, warning
about common C programming mistakes. For programs with specifications, LCLint
checks the code for consistency with the specifications. Examples of properties
checked by LCLint include incorrect use of boolean values, abstraction violations
(such as incorrect use of a “private” element of a struct), and violations of temporal
properties (such as using a field before it is initialized). Subsequent work extended
LCLint to check for memory errors [25], such as failure to free allocated memory and
use of uninitialized memory, and security exploits [27], such as buffer overruns and
format string vulnerabilities. The analysis performed by LCLint is neither sound nor
complete. However, explicit specifications help to make the analysis more precise in
cases where properties cannot be inferred by analysis alone. The null pointer anno-
tations in FindBugs (described in Section 3.3.10) were inspired by LCLint, and our
study of bugs in student programming projects confirm the value of specifications
in finding bugs.
Warlock [65] is a lint-like static analysis tool to find race conditions in C pro-
grams. Each source file of a program is analyzed to determine which mutex locks
(or set of locks) are flagged. The analysis is inter-procedural; functions with no
callers are assumed to be call graph roots. Imprecision in the analysis is mitigated
through the use of annotations, which can be used to describe the locks intended to
protect particular variables, which sections of code are multithreaded, which func-
tions might be called through a function pointer, etc. The detector for inconsistent
synchronization in FindBugs uses a somewhat similar approach; however, it relies
on the common Java idiom of locking an object in order to protect its fields, and
does not handle the more general case where any object might be used to protect a
particular memory location.
PREfix [9] applies static analysis to find common dynamic errors in C and
C++ programs. Examples of errors detected by PREfix are invalid pointer accesses,
leaked memory, use of freed memory, and use of an invalid resource (such as a file or
window handle). The analysis performed by PREfix works by symbolically executing
a large number of paths through the program, tracking the state of variables and
memory. PREfix works inter-procedurally, starting at the call graph leaves and
working towards the roots. It analyzes functions by evaluating a large number
of paths through their control flow graphs. To a significant degree, the values of
expressions are tracked using predicates, to avoid analyzing infeasible paths. As
functions are analyzed, models are built for later use when calls to those functions are
encountered. Constraints are added to functions to detect situations such as passing
a null pointer to a function which dereferences it unconditionally. PREfix was able to
find significant numbers of errors in large software systems with a low (25% or less)
was required to present detected errors to the user in a meaningful way. PREfix is
a good example of a “deep” analysis to find bugs. In our work on FindBugs, we
have focused largely on bugs which can be found using local analysis. One of the
surprising conclusions of our work is that a significant number of serious bugs can
be found without deep analysis.
MC [20, 36] (Meta-Compilation) is an inter-procedural static analysis system
for analyzing C and C++ programs. The novel aspect of MC is the use of a language,
called Metal, to describe the properties checked by the analysis. Metal is based on
a syntactic pattern matching technique similar to ASTLOG. Syntactic patterns are
used to describe functions and data objects of interest to the analysis. The MC sys-
tem uses state machines to describe temporal correctness properties. For example,
a lock object can be in unlocked or locked states. The analysis creates new state
machine instances as interesting objects are encountered. Syntactic patterns specify
when state transitions should occur; for example, calling a lock function changes
the state of the lock. Some state transitions represent bugs, such as locking a non-
recursive lock object already in the locked state. The analysis also has the notion
of global state to describe global properties such as whether or not interrupts are
enabled. The analysis is inter-procedural, starting at the call graph roots and work-
ing towards the leaves. Aggressive caching and memoization of function effects are
used to make the analysis more efficient, at the expense of accuracy and correctness.
Within functions, the analysis is path sensitive, using knowledge of variable states
to avoid analyzing false paths. MC is similar to PREfix in many respects. Its main
properties. Our work on FindBugs focuses on a somewhat different class of bugs,
although there are areas of overlap.
RacerX [21] is a static analysis system to find deadlocks and data races in mul-
tithreaded software. The techniques employed by RacerX are very similar to those
used in Warlock; lock sets are tracked at all program points. Variables not accessed
with a consistent lock set are potential data races. In order for the analyzed system
to be free of deadlocks, an acquisition of a lock l implies an ordering constraint that
all locks in the current lock set must be acquired before l. Based on all observed
constraints, RacerX computes the transitive closure and emits potential deadlock
warnings for any detected cycles. The analysis used is very similar to MC; however,
variables are not tracked to eliminate false paths. To reduce the impact of false
positives due to imprecision in the analysis, the authors employ a wide variety of
heuristics. One novel technique is unlockset analysis, which computes locks known
not to be held using a backwards analysis. Unlocksets are used to increase confi-
dence in the accuracy of locksets; when the unlockset disagrees with the lockset, the
most likely explanation is that an analysis error wrongly concluded that a lock was
held. RacerX also uses heuristics to rank the warnings produced by detected data
races. For example, unsynchronized variable accesses or unlocked function calls are
ranked higher if they are the first or last statement in a critical section, since those
accesses are most likely to be intentionally protected by the lock, and not merely
coincidentally locked. The authors used RacerX to find a number of deadlocks and
data races in Linux, FreeBSD, and a commercial OS kernel. FindBugs also has de-
cases. For example, FindBugs only detects data races related to the common idiom
of locking an object to protect access to its fields. Deadlock detection in FindBugs
is limited to identifying locations where a wait is performed with two locks held.
What is interesting in FindBugs is that simple analysis techniques can reveal serious
bugs.