How I Learned to Stop Fuzzing
and Find More Bugs
Jacob West Fortify Software
August 3-5, 2007 Las Vegas
Agenda
Introduction to fuzzing What is fuzzing?
Challenges with fuzzing
Introduction to static analysis How static analysis works
Examples of bugs static analysis is good at finding Untapped potential: Customization
Experiment
Fuzzing versus static analysis Conclusion
What is Fuzzing?
Encompasses runtime testing that attempts to induce faults in software systems by inputting random or semi-random values
Introduced by Barton Miller at the University of Wisconsin, Madison in 1990
How Fuzzing Works
Identify sources of input to a program
Permute or generate pseudorandom input Monitor the program for failures
Record the input and program state combinations that generate faults Repeat for desired duration
Input Sources: File Formats
Identify all valid file formats (e.g. JPG, TIFF, PDF, DOC, XLS) Collect a library of valid files
Malform a file
Input Sources: Protocols Create bogus messages
(e.g. TCP/IP, RPC, SOAP, HTTP) Record-fuzz-replay
Enable a sniffer
Collect a few thousand messages Fuzz the messages
Replace the fuzzed messages
Intelligence
Dumb fuzzing = modify data randomly Most input will be entirely invalid
Can make for good test cases
Takes a long time to enumerate valid test cases
May test the validation logic of high-level protocols instead of the underlying application
Smart fuzzing = aware of data structure Altering content size
Replacing null-terminated strings
Altering numeric values or flipping signs 0, 2^n +/- 1
Adding invalid headers, altering header values, duplicate headers
Challenges: Nebulous File Formats / Protocols No problem for a standard Web application What about?
Proprietary Web Services interfaces Network servers
Thick client software
Difficult to enumerate input sources to fuzz Even harder to generate valid input
Challenges: Program Semantics / Reachability Example:
if (!strcmp(input1, “static_string”) { strcpy(buffer2, input2);
}
Need to provide value of input1 equal to “static_string” and large value of input2 Requires N*M random inputs to reach bug guarded by two-variable conditions
Challenges: Identifying Errors
Error reporting conventions differ between programs
Good design guidelines require programs to mask errors and error details
Challenges: Completeness / Coverage
Microsoft SDL mandates that you run 100,000 iterations per file format/parser. If you find a bug, you reset to 0 and start running another 100,000 with a new random seed. Why?
How many input sources were missed? How much of the program was tested? How good were the tests?
Advantages
Verifiable and reproducible at runtime
Scalable to across programs that utilize the same protocol
Least effort to find a bug, impossible to ensure completeness, very costly to approach
Tools
Open Source / Free SPIKE Scratch Peach Commercial Cenzic iDefense SPI Dynamics (http://www.hacksafe.com.au/blog/2006/08 /21/fuzz-testing-tools-and-techniques/)
Static Source Code Analysis Benefits
1000x faster than code review Security knowledge built in
Consistent Limitations
Does not understand architecture
Does not understand application semantics Does not understand social context
The Many Faces of Static Analysis Type checking
Style checking
Program understanding
Program verification / Property checking Bug finding
Type Checking
Taken for granted Imperfect:
short s = 0;
int i = s; /* the type checker allows this */
short r = i; /* false positive: this will cause a
type checking error at compile time. */
/* false negative:
passes type checking, fails at runtime */ Object[] objs = new String[1];
Style Checking
Pickier than type checker, might look at
Whitespace Naming
Deprecated functions
gcc -Wall does some style checking
typedef enum { red, green, blue } Color; char* getColorString(Color c) {
char* ret = NULL; switch (c) { case red: printf("red"); } return ret; } Tools Lint, PMD
Program Understanding
Help make sense of a large codebase Tools:
Fujaba
Klockwork
Program Verification / Property Checking Prove that a program has particular properties Partial specification -> property checking
Often focuses on temporal safety properties Example: allocated memory must be freed
inBuf = (char*) malloc(bufSz);
if (inBuf == NULL) return -1;
outBuf = (char*) malloc(bufSz); if (outBuf == NULL)
return -1; /* memory leak */
Soundness
Aspires to “Sound WRT the specification”: reports all bugs
Tools:
Bug Finding
More sophisticated than a style checker Less ambitious than program verification Search code for “bug idioms”
Find high-confidence, low noise results (low false positives)
Soundness
Aspires to “Sound WRT counterexample”: never reports a bug that isn’t a bug
Example: double checked locking
if (fitz == null) {
synchronized (this) { if (fitz == null) {
fitz = new Fitzer(); }
} }
Security Review
Focus on finding exploitable code
Find high-risk code constructs for review (low false negatives)
Example
int main(int argc, char* argv[]) { char buf1[1024];
char buf2[1024];
char* shortString = "a short string";
strcpy(buf1, shortString); /* innocuous use of strcpy */
strcpy(buf2, argv[0]); /* dangerous use of strcpy */
...
Trace potentially tainted data through the program
Report locations where an attacker could take advantage of a vulnerable function or construct
Security Example: Dataflow Analysis
= getInputFromNetwork(); copyBuffer( , ); exec( ); buff buff newBuff
newBuff (command injection vulnerability)
A Peek Inside a Static Analysis Tool Modeling Rules Security Properties Front End src System Model Analyzer Analyzer Analyzer Results Viewer
Parsing
Language support
One language/parser is straightforward Lots of combinations is harder
Could analyze compiled code… Everybody has the binary
No need to guess how the compiler works No need for rules
…but
Decompilation can be difficult Loss of context hurts
Analysis / Rules: Structural
Identify bugs in the program's structure Example: calls to gets()
Structural rule:
Analysis / Rules: Structural
Identify bugs in the program's structure
Example: memory leaks caused by realloc() buf = realloc(buf, 256);
Structural rule:
FunctionCall c1: (
c1.function is [name == "realloc"] and
c1 in [AssignmentStatement: rhs is c1 and lhs == c1.arguments[0]
] )
Analysis / Rules: Dataflow Source Rule
Following interesting values through the program
Example: Command injection vulnerability
Source rule:
Function: getInputFromNetwork()
Postcondition: return value is tainted
= getInputFromNetwork(); copyBuffer( , ); exec( ); buff buff newBuff newBuff
Analysis / Rules: Dataflow Pass-Through Rule Following interesting values through the
program
Example: Command injection vulnerability
Pass-through rule: Function: copyBuffer()
Postcondition: if the second argument is tainted, then the first argument becomes tainted
= getInputFromNetwork(); copyBuffer( , ); exec( ); buff buff newBuff newBuff
Following interesting values through the program
Example: Command injection vulnerability
Sink rule: Function: exec()
Precondition: the first argument must not be tainted Analysis / Rules: Dataflow Sink Rule
= getInputFromNetwork(); copyBuffer( , ); exec( ); buff buff newBuff newBuff
Analysis / Rules: Control Flow Look for dangerous sequences Example: Double-free free(x) free(x) initial state freed error start (other operations) (other operations)
while ((node = *ref) != NULL) { *ref = node->next; free(node); if (!unchain(ref)) { break; } } if (node != 0) { free(node); return UNCHAIN_FAIL; }
Analysis / Rules: Control Flow Look for dangerous sequences Example: Double-free free(x) free(x) initial state freed error start (other operations) (other operations)
while ((node = *ref) != NULL) { *ref = node->next; free(node); if (!unchain(ref)) { break; } } if (node != 0) { free(node); return UNCHAIN_FAIL; }
Analysis / Rules: Control Flow Look for dangerous sequences Example: Double-free free(x) free(x) initial state freed error start (other operations) (other operations)
while ((node = *ref) != NULL) { *ref = node->next; free(node); if (!unchain(ref)) { break; } } if (node != 0) { free(node); return UNCHAIN_FAIL; }
Common Problems False positives Incomplete/inaccurate model Conservative analysis Missing rules False negatives Incomplete/inaccurate model “Forgiving” analysis Missing rules
Untapped Potential: Customization
Improve tool understanding of the program Model the behavior of third-party libraries
Describe program semantics
Identify program-specific vulnerabilities Enforce specific coding standards
Find vulnerabilities in custom interfaces Design for testability
Advantages of Static Analysis over Fuzzing Speed
Doesn’t require running the code
Customization has almost no impact on performance
Thoroughness
Experiment
Comparison of fuzzing and static analysis on an open-source code base
Without customization With customization
Results TBA
Summary
Static analysis is spot-on for security Important attributes Language support Analysis techniques Rule set Performance Results management Customization
<end>
PDF for talk available here:
http://www.fortify.com/presentations
Send me e-mail!
Jacob West [email protected]
Secure Programming with Static Analysis