How I Learned to Stop Fuzzing and Find More Bugs

(1)

How I Learned to Stop Fuzzing

and Find More Bugs

Jacob West Fortify Software

August 3-5, 2007 Las Vegas

(2)

Agenda

Introduction to fuzzing What is fuzzing?

Challenges with fuzzing

Introduction to static analysis How static analysis works

Examples of bugs static analysis is good at finding Untapped potential: Customization

Experiment

Fuzzing versus static analysis Conclusion

(3)

What is Fuzzing?

Encompasses runtime testing that attempts to induce faults in software systems by inputting random or semi-random values

Introduced by Barton Miller at the University of Wisconsin, Madison in 1990

(4)

How Fuzzing Works

Identify sources of input to a program

Permute or generate pseudorandom input Monitor the program for failures

Record the input and program state combinations that generate faults Repeat for desired duration

(5)

Input Sources: File Formats

Identify all valid file formats (e.g. JPG, TIFF, PDF, DOC, XLS) Collect a library of valid files

Malform a file

(6)

Input Sources: Protocols Create bogus messages

(e.g. TCP/IP, RPC, SOAP, HTTP) Record-fuzz-replay

Enable a sniffer

Collect a few thousand messages Fuzz the messages

Replace the fuzzed messages

(7)

Intelligence

Dumb fuzzing = modify data randomly Most input will be entirely invalid

Can make for good test cases

Takes a long time to enumerate valid test cases

May test the validation logic of high-level protocols instead of the underlying application

Smart fuzzing = aware of data structure Altering content size

Replacing null-terminated strings

Altering numeric values or flipping signs 0, 2^n +/- 1

Adding invalid headers, altering header values, duplicate headers

(8)

Challenges: Nebulous File Formats / Protocols No problem for a standard Web application What about?

Proprietary Web Services interfaces Network servers

Thick client software

Difficult to enumerate input sources to fuzz Even harder to generate valid input

(9)

Challenges: Program Semantics / Reachability Example:

if (!strcmp(input1, “static_string”) { strcpy(buffer2, input2);

}

Need to provide value of input1 equal to “static_string” and large value of input2 Requires N*M random inputs to reach bug guarded by two-variable conditions

(10)

Challenges: Identifying Errors

Error reporting conventions differ between programs

Good design guidelines require programs to mask errors and error details

(11)

Challenges: Completeness / Coverage

Microsoft SDL mandates that you run 100,000 iterations per file format/parser. If you find a bug, you reset to 0 and start running another 100,000 with a new random seed. Why?

How many input sources were missed? How much of the program was tested? How good were the tests?

(12)

Advantages

Verifiable and reproducible at runtime

Scalable to across programs that utilize the same protocol

Least effort to find a bug, impossible to ensure completeness, very costly to approach

(13)

Tools

Open Source / Free SPIKE Scratch Peach Commercial Cenzic iDefense SPI Dynamics (http://www.hacksafe.com.au/blog/2006/08 /21/fuzz-testing-tools-and-techniques/)

(14)

(15)

Static Source Code Analysis Benefits

1000x faster than code review Security knowledge built in

Consistent Limitations

Does not understand architecture

Does not understand application semantics Does not understand social context

(16)

The Many Faces of Static Analysis Type checking

Style checking

Program understanding

Program verification / Property checking Bug finding

(17)

Type Checking

Taken for granted Imperfect:

short s = 0;

int i = s; /* the type checker allows this */

short r = i; /* false positive: this will cause a

type checking error at compile time. */

/* false negative:

passes type checking, fails at runtime */ Object[] objs = new String[1];

(18)

Style Checking

Pickier than type checker, might look at

Whitespace Naming

Deprecated functions

gcc -Wall does some style checking

typedef enum { red, green, blue } Color; char* getColorString(Color c) {

char* ret = NULL; switch (c) { case red: printf("red"); } return ret; } Tools Lint, PMD

(19)

Program Understanding

Help make sense of a large codebase Tools:

Fujaba

Klockwork

(20)

Program Verification / Property Checking Prove that a program has particular properties Partial specification -> property checking

Often focuses on temporal safety properties Example: allocated memory must be freed

inBuf = (char*) malloc(bufSz);

if (inBuf == NULL) return -1;

outBuf = (char*) malloc(bufSz); if (outBuf == NULL)

return -1; /* memory leak */

Soundness

Aspires to “Sound WRT the specification”: reports all bugs

Tools:

(21)

Bug Finding

More sophisticated than a style checker Less ambitious than program verification Search code for “bug idioms”

Find high-confidence, low noise results (low false positives)

Soundness

Aspires to “Sound WRT counterexample”: never reports a bug that isn’t a bug

Example: double checked locking

if (fitz == null) {

synchronized (this) { if (fitz == null) {

fitz = new Fitzer(); }

} }

(22)

Security Review

Focus on finding exploitable code

Find high-risk code constructs for review (low false negatives)

Example

int main(int argc, char* argv[]) { char buf1[1024];

char buf2[1024];

char* shortString = "a short string";

strcpy(buf1, shortString); /* innocuous use of strcpy */

strcpy(buf2, argv[0]); /* dangerous use of strcpy */

...

(23)

Trace potentially tainted data through the program

Report locations where an attacker could take advantage of a vulnerable function or construct

Security Example: Dataflow Analysis

= getInputFromNetwork(); copyBuffer( , ); exec( ); buff buff newBuff

newBuff (command injection vulnerability)

(24)

A Peek Inside a Static Analysis Tool Modeling Rules Security Properties Front End src System Model Analyzer Analyzer Analyzer Results Viewer

(25)

Parsing

Language support

One language/parser is straightforward Lots of combinations is harder

Could analyze compiled code… Everybody has the binary

No need to guess how the compiler works No need for rules

…but

Decompilation can be difficult Loss of context hurts

(26)

Analysis / Rules: Structural

Identify bugs in the program's structure Example: calls to gets()

Structural rule:

(27)

Analysis / Rules: Structural

Identify bugs in the program's structure

Example: memory leaks caused by realloc() buf = realloc(buf, 256);

Structural rule:

FunctionCall c1: (

c1.function is [name == "realloc"] and

c1 in [AssignmentStatement: rhs is c1 and lhs == c1.arguments[0]

] )

(28)

Analysis / Rules: Dataflow Source Rule

Following interesting values through the program

Example: Command injection vulnerability

Source rule:

Function: getInputFromNetwork()

Postcondition: return value is tainted

= getInputFromNetwork(); copyBuffer( , ); exec( ); buff buff newBuff newBuff

(29)

Analysis / Rules: Dataflow Pass-Through Rule Following interesting values through the

program

Pass-through rule: Function: copyBuffer()

Postcondition: if the second argument is tainted, then the first argument becomes tainted

(30)

Following interesting values through the program

Sink rule: Function: exec()

Precondition: the first argument must not be tainted Analysis / Rules: Dataflow Sink Rule

(31)

Analysis / Rules: Control Flow Look for dangerous sequences Example: Double-free free(x) free(x) initial state freed error start (other operations) (other operations)

while ((node = *ref) != NULL) { *ref = node->next; free(node); if (!unchain(ref)) { break; } } if (node != 0) { free(node); return UNCHAIN_FAIL; }

(32)

(33)

(34)

Common Problems False positives Incomplete/inaccurate model Conservative analysis Missing rules False negatives Incomplete/inaccurate model “Forgiving” analysis Missing rules

(35)

Untapped Potential: Customization

Improve tool understanding of the program Model the behavior of third-party libraries

Describe program semantics

Identify program-specific vulnerabilities Enforce specific coding standards

Find vulnerabilities in custom interfaces Design for testability

(36)

Advantages of Static Analysis over Fuzzing Speed

Doesn’t require running the code

Customization has almost no impact on performance

Thoroughness

(37)

Experiment

Comparison of fuzzing and static analysis on an open-source code base

Without customization With customization

(38)

Results TBA

(39)

Summary

Static analysis is spot-on for security Important attributes Language support Analysis techniques Rule set Performance Results management Customization

(40)

<end>

PDF for talk available here:

http://www.fortify.com/presentations

Send me e-mail!

Jacob West [email protected]

Secure Programming with Static Analysis