Checkpointing-Based Testing

(1)

Master Thesis

Institute of Information Security, Department of Computer Science, ETH Zurich

by Tristan Buchs

Autumn 2014 - Spring 2015

ETH student ID: 09-928-904 E-mail address: [email protected]

Supervisor: Prof. Dr. David Basin

Advisors: Dr. Mohammad Torabi Dashti Marco Guarnieri

Petar Tsankov Date of submission: April 2, 2015

(2)

(3)

Declaration of Originality

I hereby declare that this written work I have submitted is original work which I alone have authored and which is written in my own words.

With the signature I declare that I have been informed regarding normal academic citation rules and that I have read and understood the information on “Citation etiquette”:

https:

//www.ethz.ch/content/dam/ethz/main/education/rechtliches-abschluesse/ leistungskontrollen/plagiarism-citationetiquette.pdf

The citation conventions usual to the discipline in question here have been respected. This written work may be tested electronically for plagiarism.

Zurich, April 2, 2015 Signature: Tristan Buchsa a_{See printed declaration}

(4)

(5)

Abstract

Testing web applications is challenging as they usually have deep and large state spaces and very long execution traces. We develop Checkpointing-Based Testing, a novel approach for efficiently testing web applications. Our approach is based on the idea that many test cases share common prefixes and we can significantly reduce the test execution time by avoiding the re-execution of the common prefixes. We implement our approach in a system called WebCheck and empirically evaluate it through case studies over three real-world PHP web applications. Our results demonstrate that WebCheck can achieve up to 19x speedup in test execution time and also detects 10 previously unknown errors.

(6)

(7)

Acknowledgements

I would like to express my sincere gratitude to my mentor, Prof. Dr. David Basin, for giving me the opportunity to write my thesis in his research group.

I would like to thank my supervisors Marco Guarnieri and Petar Tsankov for their valuable feedbacks and continuous guidance during the whole duration of this thesis. I would also like to thank Dr. Mohammad Torabi Dashti for his insightful suggestions on the subject.

Finally, I thank all my family and friends who unconditionally supported me during my studies at ETH, especially Armand Kurum for proofreading my work.

(8)

(9)

5.1.4 Solution . . . 20 5.2 WebCheck . . . 21 5.2.1 Goal . . . 21 5.2.2 Fuzzing Framework . . . 21 5.2.3 Fuzz Operators . . . 22 5.2.4 Test Cases . . . 22 5.2.5 Oracles . . . 22 6 Implementation 25 6.1 Checkpointing . . . 25 6.1.1 Target Applications . . . 25 6.1.2 WebCheck Proxy . . . 25 6.1.3 Application-level Instrumentation . . . 28 6.1.4 Complexity . . . 29 6.2 WebCheck Fuzzer . . . 30 6.2.1 Input Parsing . . . 30 6.2.2 Fuzzer . . . 31 6.2.3 Fuzz Operators . . . 32 6.2.4 Oracles . . . 32 7 Empirical Evaluation 35 7.1 Hypothesis . . . 35 7.2 Experimental Setup . . . 35 7.3 Evaluation of H1 . . . 36 7.4 Evaluation of H2 . . . 37

7.4.1 First Set of Experiments . . . 37

7.4.2 Second Set of Experiments . . . 40

7.5 Evaluation of H3 . . . 48

7.5.1 Design of Experiments . . . 48

7.5.2 Results & Discussion . . . 49

8 Conclusion 55

(11)

1 Introduction

Web applications are nowadays ubiquitous and handle all sort of sensitive data, from banking information to health related data. Their correctness is, therefore, of utmost importance. Testing is, in practice, the most used technique for gaining confidence about a web application’s correctness. It consists of exercising the web application with a finite number of inputs (which are typically sequences of HTTP requests) to check whether its behavior conforms to its specification. However, testing is in general incomplete as most web applications accept infinitely many inputs. Therefore, web applications must be thoroughly tested with a large number of inputs to gain a sufficient level of confidence in their correctness.

Web applications accept HTTP requests sent by the clients and reply with HTTP responses containing the requested content (e.g. a web page). They typically process long sequences of such events. For instance, to buy a book on Amazon, the client has to (1) provide credentials to authenticate to Amazon, (2) select the book to purchase and add it to the shopping cart, (3) go through all the checkout steps such as entering the shipping and payment information, and, finally, (4) pay for the book. This interaction consists of a large number of HTTP requests and HTTP responses, circa 60, which are individually processed by the web application as separated events. The event-driven nature of web applications results in deep and large state spaces and very long execution traces. These aspects pose serious limitations to web application testing. First, due to the large state space, a tester must execute a large number of test cases to gain any reasonable confidence in the software. Second, due to the long execution traces, executing a single test case may take a long time. For instance, testing Amazon’s payment phase requires to first execute the login, the book selection, and the checkout phases, as they must be executed before the payment. Executing even a single test case can therefore be quite expensive. Furthermore, to thoroughly test the payment phase, the tester must check how the application handles different kinds of payment requests, such as payments with insufficient funds or with missing payment information. Each payment request results in a separate test case for which the login procedure, the book selection procedure, and the checkout procedure are always executed at the beginning.

In this thesis, we develop Checkpointing-Based Testing, a novel approach for efficiently testing web applications. Our key insight is that many test cases share common prefixes and we can significantly reduce the test execution time by avoiding

(12)

the re-execution of common prefixes. In contrast to all state-of-the-art testing techniques, which execute all requests in a test case, including the common prefix, our technique checkpoints the application’s state right after executing a common prefix and restores the application’s state instead of re-executing the prefix. As shown by the Amazon example, the common prefixes can be long and expensive to execute, in particular if some steps (e.g. authenticating the client) in the common prefix are computationally intensive. Note that our approach is not confined to web applications and is also applicable to any event-driven applications.

To evaluate our Checkpointing-Based Testing approach, we design and implement a novel system, called WebCheck. WebCheck has a specialized checkpointing component for taking and restoring snapshots of web-application states. To capture a web application’s state, WebCheck stores all state-relevant information, such as data stored in a database and session information. WebCheck builds upon the transaction mechanisms, which are readily available in virtually all SQL databases, to efficiently rollback to previous database states. Through experiments, we show that WebCheck is efficient in terms of the time required for restoring the states of real-world web applications as well as scalable in the size of the state (e.g. the database size).

WebCheck has a modular design and supports arbitrary fuzz operators for gener-ating test cases using valid inputs and arbitrary test oracles. We have pre-configured it with standard fuzz operators for modifying the sequence of requests in a test case as well as with standard oracles for detecting generic web-application errors such as unhandled interpreter errors and malformed SQL queries. We empirically evaluate WebCheck through case studies over several popular PHP web applications. Our results demonstrate that WebCheck achieves up to 19x speedup in test execution time compared to state-of-the-art solutions and discovers 10 previously unknown vulnerabilities in our test subjects.

Structure In Chapter 2 and 3, we present background information on the subject at hand as well as the main definitions for recurring terms in this thesis. In Chapter 4, we present both our checkpointing solution and our WebCheck fuzzing framework. We explain and motivate our main design choices. In Chapter 5, we discuss the implementation of every component of the WebCheck system. In Chapter 6, we discuss our main hypotheses and justify them with a set of experiments. For each hypothesis, we present each case study and review the results.

(13)

2 Related Work

2.1 Application Checkpointing

Application checkpointing is a technique whose goal is to make computer systems more fault tolerant. The state-of-the-art checkpointing approaches [3] [4] [8] [15] usually save the state of the computation on disk, in such a way that the computation can be restored when a subsequent failure occurs. The computation’s state consists of the memory data, the process identifiers of the application, and a lot of additional components, e.g., file locks or open tcp connections.

Application checkpointing is used, in general, in parallel and distributed applications. [8] [12] [1] [2] However, approaches exist also solely for single-node applications. [15] In the following, we focus on single-node applications, as they are relevant for this thesis.

There are a few available tools and libraries focusing on single-node checkpointing, namely Cryopid1, CRIU2 and Libckpt [15]. Those tools allow to suspend the execution of a running process and take a snapshot of its state, a checkpoint, into a bunch of files on disk. The process’ execution can then be restored from the checkpoint.

Although existing checkpointing tools are captivating, they are not mature enough to checkpoint web applications. After a few experiments, we encountered some issues. For instance, checkpointing tools cannot handle applications spanning over too many processes, as it is the case with web applications. The reason seems to be that the checkpointing tools do not support shared memory segments within the targeted applications.

Furthermore, they have strict limitations in dealing with network connections. Therefore, we cannot integrate existing checkpointing tools to our design, as they cannot be used with web applications.

1_{https://code.google.com/p/cryopid/} 2_{http://criu.org/Main Page}

(14)

2.2 Web Application Testing

In the context of web application testing, there are multiple important prerequisites that come into play. First of all, selecting complex test cases that exercise the web application as desired is not trivial. [17] User-session-based techniques can help tremendously as they allow the tester to record users’ interactions (HTTP requests) to generate test cases. [6]

Web application testing is often neglected or done very quickly as thoroughly testing a web application is time consuming and often lacks significant payoff. [13] Moreover, most commercially available tools are not covering all functionalities of web applications and are, most of the time, addressing accessibility and portability issues. This is a reason why well-defined web testing strategies should be designed to point web application testing in the right direction. [5]

The emphasis of current research, rather than being on executing test suites faster, is on optimizing the way test suites are executed [16] or on optimizing test scheduling. [19]

There does not seem to be previous work on increasing test suites runtime efficiency.

2.3 Fuzz testing Web Applications

Input validation in web applications has often been shown to be a hard task. Many weaknesses in web applications are a direct consequence of it. Moreover, it is infeasible to exhaustively test web applications, due to the large number of possible inputs. Fuzz testing web application tries to address this issue by leveraging a tool to automatically generate test data, which is referred to as “fuzzing”. [14] [11] The fuzzing tools solely focus on altering HTTP requests (GET and POST) and their parameters, trying to expose web application flaws, such as reflected and stored cross-site scripting (XSS) vulnerabilities and SQL injections bugs.

Alternative ways of applying fuzz testing, other than modifying the single requests, have been successfully applied to other domains. For instance, mutating entire traces of messages has been successfully applied to security protocols. [18] Despite the fact that, to the best of our knowledge, this approach has not yet been applied to fuzz testing web applications.

(15)

3 Technical Background

In this chapter, we present background information on web applications, web appli-cation testing, and fuzz testing. First, we discuss web appliappli-cations and their inner working with an emphasis on topics relevant to the thesis. Afterwards, we describe web application testing and detail fuzz testing and its different usage methods. Finally, we present relevant background on checkpointing and databases.

3.1 Web Applications

3.1.1 Context

Over the past two decades, we witnessed the quick expansion of the World Wide Web, which enables users to access web pages over the network via a web browser. Web services are mostly provided by web applications.

3.1.2 Structure

Figure 3.11 _{depicts the main components of a typical web application.}

The web browser is a software application that is mainly used to retrieve and display information, such as web pages, images, videos, and files from web servers. A web server hosts web sites and answers HTTP(s) requests from clients by delivering static or dynamically created content. The web server identifies the resource to be retrieved by the request’s URL and maps it to the corresponding file or program. Popular web servers include Apache, Microsoft IIS, and nginx.

Application servers usually run the web application’s business logic. They act as a bridge between the front-end and the back-end (storage). Within web applications, application servers are often responsible to create dynamic web pages - that is to say, using server-side scripts assembling web pages based on many parameters. The database server runs a database management system (DBMS) and offers its

(16)

Figure 3.1: Typical structure of a web application.

services to other applications that need to manage persistent data. The choice of an appropriate DBMS is based on the requirements of the web application.

From an architectural perspective, a web application is made of three or more tiers as depicted in Figure 3.22_{. The application tier can be divided in as many}

tiers as needed by the developers. The application tier is often divided in multiple components. The web server and the application logic are the more common ones and make it possible to have a working web application. On top of that, other components can be added to integrate the web application within a bigger system (e.g. enterprise business logic) or simply the model instances for web applications

designed using the model-view-controller (MVC) architectural pattern.

Figure 3.2: A web application’s tier list.

(17)

3.1 Web Applications

3.1.3 Storage

As we have seen above, web applications usually store persistent data in databases. There are also other means for storing data persistently, such as files, but they are rarely used in practice.

SQL databases are the most popular storage engines used by web applications. Typically, a database server contains one or more database, each one consisting of a number of tables. The data stored in the database is managed using SQL queries. Most programming languages feature APIs for interacting with the database server using built-in functions. SQL queries are used to both read and write the database. Web application developers have to be very thorough when dealing with those as different checks have to be made depending on the nature of the query.

3.1.4 Stateful Web Applications

The communication between the web server and the client runtime is built on HTTP/HTTPS3. The HTTP protocol is stateless, which raises the following inquiry: how can web applications be stateful when they are built on top of a stateless protocol ?

The answer to this question is simple: sessions. “A session is defined as a series of related browser requests that come from the same client during a certain time period. Session tracking ties together a series of browser requests – think of these requests as pages – that may have some meaning as a whole, such as a shopping cart application.”4 _{In the following, we detail how PHP handles sessions. Other}

programming languages and frameworks may implement sessions differently. We review here the main concepts about session management that are relevant to the thesis.

A session can be thought of as a key-pair data structure. Each session identifier is linked to the corresponding data, such as the user name or the user identifier. Both the server and the client each keep track of some of the session data. There are two ways to store the session information: either on the client or on the server. HTTP Cookies usually store the session information on the client. They are a small piece of data from a website store inside the browser. In contrast, session information is usually stored in a file, in memory, or in a database on the server. When the session information is stored on the server, the client has to store a reference to its session information in the form of a session identifier in the corresponding cookie, as we can see in Figure 3.35_{. In addition, the web server keeps track of the user’s state}

3_{https://tools.ietf.org/html/rfc2616}

4_{http://docs.oracle.com/cd/E13222 01/wls/docs92/webapp/sessions.html} 5_Source: _{http://www.lte.lu/files/1013/9306/1619/wda 080.gif}

(18)

using sessions. Each session ordinarily has an expiration date rendering the session obsolete. Sessions can also become obsolete after a predefined time of inactivity. The web server configuration dictates when an obsolete session has to be discarded.

Figure 3.3: Setting up an HTTP cookie.

3.1.5 Web Application Frameworks

Web application frameworks are designed to help with the development of web applications. They apply different techniques and patterns from software engineering directly to web development. There are a number of web frameworks families, which can be anything from frameworks, languages or environments, namely: PHP, Python, Ruby, Microsoft .NET, Java VM and Javascript. The list is not complete and represents a selection of some well-known (or widely used) web frameworks families. Each framework has its strengths and weaknesses, it is up to the developers to choose which one suits them best.

Figure 3.46 depicts the framework usage statistics for the top 10k web sites.

Figure 3.4: Framework usage statistics for Top 10k web sites.

(19)

3.2 Web Application Testing

3.1.6 Web Application Security

Web application security is an important subject because web applications typically store sensitive data, such as confidential business information or private user data, which must be handled with great caution. A fundamental weakness is that the web server must process any requests, including requests that are sent by malicious users. Malicious users can submit carefully crafted requests in order to exploit security flaws in the web application. The web application must therefore correctly validate all requests so as to avoid security threats.

3.2 Web Application Testing

3.2.1 Context

Finding faults in web applications has always been a challenge. The wide variety of heterogeneous technologies used to implement web applications as well as a lack of well-defined test strategies complicate effectively testing of web applications. Moreover, the wide variety of possible failures makes it cumbersome to reveal the most critical failures within web applications.

Over the last few years, web testing has grown fast by trying to overcome the limitations (broad variety of failures and undefined testing strategies) that initially made finding errors in web applications difficult. There is a vast landscape of diverse tools that allow the tester to scan a web application in a reliable way.

The goal of web application testing is to reveal failures, which is the inability of an application to perform its required functions, such that the underlying faults - a weakness in an implementation that can lead to a failure - can be fixed to later have a more robust and secure web application. There are a lot of distinct aspects, such as: usability testing; user acceptance testing; performance testing; security testing; functional testing and interface testing.7 All those techniques have precise testing guidelines and methods. In this thesis, we mainly focus on security testing.

Security testing’s main goal is to uncover faults that, if exploited by attackers, might lead to data leaks, such as unauthorized access to confidential data or protected resources, and integrity violations, such as corrupting the web application’s state. Faults are usually difficult to uncover and the web application needs to be thoroughly tested to reveal them. Therefore, testing for different kinds of failures might take a lot of time. Different features can be tested. A list of tasks to perform associated

(20)

with security testing can be found at [8_{]. We cannot test for all kinds of failures,}

exploring all aspects of security testing is far too broad of a domain.

Security testing in web applications can be broken down into several methodologies such as penetration testing, fuzz testing, . . . Fuzz testing has already been used to successfully find many security problems in real world applications. Therefore, we solely focus on fuzz testing.

3.2.2 Approaches

There are mainly two standard approaches to test web applications: black-box testing and white-box testing. Black-box testing considers the system as a black box. There are entry points and exit points but the content of the inside of the box is unknown. The tester can test the inputs and outputs. Black-box testing is principally used to ensure that the system works according to the specifications. In contrast, white-box testing implies that the tester has a complete knowledge of the inner working of the web application. This knowledge can be used to target tests to a specific region of the web application code.

3.2.3 Tools

A number of automated tools exist for security testing in web applications. Some of them are “black-box oriented” meaning that they primarily focus on the inputs and outputs of the web applications, whereas others can be customized to fit into a more “white-box oriented” testing approach using knowledge from the implementation of the web applications. On top of that, some tools target a very specific methodology of web application security testing such as, for example, penetration testing. A list containing most of the web application testing tools can be found at [9_].

3.3 Fuzz Testing

Here, we describe fuzz testing in detail. Fuzz testing is a software testing technique that tests programs by feeding them with invalid, malformed, unexpected, or random inputs. The tester uses monitoring tools, such as Valgrind10_{, to identify the possible}

failures that may reveal security vulnerabilities.

8_{https://www.owasp.org/index.php/Web Application Security Testing Cheat Sheet} 9_{https://www.owasp.org/index.php/Appendix A: Testing Tools}

(21)

3.3 Fuzz Testing

Fuzz testing is ordinarily used within black-box testing techniques. It gives a high benefit-to-cost ratio [9], as fuzz testing sometimes reveals critical exploitable faults and is not difficult to set up. For instance, fuzz testing can be used to test the robustness of error-handling routines.

There are distinct approaches to fuzz testing: mutation-based and generation-based fuzz testing. With the mutation-generation-based approach, a fuzzer, a tool used to test parameters of an application, mutates existing data whereas with the generation-based approach, a fuzzer generates new data generation-based on its prior knowledge of the application’s data.

Moreover, successful fuzzers usually leverage good fuzz operators with deep knowledge of the application or protocol under test. A fuzz operator is a function being used to mutate a well-formed input. The mutated input is subsequently likely to expose vulnerabilities. Fuzz operators can, in the case of white-box testing, be extracted from the specification of the protocol itself. Using the specifications of a program to generate mutated inputs has the advantage that the fuzz operators can target a specific part of the specifications directly. [18]

On one hand, the main goal of fuzz testing is to help find faults to be fixed during the development phase of a software. On the other hand, seen from the attacker perspective, fuzz testing is a reliable option to find vulnerabilities in software.

There are a few limitations to fuzz testing. First of all, it is not possible to test softwares based on their specification at an early stage in the development. Testing is always done on the system, not on the specifications. Additionally, the tester does not always have access to the entire code or specification of the system under test (black-box testing). This complicates fuzz testing as it becomes harder to target

specific application areas. It leads to fuzz testing mostly finding simple errors. Mutation-based and generation-based fuzz testing both have their set of advantages and disadvantages:

• Mutation-based PROs: It mutates existing inputs that are meaningful to the application. There is no need to extract any kind of specification from the system under test.

• Mutation-based CONs: Inputs are well-formed meaning that, by mutating those, we may miss some aspects that require more mutations. It is suited for black-box testing only. It requires a set of system inputs to work with. • Generation-based PROs: By generating inputs itself, it can target specific

parts of the system. It is suited for white-box testing. It does not require existing system inputs.

• Generation-based CONs: It is required to extract some specifications from the system under test in order to generate meaningful inputs.

(22)

3.4 Checkpointing and Databases

3.4.1 Checkpointing

Context

Originally, checkpointing application has been investigated to improve fault-tolerance in applications. It works by taking a snapshot of the application’s state and storing it in a bunch of files on the disk. In case of failure, the application can be restarted from the saved state by restoring it from its files.

Our checkpointing method varies from the standard approach we just described, as it does not store the state of a web application completely on the disk but mostly in memory.

State manipulation

Here, we describe the inner working of checkpointing processes, detailing the necessary manipulations on the processes’ states.

A snapshot of a process is a dump of all the process’ relevant data. It includes the following elements: a copy of the process’s address space; the state maintained by the kernel and the system server for that process. [7]

The snapshot is stored into files on a reliable storage (e.g. hard drive) that can sustain failures of the running process.

In case of failure, the process can be restored to the snapshotted state. The state information are read from the files and each component of the snapshot is restored to replicate the process’s state at the time of the snapshot.

3.4.2 Databases

Context

In the previous sections, we mention that web applications interact with databases a lot. Here, we give a little bit more details about databases inner workings and, specifically, we focus on transactional databases. Nowadays, most database engines have native support for transactions. A transaction can be seen as an unit of work where one or more actions are being performed against the database at once.

(23)

3.4 Checkpointing and Databases

Transactions

A transaction is a group of operations having some special properties in common, they are atomic, consistent, isolated, and durable (ACID)11_{. This implies that either}

all operations are executed or none of them are. The database will not end up in an inconsistent state following the execution of the transaction. The intermediate operations are completely isolated from the rest of the world until the transaction’s completion.

In most databases, a transaction is started using a specific command such as “BEGIN” or “START TRANSACTION”. Following this command, all the queries performed against the database will be isolated and seen only from inside of the transaction. The transaction is successful when the command “COMMIT” is executed, meaning that all the changes made during the transaction are now stored on the database. The transaction can be aborted. In this case, the database will issue a “ROLLBACK” command, undoing everything that was done inside the transaction.

There are other implemented mechanisms giving transactions even more flexibility. It is possible to issue commands to create a save-point inside the transaction, such as “CREATE SAVEPOINT savepoint name”. This command offers a mechanism to rollback a smaller portion of a transaction by executing the command “ROLLBACK TO savepoint name”, as long as there have not been any auto-committed queries issued in between.

(24)

(25)

4 Prerequisites

In this Chapter, we give the main definitions for recurring terms in this thesis.

4.1 Definitions

4.1.1 Stateful Web Applications

A web page displayed on a web browser is made of multiple elements. Some elements are statics (e.g. text and images) whereas others are dynamic and can be interacted with (e.g. links, buttons, and input fields). When interacting with a dynamic element, an event is triggered which causes a reaction. For example, a new web page appears or some provided data is now stored by the web application.

A stateful web application keeps track of the executed events and may react differently depending on what events have already been executed. Its state machine (or automaton) defines which user-triggered events are enabled in a given state. For example, a user first has to login prior to changing his sensitive personal information and therefore in the initial state the only enable event is “login”.

4.1.2 Test Cases

A test case is a trace of events (or messages). Events are defined by HTTP requests. Our test cases only contain those events that change the state of the web application. Consequently, HTTP requests used to load images, JavaScript, and style sheets are not part of the test case.

4.1.3 Test Suites

(26)

4.1.4 Snapshot of web application

When speaking about a snapshot of the web application, it means that we froze the state information from the web application into the snapshot. We can restore the snapshot to bring the web application back to this prior state at a later point in time.

(27)

5 Design

In Sections 5.1 and 5.2, we discuss the design of our tool. We present both our checkpointing solution and our WebCheck fuzzing framework. We also explain and motivate our main design choices.

5.1 Checkpointing

5.1.1 Goal

Our goal is to develop an efficient technique for testing stateful web applications. The key insight is that many of the test cases in a given test suite share a common prefix of events, for example, sending a login request to a web application. We build upon the ideas from the state-of-the-art techniques for checkpointing applications, discussed in Chapter 2, which can be used to take snapshots of different application states. To execute a test case, we can then restore a particular state, as opposed to starting the application from its initial state.The difficulty lies in designing a checkpointing technique tailored to web applications: web applications are typically large systems spanning over multiple layers (e.g. database and application tiers) and the state-of-the-art checkpointing approaches do not scale to such complex systems. We expect our approach, called checkpoint-based testing, to reduce the overall test case execution time because it eliminates the need to execute the events that are part of the prefix shared among multiple test cases.

Our approach also comes with several challenges. Using existing checkpointing tools to capture the states of web applications is nontrivial as they have been recently introduced. They are still unstable and they do not scale to large web applications with databases. Therefore we cannot readily use existing checkpointing tools to achieve our goal and we need to come up with an alternative solution. The notion of state must be explicitly defined to guarantee that the behavior of a checkpointed web application is similar to that observed when the web application is restarted from its initial state. The web application does not depend on elements outside of the state (e.g. time) and those can be controlled if it is absolutely vital to the well-being of our design.

(28)

5.1.2 Web Application State

We now formalize states for web applications. There are several key factors to consider. A web application, as others event-driven programs, does not operate as most standalone applications that can be found in an operating system. It does not distinctly start and end, it runs endlessly (excluding maintenance). Therefore, the state of a web application doesn’t depend on the web application itself, but instead on its interaction with external entities. Web applications usually process events or requests sent by those entities and the events themselves impel the web application to modify its running state. Hence, the web application’s state is associated to the events it receives. Therefore, it’s easier to pinpoint exactly where to look at to define the state of the web application.

Web application’s states can be represented using automata theory. Figure 5.1 depicts such an automaton, a state machine consisting of three distinct states.

Figure 5.1: Simple automaton.

Starting in the initial state A, the web application can transition into other states (B and then C) according to predefined transitions. We define the transitions to be events received by the web application. We can map this simple automaton to a real life example; take the following use case: a client logs into a cart application and adds an item to his cart. We have the following states and transitions:

• 1st _{state A: no client is logged in (initial state of the web application)}

• 2nd state B: client is logged in

• 3rd _{state C: client has an item in its cart}

• 1st transition A to B: client submits a login request with its credentials • 2nd _{transition B to C: client adds an item to its cart}

Firstly, the client has to log into the web application, this is usually handled by the web application’s session management utility as described in Chapter 3. Afterwards, the client adds an item to its cart. Web applications usually store their users and products data in a persistent storage such as a database.

(29)

5.1 Checkpointing

From this example, we can extract two key elements required to precisely define a web application’s state: the session data and the database.

Keeping track of those two components permits to capture the web application’s state, defined by the pair (<session,database>), where the session data has been extracted and saved from the web applications and a database dump has been saved to its current state. The state can subsequently be restored to erase everything that has been done in the mean time.

5.1.3 Checkpoint Methods

Here, we describe the checkpoint methods that we are going to use starting with the database component and following with the session data.

Databases

Most web applications use a database management system (DBMS) to persistently store their data. The naive approach to database checkpointing is to dump the content of the entire database. While trivial to implement, this approach is inefficient and does not scale well with the size of the database. We will use this naive approach as a baseline in Chapter 7, where we compare different approaches for database checkpointing.

To achieve a robust, efficient, and highly scalable checkpointing solution, we leverage the power of transactional databases by isolating the execution of test cases inside transactions.

The transactions can be rollbacked to the initial state to start with the execution of the next test case. Rather than start the execution of each test case from the beginning, we use the save-point mechanism of transactional databases to only execute the common prefix of similar test cases once. We set a save-point after the Nth request to save the state of the database. For each test case sharing those same first N requests, we directly rollback to the save-point which save us the execution of the first N requests. The save-point acts as a snapshot of the database after the Nth request.

All the subsequent requests are erased when executing a rollback to the stored save-point. There is no trace of the previous test case execution after restoring the state at the Nth request.

Sessions

Let’s address the session data. Session data can be stored in a database, a file or a cookie. For the case where the session information is stored in the database, refer to

(30)

the previous section. For the cases where the session information is stored in a file or in a cookie, we proceed as follows: to take a snapshot when the session data is stored in a file, which can be a cookie or a session file, we make a copy of the file. To restore a snapshot, we simply replace the current session file with the saved copy. It is important to note that the cookie identifier (shared at the HTTP level) needs to be stored too, when dealing with session management on the server side. The cookie identifier permits to authenticate the incoming requests and to attach them to their corresponding session data.

5.1.4 Solution

Our solution to checkpoint web applications, called WebCheck, is to use a proxy between the database driver of the web application and the database itself. The idea, depicted in Figure 5.2, is to extend the database driver in such a way that, instead of executing database queries itself, database queries are redirected to an external proxy and the proxy returns queries’ results as would the standard database driver.

Web Application Native Database Driver Extended Driver Proxy Database

Figure 5.2: WebCheck’s design.

The WebCheck proxy opens a persistent connection to the database to allow transactions to be used. All the received queries are forwarded through the persistent connection which gives the WebCheck proxy full control over all communications to the database. In particular, the WebCheck proxy can makes save-points and rollbacks for checkpointing purposes.

The results are parsed and returned in a format that is understandable for the web application’s database driver. The extended database driver strictly communicated with the WebCheck proxy and does not have any interaction with the database anymore.

Session are being dealt with within the WebCheck proxy. The WebCheck proxy is tuned on a per-application basis to match the methods with which the session are handled. Consequently, each test case can be run in total isolation from the others as the WebCheck proxy will rollback the transaction to an initial state following each test case.

Another good reason to do a proxy is that we gain modularity: all the checkpointing logic is implemented in the driver and, to support different kinds of databases, we simply need to extend their drivers.

(31)

5.2 WebCheck

5.2.1 Goal

In addition to showing the feasibility of checkpointing applied to testing, our sec-ondary goal is to use our newly developed approach within a fuzz testing framework for web applications, the WebCheck fuzzing framework. Fuzz testing web application tends to make use of an immense set of test cases, where each test case contains a slightly different alteration compared to the previous one. Many of those test cases share a common prefix as fuzz testing a web application is not done using single-request test cases because some elements to fuzz may require previous events to happen, such as being logged into the web application. Leveraging our checkpointing techniques within fuzz testing can significantly increase the quantity of test cases one can execute over a fixed amount of time.

Web applications have deep state spaces. To thoroughly test them, we need long traces of events. We must, therefore, fuzz entire traces in opposition to individual requests. We apply the same technique that is used to fuzz-test protocols, as seen in [18], where the full traces are mutated. Providing invalid or unexpected sequences of requests to a web application can potentially lead to erroneous behaviors. Our goal is to investigate in that direction and see what can be uncovered.

5.2.2 Fuzzing Framework

The WebCheck fuzzing framework permits to efficiently fuzz-test web applications. From a set of test cases and a set of fuzz operators, it generates a test suite to execute against a web application. It tests the resilience of the web application when exposed to unexpected sequence of events.

The test suite contains altered test cases built from original test cases being passed to and transmuted by the fuzz operators.

The responses from the web application are redirected to an oracle capable of discovering various failures and report them.

The WebCheck fuzzing framework extensively uses the checkpointing method designed in Section 5.1. After each run, when the web application responses have been analyzed by the oracle, we rollback the state of the web application to the most recent save-point, erasing all the modifications performed by the test case. We achieve test cases isolation, which means that each test case is being run independently from the others. The benefits of this isolation of test cases is that if a test case were to put the database in an inconsistent state, the next test case would not be affected. Putting the database back into a consistent state takes resources. This scenario can be avoided by having execution isolation of test cases. Moreover, having execution

(32)

isolation allows to easily reproduce the results, as there is no need to run multiple days worth of test cases to reach the interesting result. A drawback is that the interaction between different test cases is harder to test for.

5.2.3 Fuzz Operators

Here, we specify what our fuzz operators are.

As mentioned earlier, our goal is to focus on altered traces, rather than on malformed individual requests. Consequently, our fuzz operators are targeting and mutating full traces.

With this goal in mind, we design the following fuzz operators:

• INSERT: is used to insert at least one arbitrary request at a given position in the provided trace.

• REPEAT: is used to repeat the request at a given position in the provided trace.

• SWAP: is used to switch two consecutive requests with one another from a given position in the provided trace.

• REMOVE: is used to remove at least one request from a given position in the provided trace.

• REPLACE: is used to replace a request with an arbitrary request at a given position in the provided trace.

5.2.4 Test Cases

The WebCheck framework, due to its checkpoint-based test execution engine, enables the efficient execution of a large number of long test cases.

In addition, we require a set of test cases which covers as many functionalities of the web application as possible. Higher coverage allows our fuzzing framework to potentially discover more failures in the web application. Each web application has its proper set of specific test cases.

5.2.5 Oracles

The WebCheck framework relies on an oracle to catch failures during altered test cases executions.

(33)

5.2 WebCheck

Our primary goal is to focus on “navigation errors”. A navigation error occurs when an unexpected chain of event is sent to the web application triggering unanticipated responses. With current web browsers, it is possible to save a web page in a bookmark, allowing the user to quickly access the saved web page by requesting the bookmark directly. This feature may have unforeseen consequences. Web pages typically require context, for example some actions have to be performed before requesting a specific web page. Without context, requesting the web page can lead to a navigation error.

There is not a single root cause for navigation errors. Web applications should handle those special cases and prevent unpredicted order of requests to be correctly executed. Although it has been observed that web applications do not always handle it correctly. [10]

Furthermore, a similar behavior can be achieved when using the “back button” of most web browsers. When using this feature, web browsers usually do not resend the request for the web page to the web application. They simply load the web page back from their caches. It leads to similar navigation inconsistencies, comparable to the ones stated above.

The aforementioned faults can lead to more or less critical security issues. For example, missing access control checks can allow attackers to access pages that they should not be able to visit otherwise. The error pages may also leak data, such as providing a dump of the database or they could hint at how to inject a malicious query.

To uncover this kind of web applications’ failures, we design specialized oracles capable of exposing such inconsistencies. Our solution is to create navigation state machines (NSM) for web applications.

A NSM is state machine that describes a set of authorized transitions coming from a specific navigation state. A navigation state represents a web page and the transitions between navigation states are defined by the allowed events to be issued from this web page. We extract the NSM directly from the web applications by following the available hyper-links on web pages and defining which events trigger a transition in the NSM. A simple example is depicted in Figure 5.3.

The WebCheck fuzzer generates altered traces. If a trace belongs to the NSM’s language, the oracle does not trigger any errors, otherwise the oracle issues a navigation error. In addition, if the trace has been flagged as illegal, the oracle inspects the web application’s responses to see if the illicit request has been properly handled by the web application. A web application properly handles an illicit request when it recovers from it by, for example, redirecting the request to a legitimate web pages. The oracle reports an error if and only if the web application does not correctly handle the illegal trace.

(34)

Figure 5.3: A simple NSM.

The nature of the failure has to be verified by hand to decide on its kind (access-control issues, web server errors, . . . ).

(35)

6 Implementation

In Sections 6.1 and 6.2, we discuss the implementation of every component of the WebCheck system, discussed in Chapter 5, as well as the little refinements of the initial design that are specific to the concrete choice of web applications supported by our implementation.

6.1 Checkpointing

6.1.1 Target Applications

We focus solely on PHP based web applications that use MySQL as database. Note that a number of existing web applications fall in this category (Facebook.com1_,

Yahoo2 _{and Wordpress.com}3_{), as shown also by the prominent market share of PHP}

(see Figure 3.4) and MySQL4.

6.1.2 WebCheck Proxy

During the design phase, we have seen in Figure 5.2 what our ideal proxy should look like.

First of all, we briefly address the reason why the proxy could not be implemented directly into the database driver of the web application. With PHP, every visit to a PHP script invokes a new instance of the PHP interpreter. Nothing is shared between those instances. Therefore, keeping and reusing a persistent connection to the database cannot be done as the database driver is reset at the start of every instance. On top of that, with web servers such as Apache, the database connection pools are linked to a unique web server child process, thus there is no guarantee that a user’s HTTP request will be serviced by the process containing the desired persistent database connection. The web server processes do not share the connections. Finally,

1_{https://www.facebook.com/} 2_{https://www.yahoo.com/} 3_{https://wordpress.com/}

(36)

transactions cannot span over multiple database connections. These factors are the reason we chose to implement a proxy to handle the persistent connection to the database.

Our initial design, as seen in Figure 5.2, is simple, yet lightweight as the queries are executed once and the only overhead latency comes from sending the queries and results to and from the proxy.

Implementing our straightforward design is, however, not so simple when dealing with PHP web applications. This issue is specific to PHP and should not happen with other common web frameworks.

The issue specific to PHP web applications occurs when retrieving the result of a database query through the standard database driver. The result variable is stored in a PHP resource, rather than in an array or some other standard data structure. The PHP documentation states that “a resource is a special variable, holding a reference to an external resource. Resources are created and used by special functions.”5 _This

rather imprecise definition does not give more insights into what a resource exactly is. Moreover, it is also mentioned that: “As resource variables hold special handlers to opened files, database connections, image canvas areas and the like, converting to a resource makes no sense.”6

The main problem is that our proxy has to return the result of the database query it received and the result of a database query should be passed as a PHP resource. There is a resource type called “mysql result” which contains the outcome of the query executed against the database7_{. Looking deeper inside the actual}

implementation of the PHP MySQL database driver, we discover that the “mysql result” resource is in fact a complicated C struct containing multiple arrays and variables.

Moreover, PHP resources cannot be stored persistently due to their inability to be serialized. They cannot be sent over a communication channel. Consequently, we have to use the native PHP database driver at some point to get the result from the database as there is no other way to build this resource.

The refined designed of our PHP proxy is depicted in Figure 6.1.

The idea is to overcome the fact that building a PHP resource is complex and time consuming while leveraging the persistent connection from the proxy. The WebCheck proxy receives a query q from the web application and runs it against the database DB inside a persistent database connection.

Instead of directly returning the result r to the web application, the database proxy stores the result r in a temporary database DB’. The proxy creates a new query q’ to retrieve the stored result r from DB’. Afterwards, the query q’ is returned to the

5_{http://php.net/manual/en/language.types.resource.php} 6_{http://php.net/manual/en/language.types.resource.php} 7_{http://php.net/manual/en/resource.php}

(37)

6.1 Checkpointing Web Application Native Database Driver Proxy DB DB' 1. q 2. q 3. r 4. r 5. q' 6. q' 7. r Extended Driver

Figure 6.1: WebCheck’s design for PHP.

extended database driver of the web application and executed using the native PHP database driver. The result r is recovered as a PHP resource.

In more details, here are the specifics of the setup:

• The WebCheck proxy is written in Python. Python contains the features to easily build the proxy and simplifies the implementation and understandability of the proxy over alternatives such as C++ and Java.

• The communication channel between the extended database driver and the proxy is done over basic client and server sockets using TCP connections. The server listens on a predefined port and waits for incoming requests.

• The database proxy can also receive messages from the fuzzer, notifying it when to start or rollback transactions and when to setup save-points. Therefore, the persistent connection to the database is always in the desired state.

• An incoming query is parsed and subsequently executed inside the persistent database connection using the standard Python MySQL connector. Afterwards, the result is parsed and a new table containing the corresponding data is created. A new query to retrieve the data stored in the temporary table is produced and sent back to the extended database driver of the web application.

• The new query that is received by the web application is executed using the native PHP database driver over a standard non-persistent database connection.

In this setup, the proxy can handle the incoming queries and successfully works as an intermediary between the web application and the database.

(38)

6.1.3 Application-level Instrumentation

As explained in the previous section, to intercept all database queries using our proxy, we need to extend the database driver of the web application. In practice, there are two possible use cases, either the web application uses its own implementation of the database driver on top of the native PHP database driver or the web application simply uses the native PHP database driver. Both cases are handled similarly, the only difference being that in the first case, we only have to do one centralized code injection in the database driver file, whereas we have to modify all the web application’s files with our code in the second case.

First of all, we need a brief insight into how database calls are handled in PHP. There are three different PHP APIs for accessing a MySQL database: mysql, mysqli and PDO. Our technique can be used with each one of them. We are going to describe it in details for the MySQL API.

The MySQL API8 _{contains functions allowing the developer to query and retrieve}

data from the database. We are especially interested by the function for sending a MySQL query. The function definition9 _is:

mixed mysql query(string$query[,resource$link identifier =N U LL])

Which can be used using, for example:

$result = mysql query(0SELECT∗FROM table WHERE 1 = 10,db connection);

The query is directly sent to the underlying native PHP database driver, which transmits it to the database and returns the result. Our goal is to intercept the query before it is sent to the native database driver.

A PHP extension called “runkit” allows to do exactly this. “The runkit extension provides means to modify constants, user-defined functions, and user-defined classes. It also provides for custom superglobal variables and embeddable sub-interpreters via sandboxing”10_.

Listing 6.1 shows how we use the runkit extension within our system.

8_{http://php.net/manual/en/ref.mysql.php}

9_{http://php.net/manual/en/function.mysql-query.php} 10_{http://php.net/manual/en/intro.runkit.php}

(39)

6.1 Checkpointing

Listing 6.1: Database driver extension code. 1 runkit_function_copy ( ’mysql_query ’, ’mysql_query_old ’ );

2 runkit_function_redefine(’mysql_query ’, ’$query , $link_identifier =null ’, 3

4 // --- Connect to proxy

---5 $client = stream_socket_client("tcp ://127.0.0.1:12321 ", $errno, $errorMessage); 6

7 if ($client === false) {

8 throw new UnexpectedValueException(" Failed_to_connect : $errorMessage "); 9 }

10

11 fwrite($client, $query); 12

13 // Get response from proxy .

14 $response = stream_get_contents($client); 15

16 if ($response == " true") { 17 return true;

18 } elif ($reponse == " false ") { 19 return false;

20 } else {

21 $newQuery = $response;

22 if (is_null($link_identifier)) {

23 $result = mysql_query_old ( $newQuery); 24 } else {

25 $result = mysql_query_old ( $newQuery, $link_identifier); 26 }

27 }

28 // Close connection to proxy . 29 fclose($client);

30

31 return $result;);

The initial step is to copy the function that we want to redefine to a new one with a different name. We can redefine our target function (here mysql query) by overwriting it with our own code. We start by creating a client socket to connect to the database proxy. When the stream is successfully opened, we send the query to the proxy. We wait for a response from the proxy containing the new query used to retrieve the result using the native PHP database driver.

There are a few cases where the query only returns a boolean variable, this happens with queries of the forms: “INSERT INTO”, “UPDATE”, . . . In those cases, we return the received Boolean value. If the response is a query, we use the copy of the mysql query function that we created previously. Using the original function is straightforward, we pass the new query and a non-persistent database connection, if one exists, and it returns the result as a PHP resource. We close the connection to the proxy before exiting.

6.1.4 Complexity

The proposed solution for checkpointing web application comes at a price. The price to pay is an added complexity brought by the additional back and forth exchanges with the database.

(40)

When using the native PHP database driver, there is a single interaction D with the database, making the complexity O(D). In the proposed solution, depicted in Figure 6.1, there are two additional interactions with the database, making the complexityO(3∗D). In addition, there is also the processing time due to underlying actions happening within the proxy. Let P be this additional time. The complexity is, therefore, O(3∗D+P). Finally, we have to take into account the communication time to pass the query from the extended database driver of the web application to the proxy. We define one round-trip time as C, which leaves us with a complexity of O(3∗D+P +C) compared to the initial complexity of O(D) from using the default database driver.

Note that the complexity is the “per-query” complexity and each request may contain multiple queries.

6.2 WebCheck Fuzzer

6.2.1 Input Parsing

The test cases are recorded using tcpdump11 with the command: “sudo tcpdump -i lo -w file name.pcap ’port 80’ ”. The test cases are stored in PCAP files containing all the necessary requests to replay the execution of the test case. Initially, test cases are recorded from a user interacting with a web browser. Consequently, the traces saved in the PCAP files contain a lot of requests that are irrelevant for testing. We are testing the behavior of the web application when altering the flow of user requests. Thus, only the requests that are part of this flow are relevant. Browsing a web application through a web browser generates requests for the various images, videos, and JavaScript files to be loaded by the web browser.

Those requests are irrelevant as we are not scrutinizing the way a web page is displayed. Therefore, our first step after loading the traces from the various test case files is to filter out the irrelevant packets. The filter rejects requests for: JavaScript files, CSS files, icons, PNG images, GIF images, and JPG images to name a few.

Afterwards, the unique requests used by the fuzz operators are extracted from the test cases. We focus on test cases covering as many functionalities from the web application as possible. A set of test cases that thoroughly exercise the web application will naturally result in more complete test cases generated by the fuzz operators. The test cases have to be generated accordingly.

(41)

6.2 WebCheck Fuzzer

6.2.2 Fuzzer

The proxy needs to be started when the fuzzer uses checkpointing. The proxy is an external component independent from the fuzzer. Additionally, the web application must use the extended database driver to communicate with the proxy.

The fuzzer has two distinct modes: normal mode and checkpointing mode. The normal mode simply executes all the test cases from starting message to fuzzed message without exploiting the benefits of checkpointing. The database has to be brought back to its initial state in between each round.

On the other hand, the checkpointing mode uses all the functionalities provided by the checkpointing technique. The common prefixes among the test cases are executed only once and the fuzzer can leverage the rollback of the transaction to come back to the initial state.

Listing 6.2 describes how the fuzzer alters the initial test cases to form new ones that are then executed.

Listing 6.2: Pseudo-code of fuzzer execution. 1 for testcase in set_testcase

2 for position in testcase

3 if checkpointing_mode == true: 4 execute(request, position - 1) 5 add(savepoint)

6 for fuzz_op in set_fuzzOperators

7 for unique_request in set_unique_request

8 altered_testcase = fuzz(testcase, position, fuzz_op, unique_request) 9 if checkpointing_mode == true:

10 execute(altered_testcase, position) 11 else:

12 execute(altered_testcase)

There can be hundreds of “altered testcases” from a single test case. Every time we increment the position, the resulting set of “altered testcases” share a common prefix, that is to say the first “position - 1” requests coming before the alteration. Consequently, the longer the test case, the longer the shared prefix become and the more requests we can save using checkpointing.

A fuzzer instance starts by picking a test case from the set of test cases. It iterates through all the possible request’s positions in the trace. For each position, the fuzzer applies each fuzz operator with all the unique requests at its disposal. Given that there are P positions in a test case, F fuzz operators, and R unique requests, the total number of unique “altered testcases” isP ∗F ∗R.

When at position N, the N −1 requests are the same for all the “altered testcases”, using checkpointing implies that, instead of having to run those N −1 requests for allF ∗R “altered testcases”, we can execute them only once and create a save-point. We can then rollback to this save-point and save:

(42)

The bigger the N, the more time is saved.

6.2.3 Fuzz Operators

We use fuzz operators altering traces of requests. Traces are stored in lists, conse-quently, fuzz operators are operations on lists, modifying their content or order. The Python implementation of the fuzz operators is described in Listing 6.3.

Listing 6.3: Python implementation of fuzz operators. 1 def fuzz_op(operator, testcase, position, unique_request):

2

3 if operator is " INSERT ":

4 altered_testcase = testcase[:position+1] + [unique_request] 5

6 elif operator is " REPEAT ":

7 altered_testcase = testcase[:position+1] + [testcase[position]] 8

9 elif operator is " SWAP " and (position < len(testcase) -1): 10 # swap positions

11 testcase[position], testcase[position+1] = testcase[position+1] , testcase[position] 12 altered_testcase = testcase[:position+2]

13

14 elif operator is " REMOVE ": 15 del testcase[position]

16 altered_testcase = testcase[:position+1] 17

18 elif operator is " REPLACE ":

19 testcase[position] = unique_request 20 altered_testcase = testcase[:position+1] 21

22 return altered_testcase

6.2.4 Oracles

Our fuzzing framework makes use of two oracles: a generic oracle and an application specific oracle.

The generic oracle inspects the content of the responses received from the web application. We check for three kinds of errors:

• Server errors: happen when the status element in the header of the response is 500. It means that the server was confronted with an unexpected condition and could not complete the execution of the request.

• PHP errors: happen when the PHP interpreter encounters an unforeseen event and does not handle it correctly.

(43)

6.2 WebCheck Fuzzer

• Database errors: happen when the PHP interpreter faces difficulties while accessing the database. It results in a database error that can potentially leak important information on the database schema.

The application specific oracle is a bit more complex and it works in pair with the generic one. In Section 5.2.5, we described our idea to build navigation state machines (NSM) for each web application.

The NSMs are extracted manually from the web application. We navigate through the web pages and extract meaningful events as transitions between states. The NSM extraction takes a few hours for simple web applications. It does not scale well for bigger web applications, as they get a lot more complicated. There are automatic tools (e.g. web crawlers, model-based testing tools, . . . ) that can help with the NSM generation but this is out of the scope of this thesis.

We start from the default page of the web application, for example the index page or login page, and select this to be the initial state of the NSM. Afterwards, we focus on determining which requests or events are central to the web application and can initiate a change of state.

Those events have in common that they usually are prerequisites for other events, for example a login event has to happen before accessing some private data. Thus, requests can be categorized into two sets, the ones which are relevant to the flow of the web application and the ones which are not.

The relevant requests make the transitions between the states whereas the others represent “transitions” to the same state. The requests from the second category usually have a request of the first category as a prerequisite but are never the prerequisite of another request. The requests of the first category are the prerequisite of requests of both categories.

The altered test cases generated by the fuzz operators contain a trace of requests that is analyzed by the oracle using the NSM for the corresponding web application. In many cases, the oracle will issue false positives as altered traces often fall outside of the NSM’s language. To solve this issue, the oracle examines the responses returned by the web application to see if the illegal sequence of events is handled properly. For example, trying to access some private content before being logged into the web application should redirect to the login page. Using this method, we can reduce the number of false positives from the oracle to when the web application inappropriately handle the out-of-place request.

The two oracles allow the WebCheck fuzzing framework to uncover a specific kind of failures, namely web server errors and navigation errors. Although, it is important to note that the set of found failures is only as good as the oracle being used. The more effort being put into the design of the oracle, the better results we can expect from the fuzzing framework as a whole.

(44)

(45)

7 Empirical Evaluation

In this Chapter, we discuss our main hypotheses and justify them with a set of experiments. For each hypothesis, we describe each case study and review the results.

7.1 Hypothesis

In this thesis, (1) we study the problem of checkpointing web applications and (2) we build a fuzzing framework leveraging checkpointing of web applications to increase the efficiency of running large test suites. By increasing the efficiency, we mean running more test cases in a given amount of time.

Concretely, we investigate the following hypotheses:

• H1: Applying checkpointing techniques to web applications is feasible. • H2: Web application checkpointing reduces the time for running test suites. • H3: Our fuzz testing approach effectively exposes web application error.

7.2 Experimental Setup

The multiple experiments are conducted on the following three PHP web applications: • Tomatocart1 1.1.8, an open source professional eCommerce solution. ( 325k

lines of code)

• BambooInvoice2 _{0.8.9, an invoicing software intended for small businesses.}

( 79k lines of code)

• Cuteflow3 _{2.11.2, a web-based open source document circulation and workflow}

system. ( 50k lines of code)

1_{http://www.tomatocart.com/}

2_{http://www.turnkeylinux.org/bambooinvoice} 3_{http://cuteflow.org/}

(46)

The web applications use a MySQL database (Version 14.14; Distribution 5.5.41) as well as the original PHP MySQL API. We use PHP version 5.5, in addition the following module has to be installed: runkit4. The Zend Engine version 2.5.0 is also used. The web server runs Apache version 2.4.7 on a Linux Ubuntu machine (kernel version: 3.13.0-48-generic). The machine’s specifications are: Intel Core i7-4600U (3.30GHz), 8GB DDR3L SDRAM 1600 MHz, 256GB SSD.

7.3 Evaluation of H1

Our first experiment aims to investigate whether checkpointing web applications is feasible. We compare the outcome of the two following use cases:

1. Run a test case from start to finish.

2. Restore a snapshot of the web application and then execute only the last request.

We run this experiment on the TomatoCart web application, as it provides a good scenario to run and it is a mature, well-tested product. The test case we use goes as follows: a client logs into the web application, adds an item to his cart, and proceeds to the checkout process. The snapshot takes place before entering the checkout process, making it easy to detect if checkpointing the web application works. The snapshot is taken while recording the test case from the browser.

If the checkpointing works, executing the last request will bring the client to the checkout process with an item into his cart, while being logged in. If the checkpointing does not work, the client will be redirected to the login page of the cart application.

Results We run 100 test cases with and without checkpointing. In all cases, the state was restored properly.

Discussion This small experiment shows that checkpointing web applications is indeed feasible. Restoring the pair <session data; database data>, which precisely captures the state, before the last request, proved to be sufficient to trick the web application into thinking that the events leading to this state happened previously.

(47)

7.4 Evaluation of H2

In our second set of experiments, we investigate the time required to run test cases with and without checkpointing. We divide our experiments into two parts.

7.4.1 First Set of Experiments

Design of Experiments

We compare our checkpointing solution to various techniques that can be used to execute test cases, namely: Selenium; executing the test cas

Checkpointing-Based Testing

Master Thesis

Declaration of Originality

Abstract

Contents

1 Introduction

2 Related Work

2.2 Web Application Testing

2.3 Fuzz testing Web Applications

3.1 Web Applications

3.1.3 Storage

3.1.5 Web Application Frameworks

3.1.6 Web Application Security

3.2.2 Approaches

3.4 Checkpointing and Databases

3.4.2 Databases

4 Prerequisites

4.1.2 Test Cases

5 Design

5.1.2 Web Application State

5.1.3 Checkpoint Methods

5.1.4 Solution

Web Application Native Database Driver Extended Driver Proxy Database

5.2 WebCheck

5.2.3 Fuzz Operators

5.2.4 Test Cases

6 Implementation

6.1.2 WebCheck Proxy

6.1.3 Application-level Instrumentation

6.1.4 Complexity

6.2.1 Input Parsing

6.2.2 Fuzzer

6.2.3 Fuzz Operators

6.2.4 Oracles

7 Empirical Evaluation

7.2 Experimental Setup

7.3 Evaluation of H1

7.4 Evaluation of H2