JMF: Java Measurement Framework

(1)

JMF: Java Measurement Framework

Language-supported Runtime Integrity Measurement

Mark Thober

The Johns Hopkins University Applied Physics Laboratory

Laurel, MD, USA [email protected]

J. Aaron Pendergrass

The Johns Hopkins University

Applied Physics Laboratory Laurel, MD, USA

[email protected]

Andrew D. Jurik

The Johns Hopkins University

Applied Physics Laboratory Laurel, MD, USA [email protected]

ABSTRACT

Runtime integrity measurement systems provide the capa-bility to observe the runtime state of a process and to de-termine whether or not it is acceptable. Existing software systems tend to forgo integrity checks altogether or to enlist static mechanisms (e.g., assertions) to detect unacceptable process states at runtime. A large and growing base of ma-licious software necessitates more sophisticated handling of threats to process integrity.

In this paper, we describe an approach to runtime in-tegrity measurement we call the Java Measurement Frame-work (JMF) that presents a new way to define and check run-time integrity policies. We define a policy language based on Java that provides an accessible way to write integrity policies and we describe a periodic, dynamic measurer that obtains snapshots of process state, which are evaluated with respect to a policy by an appraiser. With full process state available to the appraiser, policies can express rich relation-ships between multiple objects, thereby detecting abnormal-ities in an application’s data structures. Our framework may be used to detect a powerful adversary who has the capabil-ity to modify both the runtime bytecode and data structures of Java applications. We show that our prototype implemen-tation in Java has acceptable overhead and that it can be used to detect runtime integrity violations in several real Java programs.

Categories and Subject Descriptors

D.2.4 [Software Engineering]: Software/Program Verifi-cation; D.4.6 [Operating Systems]: Security and Protec-tion

General Terms

Measurement, Security

Keywords

Runtime integrity measurement, integrity policy, attesta-tion, Java

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.

1. INTRODUCTION

People depend upon computing devices to store, manip-ulate, transmit, and visualize data. A major risk of this dependence is that platform software components can be al-tered unexpectedly either by accidental error or malicious act, leading to concerns about the integrity of the software. A piece of software (e.g., a running process) is said to have integrity if it runs without improper system alterations [5]. Stakeholders would like to base their access control deci-sions (e.g., to restrict access to particular data or network resources for a given process) on the operational integrity of the process, granting access to critical resources only when the process that is running is the right one and is execut-ing as it should (e.g., ensurexecut-ing that the correct program is started and continues to remain in place).

We categorize solutions that help recognize an altered or misbehaving system as eithermeasurement systems or

mon-itoring systems. A measurement system attempts to

deter-mine whether a process is valid by periodically examining its current state, independent of any previous behavior. In con-trast, a monitoring system abstractly models the execution of a process, reflecting its state transitions in the monitor’s model. A monitoring system, for example, may embed run-time assertion checks into the process to ensure that relevant invariants hold at specific points during execution.

Measurement systems may be classified as either load-time or runtime systems. Load-time measurement systems ver-ify only that the correct software is loaded. These systems (e.g., [21, 35]) are typically based on computing a crypto-graphic hash of the program image as it is loaded. Since load-time measurements do not mitigate the threat of run-time attacks, runrun-time measurement is an important tech-nique for determining trustworthiness. Runtime measure-ment systems attempt to establish that the current state of the target process is consistent with its expected execution. Several techniques and systems have been developed to perform runtime integrity measurement [26, 29, 30]. The fo-cus of these systems is primarily onmeasurement, which ex-tracts the relevant measurement data from the measurement target, with relatively little emphasis onappraisal, which de-termines if a measurement is accepted by some policy.

The main goals of our work are to establish a runtime mea-surement framework that focuses on appraisal policies, and to explore programming language support for the definition and application of appraisal policies. We call our approach the Java Measurement Framework (JMF), which provides a mechanism for defining and checking runtime integrity poli-cies on Java programs. Java is a suitable language for our

(2)

purposes because the object-oriented nature of the language provides clear representation of data structures both in the source code and the runtime environment. We define an integrity policy language that provides for the writing of policies that are based on program source code. Integrity policies consist of a set of invariants that specifies the set of “acceptable” or “valid” runtime states for the given program. We have implemented an end-to-end prototype upon the OpenJDK 7 [37] implementation of the Java runtime envi-ronment. Our implementation provides (1) a measurer that dumps the heap, thread stack traces, and loaded code (col-lectively known as the measurements) from a running Java process (2) a policy compiler that takes a policy and gener-ates the source code for an appraiser and (3) an appraiser harness that runs the policy compiler output on the mea-surements to determine whether the policy is satisfied or not.

The main contributions of this paper are:

• A novel design and implementation of a system for cre-ating policies, measuring Java processes, and apprais-ing the policies with respect to measurements with-out modifying or directly annotating application source code.

• A presentation of case studies showing sample policies in real Java applications that can be used to detect actual malicious modifications.

• An evaluation that demonstrates our measurement ap-proach has sufficiently low overhead for practical pur-poses.

2. ADVERSARIAL MODEL

In order to further motivate our measurement and ap-praisal approach, we consider the adversary’s capabilities and we describe the advantages of using JMF to detect ap-plication compromise. An important aspect of this work is that our measurement system does not assume that a pro-gram always executes according to the language semantics. We assume that an attacker is able to modify the runtime bytecode of a Java application, as well as any of its runtime data. Such malware is embedded in extant software and can be difficult to detect.

By not trusting the bytecode of the application, we can detect a more powerful adversary than monitoring systems that use inline code instrumentation (e.g., [4,13]). Since the adversary can alter such bytecode, he can simply remove the necessary portions of the monitor to obtain the desired effect, rendering the monitor useless. Such modifications will be detected by JMF.

Recent works [27, 41] have described the dangers of Java-based rootkits, which can enable malicious behavior at the level of the Java application that is running it. By modifying the bytecode of the application, an adversary can insert ma-licious behavior. Consider the following example code that authenticates a user by a password and includes a malicious modification.

1: public User authenticate(Auth auth) { 2: String user = auth.getUsername(); 3: String password = auth.getPassword();

4: SendToURL("attacker.com",user+password);

5: ... 6: }

The call toSendToURL, highlighted on line 4, shows a mod-ification of the application that will send the username and password to an external site (we show this modification in the source code instead of bytecode to improve readability). Such malicious code modifications can provide a full range of capabilities for an attacker, which is particularly dan-gerous for applications on servers running with privilege. In addition, since Java provides ample code reuse through libraries, modifications to these libraries can result in the automatic infection of all Java programs on the platform (consider changes to theSystem.out.println() method).

Although modifications to the bytecode may give the at-tackers more apparent abilities, changing critical data struc-tures of a runtime application can provide varying degrees of functionality to an attacker, depending on the application. For example, data structures in an FTP server hold the ac-cess rights of each user, and modifying these data structures can give a user access she shouldn’t have, or create a new user entirely. Such a modification does not require changes to the bytecode of the application, but can still give an at-tacker access to every file on the server.

Since measurement is periodic, it is most suitable for dis-covering policy violations in a target process’s data struc-tures that are long lived and likely to appear in most mea-surements. Identifying locations in software that are subject to such problems is left to the policy writer and would likely be guided by the magnitude and probability of data loss should an application become compromised.

We assume that the runtime environment (i.e., the JVM and operating system) is trustedwhen the measurement is

taken, so that we can be sure the measurement reflects the

actual runtime state of the application. We also assume that the measurement output is secured, for example, by relying on a chain of trust provided by a Trusted Platform Module. Other measurers can be used to justify the trustworthiness of the runtime environment, to provide trust in the JMF measurement. Prior work has shown how to measure the runtime integrity of an OS kernel [26, 29]; we believe the kernel integrity concepts can be expanded to the JVM as well. For the purposes of this paper, we assume that the runtime environment is either measured with such a tech-nique or is inherently trusted.

We acknowledge the possibility, however, that the envi-ronment may have suffered an attack at a time prior to the measurement, which may have contributed to the modifi-cation of the Java applimodifi-cation. Other possible attacks may arise due to class path attacks that load alternate classes, native code, or application bugs. An example of an applica-tion bug is code that returns references to mutable objects that are intended to be private. This permits unintended modification of data structures from an external (possibly untrusted) caller; this kind of data structure modification can be detected by JMF.

3. INTEGRITY POLICIES

Integrity policies represent the mechanism through which JMF is able to recognize aberrant program state. We de-fine a syntax for writing integrity policies in Figure 1. The syntax is similar to invariant specifications in Design By Contract [28] languages such as JML [25]. We chose to use a new syntax here as it simplifies the presentation, so we can focus only on invariants, and not on the other parts of these specification languages. In the future, we plan to implement

(3)

qual ::=ǫ|x.m| ¬(qual)|qual∨qual | qualifier qual∧qual

spec ::=ǫ| ∀x:τ ∈H | ∃x:τ∈H | specification

spec,spec

fdef ::=booleanf(C x){s} function def.

ρ ::=spec,qual⊢f(e) policy

Figure 1: Integrity policy syntax

a version that is compatible with JML. In our syntax, a pol-icy consists of a number of specifications and qualifiers, and an application of a boolean function. The specifications de-scribe the types of objects on the heap to which the policy applies and the qualifiers describe locations in the execution where the policy must hold. The boolean function can be computed for objects matching the specification if all the qualifiers are satisfied. The definition of a boolean function

fdef is consistent with the definition of methods in Java, only it must return a boolean. Thus, a function has a list of formal parameters, each with its own typeCand identifierx, and a body consisting of a sequence of statements{s}. Since appraisal of a measurement is purely functional (i.e., taking a measurement and producing a yes/no answer), the ap-praiser must not alter the measurement and for this reason, our implementation does not allow method calls on objects in the measurement, as a method may mutate the state of the object. The statements in the function definitions can-not contain method invocations of the input arguments, and all object fields must be accessed directly.

Measurement systems may use a baseline measurement, taken while the process is known to be in a good state, as a parameter to appraisal. Hence, theH specification is in-cluded in our syntax definition to allow support for separate measurement and baseline heaps. In the remainder of this paper, we will refer to the measurement target heap asHT and the baseline heap asHB; when a baseline is not part of the policy, we assume the heap in question is the target heap, and shall omit it from the policy and simply write ∀x:τ. Baseline policies are discussed further in Section 3.2. The integrity policy language of JMF enables a partial solution to the measurement quiescence problem [39]: how to measure a target when it is executing in some critical section (e.g., updating a field). The qualifiers qual allow particular policy constraints to be enabled or disabled for a given set of objects if the specified methods of those objects are currently executing. This capability of our policy speci-fication system permits object invariants to be temporarily violated while the object is being updated. In particular, this provides a means to avoid race conditions between the target and the measurer when object updates are not atomic in the underlying system. Our approach assumes that the temporary violation of invariants occurs at the granularity of methods rather than statements. In other words, it must be acceptable to disable the invariant check for the entire body of the method, so the method must not include state-ments for which a broken invariant could be problematic. If this does present a problem, a programmer can refac-tor the problematic code in question into separate methods. Alternatively, we could extend our policy syntax to permit qualifiers at the statement granularity, as the information is already available within the running JVM; however, we believe this is generally against the spirit of object-oriented

programming, and a more logical grouping of the code into distinct methods is a better solution.

Based on the policy, the appraiser can determine if any part of the measurement target is in a critical section merely by looking at the measurement. Hence, our solution to the quiescence problem does not require the target to actively participate in measurement, avoiding the risk that a com-promised target can clean up before a measurement. The other half of the coherency problem is atomicity, i.e., that the measurement must reflect the state of the target at a particular moment in time. Our framework assumes that the underlying system is able to fulfill the atomicity require-ment by providing a snapshot of the target heap and stack; our implementation, described in Section 4 fulfills this re-quirement.

One of the useful aspects of our framework is that the policy is orthogonal to the source code of the application, so the application does not need to be recompiled. This supports the writing of policies for legacy applications and libraries. This also provides flexibility in that any number of different policies may be written for the same application, based on the needs of the appraiser. Inline monitoring ap-proaches such as MOP [13] lack such flexibility and require recompilation of the source code to embed the monitor in the application.

Writing accurate integrity policies remains the responsi-bility of the policy writer, and JMF is not intended to de-termine if a policy is the best policy for a given program. However, we do aim to provide a useful mechanism to help programmers write maintainable integrity policies for their programs.

3.1 Simple Example Policy

We now present a simple example policy to show how a useful policy is written; in Section 4 we describe how this policy is enforced in the implementation. Realistic policies written on real Java applications are discussed in Section 5. The following policy (and corresponding function defini-tion) states that every LinkedList object must have the same number of nodes on the list as stated by thenumNodes

field in the object, except when the add or remove methods are in the process of manipulating the list.1

∀l_:LinkedList,¬₍l.add∨l.remove₎⊢correctListSize₍l₎

boolean correctListSize(LinkedList list) { int count = 0;

Entry node;

for (node = list.header.next; node != list.header; node = node.next) { count++;

}

return count == list.size; }

3.2 Baseline Policies

Baselining allows the appraisal policy to be more easily parameterized by values that are either unknown until run-time or may be tedious to enumerate explicitly. For exam-ple, one may wish to have a policy on a server application stating that the data structures holding the user access in-1_{The Java} _LinkedList _{class includes other methods for} adding/removing elements that should be in the qualifier; these are omitted for brevity.

(4)

formation is the same as a known-good program state, as in the following policy.

∀l1:User∈H_T,∃l2:User∈H_B⊢HasSameAccess(l1,l2)

boolean HasSameAccess(User l1, User l2) { return ((l1.name == l2.name) &&

(l1.access == l2.access)); }

The addition of baseline heaps does not add completely new capabilities over other specification languages like JML. One could write a policy where the function takes the base-line heap as input and parses it to produce the same func-tionality. However, we submit that the capability to com-pare two heaps is a useful layer of abstraction for writing practical appraisal policies; the appraisal system is then re-sponsible for the comparison of heap objects, which is more desirable than placing the burden on the policy writer.

A useful feature of baselining is the ability to generate a baseline from a static configuration file. In the user access example, we may want to ensure that allUserobjects in the runtime have a corresponding entry in a configuration file. A separate baselining program could generate a baseline heap

HB from this static configuration file, so the server would not actually need to be booted to obtain a baseline.

3.3 Class-based Policies

We choose to enforce policy at the level of classes, rather than applying different policies to each individual object in-stance. Class-based policies simplify the design of the policy syntax without any loss of expressiveness in describing pol-icy. It also aligns with related work on class invariants (see Section 8.3).

To write a policy about specific object instances, the ob-jects must be distinguishable at the class level; the program-mer must either usestatic definitions to declare variables that are effectively global to that class, and not per-instance (policy applying to that class will be specific to those vari-ables), or use subclassing to create a unique class that is only instantiated once, and then write policy about that subclass. For example, one may write a policy that applies to partic-ular static fields of a class. Consider the following example Java code that defines a classGlobals, which contains two linked lists.

class Globals{

static LinkedList inUse; static LinkedList notInUse; }

A policy that enforces theinUselist andnotInUselist to be disjoint is as follows:

⊢disjointLists(Globals.inUse,Globals.notInUse). The corresponding function definition of disjointLists

has been omitted for brevity; it simply compares the ele-ments of the two lists and returns false if any entries are the same and true otherwise. This policy applies only to the twoLinkedLists that are contained inGlobals. The alter-native to writing policies at a finer granularity is to derive a subclass and write a policy about the subclass. A class may be defined as follows:

class DisjointLinkedList extends LinkedList{}. A policy for objects of these classes may be as follows: ∀l1:DisjointLinkedList,∀l2:DisjointLinkedList ⊢disjointLists(l1,l2).

Note that the programmer must be sure to declare objects of this new subclass type when this policy is to be enforced,

which is beneficial in that it makes the policy decisions ex-plicit and observable in the API, though it may require the programmer to rewrite some code.

4. JMF DESIGN AND IMPLEMENTATION

We have implemented our Java Measurement Framework using the Java language and runtime environment provided by OpenJDK 7 [37]. Figure 2 depicts the key aspects to the implementation, including the input of a policy, mea-surement of the process during runtime, and appraisal of the measurement results with respect to the policy. In this section we describe each of the components in detail.

4.1 Policy Creation

In order to measure and appraise an application, an in-tegrity policy must be written using the specification lan-guage described in Section 3, describing the important as-pects of the data structures of the target application. The upper right portion of Figure 2 shows the flow of the policy into the system.

The policy compiler transforms the policy directly into Java source code that integrates into the appraiser. The compiler consists of a lexer, parser, and transformation filter written in ocaml and outputs the source code for a Java class. This source code is merged with auxiliary Java code that parses Java application heaps and stacks; the code is then compiled to produce the policy appraiser described in Section 4.3.

4.2 Measurement

A runtime measurement of a Java process consists of three parts: a heap dump, thread stack dumps, and a dump of the hashes of loaded classes. We have implemented a stand-alone tool,jmack, that produces these measurements by at-taching to a running JVM at any time after the process has been started. jmack is based on the JVM monitoring tools

jmap and jstack. jmack is invoked with the process ID of

the target process and the desired location of the output measurements.

In order to obtain bytecode hashes of the JVM’s loaded classes, atomic heap and stack trace dumps, and the bene-fits of measuring the target process in parallel while it runs, the target process must be run using our modified version of OpenJDK 7 so that the VM command is recognized when

jmack invokes it within the target. Outside of running the

necessary version of java, jmack requires no further sup-port from the compiler, class files, or target program to run on legacy software. Most annotation solutions (e.g., [12]) require both access to and recompilation of source code. We later describe our modifications to the JDK that

per-mitjmack measurements and also improve performance of

the target during measurement; an unmodified JVM does possess the capability to dump the heap and stack traces to capture the same state, but the target process must suspend itself while the measurements occur.

4.2.1 Naïve Measurement

Thejmack functionality appears in the center of Figure 2. At its core, jmack extracts a heap dump and thread stack traces (augmented with local variable information) from the JVM (note that these are the heap/stack for the Java ap-plication maintained by the JVM, not the heap/stack of the JVM maintained by the OS). It also outputs SHA-1 hashes

(5)

Reference Class Files Heap Dump Stack Dump Loaded Classes Policy Compiler Heap Target Policy Appraiser Appraiser Class JVM Target Process jmack jmack jmack Success or Failure Success or Failure Permgen Space Target Stack

Integrity Policy file

Figure 2: JMF system diagram. The policy is written and compiled into an appraiser program. The target Java process is periodically measured at runtime, and the appraiser is subsequently run on the measurement outputs and applicable policies.

of the class constant pool bytecodes and method bytecodes of loaded classes from the permgen space of the target, so the appraiser may compare them with class file definitions.

We have combined the collection of all three measure-ments in one tool to provide an atomic snapshot, so the entire measurement represents a particular moment in time. Our initial implementation achieves this atomicity by paus-ing the Java process while takpaus-ing the measurement. This is clearly undesirable, since the time to extract the measure-ment is linear in the size of the heap, stack, and permgen space; in the case of larger applications, this can take several seconds.

4.2.2 Improving Measurement Performance

The manner in which the JDK is built plays a signifi-cant role in its performance properties. We observed that the product build is roughly five to ten times as fast as the debug build, and we report all results using the prod-uct build. The manner in which the heap and stack traces are dumped plays an equally significant role in performance. The JVM includes several ways of dumping heap and stack information, each of them tailored for particular scenarios. One way to dump the heap and stacks is coded in Java and forces a heap dump at a point in time when a process isn’t cooperating, but can leave objects unresolved. An object is unresolved when it has an ID but the detailed object infor-mation is not found within the heap snapshot; the presence of unresolved references indicates a problem with the heap snapshot file (i.e., the hprof file).

An alternative, native code path uses C++ code, which not only speeds up the measurement but also provides more accurate heap dumps (i.e., fewer unresolved objects). In the default implementation of OpenJDK, however, the na-tive code path will only measure a process once because the domain socket used for inter-process communication is un-linked and not recreated (presumably because it assumes a dump will only be taken once, when something bad happens, as in a program crash). We modified the JVM to

circum-vent that limitation to allow unlimited dumps throughout the lifetime of the process.

Using the product build and native code to extract mea-surement information improves performance, but the target process still must be paused for an amount of time linear in the size of the heap, thread stack traces, and methods. To further minimize the impact of measurement on the target process, we add a new VM command to the JDK,fork op, that forks a target JVM process and calls several VM opera-tions, namely to dump the target heap, stack, and bytecodes of the child process. VM commands provide the interface from an attached agent to the VM internals by calling VM operations. Forking the process utilizes the memory

copy-on-write(COW) mechanism of Linux, which allows the

tar-get process to run while the measurement is performed on the forked process. Copy-on-write has been shown elsewhere to be a useful mechanism to improve target performance while still guaranteeing atomicity of the measurement [39]. It is important to note that this is an OS-level fork of the JVM that is running the Java application; we do not need to implement any new COW functionality, but instead make use of the existing COW implementation within Linux that happens automatically with the fork.

We invoke the fork system call within the fork op VM command. At that location in the call stack we have access to the target process state, and can obtain the measurements from a target process clone while the target process runs in parallel. Once jmack initiates a fork of the target process,

jmack begins capturing the state of the newly created

pro-cess while the original target propro-cess proceeds.

4.3 Appraisal

The appraiser consists of two separate components: the measurement appraiser generated by the policy compiler that verifies the integrity of the heap and stack dumps, and a class appraiser which verifies the integrity of the loaded classes. The policy compiler takes a policy as input and pro-duces Java source code. This Java code is added to a stan-dard set of Java appraisal code to produce an appraiser that

(6)

is specific to the policy. The standard appraisal code consists of code derived from the Java Heap Analysis Tool (jhat) that can parse heap dumps and stack traces, and is augmented to support the relevant aspects of our policy language in-cluding qualifiers and existential and universal quantifiers. The combined Java code for the appraiser is then compiled by a Java compiler to produce a program that serves as the appraiser. This program takes as input the snapshot from the measurement tool and outputs whether the policy has been satisfied or not.

Appraisal of a class measurement involves comparing the SHA-1 hashes contained in the measurement with SHA-1 hashes of the respective portions from class files. We use the class file parser from the Byte Code Engineering Library (BCEL) [3] to read in the class file information. We then compute a SHA-1 hash of the constant pool for each class file and SHA-1 hashes of the bytecodes of all methods contained in each class file. These hashes are then compared with the loaded class measurement, with any discrepancies causing a failure of the appraisal.

4.4 Method Inlining and Other Optimizations

The Java HotSpot Server VM [38], upon which our im-plementation is based, is an adaptive compiler with many improvements on just-in-time (JIT) and static compiling ap-proaches. The HotSpot VM maintains two call stacks, one for the dynamically compiled code that is actually run and a second, “virtual” stack that retains the structure of the orig-inal bytecode. The VM starts by interpreting bytecodes; a profiler monitors the execution to determine code “hot spots” that will be subject to dynamic compilation and optimiza-tions. As a result of JMF’s thread stack traces representing the structure of the bytecode, optimizations such as method inlining and dead code elimination can be done with no ef-fect upon our measurement framework.

If an adversary were able to modify the native code with-out modifying the bytecode, then the JMF approach could be vulnerable. There are several reasons why modifying only the native code (that corresponds to bytecode) would be very difficult. First, since the HotSpot VM dynamically compiles code, the attacker is limited to code that actually gets compiled (as opposed to bytecode-interpreted) which may be as little as 10% of the bytecode [38]. Second, the HotSpot VM may also back out an inlining optimization if necessitated by dynamic code loading. These first two points imply that modifications to native code are brittle, requiring substantial control of the operation of the Hotspot VM and the configuration of the application. A third diffi-culty is the new or modified native code would have to be reconciled to the VM’s virtual call stack. Ultimately, since we must trust the JVM for taking accurate measurements, we do not specifically guard against an attack of on-the-fly native code. However, we do plan to explore the feasibility and impact of such attacks in future work and look for ways to mitigate such threats.

5. CASE STUDIES

To show the practicality of our approach and implemen-tation, we now present several case studies of actual Java applications. We discuss some malicious modifications that an attacker may wish to perform on these applications, and then describe how they may be detected either by the class appraiser or by the policy appraiser with a suitable policy.

5.1 Vuze

Vuze [40], formerly known as Azureus, is a BitTorrent ap-plication written in Java. It represents a suitable use case for JMF because BitTorent clients are often open for long periods of time. We give an example of a malicious byte-code modification in Vuze, illustrating how JMF can de-tect such malicious activity. Specifically, we implemented a change to the bytecode to create a torrent of a directory and announce it without user interaction in the constructor ofStartServer.class.

Our class appraiser is able to detect changes at class-level granularity, including discrepancies in individual methods and the class constant pool. Modifications to the class con-stant pool can cascade into changes to the bytecodes of mul-tiple methods, even though the functionality of the methods remains the same. Usingjmack, we extracted hashes of the bytecodes and constant pools from a running Vuze process that contained the modified class. Of the approximately 4000 classes examined in a measurement of a Vuze process, only theStartServerhashes are (appropriately) identified as modified.

5.2 Apache FtpServer

Apache FtpServer [2] is an open source FTP server written in Java. We discuss three malicious modifications to this server and describe policies for combating them.

To improve the readability, the example policies in this section do not use full classpaths; they also do not con-tain qualifiers, which would be necessary in a real deploy-ment to describe when the policy may be legitimately in-valid. The policies make use of several helper functions (HashMapGet,HashtableGet, andGetClassName), which we omit for brevity. These functions are needed to traverse the data structures of the heap dump; we cannot directly use the normal Java methods for accessing these data struc-tures, since they are not objects in a live heap, and so they cannot have methods invoked on them. However, we believe these helper methods can be automatically generated by the Java source code that manages the data structures for a live heap (e.g., thegetmethod in HashMap.java).

5.2.1 Session Modification

An important property of the FTP server is that all logged-in users should be valid users of the system. A pol-icy asserting that the user associated with an active FTP session exists in the data structure that contains the list of users is written in Figure 3(a). Information on each session is contained in anFtpIoSessionobject. The user informa-tion of the session is stored several objects deep inside a

HashMap. A PropertiesUserManager object stores all the information on the valid users of the server, including the username, password, and whether the account is enabled; the core information is also stored several objects deep in aHashTable. Hence the policy ensures that each username associated with a FtpIoSession exists in the table of the

PropertiesUserManager, and enableflag for the account is set totrue.

5.2.2 User Account Modification

Another important property of the FTP server is that all the user account information should be as expected and not be modified. ThePropertiesUserManagerobject

(7)

∀x:FtpIoSession,∃p:PropertiesUserManager⊢IsValidSession(x,p)

boolean IsValidSession(FtpIoSession x, PropertiesUserManager p) {

String username = HashMapGet(x.wrappedSession.attributes.attributes.m.table,"user"); return HashTableGet(p.userDataProp.table,"user."+username+".enableflag")

}

(a) FTP session policy

∀pTgt:PropertiesUserManager∈HT,∃pBase:PropertiesUserManager∈HB ⊢SameAccts(pTgt,pBase) boolean SameAccts(PropertiesUserManager pTgt, PropertiesUserManager pBase) {

for (int i=0; i < pTgt.userDataProp.table.length; i++) { String key = pTgt.userDataProp.table[i].key;

String value = pTgt.userDataProp.table[i].value;

if (!value.equals(HashTableGet(pBase.userDataProp.table,key))) { return false; } } return true; }

(b) FTP user account policy ∀c:DefaultCommandFactory⊢HasSameCommands(c)

boolean HasSameCommands(org.apache.ftpserver.command.impl.DefaultCommandFactory c) { return (GetClassName(c.commandMap("ABOR")) == "ABOR") &&

(GetClassName(c.commandMap("ACCT")) == "ACCT") && ...

}

(c) FTP command policy

Figure 3: FTP server policies

and properties for those users (such as enableflag and writepermission, for each account). An example of a pol-icy that uses a baseline measurement for comparison of ac-count information properties is shown in Figure 3(b). It ensures that every object of typePropertiesUserManager

in the target has a corresponding object of the same type in the baseline, and the property values are the same.

5.2.3 Command Behavior Modification

The FtpServer implementation includes a separate class for each command (e.g., the DELE command is defined in DELE.java). Each of these command classes is a subclass ofAbstractCommand. Objects of each of the commands are stored within a HashMap inside a CommandFactory object. When a command is sent, the command string is looked up in theHashMap, and methodexecute is invoked on the resulting object. By replacing the object in the HashMap

with an object of a different type (though with the same base class), an attacker could replace the functionality of a command.

For example, an attacker may replace theSITE WHOobject with anEVIL SITE WHOobject that disables the check to be sure only an administrator can perform the action (assuming the EVIL SITE WHO class has been loaded into the JVM). This attack can be mitigated by checking that unexpected classes are not loaded in the JVM, using the class appraiser. Alternatively, the attacker may simply replace a command with another pre-existing command. By replacing theNOOP

object with aRETRobject, and thereby retrieving a file with a different command, the attacker may possibly avoid being logged.

An example policy that will detect these modifications is given in Figure 3(c). It simply goes through the HashMap

object that holds the commands and ensures they all have types of the correct class.

This sort of behavior is a common programming prac-tice and is extremely important to the control flow of the program. It is analogous to function pointers in C pro-grams [26]. Modification to such data structures can result in unexpected control flows, which may give an attacker a great deal of power.

5.3 bluffin-muffin

Bluffin-muffin [10] is an open source Texas hold ’em appli-cation, consisting of a centralized server and multiple clients. The client relies on the server to play the game correctly and fairly, so it wants to ensure that it is operating correctly. This is an example of an application where an attestation protocol may be employed between the client and server [14] since in a real deployment there may be money at stake.

An attacker may employ a variety of modifications to the server’s bytecode, e.g., to shuffle cards unfairly. Such modifi-cations will be detected by the class appraiser. Alternatively, suppose an attacker has the ability to modify a player’s cards without changes to the bytecode. We created a policy that ensures that all cards in players hands and in the deck are unique; i.e., a player should not have the ace of spades if it also appears in the deck. This example involves multi-ple class instances (game, players, cards, etc.), showing that our system is suitable to describe and enforce sophisticated security policies.

This example also illustrates a policy on a server applica-tion, which can help clients verify that the server is acting in an acceptable manner. Similarly, a server may want to verify properties of a connected client to be sure that the client is acting properly as well. In the poker example, the

(8)

server may wish to ensure that only valid client programs are running, and no poker bots are being used. These kinds of client/server trust requirements are increasingly common, as in multi-user online gaming, financial software, and be-yond. Paired with a proper attestation protocol, we believe our system can have wide applicability to client/server trust relationships.

6. EXPERIMENTAL RESULTS

In order to demonstrate the practicality of our implemen-tation, we have performed a series of benchmarks showing we can perform runtime measurement of a Java process with reasonable overhead. We now describe our experimental methodology and analyze the performance results.

6.1 Experimental Methodology

We carried out our experiments on a desktop computer running 32-bit Ubuntu 10.04 with 2 GiB of memory and dual Intel Pentium D 3.20 GHz processors with a modi-fication of the OpenJDK Client VM (1.7.0). We used a subset of the DaCapo-9.12-bach benchmark suite [9], which includes several open source Java programs. The subset we used includedavrora,eclipse,h2,jython,lusearch,pmd,

sunflow, tradebeans, tradesoap, andxalan. We did not use batik and tomcat because they use libraries that are unsupported by the OpenJDK JRE, and we did not usefop

orluindex because they do not have a “large” benchmark size (using the −slarge command line argument), which we required for adequate performance analysis (the smaller benchmark inputs take no more than six seconds to complete in our experimental setup, so the number of measurements would necessarily be limited). We used the default settings for all other options, and ran each benchmark five times.

6.2 Performance Results

We evaluate the performance of measuring a process using JMF’sjmack tool, including the copy-on-write implementa-tion using fork. The sequence of steps is as follows:

1. Start the target process.

2. jmack is invoked with the process id of the target

pro-cess.

3. jmack executes the fork op VM command within the

target process JVM. The target process then continues unabated.

4. The forked process runs in parallel to the original process to capture the state snapshot, then terminates; the tar-get JVM is momentarily paused while the measurement transfers to the forked JVM.

Figure 4 shows the execution times of each of the bench-marks for various delays between measurements; we report the mean values. The standard deviations are small relative to the mean values. For example, in this set of experiments the average standard deviation is 2.4% of the mean value with a range of 0.3% to 12.8%. “Delay = 0s” means that

oncejmackcompletes a measurement, another measurement

is immediately initiated. A positive delay means the process is allowed to run for the specified number of seconds before another measurement is taken. We restrict test values up to “Delay = 10s” because the shortest benchmark takes ap-proximately 10 seconds to run on our experimental setup. Further increasing the delay reduces the overhead.

0 30 60 90 120 150 180 210 240 270

avrora eclipse h2 jython lusearch pmd sunflow tradebeanstradesoap xalan

Benchmark Completion Time (seconds)

Benchmark Delay=0s Delay=5s Delay=10s No Measurement

Figure 4: Execution times of the DaCapo bench-marks when the amount of delay between measure-ments varied. The execution times of the bench-marks vary from as short as 10 seconds to over 3 minutes, and as the delay between jmack invoca-tions increases, the execution times decrease.

The question of the frequency of measurements can be re-duced to a tradeoff between performance and concerns about the power of the adversary. An adversary who uses measure-ment infrequency as an attack vector must compromise the target between measurements, and exit the target when a measurement occurs (an adversary who makes changes that are not part of the integrity policy can obviously avoid de-tection). Since measurement is best suited for persistent changes to code and data structures, we believe the time between measurements need not be small. In many cases, it may best be used as part of anattestationscenario (e.g., where one party of a network connection wants to ascertain if another party is operating as expected before connect-ing) [14].

We calculate the runtime overhead by comparing the exe-cution times of a particular exeexe-cution scenario and the “No Measurement” (i.e., with no measurements being taken) exe-cution scenario. The average overhead across all the bench-marks for “Delay = 0s” is 38.0% and for “Delay = 5s” it is 6.8%. For “Delay = 10s”, the average overhead shrinks further to a more manageable 3.0%. The number of jmack

invocations for each benchmark varies because we allow the measurements to complete before delaying. For example, al-though jythonand tradebeanspossess comparable execu-tion times,jython averages about twice as many measure-ments because measuremeasure-ments for tradebeans take longer. This phenomenon can be seen in Figure 5, which shows the measurement times for each benchmark. The number of in-vocations is inversely related to the measurement time.

In order to put the measurement performance of JMF into context, we compare it to the monitoring performance of JavaMOP [13]. JavaMOP uses the previous 2006-10 ver-sion of the DaCapo benchmark suite, which removes three single-threaded benchmarks, replaceshsqldb withh2, adds six new benchmarks, and updates the workloads [16]. Java-MOP performance results are reported for the monitoring of

(9)

0 1 2 3 4 5 6 7 8 9 10 11

Measurement Process Lifetime (seconds)

Benchmark

Delay=0s Delay=5s Delay=10s

Figure 5: The average measurement times (i.e., the lifetime of thejmack process) for each of the bench-marks and each delay. Note the similarity in shape between the measurement time and average heap size graph in Figure 6.

six different properties. The authors devise a “decentralized indexing” optimization to reduce runtime overhead and re-port that less than 8% of the experiments showed more than 10% runtime overhead. Averaging the overhead for exper-iments using their decentralized indexing technique results in a 7.3% slowdown. Many of the overheads are listed as 0.0%, most likely because the properties being monitored were infrequent or not even applicable. This illustrates an interesting point that their overhead is dependent on what properties they are checking, and will increase if they add more checks. The overhead in JMF will not increase as we are already taking a measurement of the entire state of the process, and the appraiser can decide which portions of the measurement it wishes to inspect. Although a direct com-parison between a monitoring and measurement system is difficult, our performance results are comparable. Given the other advantages of measurement—that we don’t need to re-compile programs and can detect bytecode modifications— we contend that the JMF measurement system is an attrac-tive approach.

Although performance of the target is the most important characteristic of our implementation, we also show the av-erage heap size and measurement time for each benchmark. Figure 6 illustrates the average size of the heap dump for each of the benchmarks. Heap size is an indicator of several other properties (including measurement time), and is itself a property that remains roughly constant within a bench-mark across many delays.

As depicted in Figure 5, measurement times vary, and are highly correlated with the size of the heap. Although the relative proportion of time it takes to dump the heap varies between benchmarks (intuitively, the larger the heap the longer jmack takes to dump it), we observe that the heap size tends to be the dominating factor in measurement time, with bytecode hashes next, then stack traces and other overhead (e.g., the time it takes to fork) accounting for the rest of the time. Across all benchmarks and across all

0 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 190 200 210 220 230 240 250 260 270 280

Heap Size (MB)

Benchmark

Delay=0s Delay=5s Delay=10s

Figure 6: The average size of the heap dumps re-mains relatively constant in spite of measurements taken at different frequencies (and therefore at dif-ferent times).

lay amounts, the average measurement time per megabyte of heap remains relatively constant at 0.04 seconds/MB. For particular benchmarks, the range varies from 0.034 second-s/MB (lusearch) to 0.048 seconds/MB (avrora).

In future work, we plan to explore a policy-based approach to extracting measurements. Our current implementation dumps the entire heap, which can clearly be sizable and take significant time. Another approach is to use the knowledge of what the policy will inspect to dump only the relevant portions of the heap. This should reduce the size of the measurement and will likely take less time as well. While this will improve the performance of the measurer, we ex-pect it will have minimal impact on the target, which is al-ready running using COW. The effect on the original target process would be incidental, as a result of having a smaller concurrent measurement process in the system.

7. DISCUSSION

The primary use of JMF is to ensure the integrity of ap-plications as part of a “defense in depth” security approach, with other components ensuring the correct operation of the environment (JVM and OS). Our tool also has other uses. Once a policy is developed, JMF can be used as a debugging tool to ensure desired properties actually hold at runtime. It is also possible to use JMF in concert with other sys-tems. For example, if one wants to use a security monitor that is integrated into the Java bytecode of an application, JMF may be used to ensure that this bytecode has not been modified. This approach effectively unifies measurement and monitoring systems.

Taken together, measurement and monitoring can provide the means to satisfy the Bell-LaPadula security theorem [8]: measurement systems may show that the system is currently in a valid state, and monitors ensure that if the current state is valid then any (successful) transition will yield a valid state. Unfortunately, merging measurement and monitoring into a single system is generally impractical. A measurement

(10)

may contain the entire state of the process, while a monitor must observe every transition of the process. As the amount of information a monitor keeps about a process increases, so does the performance impact on the process. Similarly, as the frequency of measurements increases, the performance of the process decreases. Taken to the extreme, we would have a monitor/measurer that inspects the entire state of the process after each step of the execution, which would likely make the process too slow to perform useful computation. One must weigh the benefits and risks of monitors versus measurers before deciding which to employ, or if some com-bination of both is necessary (keeping practicality in mind). In future work, we plan to improve our implementation in several ways. We aim to make a more efficient measurer that only collects the portions of the heap that are relevant to a policy, and improve the appraiser to more efficiently traverse the target heap. We also plan to make general improvements to the compiler to ensure support of all of Java’s language features.

Although we largely assume an acceptable policy is avail-able within the context of the system, drafting an effective policy is a challenging task. Policies may require domain expertise, and suffer from many of the same pitfalls that software does. For this reason, automatic policy generation may be helpful in identifying not only useful invariants, but also starting places upon which to develop more sophisti-cated policies. Other systems have already been used to identify invariants automatically based on traces of execu-tion [6, 15, 18, 32]. We plan to explore how these techniques can by applied to JMF.

We also plan to modify our policy compiler to support integration of the original source code when producing the appraiser. In many cases, our policies replicate code that al-ready exists in the original source, such as looking up entries inHashMapobjects or getting the data fields of objects. We aim to improve the efficiency of writing policies by allowing the policy writer to more easily make use of functionality that is already implemented in the program’s source code.

We will also investigate how the concepts of JMF may apply to other languages, and determine which other lan-guages may have features that are useful for measurement. Whether or not an integrity policy is sufficient for justifying process integrity remains an open problem. Hence, we plan to consider techniques to improve the confidence that a pol-icy fully covers the integrity requirements of a program. We intend to explore programming language concepts that can help a programmer write programs that are more amenable to measurement, and create metrics that may be used to evaluate the robustness of an integrity policy.

8. RELATED WORK

JMF is related to work on integrity policies for kernel data, monitoring systems, and data structure invariant gen-eration, which we now discuss in turn.

8.1 Integrity of Kernel Dynamic Data

Several prior works developed systems to provide runtime integrity measurement [26, 29, 30] of a kernel, focused pri-marily on the measurement component, with little emphasis on appraisal. There has so far been little work on general techniques for specifying integrity policies.

Petroni et al. [33] describe an architecture for defining in-tegrity specifications suited specifically for dynamic kernel

data. Since they are concerned with low-level compiled C code, they must describe how the data is organized in mem-ory and build up a model that abstracts the low-level data into a more understandable form. Constraints, which are similar to our policies, specify rules consisting of quantifiers over the sets defined in the model along with a predicate that must hold, and with an optional response to a predicate fail-ure. In their architecture, the specification writer must be careful to get the low-level structure definitions and model definitions correct, else the constraints become meaningless. In essence, the specification writer must recreate the struc-ture of the program that the programmer understood when writing the program. Since our system is object-oriented, the data structures are explicitly represented both in the code and in the runtime, thus avoiding the step of repre-senting data structures in a model, and our policies are de-scribed directly over the program source code. Our system only requires someone to write policies (which are similar to their constraints) about a program.

KOP [11] enables integrity checking by mapping dynamic kernel data and taking generic pointers, unions, and dy-namic arrays into consideration in its analysis. KOP is sim-ilar to JMF in that an appraisal is made based on an explicit snapshot of the system, but KOP focuses more on building the snapshot while JMF’s contribution lies in the use of poli-cies to reason about process snapshots.

Schiffman et. al. [36] expand the scope of integrity mea-surement beyond the kernel. They describe a service called IVP for monitoring integrity of a VM using VM introspec-tion; their focus is on integrity measurement protocols, and they require measurement modules to be written for desired measurement targets (e.g., SELinux policy, netfilter function pointers). The focus of JMF is on the integrity policies and the actual measurement and appraisal of Java programs; it may be possible to use the IVP approach to gain trust in the environment in which JMF runs, and to manage attestations involving JMF.

We acknowledge the importance of kernel measurement, but not at the expense of application measurement. We note that applications can be compromised independently of the kernel, and therefore systems like JMF remain relevant.

8.2 Program Monitors

Integrity measurement as described is closely related to a significant body of work on invariant monitoring; Delgado et al. [17], for example, present an overview of software-fault monitoring tools and Parno et al. [31] categorize and explain extant approaches to bootstrapping trust. Projects such as MOP [13], InvTS [19], Tracematches [4], Jahob [42], and Java-MaC [23] provide inline invariant enforcement by instrumenting a program’s source code based on a policy specification similar to that used by JMF2_. _The Trust-edVM implementation of Haldar et al.’s semantic remote attestation concept [20] provides a modification of the JVM to monitor protocols. JMF provides a more comprehensive implementation and focuses on application data structures. Furthermore, JMF can be run on an unmodified, already-running process and considers only the state captured in a snapshot when making policy appraisals.

2_{MOP also provides for outline monitors, where the monitor} runs in a separate process. However, it is dependent on the monitored process to send it messages with the relevant information.

(11)

The instrumented code in monitoring systems maintains an abstract model of the source program’s policy-relevant state that may be consulted to validate any transition the program takes against the policy. It is essential to the cor-rectness of the monitor that the system is unable to make a policy relevant state transition without consulting the mon-itor. If the system is allowed to make even a single un-monitored transition, the monitor’s internal state machine model of the program state will no longer be valid and thus all future decisions of the monitor may be suspect. This makes monitoring based systems vulnerable to state tran-sitions outside of the expected execution model of the sys-tem, such as code injection attacks or direct memory access (DMA).

As noted previously, measurement systems attempt to val-idate the current state, independent of any previous behav-ior. The ability of a measurement to recognize an invalid state is limited only by its ability to inspect the state of its target. This makes measurement systems immune to the vulnerabilities described for monitors, but leaves open the potential for the system to pass through intermediate invalid states without detection.

Similar to JMF, ReDAS [22] does not require code in-strumentation used by other monitoring systems. ReDAS checks integrity of Linux applications using a kernel compo-nent to monitor structural integrity (e.g., return addresses on the stack), and certain data invariants of global variables; these are checked at each system call. ReDAS is limited in the amount of things it can monitor; more checks lead to a greater performance impact at each system call. In contrast, JMF provides a comprehensive measurement ofallthe state of a Java program (both code and data). We also present a mechanism for specifying integrity policies for any data within the Java program, including for complex structures like linked lists; this is a known limitation for ReDAS.

8.3 Class Invariants

A considerable amount of work exists for defining invari-ants in object-oriented languages, the majority of which is based on the notion of Design By Contract (DBC) [28]. DBC-related work focuses on several concepts for describ-ing program obligations and benefits (such as pre- and post-conditions). We here focus only on invariants, as they are related to our work.

A number of Java-based approaches enforce class invari-ants using runtime checks. Jass [7] contracts are specified as comments in the source code, and the Jass pre-compiler inserts Java code to evaluate the class invariants at the be-ginning and end of method calls. Similarly, iContract [24] invariants are described as comments and evaluated at the beginning and end of public methods, and on exceptions. They also support universal and existential quantification over a set of elements, and they also have an implies oper-ator over contracts. In jContractor [1], invariants are de-fined in a protected, non-static method Invariant() that is evaluated at the beginning and end of public methods. They also support logical quantifiers and operators, such as Forall, Exists,suchThat, and implies. The Java Model-ing Language (JML) [25] provides a similar mechanism for describing invariants via comments, and support for univer-sal and existential quantifiers in assertions, amongst others. The JMF approach makes it easier to write up-to-date poli-cies because polipoli-cies are written and compiled independently

of the source code. JML, for example, has been hampered by the difficulty in keeping up with both the latest JML specifi-cations and the latest extensions to the core Java language. JMF does not require recompilation of the source code to in-clude runtime assertion checks. Although OpenJIR [34] at-tempts to address some of the shortcomings of JML by pro-viding an intermediate representation, our approach circum-vents the problem altogether by the independent paradigm we use to write policies. Furthermore, assertions tend to capture local policies, while our JMF framework makes it easy to express global properties across several classes.

The aforementioned MOP work [13] provides a framework for adding logic plugins, such as JML, Extended Regular Ex-pressions (ERE) and Linear Temporal Logics (LTL). They claim that the logics of assertions/invariants of DBC falls under their logical framework. They also support monitor-ing of parametric specifications (universal quantifiers). The underlying issue is efficiency; since they are doing runtime monitoring, it can be expensive to keep track of all of the groups of objects associated with universal quantifiers.

Though we do not provide a mechanism for plug-and-play logics, our technique can be extended to other policy lan-guages. Further, some of the things a monitor may observe are not in the scope of a measurement system. In particular, temporal logics are less meaningful in a measurement sys-tem that only periodically measures. For example, a mon-itor may wish to validate that all accesses are preceded by an authentication. A periodic measurer cannot ensure this property, but can only be sure that a particular state is acceptable. For example, a measurer may only check the current state of the process and be sure that the accessed data has a corresponding authentication certificate. Such a policy can be described in our policy language.

9. CONCLUSIONS

We have presented a Java Measurement Framework (JMF) that provides language support for defining and enforcing runtime integrity policies. Our integrity policies represent a new way to define a set of class invariants based on the source code and allow policy writers to avoid quiescence problems by declaring the parts of the execution for which the invari-ant is enforced. We have shown the feasibility of our ap-proach with a proof-of-concept implementation that can be run on unmodified, legacy software with comparable over-head to other measurement and monitoring approaches. In the case studies we have shown representative usage scenar-ios and the expressiveness of our policies.

Acknowledgments

We gratefully acknowledge Lauren Won and Scott Stanch-field for their insights and assistance with the implementa-tion.

10. REFERENCES

[1] P. Abercrombie and M. Karaorman. jContractor: Bytecode instrumentation techniques for implementing design by contract in Java. InProceedings of the Second Workshop on Runtime Verification (RV), volume 70(4) of Electronic Notes in Theoretical Computer Science, pages 55 – 79, Copenhagen, Denmark, July 2002.

[2] Apache Software Foundation. Apache FtpServer.

http://mina.apache.org/ftpserver/.

[3] Apache Software Foundation. The Byte Code Engineering Library.http://jakarta.apache.org/bcel/.

(12)

[4] P. Avgustinov, E. Bodden, E. Hajiyev, L. Hendren, O. Lhot´ak, O. de Moor, N. Ongkingco, D. Sereni, G. Sittampalam, J. Tibble, et al. Aspects for trace monitoring. InFormal Approaches to Software Testing and Runtime Verification, pages 20–39, Seattle, WA, USA, Aug. 2006.

[5] A. Aviˇzienis, J.-C. Laprie, B. Randell, and C. Landwehr. Basic concepts and taxonomy of dependable and secure computing.

IEEE Trans. Dependable Secure Comput., 1(1):11–43, Jan. 2004.

[6] A. Baliga, V. Ganapathy, and L. Iftode. Automatic inference and enforcement of kernel data structure invariants. InAnnual Computer Security Applications Conference (ACSAC), pages 77–86, Anaheim, CA, USA, Dec. 2008.

[7] D. Bartetzko, C. Fischer, M. M¨oller, and H. Wehrheim. Jass -Java with assertions. InProceedings of the First Workshop on Runtime Verification (RV), volume 55(2) of Electronic Notes in Theoretical Computer Science, Paris, France, July 2001. [8] D. E. Bell and L. J. La Padula. Secure computer system:

Unified exposition and multics interpretation. Technical Report MTR-2997, MITRE Corporation, 1976.

[9] S. M. Blackburn, R. Garner, C. Hoffman, A. M. Khan, K. S. McKinley, R. Bentzur, A. Diwan, D. Feinberg, D. Frampton, S. Z. Guyer, M. Hirzel, A. Hosking, M. Jump, H. Lee, J. E. B. Moss, A. Phansalkar, D. Stefanovi´c, T. VanDrunen, D. von Dincklage, and B. Wiedermann. The DaCapo benchmarks: Java benchmarking development and analysis. InObject-Oriented Programming, Systems, Languages, and Applications (OOPSLA), pages 169–190, Portland, OR, USA, Oct. 2006. [10] bluffin muffin. Open source poker game.

http://code.google.com/p/bluffin-muffin/.

[11] M. Carbone, W. Cui, L. Lu, W. Lee, M. Peinado, and X. Jiang. Mapping kernel objects to enable systematic integrity checking. InACM Conference on Computer and Communications Security (CCS), pages 555–565, Chicago, IL, USA, Nov. 2009. [12] F. Chen and G. Ro¸su. Java-MOP: A monitoring oriented

programming environment for java. InTools and Algorithms for the Construction and Analysis of Systems (TACAS), pages 546–550, Edinburgh, UK, Apr. 2005.

[13] F. Chen and G. Ro¸su. Mop: an efficient and generic runtime verification framework. InObject-Oriented Programming, Systems, Languages, and Applications (OOPSLA), pages 569–588, Montreal, QC, Canada, Oct. 2007.

[14] G. Coker, J. Guttman, P. Loscocco, J. Sheehy, and B. Sniffen. Attestation: Evidence and trust. In L. Chen, M. Ryan, and G. Wang, editors,Information and Communications Security, volume 5308 ofLecture Notes in Computer Science, pages 1–18. Springer Berlin / Heidelberg, 2008.

[15] C. Csallner, N. Tillmann, and Y. Smaragdakis. DySy: Dynamic symbolic execution for invariant inference. InInternational Conference on Software Engineering (ICSE), pages 281–290, Leipzig, Germany, May 2008.

[16] DaCapo Project. dacapo-9.12-bach Release Notes.

http://www.dacapobench.org/RELEASE_NOTES.txt. [17] N. Delgado, A. Gates, and S. Roach. A taxonomy and catalog

of runtime software-fault monitoring tools.IEEE Trans. Softw. Eng., 30(12):859–872, Dec. 2004.

[18] B. Demsky, M. D. Ernst, P. J. Guo, S. McCamant, J. H. Perkins, and M. Rinard. Inference and enforcement of data structure consistency specifications. InInternational

Symposium on Software Testing and Analysis (ISSTA), pages 233–244, Portland, ME, USA, July 2006.

[19] M. Gorbovitski, T. Rothamel, Y. A. Liu, and S. D. Stoller. Efficient runtime invariant checking: a framework and case study. InInternational Workshop On Dynamic Analysis (WODA), pages 43–49, Seattle, WA, USA, July 2008. [20] V. Haldar, D. Chandra, and M. Franz. Semantic remote

attestation: a virtual machine directed approach to trusted computing. InProceedings of the 3rd conference on Virtual Machine Research And Technology Symposium - Volume 3, page 3–3, Berkeley, CA, USA, 2004. USENIX Association. ACM ID: 1267245.

[21] T. Jaeger, R. Sailer, and U. Shankar. PRIMA: Policy-reduced integrity measurement architecture. InACM Symposium on

Access Control Models And Technologies (SACMAT), pages 07–09, Lake Tahoe, CA, USA, June 2006.

[22] C. Kil, E. Sezer, A. Azab, P. Ning, and X. Zhang. Remote attestation to dynamic system properties: Towards providing complete system integrity evidence. InIEEE/IFIP

International Conference on Dependable Systems and Networks (DSN 2009), Estoril, Portugal, June 2009.

[23] M. Kim, S. Kannan, I. Lee, O. Sokolsky, and M. Viswanathan. Java-MaC: A run-time assurance tool for Java programs.

Electronic Notes in Theoretical Computer Science, 55(2):129–155, Mar. 2001.

[24] R. Kramer. iContract - the Java design by contract tool. In

Proceedings of the Technology of Object-Oriented Languages and Systems (TOOLS), page 295, Santa Barbara, CA, USA, Aug. 1998.

[25] G. T. Leavens, A. L. Baker, and C. Ruby.JML: A notation for detailed design, chapter 12, pages 175–188. Kluwer Academic Publishers, Boston, MA, USA, Sept. 1999.

[26] P. A. Loscocco, P. W. Wilson, J. A. Pendergrass, and C. D. McDonell. Linux kernel integrity measurement using contextual inspection. InACM Workshop on Scalable Trusted Computing (STC), pages 21–29, Alexandria, VA, USA, Nov. 2007. [27] E. Metula. Managed code rootkits. Presentation at DEFCON

2009.

[28] B. Meyer. Applying “Design by Contract”.Computer, 25(10):40–51, Oct. 1992.

[29] Nick L. Petroni, Jr., T. Fraser, J. Molina, and W. A. Arbaugh. Copilot - a coprocessor-based kernel runtime integrity monitor. InUSENIX Security Symposium, pages 179–194, San Diego, CA, USA, Aug. 2004.

[30] Nick L. Petroni, Jr. and M. Hicks. Automated detection of persistent kernel control-flow attacks. InACM Conference on Computer and Communications Security (CCS), pages 103–115, Alexandria, VA, USA, Nov. 2007.

[31] B. Parno, J. McCune, and A. Perrig. Bootstrapping trust in commodity computers. InIEEE Symposium on Security and Privacy, pages 414–429, Oakland, CA, USA, May 2010. [32] J. H. Perkins, S. Kim, S. Larsen, S. Amarasinghe, J. Bachrach,

M. Carbin, C. Pacheco, F. Sherwood, S. Sidiroglou, G. Sullivan, W. Wong, Y. Zibin, M. D. Ernst, and M. Rinard.

Automatically patching errors in deployed software. InACM Symposium on Operating Systems Principles (SOSP), pages 87–102, Big Sky, MT, USA, Oct. 2009.

[33] N. L. Petroni, Jr., T. Fraser, A. Walters, and W. A. Arbaugh. An architecture for specification-based detection of semantic integrity violations in kernel dynamic data. InUSENIX Security Symposium, Vancouver, BC, Canada, Aug. 2006. [34] Robby and P. Chalin. Preliminary design of a unified JML

representation and software infrastructure. InFormal Techniques for Java-like Programs - FTfJP ’09, pages 1–7, Genova, Italy, July 2009.

[35] R. Sailer, X. Zhang, T. Jaeger, and L. van Doorn. Design and implementation of a TCG-based integrity measurement architecture. InUSENIX Security Symposium, pages 16–16, San Diego, CA, USA, Aug. 2004.

[36] J. Schiffman, H. Vijayakumar, and T. Jaeger. Verifying system integrity by proxy. In5th International Conference on Trust and Trustworthy Computing (TRUST), pages 179–200, 2012. [37] Sun Microsystems. OpenJDK.http://openjdk.java.net/. [38] Sun Microsystems. The Java HotSpot Server VM.http://

java.sun.com/products/hotspot/docs/general/hs2.html. [39] M. Thober, J. A. Pendergrass, and C. D. McDonell. Improving

coherency of runtime integrity measurement. InACM Workshop on Scalable Trusted Computing (STC), pages 51–60, Fairfax, VA, USA, Oct. 2008.

[40] Vuze, Inc. Vuze.http://www.vuze.com/.

[41] J. Williams. Enterprise Java rootkits. InBlackHat USA, Las Vegas, NV, USA, July 2009.

[42] K. Zee, V. Kuncak, M. Taylor, and M. Rinard. Runtime checking for program verification. InInternational Conference on Runtime Verification (RV), pages 202–213, Vancouver, BC, Canada, Mar. 2007.