Program Tailoring: Slicing by Sequential Criteria

(1)

Yue Li

†1

, Tian Tan

†2

, Yifei Zhang

3

_{, and Jingling Xue}

4

1 School of Computer Science and Engineering, UNSW Australia

[email protected]

Abstract

Protocol and typestate analyses often report some sequences of statements ending at a program point P that needs to be scrutinized, since P may be erroneous or imprecisely analyzed. Program slicing focuses only on the behavior at P by computing a slice of the program affecting the values at P . In this paper, we propose to restrict our attention to the subset of that behavior at P affected by one or several statement sequences, called a sequential criterion (SC). By leveraging the ordering information in a SC, e.g., the temporal order in a few valid/invalid API method invocation sequences, we introduce a new technique, program tailoring, to compute a tailored program that comprises the statements in all possible execution paths passing through at least one sequence in SC in the given order. With a prototyping implementation, Tailor, we show why tailoring is practically useful by conducting two case studies on seven large real-world Java applications. For program debugging and understanding, Tailor can complement program slicing by removing SC-irrelevant statements. For program analysis, Tailor can enable a pointer analysis, which is unscalable to a program, to perform a more focused and therefore potentially scalable analysis to its SC-relevant parts containing hard language features such as reflection.

1998 ACM Subject Classification F.3.2 Semantics of Programming Languages - Program

Ana-lysis, D.2.5 Testing and Debugging - Code inspections and Debugging aids

Keywords and phrases Program Slicing, Program Analysis, API Protocol Analysis Digital Object Identifier 10.4230/LIPIcs.ECOOP.2016.15

Supplementary Material ECOOP Artifact Evaluation approved artifact available at

http://dx.doi.org/10.4230/DARTS.2.1.8

1 Introduction

Program slicing, supported by industry-strength tools, such as WALA [52] and CodeSurfer [18], has found many diverse applications, such as program debugging, comprehension, analysis, testing, verification and optimization [8, 20, 43, 49]. Given a slicing criterion consisting of a program point P and several variables used at P [53], program slicing computes a slice of the program that may affect their values at P in terms of data and control dependences. In the ∗ _{This work is supported by ARC grants, DP130101970 and DP150102109.}

† _{These authors contributed equally to this work.}

tifact

(2)

Analysis

E.g., API Protocol Analysis Outputs open close write(Error) Tailoring Program Sequential Criterion (SC)

[open close write]

Traditional Criterion [write] Techniques Slicing-Related Program Debugging Program Understanding … ... Program Analysis SC-Irrelevant Statements

open write [open write]

Sliced Program

Tailored Program

Tools

Figure 1 Program tailoring with some of its potential applications highlighted.

past three decades, several variations on this theme of program slicing have been proposed, including static vs. dynamic, backward vs. forward, and closure vs. executable [43, 49].

In practice, API protocol analysis [7, 37] and typestate analysis [9, 16, 34] often report some statement sequences ending at a program point P that needs to be scrutinized, since P may be erroneous or imprecisely analyzed. Each sequence can represent a valid or invalid API usage call sequence. Such protocol specifications or violations can also be provided manually or mined automatically [1,3,17,36,42,60]. As such analyses are either conservative or unsound, the temporal order specified in a sequence may or may not be feasible. However, program slicing focuses on P by ignoring this order, which is often essential for analyzing P . As illustrated in Figure 1, we introduce a new technique, program tailoring to reap additional benefits missed by program slicing at a point P (e.g., file.write()) by leveraging the temporal order specified in a new criterion, called a sequential criterion (SC) for P . A SCP consists of one or several statement sequences ending at P , with each representing, e.g.,

a valid API call sequence like file.open() → file.write() or an invalid API call sequence like file.open() → file.close() → file.write(). In what follows, we will drop P from SCP when the context is clear or when we are not interested in it. Given a SCP, tailoring

aims to obtain a tailored program, denoted T (SCP), that comprises the statements in all

the possible execution paths passing through at least one statement sequence in SCP in the

given order. By construction, all data and control dependences needed for understanding the behavior at P affected by SCP are included. Any statement that is not in T (SCP) is

irrelevant to SCP, i.e., SCP-irrelevant. For illustration purposes, we write S(P ) to represent

the (backward) slice affecting P obtained by program slicing. Note that slicing all the points in SCP independently still fails to capture their ordering constraint (and is unscalable, too).

Like slicing, tailoring enables software developers or client applications to inspect only the interesting parts of a program. Unlike slicing, which focuses on understanding the program behavior at P , tailoring restricts our attention to the subset of that behavior affected by SCP

only. Due to incompatible criteria used, tailoring can be used either as a complementary technique to slicing or in cases where slicing is ineffective, as discussed below.

1.1 Goals and Motivations

Given a SCP, we have developed a prototyping implementation of program tailoring, denoted

Tailor, for Java programs, with the following three goals in mind:

Precision _{Tailor is designed to sharpen the precision of many client applications, as}

highlighted in Figure 1, by exploiting the temporal order in a SCP. One significant class

of client applications includes many slicing-related techniques, such as thin slicing [47], program chopping [22] and value slicing [26]. For a given program, T (SCP) is shown as

the blue circle and S(P ) as the white circle. The statements in I(SCP) = S(P ) − S(P ) ∩

(3)

and understanding by a human. If I(SCP) = ∅, then Tailor is ineffective at P but no

harm is done. If I(SCP) 6= ∅, then Tailor can make slicing more precise, by leveraging

the otherwise wasted SCP information that is widely available. In Section 6.1, we show

that Tailor can improve the precision of thin slicing [47], a state-of-the-art practical but unsound slicing technique, for Java programs. In Section 6.2, we show that Tailor can enable a sophisticated pointer analysis for Java, S-2Obj [24], which is unscalable for a program, to perform a focused analysis on its SC-relevant parts containing hard language features such as reflection, where existing slicing techniques are ineffective.

Scalability Tailor is designed to work efficiently for large object-oriented programs, for which traditional slicing [21] is unscalable (even with industry-strength implementations [18,52]), with the key bottleneck coming from handling of the heap [47]. Like any slicing tool, Tailor is not always scalable. However, Tailor is designed to scale significantly better than traditional slicing, as Tailor avoids handling of the heap by reasoning about essentially the (un)reachability of a statement towards the statement sequences in a SCP.

Soundiness _{Tailor is designed to be a practical tool to facilitate programming debugging}

and understanding as well as program analysis (among others) for large object-oriented programs. In this setting, any sound static analysis would be either unscalable or imprecise due to the presence of many hard language features, such as native code, dynamic class loading, reflection and multi-threading [30]. Therefore, Tailor is designed to be sound whenever a sound graph representation is available for capturing all the control flows in a (single- or multi-threaded) program. In other words, Tailor is always sound with respect to part of the program behavior modeled. According to [30], “a soundy analysis aims to be as sound as possible without excessively compromising precision and/or scalability.” Therefore, Tailor represents one such soundy analysis.

1.2 Challenges and Solutions

We examine some challenges faced in achieving our three goals and describe our solutions:

Precision/Scalability Tradeoffs How do we compute T (SC) efficiently and precisely for

large object-oriented programs? Due to exponential blowup of program paths, both DFS and BFS are out of question. We formulate the problem of computing T (SC) by solving an IFDS (Interprocedural Finite Distributive Subset) data-flow analysis problem [38], efficiently on its interprocedural control-flow graph (ICFG) representation of the program. To avoid unrealizable paths with mismatched calls and returns, our analysis must be (fully) context-sensitive in order to achieve useful precision for Java programs. Otherwise, many unrealizable paths, which go through the constructor of java.lang.Object, cannot be filtered out, causing T (SC) to be severely over-approximated. However, distinguishing calling contexts fully with call strings, object-sensitivity [33], and method cloning [54] are known to be unscalable for large programs [12, 44]. We achieve (full) context sensitivity by solving a CFL-reachability problem over a balanced-parentheses language by matching call and return edges also in the IFDS framework as described in [38].

How do we ensure that Tailor works effectively for a SC of any given length, which is defined to be the number of statements in its longest statement sequence? In general, the longer a SC is, the more SC-irrelevant statements will be removed. We propose to lengthen any given SC by leveraging the concept of object-sensitivity [33,44] developed by the pointer analysis community for Java. As a result, some infeasible paths that would otherwise be introduced are avoided. We will also try to avoid making SC extensions that make our analysis run longer but contribute nothing to precision improvement.

(4)

Soundiness How do we make Tailor as soundly as possible? We decompose the problem of

tailoring a program for a given SC into two sub-problems: (1) building an ICFG, GICFG,

for the program and (2) computing T (SC) from GICFG. Tailor is sound if GICFG is

sound (representation of all control flows in the program). In fact, Tailor is designed to be practically useful for analyzing the program behavior modeled by GICFG even if

GICFG is unsound. This paper solves (2) while resorting to the state-of-the-art for (1).

1.3 Contributions

We introduce program tailoring, a new technique for trimming a program based on SCs, which are widely available from other analyses but never exploited by program slicing. We describe how to extend a given SC for object-oriented programs in order to improve the precision of program tailoring (by making tailored programs smaller).

We formulate the problem of computing a tailored program as one of solving a data-flow problem efficiently in the IFDS framework. Tailor, which is implemented in Soot [51], is released as an open-source tool at http://www.cse.unsw.edu.au/~corg/tailor. We describe two case studies to demonstrate why Tailor is practically useful on a set of seven large real-world Java programs, by (1) assisting program slicing with program debugging and understanding tasks, and (2) enabling a focused pointer analysis on the parts of a program containing hard language features such as reflection.

2 A Motivating Example

We use an example to describe how Tailor can assist slicing tools to simplify program debugging and understanding tasks through exploiting the temporal ordering information in a given SC that is otherwise ignored by program slicing. In Section 6.2, we provide additional motivations on why Tailor can be useful in simplifying program analysis tasks.

Large object-oriented programs are very difficult to debug and understand, due to the pervasive use of heap-allocated data, nested data structures, and large libraries with complex dependences and configurations. Tracing the flow of values via multiple levels of pointer indirection through the heap across many classes in both the application and libraries is unworkable. A practical tool is needed to pinpoint relevant statements for the task at hand.

Our Java example is given in Figure 2. The Driver class is used to create and initialize a Driverobject according to some user input (lines 33 – 36) or by default (lines 37 – 41). Then the corresponding initialization information stored in info is dumped to a log at line 42.

This example has a typical error found in Java programs caused by ignoring the fact that closing a wrapper file handler will also close its internal file handler. The internal file handler, fw is passed as an argument at line 6 and assigned to out at line 25. Later, closing its wrapper file handler, bw, at line 9 will also close out (i.e., fw) at line 27. Then any further access to a closed fw (e.g., at line 12) will trigger a runtime exception at line 20.

Now we use a static typestate analysis tool Clara [9] to analyze this program. Some typestate specifications regarding file operations used will include “accessing a closed file leads to an error state”. For our example, Clara produces an error report shown on the left,

Potential Point of Failure :

Statement: fw.write(info) at line 12

Related Program Points :

Statement: out.close() at line 27

Statement: new FileWriter(...) at line 2

marking line 12 as a “potential point of failure”, together with a sequence of two method calls leading to line 12. We have one

SCline 12 : line 2 → line 27 → line 12. As static analyses like

Clara are either conservative or unsound, there may or may not be an error at line 12. Now our debugging task begins.

(5)

class Driver {

Writer fw = new FileWriter(...);

Driver (String s) {…} Driver () {…}

void config(String[] args) {

Writer bw= new BufferedWriter(fw);

for(int i = 1; i < args.length; i++)

bw.write(args[i] + "\n");

bw.close();

}

void log(String info) {

fw.write(info); } } 1 2 3 4 5 6 7 8 9 10 11 12 13 14 class FileWriter {

boolean isOpen = true;

void close() { isOpen =false; }

if (!isOpen)

thrownew IOException();

} } void write(...) { 15 16 17 18 19 20 21 22 class BufferedWriter { Writer out; void close() { } } out.close(); BufferedWriter(Writer w) {out = w;} 23 24 25 26 27 28 29 30

void main(String[] args) {

Driver d; String info;

if (args[0].equals(…)) { d = new Driver(args[0]); } else { } d.config(args); info = getConfigInfo(args); d = new Driver(); File f = getSystemFile(...); info = getSystemInfo(f); d.log(info); } 40 41 42 43 31 32 33 34 35 36 37 38 39

Figure 2 An example showing how Tailor removes SCline 12-irrelevant statements.

The error at line 12 happens only when a Driver object is created in the if branch of main(). Therefore, a tool that instructs the developer to examine this if branch only would enable its cause to be identified quickly. In contrast, marking some lines in its matching elsebranch as also being relevant can increase human analysis effort significantly (especially if the developer has to trace the flow of values across many nested heap structures).

Traditional Slicing. For large object-oriented programs, traditional (sound) slicing [21,53]

is unscalable or yields slices that are too large for human consumption [47]. Given a virtual call fw.write(info) at line 12, the set of variables of interest consists of (1) the receiver reference and the arguments of the call [52], and (2) some relevant variables selected manually or recognized automatically [2, 13]. In our example, these variables form the set {fw, info, fw.isOpen}. Then, a (backward) slice that affects their values comprises all the statements except lines 7 – 8 and 19 – 20. This slice, which includes everything in main(), contains too many statements that are not all directly relevant to the task at line 12.

Thin Slicing. Thin slicing [47] is introduced to facilitate program debugging and

understand-ing for object-oriented programs by tradunderstand-ing soundness for scalability and (direct) relevance. All control dependences and all base pointer data dependences are excluded. Given a point of interest, thin slicing includes only so-called producer statements that affect directly the values at the point. Statements that serve to explain why producer statements affect the point are ignored. For example, given x = p.f and q.f = y, where p and q may be aliased, q.f= y is a producer statement for x = p.f, because there may be a direct value flow from y to x. All statements that help explain or establish why p and q are aliases are ignored.

If we adopt the same slicing criterion at line 12 as above, thin slicing will include only seven statements at lines 2, 12, 16, 17, 36, 39 and 40. Compared with the traditional slice obtained, this smaller slice still retains line 17, a statement for explaining an immediate cause of the error at line 12. However, two SCline 12-irrelevant statements at lines 39 – 40 (in the

(6)

A … B C

X

Virtual Call Branch Intra-procedural Inter-procedural control flow control flow

Figure 3 Leveraging the ordering information SC : A → B → C to trim irrelevant statements

away.

Program Tailoring. Given SCline 12: line 2→line 27→line 12, Tailor produces a tailored

program consisting of all the statements in all execution paths passing through lines 2, 27 and 12 in that order. As line 27 is only reachable from line 9, which resides in the config method invoked at line 35, the tailored program includes all the lines in the example except lines 37 – 41 that appear in the else branch of main() and lines 19 – 20.

Let us revisit the two slices obtained above, the traditional slice, Ptrad and the thin

slice, Pthin. Let our tailored program be identified as Ptal. Despite |Pthin| < |Ptal| < |Ptrad|,

tailoring brings several benefits, obtained from exploiting the temporal order of invocation sequences in SCline 12. First, Ptal is the only one that includes nothing from the else

branch of main(), revealing more clearly to a human debugger that the potential error at line 12 is triggered by a Driver object created in the if branch of main() (“according to some user input”) rather than in its matching else branch (“according to some default configuration”). Such contextual information enables the developer to locate the cause for the error at line 12 more quickly. In the case of Ptrad and Pthin, the developer may end

up wasting a lot of analysis effort on navigating through a lot of SCline 12-irrelevant code,

highlighted by getSystemInfo() and getSystemFile(), in the else branch. Second, any statement that is not included in Ptalis SCline 12-irrelevant for understanding the SC-specific

behavior at line 12. Thus, Tailor can make, e.g., thin slicing more effective, by removing the SCline 12-irrelevant statements Pthin−Pthin∩Ptal, i.e., lines 39 and 40, from Pthin. Then

Pthin is down to only five statements at lines 2, 12, 16, 17 and 36. Starting from line 17

(an immediate cause for the error at line 12), the developer can trace the flow of isOpen backwards to find the original cause. Finally, Ptal includes all data and control dependences

reaching line 12 affected by SCline 12, enabling it to be analyzed further by other analyses,

e.g., a pointer analysis, as will be discussed in Section 6. However, Ptrad and Pthin will not

be applicable since Ptrad is unobtainable scalably for large programs and Pthin is unsound.

Note that Ptalstill contains lines 7 – 8 that do not affect SCline 12. Removing all such

irrelevant statements for large programs may be neither necessary (due to the first two points made earlier) nor practical, as a sound slicing tool would be unscalable. Thus, we have designed Tailor based on the precision/scalability tradeoffs described in Section 1.

Figure 3 recaps our insight behind tailoring. Given a SC : A → B → C, we focus on the behavior at C affected by a sequential execution of A, B and C. If one point in SC is reached from only one branch (e.g., the one containing A) of a multi-way branching statement, then the statements in the other branches are SC-irrelevant and can be trimmed away ( ). This

(7)

Extended SC Bottom-Up Pass Top-Down Pass TAILOR Tailored Program

E.g., API Protocol Analysis Analysis Tools SC Extension Program ICFG Construction ICFG G

SC-Based Data-Flow Analysis (SCDFA)

SC _(SCEXT)

Figure 4 Overview of Tailor (with all thecomponentsimplemented in this paper inblue).

is particularly suitable for object-oriented languages, since a virtual call site behaves as a multi-way branch switching to its target methods. For example, B can be regarded as residing in a target method invoked at the marked virtual call on a receiver object created only at the allocation site at A. Thus, the other target methods unreachable to C are trimmed away ( ).

3 Methodology

Figure 4 gives an overview of Tailor, with all the components implemented in this paper highlighted in blue. Given a program, we rely on the state-of-the-art (shown as “ICFG Construction”) to build an inter-procedural control flow graph (ICFG), denoted GICFG,

to represent all the possible control flows in the program. A SC is simply a set of one or more statement sequences ending at the same statement, with all statements identified by their line numbers, i.e., program points only. The length of a SC is the number of statements in its longest sequence. SCs can be deduced from the results returned by many analysis tools such as API protocol analysis [7, 37] and typestate analysis [9, 16, 34]. For example, a typestate analysis may report a potential error at line C, f.write(), together with two invocation sequences of related methods, A : f.open() → B1 : f1.close() and A: f.open() → B2 : f2.close(), leading to line C. Therefore, we may choose to tailor the program at C to investigate its behavior affected by SC = {A → B1 → C, A → B2 → C}.

Given a SC, a tailored program, T (SC), consists of the statements on all possible execution paths in GICFG passing through at least one statement sequence in SC. By exploiting the

temporal ordering information in SC, it is possible to scale tailoring significantly better than slicing for large object-oriented programs and makes it a practically useful technique.

Tailor is sound as it computes T (SC ) over-approximately with respect to GICFG. In

contrast, traditional (sound) slicing [21] is unscalable when GICFGis large and thin slicing [47]

is unsound for GICFG (as it is designed for program debugging and understanding only).

Tailor computes T (SC ) in two stages, SC Extension and SC-based Data-Flow Analysis. In the first stage (Section 3.2), we exploit “branch correlations” in object-oriented programs to lengthen a given SC in order to avoid some infeasible paths that would otherwise be introduced in the second stage. In the second stage (Section 3.1), we compute T (SC) by solving a data-flow problem in order to avoid unrealizable paths efficiently. This design allows Tailor to achieve good precision and scale well to large object-oriented programs.

3.1 SC-Based Data-Flow Analysis

We compute T (SC) by solving flow- and context-sensitively an IFDS (Interprocedural Finite Distributive Subset) data-flow problem [38], efficiently on GICFG, via graph reachability.

This formulation of our SC-based data-flow analysis, denoted SCDFA, is significant for three reasons. (1) With flow-sensitivity, SCDFA can filter out imprecisely ordered statement sequences in a SC, as many static analyses from which SCs are deduced are flow-insensitive

(8)

BU TD BU TD getConfigInfo() 36 new Driver() 38 if (...) 33 getSystemFile() 39 getSystemInfo() 40 End Node new Driver(...) 34 Entry Node d.config() 35 fw.write() 12 new FileWriter() 2 new FileWriter() 2' out.close() 27 d.log() 42 { }w { }w { }w { }w { }w { }w { }w { }w { }w { }w { }cw { }cw { }ncw { }ncw { }ncw { }ncw, cw, w { }ncw, cw, w { }ncw, cw, w Non-SC Statement

Inter-procedural control flow SC Statement Intra-procedural control flow

{ }ncw { }ncw { }ncw { }ncw { }ncw { }ncw, w { }cw, w { }cw, w n n' c w

Figure 5 A simplified GICFG of the program given in Figure 2 for illustrating the bottom-up

(BU ) and top-down (TD) passes of SCDFA with SCw= {n→c→w, n0→c→w}.

in order to be scalable. Conversely, precisely ordered statement sequences in SC also enable SCDFAto filter out irrelevant control flows in tailoring a program. This mutually beneficial process improves the precision of both parts, therefore avoiding the unnecessary complexity faced for solving both as one problem. (2) With context-sensitivity, we avoid introducing unrealizable paths with mismatched calls and returns, which is critical for achieving precision in computing T (SC). (3) With an IFDS formulation, SCDFA can scale well to reasonably large object-oriented programs. In particular, (full) context-sensitivity can be realized more efficiently by solving a CFL-reachability problem over a simplified balanced-parentheses language. As an IFDS problem, SCDFA has a time complexity of O(ED3_{), where E is the}

number of edges in GICFG and D is the size of the set of SC-related data-flow facts used (i.e.,

the set of suffixes of sequences in SC, as will be clear below and defined in Section 4). SCDFAstarts with a bottom-up pass (BU ) and finishes with a top-down pass (TD), with both performed (fully) context-sensitively. By using the example program from Figure 2, we first explain the functionalities of two passes and then examine briefly how context-sensitivity is realized efficiently in the IFDS framework. Suppose we are given a SC as line 2 : fw = new FileWriter() → line 27 : out.close() → line 12 : fw.write(). For convenience, we write it as SC : n→c→w. In its GICFG, the allocation site for FileWriter

at line 2 is replicated in the two constructors of Driver. We identify the one in the single-arg constructor as line 2 (denoted by n) and the one in the non-arg constructor as line 20 _(denoted

by n0_{). As a result, we finally have a two-sequence SC}

w= {n→c→w, n0→c→w}.

BU and TD Passes. Both passes operate on GICFG, as shown in Figure 5, for our example.

As in any data-flow analysis, all control flow edges in GICFG are treated non-deterministically

executable. Let ENTRY be the entry node of GICFG and EXIT the node marking the point

of interest w at line 12 in SCw. Conceptually, if a sequence S ∈ SCw, which is ncw or n0cw,

lies on a path, then BU must see a suffix of S at every node n on the path backwards from EXITand TD must see the corresponding prefix of S at n forwards from ENTRY.

(9)

(_n (_n' )_n' )_n getSystemFile() 39 d.config() 35 new FileWriter() 2 new FileWriter() 2' FileWriter.<init> ENTER 0ncw cw w FileWriter.<init>EXIT 0ncw cw w 0ncw cw w 0ncw cw w 0ncw cw w 0ncw cw w call-to-return edge call-to-start edge exit-to-return edge … … n' n

Figure 6 Context-sensitivity via CFL-reachability.

control flow, starting at EXIT. PANTIout_{(n) represents the set of suffixes of some sequences}

in SCwthat are partially anticipable at the entry of n, i.e., appear on some outgoing path of

nending at EXIT. In our example, PANTIout(ENTRY) = {ncw, cw, w}. As ncw is partially anticipable (but n0_cw _{is not) at ENTRY, G}

ICFG contains some paths passing through ncw.

TD computes a global property, PAVAIL, for every node n in GICFG, forwards along

the control flow, starting from ENTRY. PAVAILin_{(n) specifies the set of suffixes of some}

sequences in PANTIout_{(ENTRY) to represent implicitly (for efficiency reasons) the fact that}

their corresponding prefixes are partially available at the entry of n, i.e., appear on some incoming paths of n starting from ENTRY. For example, PAVAILin_{(12) = {ncw, w}, indicating}

that the prefixes and nc of ncw are partially available at the entry of node 12.

For this example, a node n is included in the tailored program if PAVAILin_(n)∩PANTIout_(n)

6

= ∅, since a suffix s ∈ PAVAILin_{(n) ∩ PANTI}out_{(n) is partially anticipable at the entry of n}

and some sequence in SCw with s removed is partially available at the entry of n. In our

example, the tailored program consists of all the lines except lines 38 – 40.

Context Sensitivity. Figure 6 illustrates as an example how the TD pass shown in Figure 5

is performed for FileWriter() context-sensitively by solving a CFL-reachability-based balanced-parentheses problem, efficiently in the IFDS framework. For technical details, we refer to [38]. There are four data-flow facts, ncw, cw, w, and 0 (for the empty set). There are two call sites to FileWriter() at lines 2 and 20_{, which are identified as nodes n}

and n0_{, respectively, with their call-to-start and exit-to-return edges labeled as (}

n, )n, (n0

and )n0 appropriately. ENTER and EXIT are the start and exit nodes of FileWriter(),

whose CFG is irrelevant and thus elided. In SCDFA, the call-to-return edge always serves as a kill edge (to stop the data-flow facts from bypassing a callee). For each node n, PAVAILin(n) is the set of facts, i.e., suffixes of ncw shown in black dots. According to Figure 5, PAVAILin(2) = PAVAILin(20) = {ncw}. If FileWriter() is entered from (n and exited

from )n, then PAVAILin(ENTER) = PAVAILin(EXIT) = {cw}. Hence, PAVAILin(35) = {cw}.

However, if FileWriter() is entered from (n0 and exited from )_n0, then PAVAILin(ENTER) =

(10)

class PMD {

CMLOptions opts = new CMLOptions(args); void main (String[] args) {

Renderer renderer = opts.createRenderer(); } } 1 2 3 4 5 9 23 24 25 13 14 … … renderer.end(); class CMLOptions { if ( … ) { Renderer createRenderer () { return new EmacsRenderer(); } else if ( … )

return new HTMLRenderer(); } else if ( … )

return new SummaryHTMLRenderer(); } else if ( … )

} class SummaryHTMLRenderer ... {

renderer = new HTMLRenderer(...); void render (Writer writer, … ) {

renderer.renderBody(writer, ...);

class HTMLRenderer ... { writer.write(...);

void renderBody (Writer writer, … ) {

} } } } 15 16 17 18 19 20 21 22 } } class AbstractRenderer ... { render(writer, ... ); void end () { } } 6 7 8 10 11 12 Ext 1 Ext 2 Ext 3

Figure 7 A code snippet from PMD for illustrating SCEXT in a program understanding task.

3.2 SC Extension

Tailor is designed to work effectively with any SC. While an API specification mining tool [17] may generate SCs longer than 10, an assertion verifier may produce only single-point SCs. In general, the longer a SC is, the more SC-irrelevant statements will be eliminated.

To improve the precision of SCDFA, we perform first a SC extension pass, denoted SCEXT, as shown in Figure 4. Given a sequence S ∈ SC, with F being its first point, we look for a set of n extension points E1, · · · , En, that collectively dominate F , such that any program

execution that passes through F must pass through one Ei. We do so effectively by leveraging

the concept of object-sensitivity [33,44] developed by the pointer analysis community for Java. For F , its extension points are chosen as the object allocation sites used for representing the object-sensitive calling contexts for the method containing F . As a result, some infeasible paths are avoided by exploiting branch correlations in object-oriented programs (Figure 3). To make SC extensions, we use the points-to information obtained by, e.g., the pointer analysis performed earlier for building GICFG. Consider an ICFG fragment discussed earlier in

Figure 3. Suppose we are given SC : B → C such that B resides in a target method m for the virtual call site shown and m is only invokable on the objects created at A, which represents an object allocation site. Then we can prepend A to SC to obtain ESC : A → B → C. According to SCDFA, T (SC) will include both branches of “Branch” shown in Figure 3 since both reach SC but T (ESC) will include only the branch containing A as the other (infeasible) branch does not reach ESC. However, lengthening a SC increases the number of SC-related data-flow facts used (Figure 5). So a precision/scalability tradeoff needs to be made.

We use a real program understanding task to show why SCEXT is useful for real code. PMD is a static analyzer for analyzing Java programs and can print a range of source code flaws in different formats such as Emacs, CSV, and HTML. In this task, we want to understand how PMDrenders its outputs in the commonly used HTML format. Figure 7 shows the simplified code. The only knowledge we have initially is that the HTMLRenderer class is responsible for writing to the file at line 25, but how it is done is unknown. At this stage, we have a single-point, SCline 25: line 25, representing just the write statement at line 25.

If we apply SCDFA to SCline 25, T (SCline 25) will include all the lines in the code given.

Below we show how to extend this to ESCline 25 : line 20 → line 11 → line 25

object-sensitively [33, 44]. If we apply SCDFA to ESCline 25, T (ESCline 25) will now be smaller,

consisting of all the lines except lines 16, 18 and 22 in method createRenderer(). In other words, all the branches except the one enclosing line 20 in method createRenderer() are infeasible for SCline 25, according to the points-to information provided for this program.

Let us see how SCEXT works in growing SCline 25to become ESCline 25. Initially, SCline25

(11)

object pointed to by renderer at line 12 is allocated only at the allocation site at line 11 and is considered as the calling context for line 25 in an object-sensitive pointer analysis. There is another allocation site at line 18 for the same type, HTMLRenderer, but this is not considered, since the abstract object created at line 18 does not flow to line 12 (according to the points-to information available). At this stage, we have SCline 25: line 11 → line 25.

Similarly, we now look for the allocation sites that can be used as the calling contexts for line 11 contained in method render() at line 10, which is called from this.render() at line 8 in method end() defined in the AbstractRenderer class. Thus, the receiver objects on which method render() (line 10) is invoked are the ones pointed to by renderer at line 5. These objects are returned from the call to createRenderer() at line 4. Thus, renderer points to the objects created by the allocation sites in all different branches in createRenderer(). According to the points-to information, the target method end() invoked at the virtual call site at line 5 can only be made on an receiver object of type SummaryHTMLRenderer. Thus, we now have an even longer SCline 25: line 20 → line 11 → line 25.

The allocation site at line 3 is the object-sensitive context for the target method createRenderer(), which contains line 20, invoked at line 4. No further extension is possible, since main() has been reached. Now, we have SCline 25: line 3 → line 20 → line 11 → line 25.

If we apply SCDFA to this four-point SCline 25, T (SCline 25) will consist of all the lines

except lines 16, 18 and 22. However, including line 3 does not help as it dominates line 20 in GICFG. By removing it, we obtain ESCline 25: line 20→line 11→line 25 as desired. Applying

SCDFAto ESCline 25 yields the same tailored program obtained for the three-point SCline 25.

When lengthening SCs, we aim to reduce infeasible paths by exploiting branch correlations. However, with longer SCs, SCDFA will end up using more data-flow facts, as seen in Figure 5, making it less efficient than before. So a precision/scalability tradeoff has to be made. Our key observation is to keep a SC extension point found if it is inside one branch in a multi-way branching statement or a target method invoked at a polymorphic call site (Figure 3), provided that the point is not in a cycle. This way, the other branches (or target methods) are infeasible (with respect to a given SC) and will not show up in the tailored program.

Let us return to the program understanding task at hand. From the tailored program

T(ESCline 25) obtained for ESCline 25, we can see clearly that the allocation site at line 20

in method createRenderer() is responsible for the rendering-related write at line 25.

4 Formalism

We first formalize SCDFA and SCEXT and then prove some properties about Tailor.

4.1 SC-based Data-Flow Analysis

We provide a context-insensitive formalization of SCDFA as context sensitivity is achieved on top of this formalization via CFL-reachability, as shown in Figure 6. Essentially, we describe the data-flow equations needed for solving its BU and TD passes in the IFDS framework [38].

The IFDS problem framework is precise and efficient for solving data-flow problems with three properties. First, the set D of data-flow facts is finite. Second, the domain and range of each flow, i.e., transfer function is the power set of D, denoted 2D_{, the lattice ordering}

relation v on 2D _{is ⊆ or ⊇, and the meet operator u is ∩ or ∪. Finally, each fow function f}

must be distributive over 2D_{: ∀S}

1, S2∈2D: f(S1u S2) = f(S1) u f(S2).

Below we describe our BU and TD passes, including their finite domain D and data-flow equations used. In both cases, the ordering relation v is ⊇ and the meet operator u is ∪. All our transfer functions given below are easily seen to be distributive over 2D_.

(12)

Domain. A SCP is a set of statement sequences ending at the same statement P , with all

statements identified by their line numbers (or labels). The domain D for BU and TD is:

D= [

S∈SCP

Suffix(S) (1)

where Suffix(S) returns the set of all suffixes of S, including (the empty string). Note that is necessary when P appears in a control-flow cycle (i.e., a loop or recursion). In the IFDS framework, the data-flow facts used are generated on the fly for efficiency reasons.

We make use of the car, cdr and cons functions operating on sequences in the standard manner. If s and s0 _{are two sequences, then s ++ s}0 _{returns the concatenation of s and s}0_.

Graph Representation. The BU and TD passes operate on the GICFG representation of

a program. Without loss of generality, we assume that a node, i.e., basic block in GICFG

contains one single statement. Thus, a node and the statement represented by it are used interchangeably. Let ENTRY be the entry node of GICFG and EXIT be the node for P in

SCP. Given a node n, succ(n) (pred(n)) is the set of its successors (predecessors) in GICFG.

BU: Bottom-Up Analysis. BU computes a global property, PANTI, for every node n in GICFG, backwards against the control flow, starting at EXIT. PANTIout(n) represents the set

of suffixes in D that are partially anticipable at the entry of n, i.e., that appear on some outgoing path of n ending at EXIT. Thus, we have the following initialization at EXIT:

PANTIout(EXIT) = {P } (2)

which means that our point of interest P in SCP is partially anticipable at the entry of EXIT.

The transfer equations at a node n in GICFG are given by:

PANTIin(n) = [

n0_∈succ(n)

PANTIout(n0) (3)

PANTIout(n) = GENBU(n) ∪ PANTI

in_(n) ₍₄₎

Due to the presence of PANTIin_{(n) in (4), whatever is anticipable at the exit of n are also}

anticipable at the entry of n. No old data-flow facts (i.e., known suffixes) are killed, because some sequences in SCP may share some common suffixes but have different prefixes. GENBU

is applied to generate all the new suffixes partially anticipable at a node: GENBU(n) =

[

s ∈PANTIin(n)

ADD-SUFFIX-SEENBU(n, s) (5)

Given a partially anticipable suffix s at the exit of a node n, cons(n, s) will also be partially anticipable at the entry of n if cons(n, s) ∈ D. Hence, we have:

ADD-SUFFIX-SEENBU(n, s) =

{cons(n, s)} if cons(n, s) ∈ D

∅ otherwise (6)

The following lemma follows immediately from the data-flow equations (2) – (6).

ILemma 1. For any node n, /∈ PANTIin_{(n) and /∈ PANTI}out_{(n) always hold.}

IExample 2. Figure 5 illustrates the BU pass with the data-flow results shown. We start with

(13)

TD: Top-Down Analysis. TD computes a global property, PAVAIL, for every node n in GICFG, forwards along the control flow, starting from ENTRY and visiting only the nodes

reachable in BU . The basic idea is simple. A statement sequence S ∈ SCP starts at ENTRY

and flows forwards along the control flow and ends up with its first statement car(S) removed on encountering the node representing car(S). Therefore, a node n is included in the tailored program T (SCP) if the remaining suffix of S that flows to the entry of a node n appears also

in PANTIout_{(n) or the entire sequence S has been removed upon reaching n (when P in SC}

P

appears in a control-flow cycle). Formally, PAVAILin_{(n) computes the set of suffixes s ∈ D to}

represent implicitly (for efficiency reasons) the fact that their corresponding prefixes p, such that p ++ s = S for some S ∈ SCP, are partially available at the entry of n.

As all the sequences in SCP \ PANTIout(ENTRY) have been filtered out by BU , we only

need to focus on the ones partially anticipable at ENTRY. Hence, our initialization is: PAVAILin(ENTRY) = SCP ∩ PANTIout(ENTRY) (7)

The transfer equations at a non-ENTRY node in GICFG are given by:

PAVAILin(n) = [

n0_∈pred(n)

PAVAILout(n0) (8)

PAVAILout(n) =GENTD(n) if PANTI

out_{(n) 6= ∅}

∅ otherwise (9)

During the top-down pass, we only need to visit a node n if n is reachable during the bottom-up pass, which happens when PANTIout_{(n) 6= ∅.}

Unlike GENBU, GENTDmay preserve/kill an old data-fact and generate some new ones:

GENTD(n) =

[

s ∈PAVAILin(n)

ADD-SUFFIX-EXPECTED-TO-SEETD(n, s) (10)

As a suffix s ∈ PAVAILin_{(n) represents the fact that its corresponding prefix p, such that}

p++ s = S for some S ∈ SCP, is partially available at the entry of n, we have:

ADD-SUFFIX-EXPECTED-TO-SEETD(n, s) =

 {cdr(s)} if car(s) = n

{s} otherwise (11)

There are two cases. If car(s) = n, then cdr(s) is generated, indicating that p ++ n is partially available at the exit of n. At the same time, s is killed (for efficiency not correctness), as s would be redundantly propagated otherwise. If car(s) 6= n, then s is simply preserved.

IExample 3. Figure 5 illustrates the TD pass with the data-flow results shown. We start with

PAVAILin(ENTRY) = SCw∩ PANTIout(ENTRY) = {ncw} and finish with PAVAILin(EXIT) =

PAVAILin(12) = {ncw, w}. At node 27, PAVAILout(27) = {w}, since PAVAILin(27) = {cw}.

Tailored Program. Finally, a node n is included in T (SCP) if TAILORED(n) holds:

TAILORED(n) = PAVAILin(n) ∩ PANTIout(n) 6= ∅ ∨ ∈ PAVAILin(n) (12) The first disjunct suffices if P in SCP is not in a cycle (in GICFG). If s ∈ PAVAILin(n) ∩

PANTIout(n), then a prefix p, such that p ++ s = S for some S ∈ PAVAILin(ENTRY), is partially available at the entry of n, then n must be in T (SCP), since it lies on a path

passing through all statements in S. If P is in a cycle, then the whole sequence S that starts at ENTRY ends up being removed eventually, resulting in ∈ PAVAILin_{(n). Then n should}

be included in T (SCP) as well. Note that /∈ PANTIout(n) by Lemma 1.

IExample 4. According to the data-flow facts shown for the program given in Figure 5,

(14)

4.2 SC Extension

We make use of the points-to information provided by a pointer analysis to extend a SC to reduce infeasible paths that would otherwise be introduced by SCDFA. Given a statement sequence S ∈ SC, SCEXT will lengthen it recursively by prepending the object allocation sites representing the calling contexts for the method containing its first statement, as demonstrated in Section 3.2. The general algorithm for lengthening a sequence S : L1→ · · · →Ln is as

follows. Suppose that L1resides in a method m invoked at a virtual call site. Let A1, · · · , Ar

be its all r allocation sites for creating the receiver objects on which m is invoked. Then S grows into A1→L1→ · · · →Ln, · · · , Ar→L1→ · · · →Ln, and the same process continues.

With longer SCs, SCDFA will be less efficient due to more data-flow facts introduced. We will keep a SC extension point if it is embedded in a branch and ignore it otherwise. This way, SCEXTcan enable SCDFA to avoid infeasible paths more effectively by exploiting branch correlations. A statement is said to be embedded in a branch if it appears intraprocedurally (directly) or interprocedurally (indirectly) inside a multi-way branching statement or a target method invoked at a polymorphic call site (i.e., a virtual call site with at least two target methods). Let Stmts be the set of all statements in GICFG. We use the following function

InBranchOrVC: Stmts → boolean to capture formally this branch-embedding relation:

InBranchOrVC(s) =  



true if InIntraBranch(s)

false else if InMain(s)

InInterBranchOrVC(s) otherwise

(13) where InIntraBranch(s) determines if s appears directly in a branch or not and InMain(s) tells us whether s appears directly in main() or not. InInterBranchOrVC : Stmts → boolean checks to see whether s is embedded in a branch interprocedurally or not:

InInterBranchOrVC(s) = _

c ∈ Caller(m)

InBranchOrVC(c) ∨ |Callee(c)| > 1

where m is the method containing s

(14) where Caller(m) returns the set of call sites at which m is invoked. For each call site c, there are two disjuncts. One represents a recursive application of InBranchOrVC defined in (13) to c. The other one, |Callee(c)| > 1, evaluates to true if the call site c is polymorphic.

IExample 5. For the program in Figure 7, as discussed in Section 3.2, SCEXT starts with

SCline 25: line 25, grows it to SCline 25 : line 3 → line 20 → line 11 → line 25, and finally

settles with ESCline 25: line 20 → line 11 → line 25. Let us see why a SC extension point is

kept or ignored. Line 3 should be ignored since InBranchOrVC(line 3) = ¬InMain(line 3) =

false. Line 20 is retained since InInterBranchOrVC(line 20) = InIntraBranch(line 20) =

true. Finally, line 11 is also retained because we have InInterBranchOrVC(line 11) =

InInterBranchOrVC(line 8) = InInterBranchOrVC(line 5) = |Callee(line 5)| > 1 = true, where the call site at line 5 is polymorphic according to the points-to information provided. In practice, SCDFA does not usually benefit from a SC extension point if it appears in a control-flow cycle (a loop or a recursion cycle) in GICFG. Such cycle-related points

are identified and ignored as well. To detect recursion cycles effectively, we apply Tarjan’s algorithm [48] to find strongly connected components on the call graph of the program. To detect (natural) loops, we resort to a textbook loop detection algorithm. The statements reachable directly or indirected from a loop are also considered as being part of the loop.

(15)

statement s, s1, s2, ..., sm∈ S execution e: s1s2...sm∈ EG sequence sq: s1s2...sm∈ SC relation EG_{× SC} e sq relation EG_{× S} _{e → s} execution set Esq= {e ∈ EG| e sq} statement set Ssq= {s ∈ S | e ∈ Esq, e → s}

Figure 8 Notations used in proofs.

4.3 Properties

We prove that Tailor is sound (Theorem 3), by showing that SCDFA and SCEXT are sound with respect to GICFG (Theorems 1 and 2), and consequently, that every tailored program is

SC -executable, i.e., executable along all execution paths passing through a given SC. We make use of the notations given in Figure 8 in our proofs. S is a set of statements in GICFG. EG represents all runtime executions of the program represented by GICFG and

each execution e consists of a sequence of statements in S. We write e sq if execution econtains all the statements in a statement sequence sq = s1s2· · · smin exactly the same

order, ending at sm. We write e → s if execution e contains statement s. Esq is the set of all

executions that pass through a given sequence sq. Finally, Ssq _{is the set of all statements}

that appear in all possible executions e (passing through sq), where e ∈ Esq_.

The following theorem states that SCDFA is sound with respect to GICFG.

ITheorem 1 (Soundness of SCDFA). Let GICFG be the ICFG of a program. Let SC be given

(as defined in Section 3). If sq ∈ SC, then s ∈ Ssq =⇒ TAILORED(s).

Proof. We show that for every execution e such that e sq, where sq ∈ SC, TAILORED(s),

which is given in (12), holds for all statements s such that e → s. Let sq = s1s2...sm.

Since e sq, every statement si must appear at least once in e or more in the presence

of control-flow cycles. For convenience, let s0

e be a fictitious statement at the beginning

of e. Let sm+1

e be the last occurrence of sm in e, i.e., the last statement in e. Let sie be

the first occurrence of si in e after si−1e , where 1 6 i 6 m. We can now divide e into

m+ 1 sub-executions, e1, e2, · · · , em+1, where ei, represents the sub-sequence between si−1e

(exclusive) and si

e(inclusive). We will still write ei→ sif ei contains a statement s.

As e sq is an execution, then e represents a realizable path, which must appear in GICFG. In SCDFA, its BU and TD passes are distributive over 2D given in (1), performed

context-sensitively. Thus, we only need to focus on this path. Note that in our formulation of SCDFA, a statement si and its corresponding node ni in GICFG are used interchangebly.

During the BU pass in terms of (2) – (6), we must have:

∀1 6 i 6 m : ∀ ni s.t. ei→ ni: sisi+1· · · sm∈ PANTIout(ni) (15)

∀ nm+1 s.t. em+1→ nm+1: sm∈ PANTIout(nm+1) (16)

which implies sq ∈ PANTIout_{(ENTRY) (since ∀ n}

1s.t. e1→ n1: s1s2· · · sn∈ PANTIout(n1)).

During the TD pass, sq ∈ PAVAILin_{(ENTRY) by (7). By (8) – (11), we must have:}

∀ 1 6 i 6 m : ∀ ni s.t. ei→ ni: sisi+1· · · sm∈ PAVAILin(ni) (17)

∀ nm+1s.t. em+1→ nm+1: ∈ PAVAILin(nm+1) (18)

(Note that (18) is needed only if em+1 is non-empty, which happens when sm is in a

(16)

For a SC, we write α(GICFG, SC) to represent the set of all executions in GICFG that pass

at least one sequence in SC, i.e., α(GICFG, SC) = S_sq∈SCEsq. Theorem 1 can also be stated equivalently as follows: TAILORED(s) holds for every s ∈ {s | e ∈ α(GICFG, SC), e → s}.

The following theorem states that SCEXT is sound since only infeasible paths are ignored.

ITheorem 2 (Soundness of SCEXT). Let ESC be extended from a given SC by applying

SCEXT in GICFG, then α(GICFG, SC) = α(GICFG, ESC).

Proof. According to the algorithm of SCEXT operating in GICFG (Section 4.2), all possible

object-sensitive calling contexts for a method containing the first point F in a sequence of SC are considered as the extension points of F . According to (13) and (14), no feasible paths with respect to SC are excluded if an extension point of F is not selected. Thus, only infeasible paths with respect to SC are ignored when SC is extended into ESC this way. J

ITheorem 3 (Soundness of Tailor). Tailor is sound with respect to GICFG.

Proof. Follows from Theorems 1 and 2. _J

ITheorem 4 (SC-Executability). T(SC) obtained in GICFG is SC -executable.

Proof. By Theorem 3, α(GICFG, SC) is included in the tailored program. J

5 Implementation

We have implemented Tailor (http://www.cse.unsw.edu.au/~corg/tailor) in Soot, a framework for analyzing and optimizing Java programs [51]. To build the ICFG for a program, we apply Soot’s Spark pointer analysis [27]. During the ICFG construction, Soot models the effects of native methods by using abstract Java code and creates the corresponding control-flow edges. In addition, Soot considers both explicit and implicit exceptions and treats the exceptional edges as normal control-flow edges. Finally, Soot models thread creation and running as method calls by assuming that threads execute in a sequential order. How to build ICFGs to support multi-threading soundly, precisely and scalably for Java programs is a big challenge in its own right.

Tailor has two main components, SCEXT and SCDFA. When extending SCs, we make use of the points-to information provided by Spark to find the required object allocation sites object-sensitively. To perform SCDFA, we choose Heros [10] as the IFDS solver for its BU and TD data-flow problems, because it can be easily plugged into the Soot framework.

6 Evaluation

Program tailoring is designed to be sound for a program (with respect to its ICFG, as proved in Section 4), with useful precision and good scalability for large object-oriented programs. This section serves to evaluate the last two goals by answering three research questions:

RQ1: Is Tailor useful to support program debugging and understanding, in practice? RQ2: Is Tailor useful to support program analysis, in practice?

RQ3: Is Tailor scalable for large object-oriented programs, in practice?

To address RQ1 and RQ2, we conduct two real-world case studies. In one study, we demonstrate that Tailor can assist a state-of-the-art slicing tool, a thin slicer [47] implemen-ted in WALA [52], to simplify debugging and understanding tasks. In the other study, we demonstrate that Tailor can enable a sophisticated pointer analysis algorithm, S-2Obj [24],

(17)

Table 1 Program characteristics. For each program, the numbers are produced for both the

application and library code, including only reachable classes, methods and statements by Spark [27].

Application Description #Classes #Methods #ByteCodes LOC ANTLR (2.7.2) a recognizer and parser generator 2049 13,751 261,727 90,404 Avrora (1.7.117) an assembly program simulator 3196 17,186 276,340 92,505

Eclipse (4.5) IDE 2517 16,953 305,575 106,640

Apache™ FOP (0.20.5) a formatting-objects processor 4681 28,105 492,686 171,087 JBoss AS (4.0.2) an application server 4039 25,634 448,163 154,290 PMD (4.0) a source code analyzer 4234 26,623 467,249 161,300 Apache Tomcat™ (8.0.24) a Java Servlet container 3920 25,157 432,652 150,074

provided in a state-of-the-art pointer analysis tool for Java, Doop [14], to investigate the multi-object typestate (reflective) behavior in programs for which S-2Obj is unscalable as a whole-program analysis. In both studies, all SCs used are deduced from the results generated by state-of-the-art clients, Clara [9] and Solar [29], rather than injected manually.

To address RQ3, we perform a stress test on Tailor by using a large number of randomly generated SCs. Tailor scales well to relatively large Java programs, suggesting that program tailoring represents an attractive option as a practical tool.

Our experiments are carried out on an Xeon E5-2650 2GHz machine with 64GB RAM. We have selected seven large and diverse real-world Java programs under a large library, JDK 1.6.0_45, described in Table 1. For each program, with its bytecode representation generated by Soot [51], all the statistics are calculated by using Soot’s Spark pointer analysis [27].

6.1 RQ1: Program Debugging and Understanding

Wala [52] includes a traditional slicer [21] and a thin slicer [47] (denoted TSlicer), with industry-strength implementations. Traditional slicing does not scale to large object-oriented programs due to the key bottleneck in handling of the heap [47]. Thin slicing [47] alleviates the bottleneck by including only producer statements that affect directly the values at a program point. This unsound design can facilitate program debugging and understanding [25,50,59].

By focusing on producer statements, TSlicer is surprisingly effective, by producing often small (or thin) slices quickly. In this study, we show that Tailor can make TSlicer even more effective, as highlighted in Figure 1, by exploiting the temporal order of statements in SCs to prune away more statements irrelevant to SCs. In addition, Tailor achieves this improved precision by trimming a program more efficiently than TSlicer does. These results are significant given TSlicer’s unsoundness (even for GICFG) and well-tuned implementation

in Wala, demonstrating clearly the practical benefits of our SC-oriented program tailoring. To ensure a fair comparison between Tailor (implemented in Soot) and TSlicer (implemented in Wala), we have configured Soot to minimize the differences in the ICFGs constructed for a program in both frameworks. There are four main contributing factors: (1) pointer analysis, (2) reflection analysis, (3) exception handling, and (4) native code handling. Our bottom-line is to make sure that Tailor is never more precise than TSlicer in dealing with (1) – (4). For (1), we select Wala’s VanillaZeroOneCFA option to perform its allocation-site-sensitive pointer analysis, which is more precise than Soot’s Spark pointer analysis, as Spark merges some java.lang.String allocation sites for improving performance. For (2), we use Wala’s no_flow_to_casts option to resolve reflective calls. In Soot, we have taken advantage of Solar [29] to perform reflection resolution in the same way. For (3) and (4), Wala and Soot model the same native methods in JDK 1.3 and handle both explicit and implicit exceptions with some differences. However, these differences do not affect the

(18)

Case 1: TAILOR helps filtering out false alarms Number of SCs: Precision improved: Size range of thin slices Before After 0 max min 23 5

Distributed in programs:PMD, ANTLR

43

100%

77 SCs in total (Classified into four cases)

TAILOR is useful for Cases 1, 2 and 3 (82%)

but not for Case 4 (18%) Case 4: TAILOR fails to help

Number of SCs: Precision improved: Before After max min 31 6

Distributed in programs: ANTLR, FOP

14 0% max min 31 6

Case 3: TAILOR helps with ESC

Number of SCs: Precision improved: Before After max min 22 9

Distributed in programs: FOP, PMD

13

max min 13

5

46%(avg),41%(min),50%(max)

Case 2: TAILOR helps without ESC

Number of SCs: Before After 12 max min 43 13

Distributed in programs:ANTLR, Avrora, FOP, PMD

7

max min 20

Precision improved:15%(avg),7%(min),56%(max)

TAILOR TSLICER TAILOR TSLICER TAILOR TSLICER TAILOR TSLICER Size range of thin slices Size range

of thin slices _{of thin slices}Size range

Figure 9 A case study demonstrating how Tailor enables TSlicer to remove more SC -irrelevant

statements based on the SCs deduced from the results reported by a typestate analysis, Clara [9].

precision achievements obtained, validated by having inspected all results manually. Both Tailor and TSlicer trim a program based on the writer/OutputStream-related SCsdeduced from the results reported by a state-of-the-art typestate analysis tool, Clara [9]. TSlicer is run context-sensitively by using the last point in each SC as its slicing criterion with the variables of interest selected automatically. However, the results in the following cases are excluded: (1) Clara crashes due to runtime exceptions in the case of Eclipse, JBoss and Tomcat, (2) TSlicer is unscalable for a criterion (within 1 hour), and (3) the size of a thin slice is less than 5 (small enough for human consumption). Finally, 77 SCs are considered in total, involving 66 errors and 11 program points (like the one in Figure 7) for our debugging and understanding tasks. All these SCs are provided in our artifact.

Results and Analysis Figure 9 presents the final results, with all the 77 SCs classified

into four cases. In each case, we give the number of SCs included, the names of contributing programs, the size ranges of TSlicer’s thin slices before and after the SC-irrelevant statements detected by Tailor are removed, as well as the minimum, maximum and average precision improvements achieved. Note that Wala’s traditional slicer is unscalable for any SC (within 1 hour). Below we first analyze each case and then make some remarks.

Case 1. There are 43 SCs distributed in two programs, ANTLR and PMD. TSlicer has

produced thin slices ranging from 5 to 23 statements, requiring further human analysis efforts. In contrast, Tailor has produced only zero-sized tailored programs, declaring all the 43 errors as false alarms with respect to the given SCs. Furthermore, as Clara is sound (with respect to GICFG), all the 43 errors are false alarms for the 43 reported locations.

Let us consider a SC from PMD. Clara reports a “write after close” typestate error for a call to writer.write() at line 33 in class net.sourceforge.pmd.renderers.TextRenderer, together with four two-statement sequences of method calls leading to this potential error loc-ation: line 290 → line 325, line 290 → line 337, line 292 → line 325, and line 292 → line 337. These five calls are distributed in classes net.sourceforge.pmd.PMD and the afore-mentioned class TextRenderer. Given this four-sequence SC ending at line 33, Tailor recognizes that all these sequences are infeasible since line 33 is not reachable from lines 325 and 337.

(19)

Clara reports such false errors as it is partially context-sensitive and intraprocedurally but not interprocedurally flow-sensitive. This is a typical trade-off made by static analyses, which must, for example, reason about complicated typestate or protocol information as well. Otherwise, full context- and flow-sensitivity is unattainable scalably for large programs [9]. Unlike these client analyses, Tailor reasons about the (un)reachability of a statement towards a SC, making it substantially more amenable to a fully context- and flow-sensitive analysis. Tailor’s success in this case is potentially replicable for other analysis tools [9, 29, 37].

Case 2. There are 7 SCs spread across ANTLR, Avrora, FOP and PMD. In this case, SCEXT

is not useful. These SCs share the same characteristic as SCline 12: line 2→line 27→line 12

from our motivating example in Figure 2 (Section 2). For each SC, Tailor has succeeded in removing some SC-irrelevant statements in some branches from TSlicer’s thin slices.

Case 3. There are 13 SCs found in FOP and PMD. In this case, Tailor will be ineffective unless

SCEXTis turned on. We take a SC in FOP ending at line 101 in class CommandLineStarter to show how Tailor can simplify debugging tasks for object-oriented programs enormously. Given this point of interest, TSlicer returns a thin slice containing nine statements, which are distributed in five classes, including AWTStarter, PrintStarter and CommandLineStarter in package org.apache.fop.apps, where the first two are the subclasses of the last one. The first statement of this SC resides in CommandLineStarter’s run() method, which is overridden in the two subclasses. During the SC extension, the object allocation site at line 522 in the CommandLineOptions class is found to be the sole object-sensitive context for CommandLineStarter’s run() method. Therefore, with this extension point, Tailor is able to remove four irrelevant statements, line 94 in AWTStarter, line 91 in PrintStarter, and lines 494 and 508 in CommandLineOptions, from TSlicer’s thin slice, saving a human a lot of debugging effort on navigating through many irrelevant classes unnecessarily.

Case 4. There are 14 SCs found in ANTLR and FOP. Tailor fails to reduce TSlicer’s thin

slices any further. By including producer statements (and ignoring the others unsoundly), TSlicer has happened to eliminate all the SC - irrelevant statements removed by Tailor.

Remarks. First, Tailor has succeeded in making 82% (56%) of TSlicer’s 77 thin slices

smaller (empty), as shown in Figure 9, even though TSlicer is known to return small slices unsoundly. Second, Tailor aims to eliminate SC-irrelevant statements. As discussed in Section 2 and elaborated in Case 3, removing several or even just one SC-irrelevant statement can save a lot of debugging effort, particularly for large object-oriented programs. Finally, Tailor is fast, as compared with TSlicer in Figure 10. To make the analysis times for Tailor visible, the longest analysis times spent by TSlicer on several SCs are depicted at the top-left corner. In Case 4, Tailor is ineffective, but no harm is done, as Tailor is fast. Without Tailor, the practical benefits reaped from exploiting the temporal order in the other SCs in Cases 1 – 3 will be missed.

6.2 RQ2: Program Analysis

In this second case study, we demonstrate that Tailor can be invaluable for pointer analysis (the foundation for virtually all other analyses). In particular, we show how Tailor can enable today’s most sophisticated pointer analysis algorithms, which are unscalable for a program, to perform a more focused and thus potentially scalable analysis to its specific

(20)

TSLICER TAILOR 100 200 300 400 500Time (secs) 77 SCs in Figure 9 3017s 3072s 774s 797s 2800s ~ 3200s

Figure 10 Efficiency of Tailor vs. TSlicer.

parts that contain usually hard-to-analyze language features such as reflection [30]. For a Java program, a pointer analysis requires a reflection analysis to resolve part of its call graph representing reflective calls, and conversely, a reflection analysis requires the points-to information from a pointer analysis to discover the reflective targets at a reflective call site.

Reflection analysis finds many real-world applications, such as bug detection and security analysis [5,11, 28,31], in its own right. Despite recent advances [28,29, 45], a sophisticated reflection analysis does not co-exist well with a sophisticated pointer analysis, since the latter is unscalable for large programs [28,29,31,45]. If a scalable but imprecise points-to analysis is used instead, the reflection analysis may introduce many false call graph edges [29, 45], making its underlying client applications to be too imprecise to be practically useful.

We show how Tailor can alleviate this problem. We choose Doop [14], a state-of-the-art pointer analysis framework for Java, and focus on its three pointer analyses, 1-Call, S-2Type and S-2Obj. 1-Call is usually the most scalable but most imprecise. S-2Obj is the most precise but the most unscalable. S-2Type enjoys both precision and scalability, with its precision being always not better than S-2Obj [24]. For reflection analysis, we choose Solar [29], a state-of-the-art reflection analysis built on top of Doop. Our program analysis task is to investigate the multi-object typestate behavior at some reflective call sites as precisely as possible, by running Solar with S-2Obj. If S-2Obj is scalable for a program, we are done. Otherwise, we run Solar together with 1-Call, and if that is unscalable, with S-2Type on the same program. If either is scalable, we treat Solar as a client analysis for producing the required SCs. To investigate the behavior at a reflective call site P , say, m.invoke(), we let its SCP be all possible sequences of API method calls ending

at P , such as Class.forName() → c.getMethod() → m.invoke(), found by Solar. Then we run S-2Obj on the tailored program T (SCP). If SCP represents all possible sequences

of method invocations for P , then T (SCP) exhibits the same behavior at P as in the whole

program (Theorem 3). Otherwise, a SCP-specific behavior at P is analyzed.

For the seven applications listed in Table 1, S-2Obj is only unscalable for Eclipse, Tomcat and JBoss, which will therefore be the focus of our second study. Figure 11 shows the number of reflective targets resolved at all call sites to Class.newInstance()/Method.invoke() (the most commonly used [28], in practice) in the application code of Eclipse, Tomcat and JBoss, by 1-Call (the red dots) and S-2Type (the green dots from JBoss as 1-Call is unscalable).