Flow Propagation Algorithm - The Object Flow Graph

The Object Flow Graph

2.4 Flow Propagation Algorithm

The OFG represents all data flows involving objects. It is thus possible to exploit it to analyze the program’s behavior, by propagating proper information according to the same flows along which objects are possibly propagated. In the next chapters some examples of the kind of information to be propagated will be given. The type to which an object is cast is one such example. The allocation of an object at a given program point is another one. How- ever, in general it can be assumed that some interesting piece of information, taken from a set V, is propagated along the OFG. Correspondingly, a flow propagation algorithm can be given, independent of the specific elements in V.

Fig. 2.3 shows the pseudocode of the generic flow propagation algorithm. It is a specific instance of the flow analysis framework described in [2], ap- plied to the OFG instead of the control flow graph. Each node of the OFG stores the incoming and outgoing flow information respectively inside the sets and which are initially empty. Moreover, each node generates the set of flow information items contained in the set, and prevents the elements in the set from being further propagated after node Incoming flow information is obtained from the predecessors of node as the union of the respective out sets (forward propagation). For some analyses, it may be appropriate to propagate flow information following the OFG edges in reverse order (backward propagation). This is obtained by collecting the incoming information from the out sets of the successors. In other words, the pseudo-statement 7 becomes:

2.4 Flow Propagation Algorithm 31

Fig. 2.3. Pseudocode of the flow propagation algorithm (forward propagation).

7’

in case of backward propagation. Incoming flow information is trans- formed into outgoing information by removing the elements in the set

and adding those in Flow information is repeatedly propagated inside the OFG until the fixpoint is reached: no incoming and no outgoing information changes, in any OFG node.

Assuming an upper bound for the flow information propagated in the OFG, the algorithm in Fig. 2.3 is ensured to converge in polynomial time. The actual performance can be greatly improved by choosing a proper ordering of the nodes in the OFG. In absence of loops, the best ordering is the partial order induced by the graph edges. When loops are present, a good strategy consists of propagating the flow information inside the loop before considering the nodes following the loop.

The solution produced by the algorithm in Fig. 2.3 has the property of being valid for all program executions that give rise to the data flows represented in the OFG. Since the OFG has been defined in order to take into account all statically possible data flows, the resulting solution is conservative (safe), in that no data flow can ever occur at run time which is not represented by a path in the OFG. However, in general it is impossible to decide statically if a path is feasible or not (i.e., if it can actually be executed for some input). Thus, the solution produced by the algorithm might be over-conservative, in that it may permit flow propagation along infeasible paths. Consequently, if a flow information is present at a node, there may be an execution of the program that actually produces it, while if it is absent, it is ensured that no execution can ever produce it.

32 2 The Object Flow Graph

2.5 Object sensitivity

According to the abstract syntax in Fig. 2.1, class attributes, method names, program locations, etc., are scoped at the class level. This means that it is possible to distinguish two locations (e.g., two class attributes) when they belong to different classes, while this cannot be done when they belong to the same class but to different class instances (objects). In other words, the OFG constructed according to the rules given in Section 2.2 is object insensitive. While this may be satisfactory for some analyses, in some cases the ability to distinguish among locations that belong to different objects might improve the analysis results substantially.

An object sensitive OFG can be built by giving all non-static program names an object scope instead of a class scope (static attributes and program locations that belong to static methods maintain the class scope). Objects can be identified statically by their allocation points, thus, in an object sensitive OFG, non-static class attributes and methods (including their parameters and local variables) are replicated for every statically identified object. Syntactically, an object allocation point in the code is determined by statements of the kind (5) in Fig. 2.1. For each such allocation point, an object identifier is created, and all attributes and methods in the class of the allocated object are replicated for it. Replicated program locations become distinct nodes in the OFG.

Construction of the OFG edges becomes more complicated when locations are object sensitive. For example, in presence of method calls, sources and targets of OFG edges can be determined only if the current object (pointed to by this) and the objects pointed by the reference variable used as invocation target are known. Chapter 4 provides the details of an algorithm to infer such an information.

eLib example

Let us consider two statements, one from the method getUser (line 141) and the other fromgetDocument (line 144) of class Loan. Their abstract syntax, with class scoped names, is:

Assuming that two Loan objects are created in the program, their identifiers being Loan1 and Loan2, the two statements, with object scoped names, become:

2.5 Object sensitivity 33 The effect of object sensitivity on the accuracy of the OFG consists of a finer grain edge construction, resulting in a more precise propagation of information along the data flows. In fact, information is not mixed when propagated along different objects, in an object sensitive OFG. Let us consider the following code fragment, inside a hypothetical methodmain of classMain:

in addition to the body of Loan.Loan (line 136) and Loan.getDocument

(line 143) represented as:

Five objects are allocated in total inside the code fragment above. We will identify them as User1, Document1, Loan1, Document2, Loan2 respectively.

Fig. 2.4. Object insensitive OFG.

Figures 2.4 and 2.5 contrast object insensitive and object sensitive OFGs for the code given above. Object flows in Fig. 2.5 capture the data flows occurring in the code fragment more accurately than those in Fig. 2.4. For example, the two variables d1 and d2 are assigned a Document object created at two distinct allocation points. While in the OFG of Fig. 2.4 incoming

34 2 The Object Flow Graph

edges come from a same node (Document. Document. this), in Fig 2.5 the edge for the first object comes from nodeDocument1.Document.this and ends at

Main.main.d1,while the second edge goes from Document2.Document.this

toMain.main.d2. In this way, the data flows related to these two objects are kept separated.

Similarly, the twoLoan objects assigned tol1 and12 belong to two different flows in Fig. 2.5 (bottom), while they share the same flow in Fig. 2.4. In the object sensitive OFG (Fig. 2.5),Main.main.d1 flows intoLoan1.Loan.doc,

due to parameter passing, whileMain.main.d2 flows intoLoan2.Loan.doc.

These two flows are mixed in Fig. 2.4. When getDocument is called on ob- jectl1, a single location(Loan.getDocument .return) stores the return value in Fig. 2.4, combining both flows from Main.main.d1 and Main.main.d2.

On the contrary, two return locations are represented in Fig. 2.5, namely

Loan1.getDocument.return and Loan2.getDocument.return.Since the call is issued on l1, and this variable can reference Loanl only, an OFG edge is created from Loan1.getDocument.return to Main.main.doc, but not from

Loan2.getDocument.return.

The potential advantages of an object sensitive OFG construction are ap- parent from the example above. In practice, the actual benefits depend on the purposes for which the successive analysis is conducted.

The main difficulty in object sensitive OFG construction is the static es- timation of the objects referenced by variables. This information is neces- sary whenever an attribute or a method are accessed/invoked through a reference variable. In fact, the related edges connect locations scoped by the pointed objects. In the example above, Loan1.getDocument.return(but not

Loan2.getDocument.return) is connected to Main.main.doc,becausel1 ref- erences Loan1 (but not Loan2).

In order to construct an object sensitive OFG, the information about the objects possibly referenced by program variables can be obtained by defining a flow propagation on the OFG aiming at statically estimating the referenced objects. This is the topic of Chapter 4. However, the algorithm used for this purpose assumes the availability of the OFG itself. Thus, we have a mutual dependence. It can be solved by constructing the OFG edges incrementally. On the contrary, all OFG nodes can be constructed from the very beginning. Initially, all allocations points are associated to object identifiers, used to scope the names of non-static program locations. This produces the set of all OFG nodes. As regards edges, only internal edges can be built at this stage, that is, edges involving constructor/method parameters or local variables, that are replicated for every object scope (boxes in Fig. 2.5).

Invocation of methods and access to class attributes require knowledge about the objects referenced by variables and by the special location this. Such information is approximated by a first round of flow propagation. At the

2.5 Object sensitivity 35

Fig. 2.5. Object sensitive OFG. Dashed (resp. solid) boxes indicate a method body replicated for each allocated object.

end of the propagation, edges can be added to the OFG for method calls and attribute accesses, using the objects pointed to by the related variables, as determined by the flow propagation. On the new version of the OFG obtained in this way, including the edges produced by the result of the previous flow propagation, a better estimate of the objects pointed by variables can be obtained. Refinement of the OFG can continue, until a stable one is produced (it should be noted that the incremental construction is monotone, in that edges are possibly added, but never removed).

Complete construction of an object sensitive OFG is possible only if the whole program is available (including the main), since all allocation points of all involved objects must be part of the code under analysis. In Object- Oriented programming this may not be the case, since incomplete systems are often produced and classes are often reused in different contexts. In these situations, an object insensitive OFG construction may be more appropriate.

In document Reverse Engineering of Object Oriented Code pdf (Page 45-51)