Object identification in procedural code - Declared vs actual types

Class Diagram

3.2 Declared vs actual types

3.5.1 Object identification in procedural code

In this chapter, reverse engineering of the class diagram has been presented with reference to Object Oriented programs. A lot of work [12, 13, 51, 75, 80, 88, 102] has been conducted within the reverse engineering research com- munity, aimed at identifying abstract data types in procedural code. Thus, classes are tentatively reverse engineered from procedural (instead of Object Oriented) code.

The purpose of the analyses considered in these works is supporting the migration from procedural to Object Oriented programming. It was recognized that this migration process cannot be fully automated and the results available in the literature provide local approaches which help in some cases, but not in others. If a software system was built around data types in the first place, it is possible to identify and extract them as objects. If not, it is hard to retrofit objects into the system and, until now, no one has come up with a general, automated solution for transforming procedural systems into Object Oriented ones. In such a case, the output of reverse engineering may be only the starting point for a highly human-intensive reengineering activity.

In [51] the main methods for class identification are classified as global- based or type-based, respectively when functions are clustered around globally accessible objects or formal parameter and return types. A new identification method – based on the concept of receiver parameter type – is also proposed. The approach presented in [12], which considers accesses to global variables, uses an internal connectivity index to decide which functions should be clustered around the recognized class. Such a method is extended in [13] to include type-based relations and it is combined with the strong direct dominance tree to obtain a more refined result. The recovery technique described in [102] builds a graph showing the references of the procedures to the internal fields of structures. Accesses to global variables drive the recognition of classes.

In [27] the star diagram is proposed as a support to help programmers restructure programs by improving the encapsulation of abstract data types. Another decomposing and restructuring system is described in [58]. Both of them provide sophisticated interaction means to assist the user in the process of analyzing and restructuring a program.

3.5 Related Work 61

Several works [50, 75, 80, 88] on identification and remodularization of abstract data types are based on the output produced by concept analysis [25]. The relation between procedures and global variables is analyzed by means of concept analysis in [50]. The resulting lattice is used to identify module candidates. Concept analysis is used in [75] to identify modules, by considering both positive and negative information about the types of the function argu- ments and of the return value. An example of how to identify class candidates from a C implementation of two tangled data structures is provided in [75]. Concept analysis succeeds in separating them into two distinct classes. In [88], encapsulation around dynamically allocated memory locations and module restructuring are considered. Points-to analysis is used to determine dynamic memory accesses, while concept analysis permits grouping functions around the accessed dynamic locations. Concept analysis is exploited in [80] to reengi- neer class hierarchies. A context describing the usage of a class hierarchy is the starting point for the construction of a concept lattice, from which redesign possibilities are derived.

4 Object Diagram

This chapter describes a technique to statically characterize the behavior of an object oriented system by means of diagrams which represent the class instances (objects) and their mutual relationships.

Although the class diagram is the basic view for program understanding of Object Oriented systems, it is not very informative of the behavior that a program will exhibit at run time, being focused on the static relationships among classes. On the contrary, the object diagram represents the instances of the classes and the related inter-object relationships. This program repre- sentation provides additional information with respect to the class diagram on the way classes are actually used. In fact, while the class diagram shows all possible relationships for all possible class instances, the object diagram takes into consideration the specific object allocations occurring in a program, and for each class instance it provides the specific relationships a given object has with other objects. While in the class diagram a single entity represents a class and summarizes the properties of all of its instances, in the object diagram different instances are represented as distinct diagram nodes, with their own properties. Thus, the dynamic layout of objects and inter-object relationships emerges from the object diagram, while it is only implicit in the class diagram.

A static analysis of the source code based on the flow propagation in the OFG can be exploited to reverse engineer information about the objects allocated in a program and the inter-object relationships mediated by the object attributes. The allocation points in the code are used to approximate the set of objects created by a program, while the OFG is used to determine the inter-object relationships. Resulting diagrams approximate statically any run-time object creation and inter-object relationship, in a conservative way. A second, dynamic technique that can be considered to produce the object diagram is based on the execution of the program on a set of test cases. Each test case is associated with an object diagram depicting the objects and the relationships that are instantiated when the test case is run. The diagram can

be obtained as a postprocessing of the program traces generated during each execution.

The static and the dynamic techniques are complementary, in that the first is safe with respect to the objects and relationships it represents, but it cannot provide precise information on the actual multiplicity of the allocated objects (e.g., in presence of loops), nor on the actual layout of the relationships associated with the allocated objects (e.g., in presence of infeasible paths). The dynamic view is accurate with concern to the number of instances and the relationship layout, but it is (by definition) partial, in that it holds for a single test run. Therefore, it is useful to contrast the dynamic and static view, to determine the portion of the latter that was explored with the available test suite and to refine it with information suggested by the dynamic views.

This chapter is organized as follows: after a summary presentation of the object diagram elements, given in Section 4.1, Section 4.2 describes a static method for object diagram recovery. It is a specialization of the general purpose framework defined in Chapter 2. Section 4.3 provides the details of an object sensitive OFG algorithm for the recovery of the object diagram. The dynamic technique for object diagram recovery is presented in Section 4.4. At the end of this section, static and dynamic analysis views are contrasted, high- lighting advantages and disadvantages of both, and providing hints on how they can complement each other. Static and dynamic extraction of the object diagram is conducted on the eLib program in Section 4.5. Related works are discussed in Section 4.6.

In document Reverse Engineering of Object Oriented Code pdf (Page 75-79)