Integrated forward and reverse engineering tools

Information needs Frequency

7.4 Program comprehension and tools

7.4.4 Integrated forward and reverse engineering tools

CASE analysis and design tool vendors have formed alliances with maintenance workbench vendors in an attempt to provide integrated support for design, implementation and maintenance across the software lifecycle. Maintenance workbenches were discussed above and are not discussed further.

Focus here is placed on the \Reverse Engineering" capabilities of front end analysis and design tools.

Reverse engineering approaches to providing support for program comprehension are character-ized by the parsing of source code (i.e., they are also static analysis techniques) in order to identify structural and functional elements of a system, and to obtain higher level abstractions. The infor-mation obtained via reverse engineering approaches overlaps greatly with support oered by static analysis and maintenance workbenches. All three categories of tools will produce structure charts, data ow and control diagrams, call graphs, and less commonly entity-relationship diagrams and inheritance diagrams.

Reverse engineering approaches are dierentiated from other static analysis approaches primar-ily by their claims of recovery of design information, which is presented as allowing the user to move

CHAPTER 7. A MAINTENANCE PERSPECTIVE 100 backward from source code to design and forward from design to source code with relative ease.

In addition, systems can be documented through integrations with common document preparation systems. One vendor claims support from studies indicating that their approach can cut future maintenance costs by 80% [IDE 90]. Unfortunately, no references are provided (sic).

7.4.4.1 Support for program comprehension

Reverse engineering tools are adept at identifying the components of a system that are directly re ected in the source code and at identifying the physical dependencies between components. In this, they mirror the capabilities of static analysis and maintenance workbench tools.

Of course, only that design information that is physically re ected in the code can be captured;

much essential design information is lost in the transformation from requirements analysis to design and eventually to code (or perhaps never captured). The inability to recover information that is unfortunately no longer available, or is well hidden in the details of an implementation, is a major disadvantage of reverse engineering tools.

Researchers and practitioners have identied three primary limitations of the reverse engineering approaches. These include:

inadequate support for identication of analysis/design information and abstractions that are not clearly re ected in the code, such as specications and logical subsystems;

inadequate support for relating high level abstractions to concepts in the application domain;

inability to identify the rationale for design designs reached, alternatives rejected, and the processes by which decisions were reached.

These limitations relate directly to types of information that are commonly lost during program implementation or hidden in the details of the source code. Subsequent sections discuss these limitations.

7.4.4.1.1 Identication of analysis and design level abstractions

In spite of the claims of vendors, few commercial tools provide mechanisms for identifying design abstractions such as specications or logical subsystems that are not obviously re ected in the source. The general approach presented by tools is to provide information such as structure charts. The maintainer is then expected to infer specication and design abstractions from the artifacts identied in the source.

CHAPTER 7. A MAINTENANCE PERSPECTIVE 101 Unfortunately, a large system of 500,000 source lines and averaging 200 lines per module may have 2,500 modules that may be represented in structure charts. This presents an information load to the maintainer that can be overwhelming. It is not clear how much assistance in identifying high level design abstractions is provided in such cases.

In addition, it is not always clear that the source code is a good representation of the designers intent or of the \logical" (as oppose to physical) structure of a system. Even in situations where intended function and subsystem design were originally evident in the source code, years of mainte-nance will often blur the edges of subsystems and hide design decisions and intent. In short, tools that help re ect the existing structure of software may not help in re ecting the logical structure of the system.

In an attempt to deal with this problem, [Muller, Tilley and Wong, 93] have developed the Rigi system which combines source code parsing capabilities with support for the identication of subsystem structure. Once the code is parsed and stored into the system repository, the Rigi system applies basic software engineering principles such as module cohesion and coupling to analyze the system. A potential subsystem structure for the system is identied which can be modied by user interaction. The Rigi system also provides software quality measures to assist in the evaluation of resulting (and potentially alternative) subsystem structures.

The Rigi system has been used in three case studies to reverse engineer COBOL and C applica-tions and a commercial database system. These case studies have pointed out the need for a highly scalable and exible reverse engineering system that supports tailorability and extensibility of the user interface and of the operations of the system.

Capabilities such as those provided by Rigi are now becoming available commercially. The ParaSET workbench performs similar subsystem recognition functions based on both control and data-based cohesion and coupling.

An alternative approach aimed at identifying high level design abstractions involves the use of program transformational systems that modify a program into a dierent form which has the same external behavior. [Ward 93] discusses one such research system (the Maintainer's Assis-tant) which uses knowledge-based heuristics to suggest suitable transformations and allows the maintainer to selectively apply transformations. Transformations include eliminating \gotos," un-necessary/redundant ags, analyzing structures to identify and implement equivalent but \higher level" data and control structures. The \higher" level structures identied are assumed to make the program easier to comprehend by eliminating much of the detail and clutter, and focusing the maintainer on the \essence" of the program. Eventually, by iterating over the current representa-tion of the system, a language independent abstracrepresenta-tion of the system is identied. Conceivably,

CHAPTER 7. A MAINTENANCE PERSPECTIVE 102 this abstraction represents the high level design of the system.

While this approach has only been tested with small segments of code, results are reported to be positive. The authors suggest that the technology has reached a phase where application to larger scale maintenance tasks is possible.

One commercially available tool which provides some capability for program transformation is the Software Renery [Markosian, Newcomb, Brand, Burson, Kitzmiller 94]. The Software Renery provides a number of components, including:

a parser generator in which the syntax of the source language for the reengineering eort is dened. Predened parsers are available for COBOL, C, Ada and Fortran, but the parser generator can be used to build parsers for a wide range of other languages (including assembler languages).

a object base in which abstract syntax trees produced by the parser, and other information, is stored.

a pattern matcher for matching abstract syntax trees in the object base

a specication language that allows the user to dene (and apply) transformations to the object base.

The Software Renery produces standard \high level" abstractions such as call and control ow graphs and data models but additionally allows the user to dene transformations to patterns that are matched in the object base. The system of parser generator and specication language/pattern matching and replacement supports the generation of reverse engineering tools for non-supported languages and automates the regular replacement (transformation) of source features with alternate structures. This capability can be used to support activities such as porting to new languages and databases.

The identication of appropriate transformations is primarily manual. The tool does not support sophisticated knowledge-based heuristics for identifying \higher" level structures (beyond those produced by common reverse engineering tools) As evident in research tools such as the Maintainer's Assistant, such heuristics primarily remain the subject of research.

7.4.4.1.2 Relating high level abstractions to domain concepts

When a requirements analyst species a system, s/he attempts to discover the human and often domain oriented intent of the intended user. It is precisely this intent that is often the driving force for choosing from

CHAPTER 7. A MAINTENANCE PERSPECTIVE 103 among a set of design alternatives. For example, the pattern of expected access to data based on the domain of the application can be a driving force in determining the appropriateness of specic data models and the eciency of alternative database search algorithms. This information assists the system designer in making design decisions, such as determining that a specic database is appropriate for real time update by a radar device, while another is more appropriate for personnel records.

Often, little of the intent of the system is immediately evident from the source code beyond the algorithms that represent the end product of the analysts work. Some information may be available in documentation; however this information is notoriously inaccurate and incomplete. What a maintainer must do, then, is attempt to recapture this information, based primarily on the often sole trusted artifact; the system source code. The maintainer uses skills of analysis, experimentation and guesswork to discover this intent. As pieces of the programmer's intent and domain information are discovered, they must be related to their manifestation in the program source code. The problem of (re)discovering elements of human and domain concepts and associating them with program concepts and context has been termed the concept assignment problem [Biggersta, Midbander, and Webster 94].

[Biggersta, Midbander, and Webster 94] believe that parsing-oriented tools (the class includes the vast majority of static analysis, maintenance workbench, and reverse engineering tools) are good at recognition of programming concepts, but are failures at recognition of human/domain oriented concepts. In fact, they suggest that the human/domain oriented concepts do not just represent a level of abstraction higher than programming concepts; rather, human/domain concept reason-ing uses domain knowledge, plausible reasonreason-ing, and ambiguous terms. In short, human/domain concepts and programming concepts exist in distinct worlds, and no formal set of rules or transfor-mations can directly convert one to the other. Thus, parsing-based reverse engineering tools alone can provide only limited assistance in helping the maintainer understand the application.

In the DESIRE toolset, Biggersta et al attempt to address the concept assignment problem with both naive and intelligent assistance in relating human/domain concepts with program con-cepts. The \naive" portion of the DESIRE toolset provides facilities similar to common static analysis tools that identify functions which utilize specied variables and associated call chains.

This portion of DESIRE also provides a slicer that supports generation and modication of slices, as well as a cluster analyzer which selects and displays clusters of functions based on criteria input by the user.

The \intelligent assistant" portion of the DESIRE toolset scans the source code looking for candidate domain concepts based on information stored in a domain knowledge base. Candidate domain concepts are identied based on the clustering techniques and on matches of identiers

CHAPTER 7. A MAINTENANCE PERSPECTIVE 104 to key words in the domain. The \intelligent" portion of the DESIRE system is in a research prototype, but the more traditional \naive" portion of the system has been used with moderately large systems.

7.4.4.1.3 Identifying design decisions

Common reverse engineering systems are able to highlight some aspects of the existing design for a program. They fail, however, to identify the design decisions reached, the design alternatives rejected, and the process by which the decisions were reached. However, it is precisely such design decisions that become critical to the maintainer in determining how a system performs and whether the system can be adapted to perform in some new manner.

Some rather unique work to address this tool limitation was carried out by [Rugaber, Ornburn, LeBlanc, 90]. They attempted to classify several types of design decisions in source code and to locate the artifacts (for example, identifying algorithms) left by those decisions in the code.

Additional work by Rugaber and his colleagues suggested that large scale, monolithic reverse engineering tools fail because they are not exible enough to deal with spontaneous requests for information that arise because of the need to make rapid, unanticipated changes to source code.

To remedy this situation, they recommend rapidly assembled simple tools to provide support for identifying design decisions. They found that even relatively crude, custom-crafted tools like \throw away" scripts and parsers built with \yacc" were useful for collecting data from the source code and reducing it to answer specic questions. However, they also found that control ow analysis and slicing tools could assist in the identication of design decisions.

In document Reengineering Center. Carnegie Mellon University. Pittsburgh, PA (Page 112-117)