The Elimination of Overheads due to Type Annotations and the Identification of Candidate Refactorings

(1)

CARLOS, COHAN SUJAY The Elimination of Overheads due to Type Annotations and the Identification of Candidate Refactorings. (Under the direction of Associate Professor S. Purushothaman Iyer).

Refactorings are meaning-preserving transformations of object-oriented programs carried out with the aim of improving their design. Sometimes refactorings accomplish just cosmetic improvements. At other times, they make programs easier to modify. Many systems for refactoring programs have been described in the literature.

The past few years have seen a plethora of object-oriented languages that enforce strict static type-checking and use explicit type declarations (type annotations). Type annotations add to the expense of modification. Our first result is a categorisation of refactorings by their effect on type annotations.

In the second part of this thesis, we study a tool that makes it easier to change programs with type annotations. The tool uses type inference to automate the management of type annotations.

(2)

by

Cohan Sujay Carlos

A thesis submitted to the Graduate Faculty of North Carolina State University

in partial satisfaction of the requirements for the Degree of

Master of Science

Department of Computer Science

Raleigh 2002

Approved By:

Dr. S. Purushothaman Iyer Chair of Advisory Committee

(3)

(4)

Biography

(5)

Acknowledgements

I am indebted to my advisor, Dr. Purushothaman Iyer, for the advice, guidance and support, the many insights and just for keeping me focussed, to Dr. Edward Gehringer who taught me to program in Smalltalk and gave me the opportunity to see it used in practice, and to Dr. Laurie Williams, who introduced me to refactoring and encouraged my efforts in so many ways.

Many thanks to my good friend, Prashant Iyer, author of the elegant Decaf com-piler that I used in my research. I am also grateful to my teachers, associates and friends and the staff of the Department of Computer Science, North Carolina State University, Raleigh, for their delightful company, friendliness and understanding.

(6)

List of Figures

1.1 Type Annotation Examples . . . 2

3.1 Extracting a Method . . . 13

3.2 TAI Example . . . 18

3.3 Replace-Parameter-With-Method . . . 20

3.4 Change-Parameter-Into-Member-Variable . . . 22

3.5 Replace-Method-With-Method-Object . . . 23

3.6 Introduce-Parameter-Object . . . 24

3.7 Separate-Query-From-Modifier . . . 24

4.1 Overheads from Type Annotations . . . 28

4.2 Sample Code . . . 32

4.3 Sample Type Constraint Graph . . . 32

4.4 A Sample Decaf Program . . . 34

4.5 A Sample Set of Inputs . . . 35

4.6 Stage One Output Snippet . . . 37

4.7 Stage Two Output Snippet . . . 38

4.8 Sample Decaf Program after the Change . . . 39

5.1 Trap Inviting a Split-Temporary-Variable . . . 45

5.2 The Effect of Split-Temporary-Variable . . . 46

5.3 Before i-Pull-Up-Method . . . 47

5.4 After i-Pull-Up-Method . . . 48

5.5 Before Extract-Superclass . . . 48

5.6 After Extract-Superclass . . . 49

5.7 Before Split-Method . . . 49

(9)

List of Tables

3.1 Opdyke’s Primitive Refactorings . . . 14

3.2 Primitives Refactorings that Affect the Number of Type Annotations . . . 15

3.3 Type Annotation Increasing Refactorings . . . 17

3.4 Type Annotation Reducing Refactorings . . . 18

3.5 Type Annotation Neutral Refactorings . . . 18

3.6 Left-Over Refactorings . . . 19

3.7 Composition of TAI Refactorings made up of TAI refactorings . . . 20

4.1 Palsberg and Schwartzbach’s Local Constraint Rules. . . 30

(10)

Chapter 1

Introduction

This thesis studies how type systems affect software modification. In particular, we investigate the use of type inference to make software modification easier, and to improve programming speeds. The two goals of the thesis are:

1. The elimination of programming inefficiencies resulting from type annotations.

2. The identification of candidate refactorings.

1.1 Definitions

We commence by defining a few terms which are used throughout the thesis. The terms areType,Subtype,Type Annotation,Refactorings,Type CheckingandType Inference.

1.1.1 Types and Subtypes

(11)

int i ;

float foo() ;

Figure 1.1: Type Annotation Examples

1.1.2 Type Annotation

A type annotation is that part of an explicit declaration which specifies the type of the declared variable, in a statically typed language like Java and C++. An example of a type annotation is the type specifier intin Figure 1.1. The type identifier floatdenoting the return type of the functionfoo(), is also a type annotation.

1.1.3 Refactorings

Refactorings have been described as “program restructuring operations (refactor-ings) that support the design, evolution and reuse of object-oriented frameworks” [25]. They play an important role in the development of software because they force a distinction be-tween structural changes and changes in functionality.

1.1.4 Type Inference

Type inference is the assignment of types to identifiers, expressions and methods whose types have not been explicitly specified.

1.1.5 Type-Checking

The verification problem corresponding to type inference is type checking. Type checking guarantees that a program is well-typed. Two types of type-checking, static and dynamic, are used widely in object-oriented languages. Smalltalk [13] employs dynamic type-checking, which occurs at run-time. Languages like C++ [32, 21] and Java [1, 14] employ static type-checking. Static type-checking occurs at compile time and can verify the correctness of a program in some respects.

(12)

1. There is a certain amount of program verification

2. Type annotations on variables and methods are a form of documentation

3. Type information can be used in optimisation algorithms.

The advantages of dynamic type-checking are as follows [11]:

1. There is more flexibility

2. Abstractions tend to be more general

3. The syntax and semantics of the language are simpler

1.2 Preliminary Study

I had the impression, when programming in a certain dynamically typed language that did not need type annotations, that it was much easier and faster to write programs in it than in a statically typed language with a similar philosophy that I had used very often before. Conversations with programmers who had considerable experience with both those languages indicated that the impression was shared by many. This convinced me that it made sense to delve deeper into the problem, and motivated me to try and identify ways in which writing programs and performing modifications in dynamically typed languages could be faster than in statically typed and type annotated ones.

The most obvious difference between a dynamically typed language like Smalltalk and a statically typed language like Java, neglecting their syntax, seemed to be their type systems. In an effort to understand how these differences affected the ease of modification of programs, it was decided to undertake a study of refactorings, which are a form of behaviour-preserving software modification.

In order to see if refactorings were in any way affected by differences in type systems, we studied properties of refactorings culled from a catalogue [12]. The aim of this preliminary study was also to further our understanding of the process of software modification, and to explore the use of type inference in helping with it.

(13)

of interest from the point of view of the goals of the thesis, but we discovered while dividing refactorings into smaller constituent refactorings, that type annotations were the cause of certain modification overheads in statically typed programming languages, and that the overheads could be eliminated by type inference.

1.3 Problems

1.3.1 Type Annotation Management

While Smalltalk has been a very important OO language, all of the more recently designed, and widely used languages, such as C++ and Java, have static type systems and use type-annotations. Type annotations are comments for the compiler indicating the type of an identifier or function. In languages like C++, they take the form of type specifiers in declarations. In Java and C++, they also have a high degree of redundancy. In other words, they are required in places where the type could easily have been inferred.

In the absence of automated support, type annotations force programmers to man-ually propagate changes from one annotated point to all other annotated points affected by a modification. The time taken to perform a set of changes on a program can be reduced by automating the satisfaction of type annotation dependencies. (The disadvantages of type annotations are described in more detail in Chapter 4).

No tools currently exist to automate the satisfaction of type annotation depen-dencies in a program (that is, to manage type annotations). Any such tool would be useful in Integrated Development Environments (IDEs) or reengineering environments like MOOSE [7], to speed up programming in statically typed languages.

1.3.2 Identification of Candidate Refactorings

(14)

which refactorings to apply, and concludes that therefore refactorings cannot be completely automated.

There are many popular tools that automate the performance of a refactoring once it is selected by a user. But, there are only a few tools (described in Chapter 2) that recom-mend refactorings. The difficulty with recomrecom-mending refactorings is that design soundness is very hard to qualify and quantify. Besides, “there is no theory (i.e. systematically orga-nized explanation) of how people refactor object-oriented application frameworks and the kinds of refactorings they make.” [25]

An attempt to automate the identification of refactorings was made by Kataoka et al [18] in 2001. Their tool uses invariants discovered by profiling a program to identify candidates from a certain small pool of refactorings. We follow a different approach to identify candidates from a different but again fairly small set of refactorings in real-time, as described in Chapter 5.

1.4 Proposed Solution

The thesis hypotheses are:

1. Control-flow insensitive type inference is useful in eliminating overheads due to type annotations.

2. Certain refactorings can be identified when an error occurs because of type inconsis-tencies, using information associated with the type constraint graph.

Type inference was found to enable interesting solutions to both the problems described in the previous section. It is known to be useful for verifying type correctness, for compiler optimisations and for creating documentation for users [27, 26]. Control-flow insensitive type inference is not very useful in type annotated languages, since type correctness can be verified in a much simpler manner. It does, however, have benefits. Type inference (even control-flow insensitive type inference) provides a solution to the problem of programming inefficiencies caused by type annotation redundancies. Type inference can also be used to identify candidate refactorings.

(15)

recom-mend refactorings. The tool uses type-inference algorithms described by Palsberg and Schwartzbach [27, 26]. An analysis of refactorings reveals that some of them induce a re-laxation of constraints at some point in a type constraint graph. This property is used to identify candidate refactorings while a program is being modified.

1.5 Contributions

We have thus far described the problems we have attempted to solve in this thesis, the context of those problems, and the proposed solutions. Our principal contributions are:

1. The development of a tool to remove inefficiencies associated with programming in statically typed, type annotated languages. The tool, which we shall call the type an-notation management tool, uses control-flow insensitive type inference to assist in the modification of programs. It makes the modification of programs easier by automat-ing the satisfaction of type annotation dependencies in a program. The tool works with a subset of the Decaf [8] language which resembles Java. We have not performed user studies, but the benefits can be easily recognized in terms of a reduction in the number of alterations that have to be made when a type annotation is changed.

2. The development of a new approach to identifying candidate refactorings. We have developed a new method to identify candidate refactorings. Our approach detects refactorings that can help resolve type inconsistencies in the course of modification of a program. These capabilities have not been integrated into the type annotation management tool as yet, so this thesis will primarily deal with the theory.

1.6 Thesis Outline

This thesis is structured as shown below:

Chapter 1 introduces the aims of the thesis, describes preliminary work and states the problem.

(16)

Chapter 3 describes a categorisation of refactorings based on the way they affect type annotations.

Chapter 4 details the theory of type inference and how it can be adapted to our needs. The chapter also contains a description of our type annotation management tool, and the algorithm it uses.

Chapter 5 talks about an approach to identifying candidate refactorings.

(17)

Chapter 2

Related Work

2.1 Refactoring

Refactorings are incremental, systematic reorganisations of software, usually with a view to making it more readable and easier to modify. A definition of refactorings is as follows [30].

Refactorings are changes whose purpose is to make a program more reusable and easier to understand, rather than to add behavior. Refactorings are specified as parameterized program transformations along with a set of preconditions that guarantee behavior preservation if satisfied.

The term ‘refactoring’ seems to have originated from a statement of Peter Deutsch’s that contains the phrase ‘functional factoring’ [31]. Refactoring gained popularity with members of the Smalltalk community, and became a central feature of the agile process called Extreme Programming (XP) [2]. It was in connection with object-oriented languages, especially Smalltalk, that a lot of early work on developing refactoring as an activity sepa-rate from other software development activities, was done. Catalogues of refactorings were first developed for object-oriented languages [12], in connection with literature on Extreme Programming (XP) [2]. Recently, however, there have been attempts to create catalogues of refactorings for functional programs [33].

(18)

structure of software, ultimately making maintenence more costly. He also suggests that the introduction of errors in the course of restructuring could be prevented by separating structural manipulations from other maintenance activities, and holding the semantics of a system constant using a tool [17].

2.2 Refactoring Tools

Griswold, and later Opdyke and Roberts, preferred the tools approach to solving the problems of structural maintenance. Griswold draws his conclusions about the prof-itability of tools usage from the observation that tools reduce errors by preserving meaning. Griswold experimented with restructuring in a modular imperative programming language, Scheme, and he says,

Automating the meaning-preserving activities of restructuring through transfor-mation involves the manual process of restructuring. In particular, the automa-tion not only prevents the introducautoma-tion of errors during restructuring, but also allows locally specifying structural changes.

Opdyke’s work defines a set of basic program restructuring operations and three complex ones. He shows how behaviour-preserving transformations could be automated, with some knowledge of their preconditions, and defines design constraints (class invariants and exclusive components) needed in refactoring.

Dan Roberts, in his PhD thesis [31], uses postconditions to make transformation algorithms more efficient, and presents a refactoring tool which uses postconditions to reduce computation in chains of refactorings. His work formed the basis of the Smalltalk refactoring browser.

2.3 Programming and Design Enhancement Tools

(19)

good design, and use that as a yardstick for determining the choice of changes. The third tool attempts to identify candidate refactorings.

2.3.1 Demeter System

While the purpose of restructuring is to make software easier to modify and en-hance, the purpose of refactoring tools is to reduce the difficulty of restructurings by elim-inating transformation errors. However, there have been other approaches to making soft-ware easier to modify and enhance. Some of these approaches blur the distinction between tools and languages. One of them is the Demeter system. The Demeter tools, as we shall describe below, allow a user to program at a higher level of abstraction and are capable of restructuring a program to make it conform to rules known as the Laws of Demeter [20].

The objective of the Demeter project is ‘to improve the productivity of software developers by an order of magnitude through tools and a theory of adaptive program-ming.’ [19] Demeter tools ‘automate common programming practices’ [20] in order to im-prove programming efficiency. Demeter also uses a more expressive class notation than most object-oriented programming languages, thus blurring the distinction between the language and language tools.

The first Law of Demeter (LoD) is described as a programming language indepen-dent rule which encodes ideas of good design [20]. It is provable that any object-oriented program can be transformed to satisfy the law. The law says that in a given method, messages may be sent to only a restricted set of objects, and these include the method arguments, the object on which the method was called, and the immediate subparts of that object. The LoD is closely related to the Demeter SystemTMdeveloped by the Demeter Re-search Group at Northeastern University. The results claimed for the LoD are very similar to those claimed for refactoring, i.e., the promotion of maintainability and comprehensibility of software.

2.3.2 Guru

(20)

With Guru, the user need not specify what refactorings to perform and where. In spite of running without any additional information (it actually loses information about the inheritance hierarchy), it succeeds in performing behaviour-preserving transformations on Self programs.

Guru takes a set of objects as input, discarding any existing inheritance informa-tion, and creates a hierarchy based on the methods present in the objects, and their fields. The system has the ability to simultaneously refactor methods and inheritance hierarchies. The rule-of-thumb used is that a design is better if it ‘maximizes sharing and minimizes duplication of the features (mostly methods) of objects and concrete classes’ [23].

2.3.3 Kataoka’s Tool

In their 2001 paper [18], Kataoka, Griswold, Ernst and Notkin describe a tool for identifying candidate refactorings. Aimed at encouraging the application of refactorings by ‘reducing the cost of detecting candidates for refactoring and of choosing the appropriate refactoring transformation’ [18], their approach uses program invariants computed by an-other tool called Daikon [9], as a means of detection of conditions suitable for the initiation of a refactoring.

In the first step of a two-step process, Daikon [9] performs dynamic invariant detection by appropriately instrumenting the target program for tracing variables of interest, running the program over a suitable test suite, and inferring invariants over the values thus obtained. In the second step, a program and the invariants in the program are taken as inputs, and candidate refactorings are identified.

(21)

Chapter 3

Categorisation of Refactorings

In this chapter, we will analyse a catalogue of refactorings with respect to their behaviour in the presence of type annotations. The refactorings studied are drawn from the collection published by Fowler et al [12], and the primitive refactorings suggested by Opdyke [25, 31].

Our motivation for undertaking this preliminary study is a) to understand how refactorings work and b) to understand the role that types play in refactoring

It was decided to categorize refactorings according to the way they affected type annotation counts. The choice was motivated by a desire to understand the way refactorings affected type annotations, and through that, hopefully the way software modification was affected by type systems.

(22)

int foo(int b) {

return l + b * 2 ; }

i ---int twice(---int b) {

return b*2 ; }

int foo(int b) {

return l + twice(b) ; }

ii

Figure 3.1: Extracting a Method

3.1 Types of Refactorings within a Category

The notion of primitive, intermediate and large refactorings is examined in more detail below.

3.1.1 Primitive Refactorings

The primitive refactorings are six in number, as shown below, and are defined as refactorings that do not use any other refactorings. They are from Opdyke’s list, and were picked for their effect on type annotations. The six refactorings split neatly into two groups of three each, where each group is the inverse of the other. The first three refactorings of Table 3.2 constitute one of the groups and the last three refactorings constitute the other. All primitive refactorings are assigned the prefix ‘p-’.

3.1.2 Intermediate Refactorings

(23)

i-Extract-Methodis one since it only uses the primitive refactoringp-Create-Method. Replace-Parameter-With-Method only uses primitive refactorings, but it is not used by any other refactoring, and therefore is not an intermediate refactoring, but a large refactoring. All intermediate refactorings are assigned the prefix ‘i-’.

3.1.3 Large Refactorings

Large refactorings are those that do not fall in either of the two categories. They are not assigned a prefix, and comprise the largest group. Examples are Decompose-Conditional-Expressionand Consolidate-Conditional-Expression.

3.2 Methodology

Creating an empty class Creating a member variable Creating a member function Deleting an unreferenced class Deleting an unreferenced variable Deleting a set of member functions Changing a class name

Changing a variable name

Changing a member function name

Changing the type of a set of variables and functions Changing access contol code

Adding a function argument Deleting a function argument Reordering function arguments Adding a function body

Deleting a function body

Convert an instance variable to a variable that points to an instance Convert variable references to function calls

Replacing statement list with function call Inlining a function call

Change the superclass of a class

Moving a member variable to a superclass Moving a member variable to a subclass

(24)

p-Create-Variable Creating a member variable p-Create-Method Creating a member function p-Add-Parameter Adding a function argument p-Remove-Variable Deleting an unreferenced variable p-Remove-Method Deleting a set of member functions p-Remove-Parameter Deleting a function argument

Table 3.2: Primitives Refactorings that Affect the Number of Type Annotations

Opdyke identified a set of twenty-three primitive refactorings and defined a few large refactorings based on the primitives. The primitive refactorings [25, 31] are listed in Table 3.1.

From among those primitives, a set of six primitives was found to affect the number of type annotations. The six are listed in Table 3.2 alongside the primitive refactorings from Table 3.1 that they were derived from.

The study proceeded in three stages.

1. In the first stage of the study, the refactorings in the catalogue of refactorings by Fowler et al [12] were examined for their effect on type annotations. According to how they changed the number of type annotations, they were placed in one of three categories: Type Annotation Increasing (TAR), Type Annotation Reducing (TAR) and Type Annotation Neutral (TAN).

2. In the second stage, the properties of each group were studied. Each refactoring in a group was decomposed into the biggest refactorings that it was made up of.

3. In the third stage, common properties of all refactorings in each group were identified.

The following example illustrates our comparative notion of ‘bigger’ as applied to refactorings.

(25)

3.3 Results

Of the seventy-eight refactorings taken from the catalogue of Fowler et al [12], and Opdyke’s PhD thesis [25], thirty-six were found to increase type annotations (TAI). A twenty-four were found to decrease type annotations (TAR). Of the remainder, six were found not to affect the number of type annotations (TAN) though they did affect type annotations. The remaining twelve were not included in the analysis for various reasons listed below.

1. Refactorings that were not likely to affect type annotations at all, like the refactor-ing to change the name of a method, were not taken into account. This eliminated the refactoringsConsilidate-Duplicate-Conditional-Fragments,Introduce-Assertion, Rename-Methodand Hide-Method.

2. Refactorings that used programming constructs not easily expressed in Decaf, Change-Value-To-Reference,Change-Reference-To-Value,Replace-Error-Code-With-Exception and Replace-Exception-With-Testwere not included.

3. Refactorings that were not easy to express in terms of components built up from the Opdyke primitives were also not taken into account, and these were Substitute Al-gorithm, Replace-Nested-Conditional-With-Guard-Clauses, Remove-Control-Flag and Duplicate-Observed-Data.

The refactorings in each category are listed in Tables 3.3, 3.4 and 3.5.

The left-over refactorings are listed in Table 3.6. Below, we study the individual refactorings and the rationale behind the choice of categories.

3.3.1 Type Annotation Increasing

TAI refactorings commence with the three primitive refactorings,p-Create-Variable, p-Create-Methodand p-Add-Parameter.

(26)

p-Create-Variable p-Create-Method p-Add-Parameter i-Push-down-Field i-Push-down-Method i-Extract-Method i-Create-Accessor Extract-Subclass Extract-Interface Pull-Up-Constructor-Body Encapsulate-Downcast Form-Template-Method Decompose-Conditional-Expression Consolidate-Conditional-Expression Replace-Conditional-With-Polymorphism Introduce-Null-Object Replace-Parameter-With-Method Replace-Temp-With-Query Split-Temporary-Variable Introduce-Explaining-Variable Remove-Assignments-To-Parameters Introduce-Foreign-Method Introduce-Local-Extension Self-Encapsulate-Field Replace-Data-Value-With-Object Replace-Array-With-Object Change-Unidirectional-Association-To-Bidirectional Encapsulate Field Encapsulate-Collection Replace-Record-With-Data-Class Replace-Type-Code-With-Class Replace-Type-Code-With-State-Strategy Replace-Magic-Number-With-Symbolic-Constant Extract-Class Hide-Delegate Replace-Type-Code-With-Subclass

(27)

p-Remove-Variable p-Remove-Method p-Remove-Parameter i-Pull-up-Field i-Pull-up-Method i-Inline-Method i-Remove-Parameter-Use-Member-Variable i-Remove-Accessor i-Preserve-Whole-Object Extract-Superclass Collapse-Hierarchy Parameterize-Method Replace-Parameter-With-Method Introduce-Parameter-Object Remove-Setting-Method Inline-Temp Change-Parameter-Into-Member-Variable Replace-Method-With-Method-Object Move-Method Move-Field Change-Bidirectional-Association-To-Unidirectional Inline-Class Remove-Middleman Replace-Subclass-With-Fields

Table 3.4: Type Annotation Reducing Refactorings

p-Create-Class p-Remove-Class i-Move-Method i-Move-Field Separate-Query-From-Modifier Replace-Constructor-With-Factory-Method

Table 3.5: Type Annotation Neutral Refactorings

// Examples of

int i ; // p-Create-Variable

float foo(float f) ; // p-Create-Method

float foo(float f, float ff) ; // p-Add-Parameter

(28)

Consolidate-Duplicate-Conditional-Fragments Remove-Control-Flag

Replace-Nested-Conditional-With-Guard-Clauses Introduce-Assertion

Rename-Method Hide-Method

Substitute-Algorithm

Replace-Error-Code-With-Exception Replace-Exception-With-Test Duplicate-Observed-Data Change-Value-To-Reference Change-Reference-To-Value

Table 3.6: Left-Over Refactorings

of type annotations by the number of parameters plus one for the return type. Adding a parameter to an existing method increases the number of type annotations by one.

Some TAI refactorings can be shown to be type increasing because they are made up only of other TAI refactorings. Such refactorings are listed in Table 3.7 There are five other refactorings in TAI category that present a tougher problem. They are:

1. i-Push-down-Fieldwhich usesp-Create-Variableandp-Remove-Variable. This is TAN when the number of variables created equals the number of variables removed, which occurs when there is only one subclass. If there is more than one subclass to which the variable is pushed down, the refactoring is TAI.

2. i-Push-down-Method which uses p-Create-Methodand p-Remove-Method. This works just like the previous refactoring.

3. Replace-Parameter-With-Method, which is made up ofp-Create-Methodand p-Remove-Parameter. The latter being a TAR refactoring, it is less easy to see how this refac-toring is TAI.

Consider the example in Figure 3.3

(29)

i-Extract-Method p-Create-Method

i-Create-Accessor p-Create-Method

Extract-Subclass p-Cr-Class, i-Push-dn-Fld, i-Push-dn-Meth Extract-Interface p-Create-Class and p-Create-Method Encapsulate-Downcast i-Extract-Method

Form-Template-Method i-Extract-Method Pull-Up-Constructor-Body i-Extract-Method Decompose-Conditional-Expression i-Extract-Method Consolidate-Conditional-Expression i-Extract-Method Replace-Conditional-With-Polymorphism i-Extract-Method Introduce-Null-Object Extract-Subclass Split-Temporary-Variable p-Create-Variable Introduce-Explaining-Variable p-Create-Variable Remove-Assignments-To-Parameters p-Create-Variable Introduce-Foreign-Method p-Create-Method Introduce-Local-Extension p-Create-Method Self-Encapsulate-Field i-Create-Accessor

Replace-Data-Value-With-Object p-Create-Class and i-Create-Accessor Replace-Array-With-Object p-Create-Class and i-Create-Accessor Change-Unidirectional-Assoc-To-Bidir i-Create-Accessor

Encapsulate Field i-Create-Accessor

Encapsulate-Collection p-Create-Class and i-Create-Accessor Replace-Record-With-Data-Class p-Create-Class and i-Create-Accessor Replace-Type-Code-With-Class p-Create-Class and i-Create-Accessor Replace-Magic-No-With-Symbolic-Const p-Create-Variable

Extract-Class p-Create-Class and i-Create-Accessor

Hide-Delegate p-Create-Method

Table 3.7: Composition of TAI Refactorings made up of TAI refactorings

// Before

boolean setFlag(boolean b) { return b_ = b ; }

// After

boolean setFlagTrue() { return b_ = true ; } boolean setFlagFalse() { return b_ = false ; }

(30)

happens).

4. Replace-Temp-With-Querywhich usesi-Extract-Methodandp-Remove-Variable. This results in the removal of a temporary variable and the addition of a method. Since a method addition increases the number of annotations by at least one, (owing to the return type that needs to be specified even though the message takes no parame-ters), this refactoring is TAN in one special case, where the method created takes no parameters. In all other cases, the refactoring is TAI.

5. Replace-Type-Code-With-Subclasswhich usesp-Create-Class,p-Create-Methodand p-Remove-Variable Again, as above, the refactoring can be TAN or TAI, since the components other than the type annotation neutral p-Create-Class are identical to those in the previous refactoring.

3.3.2 Type Annotation Reducing

TAR refactorings stem from three primitives,p-Remove-Variable,p-Remove-Method andp-Remove-Parameter. They are all TAR because they are the inverses of the three TAI primitive refactorings. Since they are additive, any refactorings using them alone will also be TAR.

i-Inline-Method uses p-Remove-Method.

i-Remove-Parameter-Use-Member-Variable uses p-Remove-Parameter. i-Remove-Accessor uses p-Remove-Method.

i-Preserve-Whole-Object uses p-Remove-Parameter, p-Remove-Variable. Extract-Superclass uses p-Create-Class, i-Pull-up-Fieldand i-Pull-up-Method. Collapse-Hierarchy uses i-Pull-up-Field, i-Pull-up-Methodand p-Remove-Class. Replace-Parameter-With-Method uses p-Remove-Parameter.

Remove-Setting-Method uses p-Remove-Method. Inline-Temp uses p-Remove-Variable.

Move-Methodusesi-Move-Methodandi-Remove-Parameter-Use-Member-Variable. Move-Field usesi-Move-Method andi-Remove-Accessor.

Change-Bidirectional-Association-To-Unidirectional uses i-Remove-Accessor. Inline-Class uses i-Remove-Accessorand p-Remove-Class.

(31)

Replace-Subclass-With-Fieldsusesp-Create-Variable, p-Remove-Methodand p-Remove-Class.

All the above use only TAR or TAN refactorings, and therefore are TAR refactor-ings. The ones that follow sometimes use TAI refactorrefactor-ings.

1. i-Pull-up-Fielduses p-Remove-Variable. This is the inverse ofi-Push-Down-Fieldand is therefore TAR.

2. i-Pull-up-Methoduses p-Remove-Method. This is the inverse ofi-Push-Down-Method and is therefore TAR.

3. Change-Parameter-Into-Member-Variable

// Before

void foo(boolean b) { ... = b ; } void bar(boolean b) { ... = b ; }

// After boolean b_ ;

void foo() { ... = b_ ; } void bar() { ... = b_ ; }

Figure 3.4: Change-Parameter-Into-Member-Variable

Again consider the example in Figure 3.4. As many type annotations are removed as there are methods using that object. This refactoring is TAN iff the number of methods using the object is one.

4. Replace-Method-With-Method-Object

This usesp-Create-Class,Change-Parameter-Into-Member-Variableandi-Create-Accessor. Consider the example in Figure 3.5. The penalty of the member variable declara-tion and the setter have to be offset by gains from Change-Parameter-Into-Member-Variable. Thus, at least two method parameters have to be eliminated by the refac-toring for it to be TAR.

(32)

// Before

void foo() { ... = bar(3) ; }

int bar(int i) { return ..*barcalls(i) ; } int barcalls(int i) { return ..(. = i) ; } int barcallscalls(int i) { return ..(. = i) ; }

// After

void foo() { Bar b = new Bar() ; b.setI(3) ; ... = b.bar() } class Bar {

private int i_ ;

void setI(int i) { i_ = i ; }

int bar() { return ..*barcalls(i_) ; } int barcalls() { return ..(. = i_) ; } int barcallscalls() { return ..(. = i_) ; } }

Figure 3.5: Replace-Method-With-Method-Object

This usesp-Create-Class,p-Add-Method,p-Add-Variableandi-Preserve-Whole-Object. As shown in Figure 3.6, the addition of methods and variables exacts a type anno-tation cost, which has to be overcome by the benefits of i-Preserve-Whole-Objectfor the refactoring to be TAR.

For the creation of an object withn member variables, there is a cost of at least 3n

type annotations. For the refactoring to be TAR, the number of methods benefiting from this will have to be greater than 3n/(n−1), since each method will shave off

n−1 type annotations from the program.

6. Parameterize-Methodis the inverse of Replace-Parameter-With-Method and is there-fore TAR.

3.3.3 Type Annotation Neutral

The primitive refactorings of this group are p-Create-Class and p-Remove-Class. These have no effect on type annotations.

(33)

// Before

void foo(int a, int b) { ... = a ; ... = b ; }

// After

class Pair { int a; int b;

Pair(int aa, int bb) { a = aa ; b = bb ; } int a() { return a ; }

int b() { return b ; } }

void foo(Pair p) { ... = p.a() ; ... = p.b() ; }

Figure 3.6: Introduce-Parameter-Object

1. i-Move-Method which uses p-Create-Method and p-Remove-Method. Since p-Create-Methodandp-Remove-Method are inverses of each other, and since just one of each is used in the refactoring, the refactoring as a whole is TAN.

2. i-Move-Field which uses p-Create-Variableand p-Remove-Variable. This is TAN for the same reasons as the previous refactoring is.

The large TAN refactorings are:

// Before

int foo(int i) { ... = i ; return ... ; }

// After

void foo(int i) { ... = i ; } int bar() { return ... ; }

Figure 3.7: Separate-Query-From-Modifier

1. Separate-Query-From-Modifier which as we see in Figure 3.7, does not change the number of annotations.

(34)

constructors do not have a return type, but since their name is tied to the ‘return type,’ it can be thought of as a type annotation.

3.4 Observations

The following observations can be made about the categories

:-1. TAR refactorings tend to increase the visibility of code and data. They often move code and/or data to broader scopes, or broader namespaces, and consequently increase their visibility. An example is i-Pull-up-Field, which moves a member variable to a superclass, making it available to a potentially larger mass of code.

2. TAI refactorings do the opposite. They tend to decrease the visibility of code and data. A good example is Extract-Subclass, which moves some code from a class into a new subclass. This prevents the private members in the subclass from being visible to the superclass as before.

3. TAN refactorings add and delete type annotations, but do not change the number of type annotations in the program.

4. As we have seen, the three primitive refactorings in the TAR group are inverses of the primitive refactorings of the TAI group. Interestingly, many of the intermedi-ate refactorings of the two groups are also inverses of one another. Thus, the TAI intermediate refactorings, i-Push-down-Field,i-Push-down-Method, i-Extract-Method andi-Create-Accessorhave counterparts in the TAR group with the names, i-Pull-up-Field,i-Pull-up-Method,i-Inline-Methodandi-Remove-Accessor. There are, however, two refactorings in the TAR group which do not have corresponding opposites in the TAI group, and they arei-Remove-Parameter-Use-Member-Variablewhich is a refac-toring not found in the catalogue, and i-Preserve-Whole-Object, which is from the catalogue.

(35)

change type annotations. They only have the effect of changing the dependencies between type annotations and of adding and deleting type annotations.

3.5 Conclusions

It can be concluded from the observations about the categories, that in a vast majority of cases where a set of behaviour-preserving transformations is performed, reducing the visibility of code or data increases the number of type annotations, and increasing the visibility of code or data reduces the number of type annoations.

It can also be seen that most refactorings do not change type annotations, but only the number of type annotations in the program, and the dependencies between them. This suggests that there are two kinds of modifications that can be made to a program, modifications that change type dependencies, and modifications that change the types of expressions, identifiers and methods without affecting type dependencies in any way.

(36)

Chapter 4

Type Annotation Management

This chapter deals with the problem of eliminating overheads in software modifi-cation caused by type annotations in many popular statically typed languages.

4.1 Disadvantages of Type Annotations

In his description of the ML language, Milner says, “Although it can be argued convincingly that to demand type specifications for declared variables, including the for-mal parameters of procedures, leads to more intelligible problems, it is also convenient — particularly in on-line programming — to be able to leave out these specifications.” [22]

Languages like JavaScript that have no type annotations. They are used widely in problem domains which do not demand high run-time efficiency. They are also useful to users who want to avoid the greater programming overhead and more numerous rules of syntax and semantics of statically typed, type annotated languages. Since these dynamically typed languages are type-checked at run-time, a measure of confidence in the software can only be obtained through extensive testing.

(37)

int i ; int j = i ;

Figure 4.1: Overheads from Type Annotations

prone.” Type annotation is a source of typing dependencies. Thus, it is faster to program in languages with fewer type annotation.

The increase in difficulty of programming associated with type annotations is di-rectly related to the lack of type inference. In order to accomplish static type verification in the absence of type inference, it becomes essential to manually maintain type annotations and constraints between them. Whenever a change is made to the type of an identifier, parameter or method return, henceforward referred to as an annotated point, the user is forced to readjust all annotated points dependent on the same. This results in considerable programming overhead that could be avoided through the use of type inference.

4.2 Hypothesis

Type annotations make it more difficult to change programs. For example, consider the program in Figure 4.1. If one changes the type of variable i tostring, one is forced to change the type of variablejtostringas well, in order to make the assignment type-correct. Thus, a primary change requested by a programmer necessitates one or more secondary changes, resulting in greater effort being expended per modification than necessary.

The hypothesis that we attempt to verify in this chapter is that control-flow in-sensitive type inference is useful in eliminating overheads due to type annotations.

(38)

4.3 Imlementation Inheritance and Specification Inheritance

In his PhD thesis [16], Graver defines implementation inheritance hierarchies and specification inheritance hierarchies as follows:

Implementation inheritance hierarchy. An implementation inheritance hierarchy is one where a subclass inherits the implementation of its superclass (or superclasses). A subclass may override an inherited method merely by defining a new method with the same name. Dynamically typed programming languages use implementation inheri-tance hierarchies.

Specification inheritance hierarchy. A specification inheritance hierarchy is one where a subclass inherits not only the implementation of its superclass but also its speci-fication. A subclass may only override an inherited method if the type of the new method is a subtype of the type of the inherited method. Almost all statically typed object-oriented programming languages, especially those that use type annotations, also use specification inheritance hierarchies.

We shall henceforth refer to languages with implementation inheritance hierarchies as implementation-inheritance languages, and languages with specification inheritance hi-erarchies as specification-inheritance languages.

4.4 Type Inference Theory

In this section, we present some of the theory of type-inference and discuss how it can be adapted to our needs. In the world of functional programming, there are languages like ML that are strongly typed and use type inference [22]. ML has a Semantic Soundness Theorem which certifies that well-typed programs won’t fail in certain ways, and a Syntactic Soundness Theorem that guarantees a program to be well-typed if a type-checking algorithm accepts it.

(39)

Expression: Constraint

1) Id := E [Id] ⊇ [E] V [Id := E] ⊇ [E] 2) E m₁ E₁ ... m_n E_n [E] ⊆ {C |C implements m₁ ... m_n}

3) E₁ ; E₂ [E₁ ; E₂] ⊇ [E₂]

4) if E₁ then E₂ else E₃ [if E₁ then E₂ else E₃] ⊇ [E₂] S [E₃] 5) C new [C new] = {C}

6) E instanceOf C [E instanceOf C] ={C}

7) self [self] = {the enclosing class}

8) Id [Id] = [Id]

9) nil [nil] = {}

Table 4.1: Palsberg and Schwartzbach’s Local Constraint Rules

Their control-flow insensitive algorithm for type inference works with a non-type-annotated implementation-inheritance language. It takes as input a program in the lan-guage, modifies the program to eliminate inheritance [15, 16] abstractions, and outputs either a safety guarantee along with type information about all expressions in the program, or an error.

The type constraints identified by Palsberg and Schwartzbach are of three kinds, local, connecting, and global. The constraints within method bodies are the local constraints and comprise the rules in Table 4.1. The first rule states that the type of the left-hand side in an assignment is a superset of the type of the right-hand side. The second rule is related to method calls. It states that if the messages m₁ to m_n are sent to E, then the type of

E is a subset of the type of all classes with the methods m₁ to m_n. The next two rules are recursive definitions of the types of complex expressions, and the last five describe the types of simple expressions.

Type constraints between methods are called connecting constraints. They ensure that the type of an actual parameter to a method is a subtype of the corresponding formal parameter of the method, and that the type of the return value of a method is a subtype of the variable it is assigned to.

Global constraints are derived from paths in the trace graph, and are useful in certain algorithms for type inference. They are compiled and solved in a constraint solver in worst case exponential time. A quadratic time algorithm was later developed for solving the resulting inequations.

(40)

Expression: Constraint

1) Id = E [Id] ⊇ [E] V [Id = E] ⊇ [E] V |[Id]| = 1 2) E.m₁(E₁) ; ... E.m_n(E_n) ; [E] ⊆ {C |C implements m₁ ... m_n}

3) new C() [new C()] = {C}

4) this [this] = {the enclosing class}

5) super [super] = {the superclass of the enclosing class}

6) Id [Id] = [Id]

7) null [null] = {}

Table 4.2: Local Constraint Rules for Decaf

problem with suitable changes. Firstly, in languages where a type is essentially a class and its subclasses, as in Java and C++, subtyping is the same as subclassing [28]. As these languages permit no explicit union types, a union of types needs to be approximated by the smallest superclass larger than it.

Secondly, since the Decaf [8] language does not have some of the expressions Pals-berg and Schwartzbach’s does, some rules in Table 4.1 can be omitted.

The local constraints we use in our solution are listed in Table 4.2 using Decaf syntax. These constraints form the basis of our algorithm to manage type annotations.

4.5 Type Constraint Graphs

A directed graph whose nodes represent type annotations and object instantiations in a method, and whose edges stand for the constraints between them, is a type constraint graph. Assume a type constraint graph with two nodes, A and B, representing two variables. An edge from node A to node B represents an implicit or explicit assignment of A to B. Therefore, the type of A is a subset of the type of B. For example, the type constraint graph for the method in Figure 4.2 is as shown in Figure 4.3.

(41)

class A {

A foo() {

A x = new A() ; A y = x ;

if (true) return y.foo() ; y = new A() ;

x = y ; return y ; }

}

Figure 4.2: Sample Code

(42)

4.6 Type Annotation Management Tool

The type annotation management tool that we have developed is a proof-of-concept tool that works with a small language called Decaf [8]. When a type annotation is changed anywhere in a program supplied to the tool, secondary changes are made throughout the program until all type constraints are satisfied once again.

4.6.1 Decaf

Decaf [8] is a language similar to the Java language and is statically typed, type annotated, and belongs to the family of specification-inheritance languages. Only a subset of Decaf is supported by our tool. Languages features left out include:

1. Primitive types

These were not included because the differences between by-reference and pass-by-value do not affect typing constraints. Consequently, primitive types in Java are not of much interest.

2. Standard Library

The standard library was left out to create a closed system where the source code for all objects in the system is readily available.

3. Operators

Operators are not interesting since they are mere syntactic sugar for methods.

4. Overloading

Overloading is not permitted in Decaf.

5. Type-Casts

Type-casts are not handled by the tool, because they are problematic. Programs using them can be compile-time correct and fail at run-time with incorrect casts.

4.6.2 Input

(43)

class A {

void a() ; }

class B extends A { void a() ; }

class C extends A { void a() ; }

class Main {

public void main() { C b = new C() ; b = new C() ; b.a() ;

} }

(44)

Enter the 4-tuple Main

main 2 B

Figure 4.5: A Sample Set of Inputs

1. A Program

The tool accepts programs in the Decaf language subject to the constraints outlined above. The parser first checks the program for type-correctness. Only syntactically correct and well-typed programs are accepted by the tool, though future work might be able to extend the tool to handle incomplete or incorrect programs. A sample program has been provided in Figure 4.4.

2. The Required Type Change

The required change is indicated as a four-tuple

{class name, method name, node number, new node type}

where the class name and method name identify a portion of the type constraint graph and the node number the node to be changed. The ‘new node type’ is the type to be assigned to that annotated node, replacing the existing annotation. As soon as the program has been supplied to the tool, the tool outputs a list of annotated nodes with constraint information for each node, and requests the user for a point to change. An example change specification can be seen in Figure 4.5.

4.6.3 Stages

(45)

Stage 1 – Construction of Constraint Graphs

In the first stage, the tool’s front-end converts the given program into a type constraint graph. The front-end is taken from a Decaf compiler, and instrumented to maintain an additional data-structure representing the annotated points. As the parser builds the parse tree, it also builds the constraint graph for the program, and associates it with the parse tree for easy identification. Thus, the nodes of the constraint graph representing annotated points local to a method are associated with the method. This makes it easier for a user to identify the node and to request that it be changed. This is why a four-tuple is used to specify changes. The first two elements of the four-tuple identify the context of the node, and the third element locates the exact node to be modified. The constraint graphs of different methods may be linked to one another through common member variables.

Stage 2 – Performing Annotation Changes

At the beginning of stage two, the user is requested to input the four-tuple spec-ifying the change. The tool then infers the types of nodes affected by the change and alters them suitably. The changes are currently not propagated to the source code, but that should be easy to do by maintaining references to the location of the annotation in the source code in each node of the type constraint graph. At the end of the stage, the changes are communicated to the user along with the new nodes of the constraint graph, and errors are reported if any have occurred. If the errors can be eliminated by refactoring, the corresponding refactoring is recommended.

4.6.4 Outputs

(46)

Class Main Method main>>

0) $temporary1:LC; false:false D [<=b] calls [] and [a:V] U []

1) b:LC; false:false D [] calls [a:V] and []

U [>=$temporary1, >=$temporary2] 2) $temporary2:LC; false:false D [<=b] calls [] and [a:V] U []

3) a:V

4) return:V false:false D [] calls [] and [] U []

Figure 4.6: Stage One Output Snippet

Output of Stage 1

The output at the end of the first stage is mainly used to describe the type con-straint graph and to provide the user with sufficient information to choose a node to change. The output consists of the nodes of all the constraint graphs, listed in the context of their method if any, and of their class if the nodes represent member variables.

(47)

Class Main Method main>>

0) $temporary1:LC; false:false D [<=b] calls [] and [a:V] U []

1) b:LA; true:false ***

D [] calls [a:V] and []

U [>=$temporary1, >=$temporary2]

2) $temporary2:LB; true:false *** D [<=b] calls [] and [a:V]

U [] 3) a:V

4) return:V false:false D [] calls [] and [] U []

Figure 4.7: Stage Two Output Snippet

In some nodes, this is followed by two further lines. These lines list the constraints on the node. The first line lists nodes that are either supersets of or equal to the current node. There are also two other lists on the same line containing method calls by that node and by nodes that are dependent on it.

Outputs of Stage 2

The outputs for stage two include the changes made to the program as a result of user requested changes to annotated points in the program. The second stage outputs are obtained after the algorithms have been run on the constraint graph and the changes propagated to all affected nodes.

The output is a listing of the nodes of the program similar to the listing obtained at the end of stage one. However, the nodes are labelled with the new types as computed by the change propagation algorithm. Nodes that have changed are also flagged as seen in Figure 4.7.

(48)

class A {

void a() ; }

class B extends A { void a() ; }

class C extends A { void a() ; }

class Main {

public void main() { A b = new C() ; b = new B() ; b.a() ;

} }

(49)

4.6.5 Algorithm

Palsberg and Schwartzbach [27, 26] developed an efficient algorithm for type infer-ence in a Smalltalk-like program. They used a constraint solver, and with improvements, were able to reduce the complexity of the algorithm to quadratic time. In our tool, we use an algorithm which closely mimics the way a human would propagate changes to a program. The algorithm has two components as described below.

Local

The local component of the algorithm takes a change request and modifies a node according to the constraints upon it, as determined from the edges of the graph and the method call lists associated with each node. Since the algorithm uses only method lists and graph edges for its computations, and since the constraint graphs are partial in that they do not take into account the typing dependencies between methods, it cannot propagate changes to parts of the program not reachable through some path within the constraint graph in which the changes were initiated.

The algorithm works by making a requested change to the current node if the change is possible given the method call constraints on the node, then requesting all nodes related to it to adjust themselves accordingly. The computation of the types of the new set of nodes can in some cases take worst-case exponential time.

There are two cases to be considered:

1. Change to Assigned

When the right-hand side of an assignment changes, the left-hand side may have to change appropriately if the type constraints are to be maintained. If the type of the left-hand side is equal to or a superset of the type of the right-hand side, there is no need to make any changes. If it is not, a modification will have to be made.

(50)

if the type satisfies all the method call constraints on the node. Thus the algorithm always terminates. If keeps moving the types of the annotated points toward the highest class in the hierarchy, and then, the iteration comes to an end.

As explained above, each node can only be changed as many times as the depth of the class hierarchy in the program. Each time the computation concentrates on a node, it either terminates without propagating changes to the node’s neighbours, or moves the type of the node closer to the root of the hierarchy until all constraints are satisfied, or an error is reported. The example in the previous section illustrates the working of this mode of the local change propagation algorithm.

2. Change to Assignee

The selection of the new type when the left-hand side has changed and the change is being propagated to the node representing the right-hand side, is a little more involved. The reason for this is that the right-hand side node must be either equal to or a subset of the left-hand side node. In Decaf, a class can have only one superclass, but it can have many subclasses, and they must all be tried until a good fit is found. A depth-first search algorithm with backtracking was used, which had worst-case exponential time-complexity.

When a candidate is identified, the type of the local node is changed to that type if the node has never before been changed, and to the union of that and the node’s existing type if it has already been modified.

Thus, the greatest weakness of the algorithm is the backtracking depth-first search just described, which accounts for most of the complexity of the same. However, it is hoped that future work can eliminate the bottleneck.

Interprocedural

Interprocedural change propagation is performed at the same time as local change propagation and is used to propagate the results of a change to other methods. Interproce-dural change propagation is also performed in two directions:

(51)

If the parameters of a method or the method’s return type has changed, all uses of the method are identified and changed and the local change propagation algorithm is run from those points. This terminates in the case of recursive methods because of the terminating conditions for the local change propagation algorithm described above.

2. From Method Use to Definition

If the parameters passed to a method call, a return type or the object on which the method is called is changed, the change is propagated to the parameters or return type of the method as the case may be, and the local change propagation algorithm initiated from that point. The algorithm terminates because of the termination conditions of the local change propagation algorithm.

4.7 Conclusion

(52)

Chapter 5

Identification of Candidate

Refactorings

Kataoka et al [18] opine that refactoring is difficult because it takes considerable effort to determine what refactorings should be performed and when. In this chapter, we describe a method for identifying refactorings from information present in the type constraint graph.

5.1 Background

Casais [5, 4] notices that the library provided with the Eiffel environment incurred major redesigns, and that similar problems had been reported about the design of class libraries for various application domains. He opines that these difficulties arise with the object-oriented approach because:

1. User needs change: additional functionality has to be constantly integrated into ex-isting software.

(53)

apparent.

3. Stable, reusable classes are not discovered in one go, but are developed iteratively.

Casais [4] describes an incremental restructuring algorithm based on the hypothesis that “design flaws can be uncovered at the time a hierarchy is extended with an additional object description.” Since the new class may override or refine inherited properties in ways that might constitute a suboptimal application of object-oriented mechanisms, his algorithm analyses the redefinitions and the inheritance hierarchy to improve the design and to pinpoint places warranting further redesign.

We use an approach to the identification of refactorings that is similar in spirit. Our hypothesis is that certain refactorings can be identified at the time an error occurs because of an inability to satisfy type constraints, by examining conditions in a type constraint graph.

Below, we identify refactorings with properties of interest. Some refactorings cause a relaxation of constraints at certain points in the graph. This property is useful in dealing with overconstrained graphs, and refactorings possessing it can convert a graph with un-satisfiable type constraints to one that can be solved. The following refactorings make type constraint graphs less constrained:

1. Split-Temporary-Variable

2. i-Push-Down-Field

3. i-Pull-Up-Method

4. Extract-Superclass

5. Split-Method

6. i-Push-Down-Method

Furthermore, the following candidate refactorings and refactoring patterns can be identified from observation of a type constraint graph.

1. Split-Temporary-Variable

(54)

Foo temp = foo() ; ... = bar(temp) ; temp = foo() ; Foo bar = temp ;

Figure 5.1: Trap Inviting a Split-Temporary-Variable

3. i-Pull-Up-Method

4. Extract-Superclass

5. Extract-Superclass followed by i-Pull-Up-Method

6. Split-Methodori-Push-Down-Method

5.2 Properties of Refactorings

In this section, we take a closer look at the six refactorings listed in the previous section as having interesting properties which contribute to reducing the tightness of con-straints in a type constraint graph. The next section deals with detecting these refactorings.

5.2.1 Split-Temporary-Variable

The Static Single Assignment (SSA) Form [6, 3] is a way of representing and reasoning about data flow properties of programs. In the SSA form of representation, no variable is assigned to twice. Split-Temporary-Variableworks in a manner reminiscent of the conversion of code to the SSA form. It splits a variable so that a variable that is assigned to twice is now only assigned to once.

The effect ofSplit-Temporary-Variable on a constraint graph is the addition of an extra node. As can be seen, the refactoring can reduce type constraints because the number of edges of the graph does not change but the number of nodes increases by one. Therefore, the refactoring could be useful to a suitable type inference tool.

(55)

Foo temp = foo() ; ... = bar(temp) ; Foo temp1 = foo() ; Foo bar = temp1 ;

Figure 5.2: The Effect of Split-Temporary-Variable

same. For example, in the code snippet listed in Figure 5.1,Split-Temporary-Variable does not have any effect on the graph in terms of loosening constraints, because the constraints on the variable were duplicated in the first place. The effect of performing the refactoring is seen in Figure 5.2. bar is as tightly constrained as ever throughtemp1, and so is temp.

5.2.2 i-Push-Down-Field

Before we deal with this refactoring, we need to make an assumption regarding member variables. We make the Assumption of Strict Encapsulation; this means that member variables are always either private or protected, and never public. This constraint simplifies the detection of the refactoring as we shall see later on.

i-Push-Down-Field moves a member variable to all or some of its subclasses. This is possible if it is not used in its own class. This constraint can be satisfied only if a constraint graph node representing the variable to be pushed down has no edges to or from it created in its enclosing class.

i-Push-Down-Field is very similar to Split-Temporary-Variable in its effect on a constraint graph. It increases the number of nodes while keeping the number of edges constant. In fact, the two refactorings look almost identical, and the value of an instance of the i-Push-Down-Field refactoring can be determined by how well it loosens constraints on the graph.

(56)

class A { }

class B extends A { void z() ; }

class C extends A {

public static void main(String argv[]) { B temp = new B() ;

temp.z() ; }

}

Figure 5.3: Before i-Pull-Up-Method

5.2.3 i-Pull-Up-Method

i-Pull-Up-Method has the effect of making a system less tightly constrained with respect to types because in specification-inheritance languages like Java and C++, it is essential that each annotated point have a type with only one element, a class. (Union types are not available in these languages). So, if a superclass has a method overridden by one or more of its subclasses, it (the superclass) can be used in place of its subclasses in many more places.

Consider the example in Figure 5.3. The variable temp in method main is con-strained to one and only one class even if all constraint edges are removed. It can only be of typeB because the method z() is called on it, and B is the only class with a method z(). Therefore, |[temp]|= 1 and [temp] ={B}, where [temp] stands for the type of the variable temp, and|[temp]|stands for the number of elements in the type of the variable temp.

(57)

class A {

void z() ; }

class B extends A { void z() ; }

class C extends A {

temp.z() ; }

}

Figure 5.4: After i-Pull-Up-Method

class B { }

class C {

} }

Figure 5.5: Before Extract-Superclass

5.2.4 Extract-Superclass

This refactoring works in a way quite similar to the above. It makes a constraint graph less tightly constrained by allowing the union of two previously unrelated classes.

Consider the code in Figure 5.5. The number of types that can be held by the variable temp is only one if all the constraint edges remain as they are (there is one from the object created by ‘new B()’). Therefore, |[temp]|= 1 and [temp] ={B}.

The Elimination of Overheads due to Type Annotations and the Identification of Candidate Refactorings

Contents

List of Figures

List of Tables

Chapter 1

Introduction

1.1

Definitions

1.2

Preliminary Study

1.3

Problems

1.4

Proposed Solution

1.5

Contributions

1.6

Thesis Outline

Chapter 2

Related Work

2.1

Refactoring

2.2

Refactoring Tools

2.3

Programming and Design Enhancement Tools

Chapter 3

Categorisation of Refactorings

3.1

Types of Refactorings within a Category

3.2

Methodology

3.3

Results

3.4

Observations

3.5

Conclusions

Chapter 4

Type Annotation Management

4.1

Disadvantages of Type Annotations

4.2

Hypothesis

4.3

Imlementation Inheritance and Specification Inheritance

4.4

Type Inference Theory

4.5

Type Constraint Graphs

4.6

Type Annotation Management Tool

4.7

Conclusion

Chapter 5

Identification of Candidate

Refactorings

5.1

Background

5.2

Properties of Refactorings