• No results found

3.4 Information Needs for Change Characterization

3.5.6 Other Related Work

Logic Program Querying and Meta-programming Languages

Program query languages [Wuyts 2001, Janzen 2003, Hajiyev 2006, Hou 2006, De Volder 2006] allow writing custom queries that extract information from the source code of a system. SOUL [Wuyts 2001] is a logic-based program querying language to reason over the structure of object-oriented systems. While the SOUL language is very similar to PROLOG, it provides a number of specialized features (such as linguistic symbiosis) that facilitate reasoning over software systems, as well a set of logic libraries that offer dedicated predicates for reasoning about programs written in Smalltalk, Java, C(++) and Cobol.

The JQuery tool [De Volder 2006] uses a PROLOG dialect to offer an expressive means to query source-code entities and the relationships between these entities. The work of Verbaere and De Moor concerning CodeQuest [Hajiyev 2006] and SemmleCode [de Moor 2007] provides a differ- ent approach that favours performance over expressivity. These approaches use respectively Data- log [Ceri 1989] and QL, languages that only offer a subset of the PROLOG language by e.g., limiting the possible forms of recursion and excluding the definition of data structures.

Meta-programming languages allow developers to write programs that generate, analyze or trans- form other programs. Rascal11is a new meta-programming language for source code analysis and ma- nipulation. Rascal programs can read, analyze, transform, generate and/or visualize other programs. It has been designed following the Extract-Analyze-SYnthesize (EASY) paradigm [Klint 2011]. Ras- cal can be applied to several domains such as compiler construction, implementing domain-specific languages, constraint solving, software renovation and so on.

Querying Source Code History

Kellens et al. [Kellens 2011] propose ABSINTHE, a logic-based program query language that sup- ports querying versioned software systems using logic queries. It extends the SOUL program query language with quantified regular path expressions for reasoning about a system’s history. These quan- tified regular path expressions exhibit the properties of each individual version in a sequence of suc- cessive software versions.

In previous work we have proposed Time warp [Uquillas-Gómez 2009], a prototype implementa- tion that extends the SOUL program query language to allows developers to write queries about the history of a system. It is based on the FAMIX and Hismo meta-models and offers an ad-hoc specifi- cation language (library of dedicated predicates) to reason about these models, as well as to express temporal relationships between the entities in both models.

Hindle and German [Hindle 2005] propose SCQL, a dedicated formal model and query language for reasoning over source code repositories. A repository is instantiated as a formal model to serve as the underlying model which they reason about. The model is a graph in which the different entities (e.g., revisions, files, authors) stored in the repository are vertices and their relationships are edges.

SCQL supports temporal logic operators such as previous, after, always, never, etc. used to express queries.

Source Code Change Extraction

Fluri et al. [Fluri 2007] propose change distilling, a tree differencing algorithm for fine-grained source code extraction. They identify changes between two Java programs by finding both a match between the nodes of the compared two abstract syntax trees and a minimum edit script that can transform one tree into the other given the computed matching. The authors improved the existing tree differencing algorithm by Chawathe et al. [Chawathe 1996] to classify change types based on a taxonomy of source code changes that Fluri and Gall established in a prior work [Fluri 2006]. This taxonomy defines changes according to tree edit operations (insert, delete, move, update) in the AST and classifies each change type with a significance level (e.g., else-part insert (high), attribute renaming (high)).

Mining Software Repositories (MSR)

MSR refers to the extraction and processing of information stored by version control systems, such as SVN, CVS, Git. Hassan proposes [Hassan 2009] a technique to predict faults in a system by applying complexity metrics on the changes that are present in the repository. Source Sticky Notes [Has- san 2004] is an approach that annotates a static dependency graph of a system with information that is extracted from the history of a system, to help developers to understand the context of the changes they are applying. DynaMine [Livshits 2005] is a tool that applies data mining techniques on version archives to find common usage patterns by analyzing co-changed methods.

Software Classifications

De Hondt [Hondt 1998] proposes software classifications as a means to recover architectural elements in evolving object-oriented systems. This approach is based on Reuse Contracts that we described before. Software classifications consist of a model and a technique. The software classification model provides simple concepts to organize large software systems and their evolution in manageable units (classifications). The software classification technique provides strategies to set up and recover those manageable units. Software classifications have been applied to: (a) expressing multiple views on software, (b) recovering of collaboration contracts, (c) recovering of reuse contracts, (d) recovering of architectural components, and (e) management of changes.

Aspect-Oriented Software Analysis

Within the field of aspect-oriented software development, numerous techniques have been proposed that mine for crosscutting concerns [Kellens 2007], such as the work of Marin [Marin 2007], who uses fan-in analysis to identify possible aspect candidates.

3.6

Conclusion

In this chapter we have presented the intent of our solution: a comprehensive tool suite to provide integrators access to the information discussed in Section3.4.

We covered four topics related to changes that complement the problems inherent of the integra- tion process in a collaborative environment (see Chapter2). The use of branching and merging make

3.6. Conclusion 57

the integration process a very complex task. Unfortunately, current tools do not provide the adequate support for developers who integrate changes within a single branch or between branches. Developers lack tools that aid them in understanding the context of changes in an efficient manner. Such activity is mostly manual and time consuming. Merging changes between branches is even more complicated, and no tools exist to identify and understand changes that can be applied from one branch to another, that is supporting cherry picking.

An overview of the integration process as it can be found in open-source development was briefly described as a background to introduce the definitions and terminology that we use in this disserta- tion. Even though, most of the terminology is well-known (e.g.,, commits or history), we provided our definitions in the context of the integration of changes. We introduced terms such as stream of changesand delta dependencies to refer to a sequence of set of changes within a branch, and to the dependencies between these sets of changes, respectively. Note also that some definitions are tailored to support integration in Pharo, but they can be refined and applied to other infrastructures.

The second topic presented in this chapter was a catalogue of 64 questions that developers ask when they are integrating or want to integrate changes. We conducted a study to gather such questions as a means to identify and understand the developers’ information needs that can ease the answering of the questions, and therefore to support them integrating changes. Moreover, these questions serve as the foundation to assess our contributions in Chapters5and7. We described the methodology used for our study, the data and results obtained. However, as we do not only intend to base our contributions on our identified questions, but on to relevant questions that integrators raise independently of the programing language or tools used, our findings were also extended and verified with other questions found in similar but broader studies. The 64 questions were clustered in 5 categories, and for each category a description was added. This part ended with a discussion about how tools support or can support answering of these questions.

The third topic described the information that can be used to define the requirements for our solution presented in Chapter2.4. Based on the questions, we identified 12 kinds of information such as size, author, time, structure, change scope, vocabulary, dependencies, and so on that we can use for the change characterizations. Additionally, we presented a summary including which questions can be answered by each kind of information, and to which extent the support can be provided.

The last topic presented the state-of-the-art of relevant work for the context of this dissertation. We focused on five topics: (a) modeling source code, history and changes, (b) merging, (c) change impact analysis, (d) change dependencies, and (e) understanding development tasks by means of questions. We discussed how their goals relate or differ from ours.

The next chapter introduces the core – the Ring source code meta-model – of the infrastructure needed to assess our contributions. Our history and change models and the analyses that will be presented in later chapters are built on top of it.

CHAPTER

4

Ring: a Unified Model for Source Code

Representation

Contents

4.1 Introduction . . . 60