• No results found

Implementation Evaluation and Conclusions

To conclude Chapter 6 and Chapter 7, an evaluation of framework implementation is provided. The assessment is carried out by considering system prerequisites, the functionality offered by the current implementation and by analysing the level of support provided for high-level requirements set out.

The current implementation of the ACM framework enables users to performframework setupto unify heterogeneous artefacts to a property graph representation, andconsistency management. In order to invoke the functionality of the framework certain prerequisites are required. These assumptions and the system requirements for running the framework can be summarised as follows.

• Version Control. The current implementation assumes that the original artefacts are stored in a version control system. This makes it possible for the framework to pull artefacts from the repository and perform operations on them.

• Database backend. It is assumed that the user has a graph database installed. The current implementation supports Neo4j and can be extended to allow migrating to other graph databases.

• User supplied data. In order to extend the framework with new artefacts users are required to supply XSLT files to allow the transformation of their custom XML-based representation to the customGraphMLstructure.

• Tools.The tools used to create and store original artefacts are required to provide functionality to export artefact data to an XML-based representation.

• System requirements. The framework currently runs on aWindows platform. Further pre- requisites include a version control system of the user’s choice (in the current implementation Mercurial is required) and the Java platform11.

Following is a summary of the framework functionality and an analysis of how the current implementation achieves the high level requirements set out in Chapter 4.

• Tool independence.The framework caters for any tool assuming that it allows the extraction of artefact data to an XML-based format. For the present version of the implementation the following heterogeneous tools and formats are selected: OpenOffice Write(.odt), DIA(.dia)

and Eclipse(.java). Additionally, the framework can be extended to cater for artefacts created in further tools.

• Artefact independence. The property graph model allows the representation of any entity characterised by any structural attributes and abstraction level assuming artefact data can be extracted as mentioned above. Apart from the ability to handle any artefact regardless of its type, another important aspect to consider is that the framework should prioritise artefacts which are the most widely used in software development projects. Representations selected for the current implementation include requirement specifications written in natural language, UML class diagrams, Java source code, JUnit test cases, UML use cases, UML sequence diagrams, software architectures (conceptual and module view), subsets of which are used in traditional and agile software projects.

• Automation. A major goal of the ACM framework is the discovery of approaches that allow automation across all stages of the consistency management process. The framework makes it possible to automatically extract and transform heterogeneous artefacts to a uniform format assuming data in the original tools can be accessed. The creation ofintertrace links is currently semi-automated and the approach is presented in the following chapter. Change detection is carried out in an automatic manner and is currently invoked manually by the user. Change impact analysis and consistency checking are automatic. Change propagation automatically suggests resolutions to inconsistencies, however, the propagation of changes is carried out manually by user at their discretion.

• Configuration. The ability to configure the framework during the setup process addresses the "Customisable and non-intrusive" requirement. Users can currently perform framework configuration at startup. However, future work remains to be done in the area to allow users to customise other aspects of the framework, which is discussed in Chapter 10.

• Performance. The ability to effectively handle a varying number of artefacts and changes of different complexity is evaluated in Chapter 9.

8

CHAPTER

EIGHT

AUTOMATING

TRACEABILITY

CREATION

USING

MACHINE

LEARNING

An integral aspect of the ACM framework isTraceability creation, which lays the foundations for subsequent stages. Therefore, its automation plays a pivotal role in providing an effective solution. This chapter introduces an approach based on machine learning to automate trace link creation, which is identified as a classification problem. It then discusses data collection, and feature and model selection. Finally, the trained models are evaluated, and an assessment of the approach and the strategy used to integrate it in the framework are provided.

8.1

Introduction

Traceability creation aims to establish inter trace links between software artefacts. Since the stages of consistency management, which are discussed in Chapter 4, rely on the existence of correct and complete trace links, a mechanism for creating them is intrinsic to the ACM framework. In accordance with the high level requirements of the framework, any traceability approach should be independent of artefacts and tools, and be as automatic as possible. Automatic link creation is also central to the adoption of the framework in real world scenarios, where establishing links manually in a potentially large number of artefacts may not be feasible.

Automating trace link creation is a well-established research problem and various techniques have been proposed to develop more intelligent algorithms to automatically identify links or to complement and improve the accuracy of existing solutions. As described in Chapter 3, these can be categorised in different ways including information retrieval [70], heuristic [100], data mining [109], ontology [190], and rule-based [112] techniques. Despite the number of approaches, providing a solution to accurately and automatically establish trace links among a set of heterogeneous representations remains an open problem. The aim of the approach discussed in this chapter is to provide a machine learning based semi-automated solution to create inter trace links and to cater for diverse artefacts. Prior to discussing the specifics of the approach, basic concepts of machine learning are introduced.