Evaluation Design - Towards a holistic framework for software artefact consistency management

The design of the evaluation consists of two steps. Firstly, the most suitable methodology to investigate each evaluation question was selected. Secondly, the data collection strategy was planned and specific systems were selected.

9.3.1 Research Method Selection

9.3.1.1 Evaluation of Hypotheses

As described by Sjøberg et al., empirical research "seeks to explore, describe, predict, and explain natural, social, or cognitive phenomena by using evidence based on observation or experience" [217]. Empirical software engineering research provides an extensive toolset to achieve these goals. After a careful consideration of the evaluation questions and the alternatives, such as controlled experiments and surveys [218], the method of case studies was selected.

Case studies are widely used in software engineering and can be defined as a method "aimed at investigating contemporary phenomena in their context" [219]. The primary motivation for using a case study approach is to prove that the hypotheses hold in real project scenarios, which is a significant consideration for a software engineering solution. Additionally, through a case study a deeper understanding of the problem can be gained and potential shortcomings of the proposed approach can be discovered.

Criteria for Success

A pivotal aspect of the design of the case study is the specification of success and failure criteria. Success criteria were established per evaluation question.

Q1 - Methodology and Tool independence

Success: The consistency management process does not require the changing of the methodology

that is used to create the original artefacts. Additionally, users are not required to utilise further tools to create and edit artefacts. The outcome is a true or false statement that accepts or rejects the corresponding hypothesis.

Q2 - Level of automation

Success:One or more aspects of artefact consistency management can be carried out without

manual intervention. The outcome of the investigation is a list of automatic, semi-automatic and manual steps.

Q3 - Artefact independence

Success:The framework can be extended to handle any software artefact. The outcome is a true

or false statement that accepts or rejects the corresponding hypothesis.

9.3.1.2 Correctness testing

Framework correctness is evaluated through software engineeringvalidationandverification. Specifically, each functional area is tested and expected outputs are compared to actual outputs. Both individual functional units (unit testing) and collections of a number of functional units (integration testing) are assessed. The high level functionality areas, inputs and expected outputs are summarised in Table 9.2. The tests can be found in the ACM framework’s GitHub repository1. Test packages are named as follows: framework.X.Tests, whereXstands for the given framework component or functionality area to be tested.

Criteria for Success

Besides identifying the expected outputs of each functionality area, the followingsuccess criteria

were defined determining if a given test passes or fails.

• Artefact Extraction. XML representation of original artefacts is saved to the specified framework folder in the form of.xmlfiles.

Functionality Inputs Outputs

Artefact Extraction Original artefacts in their original format

XML representation of original artefacts

Artefact Transformation XML representation of original artefacts, XSLT stylesheets

GraphML representation of artefacts

Traceability Creation GraphML representation of artefacts converted to feature data for classification

XML representation of trace links, containing pairs of ids of connected GraphML nodes

Data Storage GraphML representation of artefacts, XML representation of trace links

Graph database populated with data consisting of nodes and edges

Change Detection A change in an external repository ChangeData and updated graph database Change Impact Analysis ChangeData

Set of potentially impacted artefact elements expressed as graph nodes

Consistency Checking ChangeData, set of potentially impacted artefact elements

Artefact element is consistent, potentially inconsistent, or inconsistent

Change Propagation ChangeData, list of inconsistent elements

Update suggestions for each element to re-establish consistency

Configuration

User input (database location, external repository location, XSLT file path and framework root folder)

The framework configuration file is populated with values specified by the user

Table 9.2: Functionality areas, inputs and expected outputs.

• Artefact Transformation. GraphML representation of XML inputs is saved to the specified framework folder in the form of.graphmlfiles. GraphML representation captures required artefact data: artefact elements and theirintratrace links.

• Traceability Creation. XML links file is produced and contains correct trace links between artefact elements obtained from GraphML representation.

• Data Storage.Graph database is populated with data obtained from the.graphmland.xml

links files.

• Change Detection.

A) File level changes in external repository are extracted and identified. B) Artefact element level changes, if any, are identified.

C) Local representations in the framework folder are updated. D) Graph database nodes, properties, and specific edges are updated.

• Impact Analysis. The framework returns a set of potentially impacted elements based on graph traversals. This includes graph nodes directly connected to the changed node through

interandintralinks.

• Consistency Checking. The framework returns one of the following results: consistent,

inconsistentorpotentially inconsistentbased on consistency checking rules.

• Change Propagation. The framework suggests resolutions to each identified (potential) inconsistency.

9.3.1.3 Performance Evaluation

Performance measurements reveal how the framework performs when subjected to specific workloads [220] in terms of the number of artefacts, size of artefacts and the number of changes. This question can be analysed through a case study approach and specificmetrics. For this evaluation,execution time (s)was selected. The objective of scenarios 1 and 2 is to reveal the correlation between execution times and the number of artefacts. In scenario 1 the steps involved

inFramework Setupare tested, while scenario 2 measures execution times of theConsistency

Management steps. Scenarios 3 and 4 investigate the correlation between artefact size and

execution times. Finally, scenarios 5 and 6 measure the performance of the change identification algorithm and change detection, respectively.

Scenario 1. Measure execution times of Framework Setupwith a system consisting of the smallest number of artefacts (MazeSolver), the largest number of artefacts (MyRobotLab), and a system in between the two (JBBP).

Scenario 2.Measure execution times ofConsistency Managementwith a system consisting of the smallest number of artefacts (MazeSolver) and a system consisting of the largest number of artefacts (MyRobotLab).

Scenario 3. Measure execution times ofFramework Setupwith artefacts consisting of the largest and smallest number of nodes. The inputs include:

• GraphML file representing theServiceJava class of theMyRobotLabsystem, which contains

171nodes. This artefact contains the highest number of nodes out of the artefacts used in this evaluation.

• GraphML file representing the Owner interface from the Neo4j system. This interface represents the other end of the spectrum with2nodes, which is the lowest number in the data set.

• GraphML file representing theJBBPTokenclass from theJBBPsystem. The number of nodes in this class (37) fall between the number of nodes in theServiceclass andOwnerinterface.

Scenario 4.Measure execution times ofConsistency Managementwith artefacts consisting of the largest and smallest number of nodes (Serviceclass,Ownerinterface).

Scenario 5. Measure execution times of thechange identificationalgorithm, which is part of the Change Detection framework stage. This test is aimed at measuring the performance of the algorithm with the largest and smallest number of nodes and allartefact element levelchange types. The inputs of this test scenario are the same as described in Test scenarios3and4.

Scenario 6. Measure execution times ofChange Detectionwith different change types. This test is aimed at measuring the impact of change types on the performance of change detection.

9.3.2 Data Collection

9.3.2.1 Selecting a Data Collection Technique

The selection of the most suitable data collection method was driven by the evaluation objectives and questions. An additional factor was the volume of data required for carrying out the evaluation. The technique that best fulfils these requirements was chosen from a number of data collection methods for software field studies. For the purposes of this evaluation, second degree techniques were considered, which are characterised by an indirect involvement of software engineers. Such techniques includeStatic and Dynamic Analysis of a SystemandDocumentation

Analysis. In this work, theAnalysis of Electronic Databases of Work Performedtechnique was

selected, which took the form of extracting artefact data from online version control systems [221]. The aim of obtaining data from existing open source software development projects hosted in online repositories was to allow the evaluation of the solution in realistic project scenarios. This technique is also extensively used in research related to mining software repositories [124]. Comments in code were not considered as their investigation is outside the scope of this thesis.

9.3.2.2 Selecting Particular Open Source Systems

The next step was the identification of subject systems. The selection criteria are described in Chapter 8, and for convenience a brief summary is provided here. Principally, candidate systems are required to provide a wide range of artefacts to assess artefact independence. Therefore, this requirement stems from question Q3. Another aspect is system size. The evaluation of question Q4requires different system sizes to model different levels of complexity in terms of the number of artefacts. Challenges encountered and specifics of the selected systems are described in Chapter 8. The particular systems used in different steps of the evaluation process are specified in each corresponding step.

9.3.2.3 Change Selection

An integral part of evaluating the framework was introducing changes to artefacts. For this purpose, existing changesets from the selected repositories were taken. The changesets provide changes of varying sizes and complexity. The main motivation for selecting existing changesets was to capture realistic project scenarios. Since changes in open source repositories are to a large extent constrained to source code, custom modifications to other representations, such as UML class diagrams, were also introduced.

9.3.2.4 Artefacts Obtained from Open Source Systems

From each system a subset of artefacts were obtained. This is due to the challenge of establishing

inter trace links in larger systems, where relationships between entities may potentially be

complex. The task therefore requires domain and expert knowledge of the given system to ensure that the property graph representation of the system is accurate. Hence the problem was constrained to a subset of artefacts. The number of artefacts obtained from each system is shown on Table 9.3. The types of artefacts and the methodology of establishing trace links between them are discussed in detail in Chapter 8.

In document Towards a holistic framework for software artefact consistency management (Page 178-183)