Spotting the structures in the package hierarchy that required attention

using test coverage data 101

7.3 Spotting the structures in the package hierarchy that

required attention using test coverage data

“Are My Unit Tests in the Right Package?”

[21] Gerg˝o Balogh et al. “Are My Unit Tests in the Right Package?” In: Source Code Analysis and Manipulation (SCAM), 2016 IEEE 16th International Working Con- ference on. IEEE. 2016, pp. 137–146

The software development industry has adopted written and de facto standards for creating effective and maintainable unit tests. Unfortunately, like any other source code artifact, they are often written without conforming to these guidelines, or they may evolve into such a state (challenge 2).

This work addressed the quality of unit test suites from a novel angle. Our approach was to compare the physical organization of tests and tested code in the package hierarchy to what could be observed from dynamic behaviour of the tests. The application of community detection algorithms for the latter is a viable approach, and we believe that this kind of analysis of unit tests may reveal knowledge about them not investigated earlier (sub-thesis point 3.1). Our results indicate that for realistic systems, there are a quite lot of discrepancies between the package based and community based structures. But it does not necessarily mean that each of these need to be fixed in the first place by some kind of refactoring of test code. Furthermore, it is not generally possible to decide if there is a problem with the placement of test cases in the package structure or with the way test cases invoke elements of the tested code. Hence any discrepancies found are treated as “bad smells” (sub-thesis point 3.2).

“Analysis of Static and Dynamic Test-to-code Traceability Information”

[44] Tam´as Gergely et al. “Analysis of Static and Dynamic Test-to-code Traceability In- formation”. In: Acta Cybernetica 23.3 (2018), pp. 903–919

In this study, we carried out an analysis of test-to-code traceability information. Unit test development has some widely accepted rules that support things like the maintenance of these tests suites. Some of them concern the structural attributes of these tests. These attributes can be described by traceability relations between the test and code. Previous studies demonstrated that fully automatic test-to-code traceability recovery is difficult, if not impossible in the general case. There are several fundamental approaches that have been proposed for this task, based on, among other things, static code analysis, call-graphs, dynamic dependency analysis, name analysis, change history, and even questionnaire based approaches. However, there seems to be general agreement between researchers that no single method can provide accurate information about test and code relations (challenge 4).

Following this line of thinking, we developed a method that is able to detect Struc- tural Unit Test Smells, i.e.locations in the code where unit test development rules are violated. This method foreshadows the definition of a unified comparison methodology related to sub-thesis point 3.3. In particular, we compute test-to-code traceability using two relatively straightforward automatic approaches, one based on the static physical code structure and the other on the dynamic behavior of test cases in terms of code coverage. Both can be viewed as objective descriptions of the relationship of the unit tests and code units, but from different perspectives; hence, each location where they disagree about traceability can be treated as a Structural Unit Test Smell. Our approach is to use clustering and hence form mutually traceable groups of elements (instead of atomic traceability information), and this makes the method more robust because minor inconsistencies will probably not influence the overall results.

Here, we investigated the results of this method applied on four subject programs. Our goal was to manually check the reported Structural Unit Test Smells to see whether at least a part of these are real problems that need to be examined. Experience indicates that most of the reported Structural Unit Test Smells point to parts of the test and code that could be reorganized to better follow unit test guidelines. However, in some situations it might not be worth modifying the tests and the code (e.g. for technical reasons). Overall, we found several typical reasons that could form the basis of future studies and this might lead to an automatic classification of the Structural Unit Test Smells.

These findings have several implications. First, the method has a potential to find Structural Unit Test Smells, but the results will probably contain a large number of false positives (sub-thesis point 3.2). To filter them out, we need to carry out an investigation of the given situation. Fortunately, it seems that there are similar situations that can provide a basis for the automatic classification of the identified smells, and it may assist the developers in their refactoring activities. However, it is also clear from our manual analysis that automatic classification requires additional knowledge (i.e. simply relying on the currently used static and dynamic data is not enough). Furthermore, we found several intricate Structural Unit Test Smell patterns in the CSGs, for which we could not make informed refactoring suggestions because of their complexity and size.

“Differences between a static and a dynamic test-to-code traceability recovery method”

[45] Tam´as Gergely et al. “Differences between a static and a dynamic test-to-code traceability recovery method”. In: Software Quality Journal (2018), pp. 1–26

Recovering test-to-code traceability links may be required in virtually every phase of development. This task might seem simple for unit tests thanks to two fundamental unit testing guidelines: isolation (unit tests should exercise only a single unit) and separation (they should be placed next to this unit). However, practice shows that recovery may

7.3 Spotting the structures in the package hierarchy that required attention

using test coverage data 103

be challenging because the guidelines typically cannot be fully followed. Furthermore, previous works have already demonstrated that fully automatic test-to-code traceability recovery for unit tests is virtually impossible in a general case (challenge 4).

In this work, we proposed a semi-automatic method for this task, which is based on computing traceability links using static and dynamic approaches, comparing their results and presenting the discrepancies to the user, who will determine the final traceability links based on the differences and contextual information (sub-thesis point 3.3). We defined a set of discrepancy patterns, which could help the user in this task (sub-thesis point 3.2). Additional outcomes of analyzing the discrepancies were structural unit testing issues and related refactoring suggestions. For the static test-to-code traceability, we relied on the physical code structure, while for the dynamic, we used code coverage information. In both cases, we computed combined test and code clusters which represent sets of mutually traceable elements. We also presented an empirical study of the method involving 8 non- trivial open source Java systems.

“First Steps towards a Methodology for Unified Graph’s Discrepancy Analysis”

[14] Gerg˝o Balogh. “First Steps towards a Methodology for Unified Graph’s Discrepancy Analysis”. submittted for review to 13th International Conference of Graph Trans- formation, (part of STAF 2020)

During software analysis, researchers and it experts often rely on the comparison of datasets. They also frequently draw conclusions based on differences between two rep- resentations of the same item’s set (challenge 4). For example, developers may examine the densely connected parts of method call graphs in the context of their location in the package hierarchy tree to find error-prone parts of the system. These kinds of analyses could be aided with a generalized methodology for graphs, which could be used to unify the underlying process of discrepancy analysis. In this paper, we present a methodology for unified graph’s discrepancy analysis, named UniGDAsub-thesis point 3.3. It is based on the previously defined domain-specific discrepancy detection technique for cluster comparison. Our generalized methodology is using different types of characteristic functions to capture the similarity structures between vertices of arbitrary graphs. We provided several domain independent options for the free parameters of UniGDA. We also presented two possible use cases of UniGDA: the classification of structural test smells and the clustering of test-code traceability discrepancies to showcase the usage of our methodology.

Appendix A

Measuring, Predicting, and

Comparing Productivity of

Developer Teams

A.1 General Notions and Definitions

In this chapter, we use the following notions and definitions.

Definition A.1.1. For a given software system we define R = hr0, . . . , rni to be the

ordered set of revisions of the source code.

During the experiment, the various modifications were collected to grasp the effort spent by developers.

Definition A.1.2. A modification m is any difference between any two revisions, m ∈ diff(ri, rj) where i < j. We assign one from a predefined set of types to each modification,

based on the affected source-code element and its affected property if any, t(m) ∈ T . Definition A.1.3. δt(ri, rj) ∈ N is the count of modifications of type t, between the

revisions ri, rj. In other words δt(ri, rj) = |M | where M ⊆ diff(ri, rj) and m ∈ M, t(m) =

t. Furthermore, ∆(ri, rj) ∈ Nn is a vector over natural number contains the counts of all

predefined modification types between the revisions ri, rj.

Definition A.1.4. Furthermore we use devtimeri→rj to represent the net development

time between ri and rj revisions, where i < j.

In document Utilizing static and dynamic software analysis to aid cost estimation, software visualization, and test quality management (Page 117-121)