Software systems nowadays tend to be large, complex, and hetero- geneous in nature and face increased pressure on delivery time and product quality. Studies estimate that up to 80% of the software costs occur in the maintenance phase, out of which 40% goes into understanding the software system [178]. In this context, various mea- surements or software metrics are often utilized to obtain objective, reproducible, and quantifiable measurements to assist developers in quality assurance testing, software debugging, and software perfor- mance optimization. As such, independent of the application area different measurements are scrutinized to ensure that the system in question performs optimally, is safe and reliable, or is of high quality.
Analyses of such measurements of software systems tend to result in the scrutiny of a large amount of analysis data; a process that converts analysis data to measurement data though the use of software metrics. Most mainstream analysis tools specify and examine these
software metrics through a relational database approach. In this case, an analyzable representation of the source code is generated, stored in a relational database, and queried through SQL requests to create measurement data (e.g., quality or maintainability metrics) [72].
However, for large software systems, the analysis data gets so enor- mous that existing approaches trade the amount of source code details stored for performance. This typically means an incomplete repre- sentation of source code where lower-level details are often omitted (e.g., method bodies, expressions, etc.). Although this strategy ends up facilitating scalability, it often limits the amount of supported mea- surements. In addition to being large, software systems also undergo continuous changes in order to adapt the new technology, to meet the new requirements, and to repair errors. These exponentially growing changes imply that software analysis tools have to continuously adjust their data models, restructure their relational database schemas, and reformulate complex SQL requests. Further, in order to formulate measurement data, end-users (e.g., quality experts, project managers, etc.) typically query the underlying database via queries on software metrics. This procedure requires not only an in-dept knowledge of the database schema but also expertise in the SQL querying language.
In order to address the above-mentioned concerns, we propose techniques and methods for the interactive visual analysis of software measurement data. To resolve scalability issues we advocate the use of a NoSQL approach that is based on a graph database. Using this methodology, we can perform complete analyses of large software systems and yet maintain a reasonable performance while creating measurement data. Furthermore, while approaching “big data” scala- bility can be maintained by distributing the underlying graph database across a multi-machine cluster. Additionally, to address changing re- quirements a graph database can be restructured far more easily due to an index-free adjacency and a schema-free data access [75]. In terms of the former, additional information such as the evolution of the software or the amount of effort needed to change software components may be added via links to new nodes. While in terms of the latter, the underlying graph model may be changed at run-time to store intermediate results without corrupting the analysis data model. Finally in terms of querying, whether it is SQL for relational databases or graph quering languages such as Gremlin or Cypher [179], we provide an interactive visual workflow modeling approach. Our approach makes it easier for non-experts to adjust existing metrics
or define new metrics without the prior know-how of the underlying database schema or the required database query language. Thereby alleviating end-users from this burden and instead empowering them with a means to visually specify their queries.
Irrespective of the database approach employed, the ensuing mea- surement data consists of metrics (numbers and sets of numbers) and is typically aggregated through statistical techniques, such as mean, variance, and standard deviation. However, numerical data by itself may not be so suitable to convey information in a comprehensible fashion. Instead, as an example it may be more appropriate to depict the distribution of a given metric across the software hierarchy in graphical terms. In this regards, we aim to provide the user with a plethora of options that include traditional methods as well as inter- active software visualizations. Thus, providing them with a combined computational analysis and visualization approach.
Further, the workflow-based approach of our framework updates only the relevant measurement data and views when the user changes a metric – thereby, yielding an interactive exploration style of analyzing software measurements. In contrast, while using traditional approaches the complete measurement has to be evaluated again; a process that is too slow for large software systems with a complete set of analysis data.
It is for the reasons discussed here that we endorse a user-centric approach. Our proposed solution combines the specification and vi- sualization of software measurements through data workflows. We aim to address the concerns of different stakeholders through the synergy of configurable data abstractions, metrics, and visualizations. Examples of such varied analyses include but are not limited to the performance, safety, security, or the quality of the system. Ultimately, the goal of our research is to empower end-users with the ability to apply tailored metrics and visualization metaphors to visually explore the characteristics of a software system according to their individual requirements.
In order to validate our ideas, we have developed a live-data pro- totype tool called VIMETRIK. Our preliminary study indicates the promise and feasibility of our approach to analyze software measure- ment data. In particular, our experiment shows that even graduate students with no knowledge of the underlying database model, query- ing mechanism, or software analysis could use our tool to generate and analyze some basic software metrics. The participants completed
these analysis tasks with a completion rate and accuracy of over 85%. We expect that if non-experts could use our tool then a professional software analyst would definitely benefit from the intuitive and easy- to-use means of understanding software measurement and analysis data.