3.4 Visualization of Sets
3.4.8 UpSetR-style Set Visualization in SoSeVi 3 (2017)
In the third and final version of the Social Set Visualizer, UpSetR-style set visual- izations are incorporated for the purpose of Visual Analytics of Big Social Data, as detailed in Publication IV[Flesch et al.2017] of this thesis.
The UpSetR-style set visualizations in the Social Set Visualizer are utilized for interactive set-based Visual Analytics of Big Social Dataalong the two dimensions of time and space, as illustrated in Figure 3.14. Unlike the static visualizations in UpSetR [Conway et al. 2017], the Social Set Visualizer provides an interactive tool where the user can adapt the data that is displayed. Furthermore, it utilizes logarithmic scales and a color-coded bar chart in order to achieve a high level of interactive usability.
In contrast to the original UpSet [Lex et al. 2014], the Social Set Visualizer makes use of server-side calculations on its underlying Big Social Data corpus, and there- fore is able to handle much larger volumes of data, with 100,000s to millions of actors included in the underlying sets. The Social Set Visualizer takes the original idea of UpSet much further than the client-side datasets which were presented in the original UpSet paper, and utilizes the novel set visualization technique to work with sets with large-scale cardinalities. As these large-scale set calculations are compu- tationally intensive and require a lot of working memory, the analytical processing is offloaded to a specifically prepared backend system. This abstraction enables a smooth Visual Analytics user experience for users of the Social Set Visualizer. More- over, the holistic approach of the Social Set Visualizer incorporates the majority of the Big Data Value Chain [Miller & Mork 2013] which was introduced earlier.
Most importantly, the visualization is adapted to handling and displaying large differences in set cardinalities by using logarithmic scales on both axes. There- fore, both small and large sets and set intersections are clearly identifiable and discernible. Even though the visualization is technically not area-proportional, the overall usability of the Visual Analytics software tool for a research context benefits
Figure 3.14: UpSetR-style Set Visualization in SoSeVi 3 (Publication IV [Flesch
3.5. Summary 51 from this decision.
Furthermore, the UpSetR-style set visualization in the third version of the Social Set Visualizer makes extensive use of color coding. The horizontal bar chart is color-coded in different shades of gray according to the number of sets that are intersected with each other. This provides visual discernibility and comparability of set intersections, and allows the user to easily detect outliers in the dataset. The vertical bar chart is color-coded based on a categorical color scale, where each base set is represented by one particular colorthroughout the entire visualization. These colors are utilized throughout the intersection matrix, which increases usability of the overall visualization.
Both of these changes depict a novel addition compared to both UpSet and UpsetR, which only used one color for all sets in both the vertical and horizontal bar charts, and which provided no visual help through colors within the combination matrix.
3.5
Summary
In this chapter, an overview of the different approaches on set-theoretical visualiza- tions were presented, as illustrated inTable 3.1. Based on the theoretical foundations, three essential design requirements for set-based visualizations in a Visual Analytics software tool such as the Social Set Visualizer have been elaborated, namelyy in- teractivity, area-proportionality, and the number of sets that can be displayed. Therefore, approaches performing strongly in relevant design requirements have been selected for implementation in the Social Set Visualizer.
Type Year Inter
active Area-propor tional Large number ofsets Application Domain
Euler diagram 1750 General Purpose
Venn diagram 1880 General Purpose
EulerAPE 2014 General Purpose
Linear Diagrams 2014 General Purpose
UpSet 2014 Bioinformatics
“Exploded” Venn 2015 Big Social Data Analytics
UpSet-style SoSeVi 2016 Big Social Data Analytics
UpSetR 2017 General Purpose
UpSetR-style SoSeVi 2017 Big Social Data Analytics Table 3.1: Comparative evaluation of set-theoretical visualizations
52 Chapter 3. Design Thus, it was shown that the three versions of the Social Set Visualizer contributed to the state of the art in set visualization through design and presentation of the “Exploded” Venn Diagram (section 3.4.3), the UpSet-style (section 3.4.6), and the UpSetR-style set visualizations (section 3.4.8) in Publication II[Fleschet al.2015a], Publication III[Fleschet al.2016] andPublication IV[Fleschet al.2017] respectively. Furthermore, the presented set-based visualizations improve both utility and usability of the set-based approach approach to Big Social Data Analytics that is presented in this thesis.
Chapter 4
Development
This chapter gives insight into the development of the Social Set Visualizer, which was created as an interactive Visual Analytics software tool based on Social Set Analysis methodology. First, the main objectives of the Social Set Visualizer software tool are highlighted. Henceforth, the underlying technological foundations of the Social Set Visualizer are presented, with particular focus on its approach to data storage and visualization. Furthermore, an overview is given on the software architecture of front- and backend. And, the different iterations on the IT artifact are introduced. Throughout this PhD project, three versions of the Social Set Visualizer software tool were implemented, which are detailed in Publication II [Flesch et al. 2015a], Publication III [Flesch et al. 2016], and Publication IV [Flesch et al. 2017] of this dissertation. Lastly, the deployment process will be outlined.
4.1
Development Objectives
Application performance, fault tolerance, and maintainabilityhave been identified as the core objectives for software development [Spacey 2018] and are applied to the development of the Social Set Visualizer software tool to ensure a flawless develop- ment process.
Application Performance
With regard to the first key objective for the development of the Social Set Visualizer, application performancehas been shown to increase the usability for users of a soft- ware tool [Etezadi-Amoli & Farhoomand 1996,Albert & Tullis 2013]. Good interactive performance can positively influence the pace of insight generation in Big Social Data Analytics. Even though this effect on the pace of insight generation is difficult to measure quantitatively, it becomes apparent through qualitative user feedback. This is why a holistic approach to application performance is adapted during devel- opment of the Social Set Visualizer, which focuses on both client- and server-side performance of the software tool.
In a web-based Visual Analytics dashboard such as the Social Set Visualizer, client-side performance is immediately apparent to the end user. However, client- side performance is difficult to measure and improve, as it mainly depends on the computational qualities of the utilized devices and general network connectivity [Vi- avant et al. 2002]. Consequently, the optimization of client-side performance is an essential theme during development of the Social Set Visualizer.
54 Chapter 4. Development Moreover, server-side performance is important to process large volumes of Big Data. This creates computational difficulties when performing Big Social Data Analytics in an interactive, user-guided manner. From a software engineering per- spective, server-side performance of web-based analytical applications can be im- proved through implementation of caching, parallelization, and pre-computation [Tilkov & Vinoski 2010,Subramanian et al. 1999]. However, the impact of these measures for the increase of server-side performance might be limited by the computational per- formance of the underlying hardware. The possibility of upgrading to more powerful hardware largely depends on the available budget.
Hence, a general decision problem concerning the optimal distribution of compu- tational tasks between client and server emerges during development of the Social Set Visualizer. The UpSet software tool from the domain of bioinformatics, which was introduced in section 3.4.5, performs all computations in the client-side web browser. This client-side computation is feasible only as long as users of UpSet process small- to medium-size datasets that do not conflict with client-side limita- tions of a browser-based application, which include the maximum size of input files and the available working memory. Due to the multi-million datasets involved in Big Social Data Analytics, this approach is not feasible for the Social Set Visualizer. Therefore, most of the calculations are performed on the server side. By performing all computations involving raw social media data on the server-side, only aggregated results are transmitted to the client-side. Thereby, aspects of compliance and data privacy are strengthened in the Social Set Visualizer software tool.
Fault Tolerance
Fault tolerance depicts the second key objective for development of the Social Set Visualizer. It is based on the principles of availability, reliability, correctness, and error handling.
In this context, the term availability refers to the system’s online status and whether it responds to the end user. Unavailability of the Social Set Visualizer software tool negatively affects the ability of researchers to utilize the tool and to generate insights through Big Social Data Analytics.
Reliabilityis directly linked to the concept of availability. It measures the mean time between failures of the application. Availability of the Social Set Visualizer software tool means researchers can navigate to the website and utilize the tool. It will be deemed unreliable in case the interaction with the user interface of the Visual Analytics tool fails with an error message. This example underlines the importance of reliability as an objective during development.
Correctness is a further component of fault tolerance. It aims for the Social Set Visualizer to provide the proper computational results to the user and to correctly implement the set-theoretical operations which are performed on the theoretical data model of Big Social Data. Due to the negative impact of software errors on the va- lidity of scientific studies, the correctness of computational results remains an crucial objective, which can be mitigated through software testing or formal verification.
4.2. Technological Foundations 55