• No results found

2.1 Visual Analytics

2.1.2 Sense-Making Loop

In the sense-making loop, data needs to be restructured according to a hypothesis in order to validate or invalidate it. This process typically consists of formulating hypotheses, executing queries, and examining results.

In traditional approaches, relational databases are queried with textual SQL type queries and results in the form of tabular data are scrutinized. However, analysts not only require knowledge of the query language but they also have to look into the data as well as its ontology. Typically, these textual queries present several issues for the database user, such as the necessity to identify the database classes, attributes, and relationship structure before writing a query, and issues relating to semantic and syntactical errors [77].

The work of Cammarano et al. [78] observes that most data analysis user interfaces take either one of two approaches. While one approach focuses on simplifying the query specification, the other examines the results through visualization metaphors and techniques. While specifying or presenting the results of queries visually has been beneficial in the context of exploratory search, it is the coupling of both these approaches that is of special interest [79]. In recent times, researchers have successfully applied a number of combined approaches to a wide variety of application domains, such as Network Security, Epidemiology, and Biomedicine [2,80–82]. Figure2.2is an example of this approach that targets the detection and response of an infectious disease outbreak.

The above examples highlight the need to simplify query specifica- tion and the importance of innovative ways of viewing the results so that effective decisions may be made on large multidimensional mea- surement data – an aspect not so dissimilar to what software analysis tools strive for. In our work, we aim to bridge this gap through a Visual Programming Language (VPL) means that combines query specification and interactive software visualizations to present results

9 DB-Engines Ranking of Graph DBMS (http://db-engines.com/en/

Figure 2.2. Epinome: a VA Workbench for Epidemiology data [2]

in a more meaningful manner. VPLs are languages that exploit visual representations in order to focus on the domain of interest instead of command languages, while interactive software visualizations encode software metrics in hierarchical representations of software systems.

Visual Query Specification

A number of techniques can be found as alternatives to command languages. A popular visualization for queries is the use of graph or network representations where nodes and edges represent components and their relationships. The Ecosystem Services Database [83] is a good illustration of this approach where users compare ecosystem service values across various geographic regions through the use of a graph-based visual query system.

Some researchers have focused on graphically representing the Boolean operations found in command languages. Representations of Venn diagrams [84] have been used to form graphical queries, where query terms are associated with a ring or circle and conjunction of terms are indicated using intersection of circles. Similarly, flow diagrams [85] have been used to depict conjunctions using sequential flows and dis-junction using parallel flows. Elmqvist el al. [86] present a visual canvas for constructing visual queries through the use of a graphical set representation that they refer to as DataRoses. Each DataRose comprises of a starplot of selected columns in a dataset that are displayed as a multivariate visualization that incorporates

dynamic query sliders into each axis. Tools such as InfoCrystal [87] and KMVQL [88] are similar in providing a means to find and select graphical representations of interest. The former employs iconographic representations while the latter makes use of Karnaugh maps.

Other interfaces found in literature are either based on the ubiqui- tous file-system browser interface [89] or focus on certain innuendos that assist the user in forming his query. Examples of the latter would be the work of Sinha and Karger [90] that assists the user through the use of navigation hints and the work of Trigoni [91] that lets the user improve a query over time by gradually revealing the underlying data.

Query Result Display

Traditional query interfaces display data items that meet query spec- ifications in the form of tabular results. While this is still a useful approach, it often makes it difficult for the user to make correlations in large datasets. In the recent past, researchers have sought to tackle this problem by empowering users with better visual feedbacks.

Researchers such as Lucas el al. [92] and Mathew Ward [93] have implemented a variety of standard statistical charts as well as infor- mation visualization graphing techniques to communicate the results to the user. Similarly, systems such as Visionary [94] offer a direct- manipulation interface for browsing the results. The survey paper of Oliveira et al. [95] provides a closer look at such database visualization techniques.

In the context of software analysis, interactive software visualiza- tions can be applied to present the extracted facts in a more meaningful manner than traditional methods. In Section2.2, we provide a detailed overview of the relevant software visualization tools and techniques.

Coupling Queries with Results

The well known metaphor of a pivot table in a spreadsheet is used in the Polaris system [96] to incorporate both a novel query interface mechanism as well as an integrated visualization that displays corre- lations in data with respect to any attribute in the dataset. Similarly, the research of Livnat el al. [82] focuses on visual correlations of heterogeneous data to facilitate situational awareness and decision making processes. His work with Draper [81] introduces an interactive

A

M

B

C

N O1 O2 IN1 IN1

Figure 2.3. Simple flow-based diagram

radial query language for simplifying the tasks of searching for data correlation. They place icons representing individual entities around the circumference of a ring and allow the user to interactively focus on certain relationships by dragging relationship icons into the ring’s interior.

In our work, we employ a diagram-based VPL that combines software related queries with query results. VPLs are languages that exploit visual representations in order to focus on the domain of inter- est instead of command languages. They facilitate users to program by manipulating or arranging graphical elements rather than writing textual source code. This gives users the ability to work with them at a higher abstraction level where they need no prior experience or knowledge to express their programming requirements. Thereby, providing end-users with a more intuitive way to create, modify, or extend parts of a software system.

Every VPL can be classified into one of three basic categories: icon-based, form-based, or diagram-based, depending on which type of visual expressions are used. In his thesis, Stehno [97] examines these categories in more detail. Our work is based on the concept of boxes and arrows which belongs to the category of diagram-based visual programming. In this flow-based paradigm [98], nodes can be thought of as “black boxes” and arrows as “arcs” that send data tokens to other connected nodes. Figure 2.3shows the major entities of a flow-based diagram: A, B, and C are black boxes that process executing code components, and M and N are arcs that connect to their respective processes via the O1, O2, and two IN1 ports. Each node performs

a pre-described task as soon as it receives all the required tokens it needs for execution, while arcs carry numbers, arrays, or even pointers to objects as data tokens between a sending or receiving node. This principle is often referred to as the dataflow execution model.