• No results found

Interactive Data Visualization Software

We wrote the RnavGraphsoftware (see Chapter 2) that implemented a user interface for navigation graphs by providing a “bullet” on a graph to drive the transitions, as proposed by Hurley and Oldford [44]. WithRnavGraphour goal was to interactively analyze data in theRstatistical environment [60] using navigation graphs. While working on the interac- tive graph we investigated various options for readily available interactive scatterplots in

R; it became obvious to us that interactivity for both the graph and the scatterplot display – or any display for that matter – was important for effectively using navigation graphs.

There is a long history of the design and development of interactive visualization soft- ware for exploratory data analysis dating back to at leastPRIM[27] in 1973. Other exam- ples include Quail [43], Lisp-stat [76], Plot Windows [67], DINDE [58], DataDesk [80],

Data Viewer[40], thegobifamily [68,13,69,47],iplots[79] andMondrian[75]. Among other features, these systems provide a scatterplot that supports the following features: dynamic zooming and panning via mouse gestures and some form of brushing and linking (we are not completely sure aboutPRIMand linking).

These systems take different approaches to providing interactive data visualizations graphical user interfaces (GUI). PRIM, Data Desk, Data Viewer, the gobi family and

Mondrianprovide in essence an encapsulated environment to visualize and explore data. That is, they have limited or no connection to a complete statistical system with a major user community such as R. Hence, creating new plots and control widgets dynamically from a command line interface and incorporating various statistical analyses is not pos- sible with these systems. On the other hand, Quail andLisp-stat do support dynamic creation and incorporation of statistical analyses, but they are not integrated into a com- plete statistical system; adding new statistical tools toQuailorLisp-statinvolves their respective authors having to write these tools first, see for example Anglin and Oldford [4].

Finally,iplotswas designed to bring interactive graphics to theRenvironment. However,

iplotsuses actions in menus that cannot be controlled via the command line.

ForRnavGraph, we first used the interactive scatterplot display ofGgobivia therggobi

R package [47]. However, we were missing some important features such as advanced point glyphs for the scatterplot display including images, text and star glyphs. We also found that installingrggobiwas difficult on certain operating systems which would have limited the potential users of RnavGraph package. We were frustrated with not having interactive tools whose value in exploratory data analysis has long been known (at least 20 or more years ago [70,51,7,42,2]) that were integrated with commonly used and sta- tistically rich set of more formal analysis tools (as provided for example by the open source system,R). This frustration is shared by others. At a recentR users conference, Di Cook [22] shared her frustration and listed the following “challenges to the young developers”:

• Interactivity on the plot • Different types of brushes

• Different kinds of linking between plots • Programmability

• Strong connection with model fitting • Portability, easy install, web compatible • Large quantities of data

• Incorporating inference • Conceptual framework

We ended up writing our own interactive scatterplot displaytk2das part of the

RnavGraph Rpackage, seeSection 2.3. Motivated from the results oftk2dwe took up de- signing and implementing a new interactive general-purpose visualization system called

loon. We reflect inChapter 8on howloonmeets the challenges set by Di Cook.

This thesis is structured as follows. In Chapter 2, we discuss RnavGraph, a software environment for interactively exploring data using navigation graphs. In Chapter 3, we

present a visual exploratory analysis of the visible minority populations distributed across major census metropolitan areas of Canada. We highlight visualization and interaction methods that are used for this analysis. We endChapter 3with an introduction of loon and discuss howloonis used to perform the visual analysis of the minority data. To that means, we introduce the relevant conceptual aspects of theloonframework.

InChapter 4andChapter 5, we presentloon’s framework in detail.Chapter 6presents some relevant statistical applications that were enhanced with interactive visualization in

loon. InChapter 7, we introduce some novel tools inloonfor exploring high-dimensional data with navigation graphs. We conclude this chapter by introducing a novel high- dimensional point glyph called spiro glyph. Chapter 8 wraps up this thesis with con- clusions and a discussion of future research work.

(a) Navigation graph with canonical graph semantic and bullets representing different dimensionality reduction methods.

(b) Scatterplots driven by the location of the bullet corresponding to a particular dimensionality reduction method.

Figure 1.12: Comparing seven dimensional reduction methods using navigation graphs with the canonical semantic.

Chapter 2

RnavGraph

As part of our research, we have developed a software package called RnavGraph that provides an interactive environment to explore high-dimensional data using navigation graphs.RnavGraphis an open source package for theRstatistical environment and hosted on the Comprehensive R Archive Network (CRAN).

RnavGraph is a major milestone in our research as it represents a first implementa- tion of the concept of navigation graphs and it demonstrates that, in practice, navigation graphs are useful to explore real data. We designedRnavGraphto be flexible so that novel graph semantics can be applied and tested.

The design ofRnavGraphis an important part of our research. This design includes the selection of essential features for an useful interactive navigation graph environment, the software architecture and the user experience design.

In this chapter, we discuss the functionality of theRnavGraph package. We first show how to initialize anRnavGraphsession for the canonical graph semantic. We then describe the user interactions with the two main displays: the navigation graph display and the 2d scatterplot display. Next, we present part of the software architecture and show how

RnavGraphcan be extended to accommodate a new graph semantic. We end this chapter by listing some limitations of theRnavGraphpackage.

Many examples included in this thesis use the olive data first introduced in Subsec- tion 1.3.1. We thereforeattachthe olive data inRwhich allows us to refer to its variables by their names (i.e. Areavs. olive$Area).

attach ( olive )

We also create a second data set calledoliveAcidsthat includes only the fatty acid vari- ables, but not theRegionandAreavariables.

oliveAcids <- subset (olive , select =-c(Area , Region ))