BACHELOR THESIS. Peter Fabian Interactive Data Visualization Tool

(1)

Charles University in Prague

Faculty of Mathematics and Physics

BACHELOR THESIS

Peter Fabian

Interactive Data Visualization Tool

Department of Software Engineering

Supervisor: RNDr. Pavel Par´ızek

Study program: Computer Science

(2)

I would like to thank my former supervisor, Tom´aˇs Kalibera, for suggesting me the idea and giving me numerous advice. I would also like to thank my latter supervisor, Pavel Par´ızek, for helping me with finishing the work on the project.

Furthermore I would like to thank my friend Ondrej Ruttkay for his corrections and my parents for supporting me in my work.

I hereby declare that I wrote the thesis myself using only the referenced sources. I agree with lending of the thesis.

(3)

Abstrakt: C´ılem této práce bylo navrhnout a vytvoˇrit nástroj IVP pro inter-aktivn´ı a snadnou práci s grafickou reprezentac´ı statistických dat. Nástroj podporuje základn´ı úpravy grafu a také operace, které jsou v existuj´ıc´ıch nástroj´ıch tˇeˇzko proveditelné, nebo tam chyb´ı uplnˇe. D˚uleˇzitou vlastnost´ı nástroje je rozˇsiˇritelnost – jednotlivé typy graf˚u a interaktivn´ı operace jsou implementovány ve formˇe modul˚u, které jádru aplikace zprostˇredkovávaj´ı danou funkˇcnost.

Souˇcást´ı textu práce je popis implementace nástroje IVP, uˇzivatelská dokumentace, a také pˇrehled existuj´ıc´ıch nástroj˚u pro práci s grafy a jejich srovnán´ı s nástrojem IVP.

Kl´ıˇcov´a slova: vizualizace dat, graf, interaktivita, moduly

Title: Interactive Data Visualization Tool Author: Peter Fabian

Department: Department of Software Engineering Supervisor: RNDr. Pavel Par´ızek

Supervisor’s e-mail address: [email protected]

Abstract: The goal of the work was to design and implement a tool, which would allow the users to work with a graphical representation of statistic data easily in an interactive manner. The IVP tool supports both the basic adjustments of graphs and advanced modifications, which are either very hard or impossible to achieve in similar tools. However, the key feature of the tool is its extensibility. The particular graph types and the interactive features are implemented as plug-ins and their functionality is provided to the tool’s core.

The thesis also presents a description of the IVP tool’s implementation details, user documentation, an overview of the existing graphing software and its comparison to the IVP tool.

(6)

Chapter 1 Introduction

Large amounts of statistic information is produced every day. This infor-mation is usually processed using various computer systems nowadays. One of the most effective and revealing techniques for displaying quantitative information is visual display of the data [12]. The visualization of a large amount of data on a single graph may offer a new perspective in identifying the important information in a simple manner.

The invention of displaying statistical data in various visual styles and using miscellaneous graphical methods is (according to [12]) quite recent and dates back to the half of the 18th century, when the basic theories of calculus, logarithms or Cartesian coordinates already existed. One of the earliest graph types were time series graphs and bar plots1

depicting univariate data, soon followed by two-variable design of scatter plots.

Besides time-series and scatter plots, many new types of graphing tech-niques emerged during the last centuries, e.g. bar plots2

and their siblings – histograms, spectrograms, lag plots, box plots, etc. Every graph type presents data in a different way, but all of them serve the same purpose – to help humans describe the data, view and compare it from different per-spectives, reveal the dependencies and correlations between variables and perceive many numbers in a relatively small place [12].

Almost every graph produced today comes from some type of computer. Computers are able to process huge amounts of data and and there exist many ways to represent it graphically. However, especially for large data sets, interactivity and easy manipulation with the graphical representation of the data can be helpful in exploring its nature. Only few tools that offer interactivity and flexibility exist. The goal of this work is to implement a tool that offers not only intuitive working with data and its representation, but also the flexibility of adding new features by creating new plugins.

The following section provides an overview of related applications along

1_{According to [12], 43 of 44 graphs in The Commercial and Political Atlas by William}

Playfair published in London in 1786 were series plots and the popularity of time-series (or run-sequence) graphs remains still very high.

(7)

with the goals of the Interactive Visualization Project (IVP). In chapter 2, the features supported by the IVP tool are presented in greater depth. The detailed description of implementation is then given in chapter 3. Chapter 4 concludes and evaluates the project and describes possible future work on the IVP tool. The user documentation is given in Appendix A.

1.1 Related Work

The existing graphing tools vary in their approach to the task of creating a graph. The tools presented further in this section have approximately the same target audience as the IVP tool.

1. Sigma Plot [9] by Systat Software is a complex graphing software for Microsoft Windows operating systems with many features and ca-pabilities. It supports approximately 80 graph types (both 2D and 3D). Among the interactive features, the most important are graph zooming, breaking and scaling axes, multiple axes support and ad-justing appearance of the graph in many ways (changing color, ticks, adding grid etc.). It can import data from various types of source files and produce vector and bitmap images. The tool includes also some basic statistical features and can cooperate with statistical software SigmaStat.

2. Origin [6] by OriginLab is scientifically oriented graphing software for Microsoft Windows that uses the NAG (Numerical Algorithms Group) library, supports many graph types and includes even the C compiler. It supports approximately 60 graph types including statistical graphs; it uses layers when plotting (up to 80 layers per graph), supports work-ing with matrices and linear data sets. The tool also provides many data analysis techniques (regression, smoothing, descriptive statistics, FFT etc.) and can import data from a database. It also supports breaking and scaling axes, adjusting the appearance of the plotting area using grid, colors etc.

3. gnuplot [2] by Thomas Williams and Colin Kelley is a command-line multiplatform (UNIX/Linux, Windows, MacOS, Amiga, OS/2. . . ) plotting tool. It supports many commands, ‘hotkeys’ and mouse ac-tions. Several graphic user interfaces has been written as a front-end for gnuplot and it is also used as a plotting engine for Octave project, an open-source Matlab-compatible numerical tool. It is a powerful tool, but without a good graphic user interface, using all the features and the full interactivity is hard to achieve.

4. Dataplot [1] by James J. Filliben and Alan Heckert is a multiplatform tool for scientific, engineering, statistical, mathematical and graphical analysis. It is also an interactive, command-driven language/system

(8)

with English-like syntax. Dataplot was developed in response to data analysis problems encountered at the National Bureau of Standards (now the National Institute of Standards and Technology) in USA. It is a powerful tool, which allows to draw many types of graphs and customize their appearance, but it is not interactive and lacks some important features like breaking coordinate axes.

5. R project [7], initially written by Ross Ihaka and Robert Gentleman (now there are many contributors, most important of them form The R Development Core Team), is a multiplatform (UNIX/Linux, Windows, MacOS) system for statistical computation and graphics. It consists of a language and a run-time environment with graphics, a debugger, access to certain system functions and the ability to run programs stored in script files. R provides a wide variety of statistical1 and graphical techniques, and is highly extensible [5]. It is a very flexible and powerful tool, however, interactivity can be achieved only using history and replotting the graph which is quite uncomfortable and unintuitive.

To summarize, there are several tools designed for plotting data, some of them are supported by statistical software (R project, Sigma Plot + Sigma-Stat), but many of them lack real interactivity (R project, Dataplot) or do not provide graphic user interface (gnuplot). Some of these tools are not free (SigmaPlot, Origin) while others are available under the GNU/GPL license (R project, gnuplot). Commercial solutions usually offer many features for customizing the graph’s appearance, while the open-source ones focus on the underlying data. The IVP tool tries to combine both the extensibil-ity of the R project, native graphic user interface supporting intuitive work with graphs and ability to use the tool in batch mode from command line. The tool will, however, not support scripts or even writing custom programs (like R project or Origin), it will only support interactive plug-ins (more information about plug-ins can be found in further sections). The IVP tool will be available under the GNU/GPL license and is primarily aimed at UNIX/Linux platform. Unlike some commercial products, it rather tries to present interesting data than to show what a computer is able to draw.

(9)

Chapter 2 Supported Features

2.1 Overview

The IVP is a tool, which allows user to work easily with a graphical repre-sentation of large data sets in an intuitive way. Its main features are:

• Plotting multiple graph types: run-sequence graphs, scatter plots, his-tograms, bihishis-tograms, box plots, bar plots and lag plots.

• Scaling and breaking coordinate axes, zooming parts of the graph. • Drawing in ‘layers’ – allows to plot graph with data stored in multiple

files (i.e. if medians are stored in other file than actual values). • Saving the settings of the view created interactively by the user;

load-ing view settload-ings from files without the need of repeatload-ing the same actions each time when plotting the same data.

• Command line support for basic functions • Output of bitmap and vector images

• Extensibility through in system using graph and interactive plug-ins. Plug-ins extending interactive features modify the view of the data, the way the data is are plotted, which part of it is used for visualization, etc.

2.2 Graph Types

The IVP tool is designed with extensibility in mind, thus new types of graphical representation can be added easily using the plug-in system. Basic supported graph types include run sequence graph, scatter plot, lag plot, bar plot, box plot, histogram and bihistogram. While the run sequence graph, bar plot and scatter plot are the most common and popular general graph

(10)

0 10 20 30 40 50 2 .0 0 1 2 .0 0 1 5 2 .0 0 2 2 .0 0 2 5 2 .0 0 3

Run se que nce graph

inde x R e s p o n s e v a ri a b le

Figure 2.1: Sample run sequence graph

types since their first occurrence [12], histogram (along with bihistogram), lag plot and box plot are basic graphs used for statistical analysis of data [3].

Supported graph types include:

1. Run sequence graph or time-series (Figure 2.1) is one of the most popular graph type since the 11th century when it was first used. Run sequence graphs are used to depict a univariate data set and are “at their best for big data sets with big variability” [12]. Shifts in location, scale and outliers can be easily revealed by looking at run sequence graphs [10].

Run sequence graph is formed by:

• Horizontal axis: index i (i = 0/1, 2, 3, . . . ) or time units (from seconds to millennia)

• Vertical axis: response variable y(i)

2. Scatter plot (Figure 2.2) is the barest form of relational graphics. It is widely used in modern scientific literature and college textbooks, according to [12] – “about 40 percent of published graphics have a relational form with two or more variables” – and Tufte also calls it “the greatest of all graphical designs” [12]. Scatter plot graphically summarizes bivariate data sets, links at least two variables and can easily reveal a correlation between data sets, which may (or may not) mean a causal relationship between the variables. [12]

(11)

150 200 250 300 350 400 -1 0 -5 0 5 Scatte r plot

CO2 conce ntration (ppm)

R e la ti v e t e m p ( C e ls iu s )

Figure 2.2: Sample scatter plot

Scatter plot is formed by:

• Horizontal axis: variable representing the probable cause – x • Vertical axis: y – variable potentially influenced by x

3. Lag plot (Figure 2.3) is a special type of scatter plot, where both causal and influenced variable are represented by the same set of data. It is used to determine whether the underlying data set is random or not. Non-random structures in lag plot indicates that the tested data set is probably not random and vice versa [10]. A lag means a fixed shift or time displacement. If the values in data sets are indexed v1, v2, v3, . . . , vn, then plotting v5, v6, v7, . . . as a response variables to

v1, v2, v3, . . . would mean a lag size equal to 4. Lag plots usually use

lag size equal to 1. Lag plot is formed by:

• Horizontal axis: vi for all i

• Vertical axis: vi+1 for all i

4. Bar plot (Figure 2.4) is rather simple but quite common graph type with usually suboptimal data-ink ratio1

. It is used to summarize

uni-1_{Data-ink ratio is the “proportion of graphic’s ink devoted to the non-redundant}

dis-play of data-information” (maximizing this ratio is, according to [12], one of the techniques to improve graph quality)

(12)

-600 -400 -200 0 200 400 -6 0 0 -4 0 0 -2 0 0 0 2 0 0 4 0 0 Lag plot Be am de fle ction

Figure 2.3: Sample lag plot

variate data sets and provides the same information as index-based run sequence graph.

Typical bar plot is formed by:

• Horizontal axis: Area(s) of interest • Vertical axis: Response variable

Bar plots and boxplots can be drawn with single or multiple bars/boxes (Figure 2.5). A single bar plot can be drawn for one batch of data with no distinct groups. Alternatively, multiple bar plots can be drawn together to compare multiple data sets or to compare groups in a single data set. The width of the bars is usually equal and does not have any meaning regarding the underlying data set.

5. Histogram (Figure 2.6) is a special type of a bar plot, sometimes referred to as a frequency bar plot. It is used to depict the distribution of a univariate data set and can be very helpful in determining the distributional model of the data set, outliers, various modes, the center and skewness of the data. Typical histogram is obtained by splitting the data into equally sized buckets or bins and counting how many values fall into each of the buckets. The resulting frequencies are then plotted as a bar plot [10]. The number of buckets can be arbitrary, some useful theoretically supported suggestions on how to determine the number of buckets can be found in [8].

(13)

0 10 20 30 40 0 5 0 1 0 0 1 5 0 2 0 0 Bar plot Day no. N o . o f fl e a s Lab 1

Figure 2.4: Single bar plot

0 10 20 30 40 0 5 0 1 0 0 1 5 0 2 0 0 Bar plot Day no. N o . o f fl e a s Lab 1 Lab 2

(14)

42 47.3 52.6 57.9 63.2 68.5 73.8 79.1 84.4 89.7 95 0 1 0 2 0 3 0 4 0

Histogram -- Old Faithful

Inte rruption time

C

o

u

n

ts

Figure 2.6: Sample histogram

• Horizontal axis: response variable

• Vertical axis: frequencies (counts for each bucket)

6. Bihistogram is a plot derived from basic histogram plot. It uses two properly positioned histograms, one above the other, to depict the differences in type or center of the data distribution, skewness and the positions of the outliers in the data sets (Figure 2.7). Because all this information could be read from one plot, it is usually a good alternative to t-test and it is a great technique to detect, whether a modification in conditions caused a change in location, variation or distribution of the data [10].

Bihistogram is formed by two histograms:

• First histogram for condition 1 above the horizontal axis • Second histogram for condition 2 below the horizontal axis 7. Box plot is a graph type used for depicting multivariate data sets.

Typical box plot is obtained by determining the following: median, first and third quartiles (medians of values smaller and bigger than median respectively), minimum and maximum of each depicted data set and plotting these values on the graph. A dot or a line inside a box, which is spanning between the first and third quartiles, usually represents the median. Lines drawn from the top and the bottom of the box (thus from the upper and lower percentiles) can reach the

(15)

345.294 408.809 472.323 535.838 599.353 662.867 726.382 789.897 -4 0 -2 0 0 2 0 4 0 Bihistogram Ce ramic Stre ngth C o u n ts Batch 1 Batch 2

Figure 2.7: Sample bihistogram

minimum and maximum values (Figure 2.8), or their length can be equal to a multiple of inter-quartile range and the rest of the values, usually called the outliers, are drawn as separate dots (or are not drawn at all).

Box plots summarize the location and variance in data sets and are very useful for spotting changes of these properties among the exam-ined data sets [10].

Box plot is formed by:

• Horizontal axis: the factor of interest • Vertical axis: response variable

2.3 Interactive Features

Interactive features improve user experience, the ability to explore the data easily and they are the most common and useful to the users. They can be divided into two groups – basic (or built-in) and supported by plug-ins. The former are a part of the tool itself and can be performed even when the plug-in features are active, the latter must be activated and can only be used one at a time.

(16)

0 1 2 3 0 0 4 0 0 5 0 0 6 0 0 7 0 0 8 0 0 9 0 0 Box plot C e ra m ic s tr e n g th Batch 1 Batch 2

Figure 2.8: Sample boxplot

2.3.1 Built-in

1. Zooming – by selecting a part of a graph (using mouse or other point-ing device) the IVP tool can zoom in the selection or create a new window with zoomed selection in it (Figure 2.9). Zooming is a conve-nient way of focusing on smaller portions of graphical representation and provides an easy way to explore data in more details.

2. Adding data sets to graph – selecting any of the loaded data sets (or whole files) in the tool’s data window, dragging them over to the graph area and dropping them above the appropriate axis assigns them to the

Figure 2.9: Graph before and after the zoom action (marked area was se-lected by the pointing device)

(17)

Figure 2.10: Graph before and after the addition of new data set to y axis (the arrow symbolizes drag and drop action)

Figure 2.11: Graph before and after the breaking of coordinate axis (the marked area is the selection of the pointing device)

axis and displays the result immediately (Figure 2.10). This interactive feature allows the user to create and modify the data depicted by the graph and compare the results instantly.

2.3.2 Supported by Plug-ins

1. Breaking axes – by selecting a part of the graph or axis, the IVP tool excludes it (breaks coordinate axes – Figure 2.11). Breaking of coor-dinate axes allows the user to effectively display the data of different ranges or with large variance in magnitude in one plot. It also allows the user to exclude irrelevant or unwanted values from the graph. 2. Scaling axes – by clicking somewhere in the graph area, the selected

point can be set as maximum or minimum of one of the axes (or for both of them, see Figure 2.12). This interactive feature is provided for convenience and allows user to easily set extremes within the graph area and serves the same purpose as zooming feature.

(18)

Figure 2.12: Graph before and after the scaling of coordinate axis (the circle with dark dot inside shows where the click event occurred)

2.4 Drawing in Layers

The IVP tool is able to easily create graphs using data sets from various files. Graph plug-ins have access to all the data sets loaded into program memory and can present them in an arbitrary way.

2.5 Saving and Loading Parameterized View

The process of graph creation using both the interactive and non-interactive features may involve many steps. To recreate the same graph again and continue working on it later, the IVP tool provides the ability to save the view of the graph into an XML file and load it again, when needed.

2.6 Input Formats

The IVP tool supports import from the CSV (comma-separated values) files. The CSV is a simple but widely used format of storing data in human-readable manner. Columns are separated by the comma or white character (space, tabulator, etc.). The input data for a graph can be stored in different files, which ensures the flexibility, when working with data. The IVP tool uses heuristics to determine what the value separator is and whether the data sets have headers with their name on the first line. Of course, if it fails to guess the proper settings automatically, the user can adjust the settings of data import.

2.7 Output Formats

The preferred format for bitmap output is lossless PNG (Portable Net-work Graphics). However, other formats are supported, too: BMP

(19)

(Win-dows Bitmap), JPG (Joint Photographic Experts Group), PPM (Portable Pixmap), XBM (X11 Bitmap) and XPM (X11 Pixmap).

The preferred formats for vector output are PDF (Portable Document Format) and PS (Postscript Format).

2.8 Command Line Functions

The IVP tool can be run in both interactive and batch mode. In a batch mode, the user can use the previously created file containing information about the view or specify the options by passing parameters to the program (for further details, see the User Documentation in Appendix A).

(20)

Chapter 3 Implementation

The design of the IVP tool is affected by the fact that it must be able to run both in the command line mode and in the interactive mode with graphic user interface (GUI), and also by the extensive use of plug-ins to provide (and extend) its features. This chapter presents a detailed description of the implementation of the features described in chapter 2.

The plug-ins, which are in fact shared objects, also called dynamically linked libraries, need to use several objects from the tool. These objects are grouped into the core library called ivp corelib, which is then linked with both the main tool and the plug-ins.

The IVP tool uses the Qt library [11] to create its GUI, takes advantage of Qt’s powerful paint system and convenient plug-in API.

3.1 Core Library

The core library consists of

• basic classes representing the graph and its components – coordinate axes, legend and its visual properties and

• shared GUI elements used by both the plug-ins and the main tool.

3.1.1 Graph

The most important class of the library is the Graph class (Figure 3.1). It represents a graph as an abstract object formed by coordinate axes, graph legend, the plotting area, titles and labels of the graph and axes. Coordinate axes and graph legend are also represented as separate classes – every Graph object can create and use several instances of the Axis class and one instance of GraphLegend class. The Graph class also needs to access the data loaded into the application, therefore a reference to the data management class – iPA (intelligent Pointer Array) needs to be passed to the Graph object for its construction. It also needs an instance of QPainter class, which is used

(21)

for painting on all the QPaintDevices and one instance of class View, which represents various visual aspects of the graph. If the IVP tool runs without GUI, the Graph object also takes care of loading the plug-ins into the tool (using the PluginManager class) and creating the paint device (an instance of QImage class), which will be painted on by plug-ins (see also Figure 3.1). The methods of Graph class provide various ways for working with the coordinate axes—assigning a data set to an axis, removing data set from an axis, adding or removing interval from an axis, its automatic adjusting and drawing of axis with ticks and labels. The class also supports plotting of a graph using the graph plug-ins and interactive plug-in extensions and storing arbitrary numeric graph-specific data for plug-in purposes (e.g. frequencies for histogram and bihistogram graph types). To allow the plug-ins to store a large amount of organized data, this arbitrary numeric information is stored as a two dimensional vector of doubles.

3.1.2 View

The View class is used for storing various graph-specific visual-related fea-tures. It stores image and graph size, offset of the graph from the borders of the image, the type of value marks on graph, background color for the graph area, font styles for heading, etc (see also Figure 3.2). The View class also stores the information about currently active graph and interactive plug-ins. The motivation for this class was to simplify the implementation of undo feature (all the properties of the graph itself modifiable by the user are con-centrated into the View class, whose instances are copied during undo/redo operations).

3.1.3 Axis

Another class used by the Graph class is the Axis class. It is the basic abstraction representing coordinate axes in the tool. Every Axis is formed by a vector of Intervals (see also 3.1.5), which represent the areas visible in the graph (and on the coordinate axes). To support the zooming and breaking of coordinate axes, Intervals can be easily added or removed from the Axis. The intervals are closed, e.g. if the intervals assigned to the axis are [0; 5] and [10; 15], adding interval [5; 10] results in axis with interval [0; 15]. The intervals produced by interactive actions like zooming and breaking axes usually does not have very ‘nice’ extreme values (minimum and maximum). The IVP tool can be set to adjust the extremes of the produced intervals automatically, which usually enhances user experience and makes using the interactive features more pleasant. The algorithm used to adjust borders of the interval is described in [4]. Another technique to improve the user experience is a possibility of graph plug-ins to adjust the extremes for each of the coordinate axes according to the type of graph and the way it presents the data (for further details, see also 3.5.3).

(22)

(23)

(24)

The Axis class takes care of storing which data sets are assigned to it and supports assigning and removing of them. To represent this information, it uses an array of pointers (QList, see [11] for further details) to data set identification numbers (integers). The Axis object is also aware of the real dimension of the axis (how many pixels are reserved for it), thus it can count the scale of the coordinate axis and adjust it when the graph is resized. Axis objects also hold information about their visual appearance from their thickness, the count and style of ticks to the font styles used for their name and labels. Because of this fact, the Axis objects are also part of the state structure (StateStruct, for further details see section 3.6) used for undo/redo purposes.

3.1.4 Discussion about coordinate axes

The Axis class is designed to be as independent as it was possible. However, the fact that the Axis must be prepared to be broken in the middle could not be avoid. If the axis would be totally unaware of the possibility of breaks, it would be very difficult to construct a plug-in (interactive), that ‘breaks’ the axis. The first reason is visual – the axis break is a special, visually distinctive element, that would have to be drawn by the interactive plug-in. Determining position of the breaks, their proper scaling and turning would all happen in the interactive plug-in (not to mention problems with graph labels), whose purpose was to break axis, not to take care of these inferior concerns. The principle of the plug-ins should allow them to take care only of the necessary, allow the possible creators of plug-ins to play with the representation of the graph and not to struggle with marginal issues. This is why it was decided that the Axis class is prepared (at least) for breaking the axis and holding several uncontinuous intervals. This may be not a universal solution, but at least it increases the possibilities of the graph’s coordinate axes and this feature could be used for other types of interactive plug-ins, too. The fact that the Axis class should know about the modifications of the graph as little as possible is also the cause of drawing the axes (including its labels) in the Graph class.

3.1.5 Interval

The Axis class depends on the Interval class, because one of the basic information that coordinate axis holds is the information about the intervals visible on the graph. The Interval class provides a convenient way of using closed intervals (i.e. with minimum and maximum belonging to the interval) supporting detecting each interval’s length, mutual position of more intervals and adding and removing intervals from a list of intervals (e.g. forming a coordinate axis).

(25)

3.1.6 GraphLegend

The GraphLegend class, as its name suggests, represents the abstraction of legend on the graph. The instance of this class holds information about its visibility, background color and position on the graph and the style of font used for drawing. The legend is able to draw itself, since it has access to QPainter of the Graph object and, unlike the coordinate axes, the plug-ins are not supposed to modify its visage. Because the GraphLegend object also holds information about the appearance of a specific part of the graph, it is a part of the state structure used for undo/redo feature (for details see section 3.6).

3.1.7 Data management system

The iPA (intelligent Pointer Array) class forms the data management center of the IVP tool. It is a singleton (but no design principle prevents a program-mer from creating and using more than one instance) object able to import data from the CSV files, use heuristics to analyze the file and provide the tool with fast and straightforward access to the imported data sets. The im-ported data sets are saved in two-dimensional QVector (of double-precision numbers) object. The QVector class was chosen because it is native to the Qt library, thus it is portable and supports fast appending (amortized time complexity O(1)) and index look-up (time complexity O(1)). According to my own measurements, it outperforms other Qt’s storage classes like QList or QLinkedList ([11] provides further details about these classes). These two operations are the only ones used – the first one during the import of CSV file into program’s memory, the latter one during plotting graphs. The disadvantage of the QVector container and the price paid for fast index-based access is that it stores the values in adjacent memory positions, which mean that it may not be able to find continuous chunk of memory to store the data from the imported file. This disadvantage becomes even bigger, be-cause immediately after the data import, the iPA class tries to find out some basic statistical information like median, minimum, maximum and quartiles. This information are obtained by sorting a copy of the original QVector ob-ject. This approach causes that the IVP tool needs memory capacity size more than twice larger than is the size of imported data (since quicksort does not really sort the data in place). These basic statistical information (median, minimum, maximum, first and third quartile) are, however, saved at the time of data set import and they need not to be counted any more (during the particular run of the IVP tool). Obtaining the values is not very time complex, since the quicksort algorithm runs in O(n.logn) where n is the number of sorted items, so even if some plug-ins do not need this information, it does not hurt the overall performance of the IVP tool.

The iPA class also stores the names of the imported data sets (which are adjustable by the user to simplify their identification), can tell which data

(26)

set comes from which file, takes care of showing a progress bar during longer data import and uses some simple heuristics to analyze the input CSV file. The first heuristics is used for determining whether the first line of im-ported file contains names of data sets. This heuristics is very simple, it just searches for a character that is a letter inside the string and if it finds any letter, it assumes that the first line contains names. The second heuris-tics tires to guess the value separator used in the data file (to enhance the user experience and possibly simplify data import). Its algorithm is quite simple, too. First, it tries to find first character that is not a digit. If all the characters on the line are digits, it assumes that new line character is the separator. Otherwise, the separator is the first non-digit character. An example of a confusing and ambiguous string is “123.456,789”. There is no way how to tell whether the file uses the comma or the dot as a decimal separator, so I chose to prefer the English-like convention and consider such strings to be two numbers: 123.456 and 789. The similar problem arises with strings like “123.456” and “123,456”. In this case I chose that this kind of strings would be considered as two numbers – 123 and 456.

3.1.8 PluginManager and Plug-in Management

Sys-tem

The IVP tool uses one of the Qt’s APIs to support the plug-in functionality. In the Qt documentation ([11]), the API providing means for extending Qt applications is called the lower-level API. These steps need to be followed to create an application with functional plug-in system:

1. Create and declare a set of interfaces (i.e. abstract classes with only pure virtual functions) for the plug-ins.

2. Use Q DECLARE INTERFACE macro to inform the Qt’s meta-object com-piler about the interfaces.

3. In the application, the plug-ins are loaded using the QPluginLoader class and tested using qobject cast

The interfaces declared for three types of plug-ins used in IVP are shown in Figure 3.3:

The QPluginLoader class detects and loads the plug-ins (i.e. shared libraries) into the IVP tool at runtime. It also checks whether the plug-in is linked against the same version of the Qt library as the application and provides a convenient way of accessing the components of the plug-in with-out the need to resolve function names manually. The qobject cast works similarly as dynamic cast in C++, except it does not need RTTI support, can work across the boundaries of libraries and requires both classes (the source and resulting) to inherit (at least indirectly) from the QObject class and be declared using the Q OBJECT macro (for further details, see [11]). Be-cause of these features, the instance created by QPluginLoader can be tested

(27)

class GraphPluginInterface { public:

virtual ~GraphPluginInterface() {};

virtual QStringList graphNames() const = 0; virtual int plotGraph( const QString &graphName,

const Graph &myGraph, const iPA &myData ) = 0; virtual GraphPluginWidget* settingsWidget(

const QString &graphName, const Graph &myGraph, QWidget *parent ) = 0;

virtual void adjustGraph( const QString &graphName, Graph &myGraph, const iPA &myData ) = 0;

};

class InteractivePluginInterface { public:

virtual ~InteractivePluginInterface() {};

virtual QStringList interactivePlugins() const = 0; virtual void alterGraph( Graph *myGraph,

const QRectF &rect ) = 0;

virtual int plotGraph( Graph *graph ) = 0;

virtual InteractivePluginWidget* settingsWidget( const QString &pluginName, const Graph &myGraph, QWidget *parent ) = 0;

virtual bool paintReplace() = 0; };

(28)

with qobject cast to find out, which interface it implements. Only casting to proper interface object returns a valid pointer to the plug-in instance, casting to other interfaces returns zero.

The plug-in itself is a shared library sometimes also called dynamically linked library. To write a plug-in that loads into the application successfully, one must:

1. Declare a plug-in class that inherits from both QObject class and the interface it implements.

2. Use the Q INTERFACES macro in the class declaration to inform the Qt’s meta-object compiler about the class that implements the interface. 3. Use the Q EXPORT PLUGIN2 macro in the class definition.

Besides these generic rules valid for every low-level plug-in extending Qt applications, the plug-ins used in IVP must respect some rules to be fully usable in the tool and its graphic user interface. The most basic requirement is that every plug-in should provide a method that returns the names or identifications of actions it is able to perform (e.g. the graph plug-in has a method graphNames which returns the name of the graph type(s) it is able to draw, e.g. “Bar plot”). These names then become an identification of the plug-ins actions throughout the IVP tool. Another requirement is enforced by the use of GUI in the tool. The graph plug-ins, as well as interactive plug-ins, should provide a widget that is used for interactive communication between the user and the plug-in and convenient accessibility of the plug-in setting (for further information, see also 3.1.9).

The PluginManager class is a wrapper around the QPluginLoader class used to simplify the using of the plug-ins in the IVP tool. The instance of this class is constructed during the start-up of the IVP tool and it tries to load all the plug-ins residing in the “./plugins/” directory automatically (see also Figure 3.4).

When qobject cast succeeds in casting the plug-in to one of the inter-faces, the PluginManager then creates an instance of QAction class and passes the identification string of the plug-in action and the pointer to the plug-in object to QAction’s constructor as the text and parent pa-rameters respectively. The PluginManager also saves the newly created QActionobject to list of plugins of the appropriate type (the PluginManager uses three QLists to store the QAction objects) and creates a connection between the triggered signal of QAction and appropriate function that starts the plug-in action (e.g. for graph plug-in, the connection between QAction::triggered() signal and PluginManager’s parent plotGraph slot is created; for more information about the Qt’s signal and slots mechanism see [11]). See also Figure 3.5.

The PluginManager class also provides methods for searching plug-ins by their identification text, index (which is the ordinal number of their loading

(29)

foreach (QString fileName, pluginsDir.entryList(QDir::Files)) { QPluginLoader loader(pluginsDir.absoluteFilePath(fileName)); QObject *plugin = loader.instance();

if (plugin) {

GraphPluginInterface *gPlugin =

qobject cast<GraphPluginInterface *>(plugin); if (gPlugin)

addPluginActions(plugin, iPlugin->graphNames(), SLOT(plotGraph()));

}; };

Figure 3.4: Determining the type of plug-in by qobject cast

void addPluginActions( QObject *plugin, const QStringList &texts, const char *member, QMenu *menu ){

foreach (QString text, texts) {

QAction *action = new QAction(text, plugin);

connect(action, SIGNAL(triggered()), this, member); }

};

(30)

into the IVP tool) and easy ways for accessing particular types of plug-ins. During the IVP tool’s start-up, it is also used to find an appropriate plug-in that can perform drawing of the graphs (see also section 3.5.4).

3.1.9 Shared Graphic User Interface Elements

Several GUI elements need to be used by both the IVP tool and the plug-ins. These are grouped together in the tool’s core library to allow sharing of them among separate parts of the IVP tool.

The GraphSettingsDialog class is used for creating new graphs and for adjusting its settings. It is formed by four separate widgets managed by a QTabWidget class, which takes care of showing them as tabs of the same dialogue window. The three of these widgets – AxesTab, LegendTab and TitlesTab – are ordinary widgets, descendants of QWidget class that can change various graph settings. However, the GraphTypeTab widget is a special kind of widget that is able to show settings widgets provided by the graph plug-ins.

When the GraphTypeTab widget is instantiated, it asks all the graph plug-ins to supply it with their widgets. These widgets are descendants of GraphPluginWidget class, which declares what methods the settings wid-gets should support to be able to communicate with the GraphTypeTab and GraphSettingsDialog objects. The GraphTypeTab saves all the important information from the settings widgets in three QStrings – xAxis, yAxis and arbitrary. Every graph plug-in settings widget (classes that inherit the GraphPluginWidget) should implement its own methods to be able to write the plug-ins’ settings to GraphTypeTab’s strings when the user confirms the dialog and read these settings back when the dialog is constructed again. These methods are called write and read respectively (see Figure 3.6 for a comprehensive scheme).

To make the creating of the graph more convenient for the user and graph plug-ins more versatile, the graph plug-ins are able to set the ini-tial range of the coordinate axes. These iniini-tial ranges are counted not only when the dialog window is confirmed, but also when the user switches from the GraphTypeTab to AxisTab. This feature is supplied by the axisRange function of GraphPluginWidget’s child classes. Another useful feature is supported by the GraphPluginWidget’s validate method. This function is called when the user clicks on the OK button to confirm the dialog and provides a way for graph plug-in to validate the user input data. The GraphSettingsDialog can not be confirmed until the validate method re-turns true, so it is necessary to be careful when implementing this method in the plug-ins.

The InteractivePluginSettingsDialog class is used for adjusting set-tings of interactive plug-ins. The principle of its implementation is the same as for GraphSettingsDialog class. It displays settings widgets provided by interactive plug-in’s settingsWidget function, which are descendants of

(31)

Figure 3.6: The concept of GraphSettingsDialog

InteractivePluginWidget class.

3.2 Command Line Mode

The command line mode uses only the core of the IVP tool and plug-ins. Since the tool uses the Qt library, it must first construct an instance of a QApplication class. Without this object, some important features like font drawing would be impossible and it is crucial for Qt’s signals and slots mechanism, too. Besides the Qt library’s requirements, the IVP tool must also construct its data management object – iPA – and an instance of the basic graph-representing class – Graph. During the construction of the Graph object, if the application runs in command line mode, the class ConsoleSettingsReader along with IvpXMLparser loads the user settings passed to the program as parameters, plots graph according to that settings and if no errors occur, creates an image as the program output.

3.3 Interactive Mode

The interactive mode of the IVP tool starts GUI, which uses multiple doc-ument interface (MDI) paradigm to allow users to create and work with more graphs simultaneously. At the beginning, the basic class for Qt li-brary applications – QApplication – is constructed. The basic window,

(32)

which manages all the other windows, widgets and dialogues is an instance of the MainWindow class. During the construction of the MainWindow ob-ject, the data and plug-in management objects are created, too (iPA and PluginManager). The IVP tool menu with appropriate QAction objects needed for its functionality is constructed and the signal-slot mechanism is used for updating the availability of menu items. The MainWindow then also creates two dock widgets and takes care of updating them with proper data. The first QDockWidget presents basic information about the currently active graph (i.e. the name of the graph, currently active data sets and the range of coordinate axes). The second QDockWidget holds information about the imported data sets. It uses modified QTreeWidget – TreeDragWidget – that supports interactive adding of data sets to graph by using drag and drop technology (for further information about the drag and drop technology in Qt s see [11]).

The MainWindow class uses Qt’s class QWorkspace to manage the ChildWindows that represent particular graphs. Nearly all the graph actions are passed from the MainWindow to the currently active ChildWindow. The MainWindowobject usually displays dialog widgets (e.g. csvDialog for open-ing files or GraphSettopen-ingsDialog to create new graphs and change settopen-ings of existing ones) and then passes the control to appropriate ChildWindow object.

The ChildWindow class, which inherits QWidget class, represents graph in interactive mode and process all kinds of user-generated events. It cre-ates its own QPaintDevice – QPixmap – which is then passed to the newly constructed Graph object (along with references to already existing iPA and PluginManager classes). The painter of the Graph object then paints di-rectly onto the ChildWindow’s pixmap, uses plug-ins from PluginManager object and data from the iPA object.

The ChildWindow class also instantiates UndoManager class, which pro-vides undo and redo features for every window (and graph). For detailed information about the undo/redo feature, see 3.6.

3.4 Plotting System

The system for plotting graphs is quite sophisticated. The basic ideas are, however, simple and straightforward. At some point, the graph plug-in function must be called to plot the graph. The graph plug-in plots only the graph itself. The coordinate axes, legend and graph titles are drawn by the interactive plug-in plotting function. This approach was chosen to make both the graph and interactive plug-ins as independent as possible. When the graph needs to be plotted,

1. the plotting function first calls Graph’s adjustByGraphPlugin func-tion, which calls graph plug-ins’ method to prepare the graph settings and coordinate axes according to plug-in specific needs. Then

(33)

2. the Graph’s plotGraph function is called. This functions finds out, which paint replace interactive plug-in is active (this information should already be set and stored in the View object) and

3. uses paint-replace interactive plug-in’s1

plotting function. The inter-active plug-ins’ function takes care of plotting coordinate axes, graph legend and, in turn,

4. uses Graph’s purePlotGraph function to plot graph using the graph plotting plug-in. The appropriate graph plug-in type is determined by the graph name and is then supplied by the PluginManager object (for further information regarding PluginManager see section 3.1.8). This combination of graph plug-ins and interactive plug-ins ensures that graph plug-ins only take care of plotting the graph itself and interactive plug-ins are very versatile.

3.5 Plug-in System

The IVP tool was designed to be easily extensible by creating new plug-ins. This decision not only improves the extensibility and usability of the tool, but could also potentially prolong its lifetime. The IVP tool implements two types of plugins – graph plotting and interactive plug-ins. Both plug-in types are shared object that use Qt’s low-level API to extend the functionality of the tool (for further implementation details, see 3.1.8). Because the plugins usually offer drawing customizable by several parameters, they should offer a method to somehow communicate with user. The IVP tool requires them to support a widget that can user interact with. The tool’s Graph and View classes can store arbitrary information from plug-ins as QStrings.

3.5.1 Interafaces

Two pure virtual abstract classes were designed to provide interfaces for graph and interactive plugins. The plug-in must inherit one of these classes, the QObject class and implement the virtual methods of the interface. The GraphPluginInterface requires the graph plug-in to implement methods for plotting graph, automatic adjusting of coordinate axes and graph set-tings and a method providing a GraphPluginWidget (GUI element which interacts with user). The InteractivePluginInterface declares methods for handling mouse events, managing the graph plotting process and, sim-ilarly as for graph plug-in, it requires a method providing a GUI element (InteractivePluginWidget) that interacts with the user. (For listing the interfaces, see Figure 3.3)

1_{The distinction between paint-replace and only mouse events handling plug-ins is}

(34)

3.5.2 Lifecycle of Plug-ins

The IVP tool’s plug-ins are always loaded during the start-up of the tool. The PluginManager class is constructed either by MainWindow or Graph project, depending on the user interface mode of the tool. This class makes use of Qt’s QPluginLoader class, which creates instances of plug-in classes from the shared objects that meet all the requirements (for further informa-tion about these requirements, see PluginManager details in 3.1.8). Every plug-in class is instantiated only once and this instance is used during the application’s whole lifetime. This approach limits the use of plug-in classes – they can not be used for storing graph-specific data, because these would be shared among all the graphs that use the plug-in. Because of that, the plugins act as external functions providing new functionality.

The access to the plug-ins is provided by the PluginManager class, which stores pointers to the successfully instantiated plug-ins as part (parents) of QAction objects. These objects are created for every action that plug-in can perform (every graph plug-in can provide more actions). This approach makes calling the main plug-in function easy, because it is connected with the appropriate QAction’s trigger event.

3.5.3 Graph Plotting Plug-ins

The Graph plotting plug-ins draws graphs depending on the underlying data and user information about its representation. To make these plug-ins as independent as possible, they are not aware of coordinate axes details or scaling of the graph. All the required information are supported by Graph and View classes. Every graph plug-in must implement these methods:

• graphNames – returns the names of graph types the plug-in is able to draw. Because these names are used as identifiers throughout the IVP tool, they should not collide

• plotGraph – is the worker method of the graph plug-in which takes care of the plotting itself

• settingsWidget – creates and sets up a settings widget that can in-teract with the user in GUI

• adjustGraph – customizes the graph and its axes in various ways • axisRange – creates an Axis with range adjusted according to the

data shown in the graph automatically

Plotting of the graph happens in the plug-in’s plotGraph function. To make this task as straightforward as possible, the plug-in can use QPainter object supported by the Graph, which is already prepared for painting. Plot-ting the graph then usually involves these steps:

(35)

1. finding out the ranges of coordinate axes using graph.axis(axisName)->min/max() functions,

2. setting up the colors of pen and brush graph.painter()->setPen/ setBrush()),

3. retrieving the data from the data management object (using myData.getValue(dataSetNumber, index) function) and

4. drawing the representation of the data using Qt’s QPainter functions, e.g. drawLine, drawPoints, drawRect or drawEllipse.

The adjustGraph function is called every time before the graph is plot-ted, so it can be used for any graph-specific settings (e.g. preparing his-togram data, setting up the variables according to user change, changing axis range).

The axisRange function is called whenever a new data set is assigned to one of the coordinate axes to suggest new range of that axis (it returns a sample Axis object with suggested interval assigned to it).

The settingsWidget function returns a widget which makes it possible to adjust graph plug-in settings easily. The widget must inherit the

GraphPluginWidget class and provide these methods:

• writeSettings – writes settings from the user to the GraphTypeTab object

• readSettings – reads settings from given parameters and set the wid-gets GUI elements appropriately

• validate – ensures that all the data entered by the user are under-standable to the plug-in

• axisRange – a wrapper function that calls appropriate graph plug-in function

The widgets provided by the graph plotting plug-ins become a part of the GraphSettingsDialog that is used both for creating new graphs and changing the settings of existing ones. They appear on a stacked widget on the GraphTypeTab settings tab.

Methods for reading and writing settings were introduced to simplify and unify the communication between these external widgets and the core of the IVP tool. The GraphTypeTab widget stores the information from these wid-gets in three QStrings, two for coordinate axes and the third for arbitrary settings. The IVP core can easily read the settings written to GraphTypeTab internal structures using writeSettings function and then the settings can be easily retrieved and passed as parameters when the settings widget of the graph is showed again. (see also 3.1.9 and Figure 3.6)

(36)

Histogram and Bihistogram plug-in

While the rest of the graph plug-ins use the general steps described earlier in this section, graph plug-in used for drawing histograms and bihistograms shows the capability of plug-ins to store graph-related data in Graph object. Since both graph types need information about frequencies for each bucket (for description of histogram and bihistogram, see section 2.2), which are not present in the tool’s data management system, it uses its own function prepareHistogramDatato count the frequencies. This data is then stored in Graphobject using Graph’s appendAdditionalData method (see also section 3.1.1). To optimize the plug-in, not only frequencies, but also information about the counted data is stored. Then, the frequencies are counted only if the user changed settings of the graph (e.g. changed the depicted data set or number of buckets).

Instead of retrieving the data from the iPA object, it is retrieved them from Graph object using getAdditionalData function.

3.5.4 Interactive Plug-ins

The interactive plug-ins provide new functionality for the IVP tool. They can either only react to mouse release event or take care of plotting the graph (these are called “paint-replace” plug-ins). The former are simpler, because they only implement function for handling the mouse release event (alterGraph), while the latter must also provide a function that plots the graph and coordinate axes (plotGraph). The View object is used to store information regarding which paint-replace and which mouse release event plug-ins are active. If the interactive plug-in does not offer the graph plot-ting function, its alterGraph function can be called by mouse release event handler, but the plotting of the graph is provided by the last active paint-replace interactive plug-in. This makes the interactive plug-ins versatile and powerful – they can either offer completely different manner of plotting the graph, or implement just the mouse release event handler.

During the start-up of the IVP tool, the PluginManager object tries to use standard paint-replacing plug-in called “Standard 2D”. If this plug-in is not present, the PluginManager object searches for another suitable plug-in. If it finds one, the plug-in then becomes a default paint-replace plug-in for every created graph. Otherwise the IVP tool will run in restricted mode. Standard Plug-in

The Std2DInteractivePlugin class is a basic paint-replace interactive plug-in that is usually the default plug-plug-in when the IVP tool starts. It does not offers any special interactive features, its purpose is the existence of a plug-in, which can plot the graphs. It also shows how a sample paint-replace interactive plug-in should work. The plotGraph function obtains the painter from the Graph object, uses purePlotGraph method to plot the graph by

(37)

graph plug-in. Then the painter is translated into appropriate position to draw coordinate axes, the legend and the title of the graph (using Graph’s drawAxis and drawLegendAndTitles function). The plug-in also does not provide any widget used for adjusting its settings.

Break Axis Plug-in

The BreakAxisInteractivePlugin class is a paint-replace interactive plug-in that allows the user to break coordplug-inate axes. It implements both the plotGraphand alterGraph functions, which means that it draws the graph in its own manner and can react to mouse release events, too. The breaking of coordinate axes is accomplished by drawing separated smaller graphs using appropriate offset and axis settings. The alterGraph function, which is called whenever a mouse release event occurs, removes interval from an axis depending on the shape and the position of the selected area.

Scale Axis Plug-in

The ScaleAxisInteractivePlugin is an interactive plug-in, which does not provide paint-replace function, it implements only the alterGraph function. Thus it only adds new functions, which remove intervals from coordinate axes as a response to mouse release events.

3.6 Undo and Redo Features

The Undo and Redo features are very important for interactive work with graphical representation of the data, because it allows the user to revert unwanted changes in the graph and return back to its previous state. The IVP tool offers undo and redo feature with number of steps limited only by the size of memory of the computer.

The undo and redo feature was added in later stages of application devel-opment, which results in the fact that it is an isolated part of the tool formed by the UndoManager class. The basic structure used by the UndoManager class is StateStruct, which stores all the relevant graph settings. It con-sists of pointers to View and Legend objects and a QVector of pointers to coordinate axes. During the work with the tool, UndoManager’s saveState function is called whenever the graph settings change. This function cre-ates a copy of objects currently used by the Graph object, store pointers to them in the StateStruct and pushes this structure on the stack. When the undo function is called later, the previous settings are retrieved from the StateStruct’s objects by copying them to the Graph object, which restores its previous settings. The redo function works almost the same way except that it does not decrement the stack index, it increments it.

If the saveState function is called when the stack index points to the middle of the stack, all states that are higher on the stack are deleted.

(38)

3.7 Loading and Saving View Files

The graphs created by the user may be complex and untrivial. Therefore, the ability to save the graph settings, load them later and continue working on the graph is necessary.

The QTextStream class is used to write XML graph settings files in an easy and straightforward way. To read the settings from the XML settings file, the IvpXMLparser class is used. This class uses the Qt’s SAX2 interface to read the XML files (for further information about Qt’s SAX2 implemen-tation, see [11]).

(39)

Chapter 4 Conclusion

The main goal of the Interactive Data Visualization Project was to design and implement an extensible tool for an interactive visualization of scientific data. Except for the non-interactive plug-ins, which were supposed to mod-ify the data and allow the user to use various types of statistic techniques, all of the features from the specification have been implemented. Support for non-interactive plug-ins is subject to future work.

The design of the IVP tool evolved continuously during its development. New features were added and various discovered issues were fixed, but the overall design of the tool remained the same. A short characteristic of the key design decisions I have made, and some remaining issues that I have identified, follows, together with alternative or possible solutions.

The Axis class also includes the support for breaking (used by the Break Axis interactive in); however, some special requirements of other plug-ins would probably require its further modification. It is quite difficult to anticipate, how the potential plugins might modify the coordinate axes. Drawing of the coordinate axes is another issue, whose solution may need future revision. The axes, along with their ticks, labels and names are drawn by the Graph class, which is another compromise. Either the Axis acts more like a simple class that holds several intervals and is unaware of the breaks, or it uses the data management system, knows which graph it belongs to, draws itself, etc. I chose the former, lightweight Axis, because it is easier to manipulate with during undo/redo operations and in interactive plug-ins.

The result of versatility of the interactive plug-ins is that the Graph and the tool are unaware of the positions of the objects displayed on the graph. Thus, it is impossible to change the settings of these objects by clicking on them or move them using a pointing device. It may be useful to represent these objects as the descendants of two classes – MovableGraphElement and StaticGraphElement. The legend, names of coordinate axes or graph title (as the candidates for the descendants of the MovableGraphElement) would then be movable or clickable, while the descendants of StaticGraphElement (e.g. coordinate axes) could be only selected and modified, but not moved. These improvements can be implemented in the future versions of the project.

(40)

The primary flaw in the design of the iPA class is that it can only import data from CSV files and it is incompatible with any other file format. The use of QVector class for storing the imported data can be also limiting. A database, like Qt’s built-in SQLite could be used for storing the data, which could simplify work with large data sets. The performance impact of this solution was, however, not evaluated. Another feature subject to future work is the ability of editing imported data in a spreadsheet-like manner. This should be a simple extension, since Qt library offers special spreadsheet widgets.

The painting system of the IVP tool is rather simple and it may need revision in the future. The main flaw is that the painting of the graph takes place in the GUI thread. This makes the IVP tool unresponsive when a large data set is plotted. However, this could not be easily changed since the Qt library can draw fonts outside the GUI thread only under certain circumstances. One of the possible solutions may be to plot only the graph itself in a separate thread and the fonts in the GUI thread. This solution is also subject to future work.

Another improvement of the painting system could be accomplished by making use of the hardware acceleration for drawing graphs. The newer versions of the Qt library offer a convenient support for using the OpenGL acceleration, which could be used, as well. Even using pixel and vertex shaders is possible.

Other useful features that could improve the graphs, such as drawing a grid over the graph, drawing axes on both sides of the graph, using axis labels from external files, or coordinate axes with logarithmic scales could also be implemented.

Despite these flaws, the IVP tool allows the user to draw basic graph types and work with them interactively in a convenient way.

(41)

Bibliography

[1] Dataplot homepage. http://www.itl.nist.gov/div898/software/dataplot/, May 30, 2008.

[2] gnuplot homepage. http://www.gnuplot.info/, May 30, 2008.

[3] Harris Robert L. (1999): Information Graphics: A Comprehensive Il-lustrated Reference. Oxford University Press.

[4] Heckbert Paul S. (1991): Nice Numbers for Graph Labels. Graphic Gems II, pp. 61-63, Academic Press.

[5] Hornik Kurt (2008): The R FAQ.

http://CRAN.R-project.org/doc/FAQ/R-FAQ.html.

[6] Origin homepage. http://www.originlab.com/index.aspx?s=8&lm=10, May 30, 2008.

[7] R project homepage. http://www.r-project.org/, May 30, 2008.

[8] Scott David W. (1992): Multivariate Density Estimation: Theory, Practice, and Visualization. Wiley-Interscience.

[9] Sigma Plot homepage. http://www.systat.com/products/sigmaplot/, May 30, 2008.

[10] NIST/SEMATECH e-Handbook of Statistical Methods.

http://www.itl.nist.gov/div898/handbook/, May 30, 2008.

[11] Qt Reference Documentation. http://doc.trolltech.com/4.3/index.html, May 30, 2008.

[12] Tufte Edward R. (1983): The Visual Display of Quantitative Informa-tion. Graphics Press, Cheshire.

(42)

Appendix A

User Documentation

The IVP tool is a modular interactive graphing software, which supports easy manipulation with graphical representation of data. This document describes how to obtain, install and use this tool to create graphs according to user needs.

A.1 Package content

The IVP package contains: • The IVP tool source code • Qt 4.4.0 library

• User Documentation • Test data

A.2 Requirements

A.2.1 Hardware Requirements

The IVP tool should work on all architectures where the Qt library is avail-able, it was tested on x86 and x86-64 architectures.

A.2.2 Software Requirements

The IVP tool is dependent on Qt library, which is used for both creating GUI and drawing the graphs. It was tested on these configurations:

• Slackware Linux 12.0 and 12.1 with Qt 4.3.3, 4.3.4 and 4.4.0

(43)

The tool may also work with older versions of Qt library, but the newest one – Qt 4.4.0 – introduces various useful features that improves the usabil-ity of the IVP tool. The current version of the Qt library can be downloaded from the webpage of Trolltech ASA company (http://trolltech.com) or in-stalled from the enclosed package.

To compile the tool, a modern C++ compiler is needed; the compilation was tested on gcc 4.

A.3 Compilation

The compilation of the IVP tool is simple, to start the compilation script on Unix platforms, run

./compile.sh /path/to/Qt4/library/

The script checks whether it can find the proper version of Qt library and either asks the user to install new version, or creates makefiles and compiles the application using the make tool. If no error occurred during the compilation, everything went well and you can start using the application.

A.4 Installation

The IVP tool does not need installation, it can be run from the directory where it was compiled. However, if you would like to install IVP tool on your system, run

./install.sh /prefix/

This script copies the IVP binary, library and plugins relative to /prefix/ directory. Installing the application in system-wide mode usually requires root privileges.

If the IVP tool can then be started by running ./ivp

script that sets environment variables for the tool and starts it.

A.5 Usage

The IVP tool is able to work in two modes – the batch mode and the interactive mode. While the former is usually faster and less powerful, the latter allows the user to use all the interactive features and explore the graphical representation of the data.

(44)

A.5.1 Console Mode

The batch (or console) mode of the application can be typically used for drawing graphs quickly using the previously saved view files or settings given by command-line parameters. Since the Qt library is not able to draw fonts on Linux, when the X server is not running, the IVP tool depends on running the X server, too. These parameters are supported by the IVP tool:

• -d ‘‘fileName1.csv[1,2];fileName2.csv[3] axisToDraw fileName2.csv axisToDraw’’,

e.g. -d ‘‘myData.csv[0] y myData.csv[1] x’’ would assign the first data sets from myData.csv file to y-axis and the second data set to x-axis. Multiple files can be separated by a semi-colon. Numbers of data set, which will be assigned to the specified axis can be specified after the file name in square brackets. If they are assigned to the same axis, graph with more than one series of data will be plotted.

• -t ‘‘graphType’’,

e.g. -t ‘‘Histogram’’ would draw a histogram from the data speci-fied before

• -o ‘‘fileName.png res X;res Y’’,

e.g. -o ‘‘myGraph.png 800;600’’ would create a png image with resolution 800 by 600 pixels); the format of the output file is deter-mined by the file name extension. Both vector and bitmap output formats are supported.

• -y ‘‘min Y value;max Y value’’,

e.g. -y ‘‘500;800’’ would scale the y-axis so that 500 would be the minimum and 800 the maximum value

• -x ‘‘min X value;max X value’’, similar to y-axis Other attributes will have their default value and they can be changed through GUI or using the view file:

• -v myView.xml If both -v and some other batch arguments will be passed, the batch arguments have greater priority and they overwrite settings stored in the view file.

A.5.2 Interactive Mode

The interactive mode offers a full-featured environment of the IVP tool. It uses the MDI paradigm to allow working with several graphs concurrently. This mode is started by passing the -g parameter to the IVP tool. The main window consists of a workspace, which could contain graphs, dock widgets diplaying useful information the user and tool’s menu (Figure A.1).

The main window of the tool only allows the user to open new data files or exit from the application. After choosing the files, which should be

BACHELOR THESIS. Peter Fabian Interactive Data Visualization Tool

Charles University in Prague

Faculty of Mathematics and Physics

BACHELOR THESIS

Peter Fabian

Interactive Data Visualization Tool

Department of Software Engineering

Supervisor: RNDr. Pavel Par´ızek

Study program: Computer Science

Contents

Chapter 1

Introduction

1.1

Related Work

Chapter 2

Supported Features

2.1

Overview

2.2

Graph Types

2.3

Interactive Features

2.3.1

Built-in

2.3.2

Supported by Plug-ins

2.4

Drawing in Layers

2.5

Saving and Loading Parameterized View

2.6

Input Formats

2.7

Output Formats

2.8

Command Line Functions

Chapter 3

Implementation

3.1

Core Library

3.1.1

Graph

3.1.2

View

3.1.3

Axis

3.1.4

Discussion about coordinate axes

3.1.5

Interval

3.1.6

GraphLegend

3.1.7

Data management system

3.1.8

PluginManager and Plug-in Management

Sys-tem

3.1.9

Shared Graphic User Interface Elements

3.2

Command Line Mode

3.3

Interactive Mode

3.4

Plotting System

3.5

Plug-in System

3.5.1

Interafaces

3.5.2

Lifecycle of Plug-ins

3.5.3

Graph Plotting Plug-ins

3.5.4

Interactive Plug-ins

3.6

Undo and Redo Features

3.7

Loading and Saving View Files

Chapter 4