Towards Design Patterns for Dynamic Analytical Data Visualization

(1)

Towards Design Patterns for Dynamic Analytical Data Visualization

Hong Chen

Analytical Solutions Division, SAS Institute Inc.

SAS Campus Drive, Cary, NC 27513

[email protected]

ABSTRACT

This paper advocates the study and use of visualization design patterns to improve development productivity and usage effectiveness in dynamic, analytical data visualization. Nine visualization design patterns are presented formally using the current de facto pattern description language. Organized in three categories (data, structural, and behavioral), these patterns summarize many common practices and techniques used in the process of dynamic, analytical data visualization. A relationship diagram is also introduced to illustrate the common relationships and uses of the patterns. Driven by the study of design patterns, a simple, yet powerful, architecture design for a dynamic, analytical data visualization library is proposed. The fundamental characteristics of the design are component-based, data centric, and layout-aided.

Keywords: design patterns, data visualization, dynamic graphics, data model, graphics layout 1. INTRODUCTION

Dynamic visualization techniques [10] are widely used in various analytical visualization tasks including exploratory data analysis, process monitoring, and reporting results. While some of these tasks are very simple, many can also be quite complex. In either case, the development of software visualization systems to address these tasks is time consuming and burdensome. With the growing emphasis on information visualization in the scientific arena, new techniques are needed to improve the efficiency and effectiveness of the visualization development process.

Conceptually, dynamic, analytical data visualizations can be decomposed as a function of at least four domains: analysis, data, graphics, and user interface:

F(analysis, data, graphics, user interface).

There are many, if not countless, factors to consider in each domain. The richness of the graphics domain alone is well demonstrated by [4] and [14]. Accounting for all the factors in these four domains in a dynamic, analytical data visualization system is a complex and formidable task.

To address this complexity, researchers and developers have spent many years developing general-purpose visualization systems, such as SAS/JMP [20] and SpotFire [2], with as many features as possible. To simplify the development process, other applications have focused on specific domains of a smaller scale [9].

Recently, researchers have attempted to construct reference models and taxonomies of information visualization techniques [6][7][9][19]. These models and taxonomies can be used to help developers quickly identify, apply, and implement various techniques needed in different visualization applications.

Researchers have also attempted to apply formalisms to construct graphs and transformations more efficiently and systematically. Mackinlay’s APT system [15] uses formal graphical specifications, including graphical languages and composition rules, to automate the construction of 2D displays of relational data. Recently, Wilkinson [24] introduced a language to construct statistical graphs using a graph algebra. The Polaris system [21] extends Wilkinson’s idea to visualize relational databases based on formal specifications.

Since Gamma, et al., published their monumental book on design patterns [13], design with patterns has become common practice in object-oriented software engineering [5]. Properly used, design patterns help developers make their design and implementation smaller, simpler, more flexible, modular, reusable, and understandable — which makes them more productive. Visualization researchers and developers can also benefit from the use of design patterns.

This paper describes our efforts to use design patterns to improve the development productivity and the usage effectiveness in dynamic, analytical data visualization. A new area called visualization design patterns is introduced to help the visualization users and developers better understand, use, and develop visualization tasks. Nine visualization design patterns are presented which encompass many common practices and techniques used in the process of dynamic, analytical data visualization. For each pattern, the context, problem, forces, solution, and other aspects are described using the current de facto pattern

(2)

description language. These patterns are organized in data, structural, and behavioral categories. A relationship diagram is also introduced to illustrate the relationships among patterns.

In addition, a simple, yet powerful, software architecture is also presented for a dynamic, analytical data visualization library based on visual and software design patterns. The architecture is used to illustrate the applications of visualization design patterns. One of the novel features of this architecture is the use of the scale model and the lattice as the fundamental building blocks of a dynamic, analytical data visualization system. This pattern-driven approach is complementary to those previously discussed, and design patterns are viewed as another weapon in the visualization developer’s arsenal.

2. RELATED WORK

The discussion of design patterns in the field of information visualization has been limited and the area is ripe for exploitation. Eick [11] introduced the notion of visual design patterns to informally describe three patterns used to construct perspectives in the visual query and analysis environment ADVIZOR.

Stolte et al. [22] extended Polaris [21] to support multiscale visualization using multiple zoom paths and lattice of OLAP data cubes. Four patterns were presented informally to capture the zoom structures within their system.

This paper introduces the visualization design patterns which generalize the notion of visual design patterns [11]. Most of the patterns introduced in the above work can be viewed as the variants or special usages of some of the visualization design patterns described later in this paper.

3. ARCHITECTURE OF A DYNAMIC, ANALYTICAL DATA VISUALIZATION LIBRARY We have designed and implemented a dynamic, analytical data visualization library in the Java language. The design motivation is to support the common uses of the visualization design patterns presented later in the paper.

The library architecture is composed of four key element types: component, data, scale, and lattice. This design simplifies and unifies the development of many graphical displays in dynamic, analytical data visualization. It can be characterized as a component-based, data centric and layout-aided design. Each element type is described in detail in the following sections. Component: Components facilitate the abstraction and encapsulation of the visualization details. The fully object-oriented component concept, analogous to the Java Swing Component, is employed in the architecture. The primary components support a small common interface that defines the essential operations. Developers can choose the appropriate object-oriented techniques and design patterns to address the complexities and the needs of the visualization applications they are developing. For example, components can be as complex as a view with multiple interacting graphs, or as simple as a display of several points or labels. They can be containers that encapsulate all Model-View-Controller [13][5] elements, or just simply have the presentation and/or controller logics. Any suitable graphics API, like Java 2D/3D and OpenGL, may be used as needed. With the components available in the architecture, developers can choose to reuse the existing components, extend these components, or construct new components that support the common component interface.

The common component interface defines the methods to support the operations in the following key areas:

Scaling: All components need one or more scales (defined later in this document). Scales are used in displaying the data and viewing ranges. Each component can have its own preferred scale setting(s). If scales need to be synchronized across multiple components, the preferred scale settings can be collected from the components and used as constraints to construct common scales.

Orientation setting: Component display can be rotated or flipped to different orientations. For example, bars in a histogram can be either vertical (upward or downward) or horizontal (leftward or rightward).

Popup menu customization: Context-sensitive popup menus can be used to choose the probe and navigation tools/controllers, such as selection, zoom, pan, label, lens, etc. The menus can also serve as a short cut to invoke dialogs, property pages, and other custom actions to set or modify components’ attributes. If necessary, the probe and navigation tools can be added, removed, and disabled.

Dialog/property page customization: Context-sensitive dialogs and property pages can be registered in the components for repeated use.

Initialization/disposal: Occasionally, special initialization may be necessary before the component can be used. Similarly, extra cleanup may be needed, for example to improve system resource usage.

Overhang sizes: These are the external margins around the main drawing area of a component used for aesthetics.

(3)

geometric transformations, custom metadata/properties setting, and so on. Since the Java Swing component architecture already provides the necessary supports for these operations, they do not have to be explicitly defined in the common component interface for our implementation.

To provide developers with a starting point for actual analytical visualization development, many common components are available in our research library. These include: axis, scatter/point, lines, polygon/area, parallel, bar, histogram, box, heat map/matrix, contour, dendrogram/tree, label list, lattice, and so on.

Data: The data framework consists of a data model and several auxiliary models as depicted in Figure 1. Each model is presented as an interface which can have different concrete implementations. Listener interfaces and events are also defined accordingly. This design is inspired by the decorator pattern [13]. In this data framework, developers use the observer pattern [13] to process change events fired by these models. The statistical convention of treating a row of data values as an observation and a column as a variable is used here.

Figure 1. Data framework

Data model: This interface provides access to data values as well as the common table metadata and variable metadata, such as variable types, measurements, names, labels, and format strategies. It also manages the auxiliary models. The data values are accessible and modifiable in either primitive types, like numeric and character, or generic object type. This hybrid scheme offers modeling generality and run-time efficiency not available in some other data schemes [11]. Because data values are most frequently used as variable based, our implementation of this interface uses the column-major order for better run-time performance.

Observation property model: This interface provides access to observation metadata, visual encodings, maskings (selection, visibility, inclusion, etc.), and observation relationships, if any. The interface is object-based. To improve efficiency, different concrete implementation classes are provided for the primitive types of property values including Booleans/bits, bytes, and integers. The property values are indexed by the observation indices.

At any given time, multiple instances of the observation property model can be registered in the data model and used by one or many components. By default, instances of the observation property model are available for the selection, color, marker shape, and graphics visibility properties. The factory method pattern [13] is used to create these instances on the needed basis. Variable property model: This interface is similar to the observation property model but is used to address variable properties. If needed, for example, variable roles and some statistics can be modeled as variable properties shared among graphs and analyses.

Observation permutation model: This interface provides access to different observation ordering and grouping information resulting from applying algorithms and operations, such as sorting, classifying, and partitioning. Our library provides several sorting-based implementations of this interface to permute observations using one or more variables or observation property models.

Data reader and writer models: These two interfaces provide local and remote data I/O support for different data sources and targets. The concrete implementations can be different I/O driver classes for SAS and/or other data sets.

Other auxiliary models include general-purpose property model interfaces for listener registration and change event firing support. They are designed to accommodate any special needs not satisfied by the previously mentioned models. For example, if an application ever needs to create additional properties for the individual data values, and to monitor changes to these properties, the developer can implement these interfaces to satisfy their requirements.

observation property models data model data reader model data writer model other data property models variable observation observation permutation models variable property models

(4)

The combination of these model interfaces provides a very uniform, flexible, and extendable data framework to effectively address the data complexities associated with analytical visualization applications. One or more data models and auxiliary models can be used as the inputs and outputs for various data, analytical, and visual transformations such as sorting, classifying, linking, joining, merging, filtering, aggregating, partitioning, extracting, summarizing, and so on.

Scale: Each component has a local coordinate system with at least two scales. A scale controls one dimension of the local coordinate system. Components use scales to measure the data and to determine viewing transformations. With shared scales and data, data and visual synchronizations can be achieved among multiple components.

The scale is modeled mainly by four integrated elements: scale model, scale ticker, transformation function, and scale model connector.

Scale model: This class provides access to ranges, ticks, transformation functions, and attributes. There are three different ranges: data range, visible range, and tick mark range. Ticks are the elements registered on a scale. Tick attributes include positions, colors, labels, and custom objects. The scale model also manages a ticker and a list of registered change listeners. If needed, one or more preferred data range constraints can be provided to help the scale model select an optimal data range. This facilitates the synchronization of data and visual scaling among different components.

Ticker: This abstract class handles the tick placement. By extending this class, different concrete tickers can be use to handle different placement schemes, such as numeric/interval, nominal/categorical, time/date, binning, hierarchical, and so on. Transformation functions: These functions perform one-dimensional data or visual transformation, such as logarithmic and exponential.

Scale model connector: This abstract class can be used to synchronize multiple scale models, such as the main and any secondary scales, in different coordinate systems.

The above scale design uses the observer, strategy, mediator, and adaptor patterns [13].

Lattice: Latticing is a hierarchical 2.5D layout mechanism for laying out graphical components to address many common visual patterns in analytical visualization. In addition to its layout functionality, the lattice can be used with scale and data models to provide data and visual synchronization among the components in a systematic way. These kinds of synchronizations are extremely important in dynamic, analytical data visualization.

The lattice design consists of the lattice context and the lattice components. In a lattice, components are arranged into a grid that is managed by one or several lattice components. Each cell in the grid holds either a component or a stack of components overlaying each other. A lattice itself is also a component and can be used for overlaying. This means, besides the overlaying of fine grain layers within or among components like points or labels, the lattice also supports coarse grain layers like other lattices. These design features result in a hierarchical 2.5D layout scheme.

Lattice context: This context is the attribute specification of a lattice. Some of the key attributes are: • Lattice grid row/column counts

• Cell components, if any

• Shared common scale models, if any, within one or multiple rows/columns • Row/columns weights

• Spacing constraints between any two adjacent rows/columns • Lattice orientations: horizontal/vertical, left/right, and up/down • Annotations: title, footnote, labels, legends, etc.

Other attributes control the less frequently used aspects such as: • whether columns/rows can be interchanged, moved, or resized

• whether a component can occupy multiple grid cells, either partially or completely • whether aspect ratio preferences, if any, should be honored when possible

Lattice components: These are the default components used to construct a lattice based on the specification provided in a lattice context. They can be used together, if needed, to provide support for functionality like context sensitive popup menus and dialogs, input event handling, built-in components (title, footnote, etc.) management, and so on. They also provide layer management for coarse grain layers.

In our library implementation, a lattice can also host any Java Swing component. Of course, since these components do not implement our common component interface, they would have to be extended to support applicable analytical data and visual interactivity if needed.

Developers can use a lattice to construct complex displays, such as Trellis [3] and plot matrix [10][4][23], by using either the factory patterns [13] or simple class / interface inheritances.

(5)

4. VISUALIZATION DESIGN PATTERNS

Famous furniture makers, like James Krenov and Sam Maloof, believe that good design should involve the process of doing and using. The same can be said about analytical data visualization.

We differentiate between two types of recurring patterns in the doing and using in the area of analytical data visualization: software design patterns and visualization design patterns. The visualization design patterns are used by users of visualization systems to model, design, and perform visualization tasks, while the software design patterns [13][5] are used by developers to design and implement a visualization system. It is worth noting that some of the visualization design patterns also have great impact on visualization software development and become special software design patterns used by visualization developers as well.

In the remainder of this section, we first briefly describe the criteria that we used to identify the visualization design patterns in the area of dynamic, analytical visualization. Then we categorize and discuss nine important visualization design patterns. The relationships among these patterns are presented as well using a relationship diagram. Next, these nine patterns are discussed in detail using a pattern description language.

Figure 2. Relationships of visualization design patterns

4.1. Overview

We have investigated a variety of visualization design patterns in the area of dynamic, analytical data visualization. Nine patterns are listed in this paper. These patterns meet the following criteria:

Decorated Data Visual Encoding

Linked Graphs Brushing Network Flow Details Management sharing data sharing visual effects encoding selection states defining data coverage arranging data at levels arranging/ coordinating details in views changing visual effects Progressive Refinement selecting data processing data adjusting layers structure- based brushing choosing details linking cells adjusting process defining details as network overlaying in/over grids making data displayable arranging/ coordinating details in cells arranging/ coordinating details in layers linking

layers adjusting _cells

behavioral data

pattern categories: structural

Overlay Graphic Grid

(6)

• exist in the area of dynamic, analytical data visualization

• address a recurring problem in visualization tasks, and present a solution to it • document existing, well-proven, common design practice and experience

• identify and specify abstracts that are above the level of specific practice and experience

• provide users and developers a common vocabulary and understanding for good practices and principles • provide a means of documenting and communicating visualization design

Since the study of design patterns in the data visualization is still at the early stage, this paper attempts to address only the most common patterns in the area. More visualization design patterns can be identified using less restrict criteria.

4.2. Pattern Description Template

In this paper, the visualization design patterns are formally discussed using a pattern description template adapted from the current de facto template in [16]. This template includes five mandatory parts:

• Context: the situations in which the pattern would apply • Problem: a statement of the problem to be solved

• Forces: the factors which must be considered when applying the pattern • Solution: the proposed solution to the problem

• Examples: cases demonstrating the existence of the recurring problem and the application for the pattern

The template also includes variants, related patterns, and known uses as optional parts. The variants and knownuses list some of the existing research and commercial works. The related patterns reference to some patterns that solve similar or related problems, and to patterns that refine the pattern under describing.

The Java visual component library described early in the paper is used to facilitate the discussions of the solution and examples parts for each pattern described in the following section.

All the examples used in this paper employ dynamic graphics. To save space, some examples are used to illustrate several different patterns.

4.3. Visualization Design Patterns in Detail

Figure 2 shows the nine patterns identified in this paper and the common relationships and uses among these patterns. The patterns are classified into three categories: data, structural, and behavioral. These categories address the data manipulation, graphics layout, and interactivity in the area of dynamic, analytical data visualization.

4.3.1. Data Patterns

First and foremost, dynamic, analytical data visualization is about displaying abstract data, typically on a computer monitor. The data patterns focus on how to organize and visually represent the data in a common statistical/analytical visualization. The Decorated Data pattern addresses how to organize the common raw data, metadata, and various run-time states as the starting point of dynamic data visualization. The Visual Encoding pattern describes how the abstract data are mapped to visual forms for drawing.

• Decorated Data:

Context: Need to handle multiple data elements in dynamic analytical data visualization. Problem: How to organize and process multiple data elements effectively.

Forces: Data elements may include raw data, metadata, relationships, derived data, and even visual encodings [4]. Different types of data sources may be used. Card et al. [7] gave excellent discussions on the factors involved.

Solution: Separate the related data elements into a main data part and decorating parts so that the different parts can be created and manipulated dynamically.

The data framework is designed to provide direct support for this solution. The main part is in the data model, and the decorating parts are in various auxiliary models.

Examples: Figure 3 is an example of a one-way microarray display in a genomic application. A heat map, label, dendrogram, and axis components are used in this display along with three data models. These data models are used for the raw data, the hierarchical filtering and clustering tree data, and also for the data resulting from joining these two. The heat map and label components employ the joined data model, while the tree data model is used by the dendrogram. Each data model has registered property models for visual encoding and states (colors, selections, etc.). Both the main data values and the property models in different data models are joined to synchronize data and visual manipulations in this display. Any change in one of the models triggers changes in other related models. For example, selecting a tree branch in the dendrogram will highlight the corresponding labels and heat map rows. These data models can also be used separately in other displays to explore other aspects of this display in the same synchronized fashion.

(7)

Related patterns: Visual encoding, Decorator [13].

Known uses: SAS (JMP, Insight, IML Workshop) [20], ADVIZOR [11].

Figure 3. Microarray display Figure 4. Trellis display Figure 5. Triangular scatter plot matrix with a detail view

• Visual Encoding:

Context: Visualize abstract data.

Problem: How to convert abstract data information into something displayable.

Forces: The information to display in dynamic analytical data visualization may include data values and metadata, derived or calculated data, data structures (observations, variables, groups, relationships, etc.), and importantly, data states (selections, exclusions, labels, etc.)

Solution: Encode the information visually as position, color, shape, size, orientation, saturation, texture, and so on. Bertin [4] described these encoding schemes in great detail.

The data framework provides very good support for visual encoding via various property models.

Variants: A less discussed scheme is to encode the coarse grain elements and structures in a display of multiple graphs. Examples: Almost all examples in this paper illustrate the use of this pattern to different extents. Figure 4 shows the use of several encoding schemes in scatter plots. The observation property models of integer types are used to encode colors and marker shapes. A boolean type property model is used for selection state. The selected points, such as the one in the University Farm panel, are encoded with larger sizes and stronger intensities. An observation permutation model, based on the median yields of the barley species, is used to encode the Y positions of the points in the scatter components and the tick marks in the Y axes.

Figure 4 also demonstrates the coarse grain encoding. The positions of the scatter component panels are decided based on descending order of the median values of barley yield at the farms. This encoding is further demonstrated with texts and color bars in the strip labels above the panels. Figure 5 is another example of coarse grain encoding. The relationship between the selected component and the detail view component is indicated with the highlight borders of the same color.

(8)

Related patterns: Decorated data

Known uses: Common in data visualization. 4.3.2. Structural Patterns

In analytical data visualization, multiple plots are often arranged according to rules and principles to facilitate the exploration of patterns and trends. The structural patterns are concerned with how the plots are arranged in a display. The Graphic Grid pattern defines a layout for arranging plots in separated areas in a display, while the Overlay pattern arranges plots in a shared display space. The combination of these two patterns can address many sophisticated layout needs in an analytical data visualization.

Figure 6. Partial reproduction of Minard Chart

• Graphic Grid:

Context: Multiple graphs on different aspects of data need to be used/inspected together within a limited area. Problem: How to arrange the graphs in the space available in a way that makes them easy to understand and use. Forces: Frequently, graphs are used together for exploratory study of data patterns and trends. The graphs can be of different types and sizes, and usually data and visual synchronizations are involved. For example, common scales may be shared among the graphs.

Solution: Arrange the graphs and other visual components in a layout resembling a grid or table to facilitate the visual discovery of patterns and trends across different dimensions of the data.

Obviously, this solution can be used for complicated displays like the (scatter) plot matrix [4][7][10][23] and Trellis [3]. This pattern can also be applied to the more common graphs as well. If axis is treated as a 1D graph component, then most of the 2D statistical charts, such as scatter plot and histogram, can be arranged in the 2×2 grids with two axes and another graphic component. We consider this pattern one of the most important visualization design patterns and use it as a paradigm to simplify and unify many displays in analytical data visualization. The lattice is designed primarily to support the frequent use of this pattern.

Variants: Graphs span over multiple cells.

Examples: Most of the figures in this paper use this pattern. For instance, Figure 4 is a dynamic version of a famous example of the static Trellis display [3]. A 13×2 lattice is constructed with cells of different sizes. Six axis components occupy six of the cells in the first column. The second column is composed of six strip labels, six scatter components, and one axis component. One scale model is shared among the scatter and horizontal axis components, and other six scale models are shared in the six rows of scatter and vertical axis components. The strip labels are the instances of a simple extension of Java Swing JLabel.

Our lattice mechanism enables components to span a fractional portion of multiple rows or columns. This feature is not available in some other layout schemes [17][18][21]. Figure 5 shows the use of this feature on the Anderson iris [10] data set. A 5×4 lattice is constructed with the same types of dynamic axis and scatter components as in the Figure 4. The scatter component at the upper-right corner of the display spans 1.9 rows and 1.9 columns. This component is an

(9)

enlarged detail view of the selected component, the lower-left one in this case, in the triangular scatter plot matrix. Multiple scale models are shared within rows and columns in both examples. When any data or visual manipulations in a cell component affect its current scales, other components sharing the scale models will be automatically updated with the new scales.

Known uses: Common in information visualization. • Overlay:

Context: You already have a set of relatively simple graphs specialized to perform particular tasks. You need to provide a new, single graph to support several of these tasks.

Problem: How to build the complex new graph from two or more simple existing graphs.

Forces: These simple graphs share the same drawing area either completely or partially. They often appear to be in the same coordinate system. If needed, they may have some common interactive behaviors, such as zoom and pan, as well as different ones, like selection and brushing. Some of these forces are studied in [15] as constraints.

Solution: Overlay different layers of simple graphs to build one complex graph.

Our lattice supports overlaying components as layers. A mediator [13] is used to coordinate the context sensitive popup menus and dialogs for involved interactivities. The scale models in different components can be either simply shared or coordinated by scale model connecters.

Variants: The graphs used as overlay layers can be complex ones, such as graphic grids [21] or previously overlaid graphs.

Examples: Figure 7 is an example of overlaying a scatter, linked line, and polygon components in three separate layers using a 1×1 lattice. In Figure 6, the famous Minard Chart [23] is partially reproduced using a 3×2 lattice. The cell at the upper-right corner has a line component overlaid by a scatter component. The line component shows Napoleon’s march path along with the size of his army, and the scatter component with points turned off labels the cities. Three data models are used for the march, city, and temperature during the campaign. Finally, a transparent Java JPanel overlays the whole lattice as an annotation layer. Two Java Swing text components are created and adjusted interactively in the annotation layer for the title and legend of the display. This interactive approach is also used to generate all legends and annotations for the other examples in this paper.

Related patterns: Graphic grid, Composite [13].

Known uses: DataSplash [25], Polaris [21], SAS/IML Workshop [20]. 4.3.3. Behavioral Patterns

A key characteristic of dynamic data visualization is employing multiple coordinated plots to explore the data interactively. The behavioral patterns address how one or multiple plots are manipulated interactively during the exploratory process and how the exploratory process can be managed.

Figure 7. Overlay pattern _{Figure 8. Compound brushing} _{Figure 9. MM1 discrete event simulation}

• Linked Graphs:

Context: Use multiple graphs to explore data dynamically.

Problem: How to visually illustrate the data, analytical, and visual associations between the different dynamic graphs. Forces: The data associations can be value-related, structure-related (such as observations, variables, groups, etc.), or state-related (such as selections, exclusions, and labels). Sometimes the visual associations with rotation, pan, zoom, and others are involved. The corresponding data or visual elements should be displayed in the graphs with the consistent visual encoding, if possible. The direct manipulations and updates supported in the dynamic graphs [10] should not disrupt this consistency.

(10)

Solution: Link the graphs together through shared models. The models capture the data, analytical, and/or visual states and associations. The graphs interpret the contents of the same models consistently.

Variants: The interactivity relationships among the linked graphs may be in different topologies, such as bi-directed/undirected or peer/primary-secondary. Some graphs can be frozen if needed.

Examples: By default, most of our graphics components are linked directly or indirectly through scale, data, and property models. For example, Figure 8 shows a 4×3 lattice with scatter and histogram components linked directly through a data model. The components within the same row or column are also directly linked with the shared row or column scale models. When the data are changed or one component is zoomed or paned, all linked components are updated correspondingly. The components use their own selection property model for highlighting. Those models are linked indirectly as explained in next pattern.

Related patterns: Almost all other visualization design patterns discussed in this paper, Model-View-Controller [13][5].

Known uses: Common in dynamic data visualization. • Brushing:

Context: Manipulate a group of data elements in a graph.

Problem: How to identify and apply different operations on a group of elements using a point-click device like a mouse.

Forces: Possible operations to perform can include: delete, label, highlight, and so on [10]. Fast update is imperative. Selection, which identifies the elements to manipulate, is the basis for all these operations.

Solution: Use a geometric object,commonly called a brush, in the graph. The brush can be dragged or turned using an input device like a mouse. The elements touched by the brush are the candidates for selection and other operations. While almost all graphic components in our research library support the rectangular brush (where applicable), other kinds of brushings are also possible.

Variants: Brushes and brushed elements may have different geometric attributes (shape, angle …). Operations may involve many functional or structural [12] aspects of data, analytical, and visual processing. Brush manipulation can be indirect either programmatically or via other graphs or widgets such as sliders.

Examples: Figure 4 shows simple rectangular brushing inside the University Farm panel. Figure 8 is a simple example of more sophisticated compound brushing [8]. As mentioned earlier, each of the scatter and histogram components uses its own observation selection property model. Three brushes are used in the histograms. Brushing the histograms selects different data subsets indicated by different selection models. The selections in the left scatter component are the union of the selections in the upper-left and the right histograms, and the selections in the right scatter component are the intersection of the selections in the upper-right and the right histograms. These selections are shown as highlights in different components.

Known uses: Common in dynamic graphics, like XmdvTool [12], SAS (Insight, Jmp, IML workshop) [20], etc. • Details Management:

Context: Visual exploratory analysis of a large or complex data set.

Problem: How to uncover relationships and patterns buried in a large data set.

Forces: Usually it is neither feasible nor reasonable to expect a simple graph to be an effective display when the data are large and multi-dimensional.

Solution: Organize data into different detail levels based analysis, data, or visual processing techniques such as extraction, exclusion, summarization, distortion, and so on. The details at different levels are shown as needed either with a single graphic or possibly multiple graphs.

The simplest visual techniques used here are the linear zoom and pan, which are implemented by changing visible scale ranges. These techniques are included as probe tools in most of our graphics components. Nonlinear magnification or lens capabilities can be implemented as well. If needed, multiple graphics components can be overlaid in a single graph or used as different linked graphs to address different detail levels.

Variants: Overiew-details [7], Focus-context [7], level of details [7], multiscale techniques like data zooming [22], etc. Examples: Figure 3 shows the simple use of the focus-context technique to enhance the display of hierarchical clustering tree. A 1D nonlinear magnification function is used in the horizontal scale model to magnify the range (0.0, 0.24) out of the total range (0.0, 0.36). The focal point is at 0.0. Without this function, the display of tree nodes and branches near the tree leaves would have been very crowded.

(11)

Figure 5 is an example of using a triangular scatter plot matrix as the overview of a multidimensional data set. The small display of any two-dimensions in the matrix can be cloned and manipulated in a more detailed view at the upper-right corner.

Related patterns: Progressive refinement.

Known uses: FilmFinder [1], SpotFire [2], ADVIZOR [11], Polaris [22], DataSplash [25]. • Network Flow:

Context: You need to perform a complicated task involving multiple operations. Problem: How to organize and monitor the task visually.

Forces: The involved operations can include different analysis, data, and visual processing.

Solution: Divide the task into several subtasks which are modeled as nodes with different functionalities. The nodes are linked to form a directed graph network. Various processed information flows from node to node through links. Our graphics components and lattices can be used as nodes, and the data framework is designed to facilitate information flowing within the network.

Variant: Compound/composed nodes can be used to model the tasks hierarchically.

Examples: Figure 9 is a simple MM1 example in discrete event simulation. Event information flows from the source, to the queue, server, and sink nodes and finally is displayed in two graphics nodes. The entire simulation process can be monitored via animation techniques.

Known uses: Interactive tools using data-flow model [7] or data state model [9][24], like SAS/EMiner [20] and DataSplash [25], discrete event simulation tool like SAS/QSim [20].

• Progressive Refinement:

Context: The existing display is not exactly what you want. It may be too simple, or you have additional needs or different preferences.

Problem: How to reuse the display to suit your needs.

Forces: Due to the development complexities and diverse uses of analytical graphics, it is very difficult, if not impossible, to create a collection of graphs that will meet everyone’s needs.

Solution: Start with the simple display and progressively refine it to have all elements you want.

We attempt to follow many of the principles outlined by Tufte [23] and Bertin [4] to develop the graphics components. These components are also equipped with various probing tools, dialogs, and property pages to enable users to manipulate many aspects of the displays. More complex displays can be constructed by adjusting components’ data specifications or using the overlay feature of our lattice.

Examples: The making of Minard Chart [23] in Figure 6 is used to illustrate the refinement process. Inspired by Wilkinson’s work in [13], we created three data sets for the march, city, and temperature information. These data were loaded into three data models and used in the initial black and white display. Then the advance and retreat routes in the march data were selected and encoded in gold and blue colors. Next the display sizes and scales were adjusted based on the aspect ratio in Minard’s original chart. The path thickness was adjusted using the survivor variable in the march data as the weight. The labels were turned on and the markers were turned off in the scatter component to show the city names. Next, in the line component with the temperature data, the point labels for dates were turned on, gray was picked as the line color, and the horizontal reference lines were turned on. Finally, the title and legend in the annotation layer were added interactively.

Related patterns: Details management, Overlay

5. CONCLUSIONS AND FUTURE WORK

This paper describes our pattern-driven approach for improving development productivity and usage effectiveness in the area of dynamic, analytical data visualization. We suggest the definition and importance of visualization design patterns and distinguish them from software design patterns. Nine important visualization design patterns that have been identified and classified in three categories: data, structural, and behavioral. In the paper, these patterns are presented using the current de facto pattern description language.

Driven by this study of design patterns, we propose a simple, yet powerful, component-based, data centric and layout-aided architecture design for a dynamic, analytical data visualization library. This design simplifies and unifies the development of many graphical displays in dynamic, analytical data visualization. The simplicity, versatility, and effectiveness of the design are demonstrated by many dynamic graphics presented in the paper.

(12)

We hope this work will attract more attention to the study and use of visual and software design patterns in information visualization. We intend to continue our own efforts in the area of analytical data visualization. Taxonomies driven by visualization design patterns can be developed for the analytical data visualization applications. Another potential research topic is the development of interactive pattern programming environments/tools for the use of visualization design patterns. At this point, we are working on interactive tools for discrete event simulation using the network flow, brushing, and other patterns.

6. ACKNOWLEDGMENTS

The author is very grateful to many colleagues for their generous help and support. Special thanks to Phil Meanor for countless invaluable discussions, suggestions, and critiques which have inspired the author greatly. The author also thanks Todd Barlow for designing and reviewing the GUI for our graphics components, and Ann Kuo for her contribution to our research library implementation. Finally, the author thanks Russ Wolfinger, Marc Cohen, Radhika Kulkarni, David DeNardis, and David Duling for their support, advice, and feedback on our efforts.

REFERENCES

1. C. Ahlberg, B. Shneiderman. “Visual Information Seeking: Tight Coupling of Dynamic Query Filters with Starfield Displays”. Proc. of ACM CHI94, pp. 313-317, 1994.

2. C. Ahlberg, E. Wistrand. “IVEE: An Information Visualization and Exploration Environment”. Proc. IEEE Symposium on Information Visualization’95, pp. 66-73, 1995.

3. R. A. Becker, W. S. Cleveland, M. J. Shyu. “The Visual Design and Control of Trellis Display”. Journal of Computational and Graphical Statistics, vol. 5(2), pp. 123-155, 1996.

4. J. Bertin. Semiology of Graphics: Diagrams, Networks, Maps. Univ. of Wisconsin Press, 1983.

5. F. Buschmann, R. Meunier, H. Rohnert, P. Sommerlad, M. Stal. Pattern-Oriented Software Architecture. Wiley, 1996. 6. S. K. Card, J. D. Mackinlay. “The Structure of the Informatin Visualization Design Space”. Proc. IEEE Symposium on

Information Visualization’97, pp. 92-99, 1997.

7. S. K. Card, J. D. Mackinlay, B.Shneiderman. Readings in Information Visualization: Using Vision to Think. Morgan Kaufman, 1999.

8. H. Chen. “Compound Brushing”. Proc. IEEE Symposium on Information Visualization 2003, pp. 181-188, 2003

9. E. H. Chi. “A Taxonomy of Visualization Techniques using the Data State Reference Model”. Proc. IEEE Symposium on Information Visualization 2000, pp. 69-75, 2000.

10. W. S. Cleveland, R. McGill eds. Dynamic Graphics For Statistics, Wadsworth & Brooks, 1988.

11. S. G. Eick. “Visual Discovery and Analysis”. IEEE Trans. on Visualization and Computer Graphics, vol. 6(1), pp. 44-58, 2000.

12. Y. Fua, M. Ward, E. Rundensteiner. “Structure-based Brushes: a Mechanism for Navigating Hierarchically Organized Data and Information Spaces”. IEEE Trans. on Visualization and Computer Graphics, vol. 6(2), pp. 150-159, 2000. 13. E. Gamma, R. Helm, R. Johnson, J. Vlissides. Design Patterns: Elements of Reusable Object-Oriented Software. Addison

Wesley, 1994.

14. R. Harris. Information Graphics: A Comprehensive Illustrated Reference. Management Graphics, 1996.

15. J. D. Mackinlay. “Automating the Design of Graphical Presentation of Relational Information”. ACM Trans. on Graphics, Vol. 5(2), pp. 110-141, 1986.

16. G. Meszaros, J. Doble. “A Pattern Language for Pattern Writing”. Pattern Languages of Program Design 3, (edited by R. C. Martin, D. Riehle, F. Buschmann), pp. 529-574, Addison-Wesley, 1998.

17. P. R. Murrel. “Layouts: A Mechanism for Arranging Plots on a Page”. Journal of Computational and Graphical Statistics, vol. 8(1), pp. 121-134, 1999.

18. P. R. Murrel. “The grid Graphics Package”. R News, Vol. 2(2), pp. 14-19, 2002.

19. OLIVE: On-line Library of Information Visualization Environments. http://otal.umd.edu/Olive/. 1999. 20. SAS Products. www.sas.com/products, SAS Institute Inc., 2002.

21. C. Stolte, D. Tang, P. Hanrahan. Polaris: “A system for Query, Analysis, and Visualization of Multi-dimensional Relational Database”. IEEE Trans. on Visualization and Computer Graphics, vol. 8(1), pp. 52-65, 2002.

22. C. Stolte, D. Tang, P. Hanrahan. “Multiscale Visualization Using Data Cubes”. Proc. IEEE Symposium on Information Visualization 2002, pp. 7-14, 2002.

23. E. R. Tufte. The Visual display of Quantitative Information. Graphics Press, 1983. 24. L. Wilkinson. The Grammar of Graphics. Springer, 1999.

25. A. Woodruff, C. Olston, A. Aiken, M. Chu, V. Ercegovac, M. Lin, M. Spalding, M. Stonebraker. “DataSpash: A Direct Manipulation Environment for Programming Semantic Zoom Visualizations of Tabular Data”. Journal of Visual Languages and Computing, Vol. 12(5), pp. 551-571, 2001.