2.2 Early Classification Proposals
2.2.2 Data Visualization Taxonomy by Buja et al
In 1996 Buja et al. introduced a taxonomy for data visualization that, at a high level, is divided into two categories4: rendering and manipulation. This taxonomy is regarded as a process-based type of approach because the main categories are the processes involved in generating a visualization.
The taxonomy is more focused on classifying approaches or aspects of a visualization technique, rather than the technique as a whole. This means that a technique may fit into several subtypes in each category of the taxonomy.
The rendering category describes the features of a static image and it is divided into three subtypes as follow:
1. Scatterplots: the observations are mapped to location of points in two- or three- dimensional spaces;
2. Traces: the observations are mapped to functions of a real parameter. Examples of this type of display approach are theparallel coordinates andAndrews curves. 3. Glyphs: the observations are mapped to complex symbols whose features represent
attributes of the observations. Examples of this visual representation aretrees and castles, Chernoff faces, shape coding, and stars. Then a decision has to be made with respect to the lay-out of the glyphs. It may be appropriate to associate the spatial location of the glyphs with one of the data dimensions – possibly with the independent dimensions, if such exist. The positioning of glyphs on the display is an important step because it supports the comparison of the glyphs generated from the mapping step.
The authors emphasize the importance of the interaction aspect of the visualization to accomplish a meaningful data exploration. The idea of using visualization enhanced with interactive features to explore the data and gain knowledge is a methodology called Exploratory Data Analysis (EDA) introduced by Tukey [197] and then further explored by Cleveland and McGill [49]. The work by these authors has had a major influence on subsequent methods for visualization as an aid to statistical analysis. Therefore Buja et al.’s taxonomy is more detailed and centered around the manipulation category, which is organized in terms of three basic search tasks that it should support: finding Gestalt, posing queries, and making comparisons. They also proposed a correlation between these 4We have adopted the term category to keep consistency of terminology, but the original term used by
three tasks with typical methods found in EDA, respectively named focusing, linking, and arranging views.
2.2.2.1 Finding Gestalt
This typically involves the search for some structure in the data or relationship between attributes. Examples of such a task are: find local or global linearities and nonlinearities; identify discontinuities; and, locate clusters, outliers, and unusual groups.
The authors then associate this task with a category of tools, namely focusing individ- ual views. They compare the functionality of this category of tools to the action of setting up a camera and deciding which view to look at. Put differently, this is the stage in which the variables (or the projections) for viewing are chosen; or the aspect ratio, and zoom and pan parameters are set up. All this is accomplished usually in an interactive fashion.
Because these parameters sometimes can be of continuous nature (e.g. choice of pro- jection, zoom and pan parameters) they can be animated smoothly. Examples of the appli- cation of animation of Gestalt parameters are (a) thegrand tour, a technique that presents a dynamic sequence of projections (normally visualized as scatterplots) of the data onto a low (≤ 2) dimensional plane moved along a continuous path in the n-dimensional variate
space; (b) theprojection pursuit method, which is an exploratory data analysis tool that tries to find interesting low-dimensional projections of multivariate data (i.e. clusters) by optimizing a projection index (a specific function associated with the method); and, theExvisproject, which permitted the animation of thestick figure’s5parameters to find visually interesting textures.
2.2.2.2 Posing queries
This task tries to make sense out of the findings (i.e. views) from the finding Gestalt stage, usually via a graphical query posed on these views. They argue thatlinking multiple views is a representative approach for this task, which is illustrated by thebrushing6technique first used in the M and N plot, to become almost a standard interaction tool for several visualization methods. In this technique one uses one of the views of the data to select elements (query formation); immediately the corresponding elements are highlighted on the other existing views, yielding a graphical response to the ‘query’ posed.
5A stick figure is a m limbed icon having each limb feature (such as length, thickness, colour, and
angle) mapped to a data observation with up to m variables. The whole collection of icons put side by side generates a texture in a complex image, which relies on human ability of recognizing patterns to find interesting features in the data [156].
The use of brushing in linked views is a powerful device because it may help to identify correlations between dimensions of a dataset. Consider, for example, a case in which a linear behaviour between two variables, say X and Y , is observed in one view – the ‘query’ view. Highlighting this linear behaviour in the ‘query’ view and observing the corresponding result in the other views might bring out a similar linear behaviour involving other dimensions of the original set, thereby expanding the original correlation between variables X and Y .
2.2.2.3 Making comparisons
This task involves the comparison of several views of the data (i.e. related plots of data) generated in the finding Gestalt task. The goal is to facilitate meaningful comparisons, and the authors call this process arranging views.
The several views generated during this task require some organization strategy to facilitate understanding. For views with two variables the most common arrangement is the matrix-like organization that combines the variables in pairs (e.g. scatterplot matrix). We have identified other options for views with more than two variables, thus expand- ing the original work done by the authors. These options are nesting variables within variables (e.g. worlds within worlds and hierarchical axis), organizing the views in a spreadsheet-like format (e.g. table lens and spreadsheet-like interface for visualization exploration), applying distortion to accommodate the detail and overview (e.g. fisheye viewsandthe perspective wall), or make use of dynamic organization (rapid serial visual presentation - RSVP7).
2.2.2.4 Applying the taxonomy to techniques
The authors did not provide full examples of how the taxonomy should be applied to the mentioned techniques, restricting themselves to classify only three techniques according to the finding Gestalt sub-category of the manipulation category.
Below we provide Table 2.2 that summarizes their textual description of the exam- ples. The elements under the posing queries and making comparisons columns are our interpretations for those techniques regarding these categories.
7This technique presents successive views in a brief period of time to support the browsing of views
Finding Gestalt Posing queries Making comparisons Technique
Focusing individual views Linking multiple views Arranging views
Scatterplots (scatter- plots sub-category)
Choice of projections, as- pect ratio, zoom and pan
Brushing Matrix-like arrangement
Parallel coordinates and Andrews curves (traces sub-category)
Choice of variables, their order, their scale, and the scale and aspect ratio of the plot
Brushing, hierarchical brushing
Single view
Glyphs (glyphs sub- category)
Choice of variables and their mapping to glyph features
Brushing Comparison of glyphs
with distinct mappings
Table 2.2: Listing some techniques classified according to the Buja et al. taxonomy. Note that the emphasis is given to the manipulation category. The classification under the rendering category is provided in brackets after the technique’s name.
2.2.2.5 Critique
The Buja et al. taxonomy is innovative in the sense that it contains three high level sub- categories for the manipulation category. This introduces a degree of abstraction and al- lows the comparison of different techniques according to their capabilities in dealing with the three manipulation processes: focusing individual views, linking multiple views, and arranging views. Note that the use of processes as categories in this taxonomy indicates a feature of a process-based type of classification.
Another positive aspect is the use of the Gestalt theory (which has been studied for over 80 years) as a formal basis to describe the interaction (manipulation) part of a vi- sualization. This contributes positively to the visualization field in the same way that the perception theory has influenced the visual design of methods in the field, providing guidelines, frameworks, and models.
However, their taxonomy presents some fundamental deficiencies. Firstly, the render- ing sub-categories are unable to describe some valuable visualization methods, such as
dimension stacking andpixel-oriented methods. Secondly, the sub-category scatterplots is very limited and describes only one technique: thescatterplot.
Finally, the only practical and complete example of their taxonomy in use was the classification of techniques implemented by their system, calledXGobi. Apart from that the techniques mentioned in their work were classified only in respect to the rendering category (i.e. scatterplots, traces, and glyphs sub-categories). It is quite difficult to evalu- ate the taxonomy’s usefulness in understanding visualization methods because the authors did not provide sufficient examples nor any comparative observation on those techniques classified according to their taxonomy.