The Data Picturing Stage - The Three Stage Visualization Ontology

2.3 The Three Stage Visualization Ontology

2.3.2 The Data Picturing Stage

The data picturing category has four sub-categories, each one of them representing the main13 visual strategy that a visualization technique may adopt in presenting the high- dimensional dataset on a lower dimensional display. This category is also the most impor- tant of all three categories because it characterizes the essential quality of a visualization method. The sub-categories are:-

• Filtering: This sub-category represents techniques that either (a) select a small

group of dimensions (usually less than three) from the original set of dimensions, or (b) filter data items based on some restriction imposed on them. Following that a technique presents the results of this filtering process, usually a low-dimensional subset, using any standard low-dimensional technique such as line graph, contour lines, or a scatterplot.

Most of the methods that adopt the filtering strategy diverge in the way they select the subsets, depending on whether the dataset is multidimensional or multivariate. Methods in this sub-category may also show multiple views of the data simultaneously, each encompassing a different subset of the variables. For example, hyperslice andhyperbox implement a filtering action by providing a simultaneous view of all possible pairwise combinations of dimensions, thus following strategy (a), de- scribed above. On the other hand,data visualization sliders anddynamic queries, use sliders as a tool to select the data items portrayed on screen and offer a single view of the result, thus following the strategy (b).

Other examples of techniques in this category are3D scatterplot matrix, which se- lects three variables from the original n-dimensional dataset;table lens, which relies 13_{We explain why we have used the term main visual strategy after the description of the four sub-}

on a distortion mechanism to integrate the detailed view of selected data items with less detailed graphical representation of the whole dataset; and, visualization for multidimensional function by projections (VMFP), which uses an approximation step called uniformly distributed sequence to transform a multidimensional function into a multivariate dataset and then select two or three dimensions to be visualized as 2D and 3D scatterplots, respectively.

• Embedding: The techniques in this sub-category organize the data in a hierarchy

of dimensions. The methods are differentiable from one another by the algorithm they use to create this hierarchy. For example, for the dimension stacking technique the user selects the order in which to embed one coordinate system within another. Firstly the user chooses two variables to form the outermost level of coordinate system (top of the hierarchy). This two-dimensional coordinate system is divided into rectangular bins and within the bins the next two variables are used to span the second level coordinate system. Then the process is repeated until there is only one or two variable left (bottom of the hierarchy). At this point the data is displayed using any suitable visualization method for lower dimensional data. This is, in essence, the same strategy followed byhierarchical axis andworlds within worlds. Alternatively the hierarchy of dimensions is automatically created, as in thequad- tree mapping (QTM) technique. This technique deals with an n-dimensional mul- tivariate dataset representing fields of individuals and their corresponding fitness level. These individuals are the result of a genetic algorithm in which they rep- resent alternative solutions to the target problem. The n fields of a data item are transformed into an n-bit binary value. The method locates each data item on a two-dimensional area following this strategy: (1) take the first two more significant bits and use their values to decide which one of the four quadrants of the 2D display area the data item should be placed in – this is the first level of the hierarchy; (2) recursively repeat this procedure, using, in each interaction, the next two most significant bits to guide further subdivision of each quadrant and define the new location for the data item; and, (3) when all bits have been used we reach the last level of the hierarchy, thus the definitive location of a data item. Then we use the fitness value to create a height field.

Finally, a technique in this category may simply represent a hierarchy inherent to the dataset, as is the case for the cone trees technique. This method uses a three- dimensional tree structure as a metaphor to represent the structure of, say, a file system. The technique relies on interactive methods such as rotation, zoom and

panning to overcome the natural problem of occlusion as a result of the three- dimensional representation of the tree on a two-dimensional display. Thetreemaps technique also tries to represent the inherent hierarchy of a dataset, but instead of a three-dimensional representation it applies a strategy similar to that of theQTM technique. The difference, though, is that the recursive subdivision of the two- dimensional display area is guided by the number of elements in each level of the dataset’s hierarchy, rather than being fixed to four subdivisions as in theQTMtech- nique.

• Mapping: The essence of any technique in this sub-category is to map the attributes

of each individual data item to the graphical properties of a visual mark (c.f. Chap- ter 1, Section 1.1.2). A visual mark may be a pixel (e.g. pixel-oriented, natural textures,circle segments, heat maps, survey plots), in this case the variates are as- sociated with specific regions on the display, each variate value is assigned to a coloured pixel or texture, and those pixels/textures that belong to the same variable are placed together in the corresponding region; or, an icon (e.g. stick figure, star glyph, color icons, shape coding, and Chernoff faces), in which case a technique should provide an arrangement of all the icons generated in the mapping step in such a way as to reveal new information about the data.

Common to all techniques in this category is their intention of exploiting human perception ability to reveal inherent behaviour of a dataset and recognize relation- ships among data elements.

• Projection: The projection sub-category comprises techniques that use any geo-

metric projection of the n-dimensional data down to a lower dimensional subspace, usually two-dimensional. A projection, in this case, may be a simple parallel projection on a low dimensional plane (e.g. grand tour, andprosections); a non-linear projection (e.g. SOM maps, and RadViz); a projection to a mathematical domain (e.g.Andrews curves), or; simply a rearrangement of the axes to be non-orthogonal and display the data along all the axes simultaneously (e.g. parallel coordinates, multi-line graphs,polar charts, andstar coordinates).

At the introduction of the data picturing category we mentioned that this classification represents the main visualization strategy a technique may adopt. However, a technique does not necessarily have to employ only a single strategy to generate a visualization. Sometimes a technique may use a secondary strategy to accomplish the complete representation of the data. Nonetheless, in our classification we consider only the core or

primary strategy of the visualization process. For instance, thehyperslice technique fol- lows the filtering strategy to extract subspaces, but uses a gradient field to represent a subset on screen. Therefore thehyperslice’s data picturing stage is essentially classified in the filtering category, while a secondary classification would place hyperslice in the mapping category.

Table 2.6 summarizes the classification of the visualization methods mentioned earlier under the data picturing category, considering only the main classification and ignoring any secondary strategy a method may use. Note that the projection approach is the most popular having ten instances, followed by mapping with nine, filtering with eight, and embedding approach with only five instances from our list. Also observe that the number of techniques in each of the four strategies is very close to the average (8), thus a balanced distribution of techniques throughout the categories. A final remark is that the data pic- turing category could still be further subdivided, according to the different algorithms a technique may follow to produce a visualization.

In document A framework for the visualization of multidimensional and multivariate data (Page 45-48)