• No results found

2.5 Summary

3.1.1 The Visualization Issue

Finding an answer to the first challenge means solving the complex problem of graphically representing high-dimensional entities on a two-dimensional display. We consider this a complex problem because a dataset’s dimensionality is the single most influential factor on the design of a visualization method. When the data to be visualized are defined in an n-dimensional space, where n is lower than four, there exists a great number of well- known techniques and procedures to deal efficiently with them (see for example Keller and Keller [111]). Indeed, as pointed out by Hibbard in [93] “Visualization has been successful because so much computer data is produced that describe the four-dimensional space-time world that our eyes and brains evolved to see.”

For four dimensions one possible strategy is to incorporate the fourth dimension as time and represent the data as an animation of visualizations. But even this solution sometimes does not achieve satisfactory results, especially if we consider, for example, the memory factor – i.e. it is not easy to keep track of all the visualization presented in each frame for comparison, say. The real problem, particularly for multidimensional data, arises when the data dimensionality is greater than four, in which case the range of available solutions starts to lessen.

For multivariate data, however, the dimensionality problem is slightly different. There exists a variety of solutions that can handle more than four variates comfortably (of course, some limitation do exist, perhaps when close to hundreds of variates and thousands of ob- servations). The problem is not so much in terms of the number of variates, but instead it is more related to the question of how efficiently a visualization can provide a visual inter- pretation for the data observations capable of fostering insight into possible relationships among them.

Ideally a representation of high-dimensional data should be designed in such a way as to afford perception by the human mind, accustomed to deal with our four-dimensional space-time world. The effectiveness degree of such visual representation is a function of both the data type and the visualization goal, and, hence, cannot be achieved indepen- dently of these factors. Therefore there is no ultimate strategy capable of solving this problem with the same degree of effectiveness for all possible visualization scenarios. For example, according to the intrinsic dimensionality metrics introduced by Grinstein et al. in [86], parallel coordinates is not as effective as theRadViz technique in uniquely identifying data records representing binary vectors, whereas theRadViz is not as good asparallel coordinatesin retaining the original value of individual data observations.

to do by enumerating four strategies – filtering, mapping, embedding, and projection – as sub-categories of the data picturing stage in the TSV ontology (c.f. Chapter 2, Section 2.3). They have been identified as representative strategies based on the various visualization techniques designed for multivariate multidimensional data.

The Filtering strategy

From the four strategies described in Section 2.3 we are especially interested in the filtering approach. It comprises those techniques whose central idea is the reduction of the amount of data presented. The filtering process starts by defining a focus point in the n-dimensional data space of interest. The focus point defines the position where the slices are extracted from, during the filtering process.

Usually a slice is low-dimensional, i.e. one-, two-, three-, or even four-dimensional (in which case the use of animation may be necessary), because using higher dimensional slices would lead us back to the original problem of visualized a high-dimensional space. Normally ‘thick’ slices are utilized to filter multivariate datasets because the data space is commonly scattered and sparse, thus the thicker the slice the more data observations are ‘filtered’; whereas in the multidimensional case a ‘thin’ slice is more appropriated since the continuous nature of such (‘dense’) space allows us to sample it virtually everywhere. Figure 3.1 shows the filtering being applied to a 3D multivariate dataset to extract a 2D slice defined by the variates X and Z. Note in that picture that only those observations that lie within the slice appear in the final projection shown in Figure 3.1-(d). So if we think of a multivariate dataset as a table – in which data items are rows and variates are columns – the selection of a slice is akin to the creation of a derived table using all rows of the original table but extracting only their values in the selected columns.

For the multidimensional case a slice also corresponds to the subspace spanned by the dimensions selected in the filtering process. In this case, however, a ‘thin’ slice is used, which means assigning a single value to each unselected dimension, rather than a sub- range as in the multivariate case shown above. Therefore the unselected dimensions are fixed to the corresponding coordinates of a focus point (thus defining the slice’s ‘thick- ness’), whereas the selected dimensions are allowed to vary within a specified region (thus defining the slice’s ‘size’).

Figure 3.2 demonstrates the filtering concept applied to a 3D multidimensional space defined by a three-dimensional unit cube, having one vertex at the origin of a Cartesian system and its faces parallel to the canonical planes. In order to ‘filter’ a two-dimensional subspace of this unit cube the first step is to define the location of the focus point which is used to identify the projection plane (in this case the slice is also the projection plane).

X Z (1,1,1) Y Y = 0.5 (0,0,0) Location of projection plane Location of projection plane X Z (1,1,1) Y Y = 0.5 (0,0,0) Slicing region (a) (b) Location of projection plane X Z (1,1,1) Y Y = 0.5 (0,0,0) Slicing region X Z Projection plane (c) (d)

Figure 3.1: ‘Filtering’ a two-dimensional subspace defined by variatesXandZfrom a three- dimensional multivariate data defined within a unit cube. The filtering takes place after defin- ing: 1) the position of the projection plane (in this case identified by the coordinateY of the focus point); 2) the size of the slice (determined by the ranges inX andZ); and, 3) the ‘thick- ness’ of the slice to be extracted (determined by the sub-ranges in Y). Picture (a) shows the original multivariate dataset with 10 observations (the cyan balls); picture (b) shows the definition of the slice (in green), highlighting the selected observations in dark blue; picture (c) demonstrates how the ‘sliced’ observations are projected onto the projection plane; and, picture (d) shows the end result, the projection plane with the ‘sliced’ observations being represented by red dots.

Then we proceed to select the dimensions to compose the filtered subspace, say dimen- sions X and Z. These indeed define a two-dimensional subspace (or slice) in the cube, but they alone are not enough to determine the location of the slice within the cube. Finally the location of the ‘thin’ slice is determined by the current value of the focus point’s coor- dinate corresponding to the unselected dimension, i.e. Y . Now the slice can be uniquely located within the original multidimensional space.

This form of filtering is, somewhat, similar to the philosophy of “divide to conquer”, or better put “slice to understand”. Another form of controlling the filtering outcome

                X Z (1,1,1) Y Y = 0.5 2D subspace (0,0,0)                 X Z (1,1,1) Y Y = 0.5 2D subspace (0,0,0) Domain constraint on X = [0.3, 0.7] (a) (b)

Figure 3.2: ‘Filtering’ a two-dimensional subspace defined by the dimensionsX andZ from a three-dimensional multidimensional unit cube. Besides selecting the dimensions for filter- ing (X andZ), it is necessary to specify a value for the other dimension (Y = 0.5), so that the two-dimensional subspace can be uniquely identified – shown in picture (a). In picture (b) a domain-filtering has been applied to the dimensionX, further reducing the size this di- mension. Only the data items that lie within that pattern-filled plane compose the ‘filtered’ data.

in both multivariate and multidimensional cases is achieved when we define constraints on the variates’ range (in the multivariate case) or on the dimensions’ domain (in the multidimensional range). This type of control has been sown in Figure 3.2-(b) in which the size of the X Z slice has been reduced via a constraint on dimension X .

Normally the control over a dataset’s ranges/domains is realized by a simple interface such as sliders associated with each range/domain (see for example the IVEE system [3] for an example of this concept adapted to multivariate applications). This type of filtering is compared to querying a database in search of registers whose values are within some ranges set up (for example via sliders) for each variable. Imposing restriction on a vari- ate’s range has the added bonus that the formation of the query (i.e. the selection of the variates to form a slice) is interactive and the result is immediately available on the visual- ization. It also allows the combination of variates in conjunctions (AND), but disjunctive (OR) combination is also possible if this form of filtering is applied in sequence.

In short, the filtering process may be controlled in two forms:-

1. By selecting variables from the original data space to form a subspace in which the data is to be presented. The way in which this form of filtering is carried out in both multivariate and multidimensional data can be described by a similar process called ‘slicing’.

2. By imposing constraints on the ranges for each variate or on the domain of each dimension, thus filtering the data items that will appear on the visualization. This strategy also works similarly for both multivariate and multidimensional datasets and are responsible for defining the ‘size’ and ‘thickness’ of a slice.