In todays database systems, queries are specified in a one-by-one fashion. This is adequate if the user of the database exactly specifies the desired data and accesses a clearly separated data set. For many application areas where databases are used on a regular basis, e.g. accounting, reservation systems, and so on, queries are often based on keys, accessing exactly the desired data. In other application areas, however, especially those with very large data volumes such as scientific, engineering and environmental databases, it is often difficult to find the desired data. If, for example, researchers in environmental science are searching a huge database of test series for significant values, they might be looking for some correlation between multiple dimensions for some specific period of time and some geographic region. Since none of the parameters for the query are fixed, it is in general very difficult to find the desired information. With a query interface that provides users more feedback on the results of their queries, the researchers would probably start to specify one query that corresponds to some assumption and after issuing many refined queries and applying statistical methods to the results, they might find some interesting correlation.
4.3.1
Query Specification in XmdvTool
Query Specification in XmdvTool is through N-dimensional brushing. Brushing is the process of interactively painting over a subregion of the data display using a mouse, stylus, or other input device that enables the specification of location at- tributes. It has been used as a method for performing selection in graphics for many years. The principles of brushing were first explored by Becker and Cleve- land [BC87] and applied to high dimensional scatterplots. In this system, the user
specified a rectangular region in one of the 2D scatterplot projections and based on the mode of operation, points in other views corresponding to those falling within the brush were highlighted, deleted or labelled. Ward and Martin [War94, MW95] extended brushing to permit brushes to have the same dimensionality as the data (N-D instead of 2-D). The goal is to allow the user to gain some understanding of the spatial relationships in N-space by highlighting all data points that fall within a user-defined, relocatable subspace.
One common method of classifying brushing techniques is by identifying space in which the selection is being performed, namely screen or data space. This can then be used to specify a containment criteria (whether a particular point is inside or outside the brush). In screen space techniques, a brush is completely specified by a 2-D contiguous subspace on the screen. In data space techniques, a complete spec- ification consists of either an enumeration of the data elements contained within the brush or the N-D boundaries of a hyperbox that encapsulates the selection. A third category, namely structure space techniques, allows selection based on structural relationships between data points as discussed in Chapter 5.
In addition, brush manipulation may be direct or indirect. Direct manipulation refers to the ability to interactively control brush creation and manipulation by mouse (or other locator) actions on the data display itself. On the other hand, indirect manipulation refers to the use of separate widgets such as sliders to specify or manipulate the brush coverage. Direct manipulation is generally preferred for data- driven operations (such as isolating an interesting subset of the display), while user- driven operations (such as a range query) are often easier to specify with indirect methods.
The brush region provides the query for the data to be selected. Various brush operations are defined in XmdvTool that can be performed on the data selected by
the brush, important among them being highlighting and linking. Highlighting is one of the most fundamental brush operations. Points that are contained by the brush are colored differently from other points to make them stand out. Linking is the ability to select data in one display and see the same data selected in another display. This is useful when multiple methods of visualization are being used in conjunction, as is the case in XmdvTool. By default, XmdvTool provides brush linking between all views.
4.3.2
Query Specification in Pixel Oriented Displays
Brushes in XmdvTool correspond to N-dimensional range queries for databases. The data objects that are within the query region form the result of the query. In most cases, the number of results cannot be determined a priori; the resulting data set may be quite large, or it may even be empty. The results are visually represented in other displays of XmdvTool as differently colored points to make them stand out. This would correspond to large number of red colored polylines in the parallel coordinate display for large results and to none for empty results. To give the user more feedback, not only the data objects that are within the query region are presented in pixel oriented display, but also those that are “close” to the query region and only approximately fulfill the query. The approximate results are determined by using distance functions for each of the dimensions which are combined into the overall distance. The distance functions are datatype and application dependent and must be provided by the application. Examples of distance functions are the numerical difference (for metric types), distance matrices (for ordinal and nominal types), lexicographical, character- wise, substring or phonetic difference (for strings), and so on. The distance function yields distance tuples (d1
i, d2i, d3i, ..., dki) that denote
each of the dimensions, the distances are combined into the overall distance dk+1 i to
the query region. The overall distance dk+1
i would be close to 0 for objects that are
within the query region and increasing for the other objects.
The combination of distances for the different dimensions can be a problem because the distances for the different dimensions now have to be considered with respect to the distances of the other dimensions and the combined distance must be defined and meaningful globally. One problem is that the values calculated by the distance functions may be in completely different orders of magnitude. A second problem is that the relative importance of the multiple dimensions is highly user and query dependent. The first problem can be solved by a normalization of the distances. A simple normalization may be defined as a linear transformation of the range [0, dmax] (or [dmin, dmax] in case of bi-directional distances) for each dimension
to a fixed range (e.g. [0, 255]). The second problem can only be solved by user interaction since only the user is able to determine the priority of the dimensions. Therefore, in general, it is necessary to obtain weighting factors (w1, w2, w3, ..., wk+1)
representing the order of importance of the dimensions assigned by the user. This is somehow similar to techniques used in information retrieval that have been proposed to allow a ranking of the resulting data according to its relevance for the query.
With this kind of pixel oriented display in the XmdvTool system, the query specification process would be much easier. In the beginning, the users will see the default N-dimensional range query where the extents of each dimension is set to the mid 50% of its complete range. Then, guided by the visual feedback, they may interactively change the query according to their impression of the visualized results. In exploring very large databases, the visualization of results coupled with the possibility to incrementally refine the query are an effective way of finding the interesting properties of the data. The key idea of the system is to use the phenom-
enal abilities of the human vision system, which is able to analyze small to mid size amounts of data very efficiently and recognizes immediately patterns in images that would be very difficult (in some cases even impossible) if done by a computer.