The user study presented in this chapter evaluates the effectiveness of SDDS for tasks per- formed on multivariate 3D data sets similar to those used by radiologists studying MR spec- troscopy. Effectiveness has been defined as how well viewers could perform two tasks designed to address the original visualization goals: value estimation and relationship identification. The study compared SDDS to superquadric glyphs, an alternative spatial multivariate 3D vi- sualization technique, for context and comparison. Compared to the superquadric condition, participants were significantly more accurate and faster estimating data values and identifying positive correlations with SDDS visualizations. The contrast between the two visualization techniques was particularly dramatic for the correlation identification task, with participants responding nearly twice as quickly and more than twice as accurately.
The study did not necessarily test the ideal set of shape-varying glyph properties or the perfect dynamic range for those properties, nor does it prove that SDDS is always better for these tasks. However, the results do highlight an important effect of choosing variable channels that are not perceptually equivalent. The most common explanation given by participants for why they preferred SDDS to superquadrics was that the superquadric roundness prop- erties were much more difficult to interpret than thickness or color, which was the property that participants understood the most accurately. SDDS uses color in a more perceptually uniform nominal color encoding for different variables. Because humans perceive color preat- tentively, distinguishing between two colors is generally easier than distinguishing between color variation and shape variation.
From a visualization design perspective, this difference in perceptibility between variable channels results in an emphasis on one variable over another. The difference is unavoidable in the case of techniques like superquadrics: separate channels such shape and color have different perceptual saliences. Having perceptually different channels in a single glyph is not necessarily a bad decision in general. The freedom to manipulate the dynamic range of particular variables that results from using perceptually different channels is useful if one variable is more important than another for the viewer. That said, a study of the parameter space of shape-varying glyphs is necessary to customize dynamic range accurately. The need to perform such a study may prohibit the use of significantly different variable channels in a visualization like the superquadrics described in this work. Note also that for the glyphs chosen for this study SDDS always worked at least as well as the strongest channels in the superquadric glyphs.
One point of minor concern is how exactly viewers perceive the size of spherical glyphs. Acevedo and Laidlaw have studied this problem for circular glyphs and indicate they perceive size by the square root of the circle’s radius rather than the radius itself (Acevedo and Laidlaw, 2006). Along the same lines, the perception of size in glyphs is also affected by local size contrast. If a glyph is surrounded by many larger glyphs, it appears to be larger than a similar glyph surrounded by smaller glyphs (Ware, 2000). Studying how size is perceived for spherical glyphs, which now have a volume component, is a potentially interesting avenue of future research.
Glyph-based techniques like SDDS apply most best to data sets with slow spatial variation, as is the case with the MR spectroscopy data set. Applying SDDS to high-frequency data sets will reveal low frequency trends in properly filtered data. SDDS should also work well when visualizing sub-regions of high frequency data.
The SDDS visualization enables viewers to explore and analyze relationships between mul- tivariate volume scalar fields. SDDS is the first known multivariate scalar volume visualization technique that can potentially scale to 11 simultaneous display channels. The complexity of the SDDS visualization makes adding uncertainty extremely difficult. The next chapter will discuss a complementary view that attempts to solve this problem.
CHAPTER 3
Multivariate Uncertainty Visualization in
Abstract Plots
(The contents of this chapter were presented in the proceedings at the Conference on Infor- mation Visualization 2010 and the Conference on Visualization and Data Analysis 2010 (Feng et al., 2010a; Feng et al., 2010b))
Data uncertainty can have a critical impact on what can be properly inferred from a data set. As an example, consider a simple election data set consisting of the percentage of votes cast for a set of candidates in a region:
Name East West North South
Betsy 60 30 15 50
Frank 30 10 15 30
John 10 20 20 10
Nancy 10 40 50 10
Table 3.1: An example data of election data (percentage of votes cast), used to illustrate the problems of hidden uncertainty.
By looking at this table, the casual observer might notice a few interesting facts. First, Betsy carried the east and south regions, while Nancy carried the west and north regions. Second, candidate scores in the east and south are negatively correlated with candidate scores in the north and west. However, what if these are only preliminary results, with only 50% of the votes tallied? In this case, these are only projected results. They will be reported with
error bars, for example±20%, since so few of the votes have been counted. In this context the problem is clear: with such large error error bars, conclusions about trends and winners are unreliable. This example is based on simple statistics: if two distributions overlap significantly, one cannot confidently argue that they represent different populations. Bar graphs with and without errors bars of the data in Table 3.1 are shown in Figure 3.1. From a visualization perspective, overlapping error bars indicate the difference between the two means, but as data sets grow in size and complexity, encoding uncertainty into visualizations becomes more challenging.
This chapter describes a set of visualization techniques for the display of multivariate data with statistical estimations of data value error. The techniques leverage known characteristics of visual perception so that viewers can preattentively identify patterns in trustworthy values while discouraging viewers from making false inferences based on uncertain values. The goal is not simply to let viewers distinguish certain values from uncertain values, but to disengage the visual system’s preattentive feature detection mechanisms for uncertain values. This ensures that a data value’s visual saliency grows with its certainty.
The primary contributions of this chapter are novel augmentations of the scatter plot and parallel coordinates plot. These plots are augmented to incorporate uncertainty using density plots. More formally, the plots use kernel density estimation (KDE) to approximate the probability density function (PDF) of the underlying data. The PDF describes the likelihood of each value, so it naturally de-emphasizes unreliable data points in the original data set. One prerequisite to using KDE is that data point uncertainty must be quantified using statistical distributions. For example if the data value can fall anywhere within a range of values, this uncertainty can be modeled using uniform distributions.
Density plots are useful tools for summarizing extremely large data sets. Whereas normal plots become over-plotted with glyphs overlapping each other, PDFs always highlight regions of high density. This chapter demonstrates that density-based plots are useful for visualizing uncertain data sets, large or small. However, density plots have two noticeable problems: data values are no longer always individually identifiable and outliers are de-emphasized. I make two modifications to density plots to address these problems: first, scaling the distribution
East West North South 0 10 20 30 40 50 60 70 80 Betsy Frank John Nancy
(a) Without Error Bars
East West North South
0 10 20 30 40 50 60 70 80 Betsy Frank John Nancy
(b) With Error Bars
Figure 3.1: A bar graph of the data in Table 3.1. Without error bars (above), the viewer may attempt to look for correlations in the results. Overlapping error bars (below) indicate that very little can be learned from the data.
mean to a brighter intensity introduces a discrete, identifiable feature that fades in proportion to its uncertainty; second, a novel animated plot, called the probabilistic plot, cycles through PDF samples so that outliers draw the viewer’s eye by intermittently flickering in and out. Confident regions remain stable over time. When summed, these random samples aggregate into a histogram approximation of the fully integrated PDF.
3.1
Background
The design of density plots for uncertain data builds on previous work in multivariate vi- sualization, uncertainty visualization, scatter plots, parallel coordinates plots, and visual perception.