3.3 Visualization for multidimensional variables in parallel and perpendicular plots
3.3.4 Comparative discussion
From the previous discussion on parallel and perpendicular plots, we can summarize the features of these two types of plots. Table 3.1 shows the summary of comparison. Both plots have similar features in various aspects apart from a few exceptions such as:
the optimization of space (feature 4): an advantage for parallel plot, the perceptibility (feature 5): an advantage for perpendicular plot
Nevertheless, the advantages of each plot are more or less equal to each other which can lead to a design implication. In choosing either technique, one would end up having similar outcomes in terms of effectiveness and efficiency of the system. The usage context of the plots might just determine which of the plots to use. For example even though it was found by Siirtola, Laivo, Heimonen, & Raiha (2009) and Henley, Hagen, & Bergeron (2007) that the parallel plot resulted in a more accurate interpretation of data, Li, Martens, & Wijk (2008) found that for determining correlation, the scatter plot was superior.
70 Table 3.1: Summary of comparison of features in parallel and perpendicular plots
No Features Parallel plot Perpendicular plot
1 Visualize multidimensional attributes yes yes
2 Attribute values representation Lines joining
parallel axes
Points within perpendicular axis
3 Differentiating attribute set Colour Colour, symbols and
shapes
4 No of charts generated 1 m(m-1)/2, where m=
number of attributes
5 Graphical Perception / perceptibility Not good Good
6 Targeted users Novice to
experts
Novice to experts
7 Pattern interpretation Correlation,
cluster
Correlation, cluster
8 Application Wide Wide
9 Type of visualization Geometrical
based
Pixel based
10 Data Cluttering Avoidable via
shading and
brushing
Avoidable via multiple charts
11 Evidence of usability performance yes yes
Grinstein, Hoffman, & Pickett (2002) had also discussed in their lengthy article on setting up a standard benchmark for the evaluation of visualization for data mining. In doing so, five visualization techniques were compared using ten data sets (Simple Seven, Balloons, Contact Lenses, Shuttle O-rings, Monks problem, Iris Flower, Congress Voting, Liver Disorders, Cars and Wines) which can be accessed online. The five techniques compared are parallel coordinate, scatter plot matrix, survey plot, circle segments and radviz.
They proposed that a few criteria should be addressed for testing the visualization techniques: scalability related to the processing time of the data, ease of integration of domain knowledge, ease of classification, dealing with incorrect data, high dimensionality, query functionality and summarization of results. However, their
71 final comparison was based on the features illustrated by the data sets and how these can be efficiently visualized using the five different visualization techniques.
For the purpose of this research, the focus of discussion is on parallel coordinate and scatter plot matrix. Table 3.2 illustrates the conclusion by Grinstein, Hoffman, & Pickett (2002) on scatter plot and parallel plot visualization techniques. Yp stands for yes in parallel coordinate view and Ys stands for yes in scatter plot matrix. The unfilled space means that they are not applicable.
Table 3.2: Scatter plot and parallel coordinate comparison
No. TASK See
Outliers See Clusters Find Class Cluster See All Important Features See Some Important Features See Possible Rule/ Model See Exact Rule/ Model DATA SET 1. Simple Seven 2. Balloons 3. Contact Lenses 4. Shuttle O- rings Ys Yp Ys Yp Ys Yp Ys Yp Ys Yp Ys Yp 5. Monks problems Yp Yp Yp Yp Yp 6. Iris Flower Ys Yp Ys Yp Ys Yp Yp Ys Yp Ys Yp 7. Congress Voting 8. Liver Disorder Ys Yp Ys Yp 9. Cars Ys Yp Ys Yp Ys Yp Ys Yp Ys Yp 10. Wine Ys Yp Ys Yp Ys Yp Ys Yp Ys Yp
Grinstein, Hoffman, & Pickett (2002).
In Table 3.2 parallel coordinate and scatter plot matrix are shown to be relatively at par to each other in context of showing the appropriate tasks. However, parallel coordinate can be used to see all important features in the Iris Flower data set while it is not possible in the scatter plot matrix. For ‗monks problems‘ data, parallel coordinate is used and not scatter plot. While parallel coordinate covers the entire
72 tasks listed above stating from seeing outliers to seeing exact rule/model (7 tasks overall), scatter plot matrix only covers 6 tasks within the tested data set.
3.4 Summary
In this chapter, the discussion on word visualization shows that words relate to illustration of words in the form of drawing and sketches to show word meaning and word structure. The next discussion was on information visualization based on quantity and connectivity. Under the aspects of quantity the scatter plots and bar charts visualization had been used to display data related to frequency values; while under the aspect of connectivity, the hyperbolic, distorted views and network display had been used to display positional information. The combination of quantity and connectivity aspects is also found in the galaxy and parallel plots. Information visualization for multidimensional data was also discussed based on parallel and perpendicular plots. Both scatter plot and parallel plot can be used to show most of the tasks such as showing outliers, see clusters, see important features and others with slightly more usage of parallel plot.
73