Multidimensional Data Visualisation - Suitable Visualisation Techniques for Design Optimisation

2 Literature Review

2.3. Suitable Visualisation Techniques for Design Optimisation

2.3.1. Multidimensional Data Visualisation

Among all the multidimensional visualisation methods, scatter plot matrices, parallel coordinates plots and self-organising maps are widely used in MOO because of their capabilities to represent large multidimensional datasets. A synopsis of their main features follows.

Parallel Coordinates Plot (PCP)

Because of their effectiveness in simultaneously displaying high-dimensional datasets on a simple two-dimensional plot, parallel coordinate plots provide both a global vision of the entire data at hand and a tool to perform a local and accurate data examination by visualising only the axes and/or the samples of interest. The key concept is displayed in Figure 4. X1 X2 X3 Point A 1 1 1 Point B 1 5 5 Point C 8 2 7 Point D 9 9 9

Figure 4. Representation of four three-dimensional points through the Cartesian- coordinate system and a parallel coordinate plot. The point values per each dimension are specified on the table above.

Furthermore, it allows to conduct data analysis by identifying one-dimensional features (e.g., marginal densities), two-dimensional features (e.g., correlations), and multidimensional features (e.g., clustering) [120].

Nonetheless, there are some shortcomings which must be kept in mind. Firstly, the fact that, although all the dimensions are simultaneously visualised in the plot, the entire space is not represented: since the axes are plotted side by side, the i-th dimension is linked at most to two other dimensions. Therefore, in an n-dimensional problem no information is visualised about the relationships among the i-th axis and the other (n - 3) axes which are not by its sides. This can be achieved by performing multiple permutations of the axes, in order to gain insight into the problem at hand via different perspectives of the same input dataset.

A further deficiency is the unfriendly and unfamiliar nature of this visualisation technique, in contrast with our familiarity with the Cartesian-coordinate system. Consequently, the analysis of large multidimensional datasets may result to be excessively complex and onerous for the designers who are accustomed to traditional visualisation tools. However, such a drawback can be greatly reduced via a suitable intuitive and user-friendly interface, exploiting at the same time the capabilities of the parallel coordinate plots in visualising multidimensional data. In this context, McDonnel and Mueller [74] propose the Illustrative Parallel Coordinates (IPC). This is a set of rendering techniques (edge-bundling, branched clusters, silhouettes, shadows, halos, faded histograms within clusters and density plots) aimed at augmenting and improving the graphical aspects of the parallel coordinate plots so that as much information as possible can be conveyed also to non-expert data analysts.

In the attempt to achieve the same goal, other approaches are based on the integration of the parallel coordinate plots with other visualisation techniques. Wegman and Luo [121], for example, suggest a coupling with the grand tour technique to allow the user to explore datasets which are both high-dimensional and massive in size.

Scatter Plot Matrix (SPM)

Scatter plots are well suited for discovering or checking correlations between two variables. Scatter plot matrices can be obtained by applying the same concept to every pair of the variables contained in multidimensional datasets. The systematic format of the

resulting visualisation technique allows the user to compare all the dimensions at hand with respect to each other in a simple and immediate fashion by moving along a single row/column of a matrix of bivariate graphs [50], as shown in Figure 5.

Figure 5. Scatter plot matrix displaying a satellite design dataset [111].

The main limitation is of practical nature and arises from the visualisation of datasets with a large number of dimensions. In this case, the analysis of the single scatter plots may be significantly complicated because of their number and dimensions, especially if the plot is displayed on a computer monitor. Therefore, scatter plot matrices are advisable for the visualisation of datasets containing at most 8-10 variables. This limitation can be partially addressed by integrating the half-matrix version of scatter plots with an interactive interface which allows the user to steer data analysis and to select the information to be displayed on the graph [111][126].

Self-Organising Map (SOM)

The self organising maps are an efficient technique for visualising multidimensional data [63]. Through an unsupervised learning, the cells within the maps are organised to best describe the set of input data samples and allows projecting a high-dimensional space onto bidimensional component maps. Consequently, the main capabilities of the SOMs lie in providing an appropriate technique to identify data similarities and for clustering

[25][46]. An example is provided in Figure 6 for the same dataset considered in Appendix A.

Figure 6. Representation of the same dataset considered in Appendix A by means of the self-organising maps. The basic idea is that, through a learning process, the map is organised in such a way that all the cells close to each other represent all the inputs having similar features. The representation of any dataset is thus obtained via a set of two-dimensional plots, as many as the dimensions of the problem at hand. Each data sample is represented by a cell, which has always the same space position within all the plots. Each self-organising map is associated to a particular dimension and the values of its cells are encoded according to their colour-bar located besides the map.

As regards the dimensionality of the data samples, the SOMs are not suitable for visualising high-dimensional datasets containing more than about ten variables. When dealing with a large amount of variables, either the numerical analysis of the single maps

may be significantly deteriorated because of their number and reduced dimensions, or a global perspective of the problem under study is compromised (it may be impossible to visualise all the maps simultaneously on the same sheet or screen), which may affect the identification of data clusters and relationships.

With respect to the number of input samples, the main limitation is related to the number of cells contained in the maps. The more input samples are taken into account the larger is the number of map units to consider. Therefore, such a limitation in terms of dataset size affects the PC processing capabilities: increasing the number of input samples makes the learning process more complicated, and consequently a longer time is required [118].

In document Design exploration for engineering design optimisation : an aircraft conceptual perspective (Page 37-41)