• No results found

Typical Workflow of the Analysis Process

5.4 Visual Analysis Framework

5.4.2 Typical Workflow of the Analysis Process

The visual analysis process follows the information-seeking mantra as defined by Shneiderman [152]: “overview first, zoom and filter, then details-on-demand”. At first, the analyst creates an appropriate global overview based on the landscape metaphor. This includes finding a suitable filter radius and thresholds to suppress structural noise with topological simplification. Afterwards, the analyst inspects the clustering and compares significant clusters based on the characteristics of the hills in the landscape. Local feature analysis is performed by linking subsets to standard techniques to learn more about the semantics of the data. The user loops through these steps several times during the analysis.

An exemplary workflow is as follows: The analyst constructs (or loads) the module graph as described above (cf. Figure 5.7). Initially, all graphic views as well as the filter radius controller and the simplification controller are empty (cf. Figures 5.8a-b). After loading a data set, the analyst activates the topology module and specifies the neighborhood graph type and sampling thresholds in the control panel. A suitable filter radius is then determined with the controller widget: At first, the widget is initialized for the current data set by a random click in the controller. This evaluates the suitabilities for the two extreme cases, i.e. a too small and a too large value for σ (determined based on pair-wise distances), which leads to a line perpendicular to the diagonal (cf. Figure 5.8c). The graph is now refined either automatically or manually. The evaluation for a single filter radius can be accelerated by using a sparse neighborhood graph and by using only a sample of the data without reinsertion of non-samples. This reduces the time to find a suitable σ because the topological analysis has to be applied multiple times. Once the approximate position of the local minimum has been found, a few more evaluations at higher accuracy

(a) (b)

(c) (d)

(e) (f)

Figure 5.8: Different states of the controller widgets based on the image segmen- tation data set: (a)-(b) Initially, the filter radius plot and the three sliders of the simplification controller are empty. (c) After loading the input data, the suitability graph is initialized for the smallest and largest possible values of the filter radius σ. (d) When the merge tree changes, the three sliders in the simplification controller

are initialized to show the value distribution of the branches for the unsimplified tree. (e) To find the local minimum, the suitability graph is refined automatically or manually for different values on the x-axis. (f) Dragging one of the sliders in the simplification controller updates the value distribution in all three sliders in real-time. The desired thresholds leave only prominent features with high property values. In the persistence diagram, these are the circles far away from the diagonal.

refine the plot near this location. Alternatively, the graph could be precomputed automatically at higher accuracy if time is not a critical factor. The filter radius controller supports two clicking modes: Right-clicking in the graph only refines the plot without updating the module’s output connector. This avoids consecutive updates of connected modules during the determination of the filter radius. Left- clicking the graph additionally updates the merge tree on the output connector of the topology module. That is, if the user left-clicks on the graph near the local minimum (cf. Figure 5.8e), the connected simplification controller automatically initializes the three sliders with the respective value distributions of the merge tree’s branch decomposition (cf. Figure 5.8d). The focus now switches to this controller to remove noisy features in real-time. The analyst adjusts the sliders so that only those branches with high values for persistence, size, or stability (cf. Figure 5.8f) remain. In the persistence diagram, these are the circles far away from the diagonal. While dragging a slider, the simplified merge tree is updated on the module’s output connector. This triggers an update of the connected visualization module(s). If the landscape was already constructed before, changing the slider in the controller or in the control panels highlights remove candidates by changing the color of these hills to red. If the user releases the slider, the profile is reconstructed for the simplified merge tree. The analyst can refine parameter settings by reading the landscape, e.g. by looking for noticeable plateaus or suspicious data glyph accumulations that suggest a smaller filter radius to split a cluster.

The global clustering overview is stable and robust in that little adjustments of any parameter do not lead to significant changes in the landscape. That is, moderate changes of the filter radius or the simplification thresholds only add or remove some small hills without changing the profile’s overall structure. To find a suitable overview, typically 7-10 refinements were necessary in our experiments to find the local minimum of the plot. Depending on the parameter setting, each single evaluation typically takes around one second for the data sets used in this thesis. However, the evaluation of a σ’s suitability takes longer for larger data sets or disadvantageous parameter settings. Changing simplification thresholds to reveal or hide small features is typically fast and can be applied in real-time. Once an appropriate overview has been found, the local analysis phase starts with the identification of interesting features and their linking as described in Chapter 5.3. Linking selected subsets to another view and creating the PCA or the PCP is also fast and happens in real-time for medium-sized data sets. Still, projecting high- dimensional data can quickly become expensive depending on the applied projection technique and its implementation.

5.5

Examples and Results

In this example section, we focus on the interactive analysis process and on local feature inspection. We revisit some of the real-world data sets from the result section in Chapter 4.5 about the global overview and explore features individually. This aims to achieve a better understanding of the topological view on the data, to reveal features that are invisible in standard techniques, and should demonstrate how the structural view and geometric details complement each other for in-depth analysis of high-dimensional clusterings. After reviewing data analysis aspects, we also consider advantages of the framework for unclassified data.