Multi-Aspect Pattern Presentation - Algorithms, applications and systems towards interpretable

The key to interpretability is to present the results in understandable terms. This section surveys the existing work for various ways to present the results of the multi-aspect mining. We divide the literature into two categories: ones that statically present the results as in most of the publications; and ones that provide interactive support of pattern presentations as mostly seen in the field of visualizations.

2.5.1 Pattern Presentation in Literature

There are different ways of communicating the multi-aspect patterns to the readers in the existing work. We call it a static pattern presentation because there is no indication of an interactive system being developed to aid their discovery process in their papers. Therefore, we assume the authors of these studies have gone through a process to present selective results as a showcase of the models. We also assume that they explore and interpret all the patterns in the same way that they do in the paper. Since pattern presentation from multi- aspect data usually involves several graphics, one for each descriptor, we categorize their displays into two groups based on how authors have arranged the presentations of different descriptors: adjacent, and isolated alignments.

Few studies examine the results more from the perspective of individual factor matrices. Rather than going through the entire pattern, they look at the columns of each descriptor separately [29,64,172,251,257]. For example, Chen et al. [29] analyze the speed patterns via tensor decomposition based on a network traffic speed dataset. The authors show the results with each factor matrix instead of organizing the results by patterns. This makes sense as the purpose of the work is to identify interpretable traffic patterns with varying levels of missing values. With such an organization, authors can display how the resultant factor matrix varies with different training data used. However, it can be challenging to comprehend a single pattern as the visual explanation of one pattern is split into various figures. Similarly, Gauvin et al. [64] present the results organized by the factor matrix, instead of the pattern. This makes it easier to see the differences between different components in each factor matrix. However, the trade-off is that it cannot explain the pattern as a whole.

We have seen more often authors use adjacent alignments of descriptors (e.g., [5,13,56,

57,61,168,190,211,233]). In practice, authors use a dedicated graphic to describe each of the descriptors and then graphics of all descriptors associated with a pattern are positioned side-by-side, to deliver a comprehensive set of perspectives of the pattern. In this way, each graphic provides a complementary explanation of the pattern, and its interpretation involves walking through each figure to get a complete understanding. For example, Williams et al. [233] propose TCA (Tensor Component Analysis) to discover latent components from

three-dimensional tensor of neuron × temporal × trial based on a simulated neuron activity dataset. To demonstrates the results, they show eight noteworthy components from a 15- component model, where each of them consists of three graphics from left to right, for the neuron, temporal, and across-trial descriptors, respectively. In another example, Fan et al. [57] proposes citySpectrum to model the city dynamics with a four-dimensional tensor hour × day × region × P OI, based on a mobile GPS dataset. Similarly, they present each descriptor with a dedicated figure and show two interesting patterns related to “entertaining” and “commercial” life patterns in their results.

2.5.2 Interactive Pattern Discovery

More recently, researchers have concerned about the interpretability of pattern discovery from multi-aspect data and attempted to ease the process using visual analytic systems [26,

125,240]. Viola [26] is among the few efforts to interactively present the patterns for anomaly detection in the traffic data. It is a novel tensor-based anomaly analysis algorithm with visu- alization and interaction design that can dynamically produce interpretable data summaries and allows the domain experts to ranking anomalous patterns. Compared to the existing practice in visualizing results from multi-aspect mining, Viola introduces the interactive pattern exploration mechanism.

TPFlow by Liu et al. [125] uses a piece-wise rank-one tensor decomposition algorithm to automatically slice the data into homogeneous partitions and extract the latent patterns in each partition. Compared to Viola, TPFlow has the advantage of understanding the entire pattern space as a result of the progressive partitioning framework. Yan et al. [240] provide a visual analytic system for pattern discovery in bike-sharing data, which introduces the pattern relation view to describe the relations between the patterns. The pattern relation view is helpful for users to browse patterns quickly and find interesting patterns.

2.5.3 Summary

The existing practice in pattern presentation often hinders the interpretation of the results of the tensor factorization because the pattern presentations are not typically matched with how

Table 3: Existing Work in Pattern Presentation From Multi-Aspect Mining.

Studies Interactive Pattern Presentation [5,13,56,194,211,233], [57,61,168,190,212], [14,108,175,204,226,259], PairFac, iDisc × adjacent [29,64,172,251,257] × isolated [26,125,240] _X adjacent

FacIt _X multi-scaled, integrated, adjacent

a human perceives they are. The adjacency alignment of the descriptors can be considered as a brute force way of throwing everything about patterns to the domain experts without providing aids in how to connect different descriptors and how to connect different patterns. This problem exaggerates, especially when the number of patterns experts need to go through is large. We argue that to mitigate the mismatch, we need solutions that have “user-first” principles in mind.

We have seen recent efforts in developing a people-centric design of pattern exploration from multi-aspect data [26,125,240]. However, they are situated in a spatial-temporal con- text, which provides a limited understanding of domain experts’ requirements when working with and interpreting patterns in a generic multi-aspect data setting. Another aspect of bridging human understandability with pattern presentation is enabling people to be part of the pattern discovery and exploration process. Although Yan et al. [240] allow the users to perform a set of operations with the patterns (e.g., merge, reset, etc.), they are restricted as one-way interaction as the underlying modeling process does not take such feedback into consideration of pattern updating.

3.0 PAIRFAC: EVENT ANALYTICS THROUGH DISCRIMINANT

In document Algorithms, applications and systems towards interpretable pattern mining from multi-aspect data (Page 43-47)