My dissertation proposes a framework for interpretable tensor factorization for multi-aspect data. To formulate the framework, it presents three studies that incrementally address interpretability in the process of pattern discovery. Beyond the specific results of each study previously discussed, reviewing them as a whole could lead to key insights into a better design towards interpretable pattern discovery from multi-aspect data.
6.2.1 Multiplex Pattern Discovery to Ease the Mismatch Between Human In- formation Need and Naive Error-Based Optimization
To properly repair the mismatch, this dissertation argues that one plausible way is to un- derstand the information need and design customized models beyond the standard pattern discovery process. The different information needs presented in this dissertation are all struc- tured under the same idea of understanding the exact information need and conceiving the corresponding problem formulation with both reconstruction and human information need. Our first study targets event impact analytics in the aftermath of disasters in a city. Compared to a typical participatory impact assessment, there is the need for an expeditious data-driven evaluation mechanism to present an understanding of the impact on the stake- holders in the city. Accordingly, we formulate a problem of contrasting pattern discovery in the mobility data, leveraging its multi-aspect nature. PairFac is designed to identify underly- ing mobility patterns, for understanding persistent and the changing patterns among them. Our second study tackles the information need to understand the behavioral patterns of users on MOOC platforms from different performance groups. In addition to recognizing the patterns that lead to different performance outcomes, there is also a need to explore patterns at multiple scales. To cater to the information need, we formulate a problem of multi-level discriminative pattern discovery from a pair of tensors. iDisc is an iterative framework that reveals the contrasting patterns at multiple levels. In our last study, we address the problem of generic tensor factorization. The multiplex pattern discovery works in such a way that it presents a comprehensive list of metrics. Users can tune the model directly based on their information needs, including sparsity, stability, and quality of reconstruction, in the model inspection tool of FacIt.
This connects to existing work in “model-based” interpretable supervised machine learn- ing, where users may favor a revised model for the sake of being able to interpret it. For example, smaller models [17,78,85] or sparse models [254] are preferred over large, black-box models.
6.2.2 Multifaceted Pattern Evaluation to Mine Under Insufficient Evaluation Criteria
While existing unsupervised learning focuses on revealing underlying data patterns, there has not been a systematic way to evaluate these patterns. When evaluating of tensor fac- torization, one line of work focuses on validating patterns from domain experts’ points of view (refer to survey paper at [7]). While pattern examination often leads to hidden insights in multi-way interactions, how it deepens our understanding of the data is unclear. Another line of work directly evaluates via applications of the patterns in downstream tasks (e.g., recommendations [21,97,175,182,185]). However, despite the success of such work, they still leave the users with a black-box model without explaining the underlying mechanism for the generation of recommendations.
Given the increasingly popular use of tensor techniques, we call for a multifaceted pattern evaluation, which considers quality, validity, and utility of the results: quality stands the set of metrics that evaluate the overall factorization performance, such as reconstruction error; validity indicates how well are the results aligned with experts’ expectations based on their domain knowledge; and utility suggests the applicability of the results in downstream tasks, such as clustering, classification, or recommendations. In our first study, PairFac was able to generate a set of patterns that describe the impacts of major events in the city. However, it is clear unclear how the patterns can be used beyond explaining and examining what has happened. In our second study, we conduct an intrinsic evaluation to make sure the patterns from iDisc make sense to experts. In addition, we involve domain experts to qualitatively examine the utility of the patterns in a classification task. In our third study, FacIt first presents a comprehensive set of quality indexes. Then, the validity of the patterns is checked by the experts via directly examining them, and the utility of patterns is verified by inspecting the pairwise relationships between items based on the patterns.
We need to acknowledge that such evaluation schemas are not new to the field of tensor factorization. For example, Ho et. al [87] addressed the interpretability and predictivity of phenotypes discovered from multi-aspect data built from electronic health records. This echoes with our call for multifaceted pattern evaluation in both validity and utility.
6.2.3 Multipurpose Pattern Presentation to Overcome the Mismatch Involved Domain Knowledge and Human Understandability
To the best of our knowledge, this dissertation presents the first attempt to involve experts in the process of pattern discovery in a generic, multi-aspect setting, with novel pattern presentations and interaction mechanisms.
Tensor factorization has many applications in a wide range of domains, e.g., telecom- munications [46,196,197], neuroscience [136,143,147], and data mining [206,207]. However, few applications involve the domain experts in the process of pattern discovery. Our first two studies fall into the category of not utilizing experts’ domain knowledge. It has become increasingly alarming to us how much of a gap there is between discovered results and results that experts can understand. To address this problem, our last study argues that having experts in-the-loop along with the thoughtful design of pattern presentation can lead them to explore better, interpret, and refine patterns.
Multipurpose pattern presentation features several novelties in the visualization design of patterns. First, it sits in an interactive visual analytics system, which allows experts to manipulate patterns and provide feedback to refine them. Second, the pattern presentation features high-level summary displays that empower efficient exploration and identification of patterns. Last but not least, the pattern presentation is enriched by both quantitative and qualitative visualizations that allow for detailed pattern examination on demand. We believe the multipurpose nature of pattern presentation elevates tensor pattern discovery to a transparent and effective process for human understandability, an interactive and responsive mechanism to build upon domain knowledge.