Weakly Semi-supervised Tensor Factorization

In this section, we introduce a novel weakly semi-supervised pattern discovery method based on tensor factorization. Given a set of multi-aspect data, the objective of this algorithm is to discover a set of patterns that are faithful to the data, but also reflect experts’ domain knowledge of the dataset. We first present the users with patterns generated from a standard tensor factorization toolkit. Users then provide their feedback on the outputs through various interactions with the system. The system incorporates the feedback into the model and updates the patterns to match users’ expectations.

5.4.1 Standard Tensor Factorization

Given multi-way data represented as a tensor, standard tensor factorization can be applied to extract an initial set of patterns. We use non-negative CP decomposition to factorize the tensor into a set of components. With a specified rank R, conventional tensor factorization seeks a set of latent factor matrices from a multi-way tensor X ∈ RI1× I2... × IM_{. The objective}

is to minimize the following cost function:

L0 = X − U 2 , (5.1)

where _U = U(1)_{, U}(2)_{, . . . , U}(M )_{and U}(m) _{is the factor matrix that corresponds to the}

m-th dimension of tensor X .

5.4.2 Weakly Supervised Tensor Factorization

In this work, we propose a weakly supervised tensor factorization algorithm that interactively allows the users to incorporate their domain knowledge and drive the factorization process. The model provides two kinds of feedback to aid the factorization process: 1) feedback on the patterns and 2) feedback on the items.

5.4.2.1 Feedback On Patterns Presented with a set of initial patterns from standard tensor factorization, our domain experts express their concerns with two phenomena. First, there are uninformative patterns that have little interpretable use. For example, one pattern could be almost uniformly distributed in all of its descriptors. They would like the system to remove such patterns. The other concern is that experts might see several patterns that are almost identical, which diminishes the values of interpreting these patterns individually. We seek to support interactions that allow the users to interactively delete or merge patterns (see Section5.5.5for details about interaction support). Given a collection of R components C = {C1, C2, . . . , CR} that a model outputs, each operation of merge or delete would reduce

the rank of the model by one, i.e, R−1. Specifically, deleting the r-th pattern Cris equivalent

to the operation of removing r-th column u(m)r from U(m), ∀m ∈ {1, . . . , M }. Consider two

patterns Ci and Cj to be merged. We first remove both from C, and then add component

Ck = {u(1)_k , . . . , u(M )_k } to C, where u(m)_k = u(m)i + u (m)

j , ∀m ∈ {1, . . . , M }. After users’

interactions, we seek to obtain a new factor matrix U(m0) _{for each factor matrix U}(m) _{as the}

reference matrix for next iteration of pattern discovery with the following regularization:

L1= M X m U (m)_{− U}(m0) 2 F, (5.2)

which forces the factorization outputs to be close to the reference factor matrix.

5.4.2.2 Feedback On Items Tensor factorization has been used as a way to discover latent relationships among items. A factor matrix U(m) _{∈ R}Im× R _{can be considered as}

the item embeddings. With this information, item relationships can be further explored and used as a tool to verify the model correctness. Consider that in the latent space, if the items are clustered in a way that is not intuitive to domain experts, the data-driven factorization process is potentially flawed and will need correction. To this end, we design a set of interactions that allow users to adjust item relationships in a 2-D space based on their domain knowledge. We use such feedback as a reference for next iteration of the factorization. Specifically, we can infer a matrix P(m) _{∈ R}Im×2 _{which captures the updated}

Given P(m0)_{, we infer a pairwise distance matrix between the items based on heat kernel with}

a local scaling schema [248]:

W(m 0 ) ij = exp − 1 σiσj P(m 0 ) i − P (m0) j 2! , (5.3) where P(m 0 ) i and P (m0)

j are the coordinates for item i and item j in the m-th mode, and σ

varies for each item. σi is determined based on the distance between item i and its K-th

nearest neighbor, where K = min(7, Im). 7 is used because it has been shown to give good

results [248]. With W(m)_{, we derive the following cost function:}

L₂ = M X

T r(U(m)TL(m)U(m)), (5.4)

where L(m) _{is the Graph Laplacian matrix, computed as}

L(m)= D(m)− W(m), (5.5)

where D(m) _{is a diagonal matrix whose entries D}(m) ii =

jW (m) ij .

We want to note that we can provide the interactions that distill experts’ knowledge of the pair-wise relationship between patterns as another regularization to enforce the new set of patterns to exactly ensemble such relationship. However, our experts believe this process is not as straightforward as it is for the items because patterns can be a noisy reflection of the data and to adjust the pair-wise relationships between patterns, one needs to completely understand the entire pattern space.

5.4.2.3 Overall Objective Function To summarize, using the initial set of patterns from a standard tensor factorization, users can provide their feedback to the model by performing operations on the patterns and items. The system incorporates their knowledge and uses it as a regularization for the factorization in the next iteration. The overall objective function for factorization with supervision is as follows:

L = L0+ αL1+ βL2, (5.6)

Player

Quarter Zone

Treemap View Model Inspection View

Pattern Projection View b1 c1 c2 d e2 _e3 b b1a b1b b1c

Item Projection View b2

Pattern Query e1

Figure 28: Using FacItto interpret, fine-tune and scrutinize patterns based on tensor factorization from NBA shot data: (a) Model Inspection View provides various metrics of model sensitivity for selecting a desirable setting of rank from different aspects. (b) Pattern Pro- jection View provides users high-lever overview of the entire pattern space. (c) Circular Bar Charts (c1) and Treemap view (c2) allow for examining the detailed content of patterns. (d) Pattern Comparison Mode allows users to analyze pairs of common and discriminative patterns and their associated items. (e) Pattern Query Mode enables users to retrieve most relevant patterns (e2) by query (text) input (e1) and item bars (e3).

5.4.3 Summary

FacIt features a weakly semi-supervised factorization model to iteratively incorporate domain experts’ feedback. We wanted to note that FacIt shares this goal with Utopian [35]. However it is different in the tasks it addresses and therefore the optimization objective it follows. Utopian was proposed as an interactive topic discovery tool and limited to 2-dimensions, while our method is more generic for discovering, presenting and interpreting patterns from high-dimensional datasets.

exact set of requirements from domains of literature and requires experts who are experienced in pattern discovery from high-dimensional data. As a result, the design requirements are not entirely the same. For example, pattern (topic) deleting/merging is one of the shared operations because, in both cases, users’ control over patterns needs to be acknowledged. However, the item modification presented in this work does not pertain to the same purpose and therefore has different underlying mechanisms, compared to word modification in [35]. Unlike topic modeling, where a topic can be easily modified by changing the weights of its keywords, the complexity of modifying descriptor distributions changes dramatically with the increase of tensor modes. Indeed, domain experts did not appreciate this interaction when we were introducing this function to them. Instead of modifying the distribution of the items, experts preferred to manipulate items in the embedded space to incorporate their feedback. Through such straightforward interactions, the relationship between items becomes more aligned with experts’ expectation.

Since objective function L is not convex with respect to _U, we aim to find a local minimum for L by iteratively updating each factor matrix in _U.

Let U represent the mode-m factor matrix. For simplicity of notation, we use U to denote the set of factor matrices that correspond to modes other than m. Then, the optimization of U is equivalent to the following least squares loss functions:

U ← argmin U≥0 1 2( 1 n X − U (U)T 2 F) + α T r(UTLU) + β U − U 0 2 F , (5.7)

where X is the mode-d unfolding of tensor X . Then the gradient update of U can be computed as: 5UL = 1 n U(U) T _{− X}_(U) + αLU + β(U − U0). (5.8)

In document Algorithms, applications and systems towards interpretable pattern mining from multi-aspect data (Page 141-146)