CLUDE: An Efficient Algorithm for LU Decomposition Over a Sequence of Evolving Graphs

(1)

Title

CLUDE: An Efficient Algorithm for LU Decomposition Over a

_{Sequence of Evolving Graphs}

Author(s)

Ren, CH; Mo, L; Kao, CM; Cheng, CK; Cheung, DWL

Citation

Advances in Database Technology - EDBT 2014: Proceedings of

the 17th International Conference on Extending Database

Technology, Athens, Greece, 24-28 March, 2014

, p. 319-330

Issued Date

2014

URL

http://hdl.handle.net/10722/203633

(2)

CLUDE: An Efficient Algorithm for LU Decomposition Over

a Sequence of Evolving Graphs

Chenghui Ren, Luyi Mo, Ben Kao, Reynold Cheng, David W. Cheung

Department of Computer Science, University of Hong Kong

Pokfulam Road, Hong Kong

{chren, lymo, kao, ckcheng, dcheung}@cs.hku.hk

ABSTRACT

In many applications, entities and their relationships are represented by graphs. Examples include the WWW (web pages and hyperlinks) and bibliographic networks (authors and co-authorship). A graph can be conveniently modeled by a matrix from which various quantitative measures are derived. Some example measures include PageRank and SALSA (which measure nodes’ importance), and Personal-ized PageRank and Random Walk with Restart (which sure proximities between nodes). To compute these mea-sures, linear systems of the form Ax = b, where A is a ma-trix that captures a graph’s structure, need to be solved. To facilitate solving the linear system, the matrix A is often de-composed into two triangular matrices (L and U ). In a dy-namic world, the graph that models it changes with time and thus is the matrix A that represents the graph. We consider a sequence of evolving graphs and its associated sequence of evolving matrices. We study how LU-decomposition should be done over the sequence so that (1) the decomposition is efficient and (2) the resulting LU matrices best preserve the sparsity of the matrices A’s (i.e., the number of extra non-zero entries introduced in L and U are minimized.) We propose a cluster-based algorithm CLUDE for solving the problem. Through an experimental study, we show that CLUDE is about an order of magnitude faster than the traditional incremental update algorithm. The number of extra non-zero entries introduced by CLUDE is also about an order of magnitude fewer than that of the traditional algorithm. CLUDE is thus an efficient algorithm for LU de-composition that produces high-quality LU matrices over an evolving matrix sequence.

1. INTRODUCTION

Graphs are a powerful tool which model real world enti-ties and their relationships through nodes and edges. For example, a graph can be used to model a social network for which users are represented by nodes while their

inter-(c) 2014, Copyright is with the authors. Published in Proc. 17th Inter-national Conference on Extending Database Technology (EDBT), March 24-28, 2014, Athens, Greece: ISBN 978-3-89318065-3, on OpenProceed-ings.org. Distribution of this paper is permitted under the terms of the Cre-ative Commons license CC-by-nc-nd 4.0

actions (such as friendship or whether they have recently communicated, etc.) are represented by edges. A graph can also model web pages and the hyperlinks connecting them, or model a bibliographic network capturing co-authorship between authors.

A number of measures have been proposed for analyzing graph structures. Examples include PageRank (PR) [22] and SALSA [18] (which measure the importance of nodes), and Personalized PageRank (PPR) [12], Discounted Hitting Time (DHT) [14] and Random Walk with Restart (RWR) [23] (which measure the proximities between nodes). These mea-sures have extensive applications, especially in the structural analyses of the underlying information the graphs model. For example, Google employs PR to rank search results [17], and PPR is often used in node clustering and community de-tection [2].

A common property of these measures is that computing them requires solving certain linear systems. As an exam-ple, consider RWR: we start from a node u in a graph and at each step transit to another node. Specifically, with a probability d (called the damping factor), we transit to a neighboring node via an edge, and with a probability (1-d), we transit to the starting node u. We can compute the sta-tionary distribution of the nodes (i.e., how likely that we are at a particular node at any time instant). Intuitively, a large stationary probability of a node v implies that v is close to node u under the RWR measure. Let xube a

vec-tor representing the stationary distribution such that xu(v)

represents the stationary probability of node v, then xucan

be determined by solving:

xu= dW xu+ (1 − d)qu, (1)

where W is the column normalized adjacency matrix of the graph1_{and q}

uis a vector whose only non-zero entry is qu(u)

= 1. Let I be the identity matrix, Eq. 1 can be rewritten as

Axu= bu,

where A = I − dW and bu= (1 − d)qu. In fact, each of the

measures we have mentioned can be similarly determined by solving an equation of the form Ax = b for x by composing a matrix A and a vector b. In this formulation, the matrix Adepends solely on the graph structure (and the measure to be determined), while the vector b can be considered as an input query for the measure. For example, by setting b = bu

1_{If (i, j) is an edge in the graph, then W (j, i) = 1/λ(i),}

(3)

for various nodes u, we obtain the stationary distributions of different starting nodes u for the RWR measure.

In a dynamic world, the information modeled by the graph changes with time. For example, a hyperlink is added to a web page, or a Facebook link between two friends is estab-lished. The graph thus evolves with time. In [25], it was proposed that evolving graphs should be archived and ana-lyzed as an Evolving Graph Sequence (or EGS). An EGS is a sequence of snapshot graphs, each of which captures the world’s state at a particular instant. As we have discussed, a graph’s structure can be conceptually represented by a matrix (A) from which various measures can be computed. Hence, as the graph evolves, so are the matrices and the measures. An interesting question is “How shall all these matrices be processed so that graph-based measures can be computed efficiently to support EGS analysis?”

Before we discuss how we tackle the problem, let us first consider a few motivating examples to illustrate how graph-based measures over an EGS could lead to interesting anal-ysis.

Example 1 PageRank (PR) [22] is a widely used metric to measure the importance of hyperlinked web pages. A web publisher who makes money by putting Google Ads on his web contents, for example, would be very interested in the PageRank score of his web site. To illustrate how PageRank scores change with time, we have collected 1000 daily snapshots of a set of 20,000 Wikipedia pages and their hyperlinks. Figure 1 shows how the PR score of a Wiki page labeled 152 changes over the 1000 snapshots. From the figure, we see a number of interesting key moments at which the PR score changes significantly. To illustrate, let us discuss a few of them, which are marked by arrows in Figure 1. These PR score changes are also illustrated in Figure 2.

First we see a sharp rise of the score at snapshot #197. On further investigation, we found that at that snapshot, new links pointing to Page 152 were added to two other pages (see Figures 2 (a) and (b)). Since these two pages (Pages 777 and 1169) had very large PR scores, they contributed much to the rise of Page 152’s PR score at snapshot #197. Page 152’s high PR score, however, was short-lived. A rapid decline of the score was observed at snapshot #247. We found that at that snapshot, a high-PR page (Page 8774), which only pointed to Page 152, was edited with 30 more new outgoing links added to it (see Figure 2 (c)). This drastically reduced Page 8774’s contribution to Page 152’s PR score, resulting in its sharp drop. Next, we see that the PR score of Page 152 steadily declined over a one-year period (from snapshot #585 to #912). We found that over that period, no new pages with large PR scores linked to Page 152 while at the same time the out-degrees of the pages that were pointing to Page 152 gradually increased. Hence, their contributions

to Page 152’s PR scores were gradually reduced. ✷

In this example, we see that interesting events occurred which led to significant changes to the measure. To discover such events, we need to identify the key moments in order to focus the investigation over a manageable set of snapshots. To achieve that, it is best to display the measure as a time series from which key moments are extracted. This, in turns, requires that the measure be evaluated over the whole EGS. Example 2 In Google’s official guide on improving a web

0 200 400 600 800 1000 1 1.1 1.2 1.3 1.4 1.5 1.6x 10 -3 Snapshot number PageRank score #247 #197 #585 #912

Figure 1: PR score of Wiki Page 152 over a 1000-day EGS. 8774 152 8774 152 777 1169 8774 152 777 (a) #196 (b) #197 (c) #247 1169

Figure 2: Graph structure at specific snapshot.

page’s PR scores2_{, a number of actions are recommended.}

Some of these actions include translating the web page to other languages, publicizing the web site through newslet-ters, providing a rich site summary (RSS), and submitting the web site to various web directories, etc. How shall we evaluate the effectiveness of these actions? What are the usual lag times between the actions and their observable effects? To answer those questions, we need to systemati-cally analyze web pages’ PR scores as time series (such as the one shown in Figure 1) and discover any association be-tween various actions taken and any observable changes to the measure. This again requires the PR score be computed

at every snapshot. ✷

Example 3 Link prediction has been a popular topic es-pecially in the data mining literature. Most of existing works on link prediction consider a static graph snapshot and eval-uate certain “proximity” measure (e.g., RWR) with which “closest node pairs” are identified as potential endpoints of links. If one has access to not only, say, the RWR scores on a single graph snapshot, but the time series of the scores over an EGS, then the upward/downward trends of the scores provide an important dimension based on which link predic-tions could be made. The availability of the scores as time series allows a wealth of prediction techniques to be applied to predict links, such as those mentioned in [16] and [7]. ✷ An EGS is a sequence of graphs G = {G1, . . . , GT}. As we

have mentioned, each graph Gi derives a matrix Ai, which

is composed based on the structure of Gi and the kind of

measure to be evaluated. An EGS thus derives an evolving matrix sequence (EMS) M = {A1, . . . , AT}. To obtain a

series of measure values (such as Figure 1), we need to solve the equation Aix= b for each matrix Ai. Our objective is

to study how these equations can be solved efficiently. A standard method to solve a linear system is to perform

2_{http://www.googleguide.com/improving_pagerank.}

(4)

Gaussian Elimination (or GE), which is a rather expensive operation for large graphs. Recall that the vector b is an input of the measure (e.g., we set b = bu to compute the

RWR scores for the case where the starting node of the ran-dom walk is u). Repeatedly applying GE for each input b is very expensive. An alternative approach (LU decomposi-tion) is to first decompose the matrix Aias a product of a

lower triangular matrix Li and an upper triangular matrix

Ui (i.e., Ai = LiUi). Although LU decomposition is

com-putationally similar to GE (and so they have similar cost), once the matrix Aiis decomposed, all linear systems of the

same matrix Aican be solved very efficiently using forward

and backward substitution methods [9]. Hence using LU decomposition allows us to avoid performing expensive GE repeatedly for different input b’s. As an example, with our Wikipedia dataset, once the matrix is LU-decomposed, solv-ing the linear system is about 5,000 times faster than exe-cuting one GE. Hence, we reduce the problem of solving the linear system Aix= b to decomposing the matrix Ai.

To derive an efficient method to decompose all the ma-trices in an EMS, we first need to understand the prop-erties of the EMS. For most applications of interest, the snapshot graphs (and hence the matrices of an EMS) are (1) sparse and (2) gradually evolving. As an example, for our Wikipedia dataset, the average out-degree of a node is 7. Also, successive snapshots share more than 99% of their edges. The first property calls for LU decomposition meth-ods that are specialized for sparse matrices, while the second property suggests incremental LU decomposition be applied. Let us briefly review the two techniques.

Given a sparse matrix A, decomposing it into L and U usually does not perfectly preserve its sparsity. That is, some 0 entries in A would become non-zero in L and U . These entries are called fill-ins. A large number of fill-ins is undesirable because (1) it takes more space to store the matrices L and U , (2) it slows down forward and backward substitutions in solving the linear systems, and most im-portant of all, (3) it increases the decomposition time3. A common approach to reduce the number of fill-ins is to apply a reordering technique, which shuffles the rows and columns of A before the matrix is decomposed. Although finding the optimal ordering of A to minimize the number of fill-ins is an NP-Complete problem [26], there are a few heuristic reorder-ing strategies, such as Markowitz [20] and AMD [1], which have been shown to be very effective in reducing the number of fill-ins in practice. The quality of an LU decomposition refers to the number of ins. Intuitively, the fewer fill-ins, the higher the quality of the decomposition. Different orderings of the matrix induce different decomposition qual-ity. As we have mentioned, a higher-quality decomposition generally gives faster decomposition time as well as faster equation solving time (in the execution of forward/backward substitutions).

There are previous studies on how to perform incremental LU decomposition. Specifically, given a matrix A1 and a

low-rank matrix ∆A1, Bennett’s algorithm [5] computes the

LU factors of A2= A1+∆A1from the LU factors of A1and

the updates ∆A1. The complexity of Bennett’s algorithm

is proportional to the rank of ∆A1 and the number of

non-3_{A number of factors affect the execution time of LU}

decom-position on a sparse matrix. A major factor is the number of non-zero entries in the resulting L, U matrices. The general complexity, however, is unknown [9].

zero entries in the LU factors of A2. It has been shown

that Bennett’s algorithm is much more efficient than directly computing the LU decomposition on A2if the update matrix

∆A1 has a low rank.

Ideally, to achieve fast LU decomposition over an EMS, we should perform reordering to reduce the number of fill-ins and apply incremental LU decomposition such as Bennett’s algorithm. Unfortunately, integrating the two techniques is tricky. This is because to apply an incremental algorithm, the same ordering (if any) has to be applied to both matri-ces A1 and A2. However, an ordering that is best for A1

may not be so for A2. This issue is even more pronounced

when we attempt to apply Bennett’s algorithm onto a long EMS. This is because a good ordering for A1could be badly

unfit for the last matrix AT of the EMS, resulting in large

numbers of fill-ins and thus very slow LU decomposition. Another issue of applying incremental LU decomposition over an EMS is how the various factors Liand Uiare

repre-sented. Since these LU factors are typically sparse, they are implemented using adjacency lists, which store the non-zero entries of the factors. When we apply Bennett’s algorithm to compute L2, U2 from L1, U1, the adjacency lists

repre-senting L1, U1 would have to be (structurally) modified to

form the adjacency lists for L2, U2. This structural update

of the data structures (such as node inserts and deletes in the linked lists) turns out to be a dominating cost of the incremental algorithm compared with the numerical com-putation.

In this paper we propose an algorithm CLUDE for per-forming LU decomposition on matrices of an EMS. CLUDE groups the matrices in an EMS into clusters and apply an in-cremental algorithm to decompose the matrices within each cluster. The idea is that if matrices of the same clusters are sufficiently similar with each other, then we may derive an ordering that generally fits all the matrices in the cluster. This cluster-based ordering allows the decomposition of the matrices to be of high quality, which leads to faster LU de-composition. Moreover, since the same ordering is applied to all the matrices in the cluster, an incremental decompo-sition algorithm can be applied. Finally, CLUDE applies symbolic computation to build an adjacency-lists structure that covers all the non-zero entries of all the LU factors of a cluster. This avoids the expensive structural changes to ad-jacency lists that happens when the incremental algorithm is straightforwardly applied.

Here we summarize our contributions.

• We propose to study the problem of LU decomposition over a sequence of evolving matrices, which finds many applications especially those that involve the sequen-tial analysis of graph structural measures.

• We study the interaction between matrix ordering and incremental decomposition algorithm, with a focus on optimizing decomposition quality and speed.

• We propose the algorithm CLUDE which employs a clustering strategy to partition the matrices in an EMS so that a universal ordering and a universal data struc-ture can be applied to all the matrices in a cluster. • We perform an extensive experimental study using both

real and synthetic datasets to evaluate CLUDE and compare it against other decomposition methods. Our experiment shows that CLUDE is up to an order of

(5)

Table 1: Glossary of symbols

Symbols Meanings

n number of nodes in a snapshot graph

A, L, U an n × n matrix and its LU factors b

A decomposed version of A

x, b n × 1 vectors

sp(A) sparsity pattern of A, i.e., {(i, j)|A(i, j) 6= 0} fp(A) fill-in pattern of A

˜

sp(A) symbolic sparsity pattern of A, i.e.,_{sp(A) ∪ fp(A)} O, O∗

(A) a matrix ordering; the Markowitz order of A AO, LO

, UO

reordered matrix A and its LU factors b

AO decomposed reordered version of matrix A

A∗ reordered matrix A with O

∗

(A), i.e., A∗

= AO∗_(A)

ql(O, A) quality-loss of applying O on A

A∩, A∪ intersection and union of the matrices in a_{cluster in terms of their sparsity patterns}

magnitude faster than the basic incremental algorithm and at the same time achieves up to an order of mag-nitude smaller number of fill-ins.

The rest of this paper is organized as follows. In Sec-tion 2 we cover the basics of tradiSec-tional LU decomposiSec-tion. We present a formal problem definition in Section 3 and de-scribe the various algorithms in Section 4. In Section 5 we extend our solution to one that can guarantee decomposition quality. In Section 6 we present the experimental results. In Section 7 we present a case study showing how interesting observations can be obtained by analyzing certain measures on real data. In Section 8 we discuss some related works. Finally, we conclude the paper in Section 9.

2. PRELIMINARY

In this section we give some details of LU decomposition and matrix reordering. We will also define the various sym-bols and notations used in the paper. Figure 3 illustrates the various concepts and Table 1 lists the frequently used symbols.

2.1 LU decomposition

A system of linear equations Ax = b, where A is an n × n non-singular matrix, has a unique solution x = A−1b. A straightforward method to solve for x is to first invert the matrix A. After that, x can be computed by multiplying A−1 to any input query b. The problem of this approach is that when A is sparse, A−1 _{is usually dense [24] (e.g., see}

Figures 3(b) and (i) for a sparse matrix A and its dense in-verse). It thus takes O(n2) space to store A−1, which is im-practical for large graphs (and matrices). Besides, comput-ing A−1_b_{takes O(n}2_{) time (because A}−1 _{is dense), which}

is too expensive, considering that it has to be done for ev-ery input quev-ery b. The matrix inversion method is thus impractical.

To facilitate solving x for various input query b, we de-compose a matrix A into its LU factors. The decomposition can be done by the Crout’s method [9]. Figure 3(c) shows an example decomposition. Now, {Ax = b} ⇔ {L(U x) = b}. To find x, we first perform forward substitution to get U x, followed by a backward substitution process to obtain x [9].

2.2 Preserving sparsity in LU decomposition

Let bA= L+U be the decomposed representation of matrix A. If the number of non-zero entries in bA is k, then the complexity of forward/backward substitutions is O(k). Also, the time complexity of Crout’s method is a function of k. As we have explained in the introduction, a good decomposition should preserve the sparsity of the matrix A as much as possible. That is, k should be kept small, which is typically achieved by applying some reordering technique. Here, we briefly discuss reordering. First, some definitions.

Definition 1 (Sparsity pattern). Given a matrix A, its sparsity pattern, denoted by sp(A), is the set of indices of Aat which A contains non-zero values. That is, sp(A) := {(i, j) | A(i, j) 6= 0}.

Figure 3(a) shows an illustration of a sparsity pattern. Note that decomposing a matrix A into its LU factors may introduce extra non-zero entries. This is illustrated by Fig-ures 3(a) and 3(d), which show the sparsity pattern of the original matrix A and that of its decomposed form bA, re-spectively. These extra non-zero entries introduced by LU decomposition are called fill-ins. (In Figure 3(d), fill-ins are shown as dark grey entries.)

To reduce the number of fill-ins, we reorder the matrix A based on an ordering O.

Definition 2 (Ordering). An ordering O = (P , Q) is a pair of n-by-n permutation matrices P , Q. Each row or column of a permutation matrix contains exactly one non-zero entry, whose value is 1. (Figure 3(j) shows an example.) We say that a matrix A is reordered by the ordering O into a matrix AO

if AO

= P AQ.

Figure 3(f) shows a reordered matrix. Instead of decom-posing the original matrix A, we decompose the reordered matrix AO

instead into two factors, denoted by LO

and UO

(see Figure 3(g)). The purpose of reordering the matrix is to reduce the number of fill-ins. Let bAO= LO

+ UO

be the decomposed (“b ”) reordered (“O”) version of A. Figure 3(h) shows its sparsity pattern, sp( bAO

). Compared against the sparsity pattern of sp(AO

) (Figure 3(e)), there is only one fill-in brought about by the decomposition. The reordering step has thus resulted in much fewer fill-ins compared with the original decomposition (Figure 3(d)). One of the best reordering strategies is given by Markowitz, which has been shown to be very effective [20].

Given the LU factors, LO

and UO

, of AO

, solving the original equation Ax = b for x is simple. Note that,

{Ax = b} ⇔ {P−1_AO Q−1x= b} ⇔ {AO (Q−1_{x) = P b}.} Let x′ = Q−1_x _{and b}′ = P b, we have, AO x′ = b′ . Given LO and UO , x′

can be solved efficiently using for-ward/backward substitutions. Finally, x is computed by x = Qx′

. Note that the permutation matrices P and Q contain only one non-zero entry in each row or column. Therefore, computing b′

= P b and x = Qx′

takes only O(n) time.

2.3 Implementing LU decomposition on sparse

matrices

For most applications of interest, the matrix A and its LU factors L, U are sparse. They are thus typically represented

(6)

(b) A 1 ‐.85 ‐.85 ‐.43 ‐.28 1 ‐.43 ‐.28 1 ‐.28 1 ‐.85 1 ‐.43 ‐.43 1 1 ‐.43 ‐.28 1 _‐.28 1 ‐.85 ‐.85 1 ‐.43 ‐.43 1 ‐.43 ‐.28 ‐.85 1 (c) L and U (d) A (a) A

(e) A (f) A (g) L and U (h) A

3.0 2.6 2.6 2.0 1.8 1.7 .86 1.7 .73 .57 .94 .49 .86 .73 1.7 .57 .52 .49 1.3 1.1 1.1 2.5 1.4 2.1 1 .57 .49 .49 1.0 1.0 1.9 (i) A 1 1 1 1 1 1 (j) P and Q A PAQ 1 ‐.85 ‐.85 ‐.43 1 ‐.32 ‐.16 ‐.56 1 ‐.23 ‐.20 1 ‐.26 ‐1.1 1 1 1 ‐.43 ‐.28 1 ‐.28 1 1 ‐.82 1 ‐.43 1 1 ‐.28 .76 ‐.28 ‐.24 .68 ‐.28 ‐.24 ‐.32 .77 1 ‐.43 ‐.53 .53 1 1 1 ‐.85 ‐.85 ‐.36 .52 ‐.43 1 ‐.28 ‐.85 .41

Figure 3: Illustration of LU decomposition, sparsity pattern, fill-ins, reordering, and matrix inverse.

(a) A 1 1 1 2 ‐.85 3 ‐.85 2 1 ‐.28 2 1 3 1 ‐.28 4 _‐.281 4 1 6 ‐.85 5 5 1 3 1 6 4 ‐.43 5 ‐.43 6 1 4 ‐.43 5 ‐.43 (c) U 1 1 1 2 ‐.85 3 ‐.85 2 2 1 3 ‐.32 3 3 1 4 4 1 5 ‐.26 5 5 1 6 6 1 4 ‐.43 4 ‐.16 5 ‐.56 4 ‐.23 5 ‐.20 6 ‐1.1 (b) L 3 3 .68 4 ‐.32 1 1 1 2 ‐.28 2 2 .76 3 ‐.24 4 4 .77 6 ‐.43 5 5 1 6 ‐.53 6 6 .53 3 ‐.28 4 ‐.28 4 ‐.24

Figure 4: Data structures for storing a matrix and its LU factors

using adjacency lists. Figure 4 shows the data structures for representing the matrix and its factors. The decomposition process consists of two phases [9], namely, (1) symbolic de-composition (SD-phase) and (2) numerical dede-composition (ND-phase). The purpose of the SD-phase is to determine the locations of all possible fill-ins so that the data struc-tures for representing the LU factors (see Figures 4 (b) and 4(c)) can be efficiently created. In the ND-phase, the actual values of the entries are computed.

More specifically, in the SD-phase, we determine a fill-in pattern, fp(A) [26], given by

fp(A) = {(u, v) 6∈ sp(A) | ∃u1, . . . , uk,s.t. (1) k ≥ 1,

(2) ui<min{u, v} ∀1 ≤ i ≤ k,

(3) (u, u1), (ui, ui+1), (uk, v) ∈ sp(A) ∀1 ≤ i < k}. (2)

In words, w.r.t. the graph from which the matrix A is derived, the node pair (u, v) is in fp(A) if there is a path of length-2 or longer from u to v such that none of the nodes visited along this path has an index larger than those of u

and v. We define the symbolic sparsity pattern, ˜sp(A) of a matrix A as the union of A’s sparsity pattern and fill-in pattern, i.e.,

˜

sp(A) = sp(A) ∪ fp(A). (3)

It can be shown that fp(A), as defined in Eq. 2, covers all fill-ins’ locations and so ˜sp(A) covers all the locations in sp( bA) (Figure 3(d)), i.e., sp( bA) ⊆ ˜sp(A). Hence, by determining

˜

sp(A) in the SD-phase, we get to cover all non-zero locations of the LU factors. So, the data structures for storing the LU factors can be prepared before the numerical decomposition. Note that our discussion of the fill-in pattern and the sym-bolic sparsity pattern is orthogonal to whether reordering is done. In other words, if an ordering O is first applied to the matrix A before it is decomposed, then the fill-in pattern and the symbolic sparsity pattern are defined on the matrix AO, giving fp(AO

) and ˜sp(AO

).

3. PROBLEM DEFINITION

As we have discussed in the introduction, reordering and incremental decomposition are two techniques we can apply in decomposing the matrices in an EMS. Different orderings Oi, when applied to a matrix Ai, result in different symbolic

sparsity patterns ˜sp(AOi

i ). Note that the larger ˜sp(A Oi

i ) is,

the larger is the data structure for storing the LU factors (see Section 2.3), and the longer does it take to perform the decomposition and to solve the linear system Aix= b.

Therefore, it is important that a good ordering Oi for each

matrix Aibe found, such that the size of ˜sp(A Oi

i ) is small.

One of the best reordering strategies is given by Markowitz. For any matrix A, let O∗

(A) be the Markowitz order of A, and let A∗

be A reordered with O∗

(A) (i.e., A∗

= AO∗

(A)_).

Ideally, each Ai in an EMS should be reordered into “its

best form” A∗

i before it is decomposed. There are, however,

(7)

Markowitz order of a matrix is generally as expensive as do-ing a Gaussian Elimination [9]. So, finddo-ing the Markowitz order for every matrix in an EMS is very expensive. Sec-ond, to apply an incremental LU decomposition algorithm on two successive matrices Aiand Ai+1, if we apply an

or-dering on Ai, the same ordering has to be applied to Ai+1

as well. However, O∗

(Ai) and O∗(Ai+1) could be different.

As a result, an algorithm for decomposing the matrices in an EMS has to be selective in determining what orderings are applied to the matrices, and which matrices in the sequence should share the same ordering so that efficient incremental decomposition can be performed on them. With this dis-cussion, we are ready to formally define the problem of LU Decomposition over an Evolving Matrix sequence (LUDEM).

Definition 3 (The LUDEM Problem). Given an EMS

M = {A1, A2, . . . , AT}, where each Ai is an n × n sparse

matrix, determine, for 1 ≤ i ≤ T , an ordering Oi for Ai

and compute the LU factors of AOi

i .

We can evaluate an algorithm for solving the LUDEM problem by two metrics: (1) how fast it executes and (2) how good the orderings Oi’s are. Since Markowitz is a

known method for generating very good orderings. We use the Markowitz order O∗

(Ai) as a quality reference, and

de-fine the quality-loss of an ordering as follows.

Definition 4 (Quality-loss of an ordering). Given

an ordering O of a matrix A, the quality-loss of O on A, denoted by ql(O, A), is given by,

ql(O, A) = | ˜sp(A

O

)| − | ˜sp(A∗

)|

| ˜sp(A∗_)| . (4)

That is, we compare the size of the symbolic sparsity pattern of AO

against that of the Markowitz ordered A∗

. Note that a smaller ql(O, A) implies a higher ordering quality.

In general, ˜sp(A∗_{) cannot be determined without}

deter-mining the Markowitz ordering and decomposing A∗

. How-ever, for the special case in which A is a symmetric matrix, it has been shown that its Markowitz ordering and ˜sp(A∗

) can be determined very efficiently without physically de-composing the matrix [1, 13]. In this case, an algorithm for solving the LUDEM problem can very efficiently evalu-ate (using Equation 4) the quality-loss of the orderings it produces. In particular, the algorithm can perform quality control on its own output. So, for the special case of sym-metric matrices, we extend the LUDEM problem to one that has an additional quality constraint. We call this problem LUDEM-QC.

Definition 5 (The LUDEM-QC Problem). Given an

EMS M = {A1, A2, . . . , AT}, where each Ai is an n × n

sparse symmetric matrix, and a quality requirement β ≥ 0, determine, for 1 ≤ i ≤ T , an ordering Oi for Ai such that

ql(Oi, Ai) ≤ β, and compute the LU factors of A Oi

i .

4. ALGORITHMS FOR LUDEM

In this section we describe algorithms for solving the LU-DEM problem.

[Brute Force (BF)] The brute force method (BF) de-termines the Markowitz ordering O∗

(Ai) of each matrix Ai,

reorders Aito the Markowitz ordered A∗i and then

decom-poses A∗

i. Under BF, Oi = O ∗

(Ai). BF is generally slow

because it takes much time to determine the orderings of all matrices and it does not employ a fast incremental decom-position algorithm. However, BF achieves the best order-ing quality because all matrices are Markowitz ordered. We will use BF as the baseline with which the performances of other algorithms are measured. In particular, we evaluate the ordering quality of other algorithms against Markowitz orderings (see Definition 4). Also, the execution times of other algorithms are expressed as speedup factors over BF. [Straightly Incremental (INC)] The INC algorithm first determines the Markowitz ordering of A1 and applies

the ordering to every matrix in the EMS to obtain AO∗(A1)

i

for all 1 ≤ i ≤ T . INC then decomposes AO∗(A1)

1 followed by

applying Bennett’s algorithm to incrementally decompose the successive matrices AO∗(A1)

2 , . . . , A O∗

(A1)

T . Hence,

un-der INC, Oi= O∗(A1). INC computes only one Markowitz

ordering and performs only one full decomposition, in addi-tion to executing Bennett’s algorithm T − 1 times.

A problem with INC is that the ordering quality dete-riorates as we move from A1 to AT because the matrices

deviate from A1progressively. As we have explained, a bad

ordering makes decomposition (full or incremental) slower because of a much larger number of fill-ins in the LU factors. However, to apply Bennett’s algorithm, the matrices have to share the same ordering. Our next two algorithms attempt to strike a balance between ordering quality and the applica-bility of incremental decomposition. The idea is to partition the EMS into clusters such that matrices within the same cluster are sufficiently similar. With highly-similar cluster members, a single ordering can be shared by all members of a cluster and yet the ordering is of good enough quality. We call our next two algorithms cluster-based algorithms. Before their descriptions, we first give the details of the clustering procedure.

In order to group matrices in an EMS into clusters, we need to define a similarity measure. We measure two matri-ces’ similarity by comparing the structures of their underly-ing graphs, which are conveniently captured by the sparsity patterns of the matrices (see Figure 3(a)). Specifically, we use a normalized matrix edit similarity (mes) measure that is based on the symmetric difference of the matrices’ sparsity patterns:

Definition 6 (Matrix edit similarity). Given two matrices Aa and Ab,

mes(Aa, Ab) :=

2|sp(Aa) ∩ sp(Ab)|

|sp(Aa)| + |sp(Ab)|

. (5)

Let C = {A1, ..., At} be a cluster of t matrices4. We derive

two bounding matrices A∩and A∪, which are the

intersec-tion and union of the matrices in C in terms of their sparsity patterns. Formally,

Definition 7 (A∩, A∪). For all 1 ≤ i, j ≤ n,

A∩(i, j) := 1 if (i, j) ∈Tt k=1sp(Ak), 0 otherwise; A∪(i, j) := 1 if (i, j) ∈St_k=1sp(Ak), 0 otherwise.

It can be easily seen that,

4_{W.l.o.g., we assume that the cluster starts with the matrix}

(8)

Property 1. sp(A∩) ⊆ sp(Ai) ⊆ sp(A∪) ∀1 ≤ i ≤ t.

Hence, A∩ and A∪ sandwich the matrices in C. We can

thus measure the compactness of the cluster by the similarity between A∩ and A∪.

Definition 8 (α-boundedness). A cluster C of ma-trices is said to be α-bounded if and only if mes(A∩, A∪) ≥

α.

Since typically the matrices in an EMS are progressively evolving, we use a simple segmentation strategy to partition the matrices of an EMS into clusters. Specifically, given a user-specified similarity threshold α, we start with an empty cluster C1and incrementally insert the matrices into C1

start-ing with A1, then A2, etc., as long as C1remains α-bounded.

If the bounding requirement would have been violated by adding one more matrix, we start building the next cluster C2 and repeat the process. We call this α-clustering.

Algo-rithm 1 shows the clustering algoAlgo-rithm.

Algorithm 1: α-clustering.

Input : EMS M = {A1, A2, . . . , AT}, Similarity threshold α

Output: Clusters {C1,C2, . . . ,Cj}

1 _j_{← 1; C}_j_{← {A}₁_} 2 fori← 2 to T do

3 _{Construct A}∩, A∪from Cj∪ {Ai} based on Definition 7

4 ifmes(A∩, A∪) ≥ α then

5 _C_j_{← C}_j_{∪ {A}_i_}

6 else // start building the next cluster

7 _j_{← j + 1; C}_j_{← {A}_i_}

8 end

9 _end

10 _return_{C₁_,_C₂_{, . . . ,}_C_j_}

Note that a larger α implies that A∩ and A∪ of a

clus-ter are more similar, which then implies a tighclus-ter bounding requirement. This results in fewer matrices in a cluster and more clusters segmented from an EMS.

[Cluster-based Incremental (CINC)] Our next al-gorithm CINC applies INC on each cluster independently. More specifically, for each cluster C, CINC determines the Markowitz ordering of the first matrix in C and applies that ordering to all the matrices in C. After that, it decomposes the first matrix of C followed by applying Bennett’s algo-rithm to incrementally decompose the other matrices in the cluster. Algorithm 2 shows the pseudo code of CINC.

Algorithm 2: CINC on one cluster.

Input : A cluster C = {A1, A2, . . . , At}

Output: Ordering and LU factors of Ai, for 1 ≤ i ≤ t

1 _O₁_{← O}∗_(A₁₎ 2 _(LO1 1 , U O₁ 1 ) ← LU decomposition on A O₁ 1 3 _for_i_{← 2 to t do} 4 _O_i_{← O}₁ 5 _{∆A ← A}O1 i − A O₁ i−1 6 _(LOi i , U Oi i ) ← Bennett(A O₁ i−1,∆A, L O₁ i−1, U O₁ i−1) 7 _end 8 return{O1, . . . ,Ot}, and {(L₁O1, U₁O1), . . . , (LOtt, U Ot t )}

[Fast Cluster-based LU Decomposition (CLUDE)] Given two consecutive matrices Aiand Ai+1in a cluster C,

their symbolic sparsity patterns are typically different. The adjacency-lists structures for storing their LU factors are therefore different (see Section 2.3). As we apply Bennett’s algorithm to obtain the LU factors of matrix Ai+1 from

those of Ai, the list structures of Ai+1are dynamically

cre-ated based on those of Ai. We have profiled the execution

of Bennett’s algorithm. Interestingly, about 70% of its ex-ecution time is spent on constructing the list structures of Ai+1, which involves frequent scanning and restructuring of

various adjacency lists. Our next algorithm, CLUDE, takes advantage of the matrix cluster to determine a universal symbolic sparsity pattern (USSP). As we will show later, a USSP of a cluster C covers all the symbolic sparsity patterns of the matrices in C. We can thus build a universal adjacent-lists structure to be commonly used to store the LU factors of all matrices in C. Since this universal structure is static, we avoid the expensive dynamic construction of individual matrix’s list structure, leading to much savings in execution time.

Before we describe the details of CLUDE, let us first ex-plain the idea of USSP and prove some of its properties.

Definition 9. (Universal symbolic sparsity pat-tern). Consider a cluster C. A set of matrix indices, S, is a USSP of C iff ˜sp(A) ⊆ S, ∀A ∈ C.

Recall that for any matrix A, the data structures for stor-ing A’s LU factors are determined by its symbolic sparsity pattern ( ˜sp(A)) (see Figure 4). In particular, a node is cre-ated in an adjacency list for each matrix index that is present in ˜sp(A). We can likewise derive the data structures from a USSP S of a cluster. Since ˜sp(A) ⊆ S, ∀A ∈ C, the data structures for A are substructures of those derived from S. Hence the structures for S can act as static structures with which the the LU factors of the matrices in A are com-puted. In the following, we show how to obtain a USSP for a cluster based on A∪ (see Definition 7). First, we prove a

monotonicity property given by the following lemma. Lemma 1. Given two matrices Aa and Ab,

(sp(Aa) ⊆ sp(Ab)) ⇒ ( ˜sp(Aa) ⊆ ˜sp(Ab)).

Proof. Assume sp(Aa) ⊆ sp(Ab) and (u, v) ∈ ˜sp(Aa),

it suffice to show that (u, v) is also in ˜sp(Ab). First, from

Equation 3, sp(Ab) ⊆ ˜sp(Ab). Hence, if (u, v) ∈ sp(Ab),

then (u, v) ∈ ˜sp(Ab). The only case left to be considered is

(u, v) /∈ sp(Ab). Since sp(Aa) ⊆ sp(Ab), we have (u, v) /∈

sp(Aa). Now, ((u, v) ∈ ˜sp(Aa)) ∧ ((u, v) /∈ sp(Aa)) ⇒

(u, v) ∈ fp(Aa) (Equation 3), i.e., ∃u1, . . . , uk, s.t. the three

conditions listed in Equation 2 are satisfied. In particular, (3) (u, u1), (ui, ui+1), (uk, v) ∈ sp(Aa) ∀1 ≤ i < k.

Since sp(Aa) ⊆ sp(Ab), we have (u, u1), (ui, ui+1), (uk, v) ∈

sp(Ab) ∀1 ≤ i < k. And thus, (u, v) ∈ fp(Ab). Hence,

(u, v) ∈ ˜sp(Ab) (Equation 3).

Theorem 1. Given a cluster C = {A1, . . . , At}. Let A∪

be the matrix as defined in Defintion 7. ˜sp(A∪) is a USSP

of C.

Proof. ∀Ai ∈ C, we have sp(Ai) ⊆ sp(A∪) (by

Prop-erty 1), which implies ˜sp(Ai) ⊆ ˜sp(A∪) (by Lemma 1).

Hence, by Definition 9, ˜sp(A∪) is a USSP of C.

˜

sp(A∪) can be obtained by performing symbolic

(9)

structure is derived from ˜sp(A∪) on which Bennett’s

algo-rithm operates. To reduce the size of the structure and thus decomposition time, we precede the above steps by finding the Markowitz ordering of A∪and applying the ordering to

A∪ as well as all matrices in the cluster. Algorithm 3 shows

the pseudo code of CLUDE.

Algorithm 3: CLUDE on one cluster.

Input : A cluster C = {A1, A2, . . . , At}

Output: Ordering and LU factors of Ai, for 1 ≤ i ≤ t

1 _{Construct A}∪fromSt_i=1sp(Ai) based on Definition 7

2 _O∪← O∗(A∪)

3 _{Apply symbolic decomposition on A}O∪

∪ to obtain ˜sp(A O∪

∪ )

4 _{Create static structure from ˜}_sp(AO∪

∪ ) for LU factors 5 _O₁_{← O}∪ 6 _(LO1 1 , U O₁ 1 ) ← LU decomposition on A O₁ 1 7 fori← 2 to t do 8 _O_i_{← O}∪ 9 _{∆A ← A}O∪ i − A O∪ i−1 10 _(LOi i , U Oi i ) ← Bennett(A O∪ i−1,∆A, L O∪ i−1, U O∪ i−1) 11 _end 12 return{O1, . . . ,Ot}, and {(L₁O1, U₁O1), . . . , (LOtt, U Ot t )}

5. ALGORITHMS FOR LUDEM-QC

We extend our cluster-based algorithms CINC and CLUDE to solve the LUDEM-QC problem for which an additional quality constraint ql(Oi, Ai) ≤ β has to be enforced. The

key to enforcing the quality constraint is to control the size of the cluster. The smaller the cluster is, the higher the chance that the orderings produced by CINC or CLUDE satisfy the quality constraint. In the extreme case, when each cluster contains just one matrix, the ordering given by CINC or CLUDE for the (lone) matrix in the cluster is just Markowitz. Hence, ql(Oi, Ai) = 0 and so the constraint is

vacuously satisfied. In the following, we discuss how the clustering algorithm should be modified under CINC and CLUDE so that the quality constraint is enforced. We call this clustering β-clustering. In the following discussion, we describe how to construct the first cluster of the EMS. Sub-sequent clusters are done similarly.

[β-clustering CINC version] Given a cluster C = {A1,

. . . , At}, CINC uses the Markowitz ordering of the first

ma-trix in the cluster O1 as the ordering of all the matrices in

the cluster. As we attempt to expand the current cluster by adding a matrix At+1 from the EMS, we evaluate the

quality-loss ql(O1, At+1). If the quality constraint is

vio-lated, we start constructing a new cluster. Essentially, we replace the α-boundedness condition in α-clustering by the β quality-constraint. Algorithm 4 shows the clustering al-gorithm.

[β-clustering CLUDE version] CLUDE uses the Marko-witz ordering O∪ of A∪ as the ordering of the matrices

in the cluster. Checking the quality constraint as we at-tempt to add At+1 to the cluster is trickier than in the

CINC’s case. This is because adding At+1 to C changes

A∪ and thus O∪. Hence, the quality constraints on all the

t matrices that are already in the cluster have to be re-evaluated. To speed up constraint checking, we take a short-cut. Note that the constraint on Ai∈ C is equivalent to φi:

{| ˜sp(AO∪

i )| − | ˜sp(A ∗

i)| ≤ β · | ˜sp(A ∗

i)|}. Also from Property 1

Algorithm 4: β-clustering (CINC version).

Input : EMS M = {A1, A2, . . . , AT}, quality requirement β Output: Clusters {C1,C2, . . . ,Cj} 1 j← 1; Cj← {A1} 2 _{O ← O}∗_(A₁₎ 3 fori← 2 to T do 4 _if_{| ˜}_sp(AO i)| − | ˜sp(A ∗ i)| ≤ β · | ˜sp(A ∗ i)| then 5 _C_j_{← C}_j_{∪ {A}_i_}

6 _else _{// start building the next cluster} 7 _j_{← j + 1; C}_j_{← {A}_i_}

8 _{O ← O}∗_(A_i₎

9 end

10 end

11 _return_{C₁_,_C₂_{, . . . ,}_C_j_}

and Lemma 1, we have | ˜sp(AO∪

i )| ≤ | ˜sp(A O∪

∪ )|. Therefore

the constraint φ∪: {| ˜sp(A O∪ ∪ )|−| ˜sp(A ∗ i)| ≤ β ·| ˜sp(A ∗ i)|}

im-plies φi. Hence, as we attempt to add At+1 to the current

cluster, we only need to compute one | ˜sp(AO∪

∪ )| instead of t

| ˜sp(AO∪

i )|’s. Algorithm 5 shows this clustering algorithm.

Algorithm 5: β-clustering (CLUDE version).

Input : EMS M = {A1, A2, . . . , AT}, quality requirement β

Output: Clusters {C1,C2, . . . ,Cj}

1 j← 1; Cj← {A1}

2 _for_i_{← 2 to T do}

3 _{Construct A}∪ from Cj∪ {Ai} based on Definition 7

4 _O∪← O∗(A∪) 5 _if_∀A_l_{∈ C}_j_{∪ A}_i_{,| ˜}_sp(AO∪ ∪ )| − | ˜sp(A∗l)| ≤ β · | ˜sp(A ∗ l)| then 6 _C_j_{← C}_j_{∪ {A}_i_}

7 _else _{// start building the next cluster} 8 _j_{← j + 1; C}_j_{← {A}_i_}

9 _end

10 _end

11 return{C1,C2, . . . ,Cj}

6. EXPERIMENTAL EVALUATION

We conduct experiments to evaluate the algorithms INC, CINC, and CLUDE. We execute BF to obtain baseline per-formance numbers against which the other algorithms are evaluated. In particular, we execute BF to determine the Markowitz ordering of each matrix in the EMS to measure the quality-loss of the orderings given by other algorithms. Also, the execution times of the other algorithms are ex-pressed as speedup factors over BF’s execution time. All al-gorithms are implemented in Java and the experiments are conducted on a Linux machine with a 3.40GHz Octo-Core Intel(R) processor and 16GB of memory.

We conduct experiments on two EMS’s that are derived from two real datasets5 and also on a synthetic EMS. Here we briefly describe the datasets.

[Wiki] We collected a set of 1000 daily snapshots of 20,000 Wikipedia pages and their hyperlinks. The number of hy-perlinks in the first and the last snapshots are 56,181 and 138,072, respectively. The average (mes) similarity (Eq. 5)

5_{http://socialnetworks.mpi-sws.org/,}

(10)

between successive matrices derived from the snapshots is 99.88%.

[DBLP] The DBLP dataset consists of 70 years of publica-tions. We extracted all publications in three areas (1) DB, (2) Vision, (3) Algorithms & Theory. Based on these publi-cations, we constructed a sequence of co-authorship graphs. The snapshot graph of a date is derived from all the papers published before that date6. We used the latest 1000 daily snapshots for our experiments. There are 97,931 vertices; the number of edges in the first and the last snapshots are 387,960 and 547,164, respectively. The average similarity between successive matrices derived from the snapshots is 99.86%.

[Synthetic] We generated synthetic EGS’s from which EMS’s are derived. Our EGS generator takes five parameters (their default values are shown in parentheses):

• V (50,000): the number of vertices.

• |EP | (450,000): the number of edges in an “edge pool” EP . • d (5): the average vertex degree of the first snapshot. • k (4): the ratio ∆E+_/∆E−

, where ∆E+ _{and ∆E}−

are the number of edges added to and removed from a snapshot to generate the next snapshot, respectively.

• ∆E (500): ∆E+_{+ ∆E}−

.

• T (500): the number of snapshots in the EGS.

To generate an EGS, we first use the BA model [4] to generate a scale-free7 _{base graph G that has V vertices and}

|EP | edges. All the edges are collected in the edge pool EP . Next, we randomly pick d·V edges from EP to form the edge set E of the first snapshot. Then we repeat the following procedure to generate subsequent snapshot graphs:

1. Randomly remove ∆E−

= ∆E/(k + 1) edges from E. 2. Randomly pick ∆E+ _{= (k · ∆E)/(k + 1) edges from}

EP − E and add them to E.

We can prove that the snapshot graphs generated by the above procedure are scale-free. We omit the proof due to space limitation.

6.1 Ordering Quality Analysis

Our first set of experiments evaluate the algorithms in terms of their ordering qualities. Recall that INC finds the Markowitz ordering, O∗

(A1), of A1 and applies that to all

matrices Ai’s in the whole EMS. The ordering quality

de-grades with i as Ai gradually deviates from A1. Figure 5

shows the quality-loss ql(O∗

(A1), Ai) vs. the matrix index

i for the two real datasets. We see that the quality-loss in-creases with i as explained. Indeed, the ordering quality of INC is quite poor. For Wiki, the average quality-loss (over the 1000 matrices) is about 2. That means if a matrix Aiis

ordered by O∗

(A1), on average, the number of “extra”

en-tries in Ai’s LU factors is twice the size of Ai’s LU factors

if Ai were Markowitz-ordered! The quality-loss reaches 2.7

for the last snapshot of the EMS.

By grouping similar matrices into a cluster and apply-ing the same orderapply-ing only to matrices of the same cluster, CINC and CLUDE give much better ordering qualities. Fig-ure 6 shows the average quality-loss of the orderings given

6_{For publications that only have publication year, we evenly}

distribute them to the dates of their corresponding years.

7_{A graph is scale-free if the distribution of vertices’ degrees}

follows a power law: P (t) ∝ 1/tγ_{, where P (t) is the}

prob-ability that a vertex has a degree t, and γ is a constant. Following [4], we set γ = 3. 0 0.5 1 1.5 2 2.5 3 0 200 400 600 800 1000 Quality-loss

Matrix Index (i) Average Quality-loss (a) Wikipedia 0 0.5 1 1.5 2 0 200 400 600 800 1000 Quality-loss

Matrix Index (i) Average Quality-loss

(b) DBLP

Figure 5: INC: quality-loss vs. matrix index (i).

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.92 0.94 0.96 0.98 1 Average Quality-loss Similarity Threshold (α) CLUDE CINC (a) Wikipedia 0 0.1 0.2 0.3 0.4 0.5 0.9 0.92 0.94 0.96 0.98 1 Average Quality-loss Similarity Threshold (α) CLUDE CINC (b) DBLP

Figure 6: Average quality-loss vs. similarity thresh-old α.

by CINC and CLUDE as the α-clustering similarity thresh-old varies. A larger α implies a more stringent similarity requirement and thus clusters are more compact. It is thus easier for the same ordering to cover all the matrices in the cluster and yet it gives good ordering quality. This explains why quality-loss drops as α increases. Comparing CINC and CLUDE, CLUDE gives much better ordering qualities. This is because while CINC uses the Markowitz ordering of the first matrix in the cluster, CLUDE uses the Markowitz or-dering of A∪, which covers all matrices in the cluster and

thus fits them better. For example, for the Wiki dataset, when α = 0.95, the quality-losses of CINC and CLUDE are 0.53 and 0.13, respectively. Compared with the aver-age value 2 for INC, the quality-loss of CLUDE is 15 times better than that of INC.

6.2 Efficiency Analysis

In this section, we compare the algorithms in terms of speed. We express algorithms’ efficiency in terms of their speedup factors over BF’s execution time. Figure 7 shows the speedups as α varies. Note that INC does not cluster the matrices and so its speedup is shown as straight lines in the graphs. From the figure, we see that among the three algo-rithms, INC is the slowest while CLUDE is the fastest. This is despite the fact that INC determines only one Markowitz ordering (on A1), performs only one full LU decomposition

(on A1) and applies (the supposedly) fast Bennett’s

algo-rithm to incrementally LU decompose all the other matrices in the EMS.

The reason why INC is slow (only 2.6 times faster than BF for the Wiki dataset) is due to its poor ordering qual-ity. As we have explained, the Markowitz ordering of A1

is unfit for most of the other matrices in the EMS. Hence, the LU factors computed by INC are huge. This signif-icantly slows down the incremental decompositions (Ben-nett’s). The speedups of CINC are generally above 5 for

(11)

0 5 10 15 20 25 0.9 0.92 0.94 0.96 0.98 1 Speedup Similarity Threshold (α) CLUDE CINC INC (a) Wikipedia 0 5 10 15 20 0.9 0.92 0.94 0.96 0.98 1 Speedup Similarity Threshold (α) CLUDE CINC INC (b) DBLP

Figure 7: Speedup vs. similarity threshold α.

0 0.2 0.4 0.6 0.8 1 1.2 1.4 0.9 0.92 0.94 0.96 0.98 1 Time (hour) Similarity Threshold (α) Total Time Clustering Time Markowitz Time LU Decomposition Time Bennett Time

(a) Execution time (CLUDE) 0 0.5 1 1.5 2 2.5 3 0.9 0.92 0.94 0.96 0.98 1

Bennett Time (hour)

Similarity Threshold (α) CLUDE

CINC

(b) Bennett time

Figure 8: CLUDE’s execution time breakdown

(Wiki dataset).

the Wiki set and CLUDE registers a speedup of 20. These significant speedups are brought about by their much higher ordering qualities. From Figure 7, we see how the perfor-mances of CINC and CLUDE change with α. In particular, their speedups drop when α is very close to 1. This is be-cause a very large α value implies a very stringent clustering requirement. In the extreme case, when α is very large, each cluster contains only one matrix, which reduces CINC and CLUDE to BF. We observe that the speedups of CINC and CLUDE are very significant and quite stable unless α is very large. We remark that selecting the threshold α is an en-gineering effort as its best value depends on various factors such as the nature of the graphs. Fortunately, it is not very critical that the optimal α be found, as the algorithms per-form very well over a wide range of α.

We further investigate the reasons behind the big perfor-mance gap between CINC and CLUDE as shown by their speedup curves. There are two factors that contribute to the improvement of CLUDE over CINC: (1) CLUDE gives better ordering quality than CINC, which leads to smaller LU factors and thus faster decomposition time (full or in-cremental). (2) CLUDE uses the universal symbolic sparsity pattern to prepare the data structures for storing matrices’ LU factors. This greatly facilitates the incremental updat-ing of the LU factors across matrices (see discussion in Sec-tion 4). Both of these factors improve the speed of Bennett’s algorithm, which incrementally decompose matrices.

CLUDE’s execution time consists of four components: (1) Clustering time (tc): time to perform α-clustering on the

EMS. (2) Markowitz time (tM): time to compute the

Marko-witz orderings of matrices (done once per cluster). (3) LU decomposition time (td): time to perform full LU

decom-positions (done once per cluster on the first matrix of the cluster). (4) Bennett’s time (tB): time to perform

incremen-tal LU decompositions (done on all matrices but the first of each cluster). Figure 8(a) shows these four components

0 1 2 3 4 5 6 7 8 300 400 500 600 700 Average Quality-loss ∆E CLUDE CINC INC

(a) Average quality-loss 0 2 4 6 8 10 12 14 16 18 20 300 400 500 600 700 Speedup ∆E CLUDE CINC INC (b) Speedup

Figure 9: Varying ∆E (Synthetic).

0 0.05 0.1 0.15 0.2 0 0.05 0.1 0.15 0.2 0.25 0.3 Average Quality-loss Quality Requirement (β) CLUDE CINC

(a) Average quality-loss 0 2 4 6 8 10 12 14 0 0.05 0.1 0.15 0.2 0.25 0.3 Speedup Quality Requirement (β) CLUDE CINC INC (b) Speedup

Figure 10: Varying quality requirement β (DBLP).

when CLUDE is applied to the Wiki dataset over different α values.

First, we see that tcis negligible and stays constant.

Sec-ond, we note that as α increases, fewer matrices are col-lected in a cluster without violating the similarity constraint. Hence, clusters are smaller and there are more clusters. Con-sequently, tM and tdincrease with α. Third, tighter

cluster-ing implies better ordercluster-ing quality (see Figure 6(a)), which speeds up incremental decomposition. Therefore, tB drops

as α increases. Now, let us focus on the numbers when α = 0.95, which is the case when CLUDE gives the best speedup. We see that tB dominates CLUDE’s execution time. In

fact, tB is also the dominating component of CINC’s

exe-cution time. Figure 8(b) gives a head-to-head comparison between the tB components of CINC and CLUDE. We see

that CLUDE significantly outperforms CINC in tB by the

two factors mentioned above. This explains the big gap be-tween their execution times.

6.3 Synthetic Dataset

Our next experiments evaluate the algorithms using the synthetic dataset. The synthetic dataset allows us to vary the various properties of the graphs (matrices) so that we can perform various sensitivity studies. Figures 9(a) and (b) compare the algorithms in terms of quality and speedup, re-spectively, as the number of edge changes between snapshot (∆E) varies. Note that a larger ∆E causes the matrices in the EMS deviate more from A1. This makes INC’s

order-ing more unfit for the matrices, leadorder-ing to worse orderorder-ing quality. CINC and CLUDE, on the other hand, are very adaptive. Through α-clustering, they maintain the similar-ity of the matrices in the same cluster (by including more or fewer matrices in a cluster) and thus their ordering qual-ities remain stable as ∆E changes. However, faster evolving matrices means more and smaller clusters. This increases tM and td. Also, a larger ∆E makes incremental

(12)

speedups drop when ∆E increases. We remark that CLUDE gives very impressive speedups (10-20) compared with oth-ers (Figure 9(b)).

We have conducted many other experiments with the thetic dataset varying the various parameters of the syn-thetic data generators. The general observations from these results are that CLUDE gives the best ordering quality and at the same time is much faster than INC and CINC. CLUDE typically registers a speedup from 10 to 20. Due to space limitations, we omit those results in the paper.

6.4 The LUDEM-QC Problem

Our last set of experiments compare the performance of CINC and CLUDE in solving the LUDEM-QC problem. Re-call that the problem can be efficiently solved for symmet-ric matsymmet-rices. Hence, we conducted the experiments on the DBLP dataset, whose matrices are symmetric. Figures 10(a) and 10(b) show the qualities and speedups of the algorithms as the quality requirement β varies.

From Figure 10(a), we see that both CINC and CLUDE are adaptive to β. In particular, when the requirement is looser (a larger β), the algorithms employ bigger clusters so that they can perform fewer full decompositions but more incremental decompositions. The result is trading ordering quality (increasing quality-loss, Figure 10(a)) for faster de-composition (increasing speedup, Figure 10(b)). We observe that both CINC and CLUDE are able to maintain an order-ing quality that is well within the requirement. Between the two, CLUDE gives higher ordering quality. Again, this is because it uses the ordering of A∪, which covers all the

ma-trices in the same cluster. Moreover, CLUDE can provide more than 10 times speedup. It significantly outperforms the other algorithms.

7. CASE STUDY

To further illustrate the use of evaluating measures in a graph sequence, we conducted a case study on a Patent dataset [15]. This dataset contains information (e.g., patent name, year granted, company, etc.) of 3 million U.S. patents and the citations among them between 1975 and 1999. Ana-lyzing the citations among patents can help us answer ques-tions such as “How does company X depend on company Y in technology development?” “How does the dependency evolve over time?” These insights are useful in predicting new alliances and acquisitions, which have much impact on the companies’ stock prices. We use IBM as an example subject of analysis.

We take the yearly snapshots of the patent citation graphs spanning 1979 to 1999. Based on a citation graph, we mea-sure the proximity of company Y from company X by sum-ming the PPR scores of Y ’s patent nodes using X’s patent nodes as the set of starting seed nodes.

Taking IBM as company X, Figure 11 shows the prox-imity of a few representative companies from IBM over the years from 1979 to 1999. In the figure, we show the ranks of the companies based on their proximity scores. The fig-ure reflects how much IBM depended on other companies in its technology development. For example, Xerox devel-oped Alto (widely regarded as the first PC) and invented the Graphical User Interface (GUI), which are important com-ponents of IBM PC’s development. Xerox thus maintained a high rank during those 20 years.

Among the seven companies shown in Figure 11, Harris,

1979 1984 1989 1994 1999 0 20 40 60 80 100 Year Rank CDC HARRIS INTEL MOTOROLA NATIONAL SONY XEROX

Figure 11: PPR score rankings (IBM patents as seed nodes).

an international telecommunications equipment company, stands out. While the ranks of others were quite stable, Harris’ rank increased steadily since 1979. This trend is a good predictor of a closer collaboration between the com-panies. In fact, in 1992, IBM and Harris announced their alliance to share technology and to capitalize their strengths in technology development. Harris’ stock price hit a closing high shortly after the announcement. This case study shows that the trends of various graph measures over a graph se-quence could provide interesting insights that are beyond what measures from a single graph can derive.

8. RELATED WORK

EGS processing was first introduced in [25], which studies the computation of the shortest path distance between two nodes across a graph sequence. Our clustering approach shares some favor with that presented in [25].

There are a number of studies on efficient computation of the various measures, such as PR/SALSA/PPR/DHT/RWR, on single graphs [22, 18, 12, 14, 23, 10, 11]. One interest-ing approach is approximation methods. Two such popular methods are the power iteration (PI) method [6] and the Monte Carlo (MC) method [10]. For example, to compute RWR scores, PI iteratively refines the solution x based on the recurrence relation x(k+1)u = dW x(k)u + (1 − d)qu and

MC simulates random walks to approximate the stationary distributions. PI or MC have to be executed once for every input query qu. In contrast, our problem is to decompose

a matrix such that queries can be answered very efficiently. For example, with our Wiki dataset, answering queries after matrices are LU-decomposed is about two orders of magni-tude faster than answering them using either PI or MC.

There are fast solutions for answering some very specific queries that exploit matrix sparsity [11, 12]. For example, in [11], sparse matrix decomposition is used to find the top-k nodes of the highest RWR scores in a graph. These studies focus on processing single graphs. Instead, our work focuses on processing graph sequences and to answer queries which involve general Gaussian Elimination.

There are algorithms (e.g.,[8, 3]) for incrementally main-taining specific measures when the underlying graph changes. For example, [3] employs the MC method and stores a num-ber of random walk segments (RWS’s) in a database. When the graph changes, the stored RWS’s are updated accord-ingly. PPR scores are then approximated based on the stored RWS’s. Our algorithms compute exact measures

(13)

in-stead of approximation and they are not restricted to a specific measure. Also, after matrices are LU-decomposed, query answering can be done much faster than those incre-mental measure maintenance solutions. For example, with our Wiki dataset, our approach is at least an order of mag-nitude faster.

In recent years, there are works on processing graph streams [19, 21]. Their focus is on how to detect interesting changes and how to perform fast aggregations as graphs arrive. For example, [19] studies how to detect sub-graphs that change rapidly over a small window of the stream. Like other stud-ies on stream processing, the data (graphs) that arrives in the stream is not archived. This limits the kind of analyses that can be performed. In contrast, we focus on decompos-ing the matrices in a graph sequence such that more complex analytical tasks can be done efficiently.

9. CONCLUSIONS

In this paper we studied the LUDEM problem and its quality-constraint variant LUDEM-QC. We illustrated that by decomposing the matrices in an EMS into their LU fac-tors, interesting structural analyses on a sequence of evolv-ing graphs can be carried out efficiently. We gave an in-depth discussion on matrix reordering and incremental LU decomposition, based on which we designed our solutions for the LUDEM problem. Through extensive experiments, we analyzed our algorithms and showed that CLUDE outper-formed the rest. Over a wide range of settings, CLUDE sig-nificantly outperformed the straightforward incremental al-gorithm (INC) both in terms of ordering quality and speed. Typically, CLUDE’s quality-loss was more than 10 times smaller than that of INC. Also, CLUDE registered a speedup that in most cases was at least an order of magnitude faster than the brute-force approach.

Acknowledgment

This research is partly supported by Hong Kong Research Grants Council grant HKU712712E, HKU711309E, and the University of Hong Kong (Project 201211159083). We would like to thank the reviewers for their insightful comments.

10. REFERENCES

[1] P. R. Amestoy, T. A. Davis, and I. S. Duff. An approximate minimum degree ordering algorithm. SIAM J. Matrix Anal. Appl., 17(4):886–905, Oct. 1996.

[2] R. Andersen, F. Chung, and K. Lang. Local graph partitioning using pagerank vectors. In FOCS ’06, pages 475–486, Washington, DC, USA, 2006. IEEE Computer Society.

[3] B. Bahmani, A. Chowdhury, and A. Goel. Fast incremental and personalized pagerank. Proc. Very Large Data Base Endow., 4(3):173–184, Dec. 2010. [4] A.-L. Barab´l´csi and R. Albert. Emergence of scaling in

random networks. Science, 286(5439):509–512, 1999. [5] J. Bennett. Triangular factors of modified matrices.

Numerische Mathematik, 7(3):217–221, 1965.

[6] P. Berkhin. A survey on pagerank computing. Internet Mathematics, 2:73–120, 2005.

[7] G. E. Box, G. M. Jenkins, and G. C. Reinsel. Time series analysis: forecasting and control. Wiley. com, 2013.

[8] P. K. Desikan, N. Pathak, J. Srivastava, and V. Kumar. Incremental page rank computation on evolving graphs. In WWW (Special interest tracks and posters), pages 1094–1095, 2005.

[9] I. S. Duff, A. M. Erisman, and J. K. Reid. Direct methods for sparse matrices. Oxford University Press, Inc., New York, NY, USA, 1986.

[10] D. Fogaras and B. R´acz. Towards scaling fully personalized pagerank. In WAW, pages 105–117. 2004. [11] Y. Fujiwara, M. Nakatsuji, M. Onizuka, and

M. Kitsuregawa. Fast and exact top-k search for random walk with restart. Proc. Very Large Data Base, 5(5):442–453, 2012.

[12] Y. Fujiwara, M. Nakatsuji, T. Yamamuro,

H. Shiokawa, and M. Onizuka. Efficient personalized pagerank with accuracy assurance. In KDD, pages 15–23, 2012.

[13] A. George, M. Heath, J. Liu, and E. Ng. Solution of sparse positive definite systems on a hypercube. Journal of Computational and Applied Mathematics, 27(1-2):129–156, 1989.

[14] Z. Guan, J. Wu, Q. Zhang, A. K. Singh, and X. Yan. Assessing and ranking structural correlations in graphs. In ACM SIGMOD Conference, pages 937–948, 2011.

[15] B. Hall, A. Jaffe, and M. Trajtenberg. The NBER patent citation data file: Lessons, insights and methodological tools. Technical report, NBER, 2001. [16] Z. Huang and D. K. J. Lin. The time-series link

prediction problem with applications in

communication surveillance. INFORMS Journal on Computing, 21(2):286–303, 2009.

[17] A. N. Langville and C. D. Meyer. Google’s PageRank and beyond: The science of search engine rankings. Princeton University Press, 2006.

[18] R. Lempel and S. Moran. Salsa: the stochastic approach for link-structure analysis. ACM Trans. Inf. Syst., 19(2):131–160, 2001.

[19] Z. Liu and J. X. Yu. Discovering burst areas in fast evolving graphs. In Database Systems for Advanced Applications (1), pages 171–185, 2010.

[20] H. M. Markowitz. The elimination form of the inverse and its application to linear programming.

Management Science, 1957.

[21] A. McGregor. Graph mining on streams. In

Encyclopedia of Database Systems, pages 1271–1275, 2009.

[22] L. Page, S. Brin, R. Motwani, and T. Winograd. The pagerank citation ranking: bringing order to the web. 1999.

[23] J.-Y. Pan, H.-J. Yang, C. Faloutsos, and P. Duygulu. Automatic multimedia cross-modal correlation discovery. In KDD, pages 653–658, 2004.

[24] W. H. Press, S. A. Teukolsky, W. T. Vetterling, and B. P. Flannery. Numerical Recipes 3rd Edition: The Art of Scientific Computing. 2007.

[25] C. Ren, E. Lo, B. Kao, X. Zhu, and R. Cheng. On querying historical evolving graph sequences. Proc. Very Large Data Base, 4(11):726–737, 2011. [26] D. J. Rose and R. E. Tarjan. Algorithmic aspects of