CiteSeerX — Performance Of Greedy Ordering Heuristics For Sparse Cholesky Factorization

(1)

PERFORMANCE OF GREEDY ORDERING HEURISTICS FOR SPARSE CHOLESKY FACTORIZATION

ESMOND G. NG^y ^AND PADMA RAGHAVAN^z

Abstract. Greedy algorithms for ordering sparse matrices for Cholesky factorization can be based on dierent metrics. Minimum degree, a popular and eective greedy ordering scheme, minimizes the number of nonzero entries in the rank-1 update (degree) at each step of the factorization. Alternatively, minimum deciency minimizes the number of nonzero entries introduced (deciency) at each step of the factorization.

In this paper we develop two new heuristics: \modied minimum deciency" (MMDF) and \modied multiple minimum degree" (MMMD). The former uses a metric similar to deciency while the latter uses a degree-like metric. Our experiments reveal that on the average MMDF has 21% fewer operations to factor than minimum degree; MMMD has 15% fewer operations to factor than minimum degree. MMMD is no more expensive to compute than minimum degree while MMDF requires on the average 30% more time than minimum degree.

Keywords. sparse matrix ordering, minimum degree, minimum deciency, greedy heuristics.

AMS(MOS)subjectclassications. 65F05, 65F50.

1. Introduction.

It is well known that ordering the rows and columns of a matrix is a crucial step in the solution of sparse linear systems using Gaussian elimination. The ordering can drastically aect the amount of ll introduced during factorization and hence the cost of computing the factorization [7, 13]. When the matrix is symmetric and positive denite, the ordering step is independent of the numerical values and can be performed prior to numerical factorization. The ideal choice is an ordering that introduces the least ll, but the problem of computing such an ordering is NP-complete [22]. Consequently, almost all ordering algorithms are heuristic in nature. Examples include reverse Cuthill- McKee [5, 6, 8], automatic nested dissection [9], and minimum degree [18].

A greedy ordering heuristic numbers columns successively by selecting at each step a column with the optimal value of a metric. In the minimum degree algorithm of Tinney and Walker [21] the metric is the number of nonzero entries (and hence operations) in the rank-1 update associated with a column in a right-looking, sparse Cholesky factorization.

The algorithm can be stated in terms of vertex eliminations in a graph representing the matrix [18]; now the metric translates into the degree of a vertex. Ecient implementations of minimum degree are due to George and Liu [11, 12]. The minimum degree algorithm with multiple eliminations (MMD) due to Liu [16] has become the method of choice in the last decade. Multiple independent vertices are eliminated at a single step in MMD to reduce the ordering time. Recently, Amestoy, Davis, and Du [1] have developed the

\approximate minimum degree" algorithm (AMD), which uses an approximation to the degree to further reduce the ordering time. Berman and Schnitger [4] have analytically shown that the minimumdegree algorithm can, in some rare cases, produce a poor ordering.

However, experience has shown that the minimum degree algorithm and its variants are

Work was supported in part by the Defense Advanced Research Projects Agency under contracts DAAL03-91-C-0047, ERD9501310, and Xerox-MP002315, and by the Applied Mathematical Sciences Re- searchProgram, Oce of Energy Research,U.S. Departmentof Energy undercontractDE-AC05-96OR22464 with Lockheed Martin Energy Research Corp., and by the National Science Foundation under grants NSF- ASC-94-11394 and NSF-CDA-9529459.

yComputer Science and Mathematics Division, Oak Ridge National Laboratory, P. O. Box 2008, Oak Ridge, TN 37831-6367 ([email protected]).

z107 Ayres Hall, Department of Computer Science The University of Tennessee, Knoxville, TN 37996- 1301 ([email protected]).

(2)

eective heuristics for generating ll-reducing orderings. In fact, only some very recent separator-based schemes [3, 14, 15] have outperformed MMD for certain classes of sparse matrices. Some of these new schemes are hybrids that use the minimum degree algorithm to order some of the columns.

A greedy ordering heuristic that was also proposed by Tinney and Walker [21], but has largely been ignored, is the minimum deciency (or minimum ll) algorithm. The minimum deciency algorithm minimizes the number of ll entriesintroduced at each step of sparse Cholesky factorization (ordeciency in graph terminology). Although the metrics look similar, the minimum deciency and minimum degree algorithms are dierent. For example, the deciency could well be zero even when the degree is not. There are two reasons why the minimum deciency algorithm has not become as popular as the minimum degree algorithm [7]. First, the minimum deciency algorithm is typically much more expensive than the minimum degree algorithm. Second, it has been believed that the quality of minimumdeciency orderings is not much better than that of minimumdegree orderings [7].

Results by Rothberg [19] (and also by us [17]) demonstrate that minimum deciency leads to signicantly better orderings than minimum degree. However, current implementations of the minimum deciency algorithm require substantially more time than MMD.

In this paper, we develop two greedy heuristics that are less expensive to compute than minimum deciency, but compute better orderings than minimum degree on average. The heuristics are variants of minimumdeciency and minimumdegree. In Section 2, we provide background material and introduce some special notation to help describe our heuristics. In Section 3 we develop our \modied minimum deciency"(MMDF) and \modied multiple minimum degree" (MMMD) heuristics. We also show that the two heuristics can be implemented using the update mechanism in the \approximate degree" scheme of Amestoy, Davis, and Du [1]. In Section 4 we provide empirical results on the performance of MMDF and MMMD. Section 5 contains some concluding remarks. The remaining part of this section describes recent related work.

Related work

. Rothberg has investigated metrics for greedy ordering schemes based on approximations to the deciency [19]. His work and our work [17] were done independently of each other.¹ Rothberg [19]:

shows that the minimum deciency algorithm is signicantly superior to MMD in terms of the number of operations required to compute the Cholesky factor,

develops three \approximate minimum ll" (AMF) heuristics based on approximations to the deciency, and

concludes that heuristic AMF1 is the best among the three; on the average, AMF1 orderings require 14% fewer operations to factor than MMD orderings.

In our earlier report [17], we:

establish that many of the techniques used in ecient implementations of the minimum degree algorithm (namely, indistinguishable vertices, mass elimination, and outmatching) also apply to the minimum deciency algorithm,

corroborate Rothberg's empirical results establishing the superior performance of the minimum deciency metric,

develop our \modied minimum deciency" (MMDF) and \modied multiple minimum degree" (MMMD) heuristics, and

show that MMDF (MMMD) orderings require 17% (15%) fewer operations to factor than MMD (on the average).

It is dicult to compare the results in [17] and [19] directly because the test suites used

1 Raghavan and Rothberg presented their results independently at the Second SIAM Conference on Sparse Matrices in 1996.

(3)

in the two papers are substantially dierent. The aggregate measures reported in the two papers are also dierent. Moreover they were based on performance data obtained from dierent sets of initial numberings.

More recently, in a revision of [19], Rothberg and Eisenstat have developed two new metrics for greedy ordering schemes [20]. Rothberg and Eisenstat [20]:

develop two heuristics \approximate minimum mean ll" (AMMF) and \average minimum increase in neighbor degree" (AMIND), and

show that AMMF orderings require 22% (median) to 25% (geometric mean) fewer operations to factor than MMD orderings; AMIND orderings require 20% (median) to 21% (geometric mean) fewer operations to factor than MMD orderings.

This paper is a shorter version of [17]. The test suite in this paper is substantially dierent from that in the original paper. In an attempt to compare the performance of our heuristics with that of the AMF1 (called AMF in [20]), AMMF, and AMIND heuristics in [19]

and [20], we now use nearly the same test suite as in those two papers. Four of the matrices in [19] and [20] are proprietary, and therefore are not available to us. To report performance relative to MMD, we had earlier used the \median of ratios" over 7 initial orderings (6 random orderings and the ordering in which the matrix was given to us). In this paper, we use the \ratio of medians" over 11 random initial orderings (as in [19, 20]). As we will see later in the paper, our MMDF and MMMD heuristics produce better orderings than MMD.

The MMDF and MMMD orderings are very competitive with those produced by AMF1, AMMF, and AMIND. What we see as interesting is the development of ve dierent metrics that can produce orderings that are signicantly better than those produced by MMD. We had commented in our earlier report [17] that there could well be other relatively inexpensive greedy strategies that outperform the ones known at that time. The performance of newer schemes AMMF and AMIND [20] supports our prediction; AMMF seems to be slightly better than our MMDF. As we discuss in Section 5, we still believe that there may well be other greedy metrics that perform better than the ve developed so far.

2. Implementing Greedy Ordering Heuristics.

The ecient implementation of greedy ordering schemes is based on a compact realization of the graph-model of Cholesky factorization [18]. In this section, we provide a brief description of elimination graphs and quotient graphs, and introduce an example, together with some notation used to describe our greedy heuristics. We also describe minimum degree and minimum deciency schemes using quotient graphs.

Throughout, we use terminology common in sparse matrix factorization. The reader is referred to the book by George and Liu [13] for details.

Elimination graphs and quotient graphs.

Sparse Cholesky factorization can be modeled using eliminationgraphs [18]. Let^Gdenote an eliminationgraph. At the beginning,

Gis initialized to^G⁰, the graph of a sparse symmetric positive denite matrix [13]. At each step a vertex and its incident edges are removed from^G. If^xis the vertex removed, edges are added to^Gso that the neighbors of^xbecome a clique. Thus cliques are formed as the elimination proceeds.

A quotient graph [10, 13] is a compact representation of an elimination graph. It requires no more space than that for^G⁰ [13]. Unlike the elimination graph, vertices are not explicitly removed and neither are cliques formed explicitly. Instead vertices are grouped in

\supervertices" and marked as \eliminated" or \uneliminated."

Let^Gdenote the current elimination graph. Let ^Sbe the set of vertices that have been eliminated. Consider the subgraph induced by^S in^G⁰. This subgraph will contain one or more connected components (which are also called domains). In the quotient graph, the vertices in each connected component are coalesced into an eliminated supervertex. Note

(4)

that the cliques created by the elimination process in ^Gare easy to identify in a quotient graph. Each such clique contains all (uneliminated) neighbors of an eliminated supervertex.

It is well known that as the elimination proceeds, some (uneliminated) vertices will becomeindistinguishablefrom each other; that is, they share essentially the same adjacency structure in the current elimination graph ^G. Now each set of indistinguishable vertices is coalesced into anuneliminated supervertexin the quotient graph. Observe that all vertices in an uneliminated supervertex have the same degree (or deciency) and hence can be

\mass-eliminated" when the degree (or deciency) becomes minimum[13, 17]. Furthermore, vertices in an uneliminated supervertex form a clique.

X Y 1

Y 2

Z 1

Z k

Y m Z 2

Z

Z 3

4

Z 5

Z 6

Fig.1.An example of a quotient graph;^Xis the most recently eliminated supervertex. Supervertices

fY1;;Ym^genclosed in a curve form a clique; other partial cliques used in MMDF are shown using dotted curves.

Thus a quotient graph can be viewed as a graph containing two kinds of supervertices:

uneliminated supervertices and eliminated supervertices. Each uneliminated supervertex is a clique of indistinguishable vertices of the corresponding elimination graph ^G. Each eliminated supervertex is a subset of the vertices that have been eliminated from the original graph^G⁰. Vertices in the set of uneliminated supervertices adjacent to the same eliminated supervertex in the quotient graph form a clique in the elimination graph. For simplicity, we will say that these uneliminated supervertices form a \clique" in the quotient graph. Two uneliminated supervertices are said to be \neighbors" in the quotient graph when there is an edge between them or they are adjacent to the same eliminated supervertex in the quotient graph. Thus vertices belonging one uneliminated supervertex are adjacent to those of the other supervertex in the corresponding elimination graph.

An example and some notation.

Figure 1 contains an example of a quotient graph.

Eliminated supervertices are represented by rectangles and uneliminated supervertices are represented by circles. Assume that an uneliminated supervertex has been selected accord- ing to the greedy criterion, and the quotient graph has been transformed. This gives a new eliminated supervertex in the quotient graph. Denote the new eliminated supervertex by^X;

(5)

in the remaining part of this paper ^X will be referred to as the \most recently eliminated supervertex." Using our convention, both ^Z¹ and ^Z² are neighbors of^Y¹. Note that the uneliminated supervertices ^Y¹^;^;^Ym (enclosed by a curve), adjacent to the eliminated supervertex^X, form a clique. Two other cliques are^fY¹^;^Z²^;^;^Z⁵^gand^fY¹^;^Y²^;^Z⁵^;^;^Z⁶^g. Observe that the three cliques are not disjoint. Uneliminated supervertices that are enclosed by a dotted curve (such as^Z², ^Z³, ^Z⁴, and^Z⁵) form what we call a \partial" clique; these

\partial" cliques will be used to describe our heuristics in the next section.

If^V is a supervertex (either uneliminated or eliminated) in the quotient graph, we dene

N

1(^V) as the set of uneliminated supervertices that are neighbors of^V. We use ^N²(^V) to denote the set of uneliminated supervertices that are neighbors of those in^N¹(^V). We use deg(^V) to denote the degree of an uneliminated supervertex ^V; deg(^V) is sum of ^jV^j^?1 and the total number of vertices in all supervertices in^N¹(^V).

Minimum degree and deciency schemes.

Recall that a greedy heuristic needs a metric ^d(^v) for selecting the next supervertex to eliminate. Examples of^d() are the degree (in minimum degree) and the deciency (in minimum deciency). In terms of elimination graphs, a greedy heuristic has the following structure: select a vertex that minimizes ^d(), eliminate it from the current eliminationgraph, form the next eliminationgraph, and update the value of the metric for each vertex aected by the elimination. A greedy scheme can also be described in terms of quotient graphs: select an uneliminated supervertex that minimizes

d(), create a new quotient graph, and update the value of the metric for each uneliminated supervertex aected by the elimination.

In the minimum deciency heuristic, updating the deciency after one step of elimination may be signicantly more time consuming than updating the degree in the minimum degree algorithm. Consider the example in Figure 1 where^Xis the most recently eliminated supervertex. With minimum degree (MMD and AMD) only the uneliminated supervertices in^N¹(^X) need a degree update. However, with minimum deciency, we need to update the deciency of not only supervertices in^N¹(^X), but also some of the supervertices belonging to ^N²(^X). Any supervertex in ^N²(^X) that is a neighbor of two or more supervertices in

N

1(^X) would need a deciency update. With respect to Figure 1, we would have to update the deciency of^Z¹ since it is a neighbor of both^Y¹ and ^Ym. Similarly, we would have to update the deciency of each of^Z⁵, ^Z⁶,,^Zk (each supervertex is a neighbor of both^Y¹ and^Y²).

Rothberg showed that the true minimum deciency algorithm (true local ll in [19]) produces signicantly better orderings than MMD. We obtained similar results in [17].

However, our implementation of the minimum deciency algorithm was on the average slower than MMD by two orders of magnitude [17]. Let^X be the most recently eliminated supervertex. Using the deciency as the metric but restricting updates to uneliminated supervertices in^V¹(^X) (as in MMD) leads to orderings that are inferior to true minimum deciency but still signicantly better than MMD. This was observed by Rothberg [19] and later corroborated by Ng and Raghavan [17]. However, even such a restricted scheme is more than 40 times slower than MMD [17]. In the next section we describe two relatively inexpensive but eective heuristics based on modications to the deciency and degree.

3. Modied Minimum Deciency and Minimum Degree Heuristics.

We now describe two heuristics based on approximations to the deciency and the degree. Both metrics can be implemented using either the update mechanism in MMD or the faster scheme in AMD.

Our rst heuristic \modied minimumdeciency" (MMDF) is based on a deciency-like metric. Consider the example in Figure 1 and assume^X is the most recently eliminated supervertex. We update the values of the metric ^d() of uneliminated vertices in^N¹(^X) =

(6)

fY

1

;Ym^gjust as in MMD. Consider updating^d(^Y¹) in Figure 1. An upper bound on the deciency of ^Y¹ can be obtained in terms of the degree of^Y¹. The true deciency of

Y

1 is obtained by subtracting from the upper bound the number of edges that are present before ^Y¹ is eliminated. Identifying all such edges requires examining the uneliminated supervertices in^N¹(^Y¹) and^N²(^Y¹). However, someof these edges can be identied easily because in the quotient graph representation, uneliminated supervertices connected to a common eliminated supervertex form a clique. Using notation introduced earlier,^N¹(^Y¹) =

fY

2

;Ym^g^[^fZ¹^;^;^Zk^g is the set of uneliminated neighbors of ^Y¹. The elements of

N

1(^Y¹) can be grouped into a set of disjoint \partial" cliques^K. The obvious member of^Kis

fY

2

;;Ym^g, consisting of the uneliminated neighbors of the eliminated supervertex^X. The other partial cliques depend on the order in which the neighbors of^Y¹are examined. Without loss of generality, assume^fZ¹^gforms the second clique. Likewise let^fZ²^;^Z³^;^;^Z⁵^gbe the next partial clique we examine. Finally,^fZ⁶^;^;^Zk^gis the fourth disjoint partial clique.

The metric^d(^Y¹) is dened as^?^C^?^ct; is an upper bound on deciency,^C is the sum of contributions from partial cliques, and^ctis the correction term. At initialization,^d(^Y¹) is set to the upper bound. We dene,^C, and^ctbelow.

: The upper bound is based on the external degree [16]: = edeg(^Y¹)[edeg(^Y¹)^?1], where edeg(^Y¹) = deg(^Y¹)^?^jY¹^j. This upper bound favors larger supervertices. If two supervertices have the same degree, the larger supervertex will have a smaller upper bound on deciency since its external degree is smaller.

C: LetP ^K be the set of disjoint partial cliques as described above. We dene ^C =

V²K^jV^j[^jV^j^?1], where ^V is a partial clique in ^K; the size of ^V is the total number of vertices in all uneliminated supervertices that constitute^V.

ct: The correction term^ct takes into account contributions missed because (1) partial cliques in ^K are forced to be disjoint, and (2) cliques such as ^fZ¹^;^Ym^g which are not detected because we do not examine ^N²(^Y¹). Our heuristic value of^ct is 2edeg(^Y¹)^jY¹^j. The rationale for the choice of ^ct is as follows. Assume that each supervertex in^N¹(^Y¹) is connected to one other supervertex (in^N¹(^Y¹)) and that the associated contribution has been missed. Assume further that the size of

Y

1 is representative of the sizes of supervertices in^N¹(^Y¹); then the contributions that have been missed equal ^P_V^2N1(_Y¹⁾_;V⁶⁼_Y¹2^jV^jjY¹^j, which simplies to ^ct = 2edeg(^Y¹)^jY¹^j.

We would like to emphasize that MMDF is heuristic. We see the correction term as an approximation to edges missed because we restrict our attention to partial cliques that are disjoint. In our experiments we found that small multiples of the correction term behaved just as well if not better.

Our second heuristic \modied multiple minimum degree" (MMMD) attempts to use a metric that is bounded by a small multiple of the degree. As indicated by the name it is a close variant of the minimum degree algorithm with multiple eliminations. Consider Figure 1 and once again assume ^X is the most recently eliminated supervertex. For the supervertex ^Y¹, MMMD uses the metric^d(^Y¹) = 2edeg(^Y¹)^?Û, whereÛ is the size of the largest partial clique in the set^K (described above). More precisely,Û = maxV²K^jV^j. At initialization we simply use 2edeg(^Y¹). Note that MMMD diers from MMD only in the denition of the metric. The metric in MMMD tries to take into account contributions from the largest clique that contains^Y¹.

The disjoint partial cliques in the set^K of^Y¹are exactly those used implicitly in Liu's MMD code to compute edeg(^Y¹). Hence the update cost of MMDF and MMMD is similar to that of Liu's MMD.

Approximate MMDF and MMMD

. We now brie y outline how \approximate"

(7)

versions of the two schemes can be implemented using the faster update mechanism in the AMD scheme of Amestoy, Davis, and Du [1].

Consider \approximate" MMDF. Consider once again Figure 1 and the metric for ^Y¹, an uneliminated supervertex adjacent to^X, the most recently eliminated supervertex. The upper bound is now calculated using the approximate external degree of AMD. The correction term can also be easily calculated in terms of this approximate external degree and the size of supervertex ^Y¹. The main dierence is in how^K is constructed, and hence the term ^C. Now the set ^K corresponds to the cliques used in AMD to compute an approximation to the degree. AMD uses the sizes of certain cliques of supervertices in the set

N

1(^Y¹) =^fY²^;^Ym^g^[^fZ¹^;^;^Zk^g. With respect to the example in Figure 1, the cliques used in AMD are: ^fY²^;^Ym^g, ^fZ¹^g, ^fZ²^;^;^Z⁵^g, ^fZ⁵^;^Z⁶^;^;^Zk^g. The rst clique is the one formed by elimination of^X; the remaining cliques have no overlap with this clique.

However, the remaining cliques may have supervertices in common. MMDF based on MMD forces the partial cliques to be disjoint. On the other hand, approximate-MMDF relaxes this restriction, i.e., it uses the cliques in AMD and these cliques may have common uneliminated supervertices. The approximation to^C (the contribution to the deciency from the partial cliques) is computed using the clique sizes used in AMD (for the approximation to the degree). Approximate-MMMD is similar; it also uses the cliques in AMD.

Relation to other deciency-like schemes.

We would like to note that MMDF is similar to AMF3, proposed by Rothberg [19]; it diers mainly in the way in which the partial cliques are constructed, as well as in the denition of the correction term. AMF (\approximate minimum ll"), AMMF (\approximate mean minimum ll") and AMIND (\approximate mean increase in neighbor degree") are three other heuristics developed by Rothberg and Eisenstat [19, 20]. The deciency-like metrics in AMF, AMMF, and AMIND use only edges in the most recently formed clique while the metric in MMDF takes into account edges in as many cliques as we can \easily identify." AMIND also uses a term which is similar to our correction term in MMDF. MMMD is similar to AMF in that it uses only the size of a single clique but it diers in the sense that it uses a degree-like metric.

4. Performance of MMDF and MMMD.

We now report on the performance of MMDF and MMMD. We use a set of 36 test matrices in our empirical study. Our test suite is a subset of the one used by Rothberg and Eisenstat [19, 20]; their test suite contains four other matrices that are proprietary and hence are not available to us. Our MMD code is Liu's Fortran implementation converted to C. Our MMDF and MMMD heuristics are built using the MMD code. MMDF diers from MMD in the metric update as well in the use of heaps to store and retrieve the metric. Furthermore, unlike MMD, MMDF does not allow

\multiple eliminations." MMMD is nearly identical to MMD and diers only in the metric update portion. All our experiments were performed on a Sun Ultra Sparc-2 workstation.

The quality of greedy orderings can vary depending on the initial numbering. For each test matrix, we use 11 dierent random initial numberings for MMD, MMDF, and MMMD. We consider two quantities for the quality of ordering: the number of nonzeros in the Cholesky factor, and the number of oating-point operations required to compute the Cholesky factor. We also report actual execution times for MMD, MMDF, and MMMD.

The characteristics of the test matrices and the quality of MMD orderings are reported in Table 1. We report mean and median values over 11 initial random numberings for MMD in Table 1.

Table 2 shows the performance of MMDF and MMMD relative to that of MMD. The relative measure is computed as the ratio of the medians over 11 initial random numberings.

We also present the geometric mean and the median over all test matrices in the last two lines of the table. The execution time of MMMD matches that of MMD, while MMDF

(8)

Table1

Performance of MMD; ^jLjand operations are mean and median values over 11 initial orderings.

Problem rank ^jAj mean median

Time ^jLj Operations Time ^jLj Operations (10³) (secs) (10³) (10⁶) (secs) (10³) (10⁶)

3DTUBE 45330 3256.9 10.08 31842 42100 10.08 31842 42101

bcsstk15 3948 117.8 0.61 652 167 0.61 653 168

bcsstk16 4884 290.3 0.42 736 142 0.42 737 143

bcsstk17 10974 428.6 0.97 1134 196 0.97 1135 197

bcsstk18 11948 140.1 1.01 644 130 1.01 644 131

bcsstk23 3134 45.2 0.62 454 139 0.62 455 139

bcsstk25 15439 252.2 1.88 1536 342 1.88 1537 342

bcsstk29 13992 619.5 1.70 1744 425 1.70 1744 425

bcsstk30 28924 2043.5 2.89 3855 935 2.89 3856 935

bcsstk31 35588 1181.4 4.78 5263 2484 4.78 5263 2485

bcsstk32 44609 2014.7 5.25 5227 1096 5.25 5228 1096

bcsstk33 8738 591.9 1.32 2640 1302 1.32 2640 1302

bcsstk35 30237 1480.4 2.80 2734 389 2.80 2734 389

bcsstk36 23052 1166.2 1.56 2760 609 1.56 2761 609

bcsstk37 25503 1166.5 1.77 2834 559 1.77 2834 560

bcsstk38 8032 363.5 1.26 747 121 1.26 747 122

bcsstk39 46772 2089.3 8.12 7644 2191 8.12 7645 2191

bikker2 173160 854.9 43.27 100869 195802 43.27 100870 195802

cfd1 70656 1899.0 19.38 39959 46935 19.38 39959 46935

cfd2 123440 3211.3 36.84 90609 178166 36.84 90610 178166

copter2 55476 759.9 10.79 14182 12620 10.79 14183 12620

crystk01 4875 320.7 0.46 1082 337 0.46 1083 338

crystk02 13965 982.5 1.84 6123 4339 1.84 6124 4340

crystk03 24696 1775.8 3.92 13943 13050 3.92 13944 13051

ap 51537 1010.7 5.77 5630 1904 5.77 5630 1905

ford2 100196 544.7 8.57 2448 308 8.57 2448 309

gearbox 153746 9234.1 40.52 52972 57327 40.52 52973 57328

msc10848 10848 1240.6 1.07 2028 576 1.07 2028 576

msc23052 23052 1177.9 1.57 2748 607 1.57 2748 608

pwt 36519 326.1 2.51 1768 224 2.51 1768 225

sphere6 16386 114.7 0.57 795 133 0.57 796 133

struct1 46949 2329.5 4.13 5067 1298 4.13 5068 1299

struct2 73752 3670.9 7.74 9810 3817 7.74 9810 3817

struct3 53570 1227.3 6.36 5309 1215 6.36 5309 1216

struct4 4350 242.1 2.07 2248 1756 2.07 2248 1756

troll 213453 1198.5 42.37 61171 153228 42.37 61171 153228

(9)

Table2

Performance of MMDF and MMMD relative to MMD. For each problem we report the ratio of median values over 11 initial random orderings.

Problem Ordering time ^jLj Operations

MMDF MMMD MMDF MMMD MMDF MMMD

3DTUBE 1.50 0.99 0.87 0.86 0.79 0.77

bcsstk15 0.77 0.93 0.89 0.92 0.76 0.83

bcsstk16 1.64 1.50 0.93 0.93 0.84 0.85

bcsstk17 1.21 1.11 1.00 0.98 0.97 0.95

bcsstk18 1.04 1.06 0.90 0.93 0.76 0.82

bcsstk23 0.79 1.05 0.87 0.93 0.77 0.88

bcsstk25 1.06 1.10 0.87 0.92 0.72 0.83

bcsstk29 1.08 1.05 0.94 0.95 0.82 0.83

bcsstk30 1.41 1.17 0.94 0.97 0.83 0.93

bcsstk31 1.36 1.18 0.92 0.95 0.81 0.88

bcsstk32 1.64 1.36 0.96 0.97 0.85 0.89

bcsstk33 1.10 1.17 0.90 0.89 0.77 0.76

bcsstk35 1.64 1.34 1.02 1.00 0.98 0.98

bcsstk36 1.68 1.47 1.01 1.00 0.98 0.96

bcsstk37 1.72 1.50 0.98 0.98 0.89 0.91

bcsstk38 1.02 1.21 0.99 1.00 0.94 1.00

bcsstk39 1.59 1.09 1.08 1.02 1.32 1.08

bikker2 2.02 1.24 0.86 0.86 0.69 0.71

cfd1 1.26 1.01 0.79 0.82 0.62 0.69

cfd2 1.25 0.99 0.80 0.80 0.65 0.69

copter2 1.19 1.05 0.80 0.86 0.66 0.75

crystk01 1.33 1.22 0.91 0.89 0.80 0.77

crystk02 1.23 1.04 0.86 0.83 0.72 0.68

crystk03 1.25 1.04 0.89 0.80 0.79 0.64

ap 1.50 1.06 0.93 0.91 0.81 0.78

ford2 1.30 1.06 0.92 0.95 0.73 0.81

gearbox 1.58 1.13 0.91 1.00 0.77 1.17

msc10848 1.56 1.33 0.95 0.96 0.85 0.88

msc23052 1.69 1.52 1.00 0.99 0.91 0.94

pwt 1.48 1.03 0.94 0.97 0.85 0.92

sphere6 1.74 1.04 0.89 0.96 0.76 0.93

struct1 1.72 1.08 0.95 0.96 0.85 0.90

struct2 1.80 1.06 0.97 0.98 0.94 0.93

struct3 1.28 1.05 0.95 0.96 0.87 0.90

struct4 0.73 0.99 0.80 0.84 0.65 0.70

troll 1.13 1.04 0.80 0.85 0.65 0.73

g-mean 1.33 1.13 0.91 0.92 0.80 0.84

median 1.34 1.07 0.92 0.95 0.80 0.86

(10)

requires on average an overhead of 34% over MMD. Our experiments indicate that avoiding the use of heaps in MMDF (as suggested by a referee) will reduce about a third of this overhead.

Results for variants of MMDF and MMMD are summarized in Table 3. The approximate versions of MMDF and MMMD were based on our implementation of AMD and did not include features such as \aggressive absorption." The approximate version of MMDF/MMMD performs equally well; the geometric mean and the median are the same as those for MMDF/MMMD. For both MMDF and MMMD (and their approximate versions), adding Ashcraft's initial compression step [2] improved the performance slightly (1% on the average).

Table3

Summary of performance of MMDF and MMMD variants relative to MMD. The geometric-mean and median over all problems in the test suite is based on the ratio of median values over 11 initial random orderings for each problem.

Method ^jLj Operations

g-mean median g-mean median Without initial compression

MMDF .91 .92 .80 .80

approximate-MMDF .91 .92 .80 .80

MMMD .92 .95 .84 .86

approximate-MMMD .92 .95 .84 .86

With initial compression

MMDF .90 .90 .79 .79

approximate-MMDF .90 .91 .79 .79

MMMD .92 .95 .84 .85

approximate-MMMD .92 .95 .84 .86

5. Conclusions.

We have developed two new greedy heuristics: \modied minimum deciency" (MMDF) and \modied multipleminimumdegree" (MMMD). Both these schemes produce orderings that are better than MMD orderings. The rst scheme MMDF produces orderings that require approximately 21% fewer oating-point operations for factorization than MMD, while the second scheme MMMD generates orderings that incurs 15% fewer operations for factorization than MMD. MMDF uses a deciency-like metric, i.e., a metric whose value is a quadratic function of the degree. The execution time of MMDF is approximately 1^:3 times that of MMD. On the other hand, MMMD uses a degree-like metric, which is bounded above by twice the value of the degree. MMMD is the same as MMD but for the dierence in the choice of the metric. Consequently, the ordering time of MMMD is very similar to that of MMD. Furthermore, there is no change in the quality of the orderings when MMDF (MMMD) is implemented using \approximate degree" [1] framework.

For completeness, Table 4 summarizes the performance of our schemes, as well as those in [20]. It appears that the performance of MMMD and \approximate minimum ll" (AMF1 [19] and AMF [20]) are similar. Likewise, MMDF and \approximate mean increase in neighbor degree" (AMIND [20]) produce orderings of similar quality. \Approxi- mate mean minimumll" (AMMF [20]) appears to be slightly better than MMDF. Relative to MMD, AMMF orderings require 25% (geometric mean) to 22% (median) fewer operations for factorization, while MMDF (with compression) orderings require 21% (geometric mean and median) fewer operations for factorization.

(11)

Summary of operation counts to factor for MMDF, MMMD, AMF, AMMF, and AMIND relative to MMD (with initial compression).

Measure Ng-Raghavan Rothberg-Eisenstat MMDF MMMD AMF1 AMMF AMIND

g-mean .79 .84 .85 .75 .79

median .79 .85 .85 .78 .80

Our work is an attempt to understand factors aecting the performance of greedy ordering heuristics. We tried several metrics that are close to those in MMDF and MMMD.

Many of these had average operation counts for factorization similar to those reported for MMDF and MMMD, while others varied substantially. We also experimented with a variant of MMDF that did update the metric for \neighbors of neighbors" as in true minimum deciency. Surprisingly, the operation counts were on the average higher by 3-4% for this variant. The performance of true minimum deciency shows that deciency is a better metric than the degree. However, we surmise that the improved performance of our heuristics is from the complicated interplay of the metric and the greedy process, and not necessar- ily from accurately modeling the true deciency. We conjecture that there could well be other relatively inexpensive greedy strategies that signicantly outperform the ones known so far.

REFERENCES

[1] P. Amestoy, T. A. Davis, and I. S. Duff,An approximate minimum degree ordering algorithm, SIAM J. Matrix Anal. Appl., 17 (1996), pp. 886{905.

[2] C. Ashcraft,Compressed graphs and the minimum degree algorithm, SIAM J. Sci. Comput., 16 (1995), pp. 1404{1411.

[3] C. Ashcraft and J. W. H. Liu,Robust orderings of sparse matrices using multisection, Tech. Rep.

ISSTECH-96-002, Boeing Computer Services, Seattle, WA, 1996.

[4] P. Berman and G. Schnitger,On the performance of the minimum degree ordering for Gaussian elimination, SIAM J. Matrix Anal. Appl., 11 (1990), pp. 83{88.

[5] E. Cuthill,Several strategies for reducing bandwidth of matrices, in Sparse Matrices and their Ap- plications, D. J. Rose and R. A. Willoughby, eds., New York, 1972, Plenum Press.

[6] E. Cuthill and J. McKee,Reducing the bandwidth of sparse symmetric matrices, in Proceedings 24th ACM National Conference, aug 1969, pp. 157{172.

[7] I. Duff, A. Erisman, and J. Reid,Direct Methods for Sparse Matrices, Oxford University Press, Oxford, England, 1987.

[8] A. George,Computer Implementation of the Finite Element Method, PhD thesis, Dept. of Computer Science, Stanford University, 1971.

[9] A. George and J. W.-H. Liu,An automatic nested dissection algorithm for irregular nite element problems, SIAM J. Numer. Anal., 15 (1978), pp. 1053{1069.

[10] ,A quotient graph model for symmetric factorization, in Sparse Matrix Proceedings 1978, I. S.

Du and G. W. Stewart, eds., Philadelphia, 1979, SIAM, pp. 154{175.

[11] ,A fast implementation of the minimum degree algorithm using quotient graphs, ACM Trans.

Math. Software, 6 (1980), pp. 337{358.

[12] ,A minimal storage implementation of the minimum degree algorithm, SIAM J. Numer. Anal., 17 (1980), pp. 282{299.

[13] ,Computer Solution of Large Sparse Positive Denite Systems, Prentice-Hall Inc., Englewood Clis, New Jersey, 1981.

[14] A. Gupta,Fast and eective algorithms for graph partitioning and sparse matrix or dering, Tech.

Rep. RC-20496, IBM Research Division, T.J. Watson Research Center, Yorktown Heights, NY 10598, 1996.

[15] B. Hendrickson and E. Rothberg,Improving the runtime and quality of nested dissection ordering, tech. rep., Sandia National Laboratories, Albuquerque, NM 87185, 1996.

(12)

[16] J. W.-H. Liu,Modication of the minimum degree algorithm by multiple elimination, ACM Trans.

Math. Software, 11 (1985), pp. 141{153.

[17] E. G.-Y. Ng and P. Raghavan, Performance of greedy ordering heuristics for sparse cholesky factorization, Tech. Rep. CS-97-XX, University of Tennessee, Knoxville, TN, 1997.

[18] D. Rose,A graph-theoretic study of the numerical solution of sparse positive denite systems of linear equations, in Graph Theory and Computing, R. C. Read, ed., Academic Press, 1972, pp. 183{217.

[19] E. Rothberg,Ordering sparse matrices using approximate minimum local ll. April 1996.

[20] E. Rothberg and S. Eisenstat,Node selection strategies for bottom-up sparse matrix ordering. April 1997.

[21] W. Tinney and J. Walker,Direct solution of sparse network equations by optimally ordered trian- gular factorization, Proc. IEEE, 55 (1967), pp. 1801{1809.

[22] M. Yannakakis,Computing the minimum ll-in is NP-complete, SIAM J. Alg. Disc. Meth., 2 (1981), pp. 77{79.