• No results found

6.3 Proposed reordering method

6.3.2 Row ordering

After the initial column ordering, we perform a row permutation, which is stated by Algorithm 8. The intention of this row permutation is to place some more non-zeros for which we can build some regular patterns. These patterns may reduce the cache misses during SpMxV. Algorithm 8 describes this row permutation procedure. We

Algorithm 7: BRGC(CRS(S),CCS(S), m, n, b, t)

Input: CRS(S) and CCS(S) are the CRS, CCS representation of sparse matrix

S that hasm rows and n columns, b is a positive integer, where b >0 and t is a positive number where 0< t <1.

Output: CRS(S), the CRS representation of sparse matrix S after reordering of rows and columns.

compute A0 considering each column ofS as a binary object;

1

compute A1 fromA0 using Algorithm 5 with the modifications stated in

2

Proposition 19;

update CRS(S) and CCS(S) according to the column permutation found in A1

3

and we need to be sure that the column (row) indices for each row (column) in CRS (CCS ) are sorted in ascending order;

Γ =RowOrdering(A1,CRS(S),CCS(S), m);

4

update ci.v (fori= 0,· · · ,(n−1), where ci is the i-th column of S after initial 5

column permutation), CRS(S) and CCS(S) according to the row permutation found in Γ; merge(A1, b, m); 6 k = 1; 7 while true do 8

break if the number of lists in Ak is greater than tn; 9

create Ak+1 fromAk using Algorithm 5 with the modifications stated in 10

Proposition 19;

k =k+ 1; 11

update CRS(S) according to the column permutation found in Ak; 12

return CRS(S); 13

stress the fact that not all rows participate to this row permutation. To be more precise, we need the following notion.

The function SelectedRows(A1, m) returns precisely the set of rows of S that

participate (set R) and do not participate (set R) in creating A1. By participation,

we mean, a row participates in A1 if and only if at least one non-zero of that row

participates in the stable sorting routine which is required to create L′ from L (see

Section 5.4.1). In the sequel, we denote this set by R. Thus both R and R are subsets of{r0, . . . , rm−1}. We defineR as a set of rows defined as{r0, . . . , rm−1} − R.

The time complexity of SelectedRows(A1, m) is O(m). We have shown R rows in

Figure 6.1.

Algorithm 8 calls Algorithm 9 and returns the row permutation Γ. Algorithm 9 first determines an ordering for the rows that are inRand then computes an ordering for the rows that are in R. The former step is done in a straightforward manner through Lines 1 to 4 of Algorithm 9. The latter step is detailed below.

We first need to understand the following definitions.

Definition 3. Let Ri(j) be the column index of the j-th non-zero element in the i-th

row of A for 0 i < n and 0 j < n. Getting Ri(j) for a single non-zero from the

CRS representation of a sparse matrix is expensive. However, we can identify the

column indices of all non-zeros of S one-by-one from the CRS representation, which

requires O(τ) bit operations.

Definition 4. We say that a list A1

h of A1 is the owner of the non-zero referred by

Ri(j) if the non-zero belongs to a column in A1h. Furthermore, we denote A1h as the

winner of the i-th row if no other list in A1 owns more nonzeros from the i-th row

thanA1

h. In Algorithm 9, we call a routine called owner(Ri(j), A1), which returns the

owner of the nonzero referred by Ri(j).

Now we are ready to describe lines 5 to 21 of Algorithm 9. At Line 5, we initialize an array of queues called winlist, whose size is equal to the number of lists in A1.

As all column indices of non-zero entries for a row in CRS are in ascending order, after the execution of the for-loop (between line 6 and line 21), the queuewinlist[k] contains all the indices of the rows won by the list A1

k, where k is a positive integer. Lines 22 to 25 determine the ordering of the rows from R as follows. Let i and j

be the indices of two distinct rows in R, that are won by two different lists A1

s1 and

A1

s2, respectively, where 0 ≤ s1< s2. Then in Γ, i appears before j. Rows that are

won by the same list can be placed in arbitrary order in Γ, though in our algorithm we dedicate k queues for this purpose. The time complexity of Algorithm 9 is O(τ).

Each of the data structures used in row ordering requires O(τ) words to be stored. Figure 6.2 shows how a sparse matrix looks like after row permutations.

Algorithm 8: RowOrdering(A1,CRS(S),CCS(S),m)

Input: A1, the list of lists of columns of S described in Definition 2, CRS(S)

and CCS(S) are the CRS and CCS representation of sparse matrix S

respectively and m is the row dimension of S

Output: Γ, the row permutation vector [R, R] = SelectedRows(A1, m);

1

returnRowPerm(A1, R,R,CRS(S));

2