Background - Hardware Acceleration Technologies in Computer Algebra: Challenges and Impact

In this section, we review some of the data structures and introduce some notations used in this chapter.

6.2.1 Compressed row storage scheme (CRS)

Storage schemes used for unstructured sparse matrices usually involve some form of indirect indexing of its non-zero elements via auxiliary data structures. For example, the compressed row storage (CRS) scheme [4] uses two auxiliary arrays, colind of lengthτ (the number of non-zero elements) androwptrof lengthm+1 wheremis the number of rows of S. This is the most common storage scheme for sparse matrices. The three arrays required to store the sparse matrix S are described below.

1. value: for storing the non-zeros of S row-by-row,

2. colind: for storing the column index of each non-zero, and

3. rowptr: for storing the index of the first non-zero of each row in the value array.

6.2.2 SpMxV

with CRS scheme

Sample code for computing y = Sx under CRS scheme is given in Algorithm 6. In this algorithm, accesses to vector y and all three arrays of CRS are regular. But the accesses to the vector x might be irregular because the column indices of each row may not be consecutive. A large number of cache misses might occur during the accessing of x which may make the SpMxV very slow in practice.

6.2.3 Compressed column storage scheme (CCS)

This scheme is the same as CRS except that the non-zeros are stored column-by- column. Like CRS, three arrays are used in the compressed column storage scheme (CCS) to store sparse matrix S are described below.

1. value: for storing the non-zeros of S column-by-column, 2. rowind: for storing the row index of each non-zero, and

3. colptr: for storing the index of the first non-zero of each column in the value array.

6.2.4 Notations

We consider a sparse matrix S with arbitrary sparsity structure having m rows (r0, . . . , rm−1), n columns (c0, . . . , cn−1) and τ non-zero elements. Here, si,j refers

to the entry ofS which is at thei-th row and thej-th column. We denote byρr _and ρc _{the average number of non-zeros in a row and a column respectively. Let 1}_/p _be the probability that an element ofS is non-zero. Throughout this chapter, we assume that m and n are positive integers of machine word size, or smaller. We assume that

τ > m+n and min(m, n) > p. Time and space complexity estimates are given for the RAM model with memory holding a finite number of w-bit words, for a fixed

w[64]. Cache complexity is measured by considering the ideal cache model described in Chapter 2.

Algorithm 6: SpMxV(value,colind,rowptr,x)

Input: value,colind,rowptr are three arrays that represents S in CRS and dense vector x

Output: vector y, where y=Sx

for all i= 0,1, . . . , m₋1 do 1 y[i] = 0; 2 for i= 0,1, . . . , m₋1 do 3

for k=rowptr[i] to rowptr[i+1]₋1 do

4 j =colind[k]; 5 y[i]+ =value[k]_∗x[j]; 6 return y; 7

6.2.5 Binary reflected Gray code

A q-bit binary reflected Gray code [43] is a Gray code denoted by Gq _{and defined by} G1 _{= [0}_,_{1] and} Gq _{= [0}_Gq−1 0 , . . . ,0G q−1 2q−1₋₁,1G q−1 2q−1₋₁, . . . ,1G q−1 0 ], forq >1,

where Gq_i is the i-th binary string of Gq _{and 0}

≤i <2q_{. We call} _i _{the rank of} _Gq i in Gq_{. For example,}_G2 _{= [00}_,₀₁_,₁₁_,_{10] and}_G3 _{= [000}_,₀₀₁_,₀₁₁_,₀₁₀_,₁₁₀_,₁₁₁_,₁₀₁_,_100].

So, the rank of 011 in G3 _{is 2. For details please see [43].}

6.2.6 Sorting of binary reflected Gray codes

In this chapter, we develop a new row and column permuting algorithm based on

binary reflected Gray code for sparse matrices. We call it BRGCordering. For our

proposed reordering algorithm, we consider each non-zero of S as 1. We also consider each column of S as a binary reflected Gray code inGm_{. Like in Section 5.2, we con-} sider the bits from row 0 andm₋1 as the most and least significant bits respectively.

In this section, we explain how we can sort binary reflected Gray codes in descending order of their ranks by our proposed sorting algorithm described in Chapter 5. From the mathematical definition of binary reflected Gray code in Section 6.2.5, we can describe Corollary 3 which is the basis for sorting binary reflected Gray codes.

Corollary 3. Let Gq_i and Gq_j be two different binary reflected Gray codes in Gq_{. Let}

their first disagree bit (see Section 5.2) be h for 0_≤h < q. Assume that theh-th bit

of Gqj has 1. If the number of 1s in G

q i or G

j before h-th bit is even (odd), we can

conclude j > i (i > j).

Proposition 19 describes how we can modify our proposed sorting algorithm in Chapter 5 to sort binary reflected Gray codes according to their ranks.

Proposition 19. Our proposed sorting algorithm in Chapter 5 can sort binary reflected Gray codes in descending order according to their ranks with one modification.

While creating Ak+1 _from _Ak _{(see Section 5.4.1), we need to form array} _L _{and apply}

a stable sort algorithm on L to obtain L′ _{in ascending order only when} _k _{is even.}

Proof ⊲ _{It follows from Corollary 3.}

It should be noted that, we do not use any well-established sorting algorithm, like quick sort, for this purpose. We can implement quick sort algorithm available in C++ STL. The reasons for not using these technique are given below.

1. Our reordering algorithm, which is described later of this chapter, is not just a sorting of columns considering their ranks in binary reflected Gray codes. 2. We have already seen in Chapter 5, our proposed sorting algorithm is suitable

for sparse objects.

In document Hardware Acceleration Technologies in Computer Algebra: Challenges and Impact (Page 59-62)