• No results found

Systolic and pipelined algorithms

3 The Many Decoding Algorithms for Reed – Solomon Codes

3.10 Systolic and pipelined algorithms

=



1 −δ1x

−11 (1 − )x

  0

−x−1

 .

Because E0= δ1, this reduces to(1)(x) = E0, and

A(1)(x) =

−1 if E0= 0

0 if E0= 0,

as required for the first iteration. Thus the first iteration is correct, and iteration r is

correct if iteration r− 1 is correct. 

3.10 Systolic and pipelined algorithms

The performance of a fast algorithm for decoding is measured by its computational complexity, which can be defined in a variety of ways. The most evident way to define the computational complexity of an algorithm is its total number of elementary arith-metic operations. These are the four operations of addition, multiplication, subtraction, and division. In a large problem, however, these elementary operations may be less significant sources of complexity than is the pattern of movement of data flow as it passes between operations. The complexity of such movement of data, however, is hard to quantify.

A systolic algorithm is one in which the computations can be partitioned into small repetitive pieces, which will be called cells. The cells are arranged in a regular, usually square, array. If, further, the cells can be arranged as a one-dimensional array with data transferred in only one direction along the array, the algorithm would instead be called a pipelined algorithm. During one iteration of a systolic algorithm, each cell is allowed to exchange data with neighboring cells, but a cell is not normally allowed to exchange any (or much) data with distant cells. The complexity of a cell is considered to be less important than the interaction between cells. In this sense, a computational algorithm has a structure that may be regarded as something like a topology. In such a situation, the topology of the computation may be of primary importance, while the number of multiplications and additions may be of secondary importance.

We shall examine the structure of the Berlekamp–Massey algorithm and the Sugiyama algorithm from this point of view. The Berlekamp–Massey algorithm and the Sugiyama algorithm solve the same system of equations, so one may inquire whether the two algorithms have a common structure. We shall see in this section that the two algorithms can be arranged to have a common computational element, but the way that this element is used by the two algorithms is somewhat different. Indeed, there must be

a difference because the polynomial iterates of the Berlekamp–Massey algorithm have increasing degree, whereas the polynomial iterates of the Sugiyama algorithm have decreasing degree.

The Berlekamp–Massey algorithm begins with two polynomials of degree 0,(x) and B(x), and, at each iteration, may increase the degree of either or both polynomial iterates. The central computation of the rth iteration of the algorithm has the form

(x)

whereδr is the discrepancy computed during the rth iteration,r = 1 − r is either zero or one, andδrcan be zero only whenris zero. Depending on the values of the parametersδrandr, the update matrix takes one of the following three forms:

A(r)=

Each of the 2t iterations involves multiplication of the current two-vector of polynomial iterates by one of the three matrices on the right. The Berlekamp–Massey algorithm terminates with a locator polynomial,(x), of degree ν at most equal to t.

In analyzing the structure of the Berlekamp–Massey algorithm, it is important to note that the iterateδris a global variable because it is computed from all coefficients of(x) and B(x). (It is interesting that a similar iterate with this global attribute does not occur in the Sugiyama algorithm.) An obvious implementation of a straightforward decoder using the Berlekamp–Massey algorithm might be used in a computer program, but the deeper structure of the algorithm is revealed by formulating high-speed hardware implementations.

A systolic implementation of the Berlekamp–Massey algorithm might be designed by assigning one cell to each coefficient of the locator polynomial. This means that, during iteration r, the jth cell is required to perform the following computation:

j = j− δrBj−1, Bj = rδ−1r + rBj−1, δr+1,j= jSr−j.

The computations within a single cell require that Bj and Sjbe passed from neighbor to neighbor at each iteration, as shown in Figure3.5, with cells appropriately initialed to zero. In addition to the computations in the cells, there is one global computation for the discrepancy, given byδr+1 = 

jδr+1,j, in which data from all cells must be combined into the sumδr+1, and the sumδr+1 returned to all cells. During the rth iteration, the jth cell computesj, the jth coefficient of the current polynomial iterate

175 3.10 Systolic and pipelined algorithms

Figure 3.5. Structure of systolic Berlekamp–Massey algorithm.

(r)(x). After 2t iterations, the computation of (x) is complete, with one polynomial coefficient in each cell.

An alternative version of the Berlekamp–Massey algorithm might be a pipelined implementation of 2t cells, with the rth cell performing the rth iteration of the algorithm.

This would be a high-speed decoder in which 2t Reed–Solomon sensewords are being decoded at the same time. As the last cell is performing the final iteration on the least-recent Reed–Solomon codeword still in the decoder, the first cell is performing the first iteration on the most-recent Reed–Solomon codeword in the decoder. This decoder has the same number of computational elements as 2t Berlekamp–Massey decoders working concurrently, but the data flow is different and perhaps simpler.

The Sugiyama algorithm, in contrast to the Berlekamp–Massey algorithm, begins with two polynomials of nonzero degree, one of degree 2t, and one of degree 2t− 1. At each iteration, the algorithm may decrease the degrees of the two polynomial iterates.

The central computation of the Sugiyama algorithm at theth iteration has the following form:

The Sugiyama algorithm terminates with the locator polynomial (x) of degree ν.

The polynomial(x) is the same locator polynomial as computed by the Berlekamp–

Massey algorithm. The coefficients of the quotient polynomial Q()(x) are computed, one by one, by the division algorithm.

Because Q()(x) need not have degree 1, the structure of one computational step of the Sugiyama algorithm seems quite different from the structure of one computational step of the Berlekamp–Massey algorithm. Another difference in the two algorithms is that the Sugiyama algorithm has a variable number of iterations, while the Berlekamp–

Massey algorithm has a fixed number of iterations. However, there are similarities at a deeper level. It is possible to recast the description of the Sugiyama algorithm to expose common elements in the structure of the two algorithms.

To restructure the Sugiyama algorithm, let ddenote the degree of Q()(x), and write

To multiply any vector by the matrix on the left side, multiply that vector, sequentially, by each matrix of the sequence on the right side. Indeed, this matrix factorization is easily seen to be a representation of the individual steps of the division algorithm. With this decomposition of the matrix on the right by the product of matrices on the left, the notion of an iteration can be changed so that each multiplication by one of these submatrices on the left is counted as one iteration. The iterations on this new, finer scale now have the form at each iteration, and now the Sugiyama algorithm has a fixed number of iterations.

The two by two matrices now more closely resemble those of the Berlekamp–Massey algorithm.

A systolic implementation of the Sugiyama algorithm can be designed by defining one cell to perform the computation of one coefficient of(s(x), u(x)). Then δrmust be provided as a global variable to all cells. This is strikingly different from the case of the Berlekamp–Massey algorithm because now, sinceδrarises naturally within one cell, it need not be computed as a global variable.

It is possible to make the similarity between the Sugiyama algorithm and the Berlekamp–Massey algorithm even stronger by redefining the polynomials to make each matrix contain only the first power of x. Let u(x) = xδrt(x). Then the iteration can be written as follows:



Another change can be made by recalling that the Sugiyama algorithm terminates with a normalization step to put the result in the form of a monic polynomial. The coefficientδr= Q()r is the rth coefficient of the quotient polynomial iteration. If t(x) is found to be monic, thenδris immediately available as a coefficient of s(x).