RNS BACKWARD CONVERTER BY USING MATRIX TECHNIQUE

(1)

RNS BACKWARD CONVERTER BY USING MATRIX

TECHNIQUE

1K.Soumya, 2A.Chaitanya Lakshmi, 3M.Sunitha Rani, 4M.Renuka

1,2,3,4_{Assistant Professor, ECE Dept, Vidya Jyothi Institute of Technology, Aziz Nagar, C.B. Post, INDIA}

ABSTRACT:

Residue Number System (RNS) is the important research area from last five decades. Forward & backward conversion process is the bottle neck which limits the use of RNS for computing needs.. In this paper, we proposed an efficient VLSI architecture for Matrix based RNS backward converter. We analyzed the performance of proposed architecture for different modulo sets of size up to ten. Implemented using TSMC standard cell 180 nm CMOS technology libraries and result analysis indicated that, the performance of proposed converter achieved about 59% area reduction and 30% efficient with respective to Time-Delay Product when compared to the state of art Backward converters.There is performance degradation in computing hardware built based on Weighted Number Systems (WNSs) due to the carry propagation phenomenon inherent to WNSs. The reduction/elimination of carry chains is the major challenge in improving computer arithmetic performance. Several approaches have been proposed like carry look ahead, prefix calculations, anticipated calculation, and alternative number representation systems, Residue Number Systems (RNS). RNS, which is the research topic has interesting inherent characteristics such as parallelism, modularity, fault tolerance, and carry free operations. For this reason it has been utilized in Digital Signal Processing (DSP) applications.

Keywords: CMOS, Residue Number System (RNS), Weighted Number Systems (WNSs).

1. INTRODUCTION:

Digital signal processors (DSP) are the core of wide range of applications like audio, image and video processing and consumer electronics to name a few. Unlike general

purpose microprocessors, DSP’s involve

repetitive numerical computations at high data rate. Most of the DSP’s such as digital filters, co-realtors and FFT processors involve repetitive

operations of addition, subtraction and

multiplication on large integers. Such specialized needs of DSPs demand very high-speed VLSI implementation of arithmetic units that perform computations in real time as the data arrives. There has been significant research since the emergence of VLSI implementation of DSPs in 1970s on developing algorithms for high speed

arithmetic operations. These traditional

approaches to improve speed have resulted in complex hardware and power hungry circuits to implement simple arithmetic operations. The performance and complexity of an arithmetic circuit are highly dependent on word length. A smaller word length results in a faster system with less complex hardware.

Residue number system (RNS) represents a large integer in slices of small integers. Arithmetic operations performed on large integers now can be performed on these small integers in parallel without carry propagation, thus improving the speed of the processor. This simple feature of RNS to reduce the word length of an operation makes it

attractive for VLSI implementation of

computational intensive DSP applications using low power architectures. RNS speeds up simple arithmetic operations like addition, subtraction, and multiplication but it is complex to perform

division, comparison, and sign-detection

(2)

consumption and delay when compared to the other state of art backward converters.

2. Backward Converter:

The backward converter translates a residue represented number into its equivalent weighted number. It is an important part of the RNS system because the conversion delay should not counteract the speed gain of the RNS arithmetic unit. Furthermore, most algorithms for performing difficult RNS operations, such as division and magnitude comparison, are based on backward conversion. Thus, an efficient design of the backward converter could improve the overall performance of the RNS system and increase its applicability. Due to these considerations, most research on RNS is focused on designing efficient backward converters. The moduli set and conversion algorithm both determine the complexity of the reverse converter. In other words, a judicious selection of the moduli set, with a dynamic range suitable for the application, and of a conversion algorithm compatible with the properties of the selected moduli set, could result in a high-performance hardware design for a backward converter.

The algorithms of backward conversion are principally based on the Chinese remainder theorem (CRT), mixed-radix conversion (MRC), and new Chinese remainder theorems (New CRTs). In addition to these, novel conversion algorithms have been proposed, which are not as general as CRT, MRC, or New CRTs. To design a reverse converter, the values of moduli of the moduli set must first be substituted in conversion algorithm formulas. Then, the resulting equations should be simplified by using arithmetic properties. Finally, simplified equations would be realized using hardware components such as full adders, half-adders, logic gates, or LUT.

2.1 Choice of moduli:

Ignoring other, more practical issues, the best moduli are probably prime numbers-at least from a purely mathematical perspective. A particularly useful property of such moduli is that of “generation”. If a modulus, m, is prime, then there is at least one primitive root (or generator), p≤ m -1, such that the set { |pi|m : i = 0,1,2….m-2} is the set

of all the non-zero residues with respect to m. Evidently, for such moduli, multiplication and

powering of residues may be carried out in terms of simple operations on indices of the power of the primitive root, in a manner similar to the use of

logarithms and anti-logarithms in ordinary

multiplication. For computer applications, it is important to have moduli-sets that facilitate both efficient representation and balance, where the latter means that the differences between the moduli should be as small as possible. Take, for example, the choice of 13 and 17 for the moduli, these being adjacent prime numbers; the dynamic range is 221. With a straightforward binary encoding, four bits and five bits, respectively will be required to represent the corresponding residues. In the former case, the representational efficiency is 13/16, and in the latter it is 17/32. If instead we chose 13 and 16, then the representational efficiency would be improved to 16/16 in the second case| but at the cost of reduction in the range (down to 208). On, the other hand, with the better balanced pair, 15 and 16, we would have both better efficiency and greater range: 15/16 and 16/16 for the former, and 240 for the latter.

It is also useful to have moduli that simplify the implementation of the arithmetic operations. This invariably means that arithmetic on residue digits should not deviate too far from conventional arithmetic, which is just arithmetic modulo a power of two. A common choice of prime modulus that does not complicate arithmetic and which has good representational efficiency is mi= 2i-1. Not all pairs of numbers of the form 2i -1 are relatively prime, but it can be shown that that 2j -1and 2k -1 are relatively prime if and only if j and k are relatively prime. Unbalanced moduli-sets lead to uneven architectures, in which the role of the largest moduli, with respect to both cost and performance, is excessively dominant. An example of a moduli-set with good balance is {2n-1, 2n,2n+1}.Many moduli sets are based on these choices, but there are other possibilities; for example moduli-sets of the form {2n-1, 2n,2n+1}are among the most popular in use.

(3)

to some extent be compatible with those for conventional arithmetic, especially given the “legacy” that exists for the latter.

And the fourth is the size of individual moduli: Al- though, as we shall see, certain RNS-arithmetic operations do not require carries between digits, which is one of the primary advantages of RNS, this is so only between digits. Since a digit is ultimately represented in binary, there will be carries between bits, and therefore it is important to ensure that digits (and, therefore, the moduli) are not too large. Low-precision digits also make it possible to realize cost-effective table-lookup implementations of arithmetic operations. But, on the other hand, if the moduli are small, then a large number of them may be required to ensure a sufficient dynamic range. Of course, ultimately the choices made, and indeed whether RNS is useful or not, depend on the particular applications and technologies at hand.

2.2 Existing Algorithms of Backward Conversion:

The algorithms of backward conversion [9] are principally based on the Chinese remainder theorem (CRT), mixed-radix conversion (MRC), and new Chinese remainder theorems (New CRTs). In addition to these, novel conversion algorithms have been proposed, which are not as general as CRT, MRC, or New CRTs.

2.2.1 Chinese Remainder Theorem:

The Chinese Remainder Theorem (CRT) may rightly be viewed as one of the most important fundamental results in the theory of residue number systems. It is, for example, what assures us that if the moduli of a RNS are chosen appropriately, then each number in the dynamic range will have a unique representation in the RNS and that from such a representation we can determine the number represented. The CRT is also useful in backward conversion as well as several other operations.

Conversion from residue numbers to

conventional equivalents seems relatively

straightforward on the basis of the Chinese Remainder Theorem (CRT). Unfortunately, the direct realization of an architectural implementation based on this theorem presents quite a few problems, and, compared to forward conversion, a generalized realization is likely to be both complex and slow.Given a set of pair-wise relatively-prime moduli, m1,m2...mn and a residue representation

x1,x2...xn in that system of some number X, i.e. xi =

|X| , that number and its residues are related by the equation where M is the product of the mi’s. If the values involved are constrained so that the final value of X is within the dynamic range,

(2.1)

then the modular reduction on the left-hand side may be omitted. Equation 2.1 shows clearly the primary difficulty in the application of the CRT: the need for the modular reduction of a potentially very large number relative to a large modulus (M).

[image:3.612.371.586.422.502.2]

A straightforward way to implement to implement Equation 4.1 is as follows. The constants Xi are multiplied, in parallel, with the residues, xi , and the results then added in a multi-operand modulo-M adder. Evidently, for a large dynamic range, large or many multipliers will be required, according to whether the range is obtained from a few, large moduli or from many, small moduli, and this may be costly. It should, however, be noted that the multipliers here need not be full multipliers: because one of each pair of operands is a constant, each multiplier can be optimized according to that particular operand

Figure 2.1 ROM based CRT-reverse converter Depending on the actual technology used for the realization, i.e. on the relative cost and performance figures, an obvious way to avoid the use of multipliers is to replace them with look-up tables (say ROMs). That is, for each value of Xi, have a ROM that stores all the possible values of x.Xi, for x = 0,1,2,…. mi -1.Thenfor each tuple,9 (x1.x2...xn), of residues the appropriate values are

con- currently read out and added in a multi-operand modulo-M adder. The resulting high-level architecture is then as shown in Figure 2.1.

With either of the general approach outlined above, the multi-operand modular adder may be realized as purely combinational-logic, or all-ROM a mixture of both. Almost all architectures that have been proposed so far are of the type

|X|M = | i m i

N

i

M

x

i

|

1 1  

(4)

shown in Figure 4.1 and fall into two main categories, according to how the multi-operand modular adder is implemented those that use a tree of two-input modulo-M adders, realized as ROMs or in combinational logic, and those that use combinational logic, with, perhaps, a tiny amount of ROM. In the first category the outer modular

reduction in Equation 2.1 is performed

incrementally through the tree, since each modulo-M adder produces a result that is less than M. And in the second category that modular reduction is performed after the intermediate sum has been computed. It should be nevertheless be noted that there have been a few architectures that do not fit neatly into this broad classification. The use of two-input modular adders realized as ROMs or combinational logic produces a structure whose performance can be rather low: the combinational adders, for example, must be full carry-propagate adders (CPA). For high performance, a better approach is to utilize a fundamental technique: in

the implementation of high-performance

multipliers: in a sequence of additions, carries need not be propagated with each addition but may be saved and assimilated after the last addition. Thus the multi- operand modular adder may be implemented as a tree of carry-save adders (CSAs) with a single final CPA to assimilate the partial carry/partial-sum output of the CSA-tree.

2.2.2 Mixed-radix number systems and conversion:

There is some relationship between representations in a residue number system and those in some mixed-radix number system. The latter are therefore significant, since they are weighted number systems and so facilitate the implementation of operations (such as magnitude comparison) that are problematic in residue number systems. Consider the example of a number system in which the digit-weights are not fixed but vary according to the sequence 15,13,11, with the interpretation that the weights of the digits are, from left to right, 15 ×13 ×11 ×7; 13×11×7; 11×7, 7 and 1. Essentially, each of these weights determines a distinct radix, whence the “mixed-radix”. To ensure unique representations in such a system, it is necessary to impose the constraint that the maximum weight contributed by the lower k digits must never exceed the positional weight of the (k + 1)st digit. Then, if the radices are rN, rN-1,… r1,anynumber X can be uniquely expressed in mixed-radix form as ,

X ≅ ( ) (2.2)

Whose interpretation is that

(2.3)

Where 0 ≤ zi ≤ ri. It is evident that this is a weighted, positional number system. The bounds above on zi guarantee a unique representation. The conversion from a residue number to a mixed-radix number may be regarded as a reverse transform since the mixed-radix system is weighted.

By MRC the number X can be calculated from

residues ( ) with moduli set

The coefficients { } can be obtained

from residues by

In general

(5)

[image:5.612.141.302.83.281.2]

Figure 2.2 ROM based MRC-reverse converter

2.2.3 New Chinese Remainder Theorem – I:

The weighted number can be computed from

residues ( ) with module set

(P1,P2,….Pn ) by New CRT-I as follows:

(2.5) By New CRT-I, the size of the final modulo adder is reduced in comparison to the traditional CRT. In particular, if the first modulus of the moduli set is selected in the form, and the multiplication of other moduli is in the form, the New CRT-I can be implemented by only a multi-operand modulus adder. The main drawback of New CRT-I and II is, they are generally for restricted moduli set.

2.2.4New Chinese Remainder Theorem II:

The general algorithm of New CRT-II has a tree-like architecture, and it reduces the final modulus adder size more than New CRT-I. Based on this algorithm, for the four moduli set , the number can be calculated from its corresponding residues by using the following equations

3. PROPOSED BACKWARD CONVERTER:

Our approach Matrix Method [2] is based on the periodicity property inherent in RNS. Periodicity is the ability of numbers to cycle in fixed periods with respect to some given moduli and within the dynamic range of the system. For example, given a 3-moduli set {m1=5,m2=4,m3=3} the residues cycle in a basic period. That is, the

residue sequence of modulus will have a period

of 5 entries (0,1,2,3,4), will have a period of 4

entries (0,1,2,3) while will have a period of 3 entries (0,1,2). Based on that the decimal equivalent of any given residue number is obtained by jumping backwards in the residue table (a table containing all the possible decimal numbers together with their residue equivalent within the dynamic range of the system) to the nearest residue number with at least one residue being zero. The value of that jump is recorded and the process continues until all the residues become zeros. Suppose that we have the

residue set corresponding to the

moduli set we can calculate its

decimal counterpart by a maximum number of n-consecutive jumps in the residue table such that each jump increases the number of zero residues with one. The process continues until in the final location all the elements become zero.

More formally, given the moduli set [m1,m2....mn]

the residue number is converted into the decimal number X as follows:

Where is defined as

i>1 and is the value to be determined based

on the following matrix based computation.

If p1,p2....pn are jumps and corresponding residues

are x1,x2,x3...xn respectively, then p1=x1

and the first location is:

(3.1)

(3.2) (3.2)

(6)

(3.3) The second jump is p2=c2m1, where c2 has to be

satisfy . Solving the above two

equations we obtain that

and the second location is defined by:

The third jump is p3=c3 m2m1 where c3 has to satisfy

and it is given by:

The third location is therefore given by:

This iterative process continues until the final location is zero and the last jump is determined by

, where has to be

satisfy

Solving the above, ; which is needed in the last

matrix computation is given by:

Then the final location is given by:

Therefore, as stated in Equation (3.1), the required Decimal equivalent of{x1,x2....xn} with respect to the moduli set {m1,m2....mn}is given by:

X=p1+p2+p3...pn

A critical look at Equations (3.4), (3.6), and (3.8) indicates that this process is similar, i.e., it is simply the product of moduli and their multiplicative inverses, which is pre-computed together with a number t1,i=2.n .This is similar to the process of computing MRD but the subtractions are done in parallel.

In order to clarify the algorithm let us assume for

example that we want to convert

to decimal. The algorithm is

applied as follows: (i)

is computed by:

The next location is therefore given by:

The final location is already (0,0,0) , there is no need to proceed further and hence the result is X = 3 + 15 = 18, as it should.Verilog HDL is used to describe the Hardware, simulated using Xilinx ISE 14.4 simulator and synthesized using Cadence RTL Compiler. TSMC 180nm standard cell technology libraries are used for synthesizing.the proposed VLSI architecture for Reverse converter based on Matrix Method is drawn below for m=5 {11,7,5,3,2}

(3.3)

(3.5)

(3.7)

(3.8)

(3.9)

(3.4)

(7)

[image:7.612.254.431.83.341.2]

Figure 3.1: Proposed VLSI Architecture for Backwared converter based on matrix method

4. RESULTS:

4.1SIMULATION RESULTS:

4.1.1 Simulation Results for Proposed Matrix Method:

[image:7.612.113.571.514.648.2]

(8)

4.2 Synthesize Report

4.2.1: ASIC for moduli set {11, 7, 5, 3, 2}

Module Timing(ps)

(ps* )

Area

( )

Power

(nw* )

T*A

(ps*nw* )

MATR 11.4 4614 48.8 52.8

MRC 13.8 5159 42.1 71.3

[image:8.612.104.565.341.531.2]

CRT 10.6 5705 61.0 60.5

Table 4.1: Synthesis report for ASIC Implementation to {11, 7, 5, 3, 2}

4.2.2: FPGA for moduli set {11, 7, 5, 3, 2}

Module Timing(ns) No. of slices

MATR 35.995 231

MRC 26.598 77

CRT 30.529 90

Table 4.2: Synthesis report for FPGA Implementation to {11, 7, 5, 3, 2}

4.3 Theoretical Area comparative Analysis:

In Mixed Matrix Method, except the first iteration, for all remaining iterations requires n parallel subtractions. For finding pi=(i=2,n) except

the first iteration it requires 2 multiplications because for every computation we calculate its moduli and their multiplicative inverse. We can say clearly from the equations of MATR (12) (14) (16). (n-1) conversion process required because for finding pi=(i=2,n), p1=x1 is a straight process. And

also n-1 additions for computation of backward

conversion X. Therefore the total sum of computations required for mixed matrix method is 4n-3. The asymptotic complexity is the order of o(n)

No. of Moduli sets MATR MRC CRT

2-{3,2} 5 4 7

5-{11,7,5,3,2} 17 28 34

7-{17,13,11,7,5,3,2} 25 54 62

10-{29,23,19,17,13,11,7,5,3,2}

37 108 119

[image:8.612.356.574.603.674.2]

(9)

[image:9.612.358.585.82.225.2]

In Mixed radix Method for finding mixed radix digits it requires n(n-1)/2 multiplications and subtractions. For decimal conversion equation i.e X, we require (n-1) additions and multiplications. Next coming to Chinese remainder Theorem i.e CRT, for evaluating the expression X it requires (n-1) additions and n multiplications and also in addition it requires modulo operations and n-divisions i.e for the computation of multiplicative inverse and modulo division n(n).so finally addition of all the computations for CRT it requires .Till now we discuss all theoretical computation requirements.

Fig 4.2: Theoretical Estimated Area for different No. of Moduli sets with different Techniques

In practical mixed matrix method is best choice comparative other methods in terms of area and timing. We can say this by seeing the below table. The practical and theoretical assumptions show that mixed matrix method is best among other methods i.e CRT and MRC. Coming to power CRT is best than matrix method.

4.4 Experimental Area Comparative Analysis:

No. Of Moduli sets MAT

R

MRC CRT

2-{3,2} 1157 957 1598

5-{11,7,5,3,2} 4614 5159 5705

7-{17,13,11,7,5,3,2} 6374 1264

8

1806 7

10-{29,23,19,17,13,11,7,5,3, 2}

9107 2257

6

[image:9.612.116.324.249.413.2]

2797 8

Table 4.4: Practical calculations for area

Fig 4.3: Practical Estimated Area for different No. of Moduli sets with different Techniques

4.5 Experimental Timing Comparative Analysis:

R

MRC CRT

2-{3,2} 9878 1080

6

9281

5-{11,7,5,3,2} 11456 1383

3

1061 2

7-{17,13,11,7,5,3,2} 19830 2586

7

1578 9

10-{29,23,19,17,13,11,7,5,3, 2}

23571 3018

8

[image:9.612.353.577.332.504.2]

1980 4

Table 4.5: Practical Values for timing

[image:9.612.108.331.533.641.2]

(10)

4.6 Experimental Timing and Area Comparative Analysis:

R

MR C

CR T

2- {2,3} 19 36 27

5-{11,7,5,3,2} 52 71 60

7-{17,13,11,7,5,3,2} 76 157 113

10-{29,23,19,17,13,11,7,5,3,2 }

96 178 148

Table 4.6: Practical values for T*A (T-timing, A-Area)

Fig 4.5: Product of Practical timing and area for different No. of Moduli sets with different Techniques

5. CONCLUSION AND FUTURE SCOPE: 5.1 conclusion:

The residue number system (RNS) has been an important research field in computer arithmetic for many decades, mainly because of its carry-free nature, which can provide high-performance computing architectures with superior delay specifications. Recently, research on RNS has found new directions that have resulted in the introduction of efficient algorithms and hardware implementations for RNS with much better performance than previous ones.In RNS, the weighted operands are converted into residue representations by a forward converter. Then, arithmetic operations such as addition, subtraction, and multiplication are performed on RNS numbers in parallel without carry propagation between residue digits. Hence, RNS results in parallel and high-speed addition, subtraction, and multiplication. Reverse conversion is the process, usually after some residue-arithmetic operations, of

translating from residue representations back to conventional notations. The main methods for reverse conversion are based on the Chinese Remainder Theorem (CRT) and the Mixed-Radix Conversion (MRC) technique. Reverse converter is bottle neck and the major limiting factor in wide spread use of RNS. An efficient reverse converter is the crucial factor in order to reduce the overhead conversion delay and the power consumption in an RNS.In this Dissertation proposed an efficient VLSI Architecture for Matrix based RNS backward converter. First, we generalize previously proposed technique up to 10-moduli set. Then we design and implement an efficient VLSI architecture for Matrix based RNS backward converter up to 10-moduli sets results in the reduction of area with 59% and 30% efficient with respective Time-Delay product when compared to state of art backward converters. This results shows that matrix method is an efficient

method for area. While performing this

implementation we also got results that CRT is better for speed and MRC is better for power. Verilog HDL is used to describe the Hardware, simulated using Xilinx ISE 14.4 simulator and synthesized using Cadence RTL Compiler. TSMC 180nm standard cell technology libraries are used for synthesizing.

5.2 Future Scope:

Residue number system is not so popular in the current scenario because of the bottleneck problem of conversions of RNS to binary and Binary to RNS which is preventing the wide spread application of RNS. Once the above mentioned issue has been rectified then the use of RNS for low and high speed applications will be prolonged. In this thesis we have addressed with one of the issue i.e. the residue to binary conversion or reverse conversion by proposing reverse converters which reduced the power consumption and also increased the computational speed.

Enhance the performance of Matrix Method by proposing the following methods

 Low power Techniques

 Pipelining

6. REFERENCES:

[1] A.Omondi and B. Premkumar. Residue Number Systems: Theory and Implementation, Imperial College Press,

London, 2007.

[image:10.612.108.331.117.424.2]

(11)

[3] A. A. Hiasat, -High-speed and reduced area modular adder structures for RNS,- IEEE Transactions on Computers, vol. 5 1, no. 1, pp. 74 -79, Jan. 2002.

[4] W.Wang , M.N.S.Swamy, M.O.Ahmad ,“RNS Application For Digital Image Processing,” In Proc.4th IEEE

Int.Workshop System-On- Chip FIR Real Time

Appl.,2004,pp.77-80

[5] G. L. Bernocchi, G.-c. Cardarilli, A. D. Re, A. Nannarelli, and M. Re, -Low-power adaptive filter based on RNS components,- in ISCAS, 2007, pp. 3211-3214.