DESIGN OF BWA BASED ARCHITECTURE FOR MATCHING OF DATA ENCODED WITH ERROR CORRECTING CODES FOR REDUCED LATENCY AND COMPLEXITY

(1)

172 | P a g e

DESIGN OF BWA BASED ARCHITECTURE FOR MATCHING OF DATA ENCODED WITH ERROR CORRECTING CODES FOR REDUCED LATENCY

AND COMPLEXITY

1

D. Rajesh,

²

D. Suneel

1Pursuing M.tech, ² Asst.professor (ECE)

Nalanda Institute of Engineering and Technology(NIET), Siddharth Nagar, Kantepudi village, Sattenepalli Mandal Guntur Dist.,A.P. INDIA

ABSTRACT

The error correcting codes will have well certaindesign of the encoded data and implemented by the calculated parity bits with it. The parity bits may be varied based on the particular code style. The error detection usually recognized by the comparing the evaluated parity bits of the incoming data. The previous designs are encode compare and decode compare designs which have large area due to huge hardware and which also makes delay in error correction. These implementations are based on the saturated adder through it.The proposed error correcting architecture will pass the parallel comparison between the data and bits and parity bits such it gains speed, still to increase the performance we are adopting a new adder design is butterfly weighed accumulator.

Which contain the less number of hardware blocks than the saturated adder, such that design enhances area and design complexity.This proposed design was implemented by the Verilog HDL and simulation will processed by the Xilinx ISE13.2v.

Keywords: Data comparison, error-correcting codes (ECCs), Hamming distance, systematic codes, tag matching.

I. INTRODUCTION

Information comparison is broadly utilized as a part of registering system to perform numerous operations, for example, the tag coordinating in a store memory and the virtual-to-physical location interpretation in translation compare aside buffer(TLB). Due to such predominance, it is imperative to execute the comparison circuit with low equipment intricacy. In addition, the information comparison more often than not lives in the basic way of the parts that are conceived to build the system execution, e.g., caches and TLBs, whose output decide the stream of the succeeding operations in a pipeline. The circuit, in this way, must be intended to have as low idleness as could be expected under the circumstances, or the parts will be precluded from serving as quickening agents and the general execution of the entire system would be seriously decayed. As late PCs utilize error

(2)

173 | P a g e

amending codes (ECCs) to secure information and enhance unwavering quality, confounded unravelling system, which must go before the information comparison, lengthens the basic way and fuels the unpredictability overhead. In this manner, it turns out to be much harder to meet the above outline limitations.

Notwithstanding the requirement for modern plans as depicted, the works that adapt to the issue are not broadly known in the writing since it has been typically treated inside of commercial enterprises for their items. As of late, on the other hand, set off the fascination of more considerations from the scholarly field. The latest answer for the coordinating issue is the immediate compare technique, which encodes the approaching information and after that contrasts it and the recovered information that has been encoded also. In this manner, the technique disposes of the intricate decoding from the basic way. In performing the comparison, the strategy does not compare at whether the recovered information is precisely the same as approaching information.

Fig. 1.(a) Decode-and-compare architecture, (b) encode-and-compare architecture.

Rather, it checks if the recovered information lives in the error correctable scope of the codeword comparing to the approaching information. As the checking requires an extra circuit to register the Hamming separation, i.e., the quantity of diverse bits between the two code words, the saturateadder (SA) was introduced in as an essential building piece for ascertaining the Hamming separation. In any case, did not consider an imperative reality that may enhance the viability encourage, a down to earth ECC codeword is normally spoken to in a methodical structure in which the information and equality parts are totally isolated from one another.

What's more, as the SA dependably compels its yield not to be more noteworthy than the quantity of perceivable errors by more than one, it adds to the increment of the whole circuit multifaceted nature. In this brief, we remodel the SA-based direct contrast structural planning with decrease the idleness and equipment multifaceted nature by determining the previously stated downsides. All the more particularly, we consider the qualities of efficient codes in outlining the proposed structural planning and propose a low-unpredictability handling component that processes the Hamming separation speedier. Along these lines, the inactivity and the equipment intricacy are diminished extensively even contrasted and the SAbased construction modeling. Whatever remains of this brief is sorted out as takes after..

(3)

174 | P a g e II. DESCRIPTION OF EXISTING SYSTEM

This area depicts the ordinary decode and-compare at construction modelling and the encode-and-compare about structural engineering taking into account the immediate compare strategy. For the purpose of solidness, just the label coordinating performed in a cache memory is examined in this brief, yet the proposed construction modeling can be connected to comparative applications without loss of all inclusive statement..

2.1 Decode-and-Compare Architecture

Give us a chance to consider a cache memory where a k-bit tag is put away as a n-bit codeword subsequent to being encoded by a (n, k) code. In the decode and-compare structural planning delineated in Fig. 1(a), the n-bi recovered codeword ought to first be decoded to remove the first k-bit tag. The separated k-bit tag is then contrasted and the k-bit label field of an approaching location to figure out if the labels are coordinated or not.

As the recovered codeword ought to experience the decoder before being contrasted and the approaching tag, the basic way is too long to be in any way utilized in a down to earth store system intended for rapid access.

Since the decoder is a standout amongst the most entangled preparing components, likewise, the many sided quality overhead is not immaterial

Fig 2. SA-based architecture supporting the direct compare method

2.2 Encode-and-Compare Architecture

Note that decoding is generally more perplexing and sets aside additional time than encoding as it includes a progression of error recognition or syndrome count, and mistake amendment. The usage results in bolster the case. To determine the disadvantages of the disentangle and-compare about building design, along these lines, the decoding of a recovered codeword is supplanted with the encoding of an approaching tag in the encode-and- compare construction modelling More accurately, a k-bit

Approaching tag is initially encoded to the relating n-bit code word X and contrasted and a n-bit recovered codeword Y. The comparison is to inspect what number of bits the two codeword’s contrast, not to check if the two codeword’s are precisely equal to one another. For this, we register the Hamming separation d between the two codeword’s and arrange the cases as indicated by the scope of d.Lettmax and rmax mean the quantities of maximally correctable and detectable mistakes, separately. The cases are condensed as takes after.

1) If d = 0, X matches Y precisely.

(4)

175 | P a g e

2) If 0 <d ≤ tmax, X will coordinate Y gave at most tax errors in Y are rectified.

3) If tmax<d ≤ rmax, Y has perceptible yet uncorrectablemistakes. For this situation, the store may issue a system blame in order to make the focal handling unit make a legitimate move.

4) If rmax<d, X does not coordinate Y.

Expecting that the approaching location has no errors, we can view the two labels as coordinated if d is in either the first or the second ranges. Thusly, while keeping up the mistake adjusting capacity, the structural engineering can expel the decoder from its basic way at the expense of an encoder being recently presented.

Note that the encoder is, all in all, much less complex than the decoder, and in this manner the encoding expense is altogether not exactly the translating expense.

Since the above strategy needs to process the Hamming separation, exhibited a circuit devoted for the calculation. The circuit appeared in Fig. 2 first performs XOR operations for each pair of bits in X and Y in order to produce a vector speaking to the bitwise distinction of the two codewords. The accompanying half adders (HAs) are utilized to include the quantity of 1's two nearby bits in the vector. The quantities of 1's are collected by going through the accompanying SA tree. In the SA tree, the gathered quality z is immersed to rmax + 1

on the off chance that it surpasses rmax. All the more decisively, given inputs x and y, z can be communicated as takes after:

z = x + y, if x + y ≤ rmax rmax + 1, generally.

The last collected quality shows the scope of d. As the necessary immersion requires extra rationale hardware, the complexityof a SA is higher than the ordinary adder

III. PROPOSED DESIGN IMPLEMENTATION 3.1 Data path Design for Systematic Codes

In the SA-based outline, the comparison of 2 code words is conjured once the approaching tag is encoded.

Thusly, the key way comprises of a progression of the coding furthermore the n-bit comparison.

Fig. 3.1Timing diagram of the tag match in (a) direct compare method [6]

and (b) proposed architecture.

(5)

176 | P a g e

On the other hand, didn't consider the genuine certainty that, in take after, the PC code word is of an exploratory kind amid which the data and equality segments zone unit completely isolated as appeared in Fig. 4. Since the information a piece of an experimental code word is absolutely a proportional in light of the fact that the approaching label field, it's straight off out there for comparison while the equality half gets to be out there singularly once the coding is finished. Grounded on this truth, the comparison of the k-bit labels is begun before the remaining (n–k)- bit comparison of the equality bits. Inside of the arranged configuration, accordingly, the coding strategy to concoct the equality bits from the approaching tag is performed in parallel with the label comparison, lessening the general idleness.

3.2 Architecture for Computing the Hamming Distance

Fig 3.2 Proposed architecture optimized for systematic codewords.

The arranged outline grounded on the data way style is appeared in beneath fig. It contains different butterfly- framed weight collectors (BWAs) wanted to upgrade the inertness and unpredictability of the playing separation calculation. the basic work of the BWA is to number the amount of 1's among its information bits. It comprises of various phases of HAs as appeared in Fig. 2.2, wherever every yield tiny bit of an hour point is identified with a weight. The HAs in an extremely arrange territory unit joined in a butterfly kind in this manner on aggregate the convey bits furthermore the aggregate bits of the higher stage severally. In option words, every inputs of an hour edge in an exceptionally stage, aside from the essential stage, territory unit either convey bits or aggregate bits processed inside of the higher stage.

This affiliation method results in a property that if partner yield tad bit of an hour edge is prepared, the amount of 1's among the bits inside of the ways coming to the hour edge is fit the heap of the yield bit. In Fig. 3.2, as a case, if the convey tad bit of the dark hued hour edge is prepared, the amount of 1's among the related data bits,

(6)

177 | P a g e

i.e., A, B, C, and D, is 2. At the last phase of Fig. 2.2, the amount of 1's among the information bits, d, is ascertained as

d = 8I + four (J + K + M) + a couple of (L + N + O) + P. (2)

Fig 3.3 Proposed BWA. (a) General structure

Since what we need isn't the exact playing separation however the shift it has a place with, it's possible to change the circuit. once rmax = one, as a sample, 2 or entirely 2 1's among the information bits is viewed as a proportionate case that falls inside of the fourth differ.

All things considered, we can supplant numerous HAs with a direct OR-door tree as appeared in Fig. This is regularly an or more over the Storm Troops that falls back on the obligatory immersion communicated in (1).

Note that in Fig. 6, there's no cover between any attempt of 2 convey bit lines or any attempt of 2 aggregate piece lines. since the covers exist singularly between convey bit lines and total piece lines, it's not depleting to determine covers inside of the up and coming innovation that gives different steering layers paying little heed to what rate bits a BWA takes. We tend to as of now legitimize the general outline in extra detail. Each XOR stage in Fig. five creates the bitwise refinement vector for either information bits or equality bits, furthermore the accompanying procedure parts check the amount of 1's inside of the vector, i.e., the playing separation. Each BWA at the essential level is inside of the updated kind appeared in Fig. related creates a yield from the OR- door tree and a few other weight bits from the hour edge trees. Inside of the interconnection, such output range unit encouraged into their related procedure parts at the second level.

The yield of the OR-entryway tree is joined with the accompanying OR-door tree at the second level, furthermore the remaining weight bits range unit associated with the second level BWAs per their weights.

Extra precisely, the bits of weight w territory unit joined with the BWA responsible for w-weight inputs. Each BWA at the second level is identified with a weight of an impact of 2 that is however or able Pmax, wherever Pmax is that the biggest force of 2 that is not greater than rmax + one. since the weight bits identified with the fourth differ territory unit all ORed inside of the reconsidered BWAs, there's no must be constrained to deal with the forces of 2 that zone unit bigger than Pmax. As a case, permit us to ponder a direct (8, 4) single-error amendment, twofold mistake recognition code.

(7)

178 | P a g e Fig.3.5 BWA architecture

The comparing beginning and second level circuits zone unit appeared in underneath fig. Note that the encoder and XOR banks don't appear to be attracted Fig. seven for the purpose of straightforwardness. Since rmax = a couple of, Pmax = a couple of and there are a unit exclusively 2 BWAs overseeing weights a couple of and one at the second level. Since the bits of weight four falls inside of the fourth shift, they're ORed. The remaining bits identified with weight a couple of or one range unit associated with their relating BWAs. Note that the interconnection affects no equipment many-sided quality, since it is accomplished by a pack of debilitating wiring.

Taking the output of the previous circuits, the decision unit at last figures out whether the considering so as to approach tag coordinates the recovered code word the four scopes of the playing separation. The decision unit is really a combinatory rationale of that reasonableness is particular by a truth table that takes the output of the first circuits as inputs. For the (8, 4) code that the comparing beginning and second level circuits territory unit appeared in above fig. , the truth table for the decision unit is portray in Table I. Since U and V can't be set in the meantime, such cases territory unit certainly encased in couldn't care less terms in Table I. and V can't be set at the same time, such cases area unit implicitly enclosed in don't care terms in Table I.

Table3.5 Decision unit

(8)

179 | P a g e 3.3 General Expressions for the Complexity and the Latency

The quality in addition in light of the fact that the idleness of combinatory circuits intensely relies on upon the algorithmic system utilized. Furthermore, as the multifaceted nature and in this way the dormancy ar some of the time clashing with each other, itis unfortunately burdensome to determine Associate in Nursing systematic and totally settled mathematical statement that demonstrates the association between the amount of entryways and the inertness for the arranged outline and conjointly for the standard SA-based configuration. to maintain a strategic distance from the issue in analyticalderivation, we tend to blessing rather Associate in Nursing expression which will be usual appraisal the quality and accordingly the idleness by utilizing a few variables for the nondeterministic components. The nature of the arranged construction modeling, C, may be communicated as

C = CXOR+ CENC+ CBWA(k) + CBWA(n− k)+ C2nd + CDU ≤ n + CENC + 2CBWA(n) + CDU (3)

Where CXOR, CENC, C2nd, CDU, and CBWA(n) are the complexities of XOR banks, an encoder, the second level circuits, the choice unit, and a BWA for n inputs, separately. Utilizing the repeat

CBWA(n) can be ascertained as

CBWA(n) = CBWA (_n/2_) + CBWA (_n/2_) + 2 _n/2_ (4)

Wherever the seed worth, CBWA(1), is 0. Note that once a + b = c, CBWA(a) + CBWA(b) ≤ CBWA(c) holds for every single positive whole number a, b, and c. as a consequence of the distinction furthermore the irrefutable actuality that partner OR-entryway tree for n inputs is regularly less muddled than a BWA for n inputs wherever LXOR, LENC, L2nd, LDU, related LBWA(n) zone unit the latencies of a XOR bank, an encoder, the second level circuits, the decision unit, and a BWA for n inputs, severally.

Where LXOR, LENC, L2nd, LDU, related LBWA(n) territory unit the latencies of a XOR bank, an encoder, the second level circuits, the decision unit, and a BWA for n inputs, severally. Note that the latencies of the OR- door tree and BWAs for x ≤ n inputs at the second level range unit all delimited by_log2 n. joined of BWAs at the essential level completions past the inverse, a few sections at the second level could start prior. Additionally, some BWAs or the OR-door tree at the second level could offer their yield prior to the decision unit all together that the unit will start its operation while not comparing for the majority of its inputs. In such cases, L2nd and LDU is mostly covered up by the vital way of the previous circuits, and L gets to be shorter than the given expression.

IV. SYNTHESIS AND SIMULATION RESULTS

In this project we have designed an architecture which has very less delay and complexity for calculating the hamming distance. The hamming distance it denotes number of error parity bits in the design. The existing architecture is based on the saturated adder its based design. The proposed design will be based Binary weighted accumulator. The architecture has BWA blocks and modified BWA blocks which are present in second level.

(9)

180 | P a g e

Then they will be connected by or gate to decision unit, which decides the match/mismatch/fault. Several BWA architectures, OR gate and Decision unit modules are designed using Verilog HDL. To check the latency we have synthesized the design descriptions in Xilinx ISE 13.2i software. Their respective descriptions and simulated results are shown below

Fig 4.1 Technology schematic for final architecture

Fig 4.2 Technology schematic for BWA architecture

Fig 4.3 Synthesis area report

(10)

181 | P a g e Fig 4.4 Synthesis timing report

Fig 4.5 Simulation report

V. CONCLUSION

To reduce the latency and hardware complexity, a Novel architecture was employed for matching the data error of ECC. The proposed idea of design based on the BWA architecture. This is based on the less number of half adders compared to the saturated adder. The proposed architecture we have designed the modules Using Verilog HDL. The designs are synthesized and simulated in Xilinx ISE 13.2i. The synthesized results are proving that proposed architecture will have very less number of Luts and needs much delay due to the comparator in saturated adder.

REFEERNCES

[1] Byeong Yong Kong, Jihyuck Jo, HyewonJeong, Mina Hwang, Soyoung Cha, Bongjin Kim, and In- CheolParkLow-Complexity Low-Latency Architecture for Matching of Data EncodedWith Hard Systematic Error-Correcting Codes IEEE Transactions On Very Large Scale Integration (Vlsi) Systems, Vol. 22, No. 7, July 2014

[2] W. Wu, D. Somasekhar, and S.-L. Lu, “Direct compare of information coded with error-correcting codes,” IEEE Trans. VeryLarge Scale Integr. (VLSI) Syst., vol. 20, no. 11, pp. 2147–2151, Nov. 2012.

[3] S. Lin and D. J. Costello, Error Control Coding: Fundamentals and Applications, 2nd ed. Englewood Cliffs, NJ, USA: Prentice-Hall,2004.

(11)

182 | P a g e

[4] Y. Lee, H. Yoo, and I.-C. Park, “6.4Gb/s multi-threaded BCH encoder and decoder for multi-channel SSD controllers,” in ISSCC Dig.Tech.Papers, 2012, pp. 426–427

[5] H. Ando, Y. Yoshida, A. Inoue, I. Sugiyama, T. Asakawa, K. Morita, T. Muta, and T. Motokurumada, S.

Okada, H. Yamashita, and Y. Satsukawa, “A 1.3 GHz fifth generation SPARC64 microprocessor,” in IEEEISSCC. Dig. Tech. Papers, Feb. 2003, pp. 246–247.

[6] M. Tremblay and S. Chaudhry, “A third-generation 65nm 16-core 32-thread plus 32-scout-thread CMT SPARC processor,” in ISSCC.Dig.Tech. Papers, Feb. 2008, pp. 82–83.

[7] AMD Inc. (2010). Family 10h AMD Opteron Processor Product Data Sheet, Sunnyvale, CA, USA [Online].Available:http://support.amd.com/us/Processor_TechDocs/40036.pdf

[8] J. Chang, M. Huang, J. Shoemaker, J. Benoit, S.-L. Chen, W. Chen, S. Chiu, R. Ganesan, G. Leong, V.

Lukka, S. Rusu, and D. Srivastava, “The 65-nm 16-MB shared on-die L3 cache for the dual-core Intel xeonprocessor 7100 series,” IEEE J. Solid-State Circuits, vol. 42, no. 4, pp. 846–852, Apr. 2007

AUTHOR DETAILS

D.RAJESH , Pursuing M.techin Nalanda Institute of Engineering and Technology, He completed hisB.tech Electronics and Communication Engineering. His research of interest includes communication systems, Digital communications, Satellite communication etc.

D.SUNEEL , Working as assistant professors in Nalanda Institute of Engineering and Technology, He completed his Post Graduation, He has four years of teaching experience. Her research of interest includes communication systems, Digital communications, Satellite communication etc.