VLSI Design of Low Power Cache Architecture P.Umadevi ,K.Nishaji

(1)

ISSN: 2395-1303 http://www.ijetjournal.org Page 1

VLSI Design of Low Power Cache Architecture

P.Umadevi ,K.Nishaji

Department of Computer Science and Engineering, SBM College of Engineering and technology,

Dindigul-5.

Abstract— High-performance microprocessors

employ cache write-through policy for performance improvement and at the same time achieving good tolerance to soft errors in on-chip caches. However, write-through policy also incurs large energy overhead due to the increased accesses to caches at the lower level (e.g., L2 caches) during write operations. In existing paper, new cache architecture referred to as way-tagged cache to improve the energy efficiency of write- through caches. By maintaining the way tags of L2 cache in the L1 cache during read operations, the proposed technique enables L2 cache to work in an equivalent direct-mapping manner during write hits, which account for the majority of L2 cache accesses. This leads to significant energy reduction without performance degradation. In this paper proposed Modified LFSR (Linear Feedback Shift Register) Architecture is used which efficiently performs for data array management. The proposed system reduces the hardware consumption to 77 slices, 144 LUTs and 220 IOBs when compared with existing system hardware consumption of 95 slices, 167 LUTs and 364 IOBs. The proposed system is designed using Verilog HDL, simulated using Models software

and synthesized using Xilinx Project Navigator.

Index terms: Way Decoder, Way-tag array, Way register, Modified LFSR.

I. INTRODUCTION

Multi-level on-chip cache systems have been widely adopted in high performance microprocessors [2]-[4]. To keep data consistence throughout the memory hierarchy, write-through and write-back policies are commonly employed. Under the write-back policy, a modified cache block is copied back to its corresponding lower level cache

only when the block is about to be replaced. While under the write-through policy, all copies of a cache block are updated immediately after the cache block is modified at the current cache, even though the block might not be evicted. As a result, the write- through policy maintains identical data copies at all levels of the cache hierarchy throughout most of their life time of execution. This feature is important as CMOS technology is scaled into the nanometer range, where soft errors have emerged as a major reliability issue in on-chip cache systems. It has been reported that single-event multi-bit upsets are getting worse in on-chip memories [8]-[10]. Currently, this problem has been addressed at different levels of the design abstraction. At the architecture level, an effective solution is to keep data consistent among different levels of the memory hierarchy to prevent the system from collapse due to soft errors. Benefited from immediate update, cache write-through policy is inherently tolerant to soft errors because the data at all related levels of the cache hierarchy are always kept consistent. Due to this feature, many high- performance microprocessor designs have adopted the write-through policy [14]-[15].

Even though write through policy is tolerant to soft errors, the write-through policy also incurs large energy overhead. This is because under the write-through policy, caches at the lower level experience more accesses during write operations.

Consider a two-level (i.e., Level-1 and Level-2)

cache system for example. If the L1 data cache

implements the write-back policy, a write hit in the

L1 cache does not need to access the L2cache. In

contrast, if the L 1 cache is write-through, then both

L1 and L2 caches need to be accessed for every write

operation. Obviously, the write-through policy incurs

more write accesses in the L2 cache, which in t urn

increases the energy consumption of the cache

(2)

ISSN: 2395-1303 http://www.ijetjournal.org Page 2

system. Power dissipation is now considered as one of the critical issues in cache design. Studies have shown that on-chip caches can consume about 50%

of the total power in high-performance microprocessors [5]-[7].

In the existing work way-tagged cache used to improve the energy efficiency of write-through cache systems and no performance degradation. Consider a two-level cache hierarchy; the data residing in the L1 cache will have copies in the L2 cache. In addition, the locations of these copies in the L2 cache will not change until they are evicted from the L2 cache.

Thus, we can attach a tag to each way in the L2 cache and send this tag information to the L1 cache when the data is loaded to the L1 cache. By doing so, for all the data in the L1 cache, we will know exactly the locations (i.e., ways) of their copies in the L2 cache.

During the subsequent accesses when there is a write hit in the L1 cache (which also initiates a write access to the L2 cache under t he write-through policy), we can access the L2 cache in an equivalent direct- mapping manner because the way tag of the data copy in the L2 cache is available. As this operation accounts for the majority of L2 cache accesses in most applications, the energy consumption of L2 cache can be reduced significantly. Even though the existing method, way tagged cache reduces the energy consumption and achieves no performance degradation but still there is demand for low power consumption and hardware utilization

In the proposed work modification are made to the architecture of way tagged cache. Data array is managed using modified linear feedback shift register in order to improve the performance of architecture and also reduce the hardware utilization. Modified LFSR are better than the LFSR since it can generate the any bit combination of test patterns.

II.WAY-TAGGED CACHE ARCHITECTURE Way-tagged cache consists of tag array, data array, way-tag arrays, way-tag buffer, way decoder, and way register. The way tags of each cache line in the L2 cache are maintained in the way-tag arrays, located with the L1 data cache. Note that write buffers are commonly employed in write-through

caches (and even in many write-back caches) to improve the performance. With a write buffer, the data to be written into the L1 cache is also sent to the write buffer. The operations stored in the write buffer are then sent to the L2 cache in sequence. This avoids write stalls when the processor waits for write operations to be completed in the L2 cache. We also need to send the way tags stored in the way-tag arrays to the L2 cache along with the operations in the write buffer. Thus, a small way-tag buffer is introduced to buffer the way tags read from the way- tag arrays.

Fig 1: Way-tagged Cache

III.WAY-TAGGED CACHE ARCHITECTURE WITH MODIFIED LFSR

A. Tag Array:

Fig 2: Tag array

(3)

ISSN: 2395-1303 http://www.ijetjournal.org Page 3

The tag array architecture using an XOR network as its on-chip memory hardware Architecture. All bits of an input data slice are generated in parallel and are applied to the actual scan chains simultaneously. The scan-out values of internal scan chains are compacted in parallel using an XOR-based response compactor or a multi-input signature register (MISR)

B. Data Array

Data array contains the data itself. Tag array contains the addresses of the data contained in the cache. Processors access the tag arrays. Once the tag array has been accessed, its output must be compared to the address of the memory reference to determine if a hit has occurred. Using address access data array or modify the address data at the data array and get the final result .

C. Way Decoder

The function of the way decoder is to decode way tags and activate only the desired ways in the L2 cache. The line size of way-tag arrays is bits, where the number of ways in the L2 cache is.

This minimizes the energy overhead from the additional wires and the impact on chip area is negligible. For a L2 write access caused by a write hit in the L1 cache, the way decoder works as a to- decoder that selects just one way-enable signal. The way decoder operates simultaneously with the decoders of the tag and data arrays in the L2 cache.

For a write miss or a read miss in the L1 cache, we need to assert all way-enable signals so that all ways

in the L2 cache are activated. Two signals read and

write miss, determine the operation mode of the way decoder.

Signal read will be “1” when a read access is sent to the L2 cache. Signal write miss will be “1” if the write operation accessing the L2 cache is caused by a write miss in the L1cache.

D. Way tag Buffer

Fig 3: Way tag buffer

Way tag buffer receives the data from way tag array and store in to memory elements. The output from way tag buffer is sent to way decoder. Way-tag buffer temporarily stores the way tags read from the way-tag arrays. The way-tag buffer has separate write and read logic in order to support parallel write and read operations. The write operations in the way-tag buffer always occur one clock cycle later than the corresponding write operations in the write buffer. This is because the write buffer, L1 cache, and way-tag arrays are all updated at the same clock cycle when a STORE instruction accesses the L1 data cache (see Fig. 4). Since the way tag to be sent to the way-tag buffer comes from the way-tag arrays, this tag will be written into the way-tag buffer one clock cycle later. The EMPTY signal of the way-tag buffer is employed as the enable signal for read operations;

i.e., when the way-tag buffer is empty, a read operation is not allowed. During normal operations, the write operation and the way tag will be written into the write buffer and way-tag buffer, respectively.

Thus, when this write operation is ready to be sent to the L2 cache, the corresponding way tag is also available in the way-tag buffer, both of which can be sent together.

E. Way-tag array:

TABLE I Way -tag array operations

WRITE_H

UPDATE OPERATION

1 1 Write Way-tag arrays

1 0 Read Way-tag arrays 0 0 No access

0 1 No access

(4)

ISSN: 2395-1303 http://www.ijetjournal.org Page 4

Each cache line in the L1 cache keeps its L2 way tag information in the corresponding entry of the way-tag arrays, as shown inFig.4, where only one L1 data array and the associated way-tag array. When a data is loaded from the L2 cache to the L1 cache, the way tag of the data is written into the way-tag array. At a later time when updating this data in the L1 data cache, the corresponding copy in the L2 cache needs to be up-dated. The way tag stored in the way-tag array is read out and forwarded to the way-tag buffer together with the data from the L1 data cache. The write/read signal of way-tag arrays, WRITEH_W, is generated from the write/read signal of the data arrays in the L1 data cache. A control signal referred to as UPDATE is obtained from the cache controller.

During the read operations of the L1 cache, the way- tag arrays do not need to be accessed and thus are deactivated to reduce energy overhead. To achieve this, the word line selection signals generated by the decoder are disabled by WRITEH ( , ) through AND gates.

F. Interfacing Way tag array and buffer

w

Fig 4: Interfacing Way tag array and buffer G. Modified LFSR

Structure of LFSR is fixed. If LFSR length is 4 means, we are able to generate 16 combinations only But modified LFSR are able to generate any bit combinations of test Patterns. So that we can generate huge test patterns.

Clk

Fig 5: Modified LFSR

Data array is managed using modified linear feedback shift register in order to improve the performance of architecture and also reduce the hardware utilization

IV. EXPERIMENTAL RESULT 1)Modified LFSR

2)Way-tag cache architecture

+ + +

Address

Data array

Way tag array

Way tag buffer

Way decoder

DFF DFF DFF DFF

(5)

ISSN: 2395-1303 http://www.ijetjournal.org Page 5

Device utilization summary:

Selected Device : 3s250epq208-4

Number of Slices: 95 out of 2448 3%

Number of 4 input LUTs: 167 out of 4896 3%

Number of IOs: 365

Number of bonded IOBs: 364 out of 158 230%

3)Modified LFSR

Device utilization summary:

Selected Device : 3s250epq208-4

Number of Slices: 77 out of 2448 3%

Number of 4 input LUTs: 144 out of 4896 2%

Number of IOs: 221

Number of bonded IOBs: 220 out of 158 139%

4) Performance Analysis

TABLE II Performance Comparison

5) Performance Evaluation Graph

V. CONCLUSION AND FUTURE WORK The result presents a new energy-efficient

cache technique for high performance microprocessors employing the write-through policy.

The proposed technique attaches a tag to each way in the L2 cache. This way tag is sent to the way-tag arrays in the L1 cache when the data is loaded from the L2 cache to the L1 cache. Utilizing the way tags stored in the way-tag arrays, the L2 cache can be accessed as a direct-map-ping cache during the subsequent write hits. Simulation results demonstrate significant reduction in device utilization thereby reducing the complexity of the architecture and no performance degradation. Furthermore, the idea of way tagging can be applied to many existing low- power cache techniques such as the phased access cache to further reduce cache energy consumption.

Data array is managed using modified LFSR in order to improve the performance of architecture.

In future, this work can be extended to reduce the hardware utilization and power consumption using a shared LUT. The proposed cache designed will be implemented in real time FPGA Spartan 3E Processor to analyze its performance.

REFERENCES

[1] Jianwei Dai and Lei Wang“An Energy-Efficient L2 Cache Architecture UsingWay Tag Information Performance Existing Proposed

Evaluation Method Method Parameter

Slices 95 77

LUTs 167 144

IOBs 364 220

(6)

ISSN: 2395-1303 http://www.ijetjournal.org Page 6

Under Write-Through Policy” VOL. 21, NO. 1, JANUARY 2013.

[2] G. Konstadinidis, K. Normoyle, S. Wong, S.

Bhutani, H. Stuimer, T.Johnson, A. Smith,

“Implementation of a third-generation1.1-GHz 64-bit microprocessor,” IEEE J. Solid-State Circuits, vol.

37,no. 11,pp. 1461–1469, Nov. 2002.

[3] S. Rusu, J. Stinson, S. Tam, J. Leung, H.

Muljono, and B. Cherkauer,“A 1.5-GHz 130-nm itanium 2 processor with 6-MB on-die L3 cache,”IEEE J. Solid-State Circuits,vol.38, no. 11, pp. 1887–1895, Nov. 2003

[4] D. Wendell, J. Lin, P. Kaushik, S. Seshadri, A.

Wang,V.Sundararaman,P.Wang,H.McIntyre,S.Kim, W.Hsu,H.Park,G.Levinsky, J. Lu, M. Chirania, R.

Heald, and P. Lazar, “A 4 MBon-chip L2 cache for a 90 nm 1.6 GHz 64 bit SPARC microprocessor,” in IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig.

Tech. Papsers , 2004, pp. 66–67.

[5] S. Segars, “Low power design techniques for microprocessors,” in Proc. Int. Solid-State Circuits Conf. Tutorial , 2001, pp. 268–273.

[6] A.Malik,B.Moyer, and D.Cermak,“A low power unified cache architecture providing power and performance flexibility,” in Proc. Int.Symp. Low Power Electron. Design, 2000, pp. 241–243.

[7] D. Brooks, V. Tiwari, and M. Martonosi,

“Wattch: A framework for architectural-level power analysis and optimizations,” in Proc. Int.

Symp.Comput. Arch. , 2000, pp. 83–94

[8] J.Maiz,S.hareland, K.Zhang, and P.Armstrong,“Characterization of multi-bit soft error events in advanced SRAMs,” in Proc. Int. Electron Devices Meeting , 2003, pp. 21.4.1–21.4.4.

[9] K. Osada, K. Yamaguchi, and Y. Saitoh, “SRAM immunity to cosmic-ray-induced multi errors based on analysis of an induced parasitic bipolar effect,”IEEE J. Solid-State Circuits , pp. 827–

833,2004.

[10] F. X. Ruckerbauer and G. Georg akos, “Soft error rates in 65 nm SRAMs: Analysis of new phenomena,” in Proc. IEEE Int. On-Line Test. Symp.

, 2007, pp. 203–204.

[11] G. H. Asadi, V. Sridharan, M.B. Tahoori, and D.

Kaeli, “Balancing performance and reliability in the memory hierarchy,” in Proc. Int. Symp.Perform.

Anal. Syst. Softw. , 2005, pp. 269–279.

[12] L. Li, V. Degalahal, N. Vijaykrishnan, M.

Kandemir, and M. J. Irwin,“Soft error and energy consumption interactions: A data cache per- spective,” in Proc. Int. Symp. Low Power Electron.

Design, 2004, pp.132–137.

[13]X.Vera,J.Abella,A.Gonzalez,andR.Ronen,“Redu cingsofterror vulnerability of data caches,” presented at the Workshop System Effects Logic Soft Errors, Austin, TX, 2007.[13] P. Kongetira, K. Aingaran, and K. Olukotun, “Niagara: A 32-way multithreaded Sparc processor,” IEEE Micro , vol. 25, no. 2, pp.

21–29,Mar. 2005.

[14] P. Kongetira, K. Aingaran, and K. Olukotun,

“Niagara: A 32-way multithreadedSparc processor,”

IEEE Micro, vol. 25, no. 2, pp. 21–29,Mar. 2005.

[15] J. Mitchell, D. Henderson, and G. Ahrens, “IBM POWER5 processor-based servers: A highly available design for business-critical applications,”

IBM, Armonk, NY, White Paper, 2005.

[16]Shi-You Cheng and Juinn-Dar Huang,”Low

Power Instruction Cache Architecture Using Pre-Tag

Checking”VLSI Design Automation and Test 2007

[17]Uming Ko and Balsara,P.T, “Charaterization and

design of a Low-Power, High-Performance Cache

Architecture”VLSI Technology, System and

Application, 1995.