International Journal of Emerging Technology and Advanced Engineering
Website: www.ijetae.com (ISSN 2250-2459,ISO 9001:2008 Certified Journal, Volume 4, Issue 10, October 2014)
255
Partial Tag CBF Technique Based Low Power L2 Cache
Architecture
Ritty Varghese
1, Minu Johny
21P.G. Student, 2Asst. Professor, Electronics & Communication Department, IIET, Nellikuzhi, India
Abstract—To improve the performance of memory, cache
memory is placed between the main memory and processor to reduce the memory access time. Many processors uses cache write through policy which increases more write access and incur more energy, during write operations. Way-tagged cache is used to improve the energy efficiency of write through policy. In this the way tag of the L2 cache information is available in L1 cache, during read operation. Due to this the way-tag technique enables L2 cache to work in an equivalent direct mapping manner during write hits, and it leads to reduce the significant energy without performance degradation. While for read/write misses, the corresponding way tag information is not available in the way-tag arrays. As a result, all ways in the L2 cache are activated simultaneously under read misses. It uses set associative technique and incurs more energy. The partial tag counting bloom filter is used to improve the accuracy of the cache. The incoming address is partially matches with the cache line address by using partial tag. If the partial tag gives matches then only full tag comparison of address will be performed. If there is no match, then the further comparison is not needed. Counting bloom filter improve the energy efficiency. The partial tag CBF is significantly faster and requires much less energy than accessing the large set of memory. The Counting Bloom Filter is used, to test current existence of an element in the large set. The element is definitely not in the large set, if count is zero; otherwise, CBF cannot assist and the large set must be searched. Due to this CBF improves the energy efficiency of cache architecture and reduces the more power dissipation.
Keywords— Cache, low power, write-through policy,
Write-back policy, counting bloom filter
I. INTRODUCTION
An important reliability issue [2] in on-chip cache systems is soft errors, which occurs in a signal or data that is wrong. It will not damage the system hardware but damage the data that is being processed. In on-chip memories, single event multi-bit upsets is a growing issue. Power dissipation is a critical issue in cache design. On-chip cache can consume about 50% of the total power in high- performance microprocessors as they use longer addresses and cache tag arrays become larger [3].
Level 1 cache, often called primary cache, is a static memory integrated with processor core that is used to store information recently accessed by a processor, in multilevel on-chip cache systems. For data fetching, checking the smallest Level 1 (L1) cache first, if it hits (data found), the processor proceeds by taking data from L1 cache at high speed. If a miss is given by the smaller cache, then the next larger cache Level 2 (L2) cache is checked, and so on, before the main memory is checked. L2 caches are usually unified caches and its size starts from 256kB. The cache size increases in the high-end applications to meet the larger bandwidth requirements. The most recently used (MRU) addresses based predictions has poor prediction rate and miss predictions lead to performance degradation [4]. Applying way prediction to L2 caches cause large timing and area overheads. The High-Performance microprocessors now-a-days started using longer addresses, hence cache tag arrays become larger and significant energy overhead occurs.
In high-performance microprocessors, multilevel on-chip cache systems have been widely adopted [5]. To keep data in the cache memory and main memory identical, write-through and write-back policies are commonly employed. Under the write-back policy, the information is written inthe main memory only when the line is removed from the cache. While under the write-through policy, all the data/instruction is written both in the cache and in the main memory. The copy of data in the main memory is never different from that of the copy in the cache. As a result, the write-through policy maintains identical data copies at all levels of the cache hierarchy throughout most of their life time of execution. Due to frequent memory access during write operations this enables large energy overhead.
International Journal of Emerging Technology and Advanced Engineering
Website: www.ijetae.com (ISSN 2250-2459,ISO 9001:2008 Certified Journal, Volume 4, Issue 10, October 2014)
256
The location of the data in L2 cache will not be modified unless it is deleted. When the L1 data cache loads a data from the L2 cache, the way tag of the data in the L2 cache is also sent to the L1 cache and stored in a new set of way-tag arrays. These way way-tags provide the key information for the subsequent write accesses to the L2 cache hence reducing significant power consumption. The cache hit prediction causes high energy consumption in applications with high cache miss rate. Bloom filter algorithm tries to predict the cache misses [6], [7]. Based on this prediction, energy saving in tag comparison is achieved. The fast counting bloom filter consists of an array of up/down linear feedback shift registers (LFSRs) and local zero detectors. The idea of tag-CBF architecture is proposed that achieves cache hit as well as cache miss prediction. The idea of tag bloom can be applied to other low-power cache architectures so as to achieve better performance and energy efficiency.
The rest of this paper is organized as follows. In Section II, we provide the literature review. In Section III, contains a detailed description of the proposed partial tag cbf cache. Section IV reports the simulation results. Section V concludes this paper.
II. LITERATURE REVIEW
Tag comparison and data access are the major factors of cache power consumption. To develop the cache architecture with low power consumption and better performance research is being made continuously. Dai and Wang [1] proposed a new cache architecture called way-tagged cache to reduce L2 cache accesses under write-through policy. Here, way-tags of L1 cache are stored in L2 cache and during write operations they allow request to access only the required way in L2 cache. Su et al. [6] portioned cache data arrays into banks. Only the sub-bank containing the required data is activated with each access. Powel et al. [7] used selective direct mapping for energy reduction in L1 cache. Panwar et al. have shown that cache-tag access and tag comparison do not need to be performed for all instruction fetches. Here without performing any tag check, we can find the way for instruction. Bloom filter predicts the cache misses in [8], deactivating the ways corresponding to cache miss. Applying way-prediction also results in huge area and timing overhead . The way-predicting set-associative cache technique proposed by Inoue et al. [8], predicts the ways of tag as well as data arrays where the required data may be available. This technique provided considerable energy efficiency and hence is used in microprocessor design.
Redundant cache proposed by Min et al. [9] uses redundant cache to predict incoming cache references. In our proposal, way-tagging reduces the energy consumption and decaying Bloom filter improves the accuracy of cache miss prediction. Section V-C provides more detail comparing the proposed architecture with these related work.
III. TWO LEVEL CACHE ARCHITECTURE
Fig.1: Two level cache architecture
Read operations accounts for a large portion of total memory access in L1 cache, whereas in L2 cache, write operations are dominant, specially write hits. A read/write miss in L1 cache leads to the read/write access in the L2 cache. Each L2 read or write access consumes roughly the same amount of energy on average.
IV. COUNTING BLOOM FILTER
International Journal of Emerging Technology and Advanced Engineering
Website: www.ijetae.com (ISSN 2250-2459,ISO 9001:2008 Certified Journal, Volume 4, Issue 10, October 2014)
257 1) CBF: A CBF is conceptually an array of counts indexed via a hash function of the element under membership test. A CBF has three operations: 1) increment count (INC) 2) decrement count (DEC) and 3) PROBE –it test if the count is zero. If the element needs to be increment, the corresponding count will be incremented by one. Probe is used to check whether the count is zero or one.
2) CBF Characteristics: Membership tests are done by using probe operation. Under the membership test, a CBF provides one of the two answers: 1) ―definite no,‖ indicating that the element is definitely not a member of the large set and 2) ―I don’t know,‖ implying that the CBF cannot assist in a member ship test.
3) CBF Functionality: Initially, all counts are set to zero and the large set is empty. When any element wants updated into or removed from the large set, the corresponding CBF count is incremented or decremented by one. To be able to check whether an element currently exists in the cache, the corresponding Counting Bloom Filter is used. If the count indicates zero, then the element is definitely not in the large set. If the count indicates one, then the large set must be searched
V. PROPOSED METHOD
In this section, we present the partial tag counting bloom filter architecture
A) Way-Tag Modules
To decrease the energy consumption during the L2 cache, the way-tag cache can be used. When the L1 data cache loads a data from the L2 cache, the way-tag information of L2 cache is also loads to the L1 cache and it is stored in the way tag array. This provides the key information about the location of the corresponding data in the L2 cache. For every read and write operations in the L1 cache, L2 cache need to be accessed. In which, the way-tag cache performs different operations for read and write access. Under the write-through policy, for every write operations in the L1 cache, L2 cache needs to be accessed. If there is write hit in the L1 cache, the L2 cache selects only one way and that will be activated. Because the way-tag is attached in each way of L2, due to this it provides information to the way-tag array. From the way-tag array, we can get the desired way of data. If there is write-miss in the L1 cache means, the requested data is not available in the L1 cache (i.e.) if the data is not there in the L1 cache the desired information will not be available in way-tag arrays.
So, under the write miss in all the ways the L2 cache can be accessed simultaneously. Due to this it consumes more energy consumption. During the read hit, the requested data is available in the L1 cache, therefore L2 cache need not to be accessed. If there is read miss, the requested data is not available in the L1 cache. Therefore, the L2 cache needs to be accessed simultaneously in all the ways. By using the way-tag cache, under the write operation it access only fewer ways and it reduces the memory energy consumption.
Components included are way-tag arrays, way-tag buffer, way decoder and way register. Way tag array keeps the L2 way tag information corresponding to the l1 cache. Way-tag buffer temporarily stores the way tags reads from the way tag arrays. Write buffer avoids write stalls when the processor waits for write operations to be completed in the L2 cache. Way-decoder decodes way tags and generate enable signal for L2 cache, activates only desired ways in L2 cache. Way register stores way tags and provide information to way-tag arrays
B. Partial Tag-Enhanced Bloom Filter
[image:3.612.334.560.457.607.2]The Bloom filter per each cache way is looked up, before the tag structure access each cache way. If the Bloom filter indicates zero, then tag comparison for the cache way is avoided. Due to this we can reduce the most of the energy consumption during the tag comparison.
Fig 3.Partial tag comparison
International Journal of Emerging Technology and Advanced Engineering
Website: www.ijetae.com (ISSN 2250-2459,ISO 9001:2008 Certified Journal, Volume 4, Issue 10, October 2014)
258 C. Implementation and Performance
The implementation of the proposed method is shown in Fig.4. It includes partial tag Bloom filter, Tag arrays, Data array, way decoder, way tag registers, way tag arrays and way tag buffer.
Fig.4. Proposed cache system
VI. RESULT
Modelsim and Xilinx ISE are the software used for the simulation. Following figure.4 shows the simulation results of and proposed cache system under write through policy.
Fig.4. Simulation result of existing system
Following figure.5 shows the simulation results of the proposed cache system under write through policy.
Fig.5. Simulation result of proposed system
Following table1 shows the comparison of way tag cache with the partial tag cbf technique. For the comparison power consumption, area, delay and memory usages are taken.
Table: I
Comparison of existing and proposed method
Existing cache system
Proposed cache system Power
consumption(mW)
252 215
Area(kilobytes) 11589 11528
Delay(ns) 27.537 25.638
Memory usage(kilobytes)
122084 121892
VII. CONCLUSION
The objective of the thesis to design and implement a efficient low power l2 cache architecture is achieved. The design of the partial tag cbf technique based cache provides better power consumption than the existing systems. The HDL used is Verilog; simulation is done with Modelsim ISE software; and finally implemented on Xilinx Spartan 2 FPGA kit.
International Journal of Emerging Technology and Advanced Engineering
Website: www.ijetae.com (ISSN 2250-2459,ISO 9001:2008 Certified Journal, Volume 4, Issue 10, October 2014)
259
Partial tag partially matches the incoming address with the cache line address. If the partial tag gives matches then full tag comparison will be performed. Under the membership test CBF improves the energy and speed of the cache. Before the tag structure enters the cache way is accessed, first the Bloom filter is attached. If the Bloom filter indicates zero, then tag comparison for the cache way is avoided, thereby saving the energy that would have been consumed in tag comparison.
REFERENCES
[1] J. Maiz, S. hareland, K. Zhang, and P. Armstrong, ―Characterization of multi-bit soft error events in advanced SRAMs,‖ in Proc. Int. Electron Devices Meeting, 2003, pp. 21.4.1–21.4.4. .
[2] F. X. Ruckerbauer and G. Georgakos, ―Soft error rates in 65 nm SRAMs: Analysis of new phenomena,‖ in Proc. IEEE Int. On-Line Test. Symp., 2007, pp. 203–204.
[3] L. Li, V. Degalahal, N. Vijaykrishnan, M. Kandemir, and M. J. Irwin, ―Soft error and energy consumption interactions: A data cache per-spective,‖ in Proc. Int. Symp. Low Power Electron. Design, 2004, pp. 132–137.
[4] X. Vera, J. Abella, A. Gonzalez, and R. Ronen, ―Reducing soft error vulnerability of data caches,‖ presented at the Workshop System Ef-fects Logic Soft Errors, Austin, TX, 2007.
[5] C. Su and A. Despain, ―Cache design tradeoffs for power and perfor-mance optimization: A case study,‖ in Proc. Int. Symp. Low Power Electron. Design, 1997, pp. 63–68.
[6] K. Ghose and M. B. Kamble, ―Reducing power in superscalar processor caches using subbanking, multiple line buffers and bit-line segmen-tation,‖ in Proc. Int. Symp. Low Power Electron. Design, 1999, pp. 70–75.
[7] A. Hasegawa, I. Kawasaki, K. Yamada, S. Yoshioka, S. Kawasaki, and P. Biswas, ―Sh3: High code density, low power,‖ IEEE Micro, vol. 15, no. 6, pp. 11–19, Dec. 1995.
[8] K. Inoue, T. Ishihara, and K. Murakami, ―Way-predicting set-associa-tive cache for high performance and low energy consumption,‖ in Proc. Int. Symp. Low Power Electron. Design, 1999, pp. 273–275.
[9] B. Calder, D. Grunwald, and J Emer, ―Predictive sequential associative cache,‖ in Proc. 2nd IEEE Symp. High-Perform. Comput. Arch., 1996, pp. 244–254.
[10]J. Dai and L. Wang, ―Way-tagged cache: An energy efficient L2 cache architecture under write through policy,‖ in Proc. Int. Symp. Low Power Electron. Design, 2009, pp. 159–164.
[11]T. Lyon, E. Delano, C. McNairy, and D. Mulla, ―Data cache design con-siderations for Itanium 2 processor,‖ in Proc. IEEE Int. Conf. Comput. Design, 2002, pp. 356–362.