Abstract— To improve the performance of memory, cache memory is placed between the main memory and processor to reduce the memory access time. Many processors uses cache write through policy which increases more write access and incur more energy, during write operations. Way-tagged cache is used to improve the energy efficiency of write through policy. In this the way tag of the L2 cache information is available in L1 cache, during read operation. Due to this the way-tag technique enables L2 cache to work in an equivalent direct mapping manner during write hits, and it leads to reduce the significant energy without performance degradation. While for read/write misses, the corresponding way tag information is not available in the way-tag arrays. As a result, all ways in the L2 cache are activated simultaneously under read misses. It uses set associative technique and incurs more energy. The partial tag counting bloom filter is used to improve the accuracy of the cache. The incoming address is partially matches with the cache line address by using partial tag. If the partial tag gives matches then only full tag comparison of address will be performed. If there is no match, then the further comparison is not needed. Counting bloom filter improve the energy efficiency. The partial tag CBF is significantly faster and requires much less energy than accessing the large set of memory. The Counting Bloom Filter is used, to test current existence of an element in the large set. The element is definitely not in the large set, if count is zero; otherwise, CBF cannot assist and the large set must be searched. Due to this CBF improves the energy efficiency of cache architecture and reduces the more power dissipation.
Cache memory is used to increase data transfer rate. Generally it is difficult to found data requested by microprocessor in cache memory, as the size of cache memory is very small as compared to RAM and main memory. When any data requested by microprocessor will not found in cache memory this causes cache miss in cache memory. We know that cache memory is on chip memory, if data requested by microprocessor is found in cache memory then it will increase the data transfer rate as data is present nearer to processor. If data requested by microprocessor is not found in cache memory, then data will receive by microprocessor from RAM or main memory. It takes more time for this process and it directly affects the data transfer rate. This means that speed or data transfer rate increase by reducing the cache miss in cache memory. To track the cache miss in cache memory we will design cache controller, which will help us to increase data transfer rate between main memory and processor. In design of a cache, parameters that enable design of various cache architectures include the overall cache size, block size and efficiency, along with the replacement policy employed by the cache controller. The proposed architecture takes advantage of a number of extra victim lines that can be associated with cache sets that experience more conflict misses, for a given program. To reduce the miss delay in caches, the concept of victim cache is proposed. To implement the proposed cache architecture, three modules are considered; a cache controller, and two storage modules for the data store and the TAG store. This is illustrated in Fig. below. The cache controller handles the memory access requests from the processor, and issues control signals to control flow of data within the cache .
While caching is a promising approach to reduce route computing load, we believe that recent proposals have taken very simplistic approaches and several fundamental issues have received no attention. Firstly, the hierarchical architecture of very large networks has not been taken into account. Large networks are potentially partitioned into several domains. A realistic caching scheme should offer an end-to-end solution across multiple domains in a large network. Secondly, for scalability reasons, topology aggregation is identified as an essential technique in large networks with multiple domains. In a large network, an end- to-end route potentially crosses several domains. Considering that each domain represents only an aggregated view of its internal topology and state information, the important question is: how can such an end-to-end route be cached efficiently? Finally, cached routes are subject to changes in the network conditions and should be regularly updated. The simple update techniques that try to periodically re-compute all cached routes can cause considerable computing load.
2) “I don’t know,” implying that the CBF cannot assist in a membership test, and the large set must be searched. The CBF is capable of producing the desired answer to a membership test much faster and saves power on two conditions. First, accessing the CBF is significantly faster and requires much less energy than accessing the large set. Second, most membership tests are serviced by the CBF. The latter is investigated by studying the application behavior. For instance, when CBF is exploited as a miss predictor, previous work shows that more than 95% of the accesses to the cache tag array are serviced by the CBF.
Abstract: The focus of this paper is to depict the field of computer architecture, researchers compare architectures by simulating them on a common platform with common benchmark programs. This paper accomplished the following objectives. Designed high level cache architecture with the goal of improving high performance, Low power computing. Compared the new design to existing designs through software simulation. Concluded whether or not the design out-performed existing cache designs in regard to high-performance, low power computing and for determining the new cache configuration. In this paper, we present the different steps of our methodology of adjusting caches for a specific MPSoC application. We have to generate the configuration file where we can find the best configuration with the best size, speed, and associativity for the given application Keywords: cache memory, proposed methodology, VHDL, survey on paper.
The trend towards integrating multiple cores on the same die has accentuated the need for larger on-chip caches. Large caches are constructed as a multitude of smaller cache banks interconnected through a packet-based Network-on-Chip (NoC) communication fabric. NoC plays a critical role in optimizing the performance and power consumption of non-uniform cache-based multicore architectures. We examine the data compression technique Compression in the NIC (NC).Higher frequencies lead to higher power consumption, which, in turn, spawned cooling and reliability issues. By utilizing a number of simpler, narrower cores on a single die, architects can now provide Thread-Level Parallelism (TLP) at much lower frequencies. The presence of several processing units on the same die necessitates oversized L2 and, where applicable, L3 caches to accommodate the needs of all cores. Larger cache sizes are easily facilitated by the aforementioned explosion in on-chip transistor counts. However, the implementation of such large cache memories could be impeded by excessive interconnect delays. While smaller technology nodes are associated with shorter gate delays, the former are also responsible for increasing global interconnect delays . Therefore, the traditional assumption that each level in the memory hierarchy has a single, uniform access time is no longer valid. Cache access times are transformed into variable latencies based on the distance traversed along the chip, and has spurred the Non-Uniform Cache Architecture (NUCA) concept as in . A large, monolithic L2 cache is divided into multiple independent banks, which are interconnected through an on-chip interconnection network. The network fabric undertakes the fundamental task of providing seamless communication between the processing cores and the scattered cache banks. NoCs are be-coming increasingly popular because of their well-controlled and highly predictable electrical properties and their scalability. However, research in NoC architectures clearly points to some alarming trends; the chip area and power budgets in distributed, communication-centric systems are progressively being dominated by the interconnection network . The ramifications of this NoC dominant design space have led to the development of sophisticated router architectures with performance enhancements, area-constrained methodologies, power-efficient and thermal-aware designs, and fault-tolerant mechanisms. Data compression as in ,  is used for additional gain in performance and power.
Under the write-through policy, caches at the lower level experience more accesses during write operations. Obviously, the write-through policy incurs more write accesses in the L2 cache, Power dissipation is now considered as one of the critical issues in cache design. Here new cache architecture, referred to as c o u n t i n g b l o o m f i l t e r c a c h e s ys t e m , to improve the energy efficiency of write-through cache systems with minimal area overhead and no performance degradation. Consider a two-level cache hierarchy, where the L1 data cache is write-through and the L2 cache is inclusive for high performance. It is observed that all the data residing in the L1 cache will have copies in the L2 cache. In addition, the locations of these copies in the L2 cache will not change until they are evicted from the L2 cache. Thus, we can attach a tag to each way in the L2 cache and send this tag information to the L1 cache when the data is loaded to the L1 cache. By doing so, for all the data in the L1 cache, we will know exactly the locations (i.e., ways) of their copies in the L2 cache. During the subsequent accesses when there is a write hit in the L1 cache (which also initiates a write access to the L2 cache under the write-through policy), we can access the L2 cache in an equivalent direct-mapping manner because the way tag of the data copy in the L2 cache is available. As this operation accounts for the majority of L2 cache accesses in most applications, the energy consumption of L2 cache can be reduced significantly.
10 Read more
run-time or design-time mechanisms to reduce the heat flux and did not consider 3D-ICs with heterogeneous stacks. The goal of this work is to achieve a balanced thermal gradient in 3D-ICs, while reducing the peak temperatures. In this research, placement algorithms for design-time optimization and choice of appropriate cooling mechanisms for run-time modulation of temperature are proposed. Specifically, an architectural framework which in- troduce weight-based simulated annealing (WSA) algorithm for thermal-aware placement of through silicon vias (TSVs) with inter-tier liquid cooling is proposed for design-time. In addition, integrating a dedicated stack of emerging NVMs such as RRAM, PCRAM and STTRAM, a run-time simulation framework is developed to analyze the thermal and perfor- mance impact of these NVMs in 3D-MPSoCs with inter-tier liquid cooling. Experimental results of WSA algorithm implemented on MCNC91 and GSRC benchmarks demonstrate up to 11 K reduction in the average temperature across the 3D-IC chip. In addition, power density arrangement in WSA improved the uniformity by 5%. Furthermore, simulation results of PARSEC benchmarks with NVM L 2 cache demonstrates a temperature reduction
83 Read more
Privacy-preserving data mining has abundant applications in observation which are logically stranded to be “privacy- violating” requests. The solution is to propose methods which persist to be efficient, without confessing security. Some methods for privacy calculation utilize some figure of transformation on the data in organize to achieve the privacy preservation. Classically, such methods decrease the granularity of illustration in order to diminish the privacy. This decrease in granularity results in some defeat of efficacy of data organization or mining algorithms. This is the normal trade-off among information loss and isolation. Some illustrations of such strategies are randomization methods, k- anonymity model and l-diversity, Distributed privacy preservation, Downgrading Application Effectiveness. To deal with privacy preservation problem, Cache, a common strategy for a class of location-based services that allows users to have the benefit of the services while diminishing the related privacy concerns. Cache obtains a well-explored idea from dispersed systems, explicitly caching, and relates it in the framework of privacy. Cache has two interior ideas: (1) location-enhanced contented can be sometimes pre-fetched in great geographic mass onto a device previous to it is really needed, and (2) the content can be admitted nearby on a device when it is really needed, without relying on any networked services exterior of the device. Thus, relatively than distributing present location on each demand for information, the user only wants to divide common desired content.
Fig. 5 shows the L-CBF architecture. Here the up/down Leaner Feedback Shift Register is used to generate random number of addresses. By using the comparator both the addresses are correlated to test whether the address is member or not. Fig. 6 indicating the LFSR block diagram. The number of D-flip flops and the XOR gates are used here to generate different addresses. Every LFSR has the following limits: 1) the number of bits in the shift registers is parallel to the width and size of the LFSR. 2) in the LFSR, taps are the output of D-flip flop that have been connected with the feedback loop. 3) at starting state, the LFSR can be any value, except one. Here the total numbers of tag bits are 5. So, the feedback polynomial will be X5 + X3 + 1. So, the output is taken from 3rd and 5th output of D- flip flop and Xored both of them and feedback to the 1st flip flop. The input of the LFSR is Clock and reset. The memory core is used to tell whether the element is member or not in the large set.
In the above example, a lot of message is generated across the network to authenticate and authorized each user / learner every time. Also, many users retrieve the same learning contents or objects from the database. This may lead to decrease the performance and scalability of the E-learning applications/architecture. We can improve the performance of the E-learning applications by using caching and proxy, in distributed environment. The proxy reduces the load on large- scale distributed e-learning applications  by centrally managing and caching the authentication and authorization of web service requests. The statement of the problem is to improve the performance and scalability during the design of reference architecture under distributed environment.
10 Read more
In the NoC architecture, there are several techniques for power management that are difficult to implement with traditional busses. The NoC can be divided into sub-networks .In any specific application if any sub-network does not operate then it can be independently powered off. Thus it reduces the static power which was consumed more in case of bus architecture.
such models are only an approximate guide and no sub- stitute for an actual structure, they do provide a means to appreciate certain key features (Fig. 6). Firstly, the model shows the expected hydrophobic interactions at the inter- face mediated by the 'a' and 'd' positions of a heptad. Like- wise the model also supports the idea that residues in 'g' and 'e' form stabilizing interactions via oppositely charged residues or through hydrogen-bonding between polar residues. However, there are some notable varia- tions on the general CC theme. The 10th residue, corre- sponding to an 'a' position, is most often an asparagine rather than a hydrophobic residue. This N is predicted to form stabilizing hydrogen-bonding interactions with its cognate from the adjacent monomer, and is similar to asparagines located in the 'a' or 'd' positions of bZIP pro- teins . More importantly, the arginine of the con- served ERT signature lies in a 'd' position that is typically hydrophobic. Given its size it is likely that the charged head of the R projects to the exterior, where it could potentially form a polar interaction with the T at the flanking 'e' position (Fig. 6). Such interactions are likely to be critical for the function of S-helix as suggested by muta- tional data, and consistent with such a proposal, mutation of the T to other polar residues, like an acidic residue, does not disrupt function, unlike a hydrophobic substitution . These observations suggest that the conserved RT sig- nature of the linker forms a distinctive structural feature that functions as a switch within the CC. It is likely that the arginine owing to the length of its side chain can form alternative interactions that respectively prevent or allow downstream domains from "firing". Given that the RT sig- nature lies in the key 'd' position of the central heptad of the S-helix its interactions are likely to affect the confor- mation of the entire CC. This proposal is consistent with the observed polarity in the domain architecture graph, where catalytic domains are typically downstream of the S-helix, with various sensory domains or the receiver domain that gets phosphorylated on a conserved aspar- tate being upstream. Thus, due to its central position, the switch in the CC could respond to a conformational alter- nation in the upstream domain (the stimulus), only then undergo an appropriate conformational alteration itself, and thereby transmit a signal to allow action of the down- stream domains.
16 Read more
In the limited storage of cache, they are trying to have more hit ratio and they have created a new prediction model which will use past data as in . It also uses an offline and online database of videos. And try to improve the algorithm by understanding hit ratio so the more popular of a video more it will stay in cache even with the case of new videos getting popular. When the window in time is small. Small window, such as a day, can incorporate daily user behavior and achieve higher accuracy. So it is one limitation of this system.  In Hybrid Regression  they have used Watch time, Subscription and Age of Video because of these parameters it can predict the video should be stored in cache system or not. Same we will do in this scenario with our Signature Based Factors. our factors include Memory Region, Memory instruction PC, and Instruction Sequence. Below is the formula for hybrid regression,
Abstract In hard real-time systems, cache partitioning is often suggested as a means of increasing the predictability of caches in pre-emptively scheduled systems: when a task is assigned its own cache partition, inter-task cache eviction is avoided, and timing verification is reduced to the standard worst-case execution time analysis used in non- pre-emptive systems. The downside of cache partitioning is the potential increase in execution times. In this paper, we evaluate cache partitioning for hard real-time systems in terms of overall schedulability. To this end, we examine the sensitivity of (i) task execution times and (ii) pre-emption costs to the size of the cache partition allocated and present a cache partitioning algorithm that is optimal with respect to taskset schedulability. We also devise an alternative algorithm which primarily optimises schedulability but also minimises processor utilization. We evaluate the performance of cache partitioning compared to state-of-the-art pre-emption cost analysis based on benchmark code and on a large number of synthetic tasksets with both fixed priority and EDF scheduling. This allows us to derive general conclusions about the usability of cache partitioning and identify taskset and system parameters that influence the relative effectiveness of cache partitioning. We also examine the improvement in processor
47 Read more
Cache memory is widely used in computers and mobile phones. Cache memory is very fast memory but costly. It includes three types of memory that are L1 (till 64kb), L2 (till 512kb) L3 (2MB or more). Comparatively, it is faster than RAM. The main point of our cache system is to manage this system by inserting addresses to Cache memory and retrieving it also to decide which pages to be removed from cache memory. To manage this cache system traditionally we use LRU or MRU method. But LRU or MRU is not the optimal methods. We will discuss some replacement algorithms in this paper, which are LRU, FIFO, Hawkeye/PA- Hawkeye and SHiP.
We do the same actions as Upload-VM. Then we run CCleaner to delete browser data remnants such as password, cookies, cache, history, etc. We also delete the history of the Windows Explorer such as most recently used files list, image cache, Recycle Bin, Scrapbook, etc. In IE, the account name can’t be found. In GC, we find the account name on $LogFile and pagefile.sys. All the other remnants of activities of these virtual machines will be shown in Table 2. We find data remnants of the keyword Google, account name, and test files because EnCase can recover delete files. A lot of evidences can be found in system log files and Unallocated Clusters in these experiments. It means we did the deleted operation before.
The purpose of this study is to improve the web server performance. Bottlenecks such as network traffic overload and congested web server has still yet to be solved due to the increasing of internet usage. Caching is one of the popular and efficient solutions to reduce network latency. However, there is a need to study the current caching technique and further optimize the result. Therefore, this study proposed a Two-Tiered Caching System (2TCS). The 2TCS adapts a selective caching system concept which contains two tiers in its caching architecture. The 2TCS is used to further increase the probability of achieving a cache hit on top of utilizing the normal caching system. The proposed technique utilizes SQUID as its web cache server and also Mozilla Firefox’s default caching system at the client side. A performance comparison of a normal caching system against the implementation of the 2TCS is made using the IBM Rational Performance Tester to prove the result.
If the instruction required is not available in cache, then a cache miss occur, necessitating a fetch from main memory . This is referred to as a pipeline stall and delays processing the next instruction. Information is passed from one stage to the next by means of storage buffer as shown in Figure 2. There must be a register in the input of each stage (or between stages) to store information that is transmitted from the preceding stage. This prevents data being processed by one stage from interfering with the following stage during the same clock period.
10 Read more
improved, if the cache memories serve as accelerators. Due to technology scaling, caches are unarmed to soft errors. In tag matching cache, there are different types of architectures used and they are encode-compare architecture and decode-compare architecture based on direct compare method. This paper compares the encode- compare and decode-compare architecture using various performance metrics.