R elated W ork and D iscussions - h ttp d was collected on a 7-node parallel web-server for 24

M ulti-L evel Buffer Cache M anagem ent

CS GLIMPSE ZIPF RANDOM SPRITE

1. h ttp d was collected on a 7-node parallel web-server for 24 hours [71], The size of the d ata set served was 524 MB which is stored in 13,457 files A total of about 5M

5.5 R elated W ork and D iscussions

Addressing th e challenges of replacements in buffer caching hierarchy, researchers have mainly tried these two approaches: (1) re-designing low level cache replacement; (2) ex tending existing replacement into an unified hierarchy replacement through coordination. The MQ algorithm [82] is a representative of the first approach. However, w ithout the coordination w ith clients, the performance potential of MQ is significantly constrained. Since LRU is commonly used in software managed buffer caches due to their simplicity and adaptability, Wong and Wilkes [76] propose a protocol to integrate two-level buffer cache hierarchy into a single, large unified cache based on “dem otion” operations, and manage it using LRU. Their goal is to effectively utilize the built-in cache in RAID, so the network they assumed is high speed SAN. To reduce the possible network bottleneck caused by demotions in a database client and storage server system, Chen et al [14] even proposed to re-load evicted blocks from disks rather th a n from clients. O ur technique deals w ith the reduction of demotions by effectively utilizing history access patterns.

Jiang and Zhang [33] propose the LIRS replacement algorithm to address the perfor mance degradation of LRU on workloads w ith weak localities. They use a LIRS stack to track the recencies of accessed blocks. The blocks w ith small recencies at which they get accessed, are kept in the cache. This single-level cache replacement motivates us to investi gate if the last locality distance, LLD, can be effectively used to exploit hierarchical locality, so th a t blocks w ith different locality strengths can be arranged into correct cache levels.

The work on cooperative caching [23, 66, 74] is to coordinate the buffer caches of many client machines distributed on a LAN to form a fourth level in the network file system ’s

C H A P T E R 5. M U LTI-LEVEL B U F F E R CACH E M A N A G E M E N T 166

cache hierarchy. Besides local cache, server cache, server disk, d ata can also be cached in another client’s cache. Some associated issues include idle cache availability, consistent sharing. Our ULC protocol is intended for the conventional file buffer cache hierarchy, while the characterization of non-uniform locality is expected to enhance the effectiveness of data placements in the cooperative caching.

As far as the cache hierarchy between processor and memory is considered, the in teraction of replacements at various levels and its performance implication do not pose a problem. Multi-level inclusivity between L \, L2, --Ln cache could be accepted as a principle

to simplify the cache coherence protocol [3]. This is because a lower level cache is more th an ten times larger th a n its upper level cache. W ith this large difference, the size reductions of useful caches due to d ata redundancy have only limited negative performance im pact on the low level caches. In contrast, th e sizes of buffer caches in the hierarchy do not follow this regularity: a client buffer cache could even be larger than the second level cache.

We assume ULC works in a tru sted environment. Though it is a client-directed protocol, ULC does not increase the vulnerability of servers. This is because even with independent caching schemes, a client still has the opportunity to abuse server buffers by sending extra requests to servers to keep its blocks in the server.

T he underlying algorithms on almost all th e existing file systems are LRU or its variants. ULC basically inherits their d ata stru ctu re - LRU stack. The operation costs associated w ith the stacks are 0 (1 ) time w ith each reference request. Regarding space cost used for the stacks, we need 17 bytes (8 bytes for file identifier and block offset, 8 bytes for two pointers in a double linked list, and 1 byte for statuses) for a block in the client, which only represents 0.2% of an 8 Kbytes block. The m etadata in the shared server cache needs additional one

or two bytes for recording block owner. T he stack sizes on other levels except the first one are determ ined by their cache sizes. Thus a server with a 1GB cache only uses 2.2MB for its m etadata. The first level cache has to hold u n iL R U stack, whose actual size is determined by th e working set size of applications running on the client. The relatively cold blocks (with low level statuses) can be trim m ed from the stack w ithout compromising the ULC locality distinction ability if needed to save space cost. For example, an 8.5MB m etadata in the client can support a workload working set up to 4GB. This is highly affordable in a system endeavoring for improved file I/O performance through large caches.

5.6 S u m m a ry

An effective caching scheme for multi-level cache hierarchy is im portant to th e performance of applications because increasingly more applications rely on the hierarchy for their file ac cesses. After carefully investigating the non-uniform locality strength quantifications in the representative file access patterns, we propose the ULC caching protocol. Compared with the commonly used independent LRU scheme and the other recently proposed schemes, th e ULC protocol shows its distinguished performance merits: (1) It consistently and signifi cantly reduce average block access tim e perceived by applications; (2) It can be im plemented efficiently w ith 0 (1) tim e complexity w ith only a few stack operations associated w ith a reference.

Chapter 6

In document Efficient caching algorithms for memory management in computer systems (Page 189-192)