Related work - On the design of efficient caching systems

We already provided an extensive overview of previous work regarding content caching in Chapter 2. It is however worthwhile discussing two further streams of work related to the contributions of this chapter. These concern hash-routing protocols for content caching and caching systems designed to optimise operator-wide performance metrics.

4.7.1 Hash-routing

Hash-routing is a well known technique widely used in Web caches albeit only in the context of co-located caches. The classic approach would consist in using modulo hashing on a content identifier to map a content to a server. This approach however has the major drawback that the addition or removal of a server causes a considerable remapping of contents among servers, leading to a transient drop of performance. This problem was originally addressed by Thaler and Ravishankar [163], who proposed a mapping algorithm named Highest Random Weight (HRW). According to HRW, each content is mapped to an ordered list of servers. Requests are routed to the first server of the list and, in case of high load or failure, to the next server of the list until an available server is found. This approach has then been adopted in CARP [173].

A major breakthrough came with the invention of consistent hashing [100], which guarantees that in case of addition or removal of a servers only the contents mapped to the added/removed server are remapped.

Other noteworthy contributions on hash-routing come from Ross [145], who investigated the performance of hash-routing protocols in comparison to other cooperative caching schemes and Tang and Chanson [161] who proposed adaptive load balancing techniques for hash-routing by tuning hashing weights.

All these hashing techniques can be applied to our proposed framework as well, since it does not mandate any specific hashing scheme.

Subsquent to the publication of the contributions presented in this chapter, further proposals have investigated the application of hash-routing techniques in the content of ICN. Among those is CoRC [40], which proposes to use hash-routing techniques among ICN routers of a domain like our framework, but with main objective of reducing the size of a forwarding table that each router needs to store by partitioning it across routers. Also Saha et al. [150] proposed the use of hash-routing for cooperative caching but focusing on the specific use case of inter-AS caching without taking intra-AS design considerations into account, such as load balancing and optimisation aspects, which our proposal addresses. Therefore their approach is complementary to ours which focuses on intra-AS caching.

4.7.2 Distributed caching to optimise operator-wide performance metrics

Despite considerable research effort has focused on the design of cooperative caching schemes, most work focuses on the general case of caches not necessarily managed by an ISP. Our approach, by specifically addressing the case of operator-managed content caching, enables the optimisation of ISP- wide performance metrics. Two recent pieces of work also addressed the case of optimising operator-wide metrics, albeit providing solutions completely different from ours. Pacifici and Dan [136] investigate the case of content-level peering, in which ASes cooperatively cache contents in their networks. Using game-theoretical concepts, they propose caching algorithms enabling a stable cache allocation. Araldo et al.[9] propose an on-path meta-caching policy driven by economic cost of retrieving content items from provider ISPs.

4.8 Conclusion

In this chapter we presented hash-routing techniques for use in geographically distributed environments. We showed that these hash-routing schemes provide several advantages in comparison to state-of-the-art techniques.

They provide excellent cache hit ratio, hence reducing transit costs. They also contribute to reduce latency. However, most importantly and differently from other schemes, hash-routing techniques have a set of key advantages. First, they are extremely robust against rapid variation of traffic patterns and traffic spikes, whether regarding specific locations or contents. Second, they evenly distribute traffic across a network domain, de facto eliminating the possibility of hot spots. Third, they are easier to model and as a result provide predictable performance and make cache placement easier. In addition, their robustness to peaks makes them better candidates for emerging ISP-CDN cooperative caching schemes, given that it does not require adaptive changes and this also makes it easier to keep SLAs under control. Fourth and of key importance, they provide several knobs to fine-tune performance metrics depending on the target topology, deployment models and expected workloads.

On the design and implementation of

high-speed caching nodes

5.1 Introduction

After discussing modelling and design aspects of distributed caching, focusing on how a set of caching nodes can interact to optimise performance, we now investigate how to improve the design and implementation of a specific caching node of a networked system.

In particular, we address the problem of how to optimise the design of read-through caching nodes, which are a fundamental building block of key networked systems such as CDNs and future ICN architectures such as CCN [89] and NDN [183]. A read-through caching node receives queries for specific content items in the form, for example, of an HTTP GET request or a CCN/NDN Interest. If it currently stores the requested content, it serves it to the requester from its caching space, otherwise it fetches the requested content from the origin or an upstream cache, stores it and finally serves it to the requester.

We aim to improve the state of the art by designing and implementing a cost-effective caching node capable of sustaining high cache hit ratio and line-speed operations. Our approach relies on using inexpensive commodity hardware together with highly optimised software. However, achieving these objectives is challenging. In fact, reactive read-through caches, differently from proactive content caches, store content items as they serve them. Therefore, they need to support concurrent reads and writes efficiently and make intelligent caching decisions to maximise cache hit ratio while being able to sustain line speed operations.

Arguably, the most important performance metric for a read-through cache is the throughput of traffic served from its cache memory, i.e., the throughput originated by cache hits. In this respect, there are two key hardware properties limiting the extent of software optimisations. The first is the capacity of the cache memory as it directly impacts the number of distinct content objects that can be cached and therefore the cache hit ratio. The second is the external data rate or bandwidth of the memory, since from that depends the maximum read throughput that the cache memory can attain. However, building a system with large and fast memory using a single technology is very expensive. We address this problem by exploiting heterogeneous technologies, specifically DRAM and Flash-based SSD drives, which operate at different points of the cost/performance tradeoff. We devise novel techniques efficiently taking advantage

of specific characteristic of the heterogeneous memory available to improve performance.

Large amount of work already focused on designing key-value stores using a combination of DRAM and Flash memories (see for example [5], [7], [48], [50], [119]). However none of these designs are suitable for our purposes. In fact, the vast majority of them only support a store semantics and cannot be efficiently used for caching. Even the few designs that do support caching (e.g., [5], [49]) are not optimised for high-speed caching and primarily target applications requiring very large datasets, such as data deduplication. As such, they focus on hardware configurations that minimise the cost/capacity ratio of the memory used and all assume a case in which DRAM is very small in comparison to Flash memory, to the point of being barely sufficient to store the index of contents stored in Flash. As a result, all these designs use the DRAM only to store the index and as a write buffer for the SSD, while all content items are stored exclusively in SSD. A substantial part of that work focuses on devising novel ways to compress the size of the index or to store part of the index on SSD while keeping read and write amplification under control.

Our key intuition is that in high-speed caching applications, a hardware configuration that primarily maximises the number of stored items is not always beneficial if the cache is not fast as well. In fact, a large memory can store many items and therefore lead to a higher cache hit ratio. However, if the memory cannot sustain the throughput required to serve all hits, the large availability of space is underutilised. On the other hand, a small but fast memory might not be able to operate at its full bandwidth, because its limited space does not allow to store enough contents to generate high cache hit ratio.

Table 5.1 shows the average three-year Total Cost of Ownership (TCO) of DRAM and SSD technologies normalised by capacity and bandwidth as of 2015. As it can be observed, while it is more cost-effective to use SSDs to achieve a large capacity, DRAM is considerably more cost-effective to achieve high throughput. Fortunately, the power-law distribution of content popularity, typical of content delivery workloads [25], makes it possible and very effective to use a combination of small and fast memory like DRAM to store a few very popular contents and a slow and large memory like SSD to store a larger number of less popular contents.

Table 5.1: Three-year TCO of DRAM and SSD memory technologies, including cost of hardware and energy at 0.10 USD/kWh

Memory Cost/capacity (USD/GB) Cost/bandwidth (USD/Gbps)

DRAM 6.72 0.63

SSD 1.1 56.21

Based on this tradeoff we design a system that uses both DRAM and SSD to store content in a way to efficiently exploit key properties of both. Differently from previous work we actively use DRAM to store hot content items (in addition to the index data structures) to take advantage of its low cost/bandwidth. We disregard HDDs in our design because of their very low throughput performance and high random access latency, which make them unsuitable for our purpose. In addition, SSD prices are dropping much faster

than HDDs and, according to market forecasts [60], the launch of 3D NAND SSD drives scheduled for Q3 2016 will drive the cost of SSDs further down, to the point that both cost/capacity and cost/bandwidth of SSDs will be lower than HDDs by the end of 2016.

This chapter provides two main contributions. First, we examine a number of design options and derive general principles applicable to the design and implementation of caches for content distribution purposes, which could be adopted both in backend key-value caching systems (i.e., memcached [130]), Content-Delivery Networks and packet-level caches envisaged by ICN proposals. Second, based on these general principles, we design and implement H2C, a hierarchical two-layer line-speed packet cache that we use as content store for CCN/NDN routers and evaluate its performance with trace-driven experiments. We focus on the ICN use case because of its highly demanding technical challenges and also because it is the application that has received the least attention from the research community so far. Its technical challenges mainly stem from caching at packet-level granularity which requires to store a very large number of small objects and support a frequency of read operations considerably higher than object-level caches of CDNs. It should however be noted that most of the findings of this work are applicable to other use cases as well.

The remainder of this chapter is organised as follows. In Sec. 5.2 we provide general design guidelines for the design of two-layer DRAM/SSD caches. Based on these findings, we describe the design and implementation of H2C in Sec. 5.3. In Sec. 5.4 we evaluate the performance of H2C integrated with a CCN/NDN content router using trace-driven experiments. In Sec. 5.5 we survey related work and highlight the unique contributions of this chapter. Finally, we summarise our findings and present our conclusions in Sec. 5.6.

In document On the design of efficient caching systems (Page 76-80)