6.2 Toy-Train Disk Logging
6.3.4 Comparison with SSD-based Logging
To compare the performance of the Beluga prototype with logging using SSDs. We measured the average latency of 1 million logging operations against an SSD-based device with the on-disk cache turned off. The SSD used in this test is a 64-Gbyte SSD based on JMicron JM612F flash controller and Samsung’s SLC flash memory chips. The result, shown in Table6.9, shows that the Beluga prototype’s average logging latency is actually slightly better than that of SSD- based logging. Of course, the device-based logging implementation on SSD is not as extensively optimized. Actually we believe the streamlined disk write pipeline described in this work is equally effective for SSDs. Nonetheless, this result demonstrates that with proper structuring and tuning, hard disk-based logging could be as performant as SSD-based logging. In fact, for MLC SSDs, which has limited write count (around 10000), a high-performance low-latency hard disk logging technique such as Beluga may be a useful complement to handle sequential logging workloads.
6.4
Summary
The disk access pattern of logging is arguably the most straightforward because it is sequential in nature and yet, it is surprisingly difficult to achieve both high logging throughput and low logging latency, especially for fine-grained logging operations. The main reason is that modern I/O stacks and disk drives incorporate redundant request merging and scheduling functionalities that may get in each other’s way. Moreover, although careful control of disk access timing is crucial in delivering high disk I/O performance, there is typically little coordination between the I/O stack and the underlying disks. As a consequence, the latency and throughput of vanilla file-based or device-based logging implementations are far away from the optimum. Incorporating our understanding of the root cause behind the observed performance problems, we devised a novel logging system architecture called Beluga, which features the following innovations:
• A logging API that supports fine-grained logging (i.e. logging payload size is smaller than a disk sector) with minimum metadata manipulation and data copying,
• A streamlined disk write pipeline that moves fixed-sized disk write re- quests at a constant rate while minimizing the pipeline cycle time and • A low-power sparse-mode logging scheme that achieves low logging la-
tency without requiring disk head position prediction.
Measurements on a fully operational Beluga prototype that embodies all three innovations demonstrate that using three commodity disks, the Beluga architecture can deliver close to 1.2 million 256-byte logging operations while keeping each logging operation’s end-to-end latency below 1 msec. We believe this is the best empirical disk logging performance ever reported in the open literature. With such a high performing disk logging solution, DISCO is able to successfully integrate Beluga with its various data structures, to ensure better disk I/O responsiveness and guaranteed data persistency.
Chapter 7
Quality of Service Guarantee for
Software-Defined Distributed
Storage Systems
7.1
SDDS System Architecture in the Con-
text of Managing QoS Functionality
A cloud storage system manages the storage requirements of the tenants’ appli- cations, enabling the tenants to not worry about managing their application’s storage resources. Though a tenant greatly benefits by such a flexibility, the advantages of a cloud storage system are negated if the tenant doesn’t receive satisfactory performance. For example, a cloud storage system could have a large-scale array of highly advanced flash-based SSDs, that can process I/O requests at a very high rate of 10000 I/O operations per second (IOPS). If a tenant has a real-time application that requires I/O requests to be processed no later than 1 ms, then though, on an average a large majority of the appli- cation’s I/O requests are processed well within the latency requirement, since the cloud storage system isn’t configured to handle strict latency requirement, some of the application’s I/O requests could fail to be processed within 1 ms. Hence the tenant’s application fails and the tenant is simply unhappy with the performance offered by the cloud storage system. Therefore it is an usual prac- tice to bind performance with quality of service (QoS). QoS in a storage system can be specified by various metrics like bandwidth, in terms of either mega bytes per second (MBPS) or IOPS; latency, in terms of maximum time(micro seconds) to process an I/O request; and so on. Tenants specify these vari- ous QoS metrics using service level agreements (SLA) with the cloud storage service provider, at the time of purchasing their storage services. The cloud
Figure 7.1: Detailed overview of the components of a VDC
storage service provider either accepts or rejects the SLAs, depending on the feasibility of guaranteeing the QoS requirements. Since the most important aspect of a cloud storage system is to offer efficient software services over the physical hardware resources, to ensure satisfactory storage performance, the cloud storage system is commonly referred to as software defined distributed storage (SDDS) system.
There are various challenges in designing QoS for a SDDS system, and in this work we propose a QoS model called Cheetah, that uses some novel techniques to enable the SDDS system to provide storage virtualization with accurate QoS guarantees. The challenges, design objectives and the novel techniques involved with Cheetah are better conveyed once the SDDS sys- tem’s architecture is clearly defined. Section 1.2 defines the SDDS system’s architecture at a much higher level and is targeted specifically to introduce the context where several pieces of this dissertation fit in together. In the following subsections, we describe the system model and service model of the SDDS system architecture in greater detail that helps understand the chal- lenges involved in designing QoS features in the SDDS system.