Summary - Erasure coding for distributed storage systems

In this chapter, we compare three linear coding schemes offering different locality properties: Ham, a locally repairable code with all-blocks locality r = 3, rs(4,3), a Reed-Solomon code representing the worst-case locality r = k = 4, and Pyr(4, 3, ⟨2⟩), a Pyramid code with information locality ⟨r⟩ = 2.

We experimentally evaluate Ham using simulations and a real-world fault trace. Our results show the benefits of relying on efficient XOR operations for encoding, as well as increased performance at repairing faulty systems.

We also present an efficient method for repairing Ham based on designs. When compared to rs(4,3) and Pyr(4, 3, ⟨2⟩), Ham shows lower latency, thanks to the repairing procedure. Similarly, using a real-world trace, Ham consumes less bandwidth thanks to its all-blocks locality property.

Our results also highlight how information locality and locality should not be confused: we show that a code offering information locality can outperform standard MDS codes in term of repair bandwidth when few erasures occur, and that such a restriction does not apply to the code with all-blocks locality.

Chapter 6

Block placement for fault

resilient distributed tuple

spaces

6.1 Introduction

We are currently observing a deluge of data originated by our personal devices. Distributed applications must be able to efficiently collect, store, process and expose data. When dealing with such applications, developers need to settle on a specific programming model, to i) facilitate the implementation of such systems and ii) retain user-friendliness and ability to scale, both horizontally and geographically. Distributed storage systems are one prominent example of such applications. They are typically operated across wide area networks, such as Amazon AWS, which currently comprises 15 geographical regions.1 _In

such deployment scenarios, applications must transparently tolerate faults, a common threat for distributed systems.

The basic strategy to tolerate faults is to rely on block replication, which entails a huge storage overhead. A state-of-the-art solution to decrease such overhead while providing the same level of fault tolerance is to use erasure coding techniques. With a systematic linear code of length n and dimension k, each codeword consists of n blocks: k source blocks for the original data, and n − k redundant blocks. The storage overhead is n−k_k , and if the code is MDS, any k of the n blocks are necessary and sufficient to recover the original data, as detailed is Section 2.1.

From a fault tolerance point of view, it is optimal to place the n blocks of a codeword on different logical units (with respect to failures), so that the MDS code can tolerate up to n − k failures. A logical unit can be a single node (in this case for the optimum it is sufficient to place different blocks of a codeword on different nodes), but it can also be a cluster of nodes (e.g., a set of machines physically hosted in a single room can go down at the same moment if the cooling system of the room fails). In this second scenario, one is tempted to spread different blocks of a codeword into separate and faraway

clusters. Although being optimal with respect to fault tolerance, this solution affects negatively the latency to fetch the blocks.

The case of distributed tuple spaces A programming model consists of two separate pieces: the computation model and the coordination model. The computation model allows the programmer to build a single computational unit, while the coordination model is the glue that binds separate activities into an ensemble [78]. The tuple space paradigm, based on this idea, offers a flexi- ble technique to program parallel and distributed systems, by providing the abstraction of a shared space to which all the processes have access. In this model, communication between processes is indirect and anonymous as it is done through the shared (distributed) space. Moreover, data exist in a tuple space and do not belong to any process. Despite the simplicity of the model, very few implementations of tuple spaces offer fault tolerant facilities usually in the form of data replication ([79, 80]), with the drawbacks of space overhead and consistency maintenance. In this chapter, we consider an extended, distributed tuple space system with erasure coding capabilities. A tuple to be inserted in the tuple space is erasure coded and its blocks are placed across the nodes joining the tuple space group.

Chapter organization First, we study how to distribute the encoded blocks of single codewords over a large-scale network, in order to decrease the fetch latency. We do so by designing and evaluating several block placement heuristics, over synthetic and real-world network topologies. Second, we evaluate how the proposed heuristics behave with respect to data loss when injecting faults into the topology. Third, we leverage the results of our simulations to identify two suitable placement strategies that we deploy atop a simple distributed tuple space system with the aim of evaluating their performance in a practical setting. This chapter is organized as follows. We survey the related work in Sec- tion 6.2 and introduce the tuple space paradigm in Section 6.3. In Section 6.4 we describe our block placement heuristics, which we evaluate in Section 6.5. We leverage the results to drive the prototype implementation described in Sec- tion 6.6. The implementation supports both erasure coding techniques and a pluggable mechanism to choose among the different placement strategies. We test the prototype in Section 6.7 and conclude in Section 6.8.

In document Erasure coding for distributed storage systems (Page 64-66)