Cloud Storage Servers Need Transactional Isolation

1.5 Organization

2.1.2 Cloud Storage Servers Need Transactional Isolation

Transactional isolation support from the cloud storage server would benefit most applications and systems running on top of the server, because the cloud inherently entails concurrent data accesses. Cloud storage servers host a vari- ety of applications and systems, but most applications and systems implement transactional isolation of their own. Thus, transactional isolation has become a redundant feature that is used by many but not supported by the cloud storage server itself. Transactional isolation has become redundant because the storage stack is traditionally designed to place sophisticated functionalities, including transactions, in the high layers of the stack [79, 108]. However, the cloud opens

up new opportunities to question the traditional storage stack design. The logi- cal block device layer is the lowest common software layer of the storage stack which has been kept simple but it is used directly or indirectly by most systems and applications. To support transactional isolation (in addition to atomicity and durability) as a feature provided from the cloud storage server to all systems and applications, we reason about supporting transactions from the block layer and explore the potential benefit.

End-to-End Argument

End-to-end argument [108] is a system design guideline which helps the de- signer to decide where to place a particular functionality in a layered system. The argument advocates placing functionalities in the end-application or the high layers of the software stack in two cases. The first is when application- specific care or information is necessary even after the low layer has processed the functionality. Exceptions can be made when there is a performance or utility reason to place the functionality down the stack. The second is when placing the functionality in the low layer incurs unnecessary overhead to applications that do not use the functionality. To summarize, if a functionality is not com- plete by itself in the low layer, is not usable for most applications in the system, or does not have performance benefits to be placed in the low layer, it should be located in the high layers of the stack. We carefully review the end-to-end argument to investigate the soundness of transactional isolation support from the block layer.

Although the first part of the end-to-end argument may not completely com- ply with our approach, the exception to the first clause ensures that transac-

tional isolation in the block layer is a viable approach especially in cloud storage servers. Transactional isolation has been implemented in high layers of the storage stack, partially because of the first part of the end-to-end argument. Transactional isolation requires handling of information about which part of data the application has accessed in a transaction. However, transactional isolation in the block layer of cloud storage servers passes the first clause of the end-to-end argument, because the block layer support is undeniably useful for most applications running on a cloud storage server. The applications require transactional isolation or concurrency control by default in the cloud and once the necessary information for a transaction becomes available to the block layer, applications do not have to handle transactions redundantly. One of our goals is to provide a general block level API for transactions, so applications can easily adopt transactional features from the block layer.

Considering the second part of the end-to-end argument, transactional isolation is efficiently implementable at a low layer of the stack with negligible performance overhead using flash based storage devices, terabytes of RAM, and tens to hundreds of cores that already exist in cloud storage servers. Moreover, the fact that most applications require transactional isolation in the cloud elim- inates concerns for imposing unnecessary performance overhead to any application.

Other Needs and Benefits

In addition to the examination of the end-to-end argument, we investigate ad- vantages and other goals for support transactional isolation from the block layer:

Overcoming the complexity of locks: Storage systems typically implement pessimistic concurrency control via locks, opening the door to a wide range of aber- rant behavior such as deadlocks. A deadlock is a status which a program cannot make any progress because processes/threads in the program have locked different data simultaneously and are indefinitely waiting for others to release the lock. This problem is exacerbated when developers attempt to extract more parallelism via fine-grained locks, and add more complexity by incorporating mechanisms for atomicity and durability [87]. Transactions can provide a sim- pler design of storage system by supplying isolation, atomicity and durability at the same time using a single abstraction.

Supporting a generic transaction: Storage systems often provide concurrency control APIs over their high-level storage abstractions; for example, NTFS, a Windows filesystem, offers transactions over files, while Linux provides file- level locking. Unfortunately, these high-level concurrency control primitives often have complex, weakened, and idiosyncratic semantics [98]; for instance, NTFS provides transactional isolation for accesses to the same file, but not for directory modifications, while a Linux lock using fnctl commands can be re- leased when the file is closed by any process that was accessing the file instead of an explicit unlock [5]. The complex semantics are typically a reflection of a complex implementation, which has to operate over high-level constructs such as files and directories. In addition, if each storage system implements isolation independently transactions cannot span over different systems: for example, it is impossible to do a transaction over a file on NTFS and an arbitrary database system. One of our goals for exploring block level transactions is to support transactions over multiple systems that work on different data constructs.

Efficient transactions using multiversion concurrency control. Pessimistic concurrency control with locks is slow and prone to bugs; for example, when locks are exposed to end applications directly or via a transactional interface, the application could hang while holding a lock. Optimistic concurrency control [74] works well in this case, ensuring that other transactions can proceed without waiting for the hung process. Multiversion concurrency control works even bet- ter. Multiversion concurrency control (MVCC) is one of the optimistic concurrency control mechanisms which maintains multiple versions of data to serve users with different snapshots. Transactions with stable, consistent snapshots (a key property for arbitrary applications that can crash if exposed to inconsistent snapshots [59]) allow read-only transactions to always commit [38] and enables weaker but more performant isolation levels such as snapshot isolation [36].

However, implementing MVCC can be difficult for storage systems due to its inherent need for multiversion states. High-level storage systems are not always intrinsically multiversioned, making it difficult for developers to switch from pessimistic locking to a MVCC scheme. Multiversioning can be particularly difficult to implement for complex data structures like B-trees – a balanced tree commonly used in databases and filesystem to index data blocks – requiring explicit marking of deleted data which is known as tombstone [53, 103].

In contrast, multiversioning is relatively easy to implement over the static address space provided by a block store (for example, no tombstones are re- quired since addresses can never be deleted). Additionally, many block stores are already multiversioned in order to obtain write sequentiality: examples are shingled drives [26], SSDs, and log-structured disk stores. Thus, as an efficient implementation strategy for transactions, we investigate pushing MVCC in the

block layer as well.

Chapter 4 details how we design transactional block layer using a new multiversion concurrency control method and demonstrate how this facilitates cloud application designs.

In document Isolation in Cloud Storage (Page 47-52)