3.5.1
Distributed Hash Table
A Distributed Hash Table (DHT) is a decentralized distributed system that supports accessing the data in a way similar to hash table. Same as the traditional hash table, data is organized as key-value pairs in DHT. The most distinguished feature of DHT is that all the key-value pairs are stored among multiple nodes and changing the set of the participating nodes will not significantly impact the system.
The foundation of DHT is an abstract key-space which will be split to all the participants according to specific key-space partitioning scheme. An overlay network is designed to describe the network topology and manage the key-space. Different implementations of DHT can have the same network topology, but they must have varying key-space partitioning schemes.
The typical procedure of storing and retrieving data in DHT is depicted in figure 3.17. The first step of saving and finding a key-value pair (k,v ) is to generate the hash value of k using a designated hashing function. After that, a mapping function will try to ask a set of nodes according to the key-space partitioning scheme to find out the node which is responsible for hosting k. Saving and Finding request will be then directly sent to the node [67].
DHT provides a way to make a distributed system become scalable, autonomic, and resilient. The most popular designs of DHT is consistent hashing.
3.5.2
Consistent Hashing
Consistent Hashing was first introduced by MIT when developing a distributed caching solution that supports dynamic joins and leaves of servers. A common way for the key-space partitioning mechanism to calculate the destination of a given key is to calculate the result of key%n (key mod n). However, such mechanism
Figure 3.17: Procedure for Saving and Finding data in DHT
might cause severe performance problems because all keys need to be remapped when n changes. Consistent Hashing provides a way to guarantee that if the size of the hash table changes, then only k/n keys need to be re-shuffled, where k is the number of keys and n is the size of the hash table. Therefore, Consistent Hashing is remarkably suitable for the mobile environment where nodes are frequently connecting and disconnecting [68].
Consistent Hashing maps each key to a point on an abstract circle called the consistent hashing ring, and each available nodes will be randomly mapped to the ring as well. When locating a key, the system will first find the location of the key in the consistent hashing ring by calculating the result of hash(key). Then the system will travel the ring clockwise until it encounters an available node. This node will then be selected as the desired destination for the given key. When a node becomes unavailable, it will be removed from the consistent hashing ring and only the keys in the lost node need to be remapped to the next available node in the ring. A similar process will be applied when a new node joins the system. The new node will be mapped to the consistent hashing ring and the keys stored in its next clockwise available node will be recalculated
and the keys whose position are at the counterclockwise direct of the newly joining node will be moved to the new node.
How Consistent Hashing Works
Consistent Hashing is based on evenly mapping each object to a point on the edge of a circle. The system maps each available node to many pseudo-randomly distributed points on the edge of the same circle. The remainder of this section will illustrate the detail of Consistent Hashing.
Figure 3.18: Consistent Hashing Illustration
• The Ring
As shown in the first part of figure 3.18, consistent hashing will map a traditional hash table space with a specified table size, typically 232, to a closure ring.
• Mapping Objects to the Ring
As shown in the middle of figure 3.18, objects will be mapped to some of the slots in the hash ring via a specially designed hash function.
• Mapping Nodes to the Ring
The same hash function will be applied to place the actual nodes evenly on the ring. Each object will be stored in the node that is closest to it in its clockwise direction.
Figure 3.19 explains how consistent hashing deals with the addition and removal of nodes. If a new node has been added to the ring, like Node 4 in the above figure, it will first be mapped to the ring according to the same hash algorithm for mapping objects. All the objects between it and the nearest node in its counterclockwise direction will be transferred to it. Take the figure 3.19 for example, object 2 will be transferred from node 3 to node 4. When a node is removed from the ring, take Node 2 for example, all the objects it currently holds will be automatically transferred to its successor, i.e. Node 4.
Figure 3.20: Consistent Hashing Ring Balanced by Virtual Nodes
However, when there are only a few real nodes in the hash ring, the system might be out of balance. For instance, after node 4 has been removed from the system, there will be three objects stored in node 3 while node 1 now hosts only one object, making node 3 a hot spot. Node 3 is made the hot spot because it now contains significantly more data, or ”objects”, compared to other nodes in the hashing ring. However, we do not want a hot spot because it contains an inconsistent amount of data compared to other nodes, causing the system to route a significantly higher amount of requests to the hot spot. The hot spot will not be capable of efficiently processing this massive amount of requests, making it a bottleneck within the system, while other nodes are left in idle wasting their valuable resources. This problem is solved by introducing the concept of virtual node. As shown in figure 3.20, an actual node will be divided into several virtual nodes. Each node, including the original node, will be labeled by its real name and an incremental suffix number. All the virtual nodes along with the actual nodes will be put together and evenly distributed to the hash ring. Therefore, the system is balanced again [68].