4.2 DiSTM Generic Design
4.3.3 Distributed Atomic Collection Classes
This section describes in detail the distributed atomic collection classes imple- mented while porting the benchmarks (see Section 5.2.2) to DiSTM. The descrip- tion of each class is logically separated into two subsections: the first section concerns the functionality of the class when running in centralized mode, while the second that when running in decentralized mode.
4.3.3.1 Distributed Atomic Singleton Objects
Centralized Mode: The implementation of singleton objects is fairly straight- forward when run in centralized mode. Upon initialization the master node cre- ates the object, assigning its OID. The object is copied to the worker nodes where locally running threads can access it.
Decentralized Mode: When running in decentralized mode, the node that first accesses (initializes) the object assigns the OID to it and becomes its HOMEN- ODE. Any subsequent request to that object from any remote node will result in caching it from its home node.
HORIZONTAL VERTICAL BLOCK Node 0 Node 1 Node 2 Node N
Node 0 Node 1 Node 2 Node N
Node 0
Node 2
Node 1
Node 3
Figure 4.5: Partitioning types of distributed atomic arrays
4.3.3.2 Distributed Atomic Arrays
Centralized Mode: Similarly to singleton objects, the master node creates and initializes the whole array assigning OIDs per array element. The array is cached on the worker nodes and threads can access it normally.
Decentralized Mode: In this mode the array is partitioned amongst the nodes of the cluster. The partitioning can be achieved in several ways such as HORIZONTAL, VERTICAL or BLOCK; see Figure 4.5.
The home-owner of each array partition initializes it and assigns OIDs to the elements contained in it. If a thread attempts to access an array element which is not in the boundaries of the local partition owned by the node where it resides, a remote request is broadcast and the corresponding array element is fetched and cached on the local node.
4.3.3.3 Distributed Atomic LinkedList
The type of linked list required for the implemented benchmarks prohibits the existence of duplicates. Hence, a lookup phase precedes every addition in order to discover if the potentially inserted element already exists in the list. The fol- lowing descriptions concern the list implemented which is not a generic linked
CHAPTER 4. DISTM ARCHITECTURE 55
list, although it would be possible to implement a generic list.
Centralized Mode: During the initialization phase, the head of the linked list is created at the master node and is cached on the worker nodes. In this way all worker nodes can access the head of the list and perform operations on it. During an insertion to the list, the previous value as well as the new value of the head pointer are sent to the master node in order to commit the data to the linked list residing on the master node (always the consistent view). The master node, being responsible for maintaining the consistent state of the cached versions of the lists of the worker nodes, multicasts the new values to them aborting any conflicting transactions. Removals and retrievals from the list are performed similarly to insertions.
Decentralized Mode: In this mode the situation differs significantly from that of the centralized mode. Now each node has its own head of the list. The list is fully distributed amongst the nodes and operations include remote requests to remote nodes. Figure 4.6 presents the state diagrams of the three major operations on a list (Insertion, Removal and Lookup).
Upon an insertion, the list is traversed in order to discover if the element already exists. If the element is present, it is returned and the operation com- pletes. If the element is not found, a remote request is broadcast. If the element exists on a remote node, it is fetched and cached on the caller node and the op- eration completes with the return of the fetched element. If the remote request fails (the requested element does not exist in the cluster), the caller node inserts the element to the list and completes the operation returning the value. The re- moval operation as shown in Figure 4.6 differs slightly from the insert operation. The difference is that upon a failed lookup and a successful remote request an insert operation is simulated before the element is removed. Finally, the lookup operation has two phases. First, the part of the distributed list residing on the local current node is checked. If the element exists it is returned. On the other hand, an insertion operation is simulated in order to fetch the element from the remote nodes. If the insertion operation succeeds (i.e. the element exists in the cluster), it is fetched and cached on the local current node and returned (and finally removing it from the list). On the contrary, a null value is returned.
Initial Lookup Remote Request Return Element Add Fetched Element, Return Add Element, Return Insert Element Found Not Found Found Not Found Insertion Initial Lookup Remote Request Remove, Return true Add Fetched, Remove it, Return true Return false Remove Element Found Not Found Found Not Found Removal Initial Lookup Remote Request Return Element Add Fetched Element, Return Return null Lookup Element Found Not Found Found Not Found Lookup
Figure 4.6: State diagrams of distributed linked list operations (decentralized mode).
4.3.3.4 Distributed Atomic HashMap
The Distributed HashMap has been modeled as an non-atomic array of dis- tributed atomic linked lists. Therefore, its operation is similar to that of the distributed atomic linked list. Figure 4.7 illustrates the structure of the hashmap. The bucket[] array holds the heads of the distributed atomic linked lists. When a transaction accesses an element of the hashmap, the key is hashed and the corresponding bucket index is retrieved in order to perform the operation.
Centralized Mode: In this mode, the master node knowing in advance the size of the hashmap creates and initializes the bucket[] array as well as the heads of the atomic lists. Those values are copied to the worker nodes and hence the application threads can operate on the hashmap. Transactions’ commit procedures follow the same strategy of the corresponding centralized versions of the collection classes described above.
CHAPTER 4. DISTM ARCHITECTURE 57
Buckets Distributed Atomic Linked Lists
Figure 4.7: Distributed atomic hashmap structure.
privately owned bucket[] arrays assigning distinct OIDs per list head. This scheme can be regarded as a collection of distributed atomic linked lists running in decentralized mode. After the indexing of the key (hashing the key object associated with the value2) and the retrieval of the corresponding bucket[] index, the operations adhere to the state diagrams of Figure 4.6.