Load Balancing in Distributed Systems: A survey

(1)

Load Balancing in Distributed Systems:

A survey

Amit S Hanamakkanavar^* and Prof. Vidya S.Handur^#

*([email protected]) Dept of Computer Science & Engg, B.V.B.College of Engg. & Tech, Hubli

# ([email protected]) Dept of Computer Science & Engg, B.V.B.College of Engg. & Tech, Hubli.

Abstract: In real world, computer server load balancing is the process of distributing service requests across a group of servers. Users of one workstation are not obstructing by the intensive applications run on a different workstation. When there are large number of machines on the network are idle, then the efficiency of computation process is decreased. The efficient sharing of computing resources in a distributed system is a more complex compared to centralized system. Resources are fragmented and distributed over a set of autonomous and physically separate hosts. Load balancing promises to reduce the average response time of processes by sharing the workload of heavily loaded workstations with lightly loaded workstations.

Even though a lot of study has happened in balancing the load across the network, still the issue needs to be addressed from performance perspective.

In this survey an attempt is made to review the representative studies on load balancing and summarize the existing studies. The paper is a brief discussion on Load Balancing in Distributed systems using the following approaches: Distributed Hash Tables, Clustered Approach, Adaptive and Decentralized Load Estimation for Computational Grid Environments. The paper also describes some of the challenges in load balancing. Through the survey, the related studies in distributed systems can be well understood based on how they can satisfy the general characteristics of distributed systems. Survey extends to elaborate the Game Theory approach for load balancing and future research studies in the distributed systems.

Keywords: Distributed system, Load balancing, Grid environment, Game theory, Hash table

I. INTRODUCTION

Distributed systems [1] in which computational units are distributed over the network are connected and organized by network to handle the requests from heterogeneous users around the world.

Distributed systems offers potential for sharing and combining of different resources such as computer, database, etc. Resources are distributed and may be owned by different organizations or agents.

Distributed systems are viewed as a collection of computing and communicating resources shared by multiple users.

Application areas of distributed systems are various but the way of handling the user request may remain similar. Applications in distributed systems can be divided into number of tasks and be executed on different nodes. When the demand for resources increase, load balancing of the system becomes significant. Therefore it is very essential to have a management system that makes decision to balance the load of a node.

Load balancing in distributed system is broadly classified as static and dynamic load balancing.

Static load balancing mostly suitable for homogeneous and stable environment, will produce good results in these environments. But they are not flexible and cannot adapt to dynamically changing structure of the system. Dynamic load balancing is flexible and can provide good results in heterogeneous and dynamic environments. But however these two could be inefficient in producing good results for balancing the load and may be overall degradation of the service performance due to their own limitations.

A. Static load balancing

Static load balancing is based on predetermined system information and is used when system will be stable and the number of node participation is fixed and known at compile time. Balancing decision is made based on the average workload of the system. Static load balancing is well suitable when system structure will not change frequently.

Static load balancing algorithms distribute load based on fixed set of predetermined rules.

B. Dynamic load balancing

Load balancing is based on the current state of the system. Dynamic load balancing is performed at run time when system load and number of nodes participating are likely to change. This increases the overhead of monitoring the system over the time and leaves system in more complex state.

Decision of distributing load across the system is based on the whole system load. Here, as the participating nodes increase system becomes more complex to handle the load across each node.

(2)

In this paper we present a survey of the current load balancing on Peer-2-Peer systems (P2P), Grid environment, Clustered Heterogeneous Computational Systems, Computational Grid Systems and challenges in distributed system. After that we discuss about game theory approach for load balancing and future research studies in the distributed systems.

II. PEER TO PEER SYSTEMS P2P system [2] in which interconnected nodes (peers) share resources among each other without the use of centralized administrative system to handle the requests to the resources. P2P systems partitions task or work and distribute between the different nodes. Nodes are said to form a peer-to- peer network of nodes. Peers contain their resources, such as processing power, storage and directly available to other nodes in the network without the need of centralized management. Peers are considered as the suppliers and consumers of the resources.

A P2P system with Distributed Hash Tables (DHT) provides unique identifier that is associated with each data item (that is object) and each node in the distributed system. The identifier space is distributed among the nodes that form the P2P systems. Each node stores the item that is identified by the identifiers which are stored in its identifier space. To provide reliability, robustness and scalability system requires load balancing algorithms to fairly distribute the load among all the participating peers.

The load is related to links between nodes, objects which are information stored in the system and its frequency of access. Each node in the system is limited by the processing time, access bandwidth, storage capacity. Load contains the request rate, computational power spent on the request processing.

When the node issues query to retrieve the object, redirection of query to the node which is responsible for the object is done by routing function. Depending on the number of links between the peers, the request traverses multiple nodes on its way to the object.

In DHT each node maintains list of outgoing links to its neighbours in routing table. Forwarding node executes the routing algorithm and selects a next neighbour from its routing table which is closest to destination node. Nodes with smaller number of incoming links will receive fever request than node with many incoming links. To fairly share the traffic load, routing table must be recognized in such a way that incoming links per node must be balanced.

TABLE I. SUMMARIZATION OF WELL- KNOWN DHT DESIGNS.

Technique Description

CHORD [3]

Is a distributed lookup protocol, which locates the node containing the particular data item. Supports one operation: It maps given key onto the node. Routing table is distributed over the different nodes.

PASTRY [4]

It performs object location and application-level routing in very large P2P systems. Each node has unique identifier. When message is sent with the node id, PASTRY routes the message to the node with the node id matching. Each node contains its immediate neighbour’s node id in its node id space.

CAN [5]

It is a Content Addressable Network protocol. Which efficiently maps keys onto values in large scale distributed systems.

Here central sever stores the key value of the node and user queries this central server with required file name and obtains key value on the node containing requested file.

It uses two-dimension table to store the neighbour nodes’ key value that is IP address of that node.

KAD [6]

Kademlia: This is a distributed hash table with the minimum number of messages sent to neighbours to know about each other. Nodes are identified by the KAD ID, which is of 128 bit long random number generated by cryptographic hash function.

Routing is done based on the prefix matching strategy. Entries of the routing table are called contacts and structured like unbalanced routing tree.

(3)

A. Issues in P2P systems

1. Securities will be the main concern in P2P system. Attackers may add malware to P2P system as an attempt to take control of other nodes in the system.

2. Usage of bandwidth will be heavy.

3. If distribution hash table fails, then nodes fail to obtain the current information about the neighbor nodes.

III. CLUSTERED HETEROGENEOUS COMPUTATIONAL SYSTEMS Clusters contain heterogeneous [7] nodes to execute parallel applications those require considerable amount of computational resources or data storage. Load balancing is decentralized where decision is taken by communicating with each node of the cluster. Heuristic neighbor selection approach reduces number of communication messages sent to neighbor nodes by sending load information only on imbalance of workload at its neighbors. Each cluster contains load balancer (LB) and nodes which are interconnected. LB performs resource management and scheduling user jobs. LB examines arriving jobs and depending on load information of its neighbor nodes and self, it decides which node is assigned for the job execution.

Initially the load information about neighbors will not be known by the load balancer, so load information is assumed to be less at the beginning. Over the time, LB gathers load information about its neighbors by exchanging workload messages. Exchange of these messages is carried out by node, when heavy loaded node in a cluster wants to transfer job to other cluster with light load or upon completion of job execution.

Fig 1. Cluster Model for Distributed Systems [7].

A. Sharing load information of cluster

Sharing of load information is the core operation of dynamic load balancing for determining the distribution of jobs to different clusters. As long as job distribution process is carried on arrival of new job,

exchange of load information will be performed by nodes. Load information is sent to acknowledge the receiver about the load of other cluster in the distributed system.

Along with the load information, time stamp will be sent to the receiver and receiver compares time stamp information with its own time. If update about load has to be successful then time value received. If update about load has to be successful then time value received must be greater than time value of receiver.

So it informs received load knowledge is recent load status of the sending nodes cluster.

B. Selection of neighbors for job distribution

Load balancer contains its neighbors list for allocation of jobs [7]. LB selects the neighbor node which will takes less load for redistribution of jobs. A node with highest processing load for redistribution of job will face problem due to list with empty neighbors and it enqueues arrival of jobs even if it is overloaded with many jobs. This forces jobs in queue to wait for longer time until previous job finishes its execution. To overcome this problem of enqueueing arrival jobs, LB must make use of arriving load information from its neighbors to choose nodes for job processing.

A. Issues in Clustered systems

1. If components are heterogeneous in terms of software from each other, then there may be issues when combining all of them together as a single entity.

2. Problem may rise when finding out fault that which of the component has some problem associated with it.

3. Cluster computing involves merging different or same components together with different heterogeneity, and then a non-professional person may find it difficult to manage system.

IV. DECENTRALIZED LOAD ESTIMATION FOR COMPUTATIONAL GRID SYSTEMS.

Grid [8] is growing as wide area distributed computing structure that supports resource sharing and load balancing in a distributed system and provides user to access resources which are locally unavailable for job execution. In distributed systems resources are distributed over the network and there is large number of resources available. [9] Computational Grid provides cooperation among the distributed computer systems and allows user jobs to execute on either locally or on remote computer systems. With multiple heterogeneous resources it will improve the performance of the system by proper efficient load balancing and scheduling of jobs across the Grid environment of the system.

(4)

Many policies are used by load balancing algorithm, basically they are classified as two types: Location policy and Transfer policy. The location policy locates the under loaded node and performs the sending/receiving of load to/from the nodes under loaded/overloaded to improve the system performance.

The transfer policy, by using load information across the system determines action of the node to act as sender- “to transfer the job to under loaded node” and receiver- “to receive job from overloaded node”.

In grid environment, processing capacity of each node differs because of processor heterogeneity and underlying network connection ID also heterogeneous.

Network topology used to connect resources is different. To address this dynamically changing nature of grid environment, use of arbitrary topology to connect resources will help to improve the performance of grid. In [8] they have proposed two adaptive, dynamic and decentralized load balancing algorithms that are applicable in balancing of loads in computational Grid environments depending on the underlying Grid infrastructure size.

When Grid size is small Load Balancing on Arrival (LBA) is more efficient and if Grid size is large then MELISA (Modified Estimated Load Information Scheduling Algorithm) provides efficient load balancing. In large Grid environments resources are distributed over large network and latency of communication between these resources will be very huge due to network interconnection between resources. Therefore job distribution cost based on the traffic between the nodes and loading conditions becomes an important factor for load balancing in large scale environments. LBA and MELISA will take into account of job distribution cost due to available bandwidth between the sending and receiving nodes for making load balancing decisions.

TABLE II. SUMMARIZATION OF MELISA AND LBA ALGORITHMS [9].

Algorithm Description

MELISA

In MELISA each Pi (node) estimates job arrival rate, service rate and the load at each load status exchange instant. On every estimation instant Pi

(node]) calculates the load value on its all buddy P’s. After calculating buddy load, each P calculates the average load on its buddy set. If load is greater than the average load of its buddy set then Pi will make a decision of job migration and distribution of its load will be in such a way that, the load on all buddy processors will be

finished at considerably same

time. Here node’s

heterogeneity is considered as processor speed.

LBA

LBA is a load balancing on arrival technique, load balancing is done by transferring a job on its arrival, rather than waiting for the next transfer instant as in MELISA.

LBA responds very fast to higher arrival rates in smaller computational Grid systems. In LBA algorithm estimation of expected finish time of a job will be calculated on each arrival of job to processor.

Estimation of finish time of a job will be done periodically and also the job migration. LBA will perform job migration when load is not distributed equally across all processors and performing job migration to lightly loaded processors will be much faster in LBA than in MELISA.

A. Issues in computational Grid systems

1. Grid system is not stable compared to other systems like cluster, P2P. Because of its geographically dispersed.

2. Gathering and assembling various resources from geographically dispersed sites require high internet connection which results in high monetary cost.

3. Sometimes issues will arise when sharing resources among different nodes. Additional tools are required for having proper syncing and managing among different nodes.

V. COOPERATIVE LOAD BALANCING IN DISTRIBUTED SYSTEM USING GAME

THEORY.

In static load balancing main objective is to minimize overall expected response time. For modern distributed systems fair job allocation is the main concern. But static load balancing provided little attention towards the load balancing in dynamic networks. To address this problem cooperative load balancing game approach is focused [10]. Here only single class of job distribution system is used to examine how system accurately handles the load across

(5)

all the nodes and this approach can be applied to heterogeneous class of jobs.

System consisting of n heterogeneous computers and each computer as a player, expected execution time must be minimized for jobs. If expected execution time of jobs at computer i is denoted as Ei (βi). The game can be expressed as follows,

min Ei (βi), i= 1 , . . . , n (5.1) Where βi is thejobs average arrival rate at computer i and queuing system for each computer can be modelled as,

Ei (βi) = (5.2)

Where μi is the average service rate of computer i, so min , i= 1 , . . . , n (5.3)

Here,

βi < μi i= 1 , . . . , n (5.4)

= φ (5.5) βi ≥ 0 i= 1 , . . . , n (5.6)

Where φ is systems total job arrival rate.

Cooperative load balancing game consist of

 n players (computers).

 The set of strategies X.

 For each computer i , the objective function fi

(βi ) = - βi , βi is a subset of X., then the goal is to minimize all fi (βi ).

 For each computer i , the initial performance u⁰i = - μi . This value required by computer i for entering the game without any cooperation.

Game theory can also be applied to non-cooperative systems. But in cooperative systems load balancing will be easier and it will be efficient. So it is better to use cooperative distributed node structure. Application of game theory approaches are discussed by Riky Subrata, Albert Y. Zomaya, Bjorn Landfeldt in [11].

VI. CHALLENGES

 Current distributed systems are composed of several subsystems and each subsystem has its unique control model. So there is more than one type of control model present in the overall system. Therefore, coordination of different control models of subsystems will be challenging task.

 In some current distributed systems the control model may also be dynamically customized depending on the requirements. Here satisfying requirements of users will be challenging.

 Resource optimization in distributed large systems by predicting usage of resource by concurrent tasks will be challenging.

 Ensuring the reliability while task allocation will be challenging.

 Providing coordination among tasks at allocation time for nodes will be challenging.

VII. FUTURE RESEARCH DIRECTIONS

 Implementation of self-adaptation and evolution of control models for dynamic distributed systems can be done.

 In future, schemes used for resource optimization should adapt to the dynamic resource distribution.

 Future research implementation can focus on resource optimization for assigning and reassigning of virtual resources to applications for execution will be on-demand.

 Real large distributed systems may be hybrid network structure, effect of these structures on load balancing will be a research area in future.

VIII. CONCLUSION

Load balancing is the core part of the distributed systems to provide efficient execution of jobs in heterogeneous structured large scale network. There are many types of distributed systems and load balancing models for each system is different. In this survey we performed a systematic review of load balancing in different structures of distributed systems.

In this survey we have summarized the following general approaches for load balancing in distributed systems: 1) CHORD; 2) PASTRY; 3) CAN; 4) KAD;

and some of the algorithm: 1) MELISA, 2) LBA. We also discussed about game theory approach for load balancing for cooperative systems. Survey presents some of challenges and research directions in distributed system. Throughout this survey we tried to summarize the related studies on load balancing strategies that satisfy the characteristics of distributed systems. Current distributed systems are growing rapidly, some of the areas are big data and cloud computing. These new areas require high throughput, security, data privacy and interoperability between heterogeneous clouds. Distributed systems always be dynamic and require much faster execution of queries to satisfy the use needs.

(6)

TABLE III. COMPARISON OF P2P, CLUSTER AND GRID SYSTEMS

Parameter P2P System Cluster system Grid system

Coupling Nodes are loosely coupled. Nodes are tightly coupled. Nodes are loosely coupled.

Scheduling Distributed job management and scheduling.

Centralized job

management and

scheduling.

Distributed job management and scheduling.

Physical location

Node does not have to be in the same physical relation and can be operated using central system.

Branch of similar nodes are hooked up locally to operate as a single computer.

Node does not have to be in the same physical relation and can be operated independently.

Scalability Scalability depends on the size of

the system. Scalability is in 100s. Scalability is in 1000s.

REFERENCES

[1] Yichuan Jiang, Senior Member, IEEE, “A Survey of Task Allocation and Load Balancing in Distributed Systems “,DOI 10.1109/TPDS.2015.2407900, IEEE Transactions on Parallel and Distributed Systems.

[2] Pascal Felber, Peter Kropf, Eryk Schiller, and Sabina Serbu, “Survey on Load Balancing in Peer-to-Peer Distributed Hash Tables”, IEEE Communications Surveys & Tutorials, VOL. 16, NO. 1, FIRST QUARTER 2014.

[3] Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan, “Chord: A Scalable Peer to peer Lookup Service for Internet Applications”.

[4] Antony Rowstron1 and Peter Druschel, “Pastry:

Scalable, decentralized object location and routing for large-scale peer to peer systems”.

[5] Sylvia Ratnasamy, Paul Francis_ Mark Handley_

Richard Karp, Scott Shenker, “A Scalable ContentAddressable Network “.

[6] Petar Maymounkov and David Mazieres, “Kademlia : A peer to peer information system based on XOR matrix “, New York University.

[7] “ Heuristic Neighbor Selection Algorithm for Decentralized Load Balancing in Clustered Heterogeneous Computational Environment “, Jay W.Y. Lim, Poo Kuan Hoong, Eng-Thiam Yeoh System Innovation Group, Faculty of Information Technology, Multimedia University, Cyberjaya, Malaysia.

[8] Joshua Samuel Raj, Rex Fiona, “ Load Balancing Technique in Grid Environment: A survey “.2013 International Conference on Computer Communication and Informatics (ICCCI -2013), Jan.

04 – 06, 2013, Coimbatore, INDIA

[9] Ruchir Shah, Bhardwaj Veeravalli, and Manoj Misra,

“On the Design of Adaptive and Decentralized Load- Balancing Algorithms with Load Estimation for Computational Grid Environments “, IEEE

TRANSACTIONS ON PARALLEL AND

DISTRIBUTED SYSTEMS, VOL. 18, NO. 12, DECEMBER 2007.

[10] Dr. Anthony T. C, Dr. M-Y Leung, Dr. Rajendra B, Dr. Turgay, Dr. Chia-Tien Dan Lo, “Load balancing in distributed systems: A game theoretic approach” A survey paper.

[11] Riky Subrata, Albert Y. Zomaya, Bjorn Landfeldt,

“Game-Theoretic Approach for Load Balancing in Computational Grids “, IEEE Transactions On Parallel And Distributed Systems, VOL. 19, NO. 1, JANUARY 2008.