A SIMULATOR FOR LOAD BALANCING ANALYSIS IN DISTRIBUTED SYSTEMS
Mihai Horia ZAHARIA
1, Florin LEON
2,
Dan GÂLEA
3"Gh. Asachi" University of Iasi Department of Computer Engineering
Bd-ulD. Mangeron Nr. 53 A
1
[email protected]
2
[email protected]
3
[email protected]
Abstract. There is a lot of research in the area of load balancing at any level in distributed systems. Unfortunately most models take into account homogenous clusters. This approach makes the station and communication model more simple, so the area of possible proposed algorithms for load balancing is increased and sometimes their simplicity. The latest Internet technology development drives to the necessity that the cluster must be dynamic and also heterogeneous. In this paper a simulator for a heterogeneous cluster used in distributed computing is presented.
Keywords: distributed systems, network topologies, load balancing, simulator.
1. Introduction
Nowadays in accordance with advanced research needs the required computer power is always insufficient [3,5]. Until last decades there have been two clear differences at computing level. One is parallel computing and the other is represented by distributed systems.
Due to massive development acquired both at the computer power and communication levels a new direction appeared. High Performance Computing represents a mixture of parallel and distributed computing. This means the use of heterogeneous dynamic clusters [2].
There are two dominant different approaches in accordance with the operating systems market.
One is the DotNet framework introduced by Microsoft in order to create a background for distributed applications and language independent development. On the other hand there is the grid computing direction derived from GLOBUS international project.
This approach is usual for UNIX/LINUX based computers or supercomputers and has application level granularity. Of course there are soft producers that offer the grid style approach under Microsoft platforms but they are yet at the beginning [1,4].
There are some approaches that use JAVA in order to create the required core for Grid. No matter what approach is preferred the main idea is to use various architectures distributed in the net as a parallel supercomputer.
Unfortunately, most of the research and provided solutions are in the area of homogeneous clusters. That is justified by the simplicity of the required model in comparison with heterogeneous ones.
That drives us to propose a general station model from a heterogeneous cluster. A graphical
simulator was created in order to analyze how the system will work under various loads and various
types of node connections.
2. Cluster Model
One disadvantage of present approaches in distributed systems is the oversimplification due to the assumption that all the stations in the network are homogenous [6]. In a real cluster the workstations have different computing power and capabilities. The model we used considered a station as characterized by the following elements:
! unique ID;
! computing power;
! availability;
! task list.
The unique ID can have different meanings: an arbitrarily assigned number, the IP address, or the network interface MAC.
We supposed that the computing power of a machine could range from 1 to 10. This doesn’t cause a loss of generality, because the computing power can be expressed as a comparison with a standard machine, using different benchmark tests. For simulation purposes, we used a uniform distribution of random numbers to generate the computing power of the workstations in the network.
The availability represents the amount of free resources. It is obvious that a computer running cannot be entirely free; usually the operating system itself consumes memory and hard-disk space.
However, for a particular OS, these can be considered as constants (with a certain approximation).
Therefore, we focused on the remaining free resources. A computer with no user tasks was considered to have an availability of 100. As more tasks arrive, from the user or from other stations, the availability decreases until the computer becomes overloaded.
If a workstation is overloaded, it can send tasks to its neighbors. We made an explicit model of the tasks. Thus we assumed that a task is characterized by the physical space needed for temporary storage or transport in the network, the computing power of a machine necessary to actually solve the task (power request) and the time to solve.
In order to maintain consistency, these values are expressed using the same convention as that used for stations. The values are stated for the unit station, the reference. If a task with a certain power request arrives on a computer with a computing power of 10, the actual power request will be the default one divided by 10. This ratio is the availability difference needed to solve the task:
r putingPowe StationCom
equest TaskPowerR eed
onibilityN
ActualDisp = (1) These identifiers were randomly generated using different distributions. The task space was:
+ + <
= otherwise
ExpNeg if
ExpNeg TaskSpace
, 500
500 ) 500 , 3 ( 1
), 500 , 3 (
1 (2)
where ExpNeg is a function for creating random numbers with an exponential negative distribution:
e
xx
ExpNeg ( λ , ) = λ ⋅
−λ⋅(3) The power request was provided as a uniform random number between 1 and 800. The time to solve was generated using the absolute value of a normal (Gaussian) distribution:
) 3 , 50 ( Normal e
TimeToSolv = , (4)
where Normal is a function for creating random Gaussian numbers with a given mean and standard deviation:
2 2
2 ) (
2 ) 1 ,
(
σω µ
σ π
µ = ⋅ e− −
Normal (5)
In our simulator, we explicitly modeled the network connections. They can have a broadband of 10Mb/s or 100Mb/s, like in reality. The space required to transfer a task is taken into consideration where it must be send from a station to another through a network connection. A bigger task will take longer to transfer. Also, the available broadband of the connection will have a significant influence on the performance of routing.
3. Simulator overview
In this paragraph, several network topologies will be presented in the simulator, along with a description of system behavior when tasks are injected into the network.
The simulation platform we considered had 64 workstations. Their parameters and the connections between them can be customized. The simulation is time-based. Every second, the simulator dynamically recalculates the loads on stations and network connections.
A station first tries to solve its task in a FIFO order. When its availability is exceeded, it becomes overloaded and tries to move the tasks it cannot process to its neighbors. If all the neighbors are overloaded, too, the tasks are put on a waiting queue. If there are free neighbors, it chooses one neighbor for every extra task and sends it. The corresponding network connection begins its own processing, transporting a part of the task every second, until sending is complete.
In the graphical simulation, we used the following coloring conventions:
! a blue station has a availability over 80;
! a yellow station has a availability between 0 and 80;
! a red station is overloaded (has a negative availability);
! a black network connection is free or still available;
! a magenta network connection is full.
Figure 1 displays a mesh network, where each station is connected to all its neighbors. The running scenario is to have the first station (in the upper left corner of the LAN) receive a certain number of tasks and distribute them in the network. An interesting phenomenon takes place: at the beginning, the overloading of stations propagates in waves.
Figure 1. Mesh network Figure 2. Tree network
On receiving all tasks, the first station immediately becomes overloaded. Then it overloads its
immediate neighbors by sending the majority of its tasks to them. After that, it becomes again
available, but it in turn receives tasks from its overloaded neighbors. The same repeats for stations situated at equal distances from the initiator, thus giving the wave impression.
In figure 2, a tree network is displayed. Here, tasks are injected on the first station, but then they are transmitted to four other stars. One can notice that nodes from upper levels are often overloaded than others. This simulation result is in good agreement with theoretical considerations, where a well-known disadvantage of the tree topology is the single point of failure also known as root bottleneck. In fact the overload appears onto the higher levels in the tree with a maximum on the root.
The topology of FAT tree may decrease the overload in tight-coupled architectures but in the cluster appears another problem [7]. A simple workstation or dedicated server it is simply not enough to solve the overload. That’s why powerful wired structures are used in communications centers management.
Unfortunately in a distributed system where the cluster is dynamic elected by any self-elected workstation the static structure of Internet may or may not be helpful depending on the relative position of the initiator.
A very popular network topology is the hyper-cube. In figure 3, a 5D hyper-cube is shown. It has only 32 vertices, therefore half of the stations are not connected. The topology shows in a more intuitive way the distribution of tasks in a cube configuration.
Figure 3. 5D hyper-cube network Figure 3. 6D hyper-cube network
Having 64 stations, a 6D hyper-cube is a more efficient approach (figure 4). In this case, the graphical configuration is not as obvious as the one in figure 3. To generate this topology, each station received an ID from 0 to 63 and then connections were automatically generated between every two stations whose ID’s in binary format differed only by one bit.
Following the same scenario, one can see the speed with which tasks are distributed in the whole hyper-cube. Practically, no station remains free in a very short amount of time. This proves the high efficiency of this topology.
4. Performance study for different network topologies
Next, we studied the performance of task distribution in three scenarios, for three network topologies: mesh, trees and hyper-cube. The first scenario was the one presented above: 500 tasks were introduced in the LAN through only one station. The simulation results for 5 trials are listed in table 1.
The tree configuration proves to be the least efficient. When there are only a few big tasks left
to solve, they have little chance to exit their star and to enter another, because they must traverse the
first next two nodes beginning with root.
Table 1. First scenario: single task generator
Mesh Tree 6D Hyper-cube
266 574 348
215 508 280
320 249 478
332 113 360
491 711 242
Figures 4 and 5 display system performance for a mesh topology in this scenario. At the beginning, all tasks are simultaneously injected into one station. Therefore, at the beginning the system is overloaded and many tasks are transmitted through network connections. By the end, the majority of tasks have been solved and only a few big tasks are still finding a workstation powerful enough to solve them.
Total Disponibility
-200000 -150000 -100000 -50000 0 50000
1 24 47 70 93 116 139 162 185 208 231 254 277
seconds
disponibility
Total Network Load
0 1000 2000 3000 4000 5000
1 23 45 67 89 111 133 155 177 199 221 243 265
seconds
load
Figure 4. Mesh topology performance for Figure 5. Mesh topology performance for single task generator: total availability single task generator: total network load
In the second scenario, we assumed that stations had users with generated local tasks. For 50 seconds, on every station 10 tries were attempted to generate a task, each with a probability of 0.5%
(table 2).
Table 2. Second scenario: multiple task generators
Mesh Tree 6D Hyper-cube
302 407 105
503 157 147
216 296 371
714 113 142
106 277 235
In this scenario, the hyper-cube seems to have best performance. Although each station has six
connections to their neighbors and the locally generated tasks can be optimally distributed in a
acceptably small vicinity of the initial vertex in order to be solved. The system as a whole is never
overloaded (figures 6 and 7).
Total Disponibility
0 10000 20000 30000 40000
1 15 29 43 57 71 85 99 113 127 141 155
seconds
disponibility
Total Network Load
0 100 200 300 400 500 600
1 15 29 43 57 71 85 99 113 127 141 155
seconds
load
Figure 6. Hyper-cube topology performance for Figure 7. Hyper-cube topology performance for multiple task generators: total availability multiple task generators: total network load
In the third scenario, only one huge task was injected, with a power request of 950, a needed space of 50 and a time to solve of 100. Obviously, only a powerful computer (with a computing power of 10) can solve such a task, therefore it moves through the system until it reaches a powerful station.
Table 3. Third scenario: single huge task
Mesh Tree 6D Hyper-cube
25 118 15
26 246 38
18 46 10
197 15 10
21 18 41
Again, the hyper-cube seems to have best performance, followed by the mesh topology.
Figures 8 and 9 show the worst case, when a tree configuration is used, where the task is transported through the network for a long time.
Every computer that receives it becomes overloaded and has to send it further to its neighbors.
Network connections can transport it faster or slower, depending on their own speed (10Mb/s or 100Mb/s).
Finally a powerful workstation receives it and is able to solve it. A solution to speed up this process is to introduce an intelligent routing algorithm so that such a task could easily find the type of computer it requires.
Total Disponibility
30500 31000 31500 32000 32500
1 12 23 34 45 56 67 78 89 100 111 122
seconds
disponibility
Total Network Load
0 2 4 6 8 10 12
1 10 19 28 37 46 55 64 73 82 91 100 109 118
seconds
load