Available at http://www.Jofcis.com
A Method Based on the Combination of Dynamic and
Static Load Balancing Strategy in Distributed Rendering
Systems ⋆
Wei YAO, Huawei PAN, Chunming GAO∗
College of Information Science and Engineering, Hunan University, Changsha 410082, China
Abstract
For lacking of effective load balancing strategy to improve the performance of distributed rendering systems or poor result of load balancing strategy, this paper proposes a load balancing method based on the combination of dynamic and static load balancing strategy. At first, static load balancing strategy is used in the initial stage of assigning task. Then, using dynamic load balancing strategy balance the load of each node during the process of rendering. It can reduce the processing time of the whole system and improve system performance effectively. Experiment results show that the method can give better balance and higher processing capacity than traditional load balancing methods.
Keywords: Distributed Rendering System; Dynamic Load Balancing; Static Load Balancing; Processing
Capacity
1
Introduction
Load balancing problem is also called task scheduling problem, which directly affects the efficiency and performance of distributed systems. It can be divided into two categories: static load balanc-ing [1] and dynamic load balancbalanc-ing [2]. Usbalanc-ing static load balancbalanc-ing means we can determine load partition in advance. While using dynamic load balancing means we can detect load of each node and divide load during run phase. The standard of classifying load balancing strategies, Wang et al., [3] proposed, was to judge the sponsor between sender and receiver. The standard took into account trigger position of strategy and dependent information. The classification proposed by Casavant et al., [4] was not unique, in other word, the classification existed overlaps. Baumgartner et al., [5] presented a unique and non-overlapping classification. Yang et al., [6] divided dynamic load balancing strategies into distributed, centralized and hybrid/hierarchical strategies according to the characteristics of controlling position. Distributed strategies contained diffusion method, dimension exchange method, gradient method; centralized strategies [7] contained RandCentLB,
⋆Project supported by the Science and Technology Plan of Science and Technology Agency of Hunan Province
(No. 2013WK3023).
∗Corresponding author.
Email address: [email protected] (Chunming GAO).
1553–9105 / Copyright © 2014 Binary Information Press DOI: 10.12733/jcis9566
RefineLB, RefineKLB [8] and so on; hybrid/hierarchical strategies mainly contained HBM [9] and HybridLB [7], HBM used the hierarchy tree to balance multilevel load, HybridLB divided processors into some independent and autonomous groups, and organized these groups according to hierarchical structure.
Currently, some common dynamic load balancing methods [10, 11] in distributed systems were fastest response method, least connection method, polling method and so on. The disadvantage of fastest response method was that the responding time of each node interconnecting via Gigabit Ethernet was almost same. Least connection method did not take into account the intensity of task request and server performance, because consumption and utilization of system resource for different task were very different and the number of connection could not really reflect load condition. Polling method was the simplest and easiest method that every node was selected equally, but it needed the same node environment to run better.
In this paper, we propose a load balancing method based on the combination of dynamic and static load balancing strategy, which can balance load of each node of distributed rendering system, thus system needs less time to render a certain amount of data. Meanwhile, we put forward load index to estimate the system performance and load balancing condition.
2
Distributed Rendering System
This paper presents a distributed rendering system which adopts Client/Server mode. Fig. 1 is the flow chart of the system whose nodes connect server via 1000Mbit Ethernet, server submits task and uses static load balancing strategy to divide and assign task, so that every node receives task with the same size. When some nodes turn into idle state, server uses dynamic load balancing strategy to redistribute task through load transferring to balance load of each node, so as to improve resource utilization and the performance of the distributed rendering system.
Distributing render task
Rendering
Sending the request of load transferring
Receiving request of load transference
Completion of rendering Sending the command of
load transference and receiving request
Receiving information of load transference and
sending response Rendering command response response command information command information Rendering Receiving assignment Static load balancing Dynamic load balancing Server Nodes Start End sending command of distributing the task
__ file j rest
F Ffile_j_rest
Server is the central part of the system and all nodes connect it via high-speed Ethernet, its stability directly affects system performance. The main work of server can be summarized as follows.
(1) Task Allocation: According to frame-to-frame coherence, server uses average cycle division method to divide all frames of every render file.
(2) Load Transferring and Monitoring: According to the size of load, sever decides and executes load transferring, and monitors running status and the load condition of every node.
Nodes, the real executor of system, are responsible for rendering task. Meanwhile, according to load condition, they send request of load transferring to server or receive command of load transferring from server.
3
Load Balancing Method
Task transferring of dynamic load balancing strategy will increase communication time and wait-ing time, sometimes decrease system performance, even its effect is worse than static load bal-ancing strategy [12]. To solve the problem, static load balbal-ancing strategy is used to average task and assign them to each node when server submits task. When nodes are rendering, using dynamic load balancing strategy balance load by transferring a part of load of heavy-load nodes to light-load nodes.
3.1
Static load balancing strategy
When server submits task, average cycle division method (ACDM) is used to divide task according to frame-to-frame coherence and assign them to each node. We set the set of n nodes and the set of s render files as N ode = {N1, N2,· · · , Nn}, file = {file1, f ile2,· · · , files}, respectively. For
conveniently expressing our method, we set sequence number of the start frame of every render file as 1, and the corresponding frame number of render files is Ff ile 1, Ff ile 2,· · · , Ff ile s, respectively.
ACDM is to assign all frames of f ilej(j = 1, 2,· · · , s) to N1, N2,· · · , Nnin rotation, the number
of frames for every node is Fj =⌊Ff ile j/n⌋, and ⌊x⌋ indicates rounding down x, the set of frames
for Ni is sFi,j = {k ∗ n + i|k = 0, 1, · · · , Fj − 1}. We call the frames allocating by ACDM as
allocated frames, and call the rest of frames as unallocated frames, the number of unallocated frames is Ff ile j rest = Ff ile j− n ∗ ⌊Ff ile j/n⌋. We define a mark of having completed task for Ni
as ri, if Ni has completed task, we set ri = 1, otherwise ri = 0.
3.2
Dynamic load balancing strategy
When some nodes turn into idle state, we use dynamic load balancing strategy to improve the processing efficiency of system. The rendering process of f ilej is as follows.
When some nodes complete the assigned task at the moment, we set the subscript set of nodes as L = {li|rli = 1, i = 1, 2,· · · , nl}, and set the subscript set of the rest of nodes as M = {mi|rmi = 0, i = 1, 2,· · · , nm}. We record the time which each node complete a frame,
and compute the average rendering time Tli of the corresponding nodes of L. We sort nodes
in ascending order according to the value of Tli, and reset the subscript set of these nodes
L′ ={li′|rli′ = 1, i = 1,· · · , nl}.
When nl ≥ 1 and Ff ile j rest > 0, server will assign unallocated frames to the corresponding
nodes of L′. If Ff ile j rest ≤ nl, as shown in Fig. 2(a), server distributes unallocated frames to
the first Ff ile j rest corresponding nodes of L′ one by one, we use F (Ff ile j rest)→ F (L′, Ff ile j rest)
to express this procedure; then update node state rli′ = 0, where i = 1,· · · , Ff ile j rest, and set
Ff ile j rest = 0, L = L
∩
CLA and M = M
∪
A, where A = {li′|rli′ = 1, i = 1,· · · , Ff ile j rest}.
Otherwise, as shown in Fig. 2(b), server distributes the first nlframes of unallocated frames to the
all corresponding nodes of L′ one by one, we use F (nl)→ F (L′, nl) to express this procedure; then
update node state rli = 0, where i = 1,· · · , nl, and set Ff ile j rest = Ff ile j rest− nl, M = M
∪ L′ and L =∅. (a) unallocated frames
then*êëFfile_j núû+1-th frame
_ 2 filej n*êëF núû+ Ă _ 1 filej F -_ file j F 1 l N¢ 2 l N¢ 1 _ _ F file j rest l N¢ -_ _ F file j rest l N¢ Ă _ _ file j rest F the -th frame the -th frame the -th frame (b) _ 1 filej n*êëF núû+ Ă 1 l N¢ nl l N¢ Ă _ _ file j rest F _ filej l n*êëF núû+n Ă the -th frame the -th frame _ file j F the -th frame unallocated frames
Fig.2: The procedure of assigning unallocated frames
If Ff ile j rest = 0, card(M ) ≥ 1 and card(L) ≥ 1(card(X) indicates the number of set X),
system will transfer load. We discuss load transferring process from the following two cases.
(1) When card(L) ≥ card(M), server assigns the last frame of the corresponding nodes of
M to the first card(M ) corresponding nodes of L′ one by one, using F (M, card(M )) →
F (L′, card(M )) to express the procedure, then update rli, L and M .
(2) When card(L) < card(M ), server assigns the last frame of the first card(L)
correspond-ing nodes of M to the all correspondcorrespond-ing nodes of L′ one by one, using F (M, card(L)) →
F (L′, card(L)) to express the procedure, then update rli, L and M .
4
Experimental Results
When the distributed rendering system uses our method, this experiment is used to estimate system performance from two aspects: the time of finishing task and the load balancing condition of system.
4.1
Experimental environment
Our system is built by two different computers, PC1 and PC2, whose hardware configuration respectively are: Inter(R) Core(TM) i3 CPU @2.93GHz, 4GB RAM, Quad-Core Processor,
In-ter(R) HD Graphics Card; InIn-ter(R) Core(TM) i7 @2.8GHz, 4GB RAM, Eight-Core Processor, NVIDIA GeForce GTX 460 Graphics Card; Operating System of all computer is Windows XP. We divide experiments into three groups that they are consist of 2 nodes, 4 nodes and 8 nodes, where 2 nodes are divided into 1 PC1 + 1 PC2 and 2 PC1s two cases; 4 nodes are divided into 2 PC1s + 2 PC2s and 4 PC1s; 8 nodes represents 6 PC1s + 2 PC2s. Table 1 shows the detailed information of five render files: polygon number, file size, the number of frames, output image resolution and others.
Table 1: Detailed information of five render files
– f ile1 f ile2 f ile3 f ile4 f ile5
Polygon number 130944 12985 46022 5212 10996
Size(MB) 36 3 3.51 36 64
Ff ile j 18 23 18 31 11
Renderer mental ray mental ray mental ray mental ray mental ray Output image format ∗.tga ∗.tga ∗.tga ∗.tga ∗.tga Image resolution(pixel∗pixel) 4098∗2304 1024∗768 648∗480 3072∗3072 4096∗4096
4.2
The time of finishing task
Table 2 shows the time needed for the system to finish the five files when we use no load balancing, least connection method (LCM), polling method (PM) and our method for every experiment, and unit of time is second (s). Obviously, when system does not use load balancing method, namely no load balancing, the time needed is the maximum; even we increase the number of nodes, it does not improve system performance effectively. When system uses the same load balancing method and the same number of nodes, except 2 nodes system using least connection method, compared with system using the same configured computers, system using the different configured computers needs less time to complete the five files, because the hardware configuration of PC2 is better than which of PC1. We know from Table 2, when the system uses the same number of nodes and the same configured computers, the time needed for system using our method is the least. Experiments show that our method can decrease time for system to complete a certain amount of task.
Table 2: The time (s) for the system finishing the five files
Experiment nodes PC1 PC2 no load balancing LCM PM Our method One 2 1 1 5622.594(s) 3467.516(s) 3386.375(s) 3309.265(s) Two 2 2 0 6336.744(s) 3368.203(s) 3746.063(s) 3347.265(s) Three 4 2 2 4349.500(s) 2084.328(s) 1857.5(s) 1729.453(s) Four 4 4 0 5177.094(s) 2131.953(s) 1983.313(s) 1865.717(s) Five 8 6 2 5140.828(s) 1192.422(s) 1098.531(s) 995.623(s)
Fig. 3 indicates the average time for 2 nodes, 4 nodes and 8 nodes system using least connection method, polling method and our method to complete the five files, and vertical axis indicates time, horizontal axis indicates the number of nodes. Red solid line indicates the time for system using our method, and it is the least. As the number of nodes increases, rendering time decreases
gradually for system using any one method. When nodes increases exponentially, rendering time does not decrease strictly exponentially, because the more the nodes, the more overhead and waiting time for server assigning task and transferring load.
2 3 4 5 6 7 8 500 1000 1500 2000 2500 3000 3500 4000 node number time/s Our method Least connection method Polling method
Fig. 3: Variation of the average time for 2 nodes, 4 nodes and 8 nodes system
4.3
The load balancing condition of system
We use load index to estimate the system performance. We set the load of node Ni as loadi at
the present moment that system is running.
First, we compute the ratio of the load of node Ni and the average load of all nodes: ratioi =
loadi/µload, (µload ̸= 0), where µload =
∑n
i=1loadi/n, n is the number of nodes.
Then, we set ratio = {ratio1, ratio2,· · · , ration}, assume the probability of each node is 1/n,
and compute the variance D(ratio) of ratio.
D(ratio) = E{(ratio − µratio)2} = n1
∑n
i=1(ratioi)2− (µratio)2 = n1[
∑n
i=1(ratioi)2− n]
The value range of D(ratio) is [0, n-1]. When the value of D(ratio) decreases, the load tends to be more balanced, and system performance increases; and vice versa.
Fig. 4 indicates the load balancing condition of 2 nodes, 4 nodes and 8 nodes system, where (a) and (b) denote the load balancing condition of 2 nodes system using the same configured computers and the different configured computers, respectively; (c) and (d) denote the load balancing condition of 4 nodes system using the same configured computers and the different configured computers, respectively; (e) denotes the load balancing condition of 8 nodes system using 6 PC1s + 2 PC2s. It can be seen from the following five graphs, the fluctuation of 2 nodes system is large and frequent, because load of whichever node changes, the load balancing condition will change a lot. When there are a large number of nodes, changing the load of a few nodes have a little influence on the balance, therefore, the more the nodes, the smaller the fluctuation, and the more stable the system performance. Compare with (b) and (d) respectively, we can know
D(ratio) and fluctuation of (a) and (c) using any load balancing method are slightly smaller,
which indicates when the hardware configuration is same, the processing capacity is almost same, the time for nodes rendering adjacent frames is also similar and load tends to be more balanced. The red solid line in Fig. 4 denotes the load balancing condition of system using our method; similarly, the green dotted line denotes using least connection method and blue dot and dash line denotes using polling method. Experiment results show that our method is more stable,
thus the system performance is better. Especially, for 4 nodes and 8 nodes system, D(ratio) is 0 or slightly greater than 0 during intermediate period, which indicates that load of each node tends to balance. While the fluctuation of D(ratio) which the blue dot and dash line represents is frequent. It indicates when system assigns task to each node in rotation, load of each node changes fast and frequently, resulting in poor overall balance. The fluctuation and D(ratio) which green dotted line indicates are smaller than which of polling method, but a little bit bigger than which of our method.
0 500 1000 1500 2000 2500 3000 3500 0 0.2 0.4 0.6 0.8 1 1.2 1.4 time/s (a) variance Our method Least connection method Polling method 0 500 1000 1500 2000 2500 3000 3500 0 0.2 0.4 0.6 0.8 1 1.2 1.4 time/s (b) variance Our method Least connection method Polling method 0 500 1000 1500 2000 0 0.5 1 1.5 2 2.5 3 time/s (c) variance Our method Least connection method Polling method 0 200 400 600 800 1000 1200 1400 1600 1800 2000 0 0.5 1 1.5 2 2.5 3 time/s (d) variance Our method Least connection method Polling method 0 200 400 600 800 1000 1200 0 1 2 3 4 5 6 7 time/s (e) variance Our method Least connection method Polling method
Fig. 4: Load balancing condition of different system: (a) 2 PC1s system, (b) 1 PC1 + 1 PC2 system, (c) 4 PC1s system, (d) 2 PC1s + 2 PC2s system, (e) 6 PC1s + 2 PC2s system
We can see from Fig. 4, the value of D(ratio) instantly rises to the maximum for all load balancing methods at the last moment, because there is only one node which owns load, so
D(ratio) = 1n[∑ni=1(ratioi)2 − n] =
(n2−n)
n = n− 1. Similarly, the value of D(ratio) is n − 1
during the initial stage of assigning task, because every node receives task in order. As every node receives or completes task, the value of D(ratio) will change a lot, so the fluctuation is large during the early period and the late period. In general, our method is better than least connection method and polling method on improving system performance.
5
Conclusions and Future Work
In order to solve load balancing problem in distributed rendering systems, we propose this method based on the combination of dynamic and static load balancing strategy, it can effectively balance load, reduce time for processing a certain amount of task, and improve system performance. Meanwhile, we propose load index to estimate the load balance of system, we can judge the overall balance at every moment. The disadvantage of this method is that it limits to be used in distributed rendering systems, and we cannot guarantee its good performance when it is used in other distributed systems. The future work is to generalize this method so that it can be applied to different distributed systems, and achieve good balance and high treatment efficiency.
References
[1] H.P. Chen, H. Li and G.L. Chen, Heuristic task scheduling in parallel distributed computing, Computer Research and Development 34 (1997) 74-78.
[2] R.Diekmann, A. Frommer and B. Monien, Efficient schemes for nearest neighbor load balancing, Parallel Compute 25 (1999) 789-812.
[3] Y.T. Wang, et al., Load sharing in distributed system, IEEE Transactions on Computers 34 (1985) 204-217.
[4] T.L. Casavant, J.G. Kuhl, A taxonomy of scheduling in general-purpose distributed computing systems, IEEE Transactions on Software Engineering 14 (1988) 141- 154.
[5] K.M. Baumgartner, et al., A global load balancing strategy for a distributed computer system, in: Proc of the 1988 International Conference on Distributed Computer Systems, 1988, pp. 93-102. [6] J.X. Yang, G.Z. Tan and R.S. Wang, A Survey of Dynamic Load Balancing Strategies for Parallel
and Distributed Computing, Acta Electronica Sinica 38 (2010) 1122-1130.
[7] G.B. Zheng, Achieving high performance on extremely large parallel machines: performance pre-diction and load balancing, Urbana: UIUC, 2005.
[8] T. Agarwal, Strategies for topology-aware task mapping and for rebalancing with bounded migra-tions, Urbana: UIUC, 2005.
[9] M.H. Willebeek-LeMair, A.P. Reeves, Strategies for dynamic load balancing on highly parallel computers, IEEE Transactions on Parallel and Distributed Systems 4 (1993) 979-993.
[10] Y.L. Wang, B.L. Ye, Research on dynamic load balancing of parallel field, Science Technology and Engineering 5 (2005) 572-578.
[11] G. Pei, W.D. Zeng, et al, Research of network load balancing in distributed systems, Computer CD Software and Applications 6 (2010).
[12] S. Penmatsa, A.T., Chronopoulos, Dynamic Multi-user Load Balancing in Distributed Systems, in: Proc of the International Conference on Parallel and Distributed Processing Symposium, 2007, pp. 1-10.