Adaptive Parameter Setting for QoS Aware Load Balancing
Algorithm
KIMMO KAARIO Honeywell Industrial Control Control System Development Ohjelmakaari 1, FIN-40500 Jyv¨askyl¨a
FINLAND
TIMO H ¨AM ¨AL ¨AINEN
University of Jyv¨askyl¨a Faculty of Information Technology
Department of Mathematical Information Technology Telecommunications
P.O.Box 35, FIN-40351 Jyv¨askyl¨a FINLAND PERTTI RAATIKAINEN VTT Information Technology Telecommunications P.O.Box 1202, FIN-02044 VTT FINLAND
Abstract: - The swift growth of Internet has boosted the use of Web based services and in some practical cases has
led to overwhelming request bursts to servers. Relational database queries, image storage/retrieval and other new types of application transactions have become increasingly popular. Their coexistence in commercial parallel and distributed systems have generated some uniquely new loading problems. For example, the constant increase of request rate finally leads to processing power requirement exceeding that of the accessed server. As a consequence, the response times increase and some portion of the requests are lost. Clustering of servers to meet the growing demand for server processing capacity, especially in web-based service supply, have created the need for intelligent switching at front-end devices. As a consequence of clustering, multilayer switching schemes have been developed to enable optimum loading of the individual servers in a cluster. In this paper, we formulate the load balancing problem taking the QoS into consideration and introduce a QoS aware load balancing algorithm (QoS-LB). The performance of the algorithm is simulated and results indicating the load balancing capability of the algorithm are presented. The overall idea of this paper is to describe an algorithm that actually provides Class of Service based differentiated access to server clusters, and offers better playground for QoS mechanisms in client-server environ-ments. The engineering task to offer QoS guarantees with such a differentation tool is out of the scope of this paper.
1
Introduction
The server capacity problem has normally been solved by implementing a cluster of servers having identical or partly identical content, which on the other hand has created the problem of balancing load between the clustered servers. Front-end devices, supporting various kinds of load balanc-ing methods, have been developed to direct requests to the servers. Most of the experimented as well as implemented load balancing schemes employ sufficiently simple algo-rithms that have been developed for a specialized hardware and software architecture. They usually do not take into ac-count the Quality of Service (QoS) issues. The simplest ones share the load uniformly between the servers by using algo-rithms like round-robin [2]. Some systems consider the pro-cessing power of the servers and utilize the weighted round-robin scheme [3]. More intelligent systems take response times into account and try to optimize system performance, e.g. by maximizing cache hit rate [7]. The most advanced of these systems are called web switches operating at layer 5 of the TCP/IP protocol stack, i.e. the application layer for the Web, and use content of the IP packets in making the load balancing decisions [1].
Our goal was to develop a load balancing scheme that would fit optimally into a wide class of distributed computer system architectures and take required QoS into account. The incoming requests would be directed to servers based on the QoS needs of the requested services and loading level of the individual servers would be tuned to support required QoS levels. This means that more processing power is reserved for high priority requests than for lower priority ones. The algorithm was introduced in [4], and it was optimized for high cache hit-rates in [5]. This paper modifies the algorithm by introducing an adaptive tuning method for its parameters. The algorithm is being implemented to Media Switch [12], which is a Linux based programmable switch.
The rest of this paper is organized as follows. Chapter 2 introduces the QoS based load balancing problem and Chap-ter 3 presents the developed algorithm. ChapChap-ter 4 presents the performance evaluation results and Chapter 5 summarizes the main results and outlines the future work.
2
Problem Formulation
Let be the number of servers in a server cluster, the
num-ber of served QoS classes, and
the desired upper limit
of load in server , . Let row, , of
matrix .. . (1)
indicate the current load of QoS class in the cluster at time
. Now, we can describe the connections of each QoS class in the cluster by a connection matrix
.. . ... . .. ... (2) where "! #$
% indicates percentage of traffic in QoS class
that is currently served by server .
The following three rules have to be satisfied:
1& The load on each server is preferred to be less than
, which means that there should be some penalty
when '( *) (3) is not valid.
2& Every customer must be served, i.e.
+ , - ./- (4)
3& The requests for a certain QoS class should be served
by the same server if possible. With this rule we may
assume that rule 4& is well-formed. This requirement
can be achieved, e.g. by minimizing the product
0 , 12- (5)
By using this approach, the minimization procedure
tries to find situations, where some of the elements
in each row of the connection matrix
is close to 0. This leads to a state, where the load for that traffic type
concentrates to a single server (with
close to 1). If any of the elements in that row equals to 1, the other el-ements must equal to zero. This would be the ideal case,
if the following rule (rule 4& ) is minimized at the same
time.
4& ”QoS awareness”. In the following, we assume that the
most important QoS class is class 1 and the lowest QoS class gets the highest class number. It is also assumed that ) 3 (6) for45 56
. With these assumptions we can
minimize + , + , 87 7 7 6 7 7 7 (7)
to prefer serving the higher QoS classes in the less
loaded nodes (See rule 19 ).
Equation (7) behaves as some sort of a ”pointer” to a relevant server for each QoS class - if the chosen server
is close to ideal, then:;=<?>
@ A
: is small and elementB
>C
can be large. When the ”distance”:;D<->
@ A
: gets larger,
the minimization tries to find smaller elementsB
>C
for these servers.
Now, the problem is to minimize the function
EGFIH J C KML N @ O >KML B >C P Q H J C KML R C N @ J >K8L(S > B >C < TDU >V U C P Q @ J >K8LWR > H J C KML8X X X X ;*<ZY[ A X X X X B >C (8) with constraints H J C K8L B >C F-\ ] Y F-\ ] ^^^] [ (9) and S _ B C*` Tab c C ] ; F-\ ] ^^^] A (10) whereT ab c C
is the maximal processing capacity of server; .
The problem gets a lot more complicated when we con-sider the requirement of high cache hit rates and try to serve the same kind of requests by known servers. The problem of assigning the servers only by the content of each request is analogous to the introduced QoS based scheduling problem, but combining these two different (and in some ways contra-dicting) approaches leads to the load matrix (compare to Eq. (1)) S Fedf f fg S L L S L h ^ ^ ^ S L i S h L S h h ^ ^ ^ S h i .. . ... . .. ... S @ L S @ h ^ ^ ^ S @ i jk k k l (11) where [
is the number of QoS classes and m is the number
of services (or content types, depending on application of the algorithm). Element
S
>n
indicates now the current load in QoS class
Y
that is produced by requests of typeo .
Now, ifA is the number of servers, the connection matrix
B is an ["p m p A matrix, where B C F df f fg B L LC B Lh C ^ ^ ^ B L i C B h LC B h h C ^ ^ ^ B h i C .. . ... . .. ... B @ B @ ^ ^ ^ B @ jk k k l (12) and; A .
If we extend the previous problem formulation (Equation (8)), we are faced with the problem of minimizing the fol-lowing four object functions
E L F H J C K8L @ J >K8L i O nKML B >nC (13) E h F H J C K8L i J nKML @ O >KML B >n C (14) E qF H J C KML @ J >KML i J nK8L rS >n B >nC < TDU >V U Cts (15) and E uF @ J >KML i J n K8L H J C KML8X X X X ;*<ZY[ A X X X X B >n C (16)
at the same time.
We have also two constraints
H J C KML B >n C F-\ ] (17) for all Y Fv\ ] ^^] [ ,o F-\ ] ^^^] m and @ J >KML i J nK8LwS >n B >n C*` T a=b c C (18) for all; F-\ ] ^^^] A .
The introduced problem is ideal to be analyzed interac-tively, e.g. by WWW-NIMBUS (Nondifferentiable Interac-tive MultiobjecInterac-tive BUndle-based optimisation System) [10]. This tool allows the user to choose the importance of each object function during the optimization process, and gives us the possibility to ”surf” between the Pareto optimal solutions of the problem. The user (or decision maker) is also needed to check if the solutions are relevant. As is obvious, each of the four object functions may achieve optimal solutions that are not necessarily having any practical value from the origi-nal point of view.
The analytical research of the problem is a topic for an-other paper. This paper introduces an algorithm that is based on our proposal in [4]. The main merit of this paper com-pared to [4] is the introduction of adaptive parameter assign-ment that is introduced in the following chapter. Without this feature the algorithm would have been useless in practical implementations.
3
QoS Aware Load Balancing
Algo-rithm
In [4] we have introduced a QoS Aware Load Balancing Al-gorithm (QoS-LB) that scheduled well requests to servers to achieve the QoS based goals described in the previous section. This version of algorithm needed, however, some knowledge from the user to tune the parameters to give good results. This cannot be applied in practice and therefore the algorithm has been developed a bit further. Now, the most
important parameters of the algorithm are adaptive (step 4x
in the following) and there is no need for user intervention. Lety{z | be a set of servers in a server cluster,}vz |
a set of supported services (note that each service must be
identified uniquely by a number in this set), and ~z-| a
set of supported QoS classes in the server cluster (again, QoS
classes must be identified by these numbers). If is the
number of servers, the number of services, and the
num-ber of QoS classes, then y5 ,}Z 8 ,
and~? .
The algorithm goes as follows:
0x Initialize the variable .
An example [4] for initializing this variable to achieve lower loads in the servers that will preferably serve the most important customers (i.e. the traffic in the best QoS class):
We have function8 to assign each server a maxi-mal QoS class that it is preferred to serve. By using this function, the set of preferred QoS classes in server becomes
=? 8
Now, we can define 8 , e.g. by using a linear ap-proach 8= 2 w¡ ¢*£¤¥ ¦ ¡ ¢ ¤ § when©¨ª 2 ¥ ¦ ¡ ¢ ¤ §= when©«ª (19)
where e . Here, ¬ ® means the ceil of , and ¯w° is the floor of . Linear approach means that the number of preferred QoS classes decreases quite lin-early when the server number increases. In some cases, however, there might be need for different weighting be-tween the QoS classes. In this case, we can try a bit different version of8 , for example
8/ *±² " ¯ ³"´µ ¶ ¤ ° ¤ ¬(¡ ¢ ® Go to step 1x .
1x Try to find a server that is included in the set of
pre-ferred servers for the requested service · and its QoS
class¸ . If this kind of server is found and its load is
un-der the limit¹(º »¼ ºw¸ · , choose it as the server for the
requested connection and go to step 4x . Otherwise, go
to step 2x .
2x Try to find a server that has a load less than¹(½¾ ¿ . The
limit¹w½¾ ¿ must satisfy the rule
¹(½¾ ¿À *±²
¹(º »¼ ºw¸ · Á¸z~ ·Dz}/ (20)
where~ is the set of all supported QoS classes in the
server cluster, and S is the set of all the supported
ser-vices in the cluster. If this kind of server is found,
choose it as the server for the requested connection and
go to step 4x . Otherwise, go to step 3x .
3x Choose the least loaded node of the cluster as the server
for the requested connection. Go to step 4x .
4x Update¹Mº »¼ º dynamically.
In this paper, this is done as follows:
Let be a subset ofy that includes the servers that are
the preferred servers for the most critical QoS classes. In these simulations, the most critical QoS classes mean the most important half of the classes. If the load in any
server in server set (let this server beà ) is more than
"±² »Ä Å(ÆÇ Ä È¹Mº »¼ º É Ê , set ¹Mº »¼ ºw¸ · / "±² ¹ ´=Ë Ì (21) ¹Mº »¼ ºw¸ · "±² ¹Mº »¼ º ¸ · ¤4Í Î
for all ¸?z~ and for all ·?zÏ} , and go to step 1x .
Otherwise, set
¹(º »¼ ºw¸ · =ª2ÉÐ/¹ ´Ë Ì
¹Mº »¼ ºw¸ · 8³ Ñw (22)
for all¸z~ and for all·Dz} . Go to step 1x .
In the simulations, we have used valuesÑ ,ÒÓ ,
andÍ
ÏÓ*Ô Ã . The bigger the value ofÑ is the faster
¹Mº »¼ º decreases when there is a period of low load.
Cor-respondingly, the bigger the values forÒ and
Í are the
faster the algorithm reacts for traffic bursts. Here,¹
´Ë Ì
means the maximum processing capacity of the servers,
and¹(º »¼ ºw¸ · the user defined limit for relevant
maxi-mum load in the server for request of service· that
be-longs to QoS class¸ .
In the algorithm, the maximal number of concurrent con-nections to the back-end nodes is limited to
¹(Ȩ
¢
ÕÖ ×
¹
4
Simulation Results
In most of the Matlab [11] simulations, inter-arrival times for the requests were created by using a simple Poisson process. The service times were usually created as a Pareto process.
In order to create self-similar traffic patterns, using heavy tailed service time distributions is sufficient. However, in some sets of the simulations, we also used self-similar inter-arrival times. The reason for this was to get more information of the behaviour of the algorithm under quite heterogeneous set of load patterns. The existence of self-similarity in net-work traffic can be studied in more detail, e.g. from [6, 8] and [9].
Simulations revealed that the problem of mapping all the
pairs ÙÚ Û Ü Ý of the requests optimally to the servers is quite
difficult, if we always want to have high cache hit-rates, and to treat QoS classes unequally (i.e. use the servers with lower response time for the best customers) at the same time. The problem is so interesting that we still have intensive research going on to improve the mapping scheme that will try to sat-isfy these contradicting goals in a way that is close to the op-timal solution. Some considerable improvements have been obtained since [4], and the ”only QoS based” assignment works already quite well. This can clearly be seen in Fig-ures 1 and 2, which illustrate performance of a cluster of four servers. Figure 2 demonstrates how QoS-LB works, i.e. as-signs unequal load to servers according to QoS classes. As a reference, Figure 1 shows similar performance curves for a round-robin scheme. In QoS-LB, the server number 4 is assigned as the preferred server for the most important cus-tomers, and as the QoS class gets worse, the number of pre-ferred server gets smaller. When looking at the figures, you should note that curves start from an empty system, and the load is not stable in the first half of the simulation run (confi-dence problems get bigger when using heavy-tailed distribu-tions).
1000 2000 3000 4000 5000 6000 7000 0
0.5 1
Normalized load of the servers using Round−Robin
Server 1 1000 2000 3000 4000 5000 6000 7000 0 0.5 1 Server 2 1000 2000 3000 4000 5000 6000 7000 0 0.5 1 Server 3 1000 2000 3000 4000 5000 6000 7000 0 0.5 1 Server 4 Time
Figure 1: Normalized load of the servers using Round-Robin.
1000 2000 3000 4000 5000 6000 7000 0
0.5 1
Normalized load of the servers using our algorithm (linear f(m), dynamic Thigh)
Server 1 1000 2000 3000 4000 5000 6000 7000 0 0.5 1 Server 2 1000 2000 3000 4000 5000 6000 7000 0 0.5 1 Server 3 1000 2000 3000 4000 5000 6000 7000 0 0.5 1 Server 4 Time
Figure 2: Normalized load of the servers using QoS-LB with
dynamicÞMß àá ß (Eq. (19) was used in the initialization).
A major improvement towards real implementations of the
algorithm was that the introduced dynamic setting ofÞMß àá ß
seemed to work as expected (Þ(ß àá ß follows the actual load).
This can not be seen from the figures as clearly as the CoS
differentation. When developing the dynamics ofÞ(ß àá ß , the
main issue was to find relevant values forâ ,ã , andä (see Eqs.
(22) and (23) in step 4å of the algorithm). As mentioned
be-fore, we finally decided to use values â-æç , ãæ5è , and
ä2æèDé ê . As the need for this kind of dynamics differs
de-pending on the application of the algorithm, it may be worth of trying different values for these three parameters. This is one topic for further study.
5
Conclusions
Due to the fast growing demand for web-based services, methods to increase server processing capacity have been
studied intensively. Clustering of servers have been a
straightforward way to go, but scheduling of the incom-ing requests between the servers have not been an equally clear task. A frond-end device, often called a multilayer or content-based switch, is needed to carry out the request scheduling. A number of different scheduling schemes have been developed and the most novel of them utilize informa-tion from the link layer up to the applicainforma-tion layer.
In this paper, we have studied the issue of load sharing within a cluster of servers and introduced an adaptive tun-ing method for QoS aware load balanctun-ing scheme (QoS-LB). First, a mathematical formulation for the load balanc-ing problem to optimize system performance as a function of the required QoS classes and the content of the requested services has been developed. Second, an algorithm to tune server loading levels to gain QoS aware total system perfor-mance has been introduced. Results obtained by running a
number of simulation cases have shown that the developed algorithm works as planned.
However, the developed algorithm doesn’t consider the cache hit rates well enough. New versions of the algorithm [5] experiment some advanced features to solve this weak-ness, but the main work to be done is to find and implement an algorithm that more closely fulfills the optimality require-ments set in Chapter 2.
It is for further study to tune the QoS-LB algorithm to give performance closer to the optimal one under the highly vary-ing real-life load. The performance of introduced algorithm goes quite close to the optimum, but there is still a place for improvement. The problem has so many input parameters that the theoretical optimum can never been reached in prac-tice. It has to be noticed that including some knowledge of the requested services to the problem increases the complex-ity of the problem and may lead to an algorithm that is not straightforward enough to be implemented in practice. Our goal is to keep the algorithm simple enough to enable con-crete implementations by keeping the required processing ca-pacity in reasonable limits in the front-end switches.
The introduced algorithm does not directly offer any prov-able quality of service bounds. From this point of view, the algorithm would be better referred to as a Class of Service al-gorithm. The overall idea of this paper is to describe an algo-rithm that provides differentiated access to different type of customers or requests, and offers better playground for ”real” QoS mechanisms. The engineering task to offer QoS with such a differentation tool is out of the scope of this paper.
References
[1] G. Apostopoulos, D. Aubespin, V.Peris, P. Pradhan, D. Saha: ”Design, Implementation and Performance of a Content-Based Switch”. Proceedings of IEEE INFO-COM 2000, pp. 1117-1126.
[2] T. Brisco: ”DNS Support for Load Balancing”, RFC 1794, April 1995.
[3] A. Fox, S. D. Gribble, Y. Chawathe, E. A. Brewer, and P. Gauthier: ”Cluster-based scalable network services”. In Proceedings of the Sixteenth ACM Symposium on Operating System Principles, San Malo, France, Oct. 1997. pp. 1019–1027, 1997.
[4] K. Kaario, T. H¨am¨al¨ainen, J. Zhang: ”Tuning of QoS Aware Load Balancing Algorithm (QoS-LB) for Highly Loaded Server Clusters”. Proceedings of IEEE Interna-tional Conference on Networking 2001 (ICN’01), July 11-13, 2001, Colmar, France.
[5] K. Kaario, T. H¨am¨al¨ainen and M. Wikstr¨om: Method for Improving Cache Hit-Rates in QoS-Aware Load Balancing Algorithm (QoS-LB). Proceedings of IEEE
Globecom 2001, Volume 4 , pp. 2321 -2325, November 2001, USA.
[6] W. Leland, M. Taqqu, W. Willinger and D. Wilson: ”On the self-similar nature of Ethernet traffic (extended ver-sion)”. IEEE/ACM Tran. Networking, Vol. 2, 1994, pp. 1-15.
[7] J. Liedtke, V. Panteleenko, T. Jaeger, and N. Islam: ”High-performance caching with the Lava hit-server”. In Proceedings of the USENIX 1998 Annual Technical Conference, New Orleans, LA, June 1998.
[8] V. Paxson and S. Floyd: ”Wide Area Traffic: The Fail-ure of Poisson Modeling”. IEEE/ACM Transactions on Networking, Vol. 3, No. 3, June 1995, pp. 226-244. [9] B. Tsybakov, N. D. Georganas: ”On Self-Similar
Traf-fic in ATM Queues: Definitions, Overflow Probabil-ity Bound, and Cell Delay Distribution”. IEEE/ACM Transactions on Networking, Vol. 5, No. 3, June 1997, pp. 397-409.
[10] WWW-NIMBUS, Nondifferentiable Interactive
Multiobjective BUndle-based optimisation System, http://nimbus.mit.jyu.fi
[11] http://www.mathworks.com/