Middleware Services for Dynamic Clustering of
Application Servers (Ph.D. Thesis)
Giorgia Lodi
Technical Report UBLCS-2006-06
March 2006
Department of Computer Science
University of Bologna
Mura Anteo Zamboni 7
40127 Bologna (Italy)
The University of Bologna Department of Computer Science Research Technical Reports are available in PDF and gzipped PostScript formats via anonymous FTP from the areaftp.cs.unibo.it:/pub/TR/UBLCS
or via WWW at URLhttp://www.cs.unibo.it/. Plain-text abstracts organized by year are available in
the directoryABSTRACTS.
Recent Titles from the UBLCS Technical Report Series
2005-7 Whole Platform (Ph.D. Thesis), Solmi, R., March 2005.
2005-8 Loss Functions and Structured Domains for Support Vector Machines (Ph.D. Thesis), Portera, F., March 2005.
2005-9 A Reasoning Infrastructure to Support Cooperation of Intelligent Agents on the Semantic Grid, Dragoni, N., Gaspari, M., Guidi, D., April 2005.
2005-10 Fault Tolerant Knowledge Level Communication in Open Asynchronous Multi-Agent Systems, Dragoni, N., Gaspari, M., April 2005.
2005-11 The AEDSS Application Ontology: Enhanced Automatic Assessment of EDSS in Multiple Sclerosis, Gas-pari, M., Saletti, N., Scandellari, C., Stecchi, S., April 2005.
2005-12 How to cheat BitTorrent and why nobody does, Hales, D., Patarin, S., May 2005.
2005-13 Choose Your Tribe! - Evolution at the Next Level in a Peer-to-Peer network, Hales, D., May 2005. 2005-14 Knowledge-Based Jobs and the Boundaries of Firms: Agent-based simulation of Firms Learning and
Work-force Skill Set Dynamics, Mollona, E., Hales, D., June 2005.
2005-15 Tag-Based Cooperation in Peer-to-Peer Networks with Newscast, Marcozzi, A., Hales, D., Jesi, G., Arte-coni, S., Babaoglu, O., June 2005.
2005-16 Atomic Commit and Negotiation in Service Oriented Computing, Bocchi, L., Ciancarini, P., Lucchi, R., June 2005.
2005-17 Efficient and Robust Fully Distributed Power Method with an Application to Link Analysis, Canright, G., Engo-Monsen, K., Jelasity, M., September 2005.
2005-18 On Computing the Topological Entropy of One-sided Cellular Automata, Di Lena, P., September 2005. 2005-19 A model for imperfect XML data based on Dempster-Shafer’s theory of evidence, Magnani, M., Montesi,
D., September 2005.
2005-20 Friends for Free: Self-Organizing Artificial Social Networks for Trust and Cooperation, Hales, D., Arte-coni, S., November 2005.
2005-21 Greedy Cheating Liars and the Fools Who Believe Them, Arteconi, S., Hales, D., December 2005. 2006-01 Lambda-Types on the Lambda-Calculus with Abbreviations: a Certified Specification, Guidi, F., January
2006.
2006-02 On the Quality-Based Evaluation and Selection of Grid Services (Ph.D. Thesis), Andreozzi, S., March 2006.
2006-03 Transactional Aspects in Coordination and Composition of Web Services (Ph.D. Thesis), Bocchi, L., March 2006.
2006-04 Semantic Frameworks for Implicit Computational Complexity (Ph.D. Thesis), Dal Lago, U., March 2006. 2006-05 Fault Tolerant Knowledge Level Inter-Agent Communication in Open Multi-Agent Systems (Ph.D. Thesis),
Middleware Services for Dynamic Clustering of
Ap-plication Servers (Ph.D. Thesis)
Giorgia Lodi
1Technical Report UBLCS-2006-06
March 2006
Abstract
Nowadays, middleware are well-established technologies developed in order to make easier the implemen-tation of distributed applications. Among these applications, this thesis focuses on the so-calledenteprise applications.
Usually, these applications exhibit stringent Quality of Service (QoS) requirements, which are to be met in order to enable them to carry out their tasks effectively.
QoS has been widely defined in the literature; for the purposes of this thesis it is intended to be a set of non-functional application requirements that include availability, scalability, reliability and timeliness.
In current industry practice, these requirements are usually specified within so-calledService Level Agreements (SLAs)that, in the context of this work, are contracts used to regulate the contractual rela-tionships between application owners and middleware platform providers.
Application owners own applications that can be deployed, run and maintained using component-based technologies termedapplication servers. These technologies support clustering of application server in-stances for scalability, load balancing and fault-tolerance purposes; however, current clustering mech-anisms can meet only partially the above mentioned non-functional (i.e., the QoS) requirements of the applications they host, as these mechanisms are not designed to beQoS-aware. In this thesis,QoS-aware application servers (QaASs) are such component-based technologies in which their clustering mecha-nisms are capable of meeting QoS requirements that are specified within SLAs. Hence, the thesis proposes the design, implementation and experimental evaluation of an open source middleware architecture, con-structed out of QoS-aware middleware services, that extends current application server technology so as to create QaASs.
Specifically, the thesis focuses on three principal middleware services, termedConfiguration Service (CS), Monitoring Service (MS)andLoad Balancing Service (LBS). The CS, MS and LBS, according to the QoS specifications included into SLAs, are responsible for (i) configuring clusters of application servers in order to meet the SLA requirements; (ii) monitoring and adapting those clusters in case the QoS delivered by the resources of the clusters deviates from that required by the applications, and (iii) distributing the client requests among application server instances in the clusters so as to honor SLAs.
Experimental evaluations of the QoS-aware middleware services described in this thesis show that these services can be used effectively to extend current application server technology so as to enable that technology to meet its SLAs.
Part of the work described in this thesis has been developed within the context of the EU funded project TAPAS [TAP] and deployed for the production by the German Adesso AG company [ade], which partici-pated to the TAPAS project.
Chapter 1
Introduction
This thesis proposes the design, implementation, and experimental evaluation of a collection of QoS-aware middleware services, namedConfiguration Service (CS),Monitoring Service (MS)and Load Balancing Service (LBS). These services are intended to be part of a generic open source mid-dleware architecture, termedQoS Management subsystem, that this thesis describes in its design and implementation. The principal objective of this middleware architecture is to extend cur-rent application server technologies in order to make them QoS-aware, that is, in order to make application servers capable of honoring SLAs.
The contribution of this work is then to demonstrate that the aforementioned services can configure, monitor and dynamically adapt clusters of application servers, in order to provide distributed applications with such QoS guarantees as scalability, availability, reliability and time-liness. The applications considered in this thesis, and generally hosted by application servers, areenterprise applications; usually, enterprise applications are responsible for carrying out such business functions as managing supply chains, managing customer relationships, and so forth. Examples of these types of applications include e-commerce, automated stock-trading, and bank asset management systems.
The motivations for this work can be summarized as follows.
1
Motivations
A middleware platform is generally used as an architectural component for supporting the de-velopment and the execution of distributed applications. Its main role is to create a level of abstraction so as (i) to present a unified programming model to application developers and (ii) to mask out problems of system and network heterogeneity.
Middleware can be composed by multiple layers. There can be identified four principal levels [Sch02]:
• Host Infrastructure Middleware it encapsulates and enhances native operating system com-munication and concurrency mechanisms to create portable and reusable network pro-gramming components;
• Distribution Middleware it defines higher-level distributed programming models whose reusable APIs and mechanisms automate the native operating system network program-ming capabilities encapsulated by the previous level;
• Common Middleware Services the collection of the services of this level are responsible for augmenting the distribution middleware layer by defining higher-level domain-independent components that allow the application designers to concentrate on the application logic only;
• Domain-specific Middleware Services these services are tailored to the requirements of a specific application domain and embody knowledge of that domain.
1 Motivations Application Level Middleware Level Operating System and Communication Level QoS facilities
Figure 1. Levels of QoS Integration
Nowadays the middleware technology is largely adopted, in order to make easier the de-velopment of distributed applications; however, it is important that the middleware remains effective for such types of applications (e.g., enterprise applications) that can impose demands in terms of resource availability, adaptivity, reliability, scalability, and timeliness. In fact, these applications must operate under changeable environment conditions and they present stringent Quality of Service (QoS) requirements that are to be met in order to guarantee the correct behav-ior of the applications themselves.
QoS has been widely defined in the literature and it is usually referred to as collective effects and performance of services that determine the degree of satisfaction of the end users, in using those services. That satisfaction is generally associated with a set of non-functional requirements (i.e., QoS requirements) that include dependability, reliability, timeliness, throughput and secu-rity.
Issues of QoS have been principally addressed in the design of mechanisms that allow pro-grammers to control communication parameters such as network throughput, packet delay, mes-sage loss and delay jitter (e.g. RSVP [ea97], IntServ [BCS94], DiffServ [ea98]) over QoS-enabled communication technologies (e.g., ATM [Vet95]) [FLPS03]. These parameters indeed effect the user perceived QoS of distributed applications. However, further QoS requirements emerge at the application level and they cannot be fully met at the communication level, as they fall outside the responsibility of this level (rather this level may provide support which can be crucial for meeting them) [Ghi01]. Thus, QoS can be thought of as a pervasive system property, which is to be preserved through a set of QoS functionalities (e.g., QoS negotiation, monitoring, adaptation, etc). These functionalities are to be integrated in every subsystem of the software infrastructure, from the communication subsystem level, up to the application level (end-to-end QoS).
Figure 1 depicts the levels of the software infrastructure in which a QoS management system should be provided. Thus, for example, at the operating system level, there should be mecha-nisms for reserving such resources as CPU, memory and threads; the communication level should provide applications with mechanisms for network monitoring and reservation; the middleware level should be constructed out of services for QoS negotiation, monitoring and adaptation and finally QoS monitoring and adaptation can be applied at the application level as well, by allowing this level to monitor and adapt the QoS it may require.
This thesis focuses on the middleware level of the stack shown in Figure 1, so as to provide distributed applications, that exhibit QoS requirements, with a collection of QoS services
embod-2 General Approach
ied into that level.
1.1 Service Level Agreements and QoS Specifications
In current industry practice, QoS requirements are commonly specified within legally binding contracts termedService Level Agreements (SLAs)[Lod04].
In particular, SLAs are used to define the service guarantees an application hosting environ-ment has to provide its hosted application with, and the metrics to assess the quality of service delivered by that environment. The definition of such SLAs is a complex task, and is outside the scope of this thesis (relevant works include [SLE04], [MJSCG04], [Con]) . However, it is worth mentioning here the principal requirements that can be defined in industry SLAs. These require-ments are divided into seven principal categories that are summarized below.
• Performance it defines a quantitatively characterized service. The quantitative parameters are typically arrival rate, service time, service rate and timeliness of the service;
• Availability it defines the duration during which a service or application is guaranteed to be available;
• Reliability it is expressed in terms of mean-time between two failures of the service; • Maintainability it defines the maximum time required to recover a failure of the service; • Security: it defines the level of security required;
• Monitoring it is necessary in order to produce the service level statistics;
• Penalty within this parameter are included all the penalty clauses that are to be paid in case one of the two parties involved in the SLA cannot fulfill the contract.
2
General Approach
Owing to the above observations, the principal scope of this work is then to demonstrate that the middleware level of Figure 1 can be extended with a collection of QoS services capable of meeting the QoS requirements specified into SLAs and exhibited by distributed applications (e.g., enterprise applications).
It is worth observing that most of such current distributed applications are generally devel-oped by following a component-based approach. By using this approach, applications can be constructed out of reusable software components (e.g., client, web, enterprise components), each of which specialized to carry out some business functions. Usually, these component-based appli-cations are deployed, run and maintained by means of specific middleware technologies, which assume a relevant role in the context of this thesis. These technologies, termedapplication servers, are indeed middleware architectures that consist of a collection of middleware services useful for the development and deployment of component-based applications [FR03]. There exists a variety of such middleware architectures; for example, those implementing the Java 2 Enterprise Edition (J2EE) specifications (e.g., JBoss [jbo], JOnAS [jon], WebSphere [web]), the CORBA Component Model (CCM) [WSO00] and .Net [Pro02]; however, among them, this thesis discusses those that implement the J2EE specifications.
Thus, J2EE application servers allow developers to construct distributed applications out of reusable and interoperable software components (e.g., commercial operating systems, com-munication protocols, middleware services). In addition, they support clustering of application server instances in order to host the aforementioned component-based applications. Therefore, they provide distributed applications with clustering solutions for scalability, load balancing and fault-tolerance purposes.
2 General Approach
Figure 2. Scenario
However, these middleware technologies present a severe limitation: their clustering mech-anisms can meet only partially the aforementioned non-functional requirements (i.e., QoS re-quirements) of the applications they host, as they are not fully instrumented for meeting those requirements (i.e, they are not designed to beQoS-aware).
Therefore, the principal objective of this thesis is to design, implement and evaluate a middle-ware architecture, constructed out ofQoS-aware middleware services, that extends current applica-tion servers in order to create so-calledQoS-aware application servers (QaASs).
QaASs are defined as standard application servers extended with clustering mechanisms that can meet QoS requirements specified with SLAs (i.e., clustering mechanisms capable of honoring SLAs).
The above QoS-aware middleware services can be located in the Common Middleware Ser-vices layer, according to the previous definition, and are capable of providing application servers with mechanisms for dynamic resource configuration, monitoring, adaptation and QoS-aware load balancing. They enforce strategies for the assessment of the correct amount (and the charac-teristics) of the resources necessary to meet the QoS application requirements included into SLAs. This assessment entails determining the amount of resources needed to deliver the required QoS levels, so as to honor those SLAs.
In general, applications hosted in application servers (e.g., web applications, web services) are characterized by high load variance; hence, the amount of resources needed to meet their SLAs may vary notably, over time. Thus, in order to ensure that the SLA of an application be not violated, one can adopt aresource over-provision policy; this policy can be based on evaluating (e.g., via application modeling, via application benchmarking) the amount of resources the appli-cation may require, in the worst case, and allocating statically these resources to that appliappli-cation. Typically, this policy may lead to a largely suboptimal utilization of the hosting environment resources, as allocated resources may remain unused, at run time.
In the vast majority of state-of-the-art application servers, the hosting environment resources are represented by application server instances, deployed in several nodes (i.e., workstations connected through the same LAN) for scalability and fault tolerance purposes.
Figure 2 illustrates the scenario considered in this work. As shown in Figure 2, clients, typi-cally connected to a network, use an application that can be deployed in a clustered application server.
Within such a cluster, one machine is dedicated to the load balancing process (LBS in the above Figure), that is, that machine receives the incoming client requests and dispatches them to the appropriate clustered machine that has been selected by the LBS through a predefined load balancing policy.
2 General Approach
In the above scenario, the client requests are addressed to the LBS and filtered by means of a gateway, which is used in order to cope effectively with a possible crash of the LBS itself.
In this context, an optimal resource utilization can be achieved by allocating to an application, and maintaining at run time, the minimum number of clustered nodes required to meet the ap-plication SLA. To this end, a dynamic cluster configuration mechanism is required that can adapt the cluster configuration to possible variations of the application load by adding (or removing) nodes in the cluster, at run time [LPRT05]. Specifically, in the scenario illustrated in Figure 2, nodes can be added to the cluster, in order to cope with unexpected client load, or released from the cluster when no longer necessary.
This behavior can take advantage of service provision models such as utility computing [IBM04] or on-demand computing (or even grid computing [FK98]), where resources are made available as needed. Hence, the new nodes can be acquired on demand by either data-centers or specific clusters of spare resources that can be allocated for this purpose, as shown in Figure 2.
In this thesis, the QoS management subsystem middleware architecture provides one such a kind of dynamic clustering technique, which allows one to have an effective resource manage-ment. Therefore, the designed middleware architecture enables a clustered QoS-aware environ-ment, termedQoS-aware clustering, in which every clustered node is represented by an instance of the created QaAS.
Each QaAS node of the cluster cooperates with one another, so as to avoid resource over-provision and provide applications with an optimal resource utilization.
2.1 QoS-aware Middleware Services
To this end, each QaAS in the cluster embodies a collection of QoS-aware middleware services. Specifically, this collection of services includes a Configuration Service, a Monitoring Service and a Load Balancing Service.
The principal functionalities carried out by these services are summarized as follows. Configuration Service The main responsibilities of the CS consist of (i) configuring the application server cluster at application deployment time, so as to meet an SLA, and (ii) possibly reconfig-uring the cluster at run time, if the QoS the cluster delivers deviates from that specified in that SLA. The reconfiguration consists of varying dynamically the cluster configuration at run time, by adding or removing nodes.
Monitoring Service The MS is in charge of monitoring the aforementioned QoS-aware clustering in order to detect variations of the QoS delivered by the clustered servers, at run-time. In case variations occur, the MS cooperates with the CS for cluster reconfiguration purposes. The MS carries out its own activity based on specific thresholds, termed warning and breaching points. Specifically, the breaching points can be derived from the requirements included in the SLA. In contrast, the warning thresholds can be calculated based on, for example, the expected load, or even specific policy decisions concerning the risks one wishes to assume as to the violation of a SLA.
Load Balancing Service The LBS described in this thesis is a service that has been included into J2EE application servers and implemented at the middleware level, in order to balance the load of HTTP client requests among the clustered QaASs. The motivation for implementing load balancing at the middleware level is twofold; namely, implementing load balancing at this level of abstraction allows one to be independent from any underlying operating system. Moreover, the designed LBS can be easily made aware of specific application server conditions, such as server response time and throughput as well as the cluster membership configuration.
However, a HTTP load balancing service is not a standard component of the J2EE architec-ture. To this end, such an architecture makes use of hardware or software solutions to face this problem.
2 General Approach
Hardware load balancing services are implemented by network devices (e.g., the Cisco load balancing [Bala], F5/BIG-IP Traffic Management [hlb]) that use static load balancing policies to distribute client requests, only.
Software load balancing services can be implemented in a variety of different ways, namely as (i) operating system components (e.g., the Linux Virtual Server [Ser] is a highly scalable and highly available server built on a cluster of real servers, with the load balancing running on the Linux operating system), (ii) stand-alone applications (e.g., Apache mod jk [Mod], ZXTM Load Balancer [Balb]), and (iii) components integrated in the application server technology (e.g., IBM WebSphere Edge Components Load Balancer). All these software solutions are typically more flexible than their hardware counterparts and enable different load distribution policies.
However, these software (and also hardware) solutions, with some exceptions (e.g., IBM Web-Sphere), principally adopt static load balancing strategies, which do not consider the actual com-putational load of the clustered nodes, in dispatching the client requests among those nodes. Static load distribution may affect the QoS delivered by clustered nodes; to this end, the designed LBS embodies adaptive load balancing policies that can cope effectively with run time variations of both the clustered nodes computational load, and the cluster configuration, in dispatching the client requests.
The architecture of the developed LBS can be thought of as a reverse proxy server which interfaces the clients of an application to the nodes hosting that application, and includes both (i) support for a request-based load balancing approach (each individual client request is intercepted by the LBS, and dispatched to an application server for processing) and (ii) a session-based load balancing approach (a specific client session is processed by the same node; this client-server correspondence is termedsession affinity).
Thus, the LBS is mainly responsible for (i) intercepting each HTTP client request, (ii) selecting a target node that can serve that request, by using specific load balancing policies, and (iii) ma-nipulating the client request appropriately, in order to forward it to the selected target node, and to enable it to return the reply to the LBS itself.
2.2 Middleware services evaluation
An experimental evaluation of a first prototype of the above middleware services has been car-ried out. This evaluation shows that the services enable the construction of a clustered, applica-tion hosting environment that can meet effectively non-funcapplica-tional applicaapplica-tion requirements and optimize clustered resource utilization.
However, a precise analysis of the obtained results has led to design and implement a second prototype of the middleware architecture, described in this thesis. In fact, the purpose here has been that of reducing as much as possible the number of reconfigurations carried out by the CS (i.e., adding/releasing clustered nodes), due to their cost in terms of performance, and to improve the preliminary obtained results concerning the clustered resources utilization.
In addition, it has been carried out a further study of commercial SLAs. From this study, it has emerged, indeed, that in common industry practice the QoS requirements specified within SLAs are allowed to be breached a certain number of times, over a predefined timeframe [BCL+04].
Hence, the first SLA management model proposed in this thesis, with which no SLA violations were permitted, has been relaxed so as to tolerate a limited number of violations of the QoS requirements during a certain time period.
It is worth noticing that changes in the SLA requirements have caused changes in the design of the SLA Monitoring, which has been therefore re-developed and implemented. Moreover, as to the adaptive load balancing policy of the LBS, different techniques have been evaluated in order to find an easier and more precise estimation of the parameters used to adaptively dispatch the HTTP client requests.
The second prototype has been evaluated by performing a further experimental evaluation. This evaluation confirmed some of the positive results obtained by testing the first prototype and also permitted to improve the optimization of the usage of the clustered resources.
3 Thesis Organization
3
Thesis Organization
The remainder of this thesis is structured as follows.
Chapter 2 discusses in detail the designed middleware services, used to provide distributed applications with application servers capable of honoring SLAs.
Chapter 3 describes the implementation of a first prototype of the earlier mentioned middle-ware services.
Chapter 4 presents the principal results of an experimental evaluation of the first prototype of the middleware services. The obtained results show the effectiveness of the proposed services in terms of overhead introduced, capability of honoring SLAs and optimal resource utilization.
Chapter 5 describes a new prototype of the middleware architecture in terms of design, imple-mentation and experimental evaluation of it. In addition, it discusses some experimental results obtained by deploying the clustered QaASs in a wide area network environment.
Chapter 6 presents the state-of-the-art concerning end-to-end QoS architectures, architectures for resource clustering and load balancing. It compares and contrasts the state-of-the-art ap-proaches to the problem faced in this thesis with the proposed solution.
Finally, Chapter 7 discusses some concluding remarks and future directions.
Chapter 2
QoS-aware Middleware Services
In order to construct a QoS-aware application server (QaAS), a standard application server (e.g., JBoss, JOnAS, WebSphere) has been extended with a collection of middleware services, which provide distributed applications with such QoS management mechanisms as resource configura-tion, QoS monitoring, adaptation and QoS-aware load balancing.
These services are part of a middleware architecture termedQoS Management subsystemthat allows one to make standard application servers QoS-aware (Figure 1). Note that the absence of this subsystem leads to have a standard application server constructed out of component middleware (e.g., J2EE). Moreover, from the applications’ point of view, the QoS Management subsystem operates transparently, that is, no changes to the application level specifications are necessary in order to provide the applications with the above QoS management mechanisms.
The QoS Management subsystem is constructed out of three principal middleware services, termed Configuration Service (CS), Monitoring Service (MS) and Load Balancing Service (LBS), respectively.
This Chapter presents the design of each of those middleware services by describing their principal functionalities and the interactions between them.
1
General Architecture
Figure 2 depicts the overall system architecture that has been designed in order to honor SLAs. Such an architecture, termed QoS-aware application hosting enviroment(QoS-aware clustering for short), consists of a number of clustered QaASs.
Within the QoS-aware clustering, each clustered QaAS (i.e., node) incorporates the QoS Man-agement subsystem illustrated in Figure 1. As the QoS ManMan-agement is constructed out of the aforementioned CS, MS and LBS, these services are in turn replicated in each node of the QoS-aware clustering.
However, only one node in the cluster is indeed responsible for the SLA enforcement, moni-toring, and load balancing. This node, termed theLeaderof the cluster (the bold circle of Figure 2), receives the clients requests that, by means of the LBS of the Leader, are dispatched to the remaining nodes of the QoS-aware clustering. Hence, as depicted in Figure 2, there exists only one CS, MS and LBS Leader at a time. The remaining CS, MS and LBS, deployed in the other clustered nodes (i.e.,slave nodes) are used as backup copies in case the Leader crashes; they re-ceive from the Leader both the client requests for processing and the cluster configuration state as soon as an event causes modifications to that state (i.e., every slave node has a consistent view of the cluster configuration state).
The cluster configuration state maintained by the Leader consists of the list of QaAS nodes that are part of the QoS-aware clustering and the SLA related to the application that is currently deployed in the cluster.
The Leader maintains the state on the slave nodes consistent by carrying out a specific proto-col (further details about this protoproto-col are discussed in Section 3) that makes use of a proto-collection of
1 General Architecture
Figure 1. The QoS-aware Application Server (QaAS)
primitives, made available, in turn, by an underlying reliable group communication mechanism, as shown in Figure 2. Specifically, that mechanism provides the QaAS nodes with reliability properties that include lossless message transmission, message ordering, and atomicity.
In the design of the above middleware architecture, issues concerning such group communi-cation mechanisms are outside the scope of this thesis. Hence, a reliable group communicommuni-cation protocol, already included in standard application servers, has been used (in the implementation within the JBoss application server the reliable group communication protocol is JGroups [jgr]).
The principal activities performed by the above three middleware services of the Leader can be summarized as follows.
The CS is responsible for configuring the QoS-aware clustering so that it meets effectively the hosting SLA of a customer application. Thus, in essence, as shown in Figure 3, the CS takes in input a customer application hosting SLA, and discovers the available system resources that can honor that SLA. Provided that the discovered resources be sufficient to meet that input SLA, the CS reserves those resources, generates aresource plan(see below) for the hosted application, and sets up the QoS-aware clustering for that application. In case the CS discovers that there are no sufficient resources to host that application, it returns an exception (typically, one such an exception can be handled either by rejecting the application hosting request, or by offering a reduced service, for example, depending on the policy implemented by the owner of the hosting environment).
In essence, the resource plan mentioned above specifies the resources that are to be used in order to construct the QoS-aware clustering capable of meeting the input hosting SLA. In addition, for each resource, the resource plan includes a positive value, termed load balancing factorthat is a percentage used to dispatch the client requests during the load balancing process, when an adaptive load balancing is enabled.
The CS carries out its principal activities in cooperation with another service of the QoS Man-agement subsystem termed Monitoring Service (MS). The MS is in charge of monitoring the QoS-aware clustering at application run time, so as to detect possible (i) variations of the cluster mem-bership, (ii) variations of the load of each clustered node and (iii) violations of the hosting SLA. In order to prevent these latter violations, the MS is designed so that it takes appropriate actions if it discovers that the QoS-aware clustering is not able to cope effectively with new operational con-ditions (e.g., node overloading concon-ditions, node crashes). Thus, for example, the MS can make use of an “overload” warning threshold in order to detect dangerous load conditions that may lead to node overloading. In case the input hosting SLA is close to being violated, the MS invokes its related CS, and requires that the QoS-aware clustering be reconfigured appropriately, so that it can adapt to the new load conditions, and continue to honor the application hosting SLA.
Finally, the LBS is used in order to (i) enable effective clustering of application servers and (ii) collect parameters that are useful for the cluster performances monitoring (monitoring of the cluster response time, for example); hence, the LBS operates strictly in conjunction with the above
1 General Architecture
QoS-aware
Clustering slave QaASs
Leader QaAS
Reliable Group Communication Client requests CS MS LBS CS CS CS MS MS MS LBS LBS LBS
Figure 2. QoS-aware Clustering
SLA
CS
Client requests
LBS MS
<<get monitoring info>> <<set Resource Plan>>
<<get SLA>>
cooperate with
QoS Management
Leader
2 Service Level Agreement
CS and MS. Typically, this service can contribute to meeting the hosting SLA by both preventing the occurrence of node overload, and avoiding the use of resources which have become unavail-able (e.g., failed) at run time; it distributes appropriately the client requests load amongst the clustered nodes according to some specified load balancing policy (see Section 5).
The next Subsections describe in detail the concept of Service Level Agreement and the de-sign and protocols of the CS and MS. Following the description of these two services, separate Subsections introduce, both the design of the LBS and the adaptive load balancing policy.
2
Service Level Agreement
A Service Level Agreement (SLA) is a legally binding contract that specifies the QoS guarantees an application hosting environment, such as an application server, has to provide its hosted ap-plications with, under specific load conditions, and the metrics to assess the QoS delivered by that environment [Con].
Although SLAs are widely used in industry practice, there is no a standard for their definition; rather, service providers adopt their own methodology to specify the QoS requirements they have to commit to their customers. Thus, defining a SLA is a complex task and research works are currently investigating the definition of languages to represent it. Examples of these researches include [SLE04, KL03]. The detailed description of these works is outside the scope of this thesis; however, it is worth mentioning here a language, named SLAng [SLE04], that has been used in order to derive some parts of the example SLAs of this thesis. With SLAng, a collection of contractual clauses have been represented in an XML form.
Specifically, as depicted in Figure 4, the SLA derived by using SLAng is an XML file termed hosting SLA. In the context of this thesis, hosting SLAs are defined as the SLAs that are used to bind a QoS-aware application server to the applications it hosts.
This hosting SLA, as illustrated in Figure 4, may consist of two principal parts, namely the Client Responsibilitiesand theServer Responsibilitiesparts, which define the obligations of the client and the server, respectively, that are to be respected in order to honor the SLA.
For both client and server, the SLA may include different levels of the required quality of service, each related to every (or some) operations of the application that is bound to the host-ing environment. In the example, a bookshop application has been used and the operations, which can be performed by the clients, are such operations as ”login”, ”catalog”, ”bookDetails”, ”AddtoCart” and so on.
Hence, from a client point of view, the application operations can be classified according to the maximum number of requests a client is allowed to send to the application server, within a specified interval of time (the XMLrequestRateattribute of Figure 4).
In contrast, from a server point of view other responsibilities are to be fulfilled that concern the principal hosting obligations an application server should provide its applications with. Thus, as included in the example SLA of Figure 4, an application server has to guarantee a specified sys-tem availability; this parameter, expressed as a percentage, represents the proportion of time for which the hosted application is accessible with predictable response times (e.g., daily availability of the bookshop application will be no less than 70 percent).
Moreover, yet again the application operations may be classified according to some quality of service attribute. In the example of Figure 4, these attributes are theresponse time, i.e.,
the time elapsed between the delivery of a client request, for a specified operation of the appli-cation, to the server and the transmission of the reply from the server back to the client, and the
throughput, i.e., the number of request served by the application server per second.
3
Configuration Service
The hosting SLA described above is the principal object of the system architecture that is used to trigger the execution of the Configuration Service (CS). Hence, the main responsibilities of the CS consist of (i) configuring the cluster at the time the hosting SLA is effectively deployed in the
3 Configuration Service <?xml version="1.0" encoding="UTF-8"?> <SLAng xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> <Parties> <Client> <Name>Facilitare Inc.</Name> <Address>Frankfurt</Address> </Client> <Server> <Name>Subito Inc.</Name> <Address>Stockholm</Address> </Server> </Parties> <SLS> <Hosting> <ClientResponsibilities>
<ContainerServiceUsage clauseId="Login" requestRate="100/s"> <Operations>
<Operation name="login.jsp"/> <Operation name="login.do"/> </Operations>
</ContainerServiceUsage>
<ContainerServiceUsage clauseId="CreateAuction" requestRate="50/ s"> <Operations> <Operation name="visualizeAuction"/> <Operation name="createAuction"/> </Operations> </ContainerServiceUsage> </ClientResponsibilities> <ServerResponsibilities serviceAvailability = "0.70">
<OperationPerformance clauseId="Login" avgResponseTime="3.0s" avgThroughput="50/s"> <Operations> <Operation name="LoginCtl"/> </Operations> </OperationPerformance> <OperationPerformance clauseId="catalog.jsp" avgResponseTime="2.0s" avgThroughput="50/s"> <Operations> <Operation name="catalog"/> </Operations> </OperationPerformance>
<OperationPerformance clauseId="Checkout" avgResponseTime="1.0s" avgThroughput="50/s">
<Operations>
<Operation name="CheckoutCtl"/> </Operations>
</OperationPerformance>
<OperationPerformance clauseId="AddToCart" avgResponseTime="2.0s" avgThroughput="50/s"> <Operations> <Operation name="AddToCart"/> </Operations> </OperationPerformance> <OperationPerformance clauseId="RemoveFromCart" avgResponseTime="3.0s" avgThroughput="50/s"> <Operations> <Operation name="RemoveFromCart"/> </Operations> </OperationPerformance> </ServerResponsibilities> </Hosting> </SLS> </SLAng>
3 Configuration Service
QoS-aware clustering (i.e, at SLA deployment time), and (ii) possibly reconfiguring the cluster at run time, if the QoS the cluster delivers deviates from that specified in the hosting SLA [LP04]. The skeleton code in Figure 5 illustrates the protocol performed by the CS.
As shown in Figure 5, the CS is created and started at QaAS start up time. The actual cluster configuration initiates at hosting SLA deployment time.
Typically, the CS instance of the QaAS node where an SLA deployment occurs becomes the CS Leader. Note that the possible crash of the Leader during the configuration (or run-time reconfiguration) is detected by the backup CS instances, deployed in the slave nodes, through their MSs (see next Subsection). These MSs are notified of the Leader crash from the underlying group communication. Specifically, there exists a MS’s primitive namedmembershipChanged(see Section 4.1) that is invoked by the group communication on the MSs as soon as a change of the cluster membership configuration occurs.
The group communication’s reliability properties of lossless message transmission, message ordering and atomicity assure that each MS on the slave nodes, and consequently its local CS, has a consistent view of the membership changes. Hence, in case of Leader crash, the following simple recovery protocol is performed by each CS instance deployed in the slave nodes.
Every CS is identified by an unique identifier (ID) that is the IP address of the machine where the CS is deployed. Note that the CSs have a consistent cluster configuration state object that consists of the list of the IP addresses of the available clustered nodes.
When the crash of the Leader is detected by the MSs on the slave nodes, as previously de-scribed, all the available CSs are informed by their related MSs so as to enable the CSs to start the election of the new Leader. This election consists of taking the IP addresses of the available nodes from the cluster configuration state, interpreting them as strings and computing the minimum of these strings (by means of a lexicographical order of the strings).
It is worth noticing that, for the aforementioned reliability properties of the used group com-munication paradigm, all the available clustered nodes agree on electing the CS instance that is deployed in the machine with the smallest IP address as new cluster Leader.
This CS Leader takes as input the SLA, and parses it in order to extract from it the relevant QoS parameters that guide and determine the required cluster configuration (e.g., in the above SLA example, these parameters are such SLA attributes as the client request rate, the service availability, and so on).
The cluster configuration mainly consists of finding out other QaAS peers in the cluster (i.e., discovering the initial cluster configuration) in order to check whether or not that initial configu-ration is suitable for meeting the input hosting SLA. The initial cluster configuconfigu-ration is obtained by invoking the MS, deployed in the node Leader, as depicted in Figure 3.
Once the initial cluster configuration has been obtained, a specific SLA parameter, theserviceAvailability
attribute of Figure 4, is examined so as to verify whether further nodes are to be added to the ini-tial cluster configuration or excluded from it. This can be crucial in the light of the principal objective of this work of providing an optimal resource utilization: only the minimum number of clustered nodes, which can contribute to meeting the input hosting SLA, should be used.
Thus, the CS computes the system availability at the time the cluster configuration starts, and compares the resulting value with theserviceAvailabilityof the SLA. If the current system availability is less than that specified in the SLA, new nodes are added to the cluster.
In contrast, if the current system availability is higher than that of the SLA, the CS tempts to exclude one node at a time, and computes the remaining system availability (i.e., the availability after the exclusion of the node). If the remaining system availability is higher than that included into the SLA, the node is effectively excluded from the cluster configuration; that node is then included into the pool of spare resources. Otherwise, the node is still part of the built initial cluster.
Hence, letN be the number of nodes that form the initial cluster configuration, the system availability is computed as follows:
System Availability= 1−[ N Y i=1 (1−Ai)] (1) UBLCS-2006-06 14
3 Configuration Service
// QaAS START UP TIME
create-CS(); start-CS();
// SLA DEPLOYMENT TIME
enableConfiguration(SLA) {
// Elect cluster leader: the node with smallest ID leaderElection(SLA);
// Configure cluster
clusterConfiguration(SLA, getMembership); // Set new state on slave nodes
setClusterState(); }
clusterConfiguration(SLA, membership) {
//Compute system availability
systemAvail = computeSystemAvailability();
if ( systemAvail < SLAAvailability) { addNewInstancesReconfiguration(); }else {
//Check whether or not to exclude instances if (excludeNodes()) excludeNodesReconfiguration(); } } // RUN TIME addNewInstancesReconfiguration() { addNode(); } excludeNodesReconfiguration() {
// find node and exclude it }
4 Monitoring Service
Thus, assuming independent repair of the clustered machines, the combined availability of the system is calculated as 1 minus the product of the probability that a machine is unavailable, for all the clustered machines. The motivations for using the formula (1) are as follows.
A cluster of nodes can be considered completely failed if and only if all its nodes fail. In contrast, the cluster is operational if at least one of its nodes is available. Hence, a cluster of nodes can be modeled as a system constructed out of distinct parallel components. Thus, from the dependability theory [Tri02, Bob03], the availability of one such a system can be computed by making use of the equation (1).
Ai in the formula above is the estimated steady state availability of the single machine i,
which deploys the clustered QaAS instancei; this availability value is obtained by the CS from a specific configuration file of that machine.
Note that issues concerning the per-machine estimated steady state availability are outside the scope of this thesis. However, it is worth observing that this value can be either included in the technical machine specifications, provided by the machine manufacturer, or computed via benchmarking.
In constructing the cluster configuration, the CS produces the resource plan object that in-cludes, for each clustered node of the built cluster configuration, the above mentioned load bal-ancing factor. As stated before, the load balbal-ancing factor is a positive value computed by using the serverresponse time, the serverthroughput, and the size of the free memory in (the JVM of) each QaAS node of the resource plan object (further details about the load balancing factor are described in Section 5).
Once the cluster configuration is terminated, the CS protocol is executed at run time in case the MS requires a reconfiguration of the cluster. The reconfiguration consists of either (i) adding new nodes to the cluster, or (ii) releasing clustered nodes, and (iii) updating the aforementioned load balancing factor.
The activity (i) can be necessary both in case a cluster is to be augmented with additional resources (e.g., in order to cope with dynamically increasing load), and in case a clustered node fails and is to be replaced by one (or more) operational node(s). In both these cases, it may be convenient to maintain a pool of spare servers available for inclusion in the cluster, at any time, so as to eliminate the overhead induced by bootstrapping a new application server.
The activity (ii) is carried out in order to optimize the resource utilization. Specifically, this entails that, if the load for a hosted application decreases significantly (below predefined warning thresholds), the resources allocated to that application can be deallocated dynamically.
Finally, as to activity (iii), the load balancing factor, associated to each clustered node, is used by the LBS in order to distribute the load among the clustered nodes, when an adaptive load balancing policy is deployed.
4
Monitoring Service
The architecture of the MS is depicted in Figure 6. As shown in Figure 6, the MS consists of two principal components, namely theMembership Interceptorand theSLA Monitoring, described in the following separate Subsections.
4.1 Membership Interceptor
The Membership Interceptor is a MS component used to retrieve information about the cluster membership. It uses underlying group communication primitives in order to detect new mem-bers that join the cluster, and dead memmem-bers that leave the cluster. In addition, it saves this data in a file when changes occur to the membership, for logging purposes; to this end, another component of the MS termedMeasurement Serviceis being properly invoked by the Membership Interceptor as depicted in Figure 6.
The pseudo code in Figure 7 illustrates the above activities performed by this monitoring component. As shown in Figure 7, at QaAS start up time, the Membership Interceptor is started
4 Monitoring Service Monitoring Service SLA Monitoring Membership Interceptor Request Interceptor Cluster Performance Monitor Evaluation Violation Detection Service Client Requests <<send request>> <<evaluate SLA>> Measurement Service
<<save SLA monitoring data>> <<save membership data>>
access
Figure 6. The MS architecture
up by running the monitoring thread. This thread is in charge of monitoring the cluster member-ship and saving the obtained data in the stable storage. Note that these activities are performed by the thread during a predefined time interval termedmonitoring frequency, which has been fixed during the experimental evaluation of the QoS-aware middleware services.
At run time, the Membership Interceptor operates as listener in order to detect changes to the cluster membership. To this end, as earlier discussed, the Membership Interceptor makes available a primitive, namedmembershipChanged, for the use from the underlying group com-munication. This primitive allows the Membership Interceptor to detect changes to the cluster membership as it is properly invoked as soon as those changes occur in the cluster.
It is worth observing that, in case of a member crash, the Membership Interceptor checks whether or not the dead member is the QaAS Leader of the cluster. If the dead member is not the Leader, the Membership Interceptor deployed in the node Leader invokes the SLA Monitoring (the SLAEvaluation method of Figure 8), in order to check the adherence to the hosting SLA after the membership change.
In contrast, if the dead member is the Leader of the cluster, the recovery protocol, discussed in the previous Section, is carried out so as to elect the new Leader of the cluster, responsible at that point for monitoring the adherence to the hosting SLA of the new cluster configuration.
4.2 SLA Monitoring
The SLA Monitoring component of the Monitoring Service is responsible for monitoring the clus-ter performances. It consists of three additional components named Client Request Interceptor, Cluster Performance Monitor, andEvaluation and Violation Detection Service. These components are described below, in isolation.
The Client Request Interceptor This interceptor is used to intercept the client requests in order to evaluate the cluster performance for specific application requests; the cluster performance con-sists of the set of throughput and response time values that allow one to detect whether or not the nodes of the cluster are overloaded. This interceptor is used in the Load Balancing architecture described in the next Subsection, and is named Request Interceptor.
4 Monitoring Service
//QaAS START UP TIME
start-MS();
//SLA DEPLOYMENT TIME AND RUN-TIME run() {
while(true) { // Get cluster membership getMembership();
// Save logs in stable storage
MeasuramentService.saveMembership(); sleep(monitoringFrequency); } } membershipChanged(deadMembers,newMembers,allMembers) {
// Record deadMembers, newMembers and allMembers
MeasuramentService.saveMembership();
getDeadMembers(); getNewMembers();
}
getDeadMembers() {
if (Leader is dead) { // Elect new leader }
else {
// Leader calls SLA evaluation with EVDS
EVDS.SLAEvaluation(; }
}
getNewMembers() {
//set state to the new member }
Figure 7. Membership Interceptor skeleton code
5 Load Balancing Service
The Cluster Performance Monitor This component cooperates with the Evaluation and Violation Detection Service in order to assess the actual response time and the throughput delivered by the application server cluster, and to detect possible SLA violations. Specifically, the Cluster Per-formance Monitor (i) obtains the data required to assess response time and throughput from the client request interceptor (e.g., number of received requests, for specific application operations, number of served requests, for specific application operations), (ii) computes the average request response time and throughput, and (iii) sends the results of its computations to the Evaluation and Violation Detection Service. This latter service is invoked in order to check the adherence of those values to the SLA requirements.
The Evaluation and Violation Detection Service This service is responsible for checking whether the QoS delivered by the clustered application servers meet the hosting SLA requirements. Specifi-cally, this component detects, at run time, variations in the operational conditions of the clustered nodes, which may affect the QoS delivered by the cluster, and triggers the cluster reconfiguration, if necessary.
The skeleton code of the above components is depicted in Figure 8. As shown in Figure 8, the SLA Monitoring records and computes, through the Cluster Performance Monitor, the cluster performance as soon as an incoming client request is intercepted over the monitoring frequency interval. When the monitoring frequency interval expires, the Cluster Performance Monitor invokes the Evaluation and Violation Detection Service in order to check the adherence to the hosting SLA of both, the response time and throughput exhibited by the cluster.
Hence, in either cases, if the computed value is higher than that of the related SLA bound, the SLA is violated and an exception is raised at the application level. In contrast, if the values of both the computed response time and throughput are lower than that specified into the SLA, the Evaluation and Violation Detection Service checks whether or not an adaptation is to be applied, which either adds new QaAS instances in the cluster or releases nodes as no longer necessary.
To this end, predefined thresholds are used in order to compute so-called warning points. There exist two types of warning points, namely ahigh load warning pointand alow load warn-ing point. The former is used in order to verify whether or not the cluster is overloaded; in this case, in order to avoid SLA violations, the CS is invoked so as to add new instances in the cluster. In contrast, the latter indicates that the cluster is positively responding to the injected client load and clustered nodes can be released as no longer necessary.
Note that in the design of the SLA Monitoring a number of warning points (e.g., a throughput warning point, a response time warning point) are used, as illustrated in Figure 8. However, it is also worth observing that the Evaluation and Violation Detection Service invokes a CS reconfig-uration that adds new instances in the cluster if there exists at least one computed value (either response time or throughput) that is equal or higher than the related high load warning point. In contrast, the Evaluation and Violation Detection Service invokes a CS reconfiguration, which re-leases nodes in the cluster, if all the values computed during the monitoring frequency are equal or lower than the related low load warning points.
5
Load Balancing Service
In the design of the Load Balancing Service (LBS) for clustered QoS-aware application servers, two load balancing approaches have been evaluated, namely a “request-based” (or “per-request”) load balancing approach, and a “session-based” (or “per-session”) load balancing approach.
5.1 Per-request Load Balancing
In “request-based” load balancing, each individual client request is processed by the LBS, and dispatched to an application server, according to some specific load distribution policy (see be-low). Thus, two consecutive requests from the same client may be dispatched to two different servers for processing.
5 Load Balancing Service
//RUN TIME (Cluster Performance Monitoring)
// Get client requests via client request interceptor getClientRequests();
/* Invoke Cluster Performance Monitor to compute RT and Throughput */
ClusterPerformanceMonitor.computeRespTime(); ClusterPerformanceMonitor.computeThroughput(); class ClusterPerformanceMonitor {
while(true) {
// Record number of arrived requests // Record number of served requests // Compute response time
// Compute throughput
/* Evaluate response time via EvaluationViolationDetection Service (EVDS) */
EVDS.SLAEvaluationRT(computedRT); // Evaluate Throughput EVDS EVDS.SLAThroughput(computedTh); sleep(monitoringFrequency); } } class EvaluationViolationDetectionService { SLAEvaluation(membership) { if (membership.size < membershipRequiredSLA) { //invoke CS addNewInstancesReconfiguration } else {
//No need to add replica; invoke CS rearrangeAgreedQoS }
}
SLAEvaluationRT(computedRT) {
getRTThresholds(); if (computedRT > SLART) {
//SLA VIOLATED; return exception else {
if (computedRT >= HLWPRT)
/* RespTime warning point reached; invoke CS that adds new nodes */
else if (computedRT <= LLWPRT)
/* RespTime low load warning point reached; invoke CS that releases nodes */
else // No need to reconfigure } } SLAEvaluationTh(computedTh) { getThThresholds(); if (computedTh < SLATh) { //SLA VIOLATED } else { if (computedRT >= HLWPTH)
/* Throughput warning point reached; invoke CS that adds new nodes */
else if (computedRT <= LLWPRT)
/* Throughput low load warning point reached
invoke CS that releases nodes */
else
// No need to reconfigure }
}
}
Figure 8. SLA MS skeleton code
5 Load Balancing Service
In principle, the “request-based” load balancing is used when state replication among the clustered servers is enabled. This replication mechanism allows one to construct a hosting en-vironment that can support highly available applications. Hence, provided that the application state be maintained mutually consistent across the clustered application servers, the crash of a server can be masked by routing the client requests to a different server in the same cluster, trans-parently to the client program. In practice, the cost of maintaining consistency among clustered servers can be very high, in terms of both performance and memory overheads caused by the consistency algorithms (note that, in this case, state replication among the clustered servers is re-quired, as any of these servers can be used, without distinction, to serve incoming client requests at any time) [RT04].
5.2 Per-session Load Balancing
In contrast, using “session-based” load balancing, a specific client session (i.e., a sequence of client requests) is created in one of the clustered application server, at the time a client program requires access to the application hosted by that server; every future request, within a speci-fied timeframe, from that client will be processed by that server (these client-server sessions are termedsession affinity). Thus, the LBS intercepts each client request and, depending on the session affinity, dispatches it to the appropriate server.
“Session-based” load balancing can be implemented so as to impose no consistency require-ments among the clustered servers, as all the client requests belonging to the same session are served by the same server. However, if a server crashes while it is serving a client session, that session cannot be recovered at the application server level. Thus, either the client program im-plements its own recovery mechanisms, or this program may have to start its session again (at the risk of causing multiple executions of earlier requests). It is worth mentioning that session replication across multiple replica servers may well be implemented at the application server level, and used to provide clients with transparent fault tolerant support to server crashes. How-ever, the cost of maintaining mutually consistent client session replicas across multiple servers may make this solution as impractical as that mentioned above, in the case of request-based load balancing.
5.3 The Load Balancing Architecture
In view of the above observations, the architecture of the LBS includes support for both request-based load balancing, and session-request-based load balancing; this architecture is depicted in Figure 9, and summarized in the following.
The designed LBS can be thought of as a reverse proxy server (the LBS deployed in the node Leader of the cluster) which interfaces the clients of an application to the clustered nodes host-ing that application. This LBS can accommodate a number of load balanchost-ing policies; typically, this Service requires to be configured at cluster configuration time so as to use, at run time, one specific load balancing policy out of those it incorporates.
The architecture of the LBS consists of the following four principal components: namely, the Request Interceptor, the Load Balancing Scheduler (LBScheduler), the HTTP Request Manager, and the Sticky Session Manager, illustrated in Figure 9. Note that the LBScheduler component in Figure 9 embodies three load balancing policies, namely the Round Robin, the Random and the Workload. Only one of these policies can be selected at cluster configuration time; that policy will be used at run time for load balancing purposes. It may be worth mentioning that the load balancing policies shown in Figure 9 are those actually incorporated in the first implementation prototype of the LBS; however, the Service is structured so that the set of policies embodied by the LBScheduler can be extended easily with additional ones.
As mentioned earlier, the LBS can balance the client request load either on a per-request basis or on a per-session basis. Yet again, the decision as to which “granularity” (i.e., either per-request, or per-session) of load balancing to use is to be taken at cluster configuration time.
If per-request load balancing has been enabled, the Request Interceptor intercepts each client request, and invokes the HTTP Request Manager. This Manager firstly interrogates the LBSched-uler in order to obtain the address of a target node that can serve that request. The LBSchedLBSched-uler
5 Load Balancing Service
Figure 9. Load Balancing Service Architecture
returns the requested address, based on the specific load balancing policy it has been configured to use. As the HTTP Request Manager obtains the target node address, it manipulates the client request appropriately in order to forward it to this target node, and to enable it to return the reply to the Load Balancing Service itself. As the Load Balancing Service receives a response to a client request, it forwards it to that client. The next request from the same client will be dealt with the same way; hence, there will be no guarantee that the same target node will be selected for serving it.
In contrast, if the LBS has been configured to use a per-session load balancing, the Sticky Session Manager cooperates with the LBScheduler to identify a target node that will serve the client requests for the entire session. To this end, a unique cookie has been created and included into the HTTP requests. This cookie identifies the selected node, is generated at the time a client session starts, and maintained until that ends (see below). The use of cookies in HTTP requests is a common mechanism for managing HTTP state information [RFC97]. In fact, by means of cookies, it is possible to (i) carry state information (a sequence of client requests, for example) between the client and a clustered node and (ii) easily identify the client who originated the request. Hence, owing to these advantages, the above cookie-based solution has been favored.
A few observations concerning the design of the LBS are in order. Firstly, it is worth mention-ing that indeed a LBS architecture based on a reverse proxy may show some limitations, in terms of scalability, and robustness, for example.
However, as to the robustness, in order to overcome the single point of failure shortcoming, the LBS has been replicated in each slave nodes (as described in Section 1); this allows us to take over in case of failure of the LBS Leader (i.e., in case of failure of the reverse proxy). Neverthe-less, it is worth observing that this mechanism is indeed effective as it has been assumed, in the implementation, the use of a dynamic IP address translation (or IP takeover), which keeps trans-parency with respect to the clients accessing the hosted application. Hence, if the LBS crashes, a new node in the cluster is elected as LBS Leader, by following the protocol earlier described, and the IP address of the crashed node is assigned to the new one. Then, a proper hardware device, such as the gateway illustrated in Figure 2, is used in order to redirect the incoming client requests toward the new LBS Leader. This mechanism totally masks the failures to the end-users
5 Load Balancing Service
for which the application is still available.
In addition, one such a reverse proxy architecture has the merit of enabling the development of a Load Balancing Service which is fully transparent to both the client and the clustered QaASs. The client should only enable its browser to accept cookies, in case a per-session load balancing is used. Hence, on the balance, the reverse proxy based architecture introduced above has been favored.
Secondly, it is worth observing that the LBS has been designed so as to be completely inde-pendent of the policy used to select the target node. In fact, the Service architecture developed allows developers to plug into it the load balancing policies of their choice.
Finally, note that the LBScheduler itself does not monitor the membership of the cluster. Rather, the LBScheduler receives the list of available clustered nodes from the CS (i.e., the LB-Scheduler receives from the CS the resource plan object, as depicted in Figure 3).
5.4 The Load Balancing Policies: The WorkLoad policy
Typically open-source J2EE platforms use software load balancing services that embody either “Random” or “Round Robin” load balancing policies, for balancing the incoming client requests within a cluster of application server instances. However, these policies are all non-adaptive, i.e., they are unable to adapt to variations of the QoS delivered by the clustered application servers. As one of the focuses of this thesis was to evaluate adaptive load balancing policies, aWorkLoad policyhas been developed for this purpose.
This policy enables the LBScheduler to select the most lightly loaded node among the clus-tered nodes, in order to serve either a request or a client session. To this end, the LBScheduler uses the above mentioned load balancing factor and selects a clustered QaAS with a probability that is proportional to the load balancing factor value. For each QaAS, this value is computed by considering three QoS parameters, such as theserver response time, i.e., the time elapsed be-tween the delivery of a client request to an application server, and the transmission of the reply from that server, theserver throughput, i.e., the number of client requests served by the application server within a specific time interval, theserver available memory, i.e., the size of the free memory in (the JVM of) each application server.
Therefore, the load balancing factor of a nodeiis computed by taking into account three prin-cipal factors (i.e., the so-called RespTimeFactor, ThFactor and MemFactor below) that represent the total cluster response time, throughput and JVM memory, respectively, the nodeiis capable of offering within the cluster.
Hence, letNbe the number of clustered QaASs. At cluster configuration time for each nodei, the percentage of the total computational load thatiis supporting, in terms ofi’s response time, throughput, and available memory is calculated as follows.
RespT imeF actori=
(RespT ime1
i)∗100
T otRespT ime where T otRespT ime= n X i=1 RespT imei (2) T hF actori= (T hi∗100)
T otalT h where T otalT h= n X i=1 T hi (3) M emF actori= (M emi∗100)
T otalM em where T otalM em= n X
i=1
M emi (4)
Thus, in essence, the higher the response time of a nodei(i.e., theRespT imeiin formula (2)),
the lower that node percentage of the total response time (i.e., the aboveRespT imeF actori; note
that this is captured by the use of the inverse of the node response time).
In contrast, the larger the throughput (or the available memory) of a nodei, the larger the percentage of the total cluster throughput (i.e., the aboveT hF actori) that node delivers (or the
6 Application Deployment
The load balancing factor (i.e.,lbf) then can be obtained as the median of the three factors above, assuming that each QoS parameter has the same weight in the construction of this factor.
lbfi= (RespT imeF actori+T hF actori+M emF actori)/3 (5)
In conclusion, it is worth observing that the load balancing factor is initialized to a default value in each QaAS, and dynamically adjusted at cluster configuration and reconfiguration time. The dynamic variations of the load balancing factor reflects the changes of the operational condi-tions of the QoS-aware clustering. Hence, the WorkLoad policy enables adaptation of the hosting environment as variations of the QoS delivered by this environment occur, causing cluster recon-figuration.
In essence, the usage of the load balancing factor, in the load balancing process, can be useful in order to both optimize the user perceived response time and augment the number of requests served by the clustered per second (i.e., the throughput of the QoS-aware clustering environ-ment).
6
Application Deployment
Issues of application deployment and application component replication in a clustered hosting environment can play a relevant role in the design of the earlier described CS. This Subsection introduces these two issues, and examine three alternative approaches in the design of the CS.
As to the application deployment, there exists aHomogeneous Application Deployment (HOAD) that entails that each clustered application servers runs identical services, and application com-ponents. In contrast, there exists aHeterogeneous Application Deployment (HEAD)that entails that each application server in the clusters may run a different set of services and application compo-nents.
It is worth observing that, in practice, the use of this latter form of deployment results partic-ularly expensive as this requires a number of issues to be further investigated and carefully dealt with, such as state propagation, cluster-wide configuration management and so on. However, I believe that heterogeneous deployment, in contrast with the homogeneous one, can be particu-larly attractive when applied to component-based applications, as it enables the distribution of the application components across clusters of machines, in a controlled manner.
Typically, enterprise applications adopt a distributed multi tier paradigm, in which the appli-cation consists of separate components; namely, client, web and business components. General