High Availability with Clusters of Web Services

(1)

Julio Fernández Vilas1_{, José Pazos Arias}2_{, Ana Fernández Vilas}2 1_{CPD - Caixa Galicia. 15190 A Coruña, Spain}

2_{Universidad de Vigo. 36200 Vigo, Spain} {Jose.Pazos, Ana.Vilas}@det.uvigo.es

Abstract. Internet is open 24 hours a day, 7 days a week, so hardware,

software, and communications must be always online. Additionally, the total number of users and its workload is completely unpredictable. If you decide to use web services, you will have no solution for problems like those. This paper shows a new technique that can be applied to web services technology in order to be able to deploy web services with high availability features using clustering. This new technique is based on the virtualization of the real web services used to serve the client requests, creating new virtual web services that will be the ones invoked by the clients. At the back-end, the implementation web services (the real ones) will be invoked inside a cluster.

1 Introduction

Typically, availability problems (also scalability, continuity...) have mainly two solutions: Fault Tolerance and Clustering. Fault tolerance [1, 2] means that all components inside a single computer are redundant. In clustering [3, 4], complete computers are redundant. When clustering is used, all cluster components (nodes) run at the same time, so, in addition to availability, other properties like scalability or load balancing are obtained.

How can we deploy web services with high availability? A cluster in which nodes have some fault-tolerant features should be the best hardware infrastructure. The software infrastructure should provide solutions to three main problems: identify cluster nodes in error; provide mechanisms to perform cluster and node maintenance; and modify cluster structure as needed (adding and removing nodes). To solve these problems we propose a virtualization model.

After introducing the problems, the remainder of the paper is structured as follows: section 2 introduces virtualization and its components, while its implementation is discussed in section 3. The details about cluster implementation are explained in section 4, and section 5 shows the solution to the previously identified problems.

2 Virtualization

The technique we propose, virtualization, is based on grouping one or more web services inside a unique wrapper, which is then published as a standard web service,

(2)

so clients use the new virtual web service as a standard web service, that is, there is no difference between real and virtual from the client’s point of view. With virtualization, some additional logic can be performed out of the client applications (error management, provider selection...), and this way the software complexity is radically reduced, since developers must only worry about business logic.

Wrapping a group of web services has nothing to do with the well-known web service wrappers, which are software components used to isolate the communication layer (SOAP, HTTP…) of the web services from the web services implementation. The wrapper term is used here to refer to a virtual view of a set of web services, that is, clients have a unique view of that group. The virtual view is in fact a standard WSDL [5] document.

Virtualizing a web service requires a change in the standard web services architecture, since a virtual web service must reside in an intermediate element different from the client and the provider.

2.1 Architectural change

Let us think about the way web services are created and used. First, a provider entity builds a piece of software (a component) that implements the business logic. Then, using a development tool, the WSDL document and the wrapper classes for the component are created. Finally, the provider deploys the web service to an application server and publishes the WSDL in order to make it accessible by clients. On the client side, a client entity finds the WSDL document. Then, using a development tool, the client creates the proxy classes that will be invoked by the client’s business logic. Because of this way of developing proxies and wrappers, we detect a big dependence between clients and servers. That dependence has its origin in the SOAP messages.

Let us suppose that the provider modifies the parameters sent/received at the provider component. After modifying the business logic, the provider should rebuild the wrapper classes and the WSDL document. A simple change in the name or the type of a parameter will cause a change in the way SOAP messages are produced. The invocations will not run properly if client proxies and client applications remain unchanged, since SOAP messages sent from client to server and back do not follow the same schema. From this point of view, clients and servers are strongly coupled (like a method or a function inside a program).

This strong coupling is caused by the nature of invocations. In the standard web services architecture, clients use “direct invocations” to invoke web services. Other problems can arise due to the use of the direct invocation model:

• Adding more web service providers to a client application implies rewriting client proxies and/or client applications.

• If a provider is unavailable, client entities do not have a way to specify alternative providers inside their client applications. Ad-hoc programming code must be used. What we propose here is to change the architecture, moving the invocation model from direct to indirect. We also suggest the use of intermediate elements inside the standard architecture (as proposed in [6]) that can receive, process, and re-route SOAP messages.

(3)

2.2 VWS Components

Our proposed architecture for the use of Virtual Web Services (VWS) determines the need of, at least, three components: a client, a service provider, and a VWS engine. The client is the one who needs a service and performs an invocation. The service provider is the one that publishes and offers a web service. In our VWS architecture, invocations are not performed directly from the client to the server (provider), but rather the client performs an invocation to the VWS engine, who acts as an intermediary, and the engine performs an invocation to the provider.

The VWS engine can be something as simple as adding some kind of decision capabilities to the client proxies (Fig. 1). Alternatively, it can be something as complex as a dedicated server (Fig. 2). This document explains how to deploy web services with high-availability features using a complex VWS engine.

Fig. 1. Simple VWS Engine. Fig. 2. Complex VWS Engine. VWS Engine Provider Provider Client VWS Engine Provider Client Client _Provider Proxy Provider

The VWS engine is not a virtual server. It is a standard server, and it should be implemented in the same way as a server used to process standard web service invocations. The term “VWS engine” refers to the capability of a server to understand virtual services definitions. That is, it can receive, process, and respond to standard web service invocations. In order to process a request, the engine uses VWS descriptions to select and invoke the most suitable web service provider.

3 Implementing Virtualization

Virtualization will lead us to use a new kind of services: virtual web services (VWS). Any application that is able to use a standard web service can be bound to a a virtual web service. Our virtualization technology provides a definition language (an XML-based language). This language (VWSDL, VWS Definition Language) is used to write VWS documents. Clients do not use VWS documents, since these documents are just a definition of an implementation of a virtual service, and this definition is only useful to the VWS engine. Our proposed language has been defined as a stand-alone language, that is, not as a WSDL extension, since its intended use is completely different. Even though the main objective of WSDL and VWSDL is the same (describe a web service), the way that services are described are completely different.

VWS documents are used to describe virtual web services, and they must contain, at least, a list of methods provided by the service (using the method elements, as

shown in Fig. 3). In addition, for each method published inside a service, we must specify the input and output parameters, and their corresponding data types.

As shown in Fig. 3, each method element has its own name attribute and another

attribute called type_{. The}type_{attribute is used to specify the type of implementation}

that is being defined. For instance, the equivalent value states that, in order to

execute a virtual method, a set of “equivalent” services are available to accomplish the execution. The content of a method element is also used to specify a list of web

(4)

services and methods that will be invoked in order to complete the execution of a method. All those web services and their methods represent in fact the implementation of a virtual method, and they are specified (inside the method

element) using so many invoke elements as needed (Fig. 3).

Usually, we want web services to be deployed to all nodes in a cluster, so we should add so many invoke elements as nodes belong to the cluster. It is important to

note that deploying a web service to a node does not mean that it will be invoked. If a web service has been deployed to a given node, and we want it to be invoked, the provider (a cluster node) must be specified inside an invoke element (in a VWS

document).

<method name="getPrice" type="equivalent"

select="0.7*adjust(availability)+0.3*reverseAdjust(responseTime)"> <input>

</invoke> <...> </method>

Fig. 3. Sample method declaration.

3.1 Parameter Handling

In WSDL / SOAP parameters are nominal, that is, input parameters sent to methods use the name of the elements (XML elements). The same happens with return parameters. Let us suppose that we build a virtual method that receives a parameter called “P1” and returns a parameter called “RP”. This represents a little restriction in the way we write the method and invoke elements, since the name of the parameters

used in the virtual method must match the ones used in real methods. If a match can be found, the invoke elements can be used without a problem. When such a match

cannot be found, a map element should be used to solve this situation. The map

element can contain a set of in and out elements that are used to specify the

correspondence between virtual parameters and real parameters. Inside each sub-element (in or out), we must write an origin parameter name and a destination one,

so the engine knows how to map the parameters before and after each invocation. An example can be examined in Fig. 3.

It is possible to find equivalent services that use similar parameters with different data types; a price is a typical example that can be handled using integer types or float types. In situations like this, a conversion process must be performed. This conversion

(5)

is expressed via the type attribute. It is also possible to map a simple type to an

element contained inside a complex one using XPath [7] expressions, that is, assign a simple value extracted from a DOM node [8], for example.

The VWS engine should check all type assignments described inside the VWS documents, and this type-checking process should be made only once per VWS document: when a document is first deployed to the engine.

The use of the map element brings much new functionality to the virtualization

technique, since parameter mapping between virtual and real services lessens the coupling level between client applications and service implementations. Some benefits we obtain are: deploying new versions of implementation services with the same interface, new interface versions using the same web services, and service testing inside the cluster deploying different versions of the real web services.

4 Cluster Implementation

Our virtualization model establishes the existence of two types of nodes inside a cluster. The first type is the principal node. This node will be in charge of receiving service invocation requests. We must place the VWS documents inside the principal node. The second type is the non-principal node or provider node, and it is in charge of executing requests received from principal nodes.

Using the method and invoke elements of the VWS documents we can establish a

relation between real and virtual services. This way, we can create a cluster architecture like the one depicted in Fig. 2, where the cluster’s principal node (a VWS engine) will be the one in charge of receiving client requests and distributing the workload across cluster nodes (web service providers).

According to this structure, if a provider (a cluster node) fails, the VWS engine will redistribute pending invocation requests, and the operative nodes in the cluster should take charge of unassigned workload. This way, the whole cluster continuity can be guaranteed. However, there are still some problems to solve. How can we accomplish such a load balancing system? How can we deal with a planned or unplanned node outage? How can we select the most suitable provider in each moment? Next sections will provide answers to these questions.

4.1 Building a Web-Services-Based Cluster

Let us suppose that we have a web service (WS1), with a method (M1). If we want WS1 to be a highly available web service, we must deploy it to a cluster. The deployment process requires the web service to be deployed to each one of the non-principal nodes in the cluster. This way, more than one instance of the web service can be used, and these instances can be executed at different nodes. All of them are said to be equivalent.

To get our VWS engine up and running we must create a VWS document to define a virtual web service (VWS1) that would contain, at least, a virtual method (VM1) (it is important to note that clients do not use the VWS documents to perform the binding; they use a standard WSDL document derived from the VWS document). Inside the VWS document, we must specify how the virtual method execution should be accomplished. We need to include, at least, three XML elements inside the VWS

(6)

document: a service element describing the service; a method element that describes

the method we want to publish (including its input_andoutput_{elements); and one or}

more invoke elements. These invoke elements are the ones in charge of describing

how and where the method implementation must be made.

When a method execution request for the VM1 method arrives to the VWS engine, the engine must select the most suitable provider in order to complete the request. Once a provider has been selected, a real service (implementation service) will be invoked sending it input parameters as needed. After service execution, the engine will receive the return parameters from the real service and it will send them back to the client application. For the whole cluster to run accurately, the VWS engine has to decide which provider node would be the most suitable to perform an execution. The engine should use some selection criteria in order to maximize cluster performance. 4.2 Node Selection

After the VWS engine receives a request, the engine must select a cluster node that can accomplish the invocation request. To do it, the engine will examine the content of the method_{element included in a VWS document describing the virtual service.}

Among the providers detailed in the invoke elements (included in the method

element), the VWS engine will choose the best prior to each real service invocation. This concept, the best, is a concept that can change along time.

Our virtualization model proposes the use of expressions to select the most appropriate provider prior to each invocation. Each method element inside a VWS

document should include an expression. This expression must reflect which the priorities are when a provider has to be selected. Let us consider expression (1)

0.7*A+0.3*R ₍₁₎

where A_{is the availability and}R_{is the response-time, and they represent historical} data about a given web service. Prior to a web service invocation, the VWS engine must compute the values associated with all services specified in the invoke_elements

inside a method element; then compare all result values and, finally, invoke the

service with the highest score.

In order to unify scales, ranges and units for all variables used in an expression, the function adjust can be used. In addition, for variables whose minimum values

represent a maximum scoring, the function reverseAdjust is available. An example

(just a fragment of it) is shown in Fig. 3 (select element).

4.3 Complex expressions

Two additional methods will allow writing complex expressions: complex variables and time-variables. A complex variable is a new variable that is created using a combination of simple ones. In expression (2), a variable defined as G_{is used} to hold a value that represents a global score for a provider, using A, R, and another variable C that represents the current number of pending requests at the provider.

On the other hand, time-variables allow writing expressions that involve present and past values of a variable. For example, expression (3) can be used to obtain a global qualification for a provider, where time is taken into account by using two

(7)

global-score values (Gi-1 and Gi), and a weighting factor (W) is used as an stabilization

mechanism.

G = 0.1*reverserAdjust(C) + 0.3*reverseAdjust(R) + 0.6*A ₍₂₎

adjust (( W*Gi-1 + Gi ) / (W + 1)) (3)

4.4 External Variables

Our virtualization model does not impose any kind of constraint when writing expressions or using variables, and so we can use external variables managed by an external entity. Let us suppose that we have a virtual service (VS1) and three providers (P1, P2, and P3), and they are selected using expression (4). To force the use of a given provider, we can use expression like (5).

0.4*reverseAdjust(R)+0.6*adjust(A) ₍₄₎ upDown*(0.4*reverseAdjust(R)+0.6*adjust(A)) ₍₅₎

The upDown variable is an external variable, and so an external entity should

provide so many values as providers are detailed inside the virtual method definition. At first, the upDown variable should be created with value 1 for all the providers. If we

want to stop using provider P2 as an implementer of VS1, we only need to change the value of upDown and set it to zero.

5 Cluster Management

5.1 Cluster maintenance

To succeed in having a 100% available system, methods that allow performing maintenance on the cluster must be available. When we need to perform any kind of maintenance, we meet with the need for stopping part of the system. If we want to stop a node without interrupting activity, we should use the upDown variable.

If we want to remove a node from the cluster, we must set the value of the upDown

variable to zero, and the node will stop receiving new execution requests. That is, the node enters a draining state, and when there are no pending requests, the node can be removed from the cluster.

5.2 Node Error Detection

During normal cluster operation, errors can appear that can cause two different effects: increments in response time, and fatal errors like a node outage. The VWS engine is the element in charge of dealing with those kinds of errors.

If the response time of a given node is increased more than usual, the normal operation of the engine should make that node to stop being used. This can be accomplished by adding a variable called R (response time) to all expressions. For

example, using expression (4) providers with the lowest response time will always get a high qualification value, while providers with a high response time would stop

(8)

being selected for an invocation. If a node outage is detected, then the VWS engine should stop invoking services on that node. To deal with this situation the VWS engine must use the upDown variable, like in expression (5).

When does the provider can be used again? A simple procedure for dealing with this situation would consist on using a polling technique (PING or repetitive TCP-open). A most appropriate method consists on having a mechanism that allows the provider to send a notification to the engine, in order to notify the new state. VWS engines must provide a web service including a specific method (let us name it

udPort). Using udPort, providers can notify its actual state. Moreover, the udPort

can also be used to stop a cluster node (by modifying the upDown variable value).

5.3 Scalability

Scalability problems are usually solved using two different types of solutions: vertical scalability, achieved by improving hardware configuration of the cluster nodes, and horizontal scalability, where new nodes can be added to the cluster. If we decide to use vertical scalability, we will find no problems when implementing virtualization, since web services are completely independent on the hardware infrastructure. If we decide to use horizontal scalability as a way to extend the whole cluster capacity, we must search for alternative cluster structures, when building virtual services.

Our virtualization model sees the cluster as a tree, in which the root node is the VWS engine, and the provider nodes are leafs in that tree. The first way in we can extend a cluster consists on adding leafs to the tree (Fig. 4).

Fig. 4. Horizontal Scalability. Fig. 5. Hierarchical Scalability. Node 1 Node 2 Node 3 Node 1 N2 Node 3 Node 2a Node 2b Engine VWS Engine VWS Engine

The virtualization model also proposes another type of scalability: hierarchical scalability. With VWS, we can publish virtual web services and use them as if they were “traditional” (standard) web services. A virtual service implementation can be done using another virtual service, and so we can build cluster structures that contain intermediate nodes. According to this, the root node of the cluster would be a VWS engine, leaf nodes would be provider nodes, and intermediate nodes should be implemented as a mixture of both a root node and a leaf node. Intermediate nodes should behave as a root node in order to send requests to leaf nodes under it, and as a leaf node that receives requests from a root node. In Fig. 5 an expansion of “Node 2” has caused the creation of an intermediate node and the addition of two new nodes.

6 Related Work

Highly available web services are currently being implemented using application servers clusters (IIS+DCOM [9], WAS+EJB [10, 11]), but these clusters do not benefit of using web services inside the cluster itself, and so all nodes and all web services instances must have the same software infrastructure. There are also

(9)

solutions like HAWS [12, 13] that focuses mainly on traditional Fault Tolerant hardware architecture (like zSeries). All these solutions reach high availability by protecting only service implementation, instead of protecting the whole service itself.

There are other architectures comparable to our virtualization model. They are centered in solving specific problems like providing centralized security mechanisms (WS-DBC [14]), or achieving protocol independence between client and provider (WS-Gateway, now part of WAS [10]). In all cases, the web services are treated as indivisible units, while our model allows managing methods (WSDL ports), parameters, and parameter types, in an independent way.

7 Conclusions and Further Work

Using clusters of VWS, a high-availability-web-services infrastructure can be built. In addition to cluster benefits, we get additional benefits like building heterogeneous clusters (different node implementation), service implementation flexibility (independence between real and virtual), and the ability to build Internet-wide clusters. Some issues of our virtualization model are being studied: security issues in public (i.e. Internet) clusters and improvement of the node selection algorithms using aging models in order to improve the workload distribution mechanism.

The VWS engine introduces some overhead inside the execution architecture, since requests must be received, processed, and routed to a cluster node. This overhead is not important compared with the benefits obtained with the cluster, since the main goal when using clusters is to achieve the highest availability, while achieving the lowest response time is less important, and it is a secondary goal.

It has not been observed in this paper, since it is out of scope, but we wish to note that the availability of a principal node inside our virtualization model must be accomplished using redundant VWS engines and a network load balancer (like MNLB from CISCO [15], for instance)

The virtualization technology has been developed as a part of a larger project called INES. Internet ENhanced Services (INES) is a project that defines an application architecture that will be the basis for a large-scale production environment for web services. The VWS technology is the base for other works that extend the use of our model. Regarding these other features of our model:

• We can use the VWS documents to build composite web services. This work is in progress, and we are defining a set of different types of invocations. Our goal is to develop a web services programming language (WSPL, as an extension of VWSDL) that supports basic programming structures (if-then-else, while-do, etc.) and advanced issues like callback, transaction support, or synchronism. • VWS architecture enables building QoS-driven invocations, using a

variable-based model that enables qualifying services and providers.

• Our proposed architecture also provides a way to implement Service Level Agreement support over standard web services architecture. Purposes like [16] (where measurements must be taken on the client or on the provider) or like [17] (where a centric measurement element cannot be implemented using the actual architecture), can be implemented using our intermediary-based architecture.

(10)

• Unlike BPEL4WS [18], WSCI [19], or BPML[20], we propose VWSDL and a new architecture as a mechanism to give the web services technology new features like QoS, high availability, etc., not only process management.

All these issues are solved using the same VWS technique, whose main feature is the compatibility with existing web services standards. That is, the virtualization technique and the new architecture can be implemented while maintaining compatibility with existing infrastructure and using the same resources.

References

1. Steve Russell et al., “High Availability without Clustering”, IBM, 2001, http://www.redbooks.ibm.com/redbooks/pdfs/sg246216.pdf

2. Gray, J. et al., “High-availability computer systems”, Computer, Volume 24, Issue 9, Sept. 1991, pp. 39-48

3. Rajkumar Buyya, “High Performance Cluster Computing: Architectures and Systems”,

Prentice Hall, 1999

4. Armando Fox et al., “Cluster-Based Scalable Network Services”, Symposium on Operating Systems Principles, 1997, pp. 78-91

5. Roberto Chinnici et al., “Web Services Description Language (WSDL) Version 1.2”, World Wide Web Consortium, 2002, http://www.w3.org/TR/wsdl12

6. Booth et al., “Web Services Architecture”, W3C Working Draft, 2003, http://www.w3.org/TR/ws-arch

7. J. Clark, S. DeRose “XML Path Language (XPath) Version 1.0”, W3C, 1999, http://www.w3.org/TR/xpath

8. V. Apparao et al., “Document Object Model (DOM) Level 2 Core Specification Version 1.0”, W3C Recommendation, 1998, http://www.w3.org/TR/DOM-Level-2-Core 9. Distributed Component Object Model, http://www.microsoft.com/com/tech/DCOM.asp 10. WebSphere Application Server, http://www.ibm.com/software/webservers/appserv/ 11. Enterprise JavaBeans, http://java.sun.com/products/ejb

12. S/390 Division, “High Availability Web Services”, IBM and CISCO, 2000, http://www-1.ibm.com/servers/eserver/zseries/networking/haws.html

13. “High Availability with QoS”, IBM and CISCO, 2000,

http://www-1.ibm.com/servers/eserver/zseries/library/specsheets/high_availability_qos.html 14. “Securing Web Services with the Xtradyne WS-Domain Boundary Controller”, 2003,

http://www.xtradyne.de/documents/whitepapers/WS-DBC-WhitePaper.pdf 15. “MultiNode Load Balancing”, Cisco Systems Inc., 2000,

http://www.cisco.com/warp/public/cc/pd/ibsw/mulb/tech/mnlb_wp.pdf

16. Sahai, A. et al., “Automated SLA Monitoring for Web Services”, 13th IFIP/IEEE International Wokshop on Distributed Systems (DSOM 2002) pages 28-41

17. Ludwig, H. et al., “Web Service Level Agreement (WSLA) Language Specification”, International Business Machines, August 2002

18. Curbera, F. et al., “Business Process Execution Language for Web Services”, May 2003, http://www-106.ibm.com/developerworks/webservices/library/ws-bpel/

19. Arkin, A. et al., “Web Service Choreography Interface 1.0 Specification”, BEA, Intalio, SAP and Sun, June 2002. http://ftpna2.bea.com/pub/downloads/wsci-spec-10.pdf 20. Arkin A. et al., “Business Process Modelling Language (BPML) specification,” BPMI,