A Distributed Grid Service Broker for Web-Services Based Grid
Applications
Dr. Yih-Jiun Lee Mr. Kai-Wen Lien
Dept. of Information Management, Chien Kuo Technology University, Taiwan
摘要 網格運算已經成為了廿一世紀在資訊科學之分散式運算上最為重要的革新與應用之 一,他的重要性等同於Web-Service這個新的程式架構。大部分的網格系統(包含NASA 之IDGEURO之Data Grid以及美國奧勒岡國家實驗室的GT4)均已經或是有計畫將其系 統 建 置 在 Web-Service 之 上 , 以 符 合 light-weight ( 輕 量 ) 系 統 之 要 求 。 然 而 , 在 Service-Oriented架構之上,儘管系統的可靠性(reliability)大幅增加,但是如何尋找到 恰當的Service來尋求支援卻成了很大的問題。本論文提出一個植基於Web-Services Based 的Service broker的架構,並且詳述可利用性與接近性。 關鍵字:(3~5 個字)網格運算、網站服務、服務導向架構。 Abstract
Grid computing enables the cooperation of virtual organizations (Foster, 2003). It involves distributed resources sharing within coordinators. A grid user can submit tasks to access resources at different locations. This research is based on a fully distributed grid system, composed of nodes, in which users or proxies may have difficulty finding proper space to execute. The aim of this research is providing a grid service broker, which can find appropriate spaces for users and tasks without harming security. A distributed grid portal will also be provided as a byproduct.
Keywords:
1 INTRODUCTION
Script-I, including a set of web-services, is a services-oriented grid computing system, composed of nodes (Lee, 2005.1). Script-I is based on Node-to-Node computing, which refers a server to a node. A node, also a resource provider and a place where computation happens, has Script-I web services installed to support grid execution and serves many users. The virtual organizations in a Script-I system are constructed around nodes and users in a many-to-many relation. Because every node is independent and the system is fully distributed, finding a proper executive space is a problem. Most current brokers are supposed to be trustworthy, so users can pass tasks through the broker without worrying task falsification. Moreover, most of them act as central components in the system to allocate tasks to job executors. The challenge of this research is to introduce a broker component, without compromising the security and privacy of users, and keeping the system distributed.
1.1 THE EVOLUTION OF DISTRIBUTED COMPUTING
The predecessor of distributed computing was the Client-server architecture, first used in the 1980s (Schussel, 1995). Client-server architecture sorted all the participants (computers) in the whole system into two groups, clients or servers. A server in a client-server system is a computer with much higher power, better performa nce and connectivity, and can execute over a long period of time. A server can serve more than one client at a time, following their requests. On the other hand, a client (which might be personal computers) has usually a single processor and less powerful, with less resources. Clients can send requests to servers, which process without any further outside assistance (at least from the point of view of the requesters). The job of the client (the requester) is to pre-process and prepare a task, and send a request to the server for service. This architecture is very useful for two or three-tier business applications. However, there is frequently a performance and reliability bottleneck when requests run at a peak.
With the evolution of network infrastructures and the enhancement of the computational power of personal computers, Peer-to-Peer (P2P) computing provides another model of distributed computing, in which the computing devices (computers, servers, or all other devices) can link to each other easily and directly. Each device is called a “Peer” and communication or sharing occurs between two peers. A peer can play the role of both client and server, unlike in server-centric computing (in .NET Glossary of Terms, Microsoft glossary, 2004). This is termed an asymmetric client-server system (Foster and Kesselman, 2003). P2P computing is typically used for connecting hosts and sharing resources (particularly, various types of files). The main purpose of P2P is that resources can be provided and consumed by every peer. Compared with a client-server system in which only servers provide resources, P2P is more robust and reliable. However, a P2P computing system cannot differentiate the privileges of different users. Thus, control is lacking.
The idea of Node-to-Node computing (N2N) (Lee, 2005.2; Lee, 2006) is different. A node is denoted as a server and basic atom in the execution environment. A node is also an individual environment where computation occurs. There are services running on nodes, each of which might serve different tasks. Services can communicate to each other through message passing or service invocations. Therefore, communication occurs between two services or two nodes. In contrast to P2P computing, N2N computing can provide wider usage, more functionality and different services.
2 CURRENT SYSTEM STATUS
This section describes current achievements as regards security fulfilments.
2.1 SYSTEM INTRODUCTION
WSGrid involves the idea of Node-to-Node Computing (abbreviated to N2N computing). N2N computing derives from Peer-to-Peer Computing. It is also a new architecture in distributed computing. The structure of N2N computing is shown in Figure 1. It is very similar to P2P computing. However, N2N provides higher controllability. Each participant has equal position in P2P computing, but in N2N computing, the trust status of participants can be separately configured.
Figure 1. The structure of Node-to-Node Computing
It is also shown in Figure 1, that every node (participant) might connect to each other. The graph is fully connected. However, in the real situation, a node can only connect to nodes, which the current invoker has the privilege to access to. Thus, the connection might be varied, even for different users.
The virtual organization in Script-I is formed by nodes and users, (actually, the relation between nodes and users). For instance, Alice and Bob are users on NodeA; Bob and Charlie are users on NodeB; Alice, Bob and Charlie are all able to access NodeC. Thus, Alice can
access to NodeA and Node C; Bob can access to NodeA, NodeB and NodeC; Charlie can access to NodeB and NodeC. When Alice delegates part of her rights to Charlie, Charlie might now have the privilege to request a service or resource on NodeA, even he is not a legal member on NodeA. This delegation might be disabled when Alice sees fit. The virtual organization can be formed and changed dynamically.
2.2 CURRENT METHOD OF LOCATING SERVICES
In the current system, a user might own different spaces on different nodes. Any workflow might require files access or collection. Thus, to know where to locate files and services is an issue. The former must be artificially provided; and the latter can be acquired by accessing “available services list” of the web-services container. Because the services provided by different nodes might change dynamically, the user must check the lists of different nodes before submissions. However, the current executive state cannot be acknowledged. Thus, the user might expect a result coming from a service which is already in deadlock.
2.3 CURRENT SECURITY PROCEDURES
Authentication and authorization are always the primary concerns in any distributed system. In Script-I, one node may serve more than one user, so the node administrator allocates a personal workspace to every user, accessible only to its owner. A user can only be authorized to access the workspace by authentication of credentials (identities). The credential, which has to be attached at each access, is a security toke n in varied types. The invoked service can only serve the user if the token is validated.
Resource sharing and Single Sign-On (SSO) in Script-I is achieved by “GateService”. To enable partial rights delegation, a shorter-life token must be issued by the delegator, containing permitted action and effective domain, and passed to the delegatee. The SSO service is for users who hold identities on nodes to automatically go through authentications which happen in one transaction or in a reasonable period.
2.4 PROBLEM SCOPE AND EVALUATION RESULT
The full distributed Script-I environment has no central component or portal. Task distribution and job executive location depend on the user’s manual configuration. However, it is difficult to determine service locations and their current execution states. Thus, a user or proxy might send a task to an unavailable job executor causing program “starving”.
The aim of this research is to produce a grid service broker for a distributed system without affecting system flexibility. The grid broker is usually suspended to reduce computation cost, but awoken on user demand. The functionalities of the broker are (1) to retrieve the nodes available to the requester, (2) to compile a list containing available services on those nodes according to user requirements, (3) to request the current state of appropriate services and
nodes, (4) to decide or counsel the submitter about where to execute, (4) to send the tasks to those available services, and (5) to notify the requester regarding his submission or result. Moreover, to enhance user security, different security levels will also be provided for code authentication to avoid code falsification.
In addition, a grid service portal is also provided. The portal does not run on a particular node as a central component in other grid systems, but can be selected by each user. With the SSO service in GateService and broker service, the user can define a proper node to be his grid portal. Since the portals chosen by users might be located on different nodes, the system keeps its flexibility (no centralized component) and reduces the possibility of bottleneck.
3 THE SYSTEM AND SERVICES
Script-I is a set of web-services, built on top of Tomcat as a web container and AXIS as a web-services container. It is implemented using Java, and JAX-RPC and using SOAP and XML to format messages. This section introduces the services and their functionalities.
3.1 THE BASIC FUNCTIONALITIES
Script-I is a set of web-services, built on top of Tomcat as a web container and AXIS as a web-services container. It is implemented using Java, and JAX-RPC and using SOAP and XML to format messages. This section introduces the services and their functionalities.
a. FileService
FileService is used to move objects from one location to another. IndiGrid allows remote access only to certain locations to prevent possible attacks. The owner can move all his objects to where he has privileges to access. In order to maintain the consistency of distributed objects, the moved objects can be marked with their life-time and the out-of-date objects can be erased to ensure the number of duplicates. FileService usually cooperates with other services. There is only one precondition here: in order to access to a location, the requester must show that he has the privilege to access. Therefore, he needs to attach his certificate to prove his identity along with the request.
b. JobService
JobService is a task submission service to allow the owner, who wants to submit a task to another device. This service aims to balance the computation load and uses some resources on certain locations. The user sends a job along with the job requirements, so the remote service can execute it. The post-execution process is varied, depending on what mentions in the description file. To allow a remote task to process on the server, security is always the risk. To protect the system, the submission is only allowed to access the storage space, which is specified on the certificate.
c. LoggingService
LoggingService is used to record the status of the execution and servers. It can also be sued as a task-status-query service. The LoggingService can be invoked either in the job description file, or in the job itself. For instance, a user might specify a log must be sent to mark the time and place of the submission; he might also want to know when to finish the execution, that is to invoke the LoggingServer at the end of the task.
d. DelegationService
DelegationService allows the user to “issue” a temporary passport to someone he trusts. This service follows the idea of GateService (Lee, 2005), but provides more powerful and useful description to protect users’ privacy. To allow a trusted user to access to your resource, the user (delegator) must issue a temporary token (a short-lived certificate), which must be unique to the issuer. The delegatee must request the services along with the token, where specifies the actions and domains he is allowed to access to. The life time of the token is flexible, up to the issuer. A delegation can be withdrawn automatically (when the expired date is expired) or manually (when the idelegator believes the delegation is no longer necessary).
3.2 BROKERSERVICE
BrokerService is the last service in Script-I. It is also the most important service. Since the whole virtual community is fully distributed, there is no central component to manage the status of services or nodes. How can a user know that where and which service is available? How can a user determine where to submit the task? BrokerService makes the suggestion. In order to gather current service status of nodes, BrokerService aims to collect all necessary information and makes decisions. The related information is written in an XML format, called “status request” and “status response”.
Two message sequences- push and pull, are used. The former mentions that every node member must automatically broadcast a status response message to every “friend” it knows. The push method can be used at “service initiation”, “service shutdown”, “too busy” and “back to normal status”, so the receiver can decide whether the sender should be added into or withdrawn from the available list. The latter sequence method is to request the status from friends or a specific node, when a particular task (a computation based process) is initiated. To serve a long-running task, the “up to date” service status is very important. Thus, the leader node must request for status updating. When a node is receiving a status request, it must check its waiting list and self-check its own status. A status response message is
generated and replied. The pull method is usually used when the current status list is suspicious (for instance, it has not been updated for a long time) or a long-running task is going to submit.
No matter which method to be used, the post-process is the same. After receiving the responses, parsing all information from status response messages, it can mark those nodes which do not reply the request as unavailable and then compile a new available list according to information message in the message. Thus, the most available node can be chosen for the task.
4 CONCLUSION
Usually, it is considered that how frequent synchronization should be performed. When synchronization often happens, network bandwidth is occupied by system messages and available lists are often compiled. Then performance of the whole system might be affected. On the other hand, if the synchronization process is seldom performed, the nodes’ status might be incorrect and the broker may choose an inappropriate executor. BrokerService tries to keep the balance between up-to-date status and cost-wasting by using both push and pull methods. However, it is possible that job holders want to point out where to submit. The system provides high flexibility for this kind of user requirements, which should be specified in the job description files. Therefore, where to execute the task can be chosen by either the broker or the submitter himself.
Unlike other projects, the service broker has different architecture (compared with the broker of the Gridbus project), which “hides the complexity of grids by transforming user requirements into a set of jobs that are scheduled on the appropriate resources, managing them and collecting results when they are finished” (Venugopal, 2004). Our broker is a web service, executed on behalf of users, but not directly access submissions. The submitter will be able to choose different levels security to protect his tasks and data against eavesdropping and falsification. Finally, a distributed grid portal is provided as a byproduct. The portal is not a particular node or a central component, like MyProxy (Novotny, 2001). The portal is actually a service which can be running at anywhere. However, it does provide a magic gate. A user can access to all resources (with permission) via the gate without bothering authentication. Single Sign-On is involved. Thus, the system is still distributed and its flexibility remains.
Reference
MyProxy, the tenth HPDC, August
2. Foster, I., & Kesselman, C., (2003). The Grid 2: Blueprint for a New Computing Infrastructure, Morgan Kaufmann
3. Lee, Yih-Jiun. (2005.1). A Security Solution for Web-Services Based Virtual Organizations, IRMA 2005, San Diego, USA, May
4. Venugopal S., Buyya B., Winton L., (2004). A Grid Service Broker for Scheduling Distributed Data-Oriented Applications on Global Grids, Technical Report, Grid Computing and Distributed Systems, University of Melbourne, Australia
5. Lee, Yih-Jiun. (2005.2) A dynamic virtual organization solution for web-services based grid middleware,” in The NBiS workshop, in conjunction with 2005 DEXA International Conference,Copenhagen, Denmark, Aug. 2005.
6. Schussel, G. (1995). Client/server past, present, and future, http://news.dci.com/geos/dbsejava.htm, 1995
7. Lee, Yih-Jiun., (2006) PhD Thesis: Models of Workflow in Grid Systems – with applications to security and mobile code, University of Southampton, Southampton, UK, June, 2006