5.3 Cache Model and GCS
5.3.1 Storage layer
The GCS is deployed for the management of the content of a local storage resource. The storage resource denotes component(s) like disk file systems or archival storage systems that store the data entities.
The I/O facilities of the underlying storage resources are used by the GCS to support the access operations. The GCS uses the interfaces of the underlying system and exposes them as access operations defined in Section 4.4.2. The implementation of the GCS translates the SetData() and GetData() operations to access API of the particular storage resource.
Our GCS prototype uses the Unix file system partition as a storage resource, and translates the access operations requests to local I/O system calls of the operating system. The GCS maintains information about configuration and characteristics of specific resources: It exposes the resource description using the cache entities information (see Appendix 4.6.1).
5.3.2
Control layer
The key function of the GCS is to register the access actions realised with data entities held in the underlying storage resources. The main functions realized by the control layer are:
• execute the cache replacement policies applied to data entities held in the storage resources
• register information about data entity actions • check the time to live established for data entities
• register data access requests and data access transfers using the defined cache information
• manage and expose the cache information
The GCS registers individual cache and data activity using data actions elements (see Appendix C.1.2). This information is managed with a database system using MySQL DBMS. A mapping of the XMLSchema of the data information model (see Appendix 4.6.1) is, in fact, stored in the DBMS.
Monitoring operations are internally translated to queries to the database that regis- ters data activity information of the GCS. This makes it possible to establish during cache service working the data and cache activity information in an on-line way.
5.3.3
Collaboration layer
To support the collaboration capabilities in a system composed by a group of caches, the GCS is implemented as a grid service that provides cache capabilities to a wide range of clients including mainly other GCSs. These capabilities are supported by cache operations defined in Section 4.3.
WSDL [32] is used to describe and expose the cache operations in the grid. The cache service is built and deployed using software components and predefined services provided for the Globus Toolkit middleware version 4.1 [54]: it uses Delegation [63] for authorisation and GridFTP [4] to transfer data between cache locations.
The operations are implemented following a protocol based on exchange of XML request-response messages. Each operation is invoked using a request element that contains the parameters of the invocation and a response is given using a response element as defined in Appendix D.
The GCS implementation translates the access operation requests to invocations to the underlying resources. Each grid operation is regulated and registered by the implementation of the control layer. Similarly, registered informations are exposed through cache operations to support monitoring processes of the service operation. For example, the operation SetData() processes request messages to store a data entity in the GCS; the request contains a description of the data entity defined by the cache information schema described in Appendix 4.6. The operation provides a response message approving or disapproving the request. If the request is approved the data transfer is started using GridFTP.
5.3.4
Coordination layer
The GCS prototype does not implement the coordination layer but it provides the essential information and capabilities to its implementation. The coordination layer can be implemented by an external and specialised module or service, e.g. Chapter 6 describes a system that organizes the collective work of the group of GCSs. The high level coordination layer is implemented thanks to cache operations sup- ported by each GCS. Organisation and coordination consists in the composition of collective capabilities arranged from basic operations provided by each GCS. Mon- itoring operations permit to observe and evaluate the individual operation of each cache and configuration operations allow changing the operational parameters.
5.4
GCS Prototype Implementation
In this section we describe some details of the GCS prototype implementation. It is used as a demonstration or “proof of concept” of the approach developed in Chapter 4. The main function of the GCS prototype is to process cache operations in the storage resource location. For example, the GCS prototype mainly supports process cache information about the data and cache instance activity.
This thesis aims to specify, design and implement a software component, the GCS. GCS is the basic collaborative brick that one can use to build temporary data man- agement systems. The goal of prototype is not actually to implement an effective operational GCS system but to check that it is functional, that operations are func- tional.
Similarly, in this prototype we do not pretend to implement an optimal caching system. Therefore, as proof of concept, no extensive study of the GCS configuration and operational parameters is done and are very basic. These interesting issues are pointed as future work.
5.4.1
Grid Platform
We developed the prototype for the Globus Toolkit 4 [63] platform. Globus Toolkit is a suite of tools to develop and deploy grid systems and applications.
The Globus Toolkit supports the standard Open Grid Services Architecture (OGSA) [55] to build grid systems. OGSA defines a standard open architecture for grid- based applications. OGSA supports service oriented distributed computing with web services. The Web Services Resource Framework (WSRF) [5] is a specification
cache information elements invoke remote cache collective process operation invoke cache operation invoke cache operation MANAGER CACHE Client Service Interface Cache Remote Service Resource Interface Storage Resource operation operation request response Catalog Data & Cache
Register Activity Collaboration Client (inter−cache) Extension
Figure 5.3: GCS Prototype Architecture
to develop service-oriented applications with Web Services. OGSA/WSRF have been standardized by the Open Grid Forum OGF [62].
The use of grid services requires Web Service Description Language (WSDL) [32] which is an XML based language used to describe the interfaces of Web services in a standardised way, and a protocol to exchange Web service requests and responses. The most frequently used protocol for Web service communication is SOAP [132], which is a protocol that enables to exchange XML has been encoded messages using the HTTP communication protocol. The prototype was developed in Java program- ming language to be executed as Globus WS java container.
5.4.2
Prototype Architecture
The GCS prototype implementation is composed by five main modules. Figure 5.3 shows the general architecture of the prototype.
The Client invokes the cache operation of the Service Interface sending the XML OperationRequest element as parameter, an example using the SetDataRequest ele-
ments is presented later in Section 5.5.1. The request is enclosed in a SOAP message. The Service Interface receives the cache operations invocations. This module is implemented as a grid service using the tools and libraries provided by Globus Toolkit. A file WSDL describes as portTypes the cache operations introduced in Section 4.3, using as parameters the operation definitions of Appendix D.
The Service Interface is deployed into the Globus WS Java Core container. The container takes responsibility for many of the underlying logistic issues related to communication, messaging, logging, and security [63].
The Cache Manager module processes the cache operations using cache information elements as data structures. This module is mainly supported by the class called CacheImpl. This class implements the cache replacement mechanisms and a method for each cache operation. An example of the execution of SetData operation is presented later in Section 5.5.1.
The Cache Manager implements three basic replacement methods LRU, LFU and SIZE. It uses activity information registered in activity information elements (see Appendix C.1) and entity information elements (see Appendix 4.6.1) to support the replacement method functions. Section 5.4.4 describes the replacement method implementation. The Cache Manager invokes the Java I/O system as resource in- terface to handle files in the local filesystem. The Cache Manager invokes the data and cache activity register module to query and update cache information about the executed actions.
The Data and Cache Activity Register module manages the data and cache activity catalogue; it uses activity information elements (see Section C.1) to administes ac- tivity information. These elements are mapped to tables handled by the database system.
The Collaboration Extension represents extension modules that execute collabora- tion procedures (implemented by operations and specific to collaboration require- ments). Currently, the prototype implements simple extensions for the GetData and SetData operations that transfer the original request operation to remote caches. These extensions select the remote GCS in a round robin way; at is, a remote GCS are selected in the order that found in the element “members” of the Cache Group entity information element defined in Appendix 4.6.1. GCS holds an instance of this element for this purpose which is configurable with the SetCacheGroup() operation (see Appendix D.3.3). Later, in Section 6.2, an application scenario is described where the GCS uses a specific extension module: It invokes a special service for coordinating of collective operations between caches.
The Collaboration Extension modules uses an instance of the GCS client imple- mented with GCS API ( described below ) to invoke cache operations from other
GCSs.
Instance Configuration
To permit a flexible deployment in the globus container, the GCS is configured through the SetCache() configuration operation (see Appendix D.3.1) using a Cache entity information element as defined in Appendix 4.6.1. This operation sets the key operational parameters like the replacement method, the default time-to-live and complementary information like the owner organisation and geographical location.
5.4.3
GCS API
A Java API is provided to use the GCS prototype implementation. The API is composed of two groups or packages of classes:
Cache Information package Called package gridcaching.generated.ci, this pack- age is composed of classes for Cache Information Element definitions (Section 4.5 and Appendix 4.6) and Cache Operations element definitions (Section 4.3 and Appendix D).
This package consists of the classes that handle the element definitions as Java objects. A Java-to-XML binding is used to convert from Java objects to XML documents; it enables the programmer to deal with the data defined in the cache XML elements through an object model which represents that data. The XML Schema that defines the operations and information cache is used to generate automatically the equivalent Java classes; the Castor library is used for this purpose [50].
Cache Implementation package Called gridcaching.services.lcs.impl, this pack- ageis composed of classes that implement the cache “business logic” mainly handling by the information represented in classes of the Cache Information package. The Cache implementation package is composed of the classes that implement the operational modules of the prototype architecture described above (see Section 5.4.2).
Section 5.5.1 presents an example of utilisation of the GCS API for an appli- cation client. Figure 5.4 shows the UML class diagram of the GCS prototype generated from the source code.
10