Requirements - Grid Caching: Specification and Implementation of Collaborative Cache Services f

This section presents the general requirements for grid caching. They are motivated by the use-cases presented before. We divide these requirements into two groups: Local and Collective Operation requiremts. Several of the requirements refered here are not dealt directly in this thesis. In the Section 2.5 we mention the particular requirements that are addressed in this work.

2.3.1 Local Operation Requirements

The requirements for local operation of temporary data in grid enviroments are:

• Delegation of temporary data operation

From users and applications’ points of view, it should be possible to delegate the responsibility of the operation of temporary data to a specialised entity that applies sophisticated strategies for determining which data should be kept in the storage resource. At the same time, it must be possible to regulate the access to data and storage as shared resources. In the first use-case, applications and users delegate the complex task of operating temporary data to specialised grid components. These grid components control individual storage resources on behalf of users and applications. Furthermore, these components implement the mechanisms of interaction in order to work together.

• Accessibility by different types of clients

Entities and capabilities related with temporary data access must be available for the grid environment. These capabilities must be available for different

kinds of users, applications and services. In the first use-case, different tasks are realised by diverse applications in multiple locations. In each location, applications require to execute access operations to manipulate the temporary data. The differences related with each particular technology must be transparent for clients.

• The uniformity of operations and interfaces

The operations and interfaces necessary to handle temporary data must be uniform. The components that operate temporary data must support a stan- dard set of operations and interfaces. In the first use-case, the same action to find enough storage space is launched on several locations. In each location, the request must be interpreted in the same form. In the second use-case the resource administrator gathers information about the level of capacity of resource utilisation. Here, monitoring information must be represented in a common form.

• The capacity to gather resources on demand

Storage space for temporary data must be provided incrementally following the availability of shared resources. This need is also caused by the non-predictive character of temporary data. Supplying the requirements on demand gets more importance in grid environments because of the dynamic character of the resources. In the first use-case, users try to gather storage capacity from different and distributed sources and locations. Additionally, each location tries to get the maximum storage capacity. In this case different strategies can be applied to obtain the capacity desired from the shared asset.

• Optimization for efficient use of resources

Users expect to get unlimited storage resources for storing their data. Re- sources however are finite and their capacity limited. Sharing strategies should provide enough resources for the enormous storage space requested. Within this context, it is necessary to use resources with maximal efficiency. In our second use-case, the resource administrator needs to optimise the utilisation of the group of available resources. The administrator needs to collect information that will help him to recognise situations where the resources are not used in the best way so the appropriate corrective actions can be taken.

2.3.2 Collective Operation Requirements

This section presents the requirements for a proposed system in relation with some general aspects of collective temporary data operation. These requirements are motivated by use-cases presented in Section 2.1.1.

• Accounting data activity

It is necessary to collect information about resource usage and data actions performed by the system. Accounting processes require that the resource con- sumption be measured, rated, assigned, and registered between participants. Tracing permits to establish the degree of efficiency of resource use. Further- more, it permits to establish the behaviour, state, and activity for individual components and evaluate the effect on the collective system.

In the second use-case, it is necessary to establish the level of capacity through- out the progress of resource utilisation through a specific time period. In the same way, that information can be compared with other traced items of sim- ilar components. This permits to establish if there is a trend of unbalanced resource utilisation between locations. Data activity accounting is highly important to make a general analysis and evaluation of the system function. Finally, it is also necessary to establish use trends and access patterns.

• Flexibility to choose schemes and strategies

Many questions must be raised in every situation related with the management of distributed temporary data. Each situation has particular characteristics and needs that make it difficult to propose a universal solution. A temporary data management system must be able to deal with a large number of strategies, parameters and options. Depending upon the choices made in each situation, a variety of schemes and strategies are available for the effective and efficient control and operation of temporary data.

In the second use-case, in case of a modification of the relationships among components and locations in the distribution of workload, the administrator must change the operation of the system to get the expected effect. This capacity configuration permits a choice to be among the schemes and strategies that affect the system function.

• Performance Monitoring

An absolute prerequisite for the management of temporary data is the ability to measure the performance of data operations, we cannot hope to manage and control a system or an activity unless we can monitor its performance. One of the difficulties for performance monitoring is to obtain and use the appropriate information that describes performance. This information must be fully specified.

In the second use-case, the resource administrator requests appropriated information in order to determine if the main cause of a long response time is the

excessive level of utilisation of some resources in comparison to others. De- tailed information permits to take the decision for the appropriate correction. In this way, performance monitoring must provide information about

– availability, percentage of time that components are available – response time, time to execute an action

– effectiveness, percentage of success operations – throughput, rate of operations processed – utilisation, percentage of capacity used • Enabling the control of storage resource

The infrastructure to operate on temporary data in the grid is gathered from diverse resources provided from multiple locations. Each location must permit the use of its resources for partners (in an automatic way to store data). This capacity must be available for the grid and is operated by the management system of temporary data. In the second use-case, the resource administrator modifies some operations and control parameters of the components of the system. This includes remote components that provide functions to command their individual operations. This functions must be available for the collective system.

• Coordination for effective operation

The use of distributed resources and components requires a proper interaction relation to act together effectively. This requires that components support operations to make possible common organisational actions addressed to get a global effect on the functional system. In the second use-case, changing the distribution of work load through grid locations requires that coherent actions regulate work distribution between components.

• Detection and faults

The coordination of temporary data between different locations requires main- tain once of the proper operation of the system as a whole. Typically, several resources and components work in parallel; if a fault occurs it is important to determine, as rapidly as possible, where the fault is exactly; and to reconfigure or modify the components in such a way as to minimise the impact on operation. In the second use-case, the resource administrator needs to get opportune and detailed information about the failure. This information must help him determine the problem. With this information, the resource administrator must be able to recover the system to a functioning state.

In document Grid Caching: Specification and Implementation of Collaborative Cache Services for Grid Computing (Page 58-62)