• No results found

GridICE: monitoring the user/application activities on the grid

N/A
N/A
Protected

Academic year: 2021

Share "GridICE: monitoring the user/application activities on the grid"

Copied!
7
0
0

Loading.... (view fulltext now)

Full text

(1)

GridICE: monitoring the user/application activities

on the grid

Cristina Aiftimiei1, Sergio Andreozzi2, Guido Cuscela3, Stefano Dal Pra1, Giacinto Donvito3, Vihang Dudhalkar3, Sergio Fantinel4, Enrico Fattibene2, Giorgio Maggi3, Giuseppe Misurelli2 and Antonio Pierro3

1

INFN-Padova - Ist. Naz. di Fisica Nucleare via F. Marzolo, 8 - 35131 Padova - ITALY

2

INFN-CNAF - Viale Berti Pichat, 6/2 40126 Bologna (Italy)

3

INFN-Bari - Bari University, Via Orabona 4, Bari 70126, Italy

4 INFN-Legnaro - Lab. Naz. di Legnaro, viale dell’Universit? n. 2, 35020 Legnaro (PD)

ITALY

E-mail: cristina.aiftimiei@pd.infn.it, sergio.andreozzi@cnaf.infn.it,

guido.cuscela@ba.infn.it, stefano.dalpra@pd.infn.it,giacinto.donvito@ba.infn.it, vihang007@gmail.com, sergio.fantinel@lnl.infn.it, enrico.fattibene@cnaf.infn.it, giorgio.maggi@ba.infn.it, giuseppe.misurelli@cnaf.infn.it,

antonio.pierro@ba.infn.it

Abstract. The monitoring of the grid user activity and application performance is extremely useful to plan resource usage strategies particularly in cases of complex applications. Large VO’s, like the LHC ones, do their monitoring by means of dashboards. Other VO’s or communities, like for example the BioinfoGRID one, are characterized by a greater diversification of the application types: so the effort to provide a dashboard like monitor is particularly heavy. We present in this paper the improvements introduced in GridICE, a general grid monitoring tool, to provide reports on the resources usage with the details of the VOMS groups, roles and users 1.

By accessing the GridICE web pages, the grid user can get all information that is relevant to keep track of his activity on the grid. In the same way, the activity of a VOMS group can be distinguished from the activity of the entire VO. In this paper we briefly talk about the features and advantages of this approach and, after discussing the requirements, we describe the software solutions, middleware and prerequisite to manage and retrieve the user’s credentials.

1. Introduction

In the Grid environment heterogeneous resources are put together and made available to users for a variety of different applications. In this complex environment it is absolutely necessary to monitor resources and applications to help detecting faulty situations, contract violations and user-defined events. GridICE [1] is an open source distributed monitoring tool designed for Grid systems to provide both fabric and application monitoring. It promotes the adoption of the standard Grid Information Service interfaces, protocols and data models. The project was started in late 2002 within the EU-DataTAG [2] project and is evolving in the context of EU-EGEE [3] and related projects (BioinfoGRID [4], EUChinaGrid [5], EUMedGRID [8], etc). GridICE is fully integrated with the gLite middleware [10], in fact its metering and publishing services can be configured via the gLite installation mechanisms.

(2)

Figure 1. Bari Farm usage monitoring provided by GridICE (proof of concept of a monitoring with the group-role details)

Figure 2. GridICE architecture

One of the main issues for Grid users is to monitor their own applications. More over, different types of users need different data aggregation views depending on their Grid role. To provide a first solution to such requirements, a new group level data aggregation based on users privileges provided by VOMS (Virtual Organization Membership Service) has been recently added to GridICE.

The organization of this paper is the following: section 2 presents GridICE main features and the synergies with LEMON; section 3 explains how GridICE allows users (on the base of their privileges) to monitor ”their” resources and applications; finally section 4 is dedicated to the conclusions.

(3)

2. GridICE architecture and features

GridICE is structured in three main components (see Figure 2):

• the sensors that perform the measurement of the monitored entities;

• the site collector which aggregates the information produced by the different sensors installed within a site domain and publishes it using the Grid Information Service based on the Lightweight Directory Access Protocol (LDAP);

• the server that performs several scheduled functions like discovering new sites, gathering and storing in a relational database the measured data, showing results through web interface in a XML format. The GridICE server periodically queries a set of BDII nodes (chosen by the server administrator) containing info about a certain number of sources nodes. An up-to-date list of active source nodes is created by comparing the list retrieved from the BDII with the one already known to GridICE. In this way the server configuration is updated in order to collect data from all the sources including the new discovered ones. The data collection is performed by periodically running the different types of plug-ins whit a frequency depending from the type of service monitored.

2.1. GridICE and LEMON

LEMON [13] is a site monitoring system based on a server/client architecture. On every monitored node, a monitoring agent launches sensors for retrieving monitoring information; it communicates with them using a push/pull protocol. The extracted monitoring samples are stored on a local cache and forwarded to a central Measurement Repository using UDP or TCP transport protocol with or without authentication/encryption of data samples. LEMON provides a complete monitoring of a site: sensors are available to collect information from remote entities like switches or the electrical power distribution systems. The Measurement Repository can be interfaced to a relational database or a flat-file backend for storing the received samples. Web based interface is provided for visualizing the data.

The use in GridICE is related not only to the sensors provided for the site fabric monitoring, but LEMON is also used to transport the data to the central node (collector) where data are collected and published after a translation in LDAP Data Interchange format (LDIF [12]).

In addition to LEMON sensors, GridICE provides its own ones for specific measurements: • job monitoring sensor, built to provide information related to jobs submitted to a CE

(Computing Element). The latest version of job monitoring sensor collects the VOMS attributes for each job monitored in order to correlate the job to a given user, to a particular VO or even to a specific group inside a VO.

• LRMSinfo sensor to give a measurement of resources usage in a site.

The current GridICE monitoring system still uses an old version of LEMON (2.5.4) which is not any more supported by the LEMON authors. The integration work of the new LEMON version in GridICE was recently completed and a test phase was started aiming to include the new LEMON in the official GridICE release. The latest LEMON version has several new features which can be beneficial also to GridICE. Apart from the intuitive web based interface to check the local farms, one important new feature is the sensorAPI, which can be used by the local admins to create their own sensors for some specific information in which they might be interested. This feature strongly simplifies the creation of new GridICE sensors for grid specific tasks. Another of the new features worth to mention is the modular configuration. LEMON client/server supports small configurations files for various parts. This makes the integration with GridICE fairly easy.

(4)

Figure 3. List of roles for a authenticated user

3. Grid monitoring from the different users’ perspectives

The design of the presentation of the data collected by GridICE, is based on requirements given by different type of users, each of them having to do with a different abstraction level of a Grid: the Virtual Organization level, the Grid Operation Center level, the Site Administration level and the End-User level.

• Virtual Organization managers could wish to observe and analyze the performance of the ”actual” system they are using (which can dynamically change over time).

• Both site administrators and grid operation center managers can require performance analysis and fault detection of the resources for which they are responsible.

• Grid Service developers can require the possibility of analyzing the behavior of their applications (e.g., how does a resource broker dispatch jobs over a set of available resources). GridICE automatically recognizes the user from the personal certificate, in particular from the distinguished name, installed on the user browser and the use of the HTTPS protocol during the authentication phase and retrieves his role (VO manager, site manager, etc.) from a local database filled with the user credentials maintained by GridICE developers team (a more general solution could be provided in the future). Clearly, the GridICE server allows the unauthenticated access in parallel to the authenticated one. However authenticated user will automatically get access to information selected on the base of their role avoiding the necessity to enter the appropriate query in the GridIce web pages. Moreover authenticated users, with a more specific role or privilege, can retrieve sensitive information, such as the name of users submitting the jobs, which are not shown to unauthenticated or standard users. In fact a standard user (i.e. authenticated user with no privileges) is limited to see only information (see Table 1) about his own jobs (see figures 4, 6).

On the other side, the user identification in GridICE (see figure 3), offers new the possibility for registered user with specific privileges or roles (Site Manager, VO Manager, etc.). For example a Site Manager can now identify all the users submitting jobs to his site in order to recognize a misuse of the resources from a particular user.

The same future is also useful to ROC managers to track the users which are submitting jobs to the resources he is responsable for. The same is true for a VO Manager that could be interested following to the jobs submitted by members of his VO or even of a particular VOMS group in his VO (see figure 1) or the farm usage calculated like the number of computing hours provided by each site(see figure 7).

In conclusion, mostly of the information shown to authenticated users, in particular the non sensitive one, is also available in the current GridICE version, however to reach his information the user has to go through several pages and use the appropriate search fields. With the authenticated access, only the info needed will be displayed.

(5)

Field Description

NAME Job name

JOB ID Local LRMS job id GRID ID Grid job unique id USER Local mapped username

VO User VO name

QUEUE Queue

QTIME Job creation time START Job start time

END Job end time

STATUS Job status

CPUTIME Job CPU usage time WALLTIME Job walltime MEMORY Job memory usage VMEMORY Job virtual memory usage EXEC HOST Execution host (WN) EXIT STATUS Batch system exit status SUBJECT User DN

VOMS ROLES User VOMS roles

Table 1. Attributes measured by the Job Monitoring sensor

Figure 4. Table showing its own jobs to the user identified by means of its browser certificate

3.1. Improving GridICE server performances

Several different components may be tuned leading to a faster GridICE server. One of these is the Postgresql database. A first level improvement come from better tuned db installation settings, such as splitting database tables or components on separate disks (if possible). Similar “installation tips” are described in the appendix of the GridICE installation manual.

As part of the GridICEdb schema and software, moreover, ad hoc maintenance tools have been developed and schema’s components have been added in order to give faster response on a set of common client operations, such as graph generation. Many kinds of them gets now much quickly obtained thanks to recently introduced aggregation tables. These tables provide daily aggregated statistic for jobs, both respect to the End date as to the Start date (latter one permits to optionally comply with standards adopted by other tools).

Apart for adding elements on the existing schema, altering and optimizing it is also an ongoing and under test work. The main goal here is to reduce size and number of tables involved in a particular set of queryes, as also to reduce the needed number of queryes to perform a given task.

(6)

Figure 5. Job’s exit status per Site.

Figure 6. Exit status job for a specific user monitoring (browser user certificate needed.

Figure 7. Farm usage calculated like the number of computing hours provided by each site.

4. Conclusions

GridICE with its security layer and its strict authorization mechanism provides the possibility for users, VO Managers and Site Administrators to obtain detailed usage information for their jobs, nonetheless respecting their privacy. The security layer and authorization mechanism have been briefly described in this document. Future work is targeted to improve performance, features and reliability. An improvement in performance has already been achieved by including aggregated and summarized data and tuning PostgreSQL [15] installation and configuration. To test and improve reliability of GridICE jon monitoring data, a comparison is going on with the batch-system logs and the data collected by DGAS [14]. Further future improvement will concern the extension of the job monitoring sensor in order to observe local jobs (job submitted without using the Grid layer) and the notification system in order to filter events occurred very close in time period to be compacted in a single message.

(7)

5. Acknowledgments

We would like to thank the funding projects BioinfoGRID [?], EGEE [3], EUChinaGrid [5], EU-IndiaGrid [6], EU-MedGrid [8], LIBI [7], OMII-Europe [9] for supporting our work. Also many thanks to the LEMON team for their fruitful collaboration and promptly support.

This work makes use of results produced by the Enabling Grids for E-sciencE (EU-EGEE) project, a project co-funded by the European Commission (under contract number INFSO-RI-031688) through the Sixth Framework Programme. EGEE brings together 91 partners in 32 countries to provide a seamless Grid infrastructure available to the European research community 24 hours a day.

References

[1] GridICE Website. http://grid.infn.it/gridice.

[2] European DataTAG project. http://datatag.web.cern.ch/datatag/. [3] European Grid for E-sciencE, 2007. http://www.eu-egee.org.

[4] BioinfoGRID: Bioinformatics Grid Application for life science. http://www.bioinfogrid.eu/. [5] EUChinaGrid project. http://www.euchinagrid.org/.

[6] EU-IndiaGrid project. http://www.euindiagrid.eu/.

[7] International Laboratory on Bioinformatics http://www.libi.it/. [8] EUMedGrid project. http://www.eumedgrid.org/.

[9] Omii-europe. http://omii-europe.org.

[10] E. Laure et al. Programming the Grid with gLite. Technical Report EGEE-TR-2006-001, CERN, 2006. [11] LEMON - LHC Era Monitoring. http://cern.ch/lemon/.

[12] G. Good. The LDAP Data Interchange Format (LDIF). IETF RFC 2849, Jun 2000. [13] LEMON - LHC Era Monitoring. http://cern.ch/lemon/.

[14] DGAS Web Page: http://www.to.infn.it/grid/accounting/. [15] PostgreSQL Web Page: http://www.postgresql.org/

References

Related documents

Variations in the concentration of elements in plant samples of the tunnel #504 on the sampling profi le Analysis of the results suggests that the point with the highest

Knowledge and use of insecticide treated nets as a malaria preventive tool among pregnant women in a local government area of Lagos state,

A basic task in sentiment analysis is classifying the polarity of a given text at the document, sentence, or feature/aspect level—whether the expressed opinion in

After Day, in the lighting of Moon Night wakens under a prayer to call hush to Day, time to commune. with Night, windswept under

Since, financial development of India as of late is driven principally by administrations division and inside administrations segment by data innovation (IT) and

There are infinitely many principles of justice (conclusion). 24 “These, Socrates, said Parmenides, are a few, and only a few of the difficulties in which we are involved if

Graphs showing the percentage change in epithelial expression of the pro-apoptotic markers Bad and Bak; the anti-apoptotic marker Bcl-2; the death receptor Fas; the caspase

The national health priority areas are disease prevention, mitigation and control; health education, promotion, environmental health and nutrition; governance, coord-