Using FroNTier/Squid for distributed caching

4.4 The LCG-3D project

4.4.2 Using FroNTier/Squid for distributed caching

The LCG-3D project is evaluating FronNTier [61] as a data distribution technique for Tier-1 to Tier-2 replication. Distributed caching using FronNTier/Squid has been designed and implemented within the CDF (the Collider Detector at Fermilab) experiment. Its purpose is to deliver read-only data stored in a database to multiple users distributed world-wide. The cache system allows users to retrieve data even when they are decoupled from the central database. The generic architecture of a FroNTier/Squid installation is shown in Figure 4.6.

Client Database Cache Conversion layer HTTP HTTP JDBC FroNTier Servlet running under Tomcat

Squid Server FroNTier Client API

Figure 4.6: The FroNTier/Squid architecture

This architecture provides no single point of failure, scales easily to thousands of clients, and it significantly decreases the load on the central database.

Client queries to the database are translated by the FronNTier Client API library into HTTP requests that are delivered to a Squid web cache. Only in case the data of interest are not found in the cache, the query is sent to the database. Results of the query are then put into XML files by the FroNTier server and stored into the Squid cache so that the same query, the next time it is issued, can find the data in the cache.

4.4. The LCG-3D project 55

This scenario works well when clients use repeatedly the same queries on the same read-only database tables, which is often the case in high energy Physics applications.

4.4.2.1 Consistency issues in FroNTier/Squid

One of the drawbacks of this solution is that the consistency among the central database and the distributed caches is not enforced. A squid cache server is not aware of possible database updates, so that applications running in a FroNTier/Squid environment need to be carefully designed and tested to avoid possibly subtle consistency issues caused by stale cached data. Periodically or on demand, cached objects must be refreshed.

The CMS experiment, which is one of the main user of the FroNTier/Squid solution, has agreed to a policy of never changing objects that are stored into the central database, and ultimately other cache refresh options will be implemented [50]. A mechanism that provides periodic cache refresh is implemented as an expiration time included as a meta tag in the cached XML file, which causes the file to expire the next day. This is an adequate solution for the short term. However, periodically reload- ing every cached object at every participating site will have significant performance implications.

4.4.3 Applications managed by the LCG-3D replicated environment As we have already seen in the previous sections, data in the LHC experiments are mainly event and non-event data. However, the set of data used either directly or indirectly by users and services includes also different types of metadata. Several middleware services, for example the Replica Catalogue, use relational databases to store data. These databases need also to be replicated in order to obtain the advan- tages already explained in Chapter 3. In this section we briefly review the applications whose multi-tier replication is based on the LCG-3D replicated environment.

ATLAS Conditions Databases Tier-0 on-line to off-line and Tier-0 off-line to Tier-

1 sites replication of ATLAS conditions databases is achieved with two Oracle Streams deployments (see Section 4.3). Data in conditions databases are in- serted and retrieved using the COOL API.

AMI The Atlas Metadata Interface (AMI) is an ATLAS middleware service for

dataset selection. AMI stores metadata about logical Physics datasets allowing users to choose the data they are interested in. AMI provides capabilities that

are complementary to the ones provided by DonQuijote-2 (DQ2), the ATLAS Data Management framework. AMI can be deployed using Oracle and MySQL databases.

CMS Conditions Databases In CMS, Oracle Streams is used to replicate the con-

ditions database from the on-line system to the Tier-0 off-line system. Two levels of FroNTier/Squid caches are then used to provide read-only facilities at Tier-1 and Tier-2 sites.

LHCb Conditions Databases The LHCb experiment uses COOL to store and man-

age conditions data. Tier-0 to Tier-1 replication is done using Oracle Streams.

AMGA AMGA is a metadata catalogue used to store metadata associated to logical

file names. AMGA is used in the WLCG project and the replication is done between Tier-0 and Tier-1 sites.

LFC LFC is the LCG File Catalogue (replica catalogue) used in the WLCG project.

Its main purpose is to store the association between logical and physical file names (see Section 5.2.1.4). This catalogue is the results of a re-engineering process done on a previous catalogue developed within the European DataGrid project, that had serious performance issues.

Chapter 5

Grid Computing

Driven by the need of more computational power and storage capacity to solve com- plex data analyses, Grid computing is becoming a widely used computing model in many research fields. As an example, in Chapter 4 we presented the scientific chal- lenge that physicists are facing at CERN, where the LHC will produce unprecedented amounts of data to analyse in order to investigate on high energy Physics. A Grid infrastructure will be used to store and analyse data coming from the LHC.

In this chapter we better define the characteristics of a Grid and we introduce the software used to manage a Grid, what we call middleware. We focus on the aspects that make the Grid original, even if, as we will see, it would be more appropriate to talk about an evolution more than a revolution in distributed and parallel computing. In many cases, in fact, well known standards and software are used to create services that are part of the middleware. We discuss more in detail data management and replication issues, where we also highlight the lack of a synchronisation service for replicated data.

The structure of this chapter is the following. In Section 5.1 we introduce the main concepts that help to define Grid computing and differentiate it from other meth- ods of distributed and high performance computing. To give a practical example of Grid Computing we introduce the EGEE project, that manages the main European Grid infrastructure for e-Science applications. We also give an idea of what research communities are using the EGEE Grid. In Section 5.2 we explain what a middleware is and what are the main services that it uses to manage the resources and to execute user tasks. Two middleware examples are reviewed, the gLite middleware used in EGEE, and the Globus Toolkit which is widely used in many Grid infrastructures. We put more emphasis on data management topics, especially on services offered for

data replication, and we investigate on the support provided for replica consistency. In Section 5.3 we talk about Service Oriented Architectures and the Open Grid Ser- vice Architecture, which defines a model for creating and connecting Grid Services used in the Globus Toolkit starting from version 3. Tools for Data Access and In- tegration are introduced. In Section 5.4 we draw some conclusions about replica synchronisation in Grid computing, and state the main reasons for which we believe it is important to do research in this topic. The main issues in developing a Grid Replica Consistency Service are analysed, and the few previous and current efforts in this sector are cited.

In document The Replica Consistency Problem in Data Grids (Page 72-76)