• No results found

Cloud Computing. Up until now

N/A
N/A
Protected

Academic year: 2021

Share "Cloud Computing. Up until now"

Copied!
16
0
0

Loading.... (view fulltext now)

Full text

(1)

Cloud Computing

Lecture 6

Grid Case Studies

2010-2011

Up until now…

• Introduction.

• Definition of Cloud Computing.

• Grid Computing:

• Schedulers

• Globus Toolkit

(2)

Summary

• Grid Case Studies:

• Monitoring: TeraGRID IS • Data Transfer: LIGO • Task Distribution: GEO600

(3)

TeraGrid (

National Science Foundation

)

•TeraGrid DEEP: Integrates NSF’s 60 largest computers (+ than 60 TF):

•more than 2 PB of online storage.

•National data visualization infra-structure.

•Most powerful world computing network.

•TeraGrid WIDE Science Portals:

•Integration of scientific communities.

•More than 90 community data collections.

•Cooperation with other grid projects in Europe and Asia/Pacífic.

• Provide a mechanism that allows all participants to publish and discover information about available capabilities:

• What are TeraGrid’s computing resources? • Which features are provided by each resource? • Where are the login services?

• Where can I access a particular data collection? • Who has a weather forecast service?

• Provide a mechanism adapted to the TeraGrid open community:

(4)

Technical Issues

• Information is stored in legacy systems:

• Databases (different types, restricted access). • Static and dynamic web interfaces.

• Multiple and varied database schemas:

• It’s very complex to design an integrate database which supports all data types and relations.

• Many types of clients (browsers, SOAP, ReST).

• Service availability is critical:

• Both TeraGrid management (testing, documentation and planning) and its users and partners depend on this service.

• Therefore it has a 99.5% availability goal.

TeraGrid Information Service Architecture

Clients Clients Clients Cache Cache WS/REST HTTP GET WS/SOAP WS MDS4 Tomcat WebMDS Apache 2.0 TeraGrid Central Services WS/SOAP WS MDS4 Resources

(5)

Registry

• Editors record available content:

• The local service maintains a registry at the central registry.

• Entries expire automatically so are refreshed periodically. • Editors maintain ownership of their information systems

(they can even participate in other grids).

• Indexing services pull content:

• Registry entries have access control. • Registry entries have a link to the source. • Cachesupports service faults, etc…

High Availabilty Architecture

info.dyn.teragrid.org

info.teragrid.org

TeraGrid Dynamic Direct Numerical Simulation

Clients

Resources and Partners

(6)

TGUP Batch Load & Queue Data

IIS provides queue &

batch load

information from all

RP sites for TGUP to

use in system

monitor

<LoadRP xmlns=""> <ComputeResourceLoad xmlns=""> <ResourceID>pople.psc.teragrid.org</ResourceID> <SiteID>psc.teragrid.org</SiteID>

<LoadInfo hostname="tg-login1.pople.psc.teragrid.org" timestamp="2009-11-11T13:46:19Z"> <Load> <Type>queue</Type> <Value>98</Value> </Load> http://portal.teragrid.org/

TeraGrid Results

• Does not require deep modification or loss of

ownership of legacy systems.

• Simple and consistent access mechanism.

• Integrates:

• Description of computing services and queue state. • Registry of service and software availability.

• Centralized documentation.

• Test, validation and verification service: INCA, testing and execution portal.

(7)

Segurança Execução Monit. Base Dados

GT4: Base

GridWay GridFTP Reliable File Transfer GRAM MDS4 CAS Data Rep Delegation Replica Location Java Runtime C Runtime Python Runtime GSI-OpenSSH MyProxy Globus Toolkit

LIGO

(8)

LIGO: Laser Interferometry Gravitational Wave

Observatory

• Goal: observe

gravitational waves.

• Three physical detectors

in two locations (plus the

GEO detector in

Germany).

• More than 10 centres for

data analysis.

• Collaborators in more

than 40 institutions.

LIGO

• LIGO records thousands of data channels generating

1TB/day of data during test periods:

• The result data are published and data centres subscribe to the parts that local users want for analysis or storage.

• The data analysis results in more derived data:

• About 30% of LIGOs total data.

• They are also published and replicated.

• More than 35 million files on the grid:

• More than 6 million unique files

(9)

The Challenge

Replicate more than 1 TB/day of data to more

than 10 locations. Solution:

• A publish/subscribe model.

• Let scientist specify and discover data using

applicational criteria (metadata).

• Let scientists locate copies of the data.

Technical Constraints

• Efficiency

• Avoid bandwidth sub-use during transfer specially

on broadband (10 Gbps) links.

(10)

Lightweight Data Replicator

Joins three basic Globus services:

1. Metadata service (MDS):

“What files are available?”

Information about files such as size , md5, date,… Metadata propagation.

2. Globus Replica Location Service (RLS):

“Where are the files?”

Catalog service translates filenames into URLs. Maps files to locations.

3. GridFTP Service:

“How can we copy files?” Server+adapted client.

Used to replicate data between locations.

LIGO Data Replicator Architecture

• Each participant has a machine dedicates to transferring files requested by local clients.

• The scheduler requests metadata and catalog replicas to indentify missing files which are added to a priority list.

• The transfer daemon checks the list, transfer files and updates the LRC. • If a transfer fails it remains on the

(11)

LIGO Results

• Complete LIGO/GEO experiment:

• Replicated 30 TB in 30 days. • MTBF: 1 month.

• More than 35 million files on the LDR network.

• Performance limited by chosen programming

language (Python).

• Partnership with Globus Alliance to include a version

of the LDR in Globus Toolkit.

Data Replication Service

• Data Replication Service (DRS)

• Reimplementation of LDR publish/subscribe capabilities.

• Uses Java based WS-RF services.

• Uses the RLS e RFT services from GT4.

(12)

Segurança Execução Monit. Base Dados

GT4: Base

GridWay GridFTP Reliable File Transfer GRAM MDS4 CAS Data Rep Delegation Replica Location Java Runtime C Runtime Python Runtime GSI-OpenSSH MyProxy Globus Toolkit

GEO600

(13)

GEO600 Observatory

• Goal (same as LIGO):

observe gravitational

waves.

• 600m laser

interferometer close to

Hannover.

• Members of LIGO as

well.

The Challenge

• Sweep the packets of data produced by

GEO600 looking for gravitational waves:

• Very complex signal processing.

• Very large amounts of data to process.

• D-GRID (state owned German Grid) and Open

Science Grid available.

(14)

Two Pronged Approach

Einstein@Home

• Shared with the LIGO community.

• Runs on voluntary PCs. • Uses the BOINC network. • Runs since mid 2006. • >70,000 computers/week • ~19000 units/day

AstroGrid-D

• Same application as Einstein@Home. • Using D-Grid and OSG. • Runs since Oct/2007.

• Distributes tasks using GRAM. • Averages 4000 units/day.

Traditional Approach to Resource

Management

• Accessing multiple sites:

• Accounts, permissions, etc...

• Use a metascheduler to decide on resource

selection:

• GridWay

• Metascheduler uses GRAM to contact different

job submission sites.

(15)

GEO600 Approach

• Submission node has a list of all the GEO 600 resources with min. and max. job capacities for each.

• Hourly the list are reviewed and jobs are dispatched. • Every time a node has less than min. Jobs, more jobs are

transferred upto max. jobs and the corresponding input files are transferred asynchronously.

• The status of each job is maintained in the submission machine.

• Job output files are sent back to the submission system asynchronously.

• GEO600 processes 4000 jobs/day this way.

D-GRID Approach

GRAM4 Service

Scheduler (e.g. Condor)

GRAM4 Service Scheduler (e.g. SGE)

Local jobs Local jobs

Other GRAM4 tasks GEO600 jobs Other GRAM4 tasks

(16)

Next time…

• Cycle Sharing

• Edge Computing

References

Related documents