• No results found

Cloud Computing. Lecture 5 Grid Case Studies

N/A
N/A
Protected

Academic year: 2021

Share "Cloud Computing. Lecture 5 Grid Case Studies"

Copied!
31
0
0

Loading.... (view fulltext now)

Full text

(1)

Cloud Computing

Lecture 5

Grid Case Studies

2014-2015

(2)

Up until now…

• Introduction.

• Definition of Cloud Computing.

• Grid Computing:

• Schedulers

(3)

Summary

• Grid Case Studies:

• Monitoring: TeraGRID IS • Data Transfer: LIGO

(4)
(5)

TeraGrid (National Science Foundation)

•TeraGrid DEEP: Integrates NSF’s 60 largest computers (+ than 60 TF):

•more than 2 PB of online storage.

•National data visualization infra-structure.

•Most powerful world computing network.

•TeraGrid WIDE Science Portals:

•Integration of scientific communities.

•More than 90 community data collections.

•Cooperation with other grid projects in Europe and

(6)

• Provide a mechanism that allows all participants to publish and discover information about available capabilities:

• What are TeraGrid’s computing resources? • Which features are provided by each resource? • Where are the login services?

• Where can I access a particular data collection? • Who has a weather forecast service?

• Provide a mechanism adapted to the TeraGrid open community:

• Editors record information (instead of submitting to a centralized database).

• A central index allows for aggregation and discovery.

• Multiple access interfaces (WS/SOAP, WS/ReST, browser).

(7)

Technical Issues

• Information is stored in legacy systems:

• Databases (different types, restricted access). • Static and dynamic web interfaces.

• Multiple and varied database schemas:

• It’s very complex to design an integrate database which supports all data types and relations.

• Many types of clients (browsers, SOAP, ReST).

• Service availability is critical:

• Both TeraGrid management (testing, documentation and planning) and its users and partners depend on this

service.

(8)

TeraGrid Information Service Architecture

Clients Clients Clients Cache Cache WS/REST HTTP GET WS/SOAP WS MDS4 Tomcat WebMDS Apache 2.0 TeraGrid Central Services WS/SOAP WS MDS4 Resources

(9)

Registry

• Editors record available content:

• The local service maintains a registry at the central registry.

• Entries expire automatically so are refreshed periodically. • Editors maintain ownership of their information systems

(they can even participate in other grids).

• Indexing services pull content:

• Registry entries have access control.

• Registry entries have a link to the source. • Cache supports service faults, etc…

(10)

High Availabilty Architecture

info.dyn.teragrid.org

info.teragrid.org

TeraGrid Dynamic Direct Numerical Simulation

Clients

Resources and Partners

(11)

TGUP Batch Load & Queue Data

IIS provides queue & batch

load information from all

RP sites for TGUP to use

in system monitor

<LoadRP xmlns="">

<ComputeResourceLoad xmlns="">

<ResourceID>pople.psc.teragrid.org</ResourceID> <SiteID>psc.teragrid.org</SiteID>

<LoadInfo hostname="tg-login1.pople.psc.teragrid.org" timestamp="2009-11-11T13:46:19Z"> <Load>

<Type>queue</Type> <Value>98</Value> </Load>

(12)

TeraGrid Results

• Does not require deep modification or loss of

ownership of legacy systems.

• Simple and consistent access mechanism.

• Integrates:

• Description of computing services and queue state. • Registry of service and software availability.

• Centralized documentation.

• Test, validation and verification service: INCA, testing and execution portal.

(13)

Segurança Execução Monit. Base Dados

GT4: Base

GridWay GridFTP Reliable File Transfer GRAM MDS4 CAS Data Rep

Delegation LocationReplica Java Runtime C Runtime Python Runtime GSI-OpenSSH MyProxy Globus Toolkit

(14)
(15)

LIGO: Laser Interferometry Gravitational Wave

Observatory

• Goal: observe

gravitational waves.

• Three physical detectors

in two locations (plus the

GEO detector in

Germany).

• More than 10 centres for

data analysis.

• Collaborators in more

than 40 institutions.

(16)

LIGO: Laser Interferometry

Gravitational Wave Observatory

• LIGO records thousands of data channels generating

1TB/day of data during test periods:

• The result data are published and data centres subscribe to the parts that local users want for analysis or storage.

• The data analysis results in more derived data:

• About 30% of LIGOs total data.

• They are also published and replicated.

• More than 35 million files on the grid:

• More than 6 million unique files

(17)

The Challenge

Replicate more than 1 TB/day of data to more

than 10 locations. Solution:

• A publish/subscribe model.

• Let scientist specify and discover data using

applicational criteria (metadata).

(18)

Technical Constraints

• Efficiency

• Avoid bandwidth sub-use during transfer specially

on broadband (10 Gbps) links.

(19)

Lightweight Data Replicator

Joins three basic Globus services:

1. Metadata service (MDS):

 “What files are available?”

 Information about files such as size , md5, date,…  Metadata propagation.

2. Globus Replica Location Service (RLS):

 “Where are the files?”

 Catalog service translates filenames into URLs.  Maps files to locations.

3. GridFTP Service:

 “How can we copy files?”  Server+adapted client.

(20)

LIGO Data Replicator Architecture

• Each participant has a machine dedicated to transferring files requested by local clients.

• The scheduler requests metadata and catalogs replicas to identify missing files which are added to a priority list.

• The transfer daemon checks the list, transfer files and updates the LRC. • If a transfer fails it remains on the

(21)

LIGO Results

• Complete LIGO/GEO experiment:

• Replicated 30 TB in 30 days. (~700 MB/minute) • MTBF: 1 month.

• More than 35 million files on the LDR network.

• Performance limited by chosen programming

language (Python).

• Partnership with Globus Alliance to include a version

of the LDR in Globus Toolkit.

(22)

Data Replication Service

• Data Replication Service (DRS)

• Reimplementation of LDR publish/subscribe

capabilities.

• Uses Java based WS-RF services.

• Uses the RLS e RFT services from GT4.

(23)

Segurança Execução Monit. Base Dados

GT4: Base

GridWay GridFTP Reliable File Transfer GRAM MDS4 CAS Data Rep

Delegation LocationReplica Java Runtime C Runtime Python Runtime GSI-OpenSSH MyProxy Globus Toolkit

(24)
(25)

GEO600 Observatory

• Goal: observe

gravitational waves.

• 600m laser

interferometer close to

Hannover.

• Members of LIGO as

well.

(26)

The Challenge

• Sweep the packets of data produced by

GEO600 looking for gravitational waves:

• Very complex signal processing.

• Very large amounts of data to process.

• D-GRID (state owned German Grid) and Open

Science Grid available.

(27)

Two Pronged Approach

Einstein@Home

• Shared with the LIGO community.

• Runs on voluntary PCs. • Uses the BOINC network. • Runs since mid 2006.

• >70,000 computers/week • ~19000 units/day

AstroGrid-D

• Same application as Einstein@Home.

• Using D-Grid and OSG. • Runs since Oct/2007.

• Distributes tasks using GRAM. • Averages 4000 units/day.

(28)

Traditional Approach to Resource

Management

• Accessing multiple sites:

• Accounts, permissions, etc...

• Use a metascheduler to decide on resource

selection:

• GridWay

• Metascheduler uses GRAM to contact different

job submission sites.

(29)

GEO600 Approach

• Submission node has a list of all the GEO 600 resources with min. and max. job capacities for each.

• Hourly the list are reviewed and jobs are dispatched.

• Every time a node has less than min. Jobs, more jobs are

transferred upto max. jobs and the corresponding input files are transferred asynchronously.

• The status of each job is maintained in the submission machine.

• Job output files are sent back to the submission system asynchronously.

(30)

D-GRID Approach

GRAM4 Service

Scheduler (e.g. Condor)

Computing Nodes

GRAM4 Service Scheduler (e.g. SGE)

Computing Nodes

Local jobs Local jobs

Other GRAM4 tasks GEO600 jobs Resource A Resource B Other GRAM4 tasks

(31)

Next time…

• Cycle Sharing

• Edge Computing

References

Related documents

Increasing concentrations of sucrose in the growth medium had a general suppressive effect on callose deposition (Fig. Although 1 and 2.5% sucrose did not have a profound impact

The critical defect length leading to thermal runaway is determined as a function of the current decay time constant s dump , RRR of the SC cable copper matrix, RRR of the bus

Additional information from the Water Resources Agency of Taiwan about Taichung City and Fushin Township was also used for analyzing flood factors.. There are a number of

The Trends in International Mathematics and Science Study (TIMSS) uses five dimen- sions related to school climate: class learning environment, discipline, safety, absence of

[87] demonstrated the use of time-resolved fluorescence measurements to study the enhanced FRET efficiency and increased fluorescent lifetime of immobi- lized quantum dots on a

Long term monitoring of this species is required to determine its arrival and departure dates, stopover areas, migration routes and habitat requirements in and around

The presence of endophytes in the food plant of the aphid hosts significantly increased the larval and pupal development time of parasitoids.. Unexpectedly, all other measured

А для того, щоб така системна організація інформаційного забезпечення управління існувала необхідно додержуватися наступних принципів: