• No results found

Grids: Concepts, Technologies and Applications

N/A
N/A
Protected

Academic year: 2020

Share "Grids: Concepts, Technologies and Applications"

Copied!
44
0
0

Loading.... (view fulltext now)

Full text

(1)

Grids: Concepts,

Technologies and

Applications

Geoffrey Fo

Computer Science, Informatics, Physics Pervasive Technology Laboratories Indiana University Bloomington IN 47401

April 25 2005 [email protected]

(2)

So what is a Grid?

n Supporting human decision making with a network of at least four large computers, perhaps six or eight small computers, and a great assortment of disc files and magnetic tape units -not to mention remote consoles and teletype stations - all

churning away. (Licklider 1960)

n Coordinated resource sharing and problem solving in

dynamic multi-institutional virtual organizations

n Infrastructure that will provide us with the ability to

dynamically link together resources as an ensemble to support the execution of large-scale, resource-intensive, and

distributed applications.

n Realizing thirty year dream of science fiction writers that

have spun yarns featuring worldwide networks of

interconnected computers that behave as a single entity.

(3)

Internet Scale Distributed Services

n Grids use Internet technology and are distinguished by

managing or organizing sets of network connected resources

Classic Web allows independent one-to-one access to

individual resources

Grids integrate together and manage multiple

Internet-connected resources: People, Sensors, computers, data systems

n Organization can be explicit as in

TeraGrid which federates many supercomputers;

Deep Web Technologies IR Grid which federates multiple

data resources;

CrisisGrid which federates first responders, commanders,

sensors, GIS, (Tsunami) simulations, science/public data

n Organization can be implicit as in Internet resources such as

curated databases and simulation resources that “harmonize a community”

(4)

Different Visions of the Grid

n Grid just refers to the technologies

Or Grids represent the full system/Applications

n DoD’s vision of Network Centric Computing is just a Grid

(linking sensors, warfighters, commanders, backend resources) and they are building the GIG (Global Information Grid)

n Utility Computing or X-on-demand (X=data, computer ..) is

major computer Industry interest in Grids

n e-Science or Cyberinfrastructure are virtual organization Grids

supporting global distributed science (note sensors, instruments are people are all distributed

n Skype (Kazaa) VOIP system is a Peer-to-peer Grid (and

VRVS/GlobalMMCS like Internet A/V conferencing are Collaboration Grids)

n Commercial 3G Cell-phones and DoD ad-hoc network initiative

are forming mobile Grids

(5)

e-moreorlessanything and the Grid

n

e-Business

captures an emerging view of corporations as

dynamic

virtual organizations

linking employees, customers

and stakeholders across the world.

The growing use of

outsourcing

is one example

n

e-Science

is the similar vision for scientific research with

international participation in large accelerators, satellites or

distributed gene analyses.

n

The

Grid

integrates the best of the Web, traditional

enterprise software, high performance computing and

Peer-to-peer systems to provide the information technology

e-infrastructure

for

e-moreorlessanything

.

n

A

deluge of data

of unprecedented and inevitable size must

be managed and understood.

n

People

,

computers

,

data

and

instruments

must be linked.

n

On demand

assignment of experts, computers, networks and

storage resources must be supported

(6)

More Broad Classes of Grid Applications

n Enterprise Grid supports information system for an

organization; includes “university computer center”, “(digital) library”, sales, marketing, manufacturing …

n Outsourcing Grid links different parts of an enterprise together

(Gridsourcing)

Manufacturing plants with designers

Animators with electronic game or film designers and

producers

Coaches with aspiring players (e-NCAA or e-NFL etc.)

n Customer Grid links businesses and their customers as in many

web sites such as amazon.com

n e-Multimedia can use secure peer-to-peer Grids to link creators,

distributors and consumers of digital music, games and films

respecting rights

n Distance education Grid links teacher at one place, students all

over the place, mentors and graders; shared curriculum, homework, live classes …

(7)

e-Defense and e-Crisis

n

Grids support

Command and Control

and provide

Global Situational Awareness

Link commanders and frontline troops to themselves and to

archival and real-time data; link to what-if simulations

Dynamic heterogeneous wired and wireless networksSecurity and fault tolerance essential

n

System of Systems;

Grid of Grids

The command and information infrastructure of each ship is

a Grid; each fleet is linked together by a Grid; the President is informed by and informs the national defense Grid

Grids must be heterogeneous and federated

n

Crisis Management

and

Response

enabled by a Grid

linking sensors, disaster managers, and first responders

with decision support

(8)

Types of Computing Grids

n

Running “

Pleasing Parallel Jobs

” as in United Devices,

Entropia (Desktop Grid) “cycle stealing systems”

n

Can be managed (“inside” the

enterprise

as in Condor)

or more informal (as in SETI@Home)

n

Computing-on-demand

in Industry where jobs spawned

are perhaps very large (SAP, Oracle …)

n

Support

distributed file systems

as in Legion (Avaki),

Globus with (web-enhanced) UNIX programming

paradigm

Particle Physics will run some 30,000 simultaneous jobs

n

Linking Supercomputers as in

TeraGrid

n

Pipelined

applications linking data/instruments,

compute, visualization

n

Seamless Access

where Grid portals allow one to choose

one

of multiple resources with a common interfaces

(9)

Utility and Service Computing

n An important business application of Grids is believed to be

utility computing

n Namely support a pool of computers to be assigned as needed to

take-up extra demand

Pool shared between multiple applications

n Natural architecture is not a cluster of computers connected to

each other but rather a “Farm of Grid Services” connected to Internet and supporting services such as

Web Servers

Financial ModelingRun SAP

Data-mining

Simulation response to crisis like forest fire or earthquakeMedia Servers for Video-over-IP

n Note classic Supercomputer use is to allow full access to do

“anything” via ssh etc.

In service model, one pre-configures services for all programs

and you access portal to run job with less security issues

(10)

Some Important Styles of Grids

n Computational Grids were origin of concepts and link

computers across the globe – high latency stops this from being used as parallel machine

n Knowledge and Information Grids link sensors and information

repositories as in Virtual Observatories or BioInformatics

More detail on next slide

n Education Grids link teachers, learners, parents as a VO with

learning tools, distant lectures etc.

n e-Science Grids link multidisciplinary researchers across

laboratories and universities

n Community Grids focus on Grids involving large numbers of

peers rather than focusing on linking major resources – links Grid and Peer-to-peer network concepts

n Semantic Grid links Grid, and AI community with Semantic web

(ontology/meta-data enriched resources) and Agent concepts

(11)

Information/Knowledge Grids

n

Distributed

(10’s to 1000’s) of

data sources

(instruments,

file systems, curated databases …)

n

Data Deluge

: 1 (now) to 100’s

petabyte

s/year (2012)

Moore’s law for Sensors

n

Possible

filters

assigned dynamically (

on-demand

)

Run image processing algorithm on telescope image

Run Gene sequencing algorithm on compiled data

n

Needs

decision support

front end with “what-if”

simulations

n

Metadata

(

provenance

)

critical to annotate data

n

Integrate

across experiment

as in multi-wavelength

astronomy

(12)

Database Database Analysis and Visualizatio Portal Repositorie Federated Databases Data Filte Services

Field Trip Data

Streaming Data Sensor s

?

Discovery Services SERVOGrid Researc Simulation s Research Education Customization Services From Researc to Education Educatio Grid Computer Farm

(13)

iSERVO in a nutshell

n Designed to link data-sets (repositories and real time),

computations and earthquake scientists in ACES (Asia Pacific) Cooperation

• Australia China Japan USA

n Exemplified by SERVOGrid in USA led by JPL n Supports simulation and datamining as services

n Adopts conservative WS-I+ Web Service Interoperability

standards

n Builds full “Grid” in a library fashion as a Grid of Grids • GIS (Geographic Information System) Grid built as a set of OGC

compatible Web Services “talking” GML

• iSERVO federates separate Grids in each country/organization/function • A Grid is “just” a collection of Services aka distributed programs

n Multi-scale simulations supported by Grid workflow

n Portals based on NSF Middleware Initiative NMI Open Grid

Computing Environment OGCE

(14)

In flight data

Airline

Maintenance Centre

Ground Station

Global Network Such as SITA

Internet, e-mail, pager

Engine Health (Data) Center

DAME

Rolls Royce and UK e-Science Progra

Distributed Aircraft Maintenance

Environment

~ Gigabyte per aircraft per Engine per transatlantic

flight

~5000 engines

(15)

NASA Aerospace Engineering Grid

(16)

Virtual Observatory Astronomy Gri

Integrate Experiments

Radio Far-Infrared Visible

Visible + X-ray

Dust Map

Galaxy Density Map

(17)

e-Chemistry Laborator

Experiments-on-demand

Grid Resources

Grid-enabled Output Streams

(18)

CERN LHC Data Analysis Grid

(19)

HPC Simulation Data Filter Data Filter Data Filter Data Filt er Data Filter Distributed Filters massage data For simulation Other Gri

and W eb Servi ces Analysi Control Visualize

SERVOGrid (Complexity) Computing Model

Grid

OGSA-DA

Grid Services

This Type of Grid

integrates with

Parallel computing

Multiple HPC facilities but only use one at a time Many simultaneous

data sources and sinks

Grid Data Assimilation

(20)

Sources of Grid Technology

n

Grids support distributed collaboratories or virtual

organizations integrating concepts from

n

The Web

n

Agents

n

Distributed Objects

(CORBA Java/Jini COM)

n

Globus, Legion, Condor, NetSolve, Ninf and other High

Performance Computing activities

n

Peer-to-peer Networks

n

With perhaps the Web and P2P networks being the most

important for “Information Grids” and Globus for

“Compute Grids”

(21)

The Essence of Grid Technology?

n

We will start from the Web view and assert that basic

paradigm is

n

Meta-data rich Web Services communicating via

messages

n

These have some basic support from some runtime

such as .NET, Jini (pure Java), Apache Tomcat+Axis

(Web Service toolkit), Enterprise JavaBeans,

WebSphere (IBM) or GT3/4 (Globus Toolkit 3/4)

These are the distributed equivalent of operating system

functions as in UNIX Shell

Called Hosting Environment or platform

n

W3C standard WSDL defines IDL (Interface

standard) for Web Services

(22)

Meta-data

n

Meta-data

is usually thought of as “data about data”

n

The

Semantic Web

is at its simplest considered as

adding meta-data to web pages

n

For example, the hospital web-page has meta-data

telling you its location, phone-number, specialties

which can be used to automate Google-style searches to

allow planning of disease/accident treatment from web

n

Modern trend (

Semantic Grid

) is meta-data about

web-services e.g. specify details of interface and useage

Such as that a bioinformatics service is free or bandwidth

input is of limited amount

n

Provenance

– history and ownership – of data very

important

(23)

A typical Web Service

n In principle, services can be in any language (Fortran .. Java ..

Perl .. Python) and the interfaces can be method calls, Java RMI Messages, CGI Web invocations, totally compiled away (inlining)

n The simplest implementations involve XML messages (SOAP) and

programs written in net friendly languages like Java and Python

Paymen Credit

Card

Warehous e

Shipping control

WSDL interfaces

WSDL interfaces

Securit

y Catalog

Porta Service

Web Services Web Services

(24)

Raw (HPC) Resources Middleware Database Portal Service s Syste Service s Syste Service s Syste Service s Application Service Syste Service s Syste Service s Use Services “Core Grid

Typical Grid

Architecture

Each Blob is a Computer

Program!

(25)

Classic Grid Architecture

Database Database

Netsolv e

Computin g

Securit y Collaboratio

n

Compositio n

Content Access

Resources

Client

s Users and Devices

Middle Tie Brokers Service Providers

Middle Tier becomes Web Services

(26)

Peer to Peer Grid

Database Database

Peers Peers

Peer to Peer Grid A democratic organization

User Facin

Web Service Interfaces

Service Facin

Web Service Interfaces

Event Messag Brokers

Event Messag Brokers

Event Messag Brokers

(27)

What is Happening?

n Grid ideas are being developed in (at least) four communities

Web Service – W3C, OASIS, (DMTF)

Grid Forum (High Performance Computing, e-Science)

Enterprise Grid Alliance (Commercial “Grid Forum” with a

near term focus)

n Service Standards are being debated

n Grid Operational Infrastructure is being deployed n Grid Architecture and core software being developed

Apache has several important projects as do academia; large

and small companies

n Particular System Services are being developed “centrally” –

OGSA framework for this in GGF; WS-* for OASIS/W3C/Microsoft-IBM

n Lots of fields are setting domain specific standards and building

domain specific services

n USA started but now Europe is probably in the lead and Asia

will soon catch USA if momentum (roughly zero for USA) continues

(28)

Technical Activities of Note

n Look at different styles of Grids such as Autonomic (Robust

Reliable Resilient)

n New Grid architectures hard due to investment required n Program the Grid – Workflow

n Access the Grid – Portals, Grid Computing Environments n Critical Services Such as

Security – build message based not connection basedNotification – event services

Metadata – Use Semantic Web, provenanceFabric and Service Management

Databases and repositories – instruments, sensors

Computing – Submit job, scheduling, distributed file systemsVisualization, Computational Steering

Network performance

Low Level WS-*

High Level

(29)

Web services

Web Services

build

loosely-coupled,

distributed

applications,

(wrapping existing codes

and databases) based on

the

SOA

(service

oriented architecture)

principles.

Web Services interact by

exchanging messages in

SOAP

format

The contracts for the

message exchanges that

implement those

interactions are described

via

WSDL

interfaces.

(30)

Philosophy of Web Service Grids

Much of Distributed Computing was built by natural

extensions of computing models developed for sequential

machines

This leads to the

distributed object

(DO) model represented

by Java and

CORBA

– RPC (Remote Procedure Call) or RMI (Remote Method Invocation) for Java

Key people think this is not a good idea as it scales badly

and ties distributed entities together too tightly

– Distributed Objects Replaced by Services

Note

CORBA

was considered too complicated in both

organization and proposed infrastructure

– and Java was considered as “tightly coupled to Sun” – So there were other reasons to discard

Thus replace distributed objects by

services

connected by

one-way

” messages and not by request-response messages

(31)

Plethora of Standards

• Java is very powerful partly due to its many “frameworks” that generalize libraries e.g.

– Java Media Framework

– Java Database Connectivity JDBC

• Web Services have a correspondingly collections of specifications that represent critical features of the distributed operating systems for “Grids of Simple Services”

– About 60 WS-* specifications introduced in last 2-3 years

– These are low level with higher level standards such as access database (OGSA-DAI) or “Submit a job” built on top of these

• Many battles both between standard bodies and between companies as each tries to set standards they consider best; thus there are multiple standards for many of key Web Service functionalities

• Microsoft a key player and stands to benefit as Web Services open up enterprise software space to all participants

– e.g. MQSeries (IBM) and Tibco have to change their messaging systems to support new open standards

(32)

WS-I Interoperability

Critical underpinning of Grids and Web Services is the

gradually growing set of specifications in the Web Service

Interoperability Profiles

Web Services Interoperability

(WS-I) Interoperability

Profile 1.0a." http://www.ws-i.org.

gives us

XSD,

WSDL1.1, SOAP1.1, UDDI

in basic profile and parts of

WS-Security

in their first security profile.

We imagine the “60 Specifications” being checked out and

evolved in the

cauldron of the real world

and occasionally

best practice identifies a new specification to be added to

WS-I

which

gradually increases in scope

Note only 4.5 out of 60 specifications have “made it” in this definition

(33)

Bit

level

Internet

(OSI

Stack)

Layered Architecture for Web Services and Grids

Base Hosting Environment

Protocol HTTP FTP DNS …

Presentation XDR …

Session SSH …

Transport TCP UDP …

Network IP …

Data Link / Physical

Servic Internet

Application Specific Grids

Generally Useful Services and Grids

Workflow WSFL/BPEL

Service Management (“Context etc.”)

Service Discovery (UDDI) / Information

Service Internet Transport

Protocol

Service Interfaces WSDL

(34)

WS-* implies the The Service Internet

We have the classic (CISCO, Juniper ….) Internet routing the

flood of ordinary packets in OSI stack architecture

Web Services build the “Service Internet” or IOI (Internet on

Internet) with

Routing via WS-Addressing not IP header

Fault Tolerance (WS-RM not TCP)

Security (WS-Security/SecureConversation not IPSec/SSL)

Data Transmission by WS-Transfer not HTTP

Information Services (UDDI/WS-Context not DNS/Configuration files)

At message/web service level and not packet/IP address level

Software-based Service Internet possible as computers “fast”Familiar from Peer-to-peer networks and built as a software

overlay network defining Grid (analogy is VPN)

SOAP Header contains all information needed for the “Service

(35)

Consequences of Rule of the Millisecond

Useful to remember

critical time scales

– 1) 0.000001 ms – CPU does a calculation

– 2a) 0.001 to 0.01 ms – Parallel Computing MPI latency

– 2b) 0.001 to 0.01 ms – Overhead of a Method Call

– 3) 1 ms – wake-up a thread or process

– 4) 10 to 1000 ms – Internet delay

2a), 4) implies geographically distributed

metacomputing

can’t in general compete with parallel systems

3) << 4) implies a software overlay network is possible

without significant overhead

– We need to explain why it adds value of course!

2b) versus 3) and 4) describes regions where

method

and

message

based programming paradigms important

(36)

Linking Modules

n

From method based to RPC to message based to event-based

publish-subscribe Message Oriented Middleware

Module A Module

B

Method Call .001 to 1 millisecond

Service A Service

B Messages

0.1 to 1000 millisecond latency

Coarse Grain Service Model

Closely coupled Java/Python …

Service B Service A

Publisher Post Events “Listener

Subscribe to Events

Message Queue in the

(37)

What is a High Performance Computer?

n We might wish to consider three classes of multi-node computers n 1) Classic MPP with microsecond latency and scalable internode

bandwidth (tcomm/tcalc ~ 10 or so)

n 2) Classic Cluster which can vary from configurations like 1) to 3)

but typically have millisecond latency and modest bandwidth

n 3) Classic Grid or distributed systems of computers around the

network

Latencies of inter-node communication – 100’s of milliseconds

but can have good bandwidth

n All have same peak CPU performance but synchronization costs

increase as one goes from 1) to 3)

n Cost of system (dollars per gigaflop) decreases by factors of 2 at

each step from 1) to 2) to 3)

n One should NOT use classic MPP if class 2) or 3) suffices unless

some security or data issues dominates over cost-performance

n One should not use a Grid as a true parallel computer – it can

link parallel computers together for convenient access etc.

(38)

What is a Simple Service?

• Take any system – it has multiple functionalities

– We can implement each functionality as an independent distributed service

– Or we can bundle multiple functionalities in a single service

• Whether functionality is an independent service or one of many method calls into a “glob of software”, we can always make them as Web services by converting

interface to WSDL

• Simple services are gotten by taking functionalities and making as small as possible subject to “rule of millisecond”

– Distributed services incur messaging overhead of one (local) to

100’s (far apart) of milliseconds to use message rather than method call

– Use scripting or compiled integration of functionalities ONLY

when require <1 millisecond interaction latency

• Apache web site has many projects that are multiple functionalities presented as (Java) globs and NOT (Java) Simple Services

– Makes it hard to integrate sharing common security, user profile, file access .. services

(39)

Grids of Grids of Simple Services

• Link via methods  messages  streams

• Services and Grids are linked by messages

• Internally to service, functionalities are linked by methods

• A simple service is the smallest Grid

• We are familiar with method-linked hierarch

Lines of Code  Methods  Objects  Programs  Packages

Overlay and Compose

Grids of Grids

Methods Services Component Grids

CPUs Clusters Compute Resource Grids MPPs

Databases DatabasesFederated

Sensor Sensor Nets

Data

(40)

Component Grids?

• So we build collections of Web Services which we

package as

component Grids

Visualization Grid

Sensor Grid

Utility Computing Grid

Person (Community) Grid

Earthquake Simulation Grid

Control Room Grid

Crisis Management Grid

• We build bigger Grids by

composing component

Grids

using the

Service Internet

(41)

Critical Infrastructure (CI) Grids built as Grids of Grids

Gas Service and Filters

Physical Network Registr

y Metadata

Flood Service and Filters

Flood CIGrid

Electricity Gas CIGrid CIGrid

Data

Access/Storage Securit

y Notification Workflow Messaging Portal

s VisualizationGrid Collaboration

Grid

Sensor Grid Compute Grid

GIS Grid

Core Grid Services

(42)

Two-level Programming I

n

The paradigm implicitly assumes a

two-level

Programming Model

n

We make a

Service

(same as a “distributed object” or

“computer program” running on a remote computer)

using conventional technologies

C++ Java or Fortran Monte Carlo moduleData streaming from a sensor or SatelliteSpecialized (JDBC) database access

n

Such

services

accept and produce data from users files

and database

n

The Grid is built by coordinating such services

assuming we have solved problem of programming the

service

Servic

e Data

(43)

Two-level Programming II

n

The Grid is discussing the composition of distributed

services

with the runtime

interfaces to Grid as

opposed to UNIX

pipes/data streams

n

Familiar from use of UNIX Shell, PERL or Python

scripts to produce real applications from core programs

n

Such interpretative environments are the single

processor analog of

Grid Programming

n

Some projects like GrADS from Rice University are

looking at integration between service and composition

levels but dominant effort looks at each level separately

Service

1 Service2

Service

3 Service4

(44)

What Should One Do?

n

Grids

and

Service Oriented Architectures

will

Change landscape in mature areas like enterprise softwareSupport new distributed applications in Science,

Government, Education, Business and Community areas

Encourage trends like outsourcing and globalization in all

activities

n

Web Service/Grid

standards and infrastructure are still

in their infancy but broad principles reasonably clear

n

Many large scale software development activities are

inconsistent

with modern architectures

n

Development of

Application specific (XML-based)

standards

is an important “safe” area

References

Related documents

fusion rules, compute the ground state degeneracy on the torus, and study the modular transformations of the theory... In Chapters 4 and 5 , we present a collection of more

The ash content and volatile matter obtained from proximate analysis were used to obtain fixed carbon percentage [45].. The 50:50 sample having a value of 17.8% which is

As the difference shown in the circulation of ocean currents in the northern part of MS, we suggest that it is necessary to simulate 3D numerical models to obtain a new information

a) The strengthened columns exhibited significant improvement of the shear strength, stiffness, displacement ductility, and hysteretic energy dissipation capacity

In this research, a risk model based on the accident index (the number calculated for an accident hotspot in a road and in the study time period) for accident hotspots (a spot in

The respiratory rate monitoring device developed in this paper uses resonance tube to enhance the performance of the microphone system in catching breath sound and

spaces of half integral weight modular forms.... The Hilbert modular forms (of half integral weight) are

A possible approach to understanding the stereochemistry of olefin insertion would be to model the carbon-hydrogen or carbon-carbon bond- forming transition states of a group