• No results found

Cyberinfrastructure to integrate simulation, data and sensors for collaborative eScience in CRESIS

N/A
N/A
Protected

Academic year: 2020

Share "Cyberinfrastructure to integrate simulation, data and sensors for collaborative eScience in CRESIS"

Copied!
28
0
0

Loading.... (view fulltext now)

Full text

(1)

Cyberinfrastructure to integrate

simulation, data and sensors for

collaborative eScience in CRESI

CERSER and CRESIS

http://nia.ecsu.edu/

Elizabeth City State University

October 19 2006

Geoffrey Fox

Computer Science, Informatics, Physics

Pervasive Technology Laboratories

Indiana University Bloomington IN 47401

(2)

Abstract

n

Cyberinfrastructure

supports eScience or

collaborative science with distributed scientists,

computers, data repositories and sensors.

n

We describe the emerging

Grid software

for

eScience and the underlying Cyberinfrastructure

such as the

TeraGrid

.

n

We give one examples in detail:

iSERVO

– the

International Solid Earth Research Virtual

Organization supporting Earthquake Science

n

This illustrates

Computing Grids

,

Geographical

Information System Grids

,

Sensor Grids

n

We suggest implications for

CReSIS – Center for

(3)

Why Cyberinfrastructure Useful

n

Supports

distributed science

– data, people, computers

n

Exploits

Internet technology

(Web2.0) adding management,

security, supercomputers etc.

n

It has two aspects:

parallel

– low latency (microseconds)

between nodes and

distributed

– highish latency (milliseconds)

between nodes

n

Parallel needed to get

high performance

on

individual

3D

simulations, data analysis etc.; must

decompose problem

n

Distributed aspect

integrates

already distinct components

n

Cyberinfrastructure is in general a

distributed collection of

parallel systems

n

Grids are made of services

that are “just” programs or data

sources packaged for distributed access

(4)

e-moreorlessanything and the Grid

n

e-Science

is about global collaboration in key areas of science,

and the next generation of infrastructure that will enable it.’ from

its inventor

John Taylor

Director General of Research Councils

UK, Office of Science and Technology

n

e-Science

is about developing tools and technologies that allow

scientists to do ‘faster, better or different’ research

n

Similarly

e-Business

captures an emerging view of corporations as

dynamic

virtual organizations

linking employees, customers and

stakeholders across the world.

The growing use of

outsourcing

is one example

n

The

Grid

provides the information technology

e-infrastructure

for

e-moreorlessanything

.

n

A

deluge of data

of unprecedented and inevitable size must be

managed and understood.

n

People

,

computers

,

data

and

instruments

must be linked.

n

On demand

assignment of experts, computers, networks and

(5)

TeraGrid: Integrating NSF Cyberinfrastructure

TeraGrid is a facility that integrates computational, information, and analysis resources at the

San Diego Supercomputer Center, the Texas Advanced Computing Center, the University of

Chicago / Argonne National Laboratory, the National Center for Supercomputing Applications,

Purdue University,

Indiana University

, Oak Ridge National Laboratory, the Pittsburgh

Supercomputing Center, and the National Center for Atmospheric Research.

Today 100 Teraflop; tomorrow a petaflop; Indiana 20 teraflop today.

SDSC

TACC

UC/ANL

NCSA

ORNL

PU

IU

PSC NCAR

Caltech

USC-ISI Utah

Iowa

Cornell Buffalo

(6)

Virtual Observatory Astronomy Gri

Integrate Experiments

Radio

Far-Infrared

Visible

Visible + X-ray

Dust

Map

(7)

Grid Capabilities for Science

n

Open

technologies for any

large scale distributed system

that is adopted by

industry, many sciences and many countries (including UK, EU, USA, Asia)

Security, Reliability, Management and state standards

n

Service

and messaging specifications

n

User interfaces

via portals and portlets virtualizing to desktops, email,

PDA’s etc.

~20 TeraGrid

Science Gateways

(their name for portals)

OGCE Portal

technology effort led by Indiana

n

Uniform approach to access distributed

(super)computers

supporting

single

(large) jobs

and

spawning lots of related jobs

n

Data

and

meta-data

architecture supporting real-time and archives as well

as federation

Links to

Semantic web

and

annotation

n

Grid (Web service) workflow with standards and several successful

instantiations (such as

Taverna

and

MyLead)

n

Many

Earth science grids

including ESG (DoE), GEON, LEAD, SCEC,

SERVO; LTER and NEON for Environment

http://www.

nsf.gov/od/oci/ci-v7.pdf

(8)

APEC Cooperation for Earthquake Simulation

n

ACES

is a seven year-long collaboration among scientists

interested in

earthquake and tsunami predication

iSERVO

is Infrastructure to suppor

work of ACES

SERVOGrid

is (completed) US Grid that is

a prototype of iSERVO

http://

www.quakes.uq.edu.au/ACES/

n

Charte

red under

APEC

(9)

Database

Database

Analysis and

Visualizatio

Portal

Repositorie

Federated

Databases

Data

Filte

Services

Field Trip Data

Streaming

Data

Sensor

s

?

Discovery

Services

SERVOGrid

Researc

Simulation

s

Research

Education

Customization

Services

From

Researc

to Education

Educatio

Grid

Computer

Farm

Grid of Grids: Research Grid and Education Grid

(10)

SERVOGrid and Cyberinfrastructure

n

Grids

are the technology based on Web services that implement

Cyberinfrastructure

i.e. support eScience or science as a team

sport

Internet scale managed services that link

computers data

repositories sensors instruments

and

people

n

There is a

portal

and services in

SERVOGrid

for

Applications

such as GeoFEST, RDAHMM, Pattern

Informatics, Virtual California (VC), Simplex, mesh

generating programs …..

Job management

and monitoring web services for running

the above codes.

File management

web services for moving files between

various machines.

Geographical Information System services

Quaketables

earthquake specific database

Sensors

as well as databases

Context

(dynamic metadata) and

UDDI

system long term

metadata services

(11)

a

Topography 1 km

Stress Change

Earthquakes

PBO

Site-specific Irregular

Scalar Measurements

Constellations for Plate

Boundary-Scale Vector

Measurements

a

a

Ice Sheets

Volcanoes

Long Valley, CA

Northridge, CA

Hector Mine, CA

Greenland

(12)

Some Grid Concepts I

n

Services

are “just” (distributed) programs sending and

receiving messages with well defined syntax

n

Interfaces

(input-output)

must be open

; innards can be

open source (allowing you to modify) or proprietary

Services can be any language from Fortran, Shell scripts, C,

C#, C++, Java, Python, Perl – your choice!!

Web Services

supported by all vendors (IBM, Microsoft …)

n

Service overhead

will be just a

few milliseconds

(more

now) which is < typical network transit time

Any program that is distributed can be a Web service

Any program taking execution time ≥ 20ms can be an

(13)

Web services

n

Web Services

build

loosely-coupled,

distributed

applications,

(wrapping existing

codes and databases)

based on the

SOA

(service oriented

architecture) principles.

n

Web Services interact

by exchanging messages

in

SOAP

format

n

The contracts for the

message exchanges that

implement those

interactions are

described via

WSDL

(14)

A typical Web Service

n

In principle, services can be in

any

language (Fortran .. Java ..

Perl .. Python) and the interfaces can be method calls, Java RMI

Messages, CGI Web invocations, totally compiled away (inlining)

n

The simplest implementations involve

XML messages (SOAP)

and

programs written in net friendly languages like Java and Python

Paymen

Credit

Card

Warehous

e

Shipping

WSDL

interfaces

WSDL

interfaces

Securit

y

Catalo

g

Porta

Service

(15)

Some Grid Concepts II

n

Systems are built from contributions from many different

groups

– you do not need one “vendor” for all components as

Web services allow interoperability between components

One reason

DoD likes Grids

(called Net-Centric computing)

n

Grids are

distributed

in services and data allowing anybody to

store their data and to produce “their” view

Some think that University Library of future will curate/store data of

their faculty

n

2 level programming model

”: Classic programming of services

and services are composed using workflow consistent with

industry standards (BPEL)

n

Grid of Grids

: (System of Systems) Realistically Grid-like

systems will be built using multiple technologies and “standards”

–integrate separate Grids for Sensors, GIS, Visualization,

computing etc. with

OGSA

(Open Grid Service Architecture

from OGF) system Grid (Security, registry) into a single

Grid

n

Existing codes UNCHANGED

; wrap as a service with metadata

(16)
(17)

LEAD Gateway Portal

NSF Large ITR and Teragrid Gateway

- Adaptive Response to Mesoscal

weather events

- Supports Data exploration,Grid Workflow

(18)

Grid Workflow Data Assimilation in Earth Science

n

Grid services

triggered by abnormal events and controlled by

workflow

process real

time data from radar and high resolution simulations for tornado forecasts

(19)

SERVOGrid has a portal

The Portal is built from portlets –

providing user interface

fragments for each service

that are composed into the

full interface – uses OGCE

technology as does planetary

science VLAB portal with

University of Minnesota

(20)

GIS and Sensor Grids

n

OGC

has defined a suite of

data structures

and

services

to

support

Geographical Information Systems and Sensors

n

GML

Geography Markup language defines specification of

geo-referenced data

n

SensorML

and

O&M

(Observation and Measurements) define

meta-data and data structure for sensors

n

Services like

Web Map Service, Web Feature Service, Sensor

Collection Service

define services interfaces to access GIS and

sensor information

n

Grid workflow

links services that are designed to support

streaming input and output messages

n

We built Grid (Web) service implementations of these

specifications for NASA’s

SERVOGrid

(21)

Grid Workflow Datamining in Earth Science

n

Work with

Scripps Institute

n

Grid services

controlled by

workflow

process real time

data from ~70 GPS Sensors in Southern California

Streaming Data

Support

Transformations

Data Checking

Hidden Marko

Datamining (JPL)

Display (GIS)

NASA

GPS

Earthquake

(22)

Earth/Atmosphere Grids built as Grids of (library) Grids

Ice Sheet Sensors,

SAR, Filters, EM,

Glacier Simulations

Physical Network

Registr

y

Metadat

a

Earthquake Data,

Filters &

Simulation

Services

Earthquake

SERVOGrid

Tornad

Grid

Ice Sheet

PolarGrid

Data

Access/Storage

Securit

y

Notificatio

n

Workflo

w

Messagin

g

Portal

s

Visualization

Grid

Collaboration

Grid

Sensor Grid

Compute

Grid

GIS Grid

(23)

CReSIS PolarGrid

n

Important CReSIS-specific Cyberinfrastructure

components include

Managed data from

sensors

and

satellites

Data analysis such as

SAR processing

– possibly with parallel

algorithms

Electromagnetic simulations

(currently commercial codes) to

design instrument antennas

3D simulations of

ice-sheets

(glaciers) with non-uniform

meshes

GIS

Geographical Information Systems

n

Also need capabilities present in many Grids

Portal

i.e. Science Gateway

Submitting

multiple sequential or parallel

jobs

(24)

What should we do?

n

Identify

existing programs

that should be wrapped as

Grid services

One can do this even for commercial codes as one keeps existing codes (Fortran,

C++) unchanged and constructs a “metadata” wrapper defining where programs

and its data are located and how to invoke.

n

Identify where

parallel versions

needed and if

help

needed in creating these

Parallel codes can be Grid services

Electromagnetic codes are commercial – in principle parallel

Ice sheet models can be parallelized for high resolution simulations

n

Scope out system;

Computational

needs -Identify value of

TeraGrid

; data

storage

needs;

network

requirements

n

Examine

data model

and produce a data

Grid architecture

Use databases? Distributed? Metadata? Files? What are key performance issues?

n

Examine integration of

GIS

with Grid Services

n

Design and implement

Science Gateway

n

Are there important

visualization

requirements outside GIS?

n

Are there key issues from

security

?

n

Bring up core services such as

registries

(25)

Benefits of CReSIS PolarGrid

n

Shared resources support

collaboration among CReSIS scientists

n

Integration

of Polar related data with appropriate compute

resources enabling research on specific topics and studies across

topics

n

Polar Science Gateway

accessing common services (programs),

data and their integration as workflow

n

Access to

TeraGrid

with same interface for large scale

simulations

n

Can share

common capabilities

(SAR analysis, GIS) with related

Grids such as SERVOGrid, GEON, LEAD etc.

n

Modular Grid services

allow exchange of new capabilities

preserving systems

e.g. Change EM Simulation service

n

Management

of dynamic heterogeneous data

(26)

We built a Web Service version of this Open Geospatial Consortium specification. The WMS constructs images out of abstract feature descriptions.

Web Map Service

We have built data model extensions to UDDI to support XPath queries over Geographical Information System capability.xml files. This is designed to replace OGC (Open Geospatial Consortium) Web registry service

Information Service

This uses capabilities built into portal. Note that simulations are typically performed on machines where user has accounts while data services are shared for read access

Authentication and Authorization

We use an OGCE based portal based on portlet architecture Portal

We built a file web service that could do uploads, downloads, and crossloads between different services. Clearly this supports specific operations such as file browsing, creation, deletion and copying.

File Services

We have an Application and a Host Descriptor service based on XML schema descriptors. Portlet interfaces allow code administrators to make applications available through the browser.

Application and Host Metadata Service

We store information gathered from users’ interactions with the portal interface in a generic, recursively defined XML data structure. Typically we store input parameters and choices made by the user so that we can recover and reload these later. We also use this for monitoring remote workflows. We have devoted considerable effort into developing WS-Context to support the generalization of this initial simple service.

Context Data Service

These can be all launched by a single Job Management service or by custom instances of this with metadata preset to a particular application

Specific Applications: Virtual California, Geofest, Park, RDAHMM ..

SERVO wraps Apache Ant as a web service and uses it to launch jobs. For a particular application, we design a build.xml template. The interface is simply a string array of build properties called for by the template. We’ve also built a simple generic “template engine” version of this.

Job Management

Description Service

(27)

WS-Security JSDL WSRF BPEL OGSA-DAI Key interfaces/standards/software

NOT Used (often just for historical reasons as project predated standard)

GML WFS WMS

WSDL XML Schema with pull parser XPP SOAP with Axis 1.

UDDI WS-Context JSR-168 JDBC Servlets

WS-Management VOTables in Research Key interfaces/standards/software

Used

We are developing a Web Service based on the National Virtual Observatory’s VOTables XML format for tabular data. We see this as a useful general format for ASCII data produced by various application codes in SERVO and other projects.

Data Tables Web Service

We are developing Dislin-based scientific plotting services as a variation of our Web Map Service: for a given input service, we can generate a raster image (like a contour plot) which can be integrated with other scientific and GIS map plot images.

Scientific Plotting Services

The USC QuakeTables fault database project includes a web service that allows you to search for Earthquake faults.

QuakeTables Database Services

This supplies alerts to users when filters (data-mining) detects features of interest Notification Service

This is used to stream data in workflow fed by real-time sources. It is based on NaradaBrokering which can also be used in cases just involving archival data

Messaging Service

We are developing infrastructure to support streaming GPS signals and their successive filtering into different formats. This is built over NaradaBrokering (see messaging service). This does not use Web Services as such at present but the filters can be controlled by HPSearch services.

Sensor Grid Services

The HPSearch project uses HPSearch Web Services to execute JavaScript workflow descriptions. It has more recently been revised to support WS-Management and to support both workflow (where there are many alternatives) and system management (where there is less work). Management functions include life cycle of services and QoS for inter-service links

Workflow/Monitoring/Management Services

Service Eye Chart Continued

(28)

Key GIS and Related Services

Description

Component

Publish/subscribe system allows data streams to be

reorganized using topics.

Sensor Grid

Supports integration of local and remote map services;

treats Google maps as an OGC-compliant map server;

Web Map

Services

Supports both streaming and non-streaming returns of

query results.

Web Feature

Service

Contexts can be used to hold arbitrary content (XML,

URIs, name-value pairs); can be used to support

distributed session state as well as persistent data;

currently researching scalability.

WS-Context

Support for streaming data between services; supports

scriptable workflows so not limited to DAGs;

References

Related documents

• University Housekeeping will, as needed, correct any cleanliness issues so as to leave a good impression on this building since many students use this dining facility within

Temnothorax nylanderi (Förster, 1850) new and a second record of Stenamma debile (Förster, 1850) (Hymenoptera, Formicidae) in Norway.. TORSTEIN KVAMME &amp; THOR

The PPMS builds on a participatory approach during the project cycle and comprises five interlinked elements: (i) design and monitoring framework (DMF) providing the

The third indicator is real property tax and special assessment delinquencies. County treasurers were contacted in each of the sample counties to provide delinquent tax data for

During the study, it was found that Blouberg Municipality has embarked on Municipal Finance Management training in order to capacitate its officials to be able

But a cross- state study that scored states focus on results (by looking at the range and quality of performance data generated by performance management sys- tems) and delegation

A wellness program focusing on physical activity may contribute to improving health, weight reduction, and reducing chronic diseases and other health conditions related to a

The traditional approach for solving a generalized complex symmetric EVP is to treat it as a generalized complex non-Hermitian problem and to use the LAPACK routine zggev,