• No results found

Complexity Computational Environment, integrating data and simulation on the Grid: Multiscale computing

N/A
N/A
Protected

Academic year: 2020

Share "Complexity Computational Environment, integrating data and simulation on the Grid: Multiscale computing"

Copied!
40
0
0

Loading.... (view fulltext now)

Full text

(1)

Complexity

Computational

Environment

integrating data and

simulation on the Grid:

Multiscale computing

JP

June 18 2003

Geoffrey Fox, Marlon Pierce

Community Grids Lab

Indiana University

[email protected]

http://academia.web.cern.ch/academia/lectures/grid /

(2)

Grid Backdrop from CT Project

• Grid Computational Environment (GCE) for SERVOGrid based on Web services (WS)

• Job submission Job management, simple security (to be addressed), File processing

• Support as WS key simulation and Pattern recognition codes (DISLOC*, SIMPLEX*, VC, PARK, GEOFEST*, DAHMM, PDPC)

– *Current

• Support databases and visualization

• Simple workflow, notification, metadata services • Initial Schema for GEM specific (meta-)data

• Portlet based Interfaces

• Extend to ACES (Japan, Australia) for distributed computers, software, databases, clients

• Collaboration and other useful portlets

(3)

AIST Additions

Compatibility with Grid Services

Use of OGSA-DAI XML and SQL database standards

– Including extensions for streaming (sensor) data

– Including extensions for integration with simulations

Optimization for parallel simulations (e.g. parallel IO)

(?)

Better workflow, notification, metadata services

– openGIS/GML compatibility (fault etc. Schema)

– Semantic Grid

Autonomic (Robust Reliable Resilient) services

(?)

Support

multi-scale simulations and data assimilation

ServoPSE Problem Solving Environments

(?)

– GeoLanguage (ServoML specializing CCEML) integrating workflow and multi-scale support

(4)

Database Database

Closely Coupled Compute Nodes

Analysis and Visualization Repositorie

Federated Databases

Sensor Nets

Streaming Data

Loosely Coupled Filters

(5)

Sources of Grid Technology?

Grids support distributed collaboratories or virtual

organizations that support People, Computers,

Observational Data and results of thought and data

processing

The Web

and Web Services

– Most important for Information Grids as these are naturally service-based

Distributed Objects

(CORBA Java/Jini COM)

– Distributed Object same as a Service

Globus

Legion Condor NetSolve Ninf and other High

Performance Computing activities

– Compute/File Grids that need to be made into services (Globus GT3) and integrated with Information Grids for Geocomplexity

(6)

Taxonomy of Grid Functionalities

Grid supporting a company’s enterprise infrastructure

Enterprise Grid

Grid supporting University community computing

Campus Grid

Hybrid combination of Information and Compute/File Grid emphasizing integration of experimental data, filters and simulations: Data assimilation

Complexity or Hybrid Grid

Grid service access to distributed information, data and knowledge repositories

Information Grid

or Data Service Grid

“Internet Computing” and “Cycle Scavenging” with secure sandbox on large numbers of untrusted computers

Desktop Grid

e.g. SETI@Home

Run multiple jobs with distributed compute and data resources (Global “UNIX Shell”)

Compute/File Grid or Data File Grid

(7)

Approach

Build on e-Science methodology and Grid

technology

Geocomplexity (and Biocomplexity)

applications with multi-scale models,

scalable parallelism, data assimilation as

key issues

– Data-driven models for earthquakes

Use existing code/database technology

(SQL/Fortran/C++) linked to “Application

Web/OGSA services”

– XML specification of models, computational steering, scale supported at “Web Service” level as don’t need “high performance” here

– Allows use of Semantic Grid technology

AIST builds on CT

Typica codes

WS linking to user and

Other WS (data sources)

(8)

HPC Simulation Data Filter Data Filter Data Filter Data Filt er Data Filter Distributed Filters massage data For simulation Other Gri

and W eb Servi ces Analysi Control Visualize SERVOGrid (Complexity)Computing Model Grid OGSA-DAI Grid Services

This Type of Grid

integrates with

Parallel computing

Multiple HPC facilities but only use one at a time Many simultaneous

data sources and sinks

(9)

Data Assimilation

Data assimilation implies one is solving some optimization

problem which might have Kalman Filter like structur

As discussed by DAO at Earth Science meeting, one will

become more and more dominated by the data (Nobs

much

larger than number of simulation points).

Natural approach is to form for each local (position, time)

patch the “important” data combinations so that

optimization doesn’t waste time on large error or insensitive

data.

Data reduction done in natural distributed fashion NOT on

HPC machine as distributed computing most cost effective

if calculations essentially independent

(10)

Distributed Filtering

HPC Machine Distribute

Machine

Data Filter

Nobslocal patch 1

Nfilteredlocal patch 1

Data Filter

Nobslocal patch 2

Nfilteredlocal patch 2 Geographicall

y

Distribute

Sensor patches

Nobslocal patch >> Nfilteredlocal patch Number_of_Unknownslocal patch

Send needed Filter Receive filtered data In simplest approach, filtered data gotten by linear transformations on original data based on Singular Value Decomposition of Least

squares matrix

(11)

Grid Politics

• There is a Global Grid Forum meeting 3 times per year with about 700 attendees per meeting

– Exchange information and define standards for “everything” not done in W3C and OASIS

– e.g. Grid Service, Security, What is a Job, Database, Computer, How to build portals ….

• There is a large project called Globus developing software largely for “compute/file” Grids

• There are some 50 Grid projects (mainly in Europe and USA) developing software and applications as well as installing

infrastructure

– Some are “deployment”: EDG NMI VDT …..

• There are related initiatives called CyberInfrastructure (NSF USA) and e-Science (UK)

• There is a proposed OMII (Open Middleware Infrastructure

(12)

OGSA OGSI & Hosting

Environments

• Start with Web Services in a hosting environment

• Add OGSI to get a Grid service and a component model

• Add OGSA to get Interoperable Grid “correcting” differences in base platform and adding key functionalities

OGSI on Web Services

Broadly applicable services: registry,

authorization, monitoring, data

access, etc., etc.

Hosting Environment for WS Models for resources& ot her ent ities

More specialized services: data

replication, workflow, etc., etc. Domai

n - servicesspecific

O the r model s Network OGSA Environment Possibly OGSA Not OGSA

(13)

OGSI Open Grid Service Interface

• http://www.gridforum.org/ogsi-wg

• It is a “component model” for web services.

• It defines a set of behavior patterns that each OGSI service must exhibit. • Every “Grid Service” portType extends a common base type.

– Defines an introspection model for the service – You can query it (in a standard way) to discover

• What methods/messages a port understands

• What other port types does the service provide? • If the service is “stateful” what is the current state?

• Factory Model

• A set of standard portTypes for

– Message subscription and notification – Service collections

• Each service is identified by a URI called the “Grid Service Handle”

• GSHs are bound dynamically to Grid Services References (typically wsdl docs)

– A GSR may be transient. GSHs are fixed.

(14)

OGSA-DA

(Malcolm Atkinson Edinburgh) UK e-Science Grid Core Programme

Development of Data Access and Integration Services for OGSA

http://umbriel.dcs.gla.ac.uk/NeSC/general/projects/OGSA_DAI - Access to XML Databases

Access to Relational Databases

(15)

-DAI Key Services

GridDataService GDS Access to data & DB

operations

GridDataServiceFactory GDSF Makes GDS & GDSF

GridDataServiceRegistry GDSR Discovery of GDS(F) & Data

GridDataTranslationService GDTS Translates or Transforms Data

GridDataTransportDepot

Integrated Structured Data Transport

GDTD Data transport with persistence

Relational & XML models supported

Role-based Authorisation

(16)

Client

Client Client

Relation al

database

Grid Data Service

Directo ry / File system XML

databas e

(17)

Integration of Data and Filters

One has the OGSA-DAI Data repository interface

combined with WSDL of the (Perl, Fortran, Python …)

filter

User only sees WSDL not data syntax

Some non-trivial issues as to where the filtering compute

power is

– Microsoft says filter next to data

D B

Filter

WSDL Of Filter

(18)

Inf Grid Mult Scale Parallel Computing Experiments GeoInformatics Extended/Integrated VA+PARK+GEOFEST Large System Simulations General Complex Systems Simulations

Load Balancing Algorithms

(19)

Databas

Service ComputService ServiceSensor

Parallel Simulatio

Service

Middle Tier with XML Interfaces

Visualizatio Service Applicatio

Service-1

Users

Database

Applicatio Service-2

Applicatio Service-3

CCE Control Portal Aggregation

SERVOGrid Complexity Computing Environment

XML Meta-dat Service Complexit

(20)

SERVOGrid Requirements

Seamless Access

to Data repositories and large scale

computers

Integration

of

multiple data sources

including sensors,

databases, file systems with analysis system

– Including filtered OGSA-DAI

Rich meta-data

generation and access with

SERVOGrid

specific Schema

extending openGIS standards and using

Semantic Grid

Portals

with component model for user interfaces and web

control of all capabilities

(21)

Gri Computing or Programmin Environments “Core Grid Resources

Portal such as “Jetspeed”

Application/User Framework supporting

development and deployment of OGSI compliant AWS (Application Web Services)

AW

S AWS AWS AWS

Database Web Services H o s t i n g E n v i r o n m e n t Resource Grid Services Generic Application Services

OGSA Interoperability Layer

“Sophisticated” System Services

OGSA Interoperability Layer

(22)

Taxonomy of Grid Operational Style

Fault tolerant and self-healing Grid Robust Reliable Resilient R3

R3 or Autonomic Grid

Grid supporting collaborative tools like the Access Grid, whiteboard and shared applications.

Collaboration Grid

Grid designed for rapid deployment and minimum life-cycle support costs

Lightweight Grid

Grid built with peer-to-peer mechanisms

Peer-to-peer Grid

Integration of Grid and Semantic Web meta-data and ontology technologies

Semantic Grid

Description of Grid Operational or Architectural Style

(23)

Paradigms Protocols Platforms and Hosting

• We can start from the Web view where the

basic

Grid paradigm

is

• Meta-data rich Web Services communicating via

messages

• These have some basic support from some runtime

such as .NET, Jini (pure Java), Apache

Tomcat+Axis (Web Service toolkit), Enterprise

JavaBeans, WebSphere (IBM) or GT3 (Globus

Toolkit 3)

These are the distributed equivalent of operating

system functions as in UNIX Shell

(24)

Permeating Principles and Policies

• Meta-data rich Message-linked Web Services as the permeating paradigm • “User” Component Model such as “Enterprise JavaBean (EJB)” or .NET. • Service Management framework including a possible Factory mechanism • High level Invocation Framework describing how you interact with system

components.

– This could for example be used to allow the system to built from either W3C or GGF style (OGSI) Web Services and to protect the user from changes in their specifications.

• Security is a service but the need for fine grain selective authorization encourages • Policy context that sets the rules for each particular Grid.

– Currently OGSA supports policies for routing, security and resource use. • The Grid Fabric or set of resources needs mechanisms to manage them. This

includes automatic recording of meta-data and configuration of software.

• Quality of service (QoS) for the Network and this implies performance monitoring and bandwidth reservation services.

– Challenging as end-to-end and not just backbone QoS is needed.

• Messaging systems like MQSeries from IBM provide robustness from asynchronous delivery and can abstract destination and allow customization of content such as

converting between different interface specifications.

(25)

Virtualization

The Grid could and sometimes does

virtualize

various

concepts

Location:

URI (Universal Resource Identifier)

virtualizes URL

Replica

management (caching) virtualizes file location

generalized by GriPhyn virtual data concept

Protocol:

message transport and WSDL bindings

virtualize transport protocol as a QoS request

P2P or Publish-subscribe

messaging

virtualizes matching

of source and destination services

Semantic Grid

virtualizes Knowledge as a meta-data

query

Brokering

virtualizes resource allocation

(26)

Interfaces and Functionality and Semantics I

The Grid platform tries to minimize detail in protocols and

maximize detail in interfaces to enhance scaling

However rich meta-data and semantics are critical for

correct and interesting operation

– Put as much semantic interpretation as you can into specific services

– Lack of Semantic interoperation is in fact main weakness of today’s Grids and Web services

Everything becomes a service whether system or

application level

There are some very important “Global Services”

– Discovery (look up) and Registration of service metadata

– Workflow

(27)

Interfaces and Functionality and Semantics II

• There are many other generally important services

• OGSA-DAI The Database Service

• Portal Service linked to by WSRP (Web services

for Remote Portals)

• Notification of events

• Job submission

• Provenance – interpret meta-data about history of

data

• File Interfaces

• Sensor service – satellites …

• Visualization

(28)

Categories of Worldwide Grid Service

to be exploited by SERVOGrid

• 1) Types of Grid

– R3

– Lightweight – P2P

– Federation and Interoperability

• 2) Core Infrastructure and Hosting Environment

– Service Management – Component Model

– Service wrapper/Invocation – Messaging

• 3) Security Services

– Certificate Authority – Authentication – Authorization – Policy

• 4) Workflow Services and Programming Model

– Enactment Engines (Runtime) – Languages and Programming – Compiler

– Composition/Development

• 5) Notification Services

• 6) Metadata and Information Services

– Basic including Registry

– Semantically rich Services and meta-data – Information Aggregation (events)

– Provenance

• 7) Information Grid Services

– OGSA-DAI/DAIT

– Integration with compute resources – P2P and database models

• 8) Compute/File Grid Services

– Job Submission

– Job Planning Scheduling Management – Access to Remote Files, Storage and

Computers

– Replica (cache) Management – Virtual Data

– Parallel Computing

• 9) Other services including

– Grid Shell – Accounting

– Fabric Management

– Visualization Data-mining and Computational Steering

– Collaboration

• 10) Portals and Problem Solving Environments • 11) Network Services

(29)

Two-level Programming I

• The paradigm implicitly assumes a

two-level Programming

Model

• We make a

Service

(same as a “distributed object” or

“computer program” running on a remote computer) using

conventional technologies

– C++ Java or Fortran Monte Carlo module – Data streaming from a sensor or Satellite – Specialized (JDBC) database access

• Such

nuggets

accept and produce data from users files and

database

• The Grid is built by coordinating such nuggets assuming

we have solved problem of programming the nugget

Nugge

(30)

Two-level Programming II

The Grid is discussing the linkage and distribution of the

nuggets

with the onl

addition runtime interfaces

to Grid as opposed to

UNIX data stream

Familiar from use of UNIX Shell, PERL or Python scripts

to produce real applications from core programs

Such interpretative environments are the single processor

analog of

Grid Programming

and this tends to be called

workflow

Workflow

is the composition of multiple services

(programs) together to make a new service

– Includes “Software Bus”, “Application Integration”, “Co-ordination Languages” etc.

Nugget

1 Nugget2

Nugget

(31)

Workflow

Workflow

has at least 4 parts

– “Programming Environment” – typically GUI to drag and drop services and their linkages (familiar from AVS etc. which was workflow for visualization)

– Language – from XML to extended Python

– Compiler – converting Language into executable

– Runtime controlling flow of information and notification events

Can use Python, Mathematica, Matlab, JavaSpaces, IBM

BPEL4WS, DoE CCA etc.

– Don’t think current systems are very near “what we will want” but expect much progress over next 3 years and plenty of systems to work with

(32)

Workflow GCEs and Problem Solving

Environments (PSEs)

There is some confusion between fields of workflow

(Grid Computing Environments GCE) and PSEs

To extent PSEs “just” allow manipulation of “nuggets”,

they are indistinguishable from a domain specific GCE

They are distinct if they support intra nugget operations

such as

– Integration of mesh and simulation – Closely coupled code linkage

– Generation of code from high level interface like Mathematica

(33)

Database

SERVOGrid Complexit Simulation Service

XML Meta-dat Service

Job

s Tools

SERVOPS Programs using CCEM (SERVOML)

MultiScal e

Ontologie s

Job MetaData Tool MetaData

Selected GeoInformatics Data

Complexity Scripts

Importance of Metadata Service; how should this be implemented?

(34)

Metadata Approaches

Specialized services like UDDI and MDS (Globus)

– Nobody likes UDDI

– MDS uses LDAP

– RGMA is MDS with a relational database backend

“By hand” as in current GEM Portal which is roughly

same as using service stored SDE’s (Service Data

Elements) as in OGSI

Some new MDS coming from Globus GT3?

– Current MDS has both a Schema (insufficient for us) and a “database technology”

Semantic Grid technologies

Some basic XML database (Oracle, Xindice …)

(35)

Workflow and SERVOGrid CCE

• SERVOGrid should workflow technology to support both – “code and data coupling” (DISLOC with SIMPLEX etc.) – Multiscale features

• Implementing multiscale model requires – building Web services for each model, – describing each model with metadata and

– Describing linkage of models (linkage of ports on web services) – And describing when to use which scale model

• So workflow and multiscale depend on web services described by rich metadata

• This analysis isn’t correct if scales must be “tightly coupled” as current workflow won’t support this (CCA from DoE claims to address this but not clear if general)

– We should focus on multiscale models with loose “nugget” coupling

(36)

Technologies under development at Indiana

Portal Infrastructure and Portlets integrating with rest of

Globus/OGSA-DAI Community

– Including job submission, management of modest meta-data and linkage to databases

– Should package as “application web service toolkit” and test on ACES world wide iSERVOGrid

“Some” core portal Metadata (Semantic Grid) services

Messaging system between Web services that is useful for

– “Service Management”/Autonomic Grids

– Security

– Notification service

(37)

Web Services as a Portlet

Each

Web Service

naturally has a

user interface

specified as “just

another port”

– Customizable for universal access

This gives each Web Service a

Portlet

view specified (in XML as

always) by

WSRP

(Web services

for Remote Portals)

So component model for resources

“automatically” gives a

component

model for user interfaces

– When you build your

application, you define portle

at same time

Application o Content source WSD L Web Service S R W P

Application as a WS

General Application Port Interface with other We Services

User Face o Web Servic

WSRP Ports define

WS as a Portlet

Web Services have other ports (Grid Service) to be

(38)

Online Knowledge Center built from Portlets

• Web Services

provide a

component model

for the middleware (see large “

common

component architecture

” effort in Dept. of

Energy)

• Should match each WSDL component with

a corresponding user interface component

• Thus one “must use” a

component model

for the portal

with again an XML

specification (

portalML

) of portal

component

(39)

Sample page with several portlets:

(40)

Provide information about application

and

host parameters

Select application to edit

References

Related documents

This is the first time were direct 24-hour energy expendi- ture measurements in healthy infants with a standardized methodology [6], was used as a reference to test the accu- racy

Como objetivos específicos investigou-se como as metodologias ativas como a Flipped Classroom (Sala de Aula Invertida) cria condições favoráveis ao processo de ensino

To try to describe this voice, I came up with the word ‘mythod’ meaning our own personal myth or story, our experience, our memory and history, our dreams and imagination, and how

Some limit properties for information based model selection criteria are given in the context of unit root evaluation and various assumptions about initial conditions.. The

World Health Organization and the European research Organization on Genital Infection and Neoplasia in the year 2000 mentioned that HPV testing showed

In this work, we proposed two novel distributed algorithms to solve the RSS/AoA localization problem for known transmit powers based on SOCP

In the year before the LBP diagnosis (Table 5), utilization of ambulatory care was in general considerably lower for the 1–2 LBP encounter subgroup and the specialty care

• If your client is a HIPAA guaranteed-issue, group, or individual conversion plan member, have him or her complete the Application for Blue Shield Individual and Family