• No results found

Lessons on Process and Standards in other science communities

N/A
N/A
Protected

Academic year: 2020

Share "Lessons on Process and Standards in other science communities"

Copied!
42
0
0

Loading.... (view fulltext now)

Full text

(1)

Lessons on Process and

Standards in other science

communities

IMAG Model Sharing Strategies Workshop

NIH April 10 2007

Geoffrey Fox

Computer Science, Informatics, Physics Pervasive Technology Laboratories Indiana University Bloomington IN 47401

http://

grids.ucs.indiana.edu/ptliupages/presentations/

(2)

What is a Model Electronically?

n This should have a label – a URI

n It should have a collection of data or metadata defining it

n It might have some way of building composite models by joining

multiple smaller models together

Need to be able to define connections

n Maybe there are also “mechanisms” to manipulate model or

evolve it in time

n A computer program defines the data as values and the

mechanisms as subroutines/methods

Programs can be Fortran, Python, C#, PrologDeclarative or Imperative; Scripted or Compiled

n However in spite of software engineering, computer programs

(3)

What are Questions?

n

What are the

models

we are trying to define?

n

What is

Process

to decide on needed standards and

their Syntax

n

Are we mainly concerned about

data

defining the

model and/or the

programs

that build the model

n

Where are overlaps between

IMAG requirements

and

other

computer science or science fields

n

Is the

barrier

to sharing models “science” (i.e. it is not

clear what the common interfaces are) or

(4)

Some Examples

n There are many examples of relevant efforts to encourage

sharing of models

n DMSO (Defense Modeling and Simulation Office) produced

HLA (High Level Architecture) as a (pre-CORBA/Web Service) way of defining military models as discrete event simulations

Good but out of date

n The Open Geospatial Consortium OGC

http://www.opengeospatial.org/ is a consortium of 339 organization setting excellent standards for Geographical Information Systems

We could develop a BIS Biological Information System?

n International Virtual Observatory Alliance IVOA

(5)

Virtual Observatory Astronomy Gri

Integrate Experiments

Radio Far-Infrared Visible

Visible + X-ray

Dust Map

Galaxy Density Map

(6)

OGC Standards I

OGC 04-094 Date: 2005-05-03 Version: 1.1.0 Pages: 131 WFS allows a client to retrieve and update geospatial

data encoded in GML from multiple Web Feature Services. The specification defines interfaces for data access and manipulation operations on geographic features, using HTTP as the distributed computing platform. Via these interfaces, a Web user or service can combine, use and manage geodata -- the feature information behind a map image -- from different

Web Feature Service (WFS) OGC 05-086 Date: 2005-10-05 Version: 1.0 Pages 110 The general models and XML encodings for sensors.

Sensor Model Language (SensorML) OGC 05-087r3 Version: 0.13.0 Date: 2006-02-24 Pages: 136

The general models and XML encodings for observations and measurements, including but not restricted to those using sensors. Based on GML.

Observations and Measurements (O&M) ISO/TC 211/WG 19136 OGC 03-105r1 Version: 3.1.0 Date:2004-02-07 Pages: 601

GML is an XML grammar written in XML Schema for the modeling, transport, and storage of geographic information. GML provides a variety of kinds of objects for describing geography including features, coordinate reference systems, geometry, topology, time, units of measure and generalized values.

(7)

OGC Standards II

OGC 04-095

Date: 3 May 2005 Version: 1.1.0 Pages: 40 Filter Encoding defines an XML encoding for filter

expressions. A filter expression constrains property values to create a subset of a group of objects. The goal, typically, is to operate on just those objects by, for example, rendering them in a different color or saving them to another format.

Filter Encoding

OGC 02-087r3 Date: 2002-12-13 Version: 1.1.1 Pages: 239 Catalogue Service Implementation Specification

defines a common interface that enables diverse but conformant applications to perform discovery, browse and query operations against distributed heterogeneous catalog servers.

Catalogue Services OGC 03-065r6 Date: 2003-08-27 Version: 1.0.0 Pages: 67 WCS extends the WMS interface to allow access to

geospatial “coverages" (raster data sets) that represent values or properties of geographic locations, rather than WMS generated maps (pictures).

Web Coverag Service (WCS) OGC 06-042 Date: 2006-03-15 Version: 1.3.0 Pages: 85 A Web Map Service (WMS) produces maps of spatially

(8)

WMS uses WFS that uses data sources

<gml:featureMember>

<fault>

<name> Northridge2 </name> <segment> Northridge2

</segment>

<author> Wald D. J.</author>

<gml:lineStringProperty>

<gml:LineString

srsName="null">

<gml:coordinates>

118.72,34.243 -118.591,34.176

</gml:coordinates>

</gml:LineString>

</gml:lineStringProperty>

</fault>

</gml:featureMember>

(9)

OGC Standards

n Typify a common competition – there is a similar effort by

Technical Committee tasked by the International Standards Organization (ISO/TC211).

n Are very complex – GML specification itself is over 600 pages n Underlie the success of GIS and enabled through first through

ESRI (ArcInfo) and Minnesota Map Server and now through

Google Maps

n Are built in XML (as they should be) but for efficiency one

Transmits through binary XML

Stores in SQL databases not in XML databases

n Define some tings (catalog) which are unnecessary as provided

by a broader community

n Observations and Measurements work for any time series and

(10)

OGC Standards Structure

n

Have a language

GML

that defines the field – this

would be

CellML

and

SBML

in the case of Biology and

CML

for ChemInformatics

n

Have a user interface (the Map) captured as a

Web

Map Service

n

Have a “pixel data” service WCS the

Web Coverage

Service

n

Have a “vector” (feature, property) data service WFS

the

Web Feature Service

Note any Earth Science simulation or data analysis can be

(11)

Grid Workflow Datamining in Earth Science

n Work with Scripps Institute

n Grid services controlled by workflow process real time

data from ~70 GPS Sensors in Southern California

Streaming Data Support

Transformations Data Checking

Hidden Marko Datamining (JPL)

Display (GIS)

NASA GPS

Earthquake

(12)

Data Federation

n The IVOA activities is aimed largely at supporting interoperable

data repositories that can feed into the image processing filtering needed to extract signals

There us not so much simulation

n ChemInformatics has most data in NIH’s PubChem but will

need to federate additional repositories such as those produced by individual Chemistry groups and the raw data from NIH screening centers

n Every county (total 92) in Indiana has its own GIS and

something equivalent to a WFS holding information not yet known to Google! (e.g. our house pinpoint address and

assessment)

Need to federate all these to support state agencies

n So federation of distributed resources a major issue and WFS

(13)

GIS Grid of “Indiana Map” and ~10 Indiana counties with accessible Map (Feature) Servers from different vendors. Grids federate different data repositories (cf Astronomy VO federating different observatory collections)

Indiana County Map Grid

(14)

Browser + Google Map API

Cass County Map Server

(OGC Web Map Server) Hamilton County Map Server (AutoDesk) Marion County Map Server (ESRI ArcIMS)

Browser client fetches image tiles for the bounding box using Google Map API.

Tile Server

Cache Server

Adapter Adapter Adapter

Tile Server requests map tiles at all zoom levels with all layers. These are converted to uniform projection, indexed, and stored. Overlapping images are combined.

Must provide adapters for each Map Server type .

The cache server fulfills Google map calls with cached tiles at the requested

bounding box that fill the bounding box.

(15)

Searched on Transit/Transportation Searched on Transit/Transportation

(16)

Service or Web service Approach

n One uses GML, CML etc. to define the data in a system and one

uses services to capture “methods” or “programs”

n In eScience, important services fall in three classes

Simulations

Data access, storage, federation, discoveryFilters for data mining and manipulation

n Services use something like WSDL (Web Service Definition

Language) to define interoperable interfaces (see OPAL talk!)

n WSDL establishes a “contract” independent of implementation

between two services or a service and a client

n Services should be loosely coupled which normally means they

are coarse grain

n Services will be composed (linked together) by mashups

(typically scripts) or workflow (often XML – BPEL)

n Software Engineering and Interoperability/Standards are closely

(17)

Philosophy of Web Service Grids

n

Much of Distributed Computing was built by natural

extensions of computing models developed for sequential

machines

n

This leads to the

distributed object

(DO) model represented

by Java and

CORBA

RPC (Remote Procedure Call) or RMI (Remote Method

Invocation) for Java

n

Key people think this is not a good idea as it scales badly

and ties distributed entities together too tightly

Distributed Objects

Replaced by

Services

n

Note

CORBA

was considered too complicated in both

organization and proposed infrastructure

and

Java

was considered as “tightly coupled to Sun”

So there were other reasons to discard

n

Thus replace distributed objects by

services

connected by

(18)

Web services

n

Web Services

build

loosely-coupled,

distributed

applications,

(wrapping existing

codes and databases)

based on the

SOA

(service oriented

architecture) principles.

n

Web Services interact

by exchanging messages

in

SOAP

format

n

The contracts for the

message exchanges that

implement those

interactions are

described via

WSDL

(19)

A typical Web Service

n In principle, services can be in any language (Fortran .. Java ..

Perl .. Python) and the interfaces can be method calls, Java RMI Messages, CGI Web invocations, totally compiled away (inlining)

n The simplest implementations involve XML messages (SOAP) and

programs written in net friendly languages like Java and Python

Paymen Credit

Card

Warehous e

Shipping control

WSDL interfaces

WSDL interfaces

Securit

y Catalog

Porta Service

(20)

CICC Web Service Infrastructure

Portal Services

RSS Feeds User Profiles

Collaboration as in Sakai

Grid Services

Service Registry Job Submission and Management

Local Clusters IU Big Red TeraGrid, Open

Varuna.net

Quantum Chemistry

OSCAR Document Analysis InChI Generation/Search

(21)

Where Does The Functionality Come From?

Indiana University

 VOTables

 NCI DTP predictions  Database services

Cambridge University

 InChi generation / search  OSCAR

OpenEye

Docking

DigitalChemistry

BCI fingerprints DivKMeans

CDK

Cheminformatics

University of Michigan

 PkCell

R

Foundation

 R package

NIH

PubChem

 PubMed

gNova Consulting

European Chemicals Bureau

(22)

Service Modeling Language (SML)

n Submitted to W3C by industry giants 21 March 2007 n A model in SML is realized as a set of interrelated XML

documents. The XML documents contain information about the parts of an IT service, as well as the constraints that each part must satisfy for the IT service to function properly. Constraints are captured in two ways:

n Schemas – these are constraints on the structure and content of

the documents in a model. SML uses a profile of XML Schema 1.0 as the schema language. SML also defines a set of extensions to XML Schema to support inter-document references.

n Rules – are Boolean expressions that constrain the structure and

content of documents in a model. SML uses a profile of

(23)

Models in SML

n Models focus on capturing all invariant aspects of a service/system that

must be maintained for the service/system to be functional.

n Models are units of communication and collaboration between designers,

implementers, operators, and users; and can easily be shared, tracked, and revision controlled. This is important because complex services are often built and maintained by a variety of people playing different roles.

n Models drive modularity, re-use, and standardization. Most real-world

complex services and systems are composed of sufficiently complex

parts. Re-use and standardization of services/systems and their parts is a key factor in reducing overall production and operation cost and in

increasing reliability.

n Models represent a powerful mechanism for validating changes before

applying the changes to a service/system. Also, when changes happen in a running service/system, they can be validated against the intended state described in the model. The actual service/system and its model together enable a self-healing service/system – the ultimate objective. Models of a service/system must necessarily stay decoupled from the live service/system to create the control loop

n Models enable increased automation of management tasks. Automation

(24)

Structured v Unstructured Metadata

n

The schema’s that are defined by GML etc. are

structured definitions

n

The

traditional semantic web

approach is largely based

on structured metadata (OWL) that one can analyze

precisely

n

UML

was for example used by OGC in developing

standards

n

In the “real world”, unstructured annotation has been

(25)

How to set standards

n If one is Google, you can just define the standard and not bother

to discuss it!

Google maps does not support OGC standards

n The growth in distributed computing has spurred a great deal of

standards work as we need the different parts of system built by different people

n Often meet every few weeks to build a standard in 12 months n OASIS defines a process and doesn’t define an architecture n W3C is most prestigious

n OGF Open Grid Forum has an eScience section that is currently

led by me

n Or do it outside any standards body as in fact most domain

specific standards are done

Note IVOA has meetings from time to time at OGF to coordinate their

(26)

The Grid and Web Service Institutional Hierarchy

OGSA GS-*

and some WS-* GGF/W3C/…

XGSP (Collab)

WS-* fro OASIS/W3C Industry

Apache Axi .NET etc.

Must set standards to get interoperability

2: System Services and Features (WS-* from OASIS/W3C/Industry)

Handlers like WS-RM, Security, UDDI Registry

3: Generally Useful Services and Features (OGSA and other GGF, W3C) Such as

“Collaborate”, “Access a Database” or “Submit a Job” 4: Application or Community of Interest (CoI Specific Services such as “Map Services”, “Run

BLAST” or “Simulate a Missile”

1: Container and Run Time (Hosting) Environment (Apache Axis, .NET etc.)

XBM

XTCE VOTABLE CML

CellML

(27)

The Ten areas covered by the 60 core WS-* Specifications

WSRP (Remote Portlets)

10: Portals and User Interfaces

WS-Policy, WS-Agreement

9: Policy and Agreements

WSDM, WS-Management, WS-Transfer

8: Management

WSRF, WS-MetadataExchange, WS-Context

7: System Metadata and State

UDDI, WS-Discovery

6: Service Discovery

WS-Security, WS-Trust, WS-Federation, SAML, WS-SecureConversation

5: Security

BPEL, WS-Choreography, WS-Coordination

4: Workflow and Transactions

WS-Notification, WS-Eventing (Publish-Subscribe)

3: Notification

WS-Addressing, WS-MessageDelivery; Reliable Messaging WSRM; Efficient Messaging MOTM

2: Service Internet

XML, WSDL, SOAP

1: Core Service Model

Examples WS-* Specification Area

(28)

Activities in Global Grid Forum Working Groups

Authorization, P2P and Firewall Issues, Trusted Computing

7: Security

Resource/Service configuration, deployment and lifetime, Usage records and access, Grid economy model

6: Management

Network measurements, Role of IPv6 and high performance networking, Data transport

5: Infrastructure

Database and File Grid access, Grid FTP, Storage Management, Data replication, Binary data specification and interface, High-level publish/subscribe, Transaction management

4: Data

Job Submission, Basic Execution Services, Service Level Agreements for Resource use and reservation, Distributed Scheduling

3: Compute

Software Interfaces to Grid, Grid Remote Procedure Call, Checkpointing and Recovery, Interoperability to Job Submittal services, Information Retrieval,

2: Applications

High Level Resource/Service Naming (level 2 of slide 6), Integrated Grid Architecture

1: Architecture

GS-* and OGSA Standards Activities GGF Area

(29)

Two-level Programming I

• The Web Service (Grid) paradigm implicitly assumes a

two-level Programming Model

• We make a

Service

(same as a “distributed object” or

“computer program” running on a remote computer) using

conventional technologies

– C++ Java or Fortran Monte Carlo module – Data streaming from a sensor or Satellite – Specialized (JDBC) database access

• Such

services

accept and produce data from users files and

database

• The Grid is built by coordinating such services assuming

we have solved problem of programming the service

Servic

e Data

(30)

Two-level Programming II

n

The Grid is discussing the composition of distributed

services

with the runtime

interfaces to Grid as

opposed to UNIX

pipes/data streams

n

Familiar from use of UNIX Shell, PERL or Python

scripts to produce real applications from core programs

n

Such interpretative environments are the single

processor analog of

Grid Programming

n

Some projects like GrADS from Rice University are

looking at integration between service and composition

levels but dominant effort looks at each level separately

Service

1 Service2

Service

(31)

Grid Workflow Data Assimilation in Earth Science

n Grid services triggered by abnormal events and controlled by workflow process real

time data from radar and high resolution simulations for tornado forecasts Typical

graphical interface to service

(32)

3 Layer Programming Model

Application (level 1 Programming)

Application Semantics (Metadata, Ontology) Level 2 “Programming”

Basic Web Service Infrastructure

Web Service 1

Workflow (level 3) Programming BPEL

WS 2 WS 3 WS 4

MPI Fortran C++ etc.

Semantic Web

(33)

Database

S S

S

S SS SS SS SS SS SS SS SS

F S F S F S F S F S F S F S F

S SF

F S F S F S F S F S F S F S F S F S F S F

S Portal

F S O S O S O S O S O S O S O S O S O S O S O S O S MD MD MD MD MD MD MD MD MD

MetaData Filter Service Sensor Service Other Service Anothe Grid

Raw DataDataInformationKnowledgeWisdom Decisions S S S S Anothe Service Anothe Service S S Anothe

Grid S S

(34)

Information Management/Processing

n SOAP messages transport information expressed in a

semantically rich fashion between sources and services that enhance and transform information so that complete system provides

Semantic Web technologies like RDF and OWL help us have

rich expressivity

n DataInformationKnowledge transformation n We build application specific information

management/transformation systems ASIS for each application domain

n One special domain is the system itself where the metadata

(35)

Generalizing a GIS

n

Geographical Information Systems

GIS have been

hugely successful in all fields that study the earth and

related worlds

They define Geography Syntax (GML) and ways to store,

access, query, manipulate and display geographical features

In SOA, GIS corresponds to a domain specific XML language

and a suite of services for different functions above

n

However such a universal information model has

not

been developed in other areas

even though there are

many fields in which it appears possible

BIS Biological Information SystemMIS Military Information System

IRIS Information Retrieval Information SystemPAIS Physics Analysis Information System

(36)

ASIS Application Specific Information System I

n a) Discovery capabilities that are best done using WS-*

standards

n b) Domain specific metadata and data including

search/store/access interface. (cf WFS). Lets call generalization

ASFS (Application Specific Feature Service)

Language to express domain specific features (cf GML). Lets call

this ASL (Application Specific language)

Tools to manipulate information expressed in language and key

data of application (cf coordinate transformations). Lets call this ASTT (Application specific Tools and Transformations)

ASL must support Data sources such as sensors (cf OGC metadata

and data sensor standards) and repositories. Sensors need

(common across applications) support of streams of data

Queries need to support archived (find all relevant data in past)

and streaming (find all data in future with given properties)

Note all AS Services behave like Sensors and all sensors are

wrapped as services

Any domain will have “raw data” (binary) and that which has been

(37)

ASIS Application Specific Information System II

n Lets call this ASVS (Application Specific Visualization Services)

generalizing WMS for GIS

n The ASVS should both visualize information and provide a way of

navigating (cf GetFeatureInfo) database (the ASFS)

n The ASVS can itself be federated and presents an ASFS output

interface

n d) There should be application service interface for ASIS from which all

ASIS service inherit

n e) There will be other user services interfacing to ASIS

n All user and system services will input and output data in ASL using

filters to cope with ASBD

AS Tool (generic ) A “Sensor A Repository AS Service (user defined) ASVS Displa y AS Tool (generic )

Messages using ASL

Filter, Transformation, Reasoning, Data-mining, Analysis

(38)

Mashups v Workflow?

n Mashup Tools are reviewed at http://blogs.zdnet.com/Hinchcliffe/?p=63 n Workflow Tools are reviewed by Gannon and Fox

http://grids.ucs.indiana.edu/ptliupages/publications/Workflow-overview.pdf

n Both include

scripting in PHP, Python, sh etc. as both implement distributed

programming at level of services

n Mashups use all

types of service

interfaces and do not have the potential

robustness (security) of Grid service

approach

n Typically “pure”

(39)

Web 2.0 APIs

http://www.programmableweb.com/apis

currently

(March 3 2007) 388 Web 2.0 APIs with GoogleMaps the

most used in Mashups

This site acts as a “UDDI” or “OGC Catalog” for Web

(40)

The List of

Web 2.0 API’s

Each site has API

and its features

Divided into

broad categories

Only a few used a

lot (34 API’s used

in more than 10

mashups)

RSS feed of new

(41)

3 more Mashups

each day

For a total of 1609

March 3 2007

Note ClearForest

runs Semantic Web Services Mashup

competitions (not workflow

competitions)

Some Mashup

types: aggregators, search aggregators, visualizers, mobile, maps, games

(42)

APIs/Mashups per Protocol Distribution

REST SOAP XML-RPC REST,

XML-RPC XML-RPC,REST, REST,SOAP JS Other

google maps

netvibes

live.com

virtual earth

google search

amazon S3

amazon ECS

flickr

ebay

youtube 411syncdel.icio.us

yahoo! search yahoo! geocoding

technorati

yahoo! images trynt

yahoo! local

Number of Mashups

References

Related documents

This study was about an Android-based dictionary of dance application that described the meaning of words or terms in Balinese dance with multimedia

The Gezi Park protests of the last summer, the graft probe of late December, and ensuing power struggle between the govern- ing AK Party and the Gülen Movement, a reli-

If scientific, technical, or other specialized knowledge will assist the trier of fact to understand the evidence or to determine a fact in issue, a witness qualified as an expert

We have reported the results of a 4-week study exploring how different types of cues and positive reinforcement influence the development of automaticity,

For most patients, a sulfonylurea medication is a reasonable second line agent, given the relative efficacy, tolerability, long term experience/safety profile and very low cost

In this study, the growth of bone cells was investigated on porous silicon nitride (Si 3 N 4 ) ceramic implant by using electrochemical impedance spectroscopy (EIS)..

For example, a command of type Template a cannot change the state of any existing objects when executed: object instantiation only adds objects to the system state. Moreover,

28 (2004); Guowuyuan bangongting guanyu yange zhixing youguan nongcun jiti jianshe yongdi falv he zhengce de tongzhi [ Notice of the Office of the State Council on