Lessons on Process and
Standards in other science
communities
IMAG Model Sharing Strategies Workshop
NIH April 10 2007
Geoffrey Fox
Computer Science, Informatics, Physics Pervasive Technology Laboratories Indiana University Bloomington IN 47401
http://
grids.ucs.indiana.edu/ptliupages/presentations/
What is a Model Electronically?
n This should have a label – a URI
n It should have a collection of data or metadata defining it
n It might have some way of building composite models by joining
multiple smaller models together
• Need to be able to define connections
n Maybe there are also “mechanisms” to manipulate model or
evolve it in time
n A computer program defines the data as values and the
mechanisms as subroutines/methods
• Programs can be Fortran, Python, C#, Prolog • Declarative or Imperative; Scripted or Compiled
n However in spite of software engineering, computer programs
What are Questions?
n
What are the
models
we are trying to define?
n
What is
Process
to decide on needed standards and
their Syntax
n
Are we mainly concerned about
data
defining the
model and/or the
programs
that build the model
n
Where are overlaps between
IMAG requirements
and
other
computer science or science fields
n
Is the
barrier
to sharing models “science” (i.e. it is not
clear what the common interfaces are) or
Some Examples
n There are many examples of relevant efforts to encourage
sharing of models
n DMSO (Defense Modeling and Simulation Office) produced
HLA (High Level Architecture) as a (pre-CORBA/Web Service) way of defining military models as discrete event simulations
• Good but out of date
n The Open Geospatial Consortium OGC
http://www.opengeospatial.org/ is a consortium of 339 organization setting excellent standards for Geographical Information Systems
• We could develop a BIS Biological Information System?
n International Virtual Observatory Alliance IVOA
Virtual Observatory Astronomy Gri
Integrate Experiments
Radio Far-Infrared Visible
Visible + X-ray
Dust Map
Galaxy Density Map
OGC Standards I
OGC 04-094 Date: 2005-05-03 Version: 1.1.0 Pages: 131 WFS allows a client to retrieve and update geospatial
data encoded in GML from multiple Web Feature Services. The specification defines interfaces for data access and manipulation operations on geographic features, using HTTP as the distributed computing platform. Via these interfaces, a Web user or service can combine, use and manage geodata -- the feature information behind a map image -- from different
Web Feature Service (WFS) OGC 05-086 Date: 2005-10-05 Version: 1.0 Pages 110 The general models and XML encodings for sensors.
Sensor Model Language (SensorML) OGC 05-087r3 Version: 0.13.0 Date: 2006-02-24 Pages: 136
The general models and XML encodings for observations and measurements, including but not restricted to those using sensors. Based on GML.
Observations and Measurements (O&M) ISO/TC 211/WG 19136 OGC 03-105r1 Version: 3.1.0 Date:2004-02-07 Pages: 601
GML is an XML grammar written in XML Schema for the modeling, transport, and storage of geographic information. GML provides a variety of kinds of objects for describing geography including features, coordinate reference systems, geometry, topology, time, units of measure and generalized values.
OGC Standards II
OGC 04-095
Date: 3 May 2005 Version: 1.1.0 Pages: 40 Filter Encoding defines an XML encoding for filter
expressions. A filter expression constrains property values to create a subset of a group of objects. The goal, typically, is to operate on just those objects by, for example, rendering them in a different color or saving them to another format.
Filter Encoding
OGC 02-087r3 Date: 2002-12-13 Version: 1.1.1 Pages: 239 Catalogue Service Implementation Specification
defines a common interface that enables diverse but conformant applications to perform discovery, browse and query operations against distributed heterogeneous catalog servers.
Catalogue Services OGC 03-065r6 Date: 2003-08-27 Version: 1.0.0 Pages: 67 WCS extends the WMS interface to allow access to
geospatial “coverages" (raster data sets) that represent values or properties of geographic locations, rather than WMS generated maps (pictures).
Web Coverag Service (WCS) OGC 06-042 Date: 2006-03-15 Version: 1.3.0 Pages: 85 A Web Map Service (WMS) produces maps of spatially
WMS uses WFS that uses data sources
<gml:featureMember>
<fault>
<name> Northridge2 </name> <segment> Northridge2
</segment>
<author> Wald D. J.</author>
<gml:lineStringProperty>
<gml:LineString
srsName="null">
<gml:coordinates>
118.72,34.243 -118.591,34.176
</gml:coordinates>
</gml:LineString>
</gml:lineStringProperty>
</fault>
</gml:featureMember>
OGC Standards
n Typify a common competition – there is a similar effort by
Technical Committee tasked by the International Standards Organization (ISO/TC211).
n Are very complex – GML specification itself is over 600 pages n Underlie the success of GIS and enabled through first through
ESRI (ArcInfo) and Minnesota Map Server and now through
Google Maps
n Are built in XML (as they should be) but for efficiency one
• Transmits through binary XML
• Stores in SQL databases not in XML databases
n Define some tings (catalog) which are unnecessary as provided
by a broader community
n Observations and Measurements work for any time series and
OGC Standards Structure
n
Have a language
GML
that defines the field – this
would be
CellML
and
SBML
in the case of Biology and
CML
for ChemInformatics
n
Have a user interface (the Map) captured as a
Web
Map Service
n
Have a “pixel data” service WCS the
Web Coverage
Service
n
Have a “vector” (feature, property) data service WFS
the
Web Feature Service
• Note any Earth Science simulation or data analysis can be
Grid Workflow Datamining in Earth Science
n Work with Scripps Institute
n Grid services controlled by workflow process real time
data from ~70 GPS Sensors in Southern California
Streaming Data Support
Transformations Data Checking
Hidden Marko Datamining (JPL)
Display (GIS)
NASA GPS
Earthquake
Data Federation
n The IVOA activities is aimed largely at supporting interoperable
data repositories that can feed into the image processing filtering needed to extract signals
• There us not so much simulation
n ChemInformatics has most data in NIH’s PubChem but will
need to federate additional repositories such as those produced by individual Chemistry groups and the raw data from NIH screening centers
n Every county (total 92) in Indiana has its own GIS and
something equivalent to a WFS holding information not yet known to Google! (e.g. our house pinpoint address and
assessment)
• Need to federate all these to support state agencies
n So federation of distributed resources a major issue and WFS
GIS Grid of “Indiana Map” and ~10 Indiana counties with accessible Map (Feature) Servers from different vendors. Grids federate different data repositories (cf Astronomy VO federating different observatory collections)
Indiana County Map Grid
Browser + Google Map API
Cass County Map Server
(OGC Web Map Server) Hamilton County Map Server (AutoDesk) Marion County Map Server (ESRI ArcIMS)
Browser client fetches image tiles for the bounding box using Google Map API.
Tile Server
Cache Server
Adapter Adapter Adapter
Tile Server requests map tiles at all zoom levels with all layers. These are converted to uniform projection, indexed, and stored. Overlapping images are combined.
Must provide adapters for each Map Server type .
The cache server fulfills Google map calls with cached tiles at the requested
bounding box that fill the bounding box.
Searched on Transit/Transportation Searched on Transit/Transportation
Service or Web service Approach
n One uses GML, CML etc. to define the data in a system and one
uses services to capture “methods” or “programs”
n In eScience, important services fall in three classes
• Simulations
• Data access, storage, federation, discovery • Filters for data mining and manipulation
n Services use something like WSDL (Web Service Definition
Language) to define interoperable interfaces (see OPAL talk!)
n WSDL establishes a “contract” independent of implementation
between two services or a service and a client
n Services should be loosely coupled which normally means they
are coarse grain
n Services will be composed (linked together) by mashups
(typically scripts) or workflow (often XML – BPEL)
n Software Engineering and Interoperability/Standards are closely
Philosophy of Web Service Grids
n
Much of Distributed Computing was built by natural
extensions of computing models developed for sequential
machines
n
This leads to the
distributed object
(DO) model represented
by Java and
CORBA
•
RPC (Remote Procedure Call) or RMI (Remote Method
Invocation) for Java
n
Key people think this is not a good idea as it scales badly
and ties distributed entities together too tightly
•
Distributed Objects
Replaced by
Services
n
Note
CORBA
was considered too complicated in both
organization and proposed infrastructure
•
and
Java
was considered as “tightly coupled to Sun”
•So there were other reasons to discard
n
Thus replace distributed objects by
services
connected by
Web services
n
Web Services
build
loosely-coupled,
distributed
applications,
(wrapping existing
codes and databases)
based on the
SOA
(service oriented
architecture) principles.
n
Web Services interact
by exchanging messages
in
SOAP
format
n
The contracts for the
message exchanges that
implement those
interactions are
described via
WSDL
A typical Web Service
n In principle, services can be in any language (Fortran .. Java ..
Perl .. Python) and the interfaces can be method calls, Java RMI Messages, CGI Web invocations, totally compiled away (inlining)
n The simplest implementations involve XML messages (SOAP) and
programs written in net friendly languages like Java and Python
Paymen Credit
Card
Warehous e
Shipping control
WSDL interfaces
WSDL interfaces
Securit
y Catalog
Porta Service
CICC Web Service Infrastructure
Portal Services
RSS Feeds User Profiles
Collaboration as in Sakai
Grid Services
Service Registry Job Submission and Management
Local Clusters IU Big Red TeraGrid, Open
Varuna.net
Quantum Chemistry
OSCAR Document Analysis InChI Generation/Search
Where Does The Functionality Come From?
Indiana University
VOTables
NCI DTP predictions Database services
Cambridge University
InChi generation / search OSCAR
OpenEye
Docking
DigitalChemistry
BCI fingerprints DivKMeans
CDK
Cheminformatics
University of Michigan
PkCell
R
Foundation
R package
NIH
PubChem
PubMed
gNova Consulting
European Chemicals Bureau
Service Modeling Language (SML)
n Submitted to W3C by industry giants 21 March 2007 n A model in SML is realized as a set of interrelated XML
documents. The XML documents contain information about the parts of an IT service, as well as the constraints that each part must satisfy for the IT service to function properly. Constraints are captured in two ways:
n Schemas – these are constraints on the structure and content of
the documents in a model. SML uses a profile of XML Schema 1.0 as the schema language. SML also defines a set of extensions to XML Schema to support inter-document references.
n Rules – are Boolean expressions that constrain the structure and
content of documents in a model. SML uses a profile of
Models in SML
n Models focus on capturing all invariant aspects of a service/system that
must be maintained for the service/system to be functional.
n Models are units of communication and collaboration between designers,
implementers, operators, and users; and can easily be shared, tracked, and revision controlled. This is important because complex services are often built and maintained by a variety of people playing different roles.
n Models drive modularity, re-use, and standardization. Most real-world
complex services and systems are composed of sufficiently complex
parts. Re-use and standardization of services/systems and their parts is a key factor in reducing overall production and operation cost and in
increasing reliability.
n Models represent a powerful mechanism for validating changes before
applying the changes to a service/system. Also, when changes happen in a running service/system, they can be validated against the intended state described in the model. The actual service/system and its model together enable a self-healing service/system – the ultimate objective. Models of a service/system must necessarily stay decoupled from the live service/system to create the control loop
n Models enable increased automation of management tasks. Automation
Structured v Unstructured Metadata
n
The schema’s that are defined by GML etc. are
structured definitions
n
The
traditional semantic web
approach is largely based
on structured metadata (OWL) that one can analyze
precisely
n
UML
was for example used by OGC in developing
standards
n
In the “real world”, unstructured annotation has been
How to set standards
n If one is Google, you can just define the standard and not bother
to discuss it!
• Google maps does not support OGC standards
n The growth in distributed computing has spurred a great deal of
standards work as we need the different parts of system built by different people
n Often meet every few weeks to build a standard in 12 months n OASIS defines a process and doesn’t define an architecture n W3C is most prestigious
n OGF Open Grid Forum has an eScience section that is currently
led by me
n Or do it outside any standards body as in fact most domain
specific standards are done
• Note IVOA has meetings from time to time at OGF to coordinate their
The Grid and Web Service Institutional Hierarchy
OGSA GS-*
and some WS-* GGF/W3C/…
XGSP (Collab)
WS-* fro OASIS/W3C Industry
Apache Axi .NET etc.
Must set standards to get interoperability
2: System Services and Features (WS-* from OASIS/W3C/Industry)
Handlers like WS-RM, Security, UDDI Registry
3: Generally Useful Services and Features (OGSA and other GGF, W3C) Such as
“Collaborate”, “Access a Database” or “Submit a Job” 4: Application or Community of Interest (CoI Specific Services such as “Map Services”, “Run
BLAST” or “Simulate a Missile”
1: Container and Run Time (Hosting) Environment (Apache Axis, .NET etc.)
XBM
XTCE VOTABLE CML
CellML
The Ten areas covered by the 60 core WS-* Specifications
WSRP (Remote Portlets)
10: Portals and User Interfaces
WS-Policy, WS-Agreement
9: Policy and Agreements
WSDM, WS-Management, WS-Transfer
8: Management
WSRF, WS-MetadataExchange, WS-Context
7: System Metadata and State
UDDI, WS-Discovery
6: Service Discovery
WS-Security, WS-Trust, WS-Federation, SAML, WS-SecureConversation
5: Security
BPEL, WS-Choreography, WS-Coordination
4: Workflow and Transactions
WS-Notification, WS-Eventing (Publish-Subscribe)
3: Notification
WS-Addressing, WS-MessageDelivery; Reliable Messaging WSRM; Efficient Messaging MOTM
2: Service Internet
XML, WSDL, SOAP
1: Core Service Model
Examples WS-* Specification Area
Activities in Global Grid Forum Working Groups
Authorization, P2P and Firewall Issues, Trusted Computing
7: Security
Resource/Service configuration, deployment and lifetime, Usage records and access, Grid economy model
6: Management
Network measurements, Role of IPv6 and high performance networking, Data transport
5: Infrastructure
Database and File Grid access, Grid FTP, Storage Management, Data replication, Binary data specification and interface, High-level publish/subscribe, Transaction management
4: Data
Job Submission, Basic Execution Services, Service Level Agreements for Resource use and reservation, Distributed Scheduling
3: Compute
Software Interfaces to Grid, Grid Remote Procedure Call, Checkpointing and Recovery, Interoperability to Job Submittal services, Information Retrieval,
2: Applications
High Level Resource/Service Naming (level 2 of slide 6), Integrated Grid Architecture
1: Architecture
GS-* and OGSA Standards Activities GGF Area
Two-level Programming I
• The Web Service (Grid) paradigm implicitly assumes a
two-level Programming Model
• We make a
Service
(same as a “distributed object” or
“computer program” running on a remote computer) using
conventional technologies
– C++ Java or Fortran Monte Carlo module – Data streaming from a sensor or Satellite – Specialized (JDBC) database access
• Such
services
accept and produce data from users files and
database
• The Grid is built by coordinating such services assuming
we have solved problem of programming the service
Servic
e Data
Two-level Programming II
n
The Grid is discussing the composition of distributed
services
with the runtime
interfaces to Grid as
opposed to UNIX
pipes/data streams
n
Familiar from use of UNIX Shell, PERL or Python
scripts to produce real applications from core programs
n
Such interpretative environments are the single
processor analog of
Grid Programming
n
Some projects like GrADS from Rice University are
looking at integration between service and composition
levels but dominant effort looks at each level separately
Service
1 Service2
Service
Grid Workflow Data Assimilation in Earth Science
n Grid services triggered by abnormal events and controlled by workflow process real
time data from radar and high resolution simulations for tornado forecasts Typical
graphical interface to service
3 Layer Programming Model
Application (level 1 Programming)
Application Semantics (Metadata, Ontology) Level 2 “Programming”
Basic Web Service Infrastructure
Web Service 1
Workflow (level 3) Programming BPEL
WS 2 WS 3 WS 4
MPI Fortran C++ etc.
Semantic Web
Database
S S
S
S SS SS SS SS SS SS SS SS
F S F S F S F S F S F S F S F
S SF
F S F S F S F S F S F S F S F S F S F S F
S Portal
F S O S O S O S O S O S O S O S O S O S O S O S O S MD MD MD MD MD MD MD MD MD
MetaData Filter Service Sensor Service Other Service Anothe Grid
Raw Data Data Information Knowledge Wisdom Decisions S S S S Anothe Service Anothe Service S S Anothe
Grid S S
Information Management/Processing
n SOAP messages transport information expressed in a
semantically rich fashion between sources and services that enhance and transform information so that complete system provides
• Semantic Web technologies like RDF and OWL help us have
rich expressivity
n Data Information Knowledge transformation n We build application specific information
management/transformation systems ASIS for each application domain
n One special domain is the system itself where the metadata
Generalizing a GIS
n
Geographical Information Systems
GIS have been
hugely successful in all fields that study the earth and
related worlds
• They define Geography Syntax (GML) and ways to store,
access, query, manipulate and display geographical features
• In SOA, GIS corresponds to a domain specific XML language
and a suite of services for different functions above
n
However such a universal information model has
not
been developed in other areas
even though there are
many fields in which it appears possible
• BIS Biological Information System • MIS Military Information System
• IRIS Information Retrieval Information System • PAIS Physics Analysis Information System
ASIS Application Specific Information System I
n a) Discovery capabilities that are best done using WS-*
standards
n b) Domain specific metadata and data including
search/store/access interface. (cf WFS). Lets call generalization
ASFS (Application Specific Feature Service)
• Language to express domain specific features (cf GML). Lets call
this ASL (Application Specific language)
• Tools to manipulate information expressed in language and key
data of application (cf coordinate transformations). Lets call this ASTT (Application specific Tools and Transformations)
• ASL must support Data sources such as sensors (cf OGC metadata
and data sensor standards) and repositories. Sensors need
(common across applications) support of streams of data
• Queries need to support archived (find all relevant data in past)
and streaming (find all data in future with given properties)
• Note all AS Services behave like Sensors and all sensors are
wrapped as services
• Any domain will have “raw data” (binary) and that which has been
ASIS Application Specific Information System II
n Lets call this ASVS (Application Specific Visualization Services)
generalizing WMS for GIS
n The ASVS should both visualize information and provide a way of
navigating (cf GetFeatureInfo) database (the ASFS)
n The ASVS can itself be federated and presents an ASFS output
interface
n d) There should be application service interface for ASIS from which all
ASIS service inherit
n e) There will be other user services interfacing to ASIS
n All user and system services will input and output data in ASL using
filters to cope with ASBD
AS Tool (generic ) A “Sensor ” A Repository AS Service (user defined) ASVS Displa y AS Tool (generic )
Messages using ASL
Filter, Transformation, Reasoning, Data-mining, Analysis
Mashups v Workflow?
n Mashup Tools are reviewed at http://blogs.zdnet.com/Hinchcliffe/?p=63 n Workflow Tools are reviewed by Gannon and Fox
http://grids.ucs.indiana.edu/ptliupages/publications/Workflow-overview.pdf
n Both include
scripting in PHP, Python, sh etc. as both implement distributed
programming at level of services
n Mashups use all
types of service
interfaces and do not have the potential
robustness (security) of Grid service
approach
n Typically “pure”
Web 2.0 APIs
http://www.programmableweb.com/apis
currently
(March 3 2007) 388 Web 2.0 APIs with GoogleMaps the
most used in Mashups
This site acts as a “UDDI” or “OGC Catalog” for Web
The List of
Web 2.0 API’s
Each site has API
and its features
Divided into
broad categories
Only a few used a
lot (34 API’s used
in more than 10
mashups)
RSS feed of new
3 more Mashups
each day
For a total of 1609
March 3 2007
Note ClearForest
runs Semantic Web Services Mashup
competitions (not workflow
competitions)
Some Mashup
types: aggregators, search aggregators, visualizers, mobile, maps, games
APIs/Mashups per Protocol Distribution
REST SOAP XML-RPC REST,
XML-RPC XML-RPC,REST, REST,SOAP JS Other
google maps
netvibes
live.com
virtual earth
google search
amazon S3
amazon ECS
flickr
ebay
youtube 411syncdel.icio.us
yahoo! search yahoo! geocoding
technorati
yahoo! images trynt
yahoo! local
Number of Mashups