Services and the Semantic
Grid
SKG2005 Beijing China November 28 2005
Geoffrey Fox
Computer Science, Informatics, Physics Pervasive Technology Laboratories Indiana University Bloomington IN 47401
Data Deluged Science
n In the past, we worried about data in the form of parallel I/O or
MPI-IO, but we didn’t consider it as an enabler of new science and new ways of computing
n Data assimilation was not central to HPCC
n DoE ASCI set up because didn’t want test data!
n Now particle physics will get 100 petabytes from CERN
• Nuclear physics (Jefferson Lab) in same situation • Use around 30,000 CPU’s simultaneously 24X7
n Weather, climate, solid earth (EarthScope)
n Bioinformatics curated databases (Biocomplexity only 1000’s of data points at present)
n Virtual Observatory and SkyServer in Astronomy n Environmental Sensor nets
Information/Knowledge Grids
n
Distributed
(10’s to 1000’s) of
data sources
(instruments,
file systems, curated databases …)
n
Data Deluge
: 1 (now) to 100’s
petabyte
s/year (2012)
• Moore’s law for Sensors
n
Possible
filters
assigned dynamically (
on-demand
)
•
Run image processing algorithm on telescope image
•Run Gene sequencing algorithm on compiled data
n
Needs
decision support
front end with “what-if”
simulations
n
Metadata
(
provenance
)
critical to annotate data
n
Integrate
across experiment
as in multi-wavelength
astronomy
Semantically Rich Services with a Semantically
Rich Distributed Operating Environment
Database S S S S S S S S S S S S S S S S S S S
S SS SS SS SS SS SS SS SS
F S F S F S F S F S F S F S F S F
S SF
F S F S F S F S F S F S F S F S F S F S F
S Portal
F S O S O S O S O S O S O S O S O S O S O S O S O S MD MD MD MD MD MD MD MD MD
MetaData Filter Service
Sensor Service
Other
Service
Semantic Grid and Services
n Implications of SOA (Service Oriented Architectures) for SG (Semantic Grid)
• Build services to implement SG n Implications of SG for SOA
• Build metadata rich systems of services using SG
n Services receive data in SOAP messages, manipulate it and produce transformed data as further messages
n Meta-data is carried in SOAP messages
n Meta-data controls processing and transport of SOAP Messages n Knowledge is created from data by services
n The Grid enhances Web services with semantically rich system and application specific management
n One must exploit and work around
the
different
approaches tometa-data and their manipulation in Web Services
Structure of SOAP Messages
n SOAP Messages have System information in the header including WS-Policy based meta-data defining processing options
• Processed by Handlers
n Application data and meta-data is the body (controversies here!) • Processed by the Service itself
n Some meta-data like WS-RF is logically “only in messages” n Other like that in WS-Context or the SRB are stored in logical
equivalent of XML databases
n We only need to preserve semantic structure (XML/SOAP
Infoset) so transport in fast XML and store in efficient relational databases
H1 H2 H3 H4 Body F1 F2 F3 F4 Service Container Handlers
Container Workflow
What Type of Services are there?
n There are a horde of support services supplying security, collaboration, database access, user interfaces
n The support services are either associated with system or
application
• We will study the WS-* and GS-* which implicitly or explicitly define many support services
n There are generalized filter services which are applications that accept messages and produce new messages with some data
derived from that in input
• Simulations (including PDE’s and reactive systems) • Data-mining
• Transformations
• Agents
• Reasoning are all termed filters here
n There are services like “author ontology”, “parse RDF” or “attach provenance” that directly support Semantic Grid
n But all services and their interactions are bathed in sea of
meta-data and so implicitly need and support the Semantic Grid
It’s a Composite Hierarchical World
n Filters can be a workflow which means they are “just collectionsof other simpler services”
• One needs meta-data to control the workflow
n Services are programs that accept messages and produce
messages
n Grids are a distributed collection of services supporting managed shared resources
• Management requires meta-data
n Grids are distributed systems that accept distributed messages
and produce distributed result messages
• Can always talk about Grids and view a service or a
workflow as a special case of a Grid
n It just requires meta-data to send a message to a Grid and it
routed to “correct computer” holding “requested service” • Meta-data allows mapping of virtual to real addresses
Semantically Rich Services with a Semantically
Rich Distributed Operating Environment
Database S S S S S S S S S S S S S S S S S S S
S SS SS SS SS SS SS SS SS
F S F S F S F S F S F S F S F S F
S SF
F S F S F S F S F S F S F S F S F S F S F
S Portal
F S O S O S O S O S O S O S O S O S O S O S O S O S MD MD MD MD MD MD MD MD MD
MetaData Filter Service Sensor Service Other Service SOAP Message Streams SOAP Message Streams
Raw Data Raw Data Raw Data Raw Data Data Data Data Data Information Information Knowledge Knowledge Wisdom Decisions Information Anothe Servic e Anothe Servic e Anothe
Grid Grids of Grids Architecture AnotheGrid
is same as outward facing applicatio
The Grid and Web Service Institutional Hierarchy
1: Container an
Run Time (Hosting) Environment 2: System Services and Features
Handlers like WS-RM, Security, Programming Models like BPE or Registries like UDDI
3: Generally Useful Services and Features Such as “Access a Database” or “Submit a Job” or “Semantic
Grid” or “Support a Portal” or “Collaborative Visualization” 4: Application or Community of Interes
Specific Services
such as “Run BLAST” or “Look at Houses for sale”
OGS
and othe GGF/W3C/ ………
WS-* fro OASIS/W3C Industry
Apache Axi .NET etc.
The WS-* Infrastructure
n
Core Grid Services build on and/or extend the 60 or so
WS-* Infrastructure specifications which define
• 1. Container Model, XML, WSDL …
• 2. Service Internet ( (Reliable) Messaging, Addressing) including extensions for high performance transport and representation. This is natural basis for streaming
applications • 3. Notification
• 4. Workflow and Transactions • 5. Security
• 6. Service Discovery
• 7. Metadata and State including lifetime • 8. Management (service interactions) • 9. Policy, Agreements
• 10. Portals and User Interfaces
These categories
are directly connected to metadata
A List of Web Services 6
•
6) Service Discovery
•
UDDI
(Broadly Supported OASIS Standard) V3
August
2003
•
WS-Discovery
Web services Dynamic Discovery
(Microsoft, BEA, Intel …)
February 2004
•
WS-IL
Web Services Inspection Language, (IBM,
Microsoft)
November 2001
•
Note
WS-Context
as a metadata catalog and
WS-Management Catalog
are examples of related services
•
There are many
UDDI extensions
such as
Grimoires
from
UK OMII which often are essentially providing semantic
enrichment
Discovery is just accessing part of meta-data
defining a Grid
A List of Web Services 7
• 7) Metadata and State
• RDF Resource Description Framework (W3C) Set of
recommendations expanded from original February 1999 standard
• DAML+OIL combining DAML (Darpa Agent Markup Language) and OIL (Ontology Inference Layer) (W3C) Note December 2001
• OWL Web Ontology Language (W3C) Recommendation February 2004
• WS-MetadataExchange Web Services Metadata Exchange (BEA, IBM, Microsoft, SAP, Sun …) September 2004
• ASAP Asynchronous Service Access Protocol (OASIS) with V1.0 working draft 2B December 11 2004
• WS-GAF Web Service Grid Application Framework (Arjuna, Newcastle University) August 2003
• WBEM Web-Based Enterprise Management including CIM (Common Information Model) from DMTF (Distributed
Management Task Force) 2004-2005
A List of Web Services 7
•
7) Metadata and State: Resource Framework
•
WS-RF
Web Services Resource Framework (OASIS)
including
•
WS-Resource Framework
Web Services Resource 1.2
(OASIS) Public Review Draft 01,
10 June 2005
•
WS-ResourceProperties
Web Services Resource
Properties V1.2 Public Review Draft 01,
10 June 2005
•
WS-ResourceLifetime
Web Services Resource Lifetime
V1.2 Public Review Draft 01,
13 June 2005
•
WS-ServiceGroup
Web Services Service Group V1.2
Public Review Draft 01,
10 June 2005
•
WS-BaseFaults
Web Services Base Faults V1.2 Public
Review Draft 01,
June 13, 2005
These WS-* define syntax of Meta-data (RDF
OWL CIM) and how to use it in system
Metadata and Service Context
n Consider a collection of services working together
• Workflow tells you how to specify service interaction but
more basically there is shared information or context specifying/controlling collection
n WS-RF and WS-GAF have different approaches to
contextualization – supplying a common “context” which at its
simplest is a token to represent state
n More generally core shared information includes dynamic
service metadata and the equivalent of configuration information.
n Two services linked by a stream are perhaps simplest
example of a collection of services needing context
n Note that there is a tension between storing metadata in
messages and services.
• This is shared versus distributed memory debate in
parallel computing
Stateful Interactions
n
There are (at least) four approaches to
specifying state
•
OGSI
use factories to generate separate services for
each session in standard distributed object fashion
•
Globus GT-4
and
WSRF
use metadata of a resource
to identify state associated with particular session
•
WS-GAF
uses
WS-Context
to provide abstract
context defining state. Has strength and weakness
that reveals less about nature of session
•
WS-I+
“Pure Web Service” leaves state specification
the application – e.g. put a context in the SOAP body
n
I think we should smile and write a
great metadata
(semantic) service
hiding all these different models for
state and metadata
Role of WS-Context
n There are many WS-* specifications addressing meta-data
and both many approaches and many trade-offs
n We hear about Distributed Hash Tables (Chord) to achieve
scalability in large scale networks
n Managed dynamic workflows as in sensor integration and
collaboration require
• Fault-tolerance and ability to support dynamic changes
with few millisecond delay
• But only a modest number of involved services (up to
1000’s in a session)
• Need Session NOT Service/Resource meta-data so don’t
use WS-RF
n We are building a WS-Context compliant metadata catalog
supporting distributed or central paradigms – see later talk by Mehmet Aktas
n Use for OGC Web catalog service with UDDI for slowly
varying meta-data
A List of Web Services 8
•
8) Management
•
WS-DistributedManagement
Web Services
Distributed Management Framework with MUWS
and MOWS below (OASIS)
•
WSDM-MUWS
Web Services Distributed
Management: Management Using Web Services
(OASIS) OASIS Standard
March 9 2005
•
WSDM-MOWS
Web Services Distributed
Management: Management of Web Services
(OASIS) OASIS Standard
March 9 2005
A List of Web Services 8- Contd
•
8) Management: Microsoft Stack
•
WS-Management
Web Services for Management
(Microsoft, Intel, Sun …)
August 2005
•
WS-Management Catalog
The WS-Management
Catalog (Microsoft, Intel, Sun …)
August 2005
•
WS-Transfer
Web Service Transfer (Microsoft,
BEA, Sonic Software etc.)
September 2004
•
WS-Enumeration
Web Service Enumeration
(Microsoft, BEA, Sonic Software etc.)
September
2004
These WS-* define exchange of data and meta-data
between services
A List of Web Services 9
•
9) General Service Characteristics
•
WS-PolicyFramework
Web Services Policy
Framework (BEA, IBM, Microsoft, SAP …)
September 2004
•
WS-PolicyAttachment
Web Services Policy
Attachment (BEA, IBM, Microsoft, SAP …)
September 2004
•
WS-PolicyAssertions
Web Services Policy Assertions
Language (BEA, IBM, Microsoft, SAP)
18 December
2002
(Superseded by WS-PolicyFramework)
•
WS-Agreement
Web Services Agreement
Specification (GGF under development)
9 August 2004
These WS-* define syntax of Meta-data defining
structure of distributed System
Grids are managed (meta-data enhanced)
Activities in Global Grid Forum Working Groups
Authorization, P2P and Firewall Issues, Trusted Computing
7: Security
Resource/Service configuration, deployment and lifetime, Usage records and access, Grid economy model
6: Management
Network measurements, Role of IPv6 and high performance networking, Data transport
5: Infrastructure
Database and File Grid access, Grid FTP, Storage Management, Data replication, Binary data specification and interface, High-level publish/subscribe, Transaction management
4: Data
Job Submission, Basic Execution Services, Service Level Agreements for Resource use and reservation, Distributed Scheduling
3: Compute
Software Interfaces to Grid, Grid Remote Procedure Call, Checkpointing and Recovery, Interoperability to Job Submittal services, Information Retrieval,
2: Applications
High Level Resource/Service Naming (level 2 of fig. 1), Integrated Grid Architecture
1: Architecture
Standards Activities GGF Area
Two-level Programming I
• The Web Service (Grid) paradigm implicitly assumes a
two-level Programming Model
• We make a
Service
(same as a “distributed object” or
“computer program” running on a remote computer) using
conventional technologies
– C++ Java or Fortran Monte Carlo module – Data streaming from a sensor or Satellite – Specialized (JDBC) database access
• Such
services
accept and produce data from users files and
database
• The Grid is built by coordinating such services assuming
we have solved problem of programming the service
Servic
e Data
Two-level Programming II
n
The Grid is discussing the composition of distributed
services
with the runtime
interfaces to Grid in
analogy to UNIX
pipes/data streams
n
Familiar from use of
UNIX Shell, PERL or Python
scripts
to produce real applications from core programs
n
Such interpretative environments are the single
processor analog of
Grid Programming
n
Some projects like GrADS from Rice University are
looking at
integration
between
service and composition
levels
but dominant effort looks at each level
separately
Service
1 Service2
Service
3 Service4
WS 2 WS N-1
Web Service 1 Web Service N
3 Layer Programming Model
Level 2 Programming choosing services by virtualization
Application Semantics (Metadata, Ontology) Semantic Grid Level 1 Programming inside services
Application expressed in in Java Fortran C++ MPI etc.
Level 3 Grid Programming composing multiple services
Service Workflow, Transactions, Mediation WS-* Infrastructure
Substantial work in UK e-Science program, international semantic web community
Information Architecture and Semantic Grid
n
WS-*
provides key low level capability but deliberately
does not define an information (data) architecture
and
leaves this to domain specific specification activities such
as
CellML/SBML for biology
,
WFS/GML
for
GIS
and
XGSP for Collaboration
n
WS-*
does define a primitive
service discovery
(UDDI)
and
meta-data
capabilities including
Context,
WS-RF, RDF and WS-MetadataExchange
already discussed.
n
GGF
defines Grid data capabilities including
info-D
(publish/subscribe) and
OGSA-DAI
for data repositories
n
Semantic Grid
uses
WS-* and GS-*
extending meta-data
and service discovery with data-mining and reasoning
3 XML Databases of Importance
n WS-Context controlling a workflow
n (Extended) UDDI supporting semantic service discovery n WFS or ASFS (see later) provides application specific
data/meta-data repository)
n These have different performance, scalability and data unit size requirement
n In our implementation, each is currently “just an
Oracle/MySQL” database front ended by filters that convert between XML (GML for WFS) and object-relational Schema • Example of Semantics (XML) versus representation (SQL)
difference
n OGSA-DAI offers Grid interface to databases – we could use but don’t as we only need to expose WFS and not MySQL to Grid
Information Management/Processing
n SOAP messages transport information expressed in asemantically rich fashion between sources and services that enhance and transform information so that complete system provides
• Semantic Web technologies like RDF and OWL help us have rich expressivity
n Data Information Knowledge transformation n We build application specific information
management/transformation systems ASIS for each application domain
n One special domain is the system itself where the metadata
associated with services, sessions, Grids, messages, streams and workflow is itself managed and supported by an SIIS
Generalizing a GIS
n
Geographical Information Systems
GIS have been
hugely successful in all fields that study the earth and
related worlds
• They define Geography Syntax (GML) and ways to store, access, query, manipulate and display geographical features • In SOA, GIS corresponds to a domain specific XML language
and a suite of services for different functions above
n
However such a universal information model has
not
been developed in other areas
even though there are
many fields in which it appears possible
• BIS Biological Information System • MIS Military Information System
• IRIS Information Retrieval Information System • PAIS Physics Analysis Information System
• SIIS Service Infrastructure Information System
ASIS Application Specific Information System I
n a) Discovery capabilities that are best done using WS-* standards
n b) Domain specific metadata and data including
search/store/access interface. (cf WFS). Lets call generalization
ASFS (Application Specific Feature Service)
• Language to express domain specific features (cf GML). Lets call
this ASL (Application Specific language)
• Tools to manipulate information expressed in language and key data of application (cf coordinate transformations). Lets call this
ASTT (Application specific Tools and Transformations)
• ASL must support Data sources such as sensors (cf OGC metadata
and data sensor standards) and repositories. Sensors need
(common across applications) support of streams of data
• Queries need to support archived (find all relevant data in past) and streaming (find all data in future with given properties)
• Note all AS Services behave like Sensors and all sensors are
wrapped as services
• Any domain will have “raw data” (binary) and that which has been
filtered to ASL. Lets call ASBD (Application Specific Binary Data)
ASIS Application Specific Information System II
n Lets call this ASVS (Application Specific Visualization Services)
generalizing WMS for GIS
n The ASVS should both visualize information and provide a way of
navigating (cf GetFeatureInfo) database (the ASFS)
n The ASVS can itself be federated and presents an ASFS output
interface
n d) There should be application service interface for ASIS from which all
ASIS service inherit
n e) There will be other user services interfacing to ASIS
n All user and system services will input and output data in ASL using
filters to cope with ASBD
AS Tool (generic
) A
“Sensor ” A Repository
AS Service (user defined)
ASVS Displa
y AS Tool
(generic )
Messages using ASL
Filter, Transformation, Reasoning, Data-mining, Analysis
Everything Is a Service
or a message/ Information
Nugget Militar Informatio Management
System
Directly GS-* WS-*
ASVS
Filters/ASTT
MI
or Military Information
Object
Unit of Managed Information expressed in
ASL
OGSA-DAI and Sensor Standards
Info-WS-Notification WS-Eventing ASF
S
Information Resource Receive Request/Selec t Get Statu s ASL Data Get I S = Information Servic (Sensor Service o Repository) BF S = Basic Filte Service
F
S =
BF S
BF
S BFS
BF
S BFS
BF S
A Filter Service is a general workflow (the microscopic workflow) of Basic Filter Services
A transport link supports asynchronous publish/subscribe semantics and Web Service Reliable messaging fault tolerance
Transport links can be multicast to support collaboration (typically for last link before or after Presentation Service) or replication for fault tolerance.
The output of a Filter Service is
indistinguishable from that of an IS
F S
I S
F S I
S
F S I
S
F S IS
Gridlet =
Top IS could be produced by a Filter Service
The basic unit (Gridlet) transforms and aggregates application specific information
Gridlets are composed using Grid of Grids concept
IS Gridlet IS
Gridlet
IS Gridlet IS
Gridlet IS
Gridlet GridletIS GridletIS
IS Gridlet
Search Planning Construction
Management
Portal
Presentation Federation
Macrosopic Workflow General SystemServices ---Messaging/Data transport
Notification Security
Fault Tolerance Metadata
Directory
Collaboration Replica
Management Session
Management
ASVS
Semantically Rich Services with a Semantically
Rich Distributed Operating Environment
Database S S S S S S S S S S S S S S S S S S S
S SS SS SS SS SS SS SS SS
F S F S F S F S F S F S F S F S F
S SF
F S F S F S F S F S F S F S F S F S F S F
S Portal
F S O S O S O S O S O S O S O S O S O S O S O S O S MD MD MD MD MD MD MD MD MD
MetaData Filter Service Sensor Service Other Service SOAP Message Streams SOAP Message Streams
Raw Data Raw Data Raw Data Raw Data Data Data Data Data Information Information Knowledge Knowledge Wisdom Decisions Information Anothe Servic e Anothe Servic e Anothe
Grid Grids of Grids Architecture AnotheGrid
is same as outward facing applicatio
Summary
n
Virtualization
everywhere
n
Focus on
semantics not representation
to get
performance
combined with
expressivity
for transport
and data access
n
All this enabled by powerful
meta-data services
n
Grids add management
to rich but potentially chaotic
set of Web Services;
• management and coherence enabled by meta-data
n
Can define
general information architectures
(ASIS,
GIS, SIIS) for both applications and system
n
Knowledge
from
filters
that span
simulations,
data-mining, reasoning
and
agents
n