Web Service Grids for iSERVO
International Workshop on Geodynamics: Observation,
Modeling and Computer Simulation
University of Tokyo Japan
October 14 2004
Geoffrey Fox
Community Grids Lab Indiana University
e-Infrastructure
e-Infrastructure builds on the inevitable increasing performance
of networks and computers linking them together to support new flexible linkages between computers, data systems and people
• Grids and peer-to-peer networks are the technologies that build e-Infrastructure
• e-Infrastructure called CyberInfrastructure in USA
We imagine a sea of conventional local or global connections
supported by the “ordinary Internet”
• Phones, web page accesses, plane trips, hallway conversations
• Conventional Internet technology manages billions of
broadcast or low (one client to Server) or broadcast links
On this we superimpose high value multi-way organizations
(linkages) supported by Grids with optimized resources and system support and supporting virtual (electronic) enterprises
• Low multiplicity fully interactive real-time sessions
Web services
•
Web Services
build
loosely-coupled,
distributed
applications,
(wrapping existing codes
and databases) based on
the
SOA
(service
oriented architecture)
principles.
•
Web Services interact by
exchanging messages in
SOAP
format
•
The contracts for the
message exchanges that
implement those
What is a Grid?
•
You won’t find a clear description of what is Grid and how
does
differ
from a
collection of Web Services
– I see no essential reason that Grid Services have different requirements than Web Services
– Geoffrey Fox, David Walker, e-Science Gap Analysis, June 30 2003. Report UKeS-2003-01,
http://www.nesc.ac.uk/technical_papers/UKeS-2003-01/index.html.
– Notice “service-building model” is like programming language – very personal!
•
Grids were once defined as “Internet Scale Distributed
Computing” but this isn’t good as Grids depend as much if
not more on data as well as simulations
Community Resources
Grid Community databases have analogy to Television and the
News Web that allow individuals to communicate instantly with each other via Web Pages and Headline News acting as proxies
N resources deposit information and N can view – Complexity
Large and Small Grids
N
resources in a
community
(N is billions for the world
and
1000-10000
for many scientific fields)
Communities
are arranged hierarchically with real
work being done in
“groups” of M resources
– M could
be
10-100
in e-Science
Metcalfe’s law
: value of network grows like square of
number of nodes M – we call Grids where this true
Metcalfe or M
2Grids
Nature of Interaction depends on size of M or N
• Shared Information O(N) Complexity Grids for largish N
• Complexity M2 Metcalfe Grids for smaller M < N
Grids must merge with peer-to-peer networks
to
M
2
Interactions
•
Superimpose M
2“Grids” on the sea
(heatbath) of O(N)
“ordinary”
Database Database Analysis and Visualizatio Portal Repositorie Federated Databases Data Filte Services
Field Trip Data
Streaming Data Sensor s
?
Discovery Services SERVOGrid Researc Simulation s Research Education Customization Services From Researc to Education Educatio Grid Computer FarmGeoscience Research and Education Grids
GI Grid
Sensor Grid Database Grid
Grids and Earthquake Science
• Complexity N ≈ 1000 to 10000 Community resources building
– Thousands of Data Servers of raw and curated data – Services filtering and mining data
– Simulation Services – Visualization Services
– Geographical Information Services – Registry and metadata Services
• These services can support several communities
– National and International earth science researchers
– Emergency response and critical infrastructure planning and management
• Web Services will harmonize different countries (SERVO to iSERVO)
• Web Services will harmonize members of a community and between communities with common resources
– Curation will bring data to interoperable certified form
• National and International research collaborations analyzing particular ideas with many M2 Complexity Grids
(i)SERVO Web (Grid) Services
• Programs: All applications wrapped as Services using proxy strategy
• Job Submission: supports remote batch and shell invocations
– Used to execute simulation codes (VC suite, GeoFEST, etc.), mesh generation (Akira/Apollo) and visualization packages (RIVA, GMT).
• File management:
– Uploading, downloading, backend crossloading (i.e. move files between remote servers)
– Remote copies, renames, etc.
• Job monitoring
• Workflow: Apache Ant-based remote service orchestration (NCSA)
– Move towards a BPEL framework (can still implement with ANT)
• Database services: support SQL queries
– Expect Simpler version of OGSA-DAI (“Web Service-DAI”) Grid Database
• Data services: support interactions with XML-based fault and surface observation data.
– For simulation generated faults (i.e. from Simplex)
– XML data model being adopted for common formats with translation services to “legacy” formats.
Integration of Services
n
Use
OGCE Grid Portal Architecture
to allow importing of
existing Grid Services and their user interfaces
n
Can expect GGF activities like
OGSA
to define/refine interfaces
and projects around the world to produce more powerful services
which can easily be added replacing existing services
n
Geoscience Education Grid
by transformations on research grid
nEmergency Response and Planning Grids
by adding real-time
control/collaboration and GIS tools
•
These additions common to all crises
Aggregation Portal
Service-1
Service-N
GUI-1
OGCE
Consortiu m
Individual portlet for the Proxy Manager Use tabs or choose
different portlets to navigate through
interfaces to different
services
2 Other Portlets
Key Grid Features of iSERVO
•
The
service model
avoids a lot of the security
complications that have caused trouble in other
simulation based Grids
– We don’t support from the portal general computer logins – you can run Geofest and not rm –r *
•
Geographical Information Systems
is key set of
generally useful service
•
Currently largely
file
based but
streams
will become
more important
– Data moves directly between services and is not necessarily written to and read from files
– Must support high performance (fast) streams
File ServiceFilter
Filter Service Fil
based
HPC Simulation Data Filter Data Filter Data Filter Data Filt er Data Filter Distributed Filters massage data For simulation Other Gri
and W eb Servi ces Analysi Control Visualize
Data Deluged
Scienc
Computing
Architecture
Grid OGSA-DA Grid Services Grid Data AssimilationWhich is better use of money
Geographical Information Service
(GIS) Data Formats and Services
n OpenGIS Consortium (OGC) is an international group for defining
GIS data formats and services.
n Main data format language is the XML-based GML.
• Subdivided into schemas for drawing maps, representing
features, observations, …
n First Step: design GML schemas and build specialized Web
Services for GPS and Earthquake data.
n OGC also defines services.
• Services include Web Features Services, Web Map Services,
n Next Step: Implement OGC compatible Web Services for this
problem i.e. build a GIS Grid
Intend to build OGC compatible map and feature services supporting high performance simulations
Grid Information Service Integrating
GIS Web and Feature Services
california river data @gridnode2
WMS
IS
WFS
WFS
WFS
california fault data @gridnode1
california boundary data @gridnode3
UDDI
• Need to support dynamic feature services with
different access restrictions (especially in
Different Performance Issues for iSERVO
• All systems are built of interlinked entities
– Nature, Society, Grids and Parallel computing all link entities by messages
• Most(all) complex systems have a hierarchical architecture
– Grids link large macroscopic systems including sensors, databases, parallel computers
– Parallel Computers consists of many desktop size nodes
– Nodes have hierarchical memory structure with many cache levels
• Systems have dimension d ≈ 2 to 3
• Communication bandwidth into a system of complexity C is proportional to C(1-1/d) (Bandwidth/C α C-1/d)
– C(Grid Resource) = M C(Desktop) where M ≈ 1 to 1000 is typical number of nodes in simulation resource
• Parallel Computers need gigabit or better internal node
bandwidth and node to node latency of around a microsecond • Grids will have terabit bandwidth but latency is AT BEST a
millisecond (nodes next to each other) and is better considered as 100 milliseconds or greater across countries
Two ways of Linking Modules
n
Method based
linkage of classic
programmin
n
Message based
Grid and Service
linkage
Module Module
Method Call
.001 to 1 millisecond
Service Service Message
s
Grid Programming Model
All SERVOGrid capabilities are built as Web Services with 3 level programming model
Application (level 1 Programming)
Application Semantics (Metadata, Ontology) Level 2 “Programming”
Systems Metadata (Context, State)
Basic WS-* Infrastructure
Web Service 1
Workflow (level 3) Programming Of Services AND Streams
WS 2 WS 3 WS 4
Fortran, C++, Java (Method based)
Semantic Web (Message based)
What is a Simple Service?
• Take any system – it has multiple functionalities
– We can implement each functionality as an independent distributed service
– Or we can bundle multiple functionalities in a single service • Whether functionality is an independent service or one of many
method calls into a “glob of software”, we can always make them as Web services by converting interface to WSDL
• Simple services are gotten by taking functionalities and making as small as possible subject to “rule of millisecond”
– Distributed services incur messaging overhead of one (local) to
100’s (far apart) of milliseconds to use message rather than method call
– Use compiled integration of functionalities ONLY when require <1 millisecond interaction latency
Grids of Grids of Simple Services
• Link via methods messages streams
• Services and Grids are linked by messages
• Internally to service, functionalities are linked by methods
• A simple service is the smallest Grid
• We are familiar with method-linked hierarch
Lines of Code Methods Objects Programs Packages
Overlay and Compose
Grids of Grids Methods Services Component Grids
CPUs Clusters Compute Resource Grids MPPs
Databases DatabasesFederated
Sensor Sensor Nets
Data
Component Grids?
• So we build collections of Web Services which we
package as
component Grids
–
Visualization Grid
–
Sensor Grid
–
Utility Computing Grid
–
Person (Community) Grid
–
Earthquake Simulation Grid
–
Control Room Grid
–
Crisis Management Grid
• We build bigger Grids by
composing component
Grids
using the
Service Internet
and
Service
Critical Infrastructure (CI) Grids built as Grids of Grids of Services
Flood Service and Filters
Physical Network Registr
y Metadata
Earthquake Services Earthquake CIGrid
Flood CIGrid
…
ElectricityCIGrid…
Data
Access/Storage Securit
y Notification Workflow Messaging Portal
s VisualizationGrid Collaboration
Grid
Sensor Grid Compute
Grid GIS Grid
iSERVO Strategy
• Agree on what (type of) resources and capabilities need to put on the ISERVO Grid
– Computers, instruments, databases, visualization, maps, job submittal ….
• Agree on interfaces to resources from OGSA-DAI (databases) to particular data structures (GML/OpenGIS) – specify in XML
• Implement Resources and Capabilities as Services
– User Interface should be a portlet that can be integrated by the portal into web interface
• Make certain overarching Grid capabilities such as workflow,
federation and metadata are sufficient
• SERVO Grid is a prototype of this strategy using several US sites rather than several countries
– Can be naturally extended to iSERVO, education, emergency response by extending resources
Further iSERVO Challenges
•
Make everything a
Service
•
Understand algorithms and implementation for
data
assimilation
•
Agree on
security
and
access control
policies
•
Think about
Data Curation
– Set up policies for observational data and criteria for inclusion in iSERVO data repositories
•
Think about
Data Provenance
– Generate and maintain metadata describing ownership, origins and transformations
– Applies to both “experimental data” and results from simulations (visualizations)
•
Curation and Provenance change in research methodologies
and requires funding!
Architecture of (Web Service) Grids
Grids built from
Web Services
communicating through
an overlay network built in SOFTWARE on the
“ordinary internet” at the application level
• A new Internet built with SOAP messages replacing TCP pockets
Grids provide the
special quality of service
(security,
performance, fault-tolerance) and customized services
needed for “distributed complex enterprises”
• Developing Web Service compatible high bandwidth streaming transports
We need to work with Web Service community as they
debate the 60 or so proposed Web Service specifications
• Use Web Service Interoperability WS-I as “best practice”
• Must add further specifications to support high performance
• Database “Grid Services” for N plus N case
Importance of SOAP
• SOAP defines a very obvious message structure
with a
header
and a
body
• The
header
contains information used by the
“
Internet operating system
”
–
Destination, Source, Routing, Context, Sequence
Number …
• The
message body
is only used by the
application
and will never be looked at by “operating system”
except to encrypt, compress it etc.
• Much discussion in field revolves around
what is in
header!
Web Services
• Java is very powerful partly due to its many “frameworks” that generalize libraries e.g.
– Java Media Framework
– Java Database Connectivity JDBC
• Web Services have a correspondingly collections of specifications that represent critical features of the distributed operating systems for “Grids of Simple Services”
– Some 60 active WS-* specifications for areas such as – a. Core Infrastructure Specifications
– b. Service Discovery
– c. Security
– d. Messaging
– e. Notification
– f. Workflow and Coordination
– g. Characteristics
– h. Metadata and State
WS-I Interoperability
•
Critical underpinning of Grids and Web Services is the
gradually growing set of specifications in the Web Service
Interoperability Profiles
•
Web Services Interoperability
(WS-I) Interoperability
Profile 1.0a." h
ttp://www.ws-i.org.
gives us
XSD,
WSDL1.1, SOAP1.1, UDDI
in basic profile and parts of
WS-Security
in their first security profile.
•
We imagine the “60 Specifications” being checked out
and evolved in the
cauldron of the real world
and
occasionally best practice identifies a new specification to
be added to
WS-I
which
gradually increases in scope
Web Services Grids and
WS-I+
• WS-I Interoperability doesn’t cover all the capabilities need to support Grids
• WS-I+ is designed to minimal extension of WS-I to support “most current” Grids: it adds support for
– Enhanced SOAP Addressing (WS-Addressing) – Fault tolerant (reliable) messaging
– Workflow as in IBM-Microsoft standard BPEL
• Security and Notification best practice and support will probably get added soon
– There are Web Service frameworks here but various IBM v Microsoft v Globus differences to be resolved
• UK OMII Open Middleware Infrastructure Institute is adopting this approach to support UK e-Science program
Bit
level
Internet
(OSI
Stack)
Layered Architecture for Web Services and Grids
Base Hosting Environment
Protocol HTTP FTP DNS …
Presentation XDR …
Session SSH …
Transport TCP UDP …
Network IP …
Data Link / Physical
Servic Internet
Application Specific Grids
Generally Useful Services and Grids
Workflow WSFL/BPEL
Service Management (“Context etc.”)
Service Discovery (UDDI) / Information
Service Internet Transport
Protocol
Service Interfaces WSDL
Servic Context
Higher Level
Working up from the Bottom
We have the classic (CISCO, Juniper ….) Internet routing the
flood of ordinary packets in OSI stack architecture
Web Services build the “Service Internet” or IOI (Internet on
Internet) with
• Routing via WS-Addressing not IP header
• Fault Tolerance (WS-RM not TCP)
• Security (WS-Security/SecureConversation not IPSec/SSL)
• Information Services (UDDI/WS-Context not DNS/Configuration files)
• At message/web service level and not packet/IP address level
Software-based Service Internet possible as computers “fast” Familiar from Peer-to-peer networks and built as a software
overlay network defining Grid (analogy is VPN)
SOAP Header contains all information needed for the “Service
Minicompute r Firewall Compute r Serve r PDA Mode m Laptop computer Workstatio n Peer s Peer s Audio/Video Conferencing Client Audio/Video Conferencing Client NaradaBrokering Broker Network
NaradaBrokering
Web Service B
Stream
Server-enhance Messaging
NB supports messages and streams
NaradaBrokering and IOI
• “Software Overlay Network” features • Support for Multiple Transport protocols • Support for multiple delivery mechanisms
– Reliable Delivery
– Exactly-once Delivery – Ordered Delivery
– Optional Delivery optimization modules for different modes • Compression/Decompression of payloads with optional module • Coalescing/Fragmentation of payloads with optional module • NTP Time Service
• Security Service
• Performance Monitoring
• Performance optimized routing with optional module
Virtualizing Communication
n Communication specified in terms of user goal and Quality of
Service – not in choice of port number and protocol
n Bit Internet Protocols have become overloaded e.g. MUST use
UDP for A/V latency requirements but CAN’t use UDP as firewall will not support ………
n A given “Service Internet” communication can involve multiple
transport protocols and multiple destinations – the latter possibly determined dynamically
Dial-u Filter Satellit
UDP
Firewal HTTP
A
B1
Hand-Hel Protocol Fas
Link
Software Multicast B2
B3
Performance Monitoring
n
Every broker
incorporates a
Monitoring service
that
monitors links
originating from the node.
n
Every link measures and exposes a set of
metrics
• Average delays, jitters, loss rates, throughput.
n
Individual links can
disable
measurements for
individual or the entire set of metrics.
n
Measurement
intervals can
also be varied
n
Monitoring Service,
returns measured
metrics to
Fast Web Service Communication I
• IOI Application level Internet allows one to optimize
message streams at the cost of “startup time”,
Web Services
can deliver the fastest possible interconnections
with or
without reliable messaging
• Typical results from Grossman (UIC) comparing Slow
SOAP over TCP with binary and UDP transport (latter
gains a factor of
1000)
Pure SOAP SOAP over UDP Binary over UDP