Semantic and Streamin
Grids
Chinese Academy of Sciences Dec 6 2005
Geoffrey Fox
Computer Science, Informatics, Physics Pervasive Technology Laboratories Indiana University Bloomington IN 47401
Four Data Streaming Application Areas
n
Data Assimilation
applied to link the
data deluge
(satellites, sensors, seismometers) in real time to small
and large scale parallel simulations
• Use in Earthquake Science
n
Department of Defense
(and
Homeland Security
) have
built the
Global Information Grid
with a target
architecture
NCOW
(Network Centric Operations and
warfare)
• They submit no jobs; rather stream data to brokers from
which they are filtered and distributed
• Includes their rather dated distributed simulation HLA
n
Audio-Video Conferencing
implemented with services
and Grid messaging
n
Hand-held Grid
linking PDA/cell-phones to Grids
Data Deluged Science
n In the past, we worried about data in the form of parallel I/O or MPI-IO, but we didn’t consider it as an enabler of new science and new ways of computing
n Data assimilation was not central to HPCC
n DoE ASCI set up because didn’t want test data!
n Now particle physics will get 100 petabytes from CERN
• Nuclear physics (Jefferson Lab) in same situation • Use around 30,000 CPU’s simultaneously 24X7
n Weather, climate, solid earth (EarthScope)
n Bioinformatics curated databases (Biocomplexity only 1000’s of
data points at present)
n Virtual Observatory and SkyServer in Astronomy n Environmental Sensor nets
Information/Knowledge Grids
n
Distributed
(10’s to 1000’s) of
data sources
(instruments,
file systems, curated databases …)
n
Data Deluge
: 1 (now) to 100’s
petabyte
s/year (2012)
• Moore’s law for Sensors
n
Possible
filters
assigned dynamically (
on-demand
)
•
Run image processing algorithm on telescope image
•Run Gene sequencing algorithm on compiled data
n
Needs
decision support
front end with “what-if”
simulations
n
Metadata
(
provenance
)
critical to annotate data
n
Integrate
across experiment
as in multi-wavelength
astronomy
Database
S S
S
S SS SS SS SS SS SS SS SS F S F S F S F S F S F S F S F
S SF
F S F S F S F S F S F S F S F S F S F S F
S Portal
F S O S O S O S O S O S O S O S O S O S O S O S O S MD MD MD MD MD MD MD MD MD
MetaData Filter Service Sensor Service Other Service Anothe Grid
Raw Data Data Information Knowledge Wisdom Decisions S S S S Anothe Service Anothe Service S S Anothe
Grid S S
Semantic Grid and Services
n Implications of SOA (Service Oriented Architectures) for SG
(Semantic Grid)
• Build services to implement SG
n Implications of SG for SOA
• Build metadata rich systems of services using SG
n Services receive data in SOAP messages, manipulate it and
produce transformed data as further messages
n Meta-data is carried in SOAP messages
n Meta-data controls processing and transport of SOAP Messages n Knowledge is created from data by services
n The Grid enhances Web services with semantically rich system
and application specific management
n One must exploit and work around
the
different
approaches to meta-data and their manipulation in Web ServicesStructure of SOAP Messages
n SOAP Messages have System information in the header
including WS-Policy based meta-data defining processing options
• Processed by Handlers
n Application data and meta-data is the body (controversies here!)
• Processed by the Service itself
n Some meta-data like WS-RF is logically “only in messages” n Other like that in WS-Context or the SRB are stored in logical
equivalent of XML databases
n We only need to preserve semantic structure (XML/SOAP
Infoset) so transport in fast XML and store in efficient relational databases
H1 H2 H3 H4 Body F1 F2 F3 F4 Service Container Handlers
Container Workflow
What Type of Services are there?
n There are a horde of support services supplying security,
collaboration, database access, user interfaces
n The support services are either associated with system or application
• We will study the WS-* and GS-* which implicitly or
explicitly define many support services
n There are generalized filter services which are applications that
accept messages and produce new messages with some data derived from that in input
• Simulations (including PDE’s and reactive systems)
• Data-mining
• Transformations
• Agents
• Reasoning are all termed filters here
n There are services like “author ontology”, “parse RDF” or
“attach provenance” that directly support Semantic Grid
n But all services and their interactions are bathed in sea of meta-data and so implicitly need and support the Semantic Grid
It’s a Composite Hierarchical World
n Filters can be a workflow which means they are “just collections of other simpler services”
• One needs meta-data to control the workflow
n Services are programs that accept messages and produce messages
n Grids are a distributed collection of services supporting
managed shared resources
• Management requires meta-data
n Grids are distributed systems that accept distributed messages and produce distributed result messages
• Can always talk about Grids and view a service or a
workflow as a special case of a Grid
n It just requires meta-data to send a message to a Grid and it routed to “correct computer” holding “requested service”
• Meta-data allows mapping of virtual to real addresses
Semantically Rich Services with a Semantically
Rich Distributed Operating Environment
Database S S S S S S S S S S S S S S S S S S S
S SS SS SS SS SS SS SS SS F S F S F S F S F S F S F S F S F
S SF
F S F S F S F S F S F S F S F S F S F S F
S Portal
F S O S O S O S O S O S O S O S O S O S O S O S O S MD MD MD MD MD MD MD MD MD
MetaData Filter Service Sensor Service Other Service SOAP Message Streams SOAP Message Streams
Raw Data Raw Data
Raw Data Raw Data Data Data Data Data Information Information Knowledge Knowledge Wisdom Decisions Information Anothe Servic e Anothe Servic e Anothe
Grid Grids of Grids Architecture AnotheGrid
is same as outward facing applicatio
GIS Grids and Sensor Grids
n
OGC
has defined a suite of
data structures
and
services
to support
Geographical Information Systems and
Sensors
n
GML
Geography Markup language defines
specification of geo-referenced data
n
SensorML
and
O&M
(Observation and Measurements)
define meta-data and data structure for sensors
n
Services like
Web Map Service, Web Feature Service,
Sensor Collection Service
define services interfaces to
access GIS and sensor information
n
Grid workflow
links services that are designed to
support streaming input and output messages
n
We are building Grid (Web) service implementations of
these specifications for NASA’s
SERVOGrid
A Screen Shot From the WMS Client
WMS uses WFS that uses data sources
<gml:featureMember>
<fault>
<name> Northridge2 </name> <segment> Northridge2
</segment>
<author> Wald D. J.</author>
<gml:lineStringProperty>
<gml:LineString
srsName="null">
<gml:coordinates>
118.72,34.243 -118.591,34.176
</gml:coordinates>
</gml:LineString>
</gml:lineStringProperty>
</fault>
</gml:featureMember>
Electric Power and Natural Gas data from LANL
Interdependent Critical Infrastructure Simulations
Zoom-in
Zoom-out
FeatureInfo mode
Measure distance mode
Clear Distance
Drag and Drop mode
Refresh to initial map
Typical use of Grid Messaging in NASA
Datamining Grid
Sensor Grid
Grid Eventing GIS Grid
Typical use of Grid Messaging
HPSearc h
Manages
Narad Brokering Sensor Grid
WS-Context
Stores dynamic data
Filter or Dataminin
g
WFS (GIS data)
Post befor Processing
Post afte Processing
Notify
Subscribe
Grid Database
Archives
Web Feature Service
GIS Grid
Geographica
Real Time GPS
and Google Maps
Subscribe to live GPS station. Position data from SOPAC is
combined with Google map clients.
Select and zoom to GPS station location, click icons for more information.
Google maps can be
integrated with Web Feature Service
Archives to filter and
browse seismic records.
Integrating
Archived Web
Feature Services
Google Maps
as Service
accessed from
our WMS
3 XML Databases of Importance
n WS-Context controlling a workflow
n (Extended) UDDI supporting semantic service discovery n WFS or ASFS (see later) provides application specific
data/meta-data repository)
n These have different performance, scalability and data unit size
requirement
n In our implementation, each is currently “just an
Oracle/MySQL” database front ended by filters that convert between XML (GML for WFS) and object-relational Schema
• Example of Semantics (XML) versus representation (SQL)
difference
n OGSA-DAI offers Grid interface to databases – we could use but
don’t as we only need to expose WFS and not MySQL to Grid
Information Management/Processing
n SOAP messages transport information expressed in a
semantically rich fashion between sources and services that enhance and transform information so that complete system provides
• Semantic Web technologies like RDF and OWL help us have
rich expressivity
n Data Information Knowledge transformation n We build application specific information
management/transformation systems ASIS for each application domain
n One special domain is the system itself where the metadata
associated with services, sessions, Grids, messages, streams and workflow is itself managed and supported by an SIIS
Generalizing a GIS
n
Geographical Information Systems
GIS have been
hugely successful in all fields that study the earth and
related worlds
• They define Geography Syntax (GML) and ways to store,
access, query, manipulate and display geographical features
• In SOA, GIS corresponds to a domain specific XML language
and a suite of services for different functions above
n
However such a universal information model has
not
been developed in other areas
even though there are
many fields in which it appears possible
• BIS Biological Information System • MIS Military Information System
• IRIS Information Retrieval Information System • PAIS Physics Analysis Information System
• SIIS Service Infrastructure Information System
ASIS Application Specific Information System I
n a) Discovery capabilities that are best done using WS-*
standards
n b) Domain specific metadata and data including
search/store/access interface. (cf WFS). Lets call generalization
ASFS (Application Specific Feature Service)
• Language to express domain specific features (cf GML). Lets call
this ASL (Application Specific language)
• Tools to manipulate information expressed in language and key
data of application (cf coordinate transformations). Lets call this
ASTT (Application specific Tools and Transformations)
• ASL must support Data sources such as sensors (cf OGC metadata
and data sensor standards) and repositories. Sensors need
(common across applications) support of streams of data
• Queries need to support archived (find all relevant data in past)
and streaming (find all data in future with given properties)
• Note all AS Services behave like Sensors and all sensors are
wrapped as services
• Any domain will have “raw data” (binary) and that which has been
filtered to ASL. Lets call ASBD (Application Specific Binary Data)
ASIS Application Specific Information System II
n Lets call this ASVS (Application Specific Visualization Services)
generalizing WMS for GIS
n The ASVS should both visualize information and provide a way of
navigating (cf GetFeatureInfo) database (the ASFS)
n The ASVS can itself be federated and presents an ASFS output
interface
n d) There should be application service interface for ASIS from which all
ASIS service inherit
n e) There will be other user services interfacing to ASIS
n All user and system services will input and output data in ASL using
filters to cope with ASBD
AS Tool (generic
) A
“Sensor ” A Repository
AS Service (user defined)
ASVS Displa
y AS Tool
(generic )
Messages using ASL
Filter, Transformation, Reasoning, Data-mining, Analysis
Everything Is a Service
or a message/ Information
Nugget Militar Informatio Management
System
Directly GS-* WS-*
ASVS
Filters/ASTT
MI
or Military Information
Object Unit of Managed Information expressed in
ASL
OGSA-DAI and Sensor Standards
Info-WS-Notification WS-Eventing ASF
S
Two-level Programming I
• The Web Service (Grid) paradigm implicitly assumes a
two-level Programming Model
• We make a
Service
(same as a “distributed object” or
“computer program” running on a remote computer) using
conventional technologies
– C++ Java or Fortran Monte Carlo module – Data streaming from a sensor or Satellite – Specialized (JDBC) database access
• Such
services
accept and produce data from users files and
database
• The Grid is built by coordinating such services assuming
we have solved problem of programming the service
Servic
e Data
Two-level Programming II
n
The Grid is discussing the composition of distributed
services
with the runtime
interfaces to Grid as
opposed to UNIX
pipes/data streams
n
Familiar from use of UNIX Shell, PERL or Python
scripts to produce real applications from core programs
n
Such interpretative environments are the single
processor analog of
Grid Programming
n
Some projects like GrADS from Rice University are
looking at integration between service and composition
levels but dominant effort looks at each level separately
Service
1 Service2
Service
3 Service4
WS 2 WS N-1
Web Service 1 Web Service N
3 Layer Programming Model
Level 2 Programming choosing services by virtualization
Application Semantics (Metadata, Ontology) Semantic Grid Level 1 Programming inside services
Application expressed in in Java Fortran C++ MPI etc.
Level 3 Grid Programming composing multiple services
Service Workflow, Transactions, Mediation WS-* Infrastructure
Substantial work in UK e-Science program, international semantic web community
Consequences of Rule of the Millisecond
n
Useful to remember
critical time scales
• 1) 0.000001 ms – CPU does a calculation
• 2a) 0.001 to 0.01 ms – Parallel Computing MPI latency • 2b) 0.001 to 0.01 ms – Overhead of a Method Call
• 3) 1 ms – wake-up a thread or process either?
• 4) 10 to 1000 ms – Internet delay: Workflow
n
So use pointers and the compute memory system when
latencies of ≤ 1 millisecond but use URI looked up in a
context store when longer delays allowed
n
Transfer data when read-only and long latency allowed
nAlways choose the slowest allowed methodology and
remember when in doubt, Moore’s law favors computer
performance and systems always get more complex and
harder to maintain.
Classic
Programming
GlobalMMCS Web Service Architecture
SIP H323 AccessGrid NativeXGSP
Admire
Gateways convert to uniform XGSP Messaging
High Performance (RTP and XML/SOAP and ..
Media Servers
Filters Session Server
XGSP-based Control
NaradaBrokerin g
All Messaging
Use Multiple Media servers to scale to many codecs and many versions of audio/video mixing
NB Scales a distributed
We Services
NaradaBrokering
GlobalMMCS Architecture
Event Messaging Service
(NaradaBrokering)
XGSP Conference Control Service
Audio Video
Web Service MessagingInstant Web Service
Shared Display Web Service
Shared ….
Web Service
n
Non-WS collaboration
control protocols are
“gatewayed” to XGSP
n
NaradaBrokering
supports TCP (chat, control, shared
display, PowerPoint etc.) and UDP (Audio-Video
conferencing)
XGSP Example: New Session
<CreateAppSession>
<ConferenceID> GameRoom </ConferenceID> <ApplicationID> chess </ApplicationID>
<AppSessionID> chess-0 </AppSessionID>
<AppSession-Creator> John </AppSession-Creator> <Private> false </Private>
</CreateAppSession> <SetAppRole>
<AppSessionID> chess-0 </AppSessionID> <UserID> Bob </UserID>
<RoleDescription> black </RoleDescription> </SetAppRole>
<SetAppRole>
<AppSessionID> chess-0 </AppSessionID> <UserID> Jack </UserID>
<RoleDescription> white </RoleDescription> </SetAppRole>
XGSP AV Signaling Protocol with H.323
H323 Terminal H323 Gatewa
y H225.Setup H225.Connect JoinAVSessio n JoinAVSession OK Terminal Capability Se t AC K
Terminal Capability Set AC
K
OpenLogicChannel ( Video ) AC K JoinAVSessio n (Video) AC K
OpenLogicChannel ( Video )
OpenLogicChannel ( Audio ) AC
K
OpenLogicChannel ( Audio ) AC
K JoinAVSession (Audio)
ACK with video RTPLink <IP Addr, Port>
ACK with Audio
RTPLink<IP Addr, Port>
with the RTPLinks <IP Addr, Port>
& capability description
NaradaBrokering 2003-2006
n Messaging infrastructure for collaboration, peer-to-peer and Grids
Implements JMS and native high-performance protocols (message
transit time of 1 to 2 ms per hop)
n Order-preserving message transport with QoS and security profiles
n Support for different underlying transport such as TCP, UDP,
Multicast, RTP
n SOAP message support and WS-Eventing, WS-RM and WS-Reliability.
• WS-Notification when specification agreed
n Active replay support: Pause and Replay live streams.
n Stream Linkage: can link permanently multiple streams – using in
annotation of real-time video streams
n Replicated storage support for fault tolerance and resiliency to storage
failures.
n Management: HPSearch Scripting Interface to streams and brokers
(uses WS-Management)
n Broker Topics and Message Discovery: Locate appropriate
n Integration with Axis2 Web Service Container (?)
n High Performance Transport supporting SOAP Infoset
Average Video Delays for one broker –
Performance scales proportional to number of brokers
Latency ms
# Receivers One session Multipl
sessions
30 frames/sec
GlobalMMCS SWT Client
Chat TV
Webcam Video
Mixer GIS
e - Annotation Playe
r
Archived stream playe
r
Annotatio
nplaye / WB r
Archieved stream
list
Real time stream
list
e -Annotation Whiteboar
d
Real time stream playe
r
Archived Real Time Real Tim
Stream List Stream List Player
e-Annotation Archived Stream Annotated e-Annotation
Player Player Stream Player Whiteboard
Location of software for Grid Projects in
Community Grids Laboratory
n
htpp://www.naradabrokering.org p
rovides Web service
(and JMS) compliant
distributed publish-subscribe
messaging
(software overlay network)
n
h
tpp://www.globlmmcs.org is
a
service oriented (Grid)
collaboration environment
(audio-video conferencing)
n
ht
tp://www.crisisgrid.org is
an OGC (open geospatial
consortium) Geographical Information System (GIS)
compliant
GIS and Sensor Grid
(with POLIS center)
n
htt
p://www.opengrids.org has
WS-Context, Extended
UDDI etc.
n
The work is still in progress but core part of
NaradaBrokering is quite mature
n
All software is open source
and freely available
Summary
n
Virtualization
everywhere
n
Focus on
semantics not representation
to get
performance
combined with
expressivity
for transport
and data access
n
All this enabled by powerful
meta-data services
n
Grids add management
to rich but potentially chaotic
set of Web Services;
• management and coherence enabled by meta-data
n
Can define
general information architectures
(ASIS,
GIS, SIIS) for both applications and system
n
Knowledge
from
filters
that span
simulations,
data-mining, reasoning
and
agents
n