Grids for GeoSensors,
GeoScience and GeoScientists
PTLIU Laboratory for Community Grids
Geoffrey Fox
Computer Science, Informatics, Physics
Indiana University, Bloomington IN 4740
http://grids.ucs.indiana.edu/ptliupages/presentations/earthscopemar02
gcf@indiana.edu
Trends of Importance
n
Resources
of increasing performance or functionality
•
Computers (ASCI, Earth Simulator to TeraGrid),
storage, sensors, networks, PDA’s
n
Applications
of increasing sophistication
•
Size, multi-scales, multi-disciplines
n
New
algorithms
and mathematical techniques
n
Computer science
•
Compilers, Parallelism, Objects, Components
n
Grid
and
Internet
Concepts and Technologies
•
Enabling new applications
Projected Top 500 Until Year 2009
n
First, Tenth, 100th, 500th, SUM of all 500 Projected in Time
Earth Simulator from Japan
PACI 13.6 TF Linux TeraGrid
32 32 5 32 32 532 quad-processor McKinley
Servers Fibre ChannelSwitch HPS S HPS S ESnet HSCC MREN/Abilene Starlight 10 GbE
NCSA
500 Nodes
8 TF, 4 TB Memory
240 TB disk
SDSC
256 Nodes
4.1 TF, 2 TB Memory
225 TB disk
Caltech
32 Nodes
0.5 TF
0.4 TB
Memory
86 TB disk
Argonne
64 Nodes
1 TF
0.25 TB
Memory
25 TB disk
4 Juniper M160 OC-12 OC-48 OC-12 574p IA-32 Chiba City 128p Origin HR Display & VR Facilities= 32x 1GbE
= 64x Myrinet
= 32x FibreChannel
MyrinetClos
Spine Spine MyrinetClos Chicago & LA DTF Core Switch/Routers
Cisco 65xx Catalyst Switch (256 Gb/s Crossbar)
= 8x FibreChannel
OC-12 OC-12 OC-3 vBNS Abilene MREN Juniper M40
1176p IBM SP Blue Horizon OC-48 NTON 32 24 8 32 24 8 4 4 Sun E10K 4 1500p Origin UniTree 1024p IA-32 320p IA-64 2 14 8 Juniper M40 vBNS Abilene Calren ESnet OC-12 OC-12 OC-12 OC-3 8 Sun Starcat 16 GbE
= 32x Myrinet
HPS S 256p HP X-Class 128p HP V2500 92p IA-32 24 Extreme Black Diamond
32 quad-processor McKinley Servers (128p @ 4GF, 12GB
OC-12 ATM
Calren
Small Devices Increasing in Importance
n
There is growing
interest in wireless
portable displays in
the
confluence of cell
phone and personal
digital assistant
markets
n
By 2005,
60 million
internet ready cell
phones sold each
year
n
65%
of all
Broadband Internet
accesses via non
desktop
appliances
The HPCC Track
n
The
1990 HPCC 10 year initiative
was largely aimed at
enabling large scale simulations for a broad range of
computational science and engineering problems
n
It was in many ways a success and we have methods and
machines that can (begin to)
tackle most 3D simulations
•
ASCI simulations particularly impressive
•
DoE still putting substantial resources into basic software
and algorithms from adaptive meshes to PDE solver
libraries
n
Machines are still increasing in performance exponentially
and should achieve
petaflops
in next 7-10 years
n
EarthScope
community needs to harness these capabilities
•
Japan’s
Earth Simulator
activity major effort with large
Some HPCC Difficulties
n
An Intellectual failure
: we never produced a better
programming model than message passing
•
HPCC coding is hard work
•
Successes of ASCI software are like “Grid FTP” – not
parallelizing compilers
n
An institutional problem
: we do not have a way to produce
complex sustainable software for a niche (1%) market like
HPCC.
•
POOMA support just disappeared one day (foundation of
first proposal GEM wrote)
•
One must adopt commodity standards and produce
“small” sustainable modules.
•
Note distributed memory becoming dominant again with
HPCC Advice to EarthScope
n
KISS:
K
eep
i
t
Simple
and
Sustainable
n
Use
MPI
and
openMP
if needed for performance
on shared memory nodes
n
Adaptive Meshes
n
Load Balancing
n
PDE Solvers including
fast multipoles
n
Particle dynamics
n
Other areas such as datamining, visualization
and data assimilation quite advanced but still
significant research
}
Are well understoo
to get high performanc
parallel simulation
Use of Object Technologies I
n
The claimed commercial success in using
Object and
component technology
has not
yet
been a clear success in
HPCC
•
Object technologies
do not naturally support either
high performance or parallelism
•
C++
can be high performance but
Java (as a language)
is not uniformly so (it is improving)
•
Web Services
could change this
n
Fortran
(including Fortran90) will continue to decline in
importance and interest – the community should prefer
not to use it
•
It’s use will not attract the best students
n
Not essential
to write modules in
object oriented language
Use of Object Technologies II
n
There is
emerging HPCC component architecture
allowing
production of more modern libraries (integration
Infrastructure)
•
DoE has very large
CCA
– Common Component
Architecture – effort
•
Package software (“system and applications”)
as
distributed objects
– not as traditional libraries
n
CORBA Java
and
Web Services
are
not
naturally
high
performance as
component models
•
High performance
often
not essential
for
coarse grain
objects
•
Web Services
support multiple implementations
allowing
Application Structure
n
Earth Science applications
are typically scale and
multi-disciplinary
•
i.e. a given simulation is made of multiple components with
either different time/length scales and/or multiple authors
from possibly multiple fields
n
I am not aware of a systematic “Computational renormalization
group” – a methodology that links different scales together
n
However
composition of modules
is an area where (component)
technology of growing sophistication is becoming available
•
Needed commercially to integrate corporate functions
•
Easiest for large coarse grain components
•
Integration of data and simulation is one example of fine-scale
Object Size & Distributed/Parallel Simulations
n
All
interesting systems
consist of
linked entities
•
Particles, grid points, people or groups thereof
n
Linkage translates into
message passing
•
Cars on a freeway
•
Phone calls
•
Forces between particles
n
Amount of communication
tends to be proportional to
surface area of entity whereas simulation time proportional
to volume
n
So
communication/computation
is surface/volume and
decreases
in importance as
entity size increases
n
In parallel computing, communication synchronized; in
Some Problem Classes
n
Classic HPCC:
synchronized objects with regular time
structure (communication overhead decreases as
problem size increases)
•
Includes PDE and interacting particle based applications
•
Give
scaling parallelism on large MPP’s
n
Internet Technology and Commercial Application
Integration:
Large objects with modest communications
and without difficult time synchronization
•
Compose
as independent (pipelined)
services
•
Includes some approaches to multi-disciplinary simulation
linkage
n
Hardest:
smallish objects with irregular time
synchronization
What is a Grid or Web Service?
n
There are generic
Grid system services
: security, collaboration,
persistent storage, universal access
•
OGSA (Open Grid Service Architecture) is implementing these as
extended Web Services
n
An
Application Web Service
is a capability used either by another
service or by a user
•
It has input and output ports – data is from sensors or other
services
n
Consider
Satellite-based Sensor Operations
as a Web Service
•
Satellite management
(with a web front end)
•
Each
tracking station
is a service
•
Image Processing
is a pipeline of filters – which can be grouped
into different services
•
Data storage
is an important system service
•
Big services built hierarchically from “basic” services
Sensor Web Service
Distributed Sensor Web
Service
Out Web Service port
Universal sensor acces
for people/computers
In Web Service port
Application Web Services
n
Note Service model integrates sensors, sensor analysis, simulations and people
n
An
Application Web Service
is a capability used either by another service or
by a user
•
It has input and output ports – data is from users, sensors or other services
•
Big services built hierarchically from “basic” services
Sensor Data
as a We
service
(WS)
Data
Analysis
WS
Sensor
Managemen
WS
Visualization
WS
Simulation
WS
Filter
WS
Filter
WS
Filter
WS
Build as multiple Filter Web Services
Prog
WS
Prog
WS
The Application Service Model
n
As bandwidth of communication (between) services increases
one can support smaller services
n
A service “is a
component
” and is a replacement for a
library in case where performance allows
n
Services (components)
are a sustainable model of software
development – each service has documented capability with
standards compliant interfaces
•
XML
defines interfaces at several levels
•
WSDL
at Service interface level and
XSIL
or equivalent
for scientific data format
n
A service can be written as Perl, Python, Java Servlet,
Enterprise Javabean, CORBA (C++ or Fortran) Object …
n
Communication
protocol can be RMI (Java), IIOP
Services support Communities
n
Grid Communities
(Earth Science, SCEC, DoD, Earth
Science, High School Classes) are groups of
communicating individuals sharing resources
implemented as Web Services
n
Access Grid
from Argonne/NCSA is high-end
Audio/Video conferencing technology
n
Peer to Peer networking
describes a set of technologies
supporting community building with an emphasis on
less structured groups than classic “users of a
supercomputer”
n
Peer to peer Grids
combine the technologies and support
e-Science is just a pile of XML
n
Each leaf is a piece of XML either defining a nugget of
information and/or containing links to other XML or “raw
resources”
Biased History of Computing
n
In almost the beginning, there was
Fortran
and formats
(6I5, 5F10.4) for data
n
………..
n
1993-1997:
HTML
came along for Web Pages
n
1998-…:
XML
was developed to define information in
documents while HTML defining rendering
•
But soon it became used for specifying all data and their
format
n
2001:
Web Services
allowed XML to specify
methods
(subroutines) as well as data
n
Java, C++, Python, Perl, .. Fortran
are now “just” the
XML (RSS) Specification of Information Nuggets
n
<item rdf:about
="http://xml.com/pub/2000/08/09/xslt/xslt.html">
n
<title>
Processing Inclusions with XSLT
</title>
n
<link>
ht
tp://xml.com/pub/2000/08/09/xslt/xslt.html </
link>
n
<description>
n
Processing document inclusions with general XML tools can be
n
problematic. This article proposes a way of preserving inclusion
n
information through SAX-based processing.
n
</description>
n
</item>
n
<item rdf:about
="http://xml.com/pub/2000/08/09/rdfdb/index.html">
n
<title>
Putting RDF to Work
</title>
n
<link>
http://xml.
com/pub/2000/08/09/rdfdb/index.html </link>
n
<description>
n
Tool and API support for the Resource Description Framework
n
is slowly coming of age. Edd Dumbill takes a look at RDFDB,
n
one of the most exciting new RDF toolkits.
n
</description>
n
</item>
n
</rdf:RDF>
What is a Web Service I
n
A
web service is a computer program
running on either the local
or remote machine with a set of well defined interfaces (ports)
specified in XML (WSDL)
n
In principle, computer program can be in any language
(Fortran .. Java .. Perl .. Python) and the interfaces can be
implemented in any way what so ever
•
Interfaces can be method calls, Java RMI Messages, CGI Web
invocations, totally compiled away (inlining) but
n
The simplest implementations involve
XML messages (SOAP)
and programs written in net friendly languages like Java and
Python
n
Web Services separate the
meaning of a port (message) interface
from its implementation
n
Enhances/Enables Re-usable component model of ANY
etc.
XML WS to WS Interfaces
(Virtual) XML Knowledge (User)
Interface
Clients
(Virtual) XML Data
Interface
Raw
Data
Ra
Resource
s
Raw
Data
W
S
W
S
Web Service
(WS)
W
S
W
S
W
S
W
S
W
S
W
S
Render to XML Display
Format
Classic Grid Architecture
Database
Database
Netsolv
e
Computin
g
Securit
y
Collaboratio
n
Compositio
n
Content
Access
Resources
Middle Tie
Brokers
Service
Providers
What is a Web Service II
n
Web Services have important implication that
ALL
interfaces are XML messages based.
In contrast
n
Most Windows programs have interfaces defined as
interrupts due to user inputs
n
Most software have interfaces defined as methods which
might be implemented as a message but this is often
NOT explicit
Securit
y
Catalo
g
Paymen
Credit
Card
Warehous
e
shipping
WSDL
What is a Web Service III
n
“Everything electronic” is a
resource
•
Computers; Programs; People
•
Data (from sensors to this presentation to email to
databases)
n
“Everything electronic” is a
distributed object
n
All
resources have interfaces
which are defined in
XML
for
both
properties
(data-structure) and
methods
(service,
function, subroutine) (
Resources
are
Services
)
•
We can assume that a data-structure property has
getproperty()
and
setproperty(value)
methods to act as
interface
n
All resources are linked by
messages
with structure, which
must be specifiable in XML
WSDL Abstractions
n
WSDL
abstracts a program
as an entity that does
something given one or more inputs with its results
defined by streams on one or more outputs.
n
Functions are defined by method name and
parameter
methodname(parm1,parm2, … parmN)
•
Where parameters are “Input” “Output” or both
n
In WSDL, we will have a
Web Service
which like a
(Java or CORBA Program) can be thought of as a
(distributed) object with many methods
•
Instead of a function call, the “calling routine” sends an
XML message to the Web Service specifying
methodname
and values of the parameters
Details of WSDL Protocol Stack
n
UDDI
finds where programs are
•
remote( (distributed) programs
are just Web Services
n
WSFL
links programs togethe
(under revision?)
n
WSDL
defines interface (methods,
parameters, data formats)
n
SOAP
defines structure of message
including serialization of information
n
HTTP
is negotiation/transport
protocol
n
TCP/IP
is layers 3-4 of OSI
n
Physical Network
is layer 1 of OSI
UDDI or WSIL
WSFL
WSDL
SOAP or RMI
HTTP or SMTP
or IIOP or RMTP
TCP/IP
Examples of Web Services I
n
OGSA (Open Grid Service Architecture)
•
Integrate Web Service and Grid Concepts and allows Globus
to be implemented as Web Services
n
Audio-Video Conferencing
as a Web Service
•
Integrates H323, SIP, JXTA (etc.) protocols by mapping to
single XML Interface
•
Provides VRVS reflector model from Messaging Web Service
n
Messaging or Event Web Service
provides intelligent routing
and buffering of messages
n
Computing
as a Web service
•
Job submittal, status, composition, data services, visualization
•
Performance WS
allows access to distributed monitoring
Examples of Web Services II
n
Education
as a Web Service
•
One of easiest to do as object standards well defined (IMS)
and little performance issues
•
Grading, Homework submission, registration, assessment etc.
n
Universal Access
and Web Services
•
As Web Services allow multiple implementation of a
particular interface, one can adjust to needs of particular
clients (PDA v. versus, impaired sight etc.)
•
Can build custom implementations of certain web services for
particular communities but re-use others
n
Collaborative Web Services
•
As interfaces all message based,
much easier to share Web
Education as a Web Service
n
Can link to Science as a Web Service and substitute educational
modules
n
“
Learning Object
” XML standards already exist from IMS/ADL
h
ttp://www.adlnet.org –
need to update architecture
n
Web Services for
virtual university
include:
n
Registration
n
Performance
(grading)
n
Authoring
of Curriculum
n
Online laboratories
for real and virtual instruments
n
Homework submission
n
Quizzes
of various types (multiple choice, random parameters)
n
Assessment
data access and analysis
n
Synchronous Delivery
of Curricula
n
Scheduling
of courses and mentoring sessions
Distributed Information
Actually the XML is
distributed
Structured (XML) Information
earthscope://root/one/two/botto
m
roo
t
one
two
bottom
Note XML specifie
both internal an
Matching Information/Service
Providers and Consumers I
n
Classic Centralized Approach
n
Those with services
publish information
as to location –
this is percolated
up and down the tree of brokers
n
At simplest, publish location; better publish location
and meta-data allowing easier discovery of value
n
Those wanting service, look it up using either
•
Some search of information registered with brokers
•
A search using a system like Google
•
Because they were told some key
n
Like using an
encyclopedia
; very
reliable
and
fast for
Unstructured and Structured XML
earthscope://root/one/two/mes
s
roo
t
one
two
mess
“mess” can be multiple levels of tree
Hoosier National Forest showin
Peer to Peer Grid
Database
Database
JXTA
JXTA
Web Service Interfaces
Web Service Interfaces
Event
Messag
Brokers
Integrate P2P
and Grid/WS
Matching Information/Service
Providers and Consumers II
n
Peer-to-peer Approach (or how to search the “mess”)
n
Those with services
publish XML advertisements to their
friends
; their friends
may
forward it to other friends
n
Those wanting a service, publish an XML request to a chosen set
of friends
n
Friends use their
personal idiosyncratic approach
to matching
requests with advertisements and to choosing who else should be
asked
n
Analogous to way
communities exchange information
as in a
meeting like this
n
Uncertain reliability but
scales well
(communities intra-exchange
Message
Or Event
Based
Inte
Connection
Reso
urce
Data
base
Reso
urce
Sof
ware
Sof
ware
XM
Skin
e-Science is XML Specified Resource
connected by XML specified messages
XM
Skin
Technology Trends and Principles
n
All performance and capability measures of infrastructure
continue to improve
n
Gilder’s law
says that network bandwidth increases 3 times
faster than CPU Performance (
Moore’s Law
)
n
The
Telecosm
eclipses the
Microcosm
(but don’t look at Wall
Street) ….
George Gilder
Telecosm : How Infinite
Bandwidth Will
Revolutionize Our World
(September 2000, Free
Press; ISBN:
Grid/P2P Use of Internet I
ROBERT B. COHEN, PH.D.
COHEN COMMUNICATIONS GROUP
Cohen’s
Rival Estimate
Mainl
Grid/P2P Use of Internet II
S2S Server to Server
Meta-Data and Web Services
n
Enriching resources with meta-data is critical idea
•
Enables one to identify and link resources around the globe
•
Allows one to find out “meaning” of a Web service not just
syntax of interface
n
Semantic Grid
implies linkage of Grid/Web services
enabled by meta-data leading to “
digital brilliance
”
phase transition
n
We can experiment with Semantic Web techniques for
specifying meta-data
RDF DAML OIL
n
These encompass both straightforward enriched data
Semantic Grid & Digital Brilliance I
n
The (XML) advertisement-request matching provides a
publish-subscribe linkage
between resources – these are
people, computers and raw/processed data
n
The richer the meta-data, the more precise the linkage
•
This is spirit of
Semantic Web
– RDF/DAML/OIL
metadata enables meaningful linkage
n
In a physics analogy,
resources
can be thought of as
spins
and the
meta-data
induced linkage as
interactions
n
Phase transitions
will occur when “enough” resources
are linked – one will get associated spins to align in the
direction of
new knowledge
Semantic Grid & Digital Brilliance II
n
This suggests ways of quantifying value of
metadata
induced linkages
and ways of identifying where one
“should” add more resource specifications
n
Note that related
resources
are
not
necessarily
directly
connected
but rather messages are forwarded through
friends
n
Study of Peer to Peer
networks teach us that we can
build “
small worlds
” where distance between resources
is logarithmic in number of nodes
n
This physics based picture provides an interesting
underlying formalism to give a
theory of e-Science
….
•
All you need to do is to
build
a
lot
of
XML Meta-data
Semantic Grid & Digital Brilliance III
n
EarthScope Collaboratory
consists of a set of connected “spins”
(being a physicist; resources if I was W3C)
n
Resources are anything with a digital signature
•
Raw data, Analysers, Simulators, Simulations, Processed
Information, Extracted Knowledge, Scientists ….
n
The linkage of
Earthquake Fault Simulator Web Service
to the
Greens Function Solver Web Service
is as program to
subroutine; must have
agreement
on both
syntax
and
Semantics
n
The linkage of
Granular Physics model
to (my) remark that Los
Alamos has interesting
new simulation technology
is
less precise
n
So linkages with very precise ontologies and those which are
Web (Unstructured) mode for Google
Portals and Web Services
n
Web Services
allow us to build a
component model
(see
CCA) for resources.
n
Each resource
naturally has a
user interface
(which
might be customized for user)
n
Web Service <--> Portlet
n
Natural to use a component model for portal building
displayed web page from collection of portlets
•
So can customize each portlet and customize which portlets
you want
n
Apache Jetspeed
seems good open source technology
supporting this model
•
JSP model
is better than say a client-side Java integration in
Jetspeed Computing Portal: Choose Portlets
4 available portlet
Choose Portlet Layout
Choose 1-column Layout
Online Knowledge Center for DoD HPCMO
n
Web Services
provide a
component model
for the
middleware (see large “
common component
architecture
” effort in Dept. of Energy)
n
Should match each WSDL component with a
corresponding user interface component
n
Thus one “must use” a
component model for the portal
with again an XML specification (
portalML
) of portal
EarthScope CSIT Strategy
n
Make a list of
resources
with a hierarchical
arrangement
•
People, Places, Results (Publications, meeting archives,
Simulation Output), Activities, Sensors (Instruments), Data
(raw and processed), Earth features, Computers, Software
n
Decide on component (Web Service) model and
URI
labelling (
earthscope://devices/satellites/year/label
…)
•
Respect
performance
requirements
•
Design so modules can be re-used, re-arranged and replaced
for outreach (
education
)
n
Study related CSIT architectures of other fields
•
Grid Forum, PACI, ASCI for
computing issues
•
W3C Web Consortium for
basic IT infrastructure
•
openGIS XMML for
related fields
EarthScope HPCC Strategy
n
Decide what services are well enough understood and useful
enough to be encapsulated as
application Web Services
•
Parallel FEM Solvers
•
Visualization
•
Parallel Particle Dynamics
•
Access to Sensor Data
•
Image Processing
n
Make services as
small
as possible – smaller is simpler and more
sustainable but with higher communication needs
•
Compose large services from smaller ones
n
Design
Portals
and portal components that allow one to
manipulate services –
set parameters, compose, invoke
n
Implement chosen System Web Services (job submit,
performance, queue) on
central machines
and
local clusters
•
Make certain infrastructure supports compute, data,
middleware needs
EarthScope IT Strategy
n
Design an internal
EIF
(EarthScope Internal
Framework) defining architecture and interface
standards of internal Web Services and data structures
n
Design
EEF
(EarthScope External Framework) which
maps external raw data into sensor web services
n
Choose some appropriate (mix of)
middleware
frameworks
•
.net, IBM, BEA, Sun, Oracle
n
Look at special requirements for key
system services
•
Hardware/Data systems
(new and legacy issues)
•
Security
•
Collaboration
including Audio/Video conferencing
•
Peer-to-peer
networking
EarthScope Peer to Peer Grid Community
Gateway and Web Services
n
We will use the Gateway Computing Portal as an
example (
http://www.gatewayportal.org)
•
It is largely built using CORBA with a Java Server Pages
front end
•