Grids for GeoSensors,
GeoScience and GeoScientists
PTLIU Laboratory for Community Grids
Geoffrey Fox
Computer Science, Informatics, Physics Indiana University, Bloomington IN 4740
http://grids.ucs.indiana.edu/ptliupages/presentations/earthscopesmallmar02
Trends of Importance
n
Resources
of increasing performance or functionality
•
Computers (ASCI, Earth Simulator to TeraGrid),
storage, sensors, networks, PDA’s
n
Applications
of increasing sophistication
•
Size, multi-scales, multi-disciplines
n
New
algorithms
and mathematical techniques
n
Computer science
•
Compilers, Parallelism, Objects, Components
n
Grid
and
Internet
Concepts and Technologies
•
Enabling new applications
Projected Top 500 Until Year 2009
n First, Tenth, 100th, 500th, SUM of all 500 Projected in Time
Earth Simulator from Japan
http://geofem.tokyo.rist.or.jp/
PACI 13.6 TF Linux TeraGrid
32 32 5 32 32 5Cisco 6509 Catalyst Switch/Router 32 quad-processor McKinley
Servers
(128p @ 4GF, 8GB memory/server)
Fibre Channel Switch HPS S HPS S ESnet HSCC MREN/Abilene Starlight 10 GbE
16 quad-processor McKinley Servers
NCSA
500 Nodes 8 TF, 4 TB Memory
240 TB disk
SDSC
256 Nodes 4.1 TF, 2 TB Memory
225 TB disk
Caltech 32 Nodes
0.5 TF 0.4 TB Memory 86 TB disk
Argonne 64 Nodes
1 TF 0.25 TB Memory 25 TB disk
IA-32 nodes 4 Juniper M160 OC-12 OC-48 OC-12 574p IA-32 Chiba City 128p Origin HR Display & VR Facilities
= 32x 1GbE
= 64x Myrinet
= 32x FibreChannel
MyrinetClos
Spine Spine MyrinetClos Chicago & LA DTF Core Switch/Routers
Cisco 65xx Catalyst Switch (256 Gb/s Crossbar)
= 8x FibreChannel
OC-12 OC-12 OC-3 vBNS Abilene MREN Juniper M40
1176p IBM SP Blue Horizon OC-48 NTON 32 24 8 32 24 8 4 4 Sun E10K 4 1500p Origin UniTree 1024p IA-32 320p IA-64 2 14 8 Juniper M40 vBNS Abilene Calren ESnet OC-12 OC-12 OC-12 OC-3 8 Sun Starcat 16 GbE
= 32x Myrinet
HPS S 256p HP X-Class 128p HP V2500 92p IA-32 24 Extreme Black Diamond
32 quad-processor McKinley Servers (128p @ 4GF, 12GB
memory/server) OC-12 ATM
Calren
The HPCC Track
n
The
1990 HPCC 10 year initiative
was largely aimed at
enabling large scale simulations for a broad range of
computational science and engineering problems
n
It was in many ways a success and we have methods and
machines that can (begin to)
tackle most 3D simulations
•
ASCI simulations particularly impressive
•
DoE still putting substantial resources into basic software
and algorithms from adaptive meshes to PDE solver
libraries
n
Machines are still increasing in performance exponentially
and should achieve
petaflops
in next 7-10 years
n
EarthScope
community needs to harness these capabilities
•
Japan’s
Earth Simulator
activity major effort with large
hardware and software (
GEOFEM
) efforts
Some HPCC Advice to EarthScope
n
Important to build
Sustainable modular software
n
Use
MPI
and
openMP
if needed for performance
on shared memory nodes
n
Adaptive Meshes
n
Load Balancing
n
PDE Solvers including
fast multipoles
n
Particle dynamics
n
Other areas such as datamining, visualization
and data assimilation quite advanced but still
significant research
}
Are well understoo
to get high performanc parallel simulation
Use of Object Technologies
n There is emerging HPCC component architecture allowing
production of more modern libraries (integration Infrastructure)
• DoE has very large CCA – Common Component Architecture
– effort
• Package software (“system and applications”) as distributed
objects – not as traditional libraries
n CORBA Java and Web Services are not naturally high
performance as component models but OK for coarse grain objects (“full programs”)
n As a language, C++ can be high performance but Java is not uniformly so (it is improving)
• Fortran (including Fortran90) will continue to decline in importance and
interest – the community should prefer not to use it
n Not essential to write modules in object oriented language
• It is essential to package modules in object framework
What is a Web Service I
n A web service is a computer program running on either the local or remote machine with a set of well defined interfaces (ports) specified in XML (WSDL)
n In principle, computer program can be in any language (Fortran .. Java .. Perl .. Python) and the interfaces can be implemented in any way what so ever
• Interfaces can be method calls, Java RMI Messages, CGI Web
invocations, totally compiled away (inlining) but
n The simplest implementations involve XML messages (SOAP) and programs written in net friendly languages like Java and Python
n Web Services separate the meaning of a port (message) interface
from its implementation so CAN get high performance in spite of voluminous XML format
What is a Web Service II
n
Web Services have important implication that
ALL
interfaces are XML messages based.
In contrast
n
Web Services
in some sense
replace distributed object
paradigms such as
CORBA
and
Java
but can wrap these
other technologies as
Web Services
• We wrapped our CORBA + Java Computing Portal Gateway
as Web services straightforwardly
Securit
y Catalog
Paymen Credit
Card
Warehous e
shipping
WSDL interfaces
WSDL interfaces
etc. XML WS to WS Interfaces
(Virtual) XML Knowledge (User) Interface
Clients
(Virtual) XML Data Interface Raw Data
Ra
Resource
s
Raw Data W S W S Web Service (WS) W S W S WS WS WS
W S
Render to XML Display Format
(Virtual) XML Rendering
Classic Grid Architecture
Database Database
Netsolv e
Computin g
Securit y Collaboratio
n
Compositio n
Content Access
Resources
Client
s Users and Devices
Middle Tie Brokers Service Providers
Middle Tier becomes Web Services
Examples of System Web Services I
n OGSA (Open Grid Service Architecture)
• Integrate Web Service and Grid Concepts and allows Globus
to be implemented as Web Services
n Audio-Video Conferencing as a Web Service
• Integrates H323, SIP, JXTA (etc.) protocols by mapping to
single XML Interface
• Provides VRVS reflector model from Messaging Web Service
n Messaging or Event Web Service provides intelligent routing and buffering of messages
n Computing as a Web service
• Job submittal, status, composition, data services, visualization • Performance WS allows access to distributed monitoring
EarthScope Peer to Peer Grid Community
Distributed Scientists usin Collaboration Web Servic
to access/use Application Web Services
“Everything
(people/sensors applications) connected b XML messages
Gateway and Web Services
n
We can use the Gateway Computing Portal as an
example (
http://www.gatewayportal.org)
• It is largely built using CORBA with a Java Server Pages
front end
n
Several capabilities have been interfaced using WSDL
• Job Submission (11 Methods including execute local and
remote command, copy files etc. as well as Submit Job)
• Manage WebFlow Session (67 Methods)
• Generate Batch Script (just 1 method but two
implementations developed – one at SDSC and one at Indiana – with UDDI to manage)
• Each is one service – could have used finer grain services • Sample files are a
WSDL Abstractions
n
WSDL
abstracts a program
as an entity that does
something given one or more inputs with its results
defined by streams on one or more outputs.
n
Functions are defined by method name and
parameter
methodname(parm1,parm2, … parmN)
•
Where parameters are “Input” “Output” or both
n
In WSDL, we will have a
Web Service
which like a
(Java or CORBA Program) can be thought of as a
(distributed) object with many methods
•
Instead of a function call, the “calling routine” sends an
XML message to the Web Service specifying
methodname
and values of the parameters
•
Note name of function is just another parameter
WSDL Message Example
<message name="
submitRequest
">
<part name="
xmljob
" type="
xsd:string
"/>
</message>
<message name="
submitResponse
">
<part name="
response
" type="
xsd:string
"/>
</message>
For the batch script service, we pass the XML description of the job as a string and get back the script as a string. In general, any XML primitive or complex types can be used in messages.
SOAP and Gateway Portal I
n
Having specified service in WSDL, the run-time is
implemented in SOAP which is “just” an XML header
(info needed by transport – empty here) and body
n
Here is SOAP transported by HTTP message
n
This is
execLocalCommand
WSDL operation to run
one particular command (
ls
) on current WebFlow
directory
Specify ls as
Argument of operation
HTTP Header
SOAP Envelope and body 17
Examples of System Web Services II
n
Education
as a Web Service
• One of easiest to do as object standards well defined (IMS)
and little performance issues
• Grading, Homework submission, registration, assessment etc.
n
Universal Access
and Web Services
• As Web Services allow multiple implementation of a
particular interface, one can adjust to needs of particular clients (PDA v. versus, impaired sight etc.)
• Can build custom implementations of certain web services for
particular communities but re-use others
n
Collaborative Web Services
• As interfaces all message based, much easier to share Web
Education as a Web Service
n Can link to Science as a Web Service and substitute educational modules
n “Learning Object” XML standards already exist from IMS/ADL http://www.adlnet.org – need to update architecture
n Web Services for virtual university include: n Registration
n Performance (grading) n Authoring of Curriculum
n Online laboratories for real and virtual instruments n Homework submission
n Quizzes of various types (multiple choice, random parameters) n Assessment data access and analysis
n Synchronous Delivery of Curricula
n Scheduling of courses and mentoring sessions
n Asynchronous access, data-mining and knowledge discovery n Learning Plan agents to guide students and teachers
Sensor Web Service
Distributed Sensor Web Service
Out Web Service port
Universal sensor acces for people/computers
In Web Service port Different forma
Application Web Services
n Note Service model integrates sensors, sensor analysis, simulations and people n An Application Web Service is a capability used either by another service or
by a user
• It has input and output ports – data is from users, sensors or other services • Big services built hierarchically from “basic” services
Sensor Data as a We
service (WS) Data Analysis WS Sensor Managemen WS Visualization WS Simulation WS Filter
WS FilterWS FilterWS
Build as multiple Filter Web Services
Prog
WS ProgWS
Build as multiple interdisciplinar Programs Data Analysis WS Simulation WS Visualization WS
Message Or Event Based Inte
Connection
Reso urce
Data base
Reso urce Sof
ware Sof ware
XM Skin
e-Science is XML Specified Resource
connected by XML specified messages
XM Skin
e-Science is just a pile of XML
n Each leaf is a piece of XML either defining a nugget of
information and/or containing links to other XML or “raw resources”
Database
XML (RSS) Specification of Information Nuggets
n <item rdf:about="http://xml.com/pub/2000/08/09/xslt/xslt.html"> n <title> Processing Inclusions with XSLT </title>
n <link> http://xml.com/pub/2000/08/09/xslt/xslt.html </link>
n <description>
n Processing document inclusions with general XML tools can be n problematic. This article proposes a way of preserving inclusion n information through SAX-based processing.
n </description>
n </item>
n <item rdf:about="http://xml.com/pub/2000/08/09/rdfdb/index.html"> n <title> Putting RDF to Work </title>
n <link> http://xml.com/pub/2000/08/09/rdfdb/index.html </link>
n <description>
n Tool and API support for the Resource Description Framework n is slowly coming of age. Edd Dumbill takes a look at RDFDB, n one of the most exciting new RDF toolkits.
n </description>
n </item>
n </rdf:RDF>
Distributed Information
Actually the XML is distributed
all around in a dynamic Grid
Structured (XML) Information
earthscope://root/one/two/botto m
roo t
one
two
bottom
Note XML specifie both internal an
Matching Information/Service
Providers and Consumers I
n
Classic Centralized Approach
n
Those with services
publish information
as to location –
this is percolated
up and down the tree of brokers
n
At simplest, publish location; better publish location
and meta-data allowing easier discovery of value
n
Those wanting service, look it up using either
•
Some search of information registered with brokers
•
A search using a system like Google
•
Because they were told some key
n
Like using an
encyclopedia
; very
reliable
and
fast for
well established
information
Unstructured and Structured XML
earthscope://root/one/two/mes s
roo t
one
two
mess
“mess” can be multiple levels of tree Hoosier National Forest showin
Peer to Peer Grid
Database Database
JXTA
JXTA
Peer to Peer Grid
Web Service Interfaces
Web Service Interfaces
Event Messag Brokers
Integrate P2P and Grid/WS
Event Messag Brokers
Matching Information/Service
Providers and Consumers II
n Peer-to-peer Approach (or how to search the “mess”)
n Those with services publish XML advertisements to their friends; their friends may forward it to other friends
n Those wanting a service, publish an XML request to a chosen set of friends
n Friends use their personal idiosyncratic approach to matching requests with advertisements and to choosing who else should be asked
n Analogous to way communities exchange information as in a meeting like this
n Uncertain reliability but scales well (communities intra-exchange information independently) and supports rapidly varying
information (Web Services)
n Allows many different approaches – EarthScope imposes
Grid/P2P Use of Internet I
ROBERT B. COHEN, PH.D. COHEN COMMUNICATIONS GROUP [email protected] 212-986-7720
Global Grid Forum Toronto Feb 18 2002
Cohen’s Rival Estimate Mainl
Digital Video
Grid/P2P Use of Internet II
S2S Server to Server
Semantic Grid & Digital Brilliance I
n
The (XML) advertisement-request matching provides a
publish-subscribe linkage
between resources – these are
people, computers
and
raw/processed data
n
The richer the meta-data, the more precise the linkage
•
This is spirit of
Semantic Web
– RDF/DAML/OIL
metadata enables meaningful linkage
n
In a physics analogy,
resources
can be thought of as
spins
and the
meta-data
induced linkage as
forces
or
interactions
n
Phase transitions
will occur when “enough” resources
are linked – one will get associated spins to align in the
direction of
new knowledge
•
Term this
digital brilliance
Semantic Grid & Digital Brilliance II
n
This suggests ways of quantifying value of
metadata
induced linkages
and ways of identifying where one
“should” add more resource specifications
n
Note that related
resources
are
not
necessarily
directly
connected
but rather messages are forwarded through
friends
n
Study of Peer to Peer
networks teach us that we can
build “
small worlds
” where distance between resources
is logarithmic in number of nodes
n
This physics based picture provides an interesting
underlying formalism to give a
theory of e-Science
….
• All you need to do is to build a lot of XML Meta-data
Semantic Grid & Digital Brilliance III
n EarthScope Collaboratory consists of a set of connected “spins” (being a physicist; resources if I was W3C)
n Resources are anything with a digital signature
• Raw data, Analysers, Simulators, Simulations, Processed
Information, Extracted Knowledge, Scientists ….
n The linkage of Earthquake Fault Simulator Web Service to the
Greens Function Solver Web Service is as program to
subroutine; must have agreement on both syntax and Semantics
n The linkage of Granular Physics model to (my) remark that Los Alamos has interesting new simulation technology is less precise
n So linkages with very precise ontologies and those which are more qualitative are both part of Semantic Grid
Portals and Web Services
n
Web Services
allow us to build a
component model
(see
CCA) for resources.
n
Each resource
naturally has a
user interface
(which
might be customized for user)
n
Web Service <--> Portlet
n
Natural to use a component model for portal building
displayed web page from collection of portlets
• So can customize each portlet and customize which portlets
you want
n
Apache Jetspeed
seems good open source technology
supporting this model
• JSP model is better than say a client-side Java integration in
Jetspeed Computing Portal: Choose Portlets
4 available portlet
linking to Web Service I choose two
Choose Portlet Layout
Choose 1-column Layout
Two Computing
Portlets
EarthScope CSIT Strategy
n
Make a list of
resources
with a hierarchical
arrangement
• People, Places, Results (Publications, meeting archives,
Simulation Output), Activities, Sensors (Instruments), Data (raw and processed), Earth features, Computers, Software
n
Decide on component (Web Service) model and
URI
labeling (
earthscope://devices/satellites/year/label
…)
• Respect performance requirements
• Design so modules can be re-used, re-arranged and replaced
for outreach (education)
n
Study related CSIT architectures of other fields
• Grid Forum, PACI, ASCI for computing issues • W3C Web Consortium for basic IT infrastructure • openGIS XMML for related fields
EarthScope HPCC Strategy
n Decide what services are well enough understood and useful enough to be encapsulated as application Web Services
• Parallel FEM Solvers • Visualization
• Parallel Particle Dynamics • Access to Sensor Data
• Image Processing
n Make services as small as possible – smaller is simpler and more sustainable but with higher communication needs
• Compose large services from smaller ones
n Design Portals and portal components that allow one to manipulate services – set parameters, compose, invoke
n Install chosen System Web Services (job submit, performance, queue) on central machines and local clusters
• Make certain infrastructure supports compute, data,
middleware needs
• Set necessary hardware/software meta-data
EarthScope IT Strategy
n Design an internal EIF (EarthScope Internal Framework)
defining architecture and interface standards of internal Web Services and data structures
n Design EEF (EarthScope External Framework) which maps external raw data into sensor web services
n Support diverse set of explorations as many new approaches to Earth Science enabled by EarthScope
n Choose some appropriate (mix of) middleware frameworks • .net, IBM, BEA, Sun, Oracle
n Look at special requirements for key system services • Hardware/Data systems (new and legacy issues)
• Security
• Collaboration including Audio/Video conferencing • Peer-to-peer networking