Algorithms and the Grid
Geoffrey Fo
Computer Science, Informatics, Physics Pervasive Technology Laboratories Indiana University Bloomington IN 47401
March 18 2005
Trends in Simulation Research
n 1990-2000 the HPCC High Performance Computing and
Communication Initiative
• Established Parallel Computing
• Developed wonderful algorithms – especially in partial
differential equation and particle dynamics areas
• Almost no useful software except for MPI – messaging
between parallel computer nodes
n 1995-now Internet explosion and development of Web Service
distributed system model
• Replaces CORBA, Java RMI, HLA, COM etc.
n 2000- now: almost no USA academic work in core simulation • Major projects like ASCI (DoE) and HPCMO (DoD) thrive n 2003-? Data Deluge apparent and Grid links Internet and HPCC
e-Business e-Science and the Grid
n
e-Business
captures an emerging view of corporations as
dynamic
virtual organizations
linking employees, customers
and stakeholders across the world.
n
e-Science
is the similar vision for scientific research with
international participation in large accelerators, satellites or
distributed gene analyses.
n
The
Grid
or
CyberInfrastructure
integrates the best of the
Web, Agents, traditional enterprise software, high
performance computing and Peer-to-peer systems to provide
the information technology
e-infrastructure
for
e-moreorlessanything
.
n
A
deluge of data
of unprecedented and inevitable size must
be managed and understood.
n
People
,
computers
,
data
and
instruments
must be linked.
Some Important Styles of Grids
n Computational Grids were origin of concepts and link
computers across the globe – high latency stops this from being used as parallel machine
n Knowledge and Information Grids link sensors and information
repositories as in Virtual Observatories or BioInformatics
• More detail on next slide
n Collaborative Grids link multidisciplinary researchers across
laboratories and universities
n Community Grids focus on Grids involving large numbers of
peers rather than focusing on linking major resources – links Grid and Peer-to-peer network concepts
n Semantic Grid links Grid, and AI community with Semantic web
(ontology/meta-data enriched resources) and Agent concepts
n Grid Service Farms supply services-on-demand as in
Information/Knowledge Grids
n
Distributed
(10’s to 1000’s) of
data sources
(instruments,
file systems, curated databases …)
n
Data Deluge
: 1 (now) to 100’s
petabyte
s/year (2012)
• Moore’s law for Sensors
n
Possible
filters
assigned dynamically (
on-demand
)
•
Run image processing algorithm on telescope image
•Run Gene sequencing algorithm on compiled data
n
Needs
decision support
front end with “what-if”
simulations
n
Metadata
(
provenance
)
critical to annotate data
n
Integrate
across experiment
Virtual Observatory Astronomy Gri
Integrate Experiments
Radio Far-Infrared Visible
Visible + X-ray
Dust Map
Galaxy Density
e-Business and (Virtual) Organizations
n Enterprise Grid supports information system for an
organization; includes “university computer center”, “(digital) library”, sales, marketing, manufacturing …
n Outsourcing Grid links different parts of an enterprise together Manufacturing plants with designers
• Animators with electronic game or film designers and
producers
• Coaches with aspiring players (e-NCAA or e-NFL etc.) • Outsourcing will become easier ……..
n Customer Grid links businesses and their customers as in many web sites such as amazon.com
n e-Multimedia can use secure peer-to-peer Grids to link
creators, distributors and consumers of digital music, games and films respecting rights
In flight data Airline Maintenance Centre Ground Station Global Network Such as SITA
Internet, e-mail, pager
Engine Health (Data) Center
DAME
Rolls Royce and UK e-Science Progra
Distributed Aircraft Maintenance
Environment
~ Gigabyte per aircraft per Engine per transatlantic
flight
~5000 engines
e-Defense and e-Crisis
n Grids support Command and Control and provide Global
Situational Awareness
• Link commanders and frontline troops to themselves and to archival and
real-time data; link to what-if simulations
• Dynamic heterogeneous wired and wireless networks • Security and fault tolerance essential
n System of Systems; Grid of Grids
• The command and information infrastructure of each ship is a Grid; each
fleet is linked together by a Grid; the President is informed by and informs the national defense Grid
• Grids must be heterogeneous and federated
n Crisis Management and Response enabled by a Grid linking
sensors, disaster managers, and first responders with decision support
n Define and Build DoD relevant Services – Collaboration,
Large Scale Parallel Computers
Old Style Metacomputing Grid
Analysis and Visualization
Classes of Computing Grid Applications
n
Running “
Pleasing Parallel Jobs
” as in United Devices,
Entropia (Desktop Grid) “cycle stealing systems”
n
Can be managed (“inside” the
enterprise
as in Condor)
or more informal (as in SETI@Home)
n
Computing-on-demand
in Industry where jobs spawned
are perhaps very large (SAP, Oracle …)
n
Support
distributed file systems
as in Legion (Avaki),
Globus with (web-enhanced) UNIX programming
paradigm
• Particle Physics will run some 30,000 simultaneous jobs this
way
n
Pipelined
applications linking data/instruments,
compute, visualization
n
Seamless Access
where Grid portals allow one to choose
What is Happening?
n Grid ideas are being developed in (at least) two communities • Web Service – W3C, OASIS
• Grid Forum (High Performance Computing, e-Science) • Open Middleware Infrastructure Institute OMII currently
only in UK but maybe spreads to EU and USA
n Service Standards are being debated
n Grid Operational Infrastructure is being deployed n Grid Architecture and core software being developed
n Particular System Services are being developed “centrally” –
OGSA framework for this in
n Lots of fields are setting domain specific standards and building
domain specific services
n Grids are viewed differently in different areas
• Largely “computing-on-demand” in industry (IBM, Oracle,
HP, Sun)
A typical Web Service
n In principle, services can be in any language (Fortran .. Java ..
Perl .. Python) and the interfaces can be method calls, Java RMI Messages, CGI Web invocations, totally compiled away (inlining)
n The simplest implementations involve XML messages (SOAP) and
programs written in net friendly languages like Java and Python Paymen Credit Card Warehous e Shipping control WSDL interfaces WSDL interfaces Securit
y Catalog
Porta Service
Services and Distributed Objects
n A web service is a computer program running on either the local
or remote machine with a set of well defined interfaces (ports) specified in XML (WSDL)
n Web Services (WS) have many similarities with Distributed
Object (DO) technology but there are some (important) technical and religious points (not easy to distinguish)
• CORBA Java COM are typical DO technologies
• Agents are typically SOA (Service Oriented Architecture)
n Both involve distributed entities but Web Services are more
loosely coupled
• WS interact with messages; DO with RPC (Remote Procedure Call) • DO have “factories”; WS manage instances internally and
interaction-specific state not exposed and hence need not be managed
• DO have explicit state (statefull services); WS use context in the messages to
link interactions (statefull interactions)
n Claim: DO’s do NOT scale; WS build on experience (with
Grid impact on Algorithms I
n
Your favorite parallel algorithm will often run
untouched
on a Grid node linked to other simulations
using traditional algorithms
n
Algorithms
tolerant of high latency
n
Algorithms for
new applications
enabled by the Grid
nData assimilation
for data-deluged science generalizing
data mining
• Where and how to process data
• Incorporation of data in simulation
n
Complex Systems algorithms for non traditional
simulations as in biology, social systems
Grid impact on Algorithms II
n MPI software model not suited for Grid; use SOAP andpublish/subscribe
• Microseconds and milliseconds Latency
n Grid workflow needs “integration algorithms”
• Multidisciplinary algorithms for loose code coupling • Workflow scheduling algorithms (data oriented)
• Data caching algorithms
n Algorithms like distributed hash tables for distributed storage
and look up of data
n Algorithms for Grid security
• Efficient support of group keys for multicast • Detection of Denial of Service attacks
n Much better software available for building toolkits and
Data Deluged Science
n In the past, we worried about data in the form of parallel I/O or
MPI-IO, but we didn’t consider it as an enabler of new algorithms and new ways of computing
n Data assimilation was not central to HPCC
n DoE ASCI set up because didn’t want test data!
n Now particle physics will get 100 petabytes from CERN • Nuclear physics (Jefferson Lab) in same situation
• Use around 30,000 CPU’s simultaneously 24X7 n Weather, climate, solid earth (EarthScope)
n Bioinformatics curated databases (Biocomplexity only 1000’s of
data points at present)
Data
Information
Ideas Simulation
Model
Assimilation
Reasoning
Datamining
Computationa Science
Informatics
Data Deluge
Scienc
a
Topography 1 km
Stress Change
Earthquakes
PBO
Site-specific Irregular
Scalar Measurements Constellations for Plate Boundary-Scale Vector Measurements
a
a
Ice Sheets Volcanoes
Long Valley, CA
Northridge, CA
Hector Mine, CA Greenland
Database Database
Researc Simulation
s Analysis and
Visualizatio Repositorie Federated Databases Data Filte Services
Field Trip Data
SERVOGrid Requirements
n
Seamless Access
to Data repositories and large scale
computers
n
Integration
of
multiple data sources
including sensors,
databases, file systems with analysis system
• Including filtered OGSA-DAI (Grid database access)
n
Rich meta-data
generation and access with
SERVOGrid specific Schema
extending openGIS
(Geography as a Web service) standards and using
Semantic Grid
n
Portals
with component model for user interfaces and
web control of all capabilities
HPC Simulation Data Filter Data Filter Filter Data Filt er Data Filter Distributed Filters massage data For simulation Other Gri
Data Assimilation
n Data assimilation implies one is solving some optimization problem which might have Kalman Filter like structur
n Due to data deluge, one will become more and more dominated
by the data (Nobs much larger than number of simulation
points).
n Natural approach is to form for each local (position, time)
patch the “important” data combinations so that optimization doesn’t waste time on large error or insensitive data.
n Data reduction done in natural distributed fashion NOT on HPC machine as distributed computing most cost effective if calculations essentially independent
Distributed Filtering
HPC Machine Distribute
Machine
Data Filter
Nobslocal patch 1
Nfilteredlocal patch 1
Nobslocal patch 2
Nfilteredlocal patch 2 Geographicall
y
Distribute
Sensor patches
N
obslocal patch>> N
filteredlocal patch≈
Number_of_Unknowns
local patchSend needed Filter Receive filtered data In simplest approach, filtered data gotten by linear transformations on original data based on Singular Value Decomposition of Least
squares matrix
Factorize Matri to product of
Non Traditional Applications: Critical
Infrastructure Simulations
n
These include
electrical/gas/water
grids and
Internet
,
transportation
, cell/wired
phone
dynamics.
n
One has some “classic
SPICE
style” network
simulations in area like power grid (although load and
infrastructure data incomplete)
• 6000 to 17000 generators
• 50000 to 140000 transmission lines • 40000 to 100000 substations
n
Need algorithms both fo
Non Traditional Applications: Critical
Infrastructure Simulations
n
Activity data
for people/institutions essential for
detailed dynamics but again these are not “classic” data
but need to be “fitted” in
data assimilation
style in
terms of some assumed lower level model.
• They tell you goals of people but not their low level movement
n
Disease
and
Internet virus
spread and
social network
simulations can be built on dynamics coming from
infrastructure simulations
• Many results like “small world” internet connection structure
are qualitative and unclear if they can be extended to detailed simulations
(Non) Traditional Structure
n 1) Traditional: Known equations plus boundary values
n 2) Data assimilation: somewhat uncertain initial conditions and
approximations corrected by data assimilation
n 3) Data deluged Science: Phenomenological degrees of freedom
swimming in a sea of data
Known Data
Predictio n
Known Equations on
Agreed DoF
Phenomenologica Degrees of Freedom Swimming in a Sea of
Some Questions for Non Traditional
Applications
n No systematic study of how best to represent data deluged
sciences without known equations
n Obviously data assimilation very relevant
n Role of Cellular Automata (CA) and refinements like the New
Kind of Science by Wolfram
• Can CA or Potts model parameterize any system?
n Relationship to back propagation and other neural network
representations
n Relationship to “just” interpolating data and then extrapolating
a little
n Role of Uncertainty Analysis – everything (equations, model,
data) is uncertain!
n Relationship of data mining and simulation
n A new trade-off: How to split funds between sensors and
When is a High Performance Computer?
n We might wish to consider three classes of multi-node computers n 1) Classic MPP with microsecond latency and scalable internodebandwidth (tcomm/tcalc ~ 10 or so)
n 2) Classic Cluster which can vary from configurations like 1) to 3)
but typically have millisecond latency and modest bandwidth
n 3) Classic Grid or distributed systems of computers around the
network
• Latencies of inter-node communication – 100’s of milliseconds
but can have good bandwidth
n All have same peak CPU performance but synchronization costs
increase as one goes from 1) to 3)
n Cost of system (dollars per gigaflop) decreases by factors of 2 at
each step from 1) to 2) to 3)
n One should NOT use classic MPP if class 2) or 3) suffices unless
some security or data issues dominates over cost-performance
n One should not use a Grid as a true parallel computer – it can
Building PSE’s with th
Rule of the Millisecond I
n Typical Web Services are used in situations with interaction
delays (network transit) of 100’s of milliseconds
n But basic message-based interaction architecture only incurs fraction of a millisecond delay
n Thus use Web Services to build ALL PSE components
• Use messages and NOT method/subroutine call or RPC
Interaction
Nugget
1 Nugget2
Nugget
Building PSE’s with th
Rule of the Millisecond II
n Messaging has several advantages over scripting languages
• Collaboration trivial by sharing messages
• Software Engineering due to greater modularity
• Web Services do/will have wonderful support
n “Loose” Application coupling uses workflow technologies
n Find characteristic interaction time (millisecond programs; microseconds
MPI and particle) and use best supported architecture at this level
• Two levels: Web Service (Grid) and C/C++/C#/Fortran/Java/Python
n Major difficulty in frameworks is NOT building them but rather in supporting them
• IMHO only hope is to always minimize life-cycle support risks
• Simulation/science is too small a field to support much!
n Expect to use DIFFERENT technologies at each level even though possible to do everything with one technology
• Trade off support versus performance/customization
Requirements for MPI Messaging
n MPI and SOAP Messaging both send data from a source to a
destination
• MPI supports multicast (broadcast) communication;
• MPI specifies destination and a context (in comm parameter) • MPI specifies data to send
• MPI has a tag to allow flexibility in processing in source processor • MPI has calls to understand context (number of processors etc.) n MPI requires very low latency and high bandwidth so that
tcomm/tcalc is at most 10
• BlueGene/L has bandwidth between 0.25 and 3
Gigabytes/sec/node and latency of about 5 microseconds
Latency determined so Message Size/Bandwidth > Latency
tcomm
Requirements for SOAP Messaging
n Web Services has much of the same requirements as MPI withtwo differences where MPI more stringent than SOAP
• Latencies are inevitably 1 (local) to 100 milliseconds which is
200 to 20,000 times that of BlueGene/L
n 1) 0.000001 ms – CPU does a calculation n 2) 0.001 to 0.01 ms – MPI latency
n 3) 1 to 10 ms – wake-up a thread or process n 4) 10 to 1000 ms – Internet delay
• Bandwidths for many business applications are low as one
just needs to send enough information for ATM and Bank to define transactions
n SOAP has MUCH greater flexibility in areas like security,
fault-tolerance, “virtualizing addressing” because one can run a lot of software in 100 milliseconds
• Typically takes 1-3 milliseconds to gobble up a modest
Structure of SOAP
n
SOAP defines a very obvious message structure with a
header
and a
body
just like email
n
The
header
contains information used by the “
Internet
operating system
”
• Destination, Source, Routing, Context, Sequence Number …
n
The
message body
is partly further information used by
the operating system and partly information for
application
when it is not looked at by “operating
system” except to encrypt, compress it etc.
• Note WS-Security supports separate encryption for different
parts of a document
n
Much discussion in field revolves around
what is
referenced in header
MPI and SOAP Integration
n
Note SOAP Specifies format and through WSDL
interfaces
n
MPI only specifies interface and so interoperability
between different MPIs requires additional work
• IMPI http://impi.nist.gov/IMPI/
n
Per
vasive networks can support high bandwidth
(Terabits/sec soon) but latency issue is not resolvable in
general way
n
Can combine MPI interfaces with SOAP messaging but
I don’t think this has been done
n
Just as walking, cars, planes, phones coexist with
NaradaBrokering
n
http://www.naradabrokering.org
n
W
e have built a messaging system that is designed to
support traditional Web Services but has an
architecture that allows it to support high performance
data transport as required for Scientific applications
• We suggest using this system whenever your application can
tolerate 1-10 millisecond latency in linking components
• Use MPI when you need much lower latency
n
Use SOAP approach when MPI interfaces required but
latency high
• As in linking two parallel applications at remote sites
n
Technically it forms an overlay network supporting in
Pentium-3, 1GHz, 256 MB RAM
100 Mbps LAN
JRE 1.3 Linux
hop-3 0 1 2 3 4 5 6 7 8 9 100 1000 Transit Delay (Milliseconds)
Message Payload Size (Bytes)
Mean transit delay for message samples in NaradaBrokering: Different communication hops
Fast Web Service Communication I
• Internet Messaging systems allow one to optimize message
streams at the cost of “startup time”,
• Web Services can deliver the fastest possible
interconnections
with or without reliable messaging
• Typical results from Grossman (UIC) comparing Slow
SOAP over TCP with binary and UDP transport (latter
gains a factor of
1000
)
Pure SOAP SOAP over UDP Binary over UDP
Fast Web Service Communication II
• Mechanism only works for
streams
– sets of related
messages
• SOAP header in streams is constant
except for
sequence number (Message ID), time-stamp ..
• One needs two types of new Web Service
Specification
• “
WS-StreamNegotiation
” to define how one can use
WS-Policy to send messages at start of a stream to
define the methodology for treating remaining
messages in stream
• “
WS-FlexibleRepresentation
” to define new
Fast Web Service Communication III
• Then use “WS-StreamNegotiation” to negotiate stream in TortoiseSOAP – ASCII XML over HTTP and TCP –
–
Deposit basic SOAP header through connection – it is
part of context for stream (linking of 2 services)
–
Agree on firewall penetration, reliability mechanism,
binary representation and fast transport protocol
–
Naturally transport UDP plus WS-RM
• Use “WS-FlexibleRepresentation” to define encoding of a Fast
transport (On a different port) with messages just having
“FlexibleRepresentationContextToken”, Sequence Number, Time stamp if needed
–
RTP packets have essentially this structure
–
Could add stream termination status
• Can monitor and control with original negotiation stream
• Can generate different streams optimized for different end-points
Role of Workflow
n
Programming SOAP and Web Services (the Grid)
:
Workflow describes linkage between services
n
As distributed,
linkage must be by messages
n
Linkage is two-way and has both control and data
nApply to disciplinary, scale linkage,
multi-program linkage, link
visualization to simulation
, GIS to
simulations and visualization filters to each other
n
Microsoft-IBM specification
BPEL
is current preferred
Web Service XML specification of workflow
Service-1 Service-3
Example workflow
Here a sensor feeds a
data-mining application
(We are extending
data-mining in DoD applications
with Grossman from UIC)
The data-mining
application drives a
visualization
SERVOGrid Codes, Relationships
Elastic Dislocation
Pattern Recognizers
Fault Model BEM
Viscoelastic Layered BEM
Viscoelastic FEM Elastic Dislocation Inversion
Two-level Programming I
• The Web Service (Grid) paradigm implicitly assumes a
two-level Programming Model
• We make a
Service
(same as a “distributed object” or
“computer program” running on a remote computer) using
conventional technologies
– C++ Java or Fortran Monte Carlo module – Data streaming from a sensor or Satellite – Specialized (JDBC) database access
• Such
services
accept and produce data from users files and
database
• The Grid is built by coordinating such services assuming
we have solved problem of programming the service
Servic
Two-level Programming II
n
The Grid is discussing the composition of distributed
services
with the runtime
interfaces to Grid as
opposed to UNIX
pipes/data streams
n
Familiar from use of UNIX Shell, PERL or Python
scripts to produce real applications from core programs
n
Such interpretative environments are the single
processor analog of
Grid Programming
n
Some projects like GrADS from Rice University are
looking at integration between service and composition
levels but dominant effort looks at each level separately
Service
1 Service2
Service
3 Layer Programming Model
Application (level 1 Programming)
Application Semantics (Metadata, Ontology) Level 2 “Programming”
Basic Web Service Infrastructure
Web Service 1
Workflow (level 3) Programming BPEL
WS 2 WS 3 WS 4
MPI Fortran C++ etc.
Conclusions
n
Grids
are inevitable and
pervasive
n
Can expect Web
Services
and
Grids
to merge with a
common set of general principles but different
implementations with different scaling and
functionality trade-offs
n
We will be
flooded
with
data
, information and
purported knowledge
n
Develop
algorithms
that exploit and support the
data
deluge
n