1
Grids Challenged by a Web
2.0 and Multicore
Sandwich
CCGrid 2007
Windsor Barra Hotel Rio de Janeiro Brazil
May 15 2007
Geoffrey Fox
Computer Science, Informatics, Physics Pervasive Technology Laboratories Indiana University Bloomington IN 47401
2
Abstract
n Grids provide managed support for distributed Internet Scale services
and although this is clearly a broadly important capability, adoption of Grids has been slower than perhaps expected.
n Two important trends are Web 2.0 and Multicore that have tremendous
momentum and large natural communities and that both overlap in important ways with Grids.
n Web 2.0 has services but does not require some of the strict protocols
needed by Grids or even Web Services. Web 2.0 offers new approaches to composition and portals with a rich set of user oriented services.
n Multicore anticipates 100’s of core per chip and the ability and need to
“build a Grid on a chip”. This will use functional parallelism that is likely to derive its technologies from parallel computing and not the Grid realm.
n We discuss a Grid future that embraces Web 2.0 and multicore and
suggests how it might need to change.
n Virtual Machines (Virtualization) is another important development
3
e-moreorlessanything is an Application
n ‘e-Science is about global collaboration in key areas of science,
and the next generation of infrastructure that will enable it.’ from its inventor John Taylor Director General of Research Councils UK, Office of Science and Technology
n e-Science is about developing tools and technologies that allow
scientists to do ‘faster, better or different’ research
n Similarly e-Business captures an emerging view of corporations as
dynamic virtual organizations linking employees, customers and stakeholders across the world.
n This generalizes to e-moreorlessanything
n A deluge of data of unprecedented and inevitable size must be
managed and understood.
n People (see Web 2.0), computers, data and instruments must be
linked.
n On demand assignment of experts, computers, networks and
4
Role of Cyberinfrastructure
n Supports distributed science – data, people, computers n Exploits Internet technology (Web2.0) adding (via Grid
technology) management, security, supercomputers etc. n It has two aspects: parallel – low latency (microseconds)
between nodes and distributed – highish latency (milliseconds) between nodes
n Parallel needed to get high performance on individual 3D simulations, data analysis etc.; must decompose problem
n Distributed aspect integrates already distinct components n Cyberinfrastructure is in general a distributed collection of
parallel systems
n Cyberinfrastructure is made of services (often Web services) that are “just” programs or data sources packaged for
Not so controversial Ideas
n Distributed software systems are being “revolutionized” bydevelopments from e-commerce, e-Science and the consumer Internet. There is rapid progress in technology families termed “Web services”, “Grids” and “Web 2.0”
n The emerging distributed system picture is of distributed services
with advertised interfaces but opaque implementations
communicating by streams of messages over a variety of protocols
• Complete systems are built by combining either services or
predefined/pre-existing collections of services together to achieve new capabilities
n Note messaging (MPI and some thread systems) interesting in
parallel computing to support either “safe concurrency without side effects” or distributed memory
n We can use the term Grids strictly (Narrow or even more strictly
OGSA Grids) or just call any collections of services as “Broad Grids” which actually is quite often done – in this talk Grid
Web 2.0 and Web Services I
n Web Services have clearly defined protocols (SOAP) and a well
defined mechanism (WSDL) to define service interfaces
• There is good .NET and Java support
• The so-called WS-* specifications provide a rich sophisticated but
complicated standard set of capabilities for security, fault tolerance, meta-data, discovery, notification etc.
n “Narrow Grids” build on Web Services and provide a robust
managed environment with growing adoption in Enterprise systems and distributed science (so called e-Science)
n Web 2.0 supports a similar architecture to Web services but has
developed in a more chaotic but remarkably successful fashion with a service architecture with a variety of protocols including those of Web and Grid services
• Over 400 Interfaces defined at http://www.programmableweb.com/apis
n Web 2.0 also has many well known capabilities with Google
Maps and Amazon Compute/Storage services of clear general relevance
n There are also Web 2.0 services supporting novel collaboration
Web 2.0 and Web Services II
n
I once thought
Web Services were inevitable
but this is
no longer clear to me
n
Web services are
complicated
,
slow
and
non functional
•
WS-Security
is unnecessarily slow and pedantic
(canonicalization of XML)
•
WS-RM
(Reliable Messaging) seems to have poor
adoption and doesn’t work well in collaboration
•
WSDM
(distributed management) specifies a lot
n
There are
de facto standards
like
Google Maps
and
powerful suppliers like Google which “define the rules”
n
One can easily
combine SOAP
(Web Service) based
services/systems with HTTP messages but the “lowest
common denominator” suggests additional
Applications, Infrastructure,
Technologies
n The discussion is confused by inconsistent use of terminology –
this is what I mean
n Multicore, Narrow and Broad Grids and Web 2.0 (Enterprise
2.0) are technologies
n These technologies combine and compete to build infrastructures
termed e-infrastructure or Cyberinfrastructure
• Although multicore can and will support “standalone” clients probably
most important client and server applications of the future will be internet enhanced/enabled so key aspect of multicore is its role and integration in e-infrastructure
n e-moreorlessanything is an emerging application area of broad
importance that is hosted on the infrastructures e-infrastructure
Attack of the Killer Multicores
n Today commodity Intel systems are sold with 8 cores spread overtwo processors
n Specialized chips such as GPU’s and IBM Cell processor have
substantially more cores
n Moore’s Law implies and will be satisfied by and imply
exponentially increasing number of cores doubling every 1.5-3 Years
• Modest increase in clock speed
• Intel has already prototyped a 80 core Server chip ready in
2011?
n Huge activity in parallel computing programming (recycled from
the past?)
• Some programming models and application styles similar to
Grids
n We will have a Grid on a chip ……….
PC07Intro gcf@indiana.edu 10
IBM Cell Processor
•
This supports pipelined
(through 8 cores) or data
parallel operations
distributed on 8 SPE’s
Applications running well on Cell or AMD GPU should run scalably
on future mainline multicore chips Focus on memory
bandwidth key (dataflow not
Grids meet Multicore Systems
n The expected rapid growth in the number of cores per chip hasimportant implications for Grids
n With 16-128 cores on a single commodity system 5 years from
now one will both be able to build a Grid like application on a chip and indeed must build such an application to get the
Moore’s law performance increase
• Otherwise you will “waste” cores …..
n One will not want to reprogram as you move your application
from a 64 node cluster or transcontinental implementation to a single chip Grid
n However multicore chips have a very different architecture from
Grids
• Shared not Distributed Memory
• Latencies measured in microseconds not milliseconds
n Thus Grid and multicore technologies will need to “converge”
and converged technology model will have different requirements from current Grid assumptions
Grid versus Multicore Applications
n
It seems likely that
future multicore applications
will
involve a loosely coupled mix of multiple modules that
fall into three classes
• Data access/query/store • Analysis and/or simulation
• User visualization and interaction
n
This is
precisely mix that Grids support
but Grids of
course involve distributed modules
n
Grids and Web 2.0 use
service oriented architectures
to
describe system at module level – is this appropriate
model for multicore programming?
n
Where do multicore systems get their data from
?
Pradeep K. Dubey, pradeep.dubey@intel.com 13
Tomorrow
What is …? What
if …? Is it …?
Recognition Mining Synthesis
Create a model instance
RMS: Recognition Mining Synthesis
Model-based multimodal
recognition
Find a model instance Model
Real-time analytics on dynamic, unstructured, multimodal datasets Photo-realism and physics-based animation Today
Model-less Real-time streaming and transactions on
static – structured datasets
Very limited realism
Pradeep K. Dubey, pradeep.dubey@intel.com 14
What is a tumor? Is there a tumor here? What if the tumor progresses?
It is all about dealing efficiently with complex multimodal datasets
Recognition Mining Synthesis
PC07Intro gcf@indiana.edu 15
Role of Data in Grid/Multicore I
n
One typically is told to place compute (
analysis
) at the
data
but most of the computing power is in
multicore
clients
on the edge
n
These
multicore clients
can get data from the internet
i.e. distributed sources
• This could be personal interests of client and used by client to
help user interact with world
• It could be cached or copied
• It could be a standalone calculation or part of a distributed
coordinated computation (SETI@Home)
n
Or
they could get data from set of
local sensors
(video-cams and environmental sensors) naturally stored on
client or locally to client
Role of Data in Grid/Multicore
n
Note that as you increase sophistication of data
analysis, you increase ratio of compute to I/O
• Typical modern datamining approach like Support Vector
Machine is sophisticated (dense) matrix algebra and not just text matching
• http://grids.ucs.indiana.edu/ptliupages/presentations/PC2007/PC07BYOPA.ppt
n
Time complexity
of Sophisticated
data analysis
will
make it more attractive to fetch data from the Internet
and cache/store on client
• It will also help with memory bandwidth problems in
multicore chips
n
In this vision, the
Grid “just” acts as a source of data
and the Grid application runs locally
PC07Intro gcf@indiana.edu 18
Three styles of Multicore “Jobs”
• Totally independent or nearly so (B C E F) – This used to becalled embarrassingly parallel and is now pleasingly so
– This is preserve of job scheduling community and one gets efficiency by statistical mechanisms with (fair) assignment of jobs to cores
– “Parameter Searches” generate this class but these are often not optimal way to search for “best parameters”
– “Multiple users” of a server is an important class of this type
– No significant synchronization and/or communication latency constraints
• Loosely coupled (D) is “Metaproblem” with several components orchestrated with pipeline, dataflow or not very tight constraints
– This is preserve of Grid workflow or mashups
– Synchronization and/or communication latencies in millisecond to second or more range
• Tightly coupled (A) is classic parallel computing program with components synchronizing often and with tight timing constraints
– Synchronization and/or communication latencies around a microsecond
A
PC07Intro gcf@indiana.edu 19
Multicore Programming Paradigms
•
At a very high level, there are
three broad classes
of
parallelism
•
Coarse grain functional parallelism
typified by workflow
and often used to build composite “metaproblems” whose
parts are also parallel
–
This area has several good solutions getting better
– Pleasingly parallel applications can be considered special cases of functional parallelism
•
Large Scale loosely synchronous data parallelism
where
dynamic irregular work has clear synchronization points as
in most large scale scientific and engineering problems
•
Fine grain thread parallelism
as used in search algorithms
which are often data parallel (over choices) but don’t have
universal synchronization points
Programming Models
• So the Fine grain thread parallelism and Large Scale loosely synchronous data parallelism styles are distinctive to parallel computing while
• Coarse grain functional parallelism of multicore overlaps with
workflows from Grids and Mashups from Web 2.0
• Seems plausible that a more uniform approach evolve for coarse grain case although this is least constrained of programming
styles as typically latency issues are not critical
– Multicore would have strongest performance constraints
– Web 2.0 and Multicore the most important usability constraints
• A possible model for broad use of multicores is that the difficult parallel algorithms are coded as libraries (Fine grain thread
parallelism and Large Scale loosely synchronous data parallelism
PC07Intro gcf@indiana.edu 21
Google MapReduce
Simplified Data Processing on Large Clusters
• http://labs.google.com/papers/mapreduce.html
• This is a dataflow model between services where services can do useful document oriented data parallel applications including reductions
• The decomposition of services onto cluster engines is automated
• The large I/O requirements of datasets changes efficiency analysis in favor of dataflow
• Services (count words in example) can obviously be extended to general parallel applications
Old and New (Web 2.0) Community Tools
e-mail and list-serves are oldest and best used
Kazaa, Instant Messengers, Skype, Napster, BitTorrent for P2P
Collaboration – text, audio-video conferencing, files
del.icio.us, Connotea, Citeulike, Bibsonomy, Biolicious manage
shared bookmarks
MySpace, YouTube, Bebo, Hotornot, Facebook, or similar sites
allow you to create (upload) community resources and share them; Friendster, LinkedIn create networks
• http://en.wikipedia.org/wiki/List_of_social_networking_websites
Writely, Wikis and Blogs are powerful specialized shared
document systems
ConferenceXP and WebEx share general applications Google Scholar tells you who has cited your papers while
publisher sites tell you about co-authors
• Windows Live Academic Search has similar goals
Note sharing resources creates (implicit) communities
23
“Best Web 2.0 Sites” -- 2006
n
Extracted from
http://web2.wsj2.com/
n
Social Networking
n
Start Pages
n
Social Bookmarkin
n
Peer Production News
n
Social Media Sharing
n
Online Storage
Web 2.0 Systems are Portals, Services, Resources
n
Captures the incredible development of interactive
25
Mashups v Workflow?
n Mashup Tools are reviewed at http://blogs.zdnet.com/Hinchcliffe/?p=63 n Workflow Tools are reviewed by Gannon and Fox
http://grids.ucs.indiana.edu/ptliupages/publications/Workflow-overview.pdf n Both include
scripting in PHP, Python, sh etc. as both implement distributed
programming at level of services
n Mashups use all
types of service
interfaces and do not have the potential
robustness (security) of Grid service
approach
n Typically “pure”
26
Grid Workflow Datamining in Earth Science
n Work with Scripps Institute
n Grid services controlled by workflow process real time
data from ~70 GPS Sensors in Southern California
Streaming Data Support
Transformations Data Checking
Hidden Marko Datamining (JPL)
Display (GIS)
NASA GPS
Earthquake
27
Web 2.0 uses all types of Services
n
Here a Gadget Mashup uses a 3 service workflow with
Web 2.0 APIs
http://www.programmable
web.com/apis
has (May 14
2007) 431 Web 2.0 APIs
with GoogleMaps the most
often used in Mashups
This site acts as a “
UDDI
”
The List of
Web 2.0 API’s
Each site has API and
its features
Divided into broad
categories
Only a few used a lot
(
42 API’s
used in
more than
10
mashups
)
RSS feed of new APIs
Amazon S3 growing
APIs/Mashups per Protocol
Distribution
REST SOAP XML-RPC REST,
XML-RPC XML-RPC,REST, SOAP
REST,
SOAP JS Other
4 more
Mashups each
day
For a total of 1906
April 17 2007 (4.0 a day over last
month)
Note ClearForest
runs Semantic Web Services Mashup
competitions (not workflow
competitions)
Some Mashup
types: aggregators, search aggregators, visualizers, mobile, maps, games
32
Mash
Planet
Web 2.0
Architecture http://www.imagin33
34
Browser + Google Map API
Cass County Map Server
(OGC Web Map Server) Hamilton County Map Server (AutoDesk) Marion County Map Server (ESRI ArcIMS) Browser client
fetches image tiles for the bounding box
using Google Map API.
Tile Server
Cache Server
Adapter Adapter Adapter
Tile Server requests map tiles at all zoom levels with all layers. These are converted to uniform projection, indexed, and stored. Overlapping images are combined. Must provide adapters for each Map Server type .
The cache server fulfills Google map calls with cached tiles at the requested
bounding box that fill the bounding box.
Google Maps Server
A “Grid” Workflow
35
GIS Grid of “Indiana Map” and ~10 Indiana counties with accessible Map (Feature) Servers from different vendors. Grids federate different data
repositories (cf Astronomy VO federating different observatory collections)
Now to Portals
36
Grid-style portal as used in Earthquake Grid
37
Portlets v. Google Gadgets
n
Portals for Grid Systems are built using portlets with
software like GridSphere integrating these on the
server-side into a single web-page
n
Google (at least) offers the Google sidebar and Google
home page which support Web 2.0 services and do not
use a server side aggregator
n
Google is more user friendly!
n
The many Web 2.0 competitions is an interesting model
for promoting development in the world-wide
distributed collection of Web 2.0 developers
n
I guess Web 2.0 model will win!
Typical Google Gadget Structure
… Lots of HTML and JavaScript </Content> </Module>
Portlets build User Interfaces by combining fragments in a standalone Java Server
Google Gadgets build User Interfaces by combining fragments with JavaScript on the client
Google Gadgets are an example of Start Page technolog
Web 2.0 v Narrow Grid I
n Web 2.0 allows people to nurture the Internet Cloud and such
people got Time’s person of year award
n Whereas Narrow Grids support Internet scale Distributed
Services with similar architecture
n Maybe Narrow Grids focus on (number of) Services (there
aren’t many scientists) and Web 2.0 focuses on number of People
n Both agree on service oriented architectures but have different
emphasis
n Narrow Grids have a strong emphasis on standards and
structure; Web 2.0 lets a 1000 flowers (protocols) and a million developers bloom and focuses on functionality, broad usability and simplicity
• Semantic Web/Grid has structure to allow reasoning • Annotation in sites like del.icio.us and uploading to
Web 2.0 v Narrow Grid II
Web 2.0 has a set of major services like GoogleMaps or Flickr
but the world is composing Mashups that make new composite services
• End-point standards are set by end-point owners
• Many different protocols covering a variety of de-facto standards
Narrow Grids have a set of major software systems like Condor
and Globus and a different world is extending with custom services and linking with workflow
Popular Web 2.0 technologies are PHP, JavaScript, JSON,
AJAX and REST with “Start Page” e.g. (Google Gadgets)
interfaces
Popular Narrow Grid technologies are Apache Axis, BPEL
WSDL and SOAP with portlet interfaces
Robustness of Grids demanded by the Enterprise?
Not so clear that Web 2.0 won’t eventually dominate other
application areas and with Enterprise 2.0 it’s invading Grids
Implication for Grid Technology
of Multicore and Web 2.0 I
n
Web 2.0 and Grids are addressing a
similar application
class
although Web 2.0 has focused on user interactions
•
So technology has similar requirements
n
Multicore differs significantly from Grids in
component location and this seems
particularly
significant for data
•
Not clear therefore how similar applications will be
•Intel RMS multicore application class
pretty similar
to Grids
n
Multicore has more stringent software requirements
than Grids as latter has intrinsic network overhead
Implication for Grid Technology
of Multicore and Web 2.0 II
n
Multicore chips require
low overhead protocols
to
exploit low latency that suggests
simplicity
• We need to simplify MPI AND Grids!
n
Web 2.0 chooses
simplicity
(REST rather than SOAP)
to
lower barrier
to everyone participating
n
Web 2.0 and Multicore tend to use
traditional (possibly
visual) (scripting) languages
for equivalent of workflow
whereas Grids use
visual interface backend recorded in
BPEL
•
Google MapReduce
illustrates a popular Web 2.0
and Multicore approach to dataflow
Implication for Grid Technology
of Multicore and Web 2.0 III
n Web 2.0 and Grids both use SOA Service Oriented
Architectures
• Seems likely that Multicore will also adopt although a more
conventional object oriented approach also possible
• Services should help multicore applications integrate
modules from different sources
• Multicore will use fine grain objects but coarse grain
services
n “System of Systems”: Grids, Web 2.0 and Multicore are likely to build systems hierarchically out of smaller systems
• We need to support Grids of Grids, Webs of Grids, Grids
of Multicores etc. i.e. systems of systems of all sorts
Implication for Grid Technology
of Multicore and Web 2.0 IV
n Portals are likely to feature both Web and “desktop client”
technology although it is possible that Web approach will be adopted more or less uniformly
n Web 2.0 has a very active portal activity which has similar
architecture to Grids
• A page has multiple user interface fragments
n Web 2.0 user interface integration is typically Client side using
Gadgets AJAX and JavaScript while
• Grids are in a special JSR168 portal server side using Portlets WSRP and
Java
n Multicore doesn’t put special constraints on portal technology
but it could tend to favor non browser client or client side Web browser integrated portals
The “Momentum” Effects
n Web 2.0 has momentum as it is driven by success of social web
sites and the user friendly protocols attracting many developers
of mashups
n Grids momentum driven by the success of eScience and the
commercial web service thrusts largely aimed at Enterprise
• Enterprise software area not quite as dominant as in past • Grid technical requirements are a bit soft and could be
compromised if sandwiched by Web 2.0 and Multicore
• Will commercial interest in Web Services survive?
n Multicore driven by expectation that all servers and clients will
have many cores
• Multicore latency requirements imply cannot compromise in
some technology choices
n Simplicity, supporting many developers and stringent multicore
The Ten areas covered by the 60 core WS-*
Specifications
WSRP (Remote Portlets) 10: Portals and User
Interfaces
WS-Policy, WS-Agreement 9: Policy and Agreements
WSDM, WS-Management, WS-Transfer 8: Management
WSRF, WS-MetadataExchange, WS-Context 7: System Metadata and State
UDDI, WS-Discovery 6: Service Discovery
WS-Security, WS-Trust, WS-Federation, SAML, WS-SecureConversation
5: Security
BPEL, WS-Choreography, WS-Coordination 4: Workflow and
Transactions
WS-Notification, WS-Eventing (Publish-Subscribe)
3: Notification
WS-Addressing, WS-MessageDelivery; Reliable Messaging WSRM; Efficient Messaging MOTM 2: Service Internet
XML, WSDL, SOAP 1: Core Service Model
WS-* Areas and Web 2.0
Start Pages, AJAX and Widgets(Netvibes) Gadgets 10: Portals and User
Interfaces
Service dependent. Processed by application 9: Policy and Agreements
WS-Transfer style Protocols GET PUT etc. 8:
Management==Interaction
Processed by application – no system state –
Microformats are a universal metadata approach 7: System Metadata and
State
http://www.programmableweb.com 6: Service Discovery
SSL, HTTP Authentication/Authorization, OpenID is Web 2.0 Single Sign on
5: Security
Mashups, Google MapReduce
Scripting with PHP JavaScript …. 4: Workflow and
Transactions (no
Transactions in Web 2.0)
Hard with HTTP without polling– JMS perhaps? 3: Notification
No special QoS. Use JMS or equivalent? 2: Service Internet
XML becomes optional but still useful SOAP becomes JSON RSS ATOM
WSDL becomes REST with API as GET PUT etc. Axis becomes XmlHttpRequest
1: Core Service Model
WS-* Areas and Multicore
Web 2.0 technology popular 10: Portals and User
Interfaces
Handled by application 9: Policy and Agreements
Interaction between objects key issue in parallel programming trading off efficiency versus
performance 8: Management ==
Interaction
Environment Variables 7: System Metadata and State
Use libraries 6: Service Discovery
Not so important intrachip 5: Security
Many approaches; scripting languages popular 4: Workflow and
Transactions
Publish-Subscribe for events and Interrupts 3: Notification
Not so important intrachip 2: Service Internet
Fine grain Java C# C++ Objects and coarse grain services as in DSS. Information passed explicitly or by handles. MPI needs to be updated to handle non scientific applications as in CCR
1: Core Service Model
CCR as an example of a Cross Paradigm
Run Time
•
Naturally supports fine grain thread switching
with message passing with around
4 microsecond
latency for 4 threads switching to 4 others on an
AMD PC with C#. Threads spawned – no
rendezvous
•
Has around
50 microsecond latency
for coarse
grain service interactions with DSS extension
which supports Web 2.0 style messaging
•
MPI Collectives – Shift and Exchange vary from
10 to 20 microsecond latency
in rendezvous mode
•
Not as good as best MPI’s but managed code and
supports Grids Web 2.0 and Parallel
Computing ……
•
See
PC07Intro gcf@indiana.edu 50
Microsoft CCR
• Supports exchange of messages between threads using named
ports
• FromHandler: Spawn threads without reading ports
• Receive: Each handler reads one item from a single port
• MultipleItemReceive: Each handler reads a prescribed number of items of a given type from a given port. Note items in a port can be general structures but all must have same type.
• MultiplePortReceive: Each handler reads a one item of a given type from multiple ports.
• JoinedReceive: Each handler reads one item from each of two ports. The items can be of different type.
• Choice: Execute a choice of two or more port-handler pairings
• Interleave: Consists of a set of arbiters (port -- handler pairs) of 3 types that are Concurrent, Exclusive or Teardown (called at end for clean up). Concurrent arbiters are run concurrently but
exclusive handlers are
Overhead (latency) of AMD 4-core PC with 4 execution threads on MPI style Rendezvous Messaging for Shift and Exchange implemented either as two shifts or as custom CCR pattern. Compute time is 10 seconds divided by number of stages
Stages (millions)
Time
Microseconds
Rendezvous exchange as two shifts Rendezvous exchange customized for MPI
Overhead (latency) of INTEL 8-core PC with 8 execution threads on MPI style Rendezvous Messaging for Shift and Exchange implemented either as two shifts or as custom CCR pattern. Compute time is 15 seconds divided by number of stages
Stages (millions) Time
Microseconds
Rendezvous exchange as two shifts Rendezvous exchange customized for MPI
PC07Intro gcf@indiana.edu 53
Timing of HP Opteron Multicore as a function of number of simultaneous two-way service messages processed (November 2006 DSS Release) n CGL Measurements of Axis 2 shows about 500 microseconds – DSS is 10 times better