e-Business e-Science
and the Grid
Geoffrey Fo
Professor of Computer Science, Informatics, Physics Pervasive Technology Laboratories
Indiana University Bloomington IN 47401 Chief Technologist for Anabas Corporation
Grid Computing: Making The Global
Infrastructure a Reality
n
Based on work done in
preparing book edited
wit
e-Business e-Science and the Grid
n
e-Business
captures an emerging view of corporations as
dynamic
virtual organizations
linking employees, customers
and stakeholders across the world.
•
The growing use of
outsourcing
is one example
n
e-Science
is the similar vision for scientific research with
international participation in large accelerators, satellites or
distributed gene analyses.
n
The
Grid
integrates the best of the Web, traditional
enterprise software, high performance computing and
Peer-to-peer systems to provide the information technology
infrastructure for
e-moreorlessanything
.
n
A
deluge of data
of unprecedented and inevitable size must
be managed and understood.
n
People
,
computers
,
data
and
instruments
must be linked.
n
On demand
assignment of experts, computers, networks and
So what is a Grid?
n Supporting human decision making with a network of at least
four large computers, perhaps six or eight small computers, and a great assortment of disc files and magnetic tape units -not to mention remote consoles and teletype stations - all
churning away. (Licklider 1960)
n Coordinated resource sharing and problem solving in
dynamic multi-institutional virtual organizations
n Infrastructure that will provide us with the ability to
dynamically link together resources as an ensemble to support the execution of large-scale, resource-intensive, and
distributed applications.
n Realizing thirty year dream of science fiction writers that
have spun yarns featuring worldwide networks of
e-Science
n
e-Science
is about
global collaboration
in key areas of
science, and the next generation of infrastructure that
will enable it. This is a major UK Program
n
e-Science
reflects growing importance of international
laboratories, satellites and sensors and their integrated
analysis by distributed teams
n
CyberInfrastructure
is the analogous US initiative
Grid
Technology
supports e-Science
and
Global Terabit Research Network
n
The
Grid
software and resources run on top of
high
Resources-on-demand
n
Computing-on-demand
uses dynamically assigned
(shared) pool of resources to support excess demand in
flexible cost-effective fashion
Program Computer 1 Program Computer 26 Program Computer 27 Program Computer 52 Spares Poo Computer 1 Poo Computer N <52 Program A Program Z
Static Assignment with redundancy
e-Business and (Virtual) Organizations
n Enterprise Grid supports information system for an
organization; includes “university computer center”, “(digital) library”, sales, marketing, manufacturing …
n Outsourcing Grid links different parts of an enterprise together
(Gridsourcing)
• Manufacturing plants with designers
• Animators with electronic game or film designers and
producers
• Coaches with aspiring players (e-NCAA or e-NFL etc.)
n Customer Grid links businesses and their customers as in many
web sites such as amazon.com
n e-Multimedia can use secure peer-to-peer Grids to link creators,
distributors and consumers of digital music, games and films respecting rights
n Distance education Grid links teacher at one place, students all
e-Defense and e-Crisis
n
Grids support
Command and Control
and provide
Global Situational Awareness
• Link commanders and frontline troops to themselves and to archival and real-time data; link to what-if simulations
• Dynamic heterogeneous wired and wireless networks • Security and fault tolerance essential
n
System of Systems;
Grid of Grids
• The command and information infrastructure of each ship is a Grid; each fleet is linked together by a Grid; the President is informed by and informs the national defense Grid
• Grids must be heterogeneous and federated
n
Crisis Management
and
Response
enabled by a Grid
Some Important Classes of Grids
n Computational Grids were origin of concepts and link
computers across the globe – high latency stops this from being used as parallel machine
n Knowledge and Information Grids link sensors and information
repositories as in Virtual Observatories or BioInformatics
• More detail on next slide
n Education Grids link teachers, learners, parents as a VO with
learning tools, distant lectures etc.
n e-Science Grids link multidisciplinary researchers across
laboratories and universities
n Community Grids focus on Grids involving large numbers of
peers rather than focusing on linking major resources – links Grid and Peer-to-peer network concepts
n Semantic Grid links Grid, and AI community with Semantic web
Information/Knowledge Grids
n
Distributed
(10’s to 1000’s) of
data sources
(instruments,
file systems, curated databases …)
n
Data Deluge
: 1 (now) to 100’s
petabyte
s/year (2012)
• Moore’s law for Sensorsn
Possible
filters
assigned dynamically (
on-demand
)
•
Run image processing algorithm on telescope image
•Run Gene sequencing algorithm on compiled data
n
Needs
decision support
front end with “what-if”
simulations
n
Metadata
(
provenance
)
critical to annotate data
n
Integrate
across experiment
as in multi-wavelength
astronomy
Database Database
Closely Coupled Compute Nodes
Analysis and Visualization
Repositorie Federated Databases
Sensor Nets Streaming Data
Loosely Coupled Filters
SERVOGrid – Solid Earth Research Virtual
In flight data
Airline
Maintenance Centre
Ground Station
Global Network Such as SITA
Internet, e-mail, pager
Engine Health (Data) Center
DAME
Rolls Royce and UK e-Science Progra
Distributed Aircraft Maintenance
Environment
~ Gigabyte per aircraft per Engine per transatlantic
flight
NASA Aerospace Engineering Grid
Virtual Observatory Astronomy Gri
Integrate Experiments
Radio Far-Infrared Visible
Visible + X-ray
Dust Map
e-Chemistry Laborator
Experiments-on-demand
Grid Resources
SERVOGrid Requirements
n
Seamless Access
to Data repositories and large scale
computers
n
Integration
of
multiple data sources
including sensors,
databases, file systems with analysis system
• Including filtered OGSA-DAI (Grid database access) n
Rich meta-data
generation and access with
SERVOGrid specific Schema
extending openGIS
(Geography as a Web service) standards and using
Semantic Grid
n
Portals
with component model for user interfaces and
web control of all capabilities
Sources of Grid Technology
n
Grids support distributed collaboratories or virtual
organizations integrating concepts from
n
The Web
n
Agents
n
Distributed Objects
(CORBA Java/Jini COM)
n
Globus, Legion, Condor, NetSolve, Ninf and other High
Performance Computing activities
n
Peer-to-peer Networks
n
With perhaps the Web and P2P networks being the most
important for “Information Grids” and Globus for
The Essence of Grid Technology?
n
We will start from the Web view and assert that basic
paradigm is
n
Meta-data rich Web Services communicating via
messages
n
These have some basic support from some runtime
such as .NET, Jini (pure Java), Apache Tomcat+Axis
(Web Service toolkit), Enterprise JavaBeans,
WebSphere (IBM) or GT3 (Globus Toolkit 3)
• These are the distributed equivalent of operating system functions as in UNIX Shell
• Called Hosting Environment or platform
A typical Web Service
n In principle, services can be in any language (Fortran .. Java ..
Perl .. Python) and the interfaces can be method calls, Java RMI Messages, CGI Web invocations, totally compiled away (inlining)
n The simplest implementations involve XML messages (SOAP) and
programs written in net friendly languages like Java and Python Paymen
Credit Card
Warehous e
Shipping control
WSDL interfaces
WSDL interfaces
Securit
y Catalog
Porta Service
Services and Distributed Objects
n A web service is a computer program running on either the local
or remote machine with a set of well defined interfaces (ports) specified in XML (WSDL)
n Web Services (WS) have many similarities with Distributed
Object (DO) technology but there are some (important) technical and religious points (not easy to distinguish)
• CORBA Java COM are typical DO technologies
• Agents are typically SOA (Service Oriented Architecture)
n Both involve distributed entities but Web Services are more
loosely coupled
• WS interact with messages; DO with RPC (Remote Procedure Call) • DO have “factories”; WS manage instances internally and
interaction-specific state not exposed and hence need not be managed
• DO have explicit state (statefull services); WS use context in the messages to
link interactions (statefull interactions)
n Claim: DO’s do NOT scale; WS build on experience (with
Details of Web Service Protocol Stack
n UDDI finds where programs are
• remote (distributed) programs are just Web Services
• (not a great success)
n WSFL links programs togethe
(under revision as BPEL4WS)
n WSDL defines interface (methods,
parameters, data formats)
n SOAP defines structure of message
including serialization of information
n HTTP is negotiation/transport protocol n TCP/IP is layers 3-4 of OSI
n Physical Network is layer 1 of OSI
UDDI or WSIL
WSFL
WSDL
SOAP or RMI
HTTP or SMTP or IIOP or RMTP
TCP/IP
Education as a Web Service
n “Learning Object” XML standards already exist n Web Services for virtual university include:
n Registration
n Performance (grading) n Authoring of Curriculum
n Online laboratories for real and virtual instruments n Homework submission
n Quizzes of various types (multiple choice, random parameters) n Assessment data access and analysis
n Synchronous Delivery of Curricula including Audio/Video
Conferencing and other synchronous collaborative tools as Web Services
n Scheduling of courses and mentoring sessions
Classic Grid Architecture
Database Database
Netsolv e
Computin g
Securit y Collaboratio
n
Compositio n
Content Access
Resources
Client
s Users and Devices
Middle Tie Brokers Service Providers
Some Observations
n “Traditional “ Grids manage and share asynchronous resources
in a rather centralized fashion
n Peer-to-peer networks are “just like” Grids with different
implementations of message-based services like registration and look-up
n Collaboration systems like WebEx/Placeware (Application
sharing) or Polycom (audio/video conferencing) can be viewed as Grids
n Computers are fast and getting faster. One can afford many
strategies that used to be unrealistic including rich usually XML based messaging
n Web Services interact with messages
• Everything (including applications like PowerPoint) will be a
Web Service?
Peer to Peer Grid
Database Database
Peers Peers
Peer to Peer Grid A democratic organization
User Facin
Web Service Interfaces
Service Facin
Web Service Interfaces
Event Messag Brokers
Event Messag Brokers
System and Application Services?
n There are generic Grid system services: security, collaboration,
persistent storage, universal access
• OGSA (Open Grid Service Architecture) is implementing these as extended Web Services
n An Application Web Service is a capability used either by another
service or by a user
• It has input and output ports – data is from sensors or other services
n Consider Satellite-based Sensor Operations as a Web Service
• Satellite management (with a web front end) • Each tracking station is a service
• Image Processing is a pipeline of filters – which can be grouped into different services
• Data storage is an important system service
• Big services built hierarchically from “basic” services
What is Happening?
n
Grid ideas are being developed in (at least) two
communities
• Web Service – W3C, OASIS
• Grid Forum (High Performance Computing, e-Science) n
Service
Standards
are being debated
n
Grid Operational
Infrastructure
is being deployed
n
Grid Architecture
and core software being developed
nParticular
System Services
are being developed
“centrally” – OGSA framework for this in
n
Lots of fields are setting
domain specific standards
and
building domain specific
services
n
There is a lot of
hype
n
Grids are viewed differently in different areas
• Largely “computing-on-demand” in industry (IBM, Oracle, HP, Sun)
OGSA OGSI & Hosting
Environments
• Start with Web Services in a hosting environment
• Add OGSI to get a Grid service and a component model
• Add OGSA to get Interoperable Grid “correcting” differences in base
platform and adding key functionalities
OGSI on Web Services
Broadly applicable services: registry,
authorization, monitoring, data
access, etc., etc.
Hosting Environment for WS
More specialized services: data
replication, workflow, etc., etc. Domai
n - servicesspecific
Network
OGSA
Environment
Possibly OGSA Not OGSA
Technical Activities of Note
•
Look at different styles of Grids such as
Autonomic
(Robust Reliable Resilient)
•
New Grid architectures hard due to investment required
•
Critical
Services
Such as
– Security – build message based not connection based – Notification – event services
– Metadata – Use Semantic Web, provenance
– Databases and repositories – instruments, sensors
– Computing – Submit job, scheduling, distributed file systems – Visualization, Computational Steering
– Fabric and Service Management – Network performance
•
Program
the Grid – Workflow
Issues and Types of Grid Services
• 1) Types of Grid
– R3
– Lightweight – P2P
– Federation and Interoperability
• 2) Core Infrastructure and Hosting Environment
– Service Management – Component Model
– Service wrapper/Invocation – Messaging
• 3) Security Services
– Certificate Authority – Authentication
– Authorization – Policy
• 4) Workflow Services and Programming Model
– Enactment Engines (Runtime) – Languages and Programming – Compiler
– Composition/Development
• 5) Notification Services
• 6) Metadata and Information Services
– Basic including Registry
– Semantically rich Services and meta-data – Information Aggregation (events)
– Provenance
• 7) Information Grid Services
– OGSA-DAI/DAIT
– Integration with compute resources – P2P and database models
• 8) Compute/File Grid Services
– Job Submission
– Job Planning Scheduling Management – Access to Remote Files, Storage and
Computers
– Replica (cache) Management – Virtual Data
– Parallel Computing
• 9) Other services including
– Grid Shell – Accounting
– Fabric Management
– Visualization Data-mining and Computational Steering
– Collaboration
• 10) Portals and Problem Solvin Environments
• 11) Network Services
Data
Technology Components of (Services in a Computing Grid
1: Job Management Service
(Grid Service Interface to user or program client)
2: Schedule and control Execution
1: Plan
Execution Submittal4: Job
Remote Grid Service Remote Grid
Service
6: File and Storage
Access 3: Access to Remote Computers
Data
7: Cach Dat
Replicas 5: Data Transfer
10: Job Status
8: Virtua Data
Conclusions
n
Grids
are inevitable and
pervasive
n
Can expect Web
Services
and
Grids
to merge with a
common set of general principles but different
implementations with different scaling and
functionality trade-offs
n
Enough is known that one can
start today
n
We will be
flooded
with
data
, information and
purported knowledge
n
One should be
preparing Grid strategies
;
understanding relevant Web and Grid
standards
and
developing new domain specific standards
n
Note many existing (standards) efforts assume
Grid Computing: Making The Global
Infrastructure a Reality
n