Grid Technology
Implication
for ACES and
SERVOGrid
Brisban
Australi
June 5 2003
Geoffrey Fox Marlon Pierce
Community Grids Lab Indiana University
http://academia.web.cern.ch/academia/lectures/grid /
What is Grid Technology?
• Grids support distributed collaboratories or virtualorganizations integrating concepts from
• The Web
• Distributed Objects (CORBA Java/Jini COM)
• Globus Legion Condor NetSolve Ninf and other High Performance Computing activities
• Peer-to-peer Networks
• With perhaps the Web being the most important for “Information Grids” and Globus for “Compute Grids”
– Information Grids are basis of SERVOGrid
Paradigms Protocols Platforms and Hosting
• We can start from the Web view where the
basic
Grid paradigm
is
• Meta-data rich Web Services communicating via
messages
• These have some basic support from some runtime
such as .NET, Jini (pure Java), Apache
Tomcat+Axis (Web Service toolkit), Enterprise
JavaBeans, WebSphere (IBM) or GT3 (Globus
Toolkit 3)
– These are the distributed equivalent of operating system functions as in UNIX Shell
Taxonomy of Grid Functionalities
Grid supporting a company’s enterprise infrastructure Enterprise Grid
Grid supporting University community computing Campus Grid
Hybrid combination of Information and Compute/File Grid emphasizing integration of experimental data, filters and simulations
Complexity or Hybrid Grid
Grid service access to distributed information, data and knowledge repositories
Information Grid
“Internet Computing” and “Cycle Scavenging” with secure sandbox on large numbers of untrusted computers
Desktop Grid
Run multiple jobs with distributed compute and data resources (Global “UNIX Shell”)
Compute/File Grid
Description of Grid Functionality Name of Grid Type
Database Database
Closely Coupled Compute Nodes
Analysis and Visualization Repositorie
Federated Databases
Sensor Nets
Streaming Data
Loosely Coupled Filters
HPC Simulation Data Filter Data Filter Data Filter Data Filt er Data Filter Distributed Filters massage data For simulation Other Gri
and W eb Servi ces Analysi Control Visualize SERVOGrid (Complexity)Computing Model Grid OGSA-DAI Grid Services
This Type of Grid
integrates with
Parallel computing
Multiple HPC facilities but only use one at a time Many simultaneous
data sources and sinks
Taxonomy of Grid Operational Style
Fault tolerant and self-healing Grid Robust Reliable Resilient R3
R3 or Autonomic Grid
Grid supporting collaborative tools like the Access Grid, whiteboard and shared applications.
Collaboration Grid
Grid designed for rapid deployment and minimum life-cycle support costs
Lightweight Grid
Grid built with peer-to-peer mechanisms
Peer-to-peer Grid
Integration of Grid and Semantic Web meta-data and ontology technologies
Semantic Grid
Description of Grid Operational or Architectural Style
SERVOGrid Grid Requirements
• Seamless Access to Data repositories and large scale computers
• Integration of multiple data sources including sensors, databases, file systems with analysis system
– Including filtered OGSA-DAI
• Rich meta-data generation and access with SERVOGrid specific Schema extending industry standards
• Portals with component model for user interfaces and web control of all capabilities
What is a Web Service I
• A web service is a computer program running on either the local or remote machine with a set of well defined interfaces (ports) specified in XML (WSDL)
• In principle, computer program can be in any language (Fortran .. Java .. Perl .. Python) and the interfaces can be implemented in any way what so ever
– Interfaces can be method calls, Java RMI Messages, CGI Web invocations, totally compiled away (inlining) but
• The simplest implementations involve XML messages (SOAP)
and programs written in net friendly languages like Java and Python
• Web Services separate the meaning of a port (message) interface from its implementation
etc. XML WS to WS Interfaces
(Virtual) XML Knowledge (User) Interface
Clients
(Virtual) XML Data Interface Raw Data Ra Resource s Raw Data W S W S Web Service (WS) W S W S W
S WS WS
W S
Render to XML Display Format
(Virtual) XML Rendering
What are System and Application Services?
• There are generic Grid system services: security, collaboration, workflow, notification
– OGSA (Open Grid Service Architecture) is implementing these as extended Web Services
• An Application Web Service is a capability used either by another service or by a user
– It has input and output ports – data is from sensors or other services
• Consider Satellite-based Sensor Operations as a Web Service
– Satellite management (with a web front end) – Each tracking station is a service
– Image Processing is a pipeline of filters – which can be grouped into different services
– Data storage is an important system service
– Big services built hierarchically from “basic” services
Application Web Services
• Note Service model integrates sensors, sensor analysis, simulations and people • An Application Web Service is a capability used either by another service or by a
user
– It has input and output ports – data is from users, sensors or other services – Big services built hierarchically using workflow from “basic” services
Sensor Data as a We
service (WS) Data Analysis WS Sensor Managemen WS Visualization WS Simulation WS ` Filter
WS FilterWS FilterWS
Workflow builds as multiple Filter Web Services
Prog
WS ProgWS or as multiple
Grid Politics
• There is a Global Grid Forum meeting 3 times per year with about 700 attendees per meeting
– Exchange information and define standards for “everything” not done in W3C and OASIS
– e.g. Grid Service, Security, What is a Job, Database, Computer, How to build portals ….
• There is a large project called Globus developing software largely for “compute/file” Grids
• There are some 50 Grid projects (mainly in Europe and USA) developing software and applications as well as installing
infrastructure
– Some are “deployment”: EDG NMI VDT …..
• There are related initiatives called CyberInfrastructure (NSF USA) and e-Science (UK)
• There is a proposed OMII (Open Middleware Infrastructure
OGSA/OGSI Top Level View
• OGSA is the set of
“core” Grid services
– Stuff you can’t live without
– If you built a Grid you would need to invent these things
http://www.gridforum.org/Meetings/ggf7/docs/default.htm http://www.globusworld.org/globusworld_web/jw2_program_tut.htm
Web Services and OGSI
Broadly applicable services: registry,
authorization, monitoring, data
access, etc., etc.
Hosting Environment Models for resources& ot her ent ities
More specialized services: data
replication, workflow, etc., etc. Domai
n - servicesspecific
O
the
r model
s
OGSI Open Grid Service Interface
• http://www.gridforum.org/ogsi-wg
• It is a “component model” for web services.
• It defines a set of behavior patterns that each OGSI service must exhibit. • Every “Grid Service” portType extends a common base type.
– Defines an introspection model for the service – You can query it (in a standard way) to discover
• What methods/messages a port understands
• What other port types does the service provide? • If the service is “stateful” what is the current state? • A set of standard portTypes for
– Message subscription and notification – Service collections
• Each service is identified by a URI called the “Grid Service Handle”
• GSHs are bound dynamically to Grid Services References (typically wsdl docs)
– A GSR may be transient. GSHs are fixed.
Two-level Programming I
• The paradigm implicitly assumes a two-level Programming Model
• We make a Service (same as a “distributed object” or
“computer program” running on a remote computer) using conventional technologies
– C++ Java or Fortran Monte Carlo module – Data streaming from a sensor or Satellite – Specialized (JDBC) database access
• Such nuggets accept and produce data from users files and database
• The Grid is built by coordinating such nuggets assuming we have solved problem of programming the nugget
Nugge
Two-level Programming II
• The Grid is discussing the linkage and distribution of the
nuggets with the onl
addition runtime interfaces to Grid as opposed to
UNIX data stream
• Familiar from use of UNIX Shell, PERL or Python scripts to produce real applications from core programs
• Such interpretative environments are the single processor analog of Grid Programming and this tends to be called
workflow
• Workflow is the composition of multiple services (programs) together to make a new service
– Includes “Software Bus”, “Application Integration”, “Co-ordination Languages” etc.
Nugget
1 Nugget2
Nugget
Workflow
• Workflow has at least 4 parts
– “Programming Environment” – typically GUI to drag and drop services and their linkages (familiar from AVS etc. which was workflow for visualization)
– Language – from XML to extended Python
– Compiler – converting Language into executable
– Runtime controlling flow of information and notification events
• Can use Python, Mathematica, Matlab, JavaSpaces, IBM BPEL4WS, DoE CCA etc.
– Don’t think current systems are very near “what we will want” but expect much progress over next 3 years and plenty of systems to work with
e-Science and the Data Deluge
Particle Physics
• 2006/7: First pp collisions at TeV energies at the Large Hadron Collider at CERN in Geneva
• ATLAS/CMS Experiments involve 2000 physicists from 200 organizations in US, EU, Asia
• Need to store,access, process, analyse 10 Petabytes/yr with 200 Teraflop/s distributed computation
• Building hierarchical Grid infrastructure to distribute data and computation
• Many 10’s of million $ funding for global particle physics Grid – GryPhyN, PPDataGrid, iVDGL, EU DataGrid, EU DataTag, UK GridPP projects
Astronomy and its Data Deluge
• Virtual Observatories – NVO, AVO, AstroGrid – Store all wavelengths, need distributed joins – NVO 500 TB/yr from 2004
• Laser Interferometer Gravitational Observatory
– Search for direct evidence for gravitational waves – LIGO 250 TB/yr, random streaming from 2002 • VISTA Visible and IR Survey Telescope in 2004
– 250 GB/night, 100 TB/yr, Petabytes in 10 yrs
• New phase of astronomy, storing, searching and analysing Petabytes of data
The total area of
astronomical telescopes in m2, and CCDs measured in
Engineering, Chemistry,
Environmental BioInformatics and
Medical Applications
• Real-Time Industrial Health Monitoring
– UK DAME project for Rolls Royce Aero Engines – 1 GB sensor data/flight, 100,000 engine hours/day
• Combinatorial Chemistry – experiments on demand • Earth Observation
– ESA satellites generate 100 GB/day
– NASA 15 PB by 2007
• Bioinformatics
– Tens of TB of high value curated data
• Medical Images to Information
Importance of Metadata
• Metadata is ‘data about data’
e.g. cataloges, indices, directory structures
• Librarians work with books which have same basic ‘schema’
e.g. title, author(s), publisher, date, etc
• Need for hierarchical, community-based approach to defining metadata and schemas
– e.g. CML, SERVOGridML ……..
• Metadata important for interoperability of
databases/federated archives, and for construction of intelligent search agents
Simulation Output as Digital Library
• Digital Libraries usually for archiving of text,
audio and video data
• Scientific data require transformation,
data-mining and visualisation tools
Emergence of a ne
research methodology?
• Traditional scientific methodologies are
theory and experiment
• Last half of 20th century saw emergence of
scientific simulation as a third methodology
• This century will see emergence of a fourth
methodology - collection-based research
OGSA-DA
(Malcolm Atkinson Edinburgh) UK e-Science Grid Core Programme
Development of Data Access and Integration Services for OGSA
http://umbriel.dcs.gla.ac.uk/NeSC/general/projects/OGSA_DAI
- Access to XML Databases Access to Relational Databases
-DAI Key Services
GridDataService GDS Access to data & DB
operations
GridDataServiceFactory GDSF Makes GDS & GDSF
GridDataServiceRegistry GDSR Discovery of GDS(F) & Data
GridDataTranslationService GDTS Translates or Transforms Data
GridDataTransportDepot
Integrated Structured Data Transport
GDTD Data transport with persistenceRelational & XML models supported
Role-based Authorisation
Client
Client Client
Relation al
database
Grid Data Service
Directo ry / File system XML
databas e
Integration of Data and Filters
• One has the OGSA-DAI Data repository interfacecombined with WSDL of the (Perl, Fortran, Python …) filter
• User only sees WSDL not data syntax
• Some non-trivial issues as to where the filtering compute power is
– Microsoft says filter next to data
D B
Filter
WSDL Of Filter
OGSA OGSI & Hosting
Environments
• Start with Web Services in a hosting environment
• Add OGSI to get a Grid service and a component model
• Add OGSA to get Interoperable Grid “correcting” differences in base platform and adding key functionalities
OGSI on Web Services
Broadly applicable services: registry,
authorization, monitoring, data
access, etc., etc.
Hosting Environment for WS Models for resources& ot her ent ities
More specialized services: data
replication, workflow, etc., etc. Domai
n - servicesspecific
O the r model s Network OGSA Environment Possibly OGSA Not OGSA
Permeating Principles and Policies
• Meta-data rich Message-linked Web Services as the permeating paradigm • “User” Component Model such as “Enterprise JavaBean (EJB)” or .NET. • Service Management framework including a possible Factory mechanism • High level Invocation Framework describing how you interact with system
components.
– This could for example be used to allow the system to built from either W3C or GGF style (OGSI) Web Services and to protect the user from changes in their specifications.
• Security is a service but the need for fine grain selective authorization encourages • Policy context that sets the rules for each particular Grid.
– Currently OGSA supports policies for routing, security and resource use. • The Grid Fabric or set of resources needs mechanisms to manage them. This
includes automatic recording of meta-data and configuration of software.
• Quality of service (QoS) for the Network and this implies performance monitoring and bandwidth reservation services.
– Challenging as end-to-end and not just backbone QoS is needed.
• Messaging systems like MQSeries from IBM provide robustness from asynchronous delivery and can abstract destination and allow customization of content such as
converting between different interface specifications.
Virtualization
• The Grid could and sometimes does virtualize various concepts
• Location: URI (Universal Resource Identifier) virtualizes URL
• Replica management (caching) virtualizes file location generalized by GriPhyn virtual data concept
• Protocol: message transport and WSDL bindings virtualize transport protocol as a QoS request
• P2P or Publish-subscribe messaging virtualizes matching of source and destination services
• Semantic Grid virtualizes Knowledge as a meta-data query
• Brokering virtualizes resource allocation
Interfaces and Functionality and Semantics I
• The Grid platform tries to minimize detail in protocols and maximize detail in interfaces to enhance scaling
• However rich meta-data and semantics are critical for correct and interesting operation
– Put as much semantic interpretation as you can into specific services
– Lack of Semantic interoperation is in fact main weakness of today’s Grids and Web services
• Everything becomes a service whether system or application level
• There are some very important “Global Services”
– Discovery (look up) and Registration of service metadata
– Workflow
Interfaces and Functionality and Semantics II
• There are many other generally important services
• OGSA-DAI The Database Service
• Portal Service linked to by WSRP (Web services
for Remote Portals)
• Notification of events
• Job submission
• Provenance – interpret meta-data about history of
data
• File Interfaces
• Sensor service – satellites …
• Visualization
Web Services as a Portlet
• Each Web Service naturally has a
user interface specified as “just another port”
– Customizable for universal access
• This gives each Web Service a
Portlet view specified (in XML as always) by WSRP (Web services for Remote Portals)
• So component model for resources “automatically” gives a component model for user interfaces
– When you build your
application, you define portle
at same time
Application o Content source WSD L Web Service S R W P
Application as a WS
General Application Port Interface with other We Services
User Face o Web Servic
WSRP Ports define
WS as a Portlet
Web Services have other ports (Grid Service) to be
Online Knowledge Center built from Portlets
• Web Services
provide a
component model
for the middleware (see large “
common
component architecture
” effort in Dept. of
Energy)
• Should match each WSDL component with
a corresponding user interface component
• Thus one “must use” a
component model
for the portal
with again an XML
specification (
portalML
) of portal
component
Sample page with several portlets:
Provide information about application
and
host parameters
Select application to edit
Categories of Worldwide Grid Service
to be exploited by SERVOGrid
• 1) Types of Grid
– R3
– Lightweight – P2P
– Federation and Interoperability
• 2) Core Infrastructure and Hosting Environment
– Service Management – Component Model
– Service wrapper/Invocation – Messaging
• 3) Security Services
– Certificate Authority – Authentication – Authorization – Policy
• 4) Workflow Services and Programming Model
– Enactment Engines (Runtime) – Languages and Programming – Compiler
– Composition/Development
• 5) Notification Services
• 6) Metadata and Information Services
– Basic including Registry
– Semantically rich Services and meta-data – Information Aggregation (events)
– Provenance
• 7) Information Grid Services – OGSA-DAI/DAIT
– Integration with compute resources – P2P and database models
• 8) Compute/File Grid Services – Job Submission
– Job Planning Scheduling Management – Access to Remote Files, Storage and
Computers
– Replica (cache) Management – Virtual Data
– Parallel Computing • 9) Other services including
– Grid Shell – Accounting
– Fabric Management
– Visualization Data-mining and Computational Steering
– Collaboration
• 10) Portals and Problem Solving Environments • 11) Network Services
What should SERVOGrid do ?
• Make use of Grid technologies and architecture from around the world
• Coordinate with broad community through Global Grid Forum and OMII
• Decide on domain specific standards SERVOGridML • Agree on particular approach within choices in
international suite (use GT3 or not?, use portlets or not?, choose meta-data technology) and define SERVOGrid community practice
• Develop software system infrastructure and applications specific to solid earth science
Proposed OMII Activities:
Central Gaps
Gaps in Grid Styles and Execution Environment
• Need for both robust (fault tolerant) and lightweight
(suitable for small groups) Grid styles identified
– Peer-to-peer style supports smaller decentralized virtual organizations
• Note opportunities for modern middleware ideas to be used – lightweight, message-based
• Note that Enterprise JavaBeans not optimized for Science which has high volume dataflow
• Federated Grid Architecture natural for integration of heterogeneous functionality, style and security
Information Gri
Enterprise Gri
Compute Grid
Campus Grid R2 R1
Teacher
Students
Dynamic light-weight Peer-to-peer
Collaboration Training Grid
Overlapping Heterogeneous
(a) Layered OGSA Grid Core Servic e Core Servic e Core Servic e Core Servic e Applicatio n Service Applicatio n Service Applicatio n Service OGSA Interface OGSA Mediation Core Servic e Core Servic e Core Servic e Core Servic e Core Servic e Core Servic e Appl. Servic e Appl. Servic e Appl. Servic e Appl. Servic e Grid-1 Grid-2
OGSA or non OGSA Interface-2 OGSA or non OGSA Interface-1
Many Gaps in Generic Services
• Some gaps like Workflow and Notification are to make production versions of current projects
– Just in UK workflow from DAME, DiscoveryNet, EDG, Geodise, ICENI, myGrid, Unicore plus Cardiff, NEReSC ….
• RGMA and Semantic Grid offer improved meta-data and
Information services compared to UDDI and MDS (Globus)
– Need comprehensive federated Information service
• Security requires architecture supporting dynamic fine-grain authorization
• UK e-Science has pioneered Information Grids but gap is continuation of OGSA-DAI, integration with other services and P2P decentralized models
Gaps in Other Grid services
• Portals and User Interfaces – Noted gap that many not using Grid Computing Environment “best practice” with component based user-interfaces matching component-based middleware
• Programming Models (using workflow runtime)
• Fabric Management (should be integrated with central service management and Information system),
Computational Steering, Visualization, Datamining,
Accounting, Gridmake, Debugging, Semantic Grid tools (consistent with Information system), Collaboration,
provenance
• Application-specific services