JIM BASNEY1, STUART MARTIN2, JP NAVARRO2, MARLON PIERCE3, TOM SCAVO1,
LEIF STRAND4,
TOM URAM2,5, NANCY WILKINS-DIEHR6, WENJUN WU2, CHOONHAN YOUN6 1NATIONAL CENTER FOR SUPERCOMPUTING APPLICATIONS, UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN
2ARGONNE NATIONAL LABORATORY 3INDIANA UNIVERSITY 4CALIFORNIA INSTITUTE OF TECHNOLOGY
5UNIVERSITY OF CHICAGO
6SAN DIEGO SUPERCOMPUTER CENTER, UNIVERSITY OF CALIFORNIA AT SAN DIEGO
The Problem Solving Environments
of TeraGrid, Science Gateways, and
TeraGrid, what is it
A unique combination of fundamental CI components
Gateways, what are they
Problem Solving Environments for Science
Portal or client-server interfaces to high end resources
Web developments, explosion of digital data lead to the increased
importance of the internet and the web for science
Only 16 years since the availability of web browsers
Developments in web technology
• From static html to cgi forms to the wikis and social web pages of today
Full impact on science yet to be felt
Web usage model resonates with scientists
But, need persistency if the Web is to have a profound impact
on science (this is key for all PSEs)
TeraGrid provides common infrastructure for gateway
developers
TeraGrid’s Infrastructure for Gateways
Problem
Local compute resources are typically not enough for Gateways
Goal
Make it easy to use any TeraGrid site from a Gateway
Approach
Provide a set of client APIs and command line tools for use in
Gateways/portals
Maintain and deploy a set of common services on each site
Infrastructure Capabilities
Information Discovery
Find deployed services
Get details about the compute resources
Data Management
Move data to and from compute resources
Execution Management
Submit and monitor remote computational jobs
Security
Security
Based on Grid Security Infrastructure (GSI)
Uses X509 PKI
End entity certificates (e.g. issued to a person or host)
User proxy certificates (valid for a limited period of time)
Enables
single sign-on
to all TG resources
Enables
delegation
Users/clients can disconnect and let services perform actions
securely on their behalf
Integrated in grid middleware services
GT4 Server
Globus Web Service
Java WS Container
Gridmap
GSI in Action
GT4 Client
Globus WS Client
grid-proxy-init
end entity credential
Key
proxy
credential Key
Gateway Workflow with GSISSH
GSISSH Service Scheduler (e.g., PBS)
Compute Nodes
GSISSH Service Scheduler (e.g., LSF)
Compute Nodes
Local Jobs Local Jobs
Resource A Resource B
gatewa Jobs
GSISSH PBS LSF Client does:
• myproxy-logon (once) • Move files with gsiscp
Remote Execution Management
Grid Resource Allocation and Management (GRAM)
Provide an abstraction layer on top of various local
resource managers (PBS, Condor, LSF, SGE, …)
Defines a common job description language
Client API and command line tools to asynchronously access remote
LRMs
Fault tolerant
GSI Security
“job” Workflow
File staging before and after job execution
Lastly, File cleanup
Traditional LRM Interaction
Local Jobs
Resource A
Scheduler (e.g., PBS)
Compute Nodes
Satisfies many users and use cases
TACC’s Ranger (62976 cores!) is the Costco of HTC ;-), one
Local Jobs
Resource A
GRAM4 Service Scheduler (e.g., PBS)
Compute Nodes
remot GRAM Jobs gramJob API
Adds remote execution capability
Enable clients/devices to manage
jobs from off of the cluster (Gateways!)
GRAM Benefit
GRAM4 Service Scheduler (e.g., PBS)
Compute Nodes
GRAM4 Service Scheduler (e.g., LSF)
Compute Nodes
Local Jobs Local Jobs
Resource A Resource B
GRAM Jobs
gramJob API
Gateway Perspective
GRAM4 jobs
Scalable jo
Data Management - GridFTP
GridFTP
High-performance, secure, reliable data transfer protocol optimized for
high-bandwidth wide-area
GSI Security
Third-party transfers Parallel Transfers Striping
Lots of small files (LOSF)
Data Management - RFT
Reliable File Transfer
Adds reliability on top of GridFTP
GSI Security
Throttles requests
Retries non-fatal transfer errors
Resumes transfers from the last known position
Requires delegation in order to contact GridFTP servers on user’s
We Authn
Resource Provider Science Gateway
WS GRAM
Client WS GRAM Service
proxy credential
proxy certificate
Key
Java WS Container
Webapp
Web Interface Web Browser
community credential
Key
community account
GT4 Server GT4 Client
Globus WS
Client SAML PIPGridShi
proxy certificate GridShib SAML Tools end entity credential Key SAML Globus Web Service Policy Logs
Java WS Container (with GridShib for GT)
We Authn
Resource Provider Science Gateway
WS GRAM
Client SAML PIPGridShi
proxy certificate GridShib SAML Tools communit y credential Key SAML WS GRAM Service Policy Logs
Java WS Container (with GridShib for GT)
Security Context Webapp attributes Web Interface Web Browser username proxy credential SAML Key
Information Management
TeraGrid’s Integrated Information Services are a network of web services
responsible for aggregating the availability of TeraGrid capability kits, software, and services across all the infrastructure providers
Where are the job submission, file-transfer, and login services needed by Gateways? What is the queue status and estimated delay for each resource?
High-Availability Design
…
info.dyn.teragrid.org
info.teragrid.
org
TeraGrid Dynamic DNS
Server failover propagates globally in 15 minutes
Clients
Dynamic paths Static paths Service Provider Information Services
Today, there are approximately 29 gateways
using the TeraGrid
Selected Highlights from the PSE08 paper
The Social Informatics Data (SID) Grid
The Geosciences Network (GEON)
QuakeSim
Computational Infrastructure for Geodynamics
(CIG)
Social Informatics Data Grid
Heavy use of “multimodal”
data.
Subject might be viewing a
video, while a researcher collects heart rate and eye movement data.
Events must be
synchronized for analysis,
large datasets result
Extensive analysis
capabilities are not
something that each
researcher should have to
create for themselves.
NSF Program Officers, September 10, 2008
How does SIDGrid use the TeraGrid?
Computationally intensive tasks
Speech, gesture, facial expression, and physiological measurements
Media transcoding for pitch analysis of audio tracks
Once stored in raw form, data streams converted to formats
compatible with software for annotation, coding, integration, analysis
fMRI image analysis
Workflows for massive job submissions and data
transfers using Virtual Data System (VDS)
Worflows converted to concrete execution plan via
Pegasus Grid planner
TeraGrid information service (MDS) Replica location service (RLS)
The goal of GEON is
to advance the field of
geoinformatics and
to prepare and train current and
future generations of geoscience researchers, educators, and
practitioners in the use of
cyberinfrastructure to further their research, education, and
professional goals.
GEON is providing several key
features
data access, computational
simulations, personal work spaces and analyses environments
identifying best practices with the
objective of dramatically advancing geoscience research and
How does GEON use the TeraGrid?
Computationally intensive tasks
Ability to speedily construct earth models, access observed
earthquake recordings and simulate them to understand the
subsurface structure and characteristics of seismic wave propagation in an efficient manner
SYNSEIS (SYNthetic SEISmogram generation tool), provides access
to seismic waveform data and simulate seismic records using 2D and 3D models.
Conduct advanced calculations for simulating seismic waveforms of either earthquakes or explosions at regional distances (< 1000 km).
GSI (security), GAMA (account management), GridFTP
(data transfer), GRAM (job submission), MyWorkspace
(job monitoring)
Account management for classroom use, MyProjects
QuakeSim - Some Design Choices
Build portals out of
portlets
(Java Standard)
Reuse capabilities from our Open Grid Computing Environments
(OGCE) project, the REASoN GPS Explorer project, and many TeraGrid Science Gateways.
Decorate with Google Maps, Yahoo UI gadgets, etc.
Use
Java Server Faces
to build individual component
portlets.
Build standalone tools, then convert to portlets at the very end.
Use simple
Web Services
for accessing codes and data.
Keep It Stateless …
Use Condor-G and Globus job and file management
services for interacting with high performance computers.
TeraGrid
Favor
Google Maps
and
Google Earth
for their simplicity,
interactivity and open APIs.
Generate KML and GeoRSS
Use
Apache Maven
based build and compile system, SVN
Portlets + Client Stubs DB Service JDBC DB Job Sub/Mon And File Services Operating and Queuing Systems WSD L WSDL Browser Interface WS DL WSD L WS DL WS
DL WSDL
Visualization Or Map Service
DB WSDL
Host 1 (Quaketables) Host 2 (Grid) Host 3 (G Maps)
SOAP/HTTP
Two Approaches to the Middle Tier
Grid Service Grid Service
Backend Resource
Web Service Portal Comp. Portal Comp.
Grid Client
Backend Resource
Fat
Client ClientThin
Grid Protocol
(SOAP) Grid Client
HTTP + SOAP
Disloc output
converted to
KML and
“SWARM: Scheduling Large-scale Jobs over the Loosely-Coupled HPC Clusters,” S. L. Pallickara and M. E. Pierce, Friday, December 12, 2 p.m. to 2:30 p.m.
http://escience2008.iu.edu/sessions/SWARM.shtml
Standard Web Service Interface Request Manager
Resource
Ranking Manager DataModelManager
QBET Web
Service Fault Manager
Job Execution Manager Condor G with Birdbath
User A’s Job
Board User A’s Resource
Pool
Tokens for
resource X,Y,Z User A’s Job
Queue Job Distributor RDMBS MyProxy Server
High Performance Computing Clusters: Grid style clusters
and condor computing nodes
Membership-governed
organization
40 institutional member, 9
foreign affiliates
Supports and promotes
Earth science by
developing and
maintaining software for
computational geophysics
How does CIG use the TeraGrid?
Seismograms allow scientists to understand the ground motion
Computationally-intensive simulations run on TeraGrid using an assortment
of 3D and 1D earth models produce synthetic seismograms Necessary input datasets provided via the portal
Daemon (Python, Pyre) constantly polls the web site looking for work to do
GSI-OpenSSH and MyProxy credentials to submit jobs, monitors jobs, transfers output back to
portal
status updates to the web site using HTTP POST
Users can download results in ASCII and Seismic Analysis Code (SAC) format
Visualizations include "beachball" graphics depicting the earthquake's source mechanism, and
maps showing the locations of the earthquake and the seismic stations using GMT (http://gmt.soest.hawaii.edu/)
Researchers quickly receive results and can concentrate on the scientific
aspects of the output rather than on the details of running the analysis on a supercomputer
Future Directions
Parameter explorations
Conclusions
Technical requirements of some PSEs dictate
seamless access to high-end compute and data
resources
A
robust, flexible and scalable
infrastructure can provide
a foundation for many PSEs
PSEs themselves must be treated as sustainable
infrastructure