Building Science Gateways
Marlon Pierce
Tutorial Overview
Type Title Presenter
Talk Gateways overview Marlon
Talks OGCE overview Marlon
Talk TeraGrid: Resources
Overview Simms
Break
Demo LEAD Portal and workflows Suresh
Demo GridChem Workflow Suresh
Demo OGCE and TGUP Portals Marlon
There’s More
Type Title Presenter
Hands
On OGCE, LEAD, and TGUPportals and workflows Marlon, Suresh
Talk/H
O Building the OGCE Portal Marlon
Talk/H
O Building gadgets with GTLAB Marlon
Break (2:00-2:30)
Talk Web 2.0 for Science
Gateways (Optional) Marlon
Slides and Demo Site
• Tutorial slides are available from http://www.collab-ogce.org/ogce/index.php/Tutorials
• We run a permanent demo portal at
https://community.ucs.indiana.edu:8443/gridsphere/
– Also aliased as https://ogceportal.iu.teragrid.org:8443/gridsphere
• Portal accounts train01-train30 have been created for the workshop. Password is the same as the account name.
– Also train31-train49 from TG08 workshop.
• We also have TeraGrid training accounts with names train01-train30 that can be used to retrieve TG proxy credentials.
These should be active all week.
Concept #1: Web Portal
• Web container that
aggregates content
from multiple sources
into a single display.
o “Start Pages”
• Typically consume
RSS/Atom news feeds.
• More powerful versions
these days support
Flickr, calendars,
games, etc.
o Gadgets, widgets
Gadget
Concept #2: Grid Computing
Grid computing software is designed to integrate large supercomputing facilities.
TeraGrid, Open Science Grid, EGEE, etc.
This is done via network services
Software providers in the US include Globus and Condor
Key Service Components (and example services)
Authentication and authorization framework (MyProxy)
Remote process access and control (GRAM, Condor)
Remote file, I/O access (GridFTP, SRB, RFT)
Additional Services
Information services, replica management, database federation, storage management, schedulers, etc.
Example Grid Software Stacks: CTSS and VDT
For TeraGrid and Open Science Grid, respectively
Science Portals and Gateways
•
Science Gateways adapt Web portal technology
to build user interfaces to the Grid.
•
Science portals resemble standard portals, but
must also
– Support access to computing and storage resources.
– Allow users remote, direct access to these resources.
• You often want to run applications and access data that you own directly.
– Provide access to science applications and data sets.
•
And we must provide value added services as
Example Science Gateways
•
Many listed here:
– http://www.teragrid.org/programs/sci_gateways/
•
Co
ver many different scientific fields:
– Atmospheric science, geophysics, computational chemistry, bioinformatics, etc
•
See also GCE08 workshop at SC08 and earlier
proceedings
– http://www.collab-ogce.org/gce08/index.php/Main_Page
TeraGrid Science Gateways Program
Slides courtesy of Nancy Wilkins-Diehr TeraGrid Area Director for Science Gateways
Does a gateway have to use TeraGrid to
be a gateway?
•
No, but the TeraGrid does fund the development
and support of these gateways
– Using high end resources is more work and is not recommended unless it serves a demonstrated need
• Gateways are an excellent way to extend the impact of high-end resources
•
Are they all funded by TeraGrid?
– Can TeraGrid claim success for all gateways?
• No, we don’t make the gateways you use, we make the gateways you use better
– TeraGrid does fund a small number of developers to provide advanced support.
Why are gateways worth the effort?
• Increasing range ofexpertise needed to tackle the most challenging
scientific problems
– How many details do you
want each individual scientist to need to know?
• PBS, RSL, Condor
• Coupling multi-scale codes
• Assembling data from multiple sources
• Collaboration frameworks
#! /bin/sh #PBS -q dque
#PBS -l nodes=1:ppn=2 #PBS -l walltime=00:02:00 #PBS -o pbs.out
#PBS -e pbs.err #PBS -V cd /users/wilkinsn/tutorial/exercise_3 ../bin/mcell nmj_recon.main.mdl +( &(resourceManagerContact="tg-login1.sdsc.teragrid.org/jobmanager-pbs") (executable="/users/birnbaum/tutorial/bin/mcell") (arguments=nmj_recon.main.mdl) (count=128) (hostCount=10) (maxtime=2) (directory="/users/birnbaum/tutorial/exercise_3") (stdout="/users/birnbaum/tutorial/exercise_3/globus.out") (stderr="/users/birnbaum/tutorial/exercise_3/globus.err") ) =======
# Full path to executable
executable=/users/wilkinsn/tutorial/bin/mcell
# Working directory, where Condor-G will write # its output and error files on the local machine. initialdir=/users/wilkinsn/tutorial/exercise_3
# To set the working directory of the remote job, we # specify it in this globus RSL, which will be appended # to the RSL that Condor-G generates
globusrsl=(directory='/users/wilkinsn/tutorial/exercise_3')
# Arguments to pass to executable. arguments=nmj_recon.main.mdl
# Condor-G can stage the executable transfer_executable=false
# Specify the globus resource to execute the job
globusscheduler=tg-login1.sdsc.teragrid.org/jobmanager-pbs
# Condor has multiple universes, but Condor-G always uses globus
universe=globus
# Files to receive sdout and stderr. output=condor.out
error=condor.err
# Specify the number of copies of the job to submit to the condor queue.
Not just ease of use
What can scientists do that they couldn’t
do previously?
•
LEAD - access to radar data
•
NVO – access to sky surveys
•
OOI – access to sensor data
•
PolarGrid – access to polar ice sheet data
•
SIDGrid – analysis tools
•
GridChem – developing multiscale coupling
Gateways Greatly Expand Access
• Almost anyone can investigate scientific questions usinghigh end resources
– Not just those in the research groups of those who request allocations
– Gateways allow anyone with a web browser to explore
• Opportunities can be uncovered via google
– Nancy’s 11-year-old son discovered nanoHUB.org himself while his class was studying Bucky Balls
• Fosters new ideas, cross-disciplinary approaches
• Encourages students to experiment
• But used in production too
– Significant number of papers resulting from gateways including GridChem, nanoHUB
TeraGrid Pathways Activities
•
Program funding to involve MSI communities
•
2 Gateway components
– Adapt gateways for educational use by underrepresented communities
• GEON – SDSC, Navajo Tech
– Teach participants from underrepresented communities how to build gateways
Navajo Technical College and gateways
•Incorporating the use of gateways in their curricula
PolarGrid
• Cyberinfrastructure Centerfor Polar Science (CICPS)
– Experts in polar science, remote sensing and
cyberinfrastructure
– Indiana, ECSU, CReSIS
• Satellite observations show disintegration of ice shelves in West Antarctica and
speed-up of several glaciers in southern Greenland
– Most existing ice sheet
models, including those used by IPCC cannot explain the rapid changes
http://www.polargrid.org/polargrid/images/4/42/C0050-polargrid-big.m4v
• Components of PolarGrid
– Expedition grid consisting of ruggedized laptops in a field grid linked to a low power multi-core base camp cluster
– Prototype and two production expedition grids feed into a 17
Teraflops "lower 48" system at Indiana University and Elizabeth City State (ECSU) split between research, education and training.
– Gives ECSU a top-ranked 5 Teraflop MSI high performance computing system
• Access to expensive data
• High-end resources for analysis
• MSI student involvement
Recent Gateways using TeraGrid
Significantly
•
SCEC
•
SIDGrid
SCEC using gateway to produce hazard map
• PSHA hazard map for California using newly released Earthquake Rupture Forecast
(UCERF2.0) calculated using SCEC Science Gateway
• Warm colors indicate regions with a high
probability of experiencing strong ground motion in the next 50 years.
Social Informatics Data Grid
• Heavy use of “multimodal”data.
– Subject might be viewing a video, while a researcher collects heart rate and eye movement data.
• Events must be
synchronized for analysis, large datasets result
• Extensive analysis capabilities are not something that each
researcher should have to create for themselves.
• Social scientists have traditionally worked in isolated labs without the capability to share data or insights with others.
• SIDGrid enables a number of capabilities.
– Data that is expensive to collect can now be shared with others, increasing the potential for scientific impact.
– Geographically distant researchers can collaborate on the analysis of the same data set.
– Complex analysis tools and workflows are now available for all to use, rather than having each lab duplicate efforts.
– All researchers now have access to the highest quality computational resources
• SIDGrid uses TeraGrid resources for computationally-intensive tasks such as media transcoding algorithms for pitch analysis of audio tracks and fMRI image analysis
• SIDGrid is unique among social science data archive projects
– Focused on streaming data which change over time
– Provides the ability to investigate multiple datasets, collected at different time scales, simultaneously
• 40 institutional members
– 9 foreign affiliates • Researchers request
synthetic seismograms for any given earthquake
– Allows scientists to
understand the ground motion associated with any given
earthquake
Talks at E-Science
•
See the PSE Workshop:
http://escience2008.iu.edu/workshops/innovative/i
ndex.shtml
– Friday, 10:00 am-4:30 pm
•
Nancy Wilkins-Diehr will have more to say about
some of these gateways.
•
See also Rich Wolski’s keynote on cloud computing.
Next generation gateways will (need to) support
cloud computing and virtual machine-based
backends.
Getting Started Building a
Gateway
When might a gateway be appropriate?
• Researchers using defined sets of tools in different ways– Same executables, different input
• GridChem, CHARMM
– Creating multi-scale or complex workflows
– Datasets
• Common data formats
– National Virtual Observatory
– Earth System Grid
– Some groups have invested significant efforts here
• caBIG, extensive discussions to develop common terminology and formats
• BIRN, extensive data sharing agreements
• Difficult to access data/advanced workflows
– Sensor/radar input
Advanced support for OCI resource
Including gateway integration•
Same peer review process used to request
resources
– 30,000 CPUs
– + 6 months of Nancy
•
Reviews based on appropriate use of resources,
science is not reviewed if already funded
•
Petascale
•
Multisite workflows
•
Gateways
•
Domain expertise
Support is Very Targeted
• Start with well-defined objectives– Focus on efficient or novel use of OCI resources
• Access to minimum 0.25 FTE for months to a year
– Enough investment to really understand and help solve complex problems
• Must have commitment from PIs
– Want to make sure work is incorporated into production codes and gateways
• Good candidates for targeted support include:
– Large, high impact projects
– Ability to influence new communities
Portlets + Client Stubs DB Service JDBC DB Job Sub/Mon And File Services Operating and Queuing Systems WSDL Browser Interface WSD L WSDL WSD L WSD
L WSDL
Visualizatio n
Service
DB
WSDL
Host 1 Host 2 Host 3
My 2002 “octopus” SOA diagram, from the
archives.
SOAP/HTTP
HTTP(S)
WSD
Terminology
•
Portlet
: this is a standard Java component that
generates HTML and can also act as a client to a
remote service.
– Lives in a portal container.
– I will also use this term generically.
•
Web Service
: a remotely invoke-able function on the
Internet.
– SOAP: the XML message envelop for carrying commands over HTTP.
– WSDL: describes the service’s API in XML.
– REST: A variation of this approach.
•
Lots more info:
But Why?
• Three-tiered Service Oriented Architecture is the network equivalent of the the famous Model-View-Controller design pattern.
– View: the user interface components.
– Controller: Web service middleware
– Model: the backend resources.
• Independence of tiers gives flexibility
– Services can be reused with alternative user interfaces
• Workflow composers like Taverna, Xbaya, Kepler
– User interfaces can work with different service implementations.
Two Approaches to the Middle
Tier
Grid Service Grid Service
Backend Resource
Web Service Portal Comp. Portal Comp.
Grid Client
Backend Resource
Fat Client Thin Client
Grid Protocol
(SOAP) Grid Client
HTTP + SOAP
Managing Scientific Workflows
Scientific Workflows
•
Portal interfaces encode scientific use cases.
•
If you have a rich set of services, it is a lot of
work to make portlets for all possible use
cases.
•
And power users will have always want
something more.
•
Example: our CICC project has dozens of
chemical informatics Web services.
–
http
://www.chembiogrid.org.wiki
•
Workf
low composers can simplify this.
–
Allow users to encode and execute their own use
Web Services and Workflows
• Perform a similarity search on the NIH DTP Human Tumor data.
• Filter the results based on Pharmacokinetic
properties (FILTER) • Convert to 3D
(OMEGA)
• Docking into a
pre-defined protein (FRED)
OGCE’s XBaya
Workflow
Social Gadgets+AJAX DB Service JDBC DB Job Sub/Mon And File Services Operating and Queuing Systems REST Browser Interface RES T WSDL RES T RES T REST Visualizatio n Service DB REST
Host 1 Host 2 Host 3
Updating the
Octopus
RSS,JSON/HTTP
HTTP(S)
Enterprise Approach Web 2.0 Approach
JSR 168 Portlets Gadgets, Widgets Server-side integration and
processing AJAX, client-side integration andprocessing, JavaScript
SOAP RSS, Atom, JSON
WSDL REST (GET, PUT, DELETE, POST) Portlet Containers Open Social Containers (Orkut,
LinkedIn, Shindig); Facebook; StartPages
User Centric Gateways Social Networking Portals Workflow managers (Taverna,
Kepler, etc) Mash-ups
Grid computing: Globus, condor, etc Cloud computing: Amazon WS Suite, Xen Virtualization
Semantic Web: RDF, OWL,
Microformats,
KML, and GeoRSS feeds used to
More Information
•
Contact me:
mpierce@cs.indiana.edu
•
S
ee what I’m up to:
h
ttp://communitygrids.blogspot.com/
•
OG
CE software: ht
tp://collab-ogce.org/
Tremendous Opportunities Using the Largest Shared Resources
-Challenges too!
• What’s different when the resource doesn’t belong just to me?
– Resource discovery – Accounting
– Security
– Proposal-based requests for resources (peer-reviewed access)
• Code scaling and performance numbers • Justification of resources
• Gateway citations
• Tremendous benefits at the high end, but even more work for the developers
• Potential impact on science is huge
– Small number of developers can impact thousands of scientists
Gateways can further investments in
other projects
•
Increase access
– To instruments
•
Increase capabilities
– To analyze data
•
Improve workforce development
– For underserved populations