• No results found

Building Science Gateways

N/A
N/A
Protected

Academic year: 2020

Share "Building Science Gateways"

Copied!
46
0
0

Loading.... (view fulltext now)

Full text

(1)

Building Science Gateways

Marlon Pierce

(2)

Tutorial Overview

Type Title Presenter

Talk Gateways overview Marlon

Talks OGCE overview Marlon

Talk TeraGrid: Resources

Overview Simms

Break

Demo LEAD Portal and workflows Suresh

Demo GridChem Workflow Suresh

Demo OGCE and TGUP Portals Marlon

(3)

There’s More

Type Title Presenter

Hands

On OGCE, LEAD, and TGUPportals and workflows Marlon, Suresh

Talk/H

O Building the OGCE Portal Marlon

Talk/H

O Building gadgets with GTLAB Marlon

Break (2:00-2:30)

Talk Web 2.0 for Science

Gateways (Optional) Marlon

(4)

Slides and Demo Site

• Tutorial slides are available from http://www.collab-ogce.org/ogce/index.php/Tutorials

• We run a permanent demo portal at

https://community.ucs.indiana.edu:8443/gridsphere/

– Also aliased as https://ogceportal.iu.teragrid.org:8443/gridsphere

• Portal accounts train01-train30 have been created for the workshop. Password is the same as the account name.

– Also train31-train49 from TG08 workshop.

• We also have TeraGrid training accounts with names train01-train30 that can be used to retrieve TG proxy credentials.

— These should be active all week.

(5)

Concept #1: Web Portal

• Web container that

aggregates content

from multiple sources

into a single display.

o “Start Pages”

• Typically consume

RSS/Atom news feeds.

• More powerful versions

these days support

Flickr, calendars,

games, etc.

o Gadgets, widgets

(6)

Gadget

(7)

Concept #2: Grid Computing

— Grid computing software is designed to integrate large supercomputing facilities.

— TeraGrid, Open Science Grid, EGEE, etc.

— This is done via network services

— Software providers in the US include Globus and Condor

— Key Service Components (and example services)

— Authentication and authorization framework (MyProxy)

— Remote process access and control (GRAM, Condor)

— Remote file, I/O access (GridFTP, SRB, RFT)

— Additional Services

— Information services, replica management, database federation, storage management, schedulers, etc.

— Example Grid Software Stacks: CTSS and VDT

— For TeraGrid and Open Science Grid, respectively

(8)
(9)

Science Portals and Gateways

Science Gateways adapt Web portal technology

to build user interfaces to the Grid.

Science portals resemble standard portals, but

must also

– Support access to computing and storage resources.

– Allow users remote, direct access to these resources.

• You often want to run applications and access data that you own directly.

– Provide access to science applications and data sets.

And we must provide value added services as

(10)

Example Science Gateways

Many listed here:

– http://www.teragrid.org/programs/sci_gateways/

Co

ver many different scientific fields:

– Atmospheric science, geophysics, computational chemistry, bioinformatics, etc

See also GCE08 workshop at SC08 and earlier

proceedings

– http://www.collab-ogce.org/gce08/index.php/Main_Page

(11)

TeraGrid Science Gateways Program

Slides courtesy of Nancy Wilkins-Diehr TeraGrid Area Director for Science Gateways

(12)
(13)

Does a gateway have to use TeraGrid to

be a gateway?

No, but the TeraGrid does fund the development

and support of these gateways

– Using high end resources is more work and is not recommended unless it serves a demonstrated need

Gateways are an excellent way to extend the impact of high-end resources

Are they all funded by TeraGrid?

– Can TeraGrid claim success for all gateways?

No, we don’t make the gateways you use, we make the gateways you use better

– TeraGrid does fund a small number of developers to provide advanced support.

(14)

Why are gateways worth the effort?

• Increasing range of

expertise needed to tackle the most challenging

scientific problems

– How many details do you

want each individual scientist to need to know?

• PBS, RSL, Condor

• Coupling multi-scale codes

• Assembling data from multiple sources

• Collaboration frameworks

#! /bin/sh #PBS -q dque

#PBS -l nodes=1:ppn=2 #PBS -l walltime=00:02:00 #PBS -o pbs.out

#PBS -e pbs.err #PBS -V cd /users/wilkinsn/tutorial/exercise_3 ../bin/mcell nmj_recon.main.mdl +( &(resourceManagerContact="tg-login1.sdsc.teragrid.org/jobmanager-pbs") (executable="/users/birnbaum/tutorial/bin/mcell") (arguments=nmj_recon.main.mdl) (count=128) (hostCount=10) (maxtime=2) (directory="/users/birnbaum/tutorial/exercise_3") (stdout="/users/birnbaum/tutorial/exercise_3/globus.out") (stderr="/users/birnbaum/tutorial/exercise_3/globus.err") ) =======

# Full path to executable

executable=/users/wilkinsn/tutorial/bin/mcell

# Working directory, where Condor-G will write # its output and error files on the local machine. initialdir=/users/wilkinsn/tutorial/exercise_3

# To set the working directory of the remote job, we # specify it in this globus RSL, which will be appended # to the RSL that Condor-G generates

globusrsl=(directory='/users/wilkinsn/tutorial/exercise_3')

# Arguments to pass to executable. arguments=nmj_recon.main.mdl

# Condor-G can stage the executable transfer_executable=false

# Specify the globus resource to execute the job

globusscheduler=tg-login1.sdsc.teragrid.org/jobmanager-pbs

# Condor has multiple universes, but Condor-G always uses globus

universe=globus

# Files to receive sdout and stderr. output=condor.out

error=condor.err

# Specify the number of copies of the job to submit to the condor queue.

(15)

Not just ease of use

What can scientists do that they couldn’t

do previously?

LEAD - access to radar data

NVO – access to sky surveys

OOI – access to sensor data

PolarGrid – access to polar ice sheet data

SIDGrid – analysis tools

GridChem – developing multiscale coupling

(16)

Gateways Greatly Expand Access

• Almost anyone can investigate scientific questions using

high end resources

– Not just those in the research groups of those who request allocations

– Gateways allow anyone with a web browser to explore

• Opportunities can be uncovered via google

– Nancy’s 11-year-old son discovered nanoHUB.org himself while his class was studying Bucky Balls

• Fosters new ideas, cross-disciplinary approaches

• Encourages students to experiment

• But used in production too

– Significant number of papers resulting from gateways including GridChem, nanoHUB

(17)

TeraGrid Pathways Activities

Program funding to involve MSI communities

2 Gateway components

– Adapt gateways for educational use by underrepresented communities

• GEON – SDSC, Navajo Tech

– Teach participants from underrepresented communities how to build gateways

(18)

Navajo Technical College and gateways

•Incorporating the use of gateways in their curricula

(19)

PolarGrid

• Cyberinfrastructure Center

for Polar Science (CICPS)

– Experts in polar science, remote sensing and

cyberinfrastructure

– Indiana, ECSU, CReSIS

• Satellite observations show disintegration of ice shelves in West Antarctica and

speed-up of several glaciers in southern Greenland

– Most existing ice sheet

models, including those used by IPCC cannot explain the rapid changes

http://www.polargrid.org/polargrid/images/4/42/C0050-polargrid-big.m4v

(20)

• Components of PolarGrid

– Expedition grid consisting of ruggedized laptops in a field grid linked to a low power multi-core base camp cluster

– Prototype and two production expedition grids feed into a 17

Teraflops "lower 48" system at Indiana University and Elizabeth City State (ECSU) split between research, education and training.

– Gives ECSU a top-ranked 5 Teraflop MSI high performance computing system

• Access to expensive data

• High-end resources for analysis

• MSI student involvement

(21)

Recent Gateways using TeraGrid

Significantly

SCEC

SIDGrid

(22)

SCEC using gateway to produce hazard map

• PSHA hazard map for California using newly released Earthquake Rupture Forecast

(UCERF2.0) calculated using SCEC Science Gateway

• Warm colors indicate regions with a high

probability of experiencing strong ground motion in the next 50 years.

(23)

Social Informatics Data Grid

• Heavy use of “multimodal”

data.

– Subject might be viewing a video, while a researcher collects heart rate and eye movement data.

• Events must be

synchronized for analysis, large datasets result

• Extensive analysis capabilities are not something that each

researcher should have to create for themselves.

(24)

• Social scientists have traditionally worked in isolated labs without the capability to share data or insights with others.

• SIDGrid enables a number of capabilities.

– Data that is expensive to collect can now be shared with others, increasing the potential for scientific impact.

– Geographically distant researchers can collaborate on the analysis of the same data set.

– Complex analysis tools and workflows are now available for all to use, rather than having each lab duplicate efforts.

– All researchers now have access to the highest quality computational resources

• SIDGrid uses TeraGrid resources for computationally-intensive tasks such as media transcoding algorithms for pitch analysis of audio tracks and fMRI image analysis

• SIDGrid is unique among social science data archive projects

– Focused on streaming data which change over time

– Provides the ability to investigate multiple datasets, collected at different time scales, simultaneously

(25)

• 40 institutional members

– 9 foreign affiliates • Researchers request

synthetic seismograms for any given earthquake

– Allows scientists to

understand the ground motion associated with any given

earthquake

(26)

Talks at E-Science

See the PSE Workshop:

http://escience2008.iu.edu/workshops/innovative/i

ndex.shtml

– Friday, 10:00 am-4:30 pm

Nancy Wilkins-Diehr will have more to say about

some of these gateways.

See also Rich Wolski’s keynote on cloud computing.

Next generation gateways will (need to) support

cloud computing and virtual machine-based

backends.

(27)

Getting Started Building a

Gateway

(28)

When might a gateway be appropriate?

• Researchers using defined sets of tools in different ways

– Same executables, different input

• GridChem, CHARMM

– Creating multi-scale or complex workflows

– Datasets

• Common data formats

– National Virtual Observatory

– Earth System Grid

– Some groups have invested significant efforts here

• caBIG, extensive discussions to develop common terminology and formats

• BIRN, extensive data sharing agreements

• Difficult to access data/advanced workflows

– Sensor/radar input

(29)

Advanced support for OCI resource

Including gateway integration

Same peer review process used to request

resources

– 30,000 CPUs

– + 6 months of Nancy

Reviews based on appropriate use of resources,

science is not reviewed if already funded

Petascale

Multisite workflows

Gateways

Domain expertise

(30)

Support is Very Targeted

• Start with well-defined objectives

– Focus on efficient or novel use of OCI resources

• Access to minimum 0.25 FTE for months to a year

– Enough investment to really understand and help solve complex problems

• Must have commitment from PIs

– Want to make sure work is incorporated into production codes and gateways

• Good candidates for targeted support include:

– Large, high impact projects

– Ability to influence new communities

(31)
(32)

Portlets + Client Stubs DB Service JDBC DB Job Sub/Mon And File Services Operating and Queuing Systems WSDL Browser Interface WSD L WSDL WSD L WSD

L WSDL

Visualizatio n

Service

DB

WSDL

Host 1 Host 2 Host 3

My 2002 “octopus” SOA diagram, from the

archives.

SOAP/HTTP

HTTP(S)

WSD

(33)

Terminology

Portlet

: this is a standard Java component that

generates HTML and can also act as a client to a

remote service.

– Lives in a portal container.

– I will also use this term generically.

Web Service

: a remotely invoke-able function on the

Internet.

– SOAP: the XML message envelop for carrying commands over HTTP.

– WSDL: describes the service’s API in XML.

– REST: A variation of this approach.

Lots more info:

(34)

But Why?

• Three-tiered Service Oriented Architecture is the network equivalent of the the famous Model-View-Controller design pattern.

– View: the user interface components.

– Controller: Web service middleware

– Model: the backend resources.

• Independence of tiers gives flexibility

– Services can be reused with alternative user interfaces

• Workflow composers like Taverna, Xbaya, Kepler

– User interfaces can work with different service implementations.

(35)

Two Approaches to the Middle

Tier

Grid Service Grid Service

Backend Resource

Web Service Portal Comp. Portal Comp.

Grid Client

Backend Resource

Fat Client Thin Client

Grid Protocol

(SOAP) Grid Client

HTTP + SOAP

(36)

Managing Scientific Workflows

(37)

Scientific Workflows

Portal interfaces encode scientific use cases.

If you have a rich set of services, it is a lot of

work to make portlets for all possible use

cases.

And power users will have always want

something more.

Example: our CICC project has dozens of

chemical informatics Web services.

http

://www.chembiogrid.org.wiki

Workf

low composers can simplify this.

Allow users to encode and execute their own use

(38)

Web Services and Workflows

• Perform a similarity search on the NIH DTP Human Tumor data.

• Filter the results based on Pharmacokinetic

properties (FILTER) • Convert to 3D

(OMEGA)

• Docking into a

pre-defined protein (FRED)

(39)

OGCE’s XBaya

Workflow

(40)

Social Gadgets+AJAX DB Service JDBC DB Job Sub/Mon And File Services Operating and Queuing Systems REST Browser Interface RES T WSDL RES T RES T REST Visualizatio n Service DB REST

Host 1 Host 2 Host 3

Updating the

Octopus

RSS,JSON/HTTP

HTTP(S)

(41)

Enterprise Approach Web 2.0 Approach

JSR 168 Portlets Gadgets, Widgets Server-side integration and

processing AJAX, client-side integration andprocessing, JavaScript

SOAP RSS, Atom, JSON

WSDL REST (GET, PUT, DELETE, POST) Portlet Containers Open Social Containers (Orkut,

LinkedIn, Shindig); Facebook; StartPages

User Centric Gateways Social Networking Portals Workflow managers (Taverna,

Kepler, etc) Mash-ups

Grid computing: Globus, condor, etc Cloud computing: Amazon WS Suite, Xen Virtualization

Semantic Web: RDF, OWL,

(42)
(43)

Microformats,

KML, and GeoRSS feeds used to

(44)

More Information

Contact me:

mpierce@cs.indiana.edu

S

ee what I’m up to:

h

ttp://communitygrids.blogspot.com/

OG

CE software: ht

tp://collab-ogce.org/

(45)

Tremendous Opportunities Using the Largest Shared Resources

-Challenges too!

• What’s different when the resource doesn’t belong just to me?

– Resource discovery – Accounting

– Security

– Proposal-based requests for resources (peer-reviewed access)

• Code scaling and performance numbers • Justification of resources

• Gateway citations

• Tremendous benefits at the high end, but even more work for the developers

• Potential impact on science is huge

– Small number of developers can impact thousands of scientists

(46)

Gateways can further investments in

other projects

Increase access

– To instruments

Increase capabilities

– To analyze data

Improve workforce development

– For underserved populations

Increase outreach

Increase public awareness

References

Related documents

The Relevant Information Frame - RIF - architecture proposed here decreases the number of messages sent by selecting the most relevant data items in a node and disseminating them in

28 (2004); Guowuyuan bangongting guanyu yange zhixing youguan nongcun jiti jianshe yongdi falv he zhengce de tongzhi [ Notice of the Office of the State Council on

Although there is much research that suggests that teacher efficacy has a significant impact on student achievement (Chetty, Friedman, & Rockoff, 2014; Hoy, 2000; Hattie, 2014;

a, anus; -av, ventral lobe of albumen gland; b, bursa copulatrix; c, strip of tall columnar ciliated cells; cc, opening of crystal sac into capsule duct; cd, capsule duct; cr,

aspects; types of medicinal plant species found within or outside the group ranch that were recognized by the com- munity to be of medicinal value, part(s) used (e.g. roots,

This study aims to research into whether the Taiwanese-Mandarin bilingual aphasic patients with the selective recovery pattern lose the knowledge of the inaccessible language or

Do not try more times if the terminal declaims the debit card as it will cost you every time Always store all the receipts and get as much information as you can from the customer

Given the current status of corpora in foreign language, it is fair to say that the benefits, the potential and the likely contributions of the corpus have been underestimated