• No results found

Cyberinfrastructure across the Globe

N/A
N/A
Protected

Academic year: 2020

Share "Cyberinfrastructure across the Globe"

Copied!
39
0
0

Loading.... (view fulltext now)

Full text

(1)

Cyberinfrastructure

across the Globe

Indiana University

Computer Science Undergraduate Honors Seminar

January 8 2007

Geoffrey Fox

Computer Science, Informatics, Physics Pervasive Technology Laboratories Indiana University Bloomington IN 47401

(2)

Abstract

n

We discuss the role of Cyberinfrastructure (also called

e-infrastructure and implemented by Grid technology)

in a variety of global activities. These include the

linking of researchers and data world wide in many

fields; new generations of digital libraries and tools like

Google Scholar; study of ice-sheets at the poles and the

dramatic impact of Global warming; the study of

earthquakes across the Pacific ocean; the linking of

apparel manufacturers in Asia to designers in different

continents and the command and control system for the

Department of Defense. We discuss these applications

and their associated technology.

(3)

Why Cyberinfrastructure Useful

n Supports distributed science – data, people, computers

n Exploits Internet technology (Web2.0) adding management,

security, supercomputers etc.

n It has two aspects: parallel – low latency (microseconds)

between nodes and distributed – highish latency (microseconds) between nodes

n Parallel needed to get high performance on individual 3D

simulations, data analysis etc.; must decompose problem

n Distributed aspect integrates already distinct components n Cyberinfrastructure is in general a distributed collection of

parallel systems

n Grids are made of services that are “just” programs or data

(4)

e-moreorlessanything and the Grid

n ‘e-Science is about global collaboration in key areas of science,

and the next generation of infrastructure that will enable it.’ from its inventor John Taylor Director General of Research Councils UK, Office of Science and Technology

n e-Science is about developing tools and technologies that allow

scientists to do ‘faster, better or different’ research

n Similarly e-Business captures an emerging view of corporations as

dynamic virtual organizations linking employees, customers and stakeholders across the world.

The growing use of outsourcing is one example

n The Grid provides the information technology e-infrastructure

for e-moreorlessanything.

n A deluge of data of unprecedented and inevitable size must be

managed and understood.

n People, computers, data and instruments must be linked.

n On demand assignment of experts, computers, networks and

storage resources must be supported

(5)

TeraGrid: Integrating NSF Cyberinfrastructure

TeraGrid is a facility that integrates computational, information, and analysis resources at the San Diego Supercomputer Center, the Texas Advanced Computing Center, the University of Chicago / Argonne National Laboratory, the National Center for Supercomputing Applications, Purdue University,Indiana University, Oak Ridge National Laboratory, the Pittsburgh

Supercomputing Center, and the National Center for Atmospheric Research.

SDSC

TACC

UC/ANL

NCSA ORNL

PU IU

PSC NCAR

Caltech

USC-ISI Utah

Iowa

Cornell Buffalo

(6)

Virtual Observatory Astronomy Gri

Integrate Experiments

Radio Far-Infrared Visible

Visible + X-ray

Dust Map

Galaxy Density Map

(7)

Grid Capabilities for Science

n Open technologies for any large scale distributed system that is adopted by

industry, many sciences and many countries (including UK, EU, USA, Asia)

Security, Reliability, Management and state standards

n Service and messaging specifications

n User interfaces via portals and portlets virtualizing to desktops, email,

PDA’s etc.

~20 TeraGrid Science Gateways (their name for portals)OGCE Portal technology effort led by Indiana

n Uniform approach to access distributed (super)computers supporting single

(large) jobs and spawning lots of related jobs

n Data and meta-data architecture supporting real-time and archives as well

as federation

Links to Semantic web and annotation

n Grid (Web service) workflow with standards and several successful

instantiations (such as Taverna and MyLead)

n Many Earth science grids including ESG (DoE), GEON, LEAD, SCEC,

SERVO; LTER and NEON for Environment

(8)

n

Much of the world’s manufacturing industry is

globalized and the apparel/textile industry is typical

n

We are working with Hong Kong Textile Industry to

link the Asian manufacturers with

design/marketing/purchase functions elsewhere (USA,

Europe)

n

Need to exchange designs, available fabrics and

discussions

n

Good example of e-infrastructure enabling

specialization in one geographical area to thrive

n

Software and digital animation outsourcing are good

examples

eApparel

(9)

APEC Cooperation for Earthquake Simulation

n ACES is a seven year-long collaboration among scientists

interested in earthquake and tsunami predication

iSERVO is Infrastructure to suppor

work of ACES

SERVOGrid is (completed) US Grid that is

a prototype of iSERVO

http://www.quakes.uq.edu.au/ACES/

n Chartered under APEC

(10)

Database Database Analysis and Visualizatio Portal Repositorie Federated Databases Data Filte Services

Field Trip Data

Streaming Data Sensor s

?

Discovery Services SERVOGrid Researc Simulation s Research Education Customization Services From Researc to Education Educatio Grid Computer Farm

Grid of Grids: Research Grid and Education Grid

(11)

SERVOGrid and Cyberinfrastructure

n Grids are the technology based on Web services that implement

Cyberinfrastructure i.e. support eScience or science as a team sport

Internet scale managed services that link computers data

repositories sensors instruments and people

n There is a portal and services in SERVOGrid for

Applications such as GeoFEST, RDAHMM, Pattern

Informatics, Virtual California (VC), Simplex, mesh generating programs …..

Job management and monitoring web services for running

the above codes.

File management web services for moving files between

various machines.

Geographical Information System servicesQuaketables earthquake specific databaseSensors as well as databases

Context (dynamic metadata) and UDDI system long term

metadata services

(12)

a

Topography 1 km

Stress Change

Earthquakes

PBO

Site-specific Irregular

Scalar Measurements Constellations for Plate Boundary-Scale Vector Measurements

a

a

Ice Sheets Volcanoes

Long Valley, CA

Northridge, CA

Hector Mine, CA Greenland

(13)

Some Grid Concepts I

n

Services

are “just” (distributed) programs sending and

receiving messages with well defined syntax

n

Interfaces

(input-output)

must be open

; innards can be

open source (allowing you to modify) or proprietary

Services can be any language from Fortran, Shell scripts, C,

C#, C++, Java, Python, Perl – your choice!!

Web Services supported by all vendors (IBM, Microsoft …) n

Service overhead

will be just a

few milliseconds

(more

now) which is < typical network transit time

Any program that is distributed can be a Web serviceAny program taking execution time ≥ 20ms can be an

(14)

Web services

n

Web Services

build

loosely-coupled,

distributed

applications,

(wrapping existing

codes and databases)

based on the

SOA

(service oriented

architecture) principles.

n

Web Services interact

by exchanging messages

in

SOAP

format

n

The contracts for the

message exchanges that

implement those

interactions are

described via

WSDL

(15)

Some Grid Concepts II

n Systems are built from contributions from many different

groups – you do not need one “vendor” for all components as Web services allow interoperability between components

One reason DoD likes Grids (called Net-Centric computing)

n Grids are distributed in services and data allowing anybody to

store their data and to produce “their” view

Some think that University Library of future will curate/store data of

their faculty

n “2 level programming model”: Classic programming of services

and services are composed using workflow consistent with industry standards (BPEL)

n Grid of Grids: (System of Systems) Realistically Grid-like

systems will be built using multiple technologies and “standards” –integrate separate Grids for Sensors, GIS, Visualization,

computing etc. with OGSA (Open Grid Service Architecture from OGF) system Grid (Security, registry) into a single Grid

(16)

TeraGrid User Portal

(17)

LEAD Gateway Portal

NSF Large ITR and Teragrid Gateway

- Adaptive Response to Mesoscal weather events

(18)

Grid Workflow Data Assimilation in Earth Science

n Grid services triggered by abnormal events and controlled by workflow process real

time data from radar and high resolution simulations for tornado forecasts

Use a Portlet-based user portal to access and control services and workflow

(19)

SERVOGrid has a portal

(20)

Portlets v. Google Gadgets

n

Portals for Grid Systems are built using portlets with

software like GridSphere integrating these on the

server-side into a single web-page

n

Google (at least) offers the Google sidebar and Google

home page which support Web 2.0 services and do not

use a server side aggregator

n

Google is more user friendly!

n

The many Web 2.0 competitions is an interesting model

for promoting development in the world-wide

distributed collection of Web 2.0 developers

n

I guess Web 2.0 model will win!

(21)

GIS and Sensor Grids

n OGC has defined a suite of data structures and services to

support Geographical Information Systems and Sensors

n GML Geography Markup language defines specification of

geo-referenced data

n SensorML and O&M (Observation and Measurements) define

meta-data and data structure for sensors

n Services like Web Map Service, Web Feature Service, Sensor

Collection Service define services interfaces to access GIS and sensor information

n Grid workflow links services that are designed to support

streaming input and output messages

n We built Grid (Web) service implementations of these

specifications for NASA’s SERVOGrid

(22)

Grid Workflow Datamining in Earth Science

n Work with Scripps Institute

n Grid services controlled by workflow process real time data from ~70 GPS Sensors in Southern California

Streaming Data Support

Transformations Data Checking

Hidden Marko Datamining (JPL)

Display (GIS)

NASA GPS

Earthquake

(23)

Earth/Atmosphere Grids built as Grids of (library) Grids

Ice Sheet Sensors, SAR, Filters, EM, Glacier Simulations

Physical Network Registr

y Metadata

Earthquake Data, Filters & Simulation

Services

Earthquake

SERVOGrid

TornadGrid

Ice SheetPolarGrid

Data

Access/Storage Securit

y Notification Workflow Messaging

Portal

s VisualizationGrid

Collaboration Grid

Sensor Grid Compute

Grid GIS Grid

(24)

Community Tools

e-mail and list-serves are oldest and best used

Kazaa, Instant Messengers, Skype, Napster, BitTorrent for P2P Collaboration – text, audio-video conferencing, files

del.icio.us, Connotea, Citeulike, Bibsonomy, Biolicious manage shared bookmarks

MySpace, YouTube, Bebo, Hotornot, Facebook, or similar sites allow you to create (upload) community resources and share them; Friendster, LinkedIn

create networks

http://en.wikipedia.org/wiki/List_of_social_networking_websites

Writely, Wikis and Blogs are powerful specialized shared document systems

ConferenceXP and WebEx share general applications

Google Scholar tells you who has cited your papers while publisher sites tell you about co-authors

Windows Live Academic Search has similar goals

Note sharing resources creates (implicit) communities

Social network tools study graphs to both define communities and extract their properties

(25)

Mashups and Grids

http://www.programmableweb.com

There are 281 “commodity”

service Web 2.0 API’s on October 1 06 (356 Jan 9 07)

Mashups are composed from

JavaScript, AJAX and REST

and not usually BPEL WSDL

and SOAP; Google Gadgets not

portlets

Architecture of Mashups and

Grids “identical”

See Amazon S3 Storage and

EC2 Elastic Computing services

Mashups enable everybody to

(26)

Mashup Matrix

(27)

GIS Grid of “Indiana Map” and ~10 Indiana counties with accessible Map (Feature) Servers from different vendors. Grids federate different data repositories (cf Astronomy VO federating different observatory collections

Indiana Map Mash-up

(28)

eSports?

n YouTube illustrates asynchronous

video sharing and video conferencing illustrates synchronous video sharing

n One can link trainers (or spectators)

and athletes globally with real time video supporting video and text

annotation

n Technically hard due to network

issues and allowing real-time playing of annotated video

n Exploring with China

n Note IU could export coaching in

Soccer, Basketball etc

n Example of Cyberinfrastructure

supporting geographically distributed specialization

(29)

Minority Serving Institutions and the Grid

• Historically the R1 Research University powerhouses dominated research due to their concentration of expertise

• Cyberinfrastructure allows others to participate in same way it

supports distributed open source software and distributed Web 2.0 • Navajo Nation (Colorado Plateau covering over 25,000 square

miles in northeast Arizona, northwest New Mexico, and southeast Utah) with 110 communities and over 40% unemployment.

Building a wireless grid for education, healthcare

• http://www.win-hec.org/ World Indigenous Nations Higher Education Consortium

• Cyberinfrastructure allows Nations to preserve their geographical identity but participate fully with world class jobs and research • Some 335 MSI’s in Alliance have similar hopes for

Cyberinfrastructure to jump start their advancement!

(30)

Example: Setting up a Polar CI-Grid

• The North and South poles are melting with potential huge environmental impact

• As a result of MSI meetings, I am working with MSI ECSU in North Carolina and Kansas University to design and set up a Polar Grid (Cyberinfrastructure)

• This is a network of computers, sensors (on robots and

satellites), data and people aimed at understanding science of ice-sheets and impact of global warming

• We have changed the 100,000 year Glacier cycle into a ~50 year cycle; the field has increased dramatically in importance and interest

• Good area to get involved in as not so much established work

Typical Illustration of effect of Climate Change on Greenland:

(31)
(32)
(33)

PolarGrid

n

Important Polar Grid Cyberinfrastructure components

include

Managed data from sensors and satellites

Data analysis such as SAR processing – possibly with parallel

algorithms

Electromagnetic simulations (currently commercial codes) to

design instrument antennas

3D simulations of ice-sheets (glaciers) with non-uniform

meshes

GIS Geographical Information Systems

n

Also need capabilities present in many Grids

Portal i.e. Science Gateway

Submitting multiple sequential or parallel jobs

(34)

F F B F F B F F

B Real TimeMonitor Real Time

Monitor

Archival – High Latency

Archival – High Latency Low Bandwidth Low Bandwidth A d a a y r

Prototype Base/Field Grid

Other Polar Sensors an Sensor Aggregators (Non-polar and Polar Sites) Polar Expeditions

IU Field Base Camps

(35)

Existing User Interface

Document-enhanced Cyberinfrastructure

etc. Google Scholar Manuscript Central Science.gov Windows Live Academic Search Citeseer CMT Conferenc Management Existing Documen Web servic New Document-enhanced Integration Enhancement User Interface Community Tools Generic Document Tools

MyResearc Database Bibliographic Database Export RSS, Bibte

Endnote etc. CiteULike

(36)

Delicious Semantic Web/Grid

n

http://del.icio.us

purchased by

Yahoo

for ~$30M

n

h

ttp://www.CiteULike.org

n

http://www.connotea.org (

Nature)

n

Associate

metadata

with

Bookmarks

specified by

URL’s, DOI’s (Digital Object Identifiers)

n

Users add

comments

and

keywords

(called

tags

)

n

Users are linked together into

groups

(communities)

n

Information such as title and authors extracted

automatically

from some sites (PubMed, ACM, IEEE,

Wiley etc.)

n

Bibtex

like additional information in CiteULike

n

This is perhaps

de facto Semantic Web

– remarkable

for its simplicity

(37)
(38)

Document-enhanced Cyberinfrastructur

aka Semantic Scholar Grid I

n

Citeseer

and

Google Scholar

scour the Internet and analyze

documents for incidental metadata

Title

,

author

and

institution

of documents

Citations

with their own metadata allowing one to match

to other documents

n

Science.gov

extracts metadata from lots of US Government

databases

n

These capabilities are sure to become more powerful and to

be extended

Give “

Citation Index

” in real time

Tell you all authors of all papers that cite a paper that

cites you etc. (Note it’s a small world so don’t go too far

in link analysis)

Tell you all

citations of all papers in a workshop

(39)

Document-enhanced Cyberinfrastructur

aka Semantic Scholar Grid II

n It is natural to develop core document Services such as those

used in Citeseer/Google Scholar but applied to “your”

documents of interest that may not have been processed yet

As just submitted to a conference perhaps

n These tools can help form useful lists such as authors of all cited

or submitted papers to a journal

n OSCAR2/3 (from Peter Murray-Rust’s group at Cambridge)

augment the application independent “core” metadata (Title, authors, institutions, Citations) with a list of all chemical terms

This tool is a Service that can be applied to “your” document or to a set of

documents harvested in some fashion

Other fields have natural application specific metadata and OSCAR like

tools can be developed for them

n Such high value tools could appear on “publisher” sites of future

References

Related documents

Demonstrating the link between safe patient handling programs and enhanced quality of patient care, safety, and outcomes and promote continued organizational support for SPH

In April 2013 State Police Forensic Service Department under framework of EU program „Prevention of and Fight against Crime” launched implementation of project “Implementation

nitrification-denitrification, with nitrate reaching deep into the sediments. The supply of nitrate into the sediments, accompanied by the increased temperatures, accelerates

our options. This gives you clear water, so you are not riding the wakes of the other boats, and you get clear air. When it got light and lumpy we had the space to put the bow

As consequence, this paper contains a study on the effect of the rotor bar number on the torque (average and quality) of five-phase induction machines under different supply

Such a collegiate cul- ture, like honors cultures everywhere, is best achieved by open and trusting relationships of the students with each other and the instructor, discussions

• FedRAMP includes security controls and corresponding enhancements based on NIST SP 800-53 that federal agencies and Cloud Service Providers (CSPs) must implement. • The

Section 3 proposes a Newton’s algorithm for efficient solution of an unregularized nonlinear Bingham-Brinkman (reduced) model. Comparisons between numerical simulations and