Grids and Parallel Computing in iSERVO
International Solid Earth Researc
Virtual Organization
Chinese Earthquake Authority Beijing
July 28 2006 Geoffrey Fox
Computer Science, Informatics, Physics Pervasive Technology Laboratories Indiana University Bloomington IN 47401
http://grids.ucs.indiana.edu/ptliupages/presentations/
APEC Cooperation for Earthquake Simulation
n ACES is a seven year-long collaboration among scientists
interested in earthquake and tsunami predication
• iSERVO is Infrastructure to suppor work of ACES
• SERVOGrid is (completed) US Grid that is a prototype of iSERVO
• http://www.quakes.uq.edu.au/ACES/
n Chartered under APEC –
Participating Institutions
n CSIRO Australia
n Monash University Australia
n University of Western Australia, Perth,
Australia
n University of Queensland Australia
n University of Western Ontario Canada n University of British Columbia Canada
n China National Grid
n Chinese Academy of Sciences
n China Earthquake Administration n China Earthquake Network Center
n Brown University n Boston University
n Jet Propulsion Laboratory n Cal State Fullerton
n San Diego State University
n UC Davis
n UC Irvine n UC San Diego
n University of Southern California n University of Minnesota
n Florida State University n US Geological Survey
n Pacific Tsunami Warning Center PTWC
Hawaii
n National Central University, Taiwan
(Taiwan Chelungpu-fault Drilling Project)
n University of Tokyo
n Tokyo Institute of Technology (Titech) n Sophia University
n National Research Institute for Earth
Science and Disaster Prevention (NIED) Japan
Grids v Parallel Computing
n
Computers Networks Sensors
C1 C2
Time?
Requirements for MPI Messaging
n MPI and SOAP Messaging both send data from a source to a
destination
• MPI supports multicast (broadcast) communication;
• MPI specifies destination and a context (in comm parameter)
• MPI specifies data to send
• MPI has a tag to allow flexibility in processing in source processor • MPI has calls to understand context (number of processors etc.)
n MPI requires very low latency and high bandwidth so that
tcomm/tcalc is at most 10
• BlueGene/L has bandwidth between 0.25 and 3
Gigabytes/sec/node and latency of about 5 microseconds
• Latency determined so Message Size/Bandwidth > Latency
tcomm
Requirements for SOAP Messaging
n Web Services has much of the same requirements as MPI withtwo differences where MPI more stringent than SOAP
• Latencies are inevitably 1 (local) to 100 milliseconds which is
200 to 20,000 times that of BlueGene/L
n 1) 0.000001 ms – CPU does a calculation n 2) 0.001 to 0.01 ms – MPI latency
n 3) 1 to 10 ms – wake-up a thread or process n 4) 10 to 1000 ms – Internet delay
• Bandwidths for many business applications are low as one
just needs to send enough information for ATM and Bank to define transactions
n SOAP has MUCH greater flexibility in areas like security,
fault-tolerance, “virtualizing addressing” because one can run a lot of software in 100 milliseconds
• Typically takes 1-3 milliseconds to gobble up a modest
Database Database Analysis and Visualizatio Portal Repositorie Federated Databases Data Filte Services
Field Trip Data
Streaming Data Sensor s
?
Discovery Services SERVOGrid Researc Simulation s Research Education Customization Services From Researc to Education Educatio Grid Computer FarmGrid of Grids: Research Grid and Education Grid
GI Grid
Sensor Grid Database Grid
What is e-Science?
‘e-Science is about global collaboration in key
areas of science, and the next generation of
infrastructure that will enable it.’
John Taylor
Director General of Research Councils
UK, Office of Science and Technology
Ø
e-Science is about developing tools and
Engine flight data
Airline office
Maintenance Centre
European data centre London Airport
New York Airport
American data center Gri
d Diagnostics Centre
DAME: Operational Scenario
Rolls Royce and UK e-Science Progra
Distributed Aircraft Maintenance
Environment
~ Gigabyte per aircraft per Engine per
transatlantic flight
UK National Grid Service
Grid Operation Support Centre
Web Services based
Computation Starlight (Chicago)
Netherlight (Amsterdam)
Leeds
PSC SDSC
UCL
Network PoP Service Registry NCSA
Manchester
UKLight Oxford
RAL
US TeraGrid
UK NGS
Steering clients
SC05
Local laptops in Seattle and UK All sites connected by
production network (not all shown)
Towards an
The Data Deluge
• In next 5 years e-Science projects will
produce more scientific data than has been
collected in the whole of human history
• Some normalizations:
–
The Bible = 5 Megabytes
–
Annual refereed papers = 1 Terabyte
–
Library of Congress = 20 Terabytes
–
Internet Archive (1996 – 2002) = 100 Terabytes
Ø
In many fields new high throughput
UNIVERSITY OF CALIFORNIA, SAN DIEGO SAN DIEGO SUPERCOMPUTER CENTER
Fran Berman
Hubble Telescope
Palomar Telescope
Sloan Telescope
“The Universe is now being explored systematically, in a panchromatic way, over a range of spatial and
temporal scales that lead to a more complete, and less biased understanding of its constituents, their evolution, their origins, and the
physical processes governing them.”
Towards a National Virtual Observatory
Virtual Observatory Astronomy Gri
Integrate Experiments
Radio Far-Infrared Visible
Visible + X-ray
Dust Map
eDiaMoND Project
Mammograms have different
appearances, depending on image settings and acquisition systems
Stand ard Mamm
o Forma
t
Temporal
mammography
Computer Aided Detection
a
Topography 1 km
Stress Change
Earthquakes
PBO
Site-specific Irregular
Scalar Measurements Constellations for Plate Boundary-Scale Vector Measurements
a
a
Ice Sheets Volcanoes
Long Valley, CA
Northridge, CA
Grid Workflow Datamining in Earth Science
n Work with Scripps Institute
n Grid services controlled by workflow process real time
data from ~70 GPS Sensors in Southern California
Streaming Data Support
Transformations Data Checking
Hidden Marko Datamining (JPL)
Display (GIS)
NASA GPS
SERVOGrid has a portal
Background: Earthquake Forecast – Published Feb 19, 2002, in PNAS.
( JB Rundle et al., PNAS, v99, Supl 1, 2514-2521, Feb 19, 2002; KF Tiampo et al., Europhys. Lett., 60, 481-487, 2002; JB Rundle et al.,Rev. Geophys. Space Phys., 41(4), DOI 10.1029/2003RG000135 ,2003. http://quakesim.jpl.nasa.gov )
Color Scale Decision Threshold
D.T. => “false alarms” vs. “failures to predict”
CL#03-2015
Plot of Log10 (Seismic Potential)
Increase in Potential for significant events, ~ 2000 to 2010
Eighteen significant earthquakes (M > 4.9; blue circles) have occurred in Central or Southern California. Margin of error of the anomalies is +/- 11 km; Data from S. CA. and N. CA catalogs:
After the work was completed
1. Big Bear I, M = 5.1, Feb 10, 2001 2. Coso, M = 5.1, July 17, 2001
After the paper was in press ( September 1, 2001 ) 3. Anza I, M = 5.1, Oct 31, 2001
After the paper was published ( February 19, 2002 ) 4. Baja, M = 5.7, Feb 22, 2002
5. Gilroy, M=4.9 - 5.1, May 13, 2002 6. Big Bear II, M=5.4, Feb 22, 2003 7. San Simeon, M = 6.5, Dec 22, 2003
8. San Clemente Island, M = 5.2, June 15, 2004 9. Bodie I, M=5.5, Sept. 18, 2004
10. Bodie II, M=5.4, Sept. 18, 2004 11. Parkfield I, M = 6.0, Sept. 28, 2004 12. Parkfield II, M = 5.2, Sept. 29, 2004 13. Arvin, M = 5.0, Sept. 29, 2004
14. Parkfield III, M = 5.0, Sept. 30, 2004 15. Wheeler Ridge, M = 5.2, April 16, 2005 16. Anza II, M = 5.2, June 12, 2005
17. Yucaipa, M = 4.9 - 5.2, June 16, 2005 18. Obsidian Butte, M = 5.1, Sept. 2, 2005
Grid Workflow Data Assimilation in Earth Science
n Grid services triggered by abnormal events and controlled by workflow process real
time data from radar and high resolution simulations for tornado forecasts
TeraShake
Simulation
area
600 km by 300 km by 80 km
dx=200m
Peak Velocity
NW-SE Rupture SE-NW rupture
Grids and Cyberinfrastructure
n Grids are the technology based on Web services that implement
Cyberinfrastructure i.e. support eScience or science as a team sport
• Internet scale managed services that link computers data
repositories sensors instruments and people
n There is a portal and services in SERVOGrid for
• Applications such as GeoFEST, RDAHMM, Pattern
Informatics, Virtual California (VC), Simplex, mesh generating programs …..
• Job management and monitoring web services for running
the above codes.
• File management web services for moving files between
various machines.
• Geographical Information System services
• Quaketables earthquake specific database
• Sensors as well as databases
• Context (dynamic metadata) and UDDI system long term
metadata services
Pacific Rim Universities
(APRU ) PRAGMA SERVOGrid GEON SCECGrid Vlab Earth Simulator Naregi
China National Grid Access Infrastructure Institutions IMS International TeraShake Pattern Informatics ALLCAL GeoFEST, PARK, VirtualCalifornia QuakeTables Sesismic InSAR PBO (GPS) U.S.A. FORMOSAT-3/COSMIC (F/C) Chines Taipei JST-CREST GeoFEM GPS Seismic Daichi (InSAR) Japan CAS LURR Seismic GPS P.R. China Pattern Informatics Polaris Radarsat Canada prototype Finley, LSM PANDAS Seismic data, fault database, GPS Australia Wave Motion Earthquake Forecast/Model Data (shared
as part of collaboration) Country
and/or Economies
Current PTWC Network of Seismic Stations
National/Earthquake Grids of Relevance
n APAC –GT2 GT4 gLite
n ACcESS – Some link to SERVOGrid
n China National Grid – GOS GT3 GT4
n ChinaGrid – CGSP built on GT4
n CNGI – China’s Next Generation Internet has significant
earthquake data component
n Naregi – Uses GT4 and Unicore with much enhancements
n Japanese Earthquake Simulation Grid – unclear
n K*Grid Korea Enhanced SRB, GT2 to GT4
n TIGER Taiwan Integrated Grid for Education and Research
unclear technology and unclear earthquake relevance
n SERVOGrid – Uses WS-I+ simple Web Services
n TeraGrid – Uses GT4 but not a clear model except for core job
TeraGrid: Integrating NSF Cyberinfrastructure
TeraGrid is a facility that integrates computational, information, and analysis resources at the San Diego Supercomputer Center, the Texas Advanced Computing Center, the University of Chicago / Argonne National Laboratory, the National Center for Supercomputing Applications, Purdue University,Indiana University, Oak Ridge National Laboratory, the Pittsburgh
Supercomputing Center, and the National Center for Atmospheric Research.
Today 100 Teraflop; tomorrow a petaflop; Indiana 20 teraflop today.
SDSC
TACC
UC/ANL
NCSA
ORNL
PU
IU
PSC NCAR
Caltech
USC-ISI Utah
Iowa
Cornell Buffalo
QPSF ANU VPAC ac3 TPAC CSIRO Network:
GrangeNet / AARNet
APAC Private Network (AARNet)
Security:
APAC CA MyProxy VOMRS
APAC National Gri
Core Grid Services
Portal Tools: GridSphere Info Services: APAC Registry INCA2? IVEC SAPAC APAC National Facility Systems: Gateways Partners’ systems QPSF (JCU)
National “Grid Projects” in China
Net-based Res. Env. Plan Research Develop Production Procur Deplo Operat Manage CAS e-ScienceScience and Technology R &D Assets Foundation Platform
Next-Generation Network Initiative Edu. & Res. Grid Chin National Grid Stat Council NSF CAS MoE MoST Nationa Plannin Commission Semantic Grid China e-Nation Strategy (2006-2020)
Virtual Comp. Env.
CNGrid (2006-2010)
•
HPC Systems
– 100 Tflop/s by 2008, Pflop/s by 2010?
•
Grid Software Suite: CNGrid GOS
– Merge with
international efforts
– Emphasize production
•
CNGrid Environment
– Nodes, Centers, Policies
•
Applications
– Science
– Resource & Environment – Manufacturing
– Services
Cyber Science Infrastructure toward Petascale Computing (planned 2006-2011)
Cyber-Science Infrastructure(CSI)
(IT Infra. for Academic Research and Education)
Operation/ Maintenan ce (Middlewa re) Networkin g Contents
NII
Collaborative Operation CenterDelivye r
Delivery
Networking Infrastructure (Super-SINET)
Univ./National Supercomputing VO Domain Specific VO (e.g ITBL) Feedback Feedback R&D Collaboration Operaontional Collaborati Middlewa re CA NAREGI Site Research Dev.(
)βver.V1.0 V2.0 International Collaboration - EGEE - UNIGRIDS -Teragrid -GGF etc. Feedback Deliver y Project-Oriented VO Delivery Domain Specific VOs Customization Operation/Maintenance
ナノ分野 実証・評価
分子研 ナノ分野 実証・評価
分子研
Nano Proof of al.Concept Eval. IMS Nano Proof, Eval. IMS Joint Project
(Bio)
Osaka-UJoint Project AIST R&D Collaboration Industrial Projects Project-oriented VO
Note: names of VO are tentative) Peta-scale System VO Core Site R&D Collaboration Operation/ Maintenance
(UPKI,CA)
GONET Hi-net K-NET
Database for Model Construction Plate
Motion
Platform for Integrated Simulation
Data Processing, Visualization, Linear Solvers
Simulation Output
PC clusters for small-intermediate problems
Earth Simulator for large-scale problems
GIS Urban Information Tectonic Loading Earthqua keRuptur e Structure Oscillatio n Wave Propagati on Tsunami Generatio n Earthquake Generation
Strong Motion and Tsunami Generation
JST-CREST Integrated Predictive Simulation System
Artificial Structure Oscillation