• No results found

Building Secure and Transparent Inter-Cloud Infrastructure for Scientific Applications

N/A
N/A
Protected

Academic year: 2021

Share "Building Secure and Transparent Inter-Cloud Infrastructure for Scientific Applications"

Copied!
26
0
0

Loading.... (view fulltext now)

Full text

(1)

Building Secure and Transparent Inter-Cloud

Infrastructure for Scientific Applications

Yoshio Tanaka

Information Technology Research Institute, National Institute of AIST, Japan

(2)

We could gain a lot of insights through Grid experiments!!

• Building Grid

– Communities

• ApGrid (2002~), PRAGMA (2003~) • APGrid PMA (2004~), IGTF(2005~)

• Using Grid

– Programming

• Ninf-G: GridRPC-based programming middleware (1994~)

– Applications

• Hybrid QM/MD Simulation (2004~) • Grid FMO (2005~)

• Experiments

– Run Grid applications on large-scale Grid infrastructure

• ApGrid + TeraGrid (2003~2008) • PRAGMA + OSG (2007)

(3)

Multi-Scale Simulation using

GridRPC+MPI

• Simulation of Hydrogen Diffusion Process of Gammma-alumina

– Executed on the Pan-pacific Grid Testbed (~ 1000 CPUs) – More than 5000 QM simulations during 50 hours

– Implemented based on GridRPC + MPI Programming model.

MPI QM QM QM QM MPI QM QM QM QM MPI QM QM QM QM MPI QM QM QM QM RPC scheduler Purdue SDSC AIST MPI MD MD MD MD RPC RPC RPC NEB

Dual Xeon (3.1 GHz) 41×1 CPUs for MD

Dual Xeon (3.1GHz) 32×6 CPUs for QM

Dual Opteron (2.0 GHz) 32×8 CPUs for QM AIST Clusters

Dual ItaniumII (3.1 GHz) 32×8 CPUs for QM

Dual ItaniumII (3.1GHz) 32×6 CPUs for QM

Dual Xeon (3.2 GHz) 32×8 CPUs for QM TeraGrid Clusters

NCSA USC

(4)

Phase 1 Phase 2 Phase 3 Phase 4

• Experiment Time: 18. 97 days • Simulation steps: 270 (~ 54 fs) • Total: 150K CPU x hour

QM1 P32 P32 P32 P32 P32 USC USC USC ISTBS ISTBS

QM2 P32 P32 NCSA NCSA NCSA USC USC USC Presto Presto

QM3 M64 M64 M64 M64 M64 M64 M64 M64 M64 M64

QM4 P32 P32 TCS TCS TCS USC USC USC P32 P32

QM5 P32 P32 TCS TCS TCS USC USC USC P32 P32

Reserve F32 F32 P32 P32 P32 P32 P32 P32 F32 F32 P32 M64 F32 HPC TCS DTF PreSTo III ISTBS

(5)

GridFMO - The First Principle Calculations on the Grid (2007)

Empowered by Grid technology, large scale fragment molecular orbital (FMO) calculations were conducted making use of computational resources on the Grid. The resources were collected interoperably from PRAGMA and OSG domains, via ordinary queue systems on each machine. Without special arrangements, a series of the calculations were carried out for about 3 months using maximum of 240 CPUs.

● Utilize vast computational resources on the Grid

● Flexible resource management to exploit ad-hoc available resources ● Full electron calculation of bio

molecules by using FMO method ● Long-term calculations supported

(6)

Grid experiments were exciting and successful, but…

• Using heterogeneous and distributed

resources was not easy.

• Heterogeneity did exist in various places

– OS, Library version

– in more details of the system configuration

• Configuration of queuing systems

– e.g. max wall clock time

• disk quota limit

• Firewall configuration

• We need to install and test applications at

each site one by one.

– We should not expect end users to do the same!

(7)

Goals of and approaches by PRAGMA

• Enable Specialized Applications to run easily

on distributed resources

– Build once, run everywhere!!

• Investigate Virtualization as a practical

mechanism

– Supporting Multiple VM Infrastructures (Xen, KVM, OpenNebula, Rocks, WebOS, EC2)

• Share VM images in PRAGMA VM repository

so that we can boot our application VMs at

any site by any PRAGMA colleagues.

– Discussed in PRAGMA 20 workshop @ HK, March 3rd and 4th, 2011, 1 week before big

(8)

2011 Tohoku Earthquake

(9)

Satellite Data Flow and Services Prior to March 11 9 NASA AIST ERSDAC JAXA Terra/ASTER

- Archive (tape, B-ray) - Archive (on-Disk) - Processing - WMS - Data providing - Portal ALOS/PALSAR 70 GB/day (ASTER) 360 GB/day (PALSAR)

ASTER data: NASA→ERSDAC→AIST

PALSAR data: JAXA→ERSDAC→AIST

(processing, WMS, portal site, and data providing by AIST)

(10)

Data Flow and Services from March 11 till April 20 10 NASA (AIST) ERSDAC JAXA Terra/ASTER - Processing - WMS ALOS/PALSAR

ASTER data: NASA→ERSDAC→(AIST)→

PALSAR data: JAXA→ERSDAC→(AIST)→

(processing and WMS by Orkney, portal site by Google)

Orkney

Google

TDRS

(11)

Data Flow and Services from April 21 NASA (AIST) ERSDAC JAXA TDRS Terra/ASTER - Processing - WMS ALOS/PALSAR

ASTER data: NASA→ERSDAC→(AIST)→

PALSAR data: JAXA→ERSDAC→(AIST)→ (processing by NCHC, SDSC,

and OCCI, WMS by NCHC, portal site by Google)

Google - Portal UCSD - Processing - Processing OCCI NCHC

(12)

Insights

Fortunately, we already had VM images for satellite

data processing.

– We have prepared for using cloud.

Need to make it routine use!

PRAGMA members had disasters/accidents.

– Japan earthquake – Thailand flooding

– California power outage

PRAGMA members has common interests/needs to

build a sustainable infrastructure which could be used

to support each other in case of emergency.

– We accelerated the development/deployment of PRAGMA

(13)

PRAGMA Grid v.s. PRAGMA Cloud

• PRAGMA Grid

– Each site

• Installs common software stack (e.g. Globus, Condor)

– It’s not easy to synchronize version up…

• If the application needs additional libraries and

configuration, etc., they must be installed / configured.

– Application drivers

• Install, test, and run applications at each site.

• PRAGMA Cloud

– Each site

• Deploy cloud hosting environments.

– Application drivers

• Build VM image for their applications. • Boot the VM at each site.

(14)

PRAGMA Grid/Clouds

26 institutions in 17 countries/regions,23 compute sites,

UZH Switzerland NECTEC KU Thailand UoHyd India MIMOS USM Malaysia HKU HongKong ASGC NCHC Taiwan HCMUT HUT IOIT-Hanoi IOIT-HCM Vietnam AIST OsakaU UTsukuba Japan MU Australia KISTI KMU Korea JLU China SDSC USA UChile Chile CeNAT-ITCR Costa Rica BESTGrid New Zealand CNIC China LZU China UZH Switzerland LZU China ASTI Philippines IndianaU USA UValle Colombia

(15)

Deploy Three Different Software Stacks on the PRAGMA Cloud

• QuiQuake

– Simulator of ground motion map when earthquake occurs

– Invoked when big earthquake occurs

• HotSpot

– Find high temperature area from Satellite

– Run daily basis (when ASTER data arrives from NASA)

• WMS server

– Provides satellite images via WMS protocol

– Run daily basis, but the number of requests is not stable.

(16)

Gfarm file server

Gfarm file server Gfarm file server

Gfarm file server Gfarm file server

Gfarm Client Gfarm meta-server

Gfarm file server

PRAGMA vM RePositoRy

QuickQuake Geogrid + Bloss HotSpot Fmotif Gfarm Client

Gfarm is used for

sharing VM images

(17)

AIST HotSpot + Condor gFS gFS gFS gFS gFS SDSC (USA) Rocks Xen NCHC (Taiwan) OpenNebula KVM LZU (China) Rocks KVM AIST (Japan) OpenNebula Xen IU (USA) Rocks Xen Osaka (Japan) Rocks Xen gFC gFC gFC gFC gFC gFC gFS gFS gFS gFS gFS

GFARM Grid File System (Japan)

AIST QuickQuake + Condor

AIST Geogrid + Bloss

AIST Web Map Service + Condor UCSD Autodock + Condor

NCHC Fmotif = VM deploy Script VM Image copied from gFarm VM Image copied from gFarm VM Image copied from gFarm VM Image copied from gFarm VM Image copied from gFarm Condor Master VM Image copied from gFarm S S S S S S S gFC gFS

= Grid Farm Client = Grid Farm Server

slave

slave slave

slave slave

slave

Put all together

Store VM images in Gfarm systems Run vm-deploy scripts at PRAGMA Sites Copy VM images on Demand from gFarm Modify/start VM instances at PRAGMA sites

(18)

What are the Essential Steps

1. AIST/GEO Grid creates their VM image

2. Image made available in “centralized”

storage (currently Gfarm is used)

3. PRAGMA sites copy GEO Grid images to

local clouds

1. Assign IP addresses

2. What happens if image is in KVM and site is Xen?

4. Modified images are booted

(19)

Basic Operation

• VM image authored locally, uploaded into

VM-image repository (Gfarm from U.

Tsukuba)

• At local sites:

– Image copied from repository

– Local copy modified (automatic) to run on specific infrastructure

– Local copy booted

• For running in EC2, adapted methods

automated in Rocks to modify, bundle,

and upload after local copy to UCSD.

(20)

PRAGMA Compute Cloud

UoHyd India MIMOS Malaysia NCHC Taiwan AIST OsakaU Japan SDSC USA CNIC China LZU China LZU China ASTI Philippines IndianaU USA JLU China

Cloud Sites Integrated in GEO Grid

Execution Pool

(21)

Next step: Network virtualization

• Some motivations

– Need to change configuration of firewall and condor master when a new VM launches

– Network isolation for security

(22)

OpenFlow

A centralized programmable remote controller dictates the forwarding behavior to multiple OpenFlow switches. This architecture separating forwarding plane from control plane allows flexible management to network operator.

Utilized software tools for this demo:

Trema (http://trema.github.com/trema/)

A framework for developing OpenFlow controllers • Open vSwitch (http://openvswitch.org/)

A software based implementation of OpenFlow switch

OpenFlow Controller OpenFlow Protocol Flow control programming Open vSwitch OpenFlow Switch Network Operator forwarding plane control plane

(23)

Openflow network environment Open vSwitch Open vSwitch Open vSwitch Open vSwitch Open vSwitch Open vSwitch Open vSwitch Open vSwitch Open vSwitch Openflow Controller Trema

(Sliceable routing switch)

Osaka Univ. AIST UCSD Openflow network GRE GRE GRE GRE GRE GRE VM VM VM VM VM VM VM VM VM VM VM VM VM VM VM VM VM VM VM VM VM VM VM VM

Virtual network slice A

Virtual network slice B

(24)

Comparison between the past and the present approach

Condor Master

The past The present

Condor master Admin VM VM VM VM VM Site A

admin admin Site B admin Site C Assign a new

global IP for each VM launching

Configure firewall policy to pass the

communication for Gfarm, condor and so on

Register the new launched VM’s IP to the condor configuraion Condor Master Condor master Admin VM VM VM VM VM Site A

admin admin Site B admin Site C Dedicated isolated

virtual L2 network Request IP via

DHCP

Register the IP to the condor pool

automatically

(25)

Summary

• We learned a lot through Grid

experiments.

• Migrating from Grid to Cloud

– Virtualization technologies is useful for making distributed infrastructure easy to use.

• Still have many research issues.

– Data

– Network virtualization – Resource managements – Security

(26)

Acknowledgements

NARL|NCHC UCSD|SDSC

Open Cloud Consortium ERSDAC

Orkney

NTT-data-CCS CTC

Université Lille 1

(Science and technology)

ITT

The most of the work was done in collaboration with

PRAGMA members, especially UCSD, NCHC, Osaka U.

GEO Grid Disaster Response work was supported by

References

Related documents

Reciprocal F1-crosses of Arabidopsis thaliana accessions C24 and Columbia showed a significant increase in nonacclimated and acclimated freezing tolerance and in their content

The residents of South Cheyenne believe that future plan- ning and construction regarding the South Greeley Highway needs to be consistent with the improvements that have been

Using an interdisciplinary approach, partnered with arts and sciences, medicine, education, business, communication and law, the School of Health Sciences and Human Services

Many other related projects in the area of distance education exist (Bacher et al. Howerver in contrast to the other projects, our approach integrates synchronous and asynchronous

Candidate prodromal symptoms were Western Ontario & McMaster Universities Osteoarthritis Index (WOMAC) and Knee injury and Osteoarthritis Outcome Score (KOOS) subscale scores

(d) Our investigations do not confirm the previous finding that the CSGT method requires larger basis sets than the GIAO method to achieve results of similar accuracy: Quite good

Attending sows at farrowing decreases the number of “stillborn” pigs that die during birth or within the first few hours afterwards; pigs can be freed from membranes, weak

Table 5.1: Number of litters, means for within-litter birth weight variation (CVB), within- litter weaning weight variation (CVW), mean weaning weight (MWWT),