What FutureGrid Can Do for You
TeraGrid’11 BOF Session
Agenda
•
FutureGrid from User’s Perspective
,
Geoffrey Fox
•
How to Access FutureGrid
, Gregor von Laszewski
•
HPC on FutureGrid
, Warren Smith
•
Cloud Computing on FutureGrid
, Kate Keahey
•
Training, Education and Outreach
, Renato
Figueiredo
•
Experimental Framework Support
, Warren Smith
FutureGrid BO
Overview
TG 11
Salt Lake City
July 18 2011
Geoffrey Fox
[email protected]http://www.infomall.org https://portal.futuregrid.org
Director, Digital Science Center, Pervasive Technology Institute
FutureGrid key Concepts I
•
FutureGrid supports
Computer Science
and
Computational Science
research in
cloud, grid and parallel computing (HPC)
•
The FutureGrid testbed provides to its users:
–
An interactive development and testing platform for
middleware and application users looking at
interoperability
,
functionality
,
performance
or
evaluation
with or without
virtualization
–
A rich
education and teaching
platform for advanced
cyberinfrastructure (computer science) classes
•
FutureGrid has a complementary focus to both the Open Science
Grid and the other parts of XSEDE.
FutureGrid key Concepts II
•
Rather than loading images onto VM’s, FutureGrid supports
Cloud, Grid and Parallel computing
environments by
dynamically provisioning
software as needed onto “bare-metal”
using Moab/xCAT
– Image library for MPI, OpenMP, MapReduce (Hadoop, Dryad, Twister), gLite, Unicore, Xen, Genesis II, ScaleMP (distributed Shared Memory), Nimbus, Eucalyptus, OpenNebula, OpenStack, KVM, Windows …..
•
Growth comes from users depositing novel images in library
•
FutureGrid has ~4300 (will grow to ~5000) distributed cores
FutureGrid Partners
•
Indiana University
(Architecture, core software, Support)
•
Purdue University
(HTC Hardware)
•
San Diego Supercomputer Center
at University of California San Diego
(INCA, Monitoring)
•
University of Chicago
/Argonne National Labs (Nimbus)
•
University of Florida
(ViNE, Education and Outreach)
•
University of Southern California Information Sciences (Pegasus to manage
experiments)
•
University of Tennessee Knoxville (Benchmarking)
•
University of Texas at Austin
/Texas Advanced Computing Center (Portal)
•
University of Virginia (OGF, Advisory Board and allocation)
•
Center for Information Services and GWT-TUD from Technische Universtität
Dresden. (VAMPIR)
FutureGrid:
a Grid/Cloud/HPC Testbed
Compute Hardware
Name System type # CPUs Cores TFLOPS# Total RAM(GB) SecondaryStorage
(TB) Site Status
india IBM iDataPlex 256 1024 11 3072 339 + 16 IU Operational
alamo PowerEdgeDell 192 768 8 1152 30 TACC Operational
hotel IBM iDataPlex 168 672 7 2016 120 UC Operational
sierra IBM iDataPlex 168 672 7 2688 96 SDSC Operational
xray Cray XT5m 168 672 6 1344 339 IU Operational
foxtrot IBM iDataPlex 64 256 2 768 24 UF Operational
Bravo* Large Disk &memory 32 128 1.5 (192GB per3072 node)
144 (12 TB
per Server) IU Aug. 1 generalEarly user
Delta* Large Disk &memory With Tesla GPU’s
16
16 GPU’s 96 ? 3
1536 (192GB per
node)
96 (12 TB
per Server) IU ~Sept 15
Total 1064 4288 45 16TB
Storage Hardware
System Type Capacity (TB) File System Site Status
DDN 9550
(Data Capacitor) 339 shared with IU+ 16 TB dedicated Lustre IU Existing System DDN 6620 120 GPFS UC New System SunFire x4170 96 ZFS SDSC New System Dell MD3000 30 NFS TACC New System
Network Impairment Device
Spirent XGEM Network Impairments Simulator for
jitter, errors, delay, etc
Full Bidirectional 10G w/64 byte packets
up to 15 seconds introduced delay (in 16ns
increments)
0-100% introduced packet loss in .0001%
increments
Packet manipulation in first 2000 bytes
up to 16k frame size
5 Use Types for FutureGrid
•
122
approved projects July 17 2011
–
https://portal.futuregrid.org/projects
•
Training Education and Outreach (13)
–
Semester and short events; promising for small universities
•
Interoperability test-beds (4)
–
Grids and Clouds;
Standards
; from Open Grid Forum OGF
•
Domain Science applications (42)
–
Life science highlighted (21)
•
Computer science (50)
–
Largest current category
•
Computer Systems Evaluation (35)
–
TeraGrid (TIS, TAS, XSEDE), OSG, EGI
Selected Current Education projects
•
System Programming and Cloud Computing,
Fresno
State, Teaches system programming and cloud
computing in different computing environments
•
REU: Cloud Computing,
Arkansas, Offers hands-on
experience with FutureGrid tools and technologies
•
Workshop: A Cloud View on Computing,
Indiana
School of Informatics and Computing (SOIC), Boot
camp on MapReduce for faculty and graduate students
from underserved ADMI institutions
Selected Current Interoperability
Projects
•
SAGA,
Louisiana State, Explores use of
FutureGrid components for extensive
portability and interoperability testing of
Simple API for Grid Applications, and scale-up
and scale-out experiments
Selected Current Bio Application
Projects
•
Metagenomics Clustering,
North Texas,
Analyzes metagenomic data from samples
collected from patients
•
Genome Assembly,
Indiana SOIC, De novo
Selected Current Non-Bio
Application Projects
•
Physics: Higgs boson,
Virginia, Matrix Element
calculations representing production and
decay mechanisms for Higgs and background
processes
•
Business Intelligence on MapReduce,
Cal
State - L.A., Market basket and customer
Selected Current Computer
Science Projects
•
Data Transfer Throughput,
Buffalo, End-to-end
optimization of data transfer throughput over
wide-area, high-speed networks
•
Elastic Computing,
Colorado, Tools and technologies
to create elastic computing environments using IaaS
clouds that adjust to changes in demand automatically
and transparently
Selected Current Technology
Projects
•
ScaleMP for Gene Assembly,
Indiana Pervasive
Technology Institute (PTI) and Biology,
Investigates distributed shared memory over 16
nodes for SOAPdenovo assembly of Daphnia
genomes
•
XSEDE,
Virginia, Uses FutureGrid resources as a
testbed for XSEDE software development
•
Globus Online,
Indiana PTI, Chicago, Investigates
the feasibility of providing DemoGrid and its
Typical FutureGrid Performance Study
ADMI Cloudy View on
Computing Workshop
June 2011
• Jerome took two courses from IU in this area Fall 2010 and Spring 2011 on FutureGrid
• ADMI: Association of Computer and Information Science/Engineering Departments at Minority Institutions
• Offered on FutureGrid
• 10 Faculty and Graduate Students from ADMI Universities
• The workshop provided information from cloud programming models to case studies of scientific applications on FutureGrid.
• At the conclusion of the workshop, the participants indicated that they would incorporate cloud computing into their courses and/or research.
Concept and Delivery b
Jerome Mitchell:
FutureGrid Viral Growth Model
•
Users apply for a project
•
Users improve/develop some software in project
•
This project leads to new images which are placed in
FutureGrid repository
•
Project report and other web pages document use
of new images
•
Images are used by other users
•
And so on ad infinitum ………
Elementary FG Access Services
FG Portal
• Coordination of Projects and users
– Project management
• Membership
• Results
– User Management
• Contact Information
• Keys, OpenID
• Coordination of Information
– Manuals, tutorials, FAQ, Help – Status
• Resources, outages, usage, … • Coordination of the Community
– Information exchange: Forum, comments, community pages – Feedback: rating, polls
• Technology has been established
• Transition technical development to TACC as much as possible so we can focus on other areas at IU
Check your Account Status
•
Goto:
– Accounts-My Portal Account
•
Check if the account status
bar is green
– Errors will indicate an issue or a task that requires waiting
•
Since you are already here:
– Upload a portrait
Get access
Project Lead
•
Create a portal account
•
Create a project
•
Add project members
Project Member
•
Create a portal account
•
Ask your project lead to
add you to the project
Once the project you participate in is approved
Apply for an HPC & Nimbus account
•You will need an ssh key
Services
Offered
ViNe can be installed on the other resources via Nimbus
Access to the resource is
requested through the portal
Pegasus available via Nimbus and Eucalyptus
HPC on FutureGrid
Warren Smith
HPC on FutureGrid
•
HPC-style usage is supported
•
Many of the clusters have an HPC partition
•
Clusters well suited to HPC
–
Infiniband networks
Compute Hardware
Name System type # CPUs Cores TFLOPS# Total RAM(GB) SecondaryStorage
(TB) Site Status
india IBM iDataPlex 256 1024 11 3072 339 + 16 IU Operational
alamo PowerEdgeDell 192 768 8 1152 30 TACC Operational
hotel IBM iDataPlex 168 672 7 2016 120 UC Operational
sierra IBM iDataPlex 168 672 7 2688 96 SDSC Operational
xray Cray XT5m 168 672 6 1344 339 IU Operational
foxtrot IBM iDataPlex 64 256 2 768 24 UF Operational
Bravo* Large Disk &memory 32 128 1.5 (192GB per3072 node)
144 (12 TB
per Server) IU Aug. 1 generalEarly user
Delta* Large Disk &memory With Tesla GPU’s
16
16 GPU’s 96 ? 3
1536 (192GB per
node)
96 (12 TB
HPC Access
•
ssh to login nodes
–
alamo.futuregrid.org, hotel.futuregrid.org, …
–
Uses the public key you’ve uploaded to the portal
•
Modules to manage your environment
•
Intel and Gnu compilers (others wanted?)
•
MPI, OpenMP
•
Torque and Moab to schedule access to compute
nodes
–
Reservations?
Performance Tools
•
Provide a number of tools to analyze
performance
•
Full support of partner tools
What is Nimbus?
Enable providers to build IaaS clouds
Enable users to use IaaS clouds
Nimbus
Infrastructure
Nimbus
Platform
Workspace Service Cumulus Context Broker Cloudinit.d High-quality, extensible, customizable,open source implementation
GatewayScalingElastic Tools
Enable developers to extend,
Using Nimbus Infrastructure
Pool
node nodePool nodePool
Pool
node nodePool nodePool
Pool
node nodePool nodePool
Pool
node nodePool nodePool
Using Nimbus Infrastructure
Pool
node nodePool nodePool
Pool
node nodePool nodePool
Pool
node nodePool nodePool
Pool
node nodePool nodePool
Nimbus publishes information about each
VM
Users can find out information about their
VM (e.g. what IP the VM was bound to)
Users can interact directly with their VM in the same
way the would with a physical machine.
Nimbus on FutureGrid
•
Hotel
(University of Chicago) -- Xe
41 nodes, 328 cores
•
Foxtrot
(University of Florida) -- Xe
26 nodes, 208 cores
•
Sierra
(SDSC) -- Xe
18 nodes, 144 cores
Sky Computing
•
Sky Computing = a Federation of
Clouds
•
Approach:
– Combine resources obtained in
multiple Nimbus clouds in FutureGrid and Grid’ 5000
– Combine Context Broker, ViNe, fast image deployment
– Deployed a virtual cluster of over 1000 cores on Grid5000 and
FutureGrid – largest ever of this type
•
Grid’5000 Large Scale Deployment
Challenge award
•
Demonstrated at OGF 29 06/10
•
TeraGrid ’10 poster
• More at:
www.isgtw.org/?pid=1002832
Work by Pierre Riteau et al,
University of Rennes 1
“Sky Computing”
Backfill: Lower the Cost of Your Cloud
•
Challenge:
utilization, catch-22 of
on-demand computing
•
Solution: new instances
– Backfill
•
Bottom line:
up to 100%
utilization
•
Who decides what backfill VMs
run?
•
Spot pricing
•
Research by Paul Marshall,
University of Colorado
•
Open Source community
contributions via Google Summer
of Code (GSoC), Paolo Gomez
•
Nimbus release 2.7
•
Paper @ CCGrid 2011
16 % 31 % 47 % 62 % 78 % 94 %
•
BarBar Experiment at SLAC
in Stanford, CA
•
Using clouds to simulating
electron-positron collisions
in their detector
•
Exploring virtualization as a
vehicle for data
preservation
•
Approach:
– Appliance preparation and management
– Distributed Nimbus clouds
– Cloud Scheduler
•
Running production BaBar
workloads
UVIC Efforts
Cloud Computing on FutureGrid
•
Several Infrastructure-as-a-Service clouds
–
Nimbus, Eucalyptus, OpenStack (experimental)
•
Supported patterns
–
Experimenting with middleware on top of
infrastructure clouds
–
Modifying and experimenting with infrastructure
clouds
Overview
•
Traditional ways of delivering hands-on training and
education in parallel/distributed computing have
non-trivial dependences on the environment
• Difficult to replicate same environment on different resources (e.g. HPC clusters, desktops)
• Difficult to cope with changes in the environment (e.g. software upgrades)
TEO Infrastructure - guiding principles
•
Fidelity
: TEO activities should use full-fledged,
executable software: education/training modules
–
Learn using the proper tools
•
Reproducibility:
Creators of content should be able
to install, configure, and test their modules once,
and be assured of the same functional behavior
regardless of where the module is deployed
TEO Infrastructure - guiding principles
•
Deployability:
Students and users should be
able to deploy modules in a simple manner,
and in a variety of resources
–
Reduce barriers to entry; avoid dependences
upon a particular infrastructure
•
Community-oriented
: Modules should be
simple to share, discover, reuse, and expand
Towards this vision in FutureGrid
•
Executable modules –
virtual appliances
–
Deployable on FutureGrid resources
–
Deployable on other cloud platforms, as well as
on virtualized desktops
•
Community sharing – Web 2.0 portal,
appliance image repositories
Virtual appliances
•
Leverage existing virtual networking software and
virtual appliance images used in other projects
•
Focus: integration with FutureGrid resources
–
Leverage network virtualization software
• FutureGrid includes ViNe and GroupVPN
–
Image deployment, testing, documentation, tutorials
• KVM/Xen, Nimbus/EucalyptusVirtual appliance clusters
•
Same image, different VPNs
c o p y instanti ate Had oop + Virtu al Netw ork A Hadoop
worker Another Hadoopworker
University of Arkansas Indiana University University of California at Los Angeles Penn State Iowa State Univ.Illinois at Chicago University of Minnesota Michigan State Notre Dame University of Texas at El Paso IBM Almaden Research Center Washington University San Diego Supercomputer Center University of Florida Johns Hopkins July 26-30, 2010 NCSA Summer School Workshop
http://salsahpc.indiana.edu/tutorial
300+ Students (200 on sites from 10 institutes; 100 online)
IU MapReduce and UF Virtual Appliance technologies are supported by FutureGrid.
Activities: Courses
•
Graduate-level “Cloud computing for
Data-Intensive Sciences” (Judy Qiu, Fall 2010)
–
Virtualization technologies and tools
–
Infrastructure as a service
–
Parallel programming (MPI, Hadoop)
•
FutureGrid supported activities in a new
semester-long class offered Fall 2010 at LSU
(Gabrielle Allen, Shantenu Jha)
Activities: ADMI Workshop
•
Cloudy View on Computing workshop
–
10 faulty members and graduate students from HBCUs
interested in cloud computing.
–
Cloud programming models, case studies of scientific
Experiment Management on
FutureGrid
Warren Smith
Experiment Managemen
Goals
•
Support rigorous experimentation
–
Define experiments in detail
–
Record experimental results
• User-specified measurements (placement and granularity)
–
Share experiment information
• Experiments can be repeated and verified • Variations on experiments can be performed
•
Convenient execution of experiments
–
FutureGrid has distributed resources and services
Experiment Managemen
Approach
•
Provide tools to execute distributed experiments
– Access (potentially many) resources – Interact with a number of services
– Support execution of experiment plans
•
Support several usage models
– Workflow (often large, automatic, batched, unattended) – Interactive (attended)
– Hybrid
•
Store experiment information for later use
Experiment Managemen
Available Components
•
Pegasus
– Workflow-based experiment management – Builds on existing Pegasus software
• Kickstart to record job execution and its environment • Details of Pegasus presented elsewhere
•
TakTuk
– Basic interactive experiment management – Reuse tool deployed on Grid 5000
•
Host List Manager
– Organize provisioned systems into groups, generate host lists for TakTuk
Experiment Managemen
Planned Components
•
Messaging-based Execution and Monitoring System (MEMS)
– More sophisticated interactive experiment management
– Integrated message streams for commands, results, and monitoring
•
Pegasus provisioning workflows
– Include resource provisioning into workflow
•
Experiment Repository
– Store and retrieve information about experiments – Uses the FG Image Repository as component.