S06: Open-Source
Stack for Cloud
Computing
Milind Bhandarkar Yahoo! Michael Ryan Intel Michael Kozuch Intel Richard Gass IntelAgenda
Sessions:
(A) Introduction 8.30-9.00
(B) Hadoop
9.00-10.00
Break 10.00-10.15
Hadoop 10.15-11:30
Lunch 11.30-12.30
(C) Pig 12.30-1.30
Break 1.30-1.45
(D) Tashi
1.45-3.30
I.
Speaker intros
II.
Motivation
III.
Open Cirrus
IV.
Open Cirrus
software stack
V.
Getting involved
Session A:
Introduction
Michael Kozuch (Intro)
•
Michael Kozuch is a Principal Engineer with Intel
Labs Pittsburgh and manager of the ILP
Systems Research and Engineering group
– Manages the Intel Open Cirrus cluster and is the PI for the Tashi research project
•
Michael is a 12-year veteran of Intel and
contributed to the development of Intel’s VT and
TXT technologies
Milind Bhandarkar (Hadoop)
•
Lead Yahoo! Grid Solutions Team since
June 2005
•
Contributor to Hadoop since January 2006
•
Trained 1000+ Hadoop users at Yahoo! &
elsewhere
•
20+ years of experience in Parallel
Programming
Michael Ryan (Tashi)
•
Michael is currently a research engineer
with Intel Labs Pittsburgh
–
Lead developer for Tashi
–
Serves as sysadmin for the Intel Open Cirrus
site
–
Coordinates the Global Monitoring service for
Open Cirrus
Richard Gass (PRS)
•
Richard is currently a research engineer with
Intel Labs Pittsburgh
– Lead developer for PRS
– Serves as sysadmin for the Intel OpenCirrus site
•
Richard has published 9+ scientific papers and
is also an (imminent) PhD candidate with
Why Open and Cloud makes
sense
• Cloud Computing is a new, critical technology
– Efficiency: Admin costs aggregated
– Scalability: From 1 to 1000 servers in 10 sec. flat
– Empowerment: Anyone can buy a cluster
• Open Communities enable rapid innovation
– Exchange of ideas: Knowledge grows
– Constructive Darwinism: Best tools survive/evolve
– Empowerment: Anyone can build a LAMP stack
Rapidly developing and deploying innovative computing technologies
Research Interest: Big Data
• Interesting applications are data hungry
• The data grows over time
• The data is immobile
– 100 TB @ 1Gbps ~= 10 days
• Compute comes to the data
• Big Data clusters are the new libraries
Open Cirrus
™Cloud Computing Testbed
MIMOS* ETRI* ISPRAS* KIT* UIUC* IDA*Collaboration between industry and academia, sharing
•hardware infrastructure
Open Cirrus
• Objectives
– Foster systems research around cloud computing
– Vendor-neutral open-source stacks and APIs for the cloud
– Expose research community to enterprise level requirements
– Provide realistic traces of cloud workloads
• How are we unique
– Support for systems research and applications research
– Federation of heterogeneous datacenters
– Collection of interesting data sets
Independently-managed sites…
User Access to Open Cirrus
• User access is organized around Research Projects
– Led by Principal Investigator (PI)
• Project PIs apply to each site separately
– Identifying additional team members
• Contact information for applications to each site are available on the Open Cirrus Web site
Open Cirrus
*
Research Projects
Example research
areas of interest
Datacenter federation Datacenter management Web services Data-intensive systemsProjects typically
not of interest
Traditional HPC app developmentProduction apps looking for “free” cycles
Closed-source system development
Open Cirrus* Software
Components
Physical Machine Allocation (PRS)
Cluster Storage (HDFS)
Virtual Machine Allocation (AWS* Compatible, e.g. Tashi or Eucalyptus)
Application Services (Hadoop)
Compute Node Services Global Services Site Services Single Sign-On Global Monitoring Global User Directories Data Location Resource Telemetry Billing/ Accounting
Physical Machine Allocation:
PRS
Open service research Tashi development Apps running in a VM mgmt infrastructure (e.g., Tashi, Eucalyptus) Production storage service Provides each projectwith a mini-datacenter Isolation of experiments
•
PRS dynamically divides compute nodes
into isolated subdomains
Cluster Storage: HDFS
• Storage system aggregating standard devices
–
High-performance, parallel access
–
High data reliability through replication
• Exposing location information enables intelligent placement of computation
Virtual Machine Allocation:
Tashi
•
An open source Apache Software Foundation
incubator project
– Infrastructure for cloud computing on Big Data – http://incubator.apache.org/projects/tashi
– Support for AWS* interface – OS, FS, and VMM agnostic
•
Research focus:
Application Service: Hadoop
•
An open-source Apache Software Foundation
project sponsored by Yahoo!
–
http://hadoop.apache.org
•
Provides a scalable, parallel programming
model (MapReduce) and the associated
runtime
Summary
•
Open Communities can shape the
development of Cloud Computing
•
Open Cirrus* is a multi-partner test bed for
research in Cloud Computing
•
The Open Cirrus software stack provides a
good starting point for open-source cloud
computing software development
Getting Involved
• Contact Open Cirrus* with research proposals