S06: Open-Source Stack for Cloud Computing

(1)

S06: Open-Source

Stack for Cloud

Computing

Milind Bhandarkar Yahoo! Michael Ryan Intel Michael Kozuch Intel Richard Gass Intel

(2)

Agenda

Sessions:

(A) Introduction 8.30-9.00

(B) Hadoop

9.00-10.00

Break 10.00-10.15

Hadoop 10.15-11:30

Lunch 11.30-12.30

(C) Pig 12.30-1.30

Break 1.30-1.45

(D) Tashi

1.45-3.30

I.

Speaker intros

II.

Motivation

III.

Open Cirrus

IV.

Open Cirrus

software stack

V.

Getting involved

(3)

Session A:

Introduction

(4)

Michael Kozuch (Intro)

•

Michael Kozuch is a Principal Engineer with Intel

Labs Pittsburgh and manager of the ILP

Systems Research and Engineering group

– Manages the Intel Open Cirrus cluster and is the PI for the Tashi research project

•

Michael is a 12-year veteran of Intel and

contributed to the development of Intel’s VT and

TXT technologies

(5)

Milind Bhandarkar (Hadoop)

•

Lead Yahoo! Grid Solutions Team since

June 2005

•

Contributor to Hadoop since January 2006

•

Trained 1000+ Hadoop users at Yahoo! &

elsewhere

•

20+ years of experience in Parallel

Programming

(6)

Michael Ryan (Tashi)

•

Michael is currently a research engineer

with Intel Labs Pittsburgh

–

Lead developer for Tashi

–

Serves as sysadmin for the Intel Open Cirrus

site

–

Coordinates the Global Monitoring service for

Open Cirrus

(7)

Richard Gass (PRS)

•

Richard is currently a research engineer with

Intel Labs Pittsburgh

– Lead developer for PRS

– Serves as sysadmin for the Intel OpenCirrus site

•

Richard has published 9+ scientific papers and

is also an (imminent) PhD candidate with

(8)

(9)

Why Open and Cloud makes

sense

• Cloud Computing is a new, critical technology

– Efficiency: Admin costs aggregated

– Scalability: From 1 to 1000 servers in 10 sec. flat

– Empowerment: Anyone can buy a cluster

• Open Communities enable rapid innovation

– Exchange of ideas: Knowledge grows

– Constructive Darwinism: Best tools survive/evolve

– Empowerment: Anyone can build a LAMP stack

Rapidly developing and deploying innovative computing technologies

(10)

Research Interest: Big Data

• Interesting applications are data hungry

• The data grows over time

• The data is immobile

– 100 TB @ 1Gbps ~= 10 days

• Compute comes to the data

• Big Data clusters are the new libraries

(11)

(12)

Open Cirrus

™

Cloud Computing Testbed

MIMOS* ETRI* ISPRAS* KIT* UIUC* IDA*

Collaboration between industry and academia, sharing

•hardware infrastructure

(13)

Open Cirrus

• Objectives

– Foster systems research around cloud computing

– Vendor-neutral open-source stacks and APIs for the cloud

– Expose research community to enterprise level requirements

– Provide realistic traces of cloud workloads

• How are we unique

– Support for systems research and applications research

– Federation of heterogeneous datacenters

– Collection of interesting data sets

Independently-managed sites…

(14)

User Access to Open Cirrus

• User access is organized around Research Projects

– Led by Principal Investigator (PI)

• Project PIs apply to each site separately

– Identifying additional team members

• Contact information for applications to each site are available on the Open Cirrus Web site

(15)

Open Cirrus

*

Research Projects

Example research

areas of interest

Datacenter federation Datacenter management Web services Data-intensive systems

Projects typically

not of interest

Traditional HPC app development

Production apps looking for “free” cycles

Closed-source system development

(16)

(17)

Open Cirrus* Software

Components

Physical Machine Allocation (PRS)

Cluster Storage (HDFS)

Virtual Machine Allocation (AWS* Compatible, e.g. Tashi or Eucalyptus)

Application Services (Hadoop)

Compute Node Services Global Services Site Services Single Sign-On Global Monitoring Global User Directories Data Location Resource Telemetry Billing/ Accounting

(18)

Physical Machine Allocation:

PRS

Open service research Tashi development Apps running in a VM mgmt infrastructure (e.g., Tashi, Eucalyptus) Production storage service Provides each project

with a mini-datacenter Isolation of experiments

•

PRS dynamically divides compute nodes

into isolated subdomains

(19)

Cluster Storage: HDFS

• Storage system aggregating standard devices

–

High-performance, parallel access

–

High data reliability through replication

• Exposing location information enables intelligent placement of computation

(20)

Virtual Machine Allocation:

Tashi

•

An open source Apache Software Foundation

incubator project

– Infrastructure for cloud computing on Big Data – http://incubator.apache.org/projects/tashi

– Support for AWS* interface – OS, FS, and VMM agnostic

•

Research focus:

(21)

Application Service: Hadoop

•

An open-source Apache Software Foundation

project sponsored by Yahoo!

–

http://hadoop.apache.org

•

Provides a scalable, parallel programming

model (MapReduce) and the associated

runtime

(22)

(23)

Summary

•

Open Communities can shape the

development of Cloud Computing

•

Open Cirrus* is a multi-partner test bed for

research in Cloud Computing

•

The Open Cirrus software stack provides a

good starting point for open-source cloud

computing software development

(24)

Getting Involved

• Contact Open Cirrus* with research proposals

University Pierre and Marie Curie LIP6 in Paris

(http://opencirrus.org

http://incubator.apache.org/projects/tashi