• No results found

Infrastructure Clouds for Science and Education: Platform Tools

N/A
N/A
Protected

Academic year: 2021

Share "Infrastructure Clouds for Science and Education: Platform Tools"

Copied!
40
0
0

Loading.... (view fulltext now)

Full text

(1)

Infrastructure Clouds for Science and

Education: Platform Tools

Kate Keahey, Renato J. Figueiredo, John

Bresnahan, Mike Wilde, David LaBissoniere

Argonne National Laboratory

Computation Institute, University of Chicago

University of Florida

(2)

www.nimbusproject.org

The Power of Infrastructure Clouds

Virtualization opens the flood gates

7/16/2012 2

• Outsourcing

• Virtual appliances

– Freeze your stack in time

– Run it anywhere

• Multi-cloud applications

– Run many copies all over the world!

• Elasticity

(3)

Harnessing The Power

• Organization tools and techniques

(4)

www.nimbusproject.org

Towards a Power Adapter

7/16/2012 4

(5)

What Needs To Be Harnessed

• VM (appliance) creation and development

– configuration management tools (chef, puppet)

• VM hypervisors

– Infrastructure-as-a-Service (IaaS)

• Cloud applications

– virtual clusters, cloudinit.d, CloudFormation,

• Elasticity

– Auto-scaling tools, phantom

• Workflow

– Swift, etc

(6)

www.nimbusproject.org

What Needs To Be Organized?

• VM (appliance) creation and development

– configuration management tools (chef, puppet)

• VM hypervisors

– Infrastructure-as-a-Service (IaaS)

• Cloud applications

– virtual clusters, cloudinit.d, CloudFormation,

• Elasticity

– Auto-scaling tools, phantom

• Workflow

– Swift, etc

7/16/2012 6

(7)

VM Applications

• An entire system frozen in time

– Full software stacks (versions)

– Configuration files

– Important for science!

• A dedicated modular service

– Web service, database, AMQP node, etc

• Demos

• A binary single file (or set of files)

– Easy to freeze

(8)

www.nimbusproject.org

Developing Appliances

• A single binary image?

– Many developers?

– Version control?

– Merging conflicts?

• Base image with a description

– Ex: Ubuntu 11.04 base images plus a set of

scripts

• Configuration Management Software

– Chef, Puppet, FG Rain, etc

7/16/2012 8

(9)

• Software stack description

– ruby and json

• A library of cookbooks

• Cookbooks contain recipes

– Ex: apache2 server with php4

• Attributes to customize each recipe

– Ex: on what port will apache listen

• Templates for configuration files

• Appliance developers make recipes

– Version control can be done with git/svn/cvs…

Chef

(10)

www.nimbusproject.org

Example Recipe

7/16/2012 10

app_dir = node[:appdir]

ve_dir = node[:virtualenv][:path]

git app_dir do

repository node[:autoscale][:git_repo]

reference node[:autoscale][:git_branch]

action :sync

user node[:username]

group node[:groupname]

end

execute "run install" do cwd app_dir

user node[:username]

group node[:groupname]

command "python setup.py install"

end

(11)

Example Template

phantom:

system:

type: epu

rabbit: <%= node[:autoscale][:rabbit_host] %>

rabbit_port: <%= node[:autoscale][:rabbit_port] %>

rabbit_ssl: False

rabbit_user: <%= node[:autoscale][:rabbit_username] %>

rabbit_pw: <%= node[:autoscale][:rabbit_password] %>

rabbit_exchange: <%= node[:autoscale][:rabbit_exchange] %>

authz:

type: sqldb

dburl: <%= node[:autoscale][:dburl] %>

phantom:

system:

type: epu

rabbit: vm-102.uc.futuregrid.org rabbit_port: 5672

rabbit_ssl: False rabbit_user: XXX rabbit_pw: PPPPPP

rabbit_exchange: default_dashi_exchange authz:

type: sqldb

dburl: mysql://nimbus:[email protected]/testphantom

(12)

www.nimbusproject.org

What Needs To Be Organized?

• VM (appliance) creation and development

– configuration management tools (chef, puppet)

• VM hypervisors

– Infrastructure-as-a-Service (IaaS)

• Cloud applications

– virtual clusters, cloudinit.d, CloudFormation,

• Elasticity

– Auto-scaling tools, phantom

• Workflow

– Swift, etc

7/16/2012 12

(13)

Cloud Applications

• More than 1 VM needed for the job

• Information exchange is needed

– Manual information exchange 

• Multi-cloud

– Cloud independence required

Web Server database

Web Web

Web Server nginx

Web Servers

(14)

www.nimbusproject.org

Cloud Management Tools

• Architecture description

– VM type, location, count

– Volumes

– Networks

– Other services

• Contextualization

– Exchange dynamically determined information

• IP addrs, security information.

– Bootstrap component connections

• Ex: mount NFS, connect to DB, etc

7/16/2012 14

(15)

A Simplified Deployment Scenario

(16)

www.nimbusproject.org

A Grid in Your Pocket…

7/16/2012 16

Pierre

EC2

(17)

A Grid in Your Pocket…

Jamie

EC2

OOI private cloud

Pierre

(18)

www.nimbusproject.org

7/16/2012 18

Jamie

David

EC2

OOI private cloud

FutureGrid

A Grid in Your Pocket…

Pierre

(19)

CloudFormation

• Assemble AWS services

– Run AMIs.

– Connect EBS volumes to AMIs

– Associate and SQS queue, etc

• JSON descriptions

• AWS only

• No configuration management software

integration

– Manual integration with Chef

(20)

www.nimbusproject.org

cloudinit.d

• Multicloud VM dependency management

– Uses the libcloud abstraction library

• Integrated with chef solo

• ini file format descriptions

– Coupled with any executable script

• Launch plan end-users/operators

– Lightweight

– Copy launch plan and “one click” action

– Easily reconfigured for various clouds

• Launch plan/application developers:

– Minimal software assumptions (ssh)

– “Stem cell” deployment approach

– Incremental launch plan development

7/16/2012 20

[svc-alamoHTTP]

iaas_key: XXXXXX iaas_secret: XXXX

iaas_host: alamo.futuregrid.org iaas_port: 8443

iaas: Nimbus image: ubunut10.10 ssh_username: ubuntu

localsshkeypath: ~/.ssh/fg.pem readypgm: http-test.py

bootpgm: http-boot.sh

(21)

cloudinit.d Overview

• Services

• Run Levels

– Collections of

services without

dependencies on

each other

• Launch Plan

– An ordered set of

run levels

(22)

www.nimbusproject.org

Cloudinit.d Features

7/16/2012 22

database

Web Server Web Server Web Server

• Repeatability: write a launch plan once,

deploy many times

Launch plan

(23)

Cloudinit.d Features

database

Web Server Web Server Web Server

• Deploy on cloud and non-cloud resources

Launch plan

(24)

www.nimbusproject.org

Cloudinit.d Features

7/16/2012 24

database

Web Server Web Server Web Server

• Coordination of interdependent launches

Launch plan

Run-level 1Run-level 2

(25)

Cloudinit.d Features

database

Web Server Web Server Web Server

Launch plan

Run-level 1Run-level 2

• User-defined launch tests

(26)

www.nimbusproject.org

Cloudinit.d Features

7/16/2012 26

database

Web Server Web Server Web Server

Launch plan

Run-level 1Run-level 2

• Test-based monitoring and repair

(27)

Cloudinit.d Features

database

Web Server Web Server Web Server

Launch plan

Run-level 1Run-level 2

• Test-based monitoring and repair

(28)

www.nimbusproject.org

Cloudinit.d Interface Iaas

A Single Service Application Boot

Infrastructure Cloud

Request a new VM Check Status

New VM

sshd

Verify ssh works

bootpgm

Run the boot program….

VM HTTP Server

readypgm

Run the ready program…

If the has a successful exit code (0), then the new simple cloud

application is set to go!

The VM is running

Now the VM has been

contextualized to be a web server

scp over the boot

contextualization program…

scp over the ready program Poll the IaaS service to determine when the VM is running…

sshd needs to startup and be accessible on the new VM

Here we show how cloudinit.d automatically creates a HTTP server from a simple distribution base image

(29)

What Needs To Be Organized?

• VM (appliance) creation and development

– configuration management tools (chef, puppet)

• VM hypervisors

– Infrastructure-as-a-Service (IaaS)

• Cloud applications

– virtual clusters, cloudinit.d, CloudFormation,

• Elasticity

– Auto-scaling tools, phantom

• Workflow

– Swift, etc

(30)

www.nimbusproject.org

Escalation Pattern

7/16/2012 30

Operational Units

User Domain

(configuration and security)

Domain Management:

Monitor and regulate domain properties based on system-specific and application-specific

metrics

• Challenge: leverage on-demand, large but unreliable provider pool

– Applications that absorb resources – Applications that tolerate failures

(31)

Scaling Considerations

• Reasons to scale

– Business vs science

• Cost vs quota

• Lossy environment

– VMs fail more often than bare metal

– N preserving

• Spot instances

– If the price is right

• Backfill

– If resources are idle

(32)

www.nimbusproject.org

Amazon Auto Scaling and CloudWatch

• Auto Scaling in EC2

– Policies to scale up and down servers

• Min, Max, and desired size

• Integrated with AWS CloudWatch Sensors

– Triggers

– CPU load, disk capacity, load balancer loads,etc

– Custom sensors

• No contextualization

• REST API

• AWS only

7/16/2012 32

(33)

Phantom Scaling Services

• Multi-cloud

– Fail-over and even distribution policies

• Monitor scaling factors and failures

– Generic/system qualities: deployment status, load, bank account, etc.

– Application-specific qualities, e.g., a workload queue for ALiEn, PBS, AMQP, and others

• Evaluate against policies

• Scale and/or recover

– For user components – For system components

– Across different cloud providers

• Release as a Service

• 0.1 running on FutureGrid now

– Initially available as a service on FutureGrid resources

– Provides high availability

Sensor information

Reliably provision, manage and contextualize resources

Apply Policy

(34)

www.nimbusproject.org

Infrastructure Platform Goals

• Multi-cloud

– Work across private, community and commercial clouds

• Any Scale

– Scale in response to a diverse set of sensors/triggers

– Both system and application sensors

• High Availability

– “Any VM can die”: system or user VMs

– Minimizing time to recovery (TTR)

• Your Polices, Our Enactment

– User-defined sensors/triggers and policies

• Engineered from the ground up to work with

infrastructure clouds

• Easy on the user

7/16/2012 34

(35)

How Can Science Plug Into This

Power

Example Embarrassingly Parallel

Scientific Application

Demonstration

(36)

www.nimbusproject.org

M subtask messages

Task Queue

Application Start the workers

Using Nimbus Domains

(37)

Preserve N worker VMs M subtask messages

Cumulus/S3 Message Queue

“N preserving”

policy

Infrastructure Compute Cloud

Get task

Results/Checkpoints

Application Start the workers

Using Nimbus Domains

(38)

www.nimbusproject.org

Phantom Architecture

7/16/2012 39

MySQL nginx REST HTTPS

Web Application HTTPS

REST Service

Web Application

FutureGrid Clouds

RabbitMQ

EPUM

Provisioner

DTRS

Zookeeper Cluster

REST Service

REST Service IaaS

Clouds

(39)

Adventures in Availability

• Time to scale (TTS)

– PENDING (request)

– STARTED (deployment)

– RUNNING

(contextualization)

TTS: preliminary results for 2,000 VMs provisioned on AWS EC2

(40)

www.nimbusproject.org Application adaptation:

Applications

7/16/2012 41

Infrastructure Platform

Contextualization, multi-cloud bridge, repeatable launches, scaling, elasticity and High Availability

Schedulers

Elastic MapReduce

Workflow Systems (Swift) Data Transfer Systems

Science Gateways Custom Applications (OOI)

Library of generic sensors

Application-specific

sensors Policies Decision Engine

References

Related documents

But in 2004, probably as a result of a greater rainfall from the start of the growing season that year, and of the annual basic fertilisation, there was a greater uptake

Many polymers suffer from melting temperatures far below those used in traditional electronic processing thus making it impossible to deposit the device layers on

The next Samskara that is performed is Simantonnayana, which is prescribed for the fourth month and is the only Samskara during pregnancy that need only be performed

Nurturing and maintaining this “Truth, the attending nurse supports he patient and the Christian Science practitioner as the patient, being rooted and grounded in Love,

This doubt has resonated with many music therapy clinicians who have found the quantitative research studies that are revered by the evidence based approach to be irrelevant

conclusion was that, once asleep, very few people living near airports are at risk of any substantial sleep disturbance due to aircraft noise, even at high event levels.. An

GAO, Trade Adjustment Assistance: Commerce Program Has Helped Manufacturing and Services Firms, but Measures, Data, and Funding Formula Could Improve, Appendix III: Economic