Clusters in the Cloud

(1)

Clusters in the Cloud

Dr. Paul Coddington, Deputy Director Dr. Shunde Zhang, Compu:ng Specialist

eResearch SA

(2)

Use Cases

•  _{Make
the
cloud
easier
to
use
for
compute
jobs}

– Par:cularly for users familiar with HPC clusters

•  _{Personal,
on-‐demand
cluster
in
the
cloud} •  _{Cluster
in
the
cloud}

– A private cluster only available to a research group – A shared Node-‐managed cluster

•  Preferably dynamic/elas:c

– “Cloudburs:ng” for HPC

•  Dynamically (and transparently) add extra compute nodes from cloud to an exis:ng HPC cluster

(3)

Cluster infrastructure

Hardware, Network Local Resource Management System / Queueing system Monitoring Shared File System Conﬁgura:on Management System Applica:on Distribu:on SoWware Layer

(4)

Tradi:onal Sta:c Cluster

•  _{Hardware/Network}

– Dedicated hardware

– Long process to get new hardware – Sta:c not elas:c

•  _SoWware

– Assumes a fairly sta:c environment (IPs etc) – Not cloud-‐friendly

– Some systems need restart if cluster is changed – Not adaptable to changes

(5)

Cluster in the cloud

•  _{Hardware
/
Network}

– Provisioned by the cloud (OpenStack) – Get new resources in minutes

– Remove resources in minutes – Elas:c/scalable on demand

•  _SoWware

– Dynamic

(6)

Possible solu:ons

•  Condor for high-‐throughput compu:ng

–  Cloud Scheduler working for a CERN LCG node

–  Recent versions of Condor support cloud execu:on

•  Torque/PBS sta:c cluster in cloud

–  Works, but painful to set up and maintain

•  Dynamic Torque/PBS cluster

–  No exis:ng dynamic/elas:c solu:on

•  StarCluster for personal cluster

–  Automate setup of VMs in cloud, including cluster –  Can add/subtract worker nodes manually

(7)

Our work

•  _{Condor
for
high-‐throughput
compu:ng}

– Cloud Scheduler for Australian CERN LCG node

•  _{Torque/PBS
sta:c
cluster
in
cloud}

– Set up large cluster in cloud for CoEPP

– Scripts to automate setup and monitoring

•  _{Dynamic
Torque/PBS
cluster}

– Created Dynamic Torque system for OpenStack

•  _{StarCluster
for
personal
cluster}

– Ported to OpenStack and added Torque plugin – Add-‐ons to make it easier to use for eRSA users

(8)

Applica:on soWware

•  Want familiar HPC applica:ons to be available to cloud VMs

–  And we don’t want to install and maintain soWware twice, in HPC and cloud

•  But limit on size of VM images in the cloud •  Want to avoid making lots of custom images •  We use CVMFS

–  _{Read-‐only
distributed
ﬁle
system,
hbp
based}

–  Used by CERN LHC Grid for distribu:ng soWware –  One VM image, with CVMFS client

(9)

HTC Cluster in the Cloud

•  _{NeCTAR
eResearch
Tools
project
for
high-‐} throughput compu:ng in the cloud

•  _{ARC
Centre
of
Excellence
in
Experimental} Par:cle Physics (CoEPP)

•  _{Needed
a
large
cluster
for
CERN
ATLAS
data} analysis and simula:on

•  _{Tier
2
(global)
and
Tier
3
(local)
jobs}

•  _{Augment
exis:ng
small
physical
clusters
at} mul:ple sites – running Torque

(10)

(11)

(12)

Sta:c Cluster in the Cloud

•  _{Built
a
large
Torque
cluster
using
cloud
VMs} •  _{A
challenging
exercise!}

•  _{Reliability
issues,
needed
a
lot
of
scripts
to} automate setup, monitoring, recovery, etc •  _{Some
types
of
usage
are
bursty
but
cluster}

resources were sta:c

(13)

Dynamic Torque

•  _{Sta:c/dynamic
worker
nodes}

– Sta:c: stays up all the :me

– Dynamic: up and down according to workload

•  _{Independent
of
Torque/MAUI}

– Runs as a separate process

•  _{Only
add/remove
worker
nodes}

•  _{Query
Torque
and
MAUI
scheduler
periodically} •  _{S:ll
up
to
MAUI
scheduler
to
decide
where
to}

(14)

(15)

Dynamic Torque for CoEPP

Worker nodes in SA

Worker nodes in Melbourne

Worker nodes in Monash

Torque/MAUI and Dynamic Torque LDAP NFS Puppet Ganglia Nagios CVMFS Interac:ve nodes in Melbourne

(16)

(17)

CoEPP Outcomes

•  _{Three
large
clusters
in
use
for
over
a
year}

– Hundreds of cores in each

•  Condor and CloudScheduler for ATLAS Tier 2 •  Dynamic Torque for ATLAS Tier 3 and Belle •  LHC ATLAS experiment at CERN

– 530,000 Tier 2 jobs

– 325,000 CPU hours for Tier 3 jobs

•  _{Belle
experiment
in
Japan}

(18)

Private Clusters

•  _{Good
for
building
a
shared
cluster
for
a
large} research group with good IT support who can set up and manage a Torque cluster

•  _{What
about
the
many
individual
researchers} or small groups who also want a private

cluster using their cloud alloca:on?

•  But have no dedicated IT staﬀ and very basic Unix skills?

(19)

StarCluster

•  _{Generic
setup}

– Create security group for the cluster

– Launch VMs (master, node01, node02 …) – Set up public key for password-‐less SSH

– Install NFS on master and share scratch space to all nodeXX

– Can use EBS (Cinder) volumes as scratch space

•  _{Queuing
system
setup
(plugins)}

(20)

StarCluster for OpenStack

OpenStack EC2 API StarCluster Head Node (NFS, Torque server, MAUI) CVMFS proxy Worker Node (Torque MOM) Volume eRSA App Repository (CVMFS server) _Worker Node (Torque MOM) Worker Node (Torque MOM)

(21)

StarCluster -‐ conﬁgura:on

•  Availability zone

•  Image

•  (op:onal) Image for master

•  Flavor

•  (op:onal) Flavor for master

•  Number of nodes

•  Volume •  Username •  User ID •  Group ID •  User shell •  plugins

(22)

Start a cluster with StarCluster

# ﬁre up a new cluster (from your desktop) $ starcluster start mycluster

# log in to the head node (master) to submit jobs $ starcluster sshmaster mycluster

# Copy ﬁles

$ starcluster put /path/to/local/ﬁle/or/dir /remote/path/ $ starcluster get /path/to/remote/ﬁle/or/dir /local/path/ # Add a compute node to the cluster

$ starcluster addnode –n 2 mycluster # terminate it aWer use

$ starcluster terminate mycluster

(23)

Other op:ons for Personal Cluster

•  _Elas:cluster

– Python code to provision VMs – Ansible to conﬁgure them

– Ansible playbooks for Torque/SGE/…, NFS/pvfs/…

•  _Heat

– Everything in HOT template

– Earlier versions had limita:ons that made it hard to implement everything

(24)

Private Cluster in the Cloud

•  Can use your personal or project cloud alloca:on to start up your own personal cluster in the cloud

–  No need to share! Except among your group.

•  Can use the standard PBS/Torque queueing system to submit jobs (or not)

–  Only your jobs in the queue

•  But you have to set up and manage the cluster

–  Straighqorward if you have good Unix skills (unless things go wrong…)

•  Several groups now using this

(25)

Emu Cluster in the Cloud

•  Emu is an eRSA cluster that runs in the cloud •  Aimed to be like an old cluster (Corvus)

– 8-‐core compute nodes

•  But a bit diﬀerent

– Dynamically created VMs in the cloud – Can have private compute nodes

(26)

Emu

•  eRSA-‐managed dynamic cluster in the cloud

•  Shared by mul:ple cloud tenants and eRSA users •  All nodes in SA zone

•  eRSA cloud alloca:on contributes 128 cores

•  Users can bring in their own cloud alloca:on to launch their worker nodes in Emu

•  Users don’t need to build and look aWer their own personal cluster

•  It can also mount users’ Cinder volume storage to their own worker nodes via NFS

(27)

Using your own cloud alloca:on

•  Users add our sysadmins to their tenant

–  So we can launch VMs on their behalf –  Will look at Trusts in Icehouse

•  Add some conﬁgs to Dynamic Torque

–  Number of sta:c/dynamic nodes, size of nodes, etc

•  Add a group of user accounts allowed to use it

•  Create a reserva:on for users’ worker nodes in MAUI

•  A special ‘account string’ needs to be put in the job

–  To match users’ jobs to their group’s reserved nodes

•  A qsub ﬁlter to check if the ‘account string’ is valid

(28)

Emu

Worker nodes of Tenant1

Shared worker nodes (eRSA donated)

Worker nodes of Tenant2 Torque/MAUI and Dynamic Torque LDAP NFS Salt Sensu CVMFS NFS

(29)

Sta:c cluster vs dynamic cluster

Sta$c Cluster Dynamic Cluster

Hardware Physical Machines Virtual Machines

LRMS Torque Torque with Dynamic Torque

CMS Puppet Salt Stack

Monitoring Nagios, Ganglia Sensu, Graphite, Logstash

App Distribu:on NFS Mount CVMFS

(30)

Future Work

•  Beber repor:ng and usage graphs •  More monitoring checks

•  Queueing system

–  Mul:-‐node jobs don’t work because a new node is not trusted by exis:ng nodes

–  Trust list is only updated when PBS server is started –  Could hack Torque source code

–  Or maybe use SLURM or SGE

•  Beber way to share user creden:als

–  Trusts in Icehouse?

(31)

Future Work

•  _{Spot
instance
queue} •  _{Distributed
ﬁle
system}

– NFS is the sta:c component

•  Cannot add storage to NFS without stopping it

•  Cannot add new nodes to the allow list dynamically •  needs to use iptables; update iptables instead

– Inves:ga:on of a dynamic and distributed FS

•  One FS for all tenants

•  _{Alterna:ves
to
StarCluster}

(32)

Resources

•  _{Cloud
Scheduler} – hbp://cloudscheduler.org/ •  Star cluster – hbp://star.mit.edu.au/cluster – OpenStack version •  hbps://github.com/shundezhang/StarCluster/ •  Dynamic Torque – hbps://github.com/shundezhang/dynamictorque

(33)

(34)

Nagios vs Sensu

•  Nagios

–  First designed in the last century for sta:c environment

–  Needs to update local conﬁgura:on and restart service if a remote server is added or removed –  Server perform all checks and it is not scalable

•  Sensu

–  Modern design with AMQP as communica:on layer –  Local agent runs checks

–  Weak coupling between clients and server and it is scalable

(35)

Imagefactory

•  _{Template
in
XML}

– Packages – Commands – Files

•  _{Can
backup
an
‘image’
in
github}

(36)

EMU Monitoring

•  _Sensu

– Run health checks

•  Logstash

– Collect PBS server/MOM/accoun:ng logs

•  _Collectd

(37)

Salt Stack vs Puppet

Salt Stack Puppet

Architecture Server-‐Client Server-‐Client

Working Modal Push Pull

Communica:on Zeromq + msgpack HTTP + text

Language Python Ruby

Clusters in the Cloud