• No results found

Clusters in the Cloud

N/A
N/A
Protected

Academic year: 2021

Share "Clusters in the Cloud"

Copied!
37
0
0

Loading.... (view fulltext now)

Full text

(1)

Clusters  in  the  Cloud  

Dr.  Paul  Coddington,  Deputy  Director   Dr.  Shunde  Zhang,  Compu:ng  Specialist  

 eResearch  SA    

(2)

Use  Cases  

•  Make  the  cloud  easier  to  use  for  compute  jobs  

– Par:cularly  for  users  familiar  with  HPC  clusters  

•  Personal,  on-­‐demand  cluster  in  the  cloud   •  Cluster  in  the  cloud  

– A  private  cluster  only  available  to  a  research  group   – A  shared  Node-­‐managed  cluster  

•  Preferably  dynamic/elas:c  

– “Cloudburs:ng”  for  HPC  

•  Dynamically  (and  transparently)  add  extra  compute   nodes  from  cloud  to  an  exis:ng  HPC  cluster  

(3)

Cluster  infrastructure  

Hardware,  Network   Local  Resource   Management  System  /   Queueing  system   Monitoring   Shared  File   System   Configura:on  Management   System   Applica:on   Distribu:on   SoWware  Layer  

(4)

Tradi:onal  Sta:c  Cluster  

•  Hardware/Network  

– Dedicated  hardware  

– Long  process  to  get  new  hardware   – Sta:c  not  elas:c  

•  SoWware  

– Assumes  a  fairly  sta:c  environment  (IPs  etc)   – Not  cloud-­‐friendly  

– Some  systems  need  restart  if  cluster  is  changed   – Not  adaptable  to  changes  

(5)

Cluster  in  the  cloud  

•  Hardware  /  Network  

– Provisioned  by  the  cloud  (OpenStack)   – Get  new  resources  in  minutes  

– Remove  resources  in  minutes   – Elas:c/scalable  on  demand  

•  SoWware  

– Dynamic  

(6)

Possible  solu:ons  

•  Condor  for  high-­‐throughput  compu:ng  

–  Cloud  Scheduler  working  for  a  CERN  LCG  node  

–  Recent  versions  of  Condor  support  cloud  execu:on  

•  Torque/PBS  sta:c  cluster  in  cloud  

–  Works,  but  painful  to  set  up  and  maintain  

•  Dynamic  Torque/PBS  cluster  

–  No  exis:ng  dynamic/elas:c  solu:on    

•  StarCluster  for  personal  cluster  

–  Automate  setup  of  VMs  in  cloud,  including  cluster   –  Can  add/subtract  worker  nodes  manually  

(7)

Our  work  

•  Condor  for  high-­‐throughput  compu:ng  

– Cloud  Scheduler  for  Australian  CERN  LCG  node  

•  Torque/PBS  sta:c  cluster  in  cloud  

– Set  up  large  cluster  in  cloud  for  CoEPP  

– Scripts  to  automate  setup  and  monitoring  

•  Dynamic  Torque/PBS  cluster  

– Created  Dynamic  Torque  system  for  OpenStack  

•  StarCluster  for  personal  cluster  

– Ported  to  OpenStack  and  added  Torque  plugin   – Add-­‐ons  to  make  it  easier  to  use  for  eRSA  users  

(8)

Applica:on  soWware  

•  Want  familiar  HPC  applica:ons  to  be  available  to   cloud  VMs  

–  And  we  don’t  want  to  install  and  maintain  soWware   twice,  in  HPC  and  cloud  

•  But  limit  on  size  of  VM  images  in  the  cloud   •  Want  to  avoid  making  lots  of  custom  images   •  We  use  CVMFS    

–  Read-­‐only  distributed  file  system,  hbp  based  

–  Used  by  CERN  LHC  Grid  for  distribu:ng  soWware   –  One  VM  image,  with  CVMFS  client  

(9)

HTC  Cluster  in  the  Cloud  

•  NeCTAR  eResearch  Tools  project  for  high-­‐ throughput  compu:ng  in  the  cloud  

•  ARC  Centre  of  Excellence  in  Experimental   Par:cle  Physics  (CoEPP)  

•  Needed  a  large  cluster  for  CERN  ATLAS  data   analysis  and  simula:on  

•  Tier  2  (global)  and  Tier  3  (local)  jobs  

•  Augment  exis:ng  small  physical  clusters  at   mul:ple  sites  –  running  Torque  

(10)
(11)
(12)

Sta:c  Cluster  in  the  Cloud  

•  Built  a  large  Torque  cluster  using  cloud  VMs     •  A  challenging  exercise!  

•  Reliability  issues,  needed  a  lot  of  scripts  to   automate  setup,  monitoring,  recovery,  etc   •  Some  types  of  usage  are  bursty  but  cluster  

resources  were  sta:c  

(13)

Dynamic  Torque  

•  Sta:c/dynamic  worker  nodes  

– Sta:c:  stays  up  all  the  :me  

– Dynamic:  up  and  down  according  to  workload  

•  Independent  of  Torque/MAUI  

– Runs  as  a  separate  process  

•  Only  add/remove  worker  nodes  

•  Query  Torque  and  MAUI  scheduler  periodically   •  S:ll  up  to  MAUI  scheduler  to  decide  where  to  

(14)
(15)

Dynamic  Torque  for  CoEPP  

Worker  nodes  in  SA  

Worker  nodes  in  Melbourne  

Worker  nodes  in  Monash  

Torque/MAUI   and  Dynamic   Torque   LDAP   NFS   Puppet   Ganglia   Nagios   CVMFS   Interac:ve  nodes      in  Melbourne  

(16)
(17)

CoEPP  Outcomes  

•  Three  large  clusters  in  use  for  over  a  year  

– Hundreds  of  cores  in  each  

•  Condor  and  CloudScheduler  for  ATLAS  Tier  2   •  Dynamic  Torque    for  ATLAS  Tier  3  and  Belle   •  LHC  ATLAS  experiment  at  CERN  

– 530,000  Tier  2  jobs  

– 325,000  CPU  hours  for  Tier  3  jobs    

•  Belle  experiment  in  Japan  

(18)

Private  Clusters  

•  Good  for  building  a  shared  cluster  for  a  large   research  group  with  good  IT  support  who  can   set  up  and  manage  a  Torque  cluster  

•  What  about  the  many  individual  researchers   or  small  groups  who  also  want  a  private  

cluster  using  their  cloud  alloca:on?  

•  But  have  no  dedicated  IT  staff  and  very  basic   Unix  skills?  

(19)

StarCluster  

•  Generic  setup  

– Create  security  group  for  the  cluster  

– Launch  VMs  (master,  node01,  node02  …)   – Set  up  public  key  for  password-­‐less  SSH  

– Install  NFS  on  master  and  share  scratch  space  to   all  nodeXX  

– Can  use  EBS  (Cinder)  volumes  as  scratch  space  

•  Queuing  system  setup  (plugins)  

(20)

StarCluster  for  OpenStack  

OpenStack   EC2  API   StarCluster   Head  Node   (NFS,  Torque   server,  MAUI)   CVMFS   proxy   Worker   Node   (Torque   MOM)   Volume   eRSA  App   Repository   (CVMFS   server)   Worker   Node   (Torque   MOM)   Worker   Node   (Torque   MOM)  

(21)

StarCluster  -­‐  configura:on  

•  Availability  zone  

•  Image  

•  (op:onal)  Image  for  master  

•  Flavor  

•  (op:onal)  Flavor  for  master  

•  Number  of  nodes  

•  Volume   •  Username   •  User  ID   •  Group  ID   •  User  shell   •  plugins  

(22)

Start  a  cluster  with  StarCluster  

#  fire  up  a  new  cluster  (from  your  desktop)   $  starcluster  start  mycluster  

#  log  in  to  the  head  node  (master)  to  submit  jobs   $  starcluster  sshmaster  mycluster  

#  Copy  files  

$  starcluster  put  /path/to/local/file/or/dir  /remote/path/   $  starcluster  get  /path/to/remote/file/or/dir  /local/path/   #  Add  a  compute  node  to  the  cluster  

$  starcluster  addnode    –n  2  mycluster   #  terminate  it  aWer  use  

$  starcluster  terminate  mycluster    

(23)

Other  op:ons  for  Personal  Cluster  

•  Elas:cluster  

– Python  code  to  provision  VMs   – Ansible  to  configure  them  

– Ansible  playbooks  for  Torque/SGE/…,  NFS/pvfs/…  

•  Heat  

– Everything  in  HOT  template  

– Earlier  versions  had  limita:ons  that  made  it  hard   to  implement  everything  

(24)

Private  Cluster  in  the  Cloud  

•  Can  use  your  personal  or  project  cloud  alloca:on   to  start  up  your  own  personal  cluster  in  the  cloud  

–  No  need  to  share!  Except  among  your  group.  

•  Can  use  the  standard  PBS/Torque  queueing   system  to  submit  jobs  (or  not)  

–  Only  your  jobs  in  the  queue  

•  But  you  have  to  set  up  and  manage  the  cluster  

–  Straighqorward  if  you  have  good  Unix  skills  (unless   things  go  wrong…)  

•  Several  groups  now  using  this    

(25)

Emu  Cluster  in  the  Cloud  

•  Emu  is  an  eRSA  cluster  that  runs  in  the  cloud   •  Aimed  to  be  like  an  old  cluster  (Corvus)  

– 8-­‐core  compute  nodes    

•  But  a  bit  different  

– Dynamically  created  VMs  in  the  cloud   – Can  have  private  compute  nodes  

(26)

Emu  

•  eRSA-­‐managed  dynamic  cluster  in  the  cloud  

•  Shared  by  mul:ple  cloud  tenants  and  eRSA  users   •  All  nodes  in  SA  zone    

•  eRSA  cloud  alloca:on  contributes  128  cores  

•  Users  can  bring  in  their  own  cloud  alloca:on  to   launch  their  worker  nodes  in  Emu  

•  Users  don’t  need  to  build  and  look  aWer  their  own   personal  cluster  

•  It  can  also  mount  users’  Cinder  volume  storage  to   their  own  worker  nodes  via  NFS  

(27)

Using  your  own  cloud  alloca:on  

•  Users  add  our  sysadmins  to  their  tenant  

–  So  we  can  launch  VMs  on  their  behalf     –  Will  look  at  Trusts  in  Icehouse  

•  Add  some  configs  to  Dynamic  Torque  

–  Number  of  sta:c/dynamic  nodes,  size  of  nodes,  etc  

•  Add  a  group  of  user  accounts  allowed  to  use  it  

•  Create  a  reserva:on  for  users’  worker  nodes  in  MAUI  

•  A  special  ‘account  string’  needs  to  be  put  in  the  job  

–  To  match  users’  jobs  to  their  group’s  reserved  nodes  

•  A  qsub  filter  to  check  if  the  ‘account  string’  is  valid  

(28)

Emu  

Worker  nodes   of  Tenant1  

Shared  worker  nodes   (eRSA  donated)  

Worker  nodes  of   Tenant2   Torque/MAUI   and  Dynamic   Torque   LDAP   NFS   Salt   Sensu   CVMFS   NFS  

(29)

Sta:c  cluster  vs  dynamic  cluster  

Sta$c  Cluster   Dynamic  Cluster  

Hardware   Physical  Machines   Virtual  Machines  

LRMS   Torque   Torque  with  Dynamic  Torque  

CMS   Puppet   Salt  Stack  

Monitoring   Nagios,  Ganglia   Sensu,  Graphite,  Logstash  

App  Distribu:on   NFS  Mount   CVMFS  

(30)

Future  Work  

•  Beber  repor:ng  and  usage  graphs   •  More  monitoring  checks  

•  Queueing  system  

–  Mul:-­‐node  jobs  don’t  work  because  a  new  node  is  not   trusted  by  exis:ng  nodes  

–  Trust  list  is  only  updated  when  PBS  server  is  started   –  Could  hack  Torque  source  code  

–  Or  maybe  use  SLURM  or  SGE  

•  Beber  way  to  share  user  creden:als  

–  Trusts  in  Icehouse?  

(31)

Future  Work  

•  Spot  instance  queue   •  Distributed  file  system  

– NFS  is  the  sta:c  component  

•  Cannot  add  storage  to  NFS  without  stopping  it  

•  Cannot  add  new  nodes  to  the  allow  list  dynamically   •  needs  to  use  iptables;  update  iptables  instead  

– Inves:ga:on  of  a  dynamic  and  distributed  FS  

•  One  FS  for  all  tenants  

•  Alterna:ves  to  StarCluster  

(32)

Resources  

•  Cloud  Scheduler   – hbp://cloudscheduler.org/   •  Star  cluster   – hbp://star.mit.edu.au/cluster   – OpenStack  version   •  hbps://github.com/shundezhang/StarCluster/   •  Dynamic  Torque   – hbps://github.com/shundezhang/dynamictorque  

(33)
(34)

Nagios  vs  Sensu  

•  Nagios  

–  First  designed  in  the  last  century  for  sta:c   environment  

–  Needs  to  update  local  configura:on  and  restart   service  if  a  remote  server  is  added  or  removed   –  Server  perform  all  checks  and  it  is  not  scalable  

•  Sensu  

–  Modern  design  with  AMQP  as  communica:on  layer   –  Local  agent  runs  checks  

–  Weak  coupling  between  clients  and  server  and  it  is   scalable  

(35)

Imagefactory  

•  Template  in  XML  

– Packages   – Commands   – Files  

•  Can  backup  an  ‘image’  in  github  

(36)

EMU  Monitoring  

•  Sensu  

– Run  health  checks  

•  Logstash  

– Collect  PBS  server/MOM/accoun:ng  logs  

•  Collectd  

(37)

Salt  Stack  vs  Puppet  

Salt  Stack   Puppet  

Architecture   Server-­‐Client   Server-­‐Client  

Working  Modal   Push   Pull  

Communica:on   Zeromq  +  msgpack   HTTP  +  text  

Language   Python   Ruby  

References

Related documents

Finally, the articulation of Bahrom’s hierarchies of trade in Tajikistan emphasises this family’s Uzbek background and their historical pedigree as traders in contrast to

The DWMTM behaves excellently at the first and similarly to the MRM at the second shift, but the double window regression filters do even better, and they are additionally more

West of this, engulfing the existing billiard room, Leon added a wide south-west wing containing servants' rooms on its north side, and the billiard room and ballroom, with a

From the above picture we can understand that as far as the allocation of the annual advertising budget based on the type of the company is concerned, 84.6% of the big companies

The null hypothesis in this research study was: There is no statistically significant correlation between students’ online connectedness scores, as measured by the Online Student

Chair March said that much of the Executive Committee’s time has been working on understanding issues surrounding the budget, and he thanked Brett Sweet for his presentation at

Correlation of gfp expression of the reporter p Pcib gfp with intrinsic cib-HA expression under different conditions at the single cell level1. Tm p2 cib-HA transformed with p Pcib

4. Investigate the Operational Feasibility of the Power Plant Once the key operational strategy aspects are identied and the simu- lation model is validated, the performance