INSTITUT FÜR EXPERIMENTELLE KERNPHYSIK FAKULTÄT FÜR PHYSIK
Integration of Virtualized Workernodes in Batch
Queueing Systems – The ViBatch Concept
(Computer) Virtualization
Sharing resources of one physical machine between independent
Operating Systems (OS) in Virtual Machines (VM)
Virtual Machines are decoupled from the underlying hardware and (almost) arbitrary operating systems can be installed.
Different virtualization techniques provided by various vendors and open-source communities
Physical host machine Virtualization VM server 1 Workernode OS 1 VM server 3 User portal OS 3 VM server 2 Proxy server OS 2 VM server 2 Proxy server VM server 3 User portal OS 3 VM server 1 Workernode OS 1
Why Virtualization?
Offers independence from host systems and encapsulation of user interaction.
Enables use of special validated operating systems for high energy physics analysis
Enables use of Virtual Appliances , e.g. CernVM (see later) Allows the dynamic partitioning of a shared HPC cluster:
Grants different setups for different user groups No incompatibilities have to be considered
Linux Kernel module VM 2 SuSe VM 1 Debian Normal user processes
- Kernel-based Virtual Machine
KVM is implemented as a Kernel module
Linux kernel is the virtual machine monitor
VMs run as normal processes Supports native virtualization
techniques AMD-V and Intel VT-x => Very good performance!
Hardware
Interface to common VMMs/hypervisors such as KVM, Xen, Vmware, UML
(Remote) management of virtual machines and storage. More Information: http://libvirt.org
Dynamic Virtualization Project at KIT:
HPC Cluster Models
Isolated Computing Cluster
Each group/institution has sep. cluster Administration overhead
Can not cover peak loads
Shared Computing Cluster
All groups share one cluster
Setup compromise not always possible
Load-balancing by fair-share
Dynamic Partitioned Cluster
Configure cluster in real-time with VMs
Allows any software/OS configuration Virtualization layer hidden
Load-balancing by fair-share
Dynamic Virtualization Project at KIT:
ViBatch
Lightweight tool enabling virtualization of job environments Can be implemented into arbitrary batch systems
Batch system is not aware of the virtualization – no code modification needed (only adapt configuration)
Virtual environment is determined per job just by the queue
the job is sent to:
qsub -q [normal_queue] job1.sh qsub -q [virtual_queue] job1.sh
job submission: only queue changes!
ViBatch - Lightweight
Core components:
just bash scripts
( prologue, epilogue and remoteshell )
Additional scripts for (almost)
automatic installation on arbitrary clusters
Cluster information and preferences in one
config-file
Logfiles enable debugging and workload statistics.
ViBatch - Virtual Appliances: CernVM-FS
Our VM image includes CernVM-FS, which is a remote file system via HTTP developed by CernVM Software appliance
http://cernvm.cern.ch/portal
Provides LHC software installation (various VOs: CMS, ATLAS, ...) including most common versions of experiment software We don't have to care about own installations!
A simple Squid HTTP proxy server does the caching
Monte Carlo Sim. (vbfnlo) CPU benchmark whetstone CMSSW physics analysis +17 % +12 % native native virtual virtual virtual
native: not available
Load of ViBatch (last 6 weeks)
ViBatch in Operation at EKP, KIT
ViBatch has already been used at EKP for several HEP analysis:
Data Skims for Higgs TauTau analysis (see talk A. Burgmeier, T49.7)
Monte-Carlo generation for studies in Higgs search (C. Hackstein, T49.1)
Running on EKP production cluster in parallel to native job submission
Performance
Depends on KVM tuning and host setup Currently investigated and tuned (KSM, ...)
#
jobs
SLE11 not binary
ViBatch in Operation at EKP, KIT
Memory consumption ~ 2GB RAM per VM
Currently no InfiniBand driver for our VMs => No native use of
Lustre file system possible
Storage mounted via NFS export
Shared Institutscluster IC1 at KIT Workernodes (EKP) 200 (25)
CPU 8x2.66 GHz Intel Xeon
Memory 2 GB RAM per core
Disc space 750 GB per node
Storage 350 TB Lustre FS
Network 40 Gbit/s InfiniBand
Our setup – characteristics & problems
Problems with compatibility kernel space NFS daemon Lustre driver: Unstable, few nodes crashed
Currently solved using user space NFS daemon
Conclusion and outlook
Extend operation to the whole cluster (200 nodes 1600 VM slots) Provide detailed documentation
Further simplify installation Burst into cloud:
Connect with ROCED (Talk S. Riedel, T 77.3)
Cloud
+
ViBatch