Using WestGrid
Patrick Mann,
Manager, Technical Operations Jan.15, 2014
Winter 2014 Seminar Series
For more information on these and other seminars see https://www.westgrid.ca/support/training
Date Speaker Topic
5 February Gino DiLabio Molecular Modelling Using HPC and Gaussian
26 February Jonathan Dursi Responding to Canada’s Research Computing Needs
12 March Scott Northrup Introduction to GPU Computing Using CUDA
26 March Humaira Kamal and Alan Wagner
User Basics
To use WestGrid systems effectively you will need to know:
❖ Where to get help and information
❖ Which systems are suited to your project ❖ How to log on
❖ Basic Linux commands
Help and Support
WestGrid website: www.westgrid.ca
❖ Technical Specifications, QuickStart Guides, Software.. ❖ System status and notices
❖ Events, colloquia, news, ...
WestGrid Support: [email protected]
❖ Novice to expert
❖ Logon issues to in-depth parallelization ❖ No question too big or too small
WestGrid Cluster Schematic
Node User Desktop Login Node(s) Linux Node Node Node Node Node Scheduler Node(s) SSH overInternet Internal Cluster Network
Cluster (compute nodes)
Shared Disc System /home and /global/scratch
/home backup
Cluster Compute Nodes
Node (Linux box)CPU Core Core ... Shared RAM ….
❖ Nodes usually have 2 CPUs, with 6 or 8 cores/CPU. ❖ Usually 12-24 GB/node (2 GB/core).
❖ 100’s of nodes in one cluster.
❖ InfiniBand interconnects (with varying bandwidth and latency) ❖ Specialty systems with MUCH more memory/node.
❖ Specialty systems that look like a single node with lots of cores. CPU
Core
Core ...
Node (Linux box)
CPU Core Core ... Shared RAM CPU Core Core ...
Interconnect (InfiniBand usually)
Shared Memory: 1 node (multicore)
System Selection 1
Aim: Optimally match software requirements and characteristics with systems
➔ Fast turnaround (Users!)
➔ Efficient use of resources (Systems
Management!)
https://www.westgrid.ca/support/quickstart/new_users#choosing_system
Software Packaged, Homegrown, Parallelizability, Scalability, Memory, Output System Architecture, Size, Memory, Interconnects, Storage, Batch Policy
System Selection 2
❖ Software? (off-the-shelf, licensed, homegrown) ❖ Memory requirements?
❖ Parallelization?
➢ Scalability
➢ Shared or Distributed memory (or both)
❖ Research Program Characteristics?
➢ Lots of little jobs (parameter space and optimization) ➢ A few really big jobs (simulations)
➢ Code development ➢ ...
System Selection 3
Small-memory serial. Undemandingparallel
Hermes, Bugaboo, Jasper, Orcinus
Shared memory (OpenMP) Breezy, Hungabee (larger memory)
Distributed memory (MPI parallel) Bugaboo, Grex, Jasper, Lattice, Nestor, Orcinus, Parallel ● Bugaboo, Nestor: large associated storage
● Lattice: small memory (1.5 GB/core) ● Grex: large memory (4 GB/core)
Graphics, visualization or GPU acceleration
Parallel
Gaussian Grex (licensed)
Other special software (MATLAB, ..) Check the QuickStart and software guides
System Selection 4
Lots of systems, some special purpose, some general purpose.
❖ Each has its own software set.
➢ Lots of generic software, but some packages are only on specific
machines (see software pages).
❖ Users may work on multiple systems.
Hard to choose.
Connecting to Cluster
The login nodes (and all nodes) run Linux.
❖ Command-line shell to write text commands ❖ So need to login via a standard terminal
We use SSH (as does most of the world) ❖ Linux and MacOS have built-in clients ❖ Windows: various packages: PuTTY
Linux
You do need to know the basics of Linux and the Commandline
Lots of tutorials and books out there. ❖ See the New Users QuickStart
guide:
Graphical Applications
Editors, Visualization and other Graphical Interfaces
❖ X-Windows is the Linux windowing system
❖ Linux editors, visualization packages and anything graphical use X
❖ Used by MacOS, and can be installed in Windows.
➢ http://sourceforge.net/projects/xming (free)
❖ Linux: ssh -X [email protected]
File Transfer
❖ Linux, MacOS: built-in
❖ Windows: WinSCP, Filezilla
❖ Lots of beautiful graphical front-ends out there!
❖ Annoying issue with line-endings in files from windows Standard tools based on SSH transport.
scp Secure copy
sftp Secure file transfer protocol
Inter-Site File Transfer
WestGrid Core Network: Very Fast
Internal Network connecting all sites ❖ CANARIE National Network
connecting Compute Canada sites (and all Universities and institutions) ❖ Especially to Silo backup/archival
system
Powerful Grid tools and Globus Online
Useful Linux Software
Many useful, standard software packages included on all WestGrid systems:
❖ Programming Editors (nedit, emacs, vi, …) ❖ Compilers (Intel, GNU, Fortran, C++, ..)
❖ Scripting (Python has become a common scientific language)
❖ Parallel programming (OpenMP, Open MPI) ❖ Base scientific libraries (BLAS, LAPACK, ..)
Job Basics
❖ Login nodes:
➢ Data management
➢ Editing and compiling code ➢ Quick tests
➢ Job management
❖ “Real” work done on the worker (compute) nodes
➢ Jobs submitted to batch system (queued) ➢ Jobs dispatched as fairly as possible to
worker nodes
Batch Jobs
A batch job is defined by a Linux shell script with
directives that tell the scheduler what resources the job
needs:
❖ memory, cores, walltime
❖ (and lots of fine detail stuff)
Jobs exceeding these pre-defined resource limits may be terminated! (eg, Walltime limit)
Jobs with incompatible requirements (eg cores/node) may be queued, but never run.
Job Management
Submit a job qsub <job script>
Status of jobs qstat [-f] <job id>
Delete a job (queued or running) qdel <job id>
Predicted start time showstart <job id>
Check scheduling showq [--help] -u <user name>
❖ Linux command-line utilities
❖ Run them as usual Linux commands
man qsub Standard Linux manual page.
Sample Job Script “hello.pbs”
#!/bin/bash # Standard Linux first line #PBS -l procs=1 # Scheduling directive (lots!) #PBS -j oe # join standard and error outputs date
echo “Hello World.”
echo “This job is running on $(/bin/hostname).” Submit the job
qsub hello.pbs
https://www.westgrid.ca/support/running_jobs#sample https://www.westgrid.ca/support/running_jobs#directives
Job Submission
pjmann@bugaboo ~/PresentationTests$ qsub hello.pbs 15298317.b0
The response gives the job id: 15298317
pjmann@bugaboo ~/PresentationTests$ qstat 15298317
Job ID Name User Time Use S Queue --- --- --- --- - --- 15298317.b0 hello.pbs pjmann 0 Q q1
Job Results
… run completes (try a few qstat’s and/or showstart)
pjmann@bugaboo ~/PresentationTests$ ls
hello.pbs hello.pbs.e15298317 hello.pbs.o15298317 pjmann@bugaboo ~/PresentationTests$ cat
hello.pbs.o15298317
Thu Jan 9 12:03:57 PST 2014 Hello World.
Starting Out
https://www.westgrid.ca/support/quickstart/new_users
Recommendations:
❖ run lots of small example test jobs.
❖ get a simple one working, and build up from there ➢ We all know the debugging 80:20 (or 90:10, or
99:01)
➢ build-up iteratively
Debugging
Debugging
❖ Job output can show lots of information
❖ Mail job completion info (lots there, #PBS directive)
❖ Explicitly define information requirements (Lots of detailed PBS directives)
Interactive Jobs
Some nodes are reserved for interactive use
Larger/Longer test jobs and interactive work (< 3 hours)
https://www.westgrid.ca/support/running_jobs#interactive
Job scheduling is a complex and difficult task.
❖ Each site schedules their own jobs ❖ MOAB fair-share scheduling
Fair-share Targets
System utilization targets set for projects (groups) and
their members.
❖ Fair-share allocates job priority depending on these targets.
❖ Dependent on resource availability and characteristics.
Base Metric: Usage over last couple of weeks (system
dependent)
❖ If Usage > Target: Priority is decreased proportionally ❖ If Usage < Target: Priority is increased proportionally
Resource Allocation
The Usage Targets are defined by the Resource Allocation
Process (RAC = “Resource Allocation Committee”)
❖ Compute Canada annual process (October) ❖ Projects (PI’s) complete an application
❖ Reviewed by Technical and Scientific panels ❖ Decisions in December
❖ Targets (allocations) entered into systems Jan.10
Default allocation available for projects which do not have a
Visualization and Software
You can install software/packages.
But analysts know about optimization, hardware details, systems details, … ASK!
https://www.westgrid.ca/support/visualization
Visualization and Graphics (including GPUs)
Jan Paral, UAlberta, Mercury Solar Wind
Asking for Help
It helps the analysts if you can include information:
1. The name of the system (lots of folks forget this!). 2. The job id.
3. Your WestGrid user id (especially if you’re using a different email address).
4. Location of the script/job/datafiles/…
5. And of course details of the errors or issues. mailto:[email protected]