• No results found

Using WestGrid. Patrick Mann, Manager, Technical Operations Jan.15, 2014

N/A
N/A
Protected

Academic year: 2021

Share "Using WestGrid. Patrick Mann, Manager, Technical Operations Jan.15, 2014"

Copied!
31
0
0

Loading.... (view fulltext now)

Full text

(1)

Using WestGrid

Patrick Mann,

Manager, Technical Operations Jan.15, 2014

(2)

Winter 2014 Seminar Series

For more information on these and other seminars see https://www.westgrid.ca/support/training

Date Speaker Topic

5 February Gino DiLabio Molecular Modelling Using HPC and Gaussian

26 February Jonathan Dursi Responding to Canada’s Research Computing Needs

12 March Scott Northrup Introduction to GPU Computing Using CUDA

26 March Humaira Kamal and Alan Wagner

(3)

User Basics

To use WestGrid systems effectively you will need to know:

❖  Where to get help and information

❖  Which systems are suited to your project ❖  How to log on

❖  Basic Linux commands

(4)

Help and Support

WestGrid website: www.westgrid.ca

❖  Technical Specifications, QuickStart Guides, Software.. ❖  System status and notices

❖  Events, colloquia, news, ...

WestGrid Support: [email protected]

❖  Novice to expert

❖  Logon issues to in-depth parallelization ❖  No question too big or too small

(5)

WestGrid Cluster Schematic

Node User Desktop Login Node(s) Linux Node Node Node Node Node Scheduler Node(s) SSH over

Internet Internal Cluster Network

Cluster (compute nodes)

Shared Disc System /home and /global/scratch

/home backup

(6)

Cluster Compute Nodes

Node (Linux box)

CPU Core Core ... Shared RAM ….

❖  Nodes usually have 2 CPUs, with 6 or 8 cores/CPU. ❖  Usually 12-24 GB/node (2 GB/core).

❖  100’s of nodes in one cluster.

❖  InfiniBand interconnects (with varying bandwidth and latency) ❖  Specialty systems with MUCH more memory/node.

❖  Specialty systems that look like a single node with lots of cores. CPU

Core

Core ...

Node (Linux box)

CPU Core Core ... Shared RAM CPU Core Core ...

Interconnect (InfiniBand usually)

Shared Memory: 1 node (multicore)

(7)

System Selection 1

Aim: Optimally match software requirements and characteristics with systems

➔  Fast turnaround (Users!)

➔  Efficient use of resources (Systems

Management!)

https://www.westgrid.ca/support/quickstart/new_users#choosing_system

Software Packaged, Homegrown, Parallelizability, Scalability, Memory, Output System Architecture, Size, Memory, Interconnects, Storage, Batch Policy

(8)

System Selection 2

❖  Software? (off-the-shelf, licensed, homegrown) ❖  Memory requirements?

❖  Parallelization?

➢  Scalability

➢  Shared or Distributed memory (or both)

❖  Research Program Characteristics?

➢  Lots of little jobs (parameter space and optimization) ➢  A few really big jobs (simulations)

➢  Code development ➢  ...

(9)

System Selection 3

Small-memory serial. Undemanding

parallel

Hermes, Bugaboo, Jasper, Orcinus

Shared memory (OpenMP) Breezy, Hungabee (larger memory)

Distributed memory (MPI parallel) Bugaboo, Grex, Jasper, Lattice, Nestor, Orcinus, Parallel ●  Bugaboo, Nestor: large associated storage

●  Lattice: small memory (1.5 GB/core) ●  Grex: large memory (4 GB/core)

Graphics, visualization or GPU acceleration

Parallel

Gaussian Grex (licensed)

Other special software (MATLAB, ..) Check the QuickStart and software guides

(10)

System Selection 4

Lots of systems, some special purpose, some general purpose.

❖  Each has its own software set.

➢  Lots of generic software, but some packages are only on specific

machines (see software pages).

❖  Users may work on multiple systems.

Hard to choose.

(11)

Connecting to Cluster

The login nodes (and all nodes) run Linux.

❖  Command-line shell to write text commands ❖  So need to login via a standard terminal

We use SSH (as does most of the world) ❖  Linux and MacOS have built-in clients ❖  Windows: various packages: PuTTY

(12)

Linux

You do need to know the basics of Linux and the Commandline

Lots of tutorials and books out there. ❖  See the New Users QuickStart

guide:

(13)

Graphical Applications

Editors, Visualization and other Graphical Interfaces

❖  X-Windows is the Linux windowing system

❖  Linux editors, visualization packages and anything graphical use X

❖  Used by MacOS, and can be installed in Windows.

➢  http://sourceforge.net/projects/xming (free)

❖  Linux: ssh -X [email protected]

(14)

File Transfer

❖  Linux, MacOS: built-in

❖  Windows: WinSCP, Filezilla

❖  Lots of beautiful graphical front-ends out there!

❖  Annoying issue with line-endings in files from windows Standard tools based on SSH transport.

scp Secure copy

sftp Secure file transfer protocol

(15)

Inter-Site File Transfer

WestGrid Core Network: Very Fast

Internal Network connecting all sites ❖  CANARIE National Network

connecting Compute Canada sites (and all Universities and institutions) ❖  Especially to Silo backup/archival

system

Powerful Grid tools and Globus Online

(16)

Useful Linux Software

Many useful, standard software packages included on all WestGrid systems:

❖  Programming Editors (nedit, emacs, vi, …) ❖  Compilers (Intel, GNU, Fortran, C++, ..)

❖  Scripting (Python has become a common scientific language)

❖  Parallel programming (OpenMP, Open MPI) ❖  Base scientific libraries (BLAS, LAPACK, ..)

(17)

Job Basics

❖  Login nodes:

➢  Data management

➢  Editing and compiling code ➢  Quick tests

➢  Job management

❖  “Real” work done on the worker (compute) nodes

➢  Jobs submitted to batch system (queued) ➢  Jobs dispatched as fairly as possible to

worker nodes

(18)

Batch Jobs

A batch job is defined by a Linux shell script with

directives that tell the scheduler what resources the job

needs:

❖  memory, cores, walltime

❖  (and lots of fine detail stuff)

Jobs exceeding these pre-defined resource limits may be terminated! (eg, Walltime limit)

Jobs with incompatible requirements (eg cores/node) may be queued, but never run.

(19)

Job Management

Submit a job qsub <job script>

Status of jobs qstat [-f] <job id>

Delete a job (queued or running) qdel <job id>

Predicted start time showstart <job id>

Check scheduling showq [--help] -u <user name>

❖  Linux command-line utilities

❖  Run them as usual Linux commands

man qsub Standard Linux manual page.

(20)

Sample Job Script “hello.pbs”

#!/bin/bash # Standard Linux first line #PBS -l procs=1 # Scheduling directive (lots!) #PBS -j oe # join standard and error outputs date

echo “Hello World.”

echo “This job is running on $(/bin/hostname).” Submit the job

qsub hello.pbs

https://www.westgrid.ca/support/running_jobs#sample https://www.westgrid.ca/support/running_jobs#directives

(21)

Job Submission

pjmann@bugaboo ~/PresentationTests$ qsub hello.pbs 15298317.b0

The response gives the job id: 15298317

pjmann@bugaboo ~/PresentationTests$ qstat 15298317

Job ID Name User Time Use S Queue --- --- --- --- - --- 15298317.b0 hello.pbs pjmann 0 Q q1

(22)

Job Results

… run completes (try a few qstat’s and/or showstart)

pjmann@bugaboo ~/PresentationTests$ ls

hello.pbs hello.pbs.e15298317 hello.pbs.o15298317 pjmann@bugaboo ~/PresentationTests$ cat

hello.pbs.o15298317

Thu Jan 9 12:03:57 PST 2014 Hello World.

(23)

Starting Out

https://www.westgrid.ca/support/quickstart/new_users

Recommendations:

❖  run lots of small example test jobs.

❖  get a simple one working, and build up from there ➢  We all know the debugging 80:20 (or 90:10, or

99:01)

➢  build-up iteratively

(24)

Debugging

Debugging

❖  Job output can show lots of information

❖  Mail job completion info (lots there, #PBS directive)

❖  Explicitly define information requirements (Lots of detailed PBS directives)

(25)

Interactive Jobs

Some nodes are reserved for interactive use

Larger/Longer test jobs and interactive work (< 3 hours)

https://www.westgrid.ca/support/running_jobs#interactive

(26)

Job scheduling is a complex and difficult task.

❖  Each site schedules their own jobs ❖  MOAB fair-share scheduling

(27)

Fair-share Targets

System utilization targets set for projects (groups) and

their members.

❖  Fair-share allocates job priority depending on these targets.

❖  Dependent on resource availability and characteristics.

Base Metric: Usage over last couple of weeks (system

dependent)

❖  If Usage > Target: Priority is decreased proportionally ❖  If Usage < Target: Priority is increased proportionally

(28)

Resource Allocation

The Usage Targets are defined by the Resource Allocation

Process (RAC = “Resource Allocation Committee”)

❖  Compute Canada annual process (October) ❖  Projects (PI’s) complete an application

❖  Reviewed by Technical and Scientific panels ❖  Decisions in December

❖  Targets (allocations) entered into systems Jan.10

Default allocation available for projects which do not have a

(29)

Visualization and Software

You can install software/packages.

But analysts know about optimization, hardware details, systems details, … ASK!

https://www.westgrid.ca/support/visualization

Visualization and Graphics (including GPUs)

Jan Paral, UAlberta, Mercury Solar Wind

(30)

Asking for Help

It helps the analysts if you can include information:

1.  The name of the system (lots of folks forget this!). 2.  The job id.

3.  Your WestGrid user id (especially if you’re using a different email address).

4.  Location of the script/job/datafiles/…

5.  And of course details of the errors or issues. mailto:[email protected]

(31)

Conclusion

Support System selection Connecting Linux Jobs www.westgrid.ca [email protected]

References

Related documents

Purpose: We report on outcomes of robotic assisted laparoscopic radical prosta- tectomy as salvage local therapy for radiation resistant prostate cancer.. Materials and Methods:

Consider three types of power plants: power plants generating energy using fossil fuel without Carbon Capture and Storage technology, plants with Carbon Capture and Storage technology

o 1-2 bedroom – laundry o Family complex, no smoking o Water, sewer, garbage included o Take Yankton housing vouchers. o All complexes have free

A special case of manifolds which contain essential (i.e., incompressible non-boundary parallel) annuli are exterior spaces of connected sums of knots in S 3.. These manifolds

In Table 2, we demonstrated that for several func- tional categories, the marginally significant genes eliminated from the optimized list did, in fact, respond to elevated ozone in

Using a subset of genes with known interactions, we show that the inferred NEMix network has high accuracy and outperforms the classical nested effects model without hidden

• ThunderX_SC™: Up to 48 highly efficient cores along with integrated virtSOC, 10/40 GbE connectivity, multiple PCIe Gen3 ports, high memory bandwidth, dual socket coherency, and

The Department of Graduate Studies in Curriculum, Administration and Religious Education offers programs leading to the Graduate Certificate in Special Education, Educational