High-Performance Computing
Windows, Matlab and the HPCDr. Leigh Brookshaw
Dept. of Maths and Computing, USQ
1
The HPC
Architecture
I 30 Sun boxes or “nodes”
I Each node has 2 x 2.4GHz AMD CPUs with 4 Cores each and 16GB RAM.
I Theoretically possible to run 240 independent simultaneous jobs.
I One extra node is the controller or administration node. Access to the HPC is via the administration node.
I One extra node is the Input/Output node—it is the disk controller for the HPC. All disk space (5 Tb) is controlled by this node.
I One extra node for Matlab clients I The disk space is visible to all nodes.
The HPC
Architecture
3
The HPC
Architecture
I CPU speed is equivalent to a workstation
I Computational power is derived from using more than one node/core.
Connecting to the HPC
I habeus—only for code testing, some interactive jobs and some batch jobs.
I usqhpcio—normally don’t need to connect to it. Controls the disks—should never run jobs on this node.
I usqhpc—main access point to the HPC. Contains the queues for submitting jobs.
I usqhpcm—for Matlab jobs using the Parallel Toolbox.
I The only access to the HPC is via a “Secure Shell” from the USQ network. This ensures all communication to the HPC is encrypted and secure.
I The HPC uses RedHat Linux.
I Most interaction with the HPC does not require a detailed knowledge of the Unix/Linux command line — about 10 conmmands
5
Connecting to the HPC
Windows’ Utilities
I PuTTY creates an SSH connection and provides a command line interface to a remote machine.
I WinSCP provides a traditional Windows interface for copying files between the local machine and the remote machine. Uses SSH to ensure all communication is encrypted
I notepad++ a good all purpose text/code editor that recognises Windows, Unix and MacOS text files.
PuTTY
Main Window
I Need the name of the machine you wish to connect to
7
PuTTY
First Connect
I First time PuTTY connects to a machine it will ask you whether it should download the remote host’s identifying key—answer Yes
PuTTY
Command Line Window
I Need to enter your HPC username and password.
9
WinSCP
Connecting to remote host
WinSCP
Preferences
I In the preferences you can specify which editor to use. I Remote files are downloaded edited locally then uploaded. I Downloading and uploading of files is done automatically.
11
WinSCP
Types of Jobs
Two basic types of jobs are run on multiple computing nodes I Distributed Jobs:
I Also called “Course Grained” jobs
I Each process is completely independent of each other I Little or no communication between processes
I Examples: Parameter space search, running the same program
repeatedly with different input parameters, &c.
I Parallel Jobs:
I Also called “Fine Grained” jobs
I Each process deals with a part of the problem
I Communication synchronisation required between processes. I Example: Processing of large data sets that will not fit on one
node, computational domains that need to be split across nodes, CFD &c.
13
Distributed Job Example Paradigm
Parallel Job Example Paradigm
Peer-Peer processing
15
HPC Job Submission
I Jobs are run via a Batch System—PBS (Portable Batch System)
I A batch system requires jobs to run unsupervised!
I Jobs submitted on batch queue—batch system starts job running when requested resources are available—
I Requested Resources:
I Number of nodes
I Number of cores per node I Memory per process
I Total amount of memory for the job
I Maximum amount of time
I . . .
I Jobs are submitted via a Shell Script or via a Matlab script I Shell Script examples are available on the HPC web site
Shell Script Example
#Select resources ##### #PBS -N Test-Matlab #PBS -l nodes=7:ppn=3 ##### Queue ##### #PBS -q standard ##### Mail Options ##### #PBS -m bea #PBS -M [email protected]##### Change to current working directory ##### cd /home/mcsci/leighb/test
##### Execute Program #####
/usr/local/bin/matlab -nodisplay -nodesktop -nosplash < driver.m
17
Matlab’s Parallel Toolbox
I Provides the infrastructure scripts/commands for Parallel or Distributed computing—
I Ability to create Matlab “Workers” (a running instance of
Matlab).
I Ability to assign tasks to Workers (pass a script to run on a
worker).
I Ability to communicate between Workers (running scripts can
communicate with each other).
I Provides a Local Scheduler that will allow you to start one worker per core on one node. Maximum number of workers is 8
19
Matlab’s Distributed Computing Server
I Provides the Schedular to create Workers on other nodes. I Accessed via the Parallel Toolbox when requesting Workers I Currently the HPC has a license for a total of 64 simultaneous
Matlab Client Script
I Uses one Matlab license!
I Uses one Distributed Computing Toolbox license!
I Requests resources from the Scheduler—the number of Workers required &c.
I Client script distributes tasks to the Workers.
I Client script can run on the HPC (usqhpcm) or on your own machine
I Client machine must be able to connect to the HPC.
21
Matlab Distributed Processing
Minimalist Example running on USQHPCM
sched = findResource(’scheduler’, ’type’, ’torque’); set(sched, ’HasSharedFilesystem’, true);
set(sched, ’DataLocation’, ’/sandisk1/leighb’) set(sched, ’RshCommand’, ’ssh’);
job=createJob(sched);
set(job, ’PathDependencies’, {’/home/mcsci/leighb/test’}); for i=1:max
createTask(job, @distance, 3, {nsim} ); end
submit(job);
waitForState(job);
Matlab Distributed Processing
I “torque” is the name of the Scheduler to use with the HPC’s PBS batch system.
I createJob() create a new distributed job to submit to the scheduler
I createTask(): specify the tasks for the job. One task to one worker. A task is a Matlab function to run.
I submit(): queue the job on the PBS batch system using the ’torque’ scheduler. Each worker appears as a separate queued job on the PBS “default” queue.
I waitForState(): wait for the job to complete. Can timeout or wait for specific tasks to finish.
I getAllOutputArguments(): get the return values from the job.
23
Matlab Distributed Processing
Required Settings
I “HasSharedFilesystem” — All the nodes can see the user’s home folder on the HPC. Set to “true”
I “DataLocation” — Matlab’s book-keeping location. Place where Task output/input can be stored by Workers.
I “RshCommand” — The command Matlab must use to communicate between the Client and the Workers.
I “PathDependencies” — The paths to all the scripts used in this job—so all the Workers can find them.
Matlab Distributed Processing
Comments
I Each Worker appears as a submitted job on the PBS queue I Workers will be distributed by the PBS system
I set(sched, ’SubmitArguments’, ’-q long’);
Additional arguments to use when submitting Tasks to the PBS queue. Most common use is to change the queues.
25
Distributed versus Parallel
Distributed Job Parallel Job
Matlab sessions called “Workers” Matlab sessions called “Labs” Workers cannot communicate with
each other.
Labs can communicate with each other.
Define any number of tasks (different or the same) in a job
Define one task for the job—
duplicates are run on all Labs re-quested
Each Task is queued on the PBS sys-tem
The Job is queued on the PBS system Tasks need not run simultaneously—
assigned to Workers as they become available. Workers can run several tasks in a job
Tasks run simultaneously—on as
many Labs available at runtime. The start of the job may have to wait until the requested number of Labs is avail-able.
Matlab parallel Processing
Explicit Example
sched = findResource(’scheduler’, ’type’, ’torque’); set(sched, ’HasSharedFilesystem’, true);
set(sched, ’DataLocation’, ’/sandisk1/leighb’) set(sched, ’RshCommand’, ’ssh’);
pjob=createParallelJob(sched);
set(pjob, ’PathDependencies’, {’/home/mcsci/leighb/test’}); set(pjob, ’MaximumNumberOfWorkers’, 30)
set(pjob, ’MinimumNumberOfWorkers’, 20)
t = createTask(job, @distance, 3, {nsim} ); submit(pjob);
waitForState(pjob);
results = getAllOutputArguments(pjob);
27
Matlab Parallel Processing
Comments on Explicit example
I One task only—it is repeated on all Labs! I Only one job is submitted on the PBS queue.
I Matlab defaults to requesting from PBS one node for each lab!
I If you need more than 30 Labs you must explicitly specify resources required—
set(sched, ’ResourceTemplate’, ’-l nodes=30:ppn=2’);|\\ set(pjob, ’MaximumNumberOfWorkers’, 60)
set(pjob, ’MinimumNumberOfWorkers’, 60)
Explicit Lab Communication and Synchronisation
I numlabs – returns the number of Labs in the current job. I labindex – returns the index of Lab. Value will be different
for each lab.
I labSend – send data to the specified Lab.
I labReceive – block and read data from a specific Lab. I labProbe – check if data is available from a specific Lab. I labBarrier – block execution until all labs reach this call. I . . .
29
Matlab Parallel Processing
Letting Matlab do the work!
sched = findResource(’scheduler’,’type’,’torque’); set(sched, ’HasSharedFilesystem’, true);
set(sched, ’DataLocation’, ’/sandisk1/leighb’) set(sched, ’RshCommand’, ’ssh’); set(sched, ’RcpCommand’, ’scp’); matlabpool(sched,24); parfor i=1:64 result(i,:) = distance(100000); end matlabpool close;
Matlab Parallel Processing
Points to note
I matlabpool – start 24 Labs for this job. One job appears in the PBS queue with requested resources — 24 nodes.
I parfor – distribute the contents of the loop to the Labs in the pool. Each Lab works on one iteration of the loop! The first 24 iterations calculated simultaneously — one on each Lab. I Loop iterations are not done in loop order — but in Parallel
— results will appear out of order — unless stored in an explicitly indexed array!
I One iteration of a loop cannot depend on a previous iteration. I Pool of Labs remain running and available for tasks until the
Pool is explicitly closed.
31
Matlab Parallel Processing
Single Program Multiple Data (spmd)
I Interleaving of serial and parallel computing in the one client script
I Use Matlab smpd. . .end blocks
I Parallel computing within the smpd block—serial outside! I Identical code runs on each Lab—different data
I Useful for running the same program on different data
sets—when communication and synchronisation is required! I The Lab data sets may be part of a large distributed data set!
Matlab Parallel Processing
SMPD Example matlabpool(sched,24); spmd R1 = rand(240); Z1 = zeroes(240); Z2 = codistributed(Z1); Z3 = getLocalPart(Z2); Z4 = codistributed.rand(100000,24) Z5 = gather(Z2,1); end matlabpool close; 33Matlab Parallel Processing
SMPD Example. . .
I R1 is a different array replicated on each Lab. I Z1 is an array replicated on each Lab.
I Z2 is a codistributed array—one segment of Z1 to each Lab. Default segmentation is by the last non-unary
dimension—columns in this case. I Z3 contains the local Lab. part of Z2.
I Create a codistributed array—use if the distributed array is too large to replicate on each Lab.
I Z5 in Lab 1 contains the reconstructed Z2. Without the 1 all Labs contain the reconstructed array.
Matlab Parallel Processing
SMPD . . .
I Many Matlab functions are capable of working with codistributed arrays—
I Elementary Array operations— +, - , *, /, /, dot variants, &c. I Elementary Matrix operations—find, diag, reshape, size, sort,
is*, &c.
I Matrix functions—Eigen, inverse, LU factorization, SVD,
Norms, &c.
I Elementary trig., log, hyperbolic functions, &c.
I help codistributed/functionname
I For-loops on codistributed arrays can only loop over the parts local to each Lab!