• No results found

High-Performance Computing

N/A
N/A
Protected

Academic year: 2021

Share "High-Performance Computing"

Copied!
18
0
0

Loading.... (view fulltext now)

Full text

(1)

High-Performance Computing

Windows, Matlab and the HPC

Dr. Leigh Brookshaw

Dept. of Maths and Computing, USQ

1

The HPC

Architecture

I 30 Sun boxes or “nodes”

I Each node has 2 x 2.4GHz AMD CPUs with 4 Cores each and 16GB RAM.

I Theoretically possible to run 240 independent simultaneous jobs.

I One extra node is the controller or administration node. Access to the HPC is via the administration node.

I One extra node is the Input/Output node—it is the disk controller for the HPC. All disk space (5 Tb) is controlled by this node.

I One extra node for Matlab clients I The disk space is visible to all nodes.

(2)

The HPC

Architecture

3

The HPC

Architecture

I CPU speed is equivalent to a workstation

I Computational power is derived from using more than one node/core.

(3)

Connecting to the HPC

I habeus—only for code testing, some interactive jobs and some batch jobs.

I usqhpcio—normally don’t need to connect to it. Controls the disks—should never run jobs on this node.

I usqhpc—main access point to the HPC. Contains the queues for submitting jobs.

I usqhpcm—for Matlab jobs using the Parallel Toolbox.

I The only access to the HPC is via a “Secure Shell” from the USQ network. This ensures all communication to the HPC is encrypted and secure.

I The HPC uses RedHat Linux.

I Most interaction with the HPC does not require a detailed knowledge of the Unix/Linux command line — about 10 conmmands

5

Connecting to the HPC

Windows’ Utilities

I PuTTY creates an SSH connection and provides a command line interface to a remote machine.

I WinSCP provides a traditional Windows interface for copying files between the local machine and the remote machine. Uses SSH to ensure all communication is encrypted

I notepad++ a good all purpose text/code editor that recognises Windows, Unix and MacOS text files.

(4)

PuTTY

Main Window

I Need the name of the machine you wish to connect to

7

PuTTY

First Connect

I First time PuTTY connects to a machine it will ask you whether it should download the remote host’s identifying key—answer Yes

(5)

PuTTY

Command Line Window

I Need to enter your HPC username and password.

9

WinSCP

Connecting to remote host

(6)

WinSCP

Preferences

I In the preferences you can specify which editor to use. I Remote files are downloaded edited locally then uploaded. I Downloading and uploading of files is done automatically.

11

WinSCP

(7)

Types of Jobs

Two basic types of jobs are run on multiple computing nodes I Distributed Jobs:

I Also called “Course Grained” jobs

I Each process is completely independent of each other I Little or no communication between processes

I Examples: Parameter space search, running the same program

repeatedly with different input parameters, &c.

I Parallel Jobs:

I Also called “Fine Grained” jobs

I Each process deals with a part of the problem

I Communication synchronisation required between processes. I Example: Processing of large data sets that will not fit on one

node, computational domains that need to be split across nodes, CFD &c.

13

Distributed Job Example Paradigm

(8)

Parallel Job Example Paradigm

Peer-Peer processing

15

HPC Job Submission

I Jobs are run via a Batch System—PBS (Portable Batch System)

I A batch system requires jobs to run unsupervised!

I Jobs submitted on batch queue—batch system starts job running when requested resources are available—

I Requested Resources:

I Number of nodes

I Number of cores per node I Memory per process

I Total amount of memory for the job

I Maximum amount of time

I . . .

I Jobs are submitted via a Shell Script or via a Matlab script I Shell Script examples are available on the HPC web site

(9)

Shell Script Example

#Select resources ##### #PBS -N Test-Matlab #PBS -l nodes=7:ppn=3 ##### Queue ##### #PBS -q standard ##### Mail Options ##### #PBS -m bea #PBS -M [email protected]

##### Change to current working directory ##### cd /home/mcsci/leighb/test

##### Execute Program #####

/usr/local/bin/matlab -nodisplay -nodesktop -nosplash < driver.m

17

(10)

Matlab’s Parallel Toolbox

I Provides the infrastructure scripts/commands for Parallel or Distributed computing—

I Ability to create Matlab “Workers” (a running instance of

Matlab).

I Ability to assign tasks to Workers (pass a script to run on a

worker).

I Ability to communicate between Workers (running scripts can

communicate with each other).

I Provides a Local Scheduler that will allow you to start one worker per core on one node. Maximum number of workers is 8

19

Matlab’s Distributed Computing Server

I Provides the Schedular to create Workers on other nodes. I Accessed via the Parallel Toolbox when requesting Workers I Currently the HPC has a license for a total of 64 simultaneous

(11)

Matlab Client Script

I Uses one Matlab license!

I Uses one Distributed Computing Toolbox license!

I Requests resources from the Scheduler—the number of Workers required &c.

I Client script distributes tasks to the Workers.

I Client script can run on the HPC (usqhpcm) or on your own machine

I Client machine must be able to connect to the HPC.

21

Matlab Distributed Processing

Minimalist Example running on USQHPCM

sched = findResource(’scheduler’, ’type’, ’torque’); set(sched, ’HasSharedFilesystem’, true);

set(sched, ’DataLocation’, ’/sandisk1/leighb’) set(sched, ’RshCommand’, ’ssh’);

job=createJob(sched);

set(job, ’PathDependencies’, {’/home/mcsci/leighb/test’}); for i=1:max

createTask(job, @distance, 3, {nsim} ); end

submit(job);

waitForState(job);

(12)

Matlab Distributed Processing

I “torque” is the name of the Scheduler to use with the HPC’s PBS batch system.

I createJob() create a new distributed job to submit to the scheduler

I createTask(): specify the tasks for the job. One task to one worker. A task is a Matlab function to run.

I submit(): queue the job on the PBS batch system using the ’torque’ scheduler. Each worker appears as a separate queued job on the PBS “default” queue.

I waitForState(): wait for the job to complete. Can timeout or wait for specific tasks to finish.

I getAllOutputArguments(): get the return values from the job.

23

Matlab Distributed Processing

Required Settings

I “HasSharedFilesystem” — All the nodes can see the user’s home folder on the HPC. Set to “true”

I “DataLocation” — Matlab’s book-keeping location. Place where Task output/input can be stored by Workers.

I “RshCommand” — The command Matlab must use to communicate between the Client and the Workers.

I “PathDependencies” — The paths to all the scripts used in this job—so all the Workers can find them.

(13)

Matlab Distributed Processing

Comments

I Each Worker appears as a submitted job on the PBS queue I Workers will be distributed by the PBS system

I set(sched, ’SubmitArguments’, ’-q long’);

Additional arguments to use when submitting Tasks to the PBS queue. Most common use is to change the queues.

25

Distributed versus Parallel

Distributed Job Parallel Job

Matlab sessions called “Workers” Matlab sessions called “Labs” Workers cannot communicate with

each other.

Labs can communicate with each other.

Define any number of tasks (different or the same) in a job

Define one task for the job—

duplicates are run on all Labs re-quested

Each Task is queued on the PBS sys-tem

The Job is queued on the PBS system Tasks need not run simultaneously—

assigned to Workers as they become available. Workers can run several tasks in a job

Tasks run simultaneously—on as

many Labs available at runtime. The start of the job may have to wait until the requested number of Labs is avail-able.

(14)

Matlab parallel Processing

Explicit Example

sched = findResource(’scheduler’, ’type’, ’torque’); set(sched, ’HasSharedFilesystem’, true);

set(sched, ’DataLocation’, ’/sandisk1/leighb’) set(sched, ’RshCommand’, ’ssh’);

pjob=createParallelJob(sched);

set(pjob, ’PathDependencies’, {’/home/mcsci/leighb/test’}); set(pjob, ’MaximumNumberOfWorkers’, 30)

set(pjob, ’MinimumNumberOfWorkers’, 20)

t = createTask(job, @distance, 3, {nsim} ); submit(pjob);

waitForState(pjob);

results = getAllOutputArguments(pjob);

27

Matlab Parallel Processing

Comments on Explicit example

I One task only—it is repeated on all Labs! I Only one job is submitted on the PBS queue.

I Matlab defaults to requesting from PBS one node for each lab!

I If you need more than 30 Labs you must explicitly specify resources required—

set(sched, ’ResourceTemplate’, ’-l nodes=30:ppn=2’);|\\ set(pjob, ’MaximumNumberOfWorkers’, 60)

set(pjob, ’MinimumNumberOfWorkers’, 60)

(15)

Explicit Lab Communication and Synchronisation

I numlabs – returns the number of Labs in the current job. I labindex – returns the index of Lab. Value will be different

for each lab.

I labSend – send data to the specified Lab.

I labReceive – block and read data from a specific Lab. I labProbe – check if data is available from a specific Lab. I labBarrier – block execution until all labs reach this call. I . . .

29

Matlab Parallel Processing

Letting Matlab do the work!

sched = findResource(’scheduler’,’type’,’torque’); set(sched, ’HasSharedFilesystem’, true);

set(sched, ’DataLocation’, ’/sandisk1/leighb’) set(sched, ’RshCommand’, ’ssh’); set(sched, ’RcpCommand’, ’scp’); matlabpool(sched,24); parfor i=1:64 result(i,:) = distance(100000); end matlabpool close;

(16)

Matlab Parallel Processing

Points to note

I matlabpool – start 24 Labs for this job. One job appears in the PBS queue with requested resources — 24 nodes.

I parfor – distribute the contents of the loop to the Labs in the pool. Each Lab works on one iteration of the loop! The first 24 iterations calculated simultaneously — one on each Lab. I Loop iterations are not done in loop order — but in Parallel

— results will appear out of order — unless stored in an explicitly indexed array!

I One iteration of a loop cannot depend on a previous iteration. I Pool of Labs remain running and available for tasks until the

Pool is explicitly closed.

31

Matlab Parallel Processing

Single Program Multiple Data (spmd)

I Interleaving of serial and parallel computing in the one client script

I Use Matlab smpd. . .end blocks

I Parallel computing within the smpd block—serial outside! I Identical code runs on each Lab—different data

I Useful for running the same program on different data

sets—when communication and synchronisation is required! I The Lab data sets may be part of a large distributed data set!

(17)

Matlab Parallel Processing

SMPD Example matlabpool(sched,24); spmd R1 = rand(240); Z1 = zeroes(240); Z2 = codistributed(Z1); Z3 = getLocalPart(Z2); Z4 = codistributed.rand(100000,24) Z5 = gather(Z2,1); end matlabpool close; 33

Matlab Parallel Processing

SMPD Example. . .

I R1 is a different array replicated on each Lab. I Z1 is an array replicated on each Lab.

I Z2 is a codistributed array—one segment of Z1 to each Lab. Default segmentation is by the last non-unary

dimension—columns in this case. I Z3 contains the local Lab. part of Z2.

I Create a codistributed array—use if the distributed array is too large to replicate on each Lab.

I Z5 in Lab 1 contains the reconstructed Z2. Without the 1 all Labs contain the reconstructed array.

(18)

Matlab Parallel Processing

SMPD . . .

I Many Matlab functions are capable of working with codistributed arrays—

I Elementary Array operations— +, - , *, /, /, dot variants, &c. I Elementary Matrix operations—find, diag, reshape, size, sort,

is*, &c.

I Matrix functions—Eigen, inverse, LU factorization, SVD,

Norms, &c.

I Elementary trig., log, hyperbolic functions, &c.

I help codistributed/functionname

I For-loops on codistributed arrays can only loop over the parts local to each Lab!

References

Related documents

HaCaT cells were cultured and allowed to be 70% confluent, then they were treated with 25mM D-glucose with complete media for 24 hours, followed by UV-B radiation treatment.. of

(the “Owner”) hereby acknowledges and agrees with ENMAX Energy Corporation (“ENMAX Energy”) that in the event there is a vacancy in the premises located at or in any of the

Parents of children with ASD reported significantly higher levels of parental distress, anxiety, and depression than parents of typically developing children, with no significant

Dari perolehan nilai komposit sebesar 1.2807, berarti perusahaan dapat mengimplementasikan keempat aspek sistem pengendalian manajemen yaitu duty of care, duty of

Proposed combination scheme with PCR6 rule yields the best verification accuracy compared to the statistical match score combination algorithms and DS theory-based combination

The PCM+ 4.0 Maintenance License for new installation covers the first year of use and allows for Technical Support and Software Updates.. After the first year is over, a new PCM+

Fifth, the whole reflection of economics into legal reasoning of the CJEU should be assessed in the context of its overall style of reasoning and methods of

When fas- tening the outboard rear seat shoulder belts or the rear seat center belt, make sure they are inserted into the correct buckles to obtain maximum protection from the seat