• No results found

Using Parallel Computing to Run Multiple Jobs

N/A
N/A
Protected

Academic year: 2021

Share "Using Parallel Computing to Run Multiple Jobs"

Copied!
33
0
0

Loading.... (view fulltext now)

Full text

(1)

Beowulf Training

Using Parallel Computing to Run Multiple Jobs

Jeff Linderoth

(2)

Outline

Introduction to Scheduling Software

¦ The Wonderful World of PBS

¦ The Equally Wonderful World of Condor

Lab Time.

(3)

Resource Scheduling

So people don't ght over the resources! Schedulers...

¦ Locate appropriate resources,

¦ Manage resources, so multiple processes don't conict over

the same processor

¦ Ensure a fairness policy,

¦ Are integrated with accounting software.

(4)

Mmmmmmmmmmmmmm. Pie

Our rst computational task will be to estimate π by numerical

integration. Everyone knows... Z 1 0 1 1 + x2dx = arctan(x)| 1 x=0 = arctan(1) = π 4 .

(5)

The Rectangle Rule

0 0.5 1 1.5 2 2.5 3 3.5 4 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 4/(1+x*x)

(6)

A Program to Estimate π

I've written a π-calculator for you.

cd mkdir compute-pi cd compute-pi cp /tmp/Training/Session2/pi1.c . gcc pi1.c -lm -o pi1 ./pi1 1000

This is not a parallel program. Just a simple (one process)

program.

? Nevertheless, we must submit it through a scheduling system

(7)

Running with PBS

A simple four step process... Create a PBS submission script

Submit the script to the PBS system using the command qsub

PBS runs the script on the rst available resources PBS collects output for user's inspection

(8)

The PBS Submission Script—Overview

(1) You make a request for resources,

(2) PBS will allocate a “node pool” to fulll your request.

(3) Now you have to tell the node pool what to do!

Both steps (1) and (3) are accomplished through the PBS

submission script

The script contains

¦ PBS request statements

¦ Shell commands that will run your job on the allocated

resources.

¦ The shell commands are executed on the rst node in your

(9)

Our First PBS Submission Script

#PBS -q small

#PBS -l nodes=1:public #PBS -l cput=00:05:00 #PBS -V

echo "The PBS job ID is: ${PBS_JOBID}" echo "The PBS Node File is"

cat $PBS_NODEFILE

(10)

Format of the PBS Submission Script

Lines that begin with #PBS are PBS directives Everything else is a shell command

¦ Shell commands are just things that you would type at the

regular login-prompt.

¦ But you can also do fancy looping and conditions.

– http://www.gnu.org/manual/bash/html chapter/bashref toc.html

After the PBS commands, you put any commands you would

like.

– Usually the command to run your program is usually a good one to include. :-)

(11)

Breaking It Down. PBS Directives

-q – Species the queue in which to place the job. We have two queues, small and large

¦ small—Max CPU time 20 minutes/process.

¦ large—Lower priority than jobs in small queue

-l—Denes the resources that are required by the job and establishes a limit to the amount of resource that can be consumed.

-V— Declares that all environment variables in the qsub

command's environment are to be exported to the batch job.

¦ If you would like the PBS job to inherit the same

environment as the one you are currently running in (same PATH variable, etc), you should include this directive.

(12)

The -l Story

For resources, you will typically only need to declare

¦ the number of nodes,

¦ which class of nodes you request

#PBS -l nodes=4:public

¦ the maximum cpu time

#PBS -l cput=00:15:00

For the truly brave and curious the command is

(13)

PBS—The Big Three

qsub

¦ Submit a PBS job

qstat

¦ Check the status of a PBS job qdel

¦ Delete a PBS job

(14)

Let's do it!

[jtl3@fire1 compute-pi-1]$ qsub run.pbs 5972.fire1

[jtl3@fire1 compute-pi-1]$ qstat -a fire1:

Req’d Req’d Elap

Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time

--- --- - --- --- - -

---5972.fire1 jtl3 small run.pbs 27018 1 -- -- 00:20 E

--• Note that the job ID is printed for you when you submit the job qstat -a : Shows the status of all jobs

(15)

Looking at the Output

By default “standard output” goes to <scriptname>.o<job number>

By default “standard error” goes to <scriptname>.e<job number>

[jtl3@fire1 compute-pi-1]$ cat run.pbs.o5972 The PBS job ID is: 5972.fire1

The PBS Node File is fire34

pi is about 3.1614997369512658487167300 Error is 1.9907083361472733e-02

Note how the PBS environment variables are interpreted in the

(16)

Other Cool PBS Stuff You May Want To Do

#PBS -N <Name> : Name your job

#PBS -o <File.out> : Redirect standard output to File.out

#PBS -e <File.err> : Redirect standard error to File.err

#PBS -m -M : Mail options

Job dependencies

For a list of all PBS command le options...

¦ man qsub

(17)

Condor

For purposes of this discussion, think of Condor as a “different”

scheduler.

¦ Condor is a bit more fancy.

– Used often for nondedicated resources. (Will run only when no one else would use the machine).

– Checkpointing/Migration

– Remote I/O

? Likely, the accounting charge will be less for jobs submit to the

Condor scheduler.

http://www.cs.wisc.edu/condor

(18)

Checkpointing/Migration

Professor’s Machine Grad Student’s Machine Checkpoint Server Grad Student Leaves

}

5am 8am 5 min Professor Arrives

}

12pm 5 min 8:10am Arrives Grad Student

(19)

Condor Universes

Condor jobs are submit to a specic Condor Universe

Standard—Has cool features like checkpointing and migration

of jobs

¦ Requires special linking of your program Vanilla—No cool condor features (regular)

MPI/PVM

(20)

Compiling for Condor

Standard Universe

¦ Put the command condor compile in front of your normal

link line.

¦ [jtl3@fire1 condor]$ condor compile gcc pi1.c -o

pi1-standard -lm

Vanilla Universe

¦ Do nothing

Now Condor submission is like PBS submission

¦ Different command (job description) le

(21)

A Sample Condor Submission File

universe = standard executable = pi1-standard arguments = 1000000000 output = pi1.out error = pi1.err notification = Complete notify_user = [email protected] getenv = True rank = kflops queue

(22)

The Big Four

condor submit <job.condor>

¦ Submit a job to the Condor scheduler

condor q

¦ Check the status of the queue of Condor jobs

condor status

¦ Check the status of the condor pool condor rm <jobid>

(23)

Let's Do It!

[jtl3@fire1 condor]$ condor_submit run.condor Submitting job(s).

1 job(s) submitted to cluster 16. [jtl3@fire1 condor]$ condor_q

-- Submitter: fire1.cluster : <192.168.0.1:32777> : fire1.cluster

ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD

16.0 jtl3 8/4 11:22 0+00:00:16 R 0 3.4 pi1-standard 1000000000

[jtl3@fire1 condor]$ cat pi1.out

pi is about 3.1415926555921398488635532 Error is 2.0023467328655897e-09

I could do condor rm 16.0 Any Condor questions?

(24)

Quit Wasting My Time!

OK, Linderoth, I thought today was supposed to be about

parallel computing!

¦ That will be the focus of the next section(s)

¦ For now, let's do some simple “parallel computing”.

Suppose I'd like to run the same executable pi1, but with many

different input les or parameters.

(25)

Running Many Jobs

We need a way to easily submit many different jobs We will use the shell's scripting capabilities

¦ PBS

? Use a template command le and the sed utility

¦ Condor

(26)

PBS—Run Multiple Jobs. Step #1

Create a template submission le.

#!/bin/bash #PBS -q small

#PBS -l nodes=1:public #PBS -l walltime=00:05:00 #PBS -V

echo "The PBS job ID is: ${PBS_JOBID}" echo "The PBS Node File is"

cat $PBS_NODEFILE

(27)

PBS—Run Multiple Jobs. Step #2

Create a shell script to do the multiple submission

#!/bin/bash

for n in 100 1000 10000 100000 1000000 do

sed s/XXX_N_XXX/$n/g run.pbs.template > run.pbs.tmp qsub run.pbs.tmp

rm run.pbs.tmp done

The sed commands replaces all occurances of the pattern

(28)

PBS—Run Multiple Jobs.

[jtl3@fire1 pbs]$ sh run-many.sh 5989.fire1 5990.fire1 5991.fire1 5992.fire1 5993.fire1

sh the script you created

(29)

Condor—Run Multiple Jobs Example

condor submit allows the user to override statements in the submission le.

¦ Use the -a ag

(30)

Condor —Run Multiple Jobs. Step #1

Create the Condor submission le

Note no arguments or output lines!

executable = pi1-standard universe = standard notification = Complete notify_user = [email protected] getenv = True rank = kflops queue

(31)

The Condor Multiple Job Submission Script

Create the condor multiple job submission script Note the use of the -a option!

#!/bin/bash

for n in 100 1000 10000 100000 1000000 do

condor_submit -a "arguments = $n" -a "output = pi.$n.out"\ run.condor.many

(32)

Multiple Condor Submission Example

[jtl3@fire1 condor]$ sh run-many.sh Submitting job(s).

1 job(s) submitted to cluster 32. Submitting job(s).

1 job(s) submitted to cluster 33. Submitting job(s).

1 job(s) submitted to cluster 34. Submitting job(s).

1 job(s) submitted to cluster 35. Submitting job(s).

1 job(s) submitted to cluster 36. [jtl3@fire1 condor]$ condor_q

-- Submitter: fire1.cluster : <192.168.0.1:32777> : fire1.cluster

ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD

33.0 jtl3 8/4 12:16 0+00:00:01 R 0 3.4 pi1-standard 1000

34.0 jtl3 8/4 12:16 0+00:00:00 R 0 3.4 pi1-standard 10000

35.0 jtl3 8/4 12:16 0+00:00:00 I 0 3.4 pi1-standard 10000

36.0 jtl3 8/4 12:16 0+00:00:00 I 0 3.4 pi1-standard 10000

(33)

The End!

Schedulers are required for use in a parallel computing

environment

PBS and Condor are cool

You can do “parallel computing” even with MPI

¦ The Beowulf cluster can by a CPU cycle server for your

References

Related documents

The p-values for mean reversion tests of logarithmic returns are for both indices comparable: For example for the ADF(&lt; 1) test the p-value of the commodity index is equal to

FW dark gray, with light gray to white costal margin, outer margin with dark gray undulating post medial line; reniform spot faint brown with black mark on proximal

On the other hand, the existence of efficient (in the sense of breaking the general lower bound) low-congestion shortcuts is known for several major graph classes, as well as

Consistent with the USDOT Systems Engineering process, the results of this assessment will then be used to adjust the ConOps (Task 2) and contribute to the System Requirements

Efforts to develop an international governance structure that supports growth of global electronic commerce involves creating a new global structure or regime that emerges

Health workers in locations without Internet connectivity can access the system using any phone (satellite, fixed-line, mobile, or community pay phone).. For emerging economies

There is a gap between the demand and supply of higher education in nursing, arts and science and engineering colleges due to increasing the number of self financial college

In adults with AS with stable axial disease and active enthesitis despite treatment with NSAIDs, are locally administered parenteral corticosteroids more effective than no