Sun Powers the Grid SUN GRID ENGINE

Full text

(1)

S un P owe rs the Grid

S U P E R C O M P U T I N G 2 0 0 1

(2)

S un P owe rs the Grid

Grid S oftwa re S ta ck

Platfo rm: Solaris De ve lo pment Tools : Forte Infras tructure Adminis tratio n: iPlanet, JXTA

Sys te ms Adminis tration: SunMC, iChange , So laris Manage me nt Cons ole

Distribute d Re source Ma na gement: Sun Grid Engine

Dis tribute d Pro ce s s Manage ment: Clus te rToo ls Storag e Manage ment: JIRO, QFS, SAM-FS, HPC-SAN Campus Grid: SGE Broke r, SGE Ente rpris e Editio n

Global Grid: Avaki, Cac tus , Globus , PUNCH

Technic al C om put

(3)

S un P owe rs the Grid

S un Grid Engine

http://www.s un.c om/gridware

P roduct Ove rvie w, FAQs , We b Ba s e d Tra ining

http://supportforum.s un.com/gride ngine

S upportforum on Ins talla tion/

Compute Farms/HPC

Applica tion Note s , FAQs

http://gridengine.s uns ource.net

Ope n S ource Proje ct

Ma iling Lis ts

Courte s y Bina rie s

(4)

S un P owe rs the Grid

S un Grid Engine Focus :

La rge , long running ba tch jobs

Job

D uratio

# of

users

10m

1m

100k

10k

1k

100

10

m S ec S ecs M ins H ours D ays

Tra ns a ction

P roce s s ing

Ba tch

P roce s s ing

(5)

S un P owe rs the Grid

Res

ource-S e lection

Sun Grid Engine

Jobs

CS

TCF

WS

Results Dispatch

Ove rvie w

User

● Resource allocation policy

● Job and resource priority

● Systems' characteristics

(6)

S un P owe rs the Grid

De s ign Approa ch: Ba nk

Ana logy

Loans Teller

Wa iting a re a

Teller

M ortages

Teller

Loans

Pe nding lis t

Counte rs

Que ue s

Counte rs

Que ue s

Cus tome rs

low er

higher

Jobs

(7)

S un P owe rs the Grid

S imple Ins ta lla tion

Ma ster hos t $CODINE_ROOT/ \_ de fa ult/ \_ (dire ctorie s )

Exe c hos ts

Exe c hos ts

Exe c hos ts

S ubmit hos t

Admin

hos t

(8)

S un P owe rs the Grid

High Ava ila bility Ins ta lla tion

HA File s erve r

$CODINE_ROOT/ \_ de fa ult/

\_ (othe r dire ctories )

Ma ste r

host

S ha dow

ma s te r

host

Exe c

hos t

Submit host

Admin

hos t

(9)

S un P owe rs the Grid

S ola ris, Linux, e tc. S olaris, Linux, e tc.

Archite cture

execd

commd

s chedd

qma s te r

(10)

S un P owe rs the Grid

Hos t Type s a nd Da e mons

Ma ste r-Hos t Exe c-Host Submit-Host Admin-Host

commd

s chedd

ma s terd

e xecd

commd

execd

s ha dowd

s ha dowd

Ma nda tory Optiona l

(11)

S un P owe rs the Grid

Que ue s

J obs a cquire a ttribute s of

the que ue , e .g.:

CP U, me mory limits

P riority

S us pens ion (s chedule d,

ma nual)

Auxillia ry scripts

Intera ctive , ba tch, pa ra lle l

Hos t

Queues

S lots

S lots

Queue:

job container

J ob S lots : a llow multiple

jobs with s a me

characte ristics on same

host

(12)

S un P owe rs the Grid

qs ub

S che dd

Exe cd

5) L oad Report 7) In form when done 4) D ispa tch 1) Subm it Rec 8) ord 2) 3) J ob

Exe cd

Exe cd

qma ste r

Informa tion Flow

a ccounting

6) Con

(13)

S un P owe rs the Grid

S che duling

# # STARHPC # Version 1# #$ -l cdlic=1 # cp xyz abc Sche dd qma ster job se le ction ● use r s ort

● firs t come , first s e rve

que ue s e le ction

● re source ma tch ● se que nce numbe r ● loa d formula

load form ula (configurable by adm in), e g

● loa d_a vg ● slots

(14)

S un P owe rs the Grid

J ob Exe cution Eve nt Cha in

queue/host prolog

job script

E N D queue/host epilog

term inate m ethod resum e m ethod suspend m ethod

parallel start

parallel stop

parallel stop

queue/host epilog m igration com m and

clean com m and

requeu e job

checkpoint com m and

run at specifi ed

M IG R A

TE

S US P END

DELETE

S TA R T

EXIT

(15)

S un P owe rs the Grid

Comple xe s

comple x:

a related group of re s ource s

Hos t

num_proc

loa d_avg

me m_total

s wa p_free

Queue

me m limit

tmp dir

s lots

s us p s che d

Global

floa ting

lice ns e s

us e r-de fine d

consumable (s wap_free ) or fixed

(num_proc)

re ques ta ble (lice nse , me m limit)

Re s ource type s (e xa mple s )

qsub -l job_type=synth,license=1,m em _lim it=1G

(16)

S un P owe rs the Grid

(17)

S un P owe rs the Grid

Comple xe s : Re s ource

Attribute s

variable com m ent

na me must be unique across all comple xe s

s hortcut unique, ma y be use d a s a re pla ce me nt e ve rywhe re

type STRING, INT, BOOL, MEMORY, DOUBLE, TIME, CSTRING,

HOST

va lue de fault value, ma y be ove rride n by a ctua l va lue or loa d re port re lop re la tiona l opera tor

=< < == > >= !=

re que s t if YES , use r can re que s t; if FORCED, use r MUST reque s t consum attribute is consuma ble

de fault de fault re que s t if re source is ma rke d consuma ble

(18)

S un P owe rs the Grid

cluster

Loa d S e nsors : tra cking

a rbitra ry re s ource s

com pute

host com putehost

com pute host ● Hos t-s pe cific & cluste r-wide

informa tion. Exa mple s : ● "fre e dis k s pa ce on

a rbitra ry pa rtition" (host-s pe cific)

● "numbe r of lice ns e s of a pa rticula r S W in us e " (clus te r-wide )

● Cus tom s cript/progra m que rie d pe riodica lly

● Informa tion use d for ● re s ource re que s ts ● loa d thre sholds ● loa d formula

(19)

S un P owe rs the Grid

Re s ource Ma tching

Ca n a jobs run in a pa rticula r que ue ?

if (use r-re que s t RELOP a ctua l-va lue ) the n YES

Exa mple 1:

qsub -l a rch=s ola ris 64 myjob.s h ● a ctua l a rch=glinux

● RELOP for a rch is ==

if (s ola ris 64 == glinux) the n sche dule job.

R e s ult: job doe s NOT ge t sche dule d

Exa mple 2:

qsub -l h_vme m=256m myjob.s h ● a ctua l h_vme m=512m

● RELOP for h_vme m is <=

if (256m <= 512m) the n s che dule job.

R e s ult: job ge ts s che dule d

(20)

S un P owe rs the Grid

Inhe rita nce of Re source s

globa l

res A=val1

host re s B=val2 que ue que ue hos t res A=val6 que ue que ue re s C=val3 v a lu e e v e r y w h e r e v a lu e o n a ll q u e u e s o n th is h o s t v a lu e fo r a ll jo b s o n th is q u e u e user-defined resD =val4 v a lu e w h e r e v e r a tta c h e d o v e r r id e s g lo b a l v a lu e

(21)

S un P owe rs the Grid

Cons uma ble s

S P E C IAL TYP E O F R E S O UR C E

● Ca n us e us e r-de fine d re s ource , or built-in loa d va lue s , or va lue from loa d s e ns or

● "fre e me mory"

● "a mount of s pa ce in a s cra tch dire ctory" ● "numbe r of lice ns e s "

● Cons umption of re source s de te rmine d in two wa ys:

● If REQUES TABLE is YES or FORCED, individua l jobs will "cons ume " the spe cifie d a mount of re source s (e ithe r

a mount re que ste d or DEFAULT a mount)

● If REQUES TABLE is NO, a mount "cons ume d" mus t be give n by built-in loa d va lue or loa d s e ns or

● Jobs re que sting una va ila ble re source s will wa it until the y a re fre e d

(22)

S un P owe rs the Grid

Exa mple : Cons uma ble

initia l globa l, que ue comple x va lue s

jobs re que s ting re source s

upda te d globa l, que ue comple x

(23)

S un P owe rs the Grid

P a ra lle l a nd Che ckpointing

Environme nts

O VE R VIE W

Environment

a s et o f que ues

tha t is us e d to support

paralle l

or c hec kpointing jo bs

q2

q3

q1

q4

q5

q6

q7

E nv A

E nv

(24)

S un P owe rs the Grid

P E Configura tion

(25)

S un P owe rs the Grid

P a ra lle l Environme nt

c o d in e _ p e (5 ) q c o n f -s p <p e n a m e >

allo c atio n rule

setting com m ent

<inte ge r> a lloca te e xa ctly this ma ny s lots pe r host

$pe _s lots a lloca te a s ma ny s lots on s ingle hos t a s s ta te d on comma nd line : qs ub -pe <pe na me > <s lot

ra nge >

$fill_up fill up one hos t, move to a nothe r, continue until ra nge fille d

$round_robin do round-robin a lloca tion ove r a ll s uita ble hos ts until ra nge fille d

(26)

S un P owe rs the Grid

Che ckpoint Configura tion

(27)
(28)

S un P owe rs the Grid

Loa d S e ns or: Fre e disk

s pa ce

#!/bin/sh

myhost=`$CODINE_ROOT/utilbin/solaris64/gethostname -name`

end=false

while [ $end = false ]; do read input result=$? if [ $result != 0 ]; then end=true break fi

if [ "$input" = "quit" ]; then end=true

break fi

echo "begin"

dfoutput=`df -k /var | tail -1`

varfree=`echo $dfoutput | awk '{ print $4}'` echo "$myhost:varfree:${varfree}k"

echo "end" done

host-specific

loop until told to quit

output

inform ation

in specific

form at

script O R

binary

(29)

S un P owe rs the Grid

Cons uma ble s : Lice ns e

ma na ge me nt

P roble m:

For a pa rticula r a pplica tion, the re a re four floa ting lice ns e s : two for long jobs, two for s hort jobs

Re quire me nt:

● Applica tion should only run on two pa rticula r hos ts out of the whole clus te r

● jobs s hould only run if a lice nse is a va ila ble .

● a t a ny one time the re ca n only be four ins ta nce s of the job running: two s hort jobs a nd two long.

(30)

S un P owe rs the Grid

Cons uma ble s : Lice ns e

ma na ge me nt

global long=2 s hort=2 Host1 que ue A

long=0 que ue Bs hort=0

Host2

que ue A

long=0 que ue Bs hort=0

All Othe r hosts

long=0 s hort=0 ● cre a te globa l consuma ble s ● se t the m to ze ro e xplicitly whe re ve r the y're not de s ire d ● jobs na tura lly go

to wha te ve r

que ue s re ma ins

long jobs

run he re run he rene ithe r

s hort jobs run he re

(31)

S un P owe rs the Grid

Que ue s : Re s ource s ha ring

P roble m: se t ba s e configura tion to ma tch ne e ds Ha rdwa re

● 24 x S un Fire 280R 2 CP U 1GB RAM Re quire me nts

● inte ra ctive + fa s t jobs, no more tha n 16 a t a time , runtime <=1hr, me m<=256M, up to 2 jobs pe r CP U

● long jobs , no time limit, me m<=512M, only one job pe r CP U

● sus pe nd long jobs, if ne e de d, to run short jobs , but try to a void s uspe nding whe ne ve r pos sible

(32)

S un P owe rs the Grid

Que ue s : Re s ource s ha ring

Comple xe s : a dd a ne w que ue comple x re s ource

na me type value re lop re q cons. de fa ult

jobtype s tring NONE == YES NO long

Que ues

on both the two hosts, se t up two que ue s: long a nd short

● ba tch only que ue ; 2 slots, s et h_vme m to 512M, se t jobtype =long

● ba tch + intera ctive que ue ; 4 slots, ma ke othe r que ue subordina te , se t h_rt to 600 se conds , h_vmem to 256M, s et jobtype =short

● on a ll ma chine s, numbe r the queue s in opposite a sce nding orde r, eg

queue hostA hostB hostC hostD

batch only 1 2 3 4 batch+int 4 3 2 1

Clus ter

● se t use r-sort

● s e t que ue _sort_method to se qno

● disa ble batch+int queue s on all but 4 s yste ms (e na ble on othe rs, e g, if a s ys te m goe s down)

to run s hort job: qs ub -l jobtype =s hort shortjob.sh to run interactive job: qrsh /usr/loca l/bin/myjob

Figure

Updating...

References

Updating...

Related subjects :