• No results found

The glite Workload Management System

N/A
N/A
Protected

Academic year: 2021

Share "The glite Workload Management System"

Copied!
48
0
0

Loading.... (view fulltext now)

Full text

(1)

FESR

The gLite Workload

Management System

Annamaria Muoio

INFN Catania Italy

[email protected]

Tutorial per utenti e sviluppo di

applicazioni in Grid

(2)

This presentation will cover the following arguments:

• Overview of the gLite WMS Architecture

• Job Description Language Overview

- Principal Attributes

(3)
(4)

Workload

Workload

Management System

Management System

(WMS) comprises a set of

Grid middleware components responsible for distribution

and management of tasks across Grid resources.

Purpose of

Workload Manager

(WM) is accept and

satisfy requests for job management coming from its

clients meaning of the submission request is

to pass the responsibility of the job to the WM.

WM will pass the job to an appropriate CE for

execution

taking into account requirements and the

preferences expressed in the job description.

The decision of which resource should be used is the

outcome of a

matchmaking

matchmaking

process.

(5)

UI

Job Contr.

-Condor

Computing

Element

Storage

Element

CE characts

& status SE characts& status

LFC

Information

System

Logging &

Logging &

Book

Book

-

-

keeping

keeping

Network

Server

(6)

Logging &

Logging &

UI

Job Contr.

-CondorG

Computing

Element

Storage

Element

CE characts

& status SE characts& status

UI: allows users to

access the functionalities

of the WMS

(via command line, GUI,

C++ and Java APIs)

WMS: Workload Management

System

LFC

Information

System

Network

Server

(7)

Network

Server

Logging &

Logging &

Book

Book

-

-

keeping

keeping

Information

System

LFC

UI

Job Contr.

-CondorG

Computing

Element

Storage

Element

CE characts

& status SE characts& status

myjob.jdl

JobType = “Normal”;

Executable = "$(CMS)/exe/sum.exe";

InputSandbox = {"/home/user/WP1testC","/home/file*”};

OutputSandbox = {“sim.err”, “test.out”, “sim.log"};

Requirements = other. GlueHostOperatingSystemName ==

“linux" && other.GlueCEPolicyMaxCPUTime > 10000;

Rank = other.GlueCEStateFreeCPUs;

submitted

Job Description Language

(JDL) to specify job

characteristics and

requirements

Jo

b

St

at

us

(8)

UI

Job Contr.

-CondorG

Computing

Element

Storage

Element

CE characts

& status SE characts& status

WMS

storage

Input Sandbox files Job

submitted

incoming requests

LFC

Information

System

Logging &

Logging &

Jo

b

St

at

us

Network

Server

(9)

Logging &

Logging &

Book

Book

-

-

keeping

keeping

UI

Job Contr.

-CondorG

Computing

Element

Storage

Element

CE characts

& status SE characts& status

WMS

storage

waiting

submitted

WM: responsible to take

the appropriate actions to

satisfy the request

Job

Where must this job be executed ?

Match-Maker/

Broker

LFC

Information

System

Jo

b

St

at

us

Network

Server

(10)

Network

Server

UI

Job Contr.

-Condor

Computing

Element

Storage

Element

CE characts

& status SE characts& status

WMS

storage

waiting

submitted

Match-Maker/

Broker

What is the status of the Grid ?

Matchmaker: responsible

to find the “best” CE

where to submit a job

LFC

Information

System

Logging &

Logging &

Jo

b

St

at

us

(11)

UI

Job Contr.

-Condor

Computing

Element

Storage

Element

CE characts

& status SE characts& status

WMS

storage

waiting

submitted

Match-Maker/

Broker

CE choice

LFC

Information

System

Logging &

Logging &

Book

Book

-

-

keeping

keeping

Jo

b

St

at

us

Network

Server

(12)

Jo

b

St

at

us

Logging &

Logging &

UI

Job Contr.

-Condor

Computing

Element

Storage

Element

CE characts

& status SE characts& status

WMS

storage

waiting

submitted

Job

Adapter

JA: responsible for the final “touches”

to the job before it’s passed to Condor

(e.g. creation of wrapper script, etc.)

LFC

Information

System

Network

Server

(13)

Jo

b

St

at

us

Logging &

Logging &

Book

Book

-

-

keeping

keeping

UI

Job Contr.

-Condor

Computing

Element

Storage

Element

CE characts

& status SE characts& status

WMS

storage

JC: responsible for the

actual job management

operations (done via

CondorG)

Job

submitted

waiting

ready

LFC

Information

System

Network

Server

(14)

UI

Job Contr.

-Condor

Computing

Element

Storage

Element

CE characts

& status SE characts& status

WMS

storage

Job Input Sandbox files

submitted

waiting

ready

scheduled

LFC

Information

System

Logging &

Logging &

Jo

b

St

at

us

Network

Server

(15)

UI

Job Contr.

-Condor

Computing

Element

Storage

Element

WMS

storage

Input Sandbox

submitted

waiting

ready

scheduled

running

“Grid enabled” data transfers/ accesses

LFC

Information

System

Logging &

Logging &

Book

Book

-

-

keeping

keeping

Jo

b

St

at

us

Network

Server

(16)

UI

Job Contr.

-Condor

Computing

Element

Storage

Element

WMS

storage

Output Sandbox files

submitted

waiting

ready

scheduled

running

done

LFC

Information

System

Logging &

Logging &

Jo

b

St

at

us

Network

Server

(17)

UI

Job Contr.

-Condor

Computing

Element

Storage

Element

WMS

storage

Output Sandbox

submitted

waiting

ready

scheduled

running

done

edg-job-get-output <dg-job-id>

LFC

Information

System

Logging &

Logging &

Book

Book

-

-

keeping

keeping

Jo

b

St

at

us

Network

Server

(18)

UI

Job Contr.

-Condor

Computing

Element

Storage

Element

WMS

storage

Output Sandbox files

submitted

waiting

ready

scheduled

running

done

cleared

LFC

Information

System

Logging &

Logging &

Jo

b

St

at

us

Network

Server

(19)

Flag

Meaning

SUBMITTED

submission logged in the LB

WAIT

job match making for resources

READY

job being sent to executing CE

SCHEDULED

job scheduled in the CE queue manager

RUNNING

job executing on a WN of the selected CE queue

DONE

job terminated without grid errors

CLEARED

job output retrieved

(20)

LFC

LFC

Catalogue

Catalogue

Logging &

Logging &

Resource Broker

Resource Broker

(

(

WorkLoad

WorkLoad

Mgr.)

Mgr.)

Storage

Storage

Element

Element

Computing

Computing

Information

Information

Service

Service

Job Status

DataSets info

Author.

&Authen.

Job

Sub

mit Event

Job

Query

Jo

b

St

at

us

Input “sandbox”

In

p

u

t “

sa

n

d

b

ox

+

B

ro

k

er

In

fo

O

u

tp

u

t “

sa

n

d

b

ox

Output “sandbox”

Publish

SE & C

E info

User

User

interface

interface

(21)

--vo

<vo name> : perform submission with a different

VO than the UI default one.

--output, -o

<output file> save jobId on a file.

--resource, -r

<resource value> specify the resource

for execution.

--nomsgi

neither message nor errors on the stdout will

be displayed.

Job Submission

(22)

typical output that you can get:

edg-job-submit test.jdl

====================edg-job-submit Success =====================

The job has been successfully submitted to the Network Server.

Use edg-job-status command to check job current status.

Your job identifier (edg_jobId) is:

-

https://lxshare0234.cern.ch:9000/rIBubkFFKhnSQ6CjiLUY8Q

==============================================================

In case of failure, an error message will be displayed

instead, and an exit status different form zero will be

retured.

(23)

**** Error: API_NATIVE_ERROR ****

Error while calling the "NSClient::multi" native api

AuthenticationException: Failed to establish security context...

**** Error: UI_NO_NS_CONTACT ****

Unable to contact any Network Server

it means that there are authentication problems

between the UI and the

Network Server

(check your

proxy or contact the site administrator).

(24)

specified by a given JDL file using the command

edg-job-list-match test.jdl

Connecting to host lxshare0380.cern.ch, port 7772

Selected Virtual Organisation name (from UI conf file): dteam

*********************************************************************

COMPUTING ELEMENT IDs LIST

The following CE(s) matching your job requirements have been found:

adc0015.cern.ch:2119/jobmanager-lcgpbs-infinite

adc0015.cern.ch:2119/jobmanager-lcgpbs-long

adc0015.cern.ch:2119/jobmanager-lcgpbs-short

(25)

using the glite-job-status command.

edg-job-status

https://lxshare0234.cern.ch:9000/X-ehTxfdlXxSoIdVLS0L0w

*************************************************************

BOOKKEEPING INFORMATION:

Printing status info for the Job:

https://lxshare0234.cern.ch:9000/X-ehTxfdlXxSoIdVLS0L0w

Current Status: Scheduled

Status Reason: Job successfully submitted to Globus

Destination: lxshare0277.cern.ch:2119/jobmanager-pbs-infinite

reached on: Fri Aug 1 12:21:35 2003

(26)

output can be copied to the UI

edg-job-get-output

https://lxshare0234.cern.ch:9000/snPegp1YMJcnS22yF5pFlg

Retrieving files from host lxshare0234.cern.ch

*****************************************************************

JOB GET OUTPUT OUTCOME

Output sandbox files for the job:

- https://lxshare0234.cern.ch:9000/snPegp1YMJcnS22yF5pFlg

have been successfully retrieved and stored in the directory:

/tmp/jobOutput/larocca_snPegp1YMJcnS22yF5pFlg

*****************************************************************

By default, the output is stored under /tmp/jobOutput, but

it is possible to specify in which directory to save the

(27)

command edg-job-cancel.

edg-job-cancel

https://lxshare0234.cern.ch:9000/dAE162is6EStca0VqhVkog

Are you sure you want to remove specified job(s)? [y/n]n :y

=================== edg-job-cancel Success====================

The cancellation request has been successfully submitted for the following

job(s)

- https://lxshare0234.cern.ch:9000/dAE162is6EStca0VqhVkog

(28)

describe jobs for execution on Grid.

The JDL adopted within the gLite middleware is

based upon Condor

CLASSified Advertisement

CLASSified Advertisement

language (ClassAd)

language (ClassAd)

.

A ClassAd is a record-like structure composed of a finite

number of attributes separated by a semi-colon (;)

A ClassAd is highly flexible and can be used to represent

arbitrary services

The JDL is used in gLite to specify the job’s

characteristics and constrains, which are used

during the

match

match

-

-

making process

making process

to select the best

resources that satisfy job’s requirements.

(29)

Attribute = value;

Attribute = value;

Comments must be preceded by a sharp character

(

#

#

)

or have to follow the C++ syntax

WARNING: The JDL is sensitive to blank

characters and tabs. No blank characters

or tabs should follow the

semicolon at the end of a line.

(30)

are optional.

An “essential” JDL is the following:

Executable = “test.sh”;

StdOutput = “std.out”;

StdError = “std.err”;

InputSandbox = {“test.sh”,”input.dat”};

OutputSandbox = {“std.out”,”std.err”};

If needed, arguments to the executable can be

passed:

(31)

must be escaped with a backslash

e.g. Arguments = “\”Hello World!\“ 10”;

Special characters such as

&

,

|

,

>

,

<

are only allowed if

specified inside a quoted string or preceded by triple

\

(e.g. Arguments = "-f file1\\\&file2";)

The

backtick

character

`

cannot be specified in the

JDL.

(32)
(33)

Normal

(simple, sequential job),

Interactive

,

MPICH

,

Checkpointable

,

Partitionable,

Parametric

Or combination of them

ƒ

Checkpointable, Interactive

ƒ

Checkpointable, MPI

E.g.

JobType =

JobType =

Interactive

Interactive

;

;

JobType

JobType

= {

= {

Interactive

Interactive

,

,

Checkpointable

Checkpointable

};

};

(34)

This is a string representing the

executable/command name.

The user can specify an executable which is

already on the remote CE

Executable = {

Executable = {

/opt/EGEODE/GCT/egeode.sh

/opt/EGEODE/GCT/egeode.sh

};

};

The user can provide a local executable name

which will be staged from the UI to the WN

Executable = {

Executable = {

egeode.sh

egeode.sh

};

};

InputSandbox = {

InputSandbox = {

/home/larocca/egeode/

/home/larocca/egeode/

egeode.sh

(35)

This is a string containing all the job command line

arguments.

E.g.:

If your executable sum has to be started as:

$ sum N1 N2 –out result.out

Executable =

Executable =

sum

sum

;

;

Arguments =

(36)

List of environment settings needed by the job to

run properly

E.g.

Environment = {“

Environment = {

JAVABIN=/usr/local/java

JAVABIN=/usr/local/java

};

};

InputSandbox

InputSandbox

(optional)

List of files on the UI local disk needed by the job

for running

The listed files will automatically staged to the

remote resource

(37)

List of files, generated by the job, which have to

be retrieved

E.g.

OutputSandbox =

OutputSandbox =

{

{

“std.out

std.out

”,

,”

std.err”

std.err

,

,

image.png”

image.png

};

(38)

Job requirements on computing resources

Specified using attributes of resources published

in the Information Service

If not specified, default value defined in UI

config\uration file is considered

Default.

Requirements =

Requirements =

other.GlueCEStateStatus ==

other.GlueCEStateStatus ==

"Production

"Production

;

;

Requirements = other.GlueCEInfoLRMSType == “PBS” &&

other.GlueCEInfoTotalCPUs > 2 &&

Member (“ALICE-2.1.7”,

(39)

Floating-point expression used to rank CEs that

have already fulfill the

Requirements

expression.

The Rank expression can contain attributes that

describe the CE in the Information System (IS).

The evaluation of the rank expression is

performed by the Resource Broker (RB) during

the match-making phase.

A higher numeric value equals a better rank.

E.g.:

Rank =

Rank =

other.GlueCEStateFreeCPUs

other.GlueCEStateFreeCPUs

;

;

(40)

This is a string or a list of strings representing the

Logical File Name (LFN)

or

Grid Unique Identifier

(GUID)

needed by the job as input.

The list is used by the RB to find the CE from

which the specified files can be better accessed

and schedules the job to run there.

InputData

InputData

= {

= {

lfn:cmstestfile

lfn:cmstestfile

,

,

guid:135b7b23

guid:135b7b23

-

-

4a6a

4a6a

-

-

11d7

11d7

-

-

87e7

87e7

-

-

9d101f8c8b70

9d101f8c8b70

};

(41)

specified)

The protocol or the list of

protocols

which the

application is able to “speak” with for accessing

files listed in

InputData

on a given SE.

Supported protocols in gLite are currently

gsiftp

gsiftp

, and

file

file

.

DataAccessProtocol = {

(42)

This string representing the URI of the

Storage

Element (SE)

where the user wants to store the

output data.

This attribute is used by the Resource Broker to

find the bestCE “

close

” to this SE and schedule

the job there.

OutputSE =

(43)

This attribute allows the user to ask for the

automatic upload and registration of datasets

produced by the job on the

Worker Node (WN)

.

This attribute contains the following three

attributes:

OutputFile

StorageElement

LogicalFileName

(44)

NodeNumber attribute is an integer specifying

the number of nodes needed for a MPI job.

The RB uses this attribute during the

matchmaking for selecting those CE having a

number of CPUs equals or greater the one

specified in NodeNumber.

NodeNumber

(45)

partitionable jobs)

JobSteps attribute can be either an integer

representing the number of steps for a

checkpointable or partitionable job e.g.:

JobSteps

JobSteps

= 100000;

= 100000;

or a list of strings representing labels associated

to the steps of a checkpointable or partitionable

job e.g.:

JobSteps

(46)

partitionable jobs)

CurrentStep attribute used to indicate the initial

step when submitting a checkpointable or

partitionable job.

CurrentStep

(47)

JDL (sottomissione via WMS Netrwork Server)

https://edms.cern.ch/file/555796/1/EGEE-JRA1-TEC-555796-JDL-Attributes-v0-7.doc

https://grid.ct.infn.it/twiki/bin/view/GILDA/SimpleJobS

ubmissionWithRB

https://grid.ct.infn.it/twiki/bin/view/GILDA/MoreOnJDL

-withedgcommands

Remember to initialize the proxy

before to interact with the WMS!

(48)

your

attention

References

Related documents

pharmaceuticals that do not meet the criteria of RCRA and related items. The regulated medical waste vendor that currently incinerates pathology waste and trace chemotherapy waste

How likely is your pharmacy to take steps to implement e‐prescribing if you received 

At least 30 days prior to commencement of construction activities for each contract or phase, the Permittee shall submit the contractor's plan which details the

Among the 33 - item preliminary scale, 18 items were derived from the above three instruments, and 15 items (patient monitoring, other nurse - specific technical skills,

Additionally, Strong/Weak networks exhibit higher values of the Synchrony Measure and lower values of the Variability Measure in the regime of high E-I synaptic weight and low

Data on chemical composition of various cassava fractions, including leaf meal concentrate, leaves (dried, fresh, ensiled), peels, and roots used in livestock feeding, is

the ARB suggested that an innocent violation of an SEC rule may give rise to jurisdiction under SOX if an employee were retaliated against for reporting it. While it was merely