The SUN
™ONE Grid Engine
BATCH SYSTEM
Juan Luis Chaves Sanabria
Centro Nacional de Cálculo Científico
(
CeCalCULA
)
Latin American School in HPC on Linux Cluster
October 27 – November 07 2003
What is SGE ?
Is a cluster resource management software
Accepts jobs submitted by users and
schedules them for execution on the
cluster based upon resource management
policies (who gets how much resources
when)
Jobs are distributed in a way that
optimizes uniform workload across the
cluster
ELCAR en Clusters de Linux. 2003
Who develop SGE ?
SGE is developed by Sun Microsystems
http://www.sun.com/gridware http://gridengine.sunsource.net
SUN adquired Gridware. Developer of Distributed
Resource Management (DRM) software (July 2000)
SUN release SGE as a free downloable binary for
solaris and linux OS to facilitate deployment of compute farms.
Source code is available. Open source project to
enable the Grid Computing Model.
SGE 5.3 supported platforms
Compaq Tru64 Unix 5.0, 5.1
Hewlett Packard HP-UX 10.20, 11.00 IBM AIX 4.3.X
Linux x86, kernel 2.4, glibc ≥ 2.2
Linux Alpha/AXP, kernel 2.2, glibc ≥ 2.2 SGI IRIX 6.2 – 6.5
SUN Solaris (sparc) 2.6, 7, 8, 9 32-bit SUN Solaris (sparc) 2.6, 7, 8, 9 64-bit Sun Solaris (x86) 8
ELCAR en Clusters de Linux. 2003
How the System Operates ?
SGE accepts jobs requests for computer resources
(requeriment profile by each job)
Jobs requests are located in a holding area until
they can be executed
When are ready to be executed, the request is
forwarded to the adequate execution(s) device(s)
SGE manage the execution of the request Logs the record of their execution when it’s
finalized
SGE Components
Hosts:
Master (sge_qmaster y sge_schedd): control all
the SGE components and the overall cluster activity
Execution (sge_execd): authorized to execute
jobs through SGE
Administration: designated to carry out any
kind of administrative task for the SGE system
Submit: for submitting (qsub) and controlling
ELCAR en Clusters de Linux. 2003
SGE Components (2)
Queues:
A queue is a container for a class of jobs
(Batch/Parallel/Interactive/Checkpoint) allowed to execute on a particular host concurrently
Commands applied to a queue affect all jobs
associated with this.
SGE Components (3)
Queues (2):
Properties:
name: queue’s name
hostname: machine host of the queue
processors: in a multiprocessor system are the processors to which queue has access
qtype: type of jobs permited to run in this queue (Interactive, Batch, Parallel, Checkpointing)
slots: the numbers of jobs that can run concurrently in that queue
ELCAR en Clusters de Linux. 2003
SGE Components (4)
Queues (3):
Properties (2):
owner_lists: queue’s owners
user_lists: users o grups ids of those who may access the queue
xuser_lists: users o grups ids of those who may not access the queue
complex_list: indicate the complexs associated with the queue
complex values: assigns capacities as provided for this queue for certain complex attributes
SGE Components (5)
Complex: Set of features (resources)
associated with a queue, a hosts, or the
entire cluster that are known to SGE.
Cell: Each loosely separated SGE cluster,
with a different configuration and master
machine. The SGE_CELL environment
ELCAR en Clusters de Linux. 2003
SGE funcionality
Is controlled by four daemons:
sge_qmaster: control all the cluster’s
management and scheduling activities
Receive scheduling decisions from sge_schedd
Requets actions from sge_execd on the
execution hosts
Mantain tables about cluster status
sge_shadowd: daemon used if exist a host
backup (shadow master host) for the functionality of sge_qmaster
SGE functionality (2)
sge_schedd: mantain an up to date view of
the cluster’s status with the data provided by the sge_qmaster daemon. It :
Decide which jobs are forwarded to which
queues
Comunicate these decisions to the
sge_qmaster, who initiates the appropriate actions
ELCAR en Clusters de Linux. 2003
SGE funcionality (3)
sge_execd: is responsible for the queues
on its host and for the execution of the jobs in this queues .
It send information to the master host
(sge_qmaster) about jobs status or load on its host.
sge_commd: all the daemons
communicates among them through the communication daemons (one per host)
SGE functionality (4)
sge_qmaster sge_schedd sge_commd sge_commd sge_execd sge_commd sge_execd sge_execd sge_commd switch Master Host q2 q1 q3 q4 q5Using SGE
Depend of the user type executing the SGE
command.
SGE define four types of users:
Managers: Have full capabilities to manipulate SGE
Operators: Can execute all the commands like managers,
with the exception of making configuration changes to the SGE
Owners: Are defined by queue and can manipulate the
owned queues or jobs within them.
Users: Only can manage the owned jobs and only can use
queues or parallel environments where are authorized
Using SGE (2)
Full Full Full Full qlogin Full Full Full Full qhostOwn jobs only Own jobs only
Full Full
qhold
Own jobs only Own jobs only
Full Full qdel Shown only Shown only No system setup changes Full qconf
Own jobs only Own jobs only
Full Full
qalter
Own jobs only Own jobs only
Full Full qacct User Owner Operator Manager Command
ELCAR en Clusters de Linux. 2003
Using SGE (3)
Own jobs only Own jobs only
Full Full qrls Full Full Full Full qstat Full Full Full Full qsh Full Full Full Full qselect No configuration changes No configuration changes No system setup changes Full qmon
Own jobs only
Own jobs and owned queues only Full Full qmod User Owner Operator Manager Command
Submitting Jobs
Prerequisites ensure that in your .[t]cshrc or . bashrc no
commands are executed that need a terminal (tty)
bash, sh or ksh tty –s if [ $? = 0 ]; then stty erase ^H fi csh or tcsh tty –s if ( $status = 0 ) then stty erase ^H endif
ELCAR en Clusters de Linux. 2003
Submitting Jobs (2)
Prerequisites (2)
ensure that in your .[t]cshrc or .bashrc you
set executable search path and other SGE environmental conditions
csh or tcsh:
source <sge_root_dir>/default/common/settings.csh
bash, sh or ksh:
. <sge_root_dir>/default/common/settings.sh
Submitting Jobs (3)
specify what script should be executed qsub –cwd job_script
-cwd: run the job from the current working directory. (Default: $HOME) in the simplest case the job script contains
one line, the name of the executable
various examples in
<
sge_root_dir>
/examples/jobs/ many options are available for qsub man qsub
ELCAR en Clusters de Linux. 2003
Submitting Jobs (4)
Example of a script file #!/bin/csh WORKDIR=/tmp/scratch/$USER DATADIR=$HOME/data mkdir -p $WORKDIR cp $DATADIR/input_data $WORKDIR cd $WORKDIR
executable < input_data > out_executable cp out_executable $DATADIR
rm –rf $WORKDIR
Submitting Jobs (5)
Output and Error redirection:
Default standard output filename:
<Job_name>.o<Job_id>
Can by changed with the –o option
Default standard error filename:
<Job_name>.e<Job_id>
Can by changed with the –e option
Active SGE comments in script files:
ELCAR en Clusters de Linux. 2003
Submitting Jobs (6)
Array Jobs:
Are parametrized executions of the same script SGE view them as an array of independent tasks
joined into a single job.
task_id
is the array job task index number Each task can use the environment variable$SGE_TASK_ID to retrieve their own task index number and use it to access input data sets arranged for this task_id
Submitting Jobs (7)
Array Jobs (2):
Example:
qsub –l h_cpu=0:30:0 –t 2-10:2 script.sh input.data
Default standard output filename:
<Job_name>.o<Job_id>.<Task_id>
Default standard error filename:
<Job_name>.e<Job_id>.<Task_id>
Can be monitored and controlled as a total or by
Submitting Jobs (8)
Interactive Jobs:
Are executed on interactive queues Three ways are available:
qlogin: start a telnet-like sesion on a host choosed by SGE
qrsh: Is like rshor rloginUNIX commands
qsh: Is an xtermthat is brought up with the display set corresponding to the setting of the DISPLAY environment variable. If this variable is not set, the xterm is directed to the 0.0 screen of the X server on the host from which the interactive job was submitted.
DISPLAY can be set with the -displayoption.
Monitoring and Controlling
Jobs
qstat: show job/queue status
Whithout arguments show running/pending jobs -j show detailed information on running/pending
jobs
-f show submitted jobs and full listing of all queues
qhost: show job/host status
Whithout arguments show all execution host and
their configuration
-q show detailed information on queues at each
ELCAR en Clusters de Linux. 2003
Monitoring and Controlling
Jobs (2)
qdel: cancel jobs submitted through SGE
qdel <
job_id
>
qmod: suspend/unsuspend running jobs
qmod –s <
job_id
> (suspend) qmod –us <job_id
> (unsuspend)
qhold: holds back pending jobs from
execution
qrls: releases jobs from holds previously
assigned to them
Parallel Jobs
Are submitted to run on parallel
environments
Parallel environments are procedures to
accomplish with requeriments needed
to run a specific parallel application
One parallel environment by each class
or type of parallel application
configured into the cluster
ELCAR en Clusters de Linux. 2003
Parallel Jobs (2)
qconf –ap <parallel environment name>
create a new parallel environment
qconf –spl
list all defined parallel environments
qconf –sp <parallel environment name>
show detailed information on the specified
parallel environtment name
Parallel Jobs (3)
Parallel environment example:
$ qconf -sp mpich pe_name mpich queue_list all slots 8 user_lists NONE xuser_lists NONE
start_proc_args /usr/local/sge/mpi/startmpi.sh -catch_rsh $pe_hostfile
stop_proc_args /usr/local/sge/mpi/stopmpi.sh allocation_rule $round_robin
control_slaves TRUE job_is_first_task FALSE
Parallel Jobs (4)
Script example:
#!/bin/csh #
# (c) 2002 Sun Microsystems, Inc. Use is subject to license terms. # # our name #$ -N MPI_calc_PI_Job # # pe request #$ -pe mpich 2-6 # #$ -v MPIR_HOME=/usr/local/mpich # # needs in # $NSLOTS
# the number of tasks to be used # $TMPDIR/machines
# a valid machine file to be passed to mpirun #
echo "Got $NSLOTS slots." #
$MPIR_HOME/bin/mpirun -np $NSLOTS -machinefile $TMPDIR/machines $HOME/MPI/cpi
Checkpointing
SGE support two class of checkpointing:
User level checkpointing
Operating system level checkpointing
Checkpointing environments must be defined
by each type of application with this support
When a checkpointing job is launched this
must be indicated using the –ckpt option of the qsub command
ELCAR en Clusters de Linux. 2003
Checkpointing (2)
Checkpointing environments are defined
in configuration files:
Define the operations to:
initiating a checkpoint generation
migrate a checkpoint job to another host
restart of a checkpointed application
As well as the list of queues which are
eligible for a check-pointing method.
Checkpointing (3)
Checkpoint environment file format:
ckpt_name <name>
interface user defined or os provided.
ckpt_command command to initiate the checkpoint.
migr_command command used during a migration of a checkpointing job from one host to another.
restart_command command used to restart a previously checkpointed application.
clean_command command used to cleanup after a checkpointed application has finished.
ckpt_dir where checkpoint file should be stored.
queue_list all or comma separated list of queues
signal Unix signal to be sent to a job to initiate a checkpoint generation
when when generate the checkpoints:
s(shutdown the node)
m(periodically, at the min_cpu_intervalinterval defined by the queue)
x(when the job gets suspended)
ELCAR en Clusters de Linux. 2003
SGE Administration
All administration activities on SGE are commited
through the qmon command
Basically:
qconf –a<h|q|s|…> <associated arguments>
qconf –d<h|q|e|conf|s|…> <associated arguments> qconf –m<q|conf|…> <associated arguments> qconf –s<h|s|sel|conf|…> <associated arguments>