KISTI Supercomputer TACHYON
Scheduling scheme & Sun Grid Engine
슈퍼컴퓨팅인프라지원실
윤 준 원 ([email protected]
)
2014.07.15
Scheduling (batch job processing)
Distributed resource management
Features of job schedulers (SW)
Broad scope
Support for algorithms
Capability to integrate with standard resource manager Sensitivity to compute node and interconnect architecture Scalability
Fair-Share capability Efficiency
Dynamic capability
Support for preemption
-Sun Grid Engine
Open source batch-queuing system, developed and supported by Sun Microsystems (Oracle)
SGE History
CODINE(Computing in Distributed Networked Environments) - 1991 GRD(Global Resource Director) – 1996
Merged with GridWare - 1999
acquired by Sun Microsystems - in August of 2000
Sun renamed the product Grid Engine and released a free version
-2001
Oracle acquired Sun in January 2010
By the end of 2010, Oracle had closed the open source community,
stopped shipping source code, increased the license fees
In January of 2011, Univa announced that it had hired the core Grid
Engine development team who had worked on Grid Engine for several years.
Job scheduling in SGE
Tachyon2 - SGE 6.2u6 / Tachyon1 - SGE 6.1u5
The scheduler was a separate daemon(qmaster) before 6.2 released Scheduling a job has two distinct stage
Job selection Job scheduling
Queue
A logical abstraction that aggregate a set of job slots across
one or more execution hosts.
Slots
A container for jobs that execute on a single host
Default queue configuration : Slot count set equal to CPU
count
Standard Job Types
Batch, Interactive, Parallel, Checkpoint
Terminology
“cluster queue” all.q
“queue instance” all.q@node004
Host Group & Queue Configuration in SGE
Host Group mgt.
qconf –ahgrp , -mhgrp, -dhgrp, -shgrp
Q mgt.
qconf -[aq, mq, dq, sq]
queuename
// 큐 생성,수정,삭제, 확인 Host Group, PE, UserSet List 수정, userset list NONE(기본값)인 경우 모든 사용자
submit이 가능
qmaster/usersets 에서 큐 그룹별로 관리(#qconf –[au, mu, du, su] user1,user2, ..
user_lists)
qtype, slots, shell, shell_start_mode, prolog, epilog, complex_values 및 resources 등
수정
h_rt (walltime clock)은 Tachyon 1st long queue 168 hours, normal queue 48 hours
로 설정
long queue는 1cpu 이상, normal queue는 17cpu 이상이며, 그 미만 실행 불가
qconf –[ahgrp, mhgrp] @hostgroup, qconf -shgrpl // hostgroup 생성,수정, 확인
※ qconf -m{q,e,p,ckpt} <파일이름> -m : 수정 파일을 작성할 텍스트 편집,
q : 대기열, e : 실행 호스트, p : 병렬 환경, ckpt : 체크포인트 환경 ※ switch option – a:추가, m:변경, d:삭제, r:교체, s:보기
Sun Grid Engine Scheduler
Grid Engine Tickets
All policies are defined using “tickets”
Jobs get tickets from all the various policies Jobs with more tickets are more important
Administrator controls the total number of tickets in the
system
# of tickets assigned to each policy determines how “important” each of the different available policies are
Three Classes of Policies
Ticket Policies (Entitlement)
Share Tree (or Pair-share) Functional Ticket Override Ticket Urgency Policies Deadline Wait time Resource urgency Custom Policies POSIX Priority
Administrator to push a particular job to the front of the
Three Classes of Policies
Ticket Policies (Entitlement)
Share Tree (or Pair-share) Functional Ticket Override Ticket Urgency Policies Deadline Wait time Resource urgency Custom Policies POSIX Priority
Administrator to push a particular job to the front of the
Entitlement – Share tree
Ticket Policies (Job Selection)
Share Tree(fair-share) Policy
Start with N tickets, Divvy up across tree Job sorting based on ticket count
Memory(historical) of past usage
Leaf nodes must be project or user nodes
[root@sge03qs pe]# qconf -ssconf | grep weight_tickets* weight_tickets_functional 0
weight_tickets_share 100000 weight_ticket 0.010000
Entitlement – Function Ticket
Ticket Policies (Job Selection)
Functional Ticket Policy
Start with N tickets, Divide into four categories Users, Dept, Projects, Jobs
By default all categories have equal weight Divide within category among all jobs
Sum ticket count for each job within each category, Highest count
wins
No memory(historical) of past usage
Leaf nodes must be project or user nodes
By default, the functional ticket policy is inactive
weight_tickets_functional 0
weight_user 0.250000 weight_project 0.250000 weight_department 0.250000 weight_job 0.250000
Entitlement – Override Ticket
Ticket Policies (Job Selection)
Override Policy
Used to make temporary changes
– Override tickets disappear with job exit
Admin can assign extra tickets
– User, project, department or job
– Can also use quota to add override entitlements to a pending jobs
share_override_tickets
– Does job count dilute override ticket count. – Default is TRUES
[root@sge03 pe]# qconf -ssconf | grep share* weight_tickets_share 100000 share_override_tickets TRUE
Three Classes of Policies
Ticket Policies (Entitlement)
Share Tree (or Pair-share) Functional Ticket Override Ticket Urgency Policies Wait time Deadline Resource urgency Custom Policies POSIX Priority
Administrator to push a particular job to the front of the
Urgency – Wait Time Policy
As a job remains in the pending queue, the wait time
policy increases the urgency for that job.
It can be useful for preventing job starvation
weight_waiting_time 100.000000 weight_urgency 0.100000
U
wait= T
waitX W
wait Uwait : wait-time urgencyTwait : the time spent since being submitted
Urgency – Deadline Policy
The deadline is the time by which the job must be
scheduled.
In order to submit a job with a deadline, a user must
be a member of the
deadlineusers
group.
weight_deadline 3600000.000000 weight_urgency 0.100000
U
deadline=
: deadline time
: current time are given in Unix time(in
seconds)
Urgency – Resource Policy
If some resources in a cluster are particularly
valuable, it might be advantageous to make sure
those resources stay as busy as possible.
Three Classes of Policies
Entitlement (ticket) based
Share Tree (or Pair-share) Functional Ticket Override Ticket Urgency Policies Wait time Deadline Resource urgency Custom Policies POSIX Priority
Administrator to push a particular job to the front of the
Combining Policies
Final dispatch priority assigned to all pending jobs is
determined by combining the contributions
entitlement, urgency, and custom policies
P = Ne × We + Nu × Wu + Nc × Wc
Ne : entitlement priority
We : entitlement weighting factor # weight_ticket 0.010000
Nu : urgency priority
Wu : urgency weighting factor # weight_urgency 0.100000
Nc : custom priority
Wc : custom weighting factor # weight_priority 1.000000
Scheduler weighting factors
Reference in
Text Weighting Factor Parameter Name Tachyon1 Tachyon2
Wdeadline Deadline weight_deadline 3600000 3600000
Wwait Wait-time weight_waiting_time 0 100
We Entitlement (Ticket) weight_ticket 0.01 0.01
Wu Urgency weight_urgency 0.1 0.1
Wc Custom (POSIX) weight_priority 1 1
weight_tickets_share 100000 100000
weight_tickets_funct
ional 0 0
share_override_tick
ref. ) Job Priorities and Tickets
- urg = rrcontr + wtcontr + dlcontr
- tckts = ftckt + otckt + stckt
- job_priority = weight_urgency * normalized_urgency_value + weight_ticket * normalized_ticket_value +
weight_priority * normalized_POSIX_priority_value
ntckts The total number of tickets in normalized fashion.
tckts The total number of tickets assigned to the job currently ovrts The override tickets as assigned by the -ot option of qalter.
otckt The override portion of the total number of tickets assigned to the job currently ftckt The functional portion of the total number of tickets assigned to the job currently stckt The share portion of the total number of tickets assigned to the job currently share The share of the total system to which the job is entitled currently.
nurg The jobs total urgency value in normalized fashion. urg The jobs total urgency value.
rrcontr The urgency value contribution that reflects the urgency that is related to the jobs overall resource requirement.
wtcontr The urgency value contribution that reflects the urgency related to the jobs waiting time.
dlcontr The urgency value contribution that reflects the urgency related to the jobs deadline initiation time.