• No results found

To change a value of a limit, click the button at the right of the field whose value you want to change A dialog box appears where you can type either Memory or

Time limit values. When a hard limit is exceeded, the running job in the queue is stopped immediately. When a soft limit is exceeded, a signal is sent that the job can intercept before the job is stopped. The Limits tab is shown in the following figure.

5. Click Ok to save your changes. Click Cancel to leave the dialog box without saving your changes.

2.1.2.12 How to Configure Subordinate Queues From the Command Line

To modify a cluster queue, type the following command:

qconf -mq <cluster_queue>

The -mq option (modify cluster queue) modifies the specified cluster queue. The -mq option displays an editor containing the configuration of the cluster queue to be changed. The editor is either the default vi editor or an editor defined by the EDITOR environment variable. Modify the cluster queue by changing the configuration and then saving your changes.

You can configure the following parameter from within the editor:

■ subordinate_list - A list of queues that are subordinated to the configured queue.

Subordinate relationships are in effect only between queue instances residing on the same host.

You can also use slotwise preemption to control job queues. See How To Use Slotwise Preemption for more information.

2.1.2.13 How To Use Slotwise Preemption

The slotwise preemption provides a means to ensure that high priority jobs get the resources they need, while at the same time low priority jobs on the same host are not unnecessarily preempted, maximizing the host utilization. The slotwise preemption is designed to provide different preemption actions, but with the current implementation only suspension is provided. This means there is a subordination relationship defined between queues similar to the queuewise subordination, but if the suspend threshold is exceeded, not the whole subordinated queue is suspended, there are only single tasks running in single slots suspended. If a queue instance is limited in a way that no more jobs can run there immediately, then the qinstance is automatically disabled, which is expressed via a P(reempted) state. The scheduler avoids scheduling more jobs to a queue instance than can be run there immediately.

Like with queuewise subordination, the subordination relationships are in effect only between queue instances residing at the same host. The relationship does not apply and is ignored when jobs and tasks are running in queue instances on other hosts. The syntax is: slots=<threshold>(<queue_list>) where

■ <threshold> - a positive integer number ■ {{<queue_list>} - <queue_def>[,<queue_list>] ■ <queue_def> - <queue>[:<seq_no>][:<action>]

■ <queue> - a xxQS_NAMExx queue name as defined for queue_name in sge_

types(1).

■ <seq_no> - sequence number among all subordinated queues of the same depth in

the tree. The higher the sequence number, the lower is the priority of the queue. Default is 0, which is the highest priority.

■ <action> - the action to be taken if the threshold is exceeded. Supported is: ■ "sr": Suspend the task with the shortest run time. This is the default. ■ "lr": Suspend the task with the longest run time.

Some examples of possible configurations and their functionalities:

2.1.2.13.1 The simplest configuration subordinate_list slots=2(B.q)

which means the queue "B.q" is subordinated to the current queue (let's call it "A.q"), the suspend threshold for all tasks running in "A.q" and "B.q" on the current host is two, the sequence number of "B.q" is "0" and the action is "suspend task with shortest run time first". This subordination relationship looks like this:

A.q

| B.q

This could be a typical configuration for a host with a dual core CPU. This

subordination configuration ensures that tasks that are scheduled to "A.q" always get a CPU core for themselves, while jobs in "B.q" are not preempted as long as there are no jobs running in "A.q".

If there is no task running in "A.q", two tasks are running in "B.q" and a new task is scheduled to "A.q", the sum of tasks running in "A.q" and "B.q" is three. Three is greater than two, this triggers the defined action. This causes the task with the shortest run time in the subordinated queue "B.q" to be suspended. After suspension, there is one task running in "A.q", on task running in "B.q" and one task suspended in "B.q".

2.1.2.13.2 A simple tree subordinate_list slots=2(B.q:1, C.q:2) This defines a small tree that looks like this:

A.q

/ \ B.q C.q

A use case for this configuration could be a host with a dual core CPU and queue "B.q" and "C.q" for jobs with different requirements, e.g. "B.q" for interactive jobs, "C.q" for batch jobs. Again, the tasks in "A.q" always get a CPU core, while tasks in "B.q" and "C.q" are suspended only if the threshold of running tasks is exceeded. Here the sequence number among the queues of the same depth comes into play. Tasks

scheduled to "B.q" can't directly trigger the suspension of tasks in "C.q", but if there is a task to be suspended, first "C.q" will be searched for a suitable task.

If there is one task running in "B.q", one in "C.q" and a new task is scheduled to "A.q", the threshold of "2" in "A.q", "B.q" and "C.q" is exceeded. This triggers the suspension of one task in either "B.q" or "C.q". The sequence number gives "B.q" a higher priority than "C.q", therefore the task in "C.q" is suspended. After suspension, there is one task running in "A.q", one task running in "B.q" and one task suspended in "C.q".

2.1.2.13.3 More than two levels Configuration of A.q: subordinate_list slots=2(B.q) Configuration of B.q: subordinate_list slots=2(C.q)

looks like this:

A.q

| B.q | C.q

These are three queues with high, medium and low priority. If a task is scheduled to "C.q", first the subtree consisting of "B.q" and "C.q" is checked, the number of tasks running there is counted. If the threshold which is defined in "B.q" is exceeded, the job in "C.q" is suspended. Then the whole tree is checked, if the number of tasks running in "A.q", "B.q" and "C.q" exceeds the threshold defined in "A.q" the task in "C.q" is suspended. This means, the effective threshold of any subtree is not higher than the threshold of the root node of the tree. If in this example a task is scheduled to "A.q", immediately the number of tasks running in "A.q", "B.q" and "C.q" is checked against the threshold defined in "A.q".

2.1.2.13.4 Any tree A.q / \ B.q C.q / / \ D.q E.q F.q \ G.q

The computation of the tasks that are to be (un)suspended always starts at the queue instance that is modified, i.e. a task is scheduled to, a task ends at, the configuration is modified, a manual or other automatic (un)suspend is issued, except when it is a leaf node, like "D.q", "E.q" and "G.q" in this example. Then the computation starts at its parent queue instance (like "B.q", "C.q" or "F.q" in this example). From there first all running tasks in the whole subtree of this queue instance are counted. If the sum exceeds the threshold configured in the subordinate_list, in this subtree a task is searched to be suspended. Then the algorithm proceeds to the parent of this queue

instance, counts all running tasks in the whole subtree below the parent and checks if the number exceeds the threshold configured at the parent's subordinate_list. If so, it searches for a task to suspend in the whole subtree below the parent. And so on, until it did this computation for the root node of the tree.

2.1.2.14 How to Configure Subordinate Queues With QMON

1. On the QMON Main Control window, click the Queue Control button. The Cluster Queues dialog box appears.

2. Select a queue and then click Modify. The Queue Configuration dialog box appears.

3. To configure subordinate queues, click the Subordinates tab. Use the subordinate