Scheduling. key concepts: round robin, shortest job first, MLFQ, multi-core scheduling, cache affinity, load balancing.

(1)

Scheduling

key concepts: round robin, shortest job first, MLFQ, multi-core scheduling, cache affinity, load balancing

Lesley Istead

David R. Cheriton School of Computer Science University of Waterloo

Spring 2021

(2)

Simple Scheduling Model

We are given a set of jobsto schedule.

Only one job can run at a time.

For each job, we are given job arrival time (ai) job run time (r_i) For each job, we define

response time: time between the job’s arrival and when the job starts to run

turnaround time: time between the job’s arrival and when the job finishes running.

We must decide when each job should run, to achieve some goal, e.g., minimize average turnaround time, or minimize average response time.

2 / 26

(3)

First Come, First Served

jobs run in order of arrival simple, avoids starvation

time J4

J2 J1

0 4 8 12 16 20

J3

Job J1 J2 J3 J4

arrival (a_i) 0 0 0 5 run time (r_i) 5 8 3 2

(4)

Round Robin

preemptive FCFC OS/161’s scheduler

time J4

J3 J2 J1

0 4 8 12 16 20

Job J1 J2 J3 J4

arrival (a_i) 0 0 0 5 run time (ri) 5 8 3 2

4 / 26

(5)

Shortest Job First

run jobs in increasing order of runtime minimizes average turnaroundtime starvation is possible

time J4

J3 J2 J1

0 4 8 12 16 20

Job J1 J2 J3 J4

arrival (a_i) 0 0 0 5

(6)

Shortest Remaining Time First

preemptive variant of SJF; arriving jobs preempt running job select one with shortest remaining time

starvation still possible

time J4

J3 J2 J1

0 4 8 12 16 20

Job J1 J2 J3 J4

arrival (a_i) 0 0 0 5 run time (ri) 5 8 3 2

6 / 26

(7)

CPU Scheduling

In CPU scheduling, the “jobs” to be scheduled are the threads.

CPU scheduling typically differs from the simple scheduling model:

the run times of threads are normally not known

threads are sometimes not runnable: when they are blocked threads may have different priorities

The objective of the scheduler is normally to achieve a balance between

responsiveness (ensure that threads get to run regularly), fairness,

efficiency

How would FCFS, Round Robin, SJF, and SRTF handle blocked threads? Priorities?

(8)

Multi-level Feedback Queues

the most commonly used scheduling algorithm in modern times

objective: good responsiveness for interactivethreads, non-interactive threads make as much progress as possible

key idea: interactive threads are frequently blocked, waiting for user input, packets, etc.

approach: given higher priority to interactive threads, so that they run whenever they are ready.

problem: how to determine which threads are interactive and which are not?

MLFQ is used in Microsoft Windows, Apple macOS, Sun Solaris, and many more. It was used in Linux, but no longer is.

8 / 26

(9)

MLFQ Algorithm

...

Qn, qn

Qn-1, qn-1

Qn-2, qn-2

Q1, q1

highest priority

lowest priority shortest quantum

longest quantum

on preempt

n round-robin ready queues where the priority of Q_i > Q_j if i > j

threads in Q_i use quantum q_i and q_i ≤ q_j if i > j scheduler selects a thread from the highest priority queue to run

threads in Q_n−1 are only selected if Q_n is empty preempted threads are put onto the back of the next lower-priority queue

a thread from Qnis preempted, it is pushed onto Qn−1

when a thread wakes after blocking, it is put onto the highest-priority queue

Since interactive threads tend to block frequently, they will ”live” in higher-priority queues while non-interactive threads sift down to the bottom.

(10)

3-Queue MLFQ Example

blocked

ready (Q3)

ready (Q2)

ready (Q1)

run

run sleep wake u

p

run

preempt preempt

preempt

When do threads in Q1 run if Q3 is never empty?

To prevent starvation, all threads are periodically placed in the highest-priority queue.

10 / 26

(11)

3-Queue MLFQ Example

blocked

ready (Q3)

ready (Q2)

ready (Q1)

run

run sleep wake up

run

preempt preempt

preempt

T1 T2

Two threads, T1 and T2, start in Q3.

(12)

3-Queue MLFQ Example

blocked

ready (Q3)

ready (Q2)

ready (Q1)

run

run sleep wake u

p

run

preempt preempt

preempt

T1 T2

T1 is selected to run.

12 / 26

(13)

3-Queue MLFQ Example

blocked

ready (Q3)

ready (Q2)

ready (Q1)

run

run sleep wake u

p

run

preempt preempt

preempt

T1

T2

T1 is preempted and pushed onto the back of Q2. T2 is selected to run.

(14)

3-Queue MLFQ Example

blocked

ready (Q3)

ready (Q2)

ready (Q1)

run

run sleep wake up

run

preempt preempt

preempt

T1

T2 T3

While T2 is running a new thread, T3, arrives.

14 / 26

(15)

3-Queue MLFQ Example

blocked

ready (Q3)

ready (Q2)

ready (Q1)

run

run sleep wake u

p

run

preempt preempt

preempt

T1

T3

T2 terminates. T3 is selected.

(16)

3-Queue MLFQ Example

blocked

ready (Q3)

ready (Q2)

ready (Q1)

run

run sleep wake u

p

run

preempt preempt

preempt

T1 T3

T3 blocks. T1 is selected.

16 / 26

(17)

3-Queue MLFQ Example

blocked

ready (Q3)

ready (Q2)

ready (Q1)

run

run sleep wake up

run

preempt preempt

preempt

T1

T3

T1 is preempted, it is pushed onto Q1.

(18)

3-Queue MLFQ Example

blocked

ready (Q3)

ready (Q2)

ready (Q1)

run

run sleep wak

e up

run

preempt preempt

preempt

T1 T3

T1 is selected.

18 / 26

(19)

3-Queue MLFQ Example

blocked

ready (Q3)

ready (Q2)

ready (Q1)

run

run sleep wake u

p

run

preempt preempt

preempt

T1 T3

T3 is woken by T1 causing T1 to be preempted. Many variants of MLFQ will preempt low-priority threads when a thread wakes to ensure a fast response to an event.

(20)

3-Queue MLFQ Example

blocked

ready (Q3)

ready (Q2)

ready (Q1)

run

run sleep wake u

p

run

preempt preempt

preempt

T1

T3

T3 is selected.

20 / 26

(21)

Linux Completely Fair Scheduler (CFS) - Main Ideas

each thread can be assigned a weight

the goal of the scheduler is to ensure that each thread gets a

“share” of the processor in proportion to its weight basic operation

track the “virtual” runtime of each runnable thread always run the thread with the lowest virtual runtime virtual runtime is actual runtime adjusted by the thread weights

suppose wi is the weight of the i th thread actual runtime of i th thread is multiplied by

P

jw_j w_i

virtual runtime advances slowly for threads with high weights, quickly for threads with low weights

In MLFQ the quantum depended on the thread priority. In CFS, the quantum is the same for all threads and priorities.

(22)

CFS Example

Suppose the total weight of all threads in the system is 50 and the quantum is 5.

Time Thread Weight Actual Runtime Virtual Runtime

t 1 25 5

2 20 5

3 5 5

t + 5 1 25

2 20

3 5

Which thread is selected at t? Which thread at t + 5?

22 / 26

(23)

CFS Example

Suppose the total weight of all threads in the system is 50 and the quantum is 5.

Time Thread Weight Actual Runtime Virtual Runtime

t + 5 1 25 5 5 ∗ 50/25 = 10

2 20 5 5 ∗ 50/20 = 12.5

3 5 5 5 ∗ 50/5 = 50

T1 is selected

t + 5 1 25 10 10 ∗ 50/25 = 20

2 20 5 12.5

3 5 5 50

T2 is selected Which thread is selected at t? Which thread at t + 5?

(24)

Scheduling on Multi-Core Processors

core

core core

core

per core ready queue vs. shared ready queue

Which offers better performance? Which one scales better?

24 / 26

(25)

Scalability and Cache Affinity

Contention and Scalability

access to shared ready queue is a critical section, mutual exclusion needed

as number of cores grows, contention for ready queue becomes a problem

per core design scales to a larger number of cores CPU cache affinity

as thread runs, data it accesses is loaded into CPU cache(s) moving the thread to another core means data must be reloaded into that core’s caches

as thread runs, it acquires an affinity for one core because of the cached data

per core design benefits from affinity by keeping threads on the same core

shared queue design does not

(26)

Load Balancing

in per-core design, queues may have different lengths this results in load imbalance across the cores

cores may be idle while others are busy

threads on lightly loaded cores get more CPU time than threads on heavily loaded cores

not an issue in shared queue design

per-core designs typically need some mechanism for thread migration to address load imbalances

migration means moving threads from heavily loaded cores to lightly loaded cores

26 / 26