• No results found

Process Scheduling in Linux

N/A
N/A
Protected

Academic year: 2021

Share "Process Scheduling in Linux"

Copied!
41
0
0

Loading.... (view fulltext now)

Full text

(1)

The Gate of the AOSP #4 :Gerrit, Memory & Performance Process Scheduling in Linux

2013. 3. 29

(2)

Outline

1 Process scheduling

2 SMP scheduling

(3)

Process scheduling

• scheduler basics

• O(1) scheduler

• CFS

(4)

Terminology

• UTS

• Unix Time-sharing System

• task

• process or thread

• runqueue (rq)

• per-cpu list contains runnable tasks

• latency

• time delay between stimulus and response

• throughput

(5)

Scheduler basics

• scheduler

• CPU resouce manager • central part of kernel • controls time slices • schedules tasks (processes)

• decision maker

• when? who? how long? • latency/throughput • fairness

(6)

Task types

• interactive task

• I/O bound (ex. editor) • sleeps most of times • wants minimal latency

• batch task

• CPU bound (ex. compiler) • wants maximal throughput

(7)

Task state

(8)

Context switch

• when schedule a task • voluntary (sleep) • non-voluntary (preempt)

• save/restore task information • hw registers

• memory address space

• overhead

• cache eviction • TLB flush

(9)

Preemption

• schedule a (running) task (at any time?)

• configurable at compile time

• CONFIG_PREEMPT{,_NONE,_VOLUNTARY}

(10)

Kernel preemption points

• return to user (syscall, irq, exception) • PREEMPT_NONE

• might_sleep()

• PREEMPT_VOLUNTARY

• return from irq (preempt_count == 0)

• preempt_enable()/spin_unlock() • PREEMPT

(11)

Time slice

• round-robin time sharing system (UTS)

• a time unit allowed for a task at a given time

• can be affected by timer freq. (HZ)

• hard to optimize since

• it should be small for less latency • it should be large for better throughput

(12)

POSIX scheduling policy

• SCHED_FIFO • SCHED_RR • SCHED_OTHER • Linux-specific • SCHED_NORMAL • SCHED_BATCH • SCHED_IDLE

(13)

Linux scheduling class

• RT class • SCHED_FIFO • SCHED_RR • FAIR class • SCHED_NORMAL • SCHED_BATCH • SCHED_IDLE

(14)

Scheduling priority

• Root of Evil(tm)

• but we need it anyway

• for less latency and higher throughput

Changing priority

$ chrt -m

SCHED_OTHER min/max priority: 0/0 SCHED_FIFO min/max priority: 1/99 SCHED_RR min/max priority: 1/99 SCHED_BATCH min/max priority: 0/0 SCHED_IDLE min/max priority: 0/0

(15)

Ideal CPU model

• share cpu resources to all tasks • run tasks simultaneously • each task owns its share • currently impossible

(16)

O(1) Scheduler

• used for RT tasks

• used for normal tasks (prior to 2.6.23)

• simple and fast algorithm

• fixed number of bitmap and array • static-assigned time slice

(17)

Completely-Fair Scheduler

• used for normal tasks

• treat all tasks fairly

• unless they have different priority • vruntime deals with the priority

(18)

SMP scheduling

• cpu load tracking

• scheduler domain

(19)

CPU affinity

• set of cpus allowed to run a given task • all cpus are allowed by default

• must be considered when scheduling

• each task remembers its cpu running on

Setting cpu affinity

# taskset 0xff <command> # taskset -c 0-7 <command> # taskset -p [mask] <pid>

(20)

CPU load tracking

• CPU load = number of tasks running

• TASK_RUNNING + TASK_UNINTERRUPTIBLE

• global load average

• system load information (1, 5 ,15 min)

• cpu load average • for load balancing

Global load average

$ uptime

18:19:26 up 1 day, 7:51, 2 users, load average: 0.02, 0.03, 0.05

(21)

Moving average

• calc average of continuous data stream • smooth out short-term fluctuations • highlight longer-term trends

• kernel uses EMA for cpu load tracking • past term decreases exponentially • this process called “decay”

• http://en.wikipedia.org/wiki/Moving_average

(22)

Scheduler domain

• abstraction layer of hardware topology • a domain consists of groups

• a cpu resides in multiple hierachies • a domain is a group in higher domain

(23)

Load balancing

• migrate task according to cpu’s load • spread or pack

• chances to balance • fork & exec • wake up • idle • periodic

(24)

Load balancing strategy

• fork & exec

• spread to any idlest cpu

• wake up

• keep prev cpu or migrate to current cpu • or its idle sibling

• idle

• migrate to current cpu

• periodic

(25)

Scheduler domain fields

• for fine-tuning scheduler • SD_BALANCE_* flags • various indexes

• balance interval • imbalance percent

(26)

(H)MP scheduling

• started with per-entity load tracking patchset • calculate each sched entity’s load

• based on runnable time

• Linaro is trying to upstream the code for ARM big.LITTLE

(27)

Group scheduling

• control groups (cgroups)

• task group (cpu cgroup)

• group scheduling

(28)

Control groups

• way of grouping arbitrary tasks

• each group can implement what they want • cpu, memcg, block, device, perf, . . . • usually used for resource management

(29)

Cgroup filesystem

• maintain group hierachy in a pseudo fs

Using cgroupfs

# mount -t cgroup -o cpu nodev /sys/fs/cgroup/cpu # cd /sys/fs/cgroup/cpu

# ls

cgroup.clone_children cgroup.event_control cgroup.procs cpu.shares notify_on_release release_agent tasks

# mkdir aaa

# echo $$ > aaa/tasks

(30)
(31)

Scheduling entity

• abstraction of scheduling unit

• traditionally same as a task

• with group scheduling, it can be a task group

(32)

Task group

• abstraction of group of tasks

• viewed as a sched entity

(33)

Task group + SMP

• tasks in a group can be distributed on multiple CPUs • each cpu sees a portion of a task group

• need to distribute group’s share also • proportional to no. tasks in a cpu/group

• non-group entity (task) uses its share/load solely

(34)

Example of group scheduling (1)

• please refer to page 30

• all tasks has same load

• “professors” group • shares: 1024

• two running tasks: A and B

• “students” group • shares: 512 • no running task

• “system_tasks” group • shares: 512

(35)

CPU load in example 1

(36)

Example of group scheduling (2)

• same condition of example 1, but . . .

• in the “students” group • “Electonics students” group

• shares: 3072

• three running tasks: f, g, h

• “Computers students” group • shares: 3072

• two running tasks: u, v

• “Other students” group • shares: 1024

(37)

CPU load in example 2

(38)
(39)

Auto group

• when task sits in a (default) root group • not used if it moves to another group

• create a new group for new session • new terminal/login

Check autogroup enabled

# cat /proc/sys/kernel/sched_autogroup_enabled 1

(40)

Group scheduling in Android

• root group • system tasks

• foreground group

• currently running (user-visible) app • shares 95% of cpu

• background group • inactive apps • shares 5% of cpu

(41)

Thanks!

Q & A

References

Related documents

Army Commander 1 Dates Sub-Generals 0-3 Terrain 0-2 Camp Mandatory Characteristics Base PTS Optional Characteristics Min Max Superior - 0 Protected - 6 Average Unskilled 0

Flow-brightened electrolytic coatings are obtained by heating a matt electrolytic coating above its melting point for a few seconds and then cooling it. The coatings

In this section companies are to report premium variable expenses for automobile insurance. Direct Standard Commissions, Direct Non-Standard Commissions and Premium Taxes are

If requests are approved by the Operation Committee and President, the Provost, VPs and deans will be notified so the equity and special salary increase amounts can be entered

— For vertical installation, min. 0 °C — For vertical installation, max.. tilt angle, portrait format). — At maximum tilt

Army Commander 1 Dates Sub-Generals 0-3 Terrain Mandatory Characteristics Base PTS Optional Characteristics Min Max Average Unskilled 0 Protected Javelin 2 Superior - 0 Protected -

No Subject Theory IA Clinical/ Practical OSCE Viva Max Min Max Min Max Min Max Min Max Min 1 Medical and Allied Sciences 100 50 50 25 100 50 50 25 50 25 2 Surgery and Allied

Product instance SAP NW - EP Core implements the SAP NetWeaver usage type EP Core and always requires the installation of additional Business Packages (product instance Portal