RED HAT CONFIDENTIAL | JEREMY EDER
1
Performance Tuning and
Analysis of Red Hat
Enterprise Linux 6 and 7
Jeremy Eder
Principal Software Engineer
March 13, 2014
RED HAT CONFIDENTIAL | JEREMY EDER
2
Agenda: Performance Analysis of RHEL6/7
●
Performance Engineering Overview
●
Performance Analysis Utilities
●
Tuned
●
NUMA Tuning
RED HAT CONFIDENTIAL | JEREMY EDER 3
Performance Engineering
Overview
Micro-Benchmarks
Applications/Benchmarks
RED HAT CONFIDENTIAL | JEREMY EDER
4 Red Hat Confidential
Red Hat Performance Engineering
● Benchmarks – code path coverage
● CPU – linpack, lmbench
● Memory – lmbench, McCalpin Streams
● Disk IO – Iozone, aiostress – scsi, FC, iSCSI
● Filesystem – IOzone, postmark– ext3/4, xfs. gfs2,gluster ● Network – Netperf – 10 Gbit, 40 Gbit IB, PCI3
● Bare Metal, RHEL6/7 KVM
RED HAT CONFIDENTIAL | JEREMY EDER
5
Performance Projects / Tooling
● RHEL6.5 “numad” “tuna”, and “tuned”
● Tuna used to bind IRQ's / real-time like isolation ● Profiling challenges
−Data address profiling (cache-2-cache detection),
providing:
• the hottest contended cachelines
• the process names, addresses, pids, tids causing that contention
• the cpus they ran on,
RED HAT CONFIDENTIAL | JEREMY EDER
6
Performance Optimization
Out-of-the-box
RED HAT CONFIDENTIAL | JEREMY EDER
8
Performance Tuning Automation
Automatic Tuning Tuned Transparent Hugepages numad irqbalance Manual Tuning N/A Static Hugepages
NUMA Pinning (numactl) IRQ Pinning (tuna)
RHEL7 numa_balancing
RED HAT CONFIDENTIAL | JEREMY EDER
9
But...what if we have a problem ?
RED HAT CONFIDENTIAL | JEREMY EDER
10
But...what if we have a problem ?
● Automatic not enough...
RED HAT CONFIDENTIAL | JEREMY EDER
11
But...what if we have a problem ?
● Automatic not enough...
● Need to eek out the last X percent ● Need Determinism
RED HAT CONFIDENTIAL | JEREMY EDER
12
But...what if we have a problem ?
● Automatic not enough...
● Need to eek out the last X percent ● Need Determinism
RED HAT CONFIDENTIAL | JEREMY EDER
13
Overview of Performance
Analysis Utilities
RED HAT CONFIDENTIAL | JEREMY EDER
14
perf
Userspace tool to read CPU
counters and kernel tracepoints
RED HAT CONFIDENTIAL | JEREMY EDER
15
perf list
List counters/tracepoints available
on your system
RED HAT CONFIDENTIAL | JEREMY EDER
16
perf list
grep for something interesting,
maybe to see what numabalance is
doing ?
RED HAT CONFIDENTIAL | JEREMY EDER
17
perf top
System-wide 'top' view of busy
functions
RED HAT CONFIDENTIAL | JEREMY EDER
18
perf record
RED HAT CONFIDENTIAL | JEREMY EDER
19
perf record
●
Record system-wide (-a)
RED HAT CONFIDENTIAL | JEREMY EDER
20
perf record
●
Record system-wide (-a)
●
A single command
RED HAT CONFIDENTIAL | JEREMY EDER
21
perf record
●
Record system-wide (-a)
●
A single command
●
An existing process (-p)
RED HAT CONFIDENTIAL | JEREMY EDER
22
perf record
●
Record system-wide (-a)
●
A single command
●
An existing process (-p)
●
Add call-chain recording (-g)
RED HAT CONFIDENTIAL | JEREMY EDER
23
perf record
●
Record system-wide (-a)
●
A single command
●
An existing process (-p)
●
Add call-chain recording (-g)
RED HAT CONFIDENTIAL | JEREMY EDER
24
perf report
RED HAT CONFIDENTIAL | JEREMY EDER
25
perf report
/dev/zero
RED HAT CONFIDENTIAL | JEREMY EDER
26
perf diff
RED HAT CONFIDENTIAL | JEREMY EDER
27
perf probe (dynamic tracepoints)
Insert a tracepoint on any function...
Try 'perf probe -F' to list possibilities
RED HAT CONFIDENTIAL | JEREMY EDER
28
RED HAT CONFIDENTIAL | JEREMY EDER
29
Overview of Performance
Analysis Utilities
RED HAT CONFIDENTIAL | JEREMY EDER
30
Performance Co-Pilot (PCP)
(Multi) system-level performance
monitoring and management
RED HAT CONFIDENTIAL | JEREMY EDER
31
pmchart – graphical metric plotting tool
RED HAT CONFIDENTIAL | JEREMY EDER
32
pmchart – graphical metric plotting tool
● Can plot myriad performance statistics ● Recording mode allows for replay
● i.e. on a different system ● Record in GUI, then
RED HAT CONFIDENTIAL | JEREMY EDER
33
pmchart – graphical metric plotting tool
● Can plot myriad performance statistics ● Recording mode allows for replay
● i.e. on a different system ● Record in GUI, then
# pmafm $recording.folio
● Ships with many pre-cooked “views”...for example:
● ApacheServers: CPU%/Net/Busy/Idle Apache Servers ● Overview: CPU%/Load/IOPS/Net/Memory
RED HAT CONFIDENTIAL | JEREMY EDER
34
RED HAT CONFIDENTIAL | JEREMY EDER
35
Performance Co-Pilot Demo Script
# CPU
/root/pig -s 5 # DISK
dd if=/dev/zero of=/root/2GB count=2048 bs=1M oflag=direct # NETWORK
netperf -H lab7 -l 5 # MEMORY
/root/pig -m 16384 -l sleep -s 5
RED HAT CONFIDENTIAL | JEREMY EDER 36 CPU % Load Avg IOPS Network Memory Allocated
RED HAT CONFIDENTIAL | JEREMY EDER
37
collectl mode
RED HAT CONFIDENTIAL | JEREMY EDER
38
collectl mode
IOPS CPU
RED HAT CONFIDENTIAL | JEREMY EDER 39
collectl mode
IOPS NET CPURED HAT CONFIDENTIAL | JEREMY EDER 40
collectl mode
IOPS MEM NET CPURED HAT CONFIDENTIAL | JEREMY EDER
41
RED HAT CONFIDENTIAL | JEREMY EDER
42
RED HAT CONFIDENTIAL | JEREMY EDER
43
NUMA Tuning
Discovery
RED HAT CONFIDENTIAL | JEREMY EDER
44
Visualize NUMA Topology: lstopo
How can I visualize my system's NUMA topology in Red Hat Enterprise Linux?
https://access.redhat.com/site/solutions/62879
RED HAT CONFIDENTIAL | JEREMY EDER
45
Visualize NUMA Topology: lstopo
How can I visualize my system's NUMA topology in Red Hat Enterprise Linux?
https://access.redhat.com/site/solutions/62879
RED HAT CONFIDENTIAL | JEREMY EDER
46
Visualize NUMA Topology: lstopo
How can I visualize my system's NUMA topology in Red Hat Enterprise Linux?
https://access.redhat.com/site/solutions/62879
PCI Devices
RED HAT CONFIDENTIAL | JEREMY EDER
47
Visualize NUMA Topology: lscpu
# lscpu
Architecture: x86_64 ...
CPU(s): 16 On-line CPU(s) list: 0-15 Thread(s) per core: 1
Core(s) per socket: 8 Socket(s): 2 NUMA node(s): 2 ...
NUMA node0 CPU(s): 0-7 NUMA node1 CPU(s): 8-15
RED HAT CONFIDENTIAL | JEREMY EDER
48
Visualize NUMA Topology: lscpu
# lscpu
Architecture: x86_64 ...
CPU(s): 16 On-line CPU(s) list: 0-15 Thread(s) per core: 1
Core(s) per socket: 8 Socket(s): 2 NUMA node(s): 2 ...
NUMA node0 CPU(s): 0-7 NUMA node1 CPU(s): 8-15
Logical Cores/HT
RED HAT CONFIDENTIAL | JEREMY EDER
49
NUMA Topology and PCI Bus
● Install adapters “close” to the CPU that will run the
performance critical application.
● When BIOS reports locality, irqbalance handles
NUMA/IRQ affinity automatically.
# lstopo-no-graphics |egrep 'NUMA|eth4' NUMANode L#0 (P#0 144GB)
NUMANode L#1 (P#1 144GB) Net L#10 "eth4"
RED HAT CONFIDENTIAL | JEREMY EDER
50
RHEL NUMA Scheduler
● RHEL6
● numactl, numastat enhancements
● numad – usermode tool, dynamically monitor, auto-tune
● RHEL7 – numabalance
● Enable / Disable
RED HAT CONFIDENTIAL | JEREMY EDER
51
How to manage NUMA manually - Checklist
Checklist
Research Topology Make a resource plan Consider I/O Virtualization Tool lstopo/lscpu cgroups, numactl irqbalance/PCI Bus numatune/numad
RED HAT CONFIDENTIAL | JEREMY EDER
52
How to manage NUMA manually - Checklist
Checklist
Research Topology Make a resource plan Consider I/O Virtualization Tool lstopo/lscpu cgroups, numactl irqbalance/PCI Bus numatune/numad
RED HAT CONFIDENTIAL | JEREMY EDER
53
How to manage NUMA manually - Checklist
Checklist
Research Topology Make a resource plan Consider I/O Virtualization Tool lstopo/lscpu cgroups, numactl irqbalance/PCI Bus numatune/numad
RED HAT CONFIDENTIAL | JEREMY EDER
54
How to manage NUMA manually - Checklist
Checklist
Research Topology Make a resource plan Consider I/O Virtualization Tool lstopo/lscpu cgroups, numactl irqbalance/PCI Bus numatune/numad
RED HAT CONFIDENTIAL | JEREMY EDER
55
How to manage NUMA manually - Checklist
Checklist
Research Topology Make a resource plan Consider I/O Virtualization Tool lstopo/lscpu cgroups, numactl irqbalance/PCI Bus numatune/numad
RED HAT CONFIDENTIAL | JEREMY EDER
56
NUMA Tools: numastat
Completely rewritten for RHEL6.4
Per-node /proc/meminfo
Backwards compatible
RED HAT CONFIDENTIAL | JEREMY EDER
57
NUMA Tools: numastat
Completely rewritten for RHEL6.4
Per-node /proc/meminfo
Backwards compatible
RED HAT CONFIDENTIAL | JEREMY EDER
58
numastat: compatibility mode (old)
# numastat node0 node1 numa_hit 77587739 131990042 numa_miss 0 0 numa_foreign 0 0 interleave_hit 30254 30099 local_node 69302710 129511360 other_node 8285029 2478682
RED HAT CONFIDENTIAL | JEREMY EDER
59
numastat: per-node meminfo (new)
# numastat -mczs Node 0 --- MemTotal 65491 MemFree 60366 MemUsed 5124 Active 2650 FilePages 2021 Active(file) 1686 Active(anon) 964 AnonPages 964 Inactive 341 Inactive(file) 340 Slab 380 SReclaimable 208 SUnreclaim 173 AnonHugePages 134
RED HAT CONFIDENTIAL | JEREMY EDER
60
numastat: per-node meminfo (new)
# numastat -mczs
Node 0 Node 1 Total ---MemTotal 65491 65536 131027 MemFree 60366 59733 120099 MemUsed 5124 5803 10927 Active 2650 2827 5477 FilePages 2021 3216 5238 Active(file) 1686 2277 3963 Active(anon) 964 551 1515 AnonPages 964 550 1514 Inactive 341 946 1287 Inactive(file) 340 946 1286 Slab 380 438 818 SReclaimable 208 207 415 SUnreclaim 173 230 403 AnonHugePages 134 236 370
RED HAT CONFIDENTIAL | JEREMY EDER
61
NUMA Tuning
numad
RED HAT CONFIDENTIAL | JEREMY EDER
62
NUMA: Process Scheduler Behavior
● Scheduler distributes load evenly across all cores ● Maintains responsiveness
● Optimizing for CPU utilization
● Tries to use idle CPUs, regardless of where process
memory is located
BUT!
Using remote memory degrades
performance!
RED HAT CONFIDENTIAL | JEREMY EDER
63
How to manage NUMA manually - Checklist
Short Term (RHEL6.4) U
serspace solution
numad
Long Term (RHEL7)
RED HAT CONFIDENTIAL | JEREMY EDER
64
How to manage NUMA manually - Checklist
Short Term (RHEL6.4) U
serspace solution
numad
Long Term (RHEL7) In-Kernel Solution
numabalance
RED HAT CONFIDENTIAL | JEREMY EDER
65
RED HAT CONFIDENTIAL | JEREMY EDER
66
Effect of Automatic NUMA Balancing Software
# numastat -c pig (default scheduler – non-optimal)
PID Node 0 Node 1 2578 (pig) 2123 11878 2579 (pig) 1988 12013 2580 (pig) 14000 1 2581 (pig) 1981 12020
RED HAT CONFIDENTIAL | JEREMY EDER
67
Effect of Automatic NUMA Balancing Software
# numastat -c pig (default scheduler – non-optimal)
PID Node 0 Node 1 2578 (pig) 2123 11878 2579 (pig) 1988 12013 2580 (pig) 14000 1 2581 (pig) 1981 12020
PID Node 0 Node 1 2578 (pig) 14000 0 2579 (pig) 0 14000 2580 (pig) 14000 0 2581 (pig) 0 14000 Before numad After numad
RED HAT CONFIDENTIAL | JEREMY EDER 68
Effect of numad/numabalance
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 0 2000 4000 6000 8000 10000 12000 14000Automatic NUMA Balancing - NUMAD
NODE-0-MB NODE-1-MB
Time (sec onds)
M e g a b yt e s P e r N U M A N o d e numad begins numad done
RED HAT CONFIDENTIAL | JEREMY EDER
69
RED HAT CONFIDENTIAL | JEREMY EDER
70
RED HAT CONFIDENTIAL | JEREMY EDER
71
What is “tuned” ?
Tuning profile delivery mechanism
Red Hat ships tuned profiles that
improve performance for many
workloads...hopefully yours!
RED HAT CONFIDENTIAL | JEREMY EDER
72
RED HAT CONFIDENTIAL | JEREMY EDER
73
Tuned: Storage Performance Boost
RED HAT CONFIDENTIAL | JEREMY EDER 74 0 50 100 150 200 250 Max
Tuned: Network Latency Performance Boost
C6 C3 C1 C0 La te nc y (M ic ro se co nd s)
C-state lock improves determinism, reduces jitter
RED HAT CONFIDENTIAL | JEREMY EDER
75
RED HAT CONFIDENTIAL | JEREMY EDER
76
tuned Profile Summary: RHEL6
Tunable defaultenterprise-storage virtual-host virtual-guest performancelatency- performance
throughput-sched_min_ granularity_ns 4ms 10ms 10ms 10ms 10ms sched_wakeup_granula rity_ns 4ms 15ms 15ms 15ms 15ms dirty_ratio 20% RAM 40% 10% 40% 40% dirty_background_ratio 10% RAM 5% swappiness 60 10 30
I/O Scheduler (Elevator) CFQ deadline deadline deadline deadline deadline Filesystem Barriers On Off Off Off
CPU Governor ondemand performance performance performance Disk Read-ahead 4x
Disable THP Yes
RED HAT CONFIDENTIAL | JEREMY EDER
77
RED HAT CONFIDENTIAL | JEREMY EDER
78
Tuned: Updates for RHEL7
RED HAT CONFIDENTIAL | JEREMY EDER
79
Tuned: Updates for RHEL7
●
Installed by default!
●
Profiles automatically set based on install type:
●Desktop/Workstation: balanced
RED HAT CONFIDENTIAL | JEREMY EDER
80
Tuned: Updates for RHEL7
RED HAT CONFIDENTIAL | JEREMY EDER
81
Tuned: Updates for RHEL7
●
Re-written for maintainability and extensibility.
●Configuration consolidated to single
RED HAT CONFIDENTIAL | JEREMY EDER
82
Tuned: Updates for RHEL7
●
Re-written for maintainability and extensibility.
●Configuration is now consolidated a single
tuned.conf file
RED HAT CONFIDENTIAL | JEREMY EDER
83
Tuned: Updates for RHEL7
●
Re-written for maintainability and extensibility.
●Configuration is now consolidated a single
tuned.conf file
●
Optional hook/callout capability
●
Adds concept of Inheritance (just like
RED HAT CONFIDENTIAL | JEREMY EDER
84
Tuned: Updates for RHEL7
●
Re-written for maintainability and extensibility.
●Configuration is now consolidated a single
tuned.conf file
●
Optional hook/callout capability
●
Adds concept of Inheritance (just like
httpd.conf)
●
Profiles updated for RHEL7 features and
RED HAT CONFIDENTIAL | JEREMY EDER
85
Children Parents
Tuned: Profile Inheritance
latency-performance throughput-performance network-latency network-throughput virtual-host virtual-guest balanced desktop
RED HAT CONFIDENTIAL | JEREMY EDER
86
Children Parents
Tuned: Profile Inheritance
latency-performance throughput-performance network-latency network-throughput virtual-host virtual-guest balanced desktop Your-DB Your-Web Your-Middleware
RED HAT CONFIDENTIAL | JEREMY EDER
87
RED HAT CONFIDENTIAL | JEREMY EDER
88
Tunable Units Balanced throughput-performance network-throughput
Inherits From/Notes throughput-performance
sched_min_ granularity_ns nanoseconds auto-scaling 10000000 sched_wakeup_granularity_ns nanoseconds 3000000 15000000
dirty_ratio Percent 20 40
dirty_background_ratio Percent 10 10
swappiness Weight 1-100 60 10
I/O Scheduler (Elevator) deadline
Filesystem Barriers Boolean Enabled
CPU Governor ondemand performance
Disk Read-ahead KB 128 4096
Disable THP Boolean Enabled
Energy Perf Bias normal performance
kernel.sched_migration_cost_ns nanoseconds 500000
min_perf_pct (intel_pstate only) Percent auto-scaling 100
tcp_rmem Bytes auto-scaling Max=16777216
tcp_wmem Bytes auto-scaling Max=16777216
udp_mem Pages auto-scaling Max=16777216
RED HAT CONFIDENTIAL | JEREMY EDER
89
Tunable Units Balanced latency-performance network-latency
Inherits From/Notes latency-performance
sched_min_ granularity_ns nanoseconds auto-scaling 10000000
sched_wakeup_granularity_ns nanoseconds 3000000 10000000
dirty_ratio percent 20 10
dirty_background_ratio percent 10 3
swappiness Weight 1-100 60 10
I/O Scheduler (Elevator) deadline
Filesystem Barriers Boolean Enabled
CPU Governor ondemand performance
Disable THP Boolean N/A No Yes
CPU C-States N/A Locked @ 1
Energy Perf Bias normal performance
kernel.sched_migration_cost_ns nanoseconds N/A 5000000
min_perf_pct (intel_pstate only) percent 100
net.core.busy_read microseconds 50
net.core.busy_poll microseconds 50
net.ipv4.tcp_fastopen Boolean Enabled
kernel.numa_balancing Boolean Disabled
RED HAT CONFIDENTIAL | JEREMY EDER
90
Tunable Units throughput-performance virtual-host virtual-guest
Inherits From/Notes
throughput-performance
throughput-performance sched_min_ granularity_ns nanoseconds 10000000
sched_wakeup_granularity_ns nanoseconds 15000000
dirty_ratio percent 40 30
dirty_background_ratio percent 10 5 30
swappiness Weight 1-100 10
I/O Scheduler (Elevator)
Filesystem Barriers Boolean
CPU Governor performance
Disk Read-ahead Bytes 4096
Energy Perf Bias performance
kernel.sched_migration_cost_ns nanoseconds 5000000
min_perf_pct (intel_pstate only) percent 100
RED HAT CONFIDENTIAL | JEREMY EDER
91
RHEL “tuned” package
Available profiles: - balanced - desktop - latency-performance - myprofile - network-latency - network-throughput - throughput-performance - virtual-guest - virtual-host
RED HAT CONFIDENTIAL | JEREMY EDER
92
RED HAT CONFIDENTIAL | JEREMY EDER 93 0 50 100 150 200 250 Max
CPU Tuning: C-states (idle states)
C6 C3 C1 C0
C-state Impact on Jitter
La te nc y (M ic ro se co nd s)
Time (1-sec intervals)
RED HAT CONFIDENTIAL | JEREMY EDER
94
Power Consumption RHEL6 vs RHEL6@C0
● C-state lock increases power draw over “out of the box”
● Use cron to set latency-performance tuned profile when necessary. ● Or use BUSY_POLL
● Set tuned profile in application init script
Test Efficiency [Wh] % Diff Kernel Build +12.5%
Disk Read +32.2% Disk Write +25.6% Unpack tar.gz +23.3% Active Idle +41%
RED HAT CONFIDENTIAL | JEREMY EDER 95 Default pk cor CPU %c0 GHz TSC %c1 %c3 %c6 %c7 0 0 0 0.24 2.93 2.88 5.72 1.32 0.00 92.72 0 1 1 2.54 3.03 2.88 3.13 0.15 0.00 94.18 0 2 2 2.29 3.08 2.88 1.47 0.00 0.00 96.25 0 3 3 1.75 1.75 2.88 1.21 0.47 0.12 96.44 latency-performance pk cor CPU %c0 GHz TSC %c1 %c3 %c6 %c7 0 0 0 0.00 3.30 2.90 100.00 0.00 0.00 0.00 0 1 1 0.00 3.30 2.90 100.00 0.00 0.00 0.00 0 2 2 0.00 3.30 2.90 100.00 0.00 0.00 0.00 0 3 3 0.00 3.30 2.90 100.00 0.00 0.00 0.00
Turbostat shows P/C-states on Intel CPUs
RED HAT CONFIDENTIAL | JEREMY EDER
96
Profiling cpuidle and cpufreq
● Fixed upstream cpuidle regression in June
● Future CPUs such as Haswell add more C-states,
C8,C9,C10
● Turbostat display a bit awkward for 40+ cores
● Direction in future is to shrink the exit-latency between
C-states
● http://www.breakage.org/2012/11/processor-max_cstat
RED HAT CONFIDENTIAL | JEREMY EDER
97
RED HAT CONFIDENTIAL | JEREMY EDER 98
Helpful Utilities
Supportability ● redhat-support-tool ● sos ● kdump ● perf ● psmisc ● strace ● sysstat ● systemtap ● trace-cmd ● util-linux-ng NUMA ● hwloc ● Intel PCM ● numactl ● numad ● numatop (01.org) Power/Tuning ● cpupowerutils (R6) ● kernel-tools (R7) ● powertop ● tuna ● tuned Networking ● dropwatch ● ethtool ● netsniff-ng (EPEL6) ● tcpdump ● wireshark/tshark Storage ● blktrace ● iotop ● iostatRED HAT CONFIDENTIAL | JEREMY EDER
99
Helpful Links
● Red Hat Low Latency Performance Tuning Guide
● Optimizing RHEL Performance by Tuning IRQ Affinity
● Red Hat Performance Tuning Guide
● Red Hat Virtualization Tuning Guide
● STAC Network I/O SIG
● Finteligent Low Latency Tuning w/KVM
● Perf
RED HAT CONFIDENTIAL | JEREMY EDER
100