• No results found

<Insert Picture Here> Tracing on Linux: the Old, the New, and the Ugly

N/A
N/A
Protected

Academic year: 2021

Share "<Insert Picture Here> Tracing on Linux: the Old, the New, and the Ugly"

Copied!
29
0
0

Loading.... (view fulltext now)

Full text

(1)

<Insert Picture Here>

(2)

Categories of Tracing Tools

Kernel Tracing

Userspace Applications Tracing

Static Tracing

Dynamic Tracing

Sampling

Continuous

(3)

A Brief History

LTT (Linux Trace Toolkit) (1998) replaced in 2006Kprobes (2004):

http://www.ibm.com/developerworks/library/l-kprobes/index.html

Dtrace for Solaris (2003) released on Solaris 10 in November 2005Systemtap for Linux (2005) at OLS

LTTng (LTT Next Generation) (2006): https://lkml.org/lkml/2006/1/9/223Ftrace: (2008): https://lkml.org/lkml/2008/1/3/26

(4)

Currently Active Projects

Ftrace

Perf

Systemtap

(5)

Ftrace

In kernel with portions outside

Main Maintainer: Steven Rostedt (2008)

CLI: /sys/kernel/debug/tracing as interface (control and output)GUI for data visualization (KernelShark)

http://people.redhat.com/srostedt/kernelshark/HTML/

Trace-cmd: user space tool with subcommands

http://git.kernel.org/pub/scm/linux/kernel/git/rostedt/trace-cmd.git

Fundamental mechanism: gprof hooks. Adds mcount() call at entry of each

kernel function. Requires kernel to be compiled with -pg option for some functionality

User: Root privilegesNo user space tracing

(6)

Information collected: ftrace plugins

Function: trace entry of all kernel functions

Function-graph: traces on both entry and exit of the functions. It

then provides the ability to draw a graph of function calls

Wakeup: max latency that it takes for the highest priority task to

get scheduled after it has been woken up

Irqsoff: areas that disable interrupts and saves the trace with the

longest max latency

Preemptoff: amount of time for which preemption is disabledPreemptirqsoff: largest time for which irqs and/or preemption is

disabled

Nop: trace nothing

(7)

Ftrace: trace-cmd

User space toolRequires root

Some commands are:

Record: start collection of events into file trace.dat

Can use ftrace plugins (-p) or static kernel events (-e)

Report: display content of trace.dat

Start: start tracing but keep data in kernel ring buffer

Stop: stop collecting data into the kernel ring buffer

Extract: extract data from ring buffer into trace.dat

List: list available plugins (-p) or events (-e)

Many options, very flexible

(8)

Ftrace: Original Interface

# echo sched_switch > current_tracer # echo 1 > tracing_on # sleep 1 # echo 0 > tracing_on # cat trace   # tracer: sched_switch #       TASK­PID   CPU#    TIMESTAMP  FUNCTION #      | |      |      |         |       bash­3997  [01]   240.132281:   3997:120:R   +  4055:120:R       bash­3997  [01]   240.132284:   3997:120:R ==>  4055:120:R        sleep­4055  [01]   240.132371:   4055:120:S ==>  3997:120:R       bash­3997  [01]   240.132454:   3997:120:R   +  4055:120:S       bash­3997  [01]   240.132457:   3997:120:R ==>  4055:120:R        sleep­4055  [01]   240.132460:   4055:120:D ==>  3997:120:R

(9)

Ftrace: Trace-cmd

# trace­cmd record ­e sched_switch sleep 5  # trace­cmd report version = 6  cpus=2  sleep­29568 [000] 2930217.458882:sched_switch:trace­cmd:29568 [120] R ==> evince:28664 [120]  evince­28664 [000] 2930217.458977:sched_switch:evince:28664 [120] S ==> trace­cmd:29568 [120]  sleep­29568 [000] 2930217.459349:sched_switch:sleep:29568 [120] R ==> pulseaudio:1909 [120]  pulseaudio­1909 [000] 2930217.459516:sched_switch:pulseaudio:1909 [120] S ==> sleep:29568 [120]  sleep­29568 [000] 2930217.460463:sched_switch: sleep:29568 [120] S ==> trace­cmd:29566 [120]  trace­cmd­29566 [000] 2930217.460510:sched_switch:trace­cmd:29566 [120] S ==> trace­cmd:29567  [120]  trace­cmd­29567 [000] 2930217.460543:sched_switch:trace­cmd:29567 [120] S ==> swapper:0 [120]  <idle>­0  [000] 2930217.460710:sched_switch:swapper:0 [120] R ==> Xorg:1675 [120] 

(10)
(11)

Ftrace: Dynamic Tracing

Added by Masami Hiramatsu (end of 2009)Real dynamic tracing

Use /sys/kernel/debug/tracing/kprobe_events to control from command

line

Flexible to an extent, but hard to use because of syntaxRead more: Documentation/trace/kprobetrace.txt

LWN article: http://lwn.net/Articles/343766/

echo 'r:myretprobe do_sys_open $retval' >  /sys/kernel/debug/tracing/kprobe_events

(12)

Perf

In kernel (tools/perf directory) userspace tool (started in 2008)Main contributors : Thomas Gleixner, Ingo Molnar, Frederic

Weisebecker, Arnaldo Charvalo De Melo, Peter Zijlstra and others

Started as Hardware performance counters interface, initially called

perf counters.

Originally on top of the hardware-based performance monitoring

interface for Linux by Stephane Eranian. Now available as libpfm version 4 http://perfmon2.sourceforge.net/docs_v4.html

Has grown into all encompassing tracing systemVery active at the moment

No user space tracing yet

Documentation: tools/perf/design.txt and tools/perf/DocumenationPerf wiki: https://perf.wiki.kernel.org/index.php/Main_Page

(13)

Perf subcommands

Perf stat: collects and display events data during a command

execution

Perf record: run a command, store its profiling (sampling mode)

in output file (perf.data)

Perf report: display data previously recorded in output file

(perf.data)

Perf diff: diff between two perf.data files

Perf top: performance counter profile in real time

Perf probe: define dynamic tracepoints (more on this later)

Perf kmem: show statistics on kernel mem (from perf.data), or

record kernel mem usage for a command

Perf trace-perl (trace-python): process previously recorded trace

(14)

Perf subcommands (continued)

Perf list: list available symbolic events types (in exadecimal

encoding)

Perf annotate: display perf.data with assembly or source code of

the command executed

Perf lock: lock statistics

Perf sched: scheduler behaviour

Perf kvm: perform same as above but on a kvm guest

environment with subcommands:

Perf kvm top

Perf kvm record

Perf kvm report

(15)

Perf: Types of events

Perf list shows the list of predefined events that are availableUse one or more of the events in perf stat commands:

perf stat -e instructions sleep 5

Hardware: cpu cycles, instructions, cache references, cache misses, branch

instructions, branch misses, bus cycles. Uses libpfm. CPU model specific.

Hardware cache: for instance L1-dcache-loads, L1-icache-prefetchesHardware breakpoint

Raw: hexadecimal encoding provided by harware vendor

Software: in kernel events. Such as: cpu clock, task clock, page faults, context

switches, cpu migrations, page faults, alignment faults, emulation faults.

Static tracepoints: needs the ftrace event tracer enabled. Events are listed in

/sys/kernel/debug/tracing/events/* (locations defined with TRACE_EVENT() macros)

(16)

Other Control Parameters

Process specific: default mode, perf follows fork() and

pthread_create() calls

Thread specific: use - - no-inherit (or -i) optionSystem wide: use -a option

CPU specific: use -C with list of CPU numbersFor running process: -p <pid>

To find what binaries were running , from a perf.data file, use Elf

unique identifier inserted by linker:

perf buildid-list

User, Kernel, Hypervisor modes: count events when the CPU is

in user, kernel and/or hypervisor mode only. Use with perf stat

(17)

Dynamic Tracing in Perf

Contributed by Masami Hiramatsu

Probes can be inserted on the fly using perf probeDefined on the command line only

Syntax can be simple or complex

Can use dwarf debuginfo to show source code (--line option)

before setting probe (uses elfutils libdw like systemtap), or to use function name, line number, variable name

Abstraction on ftrace kprobes, alleviates usage problems,

supports also same syntax to specify probes

Some options: --add, --del, --line, --list, --dry-run, --verbose,

--force

(18)

Example

Add probe, run a command, and display the data:

# ./perf probe -P 'p:myprobe @fs/read_write.c:285 file buf count'

Adding new event: p:myprobe vfs_read+57 file=%bx buf=%si count=%ax # ./perf list

...

rNNN [raw hardware event descriptor] kprobes:myprobe [Tracepoint event]

# ./perf record -f -e kprobes:myprobe:record -F 1 -a ls #./perf trace

[...]

perf-11445 [000] 95862.048894383: myprobe: (c04bbed5) file=dae15e80 buf=b78b2000 count=400 perf-11445 [000] 95862.049066533: myprobe: (c04bbed5) file=dae15d80 buf=b78b2000 count=400 [...]

(19)

LTTng

External to kernel tree

Maintainer Mathieu Desnoyers

http://lttng.org/

Userspace tracing via markers and tracepoints (static

instrumentation). Need to link with special library.

kprobes/tracepoints support for kernel tracing

GUI to view events (LTTV)

Eclipse integration

Included in some embedded Linux distributions

(20)

Systemtap

External to kernel tree (started in 2005)Multiple maintainers: Red Hat, IBM, othershttp://sourceware.org/systemtap/wiki

Kernel tracing using kprobes

Userspace tracing requires in kernel infrastructure that is not in

mainline (only Red Hat based kernels) (utrace/uprobes)

Dynamic probes and tracepoints

Has well defined rich scripting language GUI available

Safety issues around modifying memory locations (guru mode) Can use debuginfo (seen as drawback)

(21)

Kprobes: Dynamic Tracing

Added to the Linux kernel in 2005

Similar to breakpoints: place breakpoint instruction at

desired code location and execute actions when hit.

Optimizations to kprobes using Jumps instead of

exceptions.

(22)

Events Markers

Static probe points in kernel code

Independent of users (ftrace, perf, Lttng, systemap)Many uses in the kernel (a few hundreds and growing)TRACE_EVENT() macro with 6 arguments

Definitions in include/linux/tracepoint.h

Define characteristics of tracing actions (probe) using

TRACE_EVENT() in include/trace/events/*.h

Mark tracing locations with function call trace_my_event_name()

• e.g. trace_sched_switch() in sched.c and

TRACE_EVENT(sched_switch,...) defined in sched.h

Read the 3-article series:

• http://lwn.net/Articles/379903/

(23)

Utrace

Systemtap provides userspace tracing based on utrace. Code in

RHEL/Fedora.

Utrace: goal was to be replacement for ptrace. Status: development

abandoned. Fedora still ships it, used by a few other tools.

• See mailing list: https://www.redhat.com/mailman/listinfo/utrace-devel

• See code: http://redhat.com/~roland/utrace/

Maintained by Oleg Nesterov

Patch kept up to date with recent kernels, most recent is for kernel 3.1

(24)

Uprobes

Uprobes: handle userspace breakpoints in kernelAnalogous to kprobes

Uses breakpoint instructionPosted many revisions No longer based on utraceNo signals

Multiple tracers allowed

Ptrace replacement eventually?

Still not approved

Latest version of the patchset https://lkml.org/lkml/2011/9/20/123

Integration of previous version of the patch with latest kernel F15 and F16

(25)

Tracing Infrastructure Unification?

Goal: unify the kernel infrastructure for tracing

There are separate ring buffers used by perf and ftrace

• Need more complex interfaces

• Need to be fast

Simpler ring buffers in the kernel (kfifo) for other users also existUnification effort restarted many times:

http://lkml.org/lkml/2010/7/9/368

http://lkml.org/lkml/2010/8/17/485

Some old benchmarks:

http://git.kernel.org/pub/scm/linux/kernel/git/compudj/linux-2.6-ringbuffer.git

Status quo with multiple buffers is likely to persist. Too many differing needs.Possible unification of ftrace and perf? Proposal of “trace” but stalled:

(26)

Dtrace

A Solaris tool, available since 2005

Ported to Linux by Oracle, initial release October 2011, version 0.1, still WIPCode posted here http://oss.oracle.com/git/

• linux-2.6-dtrace-modules-beta.git

• linux-2.6-dtrace-unbreakable-beta.git

Dtrace provider, syscall provider

Next: static probes, then tackle userspace tracing

Integrated with Oracle Unbreakable Enterprise Kernel, available on UEK

2.6.32 right now (0.1 version), moving to UEK2 version 2.6.39 from now on

x86_64 only

Kernel changes are GPLKernel Module is CDDL

(27)

In Summary

Systemtap: Moving forward, but on its own. Provides userspace tracing.

Resistance by kernel community around architectural design decisions. Integration as is with upstream kernel unlikely?

LTTng: Developed actively, but standalone. Provides userspace tracing. Code in

its own repo. Not in mainline kernel.

Ftrace: continuing, thanks to trace-cmd and kernelshark. Kernel focusedPerf: most active project at this time. Kernel focused

Unification of Buffers in kernel: stalled

Scalability: in progress. See google's comments: http://lwn.net/Articles/464268/Utrace: development stalled/abandoned, maintenance only

Uprobes: hopeful

In kernel interpreter: maybe?

tracepoints/static events: KABI status? Integration of ftrace and perf? trace?

(28)

Thank You

Ananth Mavinakayanahalli (IBM)

Srikar Dronamraju (IBM)

Masami Hiramatsu (Hitachi)

Steven Rostedt (Red Hat)

(29)

The preceding is intended to outline our general

product direction. It is intended for information

purposes only, and may not be incorporated into any

contract. It is not a commitment to deliver any

material, code, or functionality, and should not be

relied upon in making purchasing decisions.

The development, release, and timing of any

features or functionality described for Oracle’s

products remains at the sole discretion of Oracle.

References

Related documents

Metal oxides are commonly used to support polyoxometalates exhibit distinctive functionalities due to large surface area and catalytic properties of

Based on this understanding, this work focuses on the effect of the inclusion of shallow donor dopant such as gallium into the normal cadmium chloride post-growth treatment as

[r]

‘Good’ and higher reporting disclosure levels were obtained for the profit strategy elements and Natural Resources in the strategic architecture baseline by the Mining and

For measures that have building-wide impacts or impacts across a number of systems, engineering modeling using generally accepted public domain software is acceptable to

acts only as an agent in securing hotel, transportation and other travel services and in no event shall be held responsible for the failure by any person or

The proportion of news users in Northern Ireland who said they used BBC One, UTV, the BBC website or app, or other local commercial radio stations to access news.. about their

The paper discusses the implementation of paint draw- ing on computer screen in real time using a vision based method, where touch-enabled device is not required.. Here hand finger