Overview
Perf testing framework
Future work
Performance Testing for GDB
Yao Qi
[email protected]
CodeSourcery/MentorGraphics
2014-09-13
Overview
Perf testing framework
Future work
1
Overview
Introduction
Goals
2
Perf testing framework
Requirements
Design and implementation
Test case example
Overview
Perf testing framework
Future work
Introduction
Goals
Introduction
We need something to test and
measure GDB perf first,
No standard mechanism to show the
performance,
Proposed the perf testing framework
in Aug and Sep 2013,
Write some perf test cases and
patches for perf improvements from
Sep to Nov.
source:accurate-measurement.jpg
“if you cannot measure it,
you cannot improve it.”
Lord Kelvin
Thank Doug and Sanimir
for the comments and
patient review.
Overview
Perf testing framework
Future work
Introduction
Goals
Goals
Know the new perf testing
framework in GDB and how
to write perf test cases on
top of it,
Demonstrate how do they
show the perf improvements,
0 5 10 15 20 25 30 35 40 10000 20000 30000 40000 CPUT time (s)
Number of single steps original
patched
Show how do we use it to
track GDB performance
Overview
Perf testing framework
Future work
Requirements
Design and implementation
Test case example
Perf improvements
Requirements
easy to write test case for a certain
aspect, i.e. symbol lookup or single
step
basic measurements are provided (such
as cpu time and memory usage), but
test specific measurement can be
added too.
source:shelter1.jpg
both native debugging
and remote debugging
are supported,
test case has warm up
optionally,
save results in different
formats,
the executable in test
case can be
pre-compiled,
Overview
Perf testing framework
Future work
Requirements
Design and implementation
Test case example
Perf improvements
Design and implementation
Use
DejaGNU
to invoke compiler and start GDB
(and/or GDBserver),
GDB loads a python script in which some
operations are performed,
The framework collects the performance data
and save them in the desired format,
Detect performance regressions,
compile test case
gdb.perf/FOO.c
lib/gdb.exp
lib/perftest.exp
start gdb and
gdbserver
load program
gdb.perf/FOO.exp
run
Overview
Perf testing framework
Future work
Requirements
Design and implementation
Test case example
Perf improvements
Test case example: skip-prologue.exp
1
PerfTest:: a ss e mb l e {
2
if
{ [ gdb_compile
" $srcdir / $subdir / $srcfile "
$ { binfile }
executable
{ debug } ] ! =
" "
} {
3
return
-1
4
}
5
6
return
0
7
} {
8
clean_restart $binfile
9
10
if
![ runto_main ] {
11
fail
" Can ’ t run to main "
12
return
-1
13
}
14
} {
15
global
SKIP_PROLO G UE _ CO U NT
16
17
set
test
" run "
18
gdb_test_mu lt i pl e
" python SkipPrologue \( $SKIP_PRO L OG U E_ C O UN T \) .run () "
$test {
19
-re
" Breakpoint $decimal at \[^\ n \] * \ n "
{
20
# GDB prints some messages on breakpoint creation.
21
# Consume the output to avoid internal buffer full.
22
exp_continue
23
}
24
-re
" .*$gdb_prompt $ "
{
25
pass $test
26
}
27
}
28
}
COMPILE
START UP
RUN
Overview
Perf testing framework
Future work
Requirements
Design and implementation
Test case example
Perf improvements
Test case example: skip-prologue.py
1
from
perftest
import
perftest
2
3
class
SkipPrologue ( perftest . T e s t C a s e W i t h B a s i c M e a s u r e m e n t s) :
4
def
__init__ ( self , count ):
5
super
( SkipPrologue , self ). __init__ (
" skip - prologue "
)
6
self . count = count
7
8
def
_test ( self ):
9
for
_
in range
(1 , self . count ):
10
# Insert breakpoints on function f1 and f2 .
11
bp1 = gdb . Breakpoint (
" f1 "
)
12
bp2 = gdb . Breakpoint (
" f2 "
)
13
# Remove them .
14
bp1 . delete ()
15
bp2 . delete ()
16
17
def
warm_up ( self ):
18
self . _test ()
19
Extend from
TestCaseWithBasicMeasurements
Test
Measure the test
0 1 2 3 4 5 6 7 8 0 1 2 3 4 5 6 7 8 9 10 TestCase name measure exuecte testcase () run () warm up () TestCaseWithBasicMeasurements SkipPrologue warm up () execute test ()
Overview
Perf testing framework
Future work
Requirements
Design and implementation
Test case example
Perf improvements
Cache code access for skip-prologue
Before
After
read some bytes target parse the instruction is it part of prologue? stop yes no RSP read somebytes target dcache target
parse the instruction is it part of prologue? stop yes no read RSP 0 1 2 3 4 5 6 1000 2000 3000 4000
CPU time (s)
SKIP_PROLOGUE_COUNT
Original Patchednote that only x86 and x86 64 make use that.
Overview
Perf testing framework
Future work
Requirements
Design and implementation
Test case example
Perf improvements
Cache code access for disassemble
0 10000 20000 30000 40000 50000 60000 70000 80000 90000 evaluate_subexp_standardhandle_inferior_event c_parse_internal
Number of m packets
Original Patched 0 0.5 1 1.5 2 2.5 3 3.5evaluate_subexp_standard handle_inferior_event c_parse_internal
CPU time (s)
Original Patched
Overview
Perf testing framework
Future work
Requirements
Design and implementation
Test case example
Perf improvements
My workflow of improving perf
Examine the RSP log and locate
some unnecessary packets
(gdb) si
Sending packet: $qTStatus#49...
Sending packet: $Z0,80483c7,1#b4...Packet received: OK Sending packet: $Z0,4ce5b6b0,1#6e...Packet received: OK Sending packet: $QPassSignals:...
Sending packet: $vCont;s:p1b15.1b15;c#20...
Sending packet: $qXfer:traceframe-info:read::0,fff#0b...Packet received: E01 Sending packet: $mbfffef40,40#c0...
Sending packet: $qXfer:traceframe-info:read::0,fff#0b...Packet received: E01 Sending packet: $qXfer:traceframe-info:read::0,fff#0b...Packet received: E01 Sending packet: $qXfer:traceframe-info:read::0,fff#0b...Packet received: E01 Sending packet: $z0,80483c7,1#d4...Packet received: OK
Sending packet: $z0,4ce5b6b0,1#8e...Packet received: OK
Sending packet: $qXfer:traceframe-info:read::0,fff#0b...Packet received: E01 Sending packet: $qXfer:traceframe-info:read::0,fff#0b...Packet received: E01 Sending packet: $qXfer:traceframe-info:read::0,fff#0b...Packet received: E01 Sending packet: $qXfer:traceframe-info:read::0,fff#0b...Packet received: E01 Sending packet: $qXfer:traceframe-info:read::0,fff#0b...Packet received: E01 Sending packet: $qXfer:traceframe-info:read::0,fff#0b...Packet received: E01
Investigate and hack the code,
Show the perf improved by the
patch,
0 5 10 15 20 25 30 35 40 10000 20000 30000 40000 CPUT time (s)Number of single steps
original patched
Typical profling doesn’t help
-
2.91%
gdbserver
libc-2.14.90.so
[.] __strcmp_sse4_2
__strcmp_sse4_2
-
1.92%
gdbserver
libc-2.14.90.so
[.] vfprintf
- vfprintf
- 74.54% _IO_vsprintf
52.60% handle_tracepoint_query
+ 47.40% 0x1
+ 24.26% _IO_vsnprintf
-
1.39%
gdbserver
gdbserver
[.] find_regno
+ find_regno
Overview
Perf testing framework
Future work
Requirements
Design and implementation
Test case example
Perf improvements
Read multiple cache lines in dcache xfer memory
Before
After
300000 400000 500000 600000 Original Patched 15 20 25 30 35 Original PatchedOverview
Perf testing framework
Future work
Track and compare GDB performance
Still have no way to track and compare perf data
A web application to monitor and analyze the performance,
Written by python and easy to set up,
It doesn’t meet our needs perfectly, and it is not actively
maintained,
Overview
Perf testing framework
Future work
Observations and future work
observations
Command
thread
doesn’t scale,
0 5 10 15 20 25 30 35 40 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 190 200 210 220 230 240 CPU Time (s) Number of threads Switch thread by command thread N
Actual
1static int
2 d o _ c a p t u r e d _ t h r e a d _ s e l e c t (structui_out * uiout ,void* tidstr )
3{
4 ...
5 /* Since the current thread may have changed , see if there is any 6 exited thread we can now delete . */
7 prune_thread s ();
8
dcache is conservative for multi-threaded code,
1 enumtarget_xfer _ st a tu s
2 d c a c h e _ r e a d _ m e m o r y _ p a r t i a l (structtarget_ops * ops , DCACHE * dcache ,
3 CORE_ADDR memaddr , gdb_byte * myaddr ,
4 ULONGEST len , ULONGEST * xfered_len )
5 {
6 /* If this is a different inferior from what we ’ ve recorded , 7 flush the cache . */
8
9 if( ! ptid_equal ( inferior_ptid , dcache - > ptid ) )
10 {
11 dcache_inva li d at e ( dcache );
12 dcache - > ptid = inferior_ptid ;
13 } Remove? 10 15 20 25 30 CPU Time (s)
Disassemble across threads
Current(remote) Patched(remote)