Performance Testing for GDB

(1)

Overview

Perf testing framework

Future work

Performance Testing for GDB

Yao Qi

[email protected]

CodeSourcery/MentorGraphics

2014-09-13

(2)

Overview

Perf testing framework

Future work

1 Overview

Introduction

Goals

2 Perf testing framework

Requirements

Design and implementation

Test case example

(3)

Overview

Perf testing framework

Future work

Introduction

Goals

Introduction

We need something to test and

measure GDB perf first,

No standard mechanism to show the

performance,

Proposed the perf testing framework

in Aug and Sep 2013,

Write some perf test cases and

patches for perf improvements from

Sep to Nov.

source:accurate-measurement.jpg

“if you cannot measure it,

you cannot improve it.”

Lord Kelvin

Thank Doug and Sanimir

for the comments and

patient review.

(4)

Overview

Perf testing framework

Future work

Introduction

Goals

Know the new perf testing

framework in GDB and how

to write perf test cases on

top of it,

Demonstrate how do they

show the perf improvements,

0 5 10 15 20 25 30 35 40 10000 20000 30000 40000 CPUT time (s)

Number of single steps original

patched

Show how do we use it to

track GDB performance

(5)

Overview

Perf testing framework

Future work

Requirements

Design and implementation

Test case example

Perf improvements

Requirements

easy to write test case for a certain

aspect, i.e. symbol lookup or single

step

basic measurements are provided (such

as cpu time and memory usage), but

test specific measurement can be

added too.

source:shelter1.jpg

both native debugging

and remote debugging

are supported,

test case has warm up

optionally,

save results in different

formats,

the executable in test

case can be

pre-compiled,

(6)

Overview

Perf testing framework

Future work

Requirements

Design and implementation

Test case example

Perf improvements

Design and implementation

Use

DejaGNU

to invoke compiler and start GDB

(and/or GDBserver),

GDB loads a python script in which some

operations are performed,

The framework collects the performance data

and save them in the desired format,

Detect performance regressions,

compile test case

gdb.perf/FOO.c

lib/gdb.exp

lib/perftest.exp

start gdb and

gdbserver

load program

gdb.perf/FOO.exp

run

(7)

Overview

Perf testing framework

Future work

Requirements

Design and implementation

Test case example

Perf improvements

Test case example: skip-prologue.exp

1 PerfTest:: a ss e mb l e {

2 if

{ [ gdb_compile

" $srcdir / $subdir / $srcfile "

$ { binfile }

executable

{ debug } ] ! =

" "

} {

3 return

-1

4 }

5

6 return

0

7 } {

8 clean_restart $binfile

9

10 if

![ runto_main ] {

11 fail

" Can ’ t run to main "

12 return

-1

13 }

14 } {

15 global

SKIP_PROLO G UE _ CO U NT

16

17 set

test

" run "

18 gdb_test_mu lt i pl e

" python SkipPrologue $ $SKIP_PRO L OG U E_ C O UN T $ .run () "

$test {

19 -re

" Breakpoint $decimal at \[^\ n \] * \ n "

{

20 # GDB prints some messages on breakpoint creation.

21 # Consume the output to avoid internal buffer full.

22 exp_continue

23 }

24 -re

" .*$gdb_prompt $ "

{

25 pass $test

26 }

27 }

28 }

COMPILE

START UP

RUN

(8)

Overview

Perf testing framework

Future work

Requirements

Design and implementation

Test case example

Perf improvements

Test case example: skip-prologue.py

1 from

perftest

import

perftest

2

3 class

SkipPrologue ( perftest . T e s t C a s e W i t h B a s i c M e a s u r e m e n t s) :

4 def

init ( self , count ):

5 super

( SkipPrologue , self ). init (

" skip - prologue "

)

6 self . count = count

7

8 def

_test ( self ):

9 for

_

in range

(1 , self . count ):

10 # Insert breakpoints on function f1 and f2 .

11 bp1 = gdb . Breakpoint (

" f1 "

)

12 bp2 = gdb . Breakpoint (

" f2 "

)

13 # Remove them .

14 bp1 . delete ()

15 bp2 . delete ()

16

17 def

warm_up ( self ):

18 self . _test ()

19 Extend from

TestCaseWithBasicMeasurements

Test

Measure the test

0 1 2 3 4 5 6 7 8 0 1 2 3 4 5 6 7 8 9 10 _TestCase name measure exuecte testcase () run () warm up () TestCaseWithBasicMeasurements SkipPrologue warm up () execute test ()

(9)

Overview

Perf testing framework

Future work

Requirements

Design and implementation

Test case example

Perf improvements

Cache code access for skip-prologue

Before

After

read some bytes target parse the instruction is it part of prologue? stop yes no RSP read some

bytes target dcache target

parse the instruction is it part of prologue? stop yes no read RSP 0 1 2 3 4 5 6 1000 2000 3000 4000

CPU time (s)

SKIP_PROLOGUE_COUNT

Original Patched

note that only x86 and x86 64 make use that.

(10)

Overview

Perf testing framework

Future work

Requirements

Design and implementation

Test case example

Perf improvements

Cache code access for disassemble

0 10000 20000 30000 40000 50000 60000 70000 80000 90000 evaluate_subexp_standardhandle_inferior_event c_parse_internal

Number of m packets

Original Patched 0 0.5 1 1.5 2 2.5 3 3.5

evaluate_subexp_standard handle_inferior_event c_parse_internal

CPU time (s)

Original Patched

(11)

Overview

Perf testing framework

Future work

Requirements

Design and implementation

Test case example

Perf improvements

My workflow of improving perf

Examine the RSP log and locate

some unnecessary packets

(gdb) si

Sending packet: $qTStatus#49...

Sending packet: $Z0,80483c7,1#b4...Packet received: OK Sending packet: $Z0,4ce5b6b0,1#6e...Packet received: OK Sending packet: $QPassSignals:...

Sending packet: $vCont;s:p1b15.1b15;c#20...

Sending packet: $qXfer:traceframe-info:read::0,fff#0b...Packet received: E01 Sending packet: $mbfffef40,40#c0...

Sending packet: $qXfer:traceframe-info:read::0,fff#0b...Packet received: E01 Sending packet: $qXfer:traceframe-info:read::0,fff#0b...Packet received: E01 Sending packet: $qXfer:traceframe-info:read::0,fff#0b...Packet received: E01 Sending packet: $z0,80483c7,1#d4...Packet received: OK

Sending packet: $z0,4ce5b6b0,1#8e...Packet received: OK

Sending packet: $qXfer:traceframe-info:read::0,fff#0b...Packet received: E01 Sending packet: $qXfer:traceframe-info:read::0,fff#0b...Packet received: E01 Sending packet: $qXfer:traceframe-info:read::0,fff#0b...Packet received: E01 Sending packet: $qXfer:traceframe-info:read::0,fff#0b...Packet received: E01 Sending packet: $qXfer:traceframe-info:read::0,fff#0b...Packet received: E01 Sending packet: $qXfer:traceframe-info:read::0,fff#0b...Packet received: E01

Investigate and hack the code,

Show the perf improved by the

patch,

0 5 10 15 20 25 30 35 40 10000 20000 30000 40000 CPUT time (s)

Number of single steps

original patched

Typical profling doesn’t help

-

2.91%

gdbserver

libc-2.14.90.so

[.] __strcmp_sse4_2

__strcmp_sse4_2

-

1.92%

gdbserver

libc-2.14.90.so

[.] vfprintf

- vfprintf

- 74.54% _IO_vsprintf

52.60% handle_tracepoint_query

+ 47.40% 0x1

+ 24.26% _IO_vsnprintf

-

1.39%

gdbserver

[.] find_regno

+ find_regno

(12)

Overview

Perf testing framework

Future work

Requirements

Design and implementation

Test case example

Perf improvements

Read multiple cache lines in dcache xfer memory

Before

After

300000 400000 500000 600000 Original Patched 15 20 25 30 35 Original Patched

(13)

Overview

Perf testing framework

Future work

Track and compare GDB performance

Still have no way to track and compare perf data

A web application to monitor and analyze the performance,

Written by python and easy to set up,

It doesn’t meet our needs perfectly, and it is not actively

maintained,

(14)

Overview

Perf testing framework

Future work

Observations and future work

observations

Command

thread

doesn’t scale,

0 5 10 15 20 25 30 35 40 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 190 200 210 220 230 240 CPU Time (s) Number of threads Switch thread by command thread N

Actual

1static int

2 d o _ c a p t u r e d _ t h r e a d _ s e l e c t (structui_out * uiout ,void* tidstr )

3{

4 ...

5 /* Since the current thread may have changed , see if there is any 6 exited thread we can now delete . */

7 prune_thread s ();

8

dcache is conservative for multi-threaded code,

1 enumtarget_xfer _ st a tu s

2 d c a c h e _ r e a d _ m e m o r y _ p a r t i a l (structtarget_ops * ops , DCACHE * dcache ,

3 CORE_ADDR memaddr , gdb_byte * myaddr ,

4 ULONGEST len , ULONGEST * xfered_len )

5 {

6 /* If this is a different inferior from what we ’ ve recorded , 7 flush the cache . */

8

9 if( ! ptid_equal ( inferior_ptid , dcache - > ptid ) )

10 {

11 dcache_inva li d at e ( dcache );

12 dcache - > ptid = inferior_ptid ;

13 } Remove? 10 15 20 25 30 CPU Time (s)

Disassemble across threads

Current(remote) Patched(remote)

(15)

Overview

Perf testing framework

Future work

Framework