• No results found

Software profiling for an FPGA-based CPU core.

N/A
N/A
Protected

Academic year: 2020

Share "Software profiling for an FPGA-based CPU core."

Copied!
89
0
0

Loading.... (view fulltext now)

Full text

(1)

University of Windsor

University of Windsor

Scholarship at UWindsor

Scholarship at UWindsor

Electronic Theses and Dissertations

Theses, Dissertations, and Major Papers

1-1-2007

Software profiling for an FPGA-based CPU core.

Software profiling for an FPGA-based CPU core.

Jason G. Tong

University of Windsor

Follow this and additional works at: https://scholar.uwindsor.ca/etd

Recommended Citation

Recommended Citation

Tong, Jason G., "Software profiling for an FPGA-based CPU core." (2007). Electronic Theses and Dissertations. 6963.

https://scholar.uwindsor.ca/etd/6963

(2)

Softw are P rofilin g For A n F P G A -B a sed

C P U Core

by

J a so n G . T on g

A Thesis

Subm itted to the Faculty of G rad u ate Studies and Research

through Electrical and C om puter Engineering

in P a rtia l Fulfillment of the Requirem ents for the

Degree of M aster of Applied Science at the

U niversity of W indsor

(3)

Library and Archives Canada

Bibliotheque et Archives Canada

Published Heritage Branch

395 W ellington Street Ottawa ON K1A 0N4 Canada

Your file Votre reference ISBN: 978-0-494-34988-5 Our file Notre reference ISBN: 978-0-494-34988-5

Direction du

Patrimoine de I'edition

395, rue W ellington Ottawa ON K1A 0N4 Canada

NOTICE:

The author has granted a non­

exclusive license allowing Library

and Archives Canada to reproduce,

publish, archive, preserve, conserve,

communicate to the public by

telecommunication or on the Internet,

loan, distribute and sell theses

worldwide, for commercial or non­

commercial purposes, in microform,

paper, electronic and/or any other

formats.

AVIS:

L'auteur a accorde une licence non exclusive

permettant a la Bibliotheque et Archives

Canada de reproduire, publier, archiver,

sauvegarder, conserver, transmettre au public

par telecommunication ou par I'lnternet, preter,

distribuer et vendre des theses partout dans

le monde, a des fins commerciales ou autres,

sur support microforme, papier, electronique

et/ou autres formats.

The author retains copyright

ownership and moral rights in

this thesis. Neither the thesis

nor substantial extracts from it

may be printed or otherwise

reproduced without the author's

permission.

L'auteur conserve la propriete du droit d'auteur

et des droits moraux qui protege cette these.

Ni la these ni des extraits substantiels de

celle-ci ne doivent etre imprimes ou autrement

reproduits sans son autorisation.

In compliance with the Canadian

Privacy Act some supporting

forms may have been removed

from this thesis.

While these forms may be included

in the document page count,

their removal does not represent

Conformement a la loi canadienne

sur la protection de la vie privee,

quelques formulaires secondaires

ont ete enleves de cette these.

(4)

© 2007 Ja so n G. T ong

(5)

A b s tr a c t

(6)
(7)

A c k n o w le d g m e n ts

T h e day is finally here! I have successfully co m pleted one of m y life-tim e achievem ents, a M a s te r’s D egree in E lectrical an d C o m p u te r E ngineering. T h e re are several people who I w ould like to acknow ledge in th is d issertatio n .

F irs t an d forem ost, I w ould like to give m y sincerest th a n k s to m y supervisor, P rofessor M oham m ed A. S. K halid. I am in d e b te d for his invaluable advice, en­ courag em en t, m o ral su p p o rt an d g u idance th ro u g h o u t m y M a s te r’s research. His professionalism , know ledge an d ex p e rtise will never be forg o tten . I will alw ays value our research discussions t h a t we h a d over th e last few years. N ext, I would like to th a n k m y thesis co m m ittee m em bers: P rofessors N a ray an K a r an d N ader Z am ani, for th e ir invaluable suggestions, an d s u p p o rt th ro u g h o u t th is p ro je c t. S pecial th a n k s to P rofessor H u a p en g W u for his valuable tim e chairing th e M .A .Sc. Defence. Also, I w ould like to give a very special th a n k you to Lesley S h an n o n a n d B lair F o rt from U niversity of T oro n to for th e ir invaluable advice, tim e an d assistan ce in th is p ro je c t.

(8)

A C K N O W L E D G M E N T S

K evin Bisw as, H a rb A b d u l-H a m id , M a tth e w M eloche, A shkan H osseinzadeh N am in, M itra M irhassani, M ah zad A zarm eh r, Ali B id ab ad i, Josh D aniel a n d N a ta lia Salgo for th e ir friendship an d su p p o rt d u rin g m y stay.

M y h e a rtfe lt th a n k s go o u t to L isa P rice, for h e r editing skills a n d g re a t p atien c e in revising a m a jo rity of m y p a p e rs over th e years, including th is thesis. Also for her con tin u in g friendship a n d su p p o rt she h as given to me.

To R alene M arcoccia, th e A lte ra U niversity P ro g ra m , a n d th e A lte ra C o rp o ra tio n , I th a n k you for providing th e Nios II D evelopm ent F P G A b o ard s a n d th e full licenses for th e developm ent softw are.

F in a lly an d m o st im p o rta n tly , I am in d e b te d to m y p a re n ts Y im a n d M ay T ong for th e ir everlasting love, u n d e rsta n d in g a n d m o ral su p p o rt th ro u g h o u t m y M a s te r’s journey. T h is voyage w ould n o t have been easy to em bark on w ith o u t them .

(9)

C o n ten ts

A b str a c t iv

D e d ic a tio n v

A c k n o w le d g m e n ts v i

L ist o f F ig u r e s x ii

L ist o f T a b les x iii

L ist o f A b b r e v ia tio n s x iv

1 I n tr o d u c tio n 1

1.1 P rofiling Tools for

F P G A -B a se d E m b ed d ed S y s t e m s ... 1

1.2 T hesis O b j e c t i v e s ... 3

1.3 T hesis O rg a n iz a tio n ... 5

2 D e s ig n M e th o d o lo g ie s for E m b e d d e d S y s te m s 6 2.1 T ra d itio n a l D esign M e th o d o lo g y ... 7

2.2 H ardw are-S oftw are C o-D esign M ethodology ... 9

2.3 F u n c tio n -A rc h ite c tu re C o-D esign ... 11

(10)

C O N T E N T S

2.5 S u m m ary ... 16

3 P r o filin g T ools 17 3.1 P rofiling Tools an d th e Softw are P rofiling M e t h o d o l o g y ... 17

3.2 Softw are B ased P rofiling (S B P) T o o l s ... 20

3.2.1 In stru c tio n S et S im u la to r ... 21

3.2.2 G N U ’s g p r o f ... 22

3.2.3 In te l’s V T u n e ... 23

3.2.4 S u m m a ry of S B P T o o l s ... 24

3.3 Softw are B ased M em ory Profilers ( S B M P ) ... 24

3.3.1 V a l g r i n d ... 25

3.3.2 R a tio n a l S oftw are’s P u r i f y ... 26

3.3.3 S u m m ary of S B M P Tools... ... 27

3.4 H ard w are-C o u n ter B ased Profiling (H C B P ) T o o l s ... 27

3.4.1 H ard w are C o u n ters A p p r o a c h ... 28

3.4.2 P age M ig ratio n A p p r o a c h ... 29

3.4.3 D esktop P ro cesso r P rofiling C o u n t e r s ... 29

3.4.4 S u m m a ry of H C B P Tools... ... 30

3.5 F P G A -B a se d P rofiling (F P G A -B P ) T o o l s ... 31

3.5.1 S noopP ... 32

3.5.2 F req u en t L oop A nalysis T ool ( F L A T ) ... 33

3.5.3 W o O D S T o C K ... 34

3.6 Q u a lita tiv e C om parison of Profiling Tools ... 35

(11)

C O N T E N T S

4.2 A irw olf Profiling C o u n t e r ... 41

4.3 A irw olf’s Softw are D r i v e r s ... 42

4.4 S u m m a ry ... 44

5 E x p e r im e n ta l R e s u lts 45 5.1 T h e Nios II P rofiling E n v ir o n m e n t... 45

5.2 F P G A D evelopm ent B o ard a n d D esign C A D Tools ... 47

5.3 Profiling Tools S e t t i n g ... 48

5.4 Profiling S oftw are B e n c h m a r k s ... 49

5.5 C om parison of P rofiled R esu lts ... 51

5.5.1 D i j k s t r a ... 51

5.5.2 F ib o _ M a trix _ M u lt... 52

5.5.3 G am e of Life ... 53

5.5.4 B itC o u n t ... 55

5.5.5 D h r y s t o n e ... 56

5.5.6 S u m m a r y ... 57

5.6 P erfo rm an ce O verhead A n a l y s i s ... 58

5.6.1 D i j k s t r a ... 58

5.6.2 F ib o _ M a trix _ M u lt... 59

5.6.3 G am e of Life ... 60

5.6.4 B itC o u n t ... 60

5.6.5 D h r y s t o n e ... 61

5.6.6 S u m m a r y ... 63

6 C o n c lu sio n s an d F u tu re W ork 64 6.1 R esearch C o n tr ib u tio n s ... 65

6.2 F u tu re W o r k ... 66

(12)

V IT A A U C T O R IS

C O N T E N T S

(13)

List o f Figures

2.1 T h e T ra d itio n a l D esign M e th o d o lo g y ... 8

2.2 T h e H ardw are-S oftw are C o-D esign M e th o d o lo g y ... 10

2.3 T h e F u n c tio n -A rc h ite c tu re C o-D esign M e t h o d o l o g y ... 12

2.4 D esign Space E x p lo ra tio n ... 14

2.5 P la tfo rm B ased D e s i g n ... 15

3.1 S oftw are P rofiling M e t h o d o l o g y ... 19

3.2 P rofiling Tool C l a s s i f i c a t i o n ... 21

3.3 R a tio n a l P u rify ’s M em ory Profiling C olour C o d e ... 26

3.4 P ag e M ig ratio n A p p ro ach ... 30

3.5 S n o o p y ’s Profiling A rc h ite c tu re ... 32

3.6 S n o o p y ’s Profiling C o u n t e r ... 33

3.7 F req u e n t L oop A nalysis T o o l ... 34

3.8 W atc h in g O ver D a ta S tream in g on C o m p u tin g E lem ent Links . . . . 35

4.1 T h e A irw olf P rofiler ... 40

4.2 T h e A irw olf Profiling C o u n t e r ... 41

4.3 A n E xam p le of A irw olf’s S oftw are D r i v e r s ... 43

(14)

List o f Tables

3.1 C om parison of P rofiling T o o l s ... 37

5.1 Nios D evelopm ent B o ard C o m p o n en ts ... 46

5.2 B en ch m ark D e s c r i p t i o n s ... 50

5.3 P rofiled R esu lts for D i j k s t r a ... 51

5.4 P rofiled R esults for F ibo_M atrix_M ult ... 52

5.5 P rofiled R esu lts for G am e for Life using N ios2-gprof ... 53

5.6 P rofiled R esu lts for G am e for Life using A i r w o l f... 54

5.7 P rofiled R esu lts for B itC o u n t using N i o s 2 - g p r o f... 54

5.8 P rofiled R esu lts for B itC o u n t using A ir w o lf ... 55

5.9 P rofiled R esu lts for D h r y s t o n e ... 57

5.10 P erfo rm an ce O v erh ead A nalysis for D i j k s t r a ... 59

5.11 P erfo rm an ce O verhead A nalysis for F ib o .M a trix _ M u lt... 59

5.12 P erfo rm an ce O verhead A nalysis for G am e of Life ... 60

5.13 P erfo rm an ce O verhead A nalysis for B itC o u n t ... 61

(15)

L ist o f Abbreviations

A b b re v iatio n D efinition

AIB A valon Interface B us AM D A dvanced M icro D evices

A P I A dvanced P ro g ra m m in g Interface A SIC A p p licatio n Specific In te g ra te d C ircuit CAD C o m p u te r A ided D esign

C E C o u n ter E n able

C P E C o m p u tin g P ro cesso r E lem ent C P U C e n tra l P rocessing U nit

D$ D a ta C ache

D S P D ig ital Signal P rocessing

D T L B D a ta T ra n sla tio n Lookaside Buffer F C N F u n ctio n

F L A T F req u en t L oop A nalysis Tool F L C F req u en t Loop C ache

F P G A F ield P ro g ra m m a b le G a te A rray

F P G A -B P F ield P ro g ra m m a b le G a te A rray-B ased P rofiling FSL F ast S im plex Link

H C B P H ard w are-C o u n ter B ased P rofiling H C E L H its C o u n ter E n ab le Line

H D L H ard w are D escrip tio n L anguage 1$ In stru c tio n C ache

IC In te g ra te d C ircu it

ID E In te g ra te d D evelopm ent E nviro n m en t IP In tellec tu a l P ro p e rty

(16)

L IS T OF A B B R E V IA T IO N S

ISS In s tru c tio n Set S im u lato r LSW L ea st Significant W ord M S W M ost Significant W ord

N io s-II-P E Nios II Profiling E n v iro n m en t

P A P I P erfo rm an ce A dvanced P ro g ra m m in g Interface P B D P la tfo rm B ased D esign

P C P ro g ra m C o u n ter

P M A P ag e M ig ratio n A p p ro ach R A M R a n d o m Access M em ory SBB S h o rt B ackw ards B ranch

SB M P S oftw are-B ased M em ory P rofiling S B P S oftw are-B ased P rofiling

S O F S ta tic -R A M O b je c t File

S O P C S y stem O n P ro g ra m m a b le C hip S O T S am pling O ver T im e

S P M Softw are P rofiling M ethodology T C E T im e C o u n ter E n able

T C E L T im e C o u n te r E n ab le Line

U A R T U niversal A synchronous Receiver T ra n sm itte r

(17)

C h a p ter 1

I n tro d u c tio n

1.1

P rofilin g T ools for

F P G A -B a se d E m b ed d ed S y stem s

In recent years, em bedded system s have grow n in p o p u la rity due to th e ir increased processing power. T h ey are prev alen t in o u r m o d ern society, w here th ese system s are used in a w ide v ariety of ap p lica tio n s ran g in g from th e p erform ance of sim ple everyday task s to p ro d u c t m an u fa ctu rin g . C om m only used em bedded system s include cell phones, electronic pagers, television rem ote controls, d ig ita l cam eras, p erso n al d a ta assistan ts, DVD players, H D T V a n d m uch m ore. In large in d u stria l com panies, em bedded system s are used as p ro g ram m a b le controllers for m an u fa ctu rin g , nuclear

p o w e r g e n e r a t i o n , t r a n s p o r t a t i o n a n d m e d ic a l i n s t r u m e n t a t i o n .

(18)

in-1. IN T R O D U C T IO N

p u t / o u tp u t interfaces. 99% of th e c u rre n t m icroprocessors p ro d u c ed are used for em bedded system s ap p lica tio n s [67]. T h e p u rp o se of these system s is to ex ecu te softw are ap p lica tio n code t h a t is sto re d in m em ory. D ue to th e lim ita tio n s in th e h ard w are resources of th ese system s, th e y c a n n o t be as flexible an d re p ro g ra m m a b le as a d esk to p co m p u ter. D esk to p co m p u ters are gen eral-p u rp o se co m p u ters c o n tain in g various h ard w are co m p o n en ts which ca n b e p ro g ram m ed to im plem ent any ap p lica­ tio n or function. E m b ed d ed system s have d ed ica ted an d lim ited h a rd w a re resources th a t are designed specifically for p erfo rm in g th e ta sk s th a t are specific to a p a rtic u la r application.

T h e continuing adv an cem en t an d inn o v atio n of em bedded system s, re su ltin g in increased com plexity, h as led designers to significantly intensify th e ir developm ent efforts d u rin g th e design process. In ad d itio n to th e add ed difficulty, consum er de­ m a n d for th ese devices continues to rise, w hich has helped to sh o rte n design cycles an d tig h te n tim e -to -m a rk deadlines. T h e design of em bedded system s is becom ing significantly difficult w ith o u t th e use of c o m p u ter-aid ed design (C A D ) tools th a t can effectively p a rtitio n th e co m ponents into th e h ard w are or softw are dom ains. T h e re are o th e r ad d ed c o n stra in ts t h a t designers m u st consider, such as th e re d u c tio n of In te g ra te d C ircu it (IC ) chip a re a a n d system pow er co n su m p tio n while su stain in g m axim um perfo rm an ce [70].

(19)

1. IN T R O D U C T IO N

te rm in e w hich co m p o n en ts are th e p erfo rm an c e b o ttlen eck s an d w hich co m p o n en ts m eet th e tim in g requirem ents.

P rofiling tools are C A D to o ls t h a t m easu re th e perfo rm an ce of a softw are or h a r d ­ ware sy stem based on th e tim e needed to p erfo rm c e rta in functions. T h e y also help in d e te c tin g problem s such as co m m u n icatio n b o ttlen eck s in a system , cache m isses an d o th e r im p o rta n t m easu rab le p erfo rm an ce m etrics. T h ey allow early d e te c tio n of perfo rm an ce b o ttle n eck s a n d help th e em b ed d ed system designers to optim ize th e ir designs in o rd e r to m eet sy stem p erfo rm an c e co n stra in ts [60, 51].

T h ere are several profiling to o ls available to d a y t h a t can be used to profile softw are code ru n n in g on a ta rg e t processor. T hese to o ls provide different profiling in fo rm a­ tio n t h a t can b enefit em bedded designers so t h a t th e y can o ptim ize th e softw are code. D espite th e v ariety of profiling tools t h a t are available, m an y of th e m use different m easuring techniques t h a t can p o te n tia lly provide in a c c u ra te feedback. T h e m a jo rity of th e profiling to o ls used are softw are-based, w hich require th e designer to com pile th e ir softw are p ro g ram s to include in stru m e n ta tio n code a t th e b in a ry level. T h is is n o t desirable since it is very intrusive to th e o riginal p ro g ram a n d can cause u n p re ­ d ictab le execution b eh av io u r of th e softw are. S am pling techniques are also used in a v arie ty of profiling tools an d can provide varying resu lts d ep en d in g on th e sam ­ pling frequency of th e profiler. T his consequently affects th e accu racy of th e profiled results, w hich can p o te n tia lly lead em b ed d ed designers to im plem ent th e w rong soft­ w are fu n c tio n s in hardw are. It is im p e ra tiv e t h a t profiling to o ls m inim ally d istu rb th e o riginal p ro g ra m b in a ry file an d have th e ab ility to provide a c c u ra te re su lts in order to c re a te an effective h ardw are-softw are p a rtitio n of th e em bedded system .

1.2

T h esis O b jectives

(20)

1. IN T R O D U C T IO N

1. To c reate a m in im ally intrusive profiler t h a t does n o t req u ire th e in se rtio n of in s tru m e n ta tio n code ad d ed to a softw are p ro g ra m ’s b in a ry file. T h is profiler should b e able to a c cu ra te ly m easure th e a m o u n t of tim e a softw are fu n c tio n has ta k e n to execute on a ta rg e t processor.

2. Use th e developed profiler to profile several com m on softw are b en c h m a rk s ru n ­ ning on an F P G A -b a se d soft-core processor system .

To satisfy th e first objective, an F ield P ro g ra m m ab le G a te A rray (F P G A )-b a se d on-chip profiler, called th e A irw o lf profiler, was developed. T h is profiler co n tain s tw enty profiling coun ters t h a t can m easure th e perfo rm an ce of u p to tw en ty different softw are functions. It is m inim ally intrusive an d collects profiling in fo rm atio n by m easuring th e n u m b er of system clock ticks t h a t each softw are function tak es to execute on a soft-core processor. For th e second objective, a profiling en vironm ent was developed t h a t is based on th e A lte ra Nios II soft-core processor [32]. T his en vironm ent was used to execute several softw are ben ch m ark s an d to profile th e m using th e A irw o lf profiler. T h e re su lts o b ta in e d using th e A ir w o lf profiler were com p ared a g a in st th o se o b ta in e d from th e G N U ’s g p ro f [36] softw are-based profiler. T h e resu lts collected using th e A irw o lf profiler show a significant increase in profiling accu racy over tho se of th e g p ro f profiler.

(21)

1. IN T R O D U C T IO N

1.3

T h esis O rganization

(22)

C h a p ter 2

D e s ig n M eth odologies f o r

E m bedded S y s te m s

T h e developm ent of em bedded system s involves th e co m b in atio n of h ard w are a n d so ft­ w are co m p o n en ts to g e th e r to m eet th e requ irem en ts of a specific ap p licatio n . T h ere are several design m ethodologies t h a t can help em bedded designers to co o rd in a te dif­ ferent design ta sk s in ord er to m eet tig h t tim e -to -m a rk e t deadlines an d to fulfill all th e specified p erfo rm an c e requirem ents. T hese are:

• T ra d itio n a l D esign M ethodology

• H ardw are-S oftw are C o-D esign

• F u n c t i o n a l A r c h i t e c t u r e C o -D e s ig n

(23)

2. D E S IG N M E T H O D O L O G IE S F O R E M B E D D E D S Y S T E M S

In th is c h a p te r a b rief in tro d u c tio n to these m ethodologies is pro v id ed so t h a t th e re ad er is able to u n d e rsta n d th e different ap p ro ach es th a t are used in th e design of em bedded system s.

2.1

T raditional D esig n M eth o d o lo g y

T h e T ra d itio n a l Design M ethodology [39] is a set of design appro ach es t h a t are com ­ m only used in th e au to m o tiv e in d u stry [54]. T h is ap p ro ach usually follows a w aterfall m odel of system developm ent [69],

F igure 2.1 shows a flow chart for th e tra d itio n a l m ethodology for th e design of em bedded system s. In itia lly a set of specifications are defined w hich describe th e sy ste m ’s o p eratio n s an d th e perfo rm an ce re q u irem en ts th a t th e sy stem m u st satisfy. A fter th is in itia l step , th e h ard w are an d softw are co m ponents are designed in d ep en ­ dently. U sually a g roup of h ard w are a n d softw are engineers develop th ese com p o n en ts d is ta n t from each o th e r an d a t different tim es d u rin g th e design process. T h e re is very m inim al in te ra c tio n betw een th ese groups as th e h ard w are a rc h ite c tu re is being built an d th e softw are code is w ritte n . It is usually p resu m ed th a t th ese com p o n en ts can be com bined to g e th e r w ith o u t an y in co m p atib ility issues. As th e com p o n en ts are fully synthesized an d functional, th e sy ste m s’ co m p o n en ts are in te g ra te d to g e th e r, d u rin g w h a t is know n as th e system in te g ra tio n stage. Following th is stag e is th e verification a n d p ro to ty p in g stage, d u rin g w hich designers verify an d te s t th e p ro to ty p e . Lastly, th e design is sent for fabricatio n .

T h is design m eth o d o lo g y is su ita b le for sm aller a n d sim pler designs, b u t is n o t feasible for com plex em bedded system s. It in tro d u ces m any problem s a n d causes

(24)

2. D E S IG N M E T H O D O L O G IE S F O R E M B E D D E D S Y S T E M S

System

Verfication

Fabrication

System

Specification

System

Integration

v .

Hardware

Components

Hardware

Synthesis

Hardware

Model

S '

Software

Components

Code

Generation

Software

Model

(25)

2. D E S IG N M E T H O D O L O G IE S F O R E M B E D D E D S Y S T E M S

com ponents, w hich were b u ilt in a different design tim e-fram e, rely on an u n su p p o rte d h ard w are fu n ctio n (or a rc h ite c tu re ) in ord er to execute properly. U sing th e tra d itio n a l design m ethodology, designers use m ost of th e ir tim e on in terface debugging ta sk s an d have less tim e for o th e r im p o rta n t ta sk s such as overall system verification, te stin g an d o p tim iza tio n . In som e cases, m any design ite ra tio n s m ay be re q u ired to m eet design goals an d c o n stra in ts. T h is m ay lead to m issed tim e -to -m a rk e t deadlines an d design obsolescence.

2.2

H ardw are-Softw are C o-D esig n M eth o d o lo g y

T h e C o-D esign m eth o d o lo g y for em bedded system s enables th e h ard w are a n d softw are com ponents to be designed concurrently. I t allows designers to find an efficient and balan ced hardw are-softw are p a rtitio n of th e com ponents of th e em bedded system , while m a in ta in in g com p atib ility . T h is m eth o d o lo g y ensures th e h a rd w a re p latfo rm is able to execute th e softw are com p o n en ts (or su p p o rtin g ap p lica tio n softw are) an d has th e necessary c o m p u tin g resources for p ro p e r execution.

O ne of th e m ain ad v a n ta g es of th e co-design m ethodology is th e a b ility to d etec t early co m p atib ility issues in th e design. W h e n problem s are d e te c te d earlier in th e design stage, th e y are easier an d less expensive to fix [55].

T h e re are m any p ro p o sed co-design m ethodologies an d th e m a jo rity of th e m have focused on th e im p le m e n ta tio n of d ig ita l signal processing a lg o rith m s or em bedded system s design [25]. In each of th e m ethodologies, m ost have com m on design stages th a t will eventually lead to a system t h a t perform s a specific fu n ctio n or ap p licatio n . A flow chart for th e hardw are-softw are co-design m ethodology is show n in F ig u re 2.2 [30].

(26)

2. D E S IG N M E T H O D O L O G IE S F O R E M B E D D E D S Y S T E M S

NO

Acceptable?

YES

END

Partitioning

System

Specification

Verfication

Hardware

Synthesis

Software

Generation

Interface

Synthesis

(27)

2. D E S IG N M E T H O D O L O G IE S F O R E M B E D D E D S Y S T E M S

hardw are-softw are p a rtitio n in g stag e d eterm in es w hich functions o r co m p o n en ts are to be placed in th e h ard w are d o m ain a n d w hich are h an d led by softw are. T h e th ird a n d m o st im p o rta n t sta g e is synthesis, in w hich th e hardw are, softw are an d interface c o m p o n en ts are synthesized concurrently. H ard w are an d softw are engineers co n tin u ­ ously in te ra c t w ith each o th e r by exchanging perfo rm an ce in fo rm a tio n a n d fu n ctio n al re q u irem en ts of all th e com ponents. T h is ensures t h a t th e h ard w are a rc h ite c tu re an d th e softw are p ro g ram can execute to g e th e r w ith o u t difficulty. F inally, th e verification stag e determ in es if th e designed sy stem m eets th e design req u irem en ts an d p erfo r­ m ance co n stra in ts. If th e design fails to m eet th e requirem ents, ite ra tio n is needed, w hich leads back to th e review of th e specifications. T h e n u m b er of ite ra tio n s de­ p en d s on th e design size a n d com plexity. T h e hardw are-softw are co-design process helps m inim ize th e n u m b er of ite ra tio n s an d th e design tim e re q u ired to im plem ent a com plete system .

2.3 F u n ctio n -A rch itectu re C o-D esign

A n o th e r m eth o d o lo g y used in th e design of em bedded system s is th e F u n ctio n A rchi­ te c tu re C o-D esign [54]. In th is ap p ro ach th e em bedded system is b u ilt a t a higher a b stra c tio n level, w hich allows designers to focus on th e design of th e sy ste m ’s func­ tio n a lity w ith o u t having to be concerned w ith how t h a t fu n c tio n a lity is im plem ented. T h e hardw are-softw are co-design p u ts em phasis on interfacing th e h a rd w a re a n d soft­ w are com p o n en ts to g e th e r. T h is process, however, does n o t focus on th e design task s a t th e system -level, w hich o ften leads to ex ten d e d tim e in reaching th e ta rg e t design.

(28)

2. D E S IG N M E T H O D O L O G IE S F O R E M B E D D E D S Y S T E M S

NO

YES

Acceptable?

Prototype

Verification

Mapping

HW/SW

Co-Design

Performance

Simulation

Fabrication

Communication

Refinement

Function

Description

Architectural

Description

(29)

2. D E S IG N M E T H O D O L O G IE S F O R E M B E D D E D S Y S T E M S

• F u n ctio n al D efinition: th e specific fu n ctio n or ap p lica tio n t h a t th e sy stem will provide

• A rc h ite c tu re D efinition: a c a n d id a te a rc h ite c tu re t h a t co n tain s all th e IP cores, h ard w are an d softw are co m ponents t h a t im plem ent th e specified function.

Following th e specification stag e is th e m ap p in g stage, in w hich th e sy ste m ’s functions are p a rtitio n e d an d d irectly m a p p e d to th e chosen sy stem arc h ite c tu re . In ad d itio n , th e h a rd w a re a n d softw are interfaces are also m ap p ed o n to th e a r c h ite c tu re ’s resources. T h e p erfo rm an ce sim u latio n stag e is n ex t, which involves ca rry in g o u t all of th e sim u latio n s for each co m ponent, an d perfo rm in g various verification techniques on th e m a p p e d h ard w are a n d softw are com ponents. T his is done to verify th a t th e m ap p ed system is fu n c tio n a l a n d is cap ab le of m eetin g th e design co n stra in ts. T h e nex t stag e is th e com m u n icatio n refinem ent stag e, in which th e in ter-c o m m u n ica tio n betw een th e various sy stem fu nctions are defined [57], O nce th ese m odelling stages are com pleted, th e system design goes in to a h ardw are-softw are co-design synthesis w here th e com p o n en ts of th e system are synthesized tog eth er. A t th is stage, th e p ro to ty p e of th e em b ed d ed system h as been co n stru c te d , an d th e n goes in to th e verification stage. F u rth e r design ite ra tio n s are p erform ed if th e sy stem does no t m eet th e specified design requirem ents. F a b ric a tio n is th e last stage, in which th e verified system is ta k e n a n d sent off for p ro d u c tio n .

2.4

P la tfo rm -B a sed D esig n

T h e P la tfo rm -B ase d D esign (P B D ) m eth o d o lo g y em phasizes th e use of reusable IP

c o re s a s a p l a t f o r m u p o n w h ic h d e s ig n s a r c c o n s t r u c t e d [54]. T h i s in v o lv e s a d e s ig n -

(30)

2. D E S IG N M E T H O D O L O G IE S F O R E M B E D D E D S Y S T E M S

Application Space

Platform Specification

Platform Design-Space

Exploration

Architectural Space

F ig u re 2.4: D esign S pace E x p lo ra tio n

m iddle a p p ro a c h ” [26] as show n in F ig u re 2.4 [56].

(31)

2. D E S IG N M E T H O D O L O G IE S F O R E M B E D D E D S Y S T E M S

Platform

Instance

Application

Performance

Numbers

Simulation

Platform

Derivation

Mapping/

Compiling

(32)

2. D E S IG N M E T H O D O L O G IE S F O R E M B E D D E D S Y S T E M S

F ig u re 2.5 describes th e P la tfo rm -B a se d D esign m ethodology of em b ed d ed sys­ tem s [54]. T h e designer s ta r ts by specifying th e p la tfo rm arc h ite c tu re , which o u tlin e s th e p erfo rm an ce co n stra in ts an d th e fu n c tio n a lity of th e en tire sy stem based on th e in te n d e d ap p licatio n . T h is includes th e specification of th e required speed of th e m i­ croprocessor, m em ory capacity, cache m em ories, etc. From th e defined req u irem en ts, a p la tfo rm in sta n c e is m ad e w hich co n tain s all of th e in s ta n tia te d h ard w are com po­ n e n ts a n d softw are p ro g ram s required to execute a specific ap p lica tio n . Following th is stag e is th e m ap p in g a n d com piling of th e system , w hich includes h ard w are p la tfo rm synthesis an d th e p ro g ram code generatio n . N ext, th e com piled system goes in to th e sim u la tio n stage, w hen designers te s t all of th e co m ponents to ensure th a t th e y are fu n c tio n in g correctly an d m eetin g th e design co n stra in ts. B ased on th e perfo rm an ce n u m b ers re trie v ed from th e sim u latio n stag e, th e designer can d eterm in e if th e system has satisfied th e specified requ irem en ts. If no t, th e system goes into a n o th e r design ite ra tio n cycle u n til it h as fully m e t all of th e co n stra in ts.

2.5

Sum m ary

(33)

C h a p ter 3

Profiling Tools

T h ere is a w ide v ariety of profiling tools available t h a t m easure different perfo rm an ce m etrics an d retrieve diverse sets of profiling inform ation. Section 3.1 discusses profil­ ing tools an d a prop o sed softw are profiling m eth o d o lo g y for th e design of em bedded system s. T h e sub seq u en t sub-sections classify th e different ty p e s of profilers available as follows: S o ftw are-B ased Profiling (S B P ) Tools, Softw are-B ased M e m o ry Profiling

(S B M P) Tools, H ardw are-C ounter B ased Profiling (H C B P ) Tools a n d F P G A -B a sed Profiling (F P G A -B P ) Tools. In each of th ese categories, a b rief survey of th ese ex­ isting tools is presented.

3.1

P rofilin g Tools and th e

S o f t w a r e P r o f i l i n g M e t h o d o l o g y

(34)

3. P R O F IL IN G T O O L S

specification stag e in w hich all th e fu n ctio n alities of th e system an d th e s u p p o rtin g a rc h ite c tu re to im plem ent t h a t fu n ctio n are defined. U sually em b ed d ed designers have tw o o p tio n s for th e in itia l im p le m e n tatio n of th e ir design based on th e specifi­ cations. For th e first optio n , th e em bedded sy stem can be entirely im p le m e n ted in h ard w are w hile m oving c e rta in com p o n en ts to th e softw are dom ain, d ep e n d in g on th e execution perfo rm an ce of th o se fu n ctio n s [42]. T h e second o p tio n is to have th e e n tire em bedded system im p lem en ted in softw are [35] a n d invoke a profiler t h a t m easures th e p erfo rm an ce of th e softw are pro g ram . T h e in fo rm atio n provided by th e profiler is used by designers to help th e m choose w hich softw are functions are m ore d esirable for h ard w are im p le m e n tatio n .

P rofiling tools are used to m easure th e p erfo rm an ce of a p ro g ra m t h a t is ru n n in g on th e ta rg e t processor of an em bedded h a rd w a re p latfo rm . T h ese tools provide use­ ful in fo rm atio n for designers so th a t th e y can identify ce rtain softw are h o t-s p o ts th a t are causing a perfo rm an ce bottlen eck . D esigners can choose e ith e r to optim ize th e softw are code to allev iate th e perfo rm an ce issue or im plem ent th e c o m p u ta tio n a lly intensive fu n ctio n in th e h ard w are do m ain in o rd e r to achieve a sp eed -u p in p erfo r­ m ance of th e en tire system . It is im p era tiv e t h a t profilers provide a c c u ra te re su lts an d p ro p e rly d e te c t th ese h o t-sp o ts. T h is can lead to th e creatio n of a b alan ced p a rtitio n betw een th e h ard w are an d softw are com ponents. T h e q u ality of th e em b ed d ed system is en tirely d ep e n d en t on th e efficiency an d th e effectiveness of th e hardw are-softw are p a rtitio n of th e sy ste m ’s com ponents. T h e ap p lica tio n of profiling to o ls has led to a prop o sed Softw are Profiling M ethodology (S P M ) as show n in F ig u re 3.1 [60].

(35)

3. P R O F ILIN G T O O L S

NO

YES

^ M eet ^

Requirem ents?,

END S oftw are Im p lem entatio n o f E m bedded System

Profiling

So ftw are M o dification Hardw are Im plem entation Functional

Verification

(36)

3. P R O F IL IN G T O O L S

re tu rn feedback a n d p erfo rm an ce s ta tis tic s to th e designer. T h e designer analyzes th e re su lts an d d eterm in es if th e softw are code m eets th e specified p erfo rm an c e con­ stra in ts. T h a t sam e profiling in fo rm atio n can be used by an a u to m a te d h ard w are - softw are p a rtitio n in g C A D tool [63]. If th e system fails to m eet th e re q u irem en ts, th e designer will try to optim ize th e code o r move ce rtain c o m p u ta tio n a lly intensive fu nctions in to th e h a rd w a re d o m ain as a h a rd w a re accelerator. If necessary, th e en tire m ethodology s ta r ts ag ain u n til th e designer is satisfied w ith th e p erform ance.

E x istin g profiling tools offer different ty p e s of profiling ca p ab ilities a n d su p p o rt different p ro g ram m in g languages. C / C + + profiling tools are com m on, b u t th e re are also tools available t h a t can profile p ro g ram s w ritte n in Ja v a [38, 37]. M en to r Seam less C o-verification en vironm ent provides a profiler t h a t tak es a design w ritte n in S ystem C [13] an d m easures its p erfo rm an ce based on processor u tiliz a tio n , cache efficiency, m em ory h o tsp o ts, b us u tiliz a tio n an d bus m aster co n ten tio n [12].

C u rrently, th e re are m an y different kinds of profiling tools t h a t are used to re triev e a v arie ty of profiled in fo rm atio n a b o u t a pro g ram . T h e m ost com m on is function- level profiling w hich m easures th e am o u n t of tim e needed for a fu n c tio n to execute on th e processor. A n o th er ty p e is m em ory-level profiling th a t d eterm in e s w hich func­ tion, d a ta variable ty p e or in stru c tio n is causing m em ory re la te d problem s: excessive m em ory references, cache m isses, heavy p o in te r dereferencing, b ra n ch in g an d looping in stru ctio n s. F ig u re 3.2 dep icts th e p ro p o sed classification of profiling tools. T h ere are th re e m ain categories: software-based, hardware-based an d F PG A-based. We de­ scribe each of th ese in d e ta il in th e following sections.

3.2

Softw are B ased P rofilin g (S B P ) Tools

(37)

3. P R O F ILIN G T O O L S

Hardware-Based

Software-Based

FPGA-Based

Profilin g T o o ls

G N U ’s gprof Hardware Counters SnoopP

Valgrind Page Migration Approach Frequent Loop Analysis Tool Vtune Performance Analyzer W O0DST0CK

Airwolf

F ig u re 3.2: Profiling Tool C lassification

in sertio n of in s tru m e n ta tio n code. S im ulations ta k e place in v irtu a l en v iro n m en ts th a t sim u la te th e b eh a v io u r of a m icroprocessor as th e softw are code is ru n n in g on a v irtu a l environm ent. T h e in sertio n of in stru m e n ta tio n code allows an S B P to o l to a tta c h itself to th e b in a ry file an d collect p erfo rm an ce in fo rm atio n d u rin g th e execution of a p ro g ram on th e processor. In th is section, we describe an ISS, G N U ’s

gprof [36] a n d In te l’s [11] V tune [45] is given.

3 .2 .1

I n s tr u c tio n S et S im u la to r

In stru c tio n S et S im u lato rs (ISS) are one of th e S B P tools used for profiling so ft­ w are code ru n n in g in a sim u la te d environm ent. O ne p o p u la r ISS is th e Sim pleScalar

T oolset w hich sim ulates ap p lica tio n code ru n n in g on th e Sim pleScalar c o m p u ter a r­ c h itec tu re [29], T h e ad v a n ta g es of using an ISS for profiling is t h a t th e designer is able to view th e en tire d a ta flow m ovem ent inside th e m icro p ro cesso r’s registers d u rin g th e sim ulation. I t keeps tra c k of all of th e execution processes, th e c u rre n t in stru c tio n in execution, d a ta m an ip u latio n s, cache accesses an d o th e r re p o rta b le events. T h is does no t require th e softw are code to b e m odified, th erefo re intrusiveness to th e b in a ry file is n o n-existent.

(38)

-3. P R O F IL IN G T O O L S

on-a-chip designs since th e y can be very slow to sim u late [51]. T h is could lead to very in a c c u ra te profiles of th e execution tim es of each function. S im u latio n s ca n have varying tim es to com plete d ep en d in g on th e com plexity of th e softw are code. I t m ay ta k e several h o u rs to ru n an en tire sim u la tio n w hich m ay only cover a few seconds of real-tim e, th u s m isrep resen tin g th e en tire execution tim e. D ue to th e increasing com plexity of em bedded system s designs, co n stru c tin g com plex m odels of th e sy s te m ’s com p o n en ts a n d o th e r e x te rn a l environm ents m ay n o t be possible

3 .2 .2

G N U ’s g p r o f

gprof [36] is an open-source profiling to o l t h a t is used on L inux [5] an d U nix [6] w o rk sta tio n s to profile C a n d C + + ap p lica tio n code. It provides two ty p es of profiled o u tp u ts: th e flat profile an d th e call grap h . T h e flat profile is a re p o rt of how m uch tim e th e p ro g ram is sp e n t on each fu n ctio n a n d th e n um ber of tim es t h a t fu n ctio n was called. T h e call g ra p h displays each function, its calling fu n ctio n a n d o th er functions called w ith in t h a t function. To utilize th is profiler, th e designer is req u ired to com pile th e code w ith th e d efau lt debug in stru m e n ta tio n settin g . T h is o p tio n in serts ad d itio n a l in s tru m e n ta tio n code into th e b in ary executable file, as req u ired by

gprof.

D u rin g p ro g ram execution, gprof utilizes th e in serted in s tru m e n ta tio n code to m o n ito r th e p erfo rm an c e o f th e p ro g ram ru n n in g on th e C e n tra l P rocessing U n it (C P U ). T h e in stru m e n ta tio n code allows gp ro f to count th e precise n u m b er of func­ tio n calls a n d g en e rate th e a p p ro p ria te n u m b er of in te rru p ts to sam ple th e p ro g ram co u n ter (P C ) of th e C P U . It is capable of g e n e ra tin g a profile t h a t a c cu ra te ly counts th e n u m b er of fu nctions t h a t have been called, however, th e re p o rte d execution tim e of each fu n c tio n m ay b e som ew hat in acc u rate .

(39)

3. P R O F IL IN G T O O L S

executed on th e processor. B ased on th is value, gprof increm en ts th e ex ecu tio n tim e co u n ter of th e fu n c tio n t h a t is c u rren tly executing by its sam p lin g period. T h is can create in a c c u ra te tim in g resu lts for each fu n ctio n called an d th e execution tim e of th e en tire p ro g ram [68]. T h e accuracy of th e profiled execution tim e is en tirely d e p e n d e n t on th e sam p lin g frequency of th e P C .

3 .2 .3

I n t e l’s V T u n e

In te l’s V T une P erfo rm a n ce A n a ly ze r is an S PB to o l t h a t profiles C / C + + code t h a t is executed on In tel processors [45, 47, 11]. T h e V T une an alyzer fe atu res th re e profiling m odes: Sam pling O ver T im e (S O T ), Call Graph an d C ou n ter M onitor. E ach of th ese m odes is discussed briefly in th e following p a ra g ra p h s.

T h e re are two sam pling m eth o d s t h a t are used by VTune: S a m pling O ver T im e

(S O T ) an d th e P a u se /R e su m e A pplica tio n Program m ing In terfa ce (A P I) [24]. S O T profiles th e softw are code an d shows th e perfo rm an ce re su lts specified “over tim e ” of each th re a d , fu n ctio n an d in stru c tio n u n til th e p ro g ram has co m pleted execution. In ad d itio n , it can d e te c t w hen th e processor is in an idle sta te . T h is allows designers to o ptim ize th e ap p lica tio n code to execute o th e r th re a d s w hen th e processor is n o t executing any th rea d s.

S am pling using th e P a u se /R e su m e A P I [24] requires th e user to in sert c e rta in functions into various p a r ts of th e softw are code. Such fu nctions are VTPauseO , VTResumeO, V T P auseSam plingO , VTResumeSamplingO, CMPauseO a n d CMResumeO T hese fu n ctio n s are used to select c e rta in code regions for profiling.

(40)

3. P R O F ILIN G T O O L S

3 .2 .4 S u m m a ry o f S B P T ools

T h e use of th e sam pling tech n iq u e in com m on softw are-based profilers helps to reduce th e ru n -tim e overhead d u rin g profiling. N evertheless, th is can p ro d u ce in a c c u ra te profiled re su lts w hich can p o te n tia lly cre a te a su b -o p tim al p a r titio n of th e em b ed d ed system . T h e use of an ISS can also p ro d u c e in a c c u ra te resu lts since sim u lato rs a re only as go o d as th e sy stem m odel t h a t is being sim ulated. Also, th e sim u latio n tim e m ay n o t a c cu ra te ly m a tc h th e a c tu a l ru n -tim e execution of th e pro g ram . C e rta in S B P tools req u ire th e designer to link th e ir p ro g ram w ith in stru m e n ta tio n code which is in serted a t th e b in a ry level. T h is can lead to an excessive n u m b er of in te rru p t calls w hich m ay cause u n p re d ic ta b le b eh av io u r of th e softw are code ru n n in g on th e em bedded h a rd w a re p latfo rm . A dditionally, th e in stru m e n ta tio n code can lead to an increase in code size a n d m ay p o te n tia lly change th e b eh aviour an d th e perfo rm an ce of th e softw are system .

3.3

Softw are B ased M em ory Profilers (S B M P )

(41)

3. P R O F IL IN G T O O L S

to re triev e in stru c tio n s from its own cache m em ory. T his is due to m isp red icted b ra n ch in g in stru c tio n s, heavily n ested dereferencing of m em ory p o in ters a n d looping in stru ctio n s.

M em ory profilers are needed to d e te c t th e p roblem s listed above, so t h a t th e y can be resolved by th e designer. T h ey provide d etailed in fo rm atio n a b o u t w hich fu n c tio n call in th e a p p lic a tio n code is p ro d u c in g m em ory leaks, cache m isses a n d high m em ory referencing. R educing th e n u m b er of m em ory accesses can im prove p erfo rm an ce an d m inim ize p erfo rm an ce overhead [50]. In th is section, th e following m em ory profiling tools are described: Valgrind [14], a n d P u rify [44],

3.3 .1

V a lg rin d

Valgrind, is an open-source G N U profiling to o l for L inux system s [14]. T h is profiler can check th e calls for re ad an d w rites to m em ory, as well as for allo ca tin g an d freeing m em ory using fu nctions such as th e C + + functions new a n d d e l e t e . T h e m ajo r ad v a n ta g e of Valgrind is its ca p ab ility for cache m em ory profiling. It sim ulates th e C P U ’s Level 1 d a ta a n d in stru c tio n level caches as well as Level 2 cache. Valgrind

determ in es a cache h it count for every line of th e p ro g ram t h a t is being tra c e d an d analyzed. It can profile ap p licatio n s of various sizes, from sm all fu nctions to com plex ap p lica tio n system s.

(42)

3. P R O F IL IN G T O O L S

Illegal to read, write or free red and blue memory

Red Blue

M em ory / M em ory

M alloc

Free Free

Legal to read and write {or free if allocated by

malloc) Legal to write or free, but

illegal to read

Yellow M em ory

Allocated, Uninitialized Memory

W rite

F ig u re 3.3: R a tio n a l P u rity ’s M em ory Profiling C olour C ode

3 .3 .2

R a tio n a l S o ftw a re ’s P u r ify

R a tio n a l S o ftw a re ’s P u rify [44] is a softw are-based m em ory profiler t h a t can be used on M icrosoft W indow s [7], U nix [6] an d L inux [5] o p e ra tin g environm ents. T h e to o l helps in solving m em ory problem s a n d d eterm in es th e exact code lo catio n t h a t is causing th e error. T h e kinds of p roblem s th e p ro g ram d e te c ts are m em ory leaks, re ad in g an d w ritin g beyond th e b o u n d s of an a rra y in m em ory, a tte m p ts to free u n ­ a llo ca te d m em ory a n d using u n -in itialized m em ory. P u rify uses a four colour schem e to re p resen t m em ory problem s as show n in F ig u re 3.3 [44]: red, yellow, green an d blue.

(43)

3. P R O F IL IN G T O O L S

is a llo ca te d by th e p ro g ram . I t is n o t legal to re a d from it because it is n o t in itialized or does n o t co n tain any valid d a ta . T h e green zone is m em ory th a t has b een w ritte n into an d is available for re ad in g a n d w ritin g d a ta . B lue zone is m em ory t h a t is freed by th e p ro g ram a n d is no longer accessible.

3 .3 .3 S u m m a ry o f S B M P T o o ls

M em ory profiling to o ls are essential for d e te c tin g m em ory leaks, allo catio n a n d de­ allo ca tio n errors, as well as in stru c tio n s t h a t cause cache re a d /w rite misses. T h e y give th e designer m ore o p tio n s to analyze an d o ptim ize th e softw are code p rio r to p o rtin g it to th e ta rg e t arc h ite c tu re . In ad d itio n , th e y provide m ore d etailed p erfo rm an ce infor­ m a tio n th a n function-level profilers. T h e p roblem w ith th e c u rren t m em ory profiling tools is t h a t th e y use th e sam e m easu rin g techniques as S B P tools. Som e m em ory profilers req u ire t h a t th e designer include in stru m e n ta tio n code in th e ir ap p lica tio n a t th e b in a ry file. T h is in tro d u ce s th e issue of larg e code sizes an d ru n tim e overhead. Some m em ory profilers use sam pling techniques to sam ple th e h ard w are coun ters an d retriev e th e ir values. As discussed in th e case of softw are-based profiling, sam pling techniques can p ro d u ce in a c c u ra te re su lts an d m ay p o te n tia lly m islead th e designer to im p ro p erly im p lem en t c e rta in fu nctions in th e h ard w are or softw are dom ains.

3.4

H ardw are-C ounter B a sed P rofilin g

(H C B P ) T ools

H a rd w are-C o u n ter B ased P rofiling (H C B P ) to o ls utilize on-chip h ard w are cou n ters

t h a t a r e a v a ila b le o n a d v a n c e d p r o c e s s o r s s u c h a s S u n Ultrasparc [64], In te l P e n tiu m

(44)

3. P R O F IL IN G T O O L S

accesses, cache misses, pipeline stalls, ty p es of in stru c tio n s executed a n d etc. H C B P tools do n o t require th e use of in stru m e n ta tio n code since th ese co u n ters a re designed to collect perfo rm an ce in fo rm atio n of th e softw are program . In a d d itio n , very little perfo rm an ce overhead is in tro d u c e d d u rin g ru n tim e execution.

A ccessing these cou n ters requires a un iq u e in stru ctio n . T h e P erfo rm a n c e A d ­ vanced P rogram m ing Interfa ce (P A P I) [28] provides users w ith a high level in terface to access th ese coun ters an d can s u p p o rts m an y different processors [62], I n t e l’s

V T u n e co u n ter m o n ito r provides an in terface for accessing an d utilizing th e h ard w are counters to profile ap p lica tio n code executing on P en tiu m -b ased processors [46].

3.4 .1

H a rd w a re C o u n te rs A p p ro a ch

Itzko w itz et al from S u n M icro system s have described a softw are profiling to o l t h a t utilizes th e h ard w are coun ters in an U ltrasparc-III m icroprocessor [48]. O riginally th is profiling to o l was b u ilt as a n exten sio n of th e S un O ne S tu d io [4] com pilers a n d perfo rm an ce tools, w hich are used for m easu rin g th e p erform ance of softw are code. T hese h ard w are co u n ters are included in th e a rc h ite c tu re an d co n tain different ty p es of event coun ters such as, In stru c tio n s C om pleted, In stru ctio n -ca ch e (1$) M isses, D ata-cache (D$) R ead M isses, D ata-translation-lookaside-bujfer (D T L B ) M isses, E xternal-cache (E$) References, E$ R ead M isses, E$ S ta ll Cycles, an d m any others.

(45)

3. P R O F IL IN G T O O L S

re c t ad d ress value, due to th e p ossibility th a t th e previous in stru c tio n was a b ra n ch call. In ste a d of relying on th e value of th e P C , th e profiling to o l tries to find th e p ro p e r values in o th e r re g isters to ca lcu late th e effective ad d ress of th e in stru c tio n t h a t caused th e overflow event. It is n o t g u a ra n te e d success in finding th e address since th e value of th e re g isters m ay have changed once o th e r overflow signals have been delivered to o th e r h a rd w a re counters. D espite w ith th ese draw backs, th e to o l has m an ag e d to find th e p ro p e r in stru c tio n 99% of th e tim e. T h e M C F b en c h m a rk was profiled an d th e feedback pro v id ed enabled a 20% perfo rm an ce im provem ent.

3 .4 .2

P a g e M ig r a tio n A p p r o a ch

T h e Page M igration A pproach (P M A ), developed by T ik ir et al utilizes h ard w are- co u n ters for profiling m em ory w ith m em ory p ag e -m ig ratin g capabilities [65]. T h e profiler was used on a m ulti-p ro cesso r system b ase d on S u n ’s Sun F ire S erv e r as show n in F ig u re 3.4. E ach sy stem b o ard co n tain e d several processors a n d m em ory. T h e S u n Fire L in k h ard w are co u n ters are used to sam ple th e frequency w ith which each processor “to u ch es” a page of m em ory t h a t is rem ote from th e o n -b o a rd local m em ory hardw are. A t a c e rta in n u m b er of counts specified by th e user for rem o te to u ch in g of m em ory pages, th e profiler h a lts th e execution. It th e n m ig ra te s th a t p a rtic u la r m em ory p age to th e processor t h a t accesses it m ost frequently for re ad a n d w rite o p eratio n s. P M A h as d e m o n stra te d 90% speed im provem ent w hen c e rta in m em ory pages are placed closest to th e processor t h a t requires d a ta from t h a t page.

3 .4 .3

D e s k to p P r o c e s so r P ro filin g C o u n ters

(46)

3. P R O F IL IN G T O O L S

P a g e Migration

S o ftw a re A p p lic a tio n S u n F ire L ink H a rd w a re C o u n te rs

P hysical P a g e P ro c e s s o r #3 M emory

P hysical P a g e P ro c e s s o r #1 M em ory

P hysical P ro c e s s o r #2

M em ory

P hysical P r o c e s s o r #4

M em ory

F ig u re 3.4: P ag e M ig ratio n A pproach

c e rtain event occurs or th e y can m easure th e d u ra tio n of an event th a t is c u rren tly ta k in g place on th e processor. In te l P e n tiu m m icroprocessors also co n tain a set of p erform ance h ard w are coun ters [46]. T h ey are also event or tim in g driven an d are accessible th ro u g h I n te l’s V T une [45] profiling tool.

3 .4 .4

S u m m a r y o f H C B P T o o ls

Using h ard w are coun ters for profiling softw are code is beneficial since it does no t in tro d u ce any in stru m e n ta tio n code, leaving th e com piled a p p lic a tio n source code u ntouched. A dditionally, th e y do n o t ad d any p erform ance overhead since th e d a ta collection of th ese co u n ters occurs d u rin g ru n tim e execution of th e softw are. However, th e re a re draw backs w hen using H C B P tools. F irst, som e H C B P to o ls m ay req u ire th e user to reconfigure an d re p ro g ram th e coun ters to d etec t different events, w hich can lead to th e a d d itio n of c e rta in fu nctions a t th e source code level. Secondly, th ey use th e sam p lin g m e th o d to sam ple th e h ard w are counters w hich leads back to th e problem s t h a t were in tro d u c e d w ith S B P tools. T hirdly, h an d lin g of in te rru p ts

a ffe c t t h e g a t h e r e d d a t a s in c e t h e i n t e r r u p t s e rv ic e r o u tin e s ( I S R ) u s e d a d d t o t h e

(47)

3. P R O F IL IN G T O O L S

m o n ito rin g events [62].

3.5

F P G A -B a se d P rofiling (F P G A -B P ) Tools

F P G A s are user p ro g ram m a b le in te g ra te d circ u its t h a t offer re aso n ab ly high level of in te g ra tio n , negligible p ro to ty p in g cost an d in sta n ta n e o u s m a n u fa c tu rin g capability. R iding on M o o re’s law [52], F P G A s have grow n in logic ca p acity w hile m a in ta in in g an affordable cost for m an y ap p licatio n s [31]. E m b ed d e d developm ent k its t h a t utilize F P G A s co n tain an ab u n d a n ce of o n -b o a rd resources such as clock m ultip liers, fast m em ory chips, m a th co-processors, etc. T h is m akes th e m an a ttra c tiv e a lte rn a tiv e for ra p id p ro to ty p in g of large em bedded system designs due to th e ir reconfigurability a n d flexibility th a t th e y offer to th e designer.

R esearchers to d a y are developing profiling to o ls th a t can help designers w orking on em bedded sy stem designs using F P G A s. T h e two m a jo r F P G A vendors, A lte ra C o rp o ra tio n [17] an d X ilinx In c o rp o ra te d [72], provide em bedded system developm ent k its w hich use th e Nios II [32] an d M icroB laze [73] soft-core processors, respectively. T hese soft-core processors are in s ta n tia te d on th e F P G A an d used as basic building blocks for designing em bedded system s [66].

F P G A -b a se d profiling (F P G A -B P ) tools also utilize these soft-core processors for profiling. In F P G A -B P tools, th e designer executes th e softw are on th e soft-core processor an d collects th e perfo rm an ce d a ta provided by th e on-chip profiling h a rd ­ ware. T hese tools have pro v id ed im proved re su lts co m p ared to th e p revious profiling tools d escribed earlier. T h e y keep laten c y an d perfo rm an ce overhead a t a m inim um , because th e y are no n -in tru siv e an d require negligible in stru m e n ta tio n . T h ey do n o t

u s e t h e s a m p lin g te c h n iq u e a n d r e q u ir e v e r y m in i m a l p r o c e s s o r c o m p u t a t i o n . T h e s e

(48)

3. P R O F ILIN G T O O L S

System Clock

_n_n__

PC

Segm ent Counter

Segm ent Counter

#N Segm ent

Counter Segm ent

Counter

MicroBlaze

CPU

F ig u re 3.5: S noopy’s Profiling A rc h ite ctu re

3.5.1

S n o o p P

SnoopP [60] is an on-chip function-level profiler t h a t was im plem ented on th e X ilinx V irtex -II 2000 F P G A b o a rd . T h is b o a rd is used to im plem ent designs based on X ilinx M icroB laze [73] soft processor. T h e on-chip profiler utilizes th e M icroB laze as a ta rg e t processor. Sno o p P uses a h ard w are profiling arc h ite c tu re t h a t is non-in tru siv e to th e code, such t h a t any ad d itio n a l in stru ctio n s, com m ands or o th e r flags are no t necessary. F ig u re 3.5 d ep icts th e h ard w are a rc h ite c tu re for th e SnoopP profiler.

Sno o p P consists of a v ariab le n u m b er of segm ent counters t h a t are user specified

a n d d e fin e t h e a d d r e s s o f i n s t r u c t i o n s t o b e a n a ly z e d . T h e n u m b e r o f s e g m e n t c o u n t e r s

(49)

3. P R O F IL IN G T O O L S

P C > = low address P C O U T P U T B U S

— P C < = high ad dress

R E A D B U S

C o u n te r EN

^ 6 4 -b it T im e C ou nter S Y S T E M C L O C K

F ig u re 3.6: S noopy’s Profiling C ounter

address is in th e ra n g e of m em ory addresses in w hich th e b in a ry code co rresponding to th e fu n c tio n resides. T h is is d eterm in e d by th e c o m p a ra to rs inside each segm ent counter. If th is co n d itio n is tru e , th e c o m p a ra to r sends an en able signal to th e hard w are co u n ter which utilizes th e p ro c esso r’s sy stem clock to co u n t th e n u m b er of clock cycles th e fu n ctio n h as used. T h is gives th e designer th e precise n u m b er of clock cycles t h a t th e p a rtic u la r fu n ctio n needs to execute on th e processor. S n o o p P ’s

an d g p r o f’s re su lts were com pared, a n d it was show n th a t SnoopP was significantly m ore a c cu ra te . A dditionally, Sno o p P does n o t slow dow n th e p erfo rm an ce of eith er th e softw are or th e profiling process.

3 .5 .2

F req u en t L o o p A n a ly sis T o o l (F L A T )

Frequent Loop A n a lysis Tool (F L A T ) is a to o l t h a t d etec ts fu n ctio n s in softw are t h a t heavily use loops [40]. In m o st cases, loops use 90% of th e execution tim e while c o n stitu tin g only 10% of th e en tire softw are code. F L A T searches for th ese critical regions an d records th e execution frequency of each loop-intensive fu n c tio n into a cache-like h ard w are a rc h ite c tu re t h a t is im plem ented in an F P G A . A block d iag ra m of th e F L A T a rc h ite c tu re is show n in F ig u re 3.7.

U sually a loop in stru c tio n is typically d en o ted as a Sh o rt B ackw ards B ranch

(50)

3. P R O F IL IN G T O O L S

Read/Write Read/W rite

A d d re s s A d d re s s

Data Data

SBB

Increm ent M icroblaze

CPU

Frequent Loop Cache

C ontroller

Frequent Loop Cache

F ig u re 3.7: F req u en t Loop A nalysis Tool

value of th e SBB is a n egative address offset. T h e Frequent Loop Cache (F L C ) sto res th e execution frequency of each loop fu n ctio n a t th e index m em ory lo catio n t h a t is based on th e SBB value. A cache controller, called th e Frequent Loop Cache C on­ troller, keeps th e d a ta u p d a te d w ith th e la te st values. F L A T does n o t req u ire th e use of in stru m e n ta tio n code or any sam pling techniques. N onetheless, th e accu racy of th e loop d etec tio n relies on th e size of th e on-chip cache in th e F P G A .

3 .5 .3

W o O D S T o C K

W O o D S T O C K [59] (W a tc h es O ver D ata ST rea m in g O n C om puting elem en t lin K s),

is a profiling to o l th a t m o n ito rs th e com m u n icatio n dataflow betw een C o m p u tin g P rocessor E lem ents (C P E s) as show n in F ig u re 3.8

W O oD ST oC K m o n ito rs th e d a ta flow betw een each C P E by ad d in g m o n ito rs to th e circu it w hich ru n in real tim e. T h e d a ta link betw een each elem ent of th e sy stem is created by F ast S im p lex L in k s (FSL s) [71], available in X ilin x ’s M icroB laze [73] soft­ core processor. FSLs allow stre a m in g an d buffering o f-d a ta betw een th e h ard w are co m ponents of th e system . T h e profiler utilizes th e links to m easure th e stre a m of

d a t a b e t w e e n e a c h C P E . I t m e a s u r e s t h e n u m b e r o f r u n - t i m e e x e c u t io n c lo c k cycles to see w hich C P E is sta lle d or sta rv e d for d a ta .

Figure

Figure 2.2: The Hardware-Software Co-Design M ethodology
Figure 2.3: The Function-Architecture Co-Design M ethodology
Figure 2.4: Design Space Exploration
Figure 2.5: Platform  Based Design
+7

References

Related documents