• No results found

Software Development Kit for Multicore Acceleration Version 3.0. Programmer's SC

N/A
N/A
Protected

Academic year: 2021

Share "Software Development Kit for Multicore Acceleration Version 3.0. Programmer's SC"

Copied!
132
0
0

Loading.... (view fulltext now)

Full text

(1)

Software

Development

Kit

for

Multicore

Acceleration

Version

3.0

Programmer's

Guide

(2)
(3)

Software

Development

Kit

for

Multicore

Acceleration

Version

3.0

Programmer's

Guide

(4)

Note: Beforeusingthisinformationandtheproductitsupports,readthegeneralinformationinAppendixD,“Notices,”onpage

101.

EditionNotice

Thiseditionappliestotheversion3,release0oftheIBMSoftwareDevelopmentKitforMulticoreAcceleration (Productnumber5724-S84)andtoallsubsequentreleasesandmodificationsuntilotherwiseindicatedinnew editions.

ThiseditionreplacesSC33-8325-01.

(5)

Contents

Preface

.

.

.

.

.

.

.

.

.

.

.

.

.

.

. v

Aboutthisbook . . . v

What'snewinthisbook . . . v

Supportedplatforms . . . vi

Supportedlanguages . . . vi

Beta-level(unsupported)environments . . . vi

Gettingsupport . . . vi

Relateddocumentation . . . vii

Chapter

1.

SDK

3.0

overview

.

.

.

.

.

. 1

GNUtoolchain . . . 1

IBMXLC/C++compiler . . . 2

IBMFull-SystemSimulator . . . 3

Systemrootimageforthesimulator . . . 4

Linuxkernel . . . 5

CellBElibraries . . . 5

SPERuntimeManagementLibraryVersion2.2 . . 5

SIMDmathlibraries . . . 5

MathematicalAccelerationSubsystem(MASS) libraries . . . 6

ALFlibrary. . . 6

DaCSlibrary . . . 7

Prototypelibraries . . . 8

FastFourierTransform(FFT)library . . . 8

MonteCarlolibraries . . . 8

Codeexamplesandexamplelibraries . . . 9

Performancesupportlibrariesandutilities . . . . 11

SPUtimingtool . . . 11

OProfile . . . 12

SPUprofilingrestrictions. . . 12

SPUreportanomalies . . . 13

Cell-perf-countertool . . . 13

IBMEclipseIDEfortheSDK . . . 14

Hybrid-x86programmingmodeloverview . . . . 14

Chapter

2.

Programming

with

the

SDK

17

Systemrootdirectories . . . 17

Runningthesimulator. . . 18

Thecallthruutility . . . 20

Readandwriteaccesstothesimulatorsysroot image . . . 20

EnablingSymmetricMultiprocessingsupport . . 21

Enablingxclientsfromthesimulator . . . 21

Specifyingtheprocessorarchitecture . . . 21

PPEaddressspacesupportonSPE . . . 22

SDKprogrammingexamplesanddemos . . . . 24

Overviewofthebuildenvironment . . . 24

Changingthedefaultcompiler . . . 24

Buildingandrunningaspecificprogram . . . 25

CompilingandlinkingwiththeGNUtoolchain 25 SupportforhugeTLBfilesystems. . . 26

SDKdevelopmentbestpractices . . . 27

Usingashareddevelopmentenvironment . . . 27

Performanceconsiderations . . . 27

NUMA . . . 27

Preemptivecontextswitching . . . 28

Chapter

3.

Debugging

Cell

BE

applications

.

.

.

.

.

.

.

.

.

.

.

. 29

Overview . . . 29

GDBforSDK3.0 . . . 29

CompilingwithGCCorXLC . . . 29

Usingthedebugger. . . 30

DebuggingPPEcode . . . 30

DebuggingSPEcode . . . 30

Sourceleveldebugging . . . 31

Assemblerleveldebugging . . . 31

Howspu-gdbmanagesSPEregisters . . . . 32

SPUstackanalysis . . . 33

SPEstackdebugging . . . 35

Overview . . . 35

Stackoverflowchecking . . . 36

Stackmanagementstrategies . . . 37

DebuggingintheCellBEenvironment . . . 37

Debuggingmultithreadedcode. . . 37

Debuggingarchitecture . . . 37

Switchingarchitectureswithinasinglethread 39 Viewingsymbolicandadditionalinformation 40 Usingscheduler-locking . . . 41

Usingthecombineddebugger . . . 42

Settingpendingbreakpoints. . . 42

Usingthesetspustop-on-loadcommand . . 43

Disambiguationofmultiply-definedglobal symbols . . . 44

Newcommandreference . . . 45

infospuevent . . . 45

infospusignal . . . 45

infospumailbox . . . 45

infospudma. . . 45

infospuproxydma . . . 46

Settingupremotedebugging . . . 46

Remotedebuggingoverview . . . 47

Usingremotedebugging . . . 47

Startingremotedebugging . . . 47

Chapter

4.

Cell

BE

Performance

Debugging

Tool

.

.

.

.

.

.

.

.

.

.

. 51

Introduction . . . 51

ComponentsHighLevelDescription . . . 51

TracingFacility . . . 52

TraceProcessing. . . 52

Visualization . . . 52

SettingupthePDTtracingfacility. . . 53

ConfiguringthePDTkernelmodule . . . 54

PDTexampleusage . . . 54

EnablingthePDTtracingfacilityforanew application . . . 55

Compilationandapplicationbuilding . . . . 55

SPEcompilation. . . 55

(6)

Runningatrace-enabledprogramusingthePDT

libraries . . . 55

RunningaprogramusingSPEprofiling . . . . 57

ConfiguringthePDTforanapplicationrun . . . 57

UsingtheTracingAPI . . . 58

Essentialdefinitions . . . 58

ApplicationprogrammerAPI . . . 58

User-definedevents . . . 58

Dynamictracecontrol . . . 59

LibrarydeveloperAPI. . . 59

Tracefacilitycontrol . . . 59

Eventsrecording. . . 59

Restrictions . . . 60

InstallingandusingthePDTtracefacilityon Hybrid-x86 . . . 61

PDTonHybrid-x86exampleusage . . . 61

UsingthePDTRtool(pdtrcommand) . . . 61

Chapter

5.

Analyzing

Cell

BE

SPUs

with

kdump

and

crash

.

.

.

.

.

.

.

. 65

Installationrequirements . . . 65

Productionsystem . . . 66

Analysissystem . . . 66

Chapter

6.

Feedback

Directed

Program

Restructuring

(FDPR-Pro)

.

.

.

.

.

. 69

Introduction . . . 69

Inputfiles . . . 70

Instrumentationandprofiling . . . 70

Optimizations . . . 70

Instrumentationandoptimizationoptions . . . . 71

ProfilingSPEexecutablefiles . . . 71

ProcessingPPE/SPEexecutablefiles . . . 71

Integratedmode. . . 71

Standalonemode . . . 72

Human-readableoutput . . . 72

RunningfdprprofromtheIDE . . . 73

Cross-developmentwithFDPR-Pro . . . 73

Chapter

7.

SPU

code

overlays

.

.

.

.

. 75

Whatareoverlays . . . 75

Howoverlayswork . . . 75

Restrictionsontheuseofoverlays. . . 76

Planningtouseoverlays . . . 76

Overview . . . 76

Sizing . . . 76

Scalingconsiderations . . . 77

Overlaytreestructureexample . . . 77

Lengthofanoverlayprogram . . . 78

Segmentorigin . . . 78

Overlayprocessing . . . 79

Callstubs . . . 80

Segmentandregiontables . . . 80

Overlaygraphstructureexample . . . 80

SpecificationofanSPUoverlayprogram . . . . 8383 Codingforoverlays . . . 84

Migration/Co-Existence/Binary-Compatibility Considerations . . . 84

Compileroptions(XLConly) . . . 84

SDKoverlayexamples. . . 85

Simpleoverlayexample . . . 85

Overviewoverlayexample . . . 88

Largematrixoverlayexample . . . 89

UsingtheGNUSPUlinkerforoverlays . . . 91

Appendix

A.

Changes

to

SDK

for

this

release

.

.

.

.

.

.

.

.

.

.

.

.

.

.

. 93

Changestothedirectorystructure. . . 93

Selectingthecompiler . . . 94

Synchingcodeintothesimulatorsysrootimage . . 94

Appendix

B.

PDT

troubleshooting

.

.

. 95

Appendix

C.

Related

documentation

.

. 99

Appendix

D.

Notices

.

.

.

.

.

.

.

. 101

Trademarks . . . 104

Glossary

.

.

.

.

.

.

.

.

.

.

.

.

. 105

Glossary . . . 105

(7)

Preface

TheIBM SoftwareDevelopmentKitforMulticoreAccelerationVersion3.0(SDK 3.0)isacompletepackageoftoolstoenableyoutoprogramapplicationsforthe Cell BroadbandEngine™(CellBE) processor.TheSDK3.0iscomposedof development toolchains,software librariesandsamplesourcefiles,a system simulator, anda Linux®kernel,allofwhichfullysupport thecapabilitiesofthe Cell BE.

About

this

book

Thisbookdescribeshow tousetheSDK3.0towriteapplications.Howtoinstall SDK 3.0isdescribedina separatemanual,SoftwareDevelopmentKitforMulticore Acceleration Version3.0InstallationGuide,andthereisalso aprogrammingtutorial tohelp getyoustarted.

Eachsection ofthisbookcoversadifferenttopic:

v Chapter1,“SDK3.0overview,”onpage1describes thecomponentsoftheSDK

3.0

v Chapter2,“Programmingwith theSDK,”onpage17explainshowtoprogram

applicationsfortheCell BEplatform

v Chapter3,“DebuggingCellBE applications,”onpage29describes howto

debugyourapplications

v Chapter4,“CellBE PerformanceDebuggingTool,”onpage51describeshowto

usetheperformancedebuggingtool

v Chapter5,“AnalyzingCell BESPUswithkdumpandcrash,” onpage65

describesameansofdebuggingkerneldatarelatedtoSPUsthrough specific crashcommands,byusingadumpedkernelimage.

v Chapter6,“FeedbackDirectedProgramRestructuring(FDPR-Pro),”onpage69

describeshowusetheFDPR-Protooltooptimizeyourapplications

v Chapter7,“SPUcodeoverlays,” onpage75describeshowtouseoverlays

What's

new

in

this

book

Thisbookincludesinformationaboutthenew functionalitydeliveredwiththe SDK 3.0,andcompletely replacesthepreviousversion ofthisbook.Thisnew informationincludes:

v PPEaddressspacesupportonSPE

v SPUstackanalysis

v HowtooptimizecodeusingFDPR-Pro,seeChapter6,“FeedbackDirected

ProgramRestructuring(FDPR-Pro),”onpage69

v

HowtousetheChapter4,“CellBEPerformanceDebuggingTool”

v EnhancementstotheGDB,see“Switchingarchitectureswithin asinglethread”

onpage39and “Disambiguationofmultiply-definedglobalsymbols”onpage 44

v HowtodebugkerneldatarelatedtoSPUs,seeChapter5,“AnalyzingCellBE

SPUswith kdumpandcrash,”onpage65

For informationaboutdifferencesbetweenSDK 3.0and previousversions,see AppendixA,“ChangestoSDKforthis release,”onpage93.

(8)

Supported

platforms

Cell BEapplicationscanbedevelopedonthefollowingplatforms:

v x86 v x86-64 v 64-bitPowerPC ®(PPC64) v BladeCenter ® QS20 v BladeCenterQS21

Supported

languages

The supportedlanguagesare:

v C/C++

v Assembler

v Fortran

v ADA(PowerProcessingElement(PPE)Only)

Note: AlthoughC++and Fortranaresupported, takecarewhenyouwritecode

fortheSynergisticProcessingUnits(SPUs)becausemanyoftheC++and Fortranlibrariesare toolargeforthe256KBlocalstoragememoryavailable.

Beta-level

(unsupported)

environments

Thispublicationcontains documentationthatmaybeapplied tocertain

environmentsonan"as-is"basis.Thoseenvironmentsare notsupportedbyIBM, but whereverpossible,workaroundsto problemsareprovidedintherespective forums.

Getting

support

The SDK3.0isavailable throughPassportAdvantage®with fullsupport at:

http://www.ibm.com/software/passportadvantage

Youcanlocatedocumentationand otherresources ontheWorldWideWeb.Refer tothefollowingWebsites:

v IBMBladeCentersystems,optionaldevices,services,and supportinformationat

http://www.ibm.com/bladecenter/

Forserviceinformation,selectSupport.

v developerWorks

®CellBE ResourceCenterat:

http://www.ibm.com/developerworks/power/cell/

ToaccesstheCellBE forumondeveloperWorks,selectCommunity.

v TheBarcelonaSupercomputingCenter (BSC)Web siteat

http://www.bsc.es/projects/deepcomputing/linuxoncell

v Thereisalso supportfortheFull-SystemSimulator andXLC/C++Compiler

throughtheirindividualalphaWorks®forums.Ifindoubt, startwith theCell BE

architectureforum.

v TheGNUProjectdebugger,GDBissupportedthroughmanydifferentforums

ontheWeb,butprimarilyattheGDBWebsite http://www.gnu.org/software/gdb/gdb.html

(9)

Thisversion(SDK3.0)oftheSDKsupersedesallpreviousversions oftheSDK.

Related

documentation

For alistofdocumentationreferencedinthis Programmer'sGuide,seeAppendixB. Related documentation.

(10)
(11)

Chapter

1.

SDK

3.0

overview

Thissectiondescribes thecontentsoftheSDK3.0,where itisinstalledonthe system,and howthevariouscomponentsworktogether.Itcoversthefollowing topics:

v “GNUtoolchain”

v “IBMXLC/C++compiler” onpage2

v “IBMFull-SystemSimulator”onpage3

v “Systemrootimageforthesimulator”onpage4

v “Linuxkernel”onpage5

v “CellBElibraries”onpage5

v “Prototypelibraries”onpage8

v “Performancesupportlibrariesandutilities”onpage11

v “IBMEclipseIDEfortheSDK”onpage14

v “Hybrid-x86programming modeloverview”onpage14

GNU

tool

chain

TheGNUtoolchaincontains theGCCC-language compiler(GCCcompiler)for thePPUandtheSPU.For thePPUitisareplacement forthenativeGCCcompiler onPowerPC (PPC)platforms anditisacross-compileronX86.TheGCCcompiler forthePPUisthedefaultand theMakefiles areconfiguredtouseitwhenbuilding thelibrariesand samples.

TheGCCcompileralsocontains aseparateSPEcross-compilerthatsupportsthe standards definedinthefollowingdocuments:

v

C/C++LanguageExtensionsforCellBroadbandEngineArchitectureV2.5.TheGCC

compilershippedinSDK3.0supportsalllanguageextensiondescribedinthe specificationexceptforthefollowing:

– TheGCCcompilerscurrentlydonotsupportalignmentofstackvariables

greaterthan16bytesasdescribed insection1.3.1.

– TheGCCcompilerscurrentlydonotsupporttheoptionalalternatevector

literalformatspecified insection1.4.6.

– TheGCCcompilerscurrentlysupportmappingbetweenSPUandVMX

intrinsicsasdefinedinsection 5onlyinC++code.

– Therecommended vectorprintfformatcontrolsasspecifiedinsection 8.1.1

due tolibraryrestrictions.

– TheGCCcompilerdoesnotsupport theoptionalAltivecstyle ofvectorliteral

construction usingparenthesis("("and ")").ThestandardCmethodofarray initializationusingcurlybraces(″{″and ″}″)shouldbeused.

– TheC99complexmathlibraryasspecifiedinsection8.1.1due tolibrary

restrictions

v SPUApplicationBinaryInterface(ABI)SpecificationV1.8

v SPUInstructionSetArchitectureV1.2

Theassociatedassembler andlinkeradditionallysupporttheSPUAssembly LanguageSpecificationV1.6.Theassemblerandlinkerarecommon toboththeGCC compilerand theIBM XLC/C++ compiler.

(12)

GDB supportisprovidedforbothPPUand SPUdebugging, andthedebugger client canbe inthesameprocessora remoteprocess.GDBalsosupportscombined (PPUandSPU)debugging.

On anon-PPC system,theinstall directoryfortheGNUtoolchainis

/opt/cell/toolchain.Thereisasingle binsubdirectory,whichcontainsbothPPU and SPUtools.

On aPPC64orBladeCenterQS21,bothtoolchainsareinstalledinto/usr.See “System rootdirectories”onpage17forfurtherinformation.

IBM

XL

C/C++

compiler

IBM XLC/C++ forMulticoreAccelerationforLinuxisan advanced,

high-performance cross-compilerthatistunedfortheCBEA.TheXLC/C++ compiler, whichishosted onanx86,IBM PowerPCtechnology-basedsystem,ora BladeCenterQS21, generatescodeforthePPUorSPU. Thecompilerrequiresthe GCCtoolchain fortheCBEA, whichprovidestoolsforcross-assemblingand cross-linking applicationsfor boththePPE andSPE.

IBM XLC/C++ supportstherevised2003InternationalC++StandardISO/IEC 14882:2003(E),ProgrammingLanguages--C++andtheISO/IEC9899:1999, ProgrammingLanguages--Cstandard,alsoknownasC99. Thecompileralso supports:

v TheC89Standard andK&Rstyleof programming

v Languageextensionsforvectorprogramming

v LanguageextensionsforSPUprogramming

v NumerousGCCCandC++extensionstohelpusersporttheirapplicationsfrom

GCC.

The XLC/C++compileravailable fortheSDK 3.0supportsthelanguages extensionsasspecifiedintheIBMXLC/C++Advanced EditionforMulticore Acceleration forLinuxV9.0LanguageReference.

The XLcompileralsocontainsa separateSPEcross-compilerthatsupportsthe standards definedinthefollowingdocuments:

v C/C++LanguageExtensionsforCellBroadbandEngine ArchitectureV2.5.TheXL

compilershippedinSDK 3.0supportsalllanguageextensiondescribedinthe specificationexceptforthefollowing:

– TheXLcompilerscurrentlydonotsupportalignmentofstackvariables

greaterthan16bytesasdescribed insection1.3.1

– TheXLcompilerscurrentlydonotsupportOperatorOverloadingforVector

Data Typesasdescribedinsection 10

– TheXLcompilerscurrentlydonotsupportVMXfunctionsvec_extract,

vec_insert,vec_promote,andvec_splatsasdescribedinsection7

– TheXLcompilerscurrentlydonotsupportPPEaddress spacesupport on

SPEasdescribedin“GNUtoolchain”onpage1

– TheXLcompilerscurrentlydonotsupportthe__builtin_expect_callbuiltin

function call

– TheXLcompilerscurrentlysupportmappingbetweenSPUandVMX

intrinsicsasdefinedinsection 5onlyinC++code

– Therecommended vectorprintfformatcontrolsasspecifiedinsection 8.1.1

(13)

– TheC99complexmathlibraryasspecifiedinsection8.1.1due tolibrary

restrictions

v SPUApplicationBinaryInterface(ABI)SpecificationVersion1.8

v SPUInstructionSetArchitectureVersion1.2

For informationabouttheXLC/C++compilerinvocationcommandsanda completelistofoptions,refertotheIBMXLC/C++AdvancedEditionforMulticore Acceleration forLinuxV9.0CompilerReference.

ProgramoptimizationisdescribedinIBMXLC/C++AdvancedEditionforMulticore Acceleration forLinuxV9.0ProgrammingGuide.

TheXLC/C++forMulticoreAccelerationfor Linuxcompilerisinstalledinto the/opt/ibmcmp/xlc/cbe/<compiler version number> directory. Documentation is locatedonthefollowingWebsite:

http://publib.boulder.ibm.com/infocenter/cellcomp/v9v111/index.jsp

IBM

Full-System

Simulator

TheIBM Full-SystemSimulator(referredtoasthesimulatorinthisdocument)isa software applicationthatemulatesthebehaviorofafullsystemthatcontains a Cell BEprocessor.YoucanstartaLinuxoperatingsystem onthesimulatorandrun applicationsonthesimulatedoperatingsystem.Thesimulatoralsosupportsthe loadingandrunningofstatically-linkedexecutableprogramsandstandalone tests withoutanunderlyingoperatingsystem.

Thesimulatorinfrastructureisdesignedformodeling processorandsystem-level architecture atlevelsofabstraction,whichvaryfromfunctionaltoperformance simulationmodels witha numberofhybridfidelitypoints inbetween:

v Functional-onlysimulation:Models theprogram-visibleeffectsofinstructions

withoutmodeling thetimeittakestoruntheseinstructions.Functional-only simulationassumesthateachinstructioncanberuninaconstantnumber of cycles.Memoryaccessesaresynchronousandare alsoperformedinaconstant numberofcycles.

Thissimulationmodelisusefulforsoftwaredevelopmentand debuggingwhen aprecisemeasureof executiontimeisnotsignificant.Functionalsimulation proceedsmuchmore rapidlythanperformancesimulation,andsoisalso useful forfast-forwardingtoaspecific pointofinterest.

v Performancesimulation:Forsystemand applicationperformance analysis,the

simulatorprovides performancesimulation(alsoreferredtoastiming simulation).Aperformance simulationmodelrepresentsinternalpolicies and mechanismsforsystemcomponents,suchasarbiters,queues,and pipelines. Operationlatenciesaremodeled dynamicallytoaccountforbothprocessingtime andresourceconstraints.Performancesimulationmodels havebeen correlated againsthardwareorotherreferencestoacceptable levelsof tolerance.

ThesimulatorfortheCell BEprocessorprovides acycle-accurateSPUcore modelthatcanbe usedforperformanceanalysisofcomputationally-intense applications.ThesimulatorforSDK3.0providesadditionalsupport for

performancesimulation.ThisisdescribedintheIBMFull-SystemSimulatorUsers Guide.

Thesimulatorcanalsobe configuredtofast-forwardthesimulation,usinga functional model,toaspecificpoint ofinterestintheapplicationandtoswitch to

(14)

a timing-accuratemodetoconductperformancestudies.Thismeansthatvarious typesofoperationaldetailscanbegatheredtohelp youunderstandreal-world hardwareand softwaresystems.

See the/opt/ibm/systemsim-cell/docsubdirectoryforcompletedocumentation includingthesimulatoruser’sguide.Theprerelease nameofthesimulatoris “Mambo” andthisnamemayappearinsomeofthedocumentation.

The simulatorfortheCell BEprocessorisalsoavailable asan independent technology at

http://www.alphaworks.ibm.com/tech/cellsystemsim

System

root

image

for

the

simulator

The systemrootimageforthesimulatorisa filethatcontainsadisk imageof Fedora 7files,libraries,andbinariesthatcanbeusedwithinthesystemsimulator. Thisdiskimagefileispreloaded withafullrangeofFedora7utilitiesand also includesallof theCell BELinuxsupportlibrariesdescribed in“Performance support librariesandutilities”onpage11.

ThisRPMfileisthelargest oftheRPMfilesandwhenit isinstalled,ittakesupto 1.6GBonthehostserver’sharddisk.Seealso“System rootdirectories” onpage 17.

The systemrootimageforthesimulatormustbe locatedeither inthecurrent directory whenyoustart thesimulatororthedefault/opt/ibm/systemsim-cell/ images/cell directory.Thecellsdkscript automaticallyputsthesystemrootimage into thedefaultdirectory.

Youcanmountthesystemrootimagetoseewhatitcontains.Assuminga mount point of/mnt/cell-sdk-sysroot,whichisthemountpointusedbythe

cellsdk_sync_simulatorscript,thecommandto mountthesystemrootimageis: mount -o loop /opt/ibm/systemsim-cell/images/cell/sysroot_disk /mnt/cell-sdk-sysroot/

The commandtounmounttheimageis: umount /mnt/cell-sdk-sysroot/

Donotattempttomounttheimageonthehostsystem whilethesimulatoris running. Youshouldalways unmountthesystemrootimagebeforeyoustartthe simulator. Youshouldnotmountthesystemrootimagetothesamepointasthe rootonthehostserverbecausethesystemcanbecomecorruptedand failtoboot. Youcanchangefilesonthesystemrootimagediskinthefollowingways:

v Mountitasdescribedabove.Thenchangedirectory(cd)tothemountpoint

directoryorbelowand usehostsystem tools,suchasviorcptomodifythefile. DonotattempttousetheRPMutilityonanx86platformtoinstallpackagesto thesysrootdisk,becausetheRPMdatabaseformatsare notcompatiblebetween thex86andPPCplatforms.

v Usethe/opt/cell/cellsdk_sync_simulatorcommandtosynchronizethesystem

rootimagewiththe/opt/cell/sysrootdirectory forlibrariesandsamples(see “Systemrootdirectories”onpage17)thathavebeencross-compiledandlinked onahostsystemand needtobecopied tothetargetsystem.

v Usethecallthrumechanism(see“Thecallthru utility”onpage20)tosourceor

sinkthehostsystemfilewhenthesimulatorisrunning.Thisistheonlymethod thatcanbe usedwhilethesimulatorisrunning.

(15)

Linux

kernel

For theBladeCenterQS21, thekernelisinstalledinto the/boot directory, yaboot.conf ismodifiedanda rebootisrequiredtoactivate thiskernel.The cellsdk install taskisdocumentedin theSDK3.0InstallationGuide.

Note: Thecellsdk uninstall commanddoes notautomaticallyuninstallthe

kernel.Thisavoidsleavingthesystem inanunusablestate.

Cell

BE

libraries

Thefollowinglibraries aredescribedinthissection:

v “SPERuntimeManagement LibraryVersion2.2”onpage5

v “SIMDmathlibraries”onpage5

v “MathematicalAccelerationSubsystem (MASS)libraries”onpage6

v “ALFlibrary”onpage6

v “DaCSlibrary”onpage7

SPE

Runtime

Management

Library

Version

2.2

TheSPERuntime ManagementLibrary (libspe)constitutesthestandardized

low-level applicationprogramming interface(API)forapplicationaccesstotheCell BE SPEs.ThislibraryprovidesanAPItomanageSPEsthatisneutralwithrespect totheunderlyingoperatingsystemand itsmethods. Implementationsof this librarycanprovideadditionalfunctionalitythatallowsforaccessto operating system orimplementation-dependentaspectsofSPE runtimemanagement.These capabilitiesarenotsubjecttostandardizationand theirusemayleadto

non-portablecodeand dependenciesoncertainimplemented versionsofthe library.

Theelfspeisa PPEprogramthatallows anSPEprogramto rundirectlyfroma LinuxcommandpromptwithoutneedingaPPE applicationtocreateanSPE threadand waitforittocomplete.

For theBladeCenterQS21, theSDK installsthelibspeheaders,libraries,and binariesintothe/usrdirectoryandthestandalone SPEexecutive,elfspe,is registeredwith thekernelduringboot bycommandsadded to/etc/rc.d/init.d usingthebinfmt_miscfacility.

For thesimulator,thelibspeandelfspebinariesand librariesarepreinstalledin thesamedirectoriesinthesystemrootimageandnofurtheraction isrequiredat install time.

SPERuntime ManagementLibraryversion 2.2isanupgradetoversion2.1. For more information,seetheSPERuntime ManagementLibraryReference.

SIMD

math

libraries

Thetraditionalmathfunctionsare scalarinstructions,anddonottakeadvantageof thepowerfulSingleInstruction,Multiple Data(SIMD)vectorinstructionsavailable inboththePPUandSPUintheCellBEArchitecture.SIMDinstructionsperform computationsonshortvectorsofdatainparallel,insteadofonindividualscalar dataelements.They oftenprovidesignificantincreasesinprogramspeedbecause more computationcanbedone withfewerinstructions.

(16)

The SIMDmathlibraryprovidesshortvectorversions ofthemathfunctions.The MASSlibraryprovideslongvector versions.Thesevector versionsconformas closely aspossibletothespecifications setoutbythescalarstandards.

The SIMDmathlibraryisprovidedbytheSDKasbotha linkablelibraryarchive and asasetof inlinefunction headers.ThenamesoftheSIMDmathfunctionsare formedfromthenamesofthescalarcounterpartsbyappendinga vectortype suffix tothestandardscalarfunction name.For example,theSIMDversionofthe absolutevalue functionabs(),whichactsona vectoroflongintegers,iscalled absi4().Inlineversionsoffunctionsare prefixedwiththecharacter″_″

(underscore),sotheinlineversionofabsi4() iscalled_absi4().

For moreinformationabouttheSIMDmathlibrary,refertoSIMDMathLibrary Specification forCellBroadbandEngineArchitectureVersion1.1.

Mathematical

Acceleration

Subsystem

(MASS)

libraries

The MathematicalAccelerationSubsystem(MASS) consistsoflibrariesof mathematicalintrinsic functions,whicharetunedspecificallyforoptimum performance ontheCell BEprocessor.Currentlythe32-bit,64-bitPPU, andSPU libraries aresupported.

Theselibraries:

v Includebothscalarandvector functions

v Arethread-safe

v Supportboth 32-and 64-bitcompilations

v Offerimprovedperformance overthecorrespondingstandardsystemlibrary

routines

v Areintendedforuseinapplicationswhereslightdifferencesinaccuracyor

handlingof exceptionalvaluescanbetolerated

YoucanfindinformationaboutusingtheselibrariesontheMASSWebsite: http://www.ibm.com/software/awdtools/mass

ALF

library

TheALFprovidesa programmingenvironmentfordataand taskparallel applicationsand libraries.TheALFAPIprovideslibrarydeveloperswitha setof interfacestosimplifylibrarydevelopmentonheterogenousmulti-core systems. Library developerscanusetheprovidedframeworktooffloadcomputationally intensive workto theaccelerators.Morecomplex applicationscanbe developedby combiningtheseveralfunctionoffloadlibraries.Applicationprogrammers canalso choose toimplementtheirapplicationsdirectlytotheALFinterface.

ALF supportsthemultiple-program-multiple-data(MPMD)programming module where multipleprogramscanbescheduledtorunonmultiple acceleratorelements at thesametime.

TheALFfunctionalityincludes:

v Datatransfermanagement

v Paralleltaskmanagement

v Doublebuffering

(17)

With theprovidedplatform-independentAPI,youcanalsocreatedescriptionsfor multiple computetasksand definetheirorderinginformationexecutionorders by definingtaskdependency.Taskparallelismisaccomplishedbyhavingtasks withoutdirectorindirectdependenciesbetweenthem.TheALF runtimeprovides an optimalparallelschedulingschemeforthetasksbased ongivendependencies. From theapplicationorlibraryprogrammer’spointofview,ALFconsistsofthe followingtworuntimecomponents:

v Ahost runtimelibrary

v

Anacceleratorruntimelibrary

Thehost runtimelibraryprovides thehostAPIstotheapplication.Theaccelerator runtime libraryprovidestheAPIstotheapplication’sacceleratorcode,usuallythe computational kerneland helperroutines.Thisdivisionoflaborenables

programmers tospecializeindifferentpartsofagivenparallelworkload.

TheALFdesignenablesaseparationof work.Therearethreedistincttypesoftask within agivenapplication:

Application

Youdevelop programsonlyat thehostlevel.Youcanusetheprovided acceleratedlibrarieswithoutdirectknowledgeoftheinnerworkingsofthe underlyingsystem.

Acceleratedlibrary

YouusetheALFAPIstoprovidethelibraryinterfacestoinvoke the computational kernelsontheaccelerators.Youdividetheproblemintothe controlprocess,whichrunsonthehost,andthecomputational kernel, whichrunsontheaccelerators.Youthen partitiontheinputandoutput intoworkblocks,whichALF canscheduletorunondifferentaccelerators. Computational kernel

Youwriteoptimizedacceleratorcode attheacceleratorlevel. TheALFAPI provides acommoninterfacefor thecomputetasktobe invoked

automaticallybytheframework.

Theruntime frameworkhandlestheunderlyingtaskmanagement,datamovement, and errorhandling,whichmeansthatthefocusisonthekernelandthedata partitioning,notthedirectmemoryaccess(DMA) listcreation orthelock management ontheworkqueue.

TheALFAPIs areplatform-independentandtheirdesignisbasedonthefactthat manyapplicationstargetedforCellBEormulti-core computingfollowthegeneral usage patternofdividingaset ofdataintoself-containedblocks,creatingalistof datablocks tobecomputedontheSPE,andthenmanagingthedistributionofthat datatothevariousSPEprocesses.Thistypeof controlandcompute processusage scenario,alongwiththecorrespondingworkqueuedefinition,arethefundamental abstractions inALF.

DaCS

library

TheDaCSlibraryprovidesaset ofservicesforhandlingprocess-to-process communicationina heterogeneousmulti-core system.Inaddition tothebasic messagepassingservicetheseinclude:

v Mailboxservices

v Resourcereservation

(18)

v Processanddatasynchronization

v Remotememoryservices

v Errorhandling

The DaCSservicesare implementedasaset ofAPIsproviding anarchitecturally neutrallayerforapplication developersTheystructuretheprocessingelements, referredtoasDaCSElements(DE),intoa hierarchicaltopology. Thisincludes general purposeelements,referredtoasHostElements(HE), andspecial processingelements,referredtoasAcceleratorElements(AE).Hostelements usuallyrunafulloperatingsystemand submitworktothespecializedprocesses whichrunintheAcceleratorElements.

Prototype

libraries

Thissectionprovides anoverviewofthefollowingprototypelibraries,whichare shippedwith SDK3.0:

v “FastFourierTransform(FFT)library”

v “MonteCarlolibraries”

Fast

Fourier

Transform

(FFT)

library

ThisprototypelibraryhandlesawiderangeofFFTs,andconsistsofthefollowing:

v APIforthefollowingroutinesusedinsingleprecision:

– FFTReal->Complex1D

– FFTComplex-Complex 1D

– FFTComplex->Real1D

– FFTComplex-Complex 2Dforfrequencies(from1000x1000to2500x2500)

Theimplementationmanagessizesupto10000and handlesmultiplesof2,3, and5aswell aspowersof thosefactors,plusonearbitraryfactoraswell.User coderunningonthePPUmakesuseoftheCBEFFTlibrarybycallingoneof either1Dor2Dstreamingfunctions.

v

Power-of-two-only2DFFTcodeforcomplex-to-complexsingleand double

precisionprocessing.

Bothpartsofthelibraryrunusingacommoninterface thatcontainsan initializationand terminationstep,andanexecutionstepwhichcanprocess “one-at-a-time” requests(streaming)orentirearraysofrequests(batch). EnterthefollowingtoviewadditionaldocumentationfortheprototypeFFT library:

man /opt/cell/sdk/prototype/usr/include/libfft.3

Monte

Carlo

libraries

The MonteCarlolibrariesare aCellBE implementationof RandomNumber Generator(RNG)algorithmsandtransforms.Theobjectiveof thislibraryisto providefunctionsneededtoperformMonteCarlosimulations.

The followingRNGalgorithmsareimplemented:

v Hardware-based

v Kirkpatrick-Stoll

v MersenneTwister

(19)

Thefollowingtransformsare provided:

v Box-Mueller

v Moro'sInversion

v PolarMethod

Code

examples

and

example

libraries

Theexample librariespackageprovidesa setofoptimizedlibraryroutinesthat greatly reducethedevelopment costandenhancetheperformance ofCellBE programs.

TodemonstratetheversatilityoftheCellBEarchitecture,a varietyof application-orientedlibrariesare included,suchas:

v FastFourierTransform(FFT)

v Imageprocessing

v Softwaremanagedcache

v Gamemath v Matrixoperation v Multi-precisionmath v Synchronization v Vector

Additional examplesanddemosshowhow youcanexploittheon-chip computational capacity.

Boththebinaryandthesourcecodeare shippedinseparate RPMs.TheRPM namesare: v cell-libs v cell-examples v cell-demos v cell-tutorial

For eachofthese,thereisoneRPMthathasthebinaries-alreadybuiltversions, thatare installedinto /opt/cell/sdk/usr,andforeachofthese,thereisoneRPM thathasthesourceinatarfile.Forexample,cell-demos-source-3.0-1.rpmhas demos_source.tarandthis tarfilecontainsallofthesourcecode.

Thedefaultinstallation processinstallsthebinariesandinstallsthesourcetarfiles. Youneedtodecideintowhichdirectoryyouwanttountarthose files,eitherinto /opt/cell/sdk/src,orinto a'sandbox'directory.

Thelibraries andexamplesRPMshavebeenpartitionedintothefollowing subdirectories.

Table1.SubdirectoriesforthelibrariesandexamplesRPM

Subdirectory Description

/opt/cell/sdk/buildutils ContainsaREADMEandthemakeincludefiles(make.env, make.header,make.footer)thatdefinetheSDKbuildenvironment.

/opt/cell/sdk/docs Containsalldocumentation,includinginformationaboutSDK3.0 librariesandtools.

(20)

Table1.SubdirectoriesforthelibrariesandexamplesRPM (continued)

Subdirectory Description

/opt/cell/sdk/usr/bin /opt/cell/sdk/usr/spu/bin

Containsexecutableprogramsforthatplatform.Onanx86system, thisincludestheSPUTimingtool.OnaPPCsystem,thisalsoincludes alloftheprebuiltbinariesfortheSDKexamples(ifinstalled).Inthe SDKbuildenvironment(thatis,withbuildutils/make.footer)the

$SDKBIN_<target>variablespointtothesedirectories.

/opt/cell/sdk/usr/include /opt/cell/sdk/usr/spu/include

ContainsheaderfilesfortheSDKlibrariesandexamplesonaPPC system.IntheSDKbuildenvironment(thatis,withthe

buildutils/make.footer)the$SDKINC_<target>variablespointtothese directories.

/opt/cell/sdk/usr/lib /opt/cell/sdk/usr/lib64 /opt/cell/sdk/usr/spu/lib

ContainslibrarybinaryfilesfortheSDKlibrariesonaPPCsystem.In theSDKbuildenvironment(thatis,withthebuildutils/make.footer) the$SDKLIB_<target>variablespointtothesedirectories.

/opt/cell/sdk/src Containsthetarfilesforthelibrariesandexamples(ifinstalled).The tarfilesareunpackedintothesubdirectoriesdescribedinthe followingrowsofthistable.EachdirectoryhasaREADMEthat describestheircontentsandpurpose.

/opt/cell/sdk/src/lib Containsaseriesoflibrariesandreusableheaderfiles.Complete

documentationforalllibraryfunctionsisinthe/opt/cell/sdk/docs/ lib/SDK_Example_Library_API_v3.0.pdffile.

/opt/cell/sdk/src/examples TheexamplesdirectorycontainsexamplesofCellBEprogramming techniques.Eachprogramshowsaparticulartechnique,orsetof relatedtechniques,indetail.Youcanreviewtheseprogramswhen youwanttoperformaspecifictask,suchasdouble-bufferedDMA transferstoandfromaprogram,performinglocaloperationsonan SPU,orprovideaccesstomainmemoryobjectstoSPUprograms. Somesubdirectoriescontainmultipleprograms.Thesyncsubdirectory hasexamplesofvarioussynchronizationtechniques,includingmutex operationsandatomicoperations.

Thespuletmodelisintendedtoencouragetestingandrefinementof programsthatneedtobeportedtotheSPUs;italsoprovidesaneasy waytobuildfiltersthattakeadvantageofthehugecomputational capacityoftheSPUs,whilereadingandwritingstandardinputand output.

Othersamplesworthnotingare:

v Overlaysamples

v SWmanagedcachesamples

(21)

Table1.SubdirectoriesforthelibrariesandexamplesRPM (continued)

Subdirectory Description

/opt/cell/sdk/src/demos Thedemodirectoryprovidesahandfulofexamplesthatcanbeused tobetterunderstandtheperformancecharacteristicsoftheCellBE processor.Therearesampleprograms,whichcontaininsightsinto howreal-worldcodeshouldrun.

Note: Runningtheseexamplesusingthesimulatortakesmuchlonger thanonthenativeCellBE-basedhardware.Theperformance

characteristicsinwall-clocktimeusingthesimulatorareextremely inaccurate,especiallywhenrunningonmultipleSPUs.Youneedto examinetheemulatorCPUcyclecountsinstead.

Forexample,thematrix_mulprogramletsyouperformmatrix multiplicationsononeormoreSPUs.Matrixmultiplicationisagood exampleofafunctionwhichtheSPUscanacceleratedramatically. Unlikesomeoftheotherexampleprograms,theseexampleshave beentunedtogetthebestperformance.Thismakesthemharderto readandunderstand,butitgivesanideaforthetypeofperformance codethatyoucanwritefortheCellBEprocessor.

/opt/cell/sdk/src/benchmarks Thebenchmarksdirectorycontainssamplebenchmarksforvarious operationsthatarecommonlyperformedinCellBEapplications.The intentofthesebenchmarksistoguideyouinthedesign,

development,andperformanceanalysisofapplicationsforsystems basedontheCellBEprocessor.Thebenchmarksareprovidedin sourceformtoallowyoutounderstandindetailtheactualoperations thatareperformedinthebenchmark.Thisalsoprovidesyouwitha basisforcreatingyourownbenchmarkcodestocharacterize performanceforoperationsthatarenotcurrentlycoveredinthe providedsetofbenchmarks.

/opt/cell/sdk/prototype/src Containsthetarfilesforexamplesanddemosforvariousprototype packagesthatshipwiththeSDK.EachhasaREADMEthatdescribes theircontentsandpurpose.

/opt/cell/sysroot Containstheheaderfilesandlibrariesusedduringcross-compiling andcontainsthecompiledresultsofthelibrariesandexamplesonan x86system.Thecompiledlibrariesandexamples(everythingunder

/opt/cell/sysroot/opt/cell/sdk)canbesynchedupwiththe simulatorsystemrootimagebyusingthecommand:

/opt/cell/cellsdk_sync_simulator.

Performance

support

libraries

and

utilities

Thefollowingsupport librariesandutilitiesareprovidedbytheSDKtohelpyou with developmentand performancetestingyourCellBE applications.

SPU

timing

tool

TheSPUstatictimingtool, spu_timing,annotatesanSPUassemblyfilewith scheduling,timing,andinstructionissueestimatesassumingastraight,linear executionoftheprogram. Thetoolgeneratesa textualoutputoftheexecution pipeline oftheSPEinstructionstreamfromthisinputassemblyfile.Run spu_timing -–helptosee itsusage syntax.

(22)

OProfile

OProfile isatoolforprofilinguserandkernellevelcode.Itusesthehardware performance counterstosampletheprogramcountereveryNevents.Youspecify thevalue ofNaspart oftheevent specification.Thesystemenforcesaminimum value onNtoensure thesystem doesnotgetcompletely swampedtryingto capturea profile.

Make sureyouselectalargeenoughvalue ofNtoensuretheoverheadof collectingtheprofileisnotexcessivelyhigh.

The opreporttoolproducestheoutputreport.Reports canbegeneratedbasedon thefilenamesthatcorrespondtothesamples,symbolnamesorannotatedsource code listings.

How touseOProfile andthepostprocessingtoolisdescribedintheusermanual available at:

http://oprofile.sourceforge.net/doc/

The currentSDK 3.0version ofOProfileforCell BEsupportsprofilingonthe POWER™processoreventsandSPUcycleprofiling.Theseeventsincludecycles as

well asthevariousprocessor,cacheandmemoryevents. Itispossibletoprofileon upto foureventssimultaneouslyontheCellBEsystem.Thereare restrictionson whichofthePPUeventscanbe measuredsimultaneously.(Thetoolnowverifies thatmultiple eventsspecified canbe profiledsimultaneously.Intheprevious release itwasuptoyoutoverifythat.)WhenusingSPUcycleprofiling,events must bewithin thesamegroupduetorestrictions intheunderlyinghardware support fortheperformance counters.You canusetheopcontrol –list-events commandtoview theeventsandwhichgroupcontains eachevent.

Thereisonesetof performancecountersforeachnodethatareshared betweenthe twoCPUsonthenode.For agivenprofileperiod,onlyhalfofthetimeisspent collectingdatafortheevenCPUsandhalfofthetimefortheodd CPUs.Youmay need toallowmoretimetocollect theprofiledataacrossallCPUs.

Notes:

1. Before youissue anopcontrol --start,youshouldissuethefollowing

command:

opcontrol --start-daemon

2. Toproduceareportwith Linuxkernelsymbolinformationyoushouldinstall

thecorrespondingKerneldebuginfo RPM..

SPU

profiling

restrictions

WhenSPUcycleprofilingisused, theopcontrol commandisconfiguredfor separatingtheprofilebasedonSPUsandonthelibrary.Thiscorrespondstothe youspecifying–separate=CPUand –separate=lib.TheseparateCPU isrequired because itispossibleto havemultiple SPUbinaryimagesembedded intothe executablefileorintoa sharedlibrary.Sofora givenexecutable,thevariousSPUs maybe runningdifferentSPUimages.

With –separate=CPU,theimageandcorrespondingsymbolscanbe displayedfor eachSPU. Theusercanusetheopreport–mergecommandtocreateasinglereport for allSPUsthatshowsthecountsforeachsymbolinthevariousembedded SPU binaries. By default,opreportdoesnotdisplaytheapp namecolumnwhenit reports samplesfor asingleapplication,suchaswhenitprofilesa singleSPU application. Foropreporttoattributesamplestoabinaryimage, theopcontrol

(23)

script defaultstousing–separate=libwhenprofilingSPUapplicationssothatthe image namecolumn isalwaysdisplayed inthegenerated reports.

SPU

report

anomalies

ThereportfileusesthetermCPUswhen theevent isSPU\_CYCLES. Inthis case, CPUsactuallyrefertothevariousSPUsinthesystem.Forallotherevents,the CPU termreferstothevirtualPPUprocessors.

With SPUprofiling,opreport’s--long-filenamesoptionmaynotprintthefullpath of theSPUbinaryimageforwhichsampleswere collected.Shortimagenamesare usedforSPUapplicationsthatemploy thetechniqueof embeddingSPUimagesin anotherfile(executableorshared library).TheembeddedSPUELFdatacontains onlythefilenameandnopathinformationtotheSPUbinaryfilebeingembedded because thisfilemaynotexist orbeaccessibleatruntime.Youmusthavesufficient knowledgeoftheapplication’sbuildprocess tobeabletocorrelate theSPUbinary imagenamesfoundinthereporttotheapplication’ssourcefiles.

Tip

Compiletheapplicationwith -gandgeneratetheOProfilereportwith-gto facilitatefindingtherightsourcefile(s) tofocuson.

Generally,whenthereportcontainsinformationaboutasingle application, opreportdoesnotincludethereportcolumnfortheapplicationname.Itis assumedthattheperformanceanalystknowsthenameoftheapplicationbeing profiled.

Cell-perf-counter

tool

Thecell-perf-counter (cpc)toolisusedforsettingupand usingthehardware performance countersintheCell BEprocessor.Thesecountersallowyoutosee how manytimescertain hardwareeventsareoccurring, whichisusefulif youare analyzing theperformance ofsoftwarerunningona CellBE system.Hardware eventsareavailable fromallofthelogicalunitswithintheCell BEprocessor, includingthePPE,SPEs,interfacebus, andmemoryandI/Ocontrollers.Four 32-bit counters,whichcanalso beconfiguredaspairsof16-bit counters,are providedin theCell BEperformancemonitoringunit(PMU)forcountingthese events.

Thecpc toolalso makesuseofthehardwaresamplingcapabilitiesoftheCellBE PMU.Thisfeatureallowsthehardwaretocollectveryprecisecounterdataat programmable timeintervals. Theaccumulateddatacanbeusedtomonitorthe changes inperformanceoftheCell BEsystemoverlongerperiodsof time.

Thecpc toolprovides avarietyofoutputformatsforthecounterdata.Simpletext outputisshownin theterminalsession, HTMLoutputisavailableforviewingina Webbrowser,andXMLoutputcanbegeneratedforusebyhigher-level analysis toolssuchastheVisualPerformanceAnalyzer(VPA).

Youcanfinddetailsinthedocumentationand manualpagesincludedwiththe cellperfctr-tools package,whichcanfound inthe /usr/share/doc/cellperfctr-<version>/ directoryafter youhaveinstalledthepackage.

(24)

IBM

Eclipse

IDE

for

the

SDK

IBM EclipseIDEfortheSDK isbuiltupontheEclipseandCDevelopmentTools (CDT) platform.Itintegrates theGNUtoolchain,compilers,theFull-System Simulator,and otherdevelopment componentstoprovideacomprehensive, Eclipse-based developmentplatformthatsimplifies development.Thekeyfeatures include thefollowing:

v

AC/C++ editorthatsupportssyntax highlighting,a customizabletemplate,and

anoutlinewindow viewforprocedures,variables,declarations,andfunctions thatappearinsourcecode

v Avisualinterface forthePPEandSPEcombinedGDB(GNUdebugger)

v SeamlessintegrationofthesimulatorintoEclipse

v Automaticbuilder,performancetools,andseveralotherenhancements

v Remotelaunching,runninganddebuggingonaBladeCenterQS21

v ALFsourcecodetemplatesforprogramming modelswithin IDE

v AnALF CodeGeneratortoproduceanALFtemplatepackage withCsource

codeanda readme.txtfile

v

Aconfigurationoptionfor boththeLocalSimulatorand RemoteSimulatortarget

environmentsthatallowsyoutochoosebetweenlaunchinga simulation machinewiththeCell BEprocessororanenhancedCBEA-compliant processor witha fullypipelined,doubleprecisionSPEprocessor

v RemoteCell BEandsimulatorBladeCentersupport

v SPUtimingintegration

v AutomaticmakefilegenerationforbothGCCandXLCprojects

For informationabouthowtoinstalland removetheIBM EclipseIDEfortheSDK, seetheSDK3.0InstallationGuide.

For informationaboutusingtheIDE,atutorialisavailable.TheIDEandrelated programs mustbeinstalledbefore youcanaccessthetutorial.

Hybrid-x86

programming

model

overview

The CellBroadbandEngineArchitecture(CBEA)isanexampleofamulti-core hybridsystemona chip.Thatistosay,heterogeneouscoresintegratedonasingle processor withaninherent memoryhierarchy.Specifically,thesynergistic

processingelements(SPEs)canbe thoughtof ascomputational acceleratorsfora more generalpurposePPE core.Theseconceptsofhybridsystems,memory hierarchiesandacceleratorscanbe extendedmoregenerallytocoupledI/O devices,and examplesofthose systemsexist today,forexample,GPUsinPCIe slots forworkstationsanddesktops.Similarly,theCell BEprocessorsisbeingused in systemsasanaccelerator, wherecomputationallyintensiveworkloadswell suitedfortheCBEAareoff-loadedfroma morestandardprocessingnode.There are potentiallymanywaystomovedataandfunctionsfromahostprocessor toan acceleratorprocessor andviceversa.

Inordertoprovidea consistentmethodologyand setofapplication programming interfaces(APIs)fora varietyofhybridsystems,includingtheCellBE SoChybrid system,theSDK hasimplementationsoftheCellBE multi-coredata

communicationandprogramming modellibraries,DataandCommunication Synchronization andAccelerated LibraryFramework,whichcanbe usedon x86/Linuxhostprocess systemswith CellBE-basedaccelerators.Aprototype implementationoversocketsisprovidedsothatyoucangainexperiencewiththis

(25)

programming styleandfocusonhow tomanagethedistributionofprocessingand datadecomposition.For example,inthecaseofhybridprogrammingwhen

moving datapointtopointoveranetwork,caremust betakentomaximizethe computational workdoneonacceleratornodespotentiallywithasynchronousor overlappingcommunication, giventhepotentialcostincommunicatinginputand results.

For moreinformationabouttheDaCSandALFprogrammingAPIs,refertoData and CommunicationSynchronizationLibraryforHybrid-x86Programmer'sGuideand APIReference andAcceleratedLibraryFrameworkforHybrid-x86Programmer'sGuide and APIReference.

(26)
(27)

Chapter

2.

Programming

with

the

SDK

Thissectionisashortintroductionaboutprogrammingwith theSDK.Itcoversthe followingtopics:

v “Systemrootdirectories”

v

“Runningthesimulator”onpage18

v “Specifyingtheprocessorarchitecture”onpage21

v “PPEaddress spacesupport onSPE”onpage22

v “SPUstackanalysis”onpage33

v “SDKprogramming examplesanddemos”onpage24

v “SupportforhugeTLB filesystems” onpage26

v “SDKdevelopment bestpractices”onpage27

v “Performanceconsiderations”onpage27

Refer totheCell BEProgrammingTutorial,theFull-SystemSimulatorUser’sGuide, and otherdocumentationformoredetails.

System

root

directories

Becauseof thecross-compileenvironmentand simulatorintheSDK,thereare severaldifferentsystemrootdirectories.Table2describesthese directories.

Table2.Systemrootdirectories

Directoryname Description

Host Thesystemrootforthehostsystemis“/”.TheSDKis

installedrelativetothishostsystemroot.

GCC Toolchain ThesystemrootfortheGCCtoolchaindependsonthehost

platform.ForPPCplatformsincludingtheBladeCenterQS21, thisdirectoryisthesameasthehostsystemroot.Forx86and x86-64systemsthisdirectoryis/opt/cell/sysroot.Thetool chainPPUheaderandlibraryfilesarestoredrelativetothe GCCToolchainsystemrootindirectoriessuchasusr/include

andusr/lib.ThetoolchainSPUheaderandlibraryfilesare storedrelativetotheGCCToolchainsystemrootindirectories suchasusr/spu/includeandusr/spu/lib.

Simulator Thesimulatorrunsusinga2.6.22kernelandaFedora7system rootimage.Thissystemrootimagehasarootdirectoryof“/”. Whenthissystemrootimageismountedintoahost-based directorysuchas/mnt/cell-sdk-sysroot.Thisdirectoryisthe

(28)

Table2.Systemrootdirectories (continued)

Directoryname Description

Examples and Libraries TheExamplesandLibrariessystemrootdirectoryis

/opt/cell/sysroot.Whenthesamplesandlibrariesare

compiledandlinked,theresultingheaderfiles,librariesand binariesareplacedrelativetothisdirectoryindirectoriessuch asusr/include,usr/lib,and/opt/cell/sdk/usr/bin.The libspelibraryisalsoinstalledintothissystemroot. Afteryouhaveloggedinasroot,youcansynchronizethis sysrootdirectorywiththesimulatorsysrootimagefile.Todo this,usethecellsdk_sync_simulatorscriptwiththesynch task.Thecommandis:

opt/cell/cellsdk_sync_simulator

Thiscommandisveryusefulwheneveralibraryorsamplehas beenrecompiled.Thisscriptreducesusererrorbecauseit providesastandardmechanismtomountthesystemroot image,rsyncthecontentsofthetwocorrespondingdirectories, andunmountthesystemrootimage.

Running

the

simulator

Toverify thatthesimulatorisoperatingcorrectlyand thenrunit,issuethe followingcommands:

export PATH=/opt/ibm/systemsim-cell/bin:$PATH systemsim -g

The systemsimscript foundinthesimulator’s bindirectory launchesthesimulator. The –gparameter startsthegraphicaluser interface.

Note: Itisnolongernecessarytohavealocalcopy of.systemsim.tcl.The

simulatorlooksinthelocaldirectoryfirst (asit alwaysdid),butif itisnot there,itusesthesystemsim.tcl(noleadingdot)inlib/cellofthe

(29)

Notes:

1. Youmustbe onagraphicalconsole, oratleasthavetheDISPLAYenvironment

variable pointedtoanXservertorunthesimulator's graphicaluserinterface (GUI).

2. Ifanerrormessageaboutlibtk8.4.soisdisplayed,youmustloadtheTK

package asdescribedin SDK3.0InstallationGuide.

WhentheGUIisdisplayed, clickGotostartthesimulator.

Note: Tomakethesimulatorruninfastmode,youcanclickModeandthenFast

Mode.Thisforcesthesimulatortobypassitsstandardanalysisandstatistics collectionfeatures. Fastmodeisusefulifyouwanttoadvance thesimulator throughsetup orinitializationfunctionsthatare notthefocusof analysis, suchastheLinuxboot processing.Youshoulddisablefastmodewhenyou reachthepointat whichyouwishtododetailedanalysisordebugthe application.Youcanalso selectSimpleModeorCycle Mode.

Youcanusethesimulator'sGUI togetabetter understandingoftheCell BE architecture.For example,thesimulatorshowstwosetsof PPEstate.Thisis because thePPEprocessor coreisdual-threadedand eachthreadhasitsown registers andcontext.Youcanalsolookat thestateoftheSPE’s,includingthestate of theirMemoryFlowController(MFC).

Thesystemsim commandsyntax is:

Console window for the system running on the simulator

Full System Simulator GUI

mysim [root@(none) ~]# cd /opt/cell/sdk/usr/bin/tutorial [root@(none) tutorial]# simple

Hello Cell (0x1820008) Hello Cell (0x1820688) Hello Cell (0x1820900) Hello Cell (0x1820b78) Hello Cell (0x1820df0) Hello Cell (0x1821068) Hello Cell (0x18212e0) Hello Cell (0x1821558)

The program has successfully executed. [root@(none) tutorial]#

(30)

systemsim [-f file] [-g] [-n]

where:

Parameter Description

-f<filename> specifiesaninitialrunscript(TCLfile)

-g specifiesGUImode,otherwisethesimulatorstartsincommand-line mode

-n specifiesthatthesimulatorshouldnotopenaseparateconsole window

Youcanfinddocumentationaboutthesimulatorincludingtheuser’s guideinthe /opt/ibm/systemsim-cell/doc directory.

The

callthru

utility

The callthruutility allowsyoutocopyfiles toorfromthesimulatedsystem while it isrunning. Theutilityislocatedinthesimulatorsystemrootimageinthe /usr/bin directory.

Ifyoucalltheutilityas:

v callthrusink<filename>,it writes itsstandardinputinto <filename>onthe

hostsystem

v callthrusource<filename>,it writes thecontentsof<filename> onthehost

systemtostandardoutput.

Redirecting appropriatelyletsyoucopyfilestoandfromthehost.Forexample, when thesimulatorisrunningonthehost,youcould copyaCell BEapplication into /tmp:

cp matrix_mul /tmp

Then,intheconsolewindow ofthesimulatedsystem,youcouldaccessitas follows:

callthru source /tmp/matrix_mul > matrix_mul chmod +x matrix_mul

./matrix_mul

The /tmpdirectory isshown asanexampleonly.

The sourcefiles forthecallthruutilityare in/opt/ibm/systemsim-cell/sample/ callthru.Thecallthruutilityisbuiltandinstalledontothesysrootdiskaspart of theSDKinstallation process.

Read

and

write

access

to

the

simulator

sysroot

image

By defaultthesimulatordoesnotwritechangesbacktothesimulatorsystem root (sysroot)image. Thismeansthatthesimulatoralways beginsinthesameinitial stateofthesysrootimage.Whennecessary,youcanmodifythesimulator

configurationsothatanyfilechangesmadebythesimulatedsystemtothesysroot imagearestored inthesysrootdiskfilesothattheyareavailabletosubsequent simulatorsessions.

Tospecifythatyouwantupdatethesysrootimagefilewithanychangesmadein the simulatorsession, change thenewcow parameteron themysim bogus disk init commandin.systemsim.tcltorw(specifyingread/writeaccess)andremovethe last twoparameters.Thefollowingisthechangedlinefrom.systemsim.tcl:

(31)

mysim bogus disk init 0 $sysrootfile rw

Whenrunningthesimulatorwithread/write accesstothesysrootimagefile,you must ensurethatthefilesystem inthesysrootimagefileisnotcorruptedby incompletewritesora prematureshutdownof theLinuxoperatingsystemrunning inthesimulator. Inparticular,youmust besure thatLinuxwrites anycacheddata out tothefilesystem before exitingthesimulator.To dothis, issue″sync ;sync″ intheLinuxconsolewindowjustbeforeyouexitthesimulator.

Enabling

Symmetric

Multiprocessing

support

By defaultthesimulatorprovides anenvironmentthatsimulatesoneCell BE processor.Tosimulateanenvironment wheretwoCell BEprocessorsexist,similar toa BladeCenterQS21,youmust enableSymmetricMultiprocessing(SMP) support.Atclrunscript,config_smp.tcl,isprovidedwith thesimulatorto configureit forSMPsimulation.Forexample,followingsequenceofcommands willstartthesimulatorconfiguredwitha graphicaluser interfaceandSMP. export PATH=$PATH:/opt/ibm/systemsim/bin

systemsim -g -f config_smp.tcl

Whenthesimulatorisstarted,ithasaccesstosixteenSPEs acrosstwoCellBE processors.

Enabling

xclients

from

the

simulator

Toenable xclientsfromthesimulator, youneedto configureBogusNet(seethe BogusNet HowTo),andthenperform thefollowingconfigurationsteps:

1. Enableip-forward:

echo 1 > /proc/sys/net/ipv4/ip_forward 2. ConfigureIPTABLES

iptables -t nat -A POSTROUTING -o eth0 -j MASQUERADE

iptables -A FORWARD -i eth0 -o tap0 -m state --state RELATED,ESTABLISHED -j ACCEPT iptables -A FORWARD -i tap0 -o eth0 -j ACCEPT

Notes:

1. TheIPTABLEScommandsneedtousethecorrecttap#interfaceconfigured

with BogusNet.

2. Thefirst iptablescommandfailsunlesstheLinuxkernelwasconfiguredto

allowtheNATfeature.ToenableyourkernelfortheNATfeature,youneedto rebuildtheKernelandreboot yoursimulatorhostsystem.

Specifying

the

processor

architecture

ManyofthetoolsprovidedinSDK3.0supportmultiple implementationsofthe CBEA. Theseinclude theCell BEprocessor andafutureprocessor.Thisfuture processor isa CBEA-compliantprocessor witha fullypipelined,enhanceddouble precision SPU.

Theprocessor supportsfiveoptional instructionstotheSPUInstructionSet Architecture. Theseinclude:

v DFCEQ

v DFCGT

v DFCMEQ

v DFCMEQ

(32)

Detailed documentationfortheseinstructions isprovidedinversion 1.2(orlater) of theSynergistic ProcessorUnitInstructionSet Architecturespecification.The future processor alsosupportsimprovedissueand latencyforalldoubleprecision instructions.

The SDKcompilerssupportcompilation foreithertheCellBEprocessor orthe futureprocessor.

Table3.spu-gcccompileroptions

Options Description

-march=<cpu type> GeneratemachinecodefortheSPUarchitecturespecifiedby theCPUtype.SupportedCPUtypesareeithercell(default) orcelledp,correspondingtotheCellBEprocessororfuture processor,respectively.

-mtune=<cpu type> Scheduleinstructionsaccordingtothepipelinemodelofthe specifiedCPUtype.SupportedCPUtypesareeithercell

(default)orcelledp,correspondingtotheCellBEprocessor orfutureprocessor,respectively.

Table4.spu-xlccompileroptions

Option Description

-qarch=<cpu type> GeneratemachinecodefortheSPUarchitecturespecifiedby

theCPUtype.SupportedCPUtypesareeitherspu(default) oredp,correspondingtotheCellBEprocessororfuture

processor,respectively.

-qtune=<cpu type> Scheduleinstructionsaccordingtothepipelinemodelofthe specifiedCPUtype.SupportedCPUtypesareeitherspu

(default)oredp,correspondingtotheCellBEprocessoror futureprocessor,respectively.

The simulatoralso supportssimulationofthefutureprocessor.Thesimulator installation providesatcl runscripttoconfigureit forsuchsimulation.For example,thefollowingsequenceofcommandsstartthesimulatorconfiguredfor thefutureprocessorwith agraphicaluser interface.

export PATH=$PATH:/opt/ibm/systemsim-cell/bin systemsim -g -f config_edp_smp.tcl

The statictiminganalysistool, spu_timing,alsosupportsmultipleprocessor implementations.Thecommandlineoption –march=celledpcanbe usedtospecify thatthetiminganalysisbe donecorrespondingtothefutureprocessors’enhanced pipeline model.Ifthearchitectureisunspecifiedorinvokedwith thecommand lineoption–march=cell,then analysisisdonecorrespondingtotheCell BE processor'spipelinemodel.

PPE

address

space

support

on

SPE

WhenyoudevelopSPEprogramsusingtheSDK,youmaywishtoreference variablesinthePPEaddress spacefromcoderunningonanSPE.Thisisachieved through anextensiontotheClanguagesyntax.

Itmight bedesirableto sharedatainthiswaybetweenanSPEand thePPE.This extension makesiteasier topasspointerssothatyoucanusethePPEtoperform certain functionsonbehalfoftheSPE.Youcanreadily sharedatabetweenallSPEs through variablesinthePPEaddressspace.

(33)

Thecompilerrecognizesanaddress spaceidentifier__eathatcanbeusedasan extra typequalifierlikeconst orvolatileintype andvariable declarations.You canqualifyvariable declarationsinthis way,butnotvariabledefinitions.

Thefollowingare examples.

/* Variable declared on the PPE side. */ extern __ea int ppe_variable;

/* Can also be used in typedefs. */ typedef __ea int ppe_int;

/* SPE pointer variable pointing to memory in the PPE address space */ __ea int *ppe_pointer;

PointersintheSPEaddressspacecanbecasttopointersinthePPEaddressspace. Doing thistransformsanSPEaddressintoan equivalentaddress inthemapped SPElocalstore (inthePPE addressspace).Thefollowingisanexample.

int x;

__ea int *ppe_pointer_to_x = &x;

Thesepointervariablescanbepassed tothePPE processbywayofamailboxand usedbyPPEcode.Withthismethod,youcanperform operationsinthePPE executioncontextsuchascopyingmemoryfromoneregion oftheSPElocalstore toanother.

Inthesameway,these pointerscanbeconverted toandfromthetwoaddress spaces,asfollows:

int *spe_x;

spe_x = (int *) ppe_pointer_to_x;

Referencesto__eavariablescausedecreasedperformance.Theimplementation performssoftwarecachingofthese variables,buttherearemuchhigheroverheads whenthevariable isaccessedforthefirst time.Modificationsto__eavariablesis also cached.Thewritebackofsuchmodificationsto PPEaddressspacemaybe delayeduntil thecachelineisflushed, ortheSPUcontextterminates.

GCCfortheSPUprovides thefollowingcommandlineoptionstocontrolthe runtime behaviorofprogramsthatusethe__eaextension.Manyoftheseoptions specifyparametersforthesoftware-managedcache. Incombination,theseoptions causeGCCtolinkyour programtoasinglesoftware-managedcachelibrarythat satisfies thoseoptions.Table5 describestheseoptions.

Table5.Options

Option Description

-mea32 Generatecodetoaccessvariablesin32-bitPPUobjects.The

compilerdefinesapreprocessormacro__EA32__toallow applicationstodetecttheuseofthisoption.Thisisthedefault.

-mea64 Generatecodetoaccessvariablesin64-bitPPUobjects.The

compilerdefinesapreprocessormacro__EA64__toallow applicationstodetecttheuseofthisoption.

-mcache-size=8 Specifyan8KBcachesize.

-mcache-size=16 Specifyan16KBcachesize.

-mcache-size=32 Specifyan32KBcachesize.

-mcache-size=64 Specifyan64KBcachesize.

(34)

Table5.Options (continued)

Option Description

-matomic-updates UseDMAatomicupdateswhenflushingacachelinebackto PPUmemory.Thisisthedefault.

-mno-atomic-updates Thisnegatesthe-matomic-updatesoption.

Accessing an __eavariable fromanSPUprogramcreatesa copyofthisvalue in thelocalstorageof theSPU. Subsequentmodificationstothevalue inmainstorage are notautomaticallyreflectedinthecopyofthevalue inlocalstore.Itisyour responsibility toensuredatacoherencefor __eavariablesthatareaccessedby

bothSPE andPPEprograms.

Acompleteexampleusing__eaqualifierstoimplementa quicksort algorithmon theSPUaccessingPPEmemorycanbefoundintheexamples/ppe_address_space directory providedbytheSDK3.0cell-examplestarball.

SDK

programming

examples

and

demos

Eachof theexamples anddemoshasanassociatedREADME.txtfile.Thereisalsoa top-levelreadmeinthe/opt/cell/sdk/srcdirectory,whichintroducesthestructure of theexamplecode sourcetree.

Almostalloftheexamples runbothwithin thesimulatorandontheBladeCenter QS21. Someexamples includeSPU-onlyprogramsthatcanberunonthesimulator in standalonemode.

The sourcecode,whichisspecifictoa givenCellBE processorunittype,isinthe corresponding subdirectorywithina givenexample’sdirectory:

v ppuforcodecompiledtorunonthePPE

v ppu64forcodespecificallycompiledfor64-bitABIonthePPE

v spuforcodecompiledtorunonanSPE

v spu_simforcodecompiledtorunonanSPEunderthesystemsimulatorin

standaloneenvironment

Overview

of

the

build

environment

In/opt/cell/sdk/buildutilsthereare sometoplevelMakefilesthatcontrolthe build environmentforalloftheexamples.Mostofthedirectoriesinthelibraries and examplescontaina Makefileforthatdirectoryandeverythingbelow it.Allof theexamples havetheirown Makefilebutthecommondefinitionsareinthetop levelMakefiles.

The buildenvironmentMakefilesaredocumented in/opt/cell/sdk/buildutils/ README_build_env.txt.

Changing

the

default

compiler

Environment variablesinthe/opt/cell/sdk/buildutils/make.* filesare usedto determinewhichcompilerisusedtobuildtheexamples.

The /opt/cell/sdk/buildutils/cellsdk_select_compilerscriptcanbeusedto switch thecompiler.Thesyntax ofthiscommandis:

(35)

where thexlc flagselectstheXLC/C++ compilerand thegcc flagselectstheGCC compiler. Thedefault,ifunspecified,istocompiletheexampleswith theGCC compiler.

Afteryouhaveselecteda particularcompiler,thatsamecompilerisusedforall futurebuilds,unlessit isspecificallyoverwrittenbyshellenvironmentvariables, SPU_COMPILER,PPU_COMPILER,PPU32_COMPILER,orPPU64_COMPILER.

Building

and

running

a

specific

program

Youdo notneedtobuild alltheexamplecodeatonce, youcanbuildeach

programseparately. Tostart fromscratch,issue amakeclean usingtheMakefilein the/opt/cell/sdk/srcdirectory oranywhereinthepathtoa specificlibraryor sample.

If youhaveperformeda makecleanat thetoplevel,youneedto rebuildthe include filesandlibrariesfirst beforeyoucompileanythingelse.Todothis runa make inthesrc/include and src/lib directories.

Note: InSDK3.0,thetop-levelMakefilesforCellBE applicationshavebeen

movedintothesubdirectorybuildutilsunderthemainSDKdirectory /opt/cell/sdk.IfyoudevelopedMakefiles usingpreviousversions ofthe SDK,youmayneed tomodifythemtoreferencethisnew locationforthe top-levelMakefiles.

Compiling

and

linking

with

the

GNU

tool

chain

ThisreleaseoftheGNUtoolchainincludesaGCCcompilerandutilitiesthat optimizecodefortheCellBE processor.Theseare:

v Thespu-gcccompilerforcreatinganSPUbinary

v Theppu32-embedsputool

v Theppu-gcccompiler

v

Theppu-embedsputoolwhichenablesan SPUbinarytobe linkedwitha PPU

binaryintoasingleexecutableprogram

v Theppu32-gcccompilerforcompilingthePPUbinaryand linkingitwith the

SPUbinary

Theexample belowshowsthestepsrequiredtocreatetheexecutableprogram simplewhichcontainsSPUcode,simple_spu.c,and PPUcode,simple.c. 1. CompileandlinktheSPEexecutable.

/usr/bin/spu-gcc -g -o simple_spu simple_spu.c

2. Optionally runembedsputowraptheSPUbinaryintoa CESOF(CBE

Embedded SPEObjectFormat)linkablefile.ThiscontainsadditionalPPE symbolinformation.

/usr/bin/ppu32-embedspu simple_spu simple_spu simple_spu-embed.o

3. CompilethePPEsideandlinkittogetherwith theembeddedSPUbinary.

/usr/bin/ppu32-gcc -g -o simple simple.c simple_spu-embed.o -lspe

4. Or,compilethePPEsideand linkit directlywiththeSPUbinary.Thelinker

willinvokeembedspu,usingthefilenameoftheSPUbinaryasthenameofthe programhandlestruct.

(36)

Notes:

1. Thissectiononlyhighlights32-bitABIcompilation.Tocompilefor64-bit,use

ppu-gcc (insteadofppu32-gcc)anduseppu-embedspu(insteadof ppu32-embedspu).

2. Youarestronglyadvisedtousethe-gswitch asshownintheexamples.This

embedsextra debugginginformationintothecodeforlaterusebytheGDB debuggerssuppliedwith theSDK.See Chapter3,“DebuggingCellBE applications,”onpage29formoreinformation.

Support

for

huge

TLB

file

systems

The SDKsupportsthehugetranslation lookasidebuffer(TLB) filesystem,which allows youtoreserve16MBhugepages ofpinned,contiguousmemory.This feature isparticularlyusefulforsomeCell BEapplicationsthatoperateonlarge datasets, suchastheFFT16Mworkloadsample.

ToconfiguretheBladeCenterQS21for 20huge pages(320MB),runthefollowing commands:

mkdir -p /huge

echo 20 > /proc/sys/vm/nr_hugepages mount -t hugetlbfs nodev /huge

Ifyouhavedifficultiesconfiguringadequatehuge pages,itcouldbe thatthe memoryisfragmentedandyouneedtoreboot.

Youcanaddthecommandsequenceshownabovetoastartupinitializationscript, suchas/etc/rc.d/rc.sysinit,sothatthehugeTLBfilesystemisconfigured during thesystemboot.

To verifythelargememoryallocation, runthecommandcat /proc/meminfo.The outputissimilarto:

MemTotal: 1010168 kB MemFree: 155276 kB . . . HugePages_Total: 20 HugePages_Free: 20 Hugepagesize: 16384 kB

Hugepages areallocatedbyinvoking mmapofa/huge fileofthespecified size. For example,thefollowingcodesampleallocates32MBofprivatehuge paged memory:

int fmem;

char *mem_file = "/huge/myfile.bin";

fmem = open(mem_file, O_CREAT | O_RDWR, 0755)) == -1) { remove(mem_file);

ptr = mmap(0, 0x2000000, PROT_READ | PROT_WRITE, MAP_PRIVATE, fmem, 0);

mmap succeedsevenifthereareinsufficienthugepages tosatisfytherequest.On first accesstoapagethatcannotbebackedbyhugeTLBfilesystem,the

application is″killed″.That is,theprocessisterminatedandthemessage″killed″is emitted.Youmustbeensurethatthenumberof hugepagesrequesteddoesnot exceedthenumber available.Furthermore,ona BladeCenterQS20and

BladeCenterQS21,thehugepages areequallydistributed acrossboth

(37)

restrictmemoryallocationtoaspecific nodefindthatthenumberofavailable huge pagesforthespecificnodeishalfofwhatisreportedin/proc/meminfo.

SDK

development

best

practices

Thissectiondocuments somebestpractices intermsofdevelopingapplications usingtheSDK.SeealsodeveloperWorksarticlesaboutprogramming tipsand best practices forwritingCellBE applicationsat

http://www.ibm.com/developerworks/power/cell/

Using

a

shared

development

environment

Multiple usersshouldnotupdatethecommon simulatorsysrootimagefileby mounting itread-writeinthesimulator.For shareddevelopment environments,the callthruutility(see“Thecallthruutility”onpage20)canbeusedtoget filesin and outofthesimulator.Alternatively,userscancopythesysrootimagefileto theirown sandboxareaand thenmountthisversion withread/writepermissions tomake persistentupdatestotheimage.

Ifmultiple usersneedtorunCell BEapplicationsonaBladeCenterQS21,you need amachinereservation mechanismto reducecollisionsbetweentwopeople who areusingSPEsat thesametime. ThisisbecauseSPEthreads arenotfully preemptable inthisversionoftheSDK.

Performance

considerations

Thefollowingsectionsdiscussesthefollowingperformance considerationsthatyou shouldtakeintoaccountwhenyouaredevelopingapplications:

v “NUMA”

v “Preemptiveco

References

Related documents