• No results found

Data-parallel agent-based microscopic road network simulation using graphics processing units

N/A
N/A
Protected

Academic year: 2019

Share "Data-parallel agent-based microscopic road network simulation using graphics processing units"

Copied!
13
0
0

Loading.... (view fulltext now)

Full text

(1)

Contents lists available at ScienceDirect

Simulation

Modelling

Practice

and

Theory

journal homepage: www.elsevier.com/locate/simpat

Data-parallel

agent-based

microscopic

road

network

simulation

using

graphics

processing

units

Peter

Heywood

a,∗

,

Steve

Maddock

a

,

Jordi

Casas

b

,

David

Garcia

b

,

Mark

Brackstone

b

,

Paul

Richmond

a

aDepartmentofComputerScience,TheUniversityofSheffield,UnitedKingdom bTransportSimulationSystemsLtd(TSS),Spain

a

r

t

i

c

l

e

i

n

f

o

Articlehistory:

Availableonlinexxx

Keywords:

Agent-basedsimulation GPU

Simulationframework Transportmicrosimulation

a

b

s

t

r

a

c

t

Roadnetworkmicrosimulationiscomputationallyexpensive,andexistingstateoftheart commercialtoolsusetaskparallelismand coarse-graineddata-parallelismformulti-core processorsto achieveimprovedlevels ofperformance.Analternativeisto useGraphics ProcessingUnits(GPUs)andfine-graineddataparallelism.ThispaperdescribesaGPU ac-celeratedagentbasedmicrosimulationmodelofaroadnetworktransportsystem.The per-formanceforaprocedurallygeneratedgridnetworkisevaluatedagainstthatofan equiv-alentmulti-coreCPUsimulation.InordertoutiliseGPUarchitectureseffectivelythe pa-perdescribesanapproachforgraphtraversalofneighbouringinformationwhichisvital toprovidinghighlevelsofcomputationalperformance.Thegraphtraversalapproachhas beenintegratedwithinaGPUagentbasedsimulationframeworkasageneralisedmessage traversaltechniqueforgraph-basedcommunication.Speed-upsofupto43× are demon-stratedwithincreasedperformancescalingbehaviour.Simulationofoverhalfamillion ve-hiclesandnearlytwomilliondetectorsatarateof25× fasterthanreal-timeisobtained onasingleGPU.

© 2017TheAuthors.PublishedbyElsevierB.V. ThisisanopenaccessarticleundertheCCBYlicense. (http://creativecommons.org/licenses/by/4.0/)

1. Introduction

Simulationsofroadnetworksareusedduringthedevelopmentandmanagementoftransportnetworksaroundtheglobe. Microscopic roadnetwork simulationsare fine-grainedsimulations which simulateindividual vehicles within thesystem, capturinglowlevelbehaviours.AgentBasedModelling(ABM)isonemicroscopicapproach,whererelativelysimple individ-ualbehavioursaredefined,whichcombinedwithinteractionsbetweenagentsandtheenvironment,allows theemergence ofcomplex behaviours. However, microscopic simulations are much more computationally expensive than the more tra-ditionalhigher-level macroscopicsimulations, which use a higherlevel ofabstraction consistingof network flows rather thanindividualvehicles. Nonetheless,thelevelofdetailcapturedbymicroscopicsimulations ismuchgreaterthan thatof macroscopicandmesoscopicsimulations,includingtheemergentbehavioursenabledbytheuseofABM.

Correspondingauthor.

E-mailaddresses:p.heywood@sheffield.ac.uk(P.Heywood),p.richmond@sheffield.ac.uk(P.Richmond).

https://doi.org/10.1016/j.simpat.2017.11.002

1569-190X/© 2017 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY license. (http://creativecommons.org/licenses/by/4.0/)

(2)

As microscopic simulations are computationally expensive, current commercial and open-source tools such as Aim-sun [1], Vissim [2], MATsim [3] and SUMO [4] can have considerable execution run-times, especially for large scale simulations. This limits the overall effectiveness and uptake of microscopic simulation within the transport sector [5]. To increase simulator performance, existing simulation tools use parallel processing applied to multi-core processors to reduce therun-timeof simulations.Task-parallelandcoarse-graineddata-parallel approachesaretypically applied within existing tools. Task-parallelism distributes independent processing tasks to separate processing threads. Coarse-grained data-parallelism applies the same algorithms to differentunits of data,where the individual unitsof data are relatively large.Theseapproachesare well-suitedto multi-coreCentralProcessing Units(CPUs), butcan resultinpoorperformance scalability.

Many-core architecturessuch as Graphics Processing Units(GPUs) offersignificantly greater levels of parallelism than multi-core architectures, andsignificantlygreater levelsofraw computeperformance. Toaccessthehighlevels of perfor-mance, algorithms anddata structures must expose high levels of parallelism andenable goodmemory-access patterns. Typicallyfine-graineddata-parallelismisused,wherethesamealgorithmsareappliedtorelativelysmallindividualunitsof data.

Todemonstratethesimulation performanceimprovements whichcould beachievedby usingmany-coreGPUs for mi-croscopicroadnetworksimulations,asingle-lanetransportmodelwithstop-sign-basedyellow-boxjunctionshasbeen im-plemented using FLAME GPU (Flexible Large Scale Agent Modelling Environment forGraphics Processing Units). FLAME GPU is the only general-purpose GPU-accelerated ABM framework [6]. The performance of the simulation is bench-marked using a procedurally generated artificial grid-based road network, and the simulation performance is compared toahigh-performancemulti-core CPUmicroscopicroadnetworksimulationtool,Aimsun 8.1,whichimplementsthesame models.

This paper presents two contributions:(i) a GPU accelerated data-parallel agent-based road network microsimulation modelisevaluatedagainstanequivalentmodelinacommercialmulti-core CPUsoftwaretool,demonstratingconsiderable improvementstosimulationperformanceandperformancescalability;and(ii)ageneral-purposegraph-based communica-tionstrategyispresentedforhighperformanceagentcommunicationforfine-graineddata-parallelagentbasedsimulations, implemented forthe FLAMEGPU ABMframework which enableshigh performance agentbasedsimulations of transport networksonGPUs.

Section2 providesasummary ofrelatedwork.Section3 detailstheimplementedmodel,thebenchmarknetworkused andtheFLAMEGPUimplementation.Section4 describesthegraph-basedcommunicationstrategytoenablethehighlevels ofperformanceinthismodel.Section5 providestheresultsofasetofapplicationbenchmarksusedtoassessthe perfor-manceimpactofGPUsonmicroscopicroadnetworksimulationusingABM.Section6 concludesthepaper.

2. Relatedwork

Microscopicsimulationoftransportsimulationinvolvesthesimulationofindividualvehiclesandpiecesofroadnetwork infrastructuretopredict theeffectsofchanges invehiclebehaviour orchanges intheroadnetworkinfrastructure.Thisis used asatool inthe planningandmanagement oftransport networkstoimprovethe effectivenessofinfrastructure and minimisetheimpactofchangesinconditionsonthetransportationnetwork.Microscopicsimulations arecomputationally expensive, whichhaslimitedtheadoptionofmicrosimulationcomparedtohigherlevelmesoscopicandmacroscopic sim-ulations [5]. Amicro-scale simulation must includebehavioural models for vehicles to followas well asmodels for any dynamicinfrastructuresuch astrafficsignals andvehicle detectors,whichattemptto accuratelycapturethe behaviourof therealworld.ABMisapowerfulapproachtodefiningmicroscopicmodels,whichprovidesanaturalmethodofdescribing individual behaviouralmodelsandthensimulatingthe interactionbetweenindividualsinthesimulation[7].Someofthe mostimportantvehicle behaviouralmodels are: (i)Car Following Models[8,9];(ii) LaneChangingBehaviour[10,11]; and (iii)GapAcceptanceModelling[1,12].

Thesebehaviouralmodelsmakeuseofseveralsetsofinputdata,including:theattributesandstructureofthetransport network;transportdemandwhichisusedtopopulatethesimulationandallowalternatescenariostobesimulatedfor pre-dictiveuse;andsimulationparameterswhichareusedtomodifythemodelswithinthesimulator.Parameterscaninclude distributionsfromwhichvehiclepropertiescan besampled,such asvehiclelength,rateofaccelerationorevenproperties suchaspollutionemissionrates.Theseparameterscanbemanipulatedtoreplicateobservedbehaviour[13,14].

(3)

Fig.1. Theaveragerun-timefrom3repetitionsfora60minuteAimsun8.1simulationcontainingapproximately25,000vehicles,fordifferentprocessor threadcounts.Asprocessorthreadcountincreases,totalsimulationtimedecreases,butwithdiminishingreturns.NotethattheIntelXeonE5-2643isa dual-socketsystem.Additionally,noperformanceimprovementswereobservedwhenusingmoreprocessingthreadsthanphysicalcores(Hyper-threading).

Incontrasttomulti-coreprocessorarchitectures,highlyparallelmany-coreprocessingarchitecturessuchasGPUsenable theuseoffine-graineddata-parallelismtooffer highlevels ofperformance. Originallydevelopedfor2Dand3Dcomputer graphicsapplications,GPUsarenowcommonlyusedtoaccelerategeneralpurposecomputingthroughdata-parallelism,as they offersignificant computational power. Forinstance, a singlestate ofthe artNVIDIA P100GPU hasa peak theoreti-cal performance of 5.3TFLOPS (trillion floating point operations per second)in double precision [17], compared to Intel Broadwell(22 core)Xeon CPUswhichare capable oflessthan 1.0TFLOPS [18], withevengreater levels ofrawcompute performanceforsingleandhalfprecision floating pointoperations. From adevice perspective,GPUs useaformofSingle Instruction,ManyData(SIMD)parallelarchitecture,wherethesameinstructionsareappliedtomanyitemsofdata concur-rently.Additionally,alarge portionoftheGPUdiespaceisused forthe manyprocessingcores,leavinga relativelysmall portionofdiespaceavailableforon-chipcachesotoachievehighlevelsofperformancetheuseofmemorymustbe care-fullyconsidered. Alongwithdataaccess patterns,theSIMDmodel usedby GPUs oftenrequiresalternate algorithms and datastructures tosoftwaredesignedformulti-coreCPUs,inordertoleveragethehighlevelsofperformance required. Ad-ditionally,GPUsare typicallymoreenergy-efficientwhencomparedtoCPUs,offering greaterlevelsofperformanceforthe samepowerusage,i.e.GPUs typicallyofferahighernumberofFLOPSperwatt.Thisisdemonstratedbythehigh propor-tionofGPUenabledsuper-computersintheupperranksoftheGreen500,therankingoftheworld’smostpower-efficient super-computers[19].

AlthoughcurrentcommercialandopensourcemicroscopicroadnetworksimulatorssuchasVissimorAimsunuse multi-coreCPUparallelism toincrease simulatorperformance,GPUshaveyettobeadoptedwithintransportmodellingsoftware asawhole.GPUshavebeenusedpreviouslyinmicroscopictransportnetworksimulation,asbothcellularautomata[20] and alsousingmulti-agentsystems[21].Additionallyothertasksassociatedwithmicroscopicsimulationhavebeenaccelerated usingGPUs,suchasthegenerationofdemanddatausingactivityplangeneration[22],traffic signaloptimisation[23] and dynamicrouteassignment[24].Recently,outsideofmicroscopicsimulation,GPUshavebeenutilisedwithinbothmesoscopic andmicroscopicroad networksimulation toolsandperformance improvementshave beendemonstrated[25–27], further highlightingthedemandforreducedsoftwarerun-timeswithinthetransportmodellingsector.

(4)

Fig.2. FLAMEGPUState-basedrepresentationforvehicleagentswithinthesimulation,illustratingthelogicalcontrol-flowoftheagentswithinasingle iteration.Interactionwithdetectoragentshasbeenomittedforclarity.

enablinghighperformancesimulations withoutexplicitknowledgeofthe parallelprocessingmodel[6,31,32].FLAMEGPU hassuccessfullybeenusedinseveralmodellingdomainssuchaspedestriancrowdsimulation[33] andcellularsimulations

[31,34],however the useof FLAME GPUfor road network simulation hasbeen limited, due to the unsuitabilityof fixed radiuscommunicationforusewithintransportnetworks[35].

3. Mulit-agentGPUmicrosimulation

ToevaluatethepotentialperformanceimpactofGPUacceleratedABMonmicroscopicroadnetworksimulationa single-lane road network model was implemented. The implemented models include: car following model [1,15,36] (based on Gipps’carfollowingmodel[8]);constantvehiclearrival[15,36];virtualqueues[15,36];detectors[15,36];ayellow-boxmodel

[15,36] andagapacceptancegive-waymodel[1,15,36] combinedwithstop-signsforjunctions.Thesefeaturesandmodels wereselectedtoproducecomparablebehaviourfortheprocedurallygeneratedgridroadnetworkdescribedinSection5.3, withoutthecomplexitiesofmodellingthebroadrangeofroadnetworkinfrastructurepresentinrealworldroadnetworks. FLAMEGPUsimulationsmakeuseofastate-basedrepresentationofagentdynamicstosimplifydevelopmentandenable keyperformance optimisationson behalfofthemodeller.Fig.2 showsthestate machineforthevehicularagentsusedto implementthismodel;itrepresentsthelogicalprocessfollowedbyeachagentateachiterationofthesimulation,showing therelationshipbetweenagentstates,functionsandmessage-lists.Agentsenterthestatemachineforthecurrentiteration inoneofthepossibleinitialstates.Agentfunctionsallowagentstotransitionfromonestatetoanother,withtheorderof thesefunctionscontrolledbyexecutionlayers.Communicationbetweenagentsisintheformofmessagelists,whichareused tohighlightdependenciesinthestate-basedrepresentation.Theuseofindirectcommunicationpreventsraceconditionsin theparallelprocessingenvironment.Agentfunctionscanoutputmessagestotherelevantmessagelistdatastructure,which isthenusedasinputbyadifferentagentfunction.Themessagelistsareaccessedusingspecialisedmessageaccesspatterns toreduce theperformance impactofmessagelistparsing,anewgraph-basedcommunicationpatternisusedforagentto agentcommunication,asdescribedinSection4.

Forinstance,inthefirstexecutionlayer,theagentsineachofthethreepossiblestatesforvehiclesoutputamessagetoa messagelist.Thosemessagelistsaretheniteratedbythedifferentagentstatesinlayerstwoandthree,dependantuponthe agentstate.Vehicleagentswhicharedeemedtohaveleft thesimulationareidentifiedasdeadandsubsequentlyexcluded fromfurtheriterationsofthesimulation.

(5)

stateimplementtheconstantorexponentialvehiclearrivalalgorithmfromAimsun[15],ensuringthatthesameconditions requiredforanagenttobeinsertedintothesimulationsuchasorderofarrivalandsufficientheadwayaremet.

TheRoadandJunctionstatesareusedtorepresentvehicleswhichusethetwographsusedtoimplementtheroad net-workdatastructure,withvehiclesintheRoadstateexhibitingprimarilythecarfollowingbehaviouralmodel,andinteraction withstopsigns.AgentsintheJunctionstateexecutethegapacceptancegive-waymodelandyellow-boxmodelusedforthis study,andsubsequentlyifmovementisallowedthevehiclewillusethecarfollowingmodeltotraversethejunction.

Additionaltothestatemachineforcars,alinearstatemachineexistsfordetectoragents.DetectorAgentsrepresent real-worldvehicledetectionsystems suchasinductionloopsplacedwithin roadnetworkinfrastructure.Thesedetectionloops areusedtomeasuretraffic conditions,theoutputofwhichcanbe usedforcalibration,validationorinteractivebehaviour. Thedetectorstate machineislinear,consistingoffouragentfunctionsinseparatelayers.Thesefunctionseach iterateone ofthemessagelists produced bythe vehicleagents,andcaptureanystatistics relevantto theindividual detectorforthe currentinteraction.

Thetransportnetworkisrepresentedusinga pairofgraphs,representedusingtheCompressed SparseRow(CSR)data structure.Thefirstgraphrepresentstheroadsofthetransportnetworkasedges,andthejunctionsasvertices.Thesecond graphmodelstheconnectionsbetweenroadswithineachjunction.Separategraphsareusedtosimplifythespecificationof theroadnetwork.

4. Networkcommunication

WithinFLAMEGPU, communicationbetweenagentsusesmessage-listtraversalalgorithmsprovidingspecialised access patternstoenableefficient,indirect,data-parallelagentcommunication.Thealgorithmsanddata-structuresusedcanhave considerableimpact ontheperformanceofamodel,andthedevelopmentofspecialisedhigh-performanceaccessto mes-sages is vital to achieve highlevels of performance forlarge-scale ABMs. Thissection describes a newspecialisation for graph-basedmessageswithinFLAMEGPU,whichenableshighperformancefornetwork-basedcommunicationpatterns.The performanceofthecommunicationstrategyisevaluatedcomparedtoexistingFLAMEGPUcommunicationpatterns.

4.1. FLAMEGPUmessage-basedcommunication

ThecommunicationmethodusedinaFLAMEGPUmodelcanbespecialisedtoofferincreasedlevelsofperformancefor differentcommunicationpatterns.Existingtechniquesformessage-basedcommunicationavailablewithinFLAMEGPUallow globalcommunication(all-to-all)orpartitionedcommunicationbasedonspatiallocality[6].All-to-allmessagespecialisation enablesindirectglobalcommunicationbetweenagents,whilespatiallypartitionedmessagingrestrictsthecommunicationto theimmediateregionsurroundingtheagentin2Dor3Dspace.Thesecommunicationpatternsarenon-optimalforseveral corevehiclemodelsforroadnetworkmicrosimulation,wherethecommunicationbetweenagentsisrestrictedtotheroad networkinfrastructure.Typicallymodelssuchascarfollowingandlanechangingmodelsonlyrequirecommunicationwith other agents on the same road section ordirectly connected road sections within the transport network. Other models suchasgive waymodellingforjunctionsmayrequirecommunicationwithin thejunctionandwithother vehicles onthe connectedroad sections.Inthesecases both all-to-allorspatially partitioned communicationstrategiescan resultin the iterationoflargemessageliststofindtherelevantmessageormessages.Spatiallypartitionedcommunicationwouldoffer an improvementover all-to-allcommunication, howeverin areaswitha large numberof agents,orin dense sectionsof roadnetworksuchasurbancitycentres,thenumberofmessageseach agentmustiteratecanstillbe significantforeven relativelysmallcommunicationradii,resultinginanegativeimpactonperformance[35].

4.2.Graph-basedmessaging

Ideallycommunicationshouldonlyneedtooccur withintheconstraintsofthenetwork,whichistypicallyrepresented usingagraphorseriesofgraphs.Forthepurposesofthisbenchmarkmodel,whichconsistsofsinglelanesectionsofroads andstop-sign-based,yellow-boxjunctions,communicationislimitedtothecurrentroadnetworkedge,thenextroad net-workedge,orthecurrentjunction(graphvertex).Existingtechniquesforgraph-basedcommunicationexistwithinparallel anddistributedagentbasedframeworkssuch asD-Mason andRepast HPC,howeverthesestrategiesuseindividual agents astheverticesofthegraphandtherelationshipsbetweenagentsaretheedgesofthegraph[29,37].

(6)

Fig.3. TheeffectsofalternateFLAMEGPUcommunicationstrategiesoncarfollowingmodelsisshownforasectionofagrid-basedroadnetwork,fora singleagentrepresentedinwhite.All-to-allcommunicationresultsin42messagesbeingparsedbythesingleagent.Spatiallypartitionedmessagingresults inthe18messagesfromtheagentsintheblueshadedregion,whilethegraph-basedcommunicationstrategyresultsinonly5messagesfromtheorange borderedagentsbeingprocessed.Thespatiallypartitionedradiususedforthisillustrationwouldbeinsufficientforaccuratemodelling,andisonlyused forillustrativepurposes.

vertexwithinthenetwork.Ascancanthenbeperformedonthehistogramtoprovidetheoutputpositionforthemessages, allowing messages to be sortedinparallel. The histogramwhich isconstructed duringthe sortingalgorithm canalso be usedasapartofthedatastructureformessageaccess.Thenetwork-basedcommunicationstrategyisgeneralenoughtobe appliedto anyGPUmulti-agentsimulation,wherecommunicationalonga graph-baseddatastructureisrequired,such as socialnetworkmodelling.

4.3. Applicationtoroadnetworkmicrosimulation

(7)

Fig.4. Theaveragetimetakenforvehicleagentstooutputroadmessagesareshownforthreecommunicationstrategiesanddifferentscalesofmodel. Thisshowstheoverheadcostofeachcommunicationspecialisation.TheapplicationwasbenchmarkedonaNvidiaTitanX(Pascal).

Fig.5. TheaveragetimetakenforvehicleagentstoexecutetheGPUkernelwhichincludesthecarfollowingmodelisshownforthreecommunication strategiesanddifferentscalesofmodel.Thiskerneliteratesallmessagesintheappropriatelisttofindtheleadvehicle.Graph-basedcommunicationshows thehighestlevelofperformance.TheapplicationwasbenchmarkedonaNvidiaTitanX(Pascal).

4.4.Performanceimpactofthecommunicationstrategyonthecarfollowingmodel

Toevaluatetheperformanceimprovementprovidedbygraph-basedcommunicationforagent-basedroadnetwork mod-els,aset ofroadnetworkswerebenchmarkedusingthree communicationstrategies.Themessagelistoutput byvehicles travellingalongtheroadsectionswithintheroadnetworkwerespecialisedusingall-to-all,spatiallypartitionedand graph-basedmessaging.TheperformanceoftherelevantGPUkernelsisextractedandcompared.

(8)

Table1

ComputationalhardwareusedforbenchmarkingAimsunandFLAMEGPUsimulations.

Identifier CPU CPUcores CPUfrequency Memory GPU

System#1 Intelcorei74770k 4 3.5GHz 16GBDDR3 NVIDIATitanX(Pascal)(TCC)& NVIDIAGeForceGTX1080 (WDDM)

System#2 2× IntelXeonE52698v4 40(20) 2.20GHz 512GBDDR4 8× NVIDIATeslaP100(linux)

Table2

Parametersusedforvalidationandallbenchmarksimulations.

Parametername Gridsizeexperiment Inputflowexperiment Units

Timestep 0.8 0.8 seconds

Reactiontime 0.8 0.8 seconds

Stopreactiontime 1.2 1.2 seconds

Detectorperiod 600 600 seconds

Inputflowscalefactor 1.0 1.0 %

Seed Randomvalue Randomvalue Longinteger

Sectionlength 1000 1000 m

Junctionlength 10 10 m

Gridsize 2–576 64,128&256

Inputflow 600 100–1000

Detectorspersection 3 3

5. Experimentalresults

This section summarises thecross-validation ofthe FLAMEGPU basedsimulation against the existing multi-core CPU road network microsimulation tool Aimsun. It provides details of the procedurally generated Manhattan style grid road networkusedforperformancebenchmarkingandtwosetsofbenchmarkexperimentsarediscussed.

5.1. Validation

Fortheperformanceevaluationofthetwosimulatorstobecomparableitisimportantthatsoftwareproduces compara-bleresults.Asthecomplexsystemisstochastic theresultswillneedto bestatisticallyequivalent.TheFLAMEGPU imple-mentationwascross-validatedagainst Aimsun8.1foraseriesofmodelstovalidatemodelbehaviours,such asthevehicle arrival algorithm[15],carfollowing[15] andstop signinteraction [15].Additionally theoverallbehavioursandpopulation sizesfortheprocedurallygeneratedroadnetworkareevaluated.Thesamesetofparameters areusedforbothsimulators. Thesetofvalidationmodelsshowedthatvehicleparametersampling,thevehiclearrivalalgorithm,thecarfollowingmodel andthejunctionmodelswereallcorrectlyimplemented,withnodevianceobserved.Theprocedurallygeneratednetworks showedstatisticallysimilarpopulationsizesandturningbehavioursoveranaverageofmultiplereplications.

5.2. Benchmarkingmethodology

Multipleconfigurations ofprocessinghardwarewereused toobservetheperformanceimpact ofthehardwareon per-formance.Table 1 describesthe twohardwaresystems usedfortheseexperiments.The Aimsun benchmarkmodels were executedusingSystem#1,adesktopcomputer.FLAMEGPUsimulationswerecarriedoutusingarangeofGPUsinSystem #1,andalsoonan NVIDIADGX-1 (System#2),a state-of-the-artmulti-GPUserver containingNVIDIATesla P100 acceler-ators. Allsimulationswere executedusingasingleCPU andsingleGPUandthe sameparameters(Table2)were usedfor bothsimulationsofonehourofvehicledemand.

5.3. Procedurallygeneratedbenchmarkroadnetwork

(9)

Fig.6. A5×5exampleoftheprocedurally-generatedartificialgridnetwork,showingtheoverallstructureofthenetworkandthearrangementofturning sectionswithinajunction.Thenetworkcanbescaledtoanysize,withnetworksofupto576×576usedduringbenchmarking.

Table3

ParametersforgenerationoftheprocedurallygeneratedManhattan-stylegridroadnetwork.

ParameterName Description

Gridsize(G) Thenumberofjunctionsforeachrowandcolumnofthegrid Sectionlength Thelengthofeachroadsectionbetweenjunctionsinmetres Junctionlength Thelengthordiameterofjunctionswithinthenetwork

Exitturnproportion Forjunctionswith1exitdestinationedge,theproportionofvehiclesturningontotheexitisreducedto maintainhighervehiclepopulationswithinthesimulatedroadnetwork.

5.4.Networkscaleexperiment

Toevaluatetheperformanceimpactoftotalproblemscale,theGridSizeoftheprocedurallygeneratednetworkwas var-iedfrom2 to576.Aimsun simulations wereonly carriedout up toa gridsize of512 dueto thetime takenforasingle simulationtocomplete.Asthegridsizeisincreased,thesize ofthetransport networkincreases,thenumberofdetectors agentsinthesimulationincreasesandthevehiclepopulationsizeincreases.Fig.7 showtheaveragesimulationtimefor3 repetitionsofthesimulation.ThisshowsthatforlargerscaleproblemstheGPUacceleratedFLAMEGPUsimulation outper-formsAimsun8.1,byupto37.2× usingSystem#1and43.8× usingSystem#2.TheRealtimeratio(RTR)ofasimulationis theratioofexecutionrun-timetosimulatedtime. ThelargestsimulationexecutedinFLAMEGPUof576,000vehiclesand 1,994,112detectorsshowsaRTRof10.5× usingaNVIDIAGeForceGTX1080insystem#1,16.6× usingsystem#1NVIDIA TitanX(Pascal)and25.5× usingtheNVIDIATeslaP100insystem#2.Forverysmallsimulations,withgridsize8andbelow containinglessthan7396concurrentvehicles,Aimsun8.1showsgreaterlevelsofperformancethanFLAMEGPU.

5.5.Inputflowexperiment

Demandontransportationnetworksisincreasingglobally[39].Toevaluatetheimpactofthisonmicroscopictransport networksimulation,thedemandonthetransportnetworkwasvariedforseveralfixedsizesofnetwork.Duetothe charac-teristicsofourprocedurallygeneratedtransportnetwork,thevehiclepopulationsizeincreasesnon-linearly,asvehiclescan onlyenterfromtheedgesofthenetwork.ThemodelparametersusedaredefinedinTable2.

Figs.8–10 showtheaveragesimulationtimefor3repetitionsofeachsimulation.Thesimulationsforagridsizeof64, FLAMEGPUshowsaveragespeed-upsofbetween3.8× and18.9× comparedtoAimsun 8.1. Speed-upsofbetween9.3× and36.6× areshownforagridsizeof128,andbetween8.1× and75.3× foragridsizeof256.

5.5.1. Discussionofapplicationbenchmarkresults

(10)

Fig.7. Totalsimulationperformanceasthetotalscaleofthesimulationisincreased,foreachsimulator.Valuesshownaretheaveragefrom3repetitions. Alogarithmicscaleisusedtoimprovevisibilityofsimilartotalsimulationtimes.

Fig.8. Totalsimulation timeagainstinputflow foraprocedurallygeneratedroadnetworkofgridsize64. Runtimesshownaretheaveragefrom3 repetitions.Alogarithmicscaleisused.

of parallelism andtherefore GPUutilisation increases.Additionally, GPUacceleration hasassociated overhead costs. Data andcommandsmustbetransferredfromthehostcomputertotheGPUonwhichcodeisexecuted.Thisisarelativelyslow process whichmust be minimisedwhere possibleto achieve highlevels ofperformance. Largermodels quicklyamortize thiscostthroughperformancegains.

Theperformanceofindividualsimulationiterations,showninFig.11 showsthatper-iterationsimulationtimeincreases asthesimulationprogresses.Thisisduetotheincreasingnumberofvehicleswithinthesimulation.Therateofincreaseof per-iterationsimulationtimegrowsatahigherratewithinAimsunthanFLAMEGPU,demonstratingtheincreasedscalability oftheGPUacceleratedsimulationwithrespecttopopulationsize.ForbothAimsun andFLAMEGPUthetime tosimulate asingleiterationpeaksevery750iterations.Thiscanbe accountedforbythedetectorandstatisticalsummarydatabeing processedandwrittenouttodiskevery600simulatedseconds(withasimulationstepof0.8s).

(11)

Fig.9. Totalsimulationtimeagainst inputflowforaprocedurallygeneratedroadnetworkofgridsize128.Runtimesshownaretheaveragefrom3 repetitions.Alogarithmicscaleisused.

Fig.10. Totalsimulationtimeagainstinputflowforaprocedurallygeneratedroadnetworkofgridsize256.Runtimesshownaretheaveragefrom3 repetitions.Alogarithmicscaleisused.

Table4

GPUspecifications[40–42].

GPU Corecount Corefrequency(MHz) Memory Memorybandwidth(GB/s)

NVIDIAGeforceGTX1080 2560 1607(1733) 8GBGDDR5X 320

NVIDIATitanX(Pascal) 3584 1417(1531) 12GBGDDR5X 480

NVIDIATeslaP100(SXM2) 3584 1328(1480) 16GBCoWoSHBM2 732

highpopulations of agents,the NVIDIATesla P100 in System #2showed thegreatest levels of performance. This canbe attributedtothelarger numberofprocessingcores,andthe greaterlevels ofmemorybandwidth providedby the HBM2 memory,compared toGDDR5X andGDDR5.The CPUsinSystem #1& #2also differ,howeverthe majorityofprocessing timewithintheapplicationisexecutedontheGPU.

6. Conclusionandfuturework

(12)

multi-Fig.11. Per-Iterationsimulationtimeforamodelwithgridsize128andinputflow500vehiclesperhour.

coreCPUbasedsimulationsoftwaretool commonlyusedwithinthetransportplanningindustry.Toensurethatsimulator behaviour is comparableandenable simulatorperformance to be compared the FLAMEGPU basedsimulation wascross validated against Aimsun using a series of models. As this is a stochastic model and additional features are presentin Aimsunwhichcouldnotbefullymitigatedtheresultswerenon-exactbutwerestatisticallysimilar.Thelargestsimulation executed usingbothsimulators,aone hoursimulationcontainingupto512,000vehiclesand1,575,936detectors,showed a speed-upof 43.8× for theGPU acceleratedsimulation,with a real-time-ratioof 28.9. TheGPU acceleratedsimulation alsodemonstrated greatlyimprovedperformance scaling behaviour,withthe largestsimulationofup to576,000vehicles and1,994,112 detectorscompleting in 49.5%of thetime ofa much smaller(up to64,000vehicles and24,960 detectors) simulationinAimsun8.1.

CriticaltotheperformanceoftheGPUacceleratedagentbasedmicrosimulationwastheuseofdataparallelalgorithms anddatastructuresemployedbyFLAMEGPUforagentmanagement,abstractingthecomplexitiesoftheGPUprogramming modelaway fromthe modeller.The proposed graph-basedcommunicationstrategy fordata-parallel agentsisalso vitally important,asexistingmethodsforagentcommunicationwithin FLAMEGPUwerenon-optimal,limitingsimulation perfor-mance.Theproposedgraphbasedmethodsalsohavemoregeneralimplications,asothermodelssuchassocialinteraction modelsrelyongraph-basedcommunication.Theincreasedmemory-bandwidthoftheNVIDIATeslaP100GPU,comparedto theNVIDIATitanX(Pascal) andNVIDIAGeForceGTX1080provedtohaveadramaticincrease ontheperformanceofthe application,astheapplicationismemorybound.

6.1. Futurework

OurFLAMEGPUbasedsimulatorcurrentlysupportsasingletypeofjunctiontosupportthebenchmarkmodelused.To enablethissoftwaretobeusedforreal-worldstudiesoftransportsystems,additionalfeaturesmustbeimplemented,such astheinclusionofmultiplelanes,trafficsignalsandalternativetypesofjunction.Theimpactoftheseadditionalbehaviours wouldrequirefurtherperformanceevaluationusingprocedurallygeneratedandreal-worldtransportnetworks.

The proposed graph-basedcommunication strategy is currently limited to communicationalong the currentedge, or around asinglevertexwithin thegraph.Althoughthiscanbe usedto achievesignificant performance improvementsthe strategyistoofferhighperformancecommunicationformultiple-edgesforgraphexploration.Thiswouldaidininthewider applicabilityofthegeneralcommunicationstrategyandofferperformanceimprovementstoothersimulationdomains. Fur-thermore,the existing agentbased simulationuses multiplegraphs to representthe transport network,combiningthese graphsintoasinglegraphwouldsimplifythedevelopmentofadditionalmodels,andallow amore-generalised communi-cationstrategytoyieldfurtherperformanceincreases.

(13)

Acknowledgement

Thiswork wassupported byaDepartmentforTransport TransportTechnologyResearchInnovationGrant“Accelerating TransportMicrosimulation: Demonstrating theimpact offuturemanycoresimulations” (T-TRIGJuly2016) andadditional supportthroughEPSRC fellowship“AcceleratingScientificDiscoverywithAcceleratedComputing” (EP/N018869/1).

References

[1] J.Barceló,J.Casas,DynamicnetworksimulationwithAimsun,in:SimulationApproachesinTransportationAnalysis,Springer,2005,pp.57–98. [2] M.Fellendorf,Vissim:amicroscopicsimulationtooltoevaluateactuatedsignalcontrolincludingbuspriority,in:64thInstituteofTransportation

EngineersAnnualMeeting,Springer,1994,pp.1–9.

[3] M.Balmer,M.Rieser,K.Meister,D.Charypar,N.Lefebvre,K.Nagel,Matsim-t:architectureandsimulationtimes,in:Multi-agentSystemsforTraffic andTransportationEngineering,IGIGlobal,2009,pp.57–78.

[4] D.Krajzewicz,J.Erdmann,M.Behrisch,L.Bieker,Recentdevelopmentand applicationsofSUMO-simulationofurbanmobility,Int.J.Adv.Syst. Measur.5(3&4)(2012)128–138.

[5] C.Antoniou,J.Barcelò,M.Brackstone,H.Celikoglu,B.Ciuffo,V.Punzo,P.Sykes,T.Toledo,P.Vortisch,P.Wagner,Trafficsimulation:caseforguidelines, 2014URLhttp://publications.jrc.ec.europa.eu/repository/bitstream/JRC88526/2014_multitude_guidelines_on-line.pdf,doi:10.2788/11382.

[6] P.Richmond,FLAMEGPUTechnicalReportand UserGuide(CS-11-03),TechnicalReport,DepartmentofComputerScience,UniversityofSheffield, 2011.

[7] G.Eliasson,Modelingtheexperimentallyorganizedeconomy,1991.doi:10.1016/0167-2681(91)90047-2.

[8] P.Gipps,Abehaviouralcar-followingmodelforcomputersimulation,Transp.Res.PartB15(2)(1981)105–111,doi:10.1016/0191-2615(81)90037-0. [9] M.Treiber,A.Hennecke,D.Helbing,Congestedtrafficstatesinempiricalobservationsandmicroscopicsimulations,Phys.Rev.E62(2)(2000)1805. [10] P.Gipps,Amodelforthestructureoflane-changingdecisions,Transp.Res.PartB20(5)(1986)URLhttp://www.sciencedirect.com/science/article/pii/

0191261586900123.

[11]M.Brackstone, M.McDonald,J.Wu,Lanechangingon themotorway:factorsaffectingitsoccurrence,and theirimplications,in:RoadTransport InformationandControl,1998.9thInternationalConferenceon(Conf.Publ.No.454),IET,1998,pp.160–164.

[12] R.Liu,D.VanVliet,D.Watling,Microsimulationmodelsincorporatingbothdemandandsupplydynamics,Transp.Res.PartA40(2)(2006)125–150. [13] R.Dowling,A.Skabardonis,J.Halkias,G.McHale,G.Zammit,Guidelinesforcalibrationofmicrosimulationmodels:frameworkandapplications,Transp.

Res.Rec.(1876)(2004)1–9.

[14] S.-J.Kim,W.Kim,L.Rilett,Calibrationofmicrosimulationmodelsusingnonparametricstatisticaltechniques,TranspResRec.(1935)(2005)111–119. [15] TransportSimulationSystems,Aimsun8UsersManual,2014.

[16] A.PTV,Vissim5.40usermanual,Karlsruhe,Germany(2011). [17]NVIDIACorporation,NVIDIATeslaP100(whitepaper),2016.

[18] Microway, Detailed specifications of the intel xeon e5-2600v4-broadwell-ep-processors, 2016. URL https://www.microway.com/ knowledge-center-articles/detailed-specifications-of-the-intel-xeon-e5-2600v4-broadwell-ep-processors/.

[19] Green500,Green500website,2016.(https://www.top500.org/green500/).LastAccessed2016-08-01

[20]M.Zamith,M.Joselli,J.R.S.Junior,R.C.P.Leal-Toledo,E.Clua,I.MediaLab,E.Soluri,N.Tecnologia,AnapproachfortrafficforecastwithGPUcomputing &cellularautomatamodel.

[21]D.Strippgen,K.Nagel,Multi-agenttrafficsimulationwithcuda,in:HighPerformanceComputing&Simulation,2009.HPCS’09.International Confer-enceon,IEEE,2009,pp.106–114.

[22]K.Wang,Z.Shen,AGPU-basedparallelgeneticalgorithmforgeneratingdailyactivityplans,IEEETrans.Intell.Transp.Syst.13(3)(2012)1474–1480. [23]K.Wang,Z.Shen, ArtificialsocietiesandGPU-basedcloudcomputingforintelligenttransportationmanagement,IEEEIntell.Syst.26(4)(2011)22–28. [24]Y.Sano,N.Fukuta,AGPU-basedframeworkforlarge-scalemulti-agenttrafficsimulations,in:AdvancedAppliedInformatics(IIAIAAI),2013IIAI

Inter-nationalConferenceon,IEEE,2013,pp.262–267.

[25]Y.Xu,G.Tan,X.Li,X.Song,MesoscopictrafficsimulationonCPU/GPU,in:Proceedingsofthe2ndACMSIGSIMConferenceonPrinciplesofAdvanced DiscreteSimulation,ACM,2014,pp.39–50.

[26]X.Song,Z.Xie,Y.Xu,G.Tan,W.Tang,J.Bi,X.Li,Supportingreal-worldnetwork-orientedmesoscopictrafficsimulationongpu,Simul.Modell.Practice Theory74(2017)46–63.

[27]R.Bradley,I.Wright,D.Swain,R.Himlin,P.Heywood,P.Richmond,M.Mawson,G.Fletcher,R.Guichard,AcceleratingtrafficmodelsusingGPU-based technology,in:EuropeanTransportConference2016AssociationforEuropeanTransport(AET),2016.

[28]G.Cordasco,R.D.Chiara,A.Mancuso,D.Mazzeo,V.Scarano,C.Spagnuolo,Aframeworkfordistributingagent-basedsimulations,in:Euro-Par Work-shops(1)’11,2011,pp.460–470.

[29]N.Collier,M.North,RepastHPC:APlatformforLarge-ScaleAgentBasedModeling,Wiley,2011.

[30]L.Chin,D.Worth,C.Greenough,S.Coakley,M.Holcombe,M.Gheorghe,Flame-ii:aredesignoftheflexiblelarge-scaleagent-basedmodelling envi-ronment,TechnicalReport,2012.

[31] P.Richmond,D.Walker,S.Coakley,D.Romano,Highperformancecellularlevelagent-basedsimulationwithFLAMEfortheGPU.,BriefingsBioinf.11 (3)(2010)334–347URLhttp://www.ncbi.nlm.nih.gov/pubmed/20123941,doi:10.1093/bib/bbp073.

[32]P.Richmond,D.Romano, Template-drivenagent-basedmodelingandsimulationwithcuda,in:GPUComputingGemsEmeraldEdition,in:Applications ofGPUComputingSeries,2011,pp.313–324.

[33]T.Karmakharm,P.Richmond,D.M.Romano,Agent-basedlargescalesimulationofpedestrianswithadaptiverealisticnavigationvectorfields.,TPCG 10(2010)67–74.

[34]S.Tamrakar,P.Richmond,R.M.DSouza,Pi-flame:aparallelimmunesystemsimulatorusingtheflamegraphicprocessingunitenvironment, SIMULA-TION(2016).0037549716673724

[35]P.Heywood,P.Richmond,S.Maddock,RoadnetworksimulationusingflameGPU,in:Euro-Par2015:ParallelProcessingWorkshops,Springer,2015, pp.430–441,doi:10.1007/978-3-319-27308-2.

[36]TransportSimulationSystems,Aimsun8DynamicSimulatorsUsersManual,2014.

[37]G.Cordasco,C.Spagnuolo,V.Scarano,Towardthenewversionofd-mason:efficiency,effectivenessandcorrectnessinparallelanddistributed agen-t-basedsimulations,in:ParallelandDistributedProcessingSymposiumWorkshops,2016IEEEInternational,IEEE,2016,pp.1803–1812.

[38]S.Green,Particlesimulationusingcuda,NVIDIAWhitepaper6(2010)121–128.

[39]Departmentfor Transport,Roadtrafficforecasts2015,2015,(https://www.gov.uk/government/uploads/system/uploads/attachment_data/file/260700/ road-transport-forecasts-2013-extended-version.pdf).

[40]NVIDIA Corporation, Nvidia geforce gtx 1080 whitepaper: Gaming perfected, 2016. URL http://international.download.nvidia.com/geforce-com/ international/pdfs/GeForce_GTX_1080_Whitepaper_FINAL.pdf.

[41] NVIDIACorporation,Nvidiatitanx(pascal)webpage,2016.URLhttp://www.geforce.co.uk/hardware/10series/titan-x/.

References

Related documents

TICKET VALIDITY ON FAST SERVICES (Sr): to travel by fast services, passengers must pay the connected extra charge, excluding the ENTIRE NETWORK daily circulation tickets..

These issues are: balancing the tension between the contractual rights of creditors and the need for maintaining public services in the event of financial distress and default, a

Conclusions: In this study, the new dithiophosphonate ligands were synthesized and utilized in the preparation of copper(I) and silver(I) complexes with ferrocenyldithiophosphonate

Representing six states, the institutions included in the study were the University of West Virginia at Parkersburg, Utah Valley State and Dixie State College in Utah, Dalton

Our analysis enables us to provide a simple rule for evaluating government provision of public goods which reinstates the Pigou-Harberger-Browning conclusion that the costs of

Moreover, conditional on trade, FDI, corruption, banking sector liberalization, real interest rate differential and distance, we find that SEE countries had attracted more

Also, in the previous developments, two key assumptions explain why the economy is subject to banking crises: the productivity shock and the fact that entrepreneurs must receive

It is cultivated for its fruit (dates) which generates a significant amount of seeds, and this work dealt with the valorization of this matter. The study investigated the