Contents lists available at ScienceDirect
Simulation
Modelling
Practice
and
Theory
journal homepage: www.elsevier.com/locate/simpat
Data-parallel
agent-based
microscopic
road
network
simulation
using
graphics
processing
units
Peter
Heywood
a,∗,
Steve
Maddock
a,
Jordi
Casas
b,
David
Garcia
b,
Mark
Brackstone
b,
Paul
Richmond
aaDepartmentofComputerScience,TheUniversityofSheffield,UnitedKingdom bTransportSimulationSystemsLtd(TSS),Spain
a
r
t
i
c
l
e
i
n
f
o
Articlehistory:
Availableonlinexxx
Keywords:
Agent-basedsimulation GPU
Simulationframework Transportmicrosimulation
a
b
s
t
r
a
c
t
Roadnetworkmicrosimulationiscomputationallyexpensive,andexistingstateoftheart commercialtoolsusetaskparallelismand coarse-graineddata-parallelismformulti-core processorsto achieveimprovedlevels ofperformance.Analternativeisto useGraphics ProcessingUnits(GPUs)andfine-graineddataparallelism.ThispaperdescribesaGPU ac-celeratedagentbasedmicrosimulationmodelofaroadnetworktransportsystem.The per-formanceforaprocedurallygeneratedgridnetworkisevaluatedagainstthatofan equiv-alentmulti-coreCPUsimulation.InordertoutiliseGPUarchitectureseffectivelythe pa-perdescribesanapproachforgraphtraversalofneighbouringinformationwhichisvital toprovidinghighlevelsofcomputationalperformance.Thegraphtraversalapproachhas beenintegratedwithinaGPUagentbasedsimulationframeworkasageneralisedmessage traversaltechniqueforgraph-basedcommunication.Speed-upsofupto43× are demon-stratedwithincreasedperformancescalingbehaviour.Simulationofoverhalfamillion ve-hiclesandnearlytwomilliondetectorsatarateof25× fasterthanreal-timeisobtained onasingleGPU.
© 2017TheAuthors.PublishedbyElsevierB.V. ThisisanopenaccessarticleundertheCCBYlicense. (http://creativecommons.org/licenses/by/4.0/)
1. Introduction
Simulationsofroadnetworksareusedduringthedevelopmentandmanagementoftransportnetworksaroundtheglobe. Microscopic roadnetwork simulationsare fine-grainedsimulations which simulateindividual vehicles within thesystem, capturinglowlevelbehaviours.AgentBasedModelling(ABM)isonemicroscopicapproach,whererelativelysimple individ-ualbehavioursaredefined,whichcombinedwithinteractionsbetweenagentsandtheenvironment,allows theemergence ofcomplex behaviours. However, microscopic simulations are much more computationally expensive than the more tra-ditionalhigher-level macroscopicsimulations, which use a higherlevel ofabstraction consistingof network flows rather thanindividualvehicles. Nonetheless,thelevelofdetailcapturedbymicroscopicsimulations ismuchgreaterthan thatof macroscopicandmesoscopicsimulations,includingtheemergentbehavioursenabledbytheuseofABM.
∗ Correspondingauthor.
E-mailaddresses:p.heywood@sheffield.ac.uk(P.Heywood),p.richmond@sheffield.ac.uk(P.Richmond).
https://doi.org/10.1016/j.simpat.2017.11.002
1569-190X/© 2017 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY license. (http://creativecommons.org/licenses/by/4.0/)
As microscopic simulations are computationally expensive, current commercial and open-source tools such as Aim-sun [1], Vissim [2], MATsim [3] and SUMO [4] can have considerable execution run-times, especially for large scale simulations. This limits the overall effectiveness and uptake of microscopic simulation within the transport sector [5]. To increase simulator performance, existing simulation tools use parallel processing applied to multi-core processors to reduce therun-timeof simulations.Task-parallelandcoarse-graineddata-parallel approachesaretypically applied within existing tools. Task-parallelism distributes independent processing tasks to separate processing threads. Coarse-grained data-parallelism applies the same algorithms to differentunits of data,where the individual unitsof data are relatively large.Theseapproachesare well-suitedto multi-coreCentralProcessing Units(CPUs), butcan resultinpoorperformance scalability.
Many-core architecturessuch as Graphics Processing Units(GPUs) offersignificantly greater levels of parallelism than multi-core architectures, andsignificantlygreater levelsofraw computeperformance. Toaccessthehighlevels of perfor-mance, algorithms anddata structures must expose high levels of parallelism andenable goodmemory-access patterns. Typicallyfine-graineddata-parallelismisused,wherethesamealgorithmsareappliedtorelativelysmallindividualunitsof data.
Todemonstratethesimulation performanceimprovements whichcould beachievedby usingmany-coreGPUs for mi-croscopicroadnetworksimulations,asingle-lanetransportmodelwithstop-sign-basedyellow-boxjunctionshasbeen im-plemented using FLAME GPU (Flexible Large Scale Agent Modelling Environment forGraphics Processing Units). FLAME GPU is the only general-purpose GPU-accelerated ABM framework [6]. The performance of the simulation is bench-marked using a procedurally generated artificial grid-based road network, and the simulation performance is compared toahigh-performancemulti-core CPUmicroscopicroadnetworksimulationtool,Aimsun 8.1,whichimplementsthesame models.
This paper presents two contributions:(i) a GPU accelerated data-parallel agent-based road network microsimulation modelisevaluatedagainstanequivalentmodelinacommercialmulti-core CPUsoftwaretool,demonstratingconsiderable improvementstosimulationperformanceandperformancescalability;and(ii)ageneral-purposegraph-based communica-tionstrategyispresentedforhighperformanceagentcommunicationforfine-graineddata-parallelagentbasedsimulations, implemented forthe FLAMEGPU ABMframework which enableshigh performance agentbasedsimulations of transport networksonGPUs.
Section2 providesasummary ofrelatedwork.Section3 detailstheimplementedmodel,thebenchmarknetworkused andtheFLAMEGPUimplementation.Section4 describesthegraph-basedcommunicationstrategytoenablethehighlevels ofperformanceinthismodel.Section5 providestheresultsofasetofapplicationbenchmarksusedtoassessthe perfor-manceimpactofGPUsonmicroscopicroadnetworksimulationusingABM.Section6 concludesthepaper.
2. Relatedwork
Microscopicsimulationoftransportsimulationinvolvesthesimulationofindividualvehiclesandpiecesofroadnetwork infrastructuretopredict theeffectsofchanges invehiclebehaviour orchanges intheroadnetworkinfrastructure.Thisis used asatool inthe planningandmanagement oftransport networkstoimprovethe effectivenessofinfrastructure and minimisetheimpactofchangesinconditionsonthetransportationnetwork.Microscopicsimulations arecomputationally expensive, whichhaslimitedtheadoptionofmicrosimulationcomparedtohigherlevelmesoscopicandmacroscopic sim-ulations [5]. Amicro-scale simulation must includebehavioural models for vehicles to followas well asmodels for any dynamicinfrastructuresuch astrafficsignals andvehicle detectors,whichattemptto accuratelycapturethe behaviourof therealworld.ABMisapowerfulapproachtodefiningmicroscopicmodels,whichprovidesanaturalmethodofdescribing individual behaviouralmodelsandthensimulatingthe interactionbetweenindividualsinthesimulation[7].Someofthe mostimportantvehicle behaviouralmodels are: (i)Car Following Models[8,9];(ii) LaneChangingBehaviour[10,11]; and (iii)GapAcceptanceModelling[1,12].
Thesebehaviouralmodelsmakeuseofseveralsetsofinputdata,including:theattributesandstructureofthetransport network;transportdemandwhichisusedtopopulatethesimulationandallowalternatescenariostobesimulatedfor pre-dictiveuse;andsimulationparameterswhichareusedtomodifythemodelswithinthesimulator.Parameterscaninclude distributionsfromwhichvehiclepropertiescan besampled,such asvehiclelength,rateofaccelerationorevenproperties suchaspollutionemissionrates.Theseparameterscanbemanipulatedtoreplicateobservedbehaviour[13,14].
Fig.1. Theaveragerun-timefrom3repetitionsfora60minuteAimsun8.1simulationcontainingapproximately25,000vehicles,fordifferentprocessor threadcounts.Asprocessorthreadcountincreases,totalsimulationtimedecreases,butwithdiminishingreturns.NotethattheIntelXeonE5-2643isa dual-socketsystem.Additionally,noperformanceimprovementswereobservedwhenusingmoreprocessingthreadsthanphysicalcores(Hyper-threading).
Incontrasttomulti-coreprocessorarchitectures,highlyparallelmany-coreprocessingarchitecturessuchasGPUsenable theuseoffine-graineddata-parallelismtooffer highlevels ofperformance. Originallydevelopedfor2Dand3Dcomputer graphicsapplications,GPUsarenowcommonlyusedtoaccelerategeneralpurposecomputingthroughdata-parallelism,as they offersignificant computational power. Forinstance, a singlestate ofthe artNVIDIA P100GPU hasa peak theoreti-cal performance of 5.3TFLOPS (trillion floating point operations per second)in double precision [17], compared to Intel Broadwell(22 core)Xeon CPUswhichare capable oflessthan 1.0TFLOPS [18], withevengreater levels ofrawcompute performanceforsingleandhalfprecision floating pointoperations. From adevice perspective,GPUs useaformofSingle Instruction,ManyData(SIMD)parallelarchitecture,wherethesameinstructionsareappliedtomanyitemsofdata concur-rently.Additionally,alarge portionoftheGPUdiespaceisused forthe manyprocessingcores,leavinga relativelysmall portionofdiespaceavailableforon-chipcachesotoachievehighlevelsofperformancetheuseofmemorymustbe care-fullyconsidered. Alongwithdataaccess patterns,theSIMDmodel usedby GPUs oftenrequiresalternate algorithms and datastructures tosoftwaredesignedformulti-coreCPUs,inordertoleveragethehighlevelsofperformance required. Ad-ditionally,GPUsare typicallymoreenergy-efficientwhencomparedtoCPUs,offering greaterlevelsofperformanceforthe samepowerusage,i.e.GPUs typicallyofferahighernumberofFLOPSperwatt.Thisisdemonstratedbythehigh propor-tionofGPUenabledsuper-computersintheupperranksoftheGreen500,therankingoftheworld’smostpower-efficient super-computers[19].
AlthoughcurrentcommercialandopensourcemicroscopicroadnetworksimulatorssuchasVissimorAimsunuse multi-coreCPUparallelism toincrease simulatorperformance,GPUshaveyettobeadoptedwithintransportmodellingsoftware asawhole.GPUshavebeenusedpreviouslyinmicroscopictransportnetworksimulation,asbothcellularautomata[20] and alsousingmulti-agentsystems[21].Additionallyothertasksassociatedwithmicroscopicsimulationhavebeenaccelerated usingGPUs,suchasthegenerationofdemanddatausingactivityplangeneration[22],traffic signaloptimisation[23] and dynamicrouteassignment[24].Recently,outsideofmicroscopicsimulation,GPUshavebeenutilisedwithinbothmesoscopic andmicroscopicroad networksimulation toolsandperformance improvementshave beendemonstrated[25–27], further highlightingthedemandforreducedsoftwarerun-timeswithinthetransportmodellingsector.
Fig.2. FLAMEGPUState-basedrepresentationforvehicleagentswithinthesimulation,illustratingthelogicalcontrol-flowoftheagentswithinasingle iteration.Interactionwithdetectoragentshasbeenomittedforclarity.
enablinghighperformancesimulations withoutexplicitknowledgeofthe parallelprocessingmodel[6,31,32].FLAMEGPU hassuccessfullybeenusedinseveralmodellingdomainssuchaspedestriancrowdsimulation[33] andcellularsimulations
[31,34],however the useof FLAME GPUfor road network simulation hasbeen limited, due to the unsuitabilityof fixed radiuscommunicationforusewithintransportnetworks[35].
3. Mulit-agentGPUmicrosimulation
ToevaluatethepotentialperformanceimpactofGPUacceleratedABMonmicroscopicroadnetworksimulationa single-lane road network model was implemented. The implemented models include: car following model [1,15,36] (based on Gipps’carfollowingmodel[8]);constantvehiclearrival[15,36];virtualqueues[15,36];detectors[15,36];ayellow-boxmodel
[15,36] andagapacceptancegive-waymodel[1,15,36] combinedwithstop-signsforjunctions.Thesefeaturesandmodels wereselectedtoproducecomparablebehaviourfortheprocedurallygeneratedgridroadnetworkdescribedinSection5.3, withoutthecomplexitiesofmodellingthebroadrangeofroadnetworkinfrastructurepresentinrealworldroadnetworks. FLAMEGPUsimulationsmakeuseofastate-basedrepresentationofagentdynamicstosimplifydevelopmentandenable keyperformance optimisationson behalfofthemodeller.Fig.2 showsthestate machineforthevehicularagentsusedto implementthismodel;itrepresentsthelogicalprocessfollowedbyeachagentateachiterationofthesimulation,showing therelationshipbetweenagentstates,functionsandmessage-lists.Agentsenterthestatemachineforthecurrentiteration inoneofthepossibleinitialstates.Agentfunctionsallowagentstotransitionfromonestatetoanother,withtheorderof thesefunctionscontrolledbyexecutionlayers.Communicationbetweenagentsisintheformofmessagelists,whichareused tohighlightdependenciesinthestate-basedrepresentation.Theuseofindirectcommunicationpreventsraceconditionsin theparallelprocessingenvironment.Agentfunctionscanoutputmessagestotherelevantmessagelistdatastructure,which isthenusedasinputbyadifferentagentfunction.Themessagelistsareaccessedusingspecialisedmessageaccesspatterns toreduce theperformance impactofmessagelistparsing,anewgraph-basedcommunicationpatternisusedforagentto agentcommunication,asdescribedinSection4.
Forinstance,inthefirstexecutionlayer,theagentsineachofthethreepossiblestatesforvehiclesoutputamessagetoa messagelist.Thosemessagelistsaretheniteratedbythedifferentagentstatesinlayerstwoandthree,dependantuponthe agentstate.Vehicleagentswhicharedeemedtohaveleft thesimulationareidentifiedasdeadandsubsequentlyexcluded fromfurtheriterationsofthesimulation.
stateimplementtheconstantorexponentialvehiclearrivalalgorithmfromAimsun[15],ensuringthatthesameconditions requiredforanagenttobeinsertedintothesimulationsuchasorderofarrivalandsufficientheadwayaremet.
TheRoadandJunctionstatesareusedtorepresentvehicleswhichusethetwographsusedtoimplementtheroad net-workdatastructure,withvehiclesintheRoadstateexhibitingprimarilythecarfollowingbehaviouralmodel,andinteraction withstopsigns.AgentsintheJunctionstateexecutethegapacceptancegive-waymodelandyellow-boxmodelusedforthis study,andsubsequentlyifmovementisallowedthevehiclewillusethecarfollowingmodeltotraversethejunction.
Additionaltothestatemachineforcars,alinearstatemachineexistsfordetectoragents.DetectorAgentsrepresent real-worldvehicledetectionsystems suchasinductionloopsplacedwithin roadnetworkinfrastructure.Thesedetectionloops areusedtomeasuretraffic conditions,theoutputofwhichcanbe usedforcalibration,validationorinteractivebehaviour. Thedetectorstate machineislinear,consistingoffouragentfunctionsinseparatelayers.Thesefunctionseach iterateone ofthemessagelists produced bythe vehicleagents,andcaptureanystatistics relevantto theindividual detectorforthe currentinteraction.
Thetransportnetworkisrepresentedusinga pairofgraphs,representedusingtheCompressed SparseRow(CSR)data structure.Thefirstgraphrepresentstheroadsofthetransportnetworkasedges,andthejunctionsasvertices.Thesecond graphmodelstheconnectionsbetweenroadswithineachjunction.Separategraphsareusedtosimplifythespecificationof theroadnetwork.
4. Networkcommunication
WithinFLAMEGPU, communicationbetweenagentsusesmessage-listtraversalalgorithmsprovidingspecialised access patternstoenableefficient,indirect,data-parallelagentcommunication.Thealgorithmsanddata-structuresusedcanhave considerableimpact ontheperformanceofamodel,andthedevelopmentofspecialisedhigh-performanceaccessto mes-sages is vital to achieve highlevels of performance forlarge-scale ABMs. Thissection describes a newspecialisation for graph-basedmessageswithinFLAMEGPU,whichenableshighperformancefornetwork-basedcommunicationpatterns.The performanceofthecommunicationstrategyisevaluatedcomparedtoexistingFLAMEGPUcommunicationpatterns.
4.1. FLAMEGPUmessage-basedcommunication
ThecommunicationmethodusedinaFLAMEGPUmodelcanbespecialisedtoofferincreasedlevelsofperformancefor differentcommunicationpatterns.Existingtechniquesformessage-basedcommunicationavailablewithinFLAMEGPUallow globalcommunication(all-to-all)orpartitionedcommunicationbasedonspatiallocality[6].All-to-allmessagespecialisation enablesindirectglobalcommunicationbetweenagents,whilespatiallypartitionedmessagingrestrictsthecommunicationto theimmediateregionsurroundingtheagentin2Dor3Dspace.Thesecommunicationpatternsarenon-optimalforseveral corevehiclemodelsforroadnetworkmicrosimulation,wherethecommunicationbetweenagentsisrestrictedtotheroad networkinfrastructure.Typicallymodelssuchascarfollowingandlanechangingmodelsonlyrequirecommunicationwith other agents on the same road section ordirectly connected road sections within the transport network. Other models suchasgive waymodellingforjunctionsmayrequirecommunicationwithin thejunctionandwithother vehicles onthe connectedroad sections.Inthesecases both all-to-allorspatially partitioned communicationstrategiescan resultin the iterationoflargemessageliststofindtherelevantmessageormessages.Spatiallypartitionedcommunicationwouldoffer an improvementover all-to-allcommunication, howeverin areaswitha large numberof agents,orin dense sectionsof roadnetworksuchasurbancitycentres,thenumberofmessageseach agentmustiteratecanstillbe significantforeven relativelysmallcommunicationradii,resultinginanegativeimpactonperformance[35].
4.2.Graph-basedmessaging
Ideallycommunicationshouldonlyneedtooccur withintheconstraintsofthenetwork,whichistypicallyrepresented usingagraphorseriesofgraphs.Forthepurposesofthisbenchmarkmodel,whichconsistsofsinglelanesectionsofroads andstop-sign-based,yellow-boxjunctions,communicationislimitedtothecurrentroadnetworkedge,thenextroad net-workedge,orthecurrentjunction(graphvertex).Existingtechniquesforgraph-basedcommunicationexistwithinparallel anddistributedagentbasedframeworkssuch asD-Mason andRepast HPC,howeverthesestrategiesuseindividual agents astheverticesofthegraphandtherelationshipsbetweenagentsaretheedgesofthegraph[29,37].
Fig.3. TheeffectsofalternateFLAMEGPUcommunicationstrategiesoncarfollowingmodelsisshownforasectionofagrid-basedroadnetwork,fora singleagentrepresentedinwhite.All-to-allcommunicationresultsin42messagesbeingparsedbythesingleagent.Spatiallypartitionedmessagingresults inthe18messagesfromtheagentsintheblueshadedregion,whilethegraph-basedcommunicationstrategyresultsinonly5messagesfromtheorange borderedagentsbeingprocessed.Thespatiallypartitionedradiususedforthisillustrationwouldbeinsufficientforaccuratemodelling,andisonlyused forillustrativepurposes.
vertexwithinthenetwork.Ascancanthenbeperformedonthehistogramtoprovidetheoutputpositionforthemessages, allowing messages to be sortedinparallel. The histogramwhich isconstructed duringthe sortingalgorithm canalso be usedasapartofthedatastructureformessageaccess.Thenetwork-basedcommunicationstrategyisgeneralenoughtobe appliedto anyGPUmulti-agentsimulation,wherecommunicationalonga graph-baseddatastructureisrequired,such as socialnetworkmodelling.
4.3. Applicationtoroadnetworkmicrosimulation
Fig.4. Theaveragetimetakenforvehicleagentstooutputroadmessagesareshownforthreecommunicationstrategiesanddifferentscalesofmodel. Thisshowstheoverheadcostofeachcommunicationspecialisation.TheapplicationwasbenchmarkedonaNvidiaTitanX(Pascal).
Fig.5. TheaveragetimetakenforvehicleagentstoexecutetheGPUkernelwhichincludesthecarfollowingmodelisshownforthreecommunication strategiesanddifferentscalesofmodel.Thiskerneliteratesallmessagesintheappropriatelisttofindtheleadvehicle.Graph-basedcommunicationshows thehighestlevelofperformance.TheapplicationwasbenchmarkedonaNvidiaTitanX(Pascal).
4.4.Performanceimpactofthecommunicationstrategyonthecarfollowingmodel
Toevaluatetheperformanceimprovementprovidedbygraph-basedcommunicationforagent-basedroadnetwork mod-els,aset ofroadnetworkswerebenchmarkedusingthree communicationstrategies.Themessagelistoutput byvehicles travellingalongtheroadsectionswithintheroadnetworkwerespecialisedusingall-to-all,spatiallypartitionedand graph-basedmessaging.TheperformanceoftherelevantGPUkernelsisextractedandcompared.
Table1
ComputationalhardwareusedforbenchmarkingAimsunandFLAMEGPUsimulations.
Identifier CPU CPUcores CPUfrequency Memory GPU
System#1 Intelcorei74770k 4 3.5GHz 16GBDDR3 NVIDIATitanX(Pascal)(TCC)& NVIDIAGeForceGTX1080 (WDDM)
System#2 2× IntelXeonE52698v4 40(20) 2.20GHz 512GBDDR4 8× NVIDIATeslaP100(linux)
Table2
Parametersusedforvalidationandallbenchmarksimulations.
Parametername Gridsizeexperiment Inputflowexperiment Units
Timestep 0.8 0.8 seconds
Reactiontime 0.8 0.8 seconds
Stopreactiontime 1.2 1.2 seconds
Detectorperiod 600 600 seconds
Inputflowscalefactor 1.0 1.0 %
Seed Randomvalue Randomvalue Longinteger
Sectionlength 1000 1000 m
Junctionlength 10 10 m
Gridsize 2–576 64,128&256
Inputflow 600 100–1000
Detectorspersection 3 3
5. Experimentalresults
This section summarises thecross-validation ofthe FLAMEGPU basedsimulation against the existing multi-core CPU road network microsimulation tool Aimsun. It provides details of the procedurally generated Manhattan style grid road networkusedforperformancebenchmarkingandtwosetsofbenchmarkexperimentsarediscussed.
5.1. Validation
Fortheperformanceevaluationofthetwosimulatorstobecomparableitisimportantthatsoftwareproduces compara-bleresults.Asthecomplexsystemisstochastic theresultswillneedto bestatisticallyequivalent.TheFLAMEGPU imple-mentationwascross-validatedagainst Aimsun8.1foraseriesofmodelstovalidatemodelbehaviours,such asthevehicle arrival algorithm[15],carfollowing[15] andstop signinteraction [15].Additionally theoverallbehavioursandpopulation sizesfortheprocedurallygeneratedroadnetworkareevaluated.Thesamesetofparameters areusedforbothsimulators. Thesetofvalidationmodelsshowedthatvehicleparametersampling,thevehiclearrivalalgorithm,thecarfollowingmodel andthejunctionmodelswereallcorrectlyimplemented,withnodevianceobserved.Theprocedurallygeneratednetworks showedstatisticallysimilarpopulationsizesandturningbehavioursoveranaverageofmultiplereplications.
5.2. Benchmarkingmethodology
Multipleconfigurations ofprocessinghardwarewereused toobservetheperformanceimpact ofthehardwareon per-formance.Table 1 describesthe twohardwaresystems usedfortheseexperiments.The Aimsun benchmarkmodels were executedusingSystem#1,adesktopcomputer.FLAMEGPUsimulationswerecarriedoutusingarangeofGPUsinSystem #1,andalsoonan NVIDIADGX-1 (System#2),a state-of-the-artmulti-GPUserver containingNVIDIATesla P100 acceler-ators. Allsimulationswere executedusingasingleCPU andsingleGPUandthe sameparameters(Table2)were usedfor bothsimulationsofonehourofvehicledemand.
5.3. Procedurallygeneratedbenchmarkroadnetwork
Fig.6. A5×5exampleoftheprocedurally-generatedartificialgridnetwork,showingtheoverallstructureofthenetworkandthearrangementofturning sectionswithinajunction.Thenetworkcanbescaledtoanysize,withnetworksofupto576×576usedduringbenchmarking.
Table3
ParametersforgenerationoftheprocedurallygeneratedManhattan-stylegridroadnetwork.
ParameterName Description
Gridsize(G) Thenumberofjunctionsforeachrowandcolumnofthegrid Sectionlength Thelengthofeachroadsectionbetweenjunctionsinmetres Junctionlength Thelengthordiameterofjunctionswithinthenetwork
Exitturnproportion Forjunctionswith1exitdestinationedge,theproportionofvehiclesturningontotheexitisreducedto maintainhighervehiclepopulationswithinthesimulatedroadnetwork.
5.4.Networkscaleexperiment
Toevaluatetheperformanceimpactoftotalproblemscale,theGridSizeoftheprocedurallygeneratednetworkwas var-iedfrom2 to576.Aimsun simulations wereonly carriedout up toa gridsize of512 dueto thetime takenforasingle simulationtocomplete.Asthegridsizeisincreased,thesize ofthetransport networkincreases,thenumberofdetectors agentsinthesimulationincreasesandthevehiclepopulationsizeincreases.Fig.7 showtheaveragesimulationtimefor3 repetitionsofthesimulation.ThisshowsthatforlargerscaleproblemstheGPUacceleratedFLAMEGPUsimulation outper-formsAimsun8.1,byupto37.2× usingSystem#1and43.8× usingSystem#2.TheRealtimeratio(RTR)ofasimulationis theratioofexecutionrun-timetosimulatedtime. ThelargestsimulationexecutedinFLAMEGPUof576,000vehiclesand 1,994,112detectorsshowsaRTRof10.5× usingaNVIDIAGeForceGTX1080insystem#1,16.6× usingsystem#1NVIDIA TitanX(Pascal)and25.5× usingtheNVIDIATeslaP100insystem#2.Forverysmallsimulations,withgridsize8andbelow containinglessthan7396concurrentvehicles,Aimsun8.1showsgreaterlevelsofperformancethanFLAMEGPU.
5.5.Inputflowexperiment
Demandontransportationnetworksisincreasingglobally[39].Toevaluatetheimpactofthisonmicroscopictransport networksimulation,thedemandonthetransportnetworkwasvariedforseveralfixedsizesofnetwork.Duetothe charac-teristicsofourprocedurallygeneratedtransportnetwork,thevehiclepopulationsizeincreasesnon-linearly,asvehiclescan onlyenterfromtheedgesofthenetwork.ThemodelparametersusedaredefinedinTable2.
Figs.8–10 showtheaveragesimulationtimefor3repetitionsofeachsimulation.Thesimulationsforagridsizeof64, FLAMEGPUshowsaveragespeed-upsofbetween3.8× and18.9× comparedtoAimsun 8.1. Speed-upsofbetween9.3× and36.6× areshownforagridsizeof128,andbetween8.1× and75.3× foragridsizeof256.
5.5.1. Discussionofapplicationbenchmarkresults
Fig.7. Totalsimulationperformanceasthetotalscaleofthesimulationisincreased,foreachsimulator.Valuesshownaretheaveragefrom3repetitions. Alogarithmicscaleisusedtoimprovevisibilityofsimilartotalsimulationtimes.
Fig.8. Totalsimulation timeagainstinputflow foraprocedurallygeneratedroadnetworkofgridsize64. Runtimesshownaretheaveragefrom3 repetitions.Alogarithmicscaleisused.
of parallelism andtherefore GPUutilisation increases.Additionally, GPUacceleration hasassociated overhead costs. Data andcommandsmustbetransferredfromthehostcomputertotheGPUonwhichcodeisexecuted.Thisisarelativelyslow process whichmust be minimisedwhere possibleto achieve highlevels ofperformance. Largermodels quicklyamortize thiscostthroughperformancegains.
Theperformanceofindividualsimulationiterations,showninFig.11 showsthatper-iterationsimulationtimeincreases asthesimulationprogresses.Thisisduetotheincreasingnumberofvehicleswithinthesimulation.Therateofincreaseof per-iterationsimulationtimegrowsatahigherratewithinAimsunthanFLAMEGPU,demonstratingtheincreasedscalability oftheGPUacceleratedsimulationwithrespecttopopulationsize.ForbothAimsun andFLAMEGPUthetime tosimulate asingleiterationpeaksevery750iterations.Thiscanbe accountedforbythedetectorandstatisticalsummarydatabeing processedandwrittenouttodiskevery600simulatedseconds(withasimulationstepof0.8s).
Fig.9. Totalsimulationtimeagainst inputflowforaprocedurallygeneratedroadnetworkofgridsize128.Runtimesshownaretheaveragefrom3 repetitions.Alogarithmicscaleisused.
Fig.10. Totalsimulationtimeagainstinputflowforaprocedurallygeneratedroadnetworkofgridsize256.Runtimesshownaretheaveragefrom3 repetitions.Alogarithmicscaleisused.
Table4
GPUspecifications[40–42].
GPU Corecount Corefrequency(MHz) Memory Memorybandwidth(GB/s)
NVIDIAGeforceGTX1080 2560 1607(1733) 8GBGDDR5X 320
NVIDIATitanX(Pascal) 3584 1417(1531) 12GBGDDR5X 480
NVIDIATeslaP100(SXM2) 3584 1328(1480) 16GBCoWoSHBM2 732
highpopulations of agents,the NVIDIATesla P100 in System #2showed thegreatest levels of performance. This canbe attributedtothelarger numberofprocessingcores,andthe greaterlevels ofmemorybandwidth providedby the HBM2 memory,compared toGDDR5X andGDDR5.The CPUsinSystem #1& #2also differ,howeverthe majorityofprocessing timewithintheapplicationisexecutedontheGPU.
6. Conclusionandfuturework
multi-Fig.11. Per-Iterationsimulationtimeforamodelwithgridsize128andinputflow500vehiclesperhour.
coreCPUbasedsimulationsoftwaretool commonlyusedwithinthetransportplanningindustry.Toensurethatsimulator behaviour is comparableandenable simulatorperformance to be compared the FLAMEGPU basedsimulation wascross validated against Aimsun using a series of models. As this is a stochastic model and additional features are presentin Aimsunwhichcouldnotbefullymitigatedtheresultswerenon-exactbutwerestatisticallysimilar.Thelargestsimulation executed usingbothsimulators,aone hoursimulationcontainingupto512,000vehiclesand1,575,936detectors,showed a speed-upof 43.8× for theGPU acceleratedsimulation,with a real-time-ratioof 28.9. TheGPU acceleratedsimulation alsodemonstrated greatlyimprovedperformance scaling behaviour,withthe largestsimulationofup to576,000vehicles and1,994,112 detectorscompleting in 49.5%of thetime ofa much smaller(up to64,000vehicles and24,960 detectors) simulationinAimsun8.1.
CriticaltotheperformanceoftheGPUacceleratedagentbasedmicrosimulationwastheuseofdataparallelalgorithms anddatastructuresemployedbyFLAMEGPUforagentmanagement,abstractingthecomplexitiesoftheGPUprogramming modelaway fromthe modeller.The proposed graph-basedcommunicationstrategy fordata-parallel agentsisalso vitally important,asexistingmethodsforagentcommunicationwithin FLAMEGPUwerenon-optimal,limitingsimulation perfor-mance.Theproposedgraphbasedmethodsalsohavemoregeneralimplications,asothermodelssuchassocialinteraction modelsrelyongraph-basedcommunication.Theincreasedmemory-bandwidthoftheNVIDIATeslaP100GPU,comparedto theNVIDIATitanX(Pascal) andNVIDIAGeForceGTX1080provedtohaveadramaticincrease ontheperformanceofthe application,astheapplicationismemorybound.
6.1. Futurework
OurFLAMEGPUbasedsimulatorcurrentlysupportsasingletypeofjunctiontosupportthebenchmarkmodelused.To enablethissoftwaretobeusedforreal-worldstudiesoftransportsystems,additionalfeaturesmustbeimplemented,such astheinclusionofmultiplelanes,trafficsignalsandalternativetypesofjunction.Theimpactoftheseadditionalbehaviours wouldrequirefurtherperformanceevaluationusingprocedurallygeneratedandreal-worldtransportnetworks.
The proposed graph-basedcommunication strategy is currently limited to communicationalong the currentedge, or around asinglevertexwithin thegraph.Althoughthiscanbe usedto achievesignificant performance improvementsthe strategyistoofferhighperformancecommunicationformultiple-edgesforgraphexploration.Thiswouldaidininthewider applicabilityofthegeneralcommunicationstrategyandofferperformanceimprovementstoothersimulationdomains. Fur-thermore,the existing agentbased simulationuses multiplegraphs to representthe transport network,combiningthese graphsintoasinglegraphwouldsimplifythedevelopmentofadditionalmodels,andallow amore-generalised communi-cationstrategytoyieldfurtherperformanceincreases.
Acknowledgement
Thiswork wassupported byaDepartmentforTransport TransportTechnologyResearchInnovationGrant“Accelerating TransportMicrosimulation: Demonstrating theimpact offuturemanycoresimulations” (T-TRIGJuly2016) andadditional supportthroughEPSRC fellowship“AcceleratingScientificDiscoverywithAcceleratedComputing” (EP/N018869/1).
References
[1] J.Barceló,J.Casas,DynamicnetworksimulationwithAimsun,in:SimulationApproachesinTransportationAnalysis,Springer,2005,pp.57–98. [2] M.Fellendorf,Vissim:amicroscopicsimulationtooltoevaluateactuatedsignalcontrolincludingbuspriority,in:64thInstituteofTransportation
EngineersAnnualMeeting,Springer,1994,pp.1–9.
[3] M.Balmer,M.Rieser,K.Meister,D.Charypar,N.Lefebvre,K.Nagel,Matsim-t:architectureandsimulationtimes,in:Multi-agentSystemsforTraffic andTransportationEngineering,IGIGlobal,2009,pp.57–78.
[4] D.Krajzewicz,J.Erdmann,M.Behrisch,L.Bieker,Recentdevelopmentand applicationsofSUMO-simulationofurbanmobility,Int.J.Adv.Syst. Measur.5(3&4)(2012)128–138.
[5] C.Antoniou,J.Barcelò,M.Brackstone,H.Celikoglu,B.Ciuffo,V.Punzo,P.Sykes,T.Toledo,P.Vortisch,P.Wagner,Trafficsimulation:caseforguidelines, 2014URLhttp://publications.jrc.ec.europa.eu/repository/bitstream/JRC88526/2014_multitude_guidelines_on-line.pdf,doi:10.2788/11382.
[6] P.Richmond,FLAMEGPUTechnicalReportand UserGuide(CS-11-03),TechnicalReport,DepartmentofComputerScience,UniversityofSheffield, 2011.
[7] G.Eliasson,Modelingtheexperimentallyorganizedeconomy,1991.doi:10.1016/0167-2681(91)90047-2.
[8] P.Gipps,Abehaviouralcar-followingmodelforcomputersimulation,Transp.Res.PartB15(2)(1981)105–111,doi:10.1016/0191-2615(81)90037-0. [9] M.Treiber,A.Hennecke,D.Helbing,Congestedtrafficstatesinempiricalobservationsandmicroscopicsimulations,Phys.Rev.E62(2)(2000)1805. [10] P.Gipps,Amodelforthestructureoflane-changingdecisions,Transp.Res.PartB20(5)(1986)URLhttp://www.sciencedirect.com/science/article/pii/
0191261586900123.
[11]M.Brackstone, M.McDonald,J.Wu,Lanechangingon themotorway:factorsaffectingitsoccurrence,and theirimplications,in:RoadTransport InformationandControl,1998.9thInternationalConferenceon(Conf.Publ.No.454),IET,1998,pp.160–164.
[12] R.Liu,D.VanVliet,D.Watling,Microsimulationmodelsincorporatingbothdemandandsupplydynamics,Transp.Res.PartA40(2)(2006)125–150. [13] R.Dowling,A.Skabardonis,J.Halkias,G.McHale,G.Zammit,Guidelinesforcalibrationofmicrosimulationmodels:frameworkandapplications,Transp.
Res.Rec.(1876)(2004)1–9.
[14] S.-J.Kim,W.Kim,L.Rilett,Calibrationofmicrosimulationmodelsusingnonparametricstatisticaltechniques,TranspResRec.(1935)(2005)111–119. [15] TransportSimulationSystems,Aimsun8UsersManual,2014.
[16] A.PTV,Vissim5.40usermanual,Karlsruhe,Germany(2011). [17]NVIDIACorporation,NVIDIATeslaP100(whitepaper),2016.
[18] Microway, Detailed specifications of the intel xeon e5-2600v4-broadwell-ep-processors, 2016. URL https://www.microway.com/ knowledge-center-articles/detailed-specifications-of-the-intel-xeon-e5-2600v4-broadwell-ep-processors/.
[19] Green500,Green500website,2016.(https://www.top500.org/green500/).LastAccessed2016-08-01
[20]M.Zamith,M.Joselli,J.R.S.Junior,R.C.P.Leal-Toledo,E.Clua,I.MediaLab,E.Soluri,N.Tecnologia,AnapproachfortrafficforecastwithGPUcomputing &cellularautomatamodel.
[21]D.Strippgen,K.Nagel,Multi-agenttrafficsimulationwithcuda,in:HighPerformanceComputing&Simulation,2009.HPCS’09.International Confer-enceon,IEEE,2009,pp.106–114.
[22]K.Wang,Z.Shen,AGPU-basedparallelgeneticalgorithmforgeneratingdailyactivityplans,IEEETrans.Intell.Transp.Syst.13(3)(2012)1474–1480. [23]K.Wang,Z.Shen, ArtificialsocietiesandGPU-basedcloudcomputingforintelligenttransportationmanagement,IEEEIntell.Syst.26(4)(2011)22–28. [24]Y.Sano,N.Fukuta,AGPU-basedframeworkforlarge-scalemulti-agenttrafficsimulations,in:AdvancedAppliedInformatics(IIAIAAI),2013IIAI
Inter-nationalConferenceon,IEEE,2013,pp.262–267.
[25]Y.Xu,G.Tan,X.Li,X.Song,MesoscopictrafficsimulationonCPU/GPU,in:Proceedingsofthe2ndACMSIGSIMConferenceonPrinciplesofAdvanced DiscreteSimulation,ACM,2014,pp.39–50.
[26]X.Song,Z.Xie,Y.Xu,G.Tan,W.Tang,J.Bi,X.Li,Supportingreal-worldnetwork-orientedmesoscopictrafficsimulationongpu,Simul.Modell.Practice Theory74(2017)46–63.
[27]R.Bradley,I.Wright,D.Swain,R.Himlin,P.Heywood,P.Richmond,M.Mawson,G.Fletcher,R.Guichard,AcceleratingtrafficmodelsusingGPU-based technology,in:EuropeanTransportConference2016AssociationforEuropeanTransport(AET),2016.
[28]G.Cordasco,R.D.Chiara,A.Mancuso,D.Mazzeo,V.Scarano,C.Spagnuolo,Aframeworkfordistributingagent-basedsimulations,in:Euro-Par Work-shops(1)’11,2011,pp.460–470.
[29]N.Collier,M.North,RepastHPC:APlatformforLarge-ScaleAgentBasedModeling,Wiley,2011.
[30]L.Chin,D.Worth,C.Greenough,S.Coakley,M.Holcombe,M.Gheorghe,Flame-ii:aredesignoftheflexiblelarge-scaleagent-basedmodelling envi-ronment,TechnicalReport,2012.
[31] P.Richmond,D.Walker,S.Coakley,D.Romano,Highperformancecellularlevelagent-basedsimulationwithFLAMEfortheGPU.,BriefingsBioinf.11 (3)(2010)334–347URLhttp://www.ncbi.nlm.nih.gov/pubmed/20123941,doi:10.1093/bib/bbp073.
[32]P.Richmond,D.Romano, Template-drivenagent-basedmodelingandsimulationwithcuda,in:GPUComputingGemsEmeraldEdition,in:Applications ofGPUComputingSeries,2011,pp.313–324.
[33]T.Karmakharm,P.Richmond,D.M.Romano,Agent-basedlargescalesimulationofpedestrianswithadaptiverealisticnavigationvectorfields.,TPCG 10(2010)67–74.
[34]S.Tamrakar,P.Richmond,R.M.DSouza,Pi-flame:aparallelimmunesystemsimulatorusingtheflamegraphicprocessingunitenvironment, SIMULA-TION(2016).0037549716673724
[35]P.Heywood,P.Richmond,S.Maddock,RoadnetworksimulationusingflameGPU,in:Euro-Par2015:ParallelProcessingWorkshops,Springer,2015, pp.430–441,doi:10.1007/978-3-319-27308-2.
[36]TransportSimulationSystems,Aimsun8DynamicSimulatorsUsersManual,2014.
[37]G.Cordasco,C.Spagnuolo,V.Scarano,Towardthenewversionofd-mason:efficiency,effectivenessandcorrectnessinparallelanddistributed agen-t-basedsimulations,in:ParallelandDistributedProcessingSymposiumWorkshops,2016IEEEInternational,IEEE,2016,pp.1803–1812.
[38]S.Green,Particlesimulationusingcuda,NVIDIAWhitepaper6(2010)121–128.
[39]Departmentfor Transport,Roadtrafficforecasts2015,2015,(https://www.gov.uk/government/uploads/system/uploads/attachment_data/file/260700/ road-transport-forecasts-2013-extended-version.pdf).
[40]NVIDIA Corporation, Nvidia geforce gtx 1080 whitepaper: Gaming perfected, 2016. URL http://international.download.nvidia.com/geforce-com/ international/pdfs/GeForce_GTX_1080_Whitepaper_FINAL.pdf.
[41] NVIDIACorporation,Nvidiatitanx(pascal)webpage,2016.URLhttp://www.geforce.co.uk/hardware/10series/titan-x/.