Applied Soft Computing

(1)

ContentslistsavailableatScienceDirect

Applied

Soft

Computing

jo u r n al h om e p a g e :w w w . e l s e v i e r . c o m / l o c a t e / a s o c

A

survey

on

parallel

ant

colony

optimization

Martín

Pedemonte, Sergio

Nesmachnow

∗

,

Héctor

Cancela

FacultaddeIngeniería,UniversidaddelaRepública,Uruguay

a

r

t

i

c

l

e

i

n

f

o

Articlehistory:

Received8November2010

Receivedinrevisedform26April2011 Accepted23May2011

Availableonline31May2011

Keywords:

Antcolonyoptimization Parallelimplementations Taxonomy

a

b

s

t

r

a

c

t

Antcolonyoptimization(ACO)isawell-knownswarmintelligencemethod,inspiredinthesocial behav-iorofantcoloniesforsolvingoptimizationproblems.Whenfacinglargeandcomplexprobleminstances, parallelcomputingtechniquesareusuallyappliedtoimprovetheefﬁciency,allowingACOalgorithms toachievehighqualityresultsinreasonableexecutiontimes,evenwhentacklinghard-to-solve opti-mizationproblems.Thisworkintroducesanewtaxonomyforclassifyingsoftware-basedparallelACO algorithmsandalsopresentsasystematicandcomprehensivesurveyofthecurrentstate-of-the-arton parallelACOimplementations.Eachparallelmodelreviewediscategorizedinthenewtaxonomy pro-posed,andaninsightontrendsandperspectivesintheﬁeldofparallelACOimplementationsisprovided. ©2011ElsevierB.V.Allrightsreserved.

1. Introduction

In the last twenty years, the research community hasbeen searchingfornewoptimizationtechniquesthatareabletoimprove overthetraditionalexactones,whoselargecomputational require-ments often make them useless for solving complex real-life optimizationproblemsinacceptabletimes.Inthiscontext, nature-inspired metaheuristic methods have emerged as ﬂexible and robusttoolsforsolvingNP-hardoptimizationproblems,exploiting theirability tocompute accuratesolutions in moderate execu-tion times [13,49]. Ant colony optimization (ACO) is a swarm intelligencepopulation-basedmetaheuristicinspiredinthesocial behavior of ant colonies, which applies the key concepts of distributed collaboration,self-organization,adaptation, and dis-tributionfoundinantcommunities,inordertoefﬁciently solve real-lifeoptimizationproblems[41].

Parallelimplementationsbecamepopularinthelastdecadein ordertoimprovetheefﬁciencyofpopulation-basedmetaheuristics. Bysplittingthepopulationintoseveralprocessingelements, paral-lelimplementationsofmetaheuristicsallowreachinghighquality resultsinareasonableexecutiontime,evenwhenfacing hard-to-solveoptimizationproblems[2].Parallelalgorithmsnotonlytake beneﬁtofusingseveralcomputingelementstospeedupthesearch, theyalsointroduceanewexplorationpatternthatisoftenusefulto improveovertheresultqualityofthesequentialimplementations. Manypaperscanbefoundintherelatedliteraturestatingthat parallelimplementations areuseful toimprovetheACO explo-rationpattern;Fig.1showsthenumberofpublicationsperyear

∗Correspondingauthor.

E-mailaddress:sergion@ﬁng.edu.uy(S.Nesmachnow).

inthisarea.However,researchersoftenlackageneralizedpointof view,sincetheyusuallytackleauniqueimplementationtosolvea speciﬁcproblem.

Dorigo[39,40]firstsuggestedtheapplicationofparallel com-puting techniques to enhance both the ACO search and its computationalefficiency,whileRandallandLewis[84]proposed thefirstclassificationofACOparallelizationstrategies.Thebook chapterbyJansonetal.[57]andthearticlebyEllabibetal.[46]are theonlypreviousworksthathavecollectedbibliographyof pub-lishedpapersproposingparallelACOimplementations.Jansonetal. reviewedparallelACOproposalspublishedupto2002,focusingon comparing“parallelized”standardACOalgorithms,specific paral-lelACOmethods,andhardwareparallelization;althoughtheydid notincludeanexplicitalgorithmictaxonomy.Ellabibetal.briefly commentedparallelACOimplementationsupto2004,focusingin describingtheapplications,andtheyonlydistinguishedbetween coarse-grainandfine-grainmodelsforparallelACO.

Theclassic proposalsof parallelACOs focusedontraditional supercomputersandclustersofworkstations.Nowadays,thenovel emergentparallelcomputingarchitecturessuchasmulticore pro-cessors,graphicsprocessingunits(GPUs),andgridenvironments providenewopportunitiestoapplyparallelcomputingtechniques toimprovetheACOsearchresultsandtolowertherequired com-putationtimes.

Inthislineofwork,themaincontributionsofthisarticleare: (i)tointroduceanewtaxonomytoclassifysoftware-based paral-lelACOalgorithms,(ii)topresentasystematicandcomprehensive surveyofthecurrentstate-of-the-artonparallelACO implemen-tations,and (iii)toprovideaninsight ofthecurrenttrendsand perspectivesintheﬁeld.Thesurveyfocusesmainlyontheparallel models,addressingtheprincipalcharacteristicsofeachproposal, theexperimentalanalysis–includingtheoptimizationproblems 1568-4946/$–seefrontmatter©2011ElsevierB.V.Allrightsreserved.

(2)

Fig.1.Numberofreviewedpublicationsbyyear.

faced,thetestcasesandtheparallelplatformusedinthe exper-iments,andthereportedresults–,andthemaincontributionsof thereviewedworks.EachparallelACOproposaliscategorizedin thenewtaxonomyproposed.

Themanuscriptisstructuredasfollows.Nextsectiondescribes theresearchmethodologyusedinthereview.Section3describes themainfeaturesoftheACOtechniqueandbrieflyintroducesthe mostpopularACOvariants.Section4presentsthegenericconcepts ofthestrategiesforACOparallelizationandcommentsprevious classificationcriteria.Italsodescribesthenewtaxonomyproposed inthisworktocategorizeparallelACOs.Section5reviewsthe pre-viousworkonparallelACOimplementationsandcategorizeseach proposalusingthenewtaxonomy.Acomparativeanalysis regard-ingthecomputationalefficiencyandqualityofresultsisofferedin Section6.Section7presentsthetrendsandperspectivesinthefield ofparallelACOimplementations,beforestatingtheconclusionsof thesurveyinSection8.

2. Methodology

Theresearchmethodologyusedinthereviewinvolved search-ing and reviewing papers from soft computing/computational intelligenceconferences,journals,andbooks,wherethe applica-tionofparallelprocessingtechniquestoACOhavebeenproposed. 2.1. Sourcesandsearchmethods

Acomprehensivesearchwasperformedinconferences, jour-nalsandbooksaboutmetaheuristicsandparallelism.Thedatabases searched in this study include ScienceDirect, Scopus, Thomson Reuters(formerlyISI) Web of Knowledge, ACM DigitalLibrary, IEEEExplore,Elsevier,SpringerLink,Citeseer,aswellasmany oth-ersOpenAccessPublishingdatabases.Thereviewedpaperscome outfromleadingconferencesandjournalsaboutsoftcomputing, suchasInternationalConferenceonParallelProblemSolvingfrom Nature, Conference on Genetic and Evolutionary Computation, Conference onEvolutionary Computation, Journal of Heuristics, InformationSciences, Lecture NotesinComputingScience,IEEE TransactionsonEvolutionaryComputation,AppliedSoft Comput-ing,FutureGenerationComputerSystems,andJournalofArtiﬁcial IntelligenceResearch,amongothers.

Thesearchofrelatedpapersinspeciﬁcdatabaseswasdoneusing agroupofkeywordsthatincludeantcolonyoptimization, paral-lel,distributed,parallelism,andsoftcomputing.Additionally,the referencesectionofeachpaperfoundwasreviewedtolocate addi-tionalstudiesofinterest.Asaresult,theﬁnalreferencesconsist of69papers:19publishedinjournals,44inreferredconferences,

3inbooks,and3M.Sc./Ph.Dthesis.Theanalysisofrelatedworks wasmainlyfocusedonthefeaturesoftheparallelmodels, describ-ingthedistinctive characteristicsof eachparallelACOproposal, theoptimizationproblemtackled,thetestcasesandtheparallel platformusedintheexperiments,andtheefﬁciencyandquality resultsreported.EachparallelACOproposaliscategorizedinthe newtaxonomyproposedinthiswork.Inordertostudytherecent contributionsaboutparallelantcolonyoptimization,thosepapers publishedinthelastﬁveyears(2005–2010)werefurtherstudied toanalyzethemaintrendsandperspectivesaboutparallelACO implementations.

2.2. Scope

Thereviewfocusesinpapersthathaveproposedexplicitly par-allelimplementationsofACO,disregardingthoseproposalsusing implicitparallelismoradistributed-agent-basedsearch.A com-pletedescriptionoftheimplicit-parallelandotherACOcategories thathavebeenleftapartfromthereviewaresummarizedinSection

4.3.

Thereviewedworksfaceoptimizationproblemsfromalarge spectrumofapplicationdomains.Onlysingle-objective,static opti-mizationproblemsarecoveredinthissurvey,sincetheyarethe largeclassofproblemsfrequentlysolvedusingparallelACO.The algorithmicstructureofACOtosolvemulti-objectiveanddynamic optimizationproblemsisdifferenttothetraditionalACOalgorithm, sotheyhavenotbeenincludedinthescopeofthisreview 3. Antcolonyoptimization

Ant Colony Optimization [45] is a population-based meta-heuristicfor solvingoptimizationproblems,originallyproposed byDorigoandDiCaro[41].ACOusesartiﬁcialantstoconstruct solutions by incrementallyadding componentsthat are chosen consideringheuristicinformationoftheproblemandpheromone trailsthatreﬂecttheacquiredsearchexperience.

Algorithm1presentstheskeletonofanACOalgorithmapplied toacombinatorialoptimizationproblemforminimizingone objec-tivefunction.Atﬁrst,theACOsets theinitialpheromone trails values (T) and the heuristic value of thesolution components (H,knownasvisibility).Afterthat,thealgorithmiterates untila given stopconditionis reached. Everyiteration step is divided in fourstages. First, each antof thecolony concurrently, inde-pendently,andasynchronouslyconstructsasolutionbyselecting componentsusingaprobabilisticrulethatconsidersboththe expe-rienceacquiredduringthesearch(throughthetraceofpheromone deposited)andheuristicinformationoftheconsideredcomponents

(3)

(throughthevisibility).Thenextstageoptionallyappliesalocal searchmethodtoimprovethesolutions.In thethirdstage,the pheromonetrailsareupdated:thetracevaluesaredecreasedby evaporationandincreasedbydepositingpheromoneinthe compo-nentsusedtoconstructsolutions;thenetchangeinthepheromone valuedependsonthecontributionsofthesetwoupdateprocesses. Finally,inthelaststage,thebestsolutionfoundsincethestartof thealgorithm(thebest-so-farsolution)isupdatedifabetter solu-tionhasbeenfound.ACOreturnsthebest-so-farsolution,whenthe stopcriteriaisaccomplished.

Algorithm1.

ACOappliedtoastaticcombinatorialoptimizationproblem T= initializePheromoneTrails()

H= initilizeVisibilities()

sbest= s|f(s)=+∞

WhilenotstopCriteria()do

pop = constructAntsSolutions(T,H)

pop’ = applyLocalSearch(pop) % optional T =updatePheromones(T,pop’)

s =selectBestOfPopulation(pop’)

iff(s)<f(sbest)then%update best-so-far solution

sbest=s

endif endwhile returnsbest

Thescientiﬁccommunityhasproposedmultiplevariantsthat instantiatethegeneralschemeshowninAlgorithm1.Someofthe mostpopularvariantsinclude:

•AntSystem(AS)[44],theclassicmethodthatusesarandom pro-portionalstatetransitionrule,whilethepheromoneisdeposited byallantsproportionallytotheirsolutionqualityandis evapo-ratedinallthecomponents.

•AntColonySystem(ACS)[43],whichemploysapseudo-random statetransitionrule,andthepheromoneisonlydepositedand evaporatedonthecomponentsofthebestsolution.ACS incorpo-ratesalocalpheromoneupdateduringthesolutionconstruction, allowingtheexplorationofunusedcomponents.

•MAX−MINAnt System (MMAS) [91] that includes explicit lower and upper limits on the pheromone, which is only depositedonthecomponentsofthebestsolution.

ACO methods have also been proposed for solving multi-objective and dynamic optimization problems. However, these methodshaveparticularfeatures(suchasPareto-basedevaluation ofsolutions,elitepopulationsorcolonies,alternativepheromone updatingandevaporationrules,multiplepheromones,etc.)that makethemdifferentfromthetraditionalACOschemapresented inAlgorithm1,sotheyhavenotbeenincludedinthescopeofthis review.

4. ACOparallelizationstrategies

Thesystematicstudyoftheapplicationofparallelcomputing techniquestoACOalgorithmsisrecent.Severalauthorsagreethat exhaustive work still needsto be donein this subject [45,57]. There are few articles that specifically discuss possible strate-gies to implement parallel ACO; and no recent work features a complete state-of-the-artreview onthis subject. Thissection brieflyintroducesthemetricstoevaluatetheperformanceof par-allel algorithms,which will beusedin the literaturereview in Section5.Later,itpresentsdifferentstrategiestoimplement par-allel ACOalgorithms. The currentlyproposed classificationsfor software-based parallelACOare discussedbeforeintroducing a newtaxonomy,which aimstoextendthepreviousonesandto overcomesomeoftheirshortcomingsandomissions.

4.1. Parallelperformancemetrics

Severalmetrics have been proposedto evaluatethe perfor-manceofparallelalgorithms.Themostcommonmetricsusedby theresearchcommunityare thespeedupandtheefﬁciency.The speedupevaluateshowmuchfasteraparallelalgorithmisthana correspondingsequentialalgorithm,anditiscomputedastheratio betweentheexecutiontimeofthesequentialalgorithm(T1)and theexecutiontimeoftheparallelversionusingmprocessors(Tm)

(Eq.(1)).Whenevaluatingtheperformanceofnon-deterministic algorithms,thespeedupshouldcomparethemeanvaluesofthe sequentialandparallelexecutiontimes(Eq.(2))[1].Theprevious deﬁnitionallowsdistinguishingamongsublinearspeedup(Sm<m),

linearspeedup(Sm=m),andsuperlinearspeedup(Sm>m).Theideal

caseforaparallelalgorithmistoachievelinearspeedup,although themostcommonsituationistoachievesublinearspeedupvalues duetothetimesrequiredtocommunicateandsynchronizethe par-allelprocesses.Whenlinearoralmost-linearspeedupisachieved, theparallelalgorithmissaidtohaveagoodscalabilitybehavior(i.e. thetimerequiredtoperformitdiminishesproportionallywiththe numberofprocessingelementsused).Thecomputationalefficiency (simplynamedefficiency)isthenormalizedvalueofthespeedup, regardingthenumberofprocessorsusedtoexecuteaparallel algo-rithm(Eq.(3)).Theefficiencymetricallowstocomparedifferent algorithms,eventuallyexecutedinnon-identicalcomputing plat-forms.Thelinearspeedupcorrespondstoem=1,andinthemost

commonsituationsem<1. Sm= _TT1 m (1) Sm= _EE_[[_TT1] m] (2) em= S_mm (3)

AccordingtotheAmdahl’slaw[10],theperformanceofany par-allelapplicationislimitedbythesequentialpartofthecode,that dependsonthechoiceoftheparallelizationstrategy.Amdahl’slaw canbeusedinparallelcomputingtopredictthetheoretical maxi-mumspeedupwhenusingmultiplecomputingresources.Faraway fromapessimisticviewofparallelcomputing,theGustafson’sLaw

[52]indicatesthatparallelimplementationareusefulforsolving largerprobleminstances inareasonableamountoftime,while achievingalmostlinearspeedupvalues.Infact,inthe optimiza-tionfield,parallelimplementationsofmetaheuristicsareknownto beenabletoefficientlysolvehardproblems,evenachieving super-linear(Sm>m)insomespecificsituations,bytakingadvantageof

particularhardwareoralgorithmicdesignissues[7].

4.2. PreviousparallelACOclassiﬁcations

Although the research community has proposed many par-allel ACO implementations, there do not exist standardized taxonomies for classifying theparallelization strategies applied toACO.Researchersusuallymerelypresentoneora few paral-lelACOimplementations,buttheyoftendonotputmucheffort infollowingamethodologytosystematicallycategorizethe paral-lelapproaches.Themostusualcriterionwhenclassifyingparallel ACOsimplydistinguishestwowidecategories:ﬁne-grainedand coarse-grainedmodels[46],alsocalledant-basedandcolony-based models[76].

The proposal by Randall and Lewis in 2002 [84] has been the unique attempt to provide a classiﬁcation of parallel ACO approaches. This categorization distinguished ﬁve parallel ACO models. Three of them used a hierarchical master-slave paradigm and the other two categories follow an

(4)

indepen-dent executions model and a synchronous cooperative model, respectively.

SeveralaspectstoconsiderwhenimplementingparallelACO algorithmswerediscussedbyJansonetal.[57].Theworkdidnot provideacomprehensivetaxonomy,butusedtwocriteriato clas-sifyparallel ACOs.Thefirstcriterion differentiates“parallelized” standardACO,designed todecreasetheexecutiontimewithout changingthesequentialalgorithmicmodel,andspecificallydesigned parallelACO,aimedatimprovingtheresultsqualityandthe compu-tationalefficiency,followingadifferentalgorithmicbehavior.The secondcriteriondistinguishesbetweencentralizedand decentral-izedparallelACOmodels,regardingwhetheracentralprocessthat collectsboththesolutionsandthepheromoneinformationexists ornot.

Otherhigh-leveltaxonomiesfor parallelmetaheuristicshave beensporadicallyusedtoclassifyparallelACOapproaches.The par-allelmetaheuristicsclassiﬁcationbyCrainicandNourredine[28]

distinguishesthreecategorizationlevels,regardingthesearch con-trolcardinality,controlandcommunications,anddifferentiation; butit is a genericclassification that hasnotgained popularity. Inthetaxonomyfor parallelmetaheuristicsbyTalbi[92],three hierarchicallevels (algorithmic, iteration, and solution)are used toclassifyparallelimplementationsofmetaheuristics,regarding thegranularityoftheparallelapproach.TheparallelACOmodel thatuses severalcoloniesis introducedasan exampleof algo-rithmiclevelparallelization,butothermodelsforparallelACOdo notget a mention. Anotherapproach by Cunget al. [31] iden-tifiedthesinglewalkand themultiplewalk–withindependent andcooperativesearchthreads–parallelstrategiesof metaheuris-tics. Up to now, this classification has been only used in the survey of pioneering parallel ACO proposals presented in that samepaper.

The research on parallel metaheuristics has significantly advanced in the last decade, thus the taxonomy presented by RandallandLewisin2002isnolongeraccuratetodescribeand classifytheparallelACOmodelsproposedbytheresearch com-munity.OnlytenproposalsofparallelACOimplementationswere proposedupto2002,andthebulkofworksaboutparallelACO hasbeendoneintheperiodfrom2005to2010,soanew taxon-omyisrequiredtocapturethemainfeaturesofnowadaysparallel ACOproposals.Ontheotherhand,thegenericclassificationsfor parallelmetaheuristicshave notbeenproposedtoconsiderthe ACOfeatures,andtheyoftenprovideanabstractviewthatdonot helptounderstandthespecificdetailsofparallelACO.To over-comethis lackofa standardizedtaxonomyfortheclassification ofparallelACOalgorithms,thisworkintroducesanewtaxonomy forsoftware-basedparallel ACO,conceived totakeintoaccount the particular features of all the proposed strategies for ACO parallelization.

Nextsubsectionpresentstheproposalofacomprehensiveand speciﬁcnewtaxonomyforcategorizingparallelACO implementa-tions.

4.3. AnewtaxonomyforparallelACO

Thissubsectionpresentsanewtaxonomicproposalfor paral-lelACOalgorithms.Thecategorizationtakessomebasicconcepts identifiedbyRandallandLewis[84],butitexpandsthe classifi-cationinordertointroduce somemissing categories,toextend otherones,andalsotoincludegeneralideasfromtheworkby Jan-sonetal.[57]andfromtheevolutionaryalgorithmsliterature.Two maincriteriarelatedtothepopulationorganizationareusedto dis-criminatethecategoriesinthetaxonomy:thenumberofcolonies andthecooperation.Theamountofworkthatisperformedin par-allelis usedtorefine theclassification withinthemaster-slave category.

Themaincontributionsofthenewtaxonomy,whichhavenot beenproposedinpreviousattemptstoclassifyparallelACO imple-mentations,are:

•Threesubcategories wereincludedinthemaster-slavemodel, regarding the amount of work that is performed in parallel: coarse-grain,medium-grain,andﬁne-grain.Themedium-grain master-slaveisanoriginalproblemdecomposition-basednew category that includes those works that apply a hierarchical master-slave model using a domain decomposition approach

[36,38,76].

•Anewcellularmodelcategory–whereasinglecolonyis struc-tured in small neighborhoods with limited interactions– is included,basedonthesimilarclassfoundinthemostwidely acceptedtaxonomiesofparallelevolutionaryalgorithms[7,17]. Thismodel doesnotappear in previousparallelACO classiﬁ-cations,andone implementationof thisnewmodel hasbeen recentlyproposed[78].

•Awidercategory–farmorecomprehensivethanthosepreviously usedinothertaxonomies– wasadoptedforcooperativeparallel ACOmethods thatusemorethan onecolony(themulticolony model).Thiscategoryallowsgroupingalargernumberofparallel ACOproposalsthanotherpreviouslydeﬁnedmulticolonyclasses.

•The taxonomy also incorporates a category including hybrid models,whichcomprehendsthoseproposalsthatfeature char-acteristicsofmorethanoneparallelmodel.

Thefullproposalofanewtaxonomyofstrategiesforparallel implementationsofACOincludesthefollowingcategories:

•Master-slavemodel.Thiscategoryappliesahierarchicalparallel model,whereamasterprocessmanagestheglobalinformation (i.e.pheromonematrix,best-so-farsolution,etc.)anditalso con-trolsagroupofslaveprocessesthatperformsubordinatedtasks, relatedtotheACOsearchspaceexploration.Themodelincludes threedistinguishedsubcategoriesregardingthegranularity(i.e., theamountofworkperformedbyeachslaveprocess):

Coarse-grain master-slave model. The master manages the pheromonematrixandtheinteractionwiththeslavesisbasedon completesolutions.Thetasksdelegatedtotheslavesmay corre-spondtooneormoreants,andtheycomprisebuilding,improving and/orevaluatingoneormorefullsolutions,andcommunicating backtheresulttothemaster.Thissubcategoryismore compre-hensivethantheparallelantsmodelbyRandallandLewis[84], sinceitallowsgroupingseveralantsinthesameslaveprocess. Medium-grainmaster-slavemodel.Adomaindecompositionof theproblemisapplied.Theslaveprocessessolveeachsubproblem independently,whereasthemasterprocessmanagestheoverall probleminformationandconstructsacompletesolutionfromthe partialsolutionsreportedbytheslaves.

Fine-grain master-slavemodel. Theslavesperformminimum granularitytasks,suchasprocessingsinglecomponentsusedto constructsolutions,and frequentcommunicationsbetweenthe masterandtheslavesareusuallyrequired.Themodelincludesthe parallelevaluationofsolutionelementscategoryoriginallyproposed byRandallandLewis[84],butitalsoincorporatesotherproposals thatfrequentlycommunicatecomponentsorinformationabout thecomponents.

•Cellularmodel. A singlecolony isstructured in small neigh-borhoods,each onewithitsownpheromonematrix.Eachant isplaced in a cell ina toroidal grid,and thetrailpheromone updateineachmatrixconsidersonlythesolutionsconstructed bythe antsin its neighborhood. Themodel uses overlapping neighborhoods,so theeffectof ﬁnding high-qualitysolutions

(5)

Fig.2.MaincategoriesinthenewtaxonomyforparallelACO.

Fig.3. AhierarchicalviewofthenewtaxonomyforparallelACO.

gradually spreads to other neighborhoods using the diffusion modelemployedincellularevolutionaryalgorithms[3,81].

•Parallelindependentrunsmodel.SeveralsequentialACO,using identicalordifferentparameters,areconcurrentlyexecutedona setofprocessors.Theexecutionsarecompletelyindependent, withoutcommunicationamongtheACOs,thereforethemodel doesnotconsidercooperationbetweencolonies.

•Multicolony model. In this model, several colonies explore the search space using their own pheromone matrices. The cooperation is achieved by periodically exchanging informa-tion among the colonies. The parallel interacting ant colonies model previously deﬁned by Randall and Lewis [84] is a particular case of multicolony that communicates the full pheromonematrixamongcolonies,thusitiscomprisedinthis category.

•Hybridmodels.Thiscategoryincludesthoseproposalsthat fea-ture characteristics from more than one parallel model. The categoryparallelcombinationof antsandevaluationofsolution elementsbyRandalland Lewis[84] isaspecial caseofhybrid thatcombinestwomaster-slavemodels.However,severalother modelsarealsoincludedinthiscategory.

Fig.2presentsthemaincategoriesinthenewtaxonomyfor parallelACO,andFig.3showsahierarchicalviewofthecategories regardingthecriteriaconsideredintheclassiﬁcation.

TheproposedtaxonomyfocusesonexplicitlyparallelACO algo-rithms,disregardingthoseapproachesthatuseimplicitparallelism oradistributed-agent-basedsearch.Thefollowingmethodshave beenleftapartfromthecategorization:

1.implicit-parallelACOs:theACOconstructionprocessisinherently parallel, sinceants buildsolutionsin a concurrentand inde-pendentway.SomeACOvariantsuseinherentparallelfeatures withoutaparallelimplementation,e.g.:severalcolonies,each onewithitsown pheromonematrix[9,32,53,60];onecolony withtwotypesofantsandasinglepheromonematrix[61];and onecolonywithseveralpheromonematrices[12,77],

2.distributed-agent-based ACOs: this kind of (non-parallel) dis-tributedACOisusuallyemployedtosolvedistributedproblems suchasdynamicnetworkrouting(e.g:AntNet[18]),buttheyare notspeciﬁcallydesignedtotakeadvantageofparallelcomputing architectures,

3.AntBasedOptimization(ABO):despitethenamesimilarity,ABO hasadifferentbehaviorthanACO(itusesantstoreducethe searchspacebyidentifyingareasthatpotentiallycontainagood solution). Fewparallel ABOimplementations havebeen pro-posed,thougharecentpaperdiscussedstrategiesforparallel ABOinsharedmemorycomputers[15],

4.hardware-parallelACOs:thiskindofACOimplementedin hard-ware[57,72,88]areplatform-dependent(e.g.,theydependon theflowofinformationbetweenthehardwareprocessingunits), and sotheiralgorithmicbehavior differsfromthetraditional ACO.Thus,hardware-parallelACOsarenotclassifiableina tax-onomyforsoftware-basedACO,suchastheoneproposedinthis work,andtheyhavebeenalsoleftapartfromthecategorization. Table 1 summarizes the main features of the parallel ACO modelsidentifiedinthenewtaxonomy(forthehybridscategory, D/P standsfor “dependsontheproposal”).Thedatain Table1

Table1

Characteristicsofthemodelsinthenewtaxonomy.

Model Populationorganization #Colonies #Pheromonematrices Communicationfrequency

Coarse-grainmaster-slave Hierarchical,non-cooperative One One Medium

Medium-grainmaster-slave Hierarchical,non-cooperative One One Medium-high

Fine-grainmaster-slave Hierarchical,non-cooperative One One High

Cellular Structured,cooperative One Many Medium

Parallelindependentruns Distributed,non-cooperative Several Several Zero

Multicolony Distributed,cooperative Several Several Low

(6)

correspondtopureimplementationsofthemodels,althoughthe boundarybetweensomemodelsmaybediffuse.Forexample,a coarse-grainmaster-slavewithseveralantsperslave coulduse localpheromonematricesateachslavetoreducethe communi-cations,therefore improvingthecomputationalefﬁciency.Such acoarse-grainmaster-slaveimplementationisquitesimilartoa multicolonymodelinwhichpheromonematricesareperiodically synchronized.

Thepreviouslypresentedtaxonomywasconceivedto compre-hendalltheparallelACOproposalsfoundintherelatedliterature. Thenextsectionreviewstheworksthathavepresentedparallel ACOimplementationsand italsoclassiﬁeseachproposalinthe correspondingcategoryofthenewtaxonomy.

5. CategorizingparallelACOimplementations

ThissectionpresentsacomprehensivereviewofparallelACO proposalsintherelatedliterature,describingthemainfeaturesof theparallelimplementationand thedetailsoftheexperimental analysis.Thefirstsubsectionintroducesthepioneeringworkson parallelACOandthenextsubsectionscategorizetheworks pro-posedsince1998,followingthenewtaxonomy.Whenaspecific workproposestwoormore parallelACOimplementations,it is classifiedinthecategorythatcorrespondstothemethodwhich obtainedthebestresults,regardingboththesolutionqualityand theperformance.Onlyasinglereviewisincludedforcellular par-allelACO,sincethismodelhasbeenrecentlypresentedbytwoof theauthorsofthepresentwork.

5.1. Pioneeringworks

ThepioneeringworksonparallelACOimplementationsdate fromtheearly1990s.Theprimaryproposalsweremostlyfocused ininvestigatingthebeneﬁtsoftheavailableparallelarchitectures forspeedinguptheACOsearch.Itishardtoclassifytheminthe pro-posedtaxonomy,sincethebasicconceptsonparallelACOmodels werenotformulatedatthattime.

InhisPh.D. thesis,Dorigo[40]first suggestedusinga paral-lelversionofACOinordertoimprovethequalityofthesearch and itscomputational efficiency. Thefirst implementation of a parallelASisattributedtoBolondiandBondaza[14],who pre-senteda veryfine-grain implementationthat placed oneantin eachavailableprocessorofaConnectionMachine(CM-2) super-computertosolvetheTravelingSalesmanProblem(TSP).Although it was an innovative proposal, the fine grain parallel ACOdid not scale due to the high communication overhead required in the synchronization and the pheromone updating phases

[42,75].

BolondiandBondazaachievedbetterresultswhensolvingthe TSPusingahierarchicalpanmicticparallelACOimplementedin anetworkoftransputers.Severalgroupsofantsexecuteda stan-dardASalgorithm–eachoneinanavailableprocessor–,anda synchronousupdateofthepheromonetrailswasperformedafter hierarchicallybroadcastingalltheinformation.TheparallelACO showedgoodscalability:almostlinearspeedupwasachievedwhen increasingthenumberofprocessors,nomatterthesizeofthe prob-leminstancesfaced[42].

Bullnheimeret al. [16] studiedthe communications in syn-chronousandpartiallyasynchronousmaster-slaveparallelACOs.In thesynchronousversion,eachslavelocallyupdateditspheromone matrix independently and the pheromone trials were globally synchronizedwithagivenfrequency.Nocertainconclusionscan bedrawnfromtheexperimentalanalysis,sinceit didnotsolve anyconcreteproblem:itonlysimulatedthecommunicationson TSPinstanceswithupto500cities.Theasynchronousversionhad

betterspeedupandefﬁciencyvaluesthanthesynchronousversion, buttheimprovementsdiminishedforthelargestinstances. 5.2. Master-slavemodel

Master-slave parallelACO implementations have been quite popularin theresearchcommunity,mainlydue tothefactthat thismodelisconceptuallysimpleandeasytoimplement. 5.2.1. Coarse-grainmaster-slave

Thestandardimplementationofcoarse-grainmaster-slaveACO assignsoneanttoaslaveprocessthatisexecutedonanavailable processor.Themasterprocessgloballymanagestheglobal infor-mation(i.e.thepheromonematrix,thebest-so-farsolution,etc.), andeachslavebuilds,optionallyappliesthelocalsearch,and evalu-atesasinglesolution.Thecommunicationbetweenthemasterand theslavesusuallyfollowsasynchronousmodel.

TheﬁrstproposalofthisimplementationwasANTabu,amethod combiningACOandTabuSearch(TS),appliedtosolvetheQuadratic AssignmentProblem(QAP)byTalbietal.[93].TheTSmethodwas usedasalocalsearchtoimprovethesolutionsineachslave.The parallelversionwascomparedagainstasequentialACO,a paral-lelTS,geneticalgorithms(GA)andvariableneighborhoodsearch. Regardingthesolutionquality,parallelANTabuwasoneofthetwo bestmethodsstudiedwhensolvingQAPLIBinstanceswithupto 256locationsinaclusterof10SGIIndyworkstations.The compu-tationalefﬁciencyoftheparallelmethodswasnotreported.

Synchronous and asynchronous versions of a coarse-grain master-slaveAS-likemethodwerestudiedbyCatalanoand Malu-celli[20]tosolvetheSetCoveringProblem(SCP).Bothversions hadsimilarspeedupbehaviorandefﬁciencyvalueswhensolving OR-Libraryinstanceswithupto400elementsand650subsetsina CrayT3Dwith64processors,althoughthequalityofsolutionswas notstudiedinthescalabilityanalysis.

Delisleetal.[35]solvedanindustrialschedulingprobleminan aluminumcastingcenter,usingamultithreadcoarse-grain master-slaveACOimplementedwithOpenMPonaSiliconGraphicsOrigin 2000computer.Themasterspawnsseveralthreads(oneforeach ant),whichgenerateandevaluatethesolutions.Specific considera-tionsaboutloadbalancingandinformationupdatewerepresented. Theexperimentalanalysissolvedprobleminstanceswith50and80 jobsusingupto16processors.Significantspeedupswereobtained (upto5.45whenusing16processors),butthecomputational effi-ciencyvaluesdegradedasthenumberofprocessorsgrew.

Twoapplicationsfollowingthestandardimplementationwere presentedbyPengetal.[79,80]tosolveapackingproblemand theimageregistrationproblem, respectively. Theanalysiswere mainlyfocusedonthesolutionquality:inboth casesthe paral-lelACOquicklyconvergedtobettersolutionsthanevolutionary algorithms,butthecomputationalefﬁciencywasnotstudied.

Lietal.[64]appliedastandardcoarse-grainmaster-slaveACS tothevectorquantizationcodebookdesign.Themethodobtained highefficiencyvalues(>0.8)whenworkingon2–16processorsofa DeepSuper-21Ccluster(P4Xeon),anditcomputedbettersolutions thanbothasequentialACSandaparallelindependentruns. Group-ingfourantsoneachprocessorandusingamodifiedpheromone updatingruleimprovedtheefficiencyoftheparallelmodel.

Ibrietal.[55]proposedanhybridACS/TStosolveadispatching andcoveringproblemforemergencyvehicleﬂeets.Thestandard coarse-grain master-slaveparallelization wasused for theACS, wheretheslaves(implementedbythreads)constructsthe solu-tions.AsecondparallelstageisappliedtoperformtheTSoperator andtheneighborhoodevaluation.Probleminstanceswithupto100 vehicles,23stations,20zonesand30emergenciesweresolvedona IntelCore2Duo,comparingsynchronizationstrategiesbetweenthe parallelprocessesandtheimpactoftheinformationexchangeon

(7)

thecomputationalefﬁciency.Theparallelimplementations com-putedbettersolutionsthanthesequentialone.Sublinearspeedup valueswerefoundwhenusingmorethan twothreads,andthe asynchronousmethod signiﬁcantly reduced the execution time whencomparedwiththesynchronousimplementation.

Someresearchershaveproposedassigningseveralantsper pro-cessorincoarse-grainmaster-slaveACOimplementations.

Doerneretal.[37]usedthisideainasynchronousmaster-slave implementationofASrank (avariantofASwherethepheromone

deposit is weighted according toa rank of the best solutions) tosolvetheVehicleRoutingProblem(VRP). SeveralclassicVRP instanceswithupto199clientsweresolvedinaBeowulf clus-ter with up to 32 processors. A sub-linear speedup behavior wasdetected.The bestvalues of computationalefﬁciency (0.7) wereobtainedwhenusing8processors,andthentheefﬁciency decreasedwhenthenumberofusedprocessorsincreased.

Lvetal.[70]proposedaparallelACOusingPgroupsofants dis-tributedinPprocessors,whichsharedonepheromonematrixina symmetricmultiprocessingcomputer.Thisapproachcorresponds tothecoarse-grainmaster-slavemodel,wheretheshared mem-oryactslikeanimplicit masterandeachslaveholdsagroupof ants.Parallelversionsof_MMASand ACSwerestudiedtosolve TSPLIBinstanceswithupto15,915citiesonanIBMp-serverwith twoPower5processors.Theparallelalgorithms achievedbetter solutionsthanthesequentialversions,buttheircomputational efﬁ-ciencywasnotstudied.ThesameideawasappliedbyGuoetal.[51]

totheproteinstructurepredictionproblem, implementingeach groupofantswithadifferentthreadandusinganasynchronous access tothe sharedpheromone matrix. Similarresults than a previoussequentialACOwereobtainedona8CPUIBMpServer. Reductionsbetween10and50timeswerereportedinthe execu-tiontime,butthecomparisonisunfairsincethesequentialACO wasexecutedonaslowercomputerandbothmethodsuseda dif-ferentnumberofants.Themaster-slaveACOwasbetween2and 10timesfasterwhenusingmorethanonegroupofants.

Chintalapatietal.[25]groupedseveralantsinasameprocessor intheircoarse-grainmaster-slaveACOappliedtothediscoveryof classificationrules.Eachgroupdiscoversrulesandsendthemto themaster,whomanagesthepheromonematrix.Standardcancer datasetswithupto168featuresweresolvedintheexperimental analysisperformedinanheterogeneouscluster.Thespeedup val-uesdependedonthedatasetfeaturesandthenumberofants,and thebestefficiencyvalues(0.95)wereobtainedwhenusing128ants executingon8CPUstosolvethelargestprobleminstancestudied. Recently,multithreadingprogramming havebeenappliedto coarse-grainmaster-slaveACO, byassigningonethreadtoeach ant,andexecutingseveralthreadsonthesameprocessor.Tsutsui andFujimoto[97]studiedsynchronousandasynchronousvariants of parallel cunning AS (cAS) to solve the TSP. The experimen-tal analysis performedon a i7 965(4 cores, 3.2 GHz)showed thataroughasynchronousimplementationobtainedsuperlinear speedupbyintroducingadifferentalgorithmicbehaviorthanthe synchronouscAS.IntheparallelACObyGaoetal.[48]tosolvethe TargetAssignmentProblem,themaintasktoexecuteinparallelis theconstructionofsolutions.Theexperimentalanalysiscompared multithreading implementations using OpenMP and Threading BuildingBlock(TBB)inaPentiumDualCore(2.4GHz).Almost lin-earspeedupvalueswereobtainedforprobleminstanceswith100 targets,andtheOpenMPvariantwasmoreefficientthantheTBB implementation.Thesameideawasappliedinthecoarse-grainTBB implementationbyLietal.[63]tosolvetheTSP.Speedupvaluesup to1.72wereobtainedwhenusing400antstosolveaTSPinstance with500citiesonaPentiumDualCore(3.0GHz).

Inotherimplementationsthemaintaskstoperforminparallel areonlyrelatedtotheevaluationofsolutionsortheapplication ofthelocalsearch,mainlyduetotheinherentcomplexityofthe

problemfaced.Theseimplementationsarecategorizedwithinthe coarse-grainmaster-slavemodel,consideringthegranularityofthe workperformedbyeachslaveprocess.

Tsutsui[95]solvedtheQAPusingaparallelcASconceivedto speedupalocalsearchmethodperformedbytheslaveson solu-tionspreviouslybuiltbythemaster.Amultithreadingapproach wasadopted,andthecommunicationoverheadwasreducedby usingthesharedmemoryparadigm.SeveralQAPLIBinstanceswith up to 150locations were solved using two quadcore PCs, and signiﬁcantimprovements inthe execution time wereobtained. Later[96],thecoarse-grainmaster-slaveoutperformedbotha syn-chronousmulticolonyandaparallelindependentrunsmodelin experimentsperformedintwodual-coreOpteronmachines.The optimalpumpschedulinginwaterdistributionnetworkswas tack-ledwithacoarse-grainmaster-slaveACObyLópez-Ibá ˜nezetal.

[68].Multithreadingprogrammingtechniqueswereusedto imple-ment the parallel evaluation of solutions, and a dynamic load balancingschedulingforassigningsolutionstothethreadswasalso included.Theexperimentalanalysissolvedawell-knownproblem instance-alreadytackledwithasequentialACO-ina computer with2dual-coreAMD64Opteronprocessors.Accuratesolutions werecomputedbytheparallelACO,whichalsoobtainedincreasing speedupvalueswhenusingahighernumberofants.

TheparallelevaluationofsolutionswasalsousedintheACS by Weis and Lewis [99], applied to the design of a radio fre-quencyantenna structure. Anad-hoc gridcomputing approach wasimplementedinaclusterof47non-dedicatedPCs,usingan instantmessagingprotocolforthecommunications.The experi-mentalresultsdemonstratedthattheparallelACSobtainedhigh speedupvalueswithreducedoverheadwhenincreasingthesizeof theprobleminstances.

Workingina higherlevelofabstraction,Crausand Rudeanu

[29] implemented a reusableframework for executing master-slaveparallelapplications.Checkpointswereusedtooptimizethe communicationbetweenthemasterandtheslaves,which asyn-chronouslyrequestedexchange informationby turns.Then, the masteronlysentthemodiﬁedinformationforeachslavesinceits lastcheckpoint. AparallelACOwasusedtotesttheframework bysolving a TSPLIBinstance with229citiesona SunFire15K with48processors,achievingalmostlinearspeedupwhenusing upto25processors,buttheefﬁciencydecreasedwhenusingmore resources.Later[30],theauthorsimplementedapyramidal frame-workfollowingamaster-slavemodelwhichincludessubmasters thatmanagedtheslavesundertheircontrol.Thehierarchical orga-nizationallowedtoreducethecommunications,soitwasableto achievealmostlinearspeedupvalueswhenusingmorethan25 processors.

Recently, thenovelGPU platformshaveprovide an efﬁcient hardwaretoimplementcoarse-grainvariantsofmaster-slaveACO. Cataláetal.[19]solvedtheOrienteeringProblem(OP)witha coarse-grainmaster-slaveinwhichthesolutionswerebuiltonthe GPUandthemasterexecutedintheCPU.Theexperimental anal-ysissolvedOPinstanceswithupto3000nodes,comparingthe GPUparallelACOexecutedinaPCwithanVidiaGeForce 6600 GTgraphiccardwith8pixelshaderprocessorsagainstadomain decompositionmethod.Accuratesolutionswereachievedbythe GPUimplementationusingfewants,butthequalityofresultsdid notfurtherimprovewhenusingadditionalants.Theexecutiontime fortheGPUcomputingwaslinearwithrespecttothenumberof ants.

ZhuandCurry[106]implementedaGPUcoarse-grain master-slave ACO with a local search to solve bound-constrained continuousoptimizationproblems.Thesolutionswerebuilt, evalu-ated,andimprovedusingalocalsearchmethodontheGPU,while theremainingtaskswereexecutedontheCPU.The experimen-talevaluationstudiedtwelvebenchmarkfunctionsonaPCwith

(8)

anVidiaGeForceGTX280with240streamingprocessors.Given aﬁxedexecutiontime,theGPUimplementationobtainedbetter solutionsthanafull CPUimplementation.Theauthorsreported speedupvaluesrangingbetween128to403whenusing15360 threads.

5.2.2. Medium-grainmaster-slave

Themaster-slave ACOproposedtosolvetheOP byMocholí etal.[76]wasconceivedtoexecuteina gridenvironment.The mastersplitstheprobleminclusters,andtheslavesﬁndpartial solutionsforeachclusterusingindependentgroupsofantswith non-overlapping pheromone matrices. Then, the master builds a complete solution for the problem by combining thepartial solutions.Thecommunicationwasprovidedbyhighlevelgrid ser-vicesimplementedusingwebservices.Severalrandomgenerated instanceswithupto10,000nodesweresolvedinaclusterwith32 PCs,showingthatwhenusingupto32groupsofants(thenumberof availableprocessors)theexecutiontimesexponentiallydecreased, andthequalityofsolutionsimproved.

AsimilardecompositionstrategywasappliedintheD-ant algo-rithmtosolve theVRP byDoerner et al. [38]. D-antsplitsthe problemand uses severalslaves to computea partialsolution foreach subproblem.Then,thepartialsolutions aremergedby amasterprocess,whichalsoperformedthepheromone evapora-tionanddepositoncomponentsofthebest-so-farsolution.Unlike thepreviouswork,D-antgloballymanagesthepheromonematrix forthewholeproblem, andeach slaveuses itsownsubmatrix. D-antwasabletocomputeaccuratesolutions for classicalVRP instanceswithupto199clientsonaclusterwith16processors, whilereducingtheexecutiontime.However,theefﬁciencyvalues deterioratedwhenusingmorethantwoprocessors,suggestinga poorscalabilitybehavior.Later[36],animprovedimplementation ofthemedium-grainmaster-slaveD-antoutperformeda coarse-grainmaster-slave,amulticolony,andanhybridcombiningthese twomethods.ClassicVRPinstanceswithupto480clientswere solvedonaIBM1350clusterwith32processors.D-antobtained superiorefﬁciencyvalues(upto0.75 with8processors) witha verysmalldegradationinthequalityoftheobtainedsolutions. 5.2.3. Fine-grainmaster-slave

RandallandLewis[84]tackledtheTSPwithafine-grain master-slaveACS.Theslaveprocessessenteachnewcomponentincluded inthesolutiontothemaster,whichperformedthelocalupdateof pheromonetraces.WhensolvingTSPLIBinstanceswithupto657 citiesinaclusterwith8processors,thespeedupwasfarbelow lin-ear,andthemaximumefficiency(0.83)wasobtainedwhenusing onlytwoprocessors.Thesepoorresultssuggestedthatthe fine-grainapproachwithlocalupdateofpheromoneisnotanefficient ideaforACOparallelization,duetothehighfrequencyof commu-nications.Thequalityoftheobtainedsolutionswerenotreported. An improved fine-grain master-slave implementation in a sharedmemorycomputerwasproposedbyDelisleetal.[33].The masterprocesswasreplacedbyaglobalmemorythatstoredthe globalpheromonematrixandthebest-so-farsolution,andcritical regionswereusedtoavoidthemutualupdateofglobal informa-tion.Eachslaveupdatedthelocalinformationperiodically,butnot ineveryiteration.TSPLIBinstanceswithupto657citiesweresolved onaIBM/P1600NH2with16Power3processors.Byspacingout thelocalinformationupdates,theproposedmethodachieved bet-terperformancethantheimplementationbyRandallandLewis:it maintainedthequalityofsolutionswhileavoidingthescalability degradationupto8processors.Increasingthenumberofantsper processorandreducingthenumberofiterationsraisedthe per-formance,butreducedthequalityofsolutions.Theexperimental analysiswaslaterextendedtoincludeaSGIOrigin3800computer andalsotouseaRegattanodewithPower4processors[34].Better

efﬁciencyvaluesweresystematicallyobtainedintheIBMRegatta machine,suggestingthattechnologicalevolutioncanimprovethe efﬁciencyofparallelACO.

Arecentﬁne-grainimplementationofMMASonGPUwas pro-posedbyFuetal.[47]tosolvetheTSP.Unlikeotherapproaches,the GPUholdsthepheromonematrix,whichisactualizedafterevery step.So,theinformationofthemasterprocessispartlystoredin CPUandpartlyintheglobalmemoryoftheGPU.Ineachstep,the GPUisusedtogeneraterandomnumbersandtocomputethenext cityforeachant,andtheCPUonlymanagessmallpiecesofdata (vis-itedcitiesandroutes).TheexperimentalevaluationsolvedTSPLIB instanceswithupto1000citiesonai7(3.3Ghz)withaNvidia TeslaC1060(240cores).Thespeedupvalueswereupto30onthe largestinstances,andabottleneckinthecommunicationbetween CPUandGPU–whichdemandedmorethan20%oftheexecution time-wasdetected.

5.2.4. Summary:master-slaveparallelACO

Master-slaveimplementationshavebeenextensivelyusedto design parallelACOs(see Table2for a summaryof therelated publications).Themodel providesaneasyand effectivewayto takebenefitoftheadditionalprocessingpowerofparallel com-putersforsolvingcomplexproblems.Manyproposalshaveused thecoarse-grainsubmodel,sinceitsuppliesaconceptually sim-pleschemathatachievesgoodspeedupandscalabilitybehavior. Themedium-grainsubmodelwasincorporatedinthetaxonomyin ordertoincludethoseworksthatproposea divide-and-conquer-likeapproachformaster-slaveparallelization.Thefirstproposalsof fine-grainmodelsshowedpoorefficiencyduetothelargeamount ofcommunicationsrequired,soinnovativeimplementationswere devised in order to overcome this problem by exploiting fast communicationparadigmssuchassharedmemoryparallel archi-tectures.

5.3. Cellularmodel

ThecellularmodelforparallelACOisa genericproposal fol-lowingthecellularmodelforparallelevolutionaryalgorithms.So, itdiffersfrompreviousproposalsofcellular-likemodelssuchas thehardware-parallelACOsbyMiddendorfetal.(usingprocessor arrays[72]andFPGA[89])andotherso-calledcellular implemen-tations,mainlybecausetheyuseacellularautomatamodel[8,103], butwithoutproposingaparallelimplementation.

The single one implementation of a parallel cellular ACO was presented by Pedemonte and Cancela [78], who proposed adistributed-memory implementationof thecellularmodel for solving a reliable network design problem. The cellular model wasthebestparallelmethodamong thestudiedones, improv-ingovertheresultsobtainedusingpreviousparallelevolutionary algorithms(upto31%foraspecificprobleminstance),andalso showinghighvaluesofcomputationalefficiency(over0.9usinga clusterwithfourprocessors).However,theresultsslightly deteri-oratedwithrespecttothoseachievedbyasequentialACO.Several improvementsarereportedtobecurrentlyinvestigatedinorderto overcometheloseofqualitywhenusingthecellularmodel. Imple-mentationsusingbothmulticoreandGPUparallelarchitectures couldfurtherimprovethecomputationalefficiencyofthismodel. 5.4. Parallelindependentrunsmodel

Theparallel independentrunsisa straightforward approach that applies a multi-start search using several processes that executesthe sameACOalgorithm. The modeldoesnot involve communicationsbetweenprocesses,soitssearchmechanismis identicalto the one resulting fromapplying several sequential ACOs.

(9)

Table2

Summaryofmaster-slaveparallelACOproposals.

Author Year Algorithm Problem Computationalplatform

Coarse-grain

Talbietal.[93] 2001 ANTabu QAP SGIIndycluster

CatalanoandMalucelli[20] 2001 AS SCP CrayT3D

Delisleetal.[35] 2001 ACO Industrialscheduling SGIOrigin1000

CrausandRudeanu[29] 2004 ACO TSP SunFire15K

CrausandRudeanu[30] 2004 ACO TSP SunFire15K

Doerneretal.[37] 2004 ASrank VRP Cluster

Pengetal.[80] 2005 ACO Packingproblem PC

Pengetal.[79] 2006 ACO Imagerestoration PC

Lvetal.[70] 2006 MMAS,ACS TSP IBMp-servermultiprocessor

Lietal.[64] 2007 ACS Codebookdesign DeepSuper-21C

Cataláetal.[19] 2007 ACO OP GeForce6600GTGPU

Tsutsui[95] 2007 cAS QAP PCs(dualcore,quadcore)

Tsutsui[96] 2008 cAS QAP PCs(dualcore,quadcore)

López-Ibá ˜nezetal.[68] 2009 ACO Pumpscheduling DualcorePC

Guoetal.[51] 2009 ACO Proteinstructureprediction IBMp-servermultiprocessor

ZhuandCurry[106] 2009 ACO Boundconstrainedoptimization GeForceGTX280GPU

WeisandLewis[99] 2009 ACS Antennadesign Non-dedicatedcluster

Ibrietal.[55] 2010 ACS Emergencyﬂeetdispatching IntelCore2Duo

Chintalapatietal.[25] 2010 ACO Classiﬁcationrulesdiscovery Cluster

TsutsuiandFujimoto[97] 2010 cAS TSP i7965

Gaoetal.[48] 2010 ACO Targetassignment PentiumDualCore

Lietal.[63] 2010 ACO TSP PentiumDualCore

Medium-grain

Mocholíetal.[76] 2005 ACO OP Cluster,grid

Doerneretal.[38] 2005 D-ant VRP Cluster

Doerneretal.[36] 2006 D-ant VRP IBM1350cluster

Fine-grain

RandallandLewis[84] 2002 ACS TSP IBMSP-2cluster

Delisleetal.[33] 2005 ACS TSP IBM/P1600

Delisleetal.[34] 2005 ACS TSP SGIOrigin3800,IBMRegatta

Fuetal.[47] 2010 MMAS TSP i7,NvidiaTeslaC1060

Stützle [90] studied the parallel independent execution of

MMASwithalocalsearchfortheTSP.Theevaluationanalyzed thebeneﬁtsofusingaparallelmodelwithrespecttoasequential ACO,comparingthequalityofthesolutionsforTSPLIBinstances withupto1173citiesinaUltraSparcIIworkstation.Theresults showedthattheparallelexecutionsobtainedbettersolutionsthan thesequentialalgorithminallthestudiedinstances.

Later,aparallelindependentrunsimplementationofASwith alocalsearchdemandinghighexecutiontimeswaspresentedby Rahoualetal.[83]totackletheSCP.TheASwascomparedagainst acoarse-grainmaster-slaveparallelACOforsolvingSCPinstances withupto500elementsand5000subsetsinaclusterof40PCs. TheparallelindependentrunsASobtainednearidealefﬁciency valuesduetothenegligibleprocessescommunication,anditalso improvedthesolutionsqualitywithrespecttothesequential algo-rithm.Ontheotherhand,thecoarse-grainmaster-slaveefﬁciency values stronglydependedonthesizeoftheprobleminstances, rangingfrom0.21forthesmallerinstancesto0.83forthelarger ones.

Albaetal.[4]comparedthreeparallelACS(parallel indepen-dentruns,multicolony, and coarse-grain master-slave)to solve theMinimumTardyTaskProblem(MTTP)instancesinacluster with only 3 PCs. The three methods achieved similar quality of solutions. Mixed results were reportedwhen increasingthe problemsizeusingaﬁxednumberofprocessors,buttheparallel

independentrunsmodelgenerallyobtainedthehighestefﬁciency values. Later [5], the parallel independent runs was compared againstasynchronousmulticolony modelsusingstarand unidi-rectionalring topologiesina cluster of8 PCs.The modelsthat involve communication foundbetter solutions, but theparallel independentrunsmodelhadbetterefﬁciencyvalues.

Bai et al. [11] implemented a parallel independent runs of

MMASonGPU.Eachthreadexecutedoneantandeach thread blockwasusedforanindependentexecution.Themainalgorithm runsonGPU,while theCPUis only usedtoinitializethe solu-tionsandtocontroltheiterationprocess.SixTSPLIBinstanceswith upto400citiesweresolvedinaPCwithanVidiaGeForce8800 GTXwith128streamprocessors.Regardingthesolutionsquality, theGPU-parallelimplementationoutperformedthreesequential

MMASversions,whileaccelerationvaluesbetween2and32were obtained.

Table3summarizestheproposalsofparallelindependentruns ACOimplementations.The modelhasbeenseldomused.It has beenemployedinworksmainlyfocusedonachievingspeedupand scalabilityimprovements.Thesearchmechanismissimilartothe oneemployedbysequentialACOs,eventhoughthemultistarting approachissometimesusefultoavoidstagnation.Parallel indepen-dentrunsACOimplementationsfrequentlyobtainsimilarresults qualitythansequentialACOs,andtheyareoftenoutperformedby parallelmodelsthatusecommunication.

Table3

SummaryofparallelindependentrunsACOproposals.

Stützle[90] 1998 MMAS TSP UltraSparcIIworkstation

Rahoualetal.[83] 2002 AS SCP Cluster

Albaetal.[4] 2005 ACS MTTP Cluster

Albaetal.[5] 2007 ACS MTTP Cluster

(10)

5.5. Multicolonymodel

ParallelACOimplementationsfollowingthemulticolonymodel havebeenextensivelyused.Thisapproachprovidesacooperative searchmechanismthatoftenallowsobtainingsuperiorresultsthan thesequentialmodelaswellasoutperformingotherparallelACOs. Multicolonyalsoadmitsa simpleimplementationin distributed memoryplatformssuchasclustersofcomputers.

Themain features tobe consideredwhen designing a mul-ticolony ACO were summarized by Janson et al. [57]: the communication frequency and the neighborhood topology; the typeofinformationthatisexchangedbetweenthecolonies;how theinformationreceivedfromothercoloniesisused;and homo-geneousversusheterogeneousapproaches.

MichelandMiddendorf[73,74]introducedaconﬁgurationthat becamethestandardformulticolonyACO,sinceitwaslaterused bymanyotherauthors:asynchronousalgorithmwithastructured neighborhoodandaﬁxedcommunicationfrequencyforsharingthe bestsolutions,eachoneofwhom,ifbetter,substitutesthe best-so-farsolutioninthedestinationcolony.

MichelandMiddendorfsolvedtheShortestCommon Superse-quenceProblem(SCSP),aproblemwithapplicationin bioinformat-ics(DNAsequencing).Theirmulticolonymodelobtainedslightly bettersolutionsthanasinglecolonyACOforstringswithlengthup to160,butnoefficiencyanalysiswereperformed.Later,four differ-entexchangestrategieswerestudiedbyMiddendorfetal.[75]to solvetheTSPandtheQAP.Themulticolonymodeloutperformed asequential ACOina TSPLIBinstancewith101cities,whileno definitiveconclusionsweredrawnforaQAPLIBinstancewith60 locations.Thebestresultswereobtainedwhensendingthebest localsolutionusinganunidirectionalringconnectiontopologyand substitutingthebest-so-farsolutionaccordingly.Theauthorsalso concludedthattheexchangefrequencyshouldbesettoavoidthe degradationofeitherthesolutionqualityortheACOcomputational efficiency.

PiriyakumarandLevi[82]solvedtheTSPwithastandard multi-colonyACO.TheexperimentalevaluationsolvedaTSPLIBinstance with 52 cities on a Cray T3E multiprocessor, studying several qualityandefﬁciencymetrics:thecostofthebestsolution,the totalcomputingtimeandthesingleprocessortime,andtherates betweencommunication/idletimesandtotalcomputation time. Theanalysisshowedthatgoodqualityresultscanbeobtainedwhile maintainingboundedvaluesforidleandcommunicationtimes.A speedupstudywasnotincluded.

Aweapon-targetassignmentproblemwasfacedbyLeeetal.

[62]usingamulticolonyACOthatappliedalocalsearchmethod tothebestsolutionfromallcolonies,andtheimprovedsolution wasconsideredforanadditionalpheromoneupdateprocess.The proposaloutperformedaGAandothersequentialandparallelACOs regardingboththesolutionqualityandtheexecutiontimewhen solvingprobleminstanceswith120weaponsand100targets.

PACS,themulticolonyACStosolvetheTSPbyChuetal.[27], fol-lowedthestandardapproachbutitincludedanadditionalupdate everytimethatacolonyreceivesasolution.Theexperimental eval-uationonlystudiedthesolutionqualityforthreeTSPLIBinstances withupto225cities,omittinganefﬁciencyanalysis.Several con-nection topologies were evaluated, and all the proposed PACS variantsobtainedbettersolutionsthansequentialASandACS algo-rithms.

ChuandZomaya[26]evaluatedtwomulticolonymodels–using acircularexchangeofsolutionsandasharedpheromonematrix, respectively– forsolvingapredictionofproteinstructureproblem. Theexperimentalevaluationwasunconventional,sinceitusedas theperformancemetricthenumberofCPUticksneededtoﬁnd thebestsolutioninaclusterwith5processors.Bothmulticolony methodsoutperformedacoarse-grainmaster-slaveapproach,and

themulticolonywithcircularexchangeofsolutionsachievedthe bestperformancevalues.

Yangetal.[104]appliedatraditionalmulticolonyACOusingan unidirectionalringtopologytomaximizethedensityofthedemand coveredbydirecttravelsonabusnetwork.Theexperimental evalu-ationstudiedthesolutionqualityandtheexecutiontimeinacluster of8PCs.BothmetricsimprovedintheparallelACOwhencompared withasinglecolonymodel.

The standard approach was also appliedin the multicolony

MMASbyXiongetal.[100]tosolvetheTSP.Theexperimental analysissolvedclassicalTSPLIBinstanceswithupto15915cities inaDawn4000Lmassivelyparallel processing(MPP)computer with64nodes.TheparallelMMASgotbettersolutionsthanthe sequentialoneastheproblemsizeincreased,but theefﬁciency signiﬁcantlydecreasedwhenusingmorethan8nodes.

HongweiandYanhua[54]solvedtheDNAsequence determina-tionproblemapplyingamulticolonyMMASthatusedthestrategy ofsortingandexchanginginformationforpheromoneupdating. WhenexecutedinaDawnTC4000MPPsupercomputer,the par-allelMMASachievedbettersolutionsthanasequentialACO,aTS method,andaGA.Theauthorsclaimedtohaveasublinear scalabil-itybehaviorduetothecommunications,butnospeedupanalysis waspresented.

Jovanovic et al. [58,59]solved theMinimum Weight Vertex CoverProblem(MWVCP)withastandardmulticolonyACO imple-mentedwiththreadsinaIntelCore2PC.Severalinterconnection topologiesandpoliciesonhowtousetheexchangedinformation werestudiedforinstanceswithupto150nodes.Themulticolony modelcomputedbettersolutionsthanbothaparallelindependent runsandasequentialACO.Theauthorsfoundoutthatincreasing thenumberofcoloniesisnotalwaysagoodstrategyinorderto improvetheresults.Noefﬁciencyanalysiswascarriedout.

Ta˘skovaetal.[94]solvedafiniteelementmeshdecomposition problemusingamethodthatcombinesaparallelmulticolonyACO witharefinementalgorithm.Astandardmulticolonyapproachis used,whichexchangesthebestsolutionsfoundoneachrefinement level.Theexperimentalanalysiswasperformedinaclusterwith8 nodes,eachonewithtwoAMDOpteron1.6GHzprocessors.The distributedimplementationobtainedthesamequalityofsolutions thanthesequential one,butlowspeedupvalues wereachieved (upto2.98whenusing8processes),mainlyduetotheinherent sequentialfeaturesofthemethod.

Recently,themulti-depot VRPwassolvedbyYu etal. [105]

applyingamulticolonyACOwhereoutstandingantsareexchanged atcertainintervalsusingaringinterconnectiontopology.Instances withupto360customersweresolvedinaclusterwith8PCs.The multicolonyACOachievedcompetitivesolutionswhencompared withothermethods(lessthan3%farofthebestknownsolutions), butnoefﬁciencyanalysiswasperformed.

Researchershavealsointroducedothervariantsofmulticolony that differ from the standard implementation. Among several differentapproaches,asynchronousmulticolonyalgorithmshave often been proposed, and another frequent idea is to include dynamicoradaptivemethodsforthecommunicationfrequency and/ortopology.

ThemulticolonyACObyChenandZhang[23] usedadaptive methodsforexchangingthebestsolutionsandadjustingthe fre-quency of information exchanges. TSPLIB instances withup to 318citiesweresolvedinaDawn2000MPPcomputer,comparing thesolutionsqualityandtheexecutiontimeagainstasequential ACOanda multicolonywithcircular exchangeof thebestlocal solution.Theadaptive multicolonyperformedthebest,andthe adaptivefrequencyallowedreachingbetterresultsthanusingﬁxed intervals,thoughitrequiredlargerexecutiontimes.Asub-linear speedupbehaviorwasdetectedupto35processors,andthevalues improvedwhenfacingTSPinstancesofincreasingsize.Efﬁciency

(11)

valuesupto0.8wereobtainedwhenusing25processorstosolve thelargestinstance.

Chenetal.[22]alsoproposedanovelstrategyforinformation exchange,whereeachcolonydynamicallydeterminesa destina-tioncolonytosenditsbestsolutionusinganadaptivemethod.The frequencyofinformationexchangedependedonthediversityof solutions,andthepheromonematrixwasupdatedconsideringthe bestsolutionofthecolonyandthereceivedsolution.AShenteng 1800MPPsupercomputerwasusedtosolveTSPLIBinstanceswith upto318cities.TheresultsimprovedoverasequentialACO,and theadaptiveinformationexchangestrategyallowedkeepinga bal-ancebetweentheconvergenceandthediversityofthesolutions. Thespeedupdidnotlinearlyincreasewhenaddingprocessors,but goodefﬁciencyvalueswereobtainedforlargesizedproblemswhen usingupto25processors.

Inthesamelineof work,Ellabib etal.[46] proposedMACS, asynchronousmulticolonyACSwithanadaptivemechanismfor theglobalpheromoneupdate.InMACS,eachcolonyworked inde-pendently,sharingthebestsolutionsthroughanexchangemodule thatencapsulatedallthecommunicationissues.Star,hypercube, andunidirectionalringinterconnectiontopologieswerecompared intheexperimentalevaluation ona clusterof8PCsforsolving standardmedium-sizedinstancesoftheVRPwithtimewindows (VRPTW)andtheTSP.Theanalysisstudiedtheeffectofthedifferent topologiesonthesolutionsquality,showingthatthebestresults wereobtainedwiththestartopology.Inaddition,star-MACSalso outperformedapreviousACSfortheVRPTW,andaPACSforthe TSP.Noefﬁciencyanalysiswasreported.

Manfrinetal.[71]madeanin-depthanalysisofparallelACOs tosolvetheTSP.Multicolonywasidentifiedasapromising paral-lelmodelappliedtoMMAStosolveTSPLIBinstanceswithupto 2392citiesinaclusterof8PCs.Betterresultswereachievedwhen reducingthefrequencyofthecommunicationbetweencolonies. So,theauthorssuggestedusingsophisticatedcommunication pat-terns,dependantonthesizeoftheinstancesandtheexecution time, and othercomplex mechanismssuchas reinitializingthe pheromonetrailsanddividingthesearchspacetoavoidpremature convergence.Recentexperimentsonmulticolonyconfigurationby Twomeyetal.[98]showedthatthebestcommunication strate-giesdependonwhetheralocalsearchmethodisusedornot,thus aspecificanalysisshouldbeperformedtoavoidasubparsearch behavior. Theanalysisconcluded that preventinghigh commu-nicationratesmakesmorelikelythatcoloniesfocusondifferent regionsofthesearchspace,emphasizingtheexplorationand pos-siblyimprovingtheresults.Thetwopreviousworksfocusedonthe solutionquality,andtheydidnotpresentefficiencyanalysis.

Lucka and Piecka [69] solved the VRP with a multicolony savings-basedACOinamulticorearchitecture,usingadifferent threadforeachcolony.Thecoloniesasynchronouslyexchangethe bestsolutionsusingthesharedmemorywithinthesamenodeand usingsharedﬁlesacrossdifferentnodes.Theexperimental anal-ysissolvedVRPinstances withupto420customersinacluster with72SunX4100,eachonewithtwodualcoreprocessors.When increasingthenumberofcoloniesupto32,thequalityofthe solu-tionsimprovedand theexecutiontimesreduced.Moreover,the communicationtimeremainedboundedevenforthelargestVRP instancessolved.

TheworksbyXuetal.[102]andXiongetal.[101]introduceda multicolonyACSwithdynamictransitionprobability.The infor-mationexchange occurs witha ﬁxedfrequency, but theglobal bestsolutionand thefullpheromonematrixarecommunicated indifferentiterations.Apartiallyasynchronousimplementation wasstudiedtosolvetheTSPinaDawn4000L,achievingsimilar resultqualitythanasequentialACS.TheparallelACSscaledpoorly, sincethebestefﬁciencyvalueswereobtainedwhenusingonlytwo nodes,anditquicklydropswhenincreasingthenumberofnodes

upto16.Later,Xiongetal.studiedapolymorphicversionofthe previousmethod(i.e.usingdifferentkindofantsandpheromone), andsimilarresultswereobtained.

Samehetal.[87]alsostudiedamulticolonyACSthatexchanges thefullpheromonematrix,placingeachcolonyinadifferent pro-cessor.Unlikethepreviousapproach,thebest-so-farsolutionand the fullpheromone matrix wereexchangedbetween thesame pairsof colonies,andthereceivedinformationwasusedin the pheromone update process.The experimental resultsfor a TSP instancewith318citiesdemonstratedthatthetimerequiredto findanoptimalsolutiondecreaseswhenusingmorecolonies,and thattheinformationexchangefrequencyaffectsthetimerequired tofindanoptimalsolution.Noefficiencyanalysiswascarriedout. A differentapproachhasbeen appliedtosolveoptimization problemswithmulticolonyACOsusingaproblem-decomposition strategy. The multicolony ACO by Chen et al. [24] solved the classificationrulediscoveryproblemusingseveralcoloniesto inde-pendentlysearchfortheantecedentpartofaninputsetofrules andbroadcastingthetrainingseteachtimethatacolonyupdated it.FourdifferentdatasetsweresolvedinaDawn2000MPP com-puter.Themulticolonyfoundsimplerandmoreaccuraterulesthan bothasequentialACOandadecision-tree-basedalgorithm,butno efficiencystudywasreported.

AnoriginalmulticolonyACOtosolvelarge-dimension decom-posableproblemswaspresentedbyLinetal.[65].Themodelused twocoupledcoloniestooptimizedifferentpartsoftheobjective functionapplyinglocalproceduresfortheconstructionphaseand thepheromoneupdateprocess.Astagnation-basedasynchronous migrationexchangedinformationbetweencolonies.Theproposed methodgotbettersolutionsandconvergencespeedthana sequen-tialACOwhensolvingeightcontinuousproblems.Linetal.claimed thatthisparallelmodelshouldbeusefulforoptimizationinhigh dimensionalspaces,butnoefﬁciencyanalysiswasperformed. 5.5.1. Summary:multicolonyparallelACO

Multicolony modelshavebeen widelyusedin parallelACOs as they provide an accurate search exploration pattern based on the cooperative behavior of many ant colonies. Indeed, Table 4 summarizes theproposals of multicolony parallel ACO algorithms.Manyauthorshavefollowedthetraditional conﬁgu-rationbyMichelandMiddendorf[73,74],butsomeothervaluable variantshavebeenproposed,includingasynchronous implemen-tations, dynamic and adaptive models for the communication frequencyand/ortheneighborhoodtopology, andmulticolonies followinga problem-decompositionapproach.Mostofthe mul-ticolony implementationsweredeveloped usingthedistributed memoryparadigminclusterplatforms.

Multicolony ACOs showed promising results since the first implementation in the last years of the 1990’s, but the model hasbeencontinuouslyimproved,payingspecialattentiontothe mechanisms used for exchanging information among colonies. Recent studiesconfirmed thata trade-offbetweentheisolated explorationwithineachcolonyandthecooperationusing infor-mationexchangeisdesirableinordertoachieveaccurateresults andperformance.Increasingthefrequencyofcommunicationputs emphasisin exploitinggoodsolutions,sothis strategyis useful whensolvinginstancesofincreasingsize.However,italsocould haveanegativeimpactintheparallelACOefficiencyduetothe costofthecommunications.

5.6. Hybridmodels

Therehavebeenmanyproposalsfordesigninghybridparallel ACOsthatcombinethecharacteristicsofmorethanoneparallel model.However,inpractice,onlyafewhybridparallelACO meth-odshavebeenimplemented.

(12)

Table4

SummaryofmulticolonyparallelACOproposals.

MichelandMiddendorf[73] 1998 EAS SCSP Multiprocessor

MichelandMiddendorf[74] 1999 EAS SCSP Multiprocessor

Middendorfetal.[75] 2002 EAS QAP PC

PiriyakumarandLevi[82] 2002 ACO TSP CrayT3E

Leeetal.[62] 2002 ACO Weapon-targetassignment PCs

Chuetal.[27] 2004 ACS TSP N/D

ChenandZhang[23] 2005 ACO TSP Dawn2000

ChuandZomaya[26] 2006 ACO Proteinstructureprediction Cluster

Chenetal.[24] 2006 ACO Classiﬁcationrulediscovery Dawn2000

Manfrinetal.[71] 2006 MMAS TSP Cluster

Yangetal.[104] 2007 ACO busnetworkdesign Cluster

Ellabibetal.[46] 2007 ACS VRPTW,TSP Cluster

Chenetal.[22] 2008 ACO TSP Shenteng1800

Xiongetal.[100] 2008 MMAS TSP Dawn4000L

Linetal.[65] 2008 ACO Decomposableproblems PC

HongweiandYanhua[54] 2009 MMAS DNAsequencedetermination DawnTC4000

LuckaandPiecka[69] 2009 ACO VRP Cluster

Xuetal.[102] 2009 ACS TSP Dawn4000L

Jovanovicetal.[58] 2009 ACO MWVCP IntelCore2

Jovanovicetal.[59] 2010 ACO MWVCP IntelCore2

Ta˘skovaetal.[94] 2010 ACO Finiteelementmeshdecomposition Cluster

Twomeyetal.[98] 2010 MMAS TSP Cluster

Xiongetal.[101] 2010 ACS TSP Dawn4000L

Yuetal.[105] 2010 ACO Multi-depotVRP Cluster

Samehetal.[87] 2010 ACS TSP Cluster

Theparallelcombinationofantsandevaluationofsolution ele-mentswasincludedasoneofthecategoriesofthetaxonomyby RandallandLewis[84].Thismodelusedtwolevelsofmaster-slave parallelism;onebetweenthecolonyandtheants,andanotherone betweeneachantandtheevaluationofthesolutioncomponents. Later,Delisleetal.[33]suggestedanhierarchicalhybridin two levels.Theupperlevelwasamulticolony modelthatemployed message passing for the intercolony communications, and the lower level wasa fine-grain master-slavein which themaster processwasreplacedbyaglobalsharedmemory.Nospecific imple-mentationshavebeenpresentedforthefirsttheoreticalproposal, whilethesecondonehasbeenrecentlyadoptedintheworkbyLiu etal.[66],whereitwasusedtosolvearoutingprobleminmobile networks.

Iimuraetal.[56]studiedaparallelQueenAntStrategy(ASqueen),

amethodusingagroupofagentstobuildsolutionsandaqueenant givingdirectivestotheagentsaboutwhentodiversifythesearch ortoexploitgoodsolutions.ThisparallelACOfollowedanhybrid approachthatcombinedamaster-slavemodelwithamulticolony: thequeenantwasthemaster,whichcontrolledthesearch,andeach slaveheldacolonythatworkedindependentlyonthesame prob-lem,periodicallysendingitsbestsolutiontothemaster.ATSPLIB instancewith76citieswassolvedintheexperimentalevaluation inanheterogeneousclusterwith9PCs.Theparallelmodelslightly improvedthesolutions qualitywhen considering anincreasing numberofagents,whilealsoreducingtheexecutiontime.

AnoriginalparallelACOimplementationwasproposedbyLiu etal.[67]forsolvingmulti-stagedecisionproblemsfollowing a decompositionstrategyusingaconstructiongraph.Eachprocess performedtheACOsearchinasubproblemandthewhole solu-tionsearchwasaccomplishedcooperativelybythesetofprocesses. Severalprocessorswerearrangedinapipeline-likestructurethat allowedhavingmanyantsconcurrentlysolvingsubproblemsatthe sametime,andthepheromoneupdatewasperformedaftereach antbuiltawholesolution.Theexperimentalevaluationsolvedan ad-hocexampleproblemusingaclusterofPCs,andtheefﬁciency analysisshowedthatthespeedupvaluesincreasedwhensolving largeproblemsorwhenusingmanyants.

RoozmandandZamanifar[86]improvedthepreviousproposal from Chen et al. [24] tosolve the classiﬁcation rule discovery

problemwithanhybridparallelACS.Inthenewmethodseveral coloniesworkedindependentlyondifferentcategorieswithallthe inputset,andacoarse-grainmasterslavemodel–implemented withmultithreading– wasusedforconstructingandevaluating thesolutions.Anadditionalpheromoneupdateisappliedusing thevaluesofthebestcolonyeachtimethataruleisfound.Five public-domaindeseasedatasetswereusedintheexperimental evaluation.Themethodwasabletodiscoversimplerulesthanthe parallelACObyChenetal.[24],withhigherpredictiveaccuracy thanthoseobtainedbyotherpreviousACO-basedmethods.The authorsclaimedthat“byusingefﬁcientcommunicationmethods” themulticolonymodelwasabletoreducethespeedofconvergence toalocaloptimum(thusachievingmoreaccurateresults),butno furtherdetailswereprovidedaboutthecomputationalefﬁciency.

Following a different lineof work, the parallel ACO by Liu etal.[66]implementedamodelsimilartothetwo-level hierar-chicalhybridproposalbyDelisleetal.[33].Theupperlevelused astandardmulticolonyapproachemployingtheMessagePassing Interfaceforcommunication;andthelowerlevelusedaﬁne-grain master-slavemodeltomanagethesharedpheromonematrixusing multithreading(themasterprocessisreplacedbyaglobalshared memory).Thishybridmodelwasappliedtosolveamulti-path rout-ingprobleminMobileAd-hocNetworks(MANETs),andtheparallel ACOachievedbettersolutionsthanboththeAODVandDSR pro-tocolsregardingthepacketdeliveryratioandaverageend-to-end packetdelay.Noefﬁciencyanalysiswasreported.

AsummaryofthehybridproposalsforparallelACOispresented inTable5.

6. Comparativeanalysis

This section presents a comparative analysis of the paral-lelACO modelsin thenewtaxonomy, regarding thetwo main goalswhen usingaparallel implementation:thecomputational efficiencyandthequalityofresults.Thestudyisaimedatoffering aninsight onthe mainresultsfrom therelated works, provid-ingspecificcontributionstotheresearchcommunityinorderto evaluatethebenefitsofeach parallelmodel,and possibly help-ingthemtoselectoneofthemforsolvingagivenoptimization problem.