Neuroevolution of a multi-agent system for the dynamic pickup and delivery problem

(1)

Neuroevolution of a multi-agent system for the

dynamic pickup and delivery problem

Jonathan Merlevede, Rinde R.S. van Lon, and Tom Holvoet iMinds-DistriNet, KU Leuven, 3001 Leuven, Belgium.

{firstname.lastname}@cs.kuleuven.be

Abstract Automated transportation and logistics create particularly challenging problems for planners and schedulers. Besides their com-putational hardness, such systems need to cope with the dynamic and distributed nature of the problems. This article describes an agent-based approach to the dynamic pickup and delivery problem (PDP). We inves-tigate the feasibility of using the neuroevolution of augmenting topolo-gies (NEAT) algorithm to create and optimize a multi-agent system (MAS) for dynamic PDPs. A thorough feasibility study requires a sig-niﬁcant eﬀort since a platform is required that facilitates a comparison of MAS and centralized algorithms. We implemented an existing bench-mark dataset for the dynamic pickup and delivery problem with time windows (PDPTW) in the RinSim multi-agent simulator. We supplied the NEAT algorithm with training data derived from this dataset and we deployed the resulting neural network in a homogenous MAS that uses a blackboard coordination model. Our preliminary results show that our approach is a double-edged sword, the resulting MAS responds in real-time (response real-time in ms) but the solution quality is worse compared to that of the benchmark dataset.

Keywords: pickup and delivery problems with time windows (PDPTW), artiﬁcial neural networks (ANN), neuroevolution of augmenting topolo-gies (NEAT), multi-agent systems (MAS), distributed systems, decen-tralized, optimization, logistics, transportation, dynamic, real-time

1 Introduction

In 2012, total U.S. business logistics cost $1.28 trillion dollar, a 6.6 percent increase from the previous year and accounting for 8.5 percent of the U.S. gross domestic product [42]. Of these costs, 52% were due to transportation by truck or airplane. Clearly, the calculation of transportation routes is of tremendous economic value.

Constructing transportation routes is an intractable problem that has been the topic of intensive research for over half a century. In part due to compu-tational limitations, many logistics companies impose constraints on their cus-tomers. For example, it is common practice to make customers order before a certain deadline and to process their requests only after this deadline expires.

(2)

In essence, this transforms an inherently dynamic problem into a static prob-lem suitable for ordinary routing algorithms [25,26]. Transforming a dynamic problem into a static one is not always appropriate or possible, e.g. in the case of taxi services, emergency services and specialized courier services. For these dynamic problems most existing techniques involve adapting an algorithm that creates routes for the static version of the problem [4,9,27]. Since solving a static problem each time new information is revealed is computationally intensive, the usual approach is to start by computing a good set of routes. As new information becomes available, these routes are updated using heuristic methods, for example by inserting, deleting or interchanging destinations in the existing routes.

In our research we investigate whether using a multi-agent system (MAS) poses a viable alternative to conventional centralized algorithms by embracing both the dynamism and the distributed nature of a problem. More speciﬁcally, in this paper we apply neuroevolution of augmenting topologies (NEAT) to cre-ate a MAS for the uncapacitcre-ated dynamic pickup and delivery problem with time windows (PDPTW). In order to evaluate the feasibility of this approach a thorough evaluation with state-of-the-art centralized algorithms is essential, as is argued in [18]. In the dynamic PDPTW there are two algorithmic properties of interest, (1) the solution quality according to the objective function and (2) the reaction time of the algorithm. Currently there are very few MAS papers that make an eﬀort to compare with state-of-the-art centralized algorithms. An example of a paper with a good evaluation is the paper by Máhr et al. [21]. The reason for this lack of focus on good evaluation may well be due to its time consuming nature and an apparent reluctance from the MAS community in evaluation beyond a prototype.

This paper presents our ﬁrst steps towards a NEAT and MAS based approach and a thorough evaluation.We start with an overview of related work (Sect.2). We then give a precise deﬁnition of the dynamic PDPTW we are addressing (Sect.3) and detail our NEAT and MAS approach (Sect.4). We evaluate our approach by using the open-source RinSim simulator [17] and a dataset cre-ated by Gendreau et al. [9] (Sect.5). Section6 concludes and provides possible directions for future work.

2 Related work

We divide our discussion of related work into two groups: literature on MASs for dynamic pickup and delivery problems (PDPs) (Sect.2.1) and literature on NEAT (Sect. 2.2). For a literature survey on the PDP we refer to [4,25,26].

2.1 Multi-agent systems

Many MAS-based approaches to the dynamic PDP exist in the literature. The common approach is to model each vehicle as an agent. Although other kinds of agents representing e.g. distribution companies are often used, we use the terms ‘vehicle’ and ‘agent’ interchangeably.

(3)

Applicability The influential work of Fischer et al. [7,8] states the applicabil-ity of MAS-based solutions to dynamic PDPs. They argue that many of the difficulties of the transportation domain (inherently distributed, highly dy-namic, complex and the existence of cooperation) are naturally captured and alleviated by a MAS. More recently, Máhr et al. [21] have shown that even fully decentralized (‘flat’) MAS-based based approaches to the PDPTW can perform as well as state-of-the-art centralized algorithms based on mixed integer programming, especially in highly dynamic scenarios.

Coordination When a group of agents needs to make a collective decision such as who gets to serve a new request, a coordination mechanism such as bidding, auctioning or voting is typically used. This is the case in [8], which extends the contract-net protocol using a global auction mechanism that the authors call simulated trading. The authors claim that global information in the form of auctioning can signiﬁcantly improve the allocation of goods to vehicles.

Evolution Beham et al. [3] were among the ﬁrst to combine a MAS-based approach to the uncapacitated dynamic PDPTW with an evolutionary al-gorithm (EA) [6]. They deﬁne vehicle behavior using two heuristic functions or agent heuristics: one determines where agents travel to, the other what goods to pick up upon arrival at a pickup or delivery point. Their agent heuristics are weighted sums of hand-constructed heuristic functions, where an evolution strategy is used to evolve the weights. They use randomly gen-erated scenarios to show that their approach produces valid routes and that evolved heuristics can, to a limited extent, be used to produce routes for other scenarios than the one used for training. They do not compare their results with those of other algorithms.

Van Lon et al. [19] use a similar approach, but use only a single agent heuris-tic instead of two, and geneheuris-tic programming (GP) instead of an evolution strategy. Their agent heuristic assigns a priority to all unhandled requests, which is either a pickup or a delivery of a parcel that is already in the agent’s cargo. Agents continuously re-evaluate their agent heuristic on all parcels, allowing for diversion. Their results indicate that the evolved agent heuris-tic outperforms the hand constructed centralized hyper-heurisheuris-tic by a large margin.

The paper of Vonolfen et al. [40] continues in this line of research. They start from [3] but use GP with a large number of variables, including variables that give information about other agent’s distances and destinations. They slightly outperform [3]. They also compare with an unspeciﬁed tabu-search algorithm which produces slightly better routes.

In [3,19,40], there is no explicit coordination. There is implicit coordination only through the hand-designed input heuristics and a common collection of open pickup requests. In [19], agents change this collection only after performing a pickup, which means that several agents may be driving to the same pickup or delivery point at the same time. Beham et al. [3] do not specify when agents are informed.

(4)

2.2 NEAT

NEAT is an evolutionary algorithm [6] technique that evolves both the weights and topology of artificial neural networks (ANNs) [34,38]. NEAT initializes a uniform population of ANNs without hidden neurons and a fully connected input and output layer. It gradually complexifies these ANNs by introducing additional neurons and connections when it finds that they increase performance.

NEAT has proven that it is capable of devising complex behavior on nu-merous occasions [14,29,37–39,43] and has become popular as a benchmark algorithm in the field of neuroevolution [15,23,32,33]. The large number of algorithms that have NEAT at their core [5,12,30,35,36] illustrate its flexibility. Feature-selective NEAT (FS-NEAT) is a minor variation of NEAT introduced in [41]. FS-NEAT differs from NEAT only in the initialization of the population: instead of a homogeneous population of networks with fully connected input and output layers, FS-NEAT uses a population of ANNs that contain only a single connection between an arbitrary input and an arbitrary output. In this way, FS-NEAT implicitly performs feature selection. FS-NEAT outperforms NEAT in the presence of inputs that are irrelevant to the task, and its performance remains nearly constant with increasing numbers of irrelevant inputs. NEAT and FS-NEAT have similar performance in the absence of irrelevant inputs.

To the best of our knowledge, NEAT has never been applied to a transporta-tion or logistics problem before.

3 Problem deﬁnition

This section gives a precise deﬁnition of the dynamic PDPTW that we investigate (Sect.3.1), as well as a description of the scenarios that we use for training (Sect.3.2).

3.1 Pickup and Delivery Problem

In the dynamic PDPTW a ﬂeet of vehicles handles a set of transportation re-quests R that arrive over time while vehicles are working. We use the deﬁni-tion from [9] which is a dynamic version of the uncapacitated static PDPTW from [31].

In this deﬁnition, nothing is known about the size and distribution ofR. A request consists of pickup and delivery locations and corresponding time windows and an ‘announce time’, which is the point in time when the request is made known to the system. A request is consideredsatisﬁedorhandledwhen a parcel is transported from its pickup to its delivery location. Time windows are half-open: vehicles cannot service a request before the beginning of the appropriate time window, but can still service a request after the end of that window. Other properties are as follows.

– Vehicles always travel at the same speed.

(5)

– Vehicles can travel from point to point in a straight line.

– Vehicles start at the same depot at the beginning of the scenario. – Vehicles are expected back at that depot at the end of the scenario. – Vehicles are not allowed to divert from their destination once they have

started driving.

Allowing diversion can increase the quality of routes [13], but is not allowed in the deﬁnition of [9]. The goal of the dynamic PDPTW is to minimize a sum of three criteria: total travel time, the sum of tardiness over all pickup and delivery locations and the sum of overtime over all vehicles. Equation1gives the resulting cost function. ∑ k∈M Tk+∑ v∈V max{0, tv−lv}+∑ k∈M max{0,_tk¯ ₋_l₀_} (1) In this equation,M is the set of vehicles,Tk is the total travel time by vehiclek,

V is the set of all pickup and delivery locations,tvis the time at which the time

window for service at pickup or delivery point v ends, lv is the time at which

pickup or delivery point v is serviced,¯_t_k _{is the time at which vehicle} _k _arrives

back at the depot and l0 is the time at which the scenario ends.1 The problem

ends only afterall requests are handled and all vehicles are back at the central depot.

3.2 Scenario generation

A specific problem instance or scenarioconsists of a collection of requests. In order to be able to run an EA without overfitting we need a separate dataset for training. To create such a dataset, we implemented a scenario generator that follows the specification in [9]. The intention is for generated scenarios to be feasible and more or less realistic.2_{The probability of a request appearing varies} over both space and time as defined by inputs to the generator. We say that scenarios that are generated by the scenario generator using same parameters belong to the samescenario class. Since scenario generation is a stochastic pro-cess, individual scenarios belonging to the same class can be very different, and scenarios belonging to different classes can be very similar.

4 Approach

This section presents the MAS-based approach to the dynamic PDPTW that we use in our study of using NEAT in MASs. Like [3,19,40], we use an EA to combine the information from several hand-constructed heuristics into an agent heuristic that determines agent behavior. We ﬁrst describe how the agent heuristic determines agent behavior (Sect.4.1). We then present the set of hand-constructed heuristics that we use in our evaluation (Sect.4.2).

1 _{Although [9] deﬁnes a more general cost function that allows the diﬀerent}

compo-nents to have diﬀerent weights, it only reports on experiments with all weights equal to one. We discarded the weights to allow a direct comparison of results.

2 _{The generator can create scenarios that are infeasible (impossible to complete before}

(6)

4.1 Agent behavior

Whenever requests are announced, they are placed in a collection of requestsL stored on a central blackboard (i.e. a list that can be accessed and modiﬁed by all agents). Each agent aalso has a private collection of requestsRa thatahas

picked up but not yet delivered. Wheneverais idle, it constantly checks to see if L∪Ra is empty. If it is not, it applies its agent heuristic to all requests in

L∪Ra and selects requestrwith the lowest value.3 Ifacan reachr4 early, i.e.

before the beginning of the (pickup or delivery) time window associated with r, it waits until either a new request is added to Lor it can no longer reach r early. It then again becomes idle. Ifa cannot reach rearly, astarts driving to r. Upon arrival, a picks up or deliversr’s parcel and then again becomes idle. Additionally, before starting to drive,amovesr out ofL orRa. If r∈L, then

r has not yet been picked up, anda claimsr by movingr from L toRa. This

prevents other agents from also driving tor’s pickup destination. Ifr∈Ra, then

the parcel associated withr is already ina’s cargo. Sinceais in the process of delivering the parcel and cannot be interrupted, the request is handled and a removes r from Ra. Whenever a is idle, L∪Ra is empty anda can no longer

reach the central depot before the end of the scenario,adrives to the depot. This procedure implements the wait-earliest waiting strategy, which performs well without requiring large ﬂeets of vehicles [22,28]. As in [3,19], there is only implicit coordination through the shared collectionL.

4.2 Evolving heuristics

We evolve the agent heuristic using NEAT, which requires a fixed number of inputs and outputs. We define only a single output, which serves as the heuristic value and defines vehicle behavior as described in the previous subsection.

Because the number of inputs is fixed, we cannot use inputs like the locations of all open requests. Instead, we define a fixed number of simple input heuristics that summarize the information on a specific agent-request pair. These input heuristics should capture as much of the available information as possible. Any uncaptured information is not accessible to NEAT and therefore not used by any agent. We use the following values as inputs for requestrfor agenta; some of these are taken from [3].

Waiters The number of truck agents that are waiting on r. CargoSize The number of parcels in the cargo of a. IsInCargo Whether or notris in the cargo ofa.

TimeUntilAvailable Time remaining until r can be served. This is equal to the time remaining until the beginning of the p/d (pickup or delivery) time window, minus the time required forato travel tor’s p/d point. We do not allow this value to become negative.

3 _{Ties are broken on a ﬁrst-come, ﬁrst-served basis where parcels in}

L take priority over parcels inRa.

4 _{The pickup location of}

(7)

Ado The average distance between r’s p/d point and the delivery locations of the parcels thatais carrying.

Mido The minimum of the distance values betweenr’s p/d point and the de-livery locations of the parcels thatais carrying.

Mado The minimum of the distances values between r’s p/d point and the delivery locations of the parcels thatais carrying.

Dist The distance fromator’s p/d point.

Urge The urgency of a waiting order, deﬁned as the diﬀerence between the end of r’s p/d time window and the current time. This value can become negative.

Est The diﬀerence between the start of the p/d time window and the current time. This value can become negative.

Ttl The time to live, which we deﬁne as the time that is left until the end of the scenario. This value can become negative.

Adc The average distance ofr’s p/d location to all agents excludinga. Midc The distance ofr’s p/d location to the closest vehicle excludinga. Madc The distance ofr’s p/d location to the farthest vehicle excludinga. We do not know whether all these heuristics are relevant for solving the problem, and therefore use FS-NEAT instead of regular NEAT.

Note that like in [3,19,40], the calculation of the input heuristics requires knowledge of the location of all agents. Because of this and because of the com-mon collection of open requests, our MAS is not fully decentralized or ‘ﬂat’.

4.3 Simulation-based ﬁtness evaluation

NEAT requires a fitness or quality measure of the heuristics, that it evolves. We cannot analytically deduce this fitness, and therefore assign fitness values based on the performance of the heuristic in a number of simulated scenarios.

Simulations can end in one of two ways. Either the scenario has ended and all vehicles are back at the depot, or the simulation time has exceeded a predefined limit (which is far greater than the length of the scenario). In the latter case, we assign it a fitness of zero. In the first case, we use (1) as a measure of the quality of an individual. Because NEAT only supports maximizing positive fitness values, we subtract (1) from a large positive constant. Preliminary experiments indicate that this way of transforming the cost outperforms inversion and linear ranking. We take three measures to make sure that fitness values are accurate measures of the relative quality of heuristics. Firstly, we compute fitness as the average performance over a number of simulations to prevent overfitted heuristics from having high fitness. Secondly, we fix the random seed of our simulator to ensure each heuristic is evaluated on a problem of the same difficulty. Thirdly and lastly, we forego a common optimization and re-evaluateallheuristics in the population each generation, including elite heuristics (i.e. heuristics that survived from the previous generation). This prevents elites from changing only in generations where evaluation is based on easy scenarios, and prevents lucky individuals from having a large influence on the evolutionary run.

(8)

5 Evaluation

This section evaluates the approach described above. It sketches the evalua-tion setup (Sect.5.1), describes the performed evolutionary runs (Sect.5.2) and presents our results together with the results reported in [9] (Sect.5.3). It con-cludes with an analysis of these results (Sect.5.4).

5.1 Evaluation setup

We perform simulations using RinSim, a high-quality, open source simulator written in Java that speciﬁcally targets the family of transportation and lo-gistics problems [19]. We also use SharpNEATv2 [11], an implementation of NEAT in C#. SharpNEAT is a mature project and has been used in the litera-ture [2,16,20,29], even though it diﬀers from the original FS-NEAT in some ways such as speciation [10]. The Apache Thrift [1] framework organises communica-tion between SharpNEAT and RinSim. All of our data and code is available on Figshare.5_{Because of the need to evaluate millions of simulations that generally} take a few seconds to evaluate, we use the grid computing framework JPPF [24] to distribute simulation tasks over 75 computers with a combined number of over 250 cores.

5.2 Evolutionary runs

We examined the performance of our approach for three scenario classes (Tab.1). The characteristics of these classes are taken from [9], who report results for ﬁve scenarios per class.

Table 1.Characteristics of the three investigated scenario classes Scenario class Duration Average request intensity (on average) Fleet size

4h-24 4 hours 24 requests per hour 10 vehicles

4h-33 4 hours 33 requests per hour 10 vehicles

7.5h-24 7.5 hours 24 requests per hour 20 vehicles

We generated a training set of 1000 scenarios and a test set of 500 scenarios for each class. We then performed four evolutionary runs, each time using a population size of 600. Finally, we simulated each input- and champion heuristics on all scenarios in each of the test sets and on the 15 scenarios of [9].

In run one we always perform ﬁtness evaluation using the same single sce-nario, also used by [9].6 _{This scenario is four hours long and has an average} request intensity of 21 requests per hour (i.e. precisely 84 requests). In runs two

5 _{http://dx.doi.org/10.6084/m9.figshare.956301}

(9)

Table 2. Means and standard deviations of the costs of diﬀerent heuristics over test sets of 500 scenarios.

4h-24test set 4h-33test set 7.5h-24test set Dist 1873±793 (196%) 6794±2416 (297%) 2142±230 (155%) Est 1809±769 (189%) 6768±2424 (296%) 2034±197 (147%) Champ. run 1 1183±376 (124%) 3958±1733 (173%) 1711±209 (124%) Champ.4h-24 956±282 (100%) 2653±974 (116%) 1437±113 (104%) Champ.4h-33 1078±251 (113%) 2287±655 (100%) 1691±168 (122%) Champ.7.5h-24 1292±450 (135%) 4150±1655 (181%) 1385±174 (100%)

to four we perform fitness evaluation using the training scenarios of class4h-24, 4h-33and 7.5h-24.7 _{We refer to these runs as run}_{4h-24, run} _4h-33_{and run} 7.5h-24. We always perform three simulations (using three different scenarios) for each fitness evaluation.

We monitor the evolutionary process while it is running and stop evolution when there has been no improvement for a large number of generations. Run one to four took 2503, 600, 632 and 159 generations. Therefore, for example evolving the champion of class 4h-33 took 600·632·3 = 1137600 simulations. On our grid this took two hours and 55 minutes, giving an average simulation time of approximately 2.3 seconds per core. Scenarios in 7.5h-24 take 4.4 seconds on average: they are longer and require more work per time unit because of the larger ﬂeet size. Scenarios in 4h-24take only 1.3 seconds to simulate.

5.3 Results

We evaluated every primitive and champion heuristic on all test sets (Tab.2). Perhaps surprisingly, Est always constructs better routes than all other primitive input heuristics including Dist.8 _{The evolved heuristics easily outperform the} input heuristics. As expected, the best routes are created by the heuristic that was evolved using training scenarios of the same class (e.g. the champion of run 4h-24is best for creating routes for scenarios in4h-24). Looking at performance on test set 7.5h-24, the routes of Est are 47% more expensive than the routes constructed by the champion of run 7.5h-24. For run 4h-24, this increases to 89% and for run4h-33to 196%.

The overﬁtted champion heuristic constructs a route with a cost of 756, which is 63% more than the cost of 465 reported in [9] and 70 better than the routes created by the non-overﬁtted champion heuristic.

The behavior of NEAT in terms of fitness progression, genome growth and species sizes is very similar when overfitting and not overfitting. This shows that NEAT has no problem with nondeterministic fitness functions. The neural

7 _{In generation}

g, we use scenarios numbered3(g−1) + 1 mod1000to3(g−1) + 3 mod1000.

8 _{This cannot be deduced from Tab.}₂_{, where we left out the results of all primitive}

(10)

Table 3. Costs of routes by champion heuristics of run two to four compared to the costs reported in [9].

Scenario number Run 4h-24 Run 4h-33 Run 7.5h-24 [9] Ours [9] Ours [9] Ours

1 336a _{533 (159%)} ₄₇₃ _{736 (156%) 539 1021} _(189%) 65b _{215 (330%) 4392 4596 (104%)} ₁ ₁₀₈ _(108%) 55c _{78 (142%)} ₆₉₉ _{708 (101%)} ₀ ₃₄ _(n.a.) 465d _{826 (177%) 5564 6039 (109%) 540 1162} _(215%) 2 386 579 (150%) 402 684 (170%) 614 1125 ( 183%) 68 235 (346%) 867 990 (114%) 3 125 (4167%) 53 83 (157%) 297 272 ( 92%) 1 41 (4100%) 506 896 (177%) 1566 1946 (124%) 618 1291 ( 209%) 3 352 601 (170%) 455 710 (156%) 629 1215 ( 193%) 139 240 (172%) 853 1417 (166%) 2 163 (8150%) 98 89 (90% ) 303 352 (116%) 1 53 (5300%) 589 930 (158%) 1611 2479 (154%) 632 1431 ( 226%) 4 359 575 (160%) 434 770 (177%) 700 1324 ( 189%) 38 224 (589%) 1104 1459 (132%) 6 152 (2533%) 31 76 (245%) 348 354 (102%) 5 62 (1240%) 428 875 (204%) 1886 2522 (134%) 711 1538 ( 216%) 5 348 529 (152%) 495 1049 (212%) 694 1255 ( 181%) 75 225 (300%) 4121 4463 (108%) 0 152 (n.a.) 52 78 (150%) 687 680 ( 99%) 0 49 (n.a.) 476 832 (175%) 5303 5913 (112%) 694 1455 ( 210%) Average 357 563 (158%) 452 722 (160%) 635 1188 ( 187%) 78 228 (292%) 2267 2585 (114%) 2 140 (7000%) 58 81 (140%) 467 473 (101%) 1 48 (4800%) 493 872 (177%) 3186 3780 (119%) 638 1376 ( 216%)

a _{Travel time,}b_Lateness,c _Overtime,c_Total

nets of champions contain in between 20 and 100 connections. We observed that performance usually stops increasing after the number of connections reaches approximately 20. After that, genome sizes keep on increasing without gains in performance. Although we only report on four runs, our experience with other runs is that NEAT always ﬁnds heuristics of approximately the same quality.

Table3compares the costs of the routes created using our mechanism with the 15 costs reported in [9]. We use the champion heuristics of run two to four to construct routes. These heuristics have been trained on scenarios of the correct class, but were not trained on the speciﬁc scenarios used by [9]. The diﬀerence in cost is smallest for the scenarios in4h-33, where our routes are only 19% more expensive. For scenarios in 7.5h-24, our routes are 116% more expensive than those of [9].

(11)

5.4 Analysis

We identify two trends in the results of the previous section. After arguing for these trends, we compare our approach to the one of [9]. This section concludes by identifying possible improvements. We want to stress that results are still preliminary, and that more experiments are required to support most of the points below.

Trends The evolved heuristics perform better when ﬂeet sizes are smaller, both relative to the input heuristics and the routes constructed by [9]. This can indi-cate a need for better coordination between vehicles.

The evolved heuristics also perform better when request intensity is higher, again both relative to the input heuristics and relative to [9]. The latter is not surprising, since [9] uses an interruptible anytime algorithm that optimizes routes until new requests arrive. In contrast, our approach does not use the extra avail-able computation time. The first can be an indication that there is more room to optimize when the number of open requests is larger. We hypothesize that this is mostly due to a lack of the ability to evolve cooperative behavior. For example, when many vehicles are idle and the list of open requests contains only a single request, the first vehicle to get the opportunity to claim this request will always do so; this is part of the hardwired agent behavior (Sect.4.1) on which evolu-tion has no influence. Such situaevolu-tions occur less frequently with higher request intensity per vehicle.

Comparison The comparison with [9] shows that our system cannot yet compete with the state-on-the-art on the given problem set. Improvements are necessary to make our system competitive; we already identiﬁed some possible issues above. However, there are also several points to be made in favor of our approach.

First we note that the performance of the champion of run 4h-24 is only 9% worse than the performance of the overfitted heuristic. This indicates that heuristics generalize well within their scenario class. Although our mechanism is very simple, we seem to have successfully evolved something akin to instinct for our agents: it allows them to choose one request out of hundreds in mere milliseconds in a reasonable way. We see it as a huge accomplishment that our system allows to shift much of the computational effort off-line, requiring very little online computation time. Also, that we currently require so little online computation time can indicate that there is still much room for improvement.

Another important point is that we compared the performance of our al-gorithm to that of [9] in a setting for which their alal-gorithm is optimized. As an illustration, the artiﬁcial constraint that vehicles may not divert was part of the problem description and is crucial for [9]. Supporting diversion requires only minimal change of our algorithm and almost certainly improves performance. Also, although requests arrive during the day, most of them only have to be picked up and delivered by the end of the day (Fig.1). This gives [9] plenty of time to optimize their routes without incurring a tardiness penalty. We hy-pothesize that our approach would perform relatively better in more ‘dynamic’ settings that require a fast response time.

(12)

... 0 . 1 . 2 . 3 . 4 . 5 . 6 . 7 . 8 . 0. 0.2 . 0.4 . 0.6 . 0.8 . 1 . Time [h] . ECDF . . .._{Announce time}

. .._{End of pickup time window} . .._{End of delivery time window}

Figure 1.Empirical cumulative distribution function (ECDF) of the announce time and the end of pickup and delivery time windows for a typical instance of 7.5h-24.10

Improvements The most pressing improvement is to add some form of coordina-tion to the mechanism. We expect this to lead to much better results, especially because all well-performing MASs in the PDP-domain that we know of coordi-nate [8,21]. There are several ways to incorporate coordination in our evolved heuristic approach. For example we can try to combine it with an auction mech-anism, or we could add information about other agent’s intentions into the set of inputs.

6 Conclusion

We investigated the feasibility of using NEAT to create a MAS for the dynamic PDPTW and compared our approach to a specialized tabu search heuristic by Gendreau [9]. Our NEAT and MAS based approach produces routes that are between 19% and 116% worse compared to the results reported by Gendreau. However, our approach uses only very little computation time and presumably performs relatively better in highly dynamic situations such as those encountered by emergency services. Results indicate that NEAT is working well.

Although our approach is very similar to that of [3,19,40], we are the ﬁrst to compare our results with those of a state-of-the art algorithm. Unfortunately we therefore are also the ﬁrst with a worse result, which is unexpected based on earlier work [3,19,40]. Based on these preliminary results we conclude that a thorough, fair and independent evaluation is necessary to gain insight in the performance of algorithms and MASs. In future work we plan to increase the number of evolutionary runs and to compare the performance of NEAT to that of GP.

Acknowledgements

This research is partially funded by the Research Fund KU Leuven.

(13)

References

1. Apache Software Foundation: Apache Thrift. URL http://thrift.apache.org [Accessed: 2013-04-30]

2. Baker, R.J.S., Cowling, P.I., Randall, T.W.G., Jiang, P.: Can Opponent Models Aid Poker Player Evolution? In: Hingston, P., Barone, L. (eds.) 2008 IEEE Symposium on Computational Intelligence and Games (Perth, Australia, 15-18 Dec. 2008), pp. 23–30. IEEE, s.l. (2008)

3. Beham, A., Koﬂer, M., Wagner, S., Aﬀenzeller, M.: Agent-Based Simulation of Dispatching Rules in Dynamic Pickup and Delivery Problems. In: 2009 2nd Inter-national Symposium on Logistics and Industrial Informatics (Linz, Austria, 10-11 September 2009), pp. 1–6. IEEE, s.l. (2009)

4. Berbeglia, G., Cordeau, J.F., Laporte, G.: Dynamic Pickup and Delivery Problems. European Journal of Operational Research 202(1), 8–15 (Apr 2010)

5. D’Ambrosio, D., Stanley, K.O.: A Novel Generative Encoding for Exploiting Neural Network Sensor and Output Geometry. In: Lipson, H. (ed.) Proceedings of the 9th Annual Conference on Genetic and Evolutionary Computation - GECCO ’07, London, England UK - July 7-11, 2007, pp. 974–981. ACM, New York, NY (2007) 6. Eiben, A.E., Smith, J.: Introduction to Evolutionary Computing. Natural

Com-puting Series, Springer, Berlin/Heidelberg/New York, NY (2003)

7. Fischer, K., Müller, J.P., Pischel, M.: Cooperative Transportation Scheduling: An Application Domain for DAI. Applied Artiﬁcial Intelligence 10(1), 1–33 (Feb 1996) 8. Fischer, K., Müller, J.P., Pischel, M., Schier, D.: A Model for Cooperative Trans-portation Scheduling. In: Proceedings of the 1st International Conference on Multi-Agent Systems (ICMAS-95), pp. 109–116. AAAI Press/MIT Press, San Francisco, CA (1995)

9. Gendreau, M., Guertin, F., Potvin, J.Y., Séguin, R.: Neighborhood Search Heuris-tics for a Dynamic Vehicle Dispatching Problem with Pick-ups and Deliveries. Transportation Research Part C: Emerging Technologies 14(3), 157–174 (Jun 2006) 10. Green, C.: Speciation by K-means Clustering. URLhttps://sites.google.com/ site/sharpneat/speciation/speciation-by-k-means-clustering [Accessed: 2013-05-01]

11. Green, C.: SharpNEAT. URL http://sharpneat.sourceforge.net/ [Accessed: 2013-05-29]

12. Hastings, E.J., Guha, R.K., Stanley, K.O.: Evolving Content in the Galactic Arms Race Video Game. In: Lanzi, P.L. (ed.) 2009 IEEE Symposium on Computational Intelligence and Games (Sept. 7-10, 2009) – Politecnico di Milano, Milano, Italy, pp. 241–248. IEEE, s.l. (2009)

13. Ichoua, S., Gendreau, M., Potvin, J.Y.: Diversion Issues in Real-Time Vehicle Dispatching. Transportation Science 34(4), 426–438 (Nov 2000)

14. Jang, S.H., Yoon, J.W., Cho, S.B.: Optimal Strategy Selection of Non-Player Char-acter on Real Time Strategy Game Using a Speciated Evolutionary Algorithm. In: Lanzi, P.L. (ed.) 2009 IEEE Symposium on Computational Intelligence and Games (Sept. 7-10, 2009) – Politecnico di Milano, Milano, Italy, pp. 75–79. IEEE, s.l. (2009)

15. Kassahun, Y., Sommer, G.: Eﬃcient Reinforcement Learning through Evolutionary Acquisition of Neural Topologies. In: Verleysen, M. (ed.) 13th European Sympo-sium on Artiﬁcial Neural Networks (ESANN) - Bruges (Belgium), 27-29 April 2005, vol. 2005, pp. 259–266. d-Side Publications, Evere

(14)

16. Lockett, A.J., Chen, C.L., Miikkulainen, R.: Evolving Explicit Opponent Models in Game Playing. In: Lipson, H. (ed.) Proceedings of the 9th Annual Conference on Genetic and Evolutionary Computation GECCO ’07, London, England UK -July 7-11, 2007, pp. 2106–2113. ACM, New York, NY (2007)

17. van Lon, R.R.S., Holvoet, T.: RinSim: A Simulator for Collective Adaptive Systems in Transportation and Logistics. In: 2012 IEEE Sixth International Conference on Self-Adaptive and Self-Organizing Systems. Lyon, France, 10-14 September, 2012, pp. 231–232. IEEE, Los Alamitos, CA / Piscataway, NJ / Tokyo (2012)

18. van Lon, R.R.S., Holvoet, T.: Evolved Multi-Agent Systems and Thorough Evalu-ation are Necessary for Scalable Logistics. In: 2013 IEEE Workshop on Computa-tional Intelligence in Production and Logistics Systems (CIPLS). Signapore, 16–19 April, 2013, pp. 48–53 (2013)

19. van Lon, R.R.S., Holvoet, T., Vanden Berghe, G., Wenseleers, T., Branke, J.: Evolutionary Synthesis of Multi-Agent Systems for Dynamic Dial-a-Ride Problems. In: Soule, T. (ed.) Proceedings of the 14th International Conference on Genetic and Evolutionary Computation. Conference Companion - GECCO ’12, Philadelphia, PA, USA - July 7-11, 2012, pp. 331–336. ACM, New York, NY (2012)

20. Lowell, J., Grabkovsky, S., Birger, K.: Comparison of NEAT and HyperNEAT Per-formance on a Strategic Decision-Making Problem. In: Watada, J., Grabkovsky, S., Birger, K. (eds.) Fifth International Conference on Genetic and Evolutionary Com-puting, ICGEC 2011, Kinmen, Taiwan / Xiamen, China, 29 August - 1 September, 2011, pp. 102–105. IEEE, Piscataway, NJ (2011)

21. Máhr, T., Srour, J., de Weerdt, M., Zuidwijk, R.: Can Agents Measure Up? A Comparative Study of an Agent-Based and On-Line Optimization Approach for a Drayage Problem with Uncertainty. Transportation Research Part C: Emerging Technologies 18(1), 99–119 (Feb 2010)

22. Mitrović-Minić, S., Laporte, G.: Waiting Strategies for the Dynamic Pickup and Delivery Problem with Time Windows. Transportation Research Part B: Method-ological 38(7), 635–655 (Aug 2004)

23. Moriguchi, H., Honiden, S.: CMA-TWEANN: Eﬃcient Optimization of Neural Networks via Self-Adaptation and Seamless Augmentation. In: Soule, T. (ed.) Pro-ceedings of the 14th International Conference on Genetic and Evolutionary Com-putation - GECCO ’12, Philadelphia, PA, USA - July 7-11, 2012, pp. 903–910. ACM, New York, NY (Jul 2012)

24. Parallel Matters: Java Parallel Processing Framework. URL http://www.jppf. org/[Accessed: 2013-05-29]

25. Parragh, S.N., Doerner, K.F., Hartl, R.F.: A Survey on Pickup and Delivery Prob-lems, Part I: Transportation between Customers and Depot. Journal für Betrieb-swirtschaft 58(1), 21–51 (Mar 2008)

26. Parragh, S.N., Doerner, K.F., Hartl, R.F.: A Survey on Pickup and Delivery Prob-lems, Part II: Transportation between Pickup and Delivery Locations. Journal für Betriebswirtschaft 58(2), 81–117 (May 2008)

27. Pillac, V., Gendreau, M., Guéret, C., Medaglia, A.L.: A Review of Dynamic Vehicle Routing Problems. European Journal of Operational Research 225(1), 1–11 (Feb 2013)

28. Pureza, V., Laporte, G.: Waiting and Buﬀering Strategies for the Dynamic Pickup and Delivery Problem with Time Windows. Information Systems and Operational Research 46(3), 165–176 (Aug 2008)

29. Randall, T., Cowling, P., Baker, R., Jiang, P.: Using Neural Networks for Strategy Selection in Real-Time Strategy Games. In: AISB 2009 Symposium: AI and Games,

(15)

Edinburgh, Scotland, 8th-9th April, 2009. The Society for the Study of Artiﬁcial Intelligence, s.l. (2009)

30. Risi, S., Lehman, J., Stanley, K.O.: Evolving the Placement and Density of Neurons in the Hyperneat Substrate. In: Pelikan, M. (ed.) Proceedings of the 12th Annual Conference on Genetic and Evolutionary Computation - GECCO ’10, Portland, OR, USA - July 7-11, 2010, pp. 563–570. ACM, New York, NY (Jul 2010) 31. Savelsbergh, M.W.P., Sol, M.: The General Pickup and Delivery Problem.

Trans-portation Science 29(1), 17–29 (Feb 1995)

32. Siebel, N.T., Krause, J., Sommer, G.: Eﬃcient Learning of Neural Networks with Evolutionary Algorithms. In: Hamprecht, F.A., Schnörr, C., Jähne, B. (eds.) Pat-tern Recognition: 29th DAGM Symposium, Heidelberg, Germany, September 12-14, 2007. Proceedings, Lecture Notes in Computer Science, vol. 4713, pp. 466–475. Springer, Heidelberg (2007)

33. Siebel, N., Sommer, G.: Evolutionary Reinforcement Learning of Artiﬁcial Neural Networks. International Journal of Hybrid Intelligent Systems 4(3), 171–183 (Oct 2007)

34. Stanley, K.O.: Eﬃcient Evolution of Neural Networks through Complexiﬁcation. Ph.D. thesis, The University of Texas at Austin (2004)

35. Stanley, K.O., Bryant, B.D., Miikkulainen, R.: Real-Time Neuroevolution in the NERO Video Game. IEEE Transactions on Evolutionary Computation 9(6), 653– 668 (Dec 2005)

36. Stanley, K.O., D’Ambrosio, D.B., Gauci, J.: A Hypercube-based Encoding for Evolving Large-Scale Neural Networks. Artiﬁcial Life 15(2), 185–212 (Jan 2009) 37. Stanley, K.O., Kohl, N., Sherony, R., Miikkulainen, R.: Neuroevolution of an

Au-tomobile Crash Warning System. In: Beyer, H.G. (ed.) Proceedings of the 2005 Conference on Genetic and Evolutionary Computation, Washington, DC, USA -June 25-29, 2005. pp. 1977–1984. ACM, New York, NY (2005)

38. Stanley, K.O., Miikkulainen, R.: Evolving Neural Networks through Augmenting Topologies. Evolutionary Computation 10(2), 99–127 (Jan 2002)

39. Stanley, K.O., Miikkulainen, R.: Evolving a Roving Eye for Go. In: Deb, K., Poli, R., Banzhaf, W., Beyer, H.G. (eds.) Genetic and Evolutionary Computation -GECCO 2004: Genetic and Evolutionary Computation Conference, Seattle, WA, USA, June 2004: Proceedings. Part II, Lecture Notes in Computer Science, vol. 3103, pp. 1226–1238. Springer, Berlin/Heidelberg (2004)

40. Vonolfen, S., Beham, A., Kommenda, M., Aﬀenzeller, M.: Structural Synthesis of Dispatching Rules for Dynamic Dial-a-Ride Problems. In: Moreno-Díaz, R., Pich-ler, F., Quesada-Arencibia, A. (eds.) Computer Aided Systems Theory - EURO-CAST 2013. 14th International Conference, Las Palmas de Gran Canaria, Spain, February 10–15, 2013. Revised selected papers, Part 1, Lecture Notes in Computer Sciences, vol. 8111, pp. 276–283. Springer, Berlin/Heidelberg (2013)

41. Whiteson, S., Stone, P., Stanley, K.O., Miikkulainen, R., Kohl, N.: Automatic Feature Selection in Neuroevolution. In: Beyer, H.G. (ed.) Proceedings of the 2005 Conference on Genetic and Evolutionary Computation, Washington, DC, USA -June 25-29, 2005, pp. 1225–1232. ACM, New York, NY (2005)

42. Wilson, R.: 23rd Annual State of Logistics Report. Tech. rep., CSCMP (Jun 2012) 43. Wittkamp, M., Barone, L., Hingston, P.: Using NEAT for Continuous Adaptation and Teamwork Formation in Pacman. In: Hingston, P., Barone, L. (eds.) 2008 IEEE Symposium on Computational Intelligence and Games (Perth, Australia, 15-18 Dec. 2008), pp. 234–242. IEEE, s.l. (2008)