Ant Colony Optimisation (ACO) - Ant Colony Optimisation for Dynamic and Dynamic Multi-objective

ACO is a metaheuristic algorithm often applied to combinatorial optimisation problems. A metaheuristic is a high level algorithmic framework that can be used to define heuristic methods that are applicable to a wide range of different problems. Combinatorial optimisation problems involve finding values for discrete variables to obtain the optimal solution with respect to a given objective function. A discrete variable is a variable that can only take certain values from a finite set. The railway rescheduling problems in this thesis are combinatorial optimisation problems.

A combinatorial optimisation problem can be defined as a triple (S, f, Ω), where S is a set of candidate solutions, f is the objective function and Ω is a set of constraints. A solution s in the feasible solution set ˜S ( ˜S ⊆ S ) is a solution that satisfies all the constraints in Ω. f assigns an objective value, f (s), to each s in the set of candidate solutions (s ∈ S). The aim is to find the globally optimally feasible solution, s∗ (s∗ ∈ ˜S) that gives the best objective value in terms of the objective function. The work in this thesis concentrates on minimisation problems where the best solution is the solution with the lowest cost in terms of the objective function, such that f (s∗) ≤ f (s) for all s ∈ ˜S.

2.2.1 The Basic ACO Algorithm

As described in [1], ACO is an optimisation algorithm inspired by the ability of ants to follow pheromone trails laid down by other ants to discover food [22]. As ants move backwards and forwards from the nest to a food source they lay down pheromones on the ground which can be sensed by other ants. Ants choosing the shortest path to the food source will return more quickly which ensures that the shortest path accumulates more pheromone. Ants tend to probabilistically choose paths with the strongest pheromone concentration which means that a path with high pheromone levels will attract more ants and accumulate even more pheromone. In this way, the shortest path to a food source is marked by the strongest pheromone trail. However, if this trail were to persist after the food source was depleted, it would seriously hamper the ants’ ability to find food. Therefore, pheromone trails evaporate over time to allow old decisions to be forgotten.

Figure 2.1 illustrates this principle. In Figure 2.1(a), ants search for food. The ant choosing the shortest path to the food will return quicker and lay down more

(a) (b)

Figure 2.1: The principle behind ACO

pheromone thus reinforcing the shortest path. By the time the rest of the ants have returned to the nest the ant using the shortest path has fetched food again and laid down more pheromone (Figure 2.1(b)). When the ants leave in search of food again, they are more likely to follow the path with the highest pheromone level which reinforces the path even more (Figure 2.1(c)). Over time old, unreinforced pheromone trails evaporate so that old solutions can be forgotten (Figure 2.1(d)).

To apply ACO to an optimisation problem, the problem has to first be decom- posed into a fully connected weighted graph G = (V, E), where V is a set of vertexes or nodes, and E is a set of edges or connections between the nodes. The ants move along the edges of the graph from node to node recording the nodes visited. This list of visited nodes, sometimes called the ant’s tour, is one possible solution to the optimisation problem. Pheromones are deposited on the edges of the graph by the ants according to how good an ant’s solution is in terms of the optimisation objective. On the next iteration, the updated pheromone levels help to guide the ants to choose better nodes. Pheromones can be decreased as well as increased to model

addition to the pheromone the edges may also be associated with a heuristic value, which is based on problem specific knowledge and provides additional guidance to the ants.

An ant, say ant k, when at node i, chooses the next node j in its neighbourhood Nk i , probabilistically as follows: pk_ij = [τij] α_[η ij]β P l∈Nk i[τil] α_[η il]β if j ∈ N_ik (2.1) where τij is the pheromone information and ηij is the heuristic information, α and

β are constants which determine the relative influence of the pheromone and the heuristic values respectively. An ant chooses the next node in this way with a probability of 1 − q0; otherwise, it chooses the next best node in terms of the pheromone

and heuristic values.

2.2.2 Population-Based ACO (P-ACO)

As described in [1], the above algorithm, however, does not provide any mechanism for allowing the ants to adapt to a change in the environment. Once the ants have converged on a solution, the resulting loss in diversity will make it difficult for them to adapt to a change in the problem and, in addition, the pheromone trails laid down for the previous environment may not provide any useful guidance to the ants in the new environment [26]. One option is to restart the algorithm after a change but such an action is not only computationally wasteful but also results in the loss of information that has the potential to be useful in the new environment.

To address this problem, Guntsch and Middendorf [36] introduced a Population based ACO (P-ACO) algorithm. In this algorithm, the best ant found at each iteration is stored in a memory, called the population-list, and only the ants in this list are used to update the pheromone levels. When the population-list reaches its designated limit, an ant is removed and the pheromone trail for that ant is negatively updated. This provides a mechanism for allowing previous bad decisions to be forgotten. To prevent the pheromone levels from building up to a level which means that all ants follow the same path, the amount of pheromone on each edge is bounded between a minimum value and a maximum value.

This memory of best iteration ants means that solutions made before a change can be retained to provide valuable information for the new environment. However, to make the ants suitable for the new environment, they may have to undergo a repair operation. Once repaired, the pheromone information for the new environment can be computed from the tours of the fittest ants created before the change, thus

ensuring that information from the previous environment can be passed over into the new environment. Guntsch and Middendorf [36] found P-ACO to perform better than restarting the algorithm when the environment change was small and frequent and comparable with restart when the change was large and slow.

2.2.3 Max-Min Ant System (MMAS)

One of the most popular ACO algorithms is the Max-Min Ant System (MMAS) [37]. As described in [38], in this algorithm, all of the pheromone trails are initialised to a maximum value. After each iteration, all pheromone trails are evaporated as in Eq. (5.9).

τij ← (1 − ρ)τij, ∀(i, j) ∈ L, (2.2)

where L = E is the set of all pheromones and 0 < ρ ≤ 1 is the pheromone evaporation rate [22], which is a constant parameter of the algorithm.

After each iteration, the pheromone trails are updated to correspond to the tour Tbest _{of either the best-so-far ant or the best iteration ant as in Eq. (2.3).}

τij ← τij+ ∆τijbest, ∀(i, j) ∈ T

best_, _(2.3)

The update value ∆τ_ijbestis _S1, where S is the fitness of the best ant. In a minimization problem, as the fitness of the ant improves, the value of the pheromone update increases correspondingly.

In MMAS, an ant chooses the next node as in Eq. (2.1). The pheromone trails in MMAS are bounded between a minimum τmin and a maximum τmax value. The

reason for this is to counteract the increased possibility of stagnation that may occur as a result of allowing only the best ant to deposit pheromone. In addition, stagnation is addressed by reinitialising all trails to τmax when the algorithm shows

stagnation behaviour or there has been no change in the best fitness for a set number of iterations. MMAS is unusual in that all pheromone trails are initialised to the maximum value, this together with a small evaporation rate increases the exploration of the search space at the start of the search [37].

The pheromone bounds are given as follows: τmax = _S1 and τmin = τmax_a . Where

S is the fitness of the best ant and a is a parameter of the algorithm. Each time a new best ant is found, the values for τmin and τmax are updated. In a minimization

problem, this means that as the fitness of the best ant, i.e, S, improves the values for both τmin and τmax increase.

2.2.4 Ant Colony System (ACS)

ACS was developed by Dorigo and Gambardella [39, 40] as a solution to the TSP problem. In ACS, an ant makes a decision as to which node to choose next using the pseudorandom proportional rule given in Eq. (2.4). According to the value of q, which is a random variable uniformly distributed in [0, 1], an ant k on node i chooses the next node j to be the node with the highest combined pheromone and heuristic value; otherwise, it chooses according to the probability distribution given in Eq. (2.1). j =      argmax_l∈Nk i{τil[ηil] β_}, _if _{q ≤ q} 0 J otherwise (2.4)

The effect of this rule is that with probability q0 (0 ≤ q0 ≤ 1) the ant exploits the

learned knowledge of the colony; otherwise, with probability (1 − q0), it performs a

biased exploration of the search space where it is biased towards choosing the node with the best pheromone and heuristic values but may not necessarily do so.

In ACS, two types of pheromone trail updates take place: global pheromone update and local pheromone update.

Global Pheromone Trail Update

The global pheromone trail update takes place after all the ants have made their solutions. Its purpose is to reward the edges belonging to the best tour. In this phase, the best-so-far ant updates the pheromone trails on all of the edges in its tour as in Eq. (2.5), where ρ (0 ≤ ρ ≤ 1) is a pheromone decay parameter. The update incorporates both evaporation (1−ρ)τij and pheromone deposit ρ∆τijbs, where

ρ∆τbs

ij is 1/Sbs and Sbs is the fitness of the best-so-far solution Tbs. The better the

fitness of the solution, the more pheromone is deposited. The outcome of the formula is that the new pheromone value is a weighted average of the old pheromone value and the amount of pheromone deposited [22].

τij ← (1 − ρ)τij + ρ∆τijbs, ∀(i, j) ∈ T

bs_, _(2.5)

Local Pheromone Trail Update

The purpose of the local pheromone trail update is to encourage exploration. It is performed by every ant immediately after adding a node to its tour. The pheromone update is performed using Eq. (2.6), where ξ (0 < ξ < 1) and τ0 are two parameters

of the algorithm. The value of τ0 is the initial value for the pheromone trail, which

made by always choosing the nearest node in terms of the heuristic. Its effect is to reduce pheromone concentration along the edge that the ant has just travelled, making the edge less desirable to the following ant which encourages it to explore other edges and discourages stagnation where all ants follow the same trail. For this to work, all ants must move in parallel one step at a time.

τij ← (1 − ξ)τij + ξτ0, (2.6)

In ACS pheromone trail limits are not set explicitly but are implicit within the update formulas in Eqs. (2.5) and (2.6). They can never drop below τ0 because both

update formulas always add an amount of pheromone greater than τ0. In addition

they can never rise above ρ∆τbs ij [22].

2.2.5 ACO for DOPs

After a dynamic change the optimal solution may have changed. There are two approaches to dealing with this. The first is to restart the algorithm, but this will lead to the loss of potentially useful information and may increase the time taken to find a new optimal solution. The second approach is to carry information from the old environment to the new environment. This information may potentially be very useful in guiding the ants to find an optimal solution in the new environment and may speed up the search process saving time and computational effort.

ACO is very applicable to DOPs because it has the ability to retain useful information from one change period to the next. This information is encoded in the pheromone trails and can be thought of as the ants’ ‘memory’ of previous good solutions. If the information is no longer useful it will evaporate over time and will be removed.

In addition one variation of the ACO algorithm, P-ACO actually holds a memory of good ant solutions that can be used to initialise the pheromone trails after a change to provide useful information in the new environment.

In document Ant Colony Optimisation for Dynamic and Dynamic Multi-objective Railway Rescheduling Problems (Page 36-41)