Genetic Algorithms - Network Protocols Performance

Network Protocols Performance

11.3 Genetic Algorithms

The field of Genetic Algorithms (GAs) has been growing since the early 1970s, but only recently were GAs applied to real-world applications which demonstrate their commercial effectiveness.

Holland (1975) invented GAs to mimic some of the processes observed in natural evolution: evolution, natural selection, reproduction. GAs have been investigated as a possible solution for many search and optimization problems in the way nature does, i.e., through evolution. Evolution operates on chromosomes, i.e. organic elements for encoding the structure of live beings. Processes of natural selection cause the chromosomes that encode successful individuals to reproduce more frequently than those less fit. In the process of biological reproduction, chromosomes coming from parents are combined to generate new individuals through crossover and mutation mechanisms.

In the GA field, biological concepts are mapped as follows. Each solution (individual) is represented as a string (chromosome) of elements (genes); each individual is assigned a fitness value, based on the result of an evaluation function. The evaluation function measures the chromosome’s performance on the problem to be solved. The way of encoding solutions to the problem on chromosomes and the definition of an evaluation function are the mechanisms that link a GA to the problem to be solved. As the technique for encoding solutions may vary from problem to problem, a certain amount of art is involved in selecting a good encoding technique.

A set of individuals constitutes a population that evolves from one generation to the next through the creation of new individuals and the deletion of some old ones. The process starts with an initial population created in some way, e.g. randomly. A GA is basically an algorithm which manipulates strings of digits: like nature, GAs solve the problem of finding good chromosomes by blindly manipulating the material in the chromosomes themselves.

genetic algorithm network simulator

TCL script relevant data

background traffic pattern

network topology

11.3.1 The Genetic Algorithm in Nepal

The pseudocode of the GA adopted within Nepal is shown in Figure 11.2. When the GA starts, N_p individuals are randomly generated. In our approach, an individual represents a background traffic pattern. The background traffic corresponding to the initial population is generated according to a Poisson process whose inter-arrival time between packets is exponentially distributed.

At each generation the GA creates Np new individuals by applying a crossover operator to two parents. Parents are randomly chosen with a probability which linearly increases from the worst individual (smallest fitness) to the best one (highest fitness). A mutation operator is applied over each new individual with a probability pm. At the end of eac h generation, the Np individuals with higher fitness are selected for survival (elitism), and the worst No ones are deleted from the population.

The fitness function measures the probe connections’ throughput, i.e., the performance of the probe TCP connections perceived by end-users during the simulation experiment. All bytes successfully received at the TCP level, but not delivered to end-users, such as duplicated packets received due to the retransmission mechanism of the protocol, are therefore not considered.

P0 = random_population(Np);

compute_fitness(P0);

i=0; /* current generation number */

while (i < max_generation) {

A = Pi; for j=0 to No

{ /* new element generation */

s’= select_an_individual();

s”= select_an_individual();

sj = cross_over_operator(s’, s”);

if(rand()≤p^m)

sj = mutation_operator(sj);

A = A ∪ s^j; }

compute_fitness(A);

i++;

Pi = {the Np best individuals ∈ A };

}

return( best_individual(Pi) )

Figure 11.2 Pseudocode of the genetic algorithm in Nepal.

The fitness function should increase with the increasing goodness of a solution, and a solution in Nepal is good when the background traffic pattern is critical; therefore, the fitness function we defined is inversely proportional to the total number of bytes perceived

by the end-users. Hence, fitness is 1/BTCP where BTCP is the total number of bytes delivered to the TCP users on all probe connection.

This simple measure already delivers satisfactory results on simple network topologies.

For different problems and different goals, additional parameters can be easily included into the fitness function.

11.3.2 Encoding

Each individual encodes the description of the traffic generated by all the background connections for the whole duration of the simulation. A connection is specified by a UDP source and destination pair, while the route followed by packets depends upon the network topology and on the routing mechanism. We assume that all the packets sent on the network are of the same size, hence individuals do not need to encode this information for each packet.

Individuals are encoded as strings of N genes. Each gene represents a single background packet and N is constant during the whole experiment. Genes are composed of two parts:

TAG, that indicates which background connection the packet will be sent on; DELAY, that represents how long the given source will wait before sending a new packet after sending the current one.

A common model for background traffic commonly adopted in statistical network analysis is a Poisson process with negative exponentially distributed arrival times (Hui, 1990), and we want delays in randomly generated individual to be exponentially distributed. For this reason the DELAY field represent an index in an array of values exponentially distributed from 0 to K seconds with an average value of 0.4*K. Where K is called the Poisson distribution’s parameter. Please note that we make no assumptions on the number of genes with a given TAG; however, N is usually large enough to make the initial random background traffic almost equally distributed between sources.

The crossover operators must have the ability to combine good property from different individuals and in Nepal each individual embeds two different characteristics: the background traffic as a function of time (time domain) and the background traffic as a function of the background connection (space domain). It is important to maintain the conceptual separation of these two characteristic, because, for instance, an individual could represent good background traffic during a given period of a simulation experiment, or it can represent a good traffic for a given background connection, or both.

It is possible that the genes with a given TAG are not enough to generate traffic for the whole experiment, in this case the connection remains idle in the final part of the experiment. On the other hand, it is also possible that the simulation ends before all the packets are sent, and final genes are ignored.

Figure 11.3 shows, on the left-hand side, a possible individual and, on the right-hand side, the corresponding background traffic generated during the simulation experiment. The table mapping the DELAY field to exponential delays is reported in the middle of the figure.

In the individual shown in Figure 11.3, packet number 9 is not sent because the simulation is over before the delay associated with gene number 6 expires. On the other hand, one gene with the third tag is missing, since the delay associated with the last packet on that connection (packet number 7) expires before simulation’s end.

Figure 11.3 Individual encoding.

Figure 11.4 Crossover operators

11.3.3 Crossover Operators

The GA in Nepal implements four types of crossover: the widely adopted one-cut, two-cuts and uniform, plus a peculiar one called mix-source (Figure 11.4):

One-cut crossover: let one cut point c be a randomly selected point between 1 and N. Nepal generates a new individual copying the first c genes from the first parent and the subsequent N – c ones from the second parent.

Two-cuts crossover: let two cut points c₁ and c₂ be randomly selected between 1 and N with c₁ < c₂. Nepal generates a new individual copying the first c₁and the last N – c₂ genes from the first parent. The middle c₂– c₁ genes are copied from the second parent.

simulation experiment

These two crossover operators generate a background traffic that is equal to the traffic generated by one parent for a given time, and then suddenly changes and follows the traffic generated by the other parent (strictly speaking, it is possible that the cut points do not correspond exactly to the same time instants in the two parents, however, N is large enough for this to be neglected).

Uniform crossover: Nepal generates a new individual taking each gene from the first or the second parent with the same probability.

Mix-source crossover: each TAG (i.e. each background source) is randomly associated with one of the two parents. Nepal generates a new individual copying each genes with a specific TAG from the associated parent. The new individual can be longer or shorter than N: in the former case the exceeding genes are dropped; in the latter one the individual is filled with random genes.

The mix-source crossover aims at putting together good background sources from different individuals.

We can say that the mix-source crossover acts in the space (or source) domain while the one-cut and two-cut ones act in the time domain. The uniform crossover is hybrid and acts in both domains.

11.3.4 Mutations

Nepal implements three different mutation operators: speed-up, slow-down and random.

Mutation applies on N_m< N contiguous genes, where N_m is a c onstant. The first gene is randomly selected between 1 and N and the gene list is considered circular.

Speed-up mutation: decreases the DELAY field by one, without modifying the TAG field.

DELAY values are considered circular.

Slow-down mutation: increases the DELAY field by one, without modifying the TAG field.

DELAY values are considered circular.

Random mutation: substitutes each gene with a new random one.

After each mutation, genes in the individual need to be sorted to fulfill the ‘sorted in time’ property.

11.3.5 Univocal Representation

A general guideline for GA states that different chromosomes should represent different solutions. Diversely, a GA would waste some of the time in the generation and evaluation identical solutions, and even worse, the population could prematurely converge to a state where chromosomes are different but solutions are equal, and where genetic operators have no effect.

To ensure that a given bac kground traffic is determined by an unique c hromosome, genes are sorted to guarantee that the packet associated to gene i+1 is never sent before the packet associated with gene i. We say that genes in the individual are ‘sorted in time’.

Otherwise, it would be possible to exchange adjacent genes with different TAGs without modifying the represented background traffic (e.g. packet number 2 and packet number 3 in Figure 11.3).

Moreover, some crossover operators aim at combining good characteristic in the time domain. If an individual does not fulfil the ‘sorted in time’ property, it would be hard for such operators to preserve property related to such domain, and they would act almost in the same way as the common uniform crossover. Our experiments show that the performance of the GA would be significantly reduced.

Chromosomes need to be ‘sorted in time’ after crossover and mutation operators.

11.4 Experimental Results

We developed a prototypical implementation of the algorithm using the Perl 5 (Wall et al, 1996) language. The implementation of the GA is less than 500 lines long and, with a limited effort, can be ported to different simulators and different platforms.

The source of the simulator consists of 38,370 lines of C++ code. We needed to modify insane to handle an externally generated background traffic; modifications of the original code are limited to about 100 lines. Such modifications allow the traffic source to be completely defined by the user.

11.4.1 Network Topology

Figure 11.5 shows the topology of the IP network exploited in the experiments. Three TCP connections span from the transmitters TXi to the receivers RXi through three IP routers.

Each TCP connection performs long file transfers generating a series of 1024 Kbyte messages at a maximum mean rate of 1.33 Mb/s. Thus, the three TCP sources can generate an overall load of 4 Mb/s, if this is allowed by the TCP flow control mechanism.

Acknowledgments from each transmitter are carried by messages flowing in the reverse direction. These TCP connections represent the probe connections of the experiment.

Two sources (BSi) generate background UDP traffic directed to their respective destinations (BDi) over the same links traversed by the TCP connections. The timing of background packets is controlled by the GA, as described earlier. The background traffic is generated only in the forward direction of the TCP connections and thus only data messages can be delayed and discarded due to network overload, not acknowledgments.

In this topology, the link between the right-most two routers represents a bottleneck for the network since it is traversed by all the traffic generated by both the background sources and the TCP connections.

Each link shown in Figure 11.5 has a capacity of 10 Mb/s in each direction and introduces a fixed 10 µs delay. For example, such links can be considered as dedicated 10 Mb/s Ethernet trunks, the 10 µs latency accounting for the propagation delay on the wires and the switching delay introduced by an Ethernet switch.

TX2 TX1

TX3

RX1

RX2

RX3

BS1 BS2 BD2 BD1

Figure 11.5 Topology of the network.

Routers introduce a fixed 0.1 ms delay component which accounts for the processing required on each packet and adds to the queuing delay. The size of output queues modeled by insane is given in terms of number of packets (we used 64 packet queues in our experiments). The buffering capacity of real routers is given in terms of byte volume, i.e.

the number of packets that can be stored in a buffer depends upon their size. We choose to deal with fixed size packets to work around this limitation of the simulator program.

11.4.2 Parameter Values

All the parameters used by the GA are summarized in Table 11.1, and the value used in our experiments is reported.The parameters K, N and T are tuned in order to model a 6 Mb/s background traffic level, because, in this situation, the probe connections’ traffic (4 Mb/s) plus the background traffic is nearly equal to the links’ bandwidth (10 Mb/s). With a smaller load, the background traffic would be unable to interfere significantly with probe connections; while with a larger one, the whole network would be congestioned and the TCP throughput would be very small, even with a completely random traffic.

Table 11.1 Genetic algorithm parameter values.

PARAMETER Name Value

simulation duration [s] T 3

max number of generations G 500

population size N_p 50

offspring size N_o 20

number of genes in each individual N 4,500 Poisson distribution’s parameter [ms] K 6.67

mutation’s probability p_m 50%

number of genes affected by a mutation N_m 100

0 0.4 0.8 1.2 1.6 2

1 26 51 76 101 126 151 176 201 226 251 276 301 326 351 376 401 426 451 476 501

generation

probe connections’ throughput

statistical

Nepal

Figure 11.6 Throughput of probe connections.

11.4.3 Results

We let the GA run for G generations on a Sun SPARC Station 5 with 64 Mb of memory.

The GA required about 17,000 seconds of CPU time (mainly spent for sorting in time genes of newly generated individuals), and insane employed about 53,000 seconds of CPU time for running all the required simulations.

Figure 11.6 shows the throughput of the probe connections, i.e., the connection bandwidth delivered to users. The X axis reports the generation number of the GA. These values are compared to the ones of a standard statistical approach, where the background traffic is randomly generated according an equivalent Poisson process. For the statistical approach, we report the lowest throughput obtained after simulating a number of random patterns equal to the number of simulations required by Nepal until the given generation.

This value does not change significantly as new traffic patterns are randomly generated.

During the experiment, Nepal managed to dramatically degrade the probe connections’

throughput from 1.66 Mb/s to 0.48 Mb/s with a small increment of the background traffic bandwidth. Thus, the genetic algorithm leads the background traffic sources to generating a traffic pattern that the TCP protocol cannot easily deal with.

Due to the fact that some genes may result unuseful at the end of the simulation run, the bandwidth of a critical background traffic pattern generated by Nepal is slightly larger than the starting one. Thus, in order to eliminate this bias from the results, we defined a disturbance efficacy parameter at generation i asDEi =(T^*−Ti)/Bi, where T^* is the throughput of the TCP probe connections without the background noise traffic, Ti is the lowest throughput of the TCP probe connections archived until generation i and Bi is the corresponding background traffic bandwidth. In DE the effec ts of the traffic are_i,

normalized with respect to the varying background traffic bandwidth Bi. We experimentally examined the DE of the statistical approach with different background traffic loads, and we found that the GA reaches the higher DE even when tuning the statistical traffic to provide the same load as the GA.

0.36 0.41 0.46

1 26 51 76 101 126 151 176 201 226 251 276 301 326 351 376 401 426 451 476

generation

disturbance efficacy

statistical Nepal

Figure 11.7 Disturbance efficacy.

Figure 11.7 plots DEi as a function of the GA generation i and clearly shows that a critical traffic pattern is identified by the genetic evolution, and cannot be easily found with standard statistical methods. Performance evaluated with statistical methods can be much larger than the one experienced on the field in equivalent operating conditions. Thus, the GA allows the performanc e of the protoc ol to be assessed without making unrealistic assumptions about the probabilistic distribution of the traffic generated by background sources.

Moreover, the GA proves itself able to find a critical pattern examining the whole network almost as a black box, with little information of what is happening inside, and only a few control knobs for influencing the system. This is an important point, because Nepal does not rely on the knowledge of a specific problem, but on the strength of a potentially generic GA.

11.4.4 Analysis

In our case study, the genetic algorithm exposes the weaknesses of the TCP congestion reaction mechanism. TCP transmitters infer network congestion from missing acknowledgments and react to congestion by shrinking the transmission control window to

1 TCP message. This forces the transmitter to slow down the transmission rate, thus contributing to eliminate network congestion. The transmission control window is doubled each time an acknowledgment is received and the transmission rate is gradually increased accordingly.

The time required to reach the transmission speed that was devised before shrinking the window depends on the round trip delay of a packet on the connection. As a consequence, the longer the connection, the smaller the average throughput the connection achieves during congested periods.

The background traffic generated by the GA causes routers to discard packets in a way that makes TCP transmitters shrink the window before it can grow large enough to actually exploit all the available bandwidth. As a consequence, TCP connections achieve low throughput even though the network is just slightly overloaded. When the traffic is generated according to a statistical model, the performance limitations due to the shrinking of the control transmission window are less evident. This is shown by Figure 11.8, which plots the dimension of the control window of a TCP transmitter over time when the background traffic is created by the GA and when an equal amount of noise is generated statistically. The graph shows that, in the former case, the maximum dimension reached by the window when the network is overloaded is smaller. Thus, the throughput of the TCP connections with GA controlled background traffic is smaller than with statistical background traffic.

0 2000 4000 6000 8000

0.00 1.00 2.00 3.00 4.00

GENETIC

In document Telecommunications Optimization - Heuristic and Adaptive Techniques (Page 175-184)