A BASIC EVOLUTIONARY ALGORITHM FOR THE PROJECT STAFFING PROBLEM

(1)

GHENT UNIVERSITY

FACULTY OF ECONOMICS AND BUSINESS

ADMINISTRATION

YEARS 2013 – 2014

A BASIC EVOLUTIONARY

ALGORITHM FOR THE PROJECT

STAFFING PROBLEM

Master thesis presented in order to acquire the degree of Master of Science in Applied Economics: Business Engineering

Piet Peene

(2)

(3)

(4)

GHENT UNIVERSITY

FACULTY OF ECONOMICS AND BUSINESS

ADMINISTRATION

YEARS 2013 – 2014

A BASIC GENETIC ALGORITHM FOR

THE PROJECT STAFFING PROBLEM

Master thesis presented in order to acquire the degree of Master of Science in Applied Economics: Business Engineering

Piet Peene

Under the management of Prof. Dr. Broos Maenhout

(5)

Permission

Undersigned declares that the content of this master thesis may be consulted and reproduced, when referencing to it.

Piet Peene

(6)

I. Preface

This master thesis denotes the end of my studies in Business Engineering, Operations Management at the University of Ghent. It is the conclusion of a fascinating road through the fields of knowledge in operations management. However, exploring these paths sometimes presented unforeseen challenges. Being able to overcome these challenges will only strengthen motivation and courage towards future trials. May that be an important lesson I have learned in the process. In my opinion, writing a thesis is a long-haul task in a subject of personal interest. My spark of interest in mathematical modelling was ignited when taking a 3rd Bachelor class in Operations Research taught by Prof. Dr. Broos Maenhout. Further classes in the master’s degree that broadened my interest in the subject of planning and scheduling included Project Management and Applied Operations Research, taught by Prof. Dr. Mario Vanhoucke. A thesis on the project scheduling and staffing problem is therefore a perfect match with my interests.

This thesis required a lot of effort, but it would not be accomplished without the help and support of others. Special thanks go to my promoter Prof. Dr. Broos Maenhout, for guiding me through the process, offering advice and working material, not to mention his flexibility in scheduling of consultation meetings, even outside regular working hours.

Furthermore I owe my parents, Annie Lips and Yves Peene, and my sister Tine and her boyfriend Geert Depuydt many thanks for the love and support they provided, not only during the making of this thesis but throughout the completion of my higher education. I also want to thank my friends, especially Annelies Deleersnyder, Arno Wallays and Michelle Vu, whom I got to know during my time at the university. They were always available for mental support and distractions.

(7)

II. Table Of Content

1. Introduction ... 1

2. Problem description and model formulation ... 4

2.1. Project Scheduling Problem Area... 4

2.2. Problem description ... 6

2.2.1. Project scheduling problem description ... 6

2.2.2. Project staffing problem description ... 8

2.3. Mathematical model formulation... 10

3. Methodology ... 14

4. Literature Overview ... 15

4.1. Genetic algorithms ... 15

4.1.1. What are genetic algorithms... 15

4.1.2. Genetic algorithm framework ... 16

4.2. Data representation ... 19

4.2.1. Project Schedule representation ... 19

4.2.2. Project Staffing representation ... 24

4.3. Solution methods... 26 4.3.1. Initialization ... 26 4.3.2. Selection ... 29 4.3.3. Operation ... 32 4.3.4. Local Optimization ... 37 4.3.5. (Partial) Evaluation ... 41 4.3.6. Reinsert ... 44 4.3.7. Ending condition ... 47 5. The algorithm ... 49 6. Computational experiments ... 50

6.1. Observation link AVGSQDEV – Total cost ... 50

6.2. Benchmark ... 51

6.3. Results ... 54

6.3.1. Basic cycles ... 55

6.3.2. Stage contributions ... 58

6.3.3. Sensitivity Analysis ... 59

(8)

III. List of figures

Figure 1 Extended Project Management triangle ... 1

Figure 2 Flowchart Genetic Algorithm, framework 1 ... 16

Figure 4 Activity-on-the-node project schedule activity network (AN2) ... 20

Figure 5 Example project schedule, PS2 (based on AN2) ... 23

Figure 7 Labor supply and demand for AN2, PS2 ... 25

Figure 8 Pseudo code Initialization methods ... 28

Figure 9 Example Roulette Wheel ... 31

Figure 10 Pseudo code Roulette Wheel Selection ... 31

Figure 11 Pseude code Tournament Selection ... 32

Figure 12 Pseudo code Blend CrossOver ... 35

Figure 13 Local search simplified ... 38

Figure 14 Pseudo code LS1 Burgess and Killebrew simplified RLP ... 39

Figure 15 Pseude code Double-Justification ... 40

Figure 16 Pseudo code 2-exchange neighborhood ... 41

Figure 17 Average squared deviation of resource consumption ... 43

Figure 18 AVGSQDEV - Total Cost dispersion for all project lengths ... 50

Figure 19 AVGSQDEV - Total Cost dispersion for project length of 11 days ... 51

Figure 20 Benchmark setup... 52

Figure 21 Lower bound benchmark in function of project duration ... 53

Figure 22 Total Cost evolution basic cycle AN2 Framework1 ... 56

Figure 23 Total Cost evolution basic cycle AN2 Framework2 ... 56

Figure 24 Total cost Vs number of operations for different population sizes ... 60

Figure 25 Total cost Vs population size for different number of operations ... 61

Figure 26 Trade-Off Total Cost Vs. Execution Time ... 62

Figure 27 Efficient Frontier Total Cost Vs. Execution Time ... 63

Figure 28 Efficient Frontier Total Cost Vs. Execution Time, with / without doubles .. 65

Figure 29 Total Cost Vs. Number of Iterations... 65

Figure 30 Total Cost Vs. Mutation Percentage ... 66

Figure 31 BSP Vs number of operations ... 67

Figure 32 Total cost Vs ending condition ... 68

(9)

IV. List of tables

Table 1 Example Duration Vector ... 21

Table 2 Absolute Starting Times ... 21

Table 3 Relative Starting Times ... 22

Table 4 Resulting Starting Times ... 22

Table 5 Work patterns forming labor supply ... 24

Table 6 Total Cost evolution basic cycle AN2 Framework1 ... 55

Table 7 Total Cost evolution basic cycle AN2 Framework2 ... 56

Table 8 Best solution methods, total cost and lower bound per basic cycle ... 57

(10)

V. Abbreviations

AN Activity network

AVG Average

AVGSQDEV Average squared deviation

BSP Best schedule percentage

CP Critical path

GA Genetic algorithm

PS Project schedule

RACP Resource availability cost problem

RCPSP Resource-constrained project scheduling problem RLP Resource levelling problem

RRP Resource renting problem

SPI Serial parallel indicator

(11)

1. Introduction

This thesis deals with a project scheduling and staffing problem. It fits in the functional area of project management. Many people and institutions have tried to give a meaningful definition of project management. The Association of Project Management (APM) came up with an apt definition stating project management is “the planning, organisation, monitoring and control of all aspects of a project and the motivation of all involved to achieve the project objectives safely and within agreed time, cost and performance criteria. The project manager is the single point of responsibility for achieving this.” (APM BOK, 1995) A project is “a temporary endeavour undertaken to create a unique product, service or result”. (PMBOK, 2004) This very short definition of a project is very meaningful in two of its words, i.e. temporary and unique. ‘Temporary’ indicates that there is a well-defined start and end. ‘Unique’ signifies that there is no predefined scheme for executing the project. However there may be similarities to previous projects. A project consists of multiple tasks or activities that need to be executed in a certain order, this order and the precedence relations between the activities are defined in an activity network. The actual timing of all activities is defined in a project schedule.

A classic combination of criteria, measuring success or failure of a project is depicted in the iron triangle or the project

management triangle. (Atkinson, 1999) We made an extended version of the triangle including the link of project management with project staffing and project scheduling in figure 1. The triangle has a performance measure on each of its corner-points. The scope contains the content of the project, what should be done. The cost refers to the budget of the project. Time refers to the amount of time to complete the whole project. It is often

(12)

altered, it will have an impact on the other two. For example, when extending the scope of a project, the cost and time are likely to increase as well. However, these relations are not necessarily strict meaning that a decrease in time does not necessarily mean an increase or decrease in cost (cfr. non-regular objectives of performance).

The goal of the thesis is to find an intelligent way to construct a schedule of activities that has the lowest staffing cost. This construction of a schedule is called project scheduling. Every activity in the schedule has a certain need for resources. In our research, these resources are labor. Project scheduling creates a demand of labor over the span of the project and results in a project make span (time criterion in figure 1).

However this demand cannot always be met exactly by the supply of labor, which is defined by project staffing. Once the schedule and resource needs are known, the project staffing can be executed. This project staffing results in the supply of resources and gives the eventual staffing cost. A close match between supply and demand of resources is more likely to result in a lower staffing cost.

This matching of supply and demand and its link to project scheduling and project staffing is shown in the bottom part of figure 1.

In order to attain the goal of constructing a good schedule, we implement a basic evolutionary algorithm, coded in C++. This research and its coded implementation are subject to various limitations. Since the focus here is mainly on the quantitative aspects of project staffing and project scheduling, the qualitative aspects are being neglected. An example of the qualitative aspect is job satisfaction or the loss of it resulting from irregular or acyclic working patterns, including overtime and idle time. Besides the lack of qualitative information, also some quantitative aspects show deficiencies. These deficiencies are mainly due to the many assumptions included into the modelling. Examples include the assumption that the time and resource consumption of each activity is exact and known a priori, and costs assigned to the different types of labor time is only an estimation.

However these assumptions and estimations are vital to the construction of a mathematical model and are being set to resemble reality as close as possible.

(13)

There are also practical limitations towards the execution of the algorithm, in the sense that available computing power sets a boundary to this research. Although computing power is ever increasing, computationally testing the possible combinations to construct the algorithm is still limited.

To conclude this introductory chapter, a brief overview of the structure of this thesis is given.

Chapter two will dig deeper into project staffing, project scheduling and the interactions between them. The chapter concludes with an unambiguous definition of both problems and their mathematical model.

Chapter three presents the methodology, i.e. the way in which the research is conducted. Chapter four provides a literature overview on the different types of solution algorithms and its building blocks. Chapter five describes the algorithm that proves to be the most performant. In chapter six, the computational results are given in combination with general observations and calculation of a benchmark. A conclusion and recommendations for further research are presented in chapter seven.

(14)

2. Problem description and model formulation

The first chapter situates our problem in the functional area of project management. The second chapter will define the problem more in detail. In the first section, the problem is put into the bigger picture by describing similar problems. The second section gives the specific problem description of our problem that will be used throughout the remainder of this thesis. The third and final section translates the problem description into a mathematical model.

2.1. Project Scheduling Problem Area

In project scheduling, a set of activities need to be scheduled meaning a start time has to be assigned to all activities. The project scheduling problem has been widely researched. Overviews and classification methods for the scheduling problem are given by Icmeli et al. (1993), Elmaghraby (1995), Herroelen et al (1997, 1998, 1999), Brucker et al. (1999) and Hartmann and Briskorn (2010). Different classifications can be made based upon the difference in characteristics between the problems. The use of a different type of resource, i.e. renewable or non-renewable leads to a different kind of problem as well as the activity characteristics and type of scheduling objective. Examples of objectives for the project scheduling problem are minimization of the duration of the project, levelling resources over the course of the project, minimizing resource idle time and maximizing the net present value. The most popular problems are the resource-constrained project scheduling problem or RCPSP. The goal of this problem is to minimize the total length of the project taking into account a certain renewable resource constraint. Other similar problems are the resource availability cost problem (RACP), the resource levelling problem (RLP), the time-constrained project scheduling problem (TCPSP) and resource renting problem (RRP). The RACP aims at minimizing the total cost of the unlimited renewable resources required to complete the project before a certain deadline. The RLP has the objective to schedule the activities such that the resulting resource demand over the span of the project is as levelled as possible. The TCPSP aims at meeting project deadlines, starting with a fixed capacity of resources. In order to meet the deadlines, decisions have to be made concerning working overtime and hiring additional

(15)

resources to enlarge the existing fixed capacity. The RLP has as objective to minimize the renting costs incurred by renewable resource, these costs concern both fixed and variable renting costs.

After the project activities are scheduled, the staffing needs to foresee sufficient labor resources to carry out the resource demand of the schedule.

The main objective is to minimize total staffing cost of the project. Total costs are the sum of the cost of regular personnel, cost of overtime, cost of idle time and cost of temporal personnel. This problem objective is often referred to as ‘the deadline problem’ (brucker et al, 1999), meaning that there is a given deadline on the makespan of the project and the goal is to find a feasible schedule that minimizes the costs. This is opposed to ‘the budget problem’ where one is given a certain budget and needs to find a feasible schedule that minimizes the makespan.

The staffing problem has already been solved deterministically by Maenhout and Vanhoucke (2014). This thesis will focus on the project scheduling problem.

The goal of this thesis is to develop an algorithm that generates a project schedule that minimizes the staffing costs. However, it is important to note that a shorter make span or more levelled resource usage does not necessarily mean a lower total cost of the project. Therefore, translating the global objective into an intermediate objective for the scheduling problem is not straightforward.

(16)

2.2. Problem description

This section handles the problem description of both the project scheduling and project staffing problem. There is not a single formulation of these problems. The basic idea is always common, but subtle deviations can be made to the goal or the constraints of the problem, making it seemingly result in a totally different problem.

2.2.1. Project scheduling problem description

The basic idea of project scheduling is to determine a start time for each activity in the project activity network. The assignment of these start times are not random but should serve a goal. The overall goal is to produce a schedule that provides the lowest personnel staffing cost. This cost determination is not part of the scheduling process but part of the staffing process. There is no direct translation of this staffing cost goal to a goal for the scheduling problem. As an intermediate goal that gives an approximation for the staffing goal, we set a resource levelling objective for the scheduling problem. This makes the project scheduling problem resemble the resource levelling problem (RLP) as discussed in section 2.1.1. The RLP has a non regular measurement of performance; it has no early completion measure. (Neumann and Zimmermann, 1999)

Besides the objective of the problem under consideration, the scheduling constraints that are active have an influence on the problem definition. These constraints can be derived from either activity characteristics or resource characteristics. (Herroelen et al., 1997)

Activity characteristics: • No pre-emption (1)

• Finish-start precedence relations (2) • Fixed and discrete duration per activity (3) • Predefined project deadline (4)

• Acitivity resource needs: constant and discrete (5) • Single execution mode (6)

(17)

Pre-emption or splitting of an activity is not allowed.(1) Pre-emption means that if an activity is started, it can be interrupted at some point in time, to resume later. Pre-emption brings more flexibility in the schedule and thus adds complexity. The scheduling of the activity is constrained to precedence relations. (2) This means that a certain order of execution of the activities needs to be maintained. This order is determined by the activity network. The only type of precedence relation used, is the basic PERT/CPM finish-start precedence relation. This means that the previous activity in the network has to finish first before the next activity can be started. Other precedence relations, referred to as generalized precedence relations, such as start-start, start-finish and finish-finish precedence relations are not used. Also the use of minimal and maximal time lags is omitted, for the sake of complexity. A start-start precedence relationship with a minimal time lag of three days means that the next activity can start three days or later than the start of the previous activity. The duration of an activity is known in advance and has an integer value. This means that the duration does not depend on a stochastic process or events on prior activities. Forcing the activities to have integer durations simplifies the calculations. A predefined project deadline of 21 days is applied, for technical reasons not functional. (4) We did not use a deadline as a relative percentage of the critical path since the critical paths of the activity networks under consideration differ heavily. This would result in strangling the solution space for the activity network with small critical path and creating an abundant solution space size for the activity network with a large critical path. The activity resource needs are constant, meaning that over the course of an activity, the resource demand for each time unit is equal. (5) The activity resource demand is integer for the same reason the activity durations are integer values. Contrary to discrete and constant, resource needs could be continuous and the amount necessary could be a function of the duration. There is only a single execution mode for the activities. (6) Multiple activity modes would imply the possibility of executing one activity or a subset of activities to be executed in different ways, possibly incurring different costs.

Resource characteristics: • Single resource (7)

(18)

The resource used for executing the activities is labor. Every unit of labor is assumed to be equal, the labor units do not require different skill levels.(7)

Concerning the resource constraints, we consider only one resource type in our problem. This resource type is a renewable resource. (8) A renewable resource is a resource that gets renewed from period to period. Besides labor, machines are another example of a renewable resource. Examples of non-renewable resources are materials, energy and money; once they are used, they are gone. The availability of resources is variable, and defined by the staffing problem. (9) The amount of labor available at each time unit depends on the number of workers employed and their different working patterns.

2.2.2. Project staffing problem description

The basic idea of project staffing is to find the combination of working patterns that covers the resource needs of an activity schedule. Each work pattern is a serial string of work days and days off and is executed by a single worker. The work patterns are non-cyclic, meaning that there is no predefined and reoccurring pattern of days off and days on. Not all patterns are allowed however, there are minimum and maximum constraints defined on the number of consecutive days off and consecutive days on. Minimum consecutive days on 2

Maximum consecutive days on 6

Minimum consecutive days off 1 (does not result in an actual constraint) Maximum consecutive days off 2

The goal of the staffing problem is to find the combination of working patters that satisfies the resource needs of the activity schedule and minimizes the labor costs incurred by the staffing. An overview of the different costs and their weights are presented below. (Maenhout & Vanhoucke, 2014)

• Regular personnel time units 2

• Overtime units 3

• Temporal personnel time units 4

(19)

The cost of regular personnel time units is a variable cost, depending on the project makespan. It does not take into account the actual number of days worked. Each project day incurs a cost of 2. A work pattern is subdivided into several periods, each containing seven days. A regular period of seven days has five days on and two days off. If a work pattern has an extra day on, on top of these five days, an overtime unit cost is incurred. For every work pattern, its cost can be calculated based upon the regular time units and the overtime units. The cost of a work pattern is thus known a priori, before the actual staffing takes place. The other two costs, i.e. temporal personnel units and idle time units are costs that come as a result of the combination of several work patterns. If the combination of work patterns does not supply enough labor on a certain day, external labor has to be hired for that day, incurring a temporal cost per extra unit. If however the combination of work patterns supply excess labor on a certain day, a penalty cost per idle time unit is added.

(20)

2.3. Mathematical model formulation

A mathematical formulation of the scheduling and staffing problem leaves no room for misconception and has the potency of representing the problem concisely. The notation will be explained in the form of sets, input data and decision variables (Maenhout and Vanhoucke, 2010). Sets are well-defined groups of elements that have common characteristics. An individual element in a set is recognized by its index. A set will be denoted by a capital letter. If W is the set of workers, w1 represents the first individual worker in the set. Input data is static data and known on beforehand. It is important that this data is as close to reality as possible since it will have a great impact on the behavior of the algorithm and ultimately on the results. Decision variables are the unknown factor in the model. Solving the mathematical problem means determining the value of these decision variables.

Sets

W set of workers (index i)

T set of days in the scheduling horizn (index t) A set of activities in the project (index j)

Input Data

cr cost per worker per day

co cost per worker per day of overtime

cx cost per worker per day of outsourced labor cl cost per worker per day of idle time

dj duration of activity j

ppj 1 if p is a predecessing activity of activity j

rj number of resources necessary to execute activity j PD project deadline

DOmin minimum consecutive days off DOmax maximum consecutive days off DWmin minimum consecutive days working DWmax maximum consecutive days working

(21)

Decision Variables

stj starting time of activity j

resulting parameter: ajt, 1 if activity j is performed on day t PL project schedule length

witr 1 if worker i works a regular shift on day t wito 1 if worker i works an overtime shift on day t wtx number of workers outsourced externally on day t wtl number of workers in excess on day t

The actual model consists of an objective function and constraints. The objective function represents the ultimate goal. This could be the minimization of the make span, the levelling of work load, the maximization of profits etc. In this case the objective is to minimize the total personnel staffing costs. Underneath the objective function, a number of constraints are formulated. These constraints find their origin in a regulatory domain, (e.g. the number of allowed consecutive working days) or are driven by feasibility boundaries (e.g. activity 1 needs to be completed before activity 2 can start). Objective function (1) Subject to constraints (2) (3) (4) (5) (6)

(22)

(9a) (9b) (10) (11) (12) (13) (14)

The objective function (1) represents the total personnel cost of the project. It can be broken down into 4 parts. The first part is a cost incurred for each worker in a regular schedule. The cost is independent of the actual number of days worked but is completely dependent of the length of the project schedule. It represents a fixed cost per hired worker. The second part of the cost calculation accounts for the overtime units. A worker is supposed to work a normal schedule of five days per week. However when he works more, an extra cost is incurred on top of the regular cost. The third part is a cost related to outsourcing extra workers. Sometimes, the regular hires can’t carry all the workload thus extra workers will be sourced from outside. This has the advantage that there is no fixed cost over the whole length of the project. However these units of work are usually more expensive than regular or overtime work units. The last part of the cost function is a cost that represents the excess supply of workers. When there are more workers available than necessary for the amount of work, an extra cost is incurred.

The first constraint equation (2) shows the connection between the project scheduling and the project staffing. The project scheduling results in a demand of labor for each day of the project, which is represented on the left-hand side of the equation. The right-hand side of the equation represents the supply of labor for each

(23)

day of the project, which is the result of the project staffing. Supply and demand of labor need to be in balance on a daily basis. If no balance can be found between the demand of labor and the supply generated by regular and overtime work units of the hired workers, extra outsourced labor or excess labor will rectify the total balance.

Equations (3) - (6) are constraints exclusively related to the project scheduling problem. (3) is the mathematical representation of the finish-start precedence relations. The fourth equation forces the non-preemptive nature of the activities in the project schedule. The project length (5) is defined as the end of the last activity. This project length is bound to a certain predetermined project deadline (6).

Equations (7) - (12) are constraints exclusively related to the project staffing problem. The seventh equation enforces that a worker can execute a regular work unit or an overtime work unit but never both. On one day, either extra workers get outsourced or there is excess labor, or none of both. (8) It does not make any sense to attract external workforce while there are still regular workers available. Constraints (9) – (12) represent working agreements between the employees and the employer. Constraint (9) and (10) ensures the minimum and maximum number of consecutive days off work, respectively while (11) and (12) ensures the minimum and maximum number of consecutive days a worker is allowed to work.

Equation (13) limits certain variables to a binary value and other variables to positive integers (14).

(24)

3. Methodology

The problem explained in the previous chapters shows a complex interaction of two subproblems. A structural approach towards the solution of the problem is essential, and basic assumptions are necessary to limit its complexity. As mentioned before, the project staffing problem is perceived as a given and focus will go almost entirely to the project scheduling problem.

First, the project activity network dataset will be described in more depth. This is important to show that we are solving a real life problem and not an abstract theoretical problem. Furthermore, it must be noted that there is no single best method to solve all kinds of different project activity networks. The characteristics of these networks might play an important role in the final selected solution method.

In a second phase, existing basic genetic algorithm methods from the literature will be discussed. These methods can be grouped into generation methods, selection methods, operation methods, optimisation methods, reinsert methods and population management methods. These are picked from a broad range of applications, not limited to the project scheduling problem. This phase is concluded by placing these methods into a framework of a genetic algorithm.

The third phase consists of programming the methods in C++ and to connect them in a logical way. The resulting program will then be executed in several runs on three prototype datasets. Every run will narrow the number of methods by either excluding the weakest methods or by withholding only the best methods. For each prototype dataset, a genetic algorithm will be formulated.

As genetic algorithms do not promise an optimal solution, the goal is to reach a good solution. It is possible to determine an upper and lower bound for the cost objective function. These two values will then be used for benchmarking purposes of the proposed genetic algorithms. Besides benchmark testing, we will also perform tests to determine the effectivity of each method. General parameters will be changed to check the influence of these parameters on the applied algorithm and its results.

(25)

4. Literature Overview

In chapter four, the literature overview is given. The first section gives an introduction on genetic algorithms and presents the genetic algorithm frameworks. The second section shows how the data is represented. The third and last section will elaborate in depth on the different solution methods that are applied within the genetic algorithm framework.

4.1. Genetic algorithms

4.1.1. What are genetic algorithms

A genetic algorithm (Holland, 1975) is a heuristic that imitates human nature of evolution to find good (but often suboptimal) solutions to a problem in a reasonable amount of time. To the contrary there are exact solution methods which will always come up with the optimal solution. The genetic algorithm however intelligently exploits the random search.

It has been proven that genetic algorithms, in combination with local search, simulated annealing or tabu searchn provide very good solutions among Heuristics. (Brucker, 1999)

We will add local search to genetic algorithm framework. Genetic algorithms are population based algorithms, i.e. they work on a set of solutions.

The general procedure is described below.

Firstly, an initial population of member solutions is generated (1). Out of this population, ‘parents’ will be selected for mating(2). The parents will be combined in a certain way to generate new solutions (=mating), called the ‘children’ (3). The children will enter the population and a new cycle can start from the selection procedure. Step 2 and 3 will be repeated until an ending condition is reached (4). The underlying principal is survival of the fittest. This means that the stronger members of the population will survive while the inferior members will be eliminated. The next sections will go deeper into this general setup.

(26)

4.1.2. Genetic algorithm framework

This subsection discusses the integration of the project scheduling and project staffing problem and will mold it into a genetic algorithm framework.

The project scheduling problem is leading throughout the execution of the genetic algorithm. The staffing problem is called at the appropriate time when evaluation of the resulting schedule is needed.

The integration of the scheduling and staffing problem is molded into two slightly different forms of genetic algorithms, presented in two diagrams in figure 2 and figure

3.

The first framework consists of an initialisation phase, a selection phase and an operation phase. The obtained child after the operation phase will undergo local optimization where the algorithm will look for incremental improvements in the neighborhood of the child. This local optimization comes with partial evaluation to evaluate every instance of the explored neighborhood. We call it partial evaluation because the actual objective function is never calculated in this phase. Another measurement, which is a close approximation of the objective function under certain conditions, will be calculated in this phase. The reason for this partial evaluation is that the regular evaluation, which includes calculating the objective function through calling the staffing algorithm, consumes a considerable amount of time. This would extend the execution time of the algorithm unnecessarily. After the local optimization, only one schedule which is assumed to be the best, based upon the partial evaluation, will undergo the complete evaluation phase, including the staffing part. After the evaluation, a decision has to be made whether the newly generated and optimized

(27)

child can enter the population. This decision is made in the reinsert phase. The last phase checks an ending condition. If a certain ending condition is reached, the execution will stop and the best schedule found up to that moment will be the output of the algorithm. If the ending condition is not yet reached, the phases described above will be repeated starting from the selection phase.

The advantage of this form of the algorithm is that it does not allow deterioration of the resulting schedule, i.e. the outcome of the schedule at the end. This is because at the end of each cycle, the objective function is calculated and the best schedule is stored.

The biggest disadvantage of this form of the algorithm is that still an evaluation is being performed during each cycle, which consumes a considerate amount of time.

This is the reason why an alternative framework is formulated, represented in figure 3. The only difference between the second and the first framework is that the second framework postpones the evaluation phase until the very end. This second framework will not perform the evaluation in every cycle and thus save lots of execution time. In this case the quality of the schedules in the population is entirely controlled by the partial evaluation. When the ending condition is reached, the evaluation will be executed on every schedule in the population. The biggest advantage of this framework is the amount of time that can be saved in each cycle. The disadvantage however is that it is possible that the best schedule present in the population gets replaced by another one during the execution of the agorithm. Because only at the

(28)

probably be good, and thus should not be replaced and which schedule is bad and thus should be replaced. It is expected that the second framework yields an inferior quality of the schedules.

(29)

4.2. Data representation

The way in which the data is represented can have a great influence on the range of methods that can be applied. In this section the data representation for both scheduling and staffing problem will be shown.

4.2.1. Project Schedule representation

Good project management starts with a solid representation of the project schedule. A good tool for this is PERT, the Project Evaluation and Review Technique. (Cottrell, 1999) It is used for analyzing and representing activities in a project and was first developed in the late 1950s by the U.S. Navy as a tool for measuring and controlling the development progress for the Polaris Fleet Ballistic Missile program. (Malcolm, 1959) The method perceives a project as a network of activities and events.

An activity network shows the activities and the relations between them, often referred to as precedence relations. There are two types of activity networks, an activity-on-the-node (AON) network and an activity-on-the-arc (AOA) network representation.

Activity-on-the-arc

In this representation, each arc or arrow represents an activity or a task. The nodes define a milestone which is achieved when all activities on the arrows leading up to this node are completed. Dummy arcs can be introduced to enforce additional precedence relations.

Activity-on-the-node

In this representation, each node represents an activity or a certain task that has to be executed. The arcs or arrows represent the precedence relation. Figure 4 shows an example of such a network. Each node gets an activity number inside the node, the duration of the node is put on top and the necessary labor to execute the activity is put below the node. The network clearly visualizes that activity five can only be executed when both activity three and four have been executed. Activity five is called the successor of activities three and four, activities three and four are a predecessor

(30)

and end the network, they don’t consume any time nor resources. Their sole function is to have a clear single node at the beginning and end of the network. The network in Figure 4 will be used as an example in the next chapters.

In order to limit the complexity, it is assumed that the durations of the activities are deterministic. PERT often takes a certain variance on the duration of an activity into account when analyzing project schedules. Another complexity limiting factor is the type of relationships that are used. In this thesis, only finish-start relationships are considered. This means that the successor can only start when the predecessor has finished. When using generalized precedence relations, also start-start, finish-finish and start-finish relationships can be defined.(Dawson, 1995) Their respective meanings are: the successor can start when the predecessor has started, the successor can finish when the predecessor is finished and the successor finishes when the predecessor starts.

In combination with PERT, CPM or critical path method is often used. The critical path represents the group of activities that cannot be delayed without increasing the length of the project. It thus is the chain of activities that result in the minimal length of the project.

Throughout this thesis we have used the activity-on-the-node network representation method since this emphasizes the activities rather than the milestones. Furthermore it is easier to interpret at first sight and there is no need to define any dummy

(31)

activities besides the start and end activity. Other advantages identified by Turner include the ease of drawing activity-on-the-node networks and the ability to write network software more easily and the independency. (Turner, 1993)

Programming data representation

The network can be translated or decoded into static and dynamic data. The static data includes the duration of the activities, the necessary resources for the execution of each activity and the precedence relations of the activities. These will remain identical throughout the scheduling process. The dynamic data are the starting times of each activity, these will change throughout the process and are the eventual outcome.

Both the aforementioned static and dynamic data will be saved into vectors. i.e. a vector of durations, a vector of resource usages, a vector of successors and a vector of starting times. For example figure 4, this results into a vector represented in table 1.

a1 a2 ... a12

d[aj] 0 5 1 4 2 2 2 2 3 4 4 0

Table 1 Example Duration Vector

For the decoding of the activity starting times, there are two options. The first one consists of the absolute starting time of the activities and the second considers a relative starting time. (Wall, 1996) The first method is straightforward and states an exact starting time, independent of the starting time of other activities. Table 2 shows an example of a vector with absolute starting times.

a1 a2 ... a12

st[aj] 1 4 6 8 9 14 10 18 8 17 19 19

Table 2 Absolute Starting Times

Activity one starts at day one, activity two starts at day four and activity three starts at day six.

The second method does not state an absolute starting time but rather a relative starting time of the activity i.e. the relative starting time indicates how many days there are between the start of an activity and the end of the latest predecessor. The vector of these relative starting times will be referred to as the float vector in the remainder of the thesis.

(32)

Table 3 shows how a float vector is represented.

a1 a2 ... a12

fl[aj] 0 4 6 8 4 2 6 3 8 5 5 0

Table 3 Relative Starting Times

This float vector has to be interpreted in combination with the precedence relations to result into the actual starting times of the activities. If you combine this with the network of figure 4, the resulting starting times are calculated in table 4.

a1 a2 ... a12

st[aj] 0 4 15 8 20 11 6 19 30 27 27 33

Table 4 Resulting Starting Times

The start time of activity three is the end time of the latest predecessor (activity two) which is nine. Add to this number the float value of six and you get the starting time for activity three, i.e. fifteen.

We opted to go for the relative time representation for the simple reason that it is impossible to break any precedence relations constraint since it is imbedded into the definition of the float vector. If the absolute starting times are used, every schedule that is produced needs to be tested if any precedence relation is violated and if necessary repair these violations. An example of this is activity nine in table 2, which starts at day eight while its predecessing activity seven starts at day 10. table 2 is thus an example on an infeasible schedule.

Dataset

We execute the algorithm on three prototype activity network datasets. These activity networks contain the same activities, i.e. twelve activities including a dummy start and dummy end activity. Even the activity characteristics, concerning duration and resource demand are identical. The only aspect that is different between the three activity networks under research is the order in which the activities should be executed. This order is visually represented by the arrows as the precedence relations in the activity networks. Appendix A shows all three activity networks, further denoted as activity network AN1, AN2 and AN3. Note that AN2 is discussed previously in this section. These three networks are not chosen at random but have a distinctive topological structure. We want to test how the algorithm reacts to activity networks with a tendency to a very parallel structure compared to activity networks

(33)

with a tendency to a very serial structure. This is done by measuring the serial or parallel indicator as a topological indicator to measure the network structure. (Vanhoucke et al., 2008) This indicator has a value ranging from 0 to 1, with 0 meaning a complete parallel network structure and 1 meaning a complete serial network structure. This indicator (I) is calculated using the formula below.

In this calculation, n indicates the number of activities excluding the dummy start and end node and m denotes the maximum progressive level of the network. (Elmaghraby, 1977) AN1, AN2 and AN3 have 0.11, 0.33 and 0.44 as respective values for the serial parallel indicator (SP indicator). These values seem very low. However when setting higher values, the tendency towards a serial structure is so overwhelming that there is very limited scheduling flexibility. When the SP indicator is 0, which means that all activities are in a parallel structure, the scheduling flexibility is maximal since there is not strict order in the activities. When the SP indicator is 1, which means that all activities are in a serial structure, there is no scheduling flexibility since the order of the activities is completely fixed.

We assume that parallel networks have a broader solution space and offer more possibilities to the staffing of a project, possibly resulting in lower staffing costs.

When the scheduling problem is solved, all activities received a start time and a project schedule can be printed. (Figure 5) The red line represents the labor demand per day, this is the demand that needs to be covered by the project staffing.

(34)

4.2.2. Project Staffing representation

The project staffing representation revolves around the representation of the work pattern. This work pattern is a binary vector indicating whether a day in the pattern is a working day or a day off. An example of a work pattern and a combination of work patterns to form the labor supply is shown below.

Pattern Workers 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 w1 1 0 0 1 1 1 1 1 1 0 0 1 1 1 1 1 1 0 1 1 w2 1 1 1 0 0 1 1 1 0 1 1 0 0 1 1 0 1 1 1 1 w3 2 1 1 1 1 0 1 1 0 1 1 1 1 0 1 1 1 1 1 1 w4 2 1 1 1 1 0 1 1 1 0 0 1 1 1 1 1 1 0 1 1 w5 3 1 1 1 1 1 0 0 1 1 1 0 0 1 1 1 1 0 1 1 Staffing (Supply) 8 8 8 8 5 6 6 6 6 6 5 5 7 9 8 9 3 9 9 Scheduling (Demand) 8 8 8 8 5 6 6 6 6 6 5 5 9 9 9 9 10 10 10 Ext / Idle (Difference) 0 0 0 0 0 0 0 0 0 0 0 0 2 0 1 0 7 1 1 Table 5 Work patterns forming labor supply

Table 5 shows an example of staffing executed on AN2 and PS2. Five work patterns are distinguished to carry out the labor defined by PS2. Nine workers are hired for the project, work pattern one and two are executed by one worker each, work pattern three and four are executed by two workers each and 3 workers have working pattern five. The center of the table indicates whether a work pattern is on or off work on a certain day. Work patterns three and four have an overtime unit on day four; their first week contains six days of work in stead of the regular five days. In the bottom part of the table, the total labor supply per day is calculated as the sum of the individual active work patterns. The row below you can find the labor demand per day as defined by PS2. The lowest row indicates whether external labor units (positive value) should be hired or idle time (negative value) occurs for every day. Towards the end of the project, external workers are hired. The difference of labor supply and demand can also be shown in the project schedule graph. Below, you can find an extended version of figure 5. The red line still shows the labor demand while the green dotted line indicates the labor supply as defined by the staffing. Demand exceeds supply towards the end of the project.

(35)

(36)

4.3. Solution methods

This section contains an overview of all the methods that were taken into consideration for the algorithm. The structure of this section is guided by the flowchart in figure 6. The flowchart represents the different phases in the algorithm. Each phase contains multiple methods that contribute to the solution of the problem. The topics will be discussed in this order: Initialization, Selection, Operation, Local Optimization, Partial Evaluation, Reinsert, Ending Condition an Evaluation. Figure 6 indicates the sections and subsections in which the different methods are discussed.

4.3.1. Initialization

To start the algorithm, we need to initialize a population of schedules in the form of float vectors. Although research has often neglected the importance of the initialization phase, a bad initial population can lead to increased time-to-solution or even getting trapped into local optima. A minimum of diversity in the population is necessary to avoid premature convergence of the solutions towards suboptimal regions of the solution space. To initialize our population, simple constructive heuristics will be used.

To construct the schedule float vectors, we distinguish three groups of initialization methods, i.e. random, uniform and Gaussian initialization.

(37)

I1 Random Initialization

In the random initialization method, a maximum float value (MFV) is determined. Then a value is randomly generated in [0, MFV]. The parameterisation for MFV leads to the following methods.

I1a MFV = 1 x AVG duration of activities I1b MFV = 2 x AVG duration of activities I1c MFV = 3 x AVG duration of activities

I2 Uniform Initialization

In the uniform initialization method, a central value (CV) and a deviation value (DV) is determined. Then a value is uniformly generated in [CV-DV, CV+DV]. The biggest difference with the random initialization method is that this method does not necessarily include the value 0. If CV-DV would return a negative value, it is automatically initiated with a 0. If a large amount of float values are generated using this method, you will notice that they follow a uniform distribution. The parameters for CV and DV led to the following methods.

I2a CV = 1 x AVG duration of activities DV= 0,5 x AVG duration of activities I2b CV = 2 x AVG duration of activities DV = 0,5 x AVG duration of activities I2c CV = 2 x AVG duration of activities DV = 1 x AVG duration of activities Note that there is no method where CV = DV = AVG duration of activities since this method is identical to method I1b.

I3 Gaussian Initialization

In the Gaussian method, a central value (CV) and a standard deviation value (SDV) is determined. Then a value is generated conform the gaussian distribution with a mean CV and a standard deviation SDV. This method differs from the two previous ones by the fact that it allows more extreme values sine it has no maximum value, i.e. there is no closed upper end.

If this method would return a negative value, it is automatically initiated with 0. The parameters for CV and SDV led to the following methods.

I3a CV = 1 x AVG duration of activities SDV= 0,5 x AVG duration of activities I3b CV = 1 x AVG duration of activities SDV= 1 x AVG duration of activities

(38)

I4 Combined Initialization

This method makes a combination of the previous methods I1, I2 and I3 to become the initial population.

The pseudo code for the initialization phase can be found in figure 8. Pick an Initialization method I1, I2, I3 or I4 (I)

Determine initialization characteristics MFV, CV, DV, SDV While population not entirely filled

Create new empty float vector

While there are activities without float value Select a random activity a

If activity a has no float value

Initialize activity a using initialization method I Endif

Endwhile Endwhile

Figure 8 Pseudo code Initialization methods

Other Initialization method

Another initialization method that was considered finds its background in RACP. It is based of the maximum / minimum bounding strategy for determining the cheapest resource availability levels for a project. (Demeulemeester, 1995)

The general idea is to calculate a resource availability constraint and use this constraint in combination with a scheduling rule to generate initial schedules.

In a first step, calculate the minimum possible resource usage that would be necessary if there would be a constant availability of resources over the span of the project. This in fact, is equal to the resource usage of the most needy activity in the project. Applying this to our example in figure 4, the minimum possible resource usage is 10. This corresponds to the resource usage of activity 9. Secondly, calculate the maximum possible resource usage that would be necessary if there would be a constant availability of resources over the span of the project. For our example, this resource usage is 19 and is the maximum resource usage that is possible if activities 9, 10 and 11 would occur simultaneously. In a third step, the

(39)

activities will be scheduled using a basic priority rule and schedule generation scheme. Step 3 is repeated with different priority rules and a resource constraint ranging from 10 to 19. Each time step 3 is executed, this results in a schedule that is put in the initial population.

4.3.2. Selection

Once an initial population of schedules is available, we need to find a way to select one or more schedules on which a certain operation will be performed later on. These selected schedules are called parents. The key idea of this selection phase is to select good parents, in order to give them an opportunity to pass on their good genes onto the next generation. Likewise, this phase should also prevent the worst solutions from passing on their inferior genes onto the next generation. (Sivaraj, 2011)

A distinction can be made between two types of selection methods or schemes, a proportionate scheme and an ordinal-based scheme. (Sastry and Goldberg, 2001) Using an ordinal-based scheme, the chance of an individual to be selected depends on the ranking of the individual in the population based on a fitness measure.

With a proportionate scheme, the chance of an individual to be selected depends on the relative fitness of the individual in comparison with the other individuals in the population. In other words, with an ordinal-based scheme, the chance of being selected merely depends on the fitness rank of the individual in the population, while a proportionate scheme also takes into account how much one solution is better than the other to determine the selection likelihood. The latter not only implies an order of the individuals in the population but also a scaling measure to determine the relative superiority of one individual to another.

In this section, we will take a closer look at three selection methods, a random selection method, a roulette wheel selection method (proportionate scheme) and a tournament selection (ordinal-based scheme).

(40)

S1 Random Selection

This selection method does not embed any intelligence. It merely selects 2 individuals randomly. This method does not give preference to individuals that are more fit than others and therefore this method is perceived to be inferior compared to other selection methods that make use of more intelligent criteria.

S2 Roulette Wheel Selection

As stated in the introduction, the roulette wheel selection method is a proportional scheme to select individuals out of a population. The first step is to calculate a fitness value for each of the individuals in the population. Depending on the algorithm being used, this fitness value could be either the total cost (figure 2) or the average squared deviation (figure 3). The second step assigns a probability to each individual based on the fitness value. These probabilities are set out on a roulette wheel, the bigger the probability of the individual to be taken, the bigger the circumference of that individual on the roulette wheel will be. In the third step, the wheel gets spun and wherever the wheel stops, this individual will be taken. To select another individual, repeat the procedure starting from step two.

A small example will illustrate this method.

Assume five individuals and their respective fitness values. An increasing value indicates a better individual. The circumference of the roulette wheel gets divided into five parts, each part belonging to the selection of one individual. An example of the division of the wheel is given in figure 9. The wheel gets spun and the individual is chosen where the roulette wheel comes to a standstill.

(41)

Figure 9 Example Roulette Wheel

The pseudo code for the roulette wheel selection can be found in figure 10. Calculate the fitness value of each individual

While necessary to select additional individual

Determine selection probability for each individual based on fitness value Generate a random number, mimicking the spinning of a roulette wheel Translate the outcoming number into the underlying individual

Endwhile

Figure 10 Pseudo code Roulette Wheel Selection

S3 Tournament Selection

The tournament selection method is an ordinal-based scheme to select individuals out of a population. The first step consists of determining the rank-order of the individual based on a fitness value. This value will again be the total cost or the average squared deviation. This ranking will determine which individual wins in a tournament. In the second step, two individuals are selected randomly. Thirdly, the individual with the highest ranking wins the tournament and survives the selection stage. Repeat steps two and three until enough individuals are selected.

(42)

Calculate the fitness value of each individual

Make a ranking of the individuals based on fitness value While necessary to select additional individual

Select 2 individuals randomly

Determine the winner based on the ranking Endwhile

Figure 11 Pseude code Tournament Selection

This tournament selection method can be extended by either adding additional tournament rounds, such that an individual has to win 2 or more rounds before it is selected or by adding more individuals competing in each round.

A big advantage of this method over the roulette wheel is that there are no scaling issues. Since the tournament selection merely uses a rank, there is no need to translate the difference in fitness in a different selection probability. (Whitley, 1989)

S4 Combined Selection

Selection method S4 uses all three aforementioned methods to select individuals. Which method is used for each selection is determined on a random basis.

4.3.3. Operation

On the (pair of) parents, operators will be executed in order to generate different solutions, called ‘children’. The solutions can change drastically or just slightly. The two main types of operations are crossover and mutation. (Luke and Spector, 1998) Crossover relies on the hypothesis that highly fit individuals in the population consist of fit building blocks that can be mixed in order to become even more fit individuals. It will push the population to converge into one or more local optima. This process of convergence is often called intensification or exploitation. Mutation on the other hand serves the goal of maintaining genetic diversity in the population. Mutation thus fulfils the task of exploring the solution space. The aspects of mutation and crossover are antagonists but are both equally important. On the one hand, we need to be sure to have covered the solution space as much as possible through genetic diversity. But on the other hand, we also want to make sure that we find the best solution in the researched solution space through convergence into the best areas.

(43)

The occurrence of crossover and mutation are reflected by their crossover rate and mutation rate respectively, indicating the chance this operation will be executed on a selected pair of parents. Fixing these rates to a general optimal value is very difficult since they are very problem specific and they even depend on the stage of the genetic algorithm. Research has been done on the determination of these values for mutation and crossover both as a static constant and as a dynamic value, changing over the course of the execution of the algorithm. (Lin et al, 2003)

In this section the following crossover operators will be discussed, including 1-point crossover, 2-point crossover, blend crossover, mean crossover, extrapolation crossover, uniform crossover and mutation.

C1 1-Point Crossover

This crossover operator heavily relies on the building block hypothesis as it cuts the parents into two halves or blocks in order to recombine these blocks into the children. The point where the parents should be cut is determined randomly. (Spears and Anand, 1991)

An example is presented, 2 parents as float vectors containing the float value for 12 activities. The cut-off point is after the fourth activity.

a1 a2 ... a12

Parent 1 0 4 6 8 4 2 6 3 8 5 5 0

a1 a2 ... a12

Parent 2 0 2 1 4 3 9 5 7 3 10 2 0

These parents swap their float values after the cur-off point to create 2 children.

a1 a2 ... a12

Child 1 0 4 6 8 3 9 5 7 3 10 2 0

a1 a2 ... a12

Child 2 0 2 1 4 4 2 6 3 8 5 5 0

Assuming that parent 1 has a very fit first part and parent 2 has a very fit second part, child 1 has a high probability of outperforming both parents.

(44)

C2 2-Point Crossover

This crossover operator is identical to the previous one besides the fact that there is not a single cut-off point but 2 cut-off points that will chop the parent into 3 blocks to be recombined. An example with cut-off points after the third and seventh float value.

a1 a2 ... a12

Parent 1 0 4 6 8 4 2 6 3 8 5 5 0

a1 a2 ... a12

Parent 2 0 2 1 4 3 9 5 7 3 10 2 0

Resulting children are constructed by swapping the middle block.

a1 a2 ... a12

Child 1 0 4 6 4 3 9 5 3 8 5 5 0

a1 a2 ... a12

Child 2 0 2 1 8 4 2 6 7 3 10 2 0

C3 Blend Crossover

Using the blend crossover operator, you do not copy and recombine genetic material but you blend the corresponding genetic material, based on the distance between them. (Eshelman and Schaffer, 1992) (Takahashi and Kita, 2001) The blending will result into two values which are the boundaries for the newly generated child value. The steps to be undertaken will be illustrated with an example.

a1 a2 ... a12

Parent 1 0 4 6 8 4 2 6 3 8 5 5 0

a1 a2 ... a12

Parent 2 0 2 1 4 3 9 5 7 3 10 2 0

Since the first activity has an identical float value, we will take the second activity as an example. Firstly the distance between the two parents is calculated as the difference between the respective float values, i.e. distance d = |4-2|.

(1)

In the second step, 2 boundary values (X1 and X2) are determined taking the lowest float value, highest float value and the distance into account.

(45)

This would result in the following for our example.

(3)

In the third step, a new value gets generated randomly in the interval [X1,X2].

In our example, the interval is [1,5]. Do this for all activities until a entire child is produced. The pseudo code for the blend crossover operator can be found in figure 12.

For every activity

Calculate the distance d between the two parents

Calculate the lower boundary, using distance d

Calculate the upper boundary, using distance d

Randomly generate a new float value between the boundaries

Endfor

Figure 12 Pseudo code Blend CrossOver

C4 Mean Crossover

The mean crossover operator will calculate the mean value of the two parents for each activity and take this mean value as the new float value for the child (Wall, 1996). In case a non-integer value gets generated, the value will be randomly rounded up or down. An example is shown below.

a1 a2 ... a12

Parent 1 0 4 6 8 4 2 6 3 8 5 5 0

a1 a2 ... a12

Parent 2 0 2 1 4 3 9 5 7 3 10 2 0

Applying the mean crossover operator will result in this child.

a1 a2 ... a12

(46)

C5 Uniform Crossover

The uniform crossover is a very straightforward operator. To construct the child, for each activity it will randomly take the float value of either parent. Extra intelligence could be added in the sense that there is no random selection of the float value for each activity but the fittest parent gets a higher probability. (Magalhães-Mendes, 2013)

Applying uniform crossover can result in he child shown below.

a1 a2 ... a12 Parent 1 0 4 6 8 4 2 6 3 8 5 5 0 a1 a2 ... a12 Parent 2 0 2 1 4 3 9 5 7 3 10 2 0 a1 a2 ... a12 Child 0 2 6 8 3 2 6 7 3 10 5 0 C6 Combined Crossover

This crossover method combines the five aforementioned crossover operators. Every cycle, a new crossover operator gets chosen randomly.

Mutation

Mutation will be executed on a single individual. It does not combine 2 solutions but merely alters an individual in a certain spot.

Mutation of an activity float value can be done neglecting the current value, meaning that a reinitialization occurs. Another option is to take the current value into account and mutate that value by adding or subtracting some value. Our mutation operator neglects the current value.

This operator can be useful after a high number of generations, since at that point solutions can converge. As mentioned in the introduction, mutation will maintain some diversity that hinders the converging behavior of the population.

As the algorithm proceeds, it can be interesting to let the mutation rate evolve as well. Modifying this rate inversely (proportional) to the population diversity could prevent convergence. (Bäck, 1993)