NP-hardness Proof - First Phase: Initial Study and Algorithms Screening

Optimisation and Metaheuristic Algorithms

3.3 First Phase: Initial Study and Algorithms Screening

3.3.4 NP-hardness Proof

According to [88], any decision problem that can be reduced from an NP-complete problem, whether it is a member of NP or not, is not solvable in polynomial time unless P=NP since it is as hard as the NP-complete problem. In order to prove the NP-hardness of FPP, its computational complexity is analysed. Therefore, the decision counterpart of fleet placement problem (FPP) – FPP–D is introduced. The decision counterpart FPP–D inherits all parameters from FPP.

In this section, the NP-hardness of FPP through proving the NP-completeness of FPP–D is demonstrated. For FPP–D, the question is to determine whether there exists a solution with f station(s) such that all buildings are covered.

Proposition 1The FPP is NP-hard in the strong sense even if there is only one user in each building.

Proof. A polynomial-time transformation to the FPP–D from the strongly NP-complete problem “Set Cover Problem (Minimum Cover Problem)” is introduced [89, 88].

Set Cover Problem or SCP can be defined as follows: given a universe U of R elements, a collection of subsets of U , G ={g¹, g2, g3, ..., gL} and a positive integer K ≤ |G|, the question is “Does G contain a cover for U of size K or less, i.e., a subset G⁰⊂ G with |G⁰| ≤ K such that every element of U belongs to at least one member of G⁰?”

Given an instance of SCP, the following instance of FPP–D is introduced. Firstly, let all buildings in B be the equivalence of universe U in SCP and m =|B| = R. Then, let S be the direct transformation of collection G where S⁰ ={s⁰1, s⁰₂, s⁰₃, ..., s⁰_L}, such that s⁰l= gl, l = 1, 2, 3, . . . , L, hence n = L. In addition, assuming w = 1 and r = 0 so that the building is covered if it is connected to the location (sl). With the prior assumption, the distances in matrix D are assumed to be one if the location is a 1-hop neighbour of the building and zero, otherwise. Therefore matrix D reflects the membership of S and is used to constitutes the membership of

Table 3.2: Spearman correlation between two objectives. Objective 1 is user coverage maximisation. Objec-tive 2 is global walking distance minimisation.

Correlation Coefficient p-Value Objective 1 - Objective 2 0.977 2.2e-16

collection S in FPP–D. Next, let pj = 1; j ∈ {1, 2, 3, . . . , |B|} which means there is only user in building j.

Finally, the threshold value f = K is set.

Let X be a solution to SCP. A solution for FPP–D is constructed in which the buildings in B (U ) are covered by f stations where s⁰_l= gl∈ X, such that x^l= s⁰_l, if s⁰_l∈ X and x^l=∅ if s⁰l∈ X. Since X is a cover of U (in/ SCP), all buildings in B are covered and the number of stations in the corresponding solution (for FPP–D) is f =|X|.

Now assume that there exists a solution Y in FPP–D with |Y | ≤ K and |Y | should not exceed K, otherwise,

|Y | > K and the condition will not hold. Therefore, there are at most K station(s) with yⁱ 6= ∅. Since all buildings forms B and all buildings in B belongs to at least one member of Y , the selected stations with yi6= ∅ represents a solution to SCP, given a polynomial transformation from SCP to FPP-D. Since all input numbers in the FPP–D instance have size at most polynomial in the size of the input, FPP is strongly NP-hard.

SCP (as an optimisation problem) was proved to be polynomially non-approximable within the ratio c· ln |G|, for some constant c > 0 [90]. Therefore, the following statement is proposed.

Statement 1. There exists no polynomial (c· ln n)-approximation algorithm for the FPP where n is the input size, unless P = N P .

3.3.4.1 Objectives Correlation

The conflict between the two objectives of FPP, namely, (1) maximising user coverage (Objective 1) and (2) minimising global walking distance (Objective 2), are proven in this section, using Spearman rank correlation coefficient [91] which is a non-parametric rank test with the confidence interval of 95%. P-Value is used to decide whether accept or reject the null hypothesis as it reflects the probability of obtaining the observed results of a test, assuming that the null hypothesis is correct. Hence, by this definition and the confidence interval of 95%, if the observed p-value is less than 0.05, the null-hypothesis is rejected.. The spearman coefficient is in [−1, 1] where -1 indicates the negative correlation between two data, meaning while x increase, y decrease and vice versa (conflicting). 0 means there is no association between two data. Finally, 1 means there is a positive correlation, for instance, x increases with y or x decreases with y. The closer the coefficient to either side, the stronger the correlation is.

Table 3.2 shows that the coefficient is 0.977. This indicates very strong positive correlation between the two objectives. It means that as the user coverage increases, the global walking distance also does. However, in the context of FPP, this indicates the conflict in the two objectives since the aim is to find a set of high user coverage station locations that still yields low global walking distance. Therefore, with this correlation analysis, the two objectives in FPP are conflicting.

3.3.5 Methodology

Location problem has been studied for several decades already. Hence, there are plethora of existing algorithms.

For this initial algorithms evaluation, one representative from each category (exact methods, heuristic and

metaheuristic algorithm) is selcetd, namely, PolySCIP [51], greedy and iterative heuristic [17], and NSGA-II [60]. PolySCIP is the only exact multiobjective solver in the market. As for the iterative heuristic algorithm, even though it was proposed decades ago, it was still used in many recent works. A simple greedy algorithm is also introduced to show the difference between complex and simple behaviours. Finally, NSGA-II is selected since it was shown to be more efficient than Simple Evolution Algorithm for multi-objective Optimisation (SEAMO) [92], Strength Pareto Evolutionary Algorithm II (SPEA-II) [70] and Pareto Envelope-based Selection Algorithm (PESA) [93] in MCLP variant [26]. NSGA-III is not considered in this case because it can only be applied on problems that have three or more objectives [72]. The details and pseudocode of selected algorithms can be found in section 2.2.

3.3.5.1 Script for PolySCIP

In order to use PolySCIP, an optimisation model must be transformed into a program accepted by PolySCIP.

This can be done through Zimpl [94]. The tool can translate an optimisation model into an integer programming that can be solved with PolySCIP, CPLEX and AMPL variants as well. For this research, the script is written in Zimpl as shown in Listing 3.1 to express the optimisation models. Some additional parameters are introduced to overcome the limitation in the script.

3.3.5.2 Weighted Sum for Heuristic Algorithms

Two heuristic algorithms are employed in this stage, Greedy and Iterative. In order to use both variants with the multiobjective fleet placement problem, the weighted sum approach is used. There are three weight vector for testing in this phase which are [1.0, 0.0], [0.5, 0.5], and [0.0, 1.0] where the first weight is for the user coverage objective and the second weight is for the global distance objective. With these weights, three solutions are expected from each heuristic algorithm.

3.3.5.3 Solution Encoding and Operators in NSGA-II

The focus is on explaining the method of solution encoding and employed genetic operators in NSGA-II since for this phase of the study. The solution encoding in this problem along with population initialisation, selection process, crossover and mutation are described as they are important components in NSGA-II.

Solution Encoding Due to the extensive amount of fleet locations in a city (can easily reach 100,000 locations in big cities), binary encoding is not efficient. An integer encoding based on the street node ID (unique to each location) is proposed. The length of the solution is equal to the desired amount of station to be opened. In this encoding, there is no order in the encoding, hence, the locations are sorted in descending order in the solution to reduce the search in the solution space. To elaborate on this, supposedly, there are two solutions [3,2,1,4] and [1,2,3,4] and both of them have the same fitness score. By sorting them, this incident is prevented.

Population Initialisation: A solution is initialised by choosing uniformly at random street nodes from all street nodes (in a problem instance) without replacement meaning each location in the solution is unique at the initialisation.

set street := {1..N_s};

set building := {1..N_b};

param Cover[street*building] = [...]

param p[N_b] = [...]

param Distance[street*building] = [...]

10 var st[street] binary;

var oc[building] integer >= 0 <= card(street);

var cb[building] binary;

var a[street*building] binary;

var z real >= 0 <= 500;

maximize obj1: sum in building: cb[i]*p[i];

obj2: -1*z;

subto c1:

20 sum in street: st[i] == N_st;

subto c2:

forall in building:

oc[i] == sum <j> in street: Cover[j,i]*st[j];

subto c3:

forall in building:

vif oc[i] >= 1 then cb[i] == 1

30 else

cb[i] == 0 end;

subto c4:

35 forall <i,j> in street*building:

a[i,j] <= st[i]*Cover[i,j];

subto c5:

forall <j> in building:

40 sum in street: a[i,j] == cb[j];

subto c6:

forall <i,j> in street*building:

z >= a[i,j]*Distance[i,j];

subto c7:

sum <i,j> in street*building: a[i,j] >= 1;

Listing 3.1: Optimization models in Zimpl language.

Selection The selection process for crossover is based on tournamentDCD operator which is proposed in [60]

which uses crowding distance to cut the tie in case both candidates are non-dominated.

Crossover The recombination process in this work is based on a two-point crossover. The process randomly selects two points in both solutions as starting and ending points for exchanging portions and recombines these portions to create two new solutions as shown in Figure 3.3.

Figure 3.3: A two-point crossover process.

Mutation The uniform mutation is applied in this work with a condition that the replacement comes from the pool of all vehicle locations (defined by street node IDs). Figure 3.4 shows the mutation process where Sample is a function to randomly pick one location from the pool and 1, 2, 3, . . . , n denotes all street node IDs. However, if the replacement already exists in the current solution, the process is repeated until a valid replacement is found.

Figure 3.4: A uniform mutation process.

In document Graph-based Algorithms for Smart Mobility Planning and Large-scale Network Discovery (Page 38-42)