Contents lists available atScienceDirect
J. Parallel Distrib. Comput.
journal homepage:www.elsevier.com/locate/jpdcOptimizing server placement in distributed systems in the presence
of competition
Jan-Jan Wu
a,b,∗,
Shu-Fan Shih
a,c,
Pangfeng Liu
c,
Yi-Min Chung
c aInstitute of Information Science, Academia Sinica, Taipei, Taiwan, ROCbResearch Center for Information Technology Innovation, Academia Sinica, Taipei, Taiwan, ROC
cDepartment of Computer Science and Information Engineering, National Taiwan University, Taipei, Taiwan, ROC
a r t i c l e i n f o
Article history:
Received 17 February 2010 Received in revised form 30 July 2010
Accepted 18 August 2010 Available online 16 September 2010
Keywords:
Competition-aware server placement Resource allocation
Distributed system Grid and Cloud Maximizing benefit Minimizing construction cost Optimal algorithms Heuristic algorithms
a b s t r a c t
Although the problem of data server placement in parallel and distributed systems has been studied extensively, most of the existing work assumes there is no competition between servers. Hence, their goal is to minimize read, update and storage cost. In this paper, we study the server placement problem in which a new server has to compete with existing servers for user requests. Therefore, in addition to minimizing cost, we also need to maximize the benefit of building a new server.
Our major results include three parts. First, for tree-structured systems, we propose anO(|V|3k)time dynamic programming algorithm to find the optimal placement ofkextra servers that maximizes the benefit in a tree with|V|nodes. We also propose anO(|V|3)time dynamic programming algorithm to find the optimal placement of extra servers that maximizes the benefit, without any constraint on the number of extra servers. Second, for general connected graphs, we prove that the server placement problems are NP-complete, and present three greedy heuristic algorithms, calledGreedy Add,Greedy Removeand Greedy Add-Remove, to solve them. Third, we show that if the number of requests a server can handle (i.e., server capacity) is bounded, the server placement problem is NP-complete even for tree networks. We then derive a variation of the same set of greedy heuristic algorithms, with consideration of server capacity constraint, to solve the problem.
Our experiment results demonstrate that the greedy algorithms achieve good results, when compared with the upper bounds found by a linear programming algorithm.Greedy Addperforms best in the unconstrained model, yielding a benefit within 12% difference from the theoretical upper bound in average. For the constrained model,Greedy Removeperforms best for smaller network sizes, while Greedy Add-Removeperforms best for larger network sizes. On average, the heuristic algorithms yield a benefit within 13% difference from the theoretical upper bound in the constrained model.
©2010 Elsevier Inc. All rights reserved.
1. Introduction
Although the problem of data server placement in parallel and distributed systems has been studied extensively, most of the existing work assumes there is no competition between servers. Hence, their goal is to minimize read, update and storage cost. In this paper, we study the server placement problem in which a new server has to compete with existing servers for user requests. Such competition-aware resource placement is important in many areas of applications, such as planning of constructing new business service sites and allocating resources in Grid and Cloud computing
∗Corresponding author at: Institute of Information Science, Academia Sinica,
Taipei, Taiwan, ROC.
E-mail addresses:[email protected](J.-J. Wu),[email protected]
(S.-F. Shih).
environments, which involve purchasing and consuming between providers and users. Therefore, in addition to minimizing cost, competition-aware server placement also needs to maximize the benefit of building a new server. For example, we assume that there is a number of McDonald’s restaurants in a city, but no Kentucky Fried Chicken (KFC) outlets. Now, if we decide to set up a number of KFC restaurants in the same city, where should we place them? We need to determine the locations for the KFC outlets so that they can compete with McDonald’s and maximize their profits.
We denote the servers to be set up asextraservers, and the existing (competitor) servers asoriginalservers. Thus, in the above example, KFC restaurants are the extra servers and McDonald’s restaurants are the original servers.
We use a graph to model the locations of the servers and users. A node in the graph represents a geographic location, and an edge represents a path between two locations. Building servers in these locations enables users at a node to request services from 0743-7315/$ – see front matter©2010 Elsevier Inc. All rights reserved.
the servers. Each edge has a communication cost. The distance between two nodes is the length of the shortest path that connects them.
For efficiency of locating the server, and lack of global knowl-edge, we assume that requests from users are always sent to the nearest server. A node can easily compute its nearest server by it-self, and we assume no global knowledge so each site chooses its server independently. For simplicity we also assume that the dis-tances from a node to all the other nodes are different, so the near-est server is uniquely defined.
After extra servers have been established, users that previously went to McDonald’s may now choose to go to KFC. We define the benefit of an extra server placement to be the profit derived from user requests made to the server, minus the cost of setting up the server. The cost may vary, depending on the location of the extra server. This paper considers the following two placement problems, which relate to the placement of extra server in the presence of competition from original servers.
1. Given the network layout and a numberk, where should we locatekextra servers such that they will earn the most profit? 2. Given the network layout, where should we locate extra servers
such that they will earn the most profit, without any constraint on the number of extra servers?
We also use data center location selection to motivate our work. Cloud computing service providers are building data centers globally and the locations of these servers are important. For example, users of Amazon’s Elastic Compute Cloud (EC2) [1] can choose servers either in Europe or in the United States. This indicates that users of cloud computing do care about distance to servers, which directly relates to service quality. When building new data centers, the service provider must build them in strategic locations so that users, while choosing closer servers for better network performance, will choose the new data centers instead of existing ones. Consequently, locations of existing data centers are crucial to the planning for new ones.
In the first part of the paper, we consider the two placement problems for distributed systems in which a server can service an unlimited number of requests. We call such a model the ‘‘Unconstrained model’’. We consider two network topologies: tree network and general graph. For tree networks, we develop a dynamic programming technique to solve these two problems with a tree graph inO
(
|
V|
3k)
andO(
|
V|
3)
time, respectively. Forgeneral graphs, we show that the two problems are intractable (NP-complete) and propose a greedy heuristic to solve them. We also run experiments and compare our results of the heuristic with theoretical upper bounds. Our experiment results demonstrate that the greedy heuristic achieves good results, even when compared with the upper bounds found by a linear programming algorithm. It yields performances within 12% difference from the theoretical upper bound in average.
In the second part of the paper, we address the same prob-lem under the ‘‘Constrained model’’, in which the number of user requests a server can handle is bounded. We show that, with bounded server capacity, the server placement problem is NP-complete even for tree networks. We propose three effective heuristic algorithms to approximate the optimal solution. Our ex-periment results indicate that the heuristic algorithms are promis-ing.
Similar server placement issues, such as I/O or video server placement problems [6,9,8,12,17,25,30,34,37,40,41], data replica placement problems [13,20,19,22,26,38], p-Medians [3,21], k -Medians [4,31,39], and resource location problems [2,28,33], have been studied in the literature. Our extra server model and solutions differ from these previous efforts in that we introduce the concept of competition. In other words, to maximize their profits, the
extra servers must compete with the original servers for user requests. The number of extra servers established is controlled by the building costs, which differ according to the location.
The remainder of this paper is organized as follows. Section2 discusses related works. Section 3 describes the unconstrained model and define two server placement problems. For trees we introduce dynamic programming methods for finding the optimal extra server placement for both placement problems. For general graphs we provide proofs that the two placement problems are both NP-complete, present heuristic algorithms to solve them, and report experiment results from the heuristics. Section4studies the server placement problem under the constrained model, presents three effective heuristics, and reports the experiment results for the constrained model. Section5contains our conclusions.
2. Related works
2.1. Server/replica placement
The models for the server/replica placement problem can be classified into two categories. The first set of models allow the request to go up and down the tree for the nearest replica. For example, Wolfson and Milo [43] suggested a model in which no limit is set for the server capacity. The read cost is the number of hops from a request to its server. The update cost is proportional to the size of the subtree that spans all replicas. The goal is to minimize the sum of read and update cost. Kalpakis et al. [20] suggested a model in which each server has a capacity limit and each site has different site building costs. The read cost is defined as the product of the amount of data transfer and the path length. The goal is to minimize the sum of read, update and site building costs. Unger and Cidon [38] suggested a similar model but without a server capacity limit. Guha et al. [16] suggested a model in which there is a known number of servers in the tree, each with equal capacity. There is no read, write, or site building costs, and the goal is to assign the request to a server (not necessarily the nearest one), so that the maximum distance from a client to its assigned server is minimized.
The second set of models only allows the request to search for the replica towards the root of the tree. For example, Jia et al. [19] suggested a model in which no server capacity or site building cost is set. The read cost is defined as the product of access path length and the amount of data. The update cost is defined as the sum of the link cost of the size of the subtree from root to all replicas. The goal is to minimize the sum of read and update costs. Cidon et al. [11] suggested a similar model in which a replica is associated with a site building cost, but there is no update cost. The goal is to minimize the sum of read cost and storage cost. Tang and Xu [36] later proposed a model in which there is a range limit on the number of hops between a request and its assigned replica. Their goal is to minimize the sum of update and storage costs.
2.2. p-medians and k-medians
Thep-medians (ork-medians) problem is defined as follows. Given a connected graph with non-negative weights associated with the nodes and lengths associated with the edges, compute locations ofpfacilities in the graph in order to minimize the sum of each node’s weight multiplied by the shortest distance between it and theppoints.
The general case is known to be NP-hard. The case when the graph is a tree has been also extensively studied. Kariv and Hakimi [21] presented a dynamic programming algorithm for this problem that runs in timeO
(
k2n2)
. Hsu [18] proposed an algorithmwith running timeO
(
kn3)
. Benkoczi et al. [4] proposed an algo-rithm based on a combination of divide-and-conquer and dynamicprogramming techniques. They show that the running time of their algorithm is bounded byO
(
npolylog
(
n))
. The difference between these efforts and our work is that thesep-median problems do not take the building costs into account, and they aim at minimizing the access costs, instead of maximizing the profits.In 1996, Tamir [35] described a dynamic programming model that solvesp-median problems in trees that incorporates building and access costs. The algorithm assumes that the cost for a client to request a service is an increasing function of the distance between the client and the server. If the benefit function in our model is a decreasing function of the distance between the client and the server, our placement problem can be addressed by transforming it into a p-median problem, and solving it by the dynamic programming method described in [35]. However, our method has the advantage that it can deal with any arbitrary benefit functions, and still obtain the optimal solution for trees.
Our dynamic programming technique on tree networks is similar to that proposed by Kalpakis, et al. [20]. They suggested a model in which each server has a capacity limit and each site has different site building costs. Each node will request data from the nearest server, so the data cost is defined as the product of the amount of data transfer and the path length. As a result the data cost of a node is proportional to its distance to the nearest server. The goal is to minimize the sum of data cost and site building cost. If we consider the data cost as anegative profit, then the goal is to maximize the difference between profit (minuend) and building cost (subtrahend), which is the same objective function as in our model. Nevertheless in our model we emphasize the importance of competition, and this demands different proof techniques. In addition, our method must deal with arbitrary benefit functions, not necessarily a function of the distance to the nearest server, and still obtain the optimal solution.
This paper also addresses the issues of server capacity. To the best of our knowledge, the only works that take the bounded server capacity into consideration are by Lee et al. [24] which focuses on server selection for on-line games, and by Kalpakis et al. [20], which focuses on data access in tree networks. Neither work considers the influence of competition.
In a previous work by ourselves [10], we address the same problem for the unconstrained model. However, we have made substantial extensions to that work, including (1) proof of opti-mality and time complexity analysis of the dynamic programming solution, (2) dynamic programming solution for the extra-server problem in tree networks, in which there is no constraint on the number of extra servers. The dynamic programming method, the proof of optimality and time complexity analysis are described in detail, (3) detailed proof of the NP-Completeness of thek -extra-server and the extra--extra-server problems for general graph networks, greedy heuristic algorithms for solving the problems, and new ex-periment results, (4) the study of the extra-server problem un-der theconstrained model, in which the server capacity is bounded (Section4). The NP-Completeness proof, three greedy heuristic al-gorithms for solving the problem and new experiment results are presented.
3. Extra-server problem without server capacity constraint
We formulate the problem using a connected graphG
=
(
V,
E)
, where V is the set of nodes and E is the set of edges. Each edge(
u, v)
∈
Ehas a positive integer distance denoted byd(
u, v)
. For any two nodesu, v
∈
V,d(
u, v)
denotes the length of the shortest path between them. For ease of representation, we also letd(v,
S)
=
minu∈Sd(v,
u)
be the length of the shortest pathbetween
v
and any node inS, whereS⊆
V.We considerserversthat provide a service to nodes in the graph. Every node
v
must go to thenearestserverufor service. Hence, ifFig. 1. An instance of the extra-server problem with nine nodes and two original servers. A shaded node indicates an original server.
a server is located at node
v
, thenv
will be serviced by that server. To simplify the concept of the ‘‘nearest server’’, we assume that, for every nodev
, the distances to all other nodes are different, i.e.,d(v,
u)
̸=
d(v, w)
foru̸=
w
. As a result the nearest server for every node isuniquelydefined.We assume there are several original serversO
⊆
VinG, and we would like to add a number of extra servers,X⊆
V whereX
∩
O= ∅, to
Gsuch that we can maximize their net profits. Byserving a client
v
, a server nodeu∈
Xmakes a profit ofb(v,
u)
. Note that the functionbcan be arbitrary. For example, unlike [35], we donot assume that, for the same client nodev
, the function value must be monotonic with respect to the distance betweenv
and the server nodeu. Letc(v)
be the cost of building a server at nodev
∈
V, andXbe the set of new servers we would like to add to the system. A nodev
∈
Vcan go to eitherOorXfor service — it goes toXfor service whend(v,
X) <
d(v,
O)
; otherwise, whend(v,
X) >
d(v,
O)
, it goes toO. LetVX denote the set ofnodes that goes toXfor service, andVO
=
V−
VXbe the set ofnodes that goes toO.
We assume that a newly placed extra server cannot be at the same location as an original server because in many cases the space for placing service is exclusive. For example two restaurants cannot possibly share the same space. Our model discretized the available locations so that servers can be placed into nodes that are next to each other, but not the same node.
We denote thenearest serverthat
v
uses asNS(v)
. ConsequentlyNS
(v)
∈
Oifv
∈
VO, andNS(v)
∈
Xifv
∈
VX. We can now definethebenefit functionof adding the serversXas follows:
B
(
X)
=
−
v∈VX
b
(v,
NS(v))
−
−
v∈X
c
(v).
(1)Next, we define the problems.
k-extra-server. Given a graphG, a set of original serversO, and an integerk, 1
≤
k≤ |
V−
O|, our objective is to find the optimal
placement ofkextra serversX⊆
(
V−
O)
such that the following benefit function will be maximized.max
X⊆(V−O),|X|=kB
(
X)
(2)Extra-server. Given a graphGand a set of original serversO, our objective is to find the optimal placement of extra serversX
⊆
(
V−
O)
such that the following benefit function will be maximized. maxX⊆(V−O)
B
(
X).
(3)Fig. 1shows an instance of the problem. There are nine nodes and two original servers in the graph. The number next to a link is the distance between two nodes, and the number within a node is the construction cost of that node.
Fig. 2shows the placement of extra servers for the problem in Fig. 1; the extra servers are placed at nodesuand
v
, i.e.,X= {
u, v
}.
According to our model, nodesz,
u, andw
will go to extra serverufor service, while nodes
v
andywill go to nodev
. All the other nodes will go to the original serversxandtfor service. The benefit of this assignment isB(
X)
=
b(
u,
u)
+
b(
z,
u)
+
b(w,
u)
+
Fig. 2. The placement of extra server for the problem illustrated inFig. 1. The extra servers are placed at nodesuandv.
Fig. 3. An illustration of T(i)
v , the subtree consisting of v and the first i
subtrees ofv.
serverufor service, even when the benefit of going to
v
is greater, i.e.,b(w, v) >
b(w,
u)
. This is because a node always goes to the nearest server; however, in our model, that is not necessarily the server that would yield the maximum benefit. We do not assume that the benefit is a monotonic function of the distance between a client and a server; instead, we assume that the benefit is an arbitrary function of a client and its server.3.1. Finding extra server locations in tree networks
This section describes our dynamic programming methods used to solve thek-extra-serverproblem and theextra-serverproblem. We deal with thek-extra-server problem first. We focus on the case where the graphG
=
(
V,
E)
is a tree. LetT be the tree andr be the root ofT. For each node
v
∈
V, letTv be the subtree ofT rooted atv
. Ifv
is an internal node, then we usechild(v)
=
{
v
1, v
2, . . . , v
|child(v)|}
to denote the children ofv
. Following the notations used in [20], letT(i)v be the subtree ofT that consists of
v
and the subtrees rooted at the firstichildren ofv
, i.e.,Tv(i)=
{
v
} ∪ ∪
ij=1Tvj, as shown inFig. 3.
3.1.1. k-extra-server solution
We first define a key function of our dynamic programming method.
Definition 1(Benefit Function, B).For nodes
v,
u∈
V, an integerk, and an integeribetween 0 and|
child(v)
|, we define
Bv,k,ui to be the maximum benefit derived by placingkextra servers inT(i)v , under
the condition thatu
=
NS(v)
. Consequentlyucan be either an original server or an extra server.We now consider the benefit function Bv,k,iu by placing X in
Tv(i). We defineX as the set of k extra servers that maximizes the following benefit function (recall thatOis the set of original servers). Bv,k,iu
=
max X
−
w∈Tv(i),NS(w)∈X∪u b(w,
NS(w))
−
−
s∈X c(
s)
,
u̸∈
O,
Bv,k,iu=
max X
−
w∈Tv(i),NS(w)∈X b(w,
NS(w))
−
−
s∈X c(
s)
,
u∈
O.
The definition indicates that the benefit includes those nodes that will either go to the extra serversXor tou(whenu̸∈
O) for service, minus the construction cost of the extra server setX.For the case whereuis not inO,uis an extra server because, by definition,uis
v
’s nearest server. However, sinceucan be a node outsideT(i)v , it cannot be inXbecauseXis a subset ofTv(i). We still
need to add the benefit fromT(i)
v toubecause we assume that an
extra server is placed inu.
To derive the benefit function B, we need the following ‘‘isolation’’ lemma, which guides the search for the nearest server in an isolated area so that dynamic programming is possible.
Lemma 1. For every node
v
∈
V and every childv
iofv
, if u∈
Tviis the nearest server to
v
, then u is also the nearest server tov
i.Proof. We prove this lemma by contradiction and assume that the
nearest server for
v
iisu′, notu. Sinceu′is the nearest server tov
i, the distanced(v
i,
u′)
must bestrictlysmaller thand(v
i,
u)
. Thelength of the shortest path between
v
andu′isd(v,
u′)
≤
d(v, v
i)
+
d
(v
i,
u′) <
d(v, v
i)
+
d(v
i,
u)
=
d(v,
u)
, which suggests thatu′iscloser to
v
thanu; however, this contradicts the assumption thatuis
v
’s nearest server.For ease of discussion of the following lemma, we define a node setVv,u,i. The set contains those nodes inTvi that could be the
nearest server to
v
i, under the condition thatuis the nearest serverto
v
, but not tov
i, i.e.,NS(v)
=
uandNS(v
i)
̸=
u. Intuitively, thesetVv,u,irepresents those nodes inTvithat are far enough from
v
sothat none of them could be the nearest server to
v
(when compared withu), but close enough tov
iso that one of them is the nearestserver to
v
i.Definition 2 (Vv,u,i).Letube the nearest server to
v
andibe aninteger between 1 and
|
child(v)
|.
Vv,u,i is the subset of thoseu′inTvi such that one ofu′is the nearest server to
v
i, but it isnot
the nearest server to
v
. That is,Vv,u,i= {
u′|
u′∈
Tvi,
d(v
i,
u′
) <
d(v
i,
u),
d(v,
u) <
d(v,
u′)
}.
Lemma 2. For every node
v
∈
V and every childv
iofv
, if u̸∈
Tviis the nearest server to
v
, then either u is the nearest server tov
iorthere exists a node u′
∈
Vv,u,ithat is the nearest server tov
i. Proof. Ifu isthe nearest server tov
i, the lemma follows. Otherwise,we conclude that the nearest server to
v
imust be inTvi, since thepath from
v
ito nodes not inTvimust pass throughv
, which alreadyhasuas its nearest server. The lemma then follows by the definition ofVv,u,i.
We now derive a key property required to locate the nearest server.
Theorem 1. For every node
v
∈
V and an integer i between0and|
child(v)
|
, if u is the nearest server tov
, then for every nodew
in Tvi, we can find the nearest server tow
in Tvi∪ {
u}
.Proof. A shortest path from a node
w
inTvi to any node outsideTvi must pass through the edge
(v
i, v)
. However, such a pathwould end at nodeu, sinceuis the nearest server to
v
; otherwise, we would be able to find a closer server thanu, but that would contradict the fact thatNS(v)
=
u.Terminal conditionsWe first derive two terminal conditions for
k
=
0. Whenkequals 0, we do not place any extra servers inT(i)
v . Ifuis an original server inO, every node inTv(i)will
go toOfor service, so the benefit is 0. Ifuis not inO, we consider two cases. First ifuis not inTv(i), every node in
T(i)
v will either go to an original server or toufor service;
thus, the benefit can be determined by Eq.(4).
B′
=
−
w∈Tv(i),d(w,u)<d(w,O)
b
(w,
u).
(4)In the second case,uis not an original server, but it is in
T(i)
v , which means that there is at least one extra server
inTv(i). This contradicts the assumption thatkis 0. For the purpose of dynamic programming, we define the benefit to be
−∞.
k
=
1,
u̸∈
O,
u∈
T(i)v . Whenkequals 1,uis inTv(i), so it is not
an original server; however it is definitely the only extra server inTv(i). Every node inTv(i)will either go toOorufor service; thus, the benefit can be calculated in the same way asB′
−
c(
u)
. Note that, sinceuis now in theXthatmaximizes the benefit ofT(i)
v ,c
(
u)
should be deductedfrom the benefit.
Recursion. Next, we derive the recursion function forBv,k,iu. For ease of explanation, we defineEv,k,iuinDefinition 3when we discuss the case whereu
̸∈
Tvi.Bv,k,iu
=
0,
ifk=
0 andu∈
O B′,
ifk=
0,u̸∈
O, andu̸∈
Tv(i) B′−
c(
u),
ifk=
1,u̸∈
O, andu∈
Tv(i) B′′,
ifu∈
Tvi max{B′′,
B′′′}
,
ifu̸∈
Tvi−∞
,
otherwise, (5) where B′′=
max 0≤j≤k
Bv,k−uj,i−1+
Bvi,u j,|child(vi)|
,
(6) and B′′′=
max 0≤j≤k
Bv,k−uj,i−1+
Ejv,,iu
.
(7) The first three cases were discussed under the terminal condi-tions so we only need to consider the rest.u
∈
Tvi. Ifu∈
Tvi, byLemma 1,uwill also be the nearest server tov
i, sinceuis the nearest server tov
. Then, byTheorem 1,every node inTvi will go to eitherTviorufor service. In addition,uis the nearest server to
v
. ByTheorem 1, all nodes inT(i−1)v obtain service fromuorTv(i−1), as shown
inFig. 4.
Assume that there arejextra servers inTvi; then there will bek
−
jextra servers inT(i−1)v , where 0
≤
j≤
k. To obtaintheXthat maximizes the benefit, we need to consider all possible values ofj, as formulated in Eq.(6). The recursion follows.
u
̸∈
Tvi. Ifuis not inTvi, we need to consider two sub-cases.Case1. Ifu is the nearest server of
v
i, the value ofBv,k,iuis defined by Eq.(6)because we can isolate two subtrees, as we did in the case whereu
∈
Tvi.Case2. If the nearest server to
v
iisnot u, then byLemma 2,we can find the nearest serveru′for
v
iinTvi. Weformulate the benefit asB′′′in Eq.(7), and illustrate
it inFig. 5.
Consider these two sub-cases ifu
̸∈
Tvi,Bv,k,iuis formulated as max{B′′,
B′′′}. Now, the only element needed to finish the recursion
is the new cost functionEkv,,iu.
Fig. 4. An illustration ofB′′
whenjextra servers are placed inTvi.
Fig. 5. An illustration ofB′′′
whenjextra servers are placed inTvi.
Definition 3(Ekv,,iu).For nodes
v,
u∈
V, an integerk, and thei-th child of nodev
(denoted byv
i), we defineEkv,,iu as the maximumbenefit derived by placingkextra servers in the subtreeTvi, where
u
̸∈
Tviis the nearest server tov
, but it is not the nearest server tov
i. Instead, the nearest server tov
iis au′inTvi. The benefit issimilarly defined in Eq.(8):
Ekv,,iu
=
max X
−
w∈Tvi,NS(w)∈X b(w,
NS(w))
−
−
s∈X c(
s)
.
(8)Based on the above discussion, the maximum benefitEkv,,iucan be derived by Eq.(9). That is, we need to enumerate all possible instances ofu′and use the one that maximizesBvi,u′
k,|child(vi)|. This is
exactly the definition ofVv,u,i.
Ekv,,iu
=
max u′∈V v,u,i
Bvi,u′ k,|child(vi)|
.
(9)The final solution. Finally, the maximum benefit of locatingkextra servers in the treeTcan be calculated by Eq.(10):
max u∈T
Brk,,u|child(r)|
.
(10)The possible candidates foruare subject to the following con-straints: (1) ifuis an original serverd
(
r,
u)
must bed(
r,
O)
, i.e.,uis the nearest original server to the root; and (2) ifuis not an original server, the distanced(
r,
u)
must be smaller thand(
r,
O)
to ensure thatuis the nearest extra server to the root.Proof of optimality. Next, we show that the formulation ofBv,k,ui
in Eq.(5)can compute the optimum placement correctly. LetBXv,,ui be the benefit of placing an extra server setXinT(i)
v , whereuis
a nearest server to
v
andXis a subset ofT(i)v . We want to show
thatBv,k,ui
=
BXv,optu ,iwhenXoptis an optimal placement ofkextraTheorem 2.LetXoptbe an optimal placement of k extra servers in
T(i)
v ; then, Bv,k,iu
=
Bv, u Xopt,i.Proof. The first three cases of the Bv,k,iu recursion in Eq.(5) are terminal cases, so we only consider Eqs.(6)and(7).
In both cases, we prove the theorem by induction on the number of nodes inTvi. The assumption is thatBv,k,iu
=
BXv,optu ,iis true when the number of nodes inTviis at mostn−
1. Our objective is to show that the claim is true for alln.First, we consider the case whereuis inTvi(Eq.(6)). ByLemma 1,
uis the nearest server to
v
, so it also the nearest server tov
i. ThebenefitBXv,optu ,i can be separated into two independent parts by
Theorem 1. BXv,optu ,i
=
Bv,uXopt∩Tv(i−1),i−1
+
Bvi,uXopt∩Tvi,|child(vi)|
.
(11)SinceXoptis only an optimal solution forTvi
,
Xopt∩
T(i−1)may
not be an optimal solution forTvi−1. Similarly,Xopt
∩
Tvi may notbe an optimal solution forTvieither. If
|
Xopt∩
Tv(i−1)|
isk−
jand|
Xopt∩
Tvi|
isj, then from the induction hypothesis, the optimalsolutions forTvi−1andTviwill accrue benefitsBv,k−uj,i−1andBvi,u
j,|child(vi)|
respectively, and we have the following result: BXv,optu ,i
≤
Bkv,−uj,i−1+
Bvi,uj,|child(vi)|
.
(12)By definition,B′′maximizes the benefit by adjusting the number
of extra servers inTvi−1 and Tvi, i.e.,B′′
=
max0≤j≤k
{
Bv,k−uj,i−1+
Bvi,u
j,|child(vi)|
}. We therefore conclude that
Bv,u
Xopt,iis indeedB
v,u k,i and
the theorem follows.
BXv,optu ,i
≤
B′′=
Bv,k,iu.
(13)The proof for the case whereuis in notTvi (Eq.(7)) is similar. We can separateTvi−1 andTvi byTheorem 1, and argue thatEkv,,iu
correctly computes the maximum benefit by the definition of
Vv,u,i.
Time complexity analysis. We now analyze the time complexity
of our solution.
Theorem 3.Given a tree T
=
(
V,
E)
and a setO⊆
V as the original servers, the k-extra-server problem for T can be solved in O(
|
V|
3k)
time, where0
≤
k≤ |
V−
O|
is an integer.Proof. The problem can be solved by Eqs. (4)–(10). The time
required for dynamic programming is derived by calculating all the entries inBv,k,iuandEkv,,iu. Consider each pair of
v
andi, so that there are in total∑
v∈V
|
child(v)
| = |
V| −
1 pairs. Thus, the number ofentries inBv,k,iuis
(
k+1
)
·|
V|·
(
|
V|−1
)
=
O(
|
V|
2k)
, and it takesO(
|
V|
)
time to calculate each entry; hence, the time required to calculate all the entries inBv,k,iuis bounded byO
(
|
V|
3k)
. Similarly, there areO
(
|
V|
2k)
entries inEv,uk,i, and it takesO
(
|
V|
)
time to calculate eachentry; therefore, the time required to calculate all the entries in
Ekv,,iuisO
(
|
V|
3k)
. The total time required is thereforeO(
|
V|
3k)
. Table 1shows the execution time of the dynamic programming under different tree network sizes when the numbers of extra servers and original servers are both set to 0.
3|V|. The processing
time does not growth rapidly with the number of node increases.3.1.2. Extra-server solution
We now consider the extra-server problem. Obviously, the larger the number of extra servers we use, the more the total benefit will increase. However, the benefit may not increase because of the cost of constructing the extra servers. For this reason, we do not know how many extra servers we should locate in order to maximize the benefit.
Table 1
Average processing time of the dynamic programming for tree networks.
|V| Average execution time (s)
100 0.0072 150 0.0225 200 0.0540 250 0.1091 300 0.1813 350 0.2769
A simple approach is to search the parameterkin Eq.(10)by using the dynamic programming of thek-extra server problem. To do this, we compute the following equation:
max u∈T,0≤k≤|V−O|
Brk,,u|child(r)|
,
(14)where the nodeuis the nearest server to the root.
The time complexity of this naive approach isO
(
|
V|
5)
, which is too expensive for large cases. In the following, we formulate a much more efficient dynamic programming approach with a time complexity ofO(
|
V|
3)
. We begin by defining a key function of ourdynamic programming technique.
Definition 4 (Benefit Function).For nodes
v,
u∈
V, and an integeribetween 0 and
|
child(v)
|, we define
Bv,i uas the maximum benefit derived by placing extra servers inT(i)v without any constraint on
the number of the extra servers, subject to the condition thatuis the nearest server for node
v
, i.e.,u=
NS(v)
. Consequently,uis either an original server or an extra server.The benefit function Bv,i u is similar to Bv,k,ui defined in Sec-tion3.1.1, except that there is no constraint on the number of extra servers.
Terminal conditions. We first derive two terminal conditions for the recursion of theBfunction.
i
=
0,
u∈
O: Wheniequals 0, there is only one node (i.e.,v
) inTv(i). Ifu
∈
O, the benefit is obviously 0.i
=
0,
u̸∈
O: Since there is only one nodev
inTv(i), the benefit isb
(v,
u)
. At the same time, sinceuisv
, i.e.,v
is an extra server, c(v)
should be deducted from the benefit. The benefit is derived by Eq.(14):B′
=
b(v,
u), v
̸=
u,
B′
=
b(v,
u)
−
c(v), v
=
u.
(15)Recursion. Now, we can derive the recursion forBv,i u. The maxi-mum benefitBv,i uis given by the following:
Bv,i u
=
0,
ifi=
0 andu∈
O B′,
ifi=
0 andu̸∈
O B′′,
ifu∈
Tvi max
B′′,
B′′′,
ifu̸∈
Tvi, (16) where B′′=
Bv,i−u1+
Bvi,u |child(vi)|,
(17) and B′′′=
Bv,i−u1+
Eiv,u.
(18) The first two cases were discussed as terminal conditions. The cases whereuis in or not inTvi are similar to the discussion in Eq.(5). Now, to finish the recursion, we only need the new cost functionEiv,u.Definition 5 (Eiv,u).For nodes
v,
u∈
Vand thei-th child of nodeby placing extra servers in the subtreeTviwithout any constraint on the number of extra servers, whereu
̸∈
Tviis the nearest server tov
; however,uis not the nearest server tov
i. Instead, the nearestserver to
v
iis au′inTvi.The maximum benefitEiv,uis given by the following:
Evv,u i
=
u′max∈V v,u,i
Bvi,u′ |child(vi)|
.
(19)The final solution. Finally, the maximum benefit of locating extra servers in the treeT is calculated by Eq.(20):
max u∈T
Br|child,u (r)|
.
(20)The possible candidates for u are subject to the following constraints: (1) ifuis an original server,d
(
r,
u)
must bed(
r,
O)
, i.e.,uis the nearest original server to the root; and (2) ifuis not an original server, the distanced(
r,
u)
must be smaller thand(
r,
O)
to ensure thatuis the nearest extra server to the root.Proof of optimality. We now show that the formulation ofBv,i uin Eq.(16)can compute the optimum placement correctly. LetBXv,,ui be the benefit of placing an extra server setXinTv(i), whereuis the nearest server to
v
andXis a subset ofT(i)v . Our objective is
to show thatBv,i u
=
Bv,Xoptu ,iwhenXoptis an optimal placement ofextra servers inTv(i).
Theorem 4. LetXoptbe an optimal placement of extra servers in Tv(i),
then Bv,i u
=
BXv,optu ,i.Proof. The proof is similar to the proof ofTheorem 2. Time complexity analysis.
Theorem 5. Given a tree graph T
=
(
V,
E)
in whichO⊆
V are the original servers of T , the extra-server problem for T can be solved in O(
|
V|
3)
time.Proof. The proof is similar to that ofTheorem 3. There areO
(
|
V|
2)
entries inBv,i u,O
(
|
V|
2)
entries inEv,ui , and the calculation of each
entry requires at mostO
(
|
V|
)
computing time. Hence, the problem can be solved inO(
|
V|
3)
time.3.2. Finding extra server locations in general graphs
In this section, we show that the k-server and extra-server problems are NP-complete when the network topology is a general graph, and then propose a heuristic solution.
3.2.1. NP-completeness
The NP-complete proof is derived from the dominating set
problem [15]. A subsetV′
⊆
V is a dominating set if for allu∈
V
−
V′, there is av
∈
V′such that the edge(
u, v)
is inE. The decision problem of thedominating setcan be formulated as follows: Given a graphG=
(
V,
E)
and a positive integerK≤ |
V|, is there a
dominating set of sizeKor less?k-extra-server. We now consider the k-extra-server problem
and define the corresponding decision problem as follows: In an instance of ak-extra-server problem, is there a placement ofkextra servers such that the benefit is at leastB?
Extra-server. Similarly, we define the extra-server decision
problem as follows: In an instance of an extra-server problem, is there a placement of extra servers such that the benefit is at least
B?
Theorem 6. The k-extra-server problem is NP-complete.
Proof. Thek-extra-server problem is in NP because a non-deter-ministic Turing machine can non-nondeternon-deter-ministically place thek
servers and verify that the total benefit is at leastBin polynomial time with respect to the input size.
We first define an at-most-k-extra-server problem, and show that thedominating setproblem is a special case of the at-most-k -extra-server problem. The latter problem is similar to thek -extra-server problem, except that we can placeno more than kservers.
Given a dominating set problem, we construct an at-most-k -extra-server problem as follows. We use the graphG
=
(
V,
E)
from the dominating set problem. For each nodev
∈
V, we setc(v)
=
0. For each edge(
u, v)
∈
E, we setd(
u, v)
=
1 andb(v,
u)
=
1. If(v,
u)
is not inE, we setb(v,
u)
to 0. We also setb(v, v)
to 1 for allv
. Finally we set the original server set ofOto be empty,Bto|
V|,
andktoK. The construction can be done in polynomial time since the size of constructed graph is as same as the original graphG, and the benefit function, the distance function, and the parameters can be computed directly fromG.In this at-most-k-extra-server problem instance, the optimal benefit is at mostB
=
|
V|, which can only be achieved when
every node is either an extra server or adjacent to an extra server. Consequently, the extra server set can achieveBbenefit if and only if it is a dominating set. The size of the graph’s dominating set will be at mostKif and only if we can find a set of at mostKextra servers whose benefitB= |
V|. It is obvious that the at-most-
k -extra-server problem is in NP. As a result, the problem is NP-complete.Now, let us consider the at-most-k-extra-server andk -extra-server problems under the restriction that the construction cost of every node is 0. Clearly, there is no distinction between these two decision problems, since the more servers we build, the greater the probability that we will achieve the benefit of
|
V|. Therefore,
both problems are NP-complete. Now, since thek-extra-server problem without construction costs is already NP-complete, its general form, thek-extra-server problem with construction costs, is also NP-complete.Theorem 7.The extra-server problem is NP-complete.
Proof. The proof is similar to that ofTheorem 6. The extra-server problem is in NP because a non-deterministic Turing machine can non-nondeterministically place servers and verify that the total benefit is at leastBin polynomial time with respect to the input size.
Now given a dominating set problem, we translate it into an extra-server problem as follows. The graphG
=
(
V,
E)
remains the same. The benefit functionb(v,
u)
is set to 2|V|
if the edge(v,
u)
is inE, or
v
=
u; otherwise, b(v,
u)
is set to 0. The cost of constructing at each node is set to 1, and the benefit goalBis set to 2|V|
2−
K. The construction could be done in polynomial timesince the size of constructed graph is as same as the original graph
G, and the benefit function, the construction cost function, and the parameters can be computed directly fromG.
We argue that the dominating set problem instance has a solution if and only if the constructed extra-server problem instance has a solution. Sinceb
(v,
u)
is set to a very large value, every node must be next to a server or have a server so that it can derive the benefit of 2|V|
2−
K. If any node is not adjacent to a server, the loss in benefits cannot be compensated for by the saving in construction costs, which are at most|
V|. Therefore, a server
placement can achieve at least 2|V|
2−
Kif and only if there is adominating set of sizeKat most. The theorem follows.
3.2.2. Greedy heuristics for the unconstrained model
Since thek-extra-server problem and the extra-server problem are both NP-complete, we propose three greedy heuristics, called
problems. Here, we only describe heuristics for thek-extra-server problem because the method for the extra-server problem is very similar, since we can record the results from the intermediate stages of these heuristics and select the bestkvalue.
Greedy Add.The heuristicGreedy Addworks in iterations. In each iteration, it allocates an extra server that maximizes its benefit, while considering all original servers and previously added extra server as competitors. The selected extra server is added to the extra server set. The process repeatedly adds an extra server until
kextra servers have been located.
Greedy Remove. The algorithm Greedy Remove starts with
placing an extra server on every node that is not an original server. Then,Greedy Removeremoves unnecessary extra servers iteratively. In each iteration, it removes the extra server which has the highest cost. The process is repeated until the number of extra server equals tok.
Greedy Add-Remove.The heuristicGreedy Add-Removeworks
in two phases:Greedy Addand thenGreedy Remove. The first phase,Greedy Addadds an extra server without concerning about a limitation of the number of extra serverskuntil no more benefit can be earned. The second phase,Greedy Removetakes the extra server set determined by the first phase as input and repeatedly remove extra servers until the number of extra servers equals tok.
3.3. Experimental results for the unconstrained model
We conduct simulations to compare the performance of the greedy heuristics with the linear programming solutions derived by using GLPK (GNU Linear Programming Kit) [27] for the k-extra-server problem. GLPK is a set of routines designed for solving large-scale linear programming (LP) problems, mixed integer programming (MIP) problems, and other related problems.
We now derive the linear programming for thek-extra-server problem. For ease of explanation, we first derive integer ming for the problem, and then translate it into a linear program-ming problem by relaxing two of its constraints.
Let the 0–1 variableXudenote whether there is an extra server
onu
∈
V−
O, and let the 0–1 variableZuv,u∈
V−
O, andv
∈
Vdenote whether
v
is a client ofu. The integer programming for thek-extra-server problem is formulated as follows:
maximize
−
u∈(V−X)−
v∈V Zuvb(v,
u)
−
−
u∈V Xuc(
u),
(21) subject to Xu∈ {0
,
1},
foru∈
V,
(22a) Zuv∈ {0
,
1},
foru∈
V,v
∈
V,
(22b) Xu=
0,
foru∈
O,
(22c)−
u∈V Xu=
k,
(22d)−
u∈V Zuv=
1,
forv
∈
V,
(22e) Xu−
Zuv≥
0,
foru∈
(
V−
O)
, eachv
∈
V,
(22f) Zuv=
0,
foru∈
V,v
∈
V, andd(v,
u) >
d(v,
O).
(22g) Zuv−
Zwv≥
0,
foru, w, v
∈
V, andd(v,
u) <
d(v, w).
(22h)Constraint(22c)prohibits the placement of an extra server on O, constraint(22d)ensures that there are exactlykextra servers placed onV, and constraint(22e)ensures that each node
v
∈
Vis a client of at least one server (either an extra server or an original server). Constraint(22f)requires that there must be an extra server placed onuifv
is a client ofu, whereu∈
(
V−
O)
, andv
∈
V. Constraint(22g)ensures that every nodev
∈
Vcannot be a clientof a serveruifd
(v,
u) >
d(v,
O)
. In addition, constraint(22h) ensures that each nodev
∈
V cannot be a client of a serverw
ifd(v,
u) <
d(v, w)
, whereuandw
∈
V.Consider the 0–1 variables Xu and Zuv in constraints (22a)
and (22b)respectively. We replace them with constraints(23a) and (23b)respectively, so that we have a linear programming formulation:
0
≤
Xu≤
1,
for eachu∈
V,
(23a)0
≤
Zuv≤
1,
for eachu∈
V,v
∈
V.
(23b)This allows us to place a fractional number of an extra server on a node u
∈
V. Moreover,v
can go tou for service withb
(v,
u)
benefit. Consequently, this linear programming method can obtain an optimal solution. The optimal benefit gained from linear programming only serves as an upper bound, since it allows a fractional number of an extra server to be placed on a node. However, in our experiments, we find that, in most cases, linear programming produces integer solutions, i.e.,XuandZuvare in therange
{0
,
1}.3.3.1. Experiment setting
In the experiments, we use the GT-ITM generator [5] to generate random graphs based on the Waxman’s model [42]. GT-ITM and Waxman’s model are widely used for generating random graph topologies, such as Internet topologies, for a broad class of optimization problems, such as measurement and dissemination of distance information on the global Internet [7,14], dynamic network routing schemes [32], provisioning of VPNs [23], and load balanced virtual circuits [29]. In the Waxman model, each of the graphs is connected, and nodes are added randomly in a s
×
ssquare. The probability of an edge betweenuand
v
is given byp
(
u, v)
=
α
e−d/βL,
where 0
< α, β
≤
1. The distancedbetweenuand uis the Euclidean distance betweenuandv
, andL=
√
2sis the largest possible distance between any two nodes.
In our experiments, we sets to 100
, α
to 0.
5, andβ
to 0.
5. The default number of nodes|
V|
is set to 196, the number of original servers|
O|
and the number of the extra serverskare set to 0.
3V. The original servers are placed randomly in the graph. For eachv
, we set a value r(v)
and the building cost c(v)
as a uniform distribution overt the range[0
,
2qr]
and[0
,
2qc]. The
benefit functionb
(v,
u)
is defined asr(v)
divided by the distance betweenv
anduand the defaultqr andqcare set to 30,000 and10,000 respectively.
Since the solution of the linear program is a upper bound for this problem, this ‘‘super’’ optimal solution is used as a performance measurement criteria. We compare the solution from our heuris-tics with this super optimal solution. For ease of presentation, we compare thenormalized benefit, which is defined as the ratio of the benefit generated by the super optimal solution to the benefit gen-erated by the heuristic algorithm.
3.3.2. Effect of
α
We evaluate the performance of all heuristics by comparing their normalized benefits under different values of
α
. In these experiments, we set the value ofα
from 0.
1 to 0.
5, and each data point is the average of 100 simulations.Fig. 6 shows the normalized benefit results under various value of
α
. The performance of Greedy Add and GreedyAdd-Remove become stable when
α
is greater than 0.
3 and thedifference between their normalized benefits becomes smaller.
Greedy Add-Removeis the worst among the three heuristics, and
Fig. 6. Comparison of normalized benefit under different values ofαwhen the value ofβis 0.5.
Fig. 7. Comparison of normalized benefit under different network sizes.
3.3.3. Effect of network size
We compare the normalized benefits of all heuristics under different network sizes from 100 to 900 nodes. The data for each network size is the average of 100 simulation runs.
Fig. 7shows that the performance ofGreedy AddandGreedy Add-Removeare similar and scalable with network size when it is less than 400 nodes. However,Greedy Remove, is not scalable and it is the worst among three heuristics.
We also use a standard greedy algorithm to place servers with-out considering the locations of the original servers, and compare the results with our heuristics.Fig. 7shows that theGreedy Add
andGreedy Add-Removeoutperforms the standard greedy
algo-rithm by 30%.
3.3.4. Effect of k
We compare the normalized benefits of all heuristics under different numbers of extra serversk. In this experiment, we vary the number of extra serverskfrom 10% to 60% of the number of nodes in the network.
Fig. 8shows that the performance ofGreedy AddandGreedy
Add-Removedecreases when the percentage of extra serverskis
greater than 0.3. The reason is that these two heuristics places an extra server to maximize the benefit at each step and it cannot consider the overall situation; thus, the difference accumulates at each step — more servers means a larger difference between heuristics and the upper bound.
Fig. 8. Comparison of normalized benefit under different numbers of extra servers.
Fig. 9. Comparison of normalized benefit under different numbers of original servers.
3.3.5. Effect of the number of original servers
Next, we compare the normalized benefits of all heuristics under different numbers of original servers. In this experiment, we vary the number of original servers
|
O|
from 10% to 50% of the number of nodes in the network.Fig. 9 shows that the normalized benefits of all heuristics increase as the number of original servers increases. The reason is that when the number of original servers increases, the number of locations for placing extra servers decreases, therefore it is more likely that the heuristic can find good locations to place extra servers.
3.3.6. Effect of original server placement
We now observe the normalized benefits of all heuristics under a new original servers placement method. Instead of randomly placing original servers, we place original servers using
Greedy Addheuristic.Fig. 10illustrates normalized benefits of all heuristics under the new original server placement. We observe that the behaviors of heuristics are similar to those inFig. 7. For example, the normalized benefits ofGreedy AddandGreedy
Add-Remove are very stable as the size of network increases from
100 to 400. These similar behaviors indicate that the placement of original servers has little effect on the heuristics.
3.3.7. Effect of building cost
In this experiment, we compare the normalized benefit of the heuristics under different value of building costc