Approximation Algorithms for Data Distribution with Load Balancing of Web Servers

(1)

Approximation Algorithms for Data Distribution with Load Balancing of Web

Servers

Li-Chuan Chen

Networking and Communications Department

The MITRE Corporation

McLean, VA 22102

Hyeong-Ah Choi

Department of Computer Science

The George Washington University

Washington, DC 20052

Abstract

Given the increasing traffic on the World Wide Web (Web), it is difficult for a single popular Web server to handle the demand from its many clients. By clustering a group of Web servers, it is possible to reduce the origin Web server’s load significantly and reduce user’s response time when accessing a Web document. A fundamental question is how to allocate Web documents among these servers in or-der to achieve load balancing? In this paper, we are given a collection of documents to be stored on a cluster of Web servers. Each of the servers is associated with resource lim-its in lim-its memory and lim-its number of HTTP connections. Each document has an associated size and access cost. The prob-lem is to allocate the documents among the servers so that no server’s memory size is exceeded, and the load is bal-anced as equally as possible. In this paper, we show that most simple formulations of this problem are NP-hard, we establish lower bounds on the value of the optimal load, and we show that if there are no memory constraints for all the servers, then there is an allocation algorithm, that is within a factor 2 of the optimal solution. We show that if all servers have the same number of HTTP connections and the same memory size, then a feasible allocation is achieved within a factor 4 of the optimal solution using at most 4 times the optimal memory size. We also provide improved approxi-mation results for the case where documents are relatively small.

1. Introduction

Internet and World Wide Web (WWW) traffic has grown explosively, and this growth is expected to continue. For a popular Web site, network congestion and server overload-ing may become serious problems in the future and would result in increased Web services delays. A number of

ap-proaches to solve this problem has been proposed recently, including mirroring, document caching, and clusters of Web servers.

The first is to mirror (replicate) popular Web sites in dif-ferent locations throughout the world. The original Web site’s homepage would contain a list of mirror sites. This allows users to choose a site based upon their location. One drawback of mirroring the Web site is that the user does not typically have access to information about underlying network and server load. This issue was considered in a number of papers [16, 11, 1, 14, 9] by taking network la-tency and server load into account, basing decisions on prior performance, or based on erasure codes. Web caching is a mechanism to place copies of frequently accessed Web ob-jects closer to the users. One difficulty in Web caching is the possibility of accessing stale Web objects. Cache coherence [10, 4] deals with problem of keeping Web objects consis-tent with the original copy. Web objects vary in size, unlike traditional caching systems, in modern computer memory systems, where a cache-line is of a fixed size. Replacement algorithms deal with this issue where more than one object might be removed to replace the current object [13, 6].

For the clustering of servers as a single Web server, Web documents are distributed among servers, and only one Universal Resource Locator (URL) is published to the clients. Since many servers are working together, load bal-ance is the main issue. This has been studied elsewhere [2, 12, 15, 5, 8]. We will focus on this approach in this paper.

We will present a number of results on the complexity of solving the allocation optimization and decision problems, either exactly or approximately. We will consider a number of different formulations of the problem, since some seem to be easier to approximate than others. In addition to estab-lishing that simple formulation of the allocation problem are NP-hard, we present approximation algorithms for the cases of no memory constraints, equal memory and HTTP

(2)

con-straints, and for allocation involving small document sizes.

2. Previous Work

There have been several studies of load balancing among a cluster of Web servers. These are usually broken into two broad categories: client-based load balancing and server-based load balancing.

Lewontin and Martin [9] implemented a client-side load balancing algorithm. Their method is based on the past formance of the requests to minimize response latency per-formance. The performance is measured by the number of bytes transmitted divided by the total time. A list of repli-cated servers’ performances is maintained at the client’s proxy server and then uses a directory service to map a URL onto one of the servers.

Many of the server-based load balancing systems are based on a 2-tier architecture. A front-end server is respon-sible for dispatching an incoming Web document request to one of the back-end document servers. In NCSA services [7], a round-robin Domain Name Service (DNS) is used for distributing client requests to one of the Web servers. The drawback of using DNS is that it does not provide load balance among the servers, due to the non-uniformly docu-ment sizes and DNS naming caching. DNS does not know the status of Web servers. When a server is down or busy (because it has been taking all the large document size re-quests), DNS might still rotate the request to that server.

Garland et al. [5] overcome NCSA’s uneven load balanc-ing problem by implementbalanc-ing a mechanism that monitors server load and selects the least loaded server for serving an incoming request. Their server’s load metric is determined by the number of Web document requests for the server plus the number of processes currently active in the server.

Narendran et al. [12] implemented a distributed Web servers system based on the combination of DNS round-robin, HTTP redirection and document’s access rate as a mechanism to balance the load. Our model is closely re-lated to theirs, but includes server memory size limits.

Existing research has stressed practical approaches to the problem of achieving load balance, but there has not been a theoretical analysis of the performance of these algorithms, and very little work has been done in terms of how Web documents are allocated among servers. In this paper we approach this load balance problem from a more theoreti-cal direction. We consider the allocation of Web documents among a cluster of Web servers in order to achieve load bal-ance. Each of the servers is associated with resource limits in their memory and their number of HTTP connections. Each document has an associated size,sj, and access rate,

rj. Following [12], we define the access rate to be the prod-uct of time needed to access the document and the probabil-ity that the document is requested.

We prove that even simple formulations of this problem are NP-hard. We also provide simple approximation algo-rithms for a number of formulations of this problem. In each case we show that the approximation algorithm achieves a fixed performance ratio with respect to the optimum solu-tion. Before presenting our results, we define our model in greater detail.

3. Problem Formulation

Our model is a generalization of one proposed by Naren-dran et al. [12]. It consists ofMservers andNdocuments. Throughout we use the index i when referring to servers andjwhen referring to documents. Each serveriis associ-ated with a memory sizemiand a number of simultaneous HTTP connectionsli. Each documentjis associated with a document sizesjand an access costrjas defined earlier. The total access costˆris sum of all documents’ access cost, ˆ

r=PNj=1 rj. The total number of HTTP connectionsˆlis

sum of all servers’ HTTP connections,ˆl=PMi=1 li.

Let r = (r1, r2, . . . , rN), l = (l1, l2, . . . , lM), s =

(s1, s2, . . . , sN),m = (m1, m2, . . . , mM). The input to

the allocation problem is a quadrupleI =hr, l, s, mi. The output is an allocation (or access) matrix, which is anm×n

matrix,aij, where0≤aij ≤1. Ifaij 6= 0, then document

jis allocated to serveri. We permit a document to be allo-cated to more than one server, and we interpretaijto be the probability that a request for documentjis to be processed by serveri. Any allocation must satisfy the following

allo-cation constraint M X

i=1

aij= 1, for1≤j≤N .

A special case, called a 0-1 allocation is one in which

aij ∈ {0,1}. In such an allocation each document appears in exactly one server.

LetDidenote the set of documents allocated to serveri, that is,

Di={j|aij 6= 0}.

The sum of document sizes in servericannot exceed the memory of this server. From this we have the following

memory constraint X

j∈Di

sj≤mi, for1≤j≤M.

An allocation satisfying these constraints is called a feasible

allocation.

LetRidenote the total access cost for serveri, that is,

Ri=

N X

j=1

(3)

A server’s ability to respond to document requests is af-fected by two quantities. The total number of bytes this server must send is proportional to the server’s total ac-cess cost. As the number of HTTP connections increases, the server’s ability to satisfy multiple requests increases. Hence, we define the load of serveriper HTTP connection to be Ri

li. Define objective functionf(a)to be

f(a) = max 1≤i≤M Ri li .

Our goal is to balance the load by minimizing the maxi-mum load over all servers.

Allocation Optimization Problem: Given input

quadru-pleI, find a feasible allocationathat minimizesf(a). Call this optimum allocationa∗

I, and letf

∗

I = f(a

∗

I)

be its optimum value. WhenI is clear from context, we will simply writef∗_.

Allocation Decision Problem: Given inputIand valuef0,

isf∗

I ≤ f0? Our interest in the decision problem is

that, given an algorithm for the decision problem, we may use it within the context of binary search to search for the optimum value for the optimization problem.

4. Contributions

First we establish a lower bound ofr/ˆ ˆl, on the value of the optimum loadf∗_{. This can be achieved when memory} is not a constraint, by allocating every document to every server. This immediately improves the results of Narendran et al. [12], since their results did not consider bounds on memory.

Our remaining results all involve 0-1 allocations (in which each document is assigned to exactly one server).

Hardness: We show that even without memory constraints

and with all servers having an equal number of HTTP connections, the allocation optimization problem is NP-hard. We also show that if memory constraints are present, then even determining the existence of a fea-sible 0-1 allocation is NP-hard. This remains true even if all servers have the same memory size.

No memory constraints: We show that if there are no

memory constraints for all the servers, then there is a simple and efficient greedy allocation algorithm, which is within a factor 2 of the optimal solution.

Equal memory and HTTP constraints: We show that if

all servers have the same number of HTTP connections and the same memory size, then a feasible allocation is achieved within a factor 4 of the optimal solution using 4 times the optimal memory size.

Small document sizes: The above hardness results rely on

the fact that documents can be nearly as large as the memory sizes of the servers. However, in practice, document sizes are typically much smaller than the servers’ memory sizes. We show that if the memory sizes for all servers are equal to some value m, and the size of the largest document is at mostm/k, then we can compute an allocation whose load is at most a factor of2(1 + 1/k)times optimal.

All of our approximation algorithms are based on simple greedy approaches, and are easy to implement.

5. Lower Bounds

Consider an input I = hr, l, s, mi where there are no memory constraints; that is mi = ∞. Recall that rˆ =

PN

j=1 rjandˆl=

PM

i=1 li. We begin by providing a lower

bound on the optimal allocation costf∗_.

Lemma 1 Let rmax = max1≤j≤N rj and lmax =

max1≤i≤M li, f∗ _≥ _max rmax lmax, ˆ r ˆ_l . Proof:

Assuming memory space is large enough to allocate all doc-uments for each server. Then the memory constraint is triv-ially satisfied. By the pigeon-hole principle, there exists a HTTP connection on a serverithat has to provide service at least the total access costrˆdivided by the total number of HTTP connectionsˆl, that isf∗_≥ ˆr

ˆ

l. The document with the

largest access rate must be assigned to some server, and in the best case it is assigned to the server with largest number of HTTP connections, implying a cost of at least rmax

lmax.

u t

We will use the following alternative lower bound in the proof of Theorem 2 below.

Lemma 2 Assume r1 ≥ r2 ≥ . . . ≥ rN andl1 ≥ l2 ≥

. . .≥lM, then for allj0_,₁_≤_j0_≤_min(_{M, N}₎_,

f∗ _≥ _max 1≤j0≤min(N,M) Pj0 j=1rj Pj0 i=1li . Proof:

Consider the optimal allocation of the firstj0_{documents to} theMservers. LetSj0 denote the servers used in this allo-cation and letf∗

j0 denote the cost of this allocation. Clearly

|Sj0| ≤j0_{. Since we allocate a subset of documents to all} the servers we havef∗

(4)

We claim that if i ∈ Sj0 then we may assume that i−1 ∈ Sj0. If not, then move all the documents from

serverito serveri−1. Sincei−1is not inSj0 it contains

no documents. Sinceli−1 ≥li, this can only decrease the

overall cost. Based on this and the fact that |Sj0| ≤ j0 it

follows that we may assume thatSj0 ⊆ {1,2, . . . , j0_}_. For1≤i≤j0_{, let}_Ri_{be the sum of}_rj_{for the documents} assigned to serveri. By definition

f∗ j0 = max 1≤i≤j0 _Ri li . Thus for1≤i≤j0_, f∗ j0li≥Ri, f ∗ j0 j0 X i=1 li≥ j0 X i=1 Ri= j0 X j=1 rj. Therefore, f∗ ≥ fj∗0 ≥ Pj0 j=1rj Pj0 i=1li . u t

Narendran et al. [12] present an allocation algorithm un-der a similar model to ours, but without memory constraints. Now we show that it is trivial to achieve an optimal alloca-tion by selectingaijappropriately.

Theorem 1 If for alli,mi ≥ PNj=1 sj,then an optimal

allocation is achieved by settingaij =li

ˆl, for alli, j.

Proof: Sinceaij >0, it implies that each server must have

copies of all documents, thus

f(a) = max _Ri li | 1≤i≤M = max ( PN j=1(rjaij) li | 1≤i≤M ) = max ( (li ˆ l) PN j=1 rj li | 1≤i≤M ) = max _r_ˆ ˆ l | 1≤i≤M = ˆr ˆ l ≤f ∗_.

Therefore,ais an optimal solution. ut

6. NP-Completeness

We observe that even simple formulations of this op-timization problem are NP-hard. Consider the following problems:

0-1 Allocation: Given an input quadrupleI =hr, l, s, mi,

does there exist a 0-1 allocation?

This problem is NP-complete even if we set all memory sizes equalm1 = m2 = . . . = mM = m. The reason

is that satisfying the memory constraints is equivalent to the bin packing problem wheresdenotes the sizes of the objects and the bins are of sizem.

If we choose to ignore memory constraints altogether then the problem is still NP-hard for 0-1 allocations as we show below.

0-1 Allocation with No Memory Constraints: Given an

input quadrupleI = hr, l, s, mi withmi = ∞ for 1 ≤i ≤M, does there exist an allocation with load valuef ≤1?

We show that this problem is NP-complete even if all servers have an equal number of HTTP connections,l1 =

l2=. . .=lM =l. As before we may reduce the bin

pack-ing problem to this problem by lettpack-inglbe the bin size and lettingrdenote the sizes of the objects to be packed. A 0-1 allocation of value at most 1 is equivalent to a bin packing into M bins, since for each server1 ≤ i ≤ M we have

Ri/l≤1implying that the total size of objectsRiassigned to biniis at most the bin size ofl.

These results imply that the problem is only interesting when there are memory constraints or limits on the number of servers to which a document can be allocated. Hence-forth we only consider 0-1 allocations.

7. Approximation Algorithms

Throughout the remainder of the paper we consider only 0-1 allocations.

7.1. No Memory Constraint

Consider an instance of the document allocation prob-lems in which there are no memory constraints, that is

mi=∞,1≤i≤M.

Consider Algorithm 1 shown in Fig. 1. We will show that it produces an allocation that is within a factor 2 of the opti-mal solution. Note that for a 0-1 allocation with no memory constraints we may assume thatN ≥ M, since otherwise the optimal assignment is achieved by placing one docu-ment in each of theN servers with the largest values ofli.

Theorem 2 Letf1 be the objective function value for the

Algorithm 1 for no memory constraint. Thenf1≤2f∗.

Proof: Suppose not. Let j0 _{be the first document}

al-located to some server i0 _{such that} R_i0 l0

i > 2f

(5)

Algorithm 1

Input: A quadrupleI =hr, l, s, mi, wheremi=∞

for1≤i≤M.

Output: A 0-1 allocation of documents to servers.

1. Sort documents by decreasing access cost,rj. 2. Sort servers by decreasing port connections,li. 3. for1≤i≤Mdo{

4. setRi= 0;}

5. for1≤j≤Ndo{

6. Chooseithat minimizes Ri+rj

li for1≤i≤M.

7. Allocate documentjto serveri. 8. Ri+= rj.}

Figure 1. The 0/1 approximation algorithm for no memory constraint.

Ri=Pjj0=1aijrj. LetS⊆ {1,2, . . . , M}denote the set of

servers which have received at least one of the firstj0 docu-ments by the algorithm. Note that|S| ≤j0_{. By line 2 of the} algorithm servers are sorted in descending order byli. We claim that ifi∈Stheni−1∈S. To see this, consider the first documentj00_{allocated to}_i_{. Just prior allocation of}_j00_, we have

Ri+rj00 li >

Ri−1+rj00 li−1 .

By line 8 of the algorithm, ifi−1∈/SthenRi−1 = 0and

since this is the first document allocated to serveri,Ri= 0. Thus rj00 li < rj00 li−1 ,

which impliesli> li−1, a contradiction.

Thus it follows that we haveS⊆ {1,2, . . . , j0_}_{. Because} each of the first j0 _{documents has been allocated to one} server inSwe havePj_i₌₁0 aijrj =rj. Consider the situa-tion just after the allocasitua-tion of documentj0_{. By the choice} ofiin line 6 of the algorithm we have for1≤i≤M,

Ri+rj0 li ≥ Ri0+rj0 li0 . By our hypothesis, Ri+rj0 li ≥ Ri0+rj0 li0 ≥ Ri0 li0 >2f∗_, which implies Ri0+rj0 >2li0f∗.

Summing over1≤i≤j0_{we have}

j0 X i=1   j0 X j=1 Ri+rj0   > j0 X i=1 2lif∗ j0 X j=1 Ri+j0_rj0 _> ₂_f∗ j0 X i=1 li j0 X j=1 j0 X j=1 aijrj+j0_rj 0 > 2f∗ j0 X i=1 li j0 X j=1 rj+j0_rj 0 > 2f∗ j0 X i=1 li.

Sincerj0 ≤rjfor1≤j≤j0, we have Pj0 j=1 rj+j 0_rj 0 ≥ Pj0 j=1 rj+ Pj0 j=1 rj. This implies 2 j0 X j=1 rj > 2f∗ j0 X i=1 li f∗ _< Pj0 j=1 rj Pj0 i=1 li .

However this contradicts the lower bound of Lemma 2. ut

It is easy to see that a straightforward implementation of Algorithm 1 runs in O(NlogN +N M)time, where lines 1 and 6 dominate the total time. If there areL dis-tinct values ofliit is possible to achieve a running time of

O(NlogN+N L), which is no worse sinceL≤M. To do this we partition the servers intoLgroups according to the value ofli. For each group we maintain a binary heap [3] which is sorted by the valueRi. For each group we can de-termine the minimumRivalue inO(1)time and hence can determine the serverion line 6 inO(L)time by inspect-ing each heap. For the selected heap we update the value ofRiinO(logN)time. Thus each iteration of the loop of line 5 takesO((logN) +L)time for a total running time of

O(NlogN+LN).

7.2. Equal Memory and Load Constraints

In this section we will show how to relax the assump-tion on memory size made in the previous secassump-tion. Recall that we are given an input quadrupleI =hr, l, s, mi, N =

|r| = |s|, M = |l| = |m|. Assume that we have homo-geneous servers with all servers having the same number of HTTP connections and equal memory sizes, that is,li =l

andmi = mfor1≤ i ≤ M. Assume there exists a 0-1 allocationa∗ _{with an objective function value}_f∗_{such that} both memory and load balancing constraints are satisfied for alli, that is,

N X j=1 sja∗ ij≤m, N X j=1 rja∗ ij l ≤f ∗_.

(6)

For1≤j ≤N, we normalize each document’s access costrjand each document’s sizesjas follows:

r0j = rj lf∗, s 0 j = sj m.

This implies thatPN_j₌₁r0

ja∗ij ≤ 1, and PN

j=1s0ja∗ij ≤ 1.

In general we do not knowf∗_{, so our approach will be to} conduct a binary search to find the smallest value of f∗ such that we can allocate all the documents into servers with memory4mwith total cost at most4f∗_{. This will provide} us with the desired approximation bounds.

Algorithm 2

Input: A quadrupleI =hr, l, s, mi,whereli=l,

mi=mfor1≤i≤Mand target costf∗_.

Output: A 0-1 allocation of documents to servers

and indication of success. 1. /∗Initialization∗/ L1 i, L2i, Mi1, Mi2= 0for1≤i≤M, normalizerj, sj, r0 j = rj lf∗, s 0 j = sj m, for allj 2. Split the documents into two sets,D1, D2, where

D1₌_{_j _| _r0

j≥s0j}, D2={j | rj0 < s0j}. 3. call Algorithm 3;

4. if all documents have been assigned to some server then output yes, else output no.

Figure 2. The 0/1 approximation algorithm for both memory and load constraints.

We split the documents into two sets,D1_{, D}2_. _D1

con-sists of the documents whose (normalized) access cost is bigger than its (normalize) document size andD2_consists

of documents whose document size is bigger than its access cost. LetL1

i denote the cumulative load for documents that

are inD1_{and are assigned to server}_i_{. Let}_M1

i denote the

cumulative memory for documents that are inD1 _{and are}

assigned to serveri. DefineL2

i andMi2similarly forD2.

Algorithm 3 shown in Figures 3 consists of two phases. We try to assign as many documents which are inD1_as

pos-sible in phase 1 and then assign the remaining documents which are inD2_{in phase 2. The first phase guarantees that}

servers are well utilized with respect to access cost and the second phase guarantees utilization with respect to size.

Claim 1 At any time in the execution of Algorithm 2,

M1

i ≤ L1i, L2i ≤ Mi2.

Proof: This follows from definitions ofD1_and_D2_. _u_t

Algorithm 3 Subroutine used in Algorithm 2

//Phase 1: Assign documents ofD1_to

//servers such thatRi≤lf∗_and_Mi_≤_m.

1. j= 1;

2. for(i= 1toMandj≤N)do{

3. while(j≤ |D1_|_and_L1

i <1)do{ 4. Allocate documentjto serveri.

5. L1

i += rj0;

6. M1

i += sj0; 7. j++;} }

//Phase 2: Assign documents ofD2_to

//servers such thatRi≤lf∗_and_si_≤_m.

1. j= 1;

2. for(i= 1toMandj≤N)do{

3. while(j≤ |D2_|_and_M2

i <1)do{ 4. Allocate documentjto serveri.

5. L2

i += rj0;

6. M2

i += sj0; 7. j++;} }

Figure 3. The 0/1 approximation algorithm for both memory and load constraints (cont.).

Claim 2 At any point in the algorithm,

max i (max(L 1 i, L 2 i, M 1 i, M 2 i))≤2), where the max is over1≤i≤M.

Proof: The proof is by induction on the number of

doc-uments. Initially this is clearly satisfied. Suppose that the claim holds just prior to insertion of documentj0. Letibe

the server to whichj0is allocated.

Case 1:j0∈D1 Prior to insertion,L1i ≤1(for otherwise

j0 would not be placed here) . So after insertion its

load isL1 i +r0j0 ≤1 +r 0 j0 ≤2(sincer 0 j0 ≤1, ∀j).

Thus after insertionj0, we haveL1i ≤ 2, and by the

previous claim,Mi1≤2.

Case 2:j0∈D2 Prior to insertion,Mi2≤1(for otherwise

j0 would not be placed here) . So after insertion its

load isM2 i +s0j0 ≤1 +s 0 j0 ≤2(sinces 0 j0 ≤1, ∀j).

Thus after insertionj0, we haveMi2 ≤2, and by the

previous claim,L2

i ≤2.

u t

(7)

Claim 3 If there exists an optimal allocation a∗

with value f∗ _{satisfying both the memory constraint}

PN

j=1sja∗ij ≤ m, and the load balance constraint, PN

j=1

rja∗ij

l ≤ f

∗_{, then Algorithm 2 succeeds in}

assign-ing all documents.

Proof: Suppose not. Letj0 be the first document which

fails to fit. ConsiderLi, Mijust prior to the insertion ofj0. • Case 1:j0∈D1

Just prior to insertion ofj0, we claim thatL1i >1, ∀i.

If this were not so for somei, Li≤1, then we would have assignedj0to serveri. From this, we have

M < M X i=1 L1i = M X i=1 N X j=1 r0 jaij < N X j=1 r0 j,

since each document is assigned to at most one server. This implies that the number of serversM ≤

PN

j=1 r0j. This contradicts the existence of an

alloca-tion of valuef∗_.

• Case 2:j0∈D2

Just prior to the insertion ofj0, we claim thatMi2 >

1, ∀i. If this were not so for somei, Mi≤1, then we would have assignedj0to serveri. From this, we have

M < M X i=1 Mi2 = M X i=1 N X j=1 s0jaij < N X j=1 s0j,

since each document is assigned to at most one sever. This implies that M ≤ PN_j₌₁ s0

j, contradicting the

existence of a feasible allocation.

u t

Theorem 3 Under the assumptions of Claim 3, the

alloca-tion given by Algorithm 2 assigns all documents, and its cost is less than 4 times the optimal allocation in memory and load constraints, that is

N X j=1 rjaij l ≤ 4f ∗_, N X j=1 sjaij ≤ 4m , for 1≤i≤M.

Proof: By Claim 2 and Claim 3 we have for1≤i≤M,

N X j=1 r0 jaij = X j∈D1 r0 jaij+ X j∈D2 r0 jaij = L1i+L2i ≤2 + 2 = 4, N X j=1 s0 jaij = X j∈D1 s0 jaij+ X j∈D2 s0 jaij = Mi1+Mi2≤2 + 2 = 4. By normalization,r0 j = rj lf∗, s 0 j = sj M, and so returning to

the original formulation we have

N X j=1 rjaij l = PN j=1 r0jaijlf∗ l = N X j=1 r0 jaijf ∗_≤₄_f∗_, N X j=1 sjaij = N X j=1 s0_jaij_·_m_≤₄_m. u t

Now we describe the complete algorithm. Recall the lower bound from Lemma 1

f∗_≥ ˆr

PM i=1li

.

Since li = lhere, we have f∗ _≥ rˆ

lM. We can derive an

easy upper bound by observing that in the worst case all documents are allocated to a single server, and hence

ˆ

r lM ≤f

∗_≤ rˆ

l.

Assuming all input quantities are integers, observe that

lM f∗ _{is an integer in the interval} _[ˆ_r,_rM_ˆ _{]. By applying} binary search to this interval we can determine a mini-mum value oflM f∗_{, and hence a minimum value for}_f∗_, such that Algorithm 2 succeeds in allocating all the docu-ments. This involvesO(log(ˆrM))calls to Algorithm 3. It is straightforward to show that Algorithm 3 runs inO(N+M) time. (The key observation is that each iteration of the loop of line 3 either finishes a document or finishes a server.) Thus the total running time isO((N+M) log(ˆrM)). Note that since the input size (in bits) is at leastΩ(N+M+log ˆr) the algorithm runs inO(nlogn)time, wherenis the input size.

Previous sections considered the document to be as large as server memories. In practice, document sizes are typ-ically much smaller than the server’s memory. The fol-lowing lemma shows that if each server can hold at least

kdocuments, then we can achieve a better result. That is 2(1 + 1/k)times optimal solution.

Theorem 4 If there exists an optimal allocation a∗

of value f∗ _{satisfying both the memory constraint,}

PN

j=1sja∗ij ≤ m, and the load balancing constraint, PN

j=1

rja∗ij

l ≤ f, then the allocation given by Algorithm 2 is at most2(1 + 1

k)time the optimal solution, wherekis number of documents that a server can hold.

For example, ifr0

j≤1/4, we have2(1 + 1/4) = 5/2times

(8)

Proof: The proof is based on the proof of Claim 2 and Theorem 3. From Claim 2,

max

i (max(L

1

i, L2i, Mi1, Mi2)) ≤1 +r0j0,

where the max is over1≤i≤M. Ifr0

j0≤1/k, then max i (max(L 1 i, L2i, Mi1, Mi2)) ≤1 + 1 k.

From Theorem 3, Algorithm 2 in Fig. 2 is less thanL1

i+L2i

andM1

i +Mi2times optimal allocation. Therefore,

alloca-tion given by 0/1 Approximaalloca-tion Algorithm 2 is2(1 + 1

k)

time the optimal solution. ut

8. Conclusions

We have considered the problem of balancing the load among a group of Web servers. We showed that even with-out memory constraints and with all servers having an equal number of HTTP connections, the allocation optimization problem is NP-hard. We also showed that if memory con-straints are present, then even determining the existence of a feasible 0-1 allocation is NP-hard. We have presented a number of approximation algorithms, including the cases where there are no memory constraints for all the servers and where servers have equal memory and HTTP con-straints. All of our approximation algorithms are based on simple greedy approaches, and are easy to implement.

References

[1] J. Byers, M. Luby, and M. Mitzenmacher. Accessing multiple mirror sites in parallel: Using tornado codes to speed up downloads. In INFOCOM ’99, volume 1, pages 275–283, March 1999.

[2] V. Cardellini, M. Colajanni, and P. S. Yu. Dns dis-patching algorithms with state estimators for scalable Web-server clusters. World Wide Web Journal, pages 101–113, 1999.

[3] T.H. Cormen, C.E. Leiserson, and R.L. Rivest.

Intro-duction to Algorithms. McGraw-Hill Book Company,

New York, 1990.

[4] A. Dingle and T. Partl. Web cache coherence.

Com-puter Networks and ISDN Systems, 28(7):907–920,

1996.

[5] Michael Garland, Sebastian Grassia, Robert Mon-roe, and Siddhartha Puri. Implementing distributed server groups for the World Wide Web. Technical

Report MU-CS-97-114, School of Computer Science, Carnegie Mellon University, Pittsburgh, Pennsylvania, 1995.

[6] Sandy Irani. Page replacement with multi-size pages and applications to Web caching. In Proceedings of

the Twenty-Ninth Annual ACM Symposium on Theory of Computing, pages 701–710, May 1997.

[7] Eric Dean Katz, Michelle Butler, and Robert Mc-Grath. A scalable HTTP server: The NCSA prototype. In Computer Networks and ISDN Systems, volume 27, pages 155–164, 1994.

[8] Dias Kish, Mukherjee, and Renu Tewari. A scalable and highly available Web server. In COMPCON’96, pages 85–92, 1996.

[9] Steve Lewontin and Elizabeth Martin. Client side load balancing for the Web. In 6th international World

Wide Web Conference Conference, 1997.

[10] Chengjie Liu and Pei Cao. Maintaining strong cache consistency in the World-Wide Web. IEEE

Transac-tions on Computer, 47(4):445–457, April 1998.

[11] A. Myers, P Dinda, and H Zhang. Performance char-acteristics of mirror servers on the Internet. In

INFO-COM ’99, pages 304–312, March 1999.

[12] B. Narendran, Sampath Rangarajan, and Shalini Ya-jnik. Data distribution algorithms for load balanced fault-tolerant Web access. In Proc. 16th IEEE Sympos.

Reliable Distributed Systems, pages 97–106, 1997.

[13] L. Rizzo and L. Vicisano. Replacement policies for a proxy cache. IEEE/ACM Transactions on Networking, 8(2):158–170, April 2000.

[14] M Sayal, Y Breitbart, P Scheuermann, and R Vin-gralek. Selection algorithms for replicated Web servers. ACM Performance Evaluation Review,

26(3):44–50, December 1998.

[15] Rahul Simha, B. Narahari, H-A Choi, and Li-Chuan Chen. File allocation for a parallel Webserver. In

IEEE Int. Conf. High Performance Computing, pages

16–21, December 1996.

[16] E. W. Zegura, H. Ammar, Z. Fei, and S. Bhattachar-jee. Application-layer anycasting: A server selec-tion architecure and use in a replicated Web service.

IEEE/ACM Transactions on Networking, 8(4):455–

Approximation Algorithms for Data Distribution with Load Balancing of Web Servers