Model-driven Server Allocation in Distributed Enterprise Systems

(1)

Model-driven Server Allocation in Distributed Enterprise Systems

James W.J. Xue, Adam P. Chester, Ligang He and Stephen A. Jarvis

Department of Computer Science, University of Warwick,

Coventry CV4 7AL, United Kingdom

Email:{xuewj2,apc, liganghe, saj}@dcs.warwick.ac.uk

Abstract:

Internet service providers (ISPs) usually use several server pools to host different web applications, to ensure smooth sys-tem management and minimum interference between appli-cations. The workload demand in each of the pools can vary dramatically due to a number of factors, including timing and the types of the hosted applications. Therefore, it is desirable that servers should be able to switch between pools to opti-mise resource usage and maxiopti-mise company revenue. Internet applications can be modelled as multi-tier queueing networks, with each network station corresponding to each application tier. The advantage of using an analytical model is that perfor-mance metrics can be easily computed, and potential system bottlenecks can be identified without running the actual sys-tem. In this paper, an analytical model is used to assist dynamic resource allocation in server pools. In addition, an admission control scheme is also used to deal with system overloading. Performance evaluation is conducted via simulation and the experimental results show the benefits of our approach for var-ious workload scenarios.

1. Introduction

As Internet computing infrastructure becomes more complex, IT outsourcing continues to grow. Many companies focus on their core business and transfer the responsibilities of provision of services to ISPs to reduce operational costs. ISPs provide such services based on so-called Service Level Agreements (SLAs). In order to ensure a profit, ISPs have to make efficient resource usage, while providing agreed services for agreed charges. It is common that ISPs use several server pools to host different web applications. There are several benefits to this approach, including separation of concerns, smooth system administration, and less interference between applications.

Workload demand for Internet services is usually very bursty [1][3][32], thus it is difficult to predict the workload level at a certain point in time. It is not uncommon that load in server pools represents an alternative pattern – work-load demand for one application is increasing whereas it is decreasing for the other application in another server pool (see Fig. 1). Static server configuration will result in waste of re-source in one pool while there is a shortage of rere-source in the other pool. To deal with workload variation and there-fore to optimise resource usage, some ISPs transfer some of this responsibility back to their customers via pricing mod-els [11][13][20]. Some of the modmod-els arerange-based, mean-ing that IT companies need to absorb risk by predictmean-ing their

resource usage; they will have to pay a high price for sud-den changes in workload. In a competitive market, if ISPs cannot manage resources properly, they will likely charge a higher price for using the resource, and their customers will likely switch to other ISPs. Alternatively, some ISPs use ap-proaches such as resource provisioning and overbooking, or resourcepeak plantto accommodate workload variation. How-ever, these methods of overbooking orpeak plantcan result in a waste of resource or become too costly to operate. Therefore, it is desirable that resources in a shared hosting environment be able to be switched between server pools to accommo-date workload fluctuation. When server switching is

consid-pool 1

pool 2 t Load

Fig. 1. An illustration of alternative workload pattern in server pools.

ered, two fundamental questions needed to be answered: a)

whatservers should be switched;b)how manyservers should be switched. Fortunately, with an analytical model, these two questions can be answered easily. The main advantage of us-ing an analytical model is that we can easily identify system bottlenecks and compute different performance metrics with-out running the actual system. Enterprise systems are typically multi-tiered, consisting of web servers, application servers and database servers. Therefore, such systems can be modelled using queueing networks, with each network station corre-sponding to each application tier (see Fig. 2). Once a model is built, system run-time parameters (e.g. from monitoring tools, or system logs) can be fed into the model, and dynamic server switching decisions can be made to optimise pre-defined per-formance metrics (e.g. to minimise the mean response time or maximise ssite revenue).

Server switching can bring performance benefits, however, it is not cost-free. It involves supporting system operations and run-time data needs to be transferred between servers. When an application server needs to be switched, for example, it involves deployment and/or undeployment of applications, network reconnection from web servers, and possible trans-fer of session data between servers. Therefore, before making switching decisions, switching costs need to be carefully con-sidered.

(2)

C C DS WS WS C AS AS AS

Fig. 2. The model of a typical configuration of a multi-tiered Internet service. C represents clients; WS, AS and DS represent web servers, application servers and database

servers, respectively.

Moreover, during special events, some Internet services are subject to huge increases in demand, which in extreme cases can lead to system overloading. During an overload period, the service’s response time may grow to an unacceptable level, and the exhaustion of resources may cause the service to be-have erratically or even crash [29]. The ratio of peak to light load is usually in the order of several magnitudes. It is clear that overprovisioning of server resources and server switch-ing cannot help in this situation. In order to better deal with system overloading, many services employ admission con-trol schemes [7][10][29]. By rejecting less important requests when the system is overloaded, admission control schemes can guarantee the performance of specific requests.

Our early work [30] has shown the performance benefits of our server switching policy formixedandrandomworkloads. In this paper, we use the newly established server switching policy to address resource allocation issue for other workload scenarios, namely thealternativeworkload and workload gen-erated from Internet traces. The system works in this way – when there is a system state change, the run-time parameters are fed into the queueing network model, hence new system bottlenecks can be potentially identified and new performance metrics can be computed. Before making a switching decision, a local search algorithm is used to search for the best system configuration as the basis for the switching. Furthermore, a simple admission control scheme is used to maintain the num-ber of simultaneous jobs in an enterprise system at an appro-priate level. We also compare the performance of the switching policy via simulation and compare the results with those from a proportional switching policy and non-switching policies.

The remainder of this paper is organised as follows. Sec-tion 2 reviews related work; secSec-tion 3 demonstrates the mod-elling of multi-tiered enterprise systems; section 4 intro-duces the concept of system bottlenecks and an identification methodology; section 5 describes the new server switching policy; in section 6, we briefly describe how admission con-trol is achieved in this framework; the experimental results in section 7 demonstrate the quality of the combination of the admission control scheme and the sever switching policy; sec-tion 8 concludes the paper.

2. Related Work

Recent research on revenue maximisation has attracted great interest. The work in [19] studies methods for maximising profits of best-effort and QoS demanding jobs in a web server farm. [22] provides differentiated services to different jobs

us-Table 1. Notation used in this paper

Symbol Description

Sir Service time of job class-rat stationi

vir Visiting ratio of job class-rat stationi

N Number of service stations in QN K Number of jobs in QN

R Number of job classes in QN Kir Number of class-rjob at stationi

mi Number of servers at stationi

φr Revenue of each class-rjob

πi Marginal probability at centrei

T System response time Dr Deadline for class-rjobs

Er Exit time for class-rjobs

Pr Probability that class-rjob stays

Xr Class-rthroughput before switching

Xr0 Class-rthroughput after switching

Ui Utilisation at stationi

ts Server switching time

td Switching decision interval time

ing priority queues to maximise a service provider’s revenue. In [6][13], the authors use an economic approach to manage resources in hosting centres. [23] addresses the issue of server switching between different queues and tries to optimise the total profits by solving a dynamic programming equation, with the consideration of various awards and penalties. In [12], the authors also try to maximise the total revenue by partitioning servers into logical pools and switching servers between pools at run-time. This paper is different from [12][23] in the follow-ing respects: a) this work addresses the same server switch-ing issue, but for multi-tiered enterprise system architectures – servers can be switched between pools in the same tiers; b) a multi-class closed queueing network model is employed as opposed to a single M/M/m queue for computing the various performance metrics; c) this work also captures the notion of bottlenecks and uses an established identification method to guide server switching; d) we deal with system overloading by using a supporting admission control scheme.

3. Modelling of Multi-tiered Internet Services

3.1 The Model

A tiered Internet service can be modelled using a multi-class closed queueing network [28][31]. Figure 2 shows a model for a typical configuration of such applications. In the model, C refers to the client; WS, AS and DS refer to the web server, application server and database server respectively. The queueing network is solved using the MVA (Mean Value Analysis) algorithm [24], which is based on Little’s law [18] and the Arrival Theorem [24][27] from standard queueing the-ory. In this section, we briefly describe how different perfor-mance metrics can be derived from the closed queueing net-work model. Table 1 summarises the notation used throughout this paper.

Consider a product form closed queueing network withN load-independent service stations.N = {1,2,· · ·, N}is the

(3)

set of station indexes. Suppose there areKcustomers and they are partitioned intoRclasses according to their service request patterns; customers grouped in a class are assumed to be statis-tically identical.R={1,2,· · ·, R}is the set of class indexes. The service time,Sir, in a multi-class closed queueing

net-work is the average time spent by a class-rjob during a single visit to stationi. The service demand, denoted asDir, is the

total service requirement, which is the average amount of time that a class-rjob spends in service at stationi during execu-tion. This can be derived from the Service Demand Law [21] as Dir = Sir ·vir; here vir is the visiting ratio of class-r

jobs to stationi.Kr is the total population of customers of

classr. The total population of the network is thus defined as K =PR

r=1Kr. The vectorK~ ={K1, K2,· · ·, KR}is used

to represent the population of the network.

In modern enterprise systems, clusters of servers are com-monly used in each application tier to improve server process-ing capability. Thus, when modellprocess-ing those applications, we need to consider both -/M/1-FCFS and -/M/m-FCFS in each station. Suppose there areKjobs in the queueing network, for i= 1, . . . ,N andr= 1, . . . ,R, the mean response time of a class-rjob at stationican be computed as follows,

Tir(k) =        Dir1 +P R r=1Kir(k−1r) , mi= 1 Dir mi 1 +PR r=1Kir(k−1r) +Pmi−2 j=0 (mi−j−1)πi(j|k−1r) i , mi>1 (1) here,(k−1r) = (k1, . . . , kr−1, . . . , KR)is the population

vector with one class-rjob less in the system. The mean system response is the sum of the mean response time of each tier.

For the case of multi-server nodes(mi >1), it is necessary

to compute the marginal probabilities. The marginal probabil-ity that there arejjobs(j= 1, . . . ,(mi−1))at the stationi,

given that the network is in statek, is given by [4], πi(j|k) = 1 j " _R X r=1 vir Sir Xr(k)πi(j−1|k−1r) # (2) Applying Little’s law [18], the throughput of class-rjobs can be calculated, Xr(k) = kr PN i=1Tir(k) (3) Applying Little’s Law again with the Force Flow Law [21], we derive the mean queue lengthKirfor class-rjob at stationias

below,

Kir(k) =Xr(k)·Tir(k) (4)

The starting point of this equation is Kir(0,0. . . ,0) =

0, πi(0|0) = 1, πi(j|0) = 0; afterK iterations, the

sys-tem response time, throughput and mean queue length in each tier can be computed.

In multi-class product form queueing networks, per-class station utilisation can be computed using the following equa-tion [24], Uir(k) = krDir P iDir[1 +Ki(k−1r)] (5) and the total station utilisationUi(k)is the sum of per-class

station utilisation,Ui(k) =P R

r=1Uir(k). The above is the

ex-act solution for multi-class product form queueing networks.

The trade-offs between exact solutions and approximations are accuracy and speed. We use exact solutions to guide server switching decisions as a higher degree of accuracy is believed to be important here. However, a dedicated machine can be used for the switching system itself, to solve speed and storage issues and to reduce the interference with the servers them-selves.

4. Bottleneck Identification

In [2], it shows that multi-class models can exhibit multiple simultaneous bottlenecks. The dependency of the bottleneck set on the workload mix is therefore derived. In an enterprise system there are normally different classes of jobs and the class mix can change at run-time. This suggests that there might be several bottlenecks at the same time and bottlenecks can shift from tier to tier over time.

4.1 Identification Methods

In [8], it is shown that the bottleneck for a single class queue-ing network is the station iwith the largest service demand Sivi, under the assumption of the invariance of service time

Siand visiting ratioviand given routing frequencies.

Consid-erable research exists [2][8][16][17][26] which studies bottle-neck identification for multi-class closed product-form queue-ing networks as the population grows to infinity. For a finite population, the results in [9][15] can be used. In this paper we use the approach developed in [5], which uses convex poly-topes for bottleneck identification in multi-class queueing net-works. This method can compute the set of potential bottle-necks in a network with one thousand servers and fifty cus-tomer classes in just a few seconds. Figures 3 and 4 are the

silver class jobs (%)

gold class jobs (%) 100 53.8 38.5 0 _46.2 _61.5 ₁₀₀ WS tier WS tier AS tier AS tier

Fig. 3. Bottleneck of the two-class QN in pool 1.

gold class jobs (%)

silver class jobs (%) 100 50.0 25.0 0 50.0 100 DS tier WS tier AS tier AS tier WS tier DS tier DS tier 75.0 33.3 16.7 66.7 83.3

Fig. 4. Bottleneck of the two-class QN in pool 2. bottleneck identification results using convex polytopes for our chosen configurations for pool 1 and pool 2. Figure 3 shows that in pool 1, when the percentage of gold class jobs is less

(4)

than 46.2%, the web server tier is the bottleneck; when it is be-tween 46.2% and 61.5%, the system enters acrossover points region, where the bottleneck changes; when the percentage of gold class jobs in pool 1 exceeds 61.5%, the application server tier becomes the bottleneck.

Figure 4 shows the bottleneck identification in pool 2. It is more complex and is a good example of multiple bottlenecks and bottleneck shifting. In this case, when the percentage of silver class jobs is less than 16.7%, the web server tier is the bottleneck; when it is between 16.7% and 33.3%, both the web server tier and the database tier are in thecrossover

region; if the percentage of silver class jobs lies in the region 33.3% to 50.0%, the database tier becomes the bottleneck; when it is between 50.0% and 75.0%, the system enters another

crossover region, where the application server tier and the database server tier dominate; and finally, if the percentage of silver class jobs exceeds 75.0%, the application server tier is the bottleneck in the system.

5. Server Switching

As previously highlighted, the workload in enterprise systems can vary significantly. Due to this variation, it is very diffi-cult to predict the workload in advance. It is therefore the case that one-time system configuration is no longer effective and it is desirable that servers be able to switch from one pool to another, depending on the load conditions. However, the server-switching operation is not cost-free, since during the period of switching the servers being switched cannot serve jobs. Therefore, a decision has to be made as to whether it is worth switching, and if so, how much resource should be switched [12]. Figure 5 is an illustration of the switching

sys-Admission Control Enterprise System System Monitoring Switching Engine Performance Model Workload SLA Workload Model

Fig. 5. Illustration of the server switching system. tem. First, a workload model is built from the load that enters from the admission control component. Based on the workload model and system hardware information, a performance model can be built. The system monitoring facility collects run-time system data and communicates with the switching engine. If the job class mix changes, the monitoring tool should be able to catch the change and pass it to the server switching engine. The engine then solves the performance model and compares the benefits and penalties of all possible switches before mak-ing the final decision.

5.1 Revenue Function

For a typical Internet service, a user normally issues a sequence of requests (referred to as asession) during new visit to the

service site. Based on most of SLAs between ISPs and content providers, revenue contribution of a request is closely related to the QoS (e.g., how long the response time of the request is), which means that a request contributes full revenue if it is processed before the deadlineDr. When a requestrmisses its

deadline, it still waits for execution with a probabilityP(Tr)

and credit is still due for late, yet successful processing. When the response time Tr < Dr, thenP(Tr) = 1; which means

that the request contributes full revenue and the user will send another request. Suppose Er is some time point, at which

the request is dropped from the system. It is assumed in this paper that whenDr ≤ Tr ≤ Er, the request will remain in

the system with probabilityP(Tr), which follows a uniform

distribution. IfTr ≥ Er, thenP(Tr) = 0, which means that

the request quits the system without contributing any revenue. The following equation is used for calculatingP(Tr),

P(Tr) =        1, Tr< Dr Er−Tr Er−Dr , Dr≤Tr≤Er 0, Tr> Er (6)

The meaning of the above equation is that the longer the completion time of a jobrexceeds its deadline, the more likely it is that the client will quit the system, thus approximating real-world client behavior.

Based on the revenue function, the revenue gained and lost by server switching can be calculated. Suppose some servers need to be switched from pool i to pool j. We use Vi

loss

to represent the revenue loss in pool i. From the time that switching happens, the service capacity offered by server pool istarts to degrade. From eq. 7, the revenue loss in poolican be derived, Vlossi = R X r=1 Xri(ki)φriP(Tr)td− R X r=1 Xri0(ki)φirP(Tr)td (7)

The server switching itself takes time, during which neither poolinor pooljcan use the servers being switched. Only after switching timets, does poolj then benefit from the switched

servers. During the switching decision interval time td, the

revenue gainV_gainj can be calculated as below,

V_gainj = R X r=1 X_rj0(kj)φj_rP(Tr)(td−ts) − R X r=1 Xrj(kj)φjrP(Tr)(td−ts) (8)

here, it is assumed the decision interval timetd > ts.

Our goal in this paper is to maximize the ISP’s total revenue contributed by both pooli and poolj. In other words, when we decide whether to switch servers, we need to compare the revenue gain and loss caused by server switching, and the switching is done only when V_gainj > V_lossi . Here, we only consider switching servers between pools in the same tier (i.e., we switch web servers from pool ito the web server tier in poolj), although given proper configuration, the switching is also possible between tiers (i.e., switching web servers in pool ito the application tier in poolj).

(5)

5.2 Server Switching Policies

In this section, several switching policies will be introduced (described as algorithms). In all algorithms, superscripts rep-resent pools and subscripts 0, 1, 2 reprep-resent the web tier, appli-cation tier and database tier respectively.

5.2.1 Proportional Switching Policy

First, we consider a n¨aive policy called the proportional switching policy (PSP). The policy switches servers between pools based on the workload proportion in both pools. Per-formance criteria for server switching is computed using the queueing network model; if the performance of the new con-figuration is better than the current one, then server switching is done, otherwise the server configuration remains the same. Algorithm 1 describes the operation of this policy. Algorithm 1

Algorithm 1Proportional switching policy

Input:N,mi,R,Kir,Sir,vir,φr,ts,td

Output:Server configuration

foreachiinN do

m1_i/m2_i =K1/K2

end for

calculateVlossandVgainusing eq. 7 and eq. 8

ifVgain > Vlossthen

do switching according to the calculations Sir←S

0 ir

else

server configuration remains the same

end if

return current configuration

is simple as it only considers the workload proportion. In fact, workload mix and revenue contribution from individual classes in different pools can also affect the total revenue. Therefore, we introduce a new switching policy, which takes the above factors into account.

5.2.2 Bottleneck-aware Switching Policy

Algorithm 2The bottleneck-aware switching policy

Input:Nr,mi,R,Kir,Sir,vir,φr,ts,td

Output:new configuration

whilebottleneck saturation found in one pooldo

iffound at same tier in the other poolthen

return

else

switch servers to the bottleneck tier mi←m 0 iandSir←S 0 ir end if end while

search configurations using Algorithm 3 return current configuration

Here we describe a more sophisticated server switching pol-icy called the bottleneck-aware switching polpol-icy (BSP), as de-scribed in Algorithm 2. BSP works in two phases: 1)

Bottle-neck identification. It first checks for bottleneck saturation in both pools. If both pools have bottlenecks at the same tier, two cases are considered: a) if both of them are saturated, then no server will be switched;b) if a bottleneck is saturated in one pool but not in the other, then the algorithm incremen-tally switches servers between the same tiers and compares the new revenue with the value from the current configuration. If a potential configuration can result in more revenue, then the configuration will be stored. The process continues until no bottleneck saturation exists in either pool, or no more switch-ing can be done from the other pool. Note that when bottleneck saturation is found, server switching in the other tiers has little or no effect, thus it can be safely neglected. 2)Local search. If there is no bottleneck saturation in either of the pools, then the algorithm computes the server utilisation at all tiers in both pools and switches servers from low utilisation tiers to high utilisation tiers using a local search algorithm (Algorithm 3). Algorithm 3 uses nested loops to search for possible server

Algorithm 3The configuration search algorithm

Input:Nr,mi,R,Kir,Sir,vir,φr,ts,td

Output:best configuration

Initialisation: computeU_i1, U_i2 whileU01> U02do ifm20>1then m20↓,m10↑;S02r←S2 0 0r whileU1 1 > U12do ifm2 1>1then m2 1↓,m11↑;S12r←S2 0 1r whileU1 2 > U22do ifm2 2>1then m2 2 ↓, m12 ↑; S22r ← S2 0 2r; compute Vloss using eq. 7 S1 2r←S1 0

2r; computeVgainusing eq. 8

ifVgain> Vlossthen

store current configuration

end if

compute newU_i1, U_i2

end if end while

similar steps forU21< U22

S1 1r←S1 0 1r; compute newUi1, Ui2 end if end while

similar steps forU1 1 < U12 S1 0r←S1 0 0r; compute newUi1, Ui2 end if end while

similar steps forU1 0 < U02

return best configuration

switches, starting from the web tier continuing to the database tier. It tries to explore as many possible switching configura-tions as possible. However, the algorithm will not guarantee that the best switching result (the global optimal) will be found, thus it is a best-effort algorithm. If we usem0, m1, m2to

rep-resent the total number of web servers, application servers and database servers in both pools respectively, in the worst case,

(6)

the total number of searches made by Algorithm 3 will be

(m0−2)×(m1−2)×(m2−2), therefore the time

complex-ity isO(m0·m1·m2). For typical server configurations,m0,

m1andm2are not normally large, thus Algorithm 3 is feasible

in practice. The time for each search iteration depends on the complexity of the underlying queueing network model, which in turn depends on the number of stations and the number of job classes (the dominant factor as shown in [17]). Enterprise systems are normally three-tiered (N = 3), and the number of job classes is normally small, depending on the classification criteria. Therefore, solving such a multi-class closed queueing network model is very quick; the same applies for each itera-tion in the searching algorithm. It has been observed that for our configuration, the average runtime of the algorithm is less than 200 milli-seconds on a 2.2Ghz server, which is considered acceptable.

5.3 Proactive and Reactive Switching

In our switching system, two approaches to server switching can be used: proactive switching and reactive switching. Proac-tive switching is motivated by identifying similar workload patterns over time (hours, days, weeks etc). Most Internet ser-vices have cyclical patterns. For instance, for real-time finan-cial applications, the peak load normally appears at the begin-ning and the end of the market, and the load is lower during the remainder of the opening hours; it is also the case that Monday and Friday are busier than other weekdays. Based on historical workload patterns, and by applying some workload prediction techniques such as those introduced in [25], the server switch-ing engine can re-allocate resources before the expected heavy workload arrives, and can avoid the costs of server switching during a heavily loaded period. However, due to uncertainties, workload demand can have huge variation and predictive in-accuracies can be introduced by the workload predictor, which are then passed to the switching engine, stimulating inappro-priate or wrong decisions. Therefore, proactive switching is not perfect and it can at best hope to improve the overall perfor-mance during long term periods.

Reactive switching is more dynamic, based on run-time system parameters and can respond to system state changes quickly. The run-time data is collected via system monitoring tools, is reformatted, and is fed into the analytical model. The model then is solved and alternative switching decisions are compared. The proactive and reactive switching approaches can of course work together to optimise the overall system performance.

6. Admission Control

In this paper, we also use a simple admission control scheme, in addition to the server switching policy, to maintain the number of concurrent jobs in the system at an appropriate level. When the workload is high, which in turn makes the overall system response time high, less important requests are rejected first. If requests in this category are rejected, but the overall response time still remains high, the AC scheme continues to reject jobs in the system, until the response time decreases to an acceptable level.

7. Performance Evaluation

7.1 Experimental Setup

We design and develop a simulator to evaluate the server switching approach in this paper. Two applications are simu-lated, running on two logical pools (1 and 2). Each application has two classes of job (gold and silver), which represent the importance of these jobs. Both applications are multi-tiered and run on a cluster of servers. The service timeSirand the

visiting ratiovirare chosen based on realistic values or from

those supplied in supporting literature. Based on a real test-Table 2. Experimental setup.

Pool 1 Pool 2

Silver Gold Gold Silver Service time (sec) WS 0.07 0.1 0.05 0.025 AS 0.03125 0.1125 0.01 0.06 DS 0.05 0.025 0.0375 0.025 Visiting ratio WS 1.0 0.6 1.0 0.8 AS 1.6 0.8 2.0 1.0 DS 1.2 0.8 1.6 1.6 Deadline (sec) 20 15 6 8

Exit point (sec) 30 20 10 12

Revenue unit 2 10 20 4 Number of servers WS 4 5 AS 10 15 DS 2 3

bed which we have access to, the application server switching is known to take less than five seconds. Web server switch-ing is relatively straightforward but database server switchswitch-ing is more complex. In this paper, we assume that the switching cost for web servers, application servers and database servers is the same for simplicity. However, different switching costs do not affect the principle of the policies. Experimental param-eters can be found in Table 2.

7.2 Evaluation Results

Experiments have been conducted for an alternative work-load and workwork-loads generated from real-world Internet traces. For each of these cases, we compare the results from our bottleneck-aware server switching policy (BSP) with those from the proportional server switching policy (PSP) and the non-switching policy (NSP).

7.2.1 Alternative Workload

In a web hosting center, it is not uncommon that during certain periods the workload for one application is increasing while it is decreasing for another. The alternative workload can af-fect overall system performance. In this section, we conduct performance evaluation for two cases: 1) when the workload increases in pool 1 and decreases in pool 2; 2) when the work-load increases in pool 2 and decreases in pool 1. In both cases, the workload mix for silver and gold class jobs in both pools is constant. The total number of concurrent users is set to a fixed number. During evaluation, admission control is applied

(7)

when necessary. Both sets of experiments are run for 570 sec-onds, during which 19 switching decisions are made. Table 3 and Table 4 list the results for both sets of experiments. In Ta-Table 3. Load in pool 1 increases while it decreases in pool 2.

Load NSP Without AC With AC (P1, P2) PSP BSP PSP BSP (20,190) 2418 403 5916 403 5916 (30,180) 2429 2429 2569 2429 2569 (40,170) 2429 2429 6134 2429 6134 (50,160) 2425 2425 2619 2425 2619 (60,150) 2420 2420 7175 2420 7175 (70,140) 2415 2415 3458 2415 3385 (80,130) 2410 2410 15097 2410 15097 (90,120) 3827 3827 10389 3827 9288 (100,110) 3459 3459 11014 3459 3837 (110,100) 6374 6374 11872 6374 16510 (120,90) 5244 5526 11189 5526 16497 (130,80) 5557 6923 7963 6923 16233 (140,70) 4761 6255 13367 6255 16151 (150,60) 6735 3780 13408 3780 16038 (160,50) 6834 6905 13461 6905 15877 (170,40) 6944 6273 13532 6273 15639 (180,30) 7068 6478 13632 6478 15264 (190,20) 7201 7012 13752 7012 14604 (200,10) 7143 7387 13487 7387 13114 Total 88093 85130 190034 85130 211947 Improvement (%) -3.4 115.7 -3.4 140.6 ble 3, we see that the workload in pool 1 increases by 10 each time from 10 to 200, while it decreases by 10 from 200 to 10 in pool 2. The total revenue from NSP is 88,093. If AC is not applied, the total revenue from PSP and BSP are 85,130 and 190,034, representing a -3.4% and a 115.7% improvement, re-spectively. When AC is applied, the total revenue from PSP and BSP are 85,130 and 211,947, representing a -3.4% and a 140.6% improvement, respectively. The negative impact from PSP is reasonable as the PSP is a n¨aive switching policy, which simply allocates servers based on the workload proportion re-gardless of the performance results. Moreover, for each server switching, there is also a cost associated with it. Although dur-ing each run, the resultdur-ing revenue from PSP is higher than from NSP, in the long term the overall improvement could be negative (note that PSP does not switch servers in each run). In this set of experiments, there is also a performance improve-ment when admission control is applied. In Table 4, the work-load in pool 1 decreases by 10 each time step from 200 to 10, while it increases in steps of 10 from 10 to 200 in pool 2. The total revenue from NSP is 83,289. Without AC, the total rev-enue from PSP and BSP are 105,698 and 127,469, represent-ing a 26.9% and a 53.0% performance improvement, respec-tively. When AC is applied, the new total revenues are 105,698 and 117,808, representing a 26.9% and a 41.4% improvement. Note that with AC, the total revenue from BSP is less than it is in the no AC case. This is reasonable for light load situations (such as the chosen workload in this case). This is because the AC works before the BPS and if the workload results in system bottleneck saturation, the AC simply rejects requests. However, the saturation for the current configuration may be resolved in a new configuration returned by the BPS, and the rejected requests will result in a loss of revenue. We believe that when the workload is high, due to the switching costs, the

Table 4. Load in pool 1 decreases while it increases in pool 2 Load NSP Without AC With AC (P2, P1) PSP BSP PSP BSP (20,190) 7201 6227 14492 6227 14492 (30,180) 7068 6862 11895 6862 11895 (40,170) 6944 5584 9865 5584 9865 (50,160) 6834 5576 14617 5576 14617 (60,150) 6735 5569 14787 5569 14787 (70,140) 4761 5560 14917 5560 14917 (80,130) 5557 5551 1790 5551 1790 (90,120) 5244 5540 4567 5540 4742 (100,110) 6374 5528 7562 5528 5238 (110,100) 3459 5515 9442 5515 3321 (120,90) 3827 5499 12455 5499 4233 (130,80) 2410 5482 586 5482 2028 (140,70) 2415 5461 1792 5461 7181 (150,60) 2420 5436 1346 5436 1346 (160,50) 2425 5405 1468 5405 1468 (170,40) 2429 5367 1469 5367 1469 (180,30) 2429 5314 1471 5314 1471 (190,20) 2418 5229 1474 5229 1474 (200:10) 2339 4993 1474 4993 1474 Total 83289 105698 127469 105698 117808 Improvement (%) 26.9% 53.0% 26.9% 41.4%

overall revenue without AC will be less than it is with AC. To confirm this, we set the total number of users in both pools to 250. The total revenue from NSP is now 84,170. Without AC, it is 127,918 using PSP and 158,487 using BSP, representing a 52.0% and an 88.3% performance improvement. With AC, the total revenue from PSP and BSP are 127,918 and 161,550, rep-resenting a 52.0% and a 115.7% performance improvement. In conclusion, BSP always outperforms PSP in terms of revenue contribution. The AC does not always improve performance, this depending on the workload intensity and workload mix.

7.2.2 Workloads From Internet Traces

In this section, the workload used for our simulation is gen-erated from real-world Internet traces. Two Internet traces are used for the workloads in the two server pools in the exper-iments [14]. TheEPA-HTTPtrace contains a day’s worth of HTTP requests to the EPA WWW server located at Research Triangle Park, NC. The SDSC-HTTP trace contains a day’s worth of HTTP requests to the SDSC WWW server located at the San Diego Supercomputer Centre in California. Work-load characteristics (in terms of the number of requests in the systems) in both traces are extracted every five minutes. In this section, two switching decision intervals are considered: 1) a short switching decision interval – 30 seconds; 2) a long switching decision interval – 60 seconds. In section 7.2.1, a five-second fixed server switching time is used; we use differ-ent server switching times (5, 10 and 15 seconds) in this sec-tion and evaluate the performance impact of the switching cost on total revenue for the three different switching policies. We evaluate the performance of the three policies with and with-out the admission control scheme for each of the above cases. Tables 5 and 6 list the performance results for short and long switching decision intervals. As can be seen from both tables, for different server switching times, both PSP and BSP per-form better than NSP in terms of revenue contribution with and without AC, except for the long interval case when the server

(8)

Table 5. Short decision interval for workload from traces. Switching time(sec) 5 10 15 Policy NSP PSP BSP NSP PSP BSP NSP PSP BSP Without AC No. of switches 0 18 5 0 14 5 0 13 16 Total revenue(x1000) 614 683.5 1374 614 715 1370 614 648.7 569.1 Improvement(%) 0 11.3 123.7 0 16.4 123.2 0 5.6 -7.3 With AC No. of switches 0 18 7 0 14 5 0 13 24 Total revenue(x1000) 614 683.2 1447 614 714.7 1370 614 648.7 1250 Improvement(%) 0 11.3 135.6 0 16.4 123.2 0 5.6 103.5 Improvement over non-AC(%) 0 0 11.9 0 0 0 0 0 110.8

Table 6. Long decision interval for workload from traces.

Switching time(sec) 5 10 15 Policy NSP PSP BSP NSP PSP BSP NSP PSP BSP Without AC No. of switches 0 18 5 0 18 5 0 16 5 Total revenue(x1000) 1228 1369 2750 1228 1367 2747 1228 1420 2744 Improvement(%) 0 11.5 123.9 0 11.3 123.7 0 15.6 123.4 With AC No. of switches 0 18 7 0 18 7 0 16 7 Total revenue(x1000) 1228 1369 2899 1228 1366 2894 1228 1420 2890 Improvement(%) 0 11.4 136.0 0 11.3 135.6 0 15.6 135.3 Improvement over non-AC(%) 0 -0.1 12.1 0 0 11.9 0 0 11.9

switching time is 15 seconds. The improvement for PSP ranges from 5.6% to 16.4% whereas it ranges from 103.5% to 136% for BSP with one exception (-7.3%).

Table 6 shows that when the number of switches is the same (the number of switches for BSP is the same for differ-ent switching times), the longer the server switching time is, the less the performance improvement is. Table 6 also shows that the number of switches for PSP when the switching time is 5 and 10 seconds is the same; but when the switching time increases to 15 seconds, the number of switches decreases by 2 to 16, which results in slight performance improvement. In Table 5, it can be seen that when the server switching time in-creases from 5 seconds to 10 seconds, the number of server switches for PSP drops from 18 to 14, which results in a slight performance improvement. When the switching time is in-creased to 15 seconds, the number of switches only decreases by 1. Since the switching cost has increased by 50%, the to-tal revenue is reduced. Results from both tables are intuitive. Server switching is not cost-free, therefore, the performance improvement is closely related to how long a switching takes and the number of server switches. There is a trade-off between performance improvement and the number of server switches, and it depends on the decision interval and the server switching time. On one hand, more switches results in more potential per-formance improvement, on the other hand, due to the switch-ing costs involved, too many switches could result in less or negative improvement. For the chosen workload, when AC is applied, there is no performance improvement for PSP, and the overall improvement for BSP is approximately 12%. The ex-ception is the last case in Table 5, where the improvement is 103.5% with AC but negative without AC. Fig. 6 and 7 give more intuition about the results presented in Table 5 and 6. As shown in both figures, BSP contributes more revenue than NSP and PSP in most of the experiments.

In summary, for certain workload scenarios, there is a trade-off between performance improvement and the number of server switches for different server switching times and switch-ing decision intervals. The number of server switches depends

5 10 15 0 100 200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400 1500 1600 1700 1800 1900 2000 to ta l re v e n u e ( x 1 0 0 0 )

switch time (sec)

NSP_NO_AC NSP_AC PSP_NO_AC PSP_AC BSP_NO_AC BSP_AC

Fig. 6. Total revenue for short switch decision interval.

5 10 15 0 200 400 600 800 1000 1200 1400 1600 1800 2000 2200 2400 2600 2800 3000 3200 3400 3600 3800 4000 to ta l re v e n u e ( x 1 0 0 0 )

switch time (sec)

NSP_NO_AC NSP_AC PSP_NO_AC PSP_AC BSP_NO_AC BSP_AC

Fig. 7. Total revenue for long switch decision interval. on workload characteristics. Admission control schemes do not always improve performance for all workload scenarios.

8. Conclusions

In this paper we use a model-driven server switching policy to dynamically allocate server resources in enterprise systems. Such systems are normally multi-tiered, and in each tier a clus-ter of servers are commonly used to improve processing ca-pabilities. We model the tiered architecture as a multi-class closed queueing network, with each network station cor-responding to each application tier. The multi-tiered architec-ture can introduce bottlenecks, which will limit the overall

(9)

sys-tem performance. In this paper, we use a convex polytopes-based approach to identify bottlenecks in the multi-class closed queueing network. The new switching policy responds to iden-tified bottlenecks and switches servers between pools when necessary. In addition, we use an admission control scheme to deal with system overloading, which guarantees that the underlying system can respond to specific customers. Perfor-mance evaluation has been done via simulation for an exper-imental workload and workloads generated from real-world Internet traces. The results are compared with those from a n¨aive switching policy and with a system that implements no switching. Our experimental results show that the combination of the admission control scheme and the proposed switching policy performs substantially better than the other two policies in terms of revenue contribution.

Acknowledgment

This work is supported in part by the UK Engineering and Physical Science Research Council (EPSRC) contract number EP/C538277/1.

References

[1] M Arlitt and T Jin, A Workload Characterization Study of the 1998 World Cup Web Site. IEEE Network, Vol. 14, No. 3, 2000, pp. 30–37.

[2] G Balbo and G Serazzi, Asymptotic Analysis of Multiclass Closed Queueing Networks: Multiple Bottlenecks. Performance Evaluation, Vol. 30, No. 3, 1997, pp. 115–152.

[3] P Barford and M Crovella, Generating Representative Web Workloads for Network and Server Performance Evaluation. ACM SIGMETRICS Performance Evaluation Review, Vol. 26, No. 1, 1998, pp. 151–160.

[4] G Bolch, S Greiner, H de Meer and K S Trivedi, Queueing Networks and Markov Chains: modelling and performance evaluation with computer science applications, 2nd ed., Wiley, 2006.

[5] G Casale and G Serazzi, Bottlenecks Identification in Multiclass Queueing Networks Using Convex Polytopes. Modelling, Analysis, and Simulation of Comp. and Telecommunication Systems (MASCOTS) 2004.

[6] J S Chase and D C Anderson, Managing Energy and Server Resources in Hosing Centers. 18th ACM Symposium on Operating Systems Principles 2001.

[7] L Cherkasova and P Phaal, Session Based Admission Control: a Mechanism for Peak Load Management of Commercial Web Sites. IEEE Transactions on Computers, Vol. 51, No. 6, 2002. [8] P J Denning and J P Buzen, The Operational Analysis of

Queueing Network Models. ACM Computing Surveys, Vol. 10, No. 3, 1978, pp. 225–261.

[9] D L Eager and K C Sevcik, Bound Hierarchies for Multiple-class Queueing Networks. Journal of ACM, Vol. 33, No. 1, 1986, pp. 179–206.

[10] S Elnikety, E Nahum, J Tracey and W Zwaenepoel, A Method for Transparent Admission Control and Request Scheduling in e-Commerce Web Sites. International WWW Conference, New York, USA 2004.

[11] P C Fishburn and A M Odlyzko, Dynamic Behavior of Differential Pricing and Quality of Service Options for the Internet. proceedings of 1st International Conference on Information and Computation Economies. 1998.

[12] L He, J W J Xue and S A Jarvis, Partition-based Profit Optimisation for Multi-class Requests in Clusters of Servers. IEEE International Conference on e-Business Engineering 2007. [13] B A Huberman and S H Clearwater, Swing options: A

mechanism for pricing peak it demand, Computing in economics and finance, HP Labs, 2005.

[14] Internet Trace, Internet Traffic Archive Hosted at Lawrence Berkeley National Laboratory. http://ita.ee.lbl.gov/html/traces.html 2008.

[15] T Kerola, The Composite Bound Method for Computing Throughput Bounds in Multiple Class Environments. Perfor-mance Evaluation, Vol. 6, No. 1, 1986, pp. 1–9.

[16] C Knessl and C Tier, Asymptotic Approximations and Bottle-neck Analysis in Product Form Queueing Networks with Large Populations. Performance Evaluation, Vol. 33, No. 4, 1998, pp. 219–248.

[17] M Litoiu, A Performance Analysis Method for Autonomic Computing Systems. ACM Transaction on Autonomous and Adaptive Systems, Vol. 2, No. 1, 2007, p. 3.

[18] J Little, A Proof of the Queueing FormulaL=λW. Operations Research, Vol. 9, No. 3, 1961, pp. 383–387.

[19] Z Liu, M Squillante and J Wolf, On Maximizing Service-level-agreement Profits. ACM SIGMETRICS Performance Evaluation, Vol. 29, 2001, pp. 43–44.

[20] J K M MacKie-Mason and H R Varian, Pricing Congestible Network Resources. IEEE Journal on Selected Area in Commu-nications, Vol. 13, No. 7, 1995, pp. 1141–1149.

[21] D A Menasce and V A F Almeida, Capacity Planning for Web Performance: metrics, models,and methods, Prentice Hall PTR, 1998.

[22] D A Menasce, V A F Almeida, R Fonseca and M A Mendes, Business-oriented Resource Management Policies for e-Commerce Servers. Performance Evaluation, Vol. 42, 2000, pp. 223–239.

[23] J Palmer and I Mitrani, Optimal and Heuristic Policies for Dynamic Server Allocation. Journal of Parallel and Distributed Computing, Vol. 65, No. 10, 2005, pp. 1204–1211.

[24] M Reiser and S Lavenberg, Mean-value Analysis of Closed Multi-Chain Queueing Networks. Journal of the Association for Computing Machinary, Vol. 27, 1980, pp. 313–322.

[25] J Rolia, X Zhu, M Arlitt and A Andrzejak, Statistical Service Assurances for Applications in Utility Grid Environments, Tech. rep., Technical Report HPL-2002-155, HP Labs, 2002.

[26] P J Schweitzer, A Fixed-point Approximation to Product-form Networks with Large Populations. 2nd ORSA Telecommunica-tion Conference 1992.

[27] K Sevcik and I Mitrani, The Distribution of Queueing Network States at Input and Output Instants. Journal of the Association for Computing Machinary, Vol. 28, No. 2, 1981.

[28] B Urgaonkar, G Pacifici, P J Shenoy, M Spreitzer and A Tantawi, An Analytical Model for Multi-tier Internet Services and its Applications. ACM SIGMETRICS Performance Evaluation Review, 2005, pp. 291–302.

[29] M Welsh and D Culler, Adaptive Overload Control for Busy Internet Servers. the 2003 USENIX Symposium on Internet Technologies and Systems 2003.

[30] J W J Xue, A P Chester, L He and S A Jarvis, Dynamic Resource Allocation in Enterprise Systems. 14th International conference on Parallel and Distributed Systems ( ICPADS’08) 2008. [31] A Zalewski and A Ratkowski, Evaluation of Dependability

of Multi-tier Internet Business Applications with Queueing Networks. International Conference on Dependability of Computer Systems ( DEPCOS-RELCOMEX’06) 2006. [32] J Y Zhou and T Yang, Selective Early Request Termination for

Busy Internet Services. 15th International Conference on World Wide Web, Edinburgh, Scotland 2006.