Service Level Agreement Based Distributed Resource Allocation
for Streaming Hosting Systems
∗Yun Fu, Amin Vahdat
Department of Computer Science, Box 90129 Duke University, Durham, NC 27708
{fu, vahdat}@cs.duke.edu
Abstract
The trend to outsourcing network services to third parties in a utility model has resulted in a new distributed application model where the hosting service, service providers and end clients constitute a co-dependent profit
ecosys-tem. SLAs provide a means for the service
providers to specify their target levels of perfor-mance and reliability and for the hosting service to arbitrate among competing services under re-source constraint. In this paper, we propose a number of design principles for specifying and enforcing SLAs that allow service providers to obtain expected throughput from hosting sys-tems. We propose and evaluate an algorithm,
Squeeze, based on pricing and penalties that
al-lows the hosting service to maximize its prof-its while flexibly allocating available system re-sources among competing services based on pre-specified SLAs and dynamically changing client access characteristics.
1
Introduction
Burgeoning distributed hosting services intro-duce a new model for distributed applications. A hosting service establishes a distributed net-work where many edge servers are deployed on the edges of the Internet to provide
low-∗The research is supported in part by the National Science Foundation (EIA-9972879), Hewlett-Packard, IBM, and Microsoft. Vahdat is also supported by an NSF CAREER award (CCR-9984328).
latency and high-bandwidth network services to end clients. Service providers deliver the content to the hosting service, who publishes the data to a subset of available edge servers. The revenue of the hosting service comes from the service providers, who in turn derive their revenue from end clients. However, the popu-larity of individual services and the quality of services delivered by the hosting service deter-mine how much each service provider is will-ing to pay. Thus, the hostwill-ing service, service providers and end clients constitute a profit ecosystem. In this ecosystem, efficiently al-locating limited resources to benefit all three components is a challenging problem. Since the service providers cannot directly control re-source allocation on the hosting service, Ser-vice Level Agreements (SLAs) contracted be-tween the service providers and the hosting ser-vice are required to control the hosting serser-vice’s resource allocation and client request process-ing. Given a set of SLAs, the hosting service requires a dynamic and adaptive resource allo-cation algorithm to maximize its own revenue from all service providers. Further, in the case where service resources are spread across the wide area, the system requires a distributed resource allocation algorithm that can provide global maximal revenue based on local resource usage information at each site.
Distributed hosting systems have many prac-tical applications, e.g., Content Delivery Net-works (CDNs) [1] and utility computing sys-tems [5]. We propose two resource allocation models for hosting systems. One model is that
each service provider not only delivers data to the hosting service, but also provides ser-vice applications, e.g. web servers or streaming servers, to be executed on the edge servers of the hosting service. In this case, the service applications determine the amount of resources required for current workload and apply for the resources from the hosting service. So the re-source allocation mechanism of the hosting ser-vice is relatively simple in this model. It can simply allocate resources to the service provider with the highest bid while considering the SLAs contracted with other service providers. This model is more suitable for a general utility data center. The other model is that the service providers only deliver data to the hosting ser-vice and utilize serser-vice applications supplied by the hosting service. So the hosting ser-vice can fully control the resource usage for each service provider within the internals of the service applications and thus use resources more efficiently. This model can be adopted by CDNs. In this model, the hosting service needs to decide not only how to allocate resources but also how to utilize the resources more ef-ficiently. Without direct resource control from service applications, the service providers can only specify their resource requirements for tar-get levels of performance and reliability by care-fully designing SLAs.
In this paper, we investigate resource alloca-tion for a distributed hosting system for stream-ing multimedia content based on the second re-source allocation model. We consider multime-dia services because they are more challenging than standard web services. A streaming re-quest can occupy system resources for an in-definite period of time as opposed to a web ob-ject request that typically lasts for milliseconds. Thus, carefully reserving and allocating lim-ited resources for streaming requests are critical for a streaming hosting system. We propose a
Service Level Agreement based Streaming Host-ing (SLASH) system as a solution to resource
allocation in a distributed streaming service. SLASH can effectively adjust resource alloca-tion for all streaming content it serves based on
predefined SLAs.
Before discussing how to define SLAs, we must first determine what resources should be controlled and specified in SLAs. We assume one major reason that service providers would outsource SLASH is because they cannot eco-nomically satisfy their customer’s bandwidth requirements. Thus, intuitively, SLAs can be specified for how much bandwidth or how many concurrent connections SLASH should provide to the service providers. However, if an SLA directly specifies how much bandwidth should be reserved for a service provider, it is difficult to accurately estimate the revenue loss due to the shortage of the reserved bandwidth since it depends on the request load and the scheduling algorithm. On the other hand, streams with different lengths occupy system bandwidth for different periods of time. Therefore, the re-sources specified in SLAs should reflect not only space issues (bandwidth), but also time issues (how long the bandwidth is occupied). Thus, we use shares, a bandwidth unit used during a system scheduling epoch, as the target resource in SLAs.
Given a set of SLAs specified from all ser-vice providers, a hosting center must determine how many resources should be deployed to sat-isfy each customer’s request load, and how to allocate and control resource usage among all hosted services. Obviously, it is not wise to overprovision for the maximum amount of re-sources required by the peak request load from all hosted services. Sharing resources among hosted services allows for better resource uti-lization and in turn results in larger profits. Further, since streaming requests occupy sys-tem resources for an indefinite period of time, only sharing without resource partitioning may cause the hosting service to work as a work-conserving scheduler, where low-priority ser-vices may consume more resources than high-priority ones. For example, in a First Come First Serve (FCFS) system, the resources may be mostly occupied by the service with the high-est requhigh-est load, which may not correspond to the service who would pay the most.
Mean-while, due to the specified SLAs, a penalty may be charged to the hosting service when a min-imum level of resources is not delivered to a given service. In some cases, the correspond-ing profit may outweigh this penalty, leadcorrespond-ing the hosting service to temporarily accept the penalty.
In this paper, we propose a resource alloca-tion algorithm, Squeeze, to efficiently and dy-namically allocate resources for hosted services according to SLAs. Since the revenue that a hosting service can obtain is the major concern of the hosting service, we use revenue as the pri-mary criterion for evaluating different resource allocation mechanisms. We will show that the
Squeeze algorithm can effectively maximize the
revenue of a hosting service by flexibly allocat-ing resources among competallocat-ing services.
Section 2 describes related work. Section 3 introduces the architecture of SLASH. Section 4 describes our considerations on the design of SLAs and the Squeeze resource allocation algo-rithm. Finally, Section 5 presents the experi-mental results to show the correctness and ef-ficiency of SLASH implementation on resource control.
2
Related Work
James Kurose and Rahul Simha [10] proposed a microeconomic approach to allocating dis-tributed resources for file allocation problems (FAP). In their system, a number of comput-ers are fully connected into a communication network. Each node stores a part of the entire file system and can generate file access requests, which can be satisfied on either the local node or a remote node. For a given communication cost and request load assignment on each node, the system determines an optimal allocation of the portion of the file system that should be placed on each node. They define a utility func-tion, which only considers the communication cost of the system, as the target optimization goal. They propose two gradient-based algo-rithms, which can converge to an optimal solu-tion for file allocasolu-tion. They also present a
pair-wise interaction algorithm to implement the re-source allocation in a distributed manner.
Muse [6] is a recent work considering resource allocation in hosting centers. Muse can dynam-ically allocate an active server set for a service based on negotiated SLAs and cost, specifically the cost of power. The MSRP resource allo-cation algorithm proposed in Muse is also a gradient-based algorithm, which depends on a concave utility function. Staring from an initial resource allocation, Muse reassigns one unit of resource at each iteration in a greedy manner. Eventually, the algorithm can stop at an opti-mal solution. A similar incremental algorithm for optimally allocating discrete resources was previously discussed by Toshihide Ibaraki and Naoki Katoh [7]. In our work, we show how to define SLAs that can cause the revenue func-tion of each service to be a concave funcfunc-tion, which is then utilized by the Squeeze algorithm to compare and select the best candidate among all services to allocate resources.
In this paper, we focus on bandwidth as the target resource for allocating. For other host-ing services with more complicated resources, such as application hosting services, some ex-isting techniques can be utilized to isolate the resource usages of cohosted services on a single node [4] or within a cluster [3]. The hosted ser-vices can even run on a virtual host [11] where resource partitioning is completely transparent.
3
System Architecture
Figure 1 depicts the architecture of SLASH, consisting of a set of distributed edge servers and switches dispersed across the edges of the Internet. Each edge server is intended to serve a group of nearby clients. A number of edge servers are grouped together and managed by a nearby switch. Switches are interconnected as a front-end interface of the system to process client requests.
Using DNS server selection techniques or client customized preferences, client requests can be routed to a nearby switch, which redi-rects the clients to access appropriate edge
Figure 1: SLASH architecture
servers. The switch identifies the appropriate edge servers by, for example, utilizing exist-ing client clusterexist-ing technologies [9] or simply grouping clients by their ASes. To select edge servers for clients, the switch must collect the load status of the edge servers. In SLASH, edge servers regularly report their load status to their local switch. Furthermore, a switch must maintain the content of all the managed edge servers and inform the edge servers to retrieve stream files from original sources. To further utilize system resources, switches can also ex-change information to shift request load among one another and to cooperate to manage dis-tributed resource allocation to obtain globally optimal performance.
In SLASH, switches implement server selec-tion by utilizing RTSP messages [13] to redi-rect client requests to appropriate servers. The streaming data transport is implemented by RTP [12]. When a client initiates a stream con-nection, it sends an RTSP DESCRIBE request to the closest switch. The switch selects an edge server for the client based on the resource control algorithm and sends back an RTSP re-sponse with status code 302, where the selected server is set in the Location field. The client stream viewer then establishes a new connec-tion to the specified server to retrieve the data. To prevent clients from arbitrarily selecting an
edge server, which can cause the switch to lose control over resource allocation, SLASH also encrypts the issued time and the client IP ad-dress at the end of the Location field. When the Location field is forwarded to the edge server, the server can verify the validity of the redi-rection. SLASH also adopts a communication mechanism for the switches to maintain a cor-rect view of availability, resource usage, and supplied content of each edge server. For ex-ample, the edge server should inform the switch about newly established or closed connections. So the switch can estimate the load on each server.
4
Resource Allocation
To efficiently utilize limited resources in SLASH, we must consider all factors that can affect system performance, such as resource al-location, data placement and request routing. SLASH can simultaneously serve many service providers. Each service provider has many stream files with different content or different qualities, which in turn have different popu-larities. Through resource allocation, SLASH can reserve some amount of resources for each service provider or even for different quality streams of one service provider to maximize the global performance or the revenue of the entire system. We will show that by allocat-ing resources accordallocat-ing to content popularities and SLA specifications, the system can obtain higher overall revenue from customers.
4.1 Price and Penalty Design
Currently, the only distributed resource we con-sider is bandwidth. We assume each server has limited bandwidth to serve all service providers. We also assume streaming connections are rigid connections, which cannot be preempted by other new connections after they are estab-lished. In this paper, we do not consider vari-able bitrate streams. So if a connection is es-tablished, it can neither be disconnected by the server nor dynamically adjust its bitrate.
Based on SLAs contracted with service providers, SLASH should provide quality-guaranteed services and obtain revenue from the service providers. Since SLASH simultane-ously hosts multiple services, it always attempts to utilize available resources to serve as many requests as possible. Thus, the resources re-served for a service provider according to SLAs may be utilized by SLASH to serve other ser-vice providers if the system can obtain more profit. However, if the quality of the service is affected or the throughput is reduced because of a shortage of deserved resources, a penalty must be charged to the hosting service. Rel-ative to existing hosting techniques where re-source allocation is implemented by reserving a set of machines for a given service, SLASH can efficiently multiplex competing hosted services and allocate resources for them at the granu-larity of shares.
SLAs are specified from many perspectives. One major consideration is that the revenue and penalties described by SLAs can be ac-curately calculated and are fair to both vice providers and hosting services. The ser-vice providers can calculate the revenue and penalties off-line based on system logs and ver-ify the computation based on statistical results provided by third-party monitors. Meanwhile, revenue and penalties must also be easy to esti-mate online since SLASH needs to estiesti-mate the possible tradeoff between the revenue to benefit a service and the possible penalty from another service when adjusting resource allocation be-tween the two services.
We define SLAs to direct SLASH to provide necessary resources for all services, and to stim-ulate SLASH to allocate spare resources for the services with high request load or high priori-ties. Two major metrics in an SLA are: revenue
prices and penalty prices.
The revenue price of a stream file can be de-fined as the profit earned per time unit for a connection of the stream file. For the same steam file, if the system can serve it longer, the service provider should pay more for it. Streams with different qualities can be of
dif-ferent prices. For example, higher quality streams deserve higher prices. Thus, the rev-enue prices can stimulate SLASH to serve more high-quality streams for customers if there are sufficient resources.
Revenue prices can be fixed (flat) prices or a function of a set of input arguments [6]. One possible argument of this function is the throughput for a service, for example, if a ser-vice provider desires to sustain 1000 requests per minute. If the actual throughput is larger than this, the service provider can reduce the price to avoid overcharge for unexpected, high request load. On the other hand it may be willing to pay a lot more for load beyond a given level because it signifies a flash crowd or some unexpected events (major breaking news story). In this paper, we use fixed prices to simplify our discussion.
Compared to revenue prices, penalty prices are more difficult to define since they are based on an estimate for the possible profit loss due to the shortage of deserved resources at any time. Generally, it is not easy to accurately and fairly calculate the loss. For a given request load and a certain amount of resources, the throughput that it can generate highly depends on the scheduling algorithm. For example, as-sume that in Figure 2 the system should allo-cate 300Kbps for a service. There are four re-quests Req1, Req2, Req3 and Req4 for a stream of the service. Req1 and Req3 are for the
low-quality (100Kbps) version of the stream. Req2
and Req4 are for the high-quality (300Kbps) version of the stream. The profit obtained by providing the high-quality stream is higher than the low-quality stream. In Figure 2, the system rejects Req1 and Req3 and accepts Req2 and
Req4 to maximize overall profit. Obviously, if
the system accepts Req1 and Req3, it cannot process Req2 and Req4 because the bandwidth
is occupied at that moment. But, since the system does not serve Req1 and Req3, some resources guaranteed by the SLA are not al-located at the moment, causing a penalty to be charged. However, this is “unfair” since the hosting service provides a better service for the
service provider and the service provider does not lose any of its deserved profit (assuming that end clients are paying it).
Req 1 for 100Kbps (rejected)
Req2 for 300Kbps (accepted)
Time (Sec) 300
600 Throughput (Kbps)
Req for 100Kbps (rejected) 3
Req4 for 300Kbps (accepted)
0 10 20 30 40 50 60 70 80 90 100 110 120
Figure 2: Req1 and Req3 for a low-quality
(100Kbps) stream file are rejected. Req2 and Req4
for a high-quality (300Kbps) stream file are accepted.
Even if both the service provider and the hosting service agree to pay a penalty in this case, we still need to consider whether the penalty should cover the entire loss of Req1
(or Req3), which can last up to 40 seconds, or only the loss before Req2 (or Req4) begins to be
served, which is only 10 seconds. While the lat-ter appears to be fairer, it means that we need to simulate the scheduling of rejected requests to estimate how much penalty should be paid for a given request load. Thus, it is not prac-tical to determine if a single rejected request should be charged for a penalty by the resource allocation at any particular moment. Penalties should be estimated based on the offered load and the profits generated during an interval.
Another consideration for SLAs is the amount of time an individual request occu-pies while consuming resources at an edge server. Considering that profit is proportional to stream length, the loss from rejecting a short stream is less than the loss from rejecting a long stream if their revenue prices are the same. Thus, the specification of penalties should also include stream lengths.
We assume a limited number of encodings of the same stream, with bitrates all
multi-ples of a bandwidth unit, say 100Kbps, e.g., 100Kbps, 300Kbps, 500Kbps. We estimate the stream length in a specified time interval, an
epoch. Epochs should be long enough to contain
enough data samples to measure the through-put and penalties. When SLASH accepts a stream request whose bitrate is one bandwidth unit and the stream duration is 1 epoch, we say the system sells 1 share at that moment. Thus, the shares of a stream is its bandwidth units multiplied by the stream length in epochs. We define the trading volume of an epoch as the number of shares SLASH sells. A trading vol-ume can be directly used to compare with the request load in terms of shares. For example, assume an epoch is 10 seconds and the band-width unit is 100Kbps. In Figure 3, the system receives 1 600Kbps request by 1 epoch long in each epoch. In Figure 4, the system receives 1 100Kbps request by 6 epochs long in each epoch. If the SLA specifies that the system should serve 6 shares per epoch, serving either of the request loads in Figure 3 or 4 can sat-isfy the SLA. Thus, service providers do not contract with SLASH directly by bandwidth, but by shares. Notice that the trading volume of an epoch is different from the throughput. For example, in the first epoch of Figure 4, its throughput is only 100 Kbps ∗ 1 epoch = 1 M bit. But its trading volume is 6 shares (6 M bit).
Shares form the basis for SLAs between service providers and hosting centers. With shares and epochs, we can define the price of a stream as the profit in an epoch per connec-tion ($/epc) or directly as the profit for each share ($/share).
We now formally define penalties by shares. To obtain a certain quality of service, a service provider pays extra money to reserve a number of shares, Σ. Assume the offered load of the service during an epoch is Θ, and the trading volume during that epoch is Λ, where Λ ≤ Θ.
Time (Sec) 300
600 Throughput (Kbps)
0 10 20 30 40 50 60 70 80 90 100 110 120 Req1Req2Req3 Req4 Req5 Req6
Figure 3: 1 request for a 600Kbps stream by 1 epoch long in each epoch.
Time (Sec) 300 600 Throughput (Kbps) 0 10 20 30 40 50 60 70 80 90 100 110 120 Req1 Req2 Req3 Req4 Req5 Req6
Figure 4: 1 request for a 100Kbps stream by 6 epochs long in each epoch.
Then, we can define the penalty as:
penalty(Θ, Λ) =
(
0 if Λ ≥ min(Σ, Θ),
P ∗ (min(Σ, Θ) − Λ) if Λ < min(Σ, Θ),
where P is the SLA specified penalty price for a share. It could be the same as the price that the service provider pays for reserving the Σ amount of shares. Thus, the penalty can be exactly the compensation for the loss of reser-vation. The penalty price can also be higher than the price used to reserve resources. In this case, the penalty can refund the money used to reserve resources and compensate the possible loss due to rejection of client requests. Since the
penalty calculation does not distinguish which stream files those lost shares are for, the penalty price is the same for all stream files of a service, even for different bitrate versions.
4.2 SLA Based Control
The reason we introduce prices and penalties into SLASH is to provide a means for ser-vice providers to control SLASH resource al-location under highly variable client request loads. Higher prices can cause SLASH to al-locate more spare resources to a particular ser-vice. For a single service, higher prices for high-quality streams can stimulate SLASH to serve more high-quality streams if available band-width is sufficient. However, if the bandband-width is limited, the service provider may prefer to serve more connections with low-quality streams us-ing the same amount of resources. Thus, the service provider can define the revenue price of each share for low-quality streams to be higher than for high-quality streams.
We use an example to illustrate some design rules for revenue prices and penalty prices. As-sume we have only two services, service A (SA)
and service B (SB), competing for available
resources in a system. The system resources are divided into two parts and reserved by the two services. Both services have a low-quality stream and a high-quality stream. They are the same length, n epochs. The bitrate of the low-quality streams is βl bandwidth units and
βh bandwidth units for high-quality streams. We define the price for the high-quality stream of SA as P r(A, h) in $/epc. Correspondingly,
we define P r(A, l), P r(B, h) and P r(B, l). The penalty prices for SA and SB are P eA and
P eBrespectively. To stimulate SLASH to serve
more high-quality streams, we set
P r(A, l)< P r(A, h)
P r(B, l)< P r(B, h) (1) Our first goal in this example is to allocate re-sources fairly between the two services. Thus, we set P r(A, l) = P r(B, l) and P r(A, h) =
more low-quality streams or downgrade high-quality stream requests to low bitrates when resources are scarce. So we define the following relationship:
P r(A, h)/βh< P r(A, l)/βl
P r(B, h)/βh< P r(B, l)/βl (2)
To maintain fair allocation between the two ser-vices, we must ensure that one service does not occupy the other service’s reserved resources even when its request load is high enough to force it to begin to downgrade high-quality streams to low-quality streams. It can be eas-ily shown that SLASH will first use the re-sources reserved for the service itself to serve low-quality streams instead of using the other service’s resources only if P eA6= 0 and P eB 6=
0.
However, under circumstances where the re-quest load is high enough such that all resources reserved for one service have been consumed for its low-quality streams, SLASH should utilize the resources reserved from the other service if the other service still reserves some resources for high-quality streams. Thus, we can set the prices as: P r(A, l) βl − P eB > P r(B, h) βh P r(B, l) βl − P eA> P r(A, h) βh (3)
So the hosting service can earn more profit from each share by serving low-quality streams even if it has to pay a penalty for failure to provide enough resources for high-quality streams of the other service.
With the above constraints, the hosting ser-vice always attempts to provide enough sources for both services. If all available re-sources are allocated to the two services and one service experiences high request load, the host-ing service first downgrades requests for high-quality streams to low-high-quality streams for the service to utilize resources more efficiently with-out paying a penalty. However, if the request load for this service becomes even higher and
all resources for the service are consumed for low-quality streams, the hosting service starts to allocate more resources for this service if the other service still has some resources for high-quality streams. In this case, although the host-ing service has to pay a penalty for the other service, it still can earn more profit by serving more low-quality requests for the busy service. Equations (1) and (3) define a set of con-straints for possible solutions. We use con-crete numbers to show possible solutions for the above constraints. For example, βh is 3
band-width units. βl is 1 unit. Assume P r(A, h) =
P r(B, h) = y and P r(A, l) = P r(B, l) = x. We want the penalty price to be at least high enough to compensate the loss of a possible high-bitrate stream. So we set P eA = P eB =
y/βh = y/3. It is easy to show that x and
y are constrained by y > x and y < 3x/2.
As illustrated by Figure 5, point P (9, 12) is a possible solution for (x, y). So we can define
P r(A, l)= P r(B, l)= 9, P r(A, h)= P r(B, h)= 12, and P eA= P eB = 3. 0 5 10 15 20 25 30 35 0 5 10 15 20 25 30 x y P(9, 12)
Figure 5: (x, y) is constrained by two lines: y > x
and y < 3x/2. P (9, 12) is a possible solution.
4.3 Squeeze Resource Allocation Al-gorithm
Requests to streaming servers can last for many epochs and occupy system resources during this period. Thus, resource allocation must consider both space (bandwidth) and time limitations of the system. As mentioned above, shares are the resources that service providers reserve with
SLAs. For a system with β bandwidth units, it can serve β shares continuously in each epoch. Thus, the resources that SLASH can allocate in an epoch is β shares. Shares include both space and time considerations.
The resource allocation mechanism actually contains two steps: resource allocation and
ad-mission control. At the end of each epoch, the
system adjusts the amount of resources allo-cated for each service. Based on this resource allocation, the system admits or rejects client requests to the service during the next epoch.
SLASH keeps track of the request load for each bitrate in every service. We assume the prices of streams with the same bitrate for one service are all the same in this paper. How-ever, in practice, if the prices are different, we can still dynamically estimate the average price for a bitrate in one service. At the end of each epoch, we use an exponentially weighted mov-ing average (EWMA) filter [8] to estimate the request load for every bitrate in each service.
Before discussing our share allocation algo-rithm, we use an example to illustrate our principles for fully utilizing allocated resources. Figure 6 describes the example and one solu-tion. Figure 7 describes the process for obtain-ing this solution.
300Kbps 200Kbps 100Kbps 200Kbps 100Kbps 1 Mbps
Figure 6: A bandwidth unit is 100Kbps. The avail-able bandwidth for the service is 1Mbps. To squeeze 3 100Kbps, 2 200Kbps, and 2 300Kbps requests, the system downgrades them to 4 100Kbps and 3 200Kbps streams.
Similar to our earlier example, for one ser-vice, given a certain request load, Θ1, Θ2,...,
and Θm (shares) for bitrates β1, β2, ..., and βm respectively (to simplify our discussion, assume
β1 is the lowest bitrate the system can
down-grade to. Generally it is 1 bandwidth unit), we need to determine how many shares the
sys- 300Kbps 200Kbps 100Kbps 1 Mbps 100Kbps 200Kbps 100Kbps 1 Mbps
Figure 7: The system first estimates how much bandwidth is occupied when downgrading all streams to the lowest bitrate. Then the system upgrades those requests whose bitrates are higher than the lowest one to the second lowest bitrate to see how many requests can be upgraded. This process repeats until all bandwidth is consumed.
tem should allocate for each bitrate. Assume the revenue price of βi is Pi ($/epc), and the
penalty price of this service is P e ($/share). First, we need to determine the incremental profit the system can obtain when allocating one more share for a particular service. We can show that there are m + 1 (or m) crit-ical points on share allocation for each ser-vice, which we should calculate before compar-ing this service with other services. Initially, we ignore the penalty price. The first critical point is the number of shares where all bitrate streams should downgrade to β1 to fit the
al-located shares. It is C1 = Σi=m
i=1(Θi/βi) shares.
When the resources allocated for this service is less than or equal to C1, the price of each
share is P1/β1, or simply P1. The second criti-cal point is C2 = Θ1+ Σi=mi=2(β2∗ Θi/βi) shares.
With C2 shares, all β1 streams are served at
bi-trate β1. Other streams are served at a bitrate of β2. Correspondingly, other critical points
are Ci = Σj=i−1j=1 Θj+ Σj=mj=i (βi∗ Θj/βj) shares.
When the system changes resource allocation from Cito Ci+1shares, the profit for each newly
allocated share is P0
i = (Pi+1− Pi)/(βi+1− βi).
After obtaining all m critical points, now we consider the penalty. Assume the service re-served resources are Σ shares, it introduces an-other critical point Σ (if Ci6= Σ for ∀i). For all
points with Ci < Σ (1 ≤ i ≤ t, assuming there
are t such points, where Ctis the highest point),
if the system changes the allocation from Ci to
allocated share is not P0anymore, but P0 i+ P e,
which means the hosting service can pay one share less penalty by allocating one more share for the service when the allocated resources for the service is less than the reserved resources Σ.
With the request load in Figure 6, Figure 8 il-lustrates the revenue increase graph with share allocation. The gradient of each segment is the price for a newly allocated share. It can be proven that the revenue increase curve is con-cave if P1 < P2 < ... < Pm and (P1/β1) > (P2/β2) > ... > (Pm/βm) as defined in subsec-tion 4.2. 0 2 4 6 8 10 12 14 0 10 20 30 40 50 60 Shares Revenue ($)
Figure 8: For the request load in Figure 6, assume each request length is one epoch. Assume the ser-vice reserves 5 share resources with a penalty price
1$/share. The prices of 100Kbps, 200Kbps and 300Kbps are 7$, 9$ and 10$ respectively. We can
observe that the price increase for each share is 8$ from 0 to 5 shares, 7$ from 5 to 7, 2$ from 7 to 11 and 1$ from 11 to 13.
For each service, we have a graph similar to that of Figure 8, which shows m + 1 (or m) critical segments. Thus, we know the prices for each share increases in these services, which are the gradients of those segments. With this data, we introduce our resource allocation al-gorithm, Squeeze. Assume we have N services (S1,...,SN) and the system available resources
are U shares, the Squeeze algorithm operates as follows,
• Step 1: Calculate the above revenue
in-crease graph for each service so that we have Mi critical segments for service Si. We use a counter P osi, initiated to 1, to
keep track of the current segment for Si.
• Step 2: Compare all current segments of
all services and select those services with the highest price, St1, St2,..., Stk.
As-sume the current segments of these ser-vices contain Dt1, Dt2,...,Dtkshares respec-tively. If Σj=kj=1Dtj <= U , allocate Dt1,
Dt2,...,Dtk shares for each service
respec-tively. Otherwise, divide U proportionally to Dt1, Dt2,...,Dtk and allocate them to the
services.
• Step 3: Subtract the allocated shares in
this iteration from U . If U is equal to zero, the algorithm stops.
• Step 4: Increase P ost1, P ost2,...,P ostk by
1. If not all service’s current position P osi
reaches Mi+ 1, go to step 2. Otherwise,
the algorithm stops.
The Squeeze algorithm allocates resources ac-cording to the request load. Obviously, if a service does not have any request load, it can-not obtain any shares. It is also obvious that the system can still contain some available re-sources when the algorithm quits at step 4. We call these available shares as flexible shares, which can be used by any service if it experi-ences sudden request peak load during the next epoch.
From Figure 8, we can also determine how many shares should be reserved for each bitrate of a service, given a certain amount of allocated shares. With the reserved shares for a service, SLASH can perform admission control during the next epoch. The policy of admission con-trol is that a stream with a certain bitrate for one service never uses more than the shares al-located for it plus the current available flexible
shares. If it uses some amount of the flexible shares, the system subtracts the used part from
the available amount of the flexible shares. Although the resources allocated in SLASH are shares, our system also controls the
band-width usages of services. As mentioned be-fore, for a system with β bandwidth units, it can serve β shares continuously in each epoch. Thus, if we allocate D shares for a service, the service should also only use up to D units of bandwidth at any instant of time. However, the bandwidth available when a switch processes requests is also determined by the request load in previous epochs. For example, if a service is initially allocated D shares and there are no requests for that service, it can use that capac-ity for an arbitrary service. For instance, it can serve D share requests whose lengths are all 2 epochs. In this case, it uses D/2 bandwidth units in this epoch. Then, in the next epoch, al-located resources are still D shares. Since avail-able bandwidth is only D/2 bandwidth units, the service should only select those stream re-quests whose length is greater than or equal to 2 epochs to fully utilize the D/2 available bandwidth and D allocated shares. Generally, if a service bitrate is provided with D shares in an epoch, the bandwidth used by this bitrate should not exceed D units of bandwidth at any instant of time. If the current available band-width is B, SLASH would only admit those re-quests whose lengths are equal to or larger than
D/B epochs. Notice that this D/B threshold
is evaluated at any instant of time rather than in the beginning of an epoch since bandwidth is occupied and released continuously.
5
Experiments
We implemented SLASH on Solaris and Linux. Our streaming code is based on the source code publicly provided by Live Networks, Inc. [2]. We added full support for the RTSP protocol and unicast streaming to implement video-on-demand and audio-on-video-on-demand (VOD/AOD) services. We implemented the SLASH switch, which can dynamically estimate request load and allocate resources based on the Squeeze al-gorithm. The switch redirects client requests to available servers based on determined resource allocation.
To simplify the model, we only consider a
system where each edge server contains a fully replicated copy of all stream files that the sys-tem serves. Thus, there are not data place-ment problems in our study. In our experiplace-ment, we use a system that contains 1 switch and 4 servers. Each server’s available bandwidth is 90 bandwidth units (each unit is 128Kbps). Thus, the entire system can handle 360 bandwidth units. All system resources are allocated to two services, service 1 and service 2. Each service reserves 180 shares. Each service has only two versions of a single stream, a low-bitrate ver-sion and a high-bitrate verver-sion, which are the same length, 4 epochs. The low-bitrate ver-sion is 1 bandwidth unit. The high bitrate is 3 bandwidth units. The prices for high-bitrate streams are the same for the two services: 12 per epoch. The prices for low-bitrate streams are 9 per epoch. The penalty prices are 4 per share. The system epoch is 10 seconds. In our experiments, the request load is only for high-bitrate versions. Obviously, the system can only continuously handle 30 requests for high-bitrate streams per epoch, which occupies 30∗3∗4 = 360 shares, or 90 low-bitrate streams.
5.1 Resource Reservation
We first illustrate that SLASH can generate smoother throughput and more net profit by reserving a certain amount of resources for each service. In this experiment, we use 4 client machines to emulate 300 clients to access the 4 servers. The request load for service 1 is a constant request load, 15 requests for the high-quality stream per epoch. The request load for service 2 is also constant, 45 requests for the high-quality stream per epoch. If the system does not allocate resources fairly between the two services, a penalty may be charged.
Figure 9 shows the revenue of SLASH ver-sus a First-Come-First-Serve (FCFS) solution. The FCFS solution admits all requests to the cluster and balances the workload among all the servers without any reservation for the next epoch. The FCFS solution cannot downgrade high-quality requests to low-quality requests.
However, if the Squeeze algorithm downgrades high-quality requests to low-quality requests, it can unfairly obtain more revenue. So, we only provide high-quality streams in this experiment and use the Squeeze algorithm only for allocat-ing resources accordallocat-ing to the service workload and SLAs without downgrading stream quali-ties. We can see that the revenue generated by the FCFS solution experiences periodic peaks and valleys because it does not reserve any re-sources for the next epoch.
Figure 9 does not yet account for penal-ties. Since the FCFS solution rejects many re-quests when all resources are consumed, it can-not provide enough shares to the services as re-served by the SLAs. Thus its penalty is higher than SLASH as illustrated in Figure 10. We notice that the revenue generated by SLASH does not decrease significantly when consider-ing the penalty since it reserves resources for each epoch.
Figure 11 demonstrates the resources used by the two services when using the FCFS al-gorithm to allocate resources. Obviously, the resources used by the two services are propor-tional to their workloads. Figure 12 demon-strates the resources used by the two services when using the Squeeze algorithm. It shows that the resources used by the two services are usually balanced. Although the resources used by service 1 are sometimes lower than its SLA specified resources (180 shares). We find that the reserved resources by SLASH are always strictly 180 shares for each service during the entire experiment, which are calculated by the
Squeeze algorithm according to the two service
workloads and the SLAs. The skewed points on Figure 12 are because the time skew between the SLASH switch and the synthetic workload generator. Although the workload generator generates constant request load of 15 requests per epoch for service 1. Some of them maybe arrive early or late. So the switch accounts them into the previous or the next epoch, which causes that there are not enough requests in some epochs.
As mentioned above, the total request load is
60 requests per epoch. The experiment is con-ducted for 50 epochs, which introduces 3000 requests in total. The FCFS algorithm ac-cepts 1528 requests, which brings a revenue of 73344. However, it also has a total penalty of 29520. Thus, the net profit is 43824. For the same request load, SLASH only accepts 1480 re-quests. So the revenue is 71040, which is lower than the FCFS algorithm because it maybe hold resources for future requests without serv-ing requests and the offered load is skewed as mentioned above. But the penalty caused by SLASH is only 960. So the net profit of SLASH is 70080, which is 60% higher than the FCFS algorithm.
5.2 Dynamic Resource Adjustment
In this experiment, we compare the Squeeze algorithm with two other solutions. One so-lution is static reservation, which reserves re-sources for each service according to SLAs. In this scheme, one service does not use the other service’s resources even if it experiences high request load. So this solution avoids paying penalties most of the time, except when the resources used by one service are not released in a timely fashion in following epochs. How-ever, in this experiment, the request load is high enough for the two services to fully utilize their resources. The static reservation does not suf-fer significantly from the inability to adjust re-sources dynamically. The reason we use this solution is that it does not downgrade stream quality even when the request load is high. An-other solution is a greedy algorithm. It always downgrades the quality of streams to the low-est quality to obtain the maximum benefit for each share. Besides this, its resource alloca-tion mechanism is the same as the Squeeze algo-rithm. In other words, it is a special version of the Squeeze algorithm, whose revenue increase graph for each service only increases to the first critical point if not considering the penalty point. Thus, this greedy algorithm can also dynamically adjust resource allocation among services according to their revenue prices and
Figure 9: Without considering the penalty, the Squeeze algorithm makes the revenue output smoother. The FCFS solu-tion causes the system throughput periodi-cally to arrive peak and bottom values.
Figure 10: Considering the penalty, the revenue that the FCFS algorithm can ob-tain is lower compared to SLASH.
request loads.
Figure 13 illustrates the request load between the two services. We intentionally assign a con-stant request load for service 1, which is 15 re-quests per epoch. For service 2, the request load is also 15 requests per epoch in the be-ginning. Then it constantly increases by 10 requests/epoch every 4 epochs. In this experi-ment, we use 10 machines to emulate 650 clients to access the 4 servers.
As mentioned in Section 1, the major cri-terion for evaluating different resource alloca-tion schemes is the revenue that hosting ser-vices can obtain. Figure 14 shows the rev-enue generated by each of the 3 solutions with-out considering any penalty. Static reservation keeps an average 1440 revenue per epoch be-cause it handles 30 high-quality requests per epoch (12 ∗ 30 requests ∗ 4 epochs). The greedy algorithm downgrades all requests to low qual-ity. However, initially it does not have enough requests to downgrade. So its revenue stays at 1080 (9 ∗ 30 requests ∗ 4 epochs). As request load increases, it obtains enough requests to downgrade, and the system revenue increases correspondingly until it reaches its peak rev-enue, 3240 (9 ∗ 90 requests ∗ 4 epochs). We see that the Squeeze algorithm can always maintain
the optimal allocation between the other two solutions. Figure 15 shows the revenue with the penalty charge. Obviously, the effect on the static solution is the least since resources are not used by the other services in this solu-tion. The greedy algorithm’s revenue decreases greatly in the beginning due to the shortage of deserved resources, which are held by it for possible requests to downgrade.
Figure 16 shows the resource allocation by the Squeeze algorithm. We can see that the resources are fully allocated for high-quality streams (180 shares for each service) in the be-ginning. With the increase of the request load for service 2, more resources of service 2 are allocated for low-quality streams. Correspond-ingly, the resources for high-quality streams of service 2 decrease. However, the resource al-location for service 1 does not change. When the resources for service 2’s low-quality streams reaches 180 shares, meaning that there are no more resources to use for serving service 2’s re-quests. Thus, the system begins to downgrade the requests for service 1 to empty out more space for service 2. This leads to a decrease in the allocation of high-quality streams to service 1. The released resources are used for serving low-quality streams of both service 1 and
ser-Figure 11: The number of shares con-sumed by the two services in each epoch when using the FCFS algorithm.
Figure 12: The number of shares con-sumed by the two services in each epoch when using the Squeeze algorithm.
vice 2. Finally, all system resources are allo-cated for low-quality streams. This figure shows that Squeeze is able to allocate resources ac-cording to our price and penalty design. So a key point is that the penalty is somehow inde-pendent of the quality of the stream.
6
Conclusion
and
Future
Work
In this paper, we discuss principles for the de-sign of SLAs for a distributed streaming hosting system, defining appropriate target resources, revenue prices and penalty prices. We then describe the Squeeze algorithm to allocate re-sources according to specified SLAs. Our price and penalty design enables each service to have a concave revenue increase graph for increased resource allocation. This is a necessary con-dition for our algorithm to determine the op-timal resource allocation for each service. Dif-ferent from existing gradient-climbing solutions [6], the Squeeze algorithm can directly deter-mine a set of critical points to allocate resources instead of increasing and decreasing units of resources among services until equilibrium is achieved. We present some initial experimen-tal results and show that our algorithm oper-ates as expected. The Squeeze algorithm can
obtain better and smoother revenue than the FCFS solution, showing that resource reserva-tion is critical for streaming services. Mean-while, the Squeeze algorithm can dynamically adjust resource allocation and always maintain optimal revenue compared to static allocation and a greedy allocation algorithm.
Currently, we only consider the case where each edge server has unlimited storage capacity. So there are not data placement problems for this system. However, if the storage capacity of edge servers is limited, when a switch selects a server to serve a request and the server does not contain the requested stream, the switch must consider the cost to retrieve the stream. An in-tuitive solution is to introduce a cost function for retrieving stream objects. An optimal al-location should consider both the cost and the possible revenue.
Another aspect of future work is coordinat-ing switches to obtain global optimality. Cur-rently, we only define the resource allocation problems on one switch. SLASH is intended to be a scalable system where multiple switches can determine local optimal resource allocation and communicate with each other in an asyn-chronous manner to maximize global revenue. An extended Squeeze algorithm for resource al-location among switches is required. Finally, coordinating data placement among switches is
Figure 13: The request load for the two services. All requests are for high-quality streams (3 bandwidth units).
Figure 14: Without considering the penalty, the revenues generated by the three solutions.
another interesting problem.
Acknowledgments
We thank Rebecca Braynard for her careful re-view on earlier drafts. The paper also benefited from discussions with Jeffrey Chase, Ludmila Cherkasova, Wenting Tang, Adolfo Rodriguez and Dejan Kosti´c.
References
[1] Akamai. http://www.akamai.com. [2] Live Networks, Inc. http://www.live.
com.
[3] Mohit Aron, Peter Druschel, and Willy Zwaenepoel. Cluster Reserves: A Mecha-nism for Resource Management in Cluster-based Network Servers. In The
Proceed-ings of the ACM SIGMETRICS Confer-ence, June 2000.
[4] Gaurav Banga, Peter Druschel, and Jef-frey C. Mogul. Resource container: a new facility for resource management in server systems. In The Proceedings of the 3rd
USENIX Symposium on Operating Sys-tems Design and Implementation,
Febu-rary 1999.
[5] Rebecca Braynard, Dejan Kosti´c, Adolfo
Rodriguez, Jeffrey Chase, and Amin Vah-dat. Opus: an Overlay Peer Utility Service. In Proceedings of the 5th
In-ternational Conference on Open Architec-tures and Network Programming (OPE-NARCH), June 2002.
[6] Jeffrey S. Chase, Darrell Anderson, Prachi Thakar, Amin Vahdat, and Ronald Doyle. Managing Energy and Server Resources in Hosting Centers. In Proceedings of the
Eighteenth ACM Symposium on Operating Systems Principles, October 2001.
[7] Toshihide Ibaraki and Naoki Katoh.
Re-source Allocation Problems: Algorithm
Approaches. MIT Press, Cambridge, MA,
1988.
[8] Minkyong Kim and Brian Noble. Mobile Network Estimation. In Proceedings of the
7th ACM Conference on Mobile Comput-ing and NetworkComput-ing, July 2001.
[9] Balachander Krishnamurthy and Jia Wang. Topology Modeling Via
Clus-Figure 15: With the penalty, the revenue generated by the three solutions.
Figure 16: Resources allocation among different quality streams by the Squeeze al-gorithm.
ter Graphs. In Acm Sigcomm Internet
Measurement Workshop, 2001.
[10] James F. Kurose and Rahul Simha. A Microeconomic Approach to Optimal Re-source Allocation in Distributed Computer Systems. IEEE Transaction on
Comput-ers, 38(5):705–717, May 1989.
[11] John Reumann, Ashish Mehra, Kang G. Shin, and Dilip Kandlur. Virtual Services: A New Abstraction for Server Consolida-tion. In The Proceedings of the USENIX
2000 Technical Conference, June 2000.
[12] H. Schulzrinne, S. Casner, R. Frederick, and V. Jacobson. RTP: A Transport Pro-tocol for Real-Time Applications. http: //www.ietf.org/rfc/rfc1889.txt, Jan-uary 1996.
[13] H. Schulzrinne, A. Rao, and R. Lan-phier. Real Time Streaming Proto-col (RTSP). http://www.ietf.org/rfc/ rfc2326.txt, April 1998.