Were he living today, Benjamin
Franklin might say, “Nothing is
certain, except death, taxes, and
computer crashes”. Despite some
claims, all cloud computing services
will be subject to outages. No system,
however large, nor process, however
elaborate, nor support, however
fanatical, can prevent computers
from occasionally going down. The
important point to keep in mind is
the word “occasionally”. Occasional
outages should be expected,
planned for, and accepted as cost of
doing business. What determines the
definition of an occasional outage is
a negotiated agreement between
two parties called a service level
agreement (or SLA).
SLAs really offer three things: a setting of expecta-tions, a small measure of financial compensation in the event of more than occasional outages, and a means of comparing two similarly priced services. Customers of cloud computing should recall that SLAs have their origins in the telephone company efforts to manage their risk. Like the airlines’ lost luggage policies, an SLA limits the service provider’s liability to a percentage of the service fee no matter how large the loss to the customer. So matter how carefully crafted an SLA, it’s just a piece of paper and cannot prevent an outage from occurring. Customers of cloud computing should also keep in mind that no matter how small or large the compensation, it will still be more than they will get from an internal organization under similar circumstances. So while SLAs are important, they should not provide a false sense of security.
THE FOUR STEP METHOD
OF CLOUD SERVICE
LEVEL AGREEMENTS
In order to not to devote more time to an SLA than it is worth, we have developed a four step method for dealing with them in a straightforward, business-like manner. The four steps are to define, negotiate, measure, and report.
In this article we will demonstrate the definition step by examining the public cloud SLAs of Amazon EC2 from an industry insider’s perspective. In later articles we will examine the best practices of SLA negotiation, measurement, and reporting.
TO
DEFINE
the SLA means answering basic
questions about uptime, maintenance
windows, recovery options, reliability,
resilience and determining basic
thresholds;
TO
NEGOTIATE
the SLA means agreeing
to compensation and penalties when
the basic definitions are not met, all the
while keeping in mind the cost involved
in achieving higher levels of performance
and
uptime;
TO
MEASURE
the SLA means establishing
concrete measurements to monitor
performance.
TO
REPORT
is the process of sharing
measurements, comparing them with
expected results, settling accounts, and
redefining and renegotiating if necessary.
AMAZON EC2 SERVICE
LEVEL AGREEMENT
Effective Date: October 23, 2008
This Amazon EC2 Service Level Agreement (“SLA”) is a policy governing the use of the Amazon Elastic Compute Cloud (“Amazon EC2”) under the terms of the Amazon Web Services Customer Agreement (the “AWS Agreement”) between Amazon Web Services, LLC (“AWS”, “us” or “we”) and users of AWS’ services (“you”). This SLA applies separately to each account using Amazon EC2. Unless otherwise provided herein, this SLA is subject to the terms of the AWS Agreement and capitalized terms will have the meaning speci-fied in the AWS Agreement. We reserve the right to change the terms of this SLA in accordance with the AWS Agreement.
SERVICE COMMITMENT
AWS will use commercially reasonable efforts to make Amazon EC2 available with an Annual Uptime Per-centage (defined below) of at least 99.95% during the Service Year. In the event Amazon EC2 does not meet the Annual Uptime Percentage commitment, you will be eligible to receive a Service Credit as described
below.
DEFINITIONS
“Service Year” is the preceding 365 days from the date of an SLA claim.
“Annual Uptime Per-centage” is calculated by subtracting from 100% the percentage of 5 minute pe-riods during the Service Year in which Amazon EC2 was in the state of “Region Unavail-able.” If you have been using Amazon EC2 for less than 365 days, your Service Year is still the preceding 365 days but any days prior to your use of the service will be deemed to have had 100% Region Avail-ability. Any downtime occur-ring prior to a successful Ser-vice Credit claim cannot be used for future claims. Annual
Outages
are not
cumulative.
Once
a claim
has been
resolved
the clock
starts ticking
again...
Although
99.95% up time
is impressinve it
still represents
4 hours and 38
minutes of down
time in year or 22
mintues a month
and is below the
industry standard
for private clouds.
For comparison
purposes it is
standard in the
business to convert
the percentages
to minutes and
compare on a
monthly basis
the typical billing
cycle.
An outage
is only an
outage if
it lasts for
more the
five minutes.
Uptime Percentage measurements exclude downtime re-sulting directly or indirectly from any Amazon EC2 SLA Exclu-sion (defined below).
“Region Unavailable” and “Region Unavailability” means that more than one Availability Zone in which you are running an instance, within the same Region, is “Unavail-able” to you.
“Unavailable” means that all of your running in-stances have no external connectivity during a five minute period and you are unable to launch replacement instanc-es.
The “Eligible Credit Period” is a single month, and re-fers to the monthly billing cycle in which the most recent Re-gion Unavailable event included in the SLA claim occurred. A “Service Credit” is a dollar credit, calculated as set forth below, that we may credit back to an eligible Am-azon EC2 account.
SERVICE COMMITMENTS AND
SERVICE
CREDITS
If the Annual Uptime Percentage for a customer drops be-low 99.95% for the Service Year, that customer is eligible to receive a Service Credit equal to 10% of their bill (exclud-ing one-time payments made for Reserved Instances) for the Eligible Credit Period. To file a claim, a customer does not have to have wait 365 days from the day they started using the service or 365 days from their last successful claim. A cus-tomer can file a claim any
time their Annual Uptime Percentage over the trail-ing 365 days drops below 99.95%.
We will apply any Service Credits only against future Amazon EC2 payments otherwise due from you; provided that, we may is-sue the Service Credit to the credit card that you used to pay for Amazon EC2 for the billing cycle in which the error occurred. Service Credits shall not entitle you to any refund or other payment from AWS. A Service Credit will be applicable and issued only if the credit amount for the applicable monthly billing cycle is greater than one dollar ($1 USD). Service Credits may not be transferred or applied to any other account. Unless otherwise provided in the AWS Agreement, your sole and exclusive remedy for any unavailability or non-performance of Amazon EC2 or other fail-ure by us to provide Amazon EC2 is the receipt of a Service Credit (if eligible) in accordance with the terms of this SLA or termination of your use of Amazon EC2.
Typically
most SLAs
provide that
a certain
number of
outages
result in the
termination
of the
agreement
with no
future
obligation to
the client.
If you file a
claim, then
you get
more service
credits
back. Most
private cloud
agreements
provide for
cash to be
returned.
This is within
the range
of industry
standard
compensation
although
some SLA’s
go as high at
25%.
CREDIT REQUEST AND PAYMENT PROCEDURES
To receive a Service Credit, you must submit a request by sending an e-mail message to aws-sla-request @ amazon.com. To be eligible, the credit request must (i) include your ac-count number in the subject of the e-mail message (the acac-count number can be found at the top of the AWS Account Activity page); (ii) include, in the body of the e-mail, the dates and times of each incident of Region Unavailable that you claim to have experienced in-cluding instance ids of the instances that were running and affected during the time of each incident; (iii) include your server request logs that document the errors and corroborate your claimed outage (any confidential or sensitive information in these logs should be removed or replaced with asterisks); and (iv) be received by us within thirty (30) business days of the last reported incident in the SLA claim. If the Annual Uptime Percentage of such request is confirmed by us and is less than 99.95% for the Service Year, then we will issue the Service Credit to you within one billing cycle following the month in which the request occurred. Your failure to provide the request and other information as
required above will disqualify you from receiving a Service Credit.
AMAZON EC2 SLA EXCLUSIONS
The Service Commitment does not apply to any unavailabil-ity, suspension or termination of Amazon EC2, or any other Amazon EC2 performance issues: (i) that result from Service Suspensions described in Section 7.1 of the AWS Agreement; (ii) caused by factors outside of our reasonable control, in-cluding any force majeure event or Internet access or related problems beyond the demarcation point of Amazon EC2; (iii) that result from any actions or inactions of you or any third
par-ty; (iv) that result from your equip-ment, software or other tech-nology and/or third party equip-ment, software or other technol-ogy (other than third party equip-ment within our direct control); (v) that result
from failures of individual instances not attribut-able to Region Unavailability; or (vi) arising from our suspension and termination of your right to use Amazon EC2 in accordance with the AWS Agreement (collectively, the “Amazon EC2 SLA Exclusions”). If availability is impacted by factors other than those explicitly listed in this agree-ment, we may issue a Service Credit consider-ing such factors in our sole discretion.
This is a catch
all clause
that could be
a source of
dispute about
the cause of
outage unless
the causes
are carefully
documented.
The key to
this clause is
measurement
and being
able to
demonstrate
your outage
was caused
by Amazon
and not by
your own