2.6 Service Level Agreements
2.6.1 Definition of Service Level Agreements and Its Relevant Terms
2.6.1.1 Service Level Agreements
Multifarious definitions exist for the term service level agreement (SLA) [35, 114]. According to Marilly et al. [33], an SLA can be defined as follows: “a service level agreement is a contract between providers and customers, usually in measurable terms, what services providers will furnish and what penalties provider will pay if he cannot meet the committed goals”. Hence, an SLA is a binding agreement that specifies what providers guarantees to deliver and can be offered to other providers or customers.
Gartner defines SLA as an agreement that sets the expectations between providers and customers and describes products or services to be delivered, the single point of contact for end users’ problems and metrics by which the effectiveness of the process is monitored and approved [115].
IBM states that “a service level agreement is a contract between a provider and a customer that specifies the expectations for the level of service concerning availability, performance, and other measurable objectives. SLA records a common understanding about services, priorities, responsibilities, guarantees, and warranties between parties. SLA can also specify levels of availability, serviceability, performance, operation, or other attributes of the service” [116].
Information Technology Infrastructure Library (ITIL) has been an important factor in spreading SLAs. According to ITIL, SLA is an agreement between a provider and
customer. SLA describes the IT service, documents service level targets, and specifies the responsibilities of providers and customers [117].
Based on these diversified definitions, this thesis focuses on SLAs as negotiated “agreements” between different parties/entities. As “agreements”, SLAs encapsulate a set of different aspects regarding the services provisioning. These refer to the agreed quality of service (QoS)– captured through different terms, the service level objectives that the service must guarantee in the form of constraints on QoS metrics, the responsibilities, and obligations of the parties, as well as the penalties in cases of non-compliance to the agreed terms. It specifies the levels of service that providers should provide to customers regarding objectives to attain different QoS aspects [35].
2.6.1.2 Service Level Objectives
Service Level Objectives (SLOs) refer to a set of formal expressions. These formal expressions have the well-known if...then structure. The antecedent (if) contains conditions and the consequent (then) contains actions. An action represents what a party has agreed to perform when the conditions are met [118].
SLOs are often quantitative and have related measurements. For customers who make informed decisions when choosing big data analytics as a service, it is best if the SLOs offered by each provider offering similar services can be easily compared.
2.6.1.3 SLA Metrics
SLA metrics represent measurement methods for the calculation of quality of service (QoS) values and define what services and guarantees provider will provide, which are often associated with a quantitative service level objectives (SLOs) [119, 120]. SLA metrics may be categorized as functional and non-functional. Functional properties cover aspects like the number of arguments and the semantics of operations. Non-functional properties define the service capabilities and robustness, covering terms regarding the QoS, security, and remedies for performance failures [121]. It is often true that a given SLO may have multiple different SLA metrics which can be used. It is essential that an SLA make it clear which metric(s) are being used for each quantitative SLO [122]].
2.6.1.4 Service Levels and Guarantees
Service levels and guarantees represent promises and guarantees with respect to graduated high/low ranges, e.g., average availability range [low: 95%, median: 97%, high: 99%,], so that it can be evaluated whether the measured metrics exceed, meet or fall below the defined service levels at a specific time point or in a certain validity period. They can be informally represented as if-then rules which might be chained in order to form graduations, complex policies and conditional guarantees, e.g., conditional rights and obligation with exceptions, violations and consequential actions: “If the average service availability during on month is below 95% then provider is obliged to pay a penalty of 20%. ”.
2.6.2
The Evolution of SLA
Over the last thirty years, service level agreement (SLA) has undergone significant evolution driven by the advancement of distributed computing paradigm in order to adapt for changes and new challenges in different computing environments per requirements. We firstly give a brief description regarding the evolution of distributed computing paradigms, then explain our proposed SLA evolutionary stages over these years.
2.6.2.1 Distributed Computing Paradigm
According to the works in [123, 124, 125, 126, 127, 128, 129, 130], the distributed computing paradigm has evolved through a number of significant phases. Mainly, they are Internet Computing [131], Peer-to-Peer Computing [132], Cluster Computing [133], Grid Computing [134], Utility Computing [135], Cloud Computing (CC) [69] and Big Data (BD) [136].
The introduction of computer networks in the 1970s led to the development of distributed systems [137]. Then, the Internet (originally ARPAnet) was developed as a network between government research laboratories and participating departments of universities. Commercial Internet service providers began to emerge in the very late 1980s [138]. Up to this time, a few technologies emerged in the distributed systems. Peer-to-Peer network is one of the primary distributed systems with the purpose to
enable sharing of data, such as streaming audio or video [128]. In the 1980s, Cluster Computing has emerged, which is used for high-performance computing tasks. Another well-known distributed computing paradigm is Grid Computing that appears in the mid-1990s as an evolution of Cluster Computing [128]. In the 2000s, Utility Computing was proposed based on the idea of providing computing solutions in a very similar way as traditional real-world public utilities (such as electricity, water, gas, and telephone) [124]. Utility Computing was the first step towards pay-by-use philosophy. Around 2007, CC has emerged as a popular distributed computing paradigm [139]. CC is linked with Utility Computing based on the fact that CC is generally based on a pay-per-use model in which guarantees are offered using customized SLAs [140]. Recently, the further advancement of computer technologies and distributed processing paradigms have generated a new paradigm over the cloud at the forefront of BD. A representative example of such paradigm is MapReduce [141] programming model that is designed to work with distributed data-intensive processing big data analytics applications (BDAAs) in cloud. This has inspired an open source distributed computing framework called Apache Hadoop [94] and its ecosystem for cloud-hosted BDAAs.
2.6.2.2 SLA Evolutionary Stages
As an essential and efficient method of managing relationships between providers and customers and guaranteeing the level of service, we examined that SLA has been successfully used in all the aforementioned distributed computing environments over the last thirty years. Accordingly, SLA has experienced remarkable evolution as the distributed computing paradigm advances to cater to changes and new challenges in each distinct computing environment. Figure 2.3 proposes a pictorial representation of SLA evolution and lists some representative references in each particular evolutionary stage.
Concretely, the main stages in the SLA evolutionary roadmap include SLAs for Internet Computing [142, 143, 144], SLAs for Peer-to-Peer Computing [145, 146, 147], SLAs for Cluster Computing [148, 149, 150, 151], SLA for Grid Computing [152, 153, 154], SLAs for Utility Computing [155, 156, 157], SLAs for Cloud Computing [114, 118, 122, 158, 159, 160, 161, 162, 163] and SLAs for cloud-hosted BDAAs [164, 165, 166,
Figure 2.3: SLA evolution stages with representative references
167]. In this thesis, we are not going to detail each stage considering space. Instead, we give a brief explanation regarding SLAs for Internet Computing, SLAs for Cloud Computing and SLAs for cloud-hosted BDAAs.
Historically, SLAs have originated with internet service providers in the 1980s, which forms the first stage (i.e., SLAs for Internet Computing). Since the late 1980s, SLA’s have been used by fixed-line telecom operators as part of their contracts with their corporate customers [168]. Various providers and customers need SLAs in the telecommunication marketplace. Hence, Internet service providers and telecoms will commonly include SLAs within the terms of their contracts with customers to define the level of service being sold in plain language terms. Further, in order to provide better practice advice, the Tele Management Forum had published the NGOSS SLA Management Handbook in 2001 which represents a milestone of SLA in its evolution [169, 170, 171]. Up to this time, NGOSS SLA Management Handbook is the most comprehensive and voluminous published collection regarding the management of SLAs with a focus on the Telecommunication Industry [172]. The drawbacks in this very early SLAs stage lie in the fact that the SLA metrics are limited to IP-based network performance measurements such as latency and packet loss, and the rigid specification
of the terms of SLAs, as it was not possible to adapt the values of SLA terms once they were deployed.
Since then, the SLA evolution continues from the stage of SLA for Peer-to-Peer computing to the stage of SLA for Utility Computing. Notably, the emergence of Grid Computing and Utility Computing triggered several important advancements in the specification of SLAs. This is because the openness and autonomy of grids and utility-oriented service provisioning model required specification formats that were not restricted to any organization or application domain’s syntax or semantics.
Further, the rapid growth of the cloud market leading to the emergence of new services, new ways for service provisioning and new interaction and collaboration models both amongst cloud providers and service ecosystems drives the extensive exploitation of SLAs for cloud computing. The continuous technological advancement of distributed computing drives the application of SLA to the current stage of SLAs for cloud-hosted BDAAs. In this stage, the importance and sophistication of SLA increases than ever before. However, the research on this particular stage is far from mature.
It is worth mentioning that although CC is widely applied, it still has some re- strictions. The fundamental restrictions lies in the correspondence between the cloud and the end devices. Such correspondence is not appropriate for a large set of cloud- based BDAAs such as the latency-sensitive applications (e.g.,disaster management, fire detection and firefighting, and content delivery applications etc.,). Hence, two new computing paradigms are proposed to address such issues. They are Fog Computing [173, 174] and Edge Computing [175, 176]:
• Fog Computing (FC): refers a novel architecture that extends the traditional CC architecture to the edge of the network. With fog, the processing of some application components (e.g., latency-sensitive ones) can take place at the edge of the network, while others (e.g., delay-tolerant and computational intensive components) can happen in the Cloud.
• Edge Computing (EC): refers to the enabling technologies allowing computation to be performed at the edge of the network, on downstream data on behalf of cloud services and upstream data on behalf of IoT services. The “edge” is denoted
as any computing and network resources along the path between data sources and Cloud data centers.
FC and EC are not substitutes for CC but powerful complements. Compared to CC, compute, storage, and network resources are still the building blocks of the Cloud in FC and EC environment. However, they extend CC by enabling provisioning resources and services outside the Cloud, at the edge of the network, closer to end devices or eventually, at locations stipulated by SLAs [174]. Therefore, SLA is still a fundamental element to manage the level of service for those applications deployed in FC and EC environment, especially for latency-sensitive ones [177, 178]. For FC and EC enabled BDAAs, SLA is often influenced by many aspects (e.g., energy usage, application characteristics, service cost, network status, data locations etc.,). [178]. Hence, SLA metrics in terms of these aspects should be one of their focusing points.