• No results found

Cloud computing The cloud as a pool of shared hadrware and software resources

N/A
N/A
Protected

Academic year: 2021

Share "Cloud computing The cloud as a pool of shared hadrware and software resources"

Copied!
8
0
0

Loading.... (view fulltext now)

Full text

(1)

Towards SLA-oriented

Cloud Computing

Sara Bouchenak

INSA Lyon Sara.Bouchenak@insa-lyon.fr 3rdFranco-American Workshop on CyberSecurity, December 8-10, 2014, Lyon, France

Cloud computing

 The cloud as a pool of shared hadrware and software resources

 The cloud as a means for distributed applications to:  pick up required resources

 access “infinite” remote resources

 access on - demand resources (pay - as - you - go)  transparent resource management

cpu, memory, disk operating systems,

virtual machines middleware layers (e.g. application servers)

cpu, memory, disk operating systems,

virtual machines middleware layers (e.g. application servers)

cpu, memory, disk operating systems,

virtual machines middleware layers (e.g. application servers) …

cloud

December 10, 2014 2

© S. Bouchenak

QoS, SLOs, SLA

• QoS

 Many quality-of-service criteria

 Performance (e.g. service response time, service tjhroughput)  Dependability, Availability (e.g. service abadon rate), Reliability  Security, etc.

• Costs

 Energetic costs  Financial costs

• SLOs

 Service Level Objective for a given QoS criterion/metric

 Examples:

 a target service level, e.g. a minimum service throughput, a maximum service abandon rate

 a service level interval

 service level maximization/minimization

• SLA (Service Level Agreement)

 A contract between the service provider and service customer  Ideally, a combination of a set of SLOs and cost constraints

 Example: “99% of service requests are processed within 1s with a minimal energetic cost”

December 10, 2014 3

© S. Bouchenak

QoS and SLA in clouds?

• Some initiatives

 Amazon EC2, Rackspace, 3tera clouds

Restricted to a single QoS criterion

 Service unavailability due to computer failures

 Other QoS aspects not tackled (performance, security, energy, financial costs, etc.)

Ad-hoc and incomplete approaches

 Is the SLA guaranteed/violated by the cloud?

 E.g. Amazon EC2 customers must provide proofs of cloud service unavailability: capture the failure, document it, send the "proof" to Amazon, within 30 days

 This is against one of the main motivations of Cloud Computing: "hide complexity of resource management and provide simple access to cloud services by the customer"

December 10, 2014 4

(2)

Open Challenges & Perspectives

Multicriteria SLA: dependability, performance, cost, etc.

Towards scalable and distributed SLA control

Consider different applications and Big Data services

December 10, 2014 5

© S. Bouchenak

Towards SLA - aware clouds

Objective 1: Define a new cloud model, the SLAaaS (SLA aware Service)

 Orthogonal to other cloud models (IaaS, PaaS, SaaS)

 A cloud presents, along with its service interface, a non-functional SLA interface  Allow a customer to compare different cloud service providers regarding the provided SLOs

Objective 2: Autonomic reconfiguration of cloud services

 From cloud provider perspective

 Multi-objective SLA between cloud provider and cloud customer  Fully elastic cloud via dynamic resource re-allocation, reconfiguration  Handle cloud dynamics, workload variations

Objective 3: SLA governance in the cloud

 From cloud customer point of view

 Automatically notify the customer about SLA violation, energy footprint, etc.

Objective 4: Big Data services, Benchmarking tools

 Stress/evaluate dependability and scalability of Big Data cloud services, real workloads

AMADEOS project: http://amadeos.imag.fr/ MyCloud project: http://mycloud.inrialpes.fr/

December 10, 2014 6

© S. Bouchenak

Towards SLA - aware clouds

Objective 1: Define a new cloud model, the SLAaaS

Objective 2: Autonomic reconfiguration of cloud services

 Challenges

 Towards a control - theoretic approach [ACM OSR 2013]

 ConSer: Control of server systems [IEEE Trans. Comp. 2011]

 MoKa: Control of multi - tier distributed web systems [IGI Global, 2011]

 MoMap: Control of MapReduce systems [ACM CCGrid 2013]

Objective 3: SLA governance in the cloud

Objective 4: Big Data services, Benchmarking tools

 MRBS: Bechmarking framework for Hadoop MapReduce [IEEE SRDS 2012]

December 10, 2014 7

© S. Bouchenak

Towards SLA - aware clouds

Objective 1: Define a new cloud model, the SLAaaS

Objective 2: Autonomic reconfiguration of cloud services

 Challenges

 Towards a control - theoretic approach

[ACM OSR 2013]

 ConSer: Control of server systems [IEEE Trans. Comp. 2011]

 MoKa: Control of multi - tier distributed web systems [IGI Global, 2011]

 MoMap: Control of MapReduce systems [ACM CCGrid 2013]

Objective 3: SLA governance in the cloud

Objective 4: Big Data services, Benchmarking tools

 MRBS: Bechmarking framework for Hadoop MapReduce [IEEE SRDS 2012]

December 10, 2014 8

(3)

1) Complex SLOs

 Multiple service level objectives (SLOs)

 performance, availability, dependability, security, etc.  Trade - off antagonist SLOs

 “at least 99% of client requests are admitted and processed within 1s, with a minimal financial cost”

2) From SLOs to resource allocation/configuration

Challenges in autonomic reconfiguration of cloud services

availability performance

Resources

Small instance $0.085 per hour Large instance $0.34 per hour Extra large instance $0.68 per hour Number of instance X unitary price

Amazon EC2 cloud

SLOs

Availability level 99% of requests are processed Performance level requests processed within 1s Cost constraint minimal cost

bime cloud application

cost

Nontrivial SLOs-to-resource allocations

December 10, 2014 9

© S. Bouchenak

3) Time-varying and nonlinear behavior  Workload amount (#concurrent client requests)

Workload amount of the soccer World Cup Web Site [Arlitt et. al., HP 99]

Challenges in autonomic reconfiguration of cloud services

December 10, 2014 10

© S. Bouchenak

Control knobs

(i.e. resource allocations):

• Cache size

• Server admission control • Server provisioning • Content quality level • …

System outputs:

• Cache hit ratio • Service QoS • Resource utilization • Service differentiation ratio • …

Exogenous inputs:

• Workload amount

• Workload mix

A control - theoretic approach

Feedback control loop

control knobs

Controller Target system

SLOs

exogenous inputs

measured service levels service costs

December 10, 2014 11

© S. Bouchenak

 (1) Utility: State the objective and capture the trade-off  Multicriteria utility function

 (2) Model: Describe system behavior

 Relationship between allocated resources and service levels and costs

 (3) Control: Solve the system

 Calculate (optimal) resource allocation  Maximize utility function  Based on the model

 (4) Implement the solution

 Translate theoretical optimal solution into concrete implementation  Not trivial: automatically (re)determine model’s parameters

System model exogenous variables resource allocations in p u

ts predicted service levels

o u tp u ts service costs

Followed approach

December 10, 2014 12 © S. Bouchenak

(4)

Towards SLA - aware clouds

Objective 1: Define a new cloud model, the SLAaaS

Objective 2: Autonomic reconfiguration of cloud services

 Challenges

 Towards a control - theoretic approach [ACM OSR 2013]

 ConSer: Control of server systems [IEEE Trans. Comp. 2011], PhD L. Malrait  MoKa: Control of multi - tier distributed web systems

[IGI Global, 2011], PhD J. Arnaud  MoMap: Control of MapReduce systems

[ACM CCGrid 2013], PhD M. Berekmeri • Objective 3: SLA governance in the cloud

Objective 4: Big Data services, Benchmarking tools

 MRBS: Bechmarking framework for Hadoop MapReduce [IEEE SRDS 2012], PhD A. Sangroya December 10, 2014 13 © S. Bouchenak Admission control MPL



Server admission control

Prevent server thrashing, denial-of-service Multi-Programming Level (MPL)

Classical configuration parameter in server systems Apache Web server’s MaxClients

MySQL database server’s max_connections

Control of server systems

server clients

rejected

December 10, 2014 14

© S. Bouchenak

Trade off between server performance and availability

Experiments conducted with PostgreSQL database server, running TPC-C benchmark

Performance (client request latency) Availability (client request abadon rate)









How to configure server’s MPL trading-off

performance and availability?

December 10, 2014 15

© S. Bouchenak

Related work – Server admission control

• Ad-hoc techniques, heuristics

 [Menascé et al., EC’01]  Best - effort behavior

• Linear models

 [Diao et. al., NOMS’02] [Parekh et al., RTS’02]

 Can not render the whole nonlinear behavior of server systems

• Nonlinear models based on queueing theory

 [Robertsson et. al. CDC’04] [Tipper et. al., JSAC’90] [Wang et. al. INFOCOM’96]  Multiple model parameters, hard to calibrate

 Do not tackle full dynamics (workload types)  Restricted to a single QoS aspect, SLO

December 10, 2014 16

(5)

 (1) Utility: State the objective and capture the trade-off

AM-C (availability-maximizing objective)

(P1) average client request latency does not exceed a given Lmax

(P2) and abandon rate is made as small as possible PM-C, PA-AM-C, AA-PM-C

* L. Malrait, S. Bouchenak, N. Marchand. Experience with ConSer: A System for Server Control Through Fluid Modeling. IEEE Transactions on Computers, 60(7), 2011.

In collaboration with the NeCS INRIA research group on Control Theory latency and

abandon rate SLOs controlled

MPL

Controller Target server L (latency)

α (abandon rate)

SLOs: L ≤≤≤≤ Lmax& αααα

AM-C Controller workload

amount N & mix M

ConSer*: Control of server systems

December 10, 2014 17

© S. Bouchenak

ConSer modeling

latency and abandon rate SLOs

controlled MPL

Controller Target server

L (latency) α (abandon rate) SLOs: L ≤≤≤≤ Lmax& αααα

AM-C Controller

(2) Nonlinear fluid modeling

Server model (workload amount) N (workload mix) M L (latency) α (abandon rate) workload

amount N & mix M

request latency request abandon rate throughput of processed requests admitted requests

request latency request abandon rate throughput of processed requests admitted requests State variables:

MPL Control input:

Exogenous inputs: Outputs:

(incoming throughput) Ti

December 10, 2014 18

© S. Bouchenak

ConSer control

latency and abandon rate SLOs

controlled MPL

Controller Target server

L (latency) α (abandon rate) SLOs: L ≤≤≤≤ Lmax& αααα

AM-C Controller

(3) Control server’s MPLAM-C (availability-maximizing control)

(P1) average client request latency does not exceed a given Lmax (P2) and abandon rate is made as small as possible

If L > Lmax; ¬(P1) ; MPL a decreased value of Ne

If L < Lmax; (P1) & possibly ¬(P2) ; MPL an increased value of Ne • Efficient control: O(1)

workload amount N & mix M

; γ > 0

December 10, 2014 19

© S. Bouchenak

ConSer AM-C control evaluation

Experiments conducted with PostrgeSQL database server running TPC-C benchmark, AM-C control law, Lmax= 8s Performance improved by up to 30%

December 10, 2014 20

(6)

ConSer AM-C control evaluation

Experiments conducted with PostrgeSQL database server running TPC-C benchmark, AM-C control law, Lmax= 8s

December 10, 2014 21

© S. Bouchenak

Towards SLA - aware clouds

Objective 1: Define a new cloud model, the SLAaaS

Objective 2: Autonomic reconfiguration of cloud services  Challenges

 Towards a control - theoretic approach [ACM OSR 2013]

 ConSer: Control of server systems [IEEE Trans. Comp. 2011], PhD L. Malrait

 MoKa: Control of multi - tier distributed web systems [IGI Global, 2011], PhD J. Arnaud

 MoMap: Control of MapReduce systems [ACM CCGrid 2013], PhD M. BerekmeriObjective 3: SLA governance in the cloud

Objective 4: Big Data services, Benchmarking tools

 MRBS: Bechmarking framework for Hadoop MapReduce [IEEE SRDS 2012], PhD A. Sangroya

December 10, 2014 22

© S. Bouchenak

• MapReduce Big Data applications A popular programming model A runtime environment on cluster

of commodity computers • Automatic Data partitioning Data replication Task scheduling Fault tolerance

• A wide range of applications  log analysis,

 data mining,  web search engines,  scientific computing,  business intelligence,  etc.

• Big companies use it  Amazon, eBay, Facebook,

LinkedIn, Twitter, Yahoo!, etc.

Big Data Systems - MapReduce

December 10, 2014 23

© S. Bouchenak

• Lots of work to improve MapReduce dependability and performance

 New fault-tolerance models

 [Costa, CloudCom 11]  Replication and partitioning policies

 [Ananthanarayanan, EuroSys 11] [Eltabakh,VLDB 11]  Scheduling policies

 [Zaharia, OSDI 08] [Isard, SOSP 09] [Zaharia, EuroSys 10]  Cost-based optimization

 [Herodotou, VLDB 11]  Resource provisioning

 [Verma, Middleware 11]

• Most evaluations use micro-bechmarks

 Not representative of full distributed, concurrent applications  Not representative of realistic workloads

 No dependability benchmarking

Motivation

December 10, 2014 24

(7)

Empirical evaluation of dependability and performance of MapReduce Fault - tolerance

Scalability

• Variety of application domains, workloads and dataloads Compute - oriented vs. data - oriented applications Batch applications vs. real - time applications

• Variety of Big Data workloads and faultloads Various workloads, dataloads

Different fault models Different fault rates

• Portable and easy to use on a wide range of clouds Different cloud infrastructures

MRBS objectives

* A. Sangroya, D. Serrano, S. Bouchenak. Benchmarking Dependability of MapReduce Systems.

The 31stIEEE Int. Symp. on Reliable Distributed Systems (SRDS 2012), Irvine, CA, Oct. 2012.

December 10, 2014 25

© S. Bouchenak

MRBS characteristics

December 10, 2014 26

© S. Bouchenak

Response time with Hadoop 1.0: up to +40%

Experiments conducted on a ten node Hadoop cluster

Use-case: Comparing two MapReduce frameworks w.r.t.

performance & dependability

How does Hadoop 1.0 compare to Hadoop 0.20 w.r.t. performance?

Throughput with Hadoop 1.0: up to -42%

December 10, 2014 27

© S. Bouchenak

Less failed jobs with Hadoop 1.0

Experiments conducted on a ten node Hadoop cluster

Use-case: Comparing two MapReduce frameworks w.r.t.

performance & dependability

How does Hadoop 1.0 compare to Hadoop 0.20 w.r.t. dependability?

December 10, 2014 28

(8)

Less I/O failures with Hadoop 1.0

Experiments conducted on a ten node Hadoop cluster

Use-case: Comparing two MapReduce frameworks w.r.t.

performance & dependability

How does Hadoop 1.0 compare to Hadoop 0.20 w.r.t. dependability?

December 10, 2014 29

© S. Bouchenak

Conclusion & Perspectives

Multicriteria SLA by design

Different applications

December 10, 2014 30

References

Related documents

The study reports on primary school pupils’ perception of an extensive reading (ER) and writing project and their response to the reading material offered, including a focus on

In order to understand the impact of a servant leadership style on employee satisfaction, the researcher undertook a quantitative study which asked if there was a

Upon approval of an area for open storage of collateral classified information, the approval authority shall issue a memorandum to the requesting OE, citing the specific location,

In this thesis, I will explore the question: “ How are social interactions between immigrant-hosts and their visiting friends and relatives understood and interpreted by

In previous schemes, the base station does not know how many messages are aggregated from the decrypted aggregated result; leaking count knowledge will suffer

HIV, human immunodeficiency virus; CDC, Centers for Disease Control; AIDS, acquired immunodeficiency syndrome.. Many health and education departments, social service agencies,

Aim: The aim of this study was to identify the leadership styles of nurse managers working at Saudi Arabian hospitals located in the Eastern province and also to assess

6 Install the full version of SQL 2005 with an AUTODESKVAULT instance as in the Install MS SQL Server Prior to Installing Autodesk data management server section.. 7 Install