Analysis of existing Cloud technologies and Cloud modelling concepts and prototype requirements

(1)

The research leading to these results has received funding from the European Community's Seventh Framework Programme [FP7/2007-2013] under grant agreement n° 318484 (MODAClouds).

Title:

Analysis of existing Cloud technologies and Cloud modelling concepts

and prototype requirements

Authors:

Nicolas Ferry (SINTEF), Arnor Solberg (SINTEF), Alessandro Rossini

(SINTEF), Santo Lombardo (POLIMI),

Oscar Locatelli (POLIMI),

Marco Brambilla (POLIMI), Marcos Almeida (SOFTEAM), Anthonin

Abhervé (SOFTEAM)

Editor:

Nicolas Ferry (SINTEF)

Reviewers:

Tabassum Sharif (FLEXI) and Giuliano Casale (Imperial)

Identifier:

Deliverable # D4.1

Nature:

Report

Version:

1

Date:

1 April 2013

Status:

Final

Diss. level:

Public

Executive Summary

This deliverable exhibits an analysis of the state of the art in Cloud technologies and Cloud modelling

concepts. This analysis is done at both Cloud-enabled Computation Independent Model and Cloud

Provider Independent Model levels. Subsequently, it presents the WP4 requirements according the

requirements specification template provided in D3.1.1. The approach is use case based.

(2)

Members of the MODAClouds consortium:

Politecnico di Milano Italy

Stiftelsen SINTEF Norway

Institute E-Austria Timisoara Romania

Imperial College of Science, Technology and Medicine United Kingdom

SOFTEAM France

Siemens Program and System Engineering Romania

BOC Information Systems GMBH Austria

Flexiant Limited United Kingdom

ATOS Spain S.A. Spain

CA Technologies Development Spain S.A. Spain

Published MODAClouds documents

(3)

Contents

INTRODUCTION ... 7

1.1

CONTEXT AND OBJECTIVES ... 7

1.2

STRUCTURE OF THE DOCUMENT ... 8

2

SURVEY CLOUD TECHNOLOGIES AND CLOUD MODELLING CONCEPTS ... 9

2.1

KEY CHALLENGES FROM MODACLOUDML GENERAL OBJECTIVES AND CASE STUDIES ... 9

MODACloudML general objectives ... 9

2.1.1

Case study 1: Project management server ... 9

2.1.2

Case study 2: Business process modelling system ... 9

2.1.3

Case study 3: Health-‐care application ... 9

2.1.4

Case study 4: A smart city urban safety planner ... 10

2.1.5

Summary of MODACloudML challenges ... 10

2.1.6 2.2

STATE OF THE ART ... 10

Cloud-‐enabled Computation Independent Modelling concepts and technologies ... 10

2.2.1

Modelling concepts and technologies for the provisioning, deployment and adaptation of 2.2.2 applications in the cloud (CPIM/CPSM) ... 12

Data persistence ... 22

2.2.3 2.3

DESIGN TIME SCHEMA TRANSFORMATION ... 29

Synthesis of the state of the art ... 29

2.3.1 3

REQUIREMENT SPECIFICATION ... 29

3.1

CONTEXT AND SYSTEM OVERVIEW ... 30

Context ... 30

3.1.1

System boundary model ... 31

3.1.2 3.2

USE CASE SPECIFICATION FOR THE CPIM LEVEL SPECIFICATION ... 32

Use case heading ... 32

3.2.1

Use case description ... 33

3.2.2

Use case scenarios ... 33

3.2.3

Information model ... 35

3.2.4

Interface specification ... 35

3.2.5

QoS requirements ... 35

3.2.6 3.3

USE CASE SPECIFICATION FOR THE CPSM DERIVATION ... 36

3.3.1

3.3.2

3.3.3

3.3.4

3.3.5

3.3.6 3.4

USE CASE SPECIFICATION FOR THE CLOUDAPP PROVISIONING AND DEPLOYMENT USE CASES ... 38

3.4.1

3.4.2

3.4.3

3.4.4

3.4.5

3.4.6 3.5

USE CASE SPECIFICATION FOR THE MODEL BASED RUNTIME MANAGEMENT AND ADAPTATION USE CASES ... 40

3.5.1

3.5.2

3.5.3

3.5.4

3.5.5

3.5.6

(4)

3.6

CIM MODELLING SUPPORT ... 43

Context and system overview ... 43

3.6.1

Use case specification for the Define Application Services use case ... 45

3.6.2

Use case specification for the Define Services Orchestration use case ... 46

3.6.3

Use case specification for the Define Service Requirements use case ... 47

3.6.4

4

ROADMAP ... 48

5

BIBLIOGRAPHY ... 50

(5)

Table of Figures

Figure 1 MODAClouds Architecture ... 8

Figure 2 The stack of cloud solutions ... 12

Figure 3 Anatomy of an application in Cloudify ... 18

Figure 4 Overview of the models@runtime approach ... 21

Figure 5 ORM layer ... 22

Figure 6 Key-Value Data Model ... 25

Figure 7 Document-based Data Model ... 26

Figure 8 Column-oriented Data Model ... 27

Figure 9 Graph-based Data Model ... 27

Figure 10 Scope with respect to the MODAClouds reference architecture; Run time adaptation, Data Synchronization and MODACloudML CIM, CPIM and CPSM modelling. ... 30

Figure 11 Overall MODAClouds approach, the scope of WP 4 is indicated by the blue squares, while the green squares are interacting elements. ... 31

Figure 12 System boundary model. ... 32

Figure 13 Main scenarios of the CPIM level specification. ... 34

Figure 14 Information model for PIM level specification ... 35

Figure 15 Main scenarios of the CPSM derivation. ... 37

Figure 16 Main scenarios of the CPSM derivation. ... 39

Figure 17 Exemplifying the vision of a provisioning and deployment wizard ... 40

Figure 18 Main scenarios of the model based management and adaptation use cases ... 42

Figure 19 CIM Model ... 44

Figure 20 System boundary model ... 44

Figure 21 Use Case Figure ... 45

Figure 22 Information: Define Application Services ... 45

Figure 23 Use Case diagram ... 46

Figure 24 Information: Define Services Orchestration ... 46

Figure 25 Requirements Use Case diagram ... 47

(6)

(7)

Introduction

1.1

Context and objectives

Cloud computing is a computing model enabling ubiquitous network access to a shared and virtualised pool of computing capabilities (e.g., network, storage, processing, and memory) that can be rapidly provisioned with minimal management effort [1]. The landscape of cloud computing encompasses a multitude of cloud providers, as well as several infrastructure-as-a-service (IaaS) and platform-as-a-service (PaaS) [1] solutions. The ability to run, monitor and adapt multi-cloud systems (i.e., applications and services on multiple clouds) allows exploiting the peculiarities of each cloud solution and hence optimising performance, availability, and cost of the applications and services. However, these cloud solutions are typically heterogeneous and the provided features are often incompatible. This diversity is an obstacle with respect to demands such as promoting interoperability and preventing vendor lock-in. Indeed, it hinders the exploitation of the full potential of cloud computing by increasing the complexity of development and administration of multi-cloud systems. This challenge needs to be addressed.

There are several academic and industrial projects that aim at addressing this challenge (see [2], [3], [4], [5], to mention but a few) by providing seamless solutions for provisioning, deployment, monitoring and adaptation of cloud systems. The results from these projects are paramountly important to promote interoperability and prevent vendor lock-in, but they are not sufficient to properly manage the complexity of development and administration of multi-cloud systems [6].

MODAClouds proposes a model-driven approach for the design and execution of applications on multiple clouds. The model-driven approach, commonly summarised as “model once, generate anywhere”, is particularly relevant when it comes to provisioning and deployment of applications and services across multiple clouds, as well as migrating the source code from one cloud to another. The model-driven engineering (MDE) approach adopted allows the developers to build the system at various level of abstraction. As depicted in Figure 1, the three levels envisioned are: (i) the Cloud-enabled Computation Independent Model (CIM) to describe an application and its data, (ii) the Cloud-Provider Independent Model (CPIM) to describe cloud concerns related to the application in a cloud agnostic way, and (iii) the Cloud-Provider Specific Model (CPSM) to describe the cloud concerns needed to deploy and provision the application on a specific cloud.

On the basis of this architecture, this document presents the state of the art in cloud modelling concepts and environments at all levels, as well as requirements for the MODACloudML platform that will be developed within WP 4.This platform provides methods and techniques for provisioning, deployment, and adaptation on multiple clouds. This way, CPIM and CPSM can be regarded as the models that can be manipulated by tools for provisioning, deployment and adaptation of cloud-based applications. At these levels, the deliverable presents a classification of the state of the art in tools for the provisioning and deployment of application in the cloud and discusses about their use of MDE. Since this discussion focus on computational resources and elements of an application, the deliverable will also present the state of the art in data persistence. A result of the latter is to help us in expressing preference for instance to choose a database.

Then, in the rest of the deliverable, we will specify WP4 requirements at all levels (CIM, CPIM and the derivation of CPSM) and for the provisioning, deployment and adaptations features. These specifications are driven and illustrated by use cases.

(8)

Figure 1 MODAClouds Architecture

1.2

Structure of the document

The remainder of the document is organised as follows. Section 2 highlights some challenges related to the definition of MODACloudML, and presents an analysis of the state of the art of cloud technologies and cloud modelling concepts on the basis of these challenges. In particular, it outlines the state of the art in modelling concepts and technologies at the CIM level, and then it discusses cloud concerns at CPIM and CPSM levels. On this basis, Section 3 presents the specifications of WP4 requirements, based on use cases. Finally, Section 4 presents the roadmap for WP4.

Developer

CIM

DSS

CPIM

CPSM

Semi-automatic transformation Automatic deployment Decision making New or legacy applications design Code development

D

e

s

ig

n

-t

im

e

R

u

n

-t

im

e

Goal: QoS assurance & costs minimization

IDE

Monitoring & Data syncronization Run-time adaptation

Service Operator Management

(9)

2

Survey cloud technologies and cloud modelling concepts

This Section presents an analysis of the state of the art in cloud technologies and cloud modelling concepts. The different works presented are discussed with respect to the challenges to be addressed within MODACloudML.

2.1

Key challenges from MODACloudML general objectives and case studies

This Section presents the general objectives of MODACloudML and the challenges associated. These challenges are extracted from MODAClouds case studies.

MODACloudML general objectives

2.1.1

The MODACloudML platform will provide a software engineering methodology and a tool for the model-driven design of multi-clouds systems (i.e., application and services on multiple clouds). The purpose of the proposed modelling approach is to hide to designers the technical details of cloud providers while helping them to fully exploit their peculiarities by capturing requirements on distribution, QoS, and other concerns that are important when deploying an application in the clouds. The scope of MODACloudML is to provide methods and techniques for provisioning, deployment, and adaptation on multiple clouds (please note that with this multiclouds aspect the scope of the platform is broader than the scope of the runtime). The engineering of components, services or any other software artefact can be done using any of the many approaches that already exists for that purpose.

Case study 1: Project management server

2.1.2

The Modelio TeamWork server is a repository with versioning which stores models defined within the Modelio case tool. A prototype of this server is currently deployed on top of Softeam's own experimental cloud infrastructure. In a production environment, some QoS requirements should be respected and, for instance, during peak of demands on the server, we should be able to move the server from one provider to another or scale the system.

Extracted challenges: When migrating from one cloud to another, the model-driven approach adopted by MODACloudML should abstract from IaaS provider specific concepts to ease the process and then decrease the cost of migration. However, this abstraction should not prevent to exploit providers peculiarities, for instance, for scalability purposes. This scalability concern also requires that the models of the system should be adapted at runtime to scale up and down.

Case study 2: Business process modelling system

2.1.3

ADOxx is an application that supports business process modelling. From the end-user perspective, deploying this application on the cloud could benefit in terms of availability and performance. Report analysis is a complex activity that can be time consuming depending on several factors such as the size of the model repository or the complexity of the analysis. In such case, this analysis is executed asynchronously as a batch job during the night. By using appropriate cloud providers and solutions (such as cloud elasticity) this time can be reduced by 50%.

Extracted challenges: A challenge for MODACloudML from this case study is the migration of the legacy application to the cloud. MODACloudML should be agnostic to any development paradigm and technology, meaning that developers can design and implement the applications and services based on their preferred paradigms and technologies. Another challenge is to provide methodology and tools to fully exploit clouds peculiarities by focusing on cloud concerns rather than implementation details.

Case study 3: Health-care application

2.1.4

The Health-care application is an existing monolithic application for the home-based treatment of patients affected by some dementia and also for the formulation of a home care strategy by their care-givers. A high level architecture of the Health-care MODAClouds case study includes: (i) the patients and carers data storage that will benefit from the cloud in terms of scalability and performance and will be deployed in a private IaaS to better address privacy and security issues. (ii) A Server Application that implements the core functionalities of the platform, i.e. secure communication with client applications, risk assessment and analysis, adverse event detection plus a Web-based Graphical User Interface (GUI) for clinicians and platform administrators. This will benefit from the cloud in terms of scalability by taking advantage of the auto-scaling and pay per use features the

(10)

cloud offers, and it will require the cloud to establish a high level SLA oriented to the application and not to the infrastructure. (iii) A Carers Client Application used by carers and patients to access the services of the platform and deployed on the cloud as a Virtual Desktop. This requires allocating/de-allocating environments as carers come and go.

Extracted challenges: The described cloud-based applications will be deployed in a federated multi-cloud PaaS/IaaS Infrastructure that implements a hybrid cloud scenario. MODACloudML should provide modelling concepts for multiple private, public, or hybrid clouds at both IaaS and PaaS levels. MODACloudML should enable the deployment of such multi-cloud systems.

Case study 4: A smart city urban safety planner

2.1.5

The goal of this case study is to develop a city urban safety planner for the management of fire incidents. The scenario considers an area where high density population is served by an old, hard to maintain gas pipe network. Gas detectors, traffic sensors in the road, CCTV cameras and electricity circuit breaker are in place and are managed by an already existing Internet of Things (IoT) platform. The goal of the case study is to develop a city planner able to predict: (i) the potential failure of gas detector sensors by analysing data from gas sensors, and (ii) the impact of a fire by analysing the videos taken from CCTV cameras in the nearby. The planner should be deployed on PaaS with different characteristics. The infrastructure processing power has also to scale up and down very quickly to manage the peaks of data flows which are not constant all over the measuring areas. Finally, data replication and migration mechanisms have to be put in place to avoid loss of data in case of failure of one of the application instances.

Extracted challenges: MODACloudML should support the deployment of the same application on multiple clouds. The planner should be easily deployed on different PaaS, MODACloudML should enable the deployment of applications on PaaS in a cloud agnostic way. Another challenge from this case study is related to data migration and replication. MODACloudML should enable data replication on multiple clouds.

Summary of MODACloudML challenges

2.1.6

On the basis of the challenges extracted from our four case studies we can summarize the challenges of the MODACloudML platform as below:

• Providing modelling concepts for the provisioning, deployment and adaptation of application on multi-cloud mixing both IaaS and PaaS levels.

• Providing through the model-driven architecture various levels of abstraction that allows developers to abstract from IaaS/PaaS provider specific concepts while fully exploiting clouds peculiarities.

• Being agnostic to any development paradigm and technology, meaning that developers can design and implement applications and services based on their preferred paradigms and technologies.

• Providing an approach and concepts for data replication on multiple clouds at both IaaS and PaaS levels.

The analysis of the state of the art will focus on the ability of the various modelling concepts and technologies presented to address some of these challenges.

2.2

State of the art

The state of the art is organized on the basis of the MODAClouds architecture. We will present modelling concepts and technologies first at the CIM level and then at the CPIM level. The latter will be decomposed in two subsections on (i) provisioning, deployment and adaptation of software artefacts and (ii) data persistence.

Cloud-enabled Computation Independent Modelling concepts and technologies

2.2.1

The Cloud-enabled Computation Independent Models (CIM) describes cloud applications at the service level. Hence, for a given application, it contains the description of the services that compose it. Besides, it contains the public interfaces of each service, the business processes that describe their orchestration and the domain model of the data exchanged by them through their public interfaces. The CIM model includes also the support of requirements defining constraints associated with the application and a model of resource consumption associated to the utilisation of its services. In this Section we are going to describe the existing and related technologies that allow a developer to describe each part of the CIM.

(11)

We start this Section by analysing approaches that are able to describe services and their interaction. That is the case of the Services Oriented Architecture (SOA) related technologies which define applications as sets of services that communicate through well-defined interfaces. SOA-enabled technologies may therefore be used to define cloud-enabled applications without going into the fine details of deployment. Most of the time services are simply modelled by means of general purpose languages such as UML. For example, in [7], the generic UML concepts of class and interface are used to define services and their public interfaces respectively. Some service-specific languages have also been designed for SOA. We take SoaML [8] and SOMF [9]. SoaML defines a MOF metamodel and a UML profile while SOMF defines a completely new language for defining service-related concepts. It reuses and extends the UML concepts of components and ports to define respectively the services and their interfaces. Classes are used to define the entities in the domain model of the application that are manipulated by the services. SOMF also includes a sublanguage called Cloud Computing Modelling Notation (CCMN), whose concepts such as IaaS, PaaS and SaaS clouds, and clouds of clouds; and service orchestration based on activity diagrams could be embedded into MODACloudML.

Apart from these industry driven initiatives, academic works such as the Unified Services Description Language (USDL) [10] also provide the necessary abstraction to describe a services oriented architecture. This language goes even further, by allowing designers to specify, beside services and their interfaces, non-functional aspects on these services (e.g. marketing, pricing, legal, certification, documentation, etc). These aspects are however not cloud specific. This is a drawback that will be dealt with by MODACloudML.

The works outlined in the previous paragraph deal with the generic concept of service. Other works adress the specific concept of Web Service. These works can be separated into two groups. The first group is formed by languages like WDSL [11], which enable the specification of a list of services, interfaces, data types and orchestration processes at a syntactical level. The second group is formed by the so-called Semantic Web languages such as WSML [12] and OWL-S [13], which enable the specification of the semantics of the services, besides their syntax. The semantics is defined by means of logic formalisms with the objective of achieving high level services such as automatic service selection, discovery and composition. This is however out of the scope of MODAClouds.

The main drawback of the approaches in the previous category is that they do not allow for the description of non-functional requirements and constraints. There are, however, other approaches that take this kind of constraint into consideration. For example, several extensions of WSDL include non-functional requirements to service interface descriptions [14] [15]. These extensions usually consist of logical languages to link each service to a list of so-called policy assertions that it should enforced at runtime. At the modelling level, the OMG UML profile for QoS [16] also allows a designer to specify QoS requirements and to connect them to service descriptions. Reusing and extending such languages is part of the scope of MODAClouds, for more details about modelling languages allowing the specification of QoS constraints and requirements, please refer to the deliverable D5.1.

Another important aspect of the CIM model that is usually neglected by service specifications is the model of the resource consumption associated to the services that compose the application. In MODAClouds, this piece of information is used in both finding a CPIM and CPSM combination that respect the constraints and requirements imposed by the CIM; and in providing a feedback loop in which this part of the model can be updated from runtime collected monitoring information. The state of the art in the representation of resource consumption is however not in the scope of the present document. For more details, please refer to the deliverable D2.1.

Besides the service definition, their associated requirements and resource consumption information, the CIM model should also provide a description of their interactions, which is what is called “service orchestration”. There are several languages for the definition of service orchestrations [17] [18] [19] [20] [21] [22]. In this Section we are going to focus in standards such as Business Process Execution Language (BPEL) [17] and Web Services Choreography Description Language (WS-CDL) [18]. They have been proposed respectively by OASIS and W3C and focus on service orchestration in the implementation phase. Their specifications should then be detailed and executable. The execution of such specifications is often delegated to execution engines. These engines do not target cloud computing environments. MODACloudML orchestration specifications at the CIM level do not target full executability since it is intended to be transformed into a CPIM by means of a semi-automated mapping.

Other languages, such as the one presented in [22] represent academic efforts in defining high level languages intended for the initial phases of a project, in which high level declarative descriptions of processes are preferred

(12)

to low level imperative ones. In this paper, message exchanges between services are used to describe “interactions” which are then composed into “choreographies”. The main drawback of such choreographies is that they are intended for the initial phases of the project, i.e. their mapping into lower level orchestration descriptions is not supported. In MODAClouds’ Approach, the orchestration in the CIM level is semi-automatically mapped into a CPIM level architecture that implements such orchestration.

Finally, the efforts in defining Enterprise Architecture Frameworks (EAF) may be useful into defining cloud-enabled computation independent models of applications. That is so because EAFs are intended to represent the abstract working of enterprises, from its interactions to external actors to its internal services and orchestration protocols. We can cite two important standards in this domain: RM-ODP [23] and TOGAF [24]. For example, TOGAF includes the description of business services and their interfaces, and uses data entities to specify the data handled by these services. Both standards define custom multi-view metamodels which are mapped into UML profiles. The main drawback of such metamodels is that they do not target cloud applications, e.g. cloud specific requirements and QoS constraints cannot be defined and enforced.

The CIM is provided as an input to cloud solutions for the provisioning and deployment of application on the cloud together with resources such as the code of the application to be deployed (e.g., a war or jar file). Within MODAClouds this model will be semi-automatically translated and further refined at the CPIM level. CPIM and CPSM include cloud concepts such as IaaS, PaaS or SaaS elements. They are basically the models that can be manipulated by tools for provisioning, deployment and adaptation of cloud-based applications. The next Section presents the state of the art in such tools.

Modelling concepts and technologies for the provisioning, deployment and

2.2.2

adaptation of applications in the cloud (CPIM/CPSM)

The cloud market counts numerous cloud solutions at different levels of the cloud stack, such as IaaS providers, IaaS/PaaS libraries, as well as PaaS frameworks. As mentioned, this diversity prevents interoperability and promotes vendor lock-in. In the following we are going to present each of these solutions and explain how they build upon each other to form a stack (see Figure 2). We will also discuss the representation and models used and sometimes provided by some of these solutions.

Figure 2 The stack of cloud solutions

2.2.2.1

Providers

There is nowadays plethora of providers. The literature encompasses several taxonomies and surveys of providers [ProdanOstermann09,LiYangKZ10], but the cloud computing market has been constantly evolving during the latest years, and the data collected just few years ago is already outdated.

(13)

Table 1 shows a classification that outlines current major public IaaS providers. The list of providers is by no means exhaustive, but it includes the ones that we believe are the current major players ate least in the European and North American markets. This classification is based on headquarters, data centres' location, and uptime service level agreement (SLA).

Table 1 Providers

Provider Headquarters Data centres’ location Uptime SLA

IaaS stack

Amazon AWS USA USA, Brazil, Ireland, Japan, Singapore, Australia

99.95% Proprietary AT&T Cloud

Architect

USA USA 100.00% OpenStack

Bit Refinery USA USA, UK 100.00% VMWare

vCloud

GoGrid USA USA, Netherlands 100.00% Proprietary

Google Compute Engine

USA USA, EU (Unspecified) 99.95% Proprietary

Hosting.com USA USA 100.00% VMWare

vCloud

HP Cloud USA USA 99.95% OpenStack

IBM SmartCloud Enterprise

USA USA, Germany, Japan 99.90% OpenStack

Microsoft Windows Azure

USA USA, Ireland, Netherlands, Hong Kong, Singapore

99.95% Proprietary

Nephoscale USA USA 99.90% Proprietary,

OpenStack (Storage only)

OpSource USA USA, France, UK 100.00% Proprietary

RackSpace USA USA, UK, Hong Kong 100.00% OpenStack

ReliaCloud USA USA 100.00% VMWare

vCloud Softlayer USA USA, Netherlands, Singapore 100.00% Proprietary,

OpenStack (Storage only) Terramark USA USA, Canada, Brazil, Colombia, Dominican

Republic, Belgium, France, Germany, Ireland, Italy, Luxembourg, Netherlands, Spain, Sweden, Turkey, UK, China, Japan, Singapore, Australia

100.00% VMWare vCloud

(14)

Aruba Cloud Italy Italy 99.95% Proprietary CloudSigma Switzerland Switzerland, USA 100.00% Proprietary

Gandi France France, USA 99.95% Proprietary

GreenQloud Iceland Iceland 100.00% CloudStack

Lunacloud UK France, Germany, Latvia, Portugal 99.99% Proprietary (AWS EC2/S3 compatible)

Memset UK UK 99.99% Proprietary,

OpenStack (Storage only) The headquarters column shows that 15 providers are based in the USA while only six are based in Europe. However, the data centres' location column shows that 17 providers have data centres in the USA while 16 have data centres in Europe. This information is particularly relevant with respect to data protection laws and regulations, such as the EU data protection directive (Directive 95/46/EC) and the upcoming data protection regulation (to be adopted in 2014), which restricts the geographical locations where for instance the data of EU residents can be stored and processed.

The uptime SLAs column shows that all the providers promise at least 99.9% uptime. This indicates that for many applications there is no significant difference in terms of uptime SLAs, however, for some types of applications it is important to further exceed the 99.9\% uptime and reach near 100% uptime. However, the uptime SLA information does not reflect the actual uptime, but rather a contract between the provider and the clients, and the latest years have witnessed several severe outages at major providers [Jansen11]. The interested reader may use the CloudSleuth's called Global Provider View (see [25]) to understand the reliability and consistency of the most popular providers.

Public providers (see Section 2.2.2.1) have traditionally been offering a set of proprietary APIs for the provisioning, deployment, monitoring, and (partially) adaptation of cloud capabilities. Some minor providers have been implementing APIs which are compatible with the ones from leading providers such as the Amazon AWS [26] APIs. This solution may increase the interoperability across some providers. However, it does not solve the vendor lock-in problem. As explained in Section 2.1.6, MODACloudML is an approach for the provisioning, deployment and adaptation of application on multi-cloud this vendor locking problem is still an issue with respect to this multi-cloud concern.

2.2.2.2

Stacks

A first step towards solving this problem is provided by IaaS stacks such as OpenStack [27] and WMWare vCloud [28] for creating and managing infrastructure of cloud services in private, public, and hybrid clouds. Table 2 shows a classification of these stacks based on the license, implementation languages, hypervisors supported, and main contributors of each stack.

Apache CloudStack [29] is a free software included in the Apache Incubator project since 2012. It was originally developed by Citrix and is currently maintained by the Apache Software Foundation. CloudStack provides features such as resource management, user management, API, and user interface.

Eucalyptus [30] is a free software project initiated in 2008. It is developed and maintained by Eucalyptus Systems. Eucalyptus allows building Amazon AWS-compatible private and hybrid clouds.

OpenNebula [31] is a free software project initiated in 2008. It is sponsored by C12G, a cloud computing company associated with the Scientific Park of Madrid, and maintained by the OpenNebula Community. OpenNebula aims at developing the industry standard solution for creating and managing virtualised enterprise data centers and IaaS clouds.

(15)

OpenStack [27] is a free software project launched in 2010. It was originally developed by Rackspace and NASA and is currently maintained by the OpenStack Foundation with contributions from all the major players in cloud computing. OpenStack allows controlling pools of computing, storage, and networking resources throughout a datacentre. It provides an API and a dashboard that allow consumers to seamlessly provision resources.

Table 2 Stacks

Stack License Implementation languages Supported hypervisors Adopted by Main contributors CloudStack Apache License 2.0 Java KVM, Citrix Xen, VMWare vSphere

GreenQloud Citrix, Apache Software Foundation Eucalyptus General Public License v3 Java, C KVM, Citrix Xen, VMWare vSphere Eucalyptus Systems OpenNebula Apache License 2.0 C++, C, Ruby, Java, Shell script, lex, yacc KVM, Citrix Xen, Oracle VM, VMWare vSphere OpenNebula Community OpenStack Apache License 2.0 Python KVM, Citrix Xen, VMWare vSphere

AT&T Cloud Architect, HP Cloud, INM SmartCloud enterprise, Nephoscale (storage), RackSpace, Softlayer (storage), Memset (storage) RackSpace, NASA VMWare vCloud Commercial VMWare vSphere Bit Refinery, Hosting.com, ReliaCloud, Terramark VMWare

As depicted by the IaaS providers column of Table 2, the cloud market seems to be consolidating at the IaaS level towards a few IaaS stacks. As shown for the 21 public providers listed in Table 1, seven providers adopt OpenStack (four fully, and three partially), four providers adopt VMWare vCloud, one provider adopts CloudStack and the remaining nine adopt proprietary stacks (although one is compatible with Amazon AWS APIs). This indicates that the OpenStack has gained relatively wide acceptance across public providers, and also VMWare vCloud is supported by several providers. This trend may increase the interoperability across providers adopting the same stack. However, it does not have any significant contribution to address the challenge of supporting development and management of multi-clouds systems.

2.2.2.3

Libraries

A second step towards supporting multi-cloud systems is provided by some IaaS/PaaS libraries such as jclouds [32], DeltaCloud [33], and Simple Cloud [34]. These libraries provide abstraction layers facilitating the provisioning and deployment of multi-cloud systems through a single interface. They support numerous IaaS providers as well as IaaS stacks (see Figure 2). These libraries are at the border between IaaS and PaaS levels since they allow, for instance, a developer to run scripts on the infrastructure or to deploy a load balancer that may rely on platform services.

Table 3 shows a classification of these libraries based on license, implementation languages, and supported providers/stacks of each library.

Table 3 Libraries

(16)

language jclouds [32] Apache License Version 2.0 Java http://www.jclouds.org/documentation/reference/supported-providers/ libCloud [35] Apache License 2.0 Python http://libcloud.apache.org/supported_providers.html DeltaCloud [33] Apache License 2.0 Ruby http://deltacloud.apache.org/supported-providers.html fog [36] MIT License Ruby http://fog.io/about/supported_services.html

Simplecloud [34]

BSD license Php

fog [36] is a Ruby API providing access to compute and storage facilities on multiple clouds. It helps developers in testing and simulating their deployment by providing an in-memory representation of cloud resources.

jclouds [32] is a Java and Clojure API delivering an abstraction layer over the APIs of IaaS providers and stacks. It facilitates developers in describing generic virtual machines by means of templates. It also allows deploying multiple virtual machines and managing them as a group.

libCloud [35] is a Python API providing solutions for managing multiple clouds that are akin to the ones of jclouds.

Simple Cloud [34] is a PHP API delivering mechanisms for managing the life-cycle of a virtual machine on multiple clouds. It offers interfaces for data storage, document storage, and message queue services. It also provides mechanisms for monitoring a virtual machine (e.g., computing, memory, storage, and network usage). Most of these libraries are language-dependent since they are designed to interface with programming language like Ruby, Java, and PHP. However, this is not the case of DeltaCloud [33], another API providing drivers for computing and storage facilities. It consists of a REST interface where client send request to a DeltaCloud server (on a local machine or on a public DeltaCloud instance) wrapping the drivers to the various cloud providers. However, such an approach can introduce a single point of failure.

IaaS libraries provide a common access to multiple clouds; however, they do not provide any mechanism for automatic provisioning and deployment of applications and services on the clouds. They do not rely on a classical Model-Driven Architecture (MDA) but provide most of the times a code-based model of the infrastructure. For instance, jclouds provides a POJO model of the infrastructure that includes concepts such as:

• NodeMetadata: description of node with metadata such as imageId, CPU, RAM, security policy etc. • Template: an abstract representation of a node with parameters such as minCPU, OS type, etc • NodeInGroup: a set of nodes to be managed together

• Script: a set of command to be executed on nodes • Provider: information about the provider

Since jclouds is working at the IaaS level, applications and services are not modelled.

2.2.2.4

Frameworks

The latest step towards supporting multi-cloud systems is provided by some specific PaaS frameworks. These frameworks aim at reducing the complexity of managing multi-clouds systems. They provide capabilities for the provisioning, deployment, monitoring, and adaptation of multi-cloud systems without being language-dependent. They partially reuse the IaaS and Paas libraries (see Figure 2). As claimed in [37], two main types of PaaS can be distinguished. One type of PaaS such as openShift [38] considers the underlying IaaS as a black box; i.e., it does not provide visibility and control over the underlying infrastructure. Another type of PaaS considers the same IaaS as a white box, i.e., it provides full visibility and control over the underling infrastructure. Without visibility

(17)

and control on the underlying infrastructure, developers can not explicitly adapt the infrastructure to optimise performance, availability, and cost.

In this Section, we present PaaS frameworks that provide visibility of the IaaS level since this is an objective of the MODACloudML platform. Some of them rely on so-called “DevOps” tools such as Chef [39] and Puppet [40] that automate the deployment of applications and services, as well as the management of cloud capabilities. With visibility and control on both IaaS and PaaS levels, developers can exploit the peculiarities of cloud solutions at each level of the cloud stack.

They embed simple mechanisms to monitor the topology of the infrastructure, metrics about resource consumption (e.g., computing, memory, storage, and networking) in addition to feedback about the status of the application. They also offer cloud-specific adaptation mechanism such as load balancing, auto scaling or automatic failure recovery.

Table 4 shows a classification of the latter type of frameworks based on license, implementation languages, interfaces, provisioning and deployment support, monitoring support, and adaptation support. More technical details about Cloudify and Cloud foundry can be found in D6.1.

Table 4 Frameworks

Tool License Implementation

languages Interface Supported providers/stacks Monitoring support Adaptation capabilities

Cloudify [4] Apache License 2.0 Java, Groovy, JavaScript CLI, web-based monitoring interface, REST API to cloudify service Amazon, OpenStack, Azure, HP cloud, RackSpace, your own local provider

Application and deployment status and logs & resources metrics Auto-scaling based on metrics on resources and number of instances Automatic failure recovery Scalr [5] Apache License 2.0 Python, PHP, JavaScript REST API, Web-based user interface Amazon, OpenStack, RackSpace, Nimbula, Eucalyptus, IDC Frontier, CloudStack, cloud foundry Application status and logs & load statistics & Notification when anything happened to a farm. Auto-scaling of the infrastructure including database when overloaded (CPU, RAM, DISK, Network) or when scheduled thanks to the task manager. Automatic failure recovery Cloud Foundry [41] Apache License 2.0 Ruby, Java, JavaScript REST API, CLI, eclipse plug-in Amazon, OpenStack, Rackspace, Piston, Eucalyptus, your own local provider

Application Status & Environment variables & Application Logs & resources metrics Change the number of instances associated to an application Automatic failure recovery

(18)

Cloudify [4] is an open-source project developed by GigaSpaces that focus on the deployment and execution of application on the cloud with a large panel of providers supported and provides basic scalability features. To deploy applications Cloudify proposes a model inspired from Chef [39] involving the following concepts:

• Service recipe: describes general information about the service including its required infrastructure, how it should be used and the probes to monitor it.

• Service: is a cluster of service instances that make-up an application tier

• Application recipe: describes the configuration (including provisioning and scaling rules) of an application and the services it is made of.

• Application: an application is a set of services working together and is described in an application recipe. The Cloudify manager deployed on a cloud allows cloud-operators to manage several applications in the same infrastructure.

• Probes: are used to monitor the status of the system, they can be built-in, scripted or plugin Figure 3 from [4] describes the anatomy of an application in Cloudify.

Figure 3 Anatomy of an application in Cloudify

Scalr [5] is also an open-source project with a specific focus on scalability with more advanced features in this area than the two other frameworks. It proposes a model that can be manipulated through a graphical user interface which involves the following concepts:

• Server farms: a set of components with specified roles to be deployed

• Components: type of element to be deployed (e.g., databases, load balancers, application servers etc) • Roles: describes the configuration of a component (e.g., scaling options, settings, parameters, load

balancing options...)

• Auto-scaling rules: base on a metrics describes when to scale in or out • Config templates: pre-defined configuration for a role

Cloud Foundry [41] is both a PaaS hosted by VMWare and an open-source project with a Micro version for local deployment. Concepts that can be manipulated through the Cloud Foundry API are1_:

• Ressources: are entities with metadata, they can be: Organization, User, Space, Application, Runtime, Framework, Service, ServicePlan, ServiceInstance, ServiceBinding, ServiceAuthToken.

• Associations: relations between entities

• Actions: to change the state of the system i.e., start a resources with 5 instances and id 2 • Errors: HTTP response codes

These frameworks are important to optimise performance, availability, and cost of multi-cloud systems. However, they do not come with any structured approach, and the provided methods and tools are at a technical level, thus, the developer will typically be left hacking at code level rather than engineering multi-cloud systems following a structured tool supported methodology.

(19)

2.2.2.5

EU projects

Several on-going European projects are providing stacks, libraries or frameworks for the provisioning, deployment, monitoring and adaptation of cloud-based systems at IaaS or PaaS levels. In this Section, we will present these projects with a focus on their ability to target multi-clouds systems and their use of model-driven techniques. More technical details about Cloud4SOA, mOSAIC and OPTIMIS can be found in D6.1.

Project Objective

REMICS [42] [43] The MODACloudML approach is based on the work done in REMICS [43] which provides modelling concepts enabling model-driven provisioning and deployment of cloud-based systems at the IaaS level. A domain specific language called PIM4Cloud provides a first step to designers to model application to be deployed on the cloud. The proposed approach is cloud provider independent and inspired by components models. The language is implemented using Scala as a hosting language. Another mechanism focuses on the provisioning of computational resources at the IaaS level and manipulates concepts as providers and nodes with properties such as CPU, RAM etc. MODACloudML will extend it to the PaaS level.

4CaaSt [44] The 4CaaSt project delivers a solution for elastic and optimised hosting of Internet-scale multi-tier applications. This solution is based on Chef to monitor the execution and manage the life-cycle of applications and services [45].

ARTIST [46] ARTIST aims at providing MDE techniques for representing applications and services as well as cloud infrastructures and platforms. The expected outcomes of the project are a vendor- and platform-independent methodology and an automation-oriented toolset for re-engineering, migration, maintenance and evolution of cloud-based applications. Since ARTIST is also a project from call 8 of FP7-ICT only little information is available at this stage of the project.

CELAR (Cloud ELAsticity pRovisining) [47]

CELAR aims at delivering an automated and highly customisable system for elastic provisioning of resources in cloud computing platforms at the IaaS level. The expected outcomes of the project are a middleware for elastic provisioning that automatically manages and adapts cloud resources, an information system describing cloud resources and providing a search mechanism, and a scalable monitoring tool. Since CELAR is also a project from call 8 of FP7-ICT only little information is available at this stage of the project.

Cloud4SOA [48] The Cloud4SOA project supports cloud-based systems developers with multi-platform management, monitoring and migration by semantically interconnecting heterogeneous PaaS offerings. The deployment process can be done through the Cloud4SOA API exposed by Cloud4SOA PaaS platform adapters. The solution currently supports CloudFoundry, openShift, and Amazon Elastic Beanstalk.

CloudScale [49] CloudScale aims at supporting scalable service engineering. The expected outcomes of the project are tools and methods for the modelling of design alternatives and the analysis of their effect on scalability and cost and that detect scalability problems by analysing code. The ScaleDL language will serve as a basis for these tools. Since CloudScale is also a project from call 8 of FP7-ICT only little information is available at this stage of the project.

(20)

both IaaS and PaaS levels to allow providers to integrate resources form others clouds. It also allows application to seamlessly switch of cloud provider. The solution requires an agreement in the adoption of a common technology stack among cloud providers.

mOSAIC [51] mOSAIC tackles the vendor lock-in problem by providing an open-source platform including an API for provisioning and deployment of applications on multiple clouds. The API allows developing cloud applications with abstraction of IaaS services that enables the migration of these applications from one cloud to another.

OPTIMIS [52] The OPTIMIS toolkit allows to provision on mulicloud and federated cloud infrastructures and to optimize the use of resources. The toolkit provides tools for IaaS providers and service providers and developers. The Service Deployer is responsible for the deployment of services while the Service Manager is responsible for the operation of the services by keeping track of all runtime data.

Reservoir [53] Reservoir has defined an architecture for future IaaS clouds. It provides solutions for the provisioning and scalability of resources on demand. An expected outcome of the project is to enable providers of cloud infrastructure to dynamically partner with each other. The description on how to manage an application on a cloud infrastructure is done through a Service Definition Manifest which is a contract between the service and the infrastructure [54]. The abstract syntax of this language is defined using the Essential Meta-Object Facility (EMOF) in order to be independent of any specific implementation platform. Some constraints on the behaviour of the underlying infrastructure can be expressed in OCL. This abstract syntax is used to define syntax of language as the application description language or elasticity rules.

PaaSage [55] The main goal of PaaSage is to deliver an open and integrated platform to support both design and operation of cloud-based systems, together with an accompanying methodology that allows model-driven provisioning, deployment, and adaptation of these systems independently of the underlying cloud infrastructures. MODAClouds and PaaSage are collaborating on the research and development of what will be the core elements of MODACloudML (which are referred to in PaaSage as CloudML).

2.2.2.6

Discussion

The stacks, libraries and frameworks presented in this Section provide mechanisms to automate the provisioning and deployment of application on multiple clouds. However, as explained in [6], there is a “... need for developers to be able to design their software systems for multiple Clouds and for operators to be able to deploy and re-deploy these systems on various Clouds depending on the convenience. The current Cloud literature, however, does not seem to pose attention to this issue as it is focused on considering the perspective of the Cloud providers, by offering mechanisms for auto scaling of Clouds and for interoperability and federation between Clouds.”.

MDE is a well-known approach to tame the complexity of designing complex systems. Models enable developers to work at a high level of abstraction by focusing on cloud concerns rather than implementation details. Model transformations restrain developers from repetitive and error-prone tasks such as coding. The model-driven approach, commonly summarised as “model once, generate anywhere”, is particularly relevant when it comes to provisioning and deployment of applications and services across multiple clouds, as well as migrating them from one cloud to another. Even if none of the solutions presented in this state of the art fully

(21)

rely on a model-driven approach at both IaaS and PaaS levels, some of the concepts to be modelled within MODACloudML can be expressed by these solutions and they will be a source of inspiration during the design of the modelling language.

The frameworks presented in this state of the art also offer some cloud-specific mechanisms for adaptation and self-adaptation [56] such as load balancing, auto scaling or failure recovery. These adaptations are triggered when some of the constraints specified at design-time are not fulfilled any more. These constraints are related either to computing resources (e.g., the CPU usage should be below 75%) or to desired topologies (e.g., the service should be deployed and running on at least two virtual machines). Self-adaptive systems are generally based on a control loop like the well-known Monitor–Analyse–Plan–Execute from autonomic computing [56]. Inputs of the reasoning systems (Analyse and Plan) are observables describing the running system and its context. Outputs are a set of planned adaptation actions. However, the adaptation of multi-cloud systems is becoming ever more complex.

Models can also help in taming such complexity. The models at runtime [57] [58] paradigm proposes to leverage models during the execution of adaptive software systems to monitor and control the way they adapt. This way, adaptation mechanism can benefit from MDE at runtime. The models@runtime layer can be applied as a pattern for the design of Monitoring and Execution (enactment of the adaptation either compositional or parameter [59]) mechanisms of the loop.

Models@runtime provide an abstract representation of the running system causally connected to the underlying state of the system which facilitates reasoning, simulation and enactment of adaptation actions. A change in the running system is automatically reflected in a model of the current system. Any modification applied to this model can be enacted on the running system on demand. A classical architecture to achieve this is depicted in Figure 4 from [60]. The current model of the running system can be used by a reasoning system that will produce the target model of the system. Before adapting the system, some validation process can be done on the target model (step 1). If passed, the difference between the target model and the current model of the system is computed (step 2). Then, the adaptation engine enacts the adaptation only on parts of the system which are included in this difference (step 3). Finally, the model of the current system is updated again (step 4).

Figure 4 Overview of the models@runtime approach

The models@runtime approach enables the continuous evolution of the system with no strict boundaries between design-time and runtime activities. Thanks to the use of models, they provide a well-defined interface to monitor the system and adapt it. They also provide a way to measure the importance of changes in the system and analyse the delay before their enactment on the running system. In general, the stacks, libraries and frameworks we have presented do not provide such abstraction. These concerns will also be considered during the design of MODACloudML, runtime model of the running system will be provided at the CPSM level. It will describe the real topology and deployment of the system and then provide cloud specific information. Then in one hand any modifications in the specifications of the system at the CIM or CPIM level will be reflected on the CPSM level and then automatically on the running system. In the other hand, any change in the running system can be checked against the CPIM.

(22)

When designing and operating applications and services in the cloud, design decisions are not only related to computational entities but also to data representation and persistence. The next Section proposes a state of the art in mechanisms for data persistence.

Data persistence

2.2.3

The Section presents the state of the art in mechanisms for data persistence. This study can help for design decision about the choice of such mechanisms on the basis of their properties.

2.2.3.1

Object-oriented mechanisms for data persistence

It is undisputed that object oriented paradigm (OO) is among the most widespread approaches to produce code. It encourages the modularization and the reuse of the code and it is the ideal field for developing big and complex software systems. Still complex systems need complex data models and complete storage solutions. To address the increasing needs to easily manage data in OO languages, the Object/Relational Mapping (ORM) has been proposed as a “good practise” to design data and make them persistent.

The persistence concept refers to the characteristic of state that outlives the process that creates it. It is obtained by storing data in a non-volatile storage such hard driver or databases [61].

Despite the fact that persistence does not suggest any kind of storage (file, database), it is quite common to use relational databases systems (RDBMSs) as non-volatile storage. As an alternative, object databases (ODBMSs) are often used since their data model is quite compatible with OO data model.

In ORM style, the developer declares data objects using the OO model, so, it enriches the model by defining objects or parts of them that the developer wants to make persistent. Meta-models (that frequently consist in annotations mixed with OO code) are used to enrich the initial OO data model as shown in the code below.

@PersistenceCapable public class ContactInfo { @PrimaryKey

private Key key; @Persistent

private String streetAddress; …

}

Code 1 OO annotated class

The ORM framework is responsible to map and store objects in the persistence layer. Furthermore, the framework provides API to load and manipulate data.

Figure 5 ORM layer depicts the concepts explained before.

There are some benefits in the usage of ORM for development of applications: • The productivity improves because the framework is responsible for

automatically generating the code for data management.

• The user accesses data by using specific query languages provided by ORM.

• ORM forces the developers to strongly decouple the data model domain from the business logic domain.

• ORM increases the amount of reusable code and enhances the application maintainability.

(23)

Several tools based on ORM have been released in the last years, but only few of them are widely diffused and adopted.

Hibernate [61], for instance, is a standard “de-facto” ORM for Java developers. It is a very mature ORM solution that is compliant with Java Persistence API (JPA) [62] and Java Data Object (JDO) [62]. Furthermore, it offers a proprietary SQL-like query language to manage data objects (HQL) [63].

JPA and JDO are two standard persistence technologies for Java. Based on difference between JPA and JDO we could conclude JPA is a subset of JDO. The reader will find in [62] a complete documentation for both.

The available RDBMSs and the OO programming paradigm have demonstrated their unsuitability when large quantities of data have to be handled. In this case, scalability and ability to distribute data and to parallelize computations becomes a critical issue. NoSQL databases and the Map-Reduce (MR) paradigm represent the emerging solutions to deal with this issue. We will first focus on the MR paradigm in the next section which can be combined with NoSQL databases and then NoSQL will be discussed in Section 2.2.3.3.

2.2.3.2

Map Reduce

MR is not new in the context of distributed computing, but it has been re-discovered brought back to the scene thanks to Google. In [64] Google presents the Map Reduce idea and its own implementation. This is assumed to run on a large cluster of machine. In [64], we read “a typical MapReduce computation processes many terabytes of data on thousands of machines” and “Programmers find the system easy to use: hundreds of MapReduce programs have been implemented and upwards of one thousand MapReduce jobs are executed on Google’s clusters every day”.

Google had the merit to demonstrate that a “simple” infrastructure composed of several clusters combined with a MR paradigm is the right way of looking at big data problems.

Influenced by the Google experience, an Apache project started with the idea to realise an open-source product similar to the Google one. As a result, Hadoop [65] has been released. It is a complete framework for distributed storage and processing of large data sets across clusters.

It is mainly composed of:

• Hadoop Distributed File System: A distributed file system • Hadoop YARN: Job scheduler and cluster resources manager • Hadoop MapReduce: Hadoop MR environment

and of more other projects 100% compliant with the basic architecture.

Nowadays, Hadoop has become the open source solution used by private users as well as by big ICT company. For instance, Amazon offers VMs with complete Hadoop solution, Cloudera offers Cloud solution based on Hadoop and also Microsoft dismissed Dryad [66], its research project for writing parallel and distributed programs, to support Hadoop. To conclude, Hadoop is a very complete open-source alternative to conduct experiment with scalable systems and the parallel programming paradigm.

2.2.3.3

NoSQL

For a long period, since 1970, Relational Database Management Systems (RDBMs) have been largely adopted as storage solution [67]. SQL was the mainly reason of the RDBMS success. SQL provides a comprehensive and ad-hoc query language to manipulate data.

More recently, Non-relational (or not-only relational) databases, often termed as NoSQL, have emerged [68], especially in the context on widely distributed systems, and have generated both interest and criticism.

NoSQLS are not new, the term NoSQL was first used in 1998 for a relational database that omitted the use of SQL (No SQL) [69].

There are two schools of thought concerning the NoSQL meaning. The first one believes the term refers to relational databases without SQL support (No SQL), while the second one refers to non-relational databases (Not Only SQL). In this document we will refer to the term NoSQL with the meaning of distributed non-relational databases.

One of the major problems often mentioned is the heterogeneity of the languages and the interfaces they offer to developers and users. Different platforms and languages have been proposed, and applications developed for one

(24)

system require significant effort to be migrated to another one. Furthermore, some crucial properties (such as transactionality) are missing in the typical NoSQL approaches.

From a theoretical point of view, the need for a uniform classification and principle generalization for NoSQL databases is widely recognized and was described by Cattell in [70], reporting a detailed characterization of non-relational systems. Stonebraker [71] highlights the absence of a consolidated standard for NoSQL models and the absence of a formal query language for those models. Kossmann and Kraska [72] analyze the offerings of the main PaaS storage provider Amazon, Google, and Microsoft and examine the common features and differences, from a data perspective. Leymann et al. provide taxonomy for Cloud Data Hosting Solutions in their survey [73]. On the transactionality aspect, some early studies have been done within the CumuloNimbo project [74] [75], which proposes some initial statements and visions on the topic. The most interesting features of NoSQL DBs, in our opinion, are their attitude to scale based on the workload, the characteristic to be schema-less, and the fact they don’t guarantee ACID (Atomicity, Consistency, Isolation, Durability) properties (ACID are not always a system requirement).

[76] offers an even longer list of interesting characteristics: • Avoidance of unneeded complexity.

• High throughput.

• Horizontal scalability and running on commodity hardware.

• Avoidance of expensive Object-Relational mapping: The NoSQLs are often designed to store complex data structures in a way similar to the Object-Oriented programming language than the relational databases.

• Complexity and cost of setting up database clusters: The NoSQLs are simple systems that are specifically designed to run in distributed environment and generally they do not require the administrator role.

• Possibility to establish a trade-off between reliability and performance: this is the case in which to share data is more important than persist data, for instance, because of performance. to clarify, we can think about the situation in which several web processes need to share the HTTP user session. In this case it may be more convenient do not store the session to the detriment of reliability.

• The current “One size fit’s it all” Databases Thinking Was and Is Wrong: in the past the trend for managing data was to adopt RDMS as unique solution for every domains problem. Nowadays, different NoSQLs are designed for different domain problems.

• The myth of effortless distribution and partitioning of centralized data models: in [77] is discussed the disadvantages due to develop data model in a centralized model. Shalom suggests designing data models to ﬁt into a partitioned environment even if there will be only one centralized database server initially. This approach oﬀers the advantage to avoid exceedingly late and expensive changes of application code.

• Movements in programming languages and development frameworks: the lack of the requirement of NoSQL to be general-purpose data storage makes the NoSQL offer focused on a specific technology often related to a specific programming language.

• Requirements of cloud computing: The paper refers to the scalability and low administrator overhead. • The RDBMS plus caching-layer pattern/workaround vs. systems built from scratch with scalability in

mind: This point is explained in [78]. In the blog Hoff reports the architecture design of real systems that need scalability - “Shard MySQL to handle high write loads, cache objects in memcached to handle high read loads, and then write a lot of glue code to make it all work together.”[…] “With a little perspective, it's clear the MySQL+memcached era is passing.”

• Yesterday’s vs. Today’s Needs: In the 1960s and 1970s databases have been designed for single, large high-end machines. In contrast to this, the trend of many large companies is the adoption of hardware which will predictably fail. Consequently, applications are designed to be dynamically adaptive to failures.

2.2.3.3.1

Taxonomy

It is now clear that NoSQLs databases are ad hoc solutions in the meaning that they have been designed by keeping in mind a particular problem space. They are used to have the best performance in the context of usage. For this reason, they are deeply different and the categorization is not simple and often presents exceptions. Nevertheless, a taxonomy based on their data model has been provided by Yen in [79] and by Cattel in [70]. In Table 5 NoSQL taxonomy these two taxonomies are compared. We extend them by including a new category, that is, the Graph-based category.

(25)

Cattel Taxonomy Yen Taxonomy Our Taxonomy Key-Value Store Key-Value-Cache Key-Value-Store Eventually-Consistent K-V-S Ordered-Key-Value-Store Data-Structures Server Key-Value

Document Stores Document Store

Object Store Document-based

Extensible Record Stores Wide Columnar Store

Tuple-store Column-oriented

Graph-base