An Architectural Model for Deploying Critical Infrastructure Services in the Cloud

(1)

An Architectural Model for Deploying

Critical Infrastructure Services in the Cloud

Marcus Sch¨oller

NEC Europe Ltd. Heidelberg, Germany Email: marcus.schoeller@neclab.eu

Roland Bless, Frank Pallas, Jens Horneber

Karlsruhe Institute of Technology Karlsruhe, Germany

Email:{bless, frank.pallas, horneber}@kit.edu

Paul Smith

Austrian Institute of Technology Vienna, Austria

Email: Paul.Smith@ait.ac.at

Abstract—The Cloud Computing operational model is a major recent trend in the IT industry, which has gained tremendous momentum. This trend will likely also reach the IT services

that supportCritical Infrastructures (CI), because of the potential

cost savings and benefits of increased resilience due to elastic cloud behaviour. However, realizing CI services in the cloud introduces security and resilience requirements that existing offerings do not address well. For example, due to the opacity of cloud environments, the risks of deploying cloud-based CI services are difficult to assess, especially at the technical level, but also from legal or business perspectives. This paper discusses challenges and objectives related to bringing CI services into cloud environments, and presents an architectural model as a basis for the development of technical solutions with respect to those challenges.

I. INTRODUCTION

Cloud Computing is an operational model that provides on-demand elastic computing or storage resources in a cost-effective manner, often using virtualization as an enabling base technology. As one major recent trend in the IT industry, it has gained enormous momentum and started to revolutionise the way enterprises create and deliver IT solutions.

As more sectors utilise cloud-based services in their com-puting environment, Critical Infrastructure (CI) services are likely to adopt this paradigm. A first candidate are telecom-munication providers – ETSI ISG NFV was formed earlier this year with the purpose of developing pre-standards for moving telecommunications functions in the cloud – and other CI service operators from the traffic and transportation, or the infrastructure surveillance systems domain are expected to follow soon. The promised advantages do not only relate to cost reductions and increased flexibility, but also new ways to improve the resilience and availability of the CI, e.g., through the use of abundant virtual resources.

However, realising CI services in the cloud implies security and resilience requirements that existing cloud offerings do not address well. Due to the opacity of cloud environments, the risks of deploying cloud-based CI services are difficult to assess, especially at the technical level, but also from legal or business perspectives. Existing security measures do not fully address related important issues (e.g., risk, trust, and resilience), resulting in uncertainty for operators and manu-facturers of critical infrastructure IT systems. Therefore, our goal is to analyse and evaluate cloud computing technologies

c

2013 IEEE, Published at IEEE CloudCom 2013

with respect to security risks in sensitive environments, and to develop methodologies, technologies, and best practices for creating a secure, trustworthy, and high assurance cloud computing environment for critical infrastructures.

A. Objective and Challenges

In order to address the security and resilience [1] challenges associated with deploying high assurance services in the cloud for the critical infrastructure sector, one can identify a number of objectives. These can be summarised as follows: For the use of cloud services in the critical infrastructure sector one needs toidentify the relevant legal framework and the resulting obli-gations and constraints. Without such thorough consideration of the legal givens, use of the cloud in this sector, in which there are often stringent regulatory and legal requirements, will be severely limited. Furthermore, it is important to have clear guidance on how to address liability issues in light of service failures. Related to this, and building on the aforementioned legal framework, we need to develop technical solutions for the

provision of evidence and data protection for cloud services. This includes investigating solutions for digital forensics in cloud infrastructures. Since attacks and failures are inevitable, and possibly even more attractive in cloud environments, it is important to develop approaches to understanding cloud behaviour in the face of challenges and attacks. Attacking a cloud infrastructure is a potentially highly attractive goal, since services of all tenants who run them in the cloud are affected. Consequently, clouds are a more attractive target for attackers if they are aware that CI services are running inside the cloud environment. In this area, we will investigate where the appropriate points in a cloud architecture are for placing monitoring and detection functionality, and to examine how robust state of the art detection algorithms are with respect to elastic cloud behaviour. A key challenge that potential critical infrastructure cloud service users face is to understand the risks it brings. Despite some work in this area, e.g., on the development of threat catalogues, there is a lack of suitable techniques and processes for understanding and managing risk associated with cloud environments. Building on all these activities, an objective is to establish a set of best practice guidelines for secure cloud service implementations, which can be used by various stakeholders in this area to ensure secure and resilient cloud services that permit to realise CI services in cloud environments.

(2)

Based on these higher level objectives, we can infer a number of objectives on a technical level, leading to the development of solutions for the following items:

• Cloud architectures that focus on security, trust, and resilience, that are suitable to deploy and run CI services.

• Process-oriented security guidelines, policies, and pol-icy languages that allow to express the security re-quirements needed for CI services in clouds.

• Anomaly-based techniques to discover deviations from expected system and network behaviour, in order to disclose attacks and to predict their impact. • Context-aware policy enforcement technologies. • Tools and technologies to increase the trustworthiness

of cloud environments, e.g., monitoring and auditing tools that permit service audits in a lawful and pri-vacy compliant manner. Creation of additional and corresponding cooperative interfaces to the cloud in-frastructure should increase the transparency to allow for root-cause analysis, while not disclosing too much operational details.

In order to achieve these higher level objectives as well as the technical requirements, a first step is to create an

architectural model that permits the development of the listed technical solutions that support running CI services fully or partially in cloud infrastructures.

B. A Use Case

To illustrate an application of our architectural model, we describe a scenario in which services and data from a video surveillance system are provisioned in the cloud. In this scenario, a security company (called CITYSEC) has outsourced the operation of a video surveillance system, which is used to monitor a critical infrastructure, to a third-party company (TENSYS). Appreciating the potential benefits of using the cloud, TENSYShas chosen to deploy aspects of the ICT services associated with the video surveillance system in the cloud. For instance, the TENSYS system makes use of automated detection services that generate alarms to inform security personnel when unusual movement is detected. Fur-thermore, TENSYShas an obligation to archive video footage, which they choose to use cloud services for. To realise this cloud-based deployment, TENSYSpurchases connectivity from the critical infrastructure site to the cloud from a telecommu-nications company (TELCOM) and cloud compute and storage services from a cloud provider (CLOUDCORP).

In this scenario, a number of undesirable outcomes could occur, which motivate the need for a specialised cloud archi-tectural model for critical infrastructure services. A primary purpose of the video surveillance system is to alert security personnel to malicious activity, e.g., vandalism of the critical infrastructure. This is possible using the previously mentioned automated detection service. However, if an incident is not attended to by CITYSEC personnel, and damages occur, a question of liability arises. Specifically, the question of whether the failure occurred because of the use of cloud technologies. Similarly, privacy issues are prominent in this scenario – for

example, video footage of public servant indiscretions could be leaked to the media (e.g., by unauthorised access or indis-cretion of an employee). Again, the use of cloud technology could be brought into question, and suitable technical measures need to be available to address liability issues amongst the various stakeholders in the scenario. To support such use cases as this, our architectural model enables the fine-grained specification of privacy, security and resilience requirements, which are upheld by the cloud infrastructure. Furthermore, the model supports the monitoring of the cloud infrastructure to determine when problems arise, and if necessary, provably indicate the root cause of a problem that led to a failure of the kinds previously discussed.

C. Legal Requirements and Challenges

From the legal perspective, even this comparably simple use case raises several yet unsolved questions, which have to be addressed properly in order to to render the use of Cloud Computing possible in the field of critical infrastructure IT. In particular, this refers to the two legal fields of data protection law and liability/evidence law.

Our use case, and the many other possible applications of Cloud Computing in the field of critical infrastructures, involves the collection, processing and use of personal data. Consequently, any technical architecture that shall serve as the basis for a cloud-based critical infrastructure IT must provide the means necessary for reaching compliance with relevant data protection regulations. For instance, regulations are de-fined in Art. 2 of the data protection directive 95/46/EC [2] or in Art. 4 of the currently discussed draft for a forthcoming general data protection regulation [3].

This is definitely the case for legally implied security requirements. European data protection law strongly focuses on the role of the so-called “controller” of personal data, which is defined as “the natural or legal person, public authority, agency or any other body which alone or jointly with others determines the purposes and means of the processing of personal data” in [2]. In particular, the controller is required to implement “appropriate technical and organisational measures” [2] to prevent unauthorised access and disclosure and unlawful or accidental destruction, loss or alteration of personal data. In this regard, the appropriateness of a certain measure depends on the costs that would arise from its implementation and the kind or “sensitivity” of the data. As the “sensitivity” and the amount of personal data being collected, processed and used in critical infrastructure scenarios are in many cases substantial, this often implies that even sophisticated and expensive mechanisms become “appropriate” for various use cases of cloud-based infrastructure IT. This will have to be provided for in the underlying infrastructure.

Transparency is another core concept of European data protection law. It prescribes that any data subject has the right to inform herself about the precise circumstances under which her personal data are processed and stored. The controller, in turn, has the obligation to actually provide the data subject with this information. Under the specific givens of cloud computing – with dynamic (re-)allocation of resources being explicitly intended, for example – the fulfilment of this obligation becomes significantly harder than in traditional environments.

(3)

A cloud infrastructure that provides mechanisms for reliably tracing the current location(s) of data, and for gathering further dimensions regarding the precise circumstances of the processing and storage of data would be highly helpful in this regard. This is because it would allow the data controller to report the respective information to the data subject, and thereby fulfil their legal transparency obligations.

Furthermore, similar mechanisms and technologies would also prove highly valuable with regard to the monitoring and control of data transfers. It is, for example, illegal to transfer personal data from Europe to Non-European countries that do not ensure an “adequate level of protection” [2]. The location of the physical cloud infrastructure that personal data are processed and stored on is thus of high relevance for the legitimacy of transferring personal data to the respective systems. Again, reliable tracing mechanisms would help the controller to ensure (and, not to forget, demonstrate) that such transfer restrictions are actually adhered to.

Last but not least, legal data protection requirements strongly depend on the different legal roles assumed by the different actors within a given setting. Generally speaking, legal data protection requirements have to be fulfilled by the party that is deemed the controller of the personal data (see above). With regard to cloud computing, it is usually assumed that the cloud service user (in our example, the security company CITYSEC) assumes this role, while the cloud provider (CLOUDCORP) is usually seen as merely “processing data on behalf of the controller”, thereby basically leaving the responsibility for ensuring compliance with data protection requirements with the cloud service user. In order to fulfil this duty, the cloud service user must be able to ensure and validate that the cloud provider actually behaves in conformance with the instructions given by him. As the cloud service user can hardly fulfil this obligation by inspecting the cloud provider’s data centre(s) locally, this requires powerful technical mecha-nisms and instruments for remote-auditing (including trusted audit trails) to be provided by the underlying architecture.

However, such mechanisms are not only required for rea-sons of data protection law, but also from the perspective of liability and evidence law. For example, if the cloud service user suffers a damage resulting from a malfunctioning cloud service and wants the respective cloud provider to compensate him, he would have to provethat the failure actually resulted from the cloud provider not having fulfilled his duties. This, in turn, is not be possible in current cloud settings, as the cloud service user typically has no access to any evidence which could irrefutably prove an actual misconduct on the side of the cloud provider. In the end, this would result in situations where cloud service users have to bear the costs resulting from another party’s misconduct. Again, the above-mentioned technologies for remote auditing and for establish-ing trustworthy audit trails could provide means for addressestablish-ing this yet open challenge. An architecture that provides such trustworthy electronic evidence would prove highly valuable, not only with regard to critical infrastructure IT, but rather for nearly any advanced use of cloud computing in general.

These are just some of the multiple legal requirements emerging in the context of cloud-based critical infrastructure IT. They are far from being exhaustive, but give an impression about the challenges that the concept of cloud computing raises

beyond the purely technical perspective. If taken seriously into account from the very beginning, many of these challenges can, however, be addressed during the design of the underlying cloud infrastructure, thereby rendering a legally compliant use of cloud computing significantly more achievable or even possible at all during later stages of development.

II. RELATEDWORK

There exist a number of reference architectures for Cloud Computing that describe and identify technical, operational and security issues. All existing reference architectures are either layered, role-based or some combination thereof.

Layered cloud architecturesadjust their architecture either on technical functionality, as is done by Cisco [4] and the Internet Engineering Task Force (IETF) [5], or on the provided cloud service model. This latter approach is taken by the Cloud Security Alliance (CSA) [6]. Common key components or layers that are shared in current reference architectures include: • Physical hardware resources and their virtualized counterparts, often categorised as Network, Storage and Computing.

• The providedservice model, describing the abstraction level of the provided services. The most common models, stacked on top of each other in this order, are Infrastructure as a Service (IaaS), Platform as a Service (PaaS) and Software as a Service (SaaS). • Aservice registeror catalogue that is used to manage

the provide the services and cloud capabilities to users and developers based on the service model.

• Components to provide and control user access via interfaces or portals. Users can specify the desired capabilities and requirements for their application ei-ther explicitly via interface or implicitly by service selection.

• Management components to handle the complexity of clouds introduced, e.g., by elasticity and resource migration. Security is either handled by the manage-ment components or in additional specialised compo-nents. Other reference architectures, e.g., from IBM [7], further divide their management into a compo-nent handling business-related processes and issues like billing, accounting, contracting and SLAs, and an operational management component handling the technical aspects. Management can also be seen as a vertical layer as modelled in the architectures from the Distributed Management Task Force (DMTF) [8], T-Clouds project [9] and the CSA, providing support with specialised tasks ordered hierarchically but also spanning multiple other layers and components. • APIs for cloud service development, deployment, and

management used by external developers implement-ing their own new services or even service models. • Additional interfaces to provide monitoring, reporting,

performance metering or auditing data collected by a cloud provider to clients or third parties.

(4)

Cloud Infrastructure Service Model (IaaS, PaaS, SaaS) Service Repository

Cloud Service Provider

Service User Service Developer

accesses services constructs services

Fig. 1. An abstract view of several architectural models

In summary, existing role-based architectures like those from the DMTF, IBM and the National Institute of Standards and Technology (NIST) [10] usually focus on the roles of the Cloud Service Provider, Cloud Service User and the Cloud

Service Developer as shown in Fig. 1. The focus of most of such architectures is the data centre, and thus Cloud In-frastructure level, i.e., they consider methods and mechanisms regarding how to provide Cloud-based services to customers and users. This includes virtualization software and techniques, as well as management of physical hardware, network [5], and virtual entities.

A purpose of such models is to identify the flow of information and the structure of required interfaces between the involved roles. Regarding the Cloud Service Provider, a common view is that its role is very monolithic, with manage-ment responsibilities ranging from the cloud hardware platform up to service provisioning and user accounting. We argue that the role of Cloud Service Providers should be logically divided into different roles, which respect the technical and legal responsibilities in the context of Critical Infrastructures. The Open Security Architecture (OSA) [11] architecture already uses a very fine-grained role-model to identify and analyse the security issues that have to be considered in cloud infrastructures. Unfortunately, the architecture does not consider dividing the Cloud Service Provider responsibilities into different roles with individual security, management, mon-itoring, and audit tasks.

In Section I, we discussed the requirements needed to support critical infrastructures in cloud environments. Exist-ing reference architectures for Cloud ComputExist-ing focus on encapsulating the technical and administrative aspects to the given roles. The interfaces between different roles are reduced and simplified as much as possible to allow an isolated view on different components and to support interchangeable orthogonal developments.

Nevertheless, to support critical infrastructures services, current cloud architectures are lacking these items that are needed to address the following requirements:

• A retrospective or real-timeroot cause analysis func-tion for malfuncfunc-tions and security deficiencies is only feasible when the cloud architecture provides appro-priate management and logging capabilities, which can be assigned to specific infrastructure levels and components. To analyse who is responsible for a given problem situation, the architecture has to be

transpar-ent with respect to data processing, storage, and com-munication; this is even necessary at the infrastructure level of cloud service providers. Consequently, cloud architectures for CI services need additional entities or components allowed to do infrastructure compre-hensive tracing with dedicated knowledge about the interface between two architectural layers. Current reference architectures usually consider a very encap-sulated cloud service provider infrastructure resulting in monolithic management modules, and lack root cause analysis capabilities inside this layer, and in interaction with other layers.

• Partial transparency of storage, computation and com-munication processes, and the involved physical loca-tions is also needed to validate the data protection re-quirements for CIs. Furthermore, a cloud architecture has to considertrusted remote audit trails, in order to empower cloud users to validate the infrastructure is operating in conformance with current data protection laws. This requirement results in an architecture that permits tracing data or even entire information flows with all involved components and locations. For that purpose the architecture may provide an aggregated feedback channel to the involved stakeholders on the appropriate abstraction level. Most of the current reference architectures do not consider this additional information flow.

• Resilience requirements for CIs usually exceed the average demands for cloud services. Redundancy is typically desired for CIs, such that supporting archi-tectures have additionally to deal with multi-tenancy

and multi-homing aspects for CI services. Besides handling redundancy, a CI cloud service architecture also has to provide an adequate and fine-grained inter-face to specify service specific privacy, security, and resilience requirements for underlying infrastructure components. Also, interfaces should allow for auditing the fulfilment of such requirements. Most of the exist-ing reference architectures already consider interfaces to specify service requirements. CI services need a refinement of those interfaces in order to support the necessary redundancy, service models and resource specific declaration of requirements.

III. ARCHITECTURALMODEL

We need an architectural model in order to identify and locate the various mechanisms and potential new interfaces that are necessary to support critical infrastructure IT services. The proposed architectural model (cf. Fig. 2) aims at a more precise role distinction that allows for better security anal-ysis, separation of responsibilities, identification of separate administrative interfaces, and for checking the influence and coverage of legal aspects. One important aspect is to consider that a single Cloud Infrastructure is typically simultaneously used byseveral tenants (i.e., customers, users), whose virtual resources are usually isolated from each other to some degree within the Cloud Infrastructure (similar to [5] that focuses on communication aspects). The distinction between the virtual Tenant Infrastructure and the physical Cloud Infrastructure is important, since a clear separation between responsibilities

(5)

is necessary. In case a CI service fails, a root cause analy-sis should reveal the responsible party. Moreover, additional monitoring or logging mechanisms for auditing can be located within the Cloud Infrastructure, as well as within the virtual infrastructure of the tenant. In this context, it may be helpful to identify additional interfaces that allow for increasing the trust level between the tenant and the cloud infrastructure provider by permitting some level of audits.

Moreover, the architecture must clearly distinguish the service provider from the virtual tenant infrastructure. This allows to separate two important aspects of any CI service:

functional and behavioural features. On the service layer, the functional features of the service are dealt with. Which components are required to compose the service and how these components need to be inter-connected. On the tenant infrastructure level the behavioural features of the service are dealt with: this includes elasticity features, component redundancy, and overload control. Failures on the physical infrastructure level can be made transparent to the service by self-healing mechanisms on the tenant infrastructure level, e.g., automatic fail-over to redundant components and auto-recovery by on-demand provisioning of virtual resources. In addition, geo-diversity can be realized on the tenant infrastructure level by requesting resources from multiple independent cloud in-frastructure providers for increased dependability but also for higher privacy and security.

Tenant Infrastructure Level Physical Cloud Infrastructure Level CI Service Critical Infrastructure (CI) Service Level Component A Abstraction Level CI Service User Resources CI Service Provider Tenant Infrastr. Provider Service Components Tenant Infrastructure

Cloud Infrastructure (Data Centre)

Cloud Infrastructure Provider Client Devices Stakeholder Provides Service (SaaS /Paas) Provides Virtual Infrastructure (IaaS /PaaS) Provides Virtual Resources (IaaS) •Virtual Compute Resources

•Virtual Storage •Virtual Network manages cloud resources manages virtual resources manages service resources •Compute •Storage •Network Component B Component C User Level SLAs

Fig. 2. Different Considered Abstraction Levels

Consequently, Fig. 2 shows different levels of abstraction that we want to distinguish. The different abstraction levels correspond to different stakeholders and their view on the managed resources. Most existing architectures concentrate on the provisioning of resources at the Cloud Infrastructure level, i.e., how to provide and manage cloud resources within a data centre. A more overarching view covering different levels and stakeholders is needed:

• At the top level, i.e. User Level, we have the Critical Infrastructure Service User who remotely accesses the Critical Infrastructure Service. For instance, an urban traffic management operator could observe and control the traffic flow within a city, using web-based interfaces as well as various distributed sensors

that deliver measurement data as input to the next lower CI Service Level components. In the example of Section I-B it would be CITYSEC watching at the output of the video surveillance system.

• The next lower level is controlled by the CI Service Provider who manages the resources atService Level. The CI Service is composed of several components that interact with each other in order to provide the actual service. The CI Service Provider monitors the service operation and performance at this level. The service components usually either provide the application or the platform that are required at user level. The service components are instantiated on the virtual infrastructure that is provided by the next lower level. In use case of Section I-B it would be TENSYS, who uses and provides video archival and automated video analysis services as well as streaming and web servers that provide secured access to video camera live streams and stored footage.

• TheTenant Infrastructure Level provides a virtual in-frastructure that consists of virtual compute resources, virtual storage, and virtual network resources. Those virtual resources are managed by the Tenant Infras-tructure Provider. We distinguish this stakeholder from the service provider since they may be sepa-rate organisations. Several such tenants are typically hosted within one Cloud Infrastructure that is the next lower level. The Tenant Infrastructure Provider may provide either the pure virtual infrastructure (IaaS) or some basic services as a platform (PaaS) to the CI Service Provider. In the former case the CI Service Provider may install complete VM images containing an operating system, middleware services, and application-oriented service components, includ-ing necessary configuration data. In the latter case, the Tenant Infrastructure Provider may provide and operate some pre-installed operating system images and middleware or supporting services. Aside from that, the Tenant Infrastructure Provider is not aware what services and applications are running inside this virtual infrastructure. In the use case from Section I-B, it would also be TENSYS who operates the various virtual resources as a basis for their video surveillance service.

• ThePhysical Cloud Infrastructure Levelat the bottom provides real physical (sometimes called ‘bare metal’) compute, storage, and network resources, which are hosted in a data centre and administered by theCloud Infrastructure Provider. This level usually provides virtual resources (IaaS) to its upper level, i.e., the tenant. The virtualization solution usually provides (a certain degree of) isolation between the different tenants that are multiplexed onto the same physical in-frastructure and thus permits the sharing of resources. For increasing the resource efficiency, the Cloud In-frastructure Provider can usually transparently move virtual resources across its physical infrastructure, unnoticed by the tenant. In use case of Section I-B it would be CLOUDCORP who operate the physical Cloud Infrastructure.

(6)

It should be noted that stakeholder roles may sometimes be realised by the same organization. For instance, the Tenant Infrastructure Provider may be also the CI Service Provider (like TENSYS in our use case) or the CI Service Provider is also the Service User, or the Tenant Infrastructure Provider may be the same organisation as the Cloud Infrastructure Provider.

The different stakeholders are usually bound to each other by bilateral Service Level Agreements (SLAs). In case of a CI service failure, it must be possible to investigate whose fault it actually was and maybe prove that an SLA violation has happened. Therefore, new interfaces are required to permit more transparency and security audits for specific functions. For instance, if the Cloud Infrastructure Provider moved a virtual machine from one physical host to a different physical host, it is still opaque to which particular physical machine the VM was actually moved, but the event as such may be indicated to the Tenant Infrastructure Provider, maybe with enough details so that co-location constraints can be verified. Such additional information would be helpful in auditing service execution. In case of an SLA violation, trusted audit trails may be useful to find the root cause.

Fig. 3 shows further important aspects that need to be considered: multi-tenancy and resilience by using resources from different data centres that may be operated by different Cloud Infrastructure Providers.

The aspect of multi-tenancy is illustrated by having Tenant X and TenantY that are both using virtual resources hosted in Data CentreB.1from Cloud Infrastructure ProviderB. Tenant X is hosting a CI service, whereas Tenant Y is running a service which is not related to Critical Infrastructures. Since current systems do not allow for perfect isolation of virtual resources, services running in Tenant Y’s infrastructure may influence CI services running in Tenant X’s. For instance, this may be the case where virtual resources are co-located within the same physical machine, or if network traffic to TenantY adversely affects data traffic to TenantX’s resources. This may be especially true in case that Tenant Y suffers from a (Distributed) Denial-of-Service attack, or its virtual infrastructure is used as an attack source itself and quality-of-service mechanisms in the network to enforce proper tenant network separation are missing.

Moreover, Tenants may explicitly request resources from different locations, i.e., either from different data centres of the same Cloud Infrastructure Provider (cf. Tenant Y’s resources hosted in Data CentresB.1 andB.2) or from different Cloud Infrastructure Providers (cf. Tenant X’s resources hosted in Data Centres A.1 andB.1). If the different Cloud Infrastruc-ture Provider cooperate the latter is also called Cloud Fed-eration [12]. One objective for distributing virtual resources across different regions, availability zones, or even providers is to increase the availability of the CI service.

Furthermore, the access to cloud resources in a data centre is a critical path for operating the CI service: if the Internet connection to the data centre(s) is unavailable, the service operation may fail, e.g., because the operating personnel cannot access the service or because critical data input such as telemetry data or camera streams is not delivered into the data centre. In use case of Section I-B this connection

Component A Cloud Infrastructure Provider A CI Service Users Cloud Infrastructure Provider B

Data Centre B.1 Data Centre B.2 Data Centre A.1

Other Service

Tenant X Tenant Y

Other Service Users

hosts virtual resources hosts virtual resources hosts virtual resources hosts virtual resources Physical Cloud Infra-structure Level Tenant Infrastructure Level Service Level User Level hosts service components CI Service hosts service components Component B Component C Component A Component B Component C _Component D

Fig. 3. Resilience and Multi-Tenancy Aspects

from service user CITYSEC to the data centre is provided by TELCOM. In that case, it may be necessary to provide a dedicated access path to the data centre, e.g., by providing an MPLS VPN that extends to the data centre. This requires, however, that TELCOM has enough control over the whole network path from the data centre to the service user CITYSEC. Alternatively, TELCOM must request corresponding services from other ISPs or must cooperate with them accordingly. There are many ways of providing the required resilience and redundancy in network access, using services from multiple providers is usually preferable than depending on only a single provider.

Moreover, site multi-homing of the data centre itself is an additional requirement, possibly inherently present in case of using multiple Cloud Infrastructure Providers. Fig. 4 shows a site multi-homing scenario for a single Cloud Infrastructure Provider. The exemplary data centre network is hierarchically structured, consisting of so-called Top-of-Rack Switches ag-gregated by possibly several levels above (aggregation and core level), up to the WAN access level, that typically consists of IP routers using BGP (Border Gateway Protocol). We note that several other data centre network topologies are in use, especially increasingly meshed ones, like FatTree or BCube, allowing for multipath transport and more efficient solutions – this fact is, however, not relevant for our further considerations. In this example, the data centre is multi-homed to two different Internet Service Providers (ISP Aand ISP B), so the Service user can probably access the virtual resources hosted in the data centre even if one of both ISPs fails. However, redundant (i.e., site multi-homed) network access should be available at the client side, too, otherwise a single-point of failure exists on this side. Additionally, it should be verified that the redundant paths are really disjoint – also physically, especially on the last mile.

As mentioned earlier, there is a need to identify mech-anisms and interfaces that can be used to increase the trust level between the tenant and the Cloud Infrastructure Provider. Fig. 5 shows a management-oriented view of the architectural model.

(7)

Cloud Infrastructure (Data Centre) Client Devices Internet ISP A ISP B Storage Resources Aggregation and Core Level WAN Access Level

Compute + Network Resources

Top of Rack Switch

Client Internet Access

Fig. 4. Network Access to Multi-Homed CI services

Starting at the Service Level, the CI Service Operator

controls its service components by using anOperating Support System (OSS). For initialisation of an actual deployment the latter passes an Infrastructure Service Description to theTenant Infrastructure Management System. It is responsible to request the adequate resources, including the necessary constraints like co-location restrictions and redundancy requirements, from the Cloud Infrastructure Management System by passing corre-sponding resource descriptions to it. Orchestrationmeans that the Tenant Infrastructure Management System provides the required services components by instantiating and connecting the necessary virtual resources and instances, e.g., a web service with high reliability requirements could automatically be mapped onto a load balancing component and two or more backend web servers, that have to be placed on disjoint physical hosts. Traditionally, such constraints have been ex-pressed by affinity and anti-affinity rules. This is not sufficient for dependent components, e.g., stateful components in an active-standby protection use cases. Provisioning for such use cases requires explicit parametrization of the deployment, e.g., maximum tolerable delay between two instances for synchronisation (see [13] for details). Monitoring must be also performed at this level in order to verify fulfilment of the resilience requirements as well as the enforcement of the requested policies.

At the Cloud Infrastructure level, different kinds of virtual systems are supplied and managed: Virtual Machines (VMs),

Virtual Storage (VS), or Virtual Network (VN) components. The virtual resources are provided by a virtualization software solution that usually includes aCloud Infrastructure Manage-ment System, which manages individual resources within the physical machines by usingVirtual Machine Monitors(VMMs, often also called Hypervisors), Virtual Storage Monitors, and

Virtual Network Monitors respectively. In case that the Cloud Infrastructure Provider also supplies preconfigured compo-nents, it may load preinstalled VM images from a repository into the allocated virtual resources. The Tenant Infrastructure Management System may automatically request the creation

and instantiation of new virtual resources within the SLA bounds in order to utilise the elasticity of cloud resources.

At each level a corresponding monitoring is necessary: the CI Service Operator monitors its components at service level (e.g., transaction latency, number of failed and successful requests and so on). The monitoring is component specific and requires service internal functionality; often this component specific monitoring is called Element Management System

(EMS). The EMS interfaces with the OSS for event reporting. At the OSS level the events from several components get corre-lated to achieve a holistic view on the current service context. Based on this context adaptive actions can be initiated, e.g., starting of additional replicas to prevent overload situations. In addition to the service specific monitoring, generic monitoring on the virtual resource level provides input to scale-out/down and load balancing decisions. Such measurements are taken on the Tenant Infrastructure Operator level, monitoring the service’s own virtual resources (state of the virtual machines etc.), whereas the Cloud Infrastructure Operator monitors the physical infrastructure as well as all virtual resources from all tenants. The cloud infrastructure provider can use these measurements to assess if VMs should be migrated to other compute nodes and shut some down for energy savings or if resource contention is increasing and VMs should be migrated to prevent service failures.

CI Service User CI Service Operator Tenant Infrastr. Operator Cloud Infrastructure Operator Stakeholder Operating Support System Tenant Infrastructure Management System Cloud Infrastructure Management System VS VM VM VN VSM VMM VMM VNM Infrastructure Service Descriptions Resource Descriptions Tenant Infra-structure Cloud Infrastructure Service VImage DB CI Service Component A Component B Component C User •Orchestration •Provisioning •Monitoring •Policy Control

Fig. 5. A Management Oriented View

In case a service failure happens the root cause must be identified. The elasticity and dynamics of the cloud environ-ment makes it difficult to reconstruct what actually happened during a certain time period, since the state of the virtual topol-ogy and its components may change frequently. For instance, the Tenant Infrastructure Operator may have requested to install an additional virtual network component for resilience, which may have caused a temporary loop for a few seconds due to mis-configuration by the Tenant Infrastructure Operator. This implies several questions, e.g., how can the Cloud In-frastructure Provider prove that the additional component was correctly provided and that the failure was not on his side? Is it

(8)

possible to prove that a certain topology was active a specific point in time?

Moreover, if the Cloud Infrastructure Provider provides more transparency to the tenant for otherwise opaque opera-tions may help to increase the trust level between both parties, because auditing of security and related requirements may be possible. Usually, the tenant has no knowledge where the virtual instances are actually placed within the Cloud Infras-tructure or when they are migrated. For auditing purposes it may be useful though to provide some additional information, so that the tenant can verify his collocation requirements are kept or when a virtual machine was migrated. Log data and some operational data may be stored at some trusted third party in order to provide a solution that allows for non-repudiation. While the objective of most other approaches is mainly to specify and standardise mechanisms for the cloud infrastruc-ture level and how to realise and provide services on top of this cloud infrastructure, our architecture aims at covering the provision, deployment, and operation of critical infrastructure services in cloud environments. A separation of different stakeholders and their responsibilities is important in order to address technical and legal requirements that we identified.

IV. CONCLUSION

This paper addressed the special requirements that can be identified if critical infrastructure services are moved fully or partially into cloud environments. Understanding and manag-ing risk associated with such environments is important in order to eliminate the uncertainty for operators and providers of critical infrastructure IT systems. Thus, realising such CI services in the cloud implies security and resilience require-ments that existing cloud offerings do not address well. From a legal perspective, we derived a requirement for provision of mechanisms and instruments that permit remote-auditing (including trusted audit trails).

Separation of different stakeholders and their responsibili-ties is important in order to adequately address the technical and legal requirements that we have identified. Therefore, the presented architectural model aims at a more precise role distinction that allows for better security analysis, separation of responsibilities, identification of separate administrative interfaces, and for checking the influence and coverage of legal aspects. We identified different abstraction levels: The Service User level, the CI Service level, the Tenant Infrastructure level, as well as the Physical Cloud Infrastructure level.

This model constitutes a basis for the development of interfaces that permit the utilisation of monitoring and audit-ing tools that enable service audits in a lawful and privacy compliant manner. Such cooperative interfaces should increase transparency in the cloud infrastructure to allow for root-cause analysis. This work will be complemented by the development of process-oriented security guidelines, policies, and policy languages that allow the expression of security requirements that are needed for realising CI services in clouds. Other work will concentrate on anomaly detection mechanisms that are used to discover deviations from expected system and network

behaviour, in order to disclose attacks and to predict their impact. As a next step we are identifying and defining the mentioned interfaces and mechanisms along the architectural model required to support deployment of CI services.

ACKNOWLEDGMENT

The research presented in this paper has been funded by the European Commission in the context of the Research Framework Program Seven (FP7) project SECCRIT (Grant Agreement No. 312758).

REFERENCES

[1] J. P. G. Sterbenz, D. Hutchison, E. K. C¸ etinkaya, A. Jabbar, J. P.

Rohrer, M. Sch¨oller, and P. Smith, “Resilience and Survivability in Communication Networks: Strategies, Principles, and Survey of

Disci-plines,”Computer Networks: Special Issue on Resilient and Survivable

Networks (COMNET), vol. 4, pp. 1245–1265, 2010.

[2] European Parliament and Council of Europe, “Directive 95/46/EC on

the protection of individuals with regard to the processing of personal data and on the free movement of such data (Data Protection Directive),” 1995.

[3] European Commission, “Proposal for a regulation of the European

Parliament and of the Council on the protection of individuals with regard to the processing of personal data and on the free movement of such data (General Data Protection Regulation),” 2012.

[4] V. Josyula, M. Orr, and G. Page, Cloud Computing: Automating the

Virtualized Data Center. Cisco Press Networking Technology, Cisco Systems, 2011.

[5] M. Lasserre, F. Balus, T. Morin, N. Bitar, and Y. Rekhter,

“Frame-work for DC Net“Frame-work Virtualization.” Internet-Draft, Internet

En-gineering Task Force, July 2013. https://datatracker.ietf.org/doc/

draft-ietf-nvo3-framework/.

[6] Cloud Security Alliance, “Quick Guide to the Reference Architecture

-Trusted Cloud Initiative,” 2011.

[7] A. Kochut, Y. Deng, M. R. Head, J. Munson, A. Sailer, H. Shaikh,

C. Tang, A. Amies, M. Beaton, D. Geiss, D. Herman, H. Macho, S. Pappe, S. Peddle, R. Rendahl, A. E. T. Reyes, H. Sluiman, B. Snitzer, T. Volin, and H. Wagner, “Evolution of the ibm cloud: enabling an enterprise cloud services ecosystem,”IBM J. Res. Dev., vol. 55, pp. 397– 409, Nov. 2011.

[8] Distributed Management Task Force (DMTF), “Architecture for

Man-aging Clouds Version 1.0.0: A White Paper from the Open Cloud Standards Incubator,” June 2010.

[9] I. M. Abbadi, “Clouds Infrastructure Taxonomy, Properties, and

Man-agement Services,” in Advances in Computing and Communications

(A. Abraham, J. L. Mauri, J. F. Buford, J. Suzuki, and S. M. Thampi,

eds.), vol. 193 of Communications in Computer and Information

Science, pp. 406–420, Springer Berlin Heidelberg, 2011.

[10] R. B. Bohn, J. Messina, F. Liu, J. Tong, and J. Mao, “NIST Cloud

Computing Reference Architecture,” inProceedings of the 2011 IEEE

World Congress on Services, SERVICES ’11, (Washington, DC, USA), pp. 594–596, IEEE Computer Society, 2011.

[11] Open Security Architecture, “SP-011: Cloud Computing Pattern.”

http://www.opensecurityarchitecture.org/cms/library/patternlandscape/251-pattern-cloud-computing.

[12] B. Rochwerger, D. Breitgand, A. Epstein, D. Hadas, I. Loy, K. Nagin,

J. Tordsson, C. Ragusa, M. Villari, S. Clayman, E. Levy, A. Maraschini, P. Massonet, H. Muoz, and G. Tofetti, “Reservoir – When One Cloud

Is Not Enough,”Computer, vol. 44, no. 3, pp. 44–51, 2011.

[13] M. Sch¨oller, M. Stiemerling, A. Ripke, and R. Bless, “Resilient

deploy-ment of virtual network functions,” into appear in Fifth International

Workshop on Reliable Networks Design and Modeling (RNDM 2013), 2013.