Information flow in IaaS cloud computing - Information flow in cloud infrastructures

5.1 Information flow in cloud infrastructures

5.1.1 Information flow in IaaS cloud computing

Figure 5.1: Information flow between legitimate actors (differentiated by virtual resources, processed data, and meta data). The depiction of information flow with authorised third parties is omitted since they can interact with any other actor.

Processed Data Virtual Resources Meta Data

Private Customer Corporate

Customer Service Provider

Software Vendor Cloud Provider Hardware Provider Hardware Vendor Authorised Third Party

There are three different types of information flow that can be identified between the legitimate actors. Figure5.1illustrates the three types of information and the way they flow between the different actors. The different types of information cover (1) the data processed within the cloud, (2) the virtual resources operated in the cloud, and (3) meta data of hardware and software, which are exchanged during their operation. For authorised third parties the information flow depends on which basis of authorisation the third party acts on and with whom the third party interacts. For example in a case of full inspection, there can be all three types of infor-

mation flow. For that reason, the depiction of the information flow for authorised third parties is omitted in Figure5.1. Further, it is assumed in this thesis that for authorised third parties all three types of information flow are allowed. For a more differentiated control of information flow to authorised third parties, it is necessary to extend the model of information flow by conditional constraints indicated under which circumstances a specific information flow is allowed. This requires additional research with a focus on information flow to authorised third parties, which is not part of this thesis. Possible extensions are discussed in Section7.3. In the following, each type of information flow is discussed.

The first type of information flow (processed data) is related to the customers’ data that are processed within the cloud. The data are provided by the private and corporate customers and are processed on their behalf by the service and cloud providers. Further, the data can be exchanged with others customers, e.g., in aBusiness-to-Business (B2B)scenario between two corporate customers, and also between service providers when providing subcontracted cloud services, e.g., when using a virus scanning service for an email service. InIaaS, the customer has direct influence on the information flow of the processed data, since the data processing application are under the customer’s control. The providers can also have an influence, since they are operating the resources running the applications. However, except for the service provider, the providers do not directly interact with the data processing applications.

The second type of information flow (virtual resources)is related to the virtual resources that are operated in the cloud. The information flow of virtual resources differs from that of processed data, since virtual resources represent a part of the data processing system itself (cf. Section4.1.2). Each virtual resource can contain customers’ data (i.e., processed data) men- tioned in the first type of information flow. Without inspecting the virtual resources, the data processed within remain hidden inside the virtual resources and beyond the awareness of the cloud, hardware, and service providers. However, if the virtual resource is migrated, the data within are migrated too, resulting in an information flow of processed data. There same is true for initial virtual resource placement, if the customers’ data are contained in a virtual resource (e.g., in the case of customer defined virtual machine images). Also, the destruction of a virtual resource usually results in the deletion of contained data.1Therefore, the information flow of virtual resources usually also covers the information flow of processed data. To conclude, virtual resources are operated under the control of a service provider, a cloud provider, and a hardware provider and are exchanged among them (e.g., due to resource migration). The customers have no direct influence on this type of information flow.

The third type of information flow (meta data)is related to different types of meta data that are exchanged with the software and hardware vendors. Meta data in this context are data that is directly related to the software and hardware itself and usually covers information on software execution and hardware operation. Examples for meta data are license information, usage statistics, and error reports on failure. Usually, meta data do not contain customers’ data. How- ever, it is possible for meta data to contain customers’ data. For example, a memory snapshot after a hardware failure may contain customers’ data that was loaded into the memory. What

1_{In this thesis it is assumed that on destruction the allocated hardware resources are freed and no longer can be} addressed by the corresponding customers. However, data are not necessarily deleted or overwritten immediately after freeing the hardware resource and might be recovered on physical access.

types of data are transferred to the vendor can vary for different vendors and applications, and they have to be investigated for every single case. Often, the customers and providers can directly influence the transmission of meta data, for example, in the case of optional transmission of error reports or by filtering the transmitted data. If such options are consequently implemented, the information flow of processed data as a part of the meta data can be fully controlled by the customers and providers, and therefore be handled like the information flow of processed data. In this thesis it is assumed that the information flow of meta data does not contain processed data, and it will therefore not be further investigated. Possible extensions of the information model for covering meta data are discussed in Section7.3.

To summarize, the information flows of processed data and of virtual resources are most relevant for compliant data processing in the cloud, since both types of information flow contain customers’ data that have to be processed in a legally compliant manner and according to the customers’ mandate. In this thesis following definitions of information flow of processed data and of virtual resources are made according the observations above.

Definition 5.1 (Information flow of processed data) All types of data that are processed on behalf of a corporate customer within a virtual resource (of the cloud) a considered to be processed data. Further, processed data are associated with the corporate customer on whose behalf they are processed. Then, the information flow of processed data is considered the access of an actor (cf. Section2.2.1), virtual resource (cf. Section4.1.2), or hardware resource (cf. Section4.1.3) to the processed data.

Definition 5.2 (Information flow of virtual resources) The term virtual resources is used according to the entity-relationship model onIaaSclouds defined in Section4.1and covers virtual machines, virtual storage, virtual links and virtual network services (cf. Section4.1.2). The

information flow of virtual resourcesis considered the access of an actor (cf. Section2.2.1) or hardware resource (cf. Section4.1.3) to the virtual resources, for example, due to virtual resource placement or migration to a specific hardware resource.

Remark 5.1 (Accessing of vs. connecting with virtual resources) There is a difference between accessing a virtual resource (e.g., a server executes a virtual machine) and connecting with a virtual resource (e.g., a corporate customer has a network connection with a virtual machine). While the first is considered the information flow of virtual resources, the latter is considered the information flow of processed data. In this thesis, accessing and connecting are distinguished form one antother in thataccessing of virtual resourcesmeans that virtual resources are fully accessible on the virtualisation level andconnecting with virtual resources

means that a network connection is established with a network endpoint of the virtual machine.

Remark 5.2 (Equivalence of access on virtual resources) The legitimate actors access virtual resources via a hardware resource that hosts the virtual resources. For example, hardware providers access virtual machines when they are migrated to one of their servers. Another example is the duplication of virtual storage for a backup located at a third party cloud provider. Hardware resources access virtual resources when hosting them. In general, for every hardware resource there is a legitimate actor that is responsible for its operation. Since legitimate actors access virtual resources via a hardware resource, the access of a legitimate actor can

always be described by an equivalent access of the involved hardware resource and vice versa. This equivalence makes it possible to model the information flow of virtual resources on the basis of hardware resources only.

Further, it is observed that the different types of information flow cannot be controlled by every actor equally. While the information flow of processed data is primarily controlled by the customers, the information flow of virtual resources is primarily controlled by the cloud and the hardware providers. This has a significant impact on how information flow control has to be implemented in the scenario of IT outsourcing to the cloud. This is because the control mechanisms have to be operated by different parties. As a result, the actions of these parties have to be coordinated (to some extent) to achieve the overall security goal: dealing with the challenge of location inhomogeneity (cf. Def.4.6). In Section5.1.2, the impact of this separation of responsibility and control is investigated, and how it has to be addressed in the information model.

In document Tackling cloud compliance through information flow control (Page 123-126)