Virtualization - Data Protection Issue - Grid Computing Security pdf

7.2 Data Protection Issue

7.2.2 Virtualization

A typical data center today hosts different applications in different servers resulting in over-provisioning of resources and low utilization. Therefore, for some time there has been a move towards consolidation of servers to increase the overall utilization of the data centers. Research and develop- ment in the area of server consolidation has resulted in virtualization solu- tions in the server consolidation space. These solutions typically allow applications to run on self-contained environments called virtual machines (VM). It is possible to create different instances of VMs on individual servers, resulting in a better provisioning environment and higher overall utilization. Not only different instances of VMs can run but these instances

7.2 Data Protection Issue 139 can also host completely different operating systems. Therefore, virtualization techniques allow legacy systems to run on new systems seamlessly. In addition to these advantages, a by-product of virtualization is isolation. Therefore, virtualization techniques allow the creation of secure environments and can be used as an isolation solution. It is to be noted that the main goal of virtualization solutions is to provide higher resource utilization and server consolidation. The ability to provide secure and isolated environments come as a by-product. Therefore, there is a need to create flexible policies on the virtualized environment. Research is currently being carried out in this regard [122].

To provide virtualization, there is a need for a layer of software which provides the illusion of a real machine to multiple instances of virtual machines. This layer has been traditionally called Virtual Machine Monitor (VMM). There are also concepts called the host operating system and guest operating system. The former is the operating system or OS which hosts the VMM, and the latter is the operating system which is hosted on top of the VMM. It is also possible for the VMM to run directly on the hardware. In that case, host operating system is not required, and VMM will the minimal OS. There are three popular virtualization technologies: hosted virtualization, para-virtualization, and shared kernel based virtual- ization techniques.

• The Hosted Virtualization model is one where the VMM and the guest OS run on the user space of the host OS. The applications running on the host OS and the guest OS share the same user space. Generally, this model does not require any modifica- tion to the host OS. However, since there are multiple redirections, the performance of such a model suffers significantly. VMWare® _{GSX Server is an example of hosted virtualiza-} tion system.

• The Para-Virtualization model is one wherethe operating systems are modified and recompiled so that the multiple redirections of the hosted model can be avoided. The performance of the para-virtualization based systems is comparatively better than the hosted virtualization based systems. Xen [123] and Vir- tuozzo®_{[124] are examples of para-virtualization systems.}

• The Shared Kernel systems are those systems where the kernel is shared and the user space is partitioned to be used by different sets of applications. An example of shared kernel based virtualization systems is the Linux VServer [125].

Hosted Virtualization Model

The hosted virtualization model allows multiple guest Operating Systems (OS) to be run on the user space of the host OS. Figure 7.3 provides an overview of the hosted virtualization model. As shown on the figure, App1 and App2, and the Guest OS share the user space. Applications running in the guest OS are sandboxed within the guest OS. The applications within the guest OS contact the hardware being redirected through the virtualization layer, thereby reducing the performance of the overall systems.

Fig. 7.3. Overview of a hosted virtualization model

VMWare®_{GSX Server – A Hosted Virtualization Solution:}_{One of the} most popular hosted virtualization solutions is VMWare’s®_{GSX Server.} Similar to the other hosted virtualization models, VMWare®_{GSX Server} has a host operating system, and guest operating systems which run as applications on the host operating system. VMWare®_{Workstation's hosted} architecture also includes a user-level application (VMApp), a device driver (VMDriver) for the host system, and a virtual machine monitor

(VMM) that is created by VMDriver as it loads. Thereafter, an execution context can be of two types, native or virtual. The former context belongs to the host, and the latter belongs to the virtual machine. The VMDriver is responsible for switching this context. I/O initiated by a guest system is

7.2 Data Protection Issue 141 trapped in the VMM and forwarded to the VMApp, which executes in the host's context and performs the I/O using the normal system calls. VMware®_{uses numerous optimizations that reduce various virtualization} overheads. In spite of the optimizations, the overheads introduced by the GSX Servercan be significantly high, depending on the application. This led to the development of another model of virtualization called the para- virtualization model. However, it is to be noted that the hosted virtualization model in spite of the performance is extremely popular as it provides an easy solution to the virtualization problem.

Para-virtualization System

The hosted virtualization model mentioned before is one of the easiest mechanisms to achieve isolation through virtualization. However, the ease of managing hosted operating systems comes at a price. That price being performance. Since, in a hosted model there are multiple redirections of system calls, the performance suffers. In addition, the virtualization layer must manage all the underlying hardware structures like DMA controllers, page tables, I/O devices, and others, to provide a consistent view to all the operating systems hosted by the model. Whenever the virtualization layer context switches between the different OS images, it first needs to preserve the current states in the hardware structures, which can be used when the execution is resumed again. This managing of the structures puts a huge overhead on the performance of the hosted virtualization models. This overhead is sometimes quite significant.

To counter the problems of the hosted virtualization mentioned above, researchers have developed another virtualization model called the para-virtualization model. This model introduces the concept of the ideal- ized hardware interface which completely abstracts the underlying hard- ware infrastructure. The virtualization layer or the hypervisor is embedded in the address space of each guest OS, so whenever the guest OS is required to update a hardware structure, it makes an API call to the hypervisor. Therefore, the hypervisor is able to keep track of all the happenings in the hardware data structures helping it to make optimal decision of updat- ing the structures during context switching. It can also provide run-time specific information to the guest OS-es, enabling them to make better scheduling decisions. Based on these, the para-virtualization solutions have several distinct benefits:

• Performance of the para-virtualization solutions is significantly better than the hosted virtualization model. The hypervisor,

since it is embedded with the guest operating systems, results in having fewer redirections. In addition, the information between the guest OS and the underlying hardware abstraction layer is exchanged much faster resulting in better managment of the guest OSes.

• Para-virtualization solutions provide significant benefits in terms of device drivers and device interfaces. Para-virtualization allows the virtualization of device drivers. It helps to provide resource CPU guarantees, and porting OS images across hardware.

• Para-virtualization offers better protection to the hypervisor compared to the hosted virtualization model. Since the hypervisor is run in a different protection domain compared to device drivers, it is protected from bugs and crashes of the device drivers.

In spite of the above benefits, the market share of the para- virtualization solutions are much less compared to the hosted virtualization solutions. Though the performance of para-virtualization solutions is better, they do not work across different platforms. The para-virtualization model requires the hypervisor to be embedded into the address space of the guest OS. Therefore, to achieve this, the guest OS need to be recompiled. This can be applied to the open source operating systems like Linux. How- ever, for closed OS like Windows, para-virtualization solutions are currently not available. However, with chip vendors developing chips which support virtualization like Intel’s®_{Virtualization Technology (VT) [126]} and AMD’s®_{Pacifica [127], applicability of the para-virtualization solu-} tions will be greatly enhanced. The experts in the field of virtualization are confident that the para-virtualization techniques will be the future of virtualization.

Xen – A Para-Virtualization Solution: One of the most popular para- virtualization solutions is Xen which was initially developed from Univer- sity of Cambridge. Currently, Xen is marketed by a company called Xen- Source®_{founded by the leader of of the Xen project, Dr. Ian Pratt. Market} share of the Xen software is increasing in the virtualization space, where VMWare®_{has significant presence.}

Xen presents all the benefits of the para-virtualization systems. It is fast and has a very low overhead on the overall performance of the system. This is achieved by storing the hardware states in memory and managing them efficiently. Figure 7.4 shows the high level architecture of the Xen

7.2 Data Protection Issue 143 para-virtualization system. Because of its open-source nature, and good performance, Xen is even witnessing commercial implementations. One of the biggest impediments in the wide-scale adoption of Xen and other para- virtualization solutions is the lack of virtualization capabilities of the popular IA-32 architecture. As mentioned in [128], the architecture has at least 17 instructions which make the architecture “non-virtualizable.” This leads to the compiling of kernels to make them aware of the Xen virtualization features. However, Intel®_{is coming up with Intel VT which will} introduce virtualization features into the processors along with better management capabilities. The technology will push the adoption of Xen in a greater way.

Fig. 7.4. Architecture of Xen

Nova – On Demand Virtual Execution Environments: One of the main challenges in having the virtualization solutions catering to isolation needs of grid systems is to have a policy manager which interacts with the virtualized environment. One of the key requirements of such a policy manager would be to create a virtualized execution environment on-demand based on the policies and incoming job requests. Nova [129] provides such a fa- cility for the grid systems. The goals of Nova are (a) to reduce the time required to get a “working” virtual machine, (b) to ensure that the virtual

machine allocated to the grid job has the necessary hardware and software resources, to perform the job, (c) to perform effective clean-up of the virtual machine once the job is complete, and (d) to ensure that the effect of a completed job does not spill over to another future job. Nova addresses the goals by creating, in advance, virtual machines with configurations that consume very little resources which are called “Tiny VM”. Nova has been built on top of Xen system. The authors have shown that Nova is able to create virtual machines in the order of a few milliseconds. It is to be noted that the solution is a research in progress and significant effort is needed before it can be deployed effectively in enterprises.

Shared Kernel Systems

The third type of virtualization system is the shared kernel system. An ex- ample of such a system is the Linux VServer. The basic concept of the Linux VServer and other shared kernel systems is to divide the user space environment into distinctly separate units also called Virtual Private Serv- ers (VPS), in such a way that the processes within each VPS treat them as separate kernels. The shared kernel systems are very efficient compared to the other virtualization technologies. However, the flexibilities are greatly reduced as they tend to work on a single operating system, as all the applications use a shared kernel. For example, the Linux VServer runs exclu- sively on Linux.

Shared kernel systems are able to achieve the following benefits:

• Higher Resource Utilization: One of the biggest advantages of the shared kernel systems is the increase in resource utilization. By proper allocation of resources across the partitions and ability to share common resources across the partitions helps in increasing the utilization levels. The Linux VServer implementation uses token bucket implementation to achieve fairness across the different partitions or contexts.

• Security: The shared kernel based virtualization systems have high security as they can isolate the different contexts in an efficient and secure manner.

• Low Overhead: The overhead associated with the shared kernel systems is very low as they do not pass through multiple layers unlike the other virtualization systems. As mentioned in [130], the overhead of a Linux VServer system can be as low as 2%.

7.2 Data Protection Issue 145

In document Grid Computing Security pdf (Page 148-155)