A Dynamic Virtual Machine Allocation Technique Using Network
Resource Contention for a High-performance Virtualized Computing
Cloud
Jung-Lok Yu
1, Chan-Ho Choi
1, Du-Seok Jin
1, Jongsuk Ruth Lee
1and Hee-Jung Byun
2*1
National Institute of Supercomputing and Networking (NISN), KISTI, Daejeon,
Korea
2
Department of Information & Communications, Suwon University, Korea
1{junglok.yu, chchoi, dsjin, jsruthlee}@kisti.re.kr,
2[email protected]
Abstract
Virtualized Computing Cloud (VCC) is a well-known resource-provisioning and computing environment due to advantages such as maximized resource utilization, isolated performance, and customizable runtimes. Therefore, recently, VCC has been increasingly adopted in a broad spectrum of service domains, including internet-based application services and computational science. However, efficient resource (i.e., virtual machine; VM) allocation techniques are required to deliver higher-quality Internet and/or computing services. In this paper, we propose a novel dynamic, self-adaptive VMs allocation technique considering network resource contention in a Xen virtualization environment. In addition, we provide generic virtualized resources and job management framework to realize our proposed VM allocation technique. Using the virtualized resources and job management framework, we also practically analyze the impact of various system parameters and job characteristics on the performance of our technique. The results show that our approach outperforms others, reducing the average job response (by up to 37%) and execution (by up to 22.3%) times.
Keywords: Virtualization, Computing platform, Virtual machine, Network resource contention, Performance analysis
1. Introduction
The virtualized Computing Cloud (VCC), also known as the virtualization-based hosting platform, has been considered a promising computing environment mainly because of its advantages such as high resource utilization, low cost, less energy consumption, customizable runtime, and isolated performance [1, 2]. VCC, therefore, has been increasingly adopted in a broad spectrum of different service domains, for example, internet-based application services [3], education [4, 5], and computational science [6, 7].
In VCC environments, the server consolidation [8] technique is commonly used. Server consolidation means that multiple virtual machines (VMs) that execute different service applications are put together on a single physical server, sharing the same limited hardware resources, e.g., CPU, network, and storage. Moreover, service applications (e.g.,
web/database applications and computational science applications) in the consolidated server environment have different workload characteristics, more specifically, different network communication behavior. Therefore, efficient VM allocation techniques are required to deliver higher-quality Internet and/or computing services for various application domains.
In this paper, we propose a novel, dynamic, self-adaptive VM allocation/scheduling technique that takes into account the network resource contention between physical nodes in the Xen [9, 10, 11] virtualization environment. Using generic virtualized resources and job management framework, we also practically analyze the impact of different system parameters and job characteristics on the performance of our proposed technique in an eight-node cloud test-bed. The results show that our approach outperforms others, significantly reducing the average job response and execution times.
The rest of the paper is organized as follows. The next section briefly overviews Xen virtualization and summarizes related works. In Section 3, we describe our proposed VM allocation technique as well as generic virtualized resources and the job management framework, which we have designed and implemented for performance measurement and analysis. Then, in Section 4, various performance results are analyzed. Finally, we conclude this paper in Section 5.
Figure 1. Overall Structure of Xen Virtualized Environment
2. Xen Virtualization and Related Works
VMWare [12, 13] and KVM (Kernel-based Virtual Machine) [14] provide full virtualization [15] using binary code translation and emulation techniques, respectively. Hence, in general, those virtualization environments have an advantage in that it is not necessary to modify the OS kernel, but their performances degrade in order to guarantee secure hardware resource shares among different multiple VMs, resulting in large virtualization overheads.
Unlike those, Xen [9-11], a widely-used open-source Virtual Machine Monitor (VMM), uses the para-virtualization technique to minimize the overheads through minimal OS modification. Figure 1 depicts the overall structure of a Xen virtualization environment consisting of three different components: 1) Xen hypervisor, 2) Domain-U, and 3) Domain-0.
Figure 2. Xen’s Split Driver I/O Model
The Xen hypervisor has ability to directly access hardware resources, such as the CPU, disk, and network, and it plays critical roles in scheduling VCPUs (i.e., virtual CPUs in VMs) onto physical CPUs and allocating/releasing physical memory to/from VMs. In fact, the Xen hypervisor supports isolated execution among multiple VMs by executing hyper-call based trap handler codes. It also provides communication mechanisms, like event channels and I/O rings, among VMs.
Domain-U (Dom-U) means a virtual machine that is running a guest OS and user applications. A Dom-U cannot directly control hardware resources, but its access is only permitted by hyper-calls exported by the Xen hypervisor.
In the Xen environment, Domain-0 is a specialized “control” virtual machine to support network-I/O and disk-I/O operations in Dom-Us. By using the Domain-0, Xen adopts a split driver I/O model to manage the network/disk devices access (please refer to Figure 2). The front-end driver of the model lies in Dom-Us, while the back-end driver is the counterpart for the front-end driver and locates in the control domain (Dom-0). The back-end driver forwards the device request from the front-end driver to the native device driver and returns the response to the front-end driver. Domain-0 can also (de-)multiplex device requests, such as permission to use the virtual network bridge. Although this model keeps the hypervisor as simple as possible, Dom-0 apparently incurs a performance bottleneck because Dom-0 is on the critical path for all of I/O requests.
To overcome these drawbacks, [16] proposed a VMM-bypass I/O model to optimize I/O performance in VM and various scheduling algorithms at the user level. In [17], workload-aware VM placement on multi-core systems is proposed and analyzed, which also uses VM migration for low power consumption whenever needed. [18] analyzed the pros and cons of VMM-bypass I/O for parallel applications running on VMs. In the next section, we describe our proposed novel VMs (i.e., virtual cluster) allocation technique, which dynamically changes a set of physical nodes on the basis of the degree of network contention measured on both Dom-0 and Dom-Us to achieve a virtualized high-performance computing cloud.
Figure 3. Virtualized Resource and Job Management Framework
3. Network Contention-aware Virtual Machines Allocation
3.1. Virtualized Resources and Job Management Framework
In this paper, we have designed and implemented a generic virtualized resources and job management framework to realize our proposed VMs allocation technique, which will be discussed later in Section 3.2 and also to practically analyze the impact of different system parameters and job characteristics on the performance of our proposed technique.
Figure 3 illustrates the virtualized resource and job management framework. It mainly consists of the following five components:
1) Host manager
The host manager is a component that manages physical servers where multiple virtual machines are provisioned.
2) VM manager
The VM manager manages VM lifecycles whenever needed, e.g., VM creation, VM destroy, and VM suspend/resume.
3) Job manager
The job manager controls job lifecycles running on a provisioned virtual cluster (i.e., a set of VMs); that is, it can submit a specific job, cancel the queued or running jobs, and check the status of jobs.
4) Job parser
The job parser is a component for analyzing job specifications requested by users. More specifically, it decides the specification of computing resources (e.g., number of nodes and memory size) to run a job and prepares the description to submit the job.
5) Scheduler (Scheduling Container)
The scheduler is a set of scheduling mechanism interfaces that allows anyone to implement and describe a detailed allocation algorithm. For a job, a scheduler must provide a {host :VMs} map.
In particular, we have a separated mechanism and policy in the scheduling container layer. That is, we provide numerous mechanisms in order to collect various performance factors (e.g., CPU load and the network contention for each host) so that third-party developers can devise scheduling algorithms using these mechanism interfaces.
Figure 4. Network Contention-Aware Virtual Machines Allocation Technique 3.2. Network Contention-Aware Virtual Machine Allocation
As described in prior sections, it is important to understand the network resource contention of physical hosts for the enhanced performance of consolidated applications.
Figure 4 depicts the overall structure of our proposed network contention-aware VMs allocation technique. For each host k in a cloud test-bed consisting of m hosts (1 <= k <= m), the I/O degree of Dom-0 (NETCONTk) is dynamically measured and periodically announced to the host manager in the virtualized resources and job management framework (see Figure 4. for more details). NETCONTkis defined as follows:
) 1 ) 1 (( ) ( k k k IDLE CPU NETCONT , (1)
where CPUk means the CPU utilization ratio of host k that implies how much work is needed for processing network packets, IDLEk is the averaged idle ratio for all VMs running on the host k, and w is the weight factor.
All the network contention information is collected and managed by the host manager. Upon a request of provisioning a set of VMs (i.e., virtual cluster) and executing applications on it, our network contention-aware VM scheduler can greedily select the available physical hosts based on the NETCONT, The VM manager handles the allocation and booting of VMs on the selected hosts, and after that, the requested application can be executed on the dynamically provisioned virtual cluster with minimized network contention. Specifically, we
have modified a xentop [19] tool to measure NETCONT on each host. For VM provisioning and lifecycle management in the VM manager, we used Open Nebula (version 3.2) [20] cloud management toolkit’s XML-RPC interfaces, and we also used Torque (version 3.0.5), also known as OpenPBS [21], for executing and controlling the requested applications.
Figure 5. Experimental Cloud Testbed
4. Performance Evaluation
4.1. Experimental Environment
To measure and analyze the performance impacts of the proposed technique, we built an experimental cloud test-bed. Figure 5 shows our experimental cloud test-bed, which consists of a front-end host, eight compute hosts and an NFS (Network File System) server. The NFS server provides a storage volume (100GB) to be shared across all the hosts and another volume (200GB) for storing VM images and virtual block storages. The ront-end host has eight 2.5GHz Intel-Xeon CPUs, 16GB physical memory, and 1Gbps Ethernet NIC (Network Interface Card), where OpenNebula and Torque toolkits have been installed and configured. Eight compute hosts have the same hardware specifications as the front-end host on which Xen 4.0.1 is deployed to provide VM provisioning. Our virtualized resources and job management service is executed on the front-end host to properly schedule VMs and jobs by using dynamically measured and collected NETCONTs factors. w is set as 0.8.
4.2. Workloads
For experimental workloads, we generated synthetic ones using NAS Parallel Benchmark (NPB) [22] version 2.4 EP, BT, SP, MG, LU, FT, CG, and IS (class A, B) kernels to mimic a set of consolidated applications that have massive inter-process communication with Nearest Neighbor (NN) and All-to-All (AA) communication patterns (see Table 1). The communication intensity of each NPB benchmark kernels is (EP < BT) < (SP < MG < LU < FT) < (CG < IS). Each job in the workloads requires 2, 4, 8, 9, 16, 25, or 32 CPU cores randomly. Hyper-Erlang distribution [23] is used to model the arrivals of jobs. As performance metrics, we used the average job response time (avg. job waiting time + avg. job execution time), normalized execution time, and normalized completion time.
Table 1. 5 NPB Kernel Characteristics (CLASS=A, 16 Cores)
4.3. Performance Results
4.3.1. Performance Impact of Domain-0’s Load: As explained in Section 2, Domain-0 in
Xen apparently incurs a performance bottleneck because Domain-0 is on the critical path for all of the network and disk I/O requests of guest VMs.
Figure 6 shows the NPB applications performances with different Domain-0’s loads. In this experiment, we compare the normalized execution times of NPB kernels (EP, MG, LU, FT, CG, IS) when Domain-0 has various loads states; DEDICATED (i.e., Domain-0 running no tasks), CPU_LOAD (i.e., Domain-0 running CPU-intensive tasks), and NETWORK_LOAD (i.e., Domain-0 running communication-intensive tasks). For the workload, we use CLASS B size kernels and also assume all the kernels require eight CPU cores.
As shown in Figure 6, as expected, when Domain-0 has CPU-intensive tasks, the normalized execution times of NPB kernels are almost the same compared to that of DEDICATED. With NETWORK_LOAD, however, we observe that the execution times of NPB kernels are significantly increased, resulting in poor performance. This result clearly shows Domain-0’s network load has a crucial impact on the performance of guest VMs’ I/O operations. From the results, we also observe that, for applications with low communication intensity (like EP), the execution time difference among various Domain-0’s loads becomes less distinguished. However, as the communication intensity of kernels (EP < MG < IS) increases, the performance degradation becomes more pronounced. For example, IS for NETWORK_LOAD shows much longer normalized execution time as high as 13 times compared to DEDICATED.
Figure 6. NPB Performances with Different Domain-0 Loads
4.3.2. Performance Impact of System Loads: Figure 7 depicts the performance
comparisons between stripping (i.e., simply selecting nodes in round-robin fashion) and our proposed technique according to the various system loads. For this experiment, we ran five different types of workloads (LOW, LOW-MEDIUM, MEDIUM, MEDIUM-HIGH, and HIGH) with different average inter-arrival times where each has mixed 100 NPB kernels applications with class A problem size.
As shown in Figure 7, the proposed technique significantly outperforms other basic technique, reducing both the average job response time (by up to 37%) and the average execution time (by up to 20%) at HIGH system load. We observe that, at LOW system load, the performance gain of our approach is negligible compared to stripping. However, we also confirm that the performance gain is more pronounced when heavy load is applied to the system. This is mainly because our technique can guide in the allocation of physical nodes with less network pressure or contention (see an allocation example in Figure 8), thereby resulting in the decrement of message communication latency in kernels applications.
Figure 8. An Example of VMs Allocation Considering Network Contention
4.3.2. Performance Impact of Different Job Characteristics: Next, we focus on the performance variation of different VMs allocation techniques when the communication intensities and patterns of the workloads change.
Figure 9 shows the performance comparisons between the stripping approach and our proposed technique with different job characteristics. Note that EP, MG, and IS are different types of jobs in terms of computation granularity, communication pattern, and message size distribution. For this experiment, we ran three different types of workloads (i.e., a set of EPs, a set of MGs, and a set of ISs), where each consists of 50 identical jobs requiring 2, 4, 8, 9, 16, 25, or 32 CPU cores randomly with arrival time 0.
As shown in Figure 9, for an application with low communication intensity and NN communication pattern (i.e., EP), the completion time difference among all VMs allocation approaches is less distinguished, although our technique performs marginally better. As the communication intensity of workloads (EP < MG < IS) increases, however, we reconfirm that the performance benefit of our technique over stripping becomes more pronounced. For example, for IS with high communication intensity and an AA pattern, our approach outperforms stripping by up to 22.3%
From the all results discussed in prior subsections, we clearly confirm that 1) the difference of network contention is common in a virtualized computing cloud and, therefore, should be carefully handled in allocating and scheduling VMs to improve system utilization and 2) significant performance improvement can be achieved with the proposed network contention-aware approach.
Figure 9. Performance Comparison with Different Job Characteristics
5. Conclusions and Future Work
We have proposed a novel VMs allocation technique considering network resource contention in a Xen virtualization environment. Using unified virtualized resources and job management framework, we have also practically analyzed the impact of various system parameters and job characteristics on the performance of our technique. From the results, we confirm that the difference of network resource contention is common in virtualized computing cloud, so it should be considered in scheduling VMs to improve system utilization and that significant performance improvement can be achieved with the proposed network contention-aware approach. We have a plan to experiment with a large-scale cloud test-bed, and also to expand our work to VMs co-scheduling [24, 25] toward high performance clouds.
Acknowledgements
This research was supported by both the EDISON※ Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Science, ICT & Future Planning (No. NRF-2011-0020576) and the KISTI Program (No. K-14-L01-C02-S03).
※ EDISON: Education-research Integration through Simulation on the Net.
References
[1] M. D. Dikaiakos, D. Katsaros, P. Mehra, G. Pallis and A. Vakali, IEEE Internet Computing, vol. 13, no. 5,
(2009).
[2] A. Gupta, “Cloud computing growing interest and related concerns”, Proceedings of the 2nd International Conference of Computer Technology and Development, (2010) November 2-4, Cairo, Egypt.
[3] D. Sanderson, O'Reilly Media Publishers 2nd Edition, (2012).
[4] H.-G. Kim, J.-L. Yu, D.-S. Jin, H. Ryu and J. R. Lee, “A Scientific Workflow System for Computational Science and Engineering: EDISON SimFlow”, Proceedings of 2014 International Conference on Future Information and Communication Engineering, (2014) June 26-28, Kowloon, Hong Kong.
[5] J.-L. Yu, C.-H. Choi, D.-S. Jin, J. R. Lee and H.-J. Byun, “Network-aware Virtual Machines Allocation Technique for High Performance Cloud”, Proceedings of 2014 International Conference on Future Information and Communication Engineering, (2014) June 26-28, Kowloon, Hong Kong.
[6] G. Juve, M. Rynge, E. Deelman, J.-S. Vockler and G. B. Berriman, IEEE Computing in Science & Engineering, vol. 15, no. 4, (2013).
[7] P. Mehrotra, J. Djomehri, S. Heistand, R. Hood, H. Jin, A. Lazanoff, S. Saini and R. Biswas, “Performance Evaluation of Amazon EC2 for NASA HPC Applications”, (2012) June 18-22, DELFT, NETHERLANDS. [8] Y. Song, Y. Zhang, Y. Sun and W. Shi, “Utility analysis for Internet-oriented server consolidation in
VM-based data centers”, Proceedings of 2009 International Conference on Cluster Computing and Workshops,
(2009) August 31, New Orleans, LA.
[9] P. Barham, B. Dragovic, K. Fraser, S. Hand, T. Harris, A. Ho, R. Neugebauer, I. Pratt and A. Warfield, “Xen and the Art of Virtualization”, Proceedings of the ACM Symposium on Operating Systems Principles, (2003)
October 19-22, New York, USA.
[10]X. Pu, M. Liu, J. Jin and Y. Cao, “A modeling of network I/O efficiency in Xen virtualized clouds”, Proceedings of 2011 International Conference on Electronics, Communications and Control, (2011)
September 9-11, Ningbo. [11]“Xen”, http://www.xen.org.
[12]C. A. Waldspurger, “Memory Resource Management in VMware ESX Server”, Proceedings of the Fifth Symposium on Operating Systems Design and Implementation (OSDI), (2002) December 9-11, Boston, MA. [13]K. Adams and O. Agesen, “A Comparison of Software and Hardware Techniques for x86 Virtualization”, Proceedings of the Twelfth International Conference on Architectural Support for Programming Languages and Operating Systems, (2006) October 21-25, San Jose, California, USA.
[14]A. Kivity, Y. Kamay, D. Laor, U. Lublin and A. Liguori, “kvm: the Linux Virtual Machine Monitor”, Proceedings of the Linux Symposium, (2007) June, Ottawa, Ontario, Canada.
[15]“Vmware Editor, Understanding Full Virtualization, Paravirtualization, and Hardware Assist”, Vmware white paper, (2007).
[16]W. Huang, J. Liu, B. Abali and D. K. Panda, “A case for high performance computing with virtual machines”, Proceedings of the 20th annual international conference on Supercomputing, (2006) June 28-July 1, Queensland, Australia.
[17]I. Jo, I. Y. Jung and H. Y. Yeom, International Journal on Computer Science and Engineering (IJCSE), vol. 3, no. 11, (2011).
[18]Y. Mei, L. Liu, X. Pu and S. Sivathanu, “Performance Measurements and Analysis of Network IO Applications in Virtualized Cloud”, Proceedings of IEEE third International Conference on Cloud Computing, (2010) July 5-10, Miami, Florida, USA.
[19]“XenTop”, http://fossies.org/dox/xen-4.4.0/dir_072a5daea518cf86ca8ae3c46bc95e80.html. [20]“OpenNebula”, http://opennebula.org/.
[21]“OpenPBS (Torque)”, http://www.adaptivecomputing.com/products/open-source/torque. [22]“NAS Parallel Benchmarks”, http://www.nas.nasa.gov/publications/npb.html.
[23]Y. Fang, “Wireless Networks”, Springer, vol. 7, (2001).
[24]J.-L. Yu and H.-J. Byun, IEICE Transactions on Information and System, vol. 94-D, (2011).
[25]O. Sukwong and H. S. Kim, “Is co-scheduling too expensive for SMP VMs?”, Proceedings of the sixth conference on Computer systems, (2011) April 10-13, Salzburg, Austria.
Authors
Jung-Lok Yu, he received his Ph.D. degrees from KAIST,
Daejeon, Korea in 2007. He was a senior engineer in Samsung Electronics from 2007 to 2010. He is currently a senior researcher with the Supercomputing Center at the Korea Institute of Science and Technology Information. His research interests include parallel processing, cluster computing, and cloud computing.
Chanho Choi, he received his M.S. degrees from Seoul National
University, Seoul, Korea in 2013. He is currently a researcher with the Supercomputing Center, Korea Institute of Science and Technology Information. His research interests include distributed computing and cloud computing.
Du-Seok Jin, he is a senior researcher at the National Institute of Supercomputing and Networking at the Korea Institute of Science and Technology Information. He received his M.S. degree in Computer Science from Chonbuk National Univ., Republic of Korea in 2001 and his Ph.D. degree in Computer Science from Paichai Univ., Republic of Korea in 2011. His research interests include information retrieval systems, cloud storages and distributed file systems.
Jungsuk Ruth Lee, she is a principal researcher at the National
Institute of Supercomputing and Networking at the Korea Institute of Science and Technology Information and an adjunct faculty at the Univ. of Science and Technology of Korea. Her research interests are smart learning, parallel/distributed computing, and big data handling. She received her PhD degree in computer science from the Univ. of Canterbury, New Zealand.
Hee-Jung Byun, she received a Ph.D. degree from Korea
Advanced Institute of Science and Technology (KAIST) , Daejeon, Korea, in 2005. She was a senior engineer in Samsung Electronics, Ltd. from 2007 to 2010. She is currently an assistant professor with the Department of Information & Communications, Suwon University, Kyunggi-do, Korea. Her research interests include network modeling, controller design, and performance analysis.