While vFlood, vPRO, and vPipe provide e↵ective solutions to mask the virtu-alization induced overheads and latency in I/O processing of VMs, they each have limitations, which pave the way for many possible future extensions.
Interplay with Emerging Hardware A few techniques have been proposed to give VMs direct access to specialized hardware (e.g., use of IOMMU-based SR-IOV in Xen 4.0 and VMDirectPath [91] in VMware vSphere [92]). These techniques lower the device virtualization overhead by bypassing the driver domain. However, the VMs are still subjected to VM scheduling delays which cause delays in I/O processing by VMs.
In such settings, we envision implementing vPRO (except the vFlood VM module) in the hardware itself, thus eliminating the VMM overheads completely. We believe that the vPRO state machine described Section 2.3 and in Section 3.2 should lend itself to a scalable hardware implementation.
vPipe faces a di↵erent challenge in the presence of the VMM-bypass I/O devices.
These devices bypass the driver domain and hence it would be difficult to route the packets towards the shadow socket or access the VM’s storage device interface at the driver domain to carry out the vPipe operations. [59] also discusses the disadvan-tages of having such direct access devices when interposition by the driver domain is necessary to perform functions such as firewalling and rate limiting. An extension to vPipe to handle this situation would be, programming the hardware to forward the packets for a matching connection to the driver domain or programming the storage controller so that the driver domain has access to the storage device – even though the current versions of hardware do not support this feature.
Interacting with I/O Schedulers at VMM Layer As we discussed in Chapter 5, some recent e↵orts have focused on improving the I/O scheduler in the VMM or the driver domain such that the cloud providers can provide QoS guarantees on I/O to the customers. Current designs of both vFlood and vPRO are agnostic to the I/O scheduler running in the driver domain. This approach has both advantages
and disadvantages. The advantage is that, vFlood and vPRO being applicable to the virtualization platform regardless of the I/O scheduling algorithm. However, the actions taken by either vFlood or vPRO might violate QoS guarantees provided by the VMM because both of them are unaware of the QoS rules of the VMM. As an example, the bu↵er management policies used by the vFlood and vPRO may not allow the VM to send packets more than the available space in the vFlood bu↵er – however, the VMM I/O scheduler may allow the VM to send more packets at that time in order to meet the QoS guarantees. Solving such issues require coordination between vPRO and the VMM’s I/O scheduler.
vPipe, on the other hand, takes into account the sharing of I/O resources and driver domain’s CPU and implements its own fairness mechanism. However, we believe that there is more work to be done in this area, specifically how vPipe can adhere to the QoS guarantees provided by the VMM I/O scheduler for the o✏oaded I/O operations.
LIST OF REFERENCES
[1] Michael Armbrust, Armando Fox, Rean Griffith, Anthony D. Joseph, Randy Katz, Andy Konwinski, Gunho Lee, David A. Patterson, Ariel Rabkin, Ion Sto-ica, and Matei Zaharia. Above the clouds: A Berkeley view of cloud computing.
Technical Report UCB/EECS-2009-28, EECS Department, University of Cali-fornia, Berkeley, Berkeley, CA, 2009.
[2] Amazon Elastic Compute Cloud (EC2). http://aws.amazon.com/ec2. Accessed July 2013.
[3] GoGrid. http://www.gogrid.com. Accessed March 2010.
[4] Windows Azure: Microsoft’s cloud platform. http://www.windowsazure.com/
en-us/. Accessed August 2013.
[5] Amazon EC2 instance types. http://aws.amazon.com/ec2/instance-types/. Ac-cessed July 2013.
[6] C. Clark, K. Fraser, S. Hand, and J. G. Hansen. Live migration of virtual machines. In Proceedings of the 2nd USENIX Conference on Networked Systems Design and Implementation, NSDI ’05, pages 273–286, 2005.
[7] Carl Waldspurger and Mendel Rosenblum. I/O virtualization. Communications of the ACM, 55(1):66–73, 2012.
[8] Jeremy Sugerman, Ganesh Venkitachalam, and Beng-Hong Lim. Virtualizing I/O devices on VMware Workstation’s hosted virtual machine monitor. In Pro-ceedings of the USENIX Annual Technical Conference, ATC ’01, pages 1–14, 2001.
[9] Aravind Menon, Alan L. Cox, and Willy Zwaenepoel. Optimizing network vir-tualization in Xen. In Proceedings of the USENIX Annual Technical Conference, ATC ’06, pages 15–28, 2006.
[10] Aravind Menon and Willy Zwaenepoel. Optimizing TCP receive performance.
In Proceedings of the USENIX Annual Technical Conference, ATC ’08, pages 85–98, 2008.
[11] Yaozu Dong, Zhao Yu, and Greg Rose. SR-IOV networking in Xen: architec-ture, design and implementation. In Proceedings of the 1st Workshop on I/O Virtualization, WIOV ’08, pages 10–10, 2008.
[12] J. Dean and S. Ghemawat. MapReduce: Simplified data processing on large clusters. In Proceedings of the 6th USENIX Conference on Operating Systems Design and Implementation, OSDI ’04, pages 10–10, 2004.
[13] MPICH: High-performance portable MPI. http://www.mpich.org. Accessed March 2010.
[14] Diwaker Gupta, Ludmila Cherkasova, Rob Gardner, and Amin Vahdat. Enforc-ing performance isolation across virtual machines in Xen. In ProceedEnforc-ings of the ACM/IFIP/USENIX International Conference on Middleware, Middleware ’06, pages 342–362, 2006.
[15] Timothy Wood, Prashant Shenoy, Arun Venkataramani, and Mazin Yousif.
Black-box and gray-box strategies for virtual machine migration. In Proceedings of the 4th USENIX Conference on Networked Systems Design and Implementa-tion, NSDI ’07, pages 229–242, 2007.
[16] Carl A. Waldspurger. Memory resource management in VMware ESX server.
In Proceedings of the 9th USENIX Conference on Operating Systems Design and Implementation, OSDI ’10, pages 1–14, 2010.
[17] Diwaker Gupta, Sangmin Lee, Michael Vrable, Stefan Savage, Alex C. Snoeren, George Varghese, Geo↵rey M. Voelker, and Amin Vahdat. Di↵erence engine:
Harnessing memory redundancy in virtual machines. In Proceedings of the 8th USENIX Conference on Operating Systems Design and Implementation, OSDI
’08, pages 309–322, 2008.
[18] Ardalan Kangarlou, Sahan Gamage, Ramana Rao Kompella, and Dongyan Xu.
vSnoop: Improving TCP throughput in virtualized environments via acknowl-edgement o✏oad. In Proceedings of the ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, SC ’10, pages 1–11, 2010.
[19] Ardalan Kangarlou-Haghighi. Improving the reliability and performance of vir-tual cloud infrastructures. PhD thesis, Purdue University, West Lafayette, 2011.
[20] Jon Oltsik and Mark Bowker. Server virtualization landscape.
http://events.1105govinfo.com/events/vcg-summit-2010/information/⇠/media/
GIG/GIG%20Events/2010%20Enterprise%20Architecture/Presentations 0/
VCG10 3%201 Oltsik%20Bowker.ashx. Accessed April 2011.
[21] Grzegorz Milos, Derek G. Murray, Steven Hand, and Michael A. Fetterman.
Satori: Enlightened page sharing. In Proceedings of the USENIX Annual Tech-nical Conference, ATC ’09, pages 1–1, 2009.
[22] Paul Barham, Boris Dragovic, Keir Fraser, Steven Hand, Tim Harris, Alex Ho, Rolf Neugebauer, Ian Pratt, and Andrew Warfield. Xen and the art of virtu-alization. In Proceedings of the 19th ACM Symposium on Operating Systems Principles, SOSP ’03, pages 164 –177, 2003.
[23] Apache Olio. http://incubator.apache.org/olio/. Accessed April 2011.
[24] Will Sobel, Shanti Subramanyam, Akara Sucharitakul, Jimmy Nguyen, Hubert Wong, Arthur Klepchukov, Sheetal Patil, Armando Fox, and David Patterson.
Cloudstone: Multi-platform, multi-language benchmark and measurement tools for Web 2.0. In Proceedings of the 1st Workshop on Cloud Computing and Its Applications, CCA ’08, 2008.
[25] Srikanth Kandula, Sudipta Sengupta, Albert Greenberg, Parveen Patel, and Ronnie Chaiken. The nature of data center traffic: Measurements and analysis.
In Proceedings of the 9th ACM SIGCOMM Internet Measurement Conference, IMC ’09, pages 202–208, 2009.
[26] Theophilus Benson, Ashok Anand, Aditya Akella, and Ming Zhang. Understand-ing data center traffic characteristics. SIGCOMM Computer Communication Review, 40(1):92–99, 2010.
[27] Mohammad Alizadeh, Albert Greenberg, David A. Maltz, Jitendra Padhye, Parveen Patel, Balaji Prabhakar, Sudipta Sengupta, and Murari Sridharan. Data center TCP (DCTCP). In Proceedings of the ACM SIGCOMM Conference, SIG-COMM ’10, pages 63–74, 2010.
[28] Sriram Govindan, Arjun R. Nath, Amitayu Das, Bhuvan Urgaonkar, and Anand Sivasubramaniam. Xen and co.: Communication-aware CPU scheduling for con-solidated Xen-based hosting platforms. In Proceedings of the 3rd ACM SIG-PLAN/SIGOPS International Conference on Virtual Execution Environments, VEE ’07, pages 126–136, 2007.
[29] Diego Ongaro, Alan L. Cox, and Scott Rixner. Scheduling I/O in virtual ma-chine monitors. In Proceedings of the 4th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments, VEE ’08, pages 1–10, 2008.
[30] Alacritech corporation. http://www.alacritech.com. Accessed April 2011.
[31] Chelsio communications. http://www.chelsio.com. Accessed April 2011.
[32] Leah Shalev, Julian Satran, Eran Borovik, and Muli Ben-Yehuda. IsoStack:
Highly efficient network processing on dedicated cores. In Proceedings of the USENIX Annual Technical Conference, ATC ’10, pages 5–5, 2010.
[33] Je↵rey C. Mogul. TCP o✏oad is a dumb idea whose time has come. In Proceed-ings of the 9th USENIX Workshop on Hot Topics in Operating Systems, HotOS
’03, pages 5–5, 2003.
[34] Danhua Guo, Guangdeng Liao, and L.N. Bhuyan. Performance characterization and cache-aware core scheduling in a virtualized multi-core server under 10GbE.
In Proceedings of the IEEE International Symposium on Workload Characteri-zation, IISWC ’09, pages 168–177, 2009.
[35] VMware knowledge base article. http://kb.vmware.com/kb/1006143. Accessed April 2011.
[36] TOE : The Linux foundation. http://www.linuxfoundation.org/collaborate/
workgroups/networking/toe. Accessed April 2011.
[37] Greg Regnier, Srihari Makineni, Ramesh Illikkal, Ravi Iyer, Dave Minturn, Ram Huggahalli, Don Newell, Linda Cline, and Annie Foong. TCP onloading for data center servers. IEEE Computer, 37:48 – 58, 2004.
[38] VMware ESX. http://www.vmware.com/products/esxi-and-esx/. Accessed Au-gust 2013.
[39] Microsoft Hyper-V server. http://www.microsoft.com/en-us/server-cloud/
hyper-v-server/default.aspx. Accessed August 2013.
[40] Kernel-based Virtual Machine (KVM). http://www.linux-kvm.org. Accessed August 2013.
[41] L. S. Brakmo and L. L. Peterson. TCP Vegas: End to end congestion avoid-ance on a global internet. IEEE Journal on selected areas in communications, 13(8):1465 –1480, 1995.
[42] Sangtae Ha, Injong Rhee, and Lisong Xu. CUBIC: A new TCP-friendly high-speed TCP variant. ACM SIGOPS Operating System Review, 42(5):64–74, 2008.
[43] David X. Wei, Cheng Jin, Steven H. Low, and Sanjay Hegde. FAST TCP:
Motivation, architecture, algorithms, performance. IEEE/ACM Transactions on Networking, 14(6):1246–1259, 2006.
[44] VMware tools. http://kb.vmware.com/kb/340. Accessed April 2011.
[45] Linux network emulation. http://www.linuxfoundation.org/collaborate/
workgroups/networking/netem. Accessed April 2011.
[46] Abhijit K. Choudhury and Ellen L. Hahne. Dynamic queue length thresholds for shared-memory packet switches. IEEE/ACM Transactions on Networking, 6(2):130–140, 1998.
[47] CLOC : Count lines of code. http://cloc.sourceforge.net. Accessed April 2011.
[48] Linux Net:Bridge. http://www.linux-foundation.org/en/Net:Bridge. Accessed April 2011.
[49] The Iperf benchmark. https://code.google.com/p/iperf/. Accessed March 2010.
[50] Project Faban. https://java.net/projects/faban/. Accessed April 2011.
[51] Aravind Menon, Jose Renato Santos, Yoshio Turner, G. (John) Janakiraman, and Willy Zwaenepoel. Diagnosing performance overheads in the Xen virtual machine environment. In Proceedings of the 4th ACM SIGPLAN/SIGOPS Inter-national Conference on Virtual Execution Environments, VEE ’05, pages 13–23, 2005.
[52] Intel MPI benchmark. http://software.intel.com/en-us/articles/
intel-mpi-benchmarks/. Accessed March 2010.
[53] Ole Agesen, Jim Mattson, Radu Rugina, and Je↵rey Sheldon. Software tech-niques for avoiding hardware virtualization exits. In Proceedings of the USENIX Annual Technical Conference, ATC ’12, pages 35 – 35, 2012.
[54] Abel Gordon, Nadav Amit, Nadav Har’El, Muli Ben-Yehuda, Alex Landau, Assaf Schuster, and Dan Tsafrir. ELI: Bare-metal performance for I/O virtualization.
In Proceedings of the 17th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS ’12, pages 411–422, 2012.
[55] Sahan Gamage, Ardalan Kangarlou, Ramana Rao Kompella, and Dongyan Xu.
Opportunistic flooding to improve TCP transmit performance in virtualized clouds. In Proceedings of the 2nd ACM Symposium on Cloud Computing, SoCC
’11, pages 1–14, 2011.
[56] Cong Xu, Sahan Gamage, Pawan N. Rao, Ardalan Kangarlou, Ramana Rao Kompella, and Dongyan Xu. vSlicer: Latency-aware virtual machine scheduling via di↵erentiated-frequency CPU slicing. In Proceedings of the 21st International Symposium on High-Performance Parallel and Distributed Computing, HPDC
’12, pages 3–14, 2012.
[57] Keith Adams and Ole Agesen. A comparison of software and hardware techniques for x86 virtualization. In Proceedings of the 12th International Conference on Architectural Support for Programming Languages and Operating Systems, AS-PLOS ’06, pages 2–13, 2006.
[58] Aravind Menon, Simon Schubert, and Willy Zwaenepoel. TwinDrivers: Semi-automatic derivation of fast and safe hypervisor network drivers from guest OS drivers. In Proceedings of the 14th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS ’09, pages 301–312, 2009.
[59] Abel Gordon, Nadav Har’El, Alex Landau, Muli Ben-Yehuda, and Avishay Traeger. Towards exitless and efficient paravirtual I/O. In Proceedings of the 5th Annual International Systems and Storage Conference, SYSTOR ’12, pages 1–6, 2012.
[60] Cong Xu, Sahan Gamage, Hui Lu, Ramana Rao Kompella, and Dongyan Xu.
vTurbo: Accelerating virtual machine I/O processing using designated turbo-sliced core. In Proceedings of the USENIX Annual Technical Conference, ATC
’13, pages 243 – 254, 2013.
[61] Tal Garfinkel and Mendel Rosenblum. A virtual machine introspection based architecture for intrusion detection. In Proceedings of Network and Distributed Systems Security Symposium, NDSS ’03, pages 191–206, 2003.
[62] Lookbusy: A synthetic load generator. http://www.devin.com/lookbusy/. Ac-cessed April 2011.
[63] Lighttpd web server. http://www.lighttpd.net/. Accessed March 2013.
[64] Pound: Reverse-proxy and load-balancer. http://www.apsis.ch/pound. Accessed July 2013.
[65] GNUMP3d: GNU MP3/media streamer. http://www.gnu.org/software/
gnump3d/. Accessed July 2013.
[66] MPlayer: The movie player. http://www.mplayerhq.hu/. Accessed July 2013.
[67] Kaushik Kumar Ram, Alan L Cox, Mehul Chadha, and Scott Rixner. Hyper-switch: A scalable software virtual switching architecture. In Proceedings of the USENIX Annual Technical Conference, ATC ’13, pages 13 – 24, 2013.
[68] Ben Pfa↵, Justin Pettit, Keith Amidon, Martin Casado, Teemu Koponen, and Scott Shenker. Extending networking into the virtualization layer. In Proceedings of the Workshop on Hot Topics in Networks, HotNets ’09, pages 1 – 6, 2009.
[69] Nick McKeown, Tom Anderson, Hari Balakrishnan, Guru Parulkar, Larry Peter-son, Jennifer Rexford, Scott Shenker, and Jonathan Turner. Openflow: Enabling innovation in campus networks. SIGCOMM Computer Communication Review, 38:69–74, 2008.
[70] Nadav Amit, Muli Ben-Yehuda, Dan Tsafrir, and Assaf Schuster. vIOMMU:
Efficient IOMMU emulation. In Proceedings of the USENIX Annual Technical Conference, ATC’11, pages 6–6, 2011.
[71] Alex Landau, Muli Ben-Yehuda, and Abel Gordon. SplitX: Split guest/hypervisor execution on multi-core. In Proceedings of the 3rd Workshop on I/O Virtualization, WIOV ’11, pages 1–7, 2011.
[72] Irfan Ahmad, Ajay Gulati, and Ali Mashtizadeh. vIC: Interrupt coalescing for virtual machine storage device IO. In Proceedings of the USENIX Annual Tech-nical Conference, ATC ’11, pages 4–4, 2011.
[73] Xiaolan Zhang, Suzanne McIntosh, Pankaj Rohatgi, and John Linwood Grif-fin. XenSocket: A high-throughput interdomain transport for virtual machines.
In Proceedings of the ACM/IFIP/USENIX International Conference on Middle-ware, Middleware ’07, pages 184–203, 2007.
[74] Jian Wang, Kwame-Lante Wright, and Kartik Gopalan. XenLoop: A transpar-ent high performance inter-VM network loopback. In Proceedings of the 17th International Symposium on High Performance Distributed Computing, HPDC
’08, pages 109–118, 2008.
[75] Anton Burtsev, Kiran Srinivasan, Prashanth Radhakrishnan, Lakshmi N.
Bairavasundaram, Kaladhar Voruganti, and Garth R. Goodson. Fido: Fast inter-virtual-machine communication for enterprise appliances. In Proceedings of the USENIX Annual Technical Conference, ATC ’09, pages 25–25, 2009.
[76] Kangho Kim, Cheiyol Kim, Sung-In Jung, Hyun-Sup Shin, and Jin-Soo Kim.
Inter-domain socket communications supporting high performance and full bi-nary compatibility on Xen. In Proceedings of the 4th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments, VEE ’08, pages 11–20, 2008.
[77] Wei Huang, Matthew J. Koop, Qi Gao, and Dhabaleswar K. Panda. Virtual machine aware communication libraries for high performance computing. In Proceedings of the ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, SC ’07, pages 1–12, 2007.
[78] Ajay Gulati, Irfan Ahmad, and Carl A. Waldspurger. PARDA: Proportional allocation of resources for distributed storage access. In Proccedings of the 7th Conference on File and Storage Technologies, FAST ’09, pages 85–98, 2009.
[79] Ajay Gulati, Arif Merchant, and Peter Varman. mClock: Handling throughput variability for hypervisor IO scheduling. In Proceedings of the 9th USENIX Con-ference on Operating Systems Design and Implementation, OSDI ’10, pages 1–7, 2010.
[80] Ajay Gulati, Ganesha Shanmuganathan, Xuechen Zhang, and Peter Varman.
Demand based hierarchical QoS using storage resource pools. In USENIX Annual Technical Conference, ATC’12, pages 1–1, 2012.
[81] Jean-Pascal Billaud and Ajay Gulati. hClock: Hierarchical QoS for packet scheduling in a hypervisor. In Proceedings of the 8th ACM European Confer-ence on Computer Systems, EuroSys ’13, pages 309–322, 2013.
[82] Mukil Kesavan, Ada Gavrilovska, and Karsten Schwan. Di↵erential virtual time (DVT): Rethinking I/O service di↵erentiation for virtual machines. In Proceed-ings of the 1st ACM Symposium on Cloud computing, SoCC ’10, pages 27–38, 2010.
[83] Mukil Kesavan, Ada Gavrilovska, and Karsten Schwan. On disk scheduling in virtual machines. In Proceedings of the 2nd Workshop on I/O Virtualization, WIOV ’10, pages 6–6, 2010.
[84] Vijay Vasudevan, Amar Phanishayee, Hiral Shah, Elie Krevat, David G. Ander-sen, Gregory R. Ganger, Garth A. Gibson, and Brian Mueller. Safe and e↵ective fine-grained TCP retransmissions for datacenter communication. In Proceedings of the ACM SIGCOMM Conference, SIGCOMM ’09, pages 303–314, 2009.
[85] Luwei Cheng and Cho-Li Wang. vBalance: Using interrupt load balance to improve I/O performance for SMP virtual machines. In Proceedings of the 3rd ACM Symposium on Cloud Computing, SoCC ’12, pages 1–14, 2012.
[86] Hui Kang, Yao Chen, Jennifer L. Wong, Radu Sion, and Jason Wu. Enhance-ment of Xen’s scheduler for MapReduce workloads. In Proceedings of the 20th International Symposium on High Performance Distributed Computing, HPDC
’11, pages 251–262, 2011.
[87] Abel Gordon, Muli Ben-Yehuda, Dennis Filimonov, and Maor Dahan. VAMOS:
Virtualization aware middleware. In Proceedings of the 3rd Workshop on I/O Virtualization, WIOV ’11, pages 3–3, 2011.
[88] Venkateswararao Jujjuri, Eric Van Hensbergen, and Anthony Liguori. VirtFS – A virtualization aware file system pass-through. In Proceedings of the Ottawa Linux Symposium, OLS ’10, pages 109 – 120, 2010.
[89] Murali Rangarajan, Aniruddha Bohra, Kalpana Banerjee, Enrique V. Carrera, Ricardo Bianchini, and Liviu Iftode. TCP servers: O✏oading TCP processing in internet servers. Technical Report DCS-TR-481, Rutgers University Department of Computer Science, Piscataway, New Jersey, 2002.
[90] Broadcom corporation. http://www.broadcom.com. Accessed April 2011.
[91] VMDirectPath I/O. http://kb.vmware.com/kb/1010789. Accessed August 2013.
[92] VMware vSphere. http://www.vmware.com/products/vsphere/. Accessed Au-gust 2013.
VITA
Sahan Bamunavita Gamage was born and raised in the city of Galle, Sri Lanka.
He received a B.S. in computer science and engineering, with first class honors, from University of Moratuwa, Sri Lanka in November 2004. He received an M.S. in com-puter science from Purdue University, West Lafayette, Indiana, USA in August 2010 and his Ph.D. in computer science from Purdue University, West Lafayette, Indiana, USA in December 2013 under the direction of Professor Dongyan Xu and Professor Ramana Kompella. His research interests are in the areas of virtualization, operating systems, and networking. Before coming to Purdue for graduate school, he was a software engineer with Millennium Information Technologies, Sri Lanka and a senior software engineer with WSO2 Inc., Sri Lanka. During his Ph.D. studies at Purdue University, he interned with NEC Laboratories USA in Princeton, New Jersey, USA, and VMware Inc. in Palo Alto, California, USA.