• No results found

The Network Hypervisor

N/A
N/A
Protected

Academic year: 2021

Share "The Network Hypervisor"

Copied!
27
0
0

Loading.... (view fulltext now)

Full text

(1)

1 © IBM 2010

Network Abstraction

The Network Hypervisor

David Hadas,

Haifa Research Lab,

Nov, 2010

[email protected]

IBM Research

(2)

Agenda

 New Requirements from DCNs

– Virtualization

– Clouds

 Our Approach: Building

Abstracted Networks

 Virtual Application Networks

1,2

(VANs)

– An Abstracted Network solution

Developed by HRL as part of Reservoir

– Abstracted Network Challenges

(1) D. Hadas, S. Guenender, and B. Rochwerger, “Virtual network services for federated cloud computing,” tech. rep., IBM, 2009. Research Report H-0269.

(2) A. Landau, D. Hadas, and M. Ben-Yehuda, “Plugging the hypervisor abstraction leaks caused by virtual networking,” in SYSTOR ’10: Proceedings of the 3rd Annual Haifa Experimental Systems Conference, pp. 1–9, ACM, 2010.

(3)

3 © IBM 2010

Traditional Network View

Network View

– DCs scale to 100,000 End Stations[1,2]

– Routers connect L2 domain, sub-divided to VLANs – Typically a LAN is up to 250 End Stations[1,2,4]

– Typically a L2 domain is up to 4K End Stations[1,2,3]

– Scale: 1000 LANs, 100 L2 Domain

Server View

– End-Stations have single IP and MAC and are “dumb” (know only a default GW)

– Network topology and routes are under the control of the network

Layer-2 domain

DC

Router

Internet

App App App App App App App App App App App App App App App App App App App App

Layer-2 domain

IP Cloud

T h e N e tw o rk C o m p u te

[1] Albert Greenberg , Parantap Lahiri , David A. Maltz , Parveen Patel , Sudipta Sengupta, Towards a next generation data center architecture: scalability and commoditization, Proceedings of the ACM workshop on Programmable routers for extensible services of tomorrow, August 22-22, 2008, Seattle, WA, USA [2] Greenberg, A., et al., “The Cost of a Cloud: Research Problems in Data Center Networks”, CCR, v39, n1, Jan’09.

[3] C. Kim and J. Rexford. Revisiting ethernet: Plug-and-play made scalable and efficient. In Proc. IEEE LANMAN Workshop, 2007. [4] P. Garimella, Y.-W. E. Sung, N. Zhang, and S. Rao. Characterizing vlan usage in an operational network. In INM ’07

(4)

The Future of DCNs

 # of Physical Hosts per DC is already over 100,000 Hosts

 # of VMs per Host will grow to 64 – 192 (32 core machine)

 1 or 2 virtual network interfaces per VM

– We believe the real number is 4, 8 or more since the cost of virtual interfaces is

lower than that of physical interfaces, leading to new use patterns

 Total: 10,000,000 – 100,000,000 virtual network interfaces in a large DC!

(5)

5 © IBM 2010

VM VM

VM VM VM

Network Virtualization – The Naive Way

New Network View

– The vSwitch is part of the Physical Network! – 64-192 VMs per Server (32 Core Machine)

• Multiple virtual interfaces per VM – Ratio: 1:1 LANs to Hosts,

10:1 L2 Domains to Hosts

– Scale: 100,000 LANs, 10,000 L2 Domain

New Server View

– VMs have single IP and MAC and are “dumb” (knows only a default GW)

– Network topology and routes are under the control of the network (now spanning into Hosts)

Layer-2 domain

DC

Router

Internet

Layer-2 domain

VM VM VM VM VM VM VM VM T h e N e w N e tw o rk ( T h e N a iv e W a y ) C o m p u te VM VM VM VM VM VM VM VM App App App App App App App App App App VM VM VM VM VM App App App App App App App App App App

IP Cloud

(6)

Wouldn’t L2 Domains of The Future be Larger?

Vendors have for many years been seeking ways to

grow L2 further with very partial success

– L2 Domain scaling is on the “most wanted” list for many years

Ethernet related research (See for example

[1,2]

)

concludes that scaling Ethernet requires fundamental

changes into the Ethernet protocol

(e.g. to eliminate broadcast services such as ARP)

– Standardization work has started on IETF and IEEE to address

Ethernet related issues (TRILL,

Layer 2 Multipathing

)

– Will L2 scale by a factor of 10, 100, 1,000, 10,000?

– Will L2 fit more complex environments with multiple sites?

[1] A. Myers, T.S.E. Ng, and H. Zhang, "Rethinking the Service Model: Scaling Ethernet to a Million Nodes", in Proceedings of ACM SIGCOMM HotNets, 2004.

[2] Changhoon Kim , Matthew Caesar , Jennifer Rexford, Floodless in seattle: a scalable ethernet architecture for large enterprises, Proceedings of the ACM SIGCOMM 2008 conference on Data communication, August 17-22, 2008, Seattle, WA, USA

(7)

7 © IBM 2010

Network Infrastructure for Clouds

 Clouds need to scale

– # of VMs,

– # of Vitual networks,

– # of Sites

 Clouds need to be agile

– Fully automated

– Enable auto-placement decisions

– Dynamic

– Self-managed

– Simple to use

– Efficient

 Clouds need to support establishing isolated virtual networks

– Private to the hosted customer and managed by it

(allowing customers to determine their own address space)

– Extend beyond the limits of a single data center (physical and administrative) without

requiring coordinated management and while maintaining data center independence and

security.

– Established ad-hoc without requiring pre-configuration and/or management control of the

physical network

A

ut

om

at

ed

M

ob

ili

ty

A

ny

w

he

re

(8)

Site B

Site B

Site A

Site A

Site

Site

Mobility Anywhere

No VM Mobility Local Mobility Mobility Anywhere

Today

Today

Today

Today

Today

Today

Today

Today

’’’’

’’’’

s Virtual

s Virtual

s Virtual

s Virtual

s Virtual

s Virtual

s Virtual

s Virtual

Network Solutions

Network Solutions

Network Solutions

Network Solutions

Network Solutions

Network Solutions

Network Solutions

Network Solutions

How to achieve this?

How to achieve this?

How to achieve this?

How to achieve this?

Mobility Anywhere Mobility Anywhere

Server Consolidation

Server Consolidation

Server Consolidation

Server Consolidation

Server Consolidation

Server Consolidation

Server Consolidation

Server Consolidation

Automation

Automation

Automation

(9)

9 © IBM 2010

Mobility Anywhere Challenges In Data Centers

 A solution that enables customers to freely mobile VMs across Data Centers is needed

A ‘Long Distance Virtual LAN’ technology

• Current approach: Extending local VLANs across subnets

A Revised Client/Server Routing technology

• Connecting Internet Clients to mobile Servers

1 2 2 1 [1] http://www.cisco.com/en/US/solutions/collateral/ns340/ns517/ns224/ns836/white_paper_c11-557822.pdf “

“(This) is not possible with existing technologies…. (It) will require the redesign of the IP network

between the data centers involving the Internet.””[1][1]

Data Center Virtualization

Requires Remodeling Of The Network

IDC, Oct 2009: IDC, Oct 2009: IDC, Oct 2009:

IDC, Oct 2009: ‘The DC network industry is witnessing a dramatic shift in DC architecture. It is clear that the current network topology will not meet the requirements of the new DC. Server, storage, and network virtualization will be the overriding premise of the DC buildout.’

Data Center Virtualization

Data Center Virtualization

Requires Remodeling Of The Network

Requires Remodeling Of The Network

IDC, Oct 2009: IDC, Oct 2009: IDC, Oct 2009:

IDC, Oct 2009: ‘The DC network industry is witnessing a dramatic shift in DC architecture. It is clear that the current network topology will not meet the requirements of the new DC. Server, storage, and network virtualization will be the overriding premise of the DC buildout.’

(10)

Abstracted Networks – The Network Hypervisor

 Network virtualization should follow (well established) host virtualization

principles:

Host

virtualization should enable virtual machines

• To remain independent of physical location

• To remain independent of the

host

physical characteristics such as CPU, Memory, I/O,

etc.

• To form isolated

compute

environments on top of the shared physical

host

environment

Network

virtualization should enable virtual machines

• To remain independent of physical location

• To remain independent of the

network

physical characteristics such as topology,

protocols, addresses, etc.

• To form isolated

network

environments on top of the shared physical

network

environment serving the hosts

Such complete network virtualization can be achieved using

network abstraction – hence the term “

Abstracted Networks

(11)

11 © IBM 2010

Physical Network Abstraction

Host

Host

Host

Host

Host

Host

Host

Host

Host

Host

(12)

A New Way To Look At Virtual Networking,

Hypervisor Evolution

 In the 1970’s, IBM shipped the first hypervisors offering

a complete virtual replica of the hosting system

– Guest virtual machines are offered a fully protected and

isolated copy of the underlying physical host hardware

– Virtual replica can be hardware dependent (No abstraction)!

 Later, the hypervisor role evolved in order to support

advanced administrative functions

, such as

– Checkpoint/Restart

– VM Cloning

– Live-Migration (Mobility)

 The hypervisor needs to ensure that the guest application

and operating system are able to continue running

unchanged even when the compute, network and storage

environments drastically changes.

– Can be achieved by completely abstracting the environment

offered to guests and ensuring that guests become unaware

of any characteristic of the hosting platform.

HOST Virtual Machine Virtual I/O Virtual CPU Virtual Memory Virtual Network Virtual Storage

(13)

13 © IBM 2010

A New Way To Look At Virtual Networking

Current Network Abstraction Layers are Leaking

 Most current hypervisors offer incomplete network

abstraction. Common approaches include:

– Guest directly accessed a physical network adapter

• The hypervisor exposes direct references to unique platform hardware resources

– An emulated NIC or backend is connected to a local host

software bridge/router

• The hypervisor exposes direct references to network addresses with spatial meaning (may also be time-bounded).

– Use Network Address Translation

• Received packets continue to include references of physical addresses, causing a leak in the hypervisor abstraction layer.

 Current hypervisors tend to consider this incomplete

abstraction as a network problem rather then a problem

of the hypervisor.

– Solution: Restrict the mobility to the network segment

– Solution: Synchronize the mobility with a timely

reorganization of the network.

HOST Virtual Machine Virtual I/O Virtual CPU Virtual Memory Virtual Storage Physical Environment Information Network Abstraction Layer

Physical Information Leaks via The Abstraction Layer

Virtual Network

(14)

Plugging The Hypervisor Network Abstraction Layer Leaks

 Abstraction: VMs should not ‘talk’ to the hardware (physical network)

– Use separate addresses

– Hypervisor should provide mapping

 Isolation: VMs belonging to one virtual network should be isolated from

VMs not connected to that network

– Security by Isolation

– Isolated Performance

– Independent Address/Naming schemes

Host

VM

Switch, Router

Site Physical Hosts

Site Physical Network Devices

Host

VM

X

X

X

X

VM

X

X

Isolation Isolation Abstraction Abstraction Abstraction Abstraction

(15)

15 © IBM 2010

Virtualization – Abstracted Network

Abstracted Networks View

– Hosts hypervisors form an abstracted network between them

– Abstracted network topology and routes are under the control of the hosts

Physical Network View

– End Stations (Servers) have single IP and MAC in the physical network domain are “dumb” (knows only a default GW)

– Physical network design is unchanged – Same scale as today

Layer-2 domain

DC

Router

Internet

Layer-2 domain

T h e N e tw o rk C o m p u te

IP Cloud

VM VM VM VM VM VM VM VM VM Pseudo-wires VM

(16)

Virtual Application Networks

 A VAN is a virtual and distributed switching service connecting VMs. VANs revolutionize the way networks are organized by using an overlay network between hypervisors of different hosting platforms.

 The hypervisors’ overlay network decouple the virtual network services offered to VMs from the underlying physical network. As a result, the network created between VMs becomes independent from the topology of the physical network.

(17)

17 © IBM 2010

Example for Abstracted Networks

The HRL Virtual Application Networks (VAN) project

 What are VANs?

– A design following the Abstracted

Network principles

– A L2-alike service for VMs

using IP-based Pseudo-Wires

– A set of control protocols for

setting and maintaining

Pseudo-Wires

• Auto Discovery

• Dynamic Routing

– A set of methodologies for using

Abstracted Networks

Overlaid Virtual Network Service Overlaid Virtual Network ServiceOverlaid Virtual Network Service Overlaid Virtual Network Service

VM

Host Host

Standard IP Physical Network serving as the “Underlay”

Host

VM VM VM VM VM VM VM VM VM

network overlay

HYP HYP network HYP

overlay VAN

Enhanced EnhancedVAN EnhancedVAN VM VM VM VM VM VM VM VM

 Reservoir Mid 2008-2010

– An EU project for federated clouds

– Proof of Concept developed under system X

– Demonstrated to the EU at EOY-1 and EOY-2

(18)

Abstracted Networks Introduce Multiple Research Challenges

 I/O Performance  Routing  Traffic shaping  QoS  Multi-pathing  Scalability  Management (FCAPS)  Resiliency  Security

 Network aspects of VM Placement  Federation of Clouds

Host A

Host B

VM1 VM2 VM3 VM4 VM5 VM6 Red Gr Red Gr ‘B’ ‘A’ … ‘Red’ Application Data … M1 M5 Application Data … M1 M5 Ye Ye VM7 Pu VM8 M9 M1 M6 M5 M2 M3 M2 M1 M2 Virtual Interface Identifier (vMAC) E.g. IP Network Trusted Virtual Domain Auto discovery protocols Route Table for Green 1 2 3 0

(19)

19 © IBM 2010

Reservoir – Federated Clouds

 Cloud providers cannot be expected to coordinate their network maintenance, network

topologies, etc. with each other.

 A research into Federated Clouds suggests meeting these requirements by separating clouds

using VAN proxies, which acts as gateways between clouds.

– A VAN proxy hides the internal structure of the cloud from other clouds in a federation.

– The VAN proxies of different clouds communicate to ensure that VANs can extend across a cloud boundary while adhering to the privacy and security limitations of its members

(20)

QEMU

Guest

Guest Kernel VIRTIO Frontend VIRTIO Backend

QEMU

Guest

Guest application Guest Kernel Network Adapter Driver Emulated Network Adapter Guest Network Stack Guest Network Stack Socket Interface Guest application Socket Interface

Host

Kernel

Virtual Network Interface Host Network Services (E.g. Bridge or VAN central services) Virtual Network Interface

Packet flow today (in KVM)

(21)

21 © IBM 2010

Pseudo-Wires Between Hosts Results in a Dual Stack

Host

Kernel

QEMU

Guest

Guest Kernel VIRTIO Frontend Driver VIRTIO Backend

QEMU

Guest

Guest application Guest Kernel Network Adapter Driver Emulated Network Adapter Guest Network Stack Guest Network Stack Traffic Encapsulation Traffic Encapsulation Host Network Stack Socket Interface Socket Interface Guest application Socket Interface Guest Stack (Glue) Host Stack App. Driver Net Driver

Abstraction

(22)

Performance

 Path from guest to wire is long

– Latencies are manifested in the form of:

• Packet copies

• VM exits and entries

• User/Kernel mode switches

• Host QEMU process scheduling

 Performance Aspects

Increase Packet Size

Inhibit Checksum

CPU Affinity

(23)

23 © IBM 2010

Increase Packet Size

 Large Packets

– Transport and Network layers capable of up to 64KB packets

– Ethernet limit is 1500 bytes but there is no Ethernet wire between guest and

host!

– Set MTU to 64KB in guest

 Flow

– Application writes 64KB to TCP socket

– TCP, IP check MTU (=64KB) and create 1 TCP segment, 1 IP packet

– Guest virtual NIC driver copies entire 64KB frame to host

– Host writes 64KB frame into UDP socket

– Host stack creates 1 64KB UDP packet

– If packet destination = VM on local host

• Transfer 64KB packet directly on the loopback interface

– If packet destination = other host

• Host NIC segments 64KB packet in hardware

(24)

Inhibit Checksums

 VM to VM packets represent inner Host buffer transfer

– Inhibit TCP/UDP checksum calculation and verification

 VM to Network packets are protected by lower stack checksum

– Inhibit TCP/UDP checksum calculation and verification

(25)

25 © IBM 2010

CPU affinity and pinning

 QEMU process contains 2 threads

– CPU thread (actually, one CPU thread per guest vCPU)

– IO thread

 Linux process scheduler selects core(s) to run threads on

 Many times scheduler made wrong decisions

– Schedule both on same core

– Constantly reschedule (core 0 -> 1 -> 0 -> 1 -> …)

 Solution/workaround – pin CPU thread to core 0, IO thread to core 1

(26)

Flow control

 Guest does not anticipate flow control at Layer-2

 Thus, host should not provide flow control

– Otherwise, bad effects similar to TCP-in-TCP encapsulation will happen

 Lacking flow control, host should have large enough socket buffers

 Example:

– Guest uses TCP

– Host buffers should be at least guest TCP’s bandwidth x delay

(27)

27 © IBM 2010

Performance results

Throughput

Receiver CPU Utilization

References

Related documents

Factors such as maternal age, booking status, parity, multiple gestation, mode of delivery, sex of the babies and maternal disease in the current pregnancy and other

Hempel, 3-manifolds as viewed from the curve complex, Topology 40 (3) (2001) 631–657] used the curve complex associated to the Heegaard surface of a splitting of a 3-manifold to

Loss figures for each province and cereal crop are adjusted considering the relevant agro- ecological factors like climate and scale of farming (commercial or subsistence) and

• Otherwise, personal devices not allowed on the secure network at all, only a guest

Network bridging Hardware acceleration libvirt and virt-manager Guest installation.. L2 Guest OS L2

The guest mode usage (including both guest kernel and guest user mode) accounts for total CPU used by VM where as CPU usage of hypervisor consists of Qemu (host user mode) and

Further Work on the Topic.. Hypervisor Pain Points Hardware Tracing, Logging, Security functionality Kernel Layer KVM/QEMU incl. HW acceleration PIC Guest VM

There will be an icon for “Available Networks” on your toolbar at either the top or bottom of your screen (depends on the laptop make and model), Right click on this and from the