• No results found

Large-Scale Distributed Systems. Datacenter Networks. COMP6511A Spring 2014 HKUST. Lin Gu

N/A
N/A
Protected

Academic year: 2021

Share "Large-Scale Distributed Systems. Datacenter Networks. COMP6511A Spring 2014 HKUST. Lin Gu"

Copied!
23
0
0

Loading.... (view fulltext now)

Full text

(1)

[email protected]

COMP6511A Spring 2014 HKUST Lin Gu

Large-Scale Distributed Systems

(2)

Major Components of a Datacenter

• Computing hardware (equipment racks)

• Power supply and distribution hardware

• Cooling hardware and cooling fluid

distribution hardware

• Network infrastructure

• IT Personnel and office equipment

(3)

Growth Trends in Datacenters

• Load on network & servers continues to rapidly grow

– Rapid growth: a rough estimate of annual growth rate: enterprise datacenters: ~35%, Internet datacenters: 50% - 100%

– Information access anywhere, anytime, from many devices

• Desktops, laptops, PDAs & smart phones, sensor networks, proliferation of broadband

• Mainstream servers moving towards higher speed links

– 1-GbE to10-GbE in 2008-2009 – 10-GbE to 40-GbE in 2010-2012

• High-speed datacenter-MAN/WAN connectivity

– High-speed datacenter syncing for disaster recovery

(4)

• A large part of the total cost of the DC hardware

– Large routers and high-bandwidth switches are very expensive

• Relatively unreliable – many components may fail.

• Many major operators and companies design their

own datacenter networking to save money and

improve reliability/scalability/performance.

– The topology is often known – The number of nodes is limited

– The protocols used in the DC are known

• Security is simpler inside the data center, but

challenging at the border

• We can distribute applications to servers to distribute

load and minimize hot spots

(5)

Networking components (examples)

• High Performance & High Density Switches & Routers

– Scaling to 512 10GbE ports per chassis

– No need for proprietary protocols to scale

• Highly scalable DC

Border Routers

– 3.2 Tbps capacity in a single chassis

– 10 Million routes, 1 Million in hardware

– 2,000 BGP peers

– 2K L3 VPNs, 16K L2 VPNs – High port density for GE and

10GE application connectivity – Security

768 1-GE port Downstream 64 10-GE port Upstream

(6)

Common data center topology

Internet Servers Layer-2 switch Access Datacenter Layer-2/3 switch Aggregation Layer-3 router Core

Datacenter Networking

(7)

Data center network design goals

• High network bandwidth, low latency

• Reduce the need for large switches in the core

• Simplify the software, push complexity to the

edge of the network

• Improve reliability

• Reduce capital and operating cost

(8)

Avoid this…

Datacenter Networking

(9)

?

Can we avoid using high-end switches?

• Expensive high-end switches to scale up

• Single point of failure and bandwidth bottleneck

– Experiences from real systems

• One answer:

DCell

9

(10)

DCell Ideas

• #1: Use mini-switches to scale out

• #2: Leverage servers to be part of the routing

infrastructure

– Servers have multiple ports and need to forward

packets

• #3: Use recursion to scale and build complete

graph to increase capacity

(11)

One approach: switched network with

a hypercube interconnect

• Leaf switch: 40 1Gbps ports+2 10 Gbps ports.

– One switch per rack.

– Not replicated (if a switch fails, lose one rack of capacity)

• Core switch: 10 10Gbps ports

– Form a hypercube

• Hypercube – high-dimensional rectangle

(12)

Hypercube properties

• Minimum hop count

• Even load distribution for all-all communication.

• Can route around switch/link failures.

• Simple routing:

– Outport = f(Dest xor NodeNum)

– No routing tables

(13)

A 16-node (dimension 4) hypercube

0 3 2 1 0 0 1 2 3 3 1 1 3 0 2 0 2 1 5 4 6 7 3 2 10 11 15 14 8 9 13 12 1 1 1 1 1 1 1 1 1 1 1 1 3 3 3 3 2 2 2 2 2 2 2 2 2 2 2 2 0 0 0 0 0 0 0 0 0 0 0 0 3 3 3 3 3 3 3 3

Interconnect

(14)

64-switch Hypercube 63 * 4 links to other containers One container: 4 links

Level 0: 32 40-port 1 Gb/sec switches Level 1: 8 10-port 10 Gb/sec switches

64 10 Gb/sec links 16 10 Gb/sec links Level 2: 2 10-port 10 Gb/sec switches

1280 Gb/sec links 4X4 Sub-cube 4X4 Sub-cube 4X4 Sub-cube 4X4 Sub-cube 16 links 16 links 16 links 16 links

Interconnect

How many servers can be connected in this system? 81920 servers with 1Gbps bandwidth Core switch: 10Gbps port x 10

Leaf switch: 1Gbps port x 40 + 10Gbps port x 2.

(15)

The Black Box

(16)

Appendix

(17)

Typical layer 2 & Layer 3 in existing systems

• Layer 2

– One spanning tree for entire network

• Prevents looping

• Ignores alternate paths

• Layer 3

– Shortest path routing between source and

destination

– Best-effort delivery

(18)

Problem With common DC topology

• Single point of failure

• Over subscription of links higher up in the

topology

– Trade off between cost and provisioning

• Layer 3 will only use one of the existing equal

cost paths

• Packet re-ordering occurs if layer 3 blindly takes

advantage of path diversity

(19)

Fat-tree based Solution

Connect hosts together using a fat-tree topology

– Infrastructure consists of cheap devices

• Each port supports same speed as the end host

– All devices can transmit at line speed if packets are distributed along existing paths

– A k-ary fat-tree is composed of switches with k ports

• How many switches? … 5k2/4

• How many connected hosts? … k3/4

(20)

k-ary Fat-Tree (k=4)

Interconnect

k2/4 switches

Use the same type of switches in the core, aggregation, and edge, with each switch having k ports

k pods

k/2 switches/pod

k/2 switches/pod

(21)

Fat-tree Modified

• Enforce special addressing scheme in DC

– Allows host attached to same switch to route only through switch

– Allows inter-pod traffic to stay within pod – unused.PodNumber.switchnumber.Endhost

• Use two level look-ups to distribute traffic and maintain

packet ordering.

(22)

2-Level look-ups

• First level is prefix lookup

– Used to route down the topology to endhost

• Second level is a suffix lookup

– Used to route up towards core – Diffuses and spreads out traffic

– Maintains packet ordering by using the same ports for the same endhost

(23)

Comparison of several schemes

– Hypercube: high-degree interconnect for large net, difficult to scale incrementally

– Butterfly and fat-tree: cannot scale as fast as DCell – De Bruijn: cannot incrementally expand

– DCell: low bandwidth between two clusters (sub-DCells)

References

Related documents

verticillioides infection ( Table 2 ). verticillioides was considered as predominant species on maize with signifi-.. Represenation of phylogenetic tree for sequenced isolates of

A Core Component of the Data Center Strategy – the Government Private Cloud Project. To provide the State of Hawaii with a common platform for

Data Center Layer Data Center Layer Networking Layer Networking Layer Device Layer Device Layer Amazon Amazon Microsoft. Microsoft

Data Center Networks The Data Center Networks are  usually based on a fat‐tree  topology using commodity  switches Core Layer: Used to 

In the oneM2M architecture, the role of part of the Abstraction Layer and all of the original Semantic and Action Layers can be seen as a new CSF, called Semantic Rule and

In this paper we hope we demonstrated a basic description of the concept of early warning signs in projects and how performance measurement can be utilized as one of the tools which,

Our predictive variables include the elements of the application packet available to the admissions committee: undergraduate GPA, letters of reference, the admissions essay, and

• Enrollment information on page 1 represents all students enrolled in postsecondary education, regardless of institution type, as reported by the National Student Clearinghouse