• No results found

COM 444 Cloud Computing

N/A
N/A
Protected

Academic year: 2021

Share "COM 444 Cloud Computing"

Copied!
18
0
0

Loading.... (view fulltext now)

Full text

(1)

1

Prof. Dr. Halûk Gümüşkaya

[email protected]

[email protected] http://www.gumuskaya.com

Computing Engineering Department

COM 444 Cloud Computing

Lec 4: Cloud Platform Architecture over

Virtualized Data Centers

Data Center Design and Networking

2

1. What is a Data Center?

2. What does a Data Center Look Like? 3. Warehouse-Scale Data Center Design 4. Power and Cooling Requirements 5. Data-Center Interconnection Networks 6. Design Considerations for WSC

7. Data Centers around the World

Data Center Design and Networking

What is a Data Center (Cloud)?

 A single-site cloud (aka “Datacenter”)consists of

 Compute nodes (grouped into racks)  Switchesconnecting the racks  A network topology, e.g., hierarchical  Storage(backend) nodes connected to the

network

 Front-endfor submitting jobs  Software services

 A geographically distributed cloud consists of  Multiple such sites

 Each site perhaps with a different structure

and services

What(’s new) in Today’s Clouds?

Four major features:

1. Massive scale 2. On-Demand Access

 Pay-as-you-go, no upfront commitment

 Anyone can access it

3. Data-Intensive Nature

 What was MBs has now become TBs, PBs and XBs.

4. New Cloud Programming Paradigms

 MapReduce/Hadoop, NoSQL/Cassandra/MongoDB and many others.

 High in accessibility and ease of programmability

(2)

5

Servers on Clusters

Clusters:Commodity computers connected by commodity Ethernet switches:

1. More scalable than conventional servers 2. Much cheaper than conventional servers

– 20X for equivalent vs. largest servers

3. Dependability via extensive redundancy 4. Few operators for 1000s servers

– Careful selection of identical HW/SW – Virtual Machine Monitors simplify

operation

6 1. What is a Data Center?

2. What does a Data Center Look Like?

3. Warehouse-Scale Data Center Design

4. Power and Cooling Requirements 5. Data-Center Interconnection Networks 6. Design Considerations for WSC

7. Data Centers around the World

Data Center Design and Networking

7

What does a

Datacenter

Look Like?

Front Back

In Some highly secure (e.g., financial info) 8

What does a DatacenterLook Like?

 A single data center can easily contain 10,000 racks with 100 cores in each rack (1,000,000 cores total)

Google data center in The Dalles, Oregon Data centers

(size of a football field) Cooling plant

(3)

9 This data center is  11.5 times  the size of a football field Technology Cost in small‐ sized Data  Center Cost in Large  Data Center Ratio Network $95 per Mbps/ Month $13 per Mbps/ month 7.1 Storage $2.20 per GB/ Month $0.40 per GB/ month 5.7 Administration ~140 servers/ Administrator >1000 Servers/ Administrator 7.1

Range in size from

“edge” facilities to “megascale” (100K to 1M servers)

Economies of Scale

Approximate costs for a

small size center (1K servers)and a larger, 400K server center

Cloud

is built on

Massive Datacenters

The larger the data center, the lower the operational cost

10

What if even a Data Center is not Big Enough?

Network of Data Centers  Build additional data centers  Where? How many?

 ….

Global Distribution

 Data centers are often globally distributed

 Example above: Google data center locations (inferred)

 For more info: http://www.google.com/about/datacenters/

 Microsoft has about 100 data centers, large or small, which are distributed around the globe.

 Why?

 Need to be close to users (physics!)

 Cheaper resources

Trend: Modular Data Center:

Warehouse-Scale Computer

 Need more capacity?

(4)

13

Warehouse-Scale Computer (WSC)

Provides Internet Services

Search, social networking, online maps, video sharing, online shopping, email, cloud computing, etc.

Differences with HPC “Clusters”:

HPC clusters have higher performanceprocessors and network

HPC clusters emphasize thread-level parallelism, WSCs emphasize request-level parallelism.

Differences with Data Centers:

Datacenters consolidate different machines and softwareinto one location

Datacenters emphasize Virtual Machinesand Hardware Heterogeneity in order to serve varied customers

14

Larger Datacenter Growth

One at a time:

 1 system

 Racking & networking: 14 hrs ($1,330)

Rack at a time:

 ~ 40 systems

 Install & networking: .75 hrs ($60)

Container at a time:

 ~1,000 systems

 No packaging to remove

 No floor space required

 Power, network, & cooling only

Weatherproof & easy to transport

Datacenter construction takes 24+ months

 Both new build & DC expansion require regulatory approval

15

Data Center Videos to Watch

• Inside Google's Data Center (CBS News, November 2012):

http://www.youtube.com/watch?v=PBx7rgqeGG8

• A virtual walk through Facebook’s Datacenter in Prineville,

Oregon (Facebook OpenCompute)

• Source: Gigaom article from 2012

- http://gigaom.com/cleantech/a-rare-look-inside-facebooks-oregon-data-center-photos-video/

• Microsoft GFS Datacenter Tour

http://www.youtube.com/watch?v=hOxA1l1pQIw

Timelapse of a Datacenter Construction on the Inside (Fortune 500 company)

16 1. What is a Data Center?

2. What does a Data Center Look Like?

3. Warehouse-Scale Data Center Design

4. Power and Cooling Requirements

5. Data-Center Interconnection Networks 6. Design Considerations for WSC

7. Data Centers around the World

(5)

17

Racks

 Equipment (e.g., servers) are typically placed in racks.  Equipment are designed in a modular fashion to fit into

rack units (1U, 2U etc.).

 A single rack can hold up to 42 1U servers.

A blade server is a stripped down computer with a modular design

18

The Architecture of a Small Server Cluster

( ~ 1000 servers)

Server in 1U or blade enclosure format

7’ rack with Ethernet switch

Small cluster with a cluster level Ethernet switch/router

interconnected by an Ethernet switch and housed in a warehouse or in a container environment

Typical elements in warehouse-scale systems

Rack-level switch can use 1- or 10Gbps links

Architecture of WSC

 WSC often use a hierarchy of networks for interconnection.  Networking fabric of WSCs is often organized as the 2-level

hierarchy.

 1-Gbps Ethernet switches with up to 48 ports are essentially a commodity component, costing less than $30/Gbps per server to connect a single rack.

 Each rack holds up to 42 1U servers connected to a rack switch

 Rack switches are uplinked to switch higher in the hierarchy

Uplink has 48 / n times lower bandwidth, where n = # of uplink ports

Goal is to maximize locality of communication relative to the rack

Standard Data Center Networking

for the Cloud to Access the Internet

(6)

21

Data Center Networking

Server racks TOR switches Tier‐1 switches Tier‐2 switches Load  balancer Load  balancer B 1 2 3 4 5 6 7 8 A C Border router Access router Internet

load balancer: application-layer routing

 receives external client requests

 directs workload within data center

 returns results to external client (hiding data center internals from client)

22

Data Center Networking

Server racks TOR switches Tier‐1 switches

Tier‐2 switches

1 2 3 4 5 6 7 8

 rich interconnection among switches, racks:

 increased throughput between racks (multiple routing paths possible)

 increased reliability via redundancy

23

Storage and Array Switch

Storage options:

Use disks inside the servers, or

Network Attached Storage (NAS) through

Infiniband

.

WSCs generally rely on

local disks

.

• Google File System (

GFS

) uses local disks and

maintains at least 3 replicas.

Switch that connects an array of racks

Array switch should have 10X the bisection

bandwidth of rack switch.

Cost of n-port switch grows as n2.

Often utilize content addressable memory chips

and FPGAs.

(Courtesy of Hennessy and Patterson, 2012) 24

Memory and Storage Hierarchy of a WSC

(7)

25

A Programmer’s View of Storage Hierarchy of a

Typical WSC

 A server consists of

a number of processor sockets, each with a multicore CPU

internal cache hierarchy

local shared and coherent DRAM

a number of directly attached disk drives.

 The DRAM and disk resources within the rack are

accessible through the first-level rack switches

(assuming some sort of remote procedure call API to them)

 All resources in all racks are accessible via the

cluster-level switch.

26

Bandwidth and Latency between these Layers

Performance Accross Blades

Network is usually the bottleneck

Consider bandwidth and latency across blades

Example: Quantifying Latency, Bandwidth, and Capacity  Assume a system with 2,000 servers, each with 8 GB of

DRAM and four 1-TB disk drives.

 Each group of 40 servers is connected through a 1-Gbps

link to a rack-level switch that has an additional 8 1-Gbps ports used for connecting the rack to the cluster-level switch.

 Network latency numbers assume a socket-based TCP-IP transport, and networking bandwidth values assume that each server behind an oversubscribed set of uplinks is using its fair share of the available cluster-level

(8)

29

Latency, Bandwidth, and Capacity of a WSC

30

WSC Memory Hierarchy

 Servers can access DRAM and disks on other servers

using a NUMA-style interface

(Courtesy of Hennessy and Patterson, 2012)

31 1. What is a Data Center?

2. What does a Data Center Look Like? 3. Warehouse-Scale Data Center Design

4. Power and Cooling Requirements

5. Data-Center Interconnection Networks

6. Design Considerations for WSC 7. Data Centers around the World

Data Center Design and Networking

32

Typical Datacenter Layout

(9)

33

Power Consumption in Servers

34

Power and Cooling Requirements

 Cooling system also uses water (evaporation and spills)

 E.g. 70,000 to 200,000 gallons per day for an 8 MW facility  Power cost breakdown

 Chillers: 30-50% of the power used by the IT equipment

 Air conditioning: 10-20% of the IT power, mostly due to fans  How many servers can a WSC support?

 Each server:

• “Nameplate power rating” gives maximum power consumption

• To get actual, measure power under actual workloads

 Oversubscribe cumulative server power by 40%, but monitor power closely

Measuring Efficiency of a WSC

 Power Utilization Effectiveness (PEU)

 = Total facility power / IT equipment power

 Median PUE on 2006 study was 1.69

 Performance

Latencyis important metric because it is seen by users

 Bing study: users will use search less as response time increases

 Service Level Objectives (SLOs)/Service Level Agreements (SLAs)

 E.g. 99% of requests be below 100 ms

Efficiency of a WSC

Figure 4.9 The cooling system in a raised-floor data center with hot-cold air circulation supporting water heat exchange facilities

(10)

37

Green Cloud Data Centers

38 Keeping Computers Cool

39 1. What is a Data Center?

2. What does a Data Center Look Like? 3. Warehouse-Scale Data Center Design 4. Power and Cooling Requirements

5. Data-Center Interconnection Networks

6. Design Considerations for WSC

Data Center Design and Networking

40

Requirements of Interconnection Network

The data-center interconnection network design

must meet 5 special requirements:

Low latency

High bandwidth

Low cost

Message-Passing Interface (MPI) communication

support

Fault tolerance

The design of an inter-server network must satisfy

both

point-to-point

and

collective communication

(11)

41

Application Traffic Support

The network topology should support all MPI

communication patterns.

Both

point-to-point and collective MPI

communications

must be supported.

The network should have high bisection bandwidth

to meet this requirement.

 For example, one-to-many communications are used for supporting distributed file access. One can use one or a few servers as metadata master servers which need to communicate with slave server nodes in the cluster.

To support the

MapReduce

programming paradigm,

the network must be designed to perform the map

and reduce functions at a high speed.

42

Network Expandability

 Data centers are not built by piling up servers in multiple

racks today.

 Instead, data-center owners buy server containers while

each container contains several hundred or even thousands of server nodes.

 The owners can just plug in the power supply, outside

connection link, and cooling water, and the whole system will just work.

 This is quite efficient and reduces the cost of purchasing

and maintaining servers.

 One approach is to establish the connection backbonefirst

and then extend the backbone links to reach the end servers.

Google Container Based Data Center

http://www.youtube.com/watch?v=zRwPSFpLX8I

(12)

45

Fault Tolerance and Graceful Degradation

The interconnection network should provide some

mechanism to

tolerate link or switch failures

.

In addition,

multiple paths

should be established

between any two server nodes in a data center.

Fault tolerance of servers is achieved by

replicating

data and computing

among redundant servers.

Similar redundancy technology should apply to the

network structure.

One the software side, the

software layer should be

aware of network failures

. Packet forwarding should

avoid using broken links.

In case of failures, the network structure should

degrade gracefully amid limited node failures.

Hot-swappable components are desired.

46

Two Approaches to Building Data-Center-Scale

Networks

Switch-centric:

 The switches are used to connect the server nodes.

 It does not affect the server side.

Server-centric:

 The server-centric design does modify the operating system running on the servers.

 Special drivers are designed for relaying the traffic.

 Switches still have to be organized to achieve the connections.

47

A Fat-Tree Interconnection Network for Data Centers

The failure of an aggregation switch and core switch will not affect the connectivity of the whole network. The failure of any edge switch can only affect a small number of end server nodes.

48

A Fat-Tree Interconnection Network for Data Centers

The topology is organized into two layers.

Server nodes are in the bottom layer, &

edge switches

are used to connect the nodes in the bottom layer.

The upper layer aggregates the lower-layer edge

switches.

A group of

aggregation switches

, edge switches, and

their leaf nodes form a

pod

.

Core switches

provide paths among different pods.

The fat-tree structure provides

multiple paths

between

any two server nodes.

 This provides fault-tolerance capability with an alternate path in case of some isolated link failures.

The extra switches in a pod provide

higher bandwidth

to support cloud applications in massive data

(13)

49

Modular Data Center in Shipping Containers

 A modern data center is structured as a shipyard of server

clusters housed in truck-towed containers.

 Inside the container, hundreds of blade servers are housed

in racks surrounding the container walls.

 The SGI ICE Cube container van house 46,080 processing coresor 30 PB of storage per container.

 Large-scale data center built with modular containers

appear as a big shipping yard of container trucks.

50

Motivations for Container-Based Data Center

This container-based data center was motivated

by demand for

 Lower power consumption,

 Higher computer density, and

 Mobility to relocate data centers to better locations with lower electricity costs, better cooling water supplies, and cheaper housing for maintenance engineers.

Sophisticated Cooling Technology

 Enables up to 80% reduction in coolingcosts compared with traditional warehouse data centers.

 Both chilled air circulation and cold water are flowing through the heat exchange pipes to keep the server racks cool and easy to repair.

Interconnection of Modular Data Centers

Container-based data-center modules

are meant for

construction of even larger data centers using a

farm of container modules.

Some proposed designs

of container modules:

Guo, et al. have developed a

server-centric BCube

network

( next figure ) for interconnecting modular

data centers.

The servers are represented by circles, and

switches by rectangles. The BCube provides a

layered structure. The bottom layer contains all the

server nodes and they form Level 0. Level 1

switches form the top layer of BCube 0.

A Server-Centric Network for a Modular Data Center

Figure 4.12 BCube: A high performance, server-centric network for building modular datacenters. (Courtesy of C. Guo, et al, ACM SIGCOMM Computer Communication Review, Oct. 2009. [25]).

The BCube provides a kernel module in the server OS to perform routing operations. The kernel module supports packet forwarding while the incoming

(14)

53

Inter-Module Connection Networks

 The BCube is commonly used inside a server container.  The containers are considered the building blocks for

data centers.

 Thus, despite the design of the inner container network,

one needs another level of networking among multiple containers.

 In the next figure , Wu, et al. have proposed a network

topology for intercontainer connection using the aforementioned BCube network as building blocks.

 The proposed network was named MDCube (for Modularized Datacenter Cube ).

 This network connects multiple BCube containers by

using high-speed switches in the BCube.

 Similarly, the MDCube is constructed by shuffling

networks with multiple containers.

54

Modularized Datacenter Cube

Figure 4.13 A 2-D MDCube is constructed from 9 BCube containers. (Courtesy of . Wu, et al, ACM CoNEXT’09, Dec. 2009, [77]).

55 1. What is a Data Center?

2. What does a Data Center Look Like? 3. Warehouse-Scale Data Center Design 4. Power and Cooling Requirements 5. Data-Center Interconnection Networks

6. Design Considerations for WSC

7. Data Centers around the World

Data Center Design and Networking

56

Design Considerations for WSC

 Cost-performance  Small savings add up  Energy efficiency

 Affects power distribution and cooling  Work per joule

 Dependability via redundancy  Network I/O

 Interactive and batch processing workloads  Ample computational parallelism is not important

 Most jobs are totally independent  “Request-level parallelism”  Operational costs count

 Power consumption is a primary constraint when designing system

 Scale and its opportunities and problems

 Can afford customized systems since WSC require volume purchase

(15)

57

WSCs offer Economies

WSCs offer economies of scale that cannot be

achieved with a datacenter:

5.7 times reduction in storage costs

7.1 times reduction in administrative costs

7.3 times reduction in networking costs

This has given rise to cloud services such as

Amazon Web Services

• “Utility Computing”

• Based on using open source virtual machine

and operating system software

(Courtesy of Hennessy and Patterson, 2012)

58

Data-Center Management Issues

Making common users happy

 The data center should be designed to provide quality service to the majority of users for at least 30 years??

Controlled information flow

 Information flow should be streamlined. Sustained services and high availability (HA) are the primary goals.

Multiuser manageability

 The system must be managed to support all functions of a data center, including traffic flow, database updating, and server maintenance.

Scalability to prepare for database growth

 The system should allow growth as workload increases. The storage, processing, I/O, power, and cooling

subsystems should be scalable.

Data-Center Management Issues (cont.)

Reliability in virtualized infrastructure

 Failover, fault tolerance, and VM live migration should be integrated to enable recovery of critical applications from failures or disasters.

Low cost to both users and providers

 The cost to users and providers of the cloud system built over the data centers should be reduced, including all operational costs.

Security enforcement and data protection

 Data privacy and security defense mechanisms must be deployed to protect the data center against network attacks and system interrupts and to maintain data integrity from user abuses or network attacks.

Green information technology

(16)

61

Challenges in Cloud Computing (1)

Concerns from The Industry (Providers)

Replacement Cost

 Exponential increase in cost to maintain the infrastructure

Vendor Lock-in

 No standard API or protocol can be very serious

Standardization

 No standard metric for QoS is limiting the popularity

Security and Confidentiality

 Trust model for cloud computing

Control Mechanism

 Users do not have any control over infrastructures

62

Challenges in Cloud Computing (2)

Concerns from Research Community:  Conflict to legacy programs

 With difficulty in developing a new application due to lack of control

 Provenance

 How to reproduce results in different infrastructures  Reduction in Latency

 No specially designed interconnect used

 Very low controllability in layout of interconnect due to abstraction

 Programming Model

 Hard to debug where programming naturally error-prone

 Details about infrastructure are hidden  QoS Measurement

 Especially for ubiquitous computing where context changes

63 1. What is a Data Center?

2. What does a Data Center Look Like? 3. Warehouse-Scale Data Center Design 4. Power and Cooling Requirements 5. Data-Center Interconnection Networks 6. Design Considerations for WSC

7. Data Centers around the World

Data Center Design and Networking

64

Colocation Data Centers

Currently there are 3056 colocation data centersfrom 95 countries in the index.

(17)

65

Colocation Data Centers

Currently there are 3056 colocation data centersfrom 95 countries in the index.

http://www.datacentermap.com/datacenters.html 66

Colocation Turkey

Currently there are 29 colocation data centersfrom 7 areas in Turkey (Türkiye).

http://www.datacentermap.com/datacenters.html

Data Center Map

The data centers listed are just the ones updated by

users and editors.

In addition, corporate data centers are conspicuously

missing in the list

 For instance the ones set up by multinationals like Google, Microsoft and Intel.

However, the site offers a comprehensive list of data

centers grouped in a country-by-country list, which

gives a clear picture of the distribution of datacenters

globally.

(18)

69

Locations of Google Data Centers

http://www.google.com/about/datacenters/inside/locations/

70

Acknowledgements

The slides have been based in-part upon original slides of a number of books and Profesors including:

Distributed and Cloud Computing: From Parallel Processing to The Internet of Things, K. Hwang, G. Fox and J. Dongarra, Morgan Kaufmann Publishers, 2012.

The Datacenter as a Computer, An Introduction to the Design of Warehouse-Scale Machines, L. A. Barroso, U. Hölzle (Google Inc.), (Mark D. Hill, Series Editor), Morgan & Claypool, 2009.

High Performance Datacenter Networks, Architectures, Algorithms, and Opportunities, D. Abts, J. Kim, (Mark D. Hill, Series Editor), Morgan & Claypool, 2011.

References

Related documents

Our study is the first to focus on observing the global BitTorrent network; this enables better understanding of the network, including the identification and quantification of two

Prophet Muhammed PBUH had a firm belief in the existence of a cause and a cure for every disease and that was described in many prophetic hadiths such as: “Make use

A nested polymerase chain reaction (nPCR)-based assay, was developed and evaluated for rapid detection of Trypanosoma evansi in experimentally infected mice and naturally

Radič ethernetu má totiž vyvedené všetky potrebné vstupy a výstupy na externé piny a preto je možné pripojiť akúkoľvek EPHY, ktorá podporuje rozhranie MII.. Signály

Mechanism, by stakeholder uptake Improved sector capacities DCI, fuel nominal DCI, fuel high A-CDM Improved passenger reaccomm. Based on airline implementation costs –

The inventory of existing conditions included a review of local emergency operations plans, review of resources that local agencies would be able to provide in event of an

Free ROBUX generators might have worked a while ago, before ROBLOX moderators upped their game and decided to stop these sketchy websites from supplying working promo codes.. Now

Nasleduje prehľad jednotlivých algoritmov a postupov, ktoré sa používajú pre detekciu objektov, alebo je možné pomocou nich objekt v obraze nájsť.. V tejto časti sú