• No results found

Last time. Data Center as a Computer. Today. Data Center Construction (and management)

N/A
N/A
Protected

Academic year: 2021

Share "Last time. Data Center as a Computer. Today. Data Center Construction (and management)"

Copied!
9
0
0

Loading.... (view fulltext now)

Full text

(1)

Data Center Construction

(and management)

Johan Tordsson

Department of Computing Science

Last time

1. Common (Web) application architectures

– N-tier applications

• Load Balancers • Application Servers • Databases

2. Cloud application guidelines

– Scalability – Fault-tolerance – Some best practices – A few Amazon examples

Today

• Data centers – how to build (and operate)

– Servers – Network – Storage – Power – Cooling – Energy-efficiency

– … and a building to keep everything in…

• Conceptual overview only

– Details about these only relevant for those who actually build/operate data centers…

Data Center as a Computer

• Majority of cloud computing infrastructure consists of reliable services delivered through data centers • Traditional co-location data centers

– Multiple servers and communications gear collocated due to common environmental & security needs – Hosts a large number of relatively small or

medium-sized applications, each running on a dedicated hardware infrastructure

• Data centers for cloud computing platforms

– Belongs to a single organization

– Uses a relatively homogeneous hardware and system software platform

– Common system management layer

(2)

Warehouse Scale

Computers (WSC)

• Not just a collection of servers

– 100s to 1000s coordinated servers – Typically runs on a virtualized platform – Fault behavior & energy considerations have

significant impact

– Needs to be considered as a single unit

• Must be highly manageable

– Deployment of software updates – Monitoring & system management

• Affordability

– Currently power public clouds such as Google, Amazon, Yahoo, Microsoft, etc…

– Soon to be affordable by Enterprises

• A rack of servers can easily have > 600 cores

What’s different about

WSC’s?

“As computation continues to move into the cloud, the computing platform of interest no longer resembles a pizza box or a refrigerator, but a warehouse full of computers. These new large datacenters are quite different from traditional

hosting facilitiesof earlier times and cannot be

viewed simply as a collection of co-located servers. Large portions of the hardware and software resources in these facilities must work in

concertto efficiently deliver good levels of

Internet service performance, something that can only be achieved by a holistic approach to their design and deployment. In other words, we must

treat the datacenter itself as one massive warehouse-scale computer (WSC).”

Google “Warehouse Style

Computer” Data Center

The New Data Center Industry

• Data centers replaces servers

• Container Computer for high efficiency and environmental conservation (Packaging, PUE, …) • Bundled software for integrated service, high

scalability, and availability

• Large Enterprise will bypass traditional server channels (IBM, HP, Dell, …)

– Purchase of entire data center directly from manufacturers

• Significant cost reductions • Horizontal scalability • High Availability

• Google already buys directly purchase from Taiwan

– Google 4thlargest server manufacturer, does not sell…

– Facebook’s opencompute.org project

(3)

Container Computers

9

Data Center Architecture

• Treat the entire data center as a computer

-

Air flow analysis

- Cooling architecture (thermal management) - Power/energy management

- Focus on ease of system and network management - What cannot be managed/monitored does not get deployed

• Modular and Scalable

- Card to Rack - Rack to Container - Container to Warehouse

• Explore low power, commodity CPU as a

building block

Data center server hardware

• Standard servers • Standard networks • Standard storage • But at a very large scale • Comparison: Parallel computer

– Custom high-performance hardware (?) – Fast interconnection networks

Design Motivation

• Multicore CPUs in mid-range servers typically carry a price/performance benefit

– 2-5 times cheaper than top-of-the-line systems

• Many services are memory-bound

– Faster CPUs do not scale well for large services – Applications are larger-than-server anyway

• Slower CPUs are more power efficient;

(4)

Cost comparison example

Server and network overview

• High-latency, low-price network

– Gigabit ethernet

• Hierarchy of commodity switches

Storage

• Increased space with distance • Decreased latency and bandwidth

Data center management

tools

Physical Cluster Deployment Tool

Virtual Cluster Provisioning

Power Management Intra-Virtual-Cluster Load Balancing Network/System Management Security

Cloud Application Management Tool

Physical Compute Servers

Network

(5)

Management (cont.)

• Virtualization Platform (virtualize everything)

– CPUs

– Storage (Filesystems) – Network

• Resource Management

– Provisioning of virtual clusters – Physical machine load balancing – Network traffic load balancing

• Power Management • Security

– Hypervisor protection – Isolation between clusters

• System Management • High Availability

– Physical component failure should not interrupt availability of virtual resources

• Cloud Applications management

• Unless a resource can be remotely managed, it should not be part of the data center…

Virtualization Platform

• Leverage existing hypervisors – Allocation of virtual machine instances – Monitor VM Performance – Virtual storage provisioning – Intra-VirtualCluster load balancing – Scalable data center network – Isolation between virtual clusters – Virtual machine migration Physical Node Physica l Node Stora ge Serve r Stora ge Serve r Physical Node Physical Node

Storage Server Storage Server Mail Virtual Cluster Compute Nodes Bkup Virtual Cluster HC Virtual Cluster AppXYZ Virtual Cluster Data Nodes Service Nodes System Service daemons Cloud OS agents

Virtual Machine Management

• Objective

– Power Management

– Physical Machine Load Balancing

• Monitor runtime VM statistics

– Heuristic calculation to predict workloads

• Determine power down/up of machines

– Multi-dimensional bin packing (knapsack)

• CPU, network, disk

– VM migration algorithm

• Physical machine load balancing

– Migration of VM’s to other physical machine

Power

• To run servers • To run data center

(6)

Power (cont.)

• Uninterruptable Power Supply (UPS)

– Detects power failure

– Batteries (for short-term outage + switch) – (Diesel) Generator (long-term outage)

• Power Distribution Unit (PDU)

– Fancy socket w. power distribution and/or control

• Power usage breakdown

Power (cont.)

• Data centers major power users

– Common claim: ~4% of world electricity use

• Example: Facebook in Luleå (120MW)

– ~1BSEK (1’000’000’000 SEK) / year (list prices)

• Exponential growth of data center capacity & cheaper server hardware

– Power costs (will) dominate. Exponential power use?

• Cost breakdown (examples):

Cooling

• Keep heat-generating servers cool • Computer Room Air Conditioners (CRAC)

-Like room air conditioner, for server rooms • Very complex to model and design

- Airflow 3D and non-linear

Cooling (cont.)

• Cooling by water

– Water cooling very close to servers

• Cooling by sea water

– Inf. availability of cool water

• Example: Google Finland • Cooling by location

(7)

Energy-efficiency

• Not all power is used by servers… • Power Usage Efficiency (PUE)

– Power used / power used computing:

• Typical: 2.0

• State-of-the-art: ~1.2

• Quite a few variants of the definition

– Many to make data centers look good

• Others look at power source

– Carbon vs. solar vs. ….

Energy-efficiency (cont.)

• Non-linear server power usage

– Performanc/power ration changes with load

• High server utilization beneficial

– But not common by default

Energy efficiency (cont.)

• 5k Google servers (6 months)

Energy efficiency (cont.)

• Consolidate workloads

– Power servers off – Or slow servers down

• Dynamic Voltage Frequency Scaling(DVFS)

– Very hard to assess impact for bursty (rapidly changing) workloads

• Oscillations and un-wanted correlations • More next time…

• Consolidation requires software support

– Must be able to start/stop instances and autoscale

(8)

Costs for a Data Center

• How much performance is required?

– How many/fast servers, disks, networks etc.? – Size of data center: Watt

• How much power is needed?

– PUE?

– How much cooling? – Price of electricity?

• What additional physical equipment is needed?

– Redundancy of power and cooling

• Where to place it, given the above?

– Costs vs. location of users

– Very attractive to host data centers…

Cloud computing = cost cuts?

• Amazon EC2 examples

• Small VM, 3 years full use (est. server lifetime)

– Per h: $0.08*(24*365*3) -> ~$2100 (!) – Reserved: $300 + $0.013*(24*365*3) -> $640

• Rough estimate of costs for Amazon (according to ”data center as a computer”)

– Assume server cost 25% of total cost (TCO) – Standard $2k (list price) 1U server today:

• 32 cores + memory, disk etc. Total cost $8k • Estimate: Can deliver 64 Small VMs

– Revenue: $2100*64 … -> ~17 times server cost! – Amazon does not pay list prices

• 90% discount rumoured

• With 24/7 use, hourly prices are very high…

Cloud cost life cycle

1. Develop service

– Run in-house for testing and very early use

2. Move to cloud-hosting

– To handle large scale-up of user base

3. Build own data center to cut hosting costs

– Once size of service is roughly known – Unless major price cuts by IaaS providers,

this will happen for more and more SaaS providers as server and data center costs drop…

Conclusions

• Data centers at warehouse scale

– More than just a group of servers – Holistic management perspective needed

• Standard solutions superior

– Off-the-shelf servers, networks, disks, etc. – Redundancy, scalability, etc. in software layer

(9)

Suggested reading

• ”Data center as a computer”

– Barroso & Hölzle (Google)

• Read (somewhat) carefully:

– Chapter 1, Chapter 3-5

• Focus on principles, ignore numbers (examples are a few years old...) • Skim:

– Chapter 2

• Overlaps texts from last lecture + Data management lecture

Next time….

• Thursday:

Data center #2: Autonomic management

– Data centers are large – Cloud services are complex – How to make these

• Configure themselves? • Optimize themselves? • Heal themselves?

• Delay project demos???

– From Thursday (31 May) to Monday (June 4)? – 3 hours, 13-16:

• 1h review + evaluation (me) • 2h presentation + demo (you)

References

Related documents

DCU - Data Center Utility; DCV - Data Center Virtualization; ; SaaS - Software as a Service; IaaS - Infrastructure as a Service; CRM - Customer Relationship Management; VM -

The firm shall have provided consulting engineering services involving preparation of project documents, and implementation on at least three 600 MW coal-fired

Combined with the draw of the Millennium Pipeline that will already be shipping gas by the time the Atlantic Sunrise Project goes into service, the increased takeaway capacity

• Classify tweets of a user filtered by a word as neutral, positive or negative. • Visualize a stream of tweets filtered by word and location. The reason why the first branch

In fact, a highly efficient virtual data center strategy will likely require solutions from multiple vendors, including virtualization management, server, networking, and

In addition to existing functions for virtualization of storage devices and data volume size, Hitachi Virtual Storage Platform also provides virtualization of tiered storage..

Virtual Storage Platform optimizes a return on assets (ROA) by orchestrating superior performance, utilization and capacity across multivendor systems, while delivering greater

There are several approaches in assessing genetic similarity between breeding material (i.e. inbred lines, hybrids, populations, landraces and races), which include