• No results found

Challenges in Modern Data- Centers Management

N/A
N/A
Protected

Academic year: 2021

Share "Challenges in Modern Data- Centers Management"

Copied!
37
0
0

Loading.... (view fulltext now)

Full text

(1)

Challenges in Modern

Data-Centers Management

(2)

Information provided in these slides is

for educational purposes only

(3)

Welcome

Hebrew - Shalom

Arabic - Ahlan'wa sahla

Bosnian - Dobrodošli

Chinese (Cantonese) - (fòonying)

Chinese (Mandarin) -

[simplified],

歡迎

[traditional] (huānyíng)

Czech - Vítáme tĕ

Danish - Velkommen

Dutch - Welkom

French - Bienvenue

(4)

Outline

Introduction

• Administrative and academic stuff

Data centers

• History and facts

Our course

• Lecture by lecture – what will be covered in each?

(5)
(6)

Me (Edi)

With Intel since 2011

• Formerly with IBM (almost 17 years)

PhD in Computer Science from the Hebrew university

• Prof. Dror G. Feitelson

Interested in anything related to “Systems”

• OS, Virtualization, Storage, etc.

• Distributed systems – resource management & job scheduling

• Performance evaluation and modeling

• Etc.

(7)

Why the course?

Data centers are

big

businesses

• 50 years of technological evolution

• Special skills required to operate them (experience, legacy)

IT team in Intel (IDC-Haifa)

• Responsible for the data center facility, continuous operation, solutions development and deployment, users engagement, etc.

Huge

experience (legacy….)

Goal is to expose some of this experience in a structured way

1. Challenges we face

2. Solutions e.g., technologies, algorithms used to address them

(8)

Administration

Edi (me) – Main instructor and responsible for the course

[email protected]

Jalil (him) – Our super-talented teaching assistant

[email protected]

Danny (not here) – Advisor and high-level supervisor

[email protected]

Important dates

• Lectures: Wednesday’s 14:30-16:30 Taub 6

• Exams

• Moed A – 3/7/2015

• Moed B – 20/9/2015

(9)

Academic

Pre-requisites

• Basic knowledge on networking, computer and distributed systems e.g., clusters should be enough

Requirements

1. Must attend 80% of the lectures

2. Must deliver homework assignments (30% of the grade)

• 4-5 assignments

3. Final exam (70% of the grade)

• 2-3 open questions + few closed ones (multiple-choice)

• In the spirit of the homework assignments

Our site

(10)

Schedule (tentative)

(11)

Questions?

(12)

Data centers

(13)

Data center

Facility used to house computer systems and associated components

• Telecommunications, storage systems, etc.

“Production floor” of most modern companies, e.g.,

• Google – Information processing, etc.

• Amazon – Sales, Hosting (AWS), etc.

(14)

History

Started as a facility to house old complex computing systems

Challenges in Modern Data Centers Management, Spring 2015 14

(15)

History cont.

Big boost during the Dot-Com era (1997–2000)

• Companies emerged whose business solely surrounds the Web

• Requiring fast Internet connectivity and 24/7 non-stop operation

• Special facilities built to house such businesses

• Internet data centers (IDC)

• Leading to new technologies and practices

• Eventually migrated to the private data centers

• Grid-computing phenomenon

(16)

History cont.

Another big transformation as part of the Cloud era (2007+)

• New design and deployment philosophy

• Redundant (multiple copies), scalable (elasticity), high-availability (stateless)

• Technology makes “hosting” economically attractive

• Even for large-scale enterprises

• Environmental impact receives special attention

• Standard bodies specify requirements

• Huge effort to make data-centers appear “Green”

Challenges in Modern Data Centers Management, Spring 2015 16

(17)

Some facts

Large data center can consume as much electricity as a small town

• In 2010 data centers accounted for 1.1%-1.5% of the global electricity use

Electricity spends account for 25-30% of a data center TCO

(18)

Some facts cont.

Average life of a data center is 9 years

• Older than 7 years considered out-of-date (Green-computing)

Minute of data-center downtime may cost tens-of-thousands of $$

• High-availability is critical component in the design

(19)

Facebook

(20)

IBM BlueGene/P

Challenges in Modern Data Centers Management, Spring 2015 20

Originally posted to Flickr, CC BY-SA 2.0

(21)
(22)

Our course

Focuses on common management challenges

• Generic enough so they fit most usage models

Impossible to cover all challenges

• Filtered the ones team has experience with

• Chose the ones we believe are most important

The team

• Responsible for data centers facility, continuous

operation, solution development & deployment, etc.

Domain experts

• Facility, networking, resource management, storage, business intelligence and analytics, security, etc.

Challenges in Modern Data Centers Management, Spring 2015 22

Introduction Facility basics Networks RM Part I RM Part II RM Part III Data access Business Intelligence Predictive Analytics DC visit Security Part I Security Part II Summary

(23)

Facility basics

Building a data center is expensive

• Single rack location construction can cost up to $80K

• Total spend can reach hundreds-of-millions of $$

Four main elements of the facility

• Power, Cooling, Space, Networks

Total Cost of Ownership (TCO)

• Initial capital (CapEx)

• Long-term operational expenditures (OpEx)

Power Usage Effectiveness (PUE)

• Power efficiency performance indicator

Facility basics

(24)

Facility – challenges

Optimizing cooling

• Hot Isle, Hot/cold air containment, Free cooling

Optimizing power feeding

• Redundancy dilemma

• AC vs. DC

Optimizing refresh rate

• 4-year optimal lifespan

Challenges in Modern Data Centers Management, Spring 2015 24

Facility basics

(25)

Networks

Veins and arteries of the data center

• Play key role in its performance and high-availability

Ensuring adequate availability

• Redundancy at layer 2 (data-link)

• Spanning Tree Protocol (SPT) & RSPT

• Per VLAN spanning tree (PVST)

• Multi-Switch Link aggregation (M-LAG)

• Redundancy at layer 3 (IP)

• Virtual Router Redundancy Protocol (VRRP)

(26)

Resource management I – III

5% resource waste in 10K-server data center can cost $3K/day ($1M /year)

• It is critical to utilize resources efficiently as possible

Resource management system (RMS) / Scheduler

1. Accepts requests from the users (millions per-day)

• VMs (Amazon), Map-reduce (Hadoop), Chip simulations (Intel), etc.

2. Queues and prioritizes them (decides which job to execute next)

• Subject to constraints, e.g., ensuring shares, deadlines, etc.

3. Allocates resources and launches the jobs on selected resources

• Various heuristics

Challenges in Modern Data Centers Management, Spring 2015 26

RM Part I-III RM Parts I-III

(27)

Resource management I

Proportional-share scheduling

• Very common scheduling heuristics used in data centers

• Every entity (VO, project, user) should get its promised share of the resources

Challenges

1. How to measure resource consumption ?

• 1-core X 4GB vs. 3-cores X 1GB

2. How to ensure fast ramp-up

• Limits, logical and physical buffers

3. Considering history

• Is this really important?

(28)

Resource management II

Matching the jobs with available resources

• Best-fit, worst-fit, first-fit, random, mix-fit, dynamic-programming, etc.

Challenges

1. Optimizing resource matching

• Single vs. multiple dimensions

• One job at a time vs. multiple jobs (look-ahead)

2. Dealing with jobs that cannot be scheduled

• Reservation (backfilling)

Challenges in Modern Data Centers Management, Spring 2015 28

(29)

Resource management III

Going global (meta-scheduling)

• Ensuring QoS, Load balancing

Practical considerations

• Scalability, Robustness, Usability

(30)

Data Access

Jobs (VMs, map-reduce, simulations) use data

• Huge burden on the storage (DoS attacks)

Challenges

1. Avoiding DoS within a data center

• Storage-side: Scale-out storage, Parallel NFS, etc.

• Client-side: cacheFS, CaMA (RO)

2. Enabling remote data access (going global)

• Synchronous and asynchronous replications

• Site-level caching, etc.

Continuous Integration (CI) use case

• Know your workload…

Challenges in Modern Data Centers Management, Spring 2015 30

(31)

Our course cont.

IntroductionFacility basics Networks RM Part I RM Part II RM Part III Data access Business Intelligence Predictive Analytics DC visit Security Part I

(32)

Business Intelligence (BI)

Goal is to provide insights on the data center to help optimize its operation

• E.g., statistics on resource usage to help deiced which equipment to buy

Involves collecting, preparing, storing,

analyzing

, and accessing the data

• Challenges in each layer e.g., impact on source system, responsiveness, etc.

Focusing on

data analysis

• Optimizing data queries (SQL)

Join-sort-aggregate implementations

• How to assemble them optimally using time and space considerations

Challenges in Modern Data Centers Management, Spring 2015 32

(33)

Predictive analytics

One of the important usages for BI in the data center

• Help systems e.g., job scheduler, take data-driven actions in real time

Deep dive into one such use-case

• Predicting jobs resource usage for optimizing resource allocation

Data-Stream Mining (DSM)

• Continuous (endless) rapid incoming data

• Machine-learning must be applied online

(34)

Predictive analytics – challenges

Performance (real-time)

• Impossible to store all data (train) – each sample must be processed once

Adaptability

• Non-stationary data – model must be adaptable (sliding windows)

Quality, availability

• Perform at least as good as “no-stream” models

• Prediction must be provided continuously

Cover well known algorithms

• Regression trees, Decision trees, etc.

• Multiple Sliding Windows (MSW)

• Mimran & Even, 2013

Challenges in Modern Data Centers Management, Spring 2015 34

(35)

Security

Securing the data center is complex

• Security breach can cause real money, reputation, IP loss and legal actions

Security control model helps organize things

• Divide the data center into layers

• For each layer describe its attack vectors, vulnerabilities and controls

Challenges in two layers

1. Applications

• Web applications

• Code injections, e.g., SQL injection

• Web manipulations e.g., ClickJacking

• Web services

2. Identities Access Management (IAM)

• Managing multiple identities (SAML)

• Authentication

• Knowledge factors: Passwords, Kerberos

(36)

Visit to IDC data center (3/6)

Challenges in Modern Data Centers Management, Spring 2015 36

(37)

Summary

Course is unique

• Covers actual challenges encountered in real environments

• Delivered by domain experts with

huge

experience in designing and deploying solutions

Interaction is important

• Don’t hesitate to ask (tough) questions

Enjoy

References

Related documents

Physical markup is of limited use for the comic strip finder (the tags do not reflect the structure of the comic content)... Markup Languages Sample

Marketing literature also recognizes that stakeholders play a significant role in influencing organizations and markets (Davis, 1992; McIntosh, 1990; Polonsky, 1994; M.. Pujari et al.

Awulachew (2010) and MoARD (2O11) indicated that Ethiopia has abundant water resources, land, irrigation potential and diversified agro-ecological which is suitable

Each seminar group has a dedicated wiki section on the University’s Virtual Learning Environment (VLE) powered by Blackboard and the Campus Pack Wiki. This enables

Thirdly, and situated at the same level of analysis, mapping and assessing co-patenting behaviour – involving companies and research organizations including universities – can

Richardson and Welker (2001) observe that there is positive relation between social disclosure (which subsumes environmental disclosure) and cost of capital, with more

If you have audio files that you need to convert to .mp3, try Mobile Media Converter, a free, easy-to-use program available for Windows, Mac, and Linux.. Download Mobile

oligohydramnios may have experienced congenital infections and conditions unfavorable for lung development. Which of the two conditions is more influential in determining the