Challenges in Modern
Data-Centers Management
Information provided in these slides is
for educational purposes only
Welcome
•
Hebrew - Shalom
•
Arabic - Ahlan'wa sahla
•
Bosnian - Dobrodošli
•
Chinese (Cantonese) - (fòonying)
•
Chinese (Mandarin) -
欢
迎
[simplified],
歡迎
[traditional] (huānyíng)
•
Czech - Vítáme tĕ
•
Danish - Velkommen
•
Dutch - Welkom
•
French - Bienvenue
Outline
•
Introduction
• Administrative and academic stuff
•
Data centers
• History and facts
•
Our course
• Lecture by lecture – what will be covered in each?
Me (Edi)
•
With Intel since 2011
• Formerly with IBM (almost 17 years)
•
PhD in Computer Science from the Hebrew university
• Prof. Dror G. Feitelson
•
Interested in anything related to “Systems”
• OS, Virtualization, Storage, etc.
• Distributed systems – resource management & job scheduling
• Performance evaluation and modeling
• Etc.
Why the course?
•
Data centers are
big
businesses
• 50 years of technological evolution
• Special skills required to operate them (experience, legacy)
•
IT team in Intel (IDC-Haifa)
• Responsible for the data center facility, continuous operation, solutions development and deployment, users engagement, etc.
•
Huge
experience (legacy….)•
Goal is to expose some of this experience in a structured way
1. Challenges we face
2. Solutions e.g., technologies, algorithms used to address them
Administration
•
Edi (me) – Main instructor and responsible for the course
•
Jalil (him) – Our super-talented teaching assistant
•
Danny (not here) – Advisor and high-level supervisor
•
Important dates
• Lectures: Wednesday’s 14:30-16:30 Taub 6
• Exams
• Moed A – 3/7/2015
• Moed B – 20/9/2015
Academic
•
Pre-requisites
• Basic knowledge on networking, computer and distributed systems e.g., clusters should be enough
•
Requirements
1. Must attend 80% of the lectures
2. Must deliver homework assignments (30% of the grade)
• 4-5 assignments
3. Final exam (70% of the grade)
• 2-3 open questions + few closed ones (multiple-choice)
• In the spirit of the homework assignments
•
Our site
Schedule (tentative)
Questions?
Data centers
Data center
•
Facility used to house computer systems and associated components
• Telecommunications, storage systems, etc.
•
“Production floor” of most modern companies, e.g.,
• Google – Information processing, etc.
• Amazon – Sales, Hosting (AWS), etc.
History
•
Started as a facility to house old complex computing systems
Challenges in Modern Data Centers Management, Spring 2015 14
History cont.
•
Big boost during the Dot-Com era (1997–2000)
• Companies emerged whose business solely surrounds the Web
• Requiring fast Internet connectivity and 24/7 non-stop operation
• Special facilities built to house such businesses
• Internet data centers (IDC)
• Leading to new technologies and practices
• Eventually migrated to the private data centers
• Grid-computing phenomenon
History cont.
•
Another big transformation as part of the Cloud era (2007+)
• New design and deployment philosophy
• Redundant (multiple copies), scalable (elasticity), high-availability (stateless)
• Technology makes “hosting” economically attractive
• Even for large-scale enterprises
• Environmental impact receives special attention
• Standard bodies specify requirements
• Huge effort to make data-centers appear “Green”
Challenges in Modern Data Centers Management, Spring 2015 16
Some facts
•
Large data center can consume as much electricity as a small town
• In 2010 data centers accounted for 1.1%-1.5% of the global electricity use
•
Electricity spends account for 25-30% of a data center TCO
Some facts cont.
•
Average life of a data center is 9 years
• Older than 7 years considered out-of-date (Green-computing)
•
Minute of data-center downtime may cost tens-of-thousands of $$
• High-availability is critical component in the design
IBM BlueGene/P
Challenges in Modern Data Centers Management, Spring 2015 20
Originally posted to Flickr, CC BY-SA 2.0
Our course
•
Focuses on common management challenges
• Generic enough so they fit most usage models
•
Impossible to cover all challenges
• Filtered the ones team has experience with
• Chose the ones we believe are most important
•
The team
• Responsible for data centers facility, continuous
operation, solution development & deployment, etc.
•
Domain experts
• Facility, networking, resource management, storage, business intelligence and analytics, security, etc.
Challenges in Modern Data Centers Management, Spring 2015 22
Introduction Facility basics Networks RM Part I RM Part II RM Part III Data access Business Intelligence Predictive Analytics DC visit Security Part I Security Part II Summary
Facility basics
•
Building a data center is expensive
• Single rack location construction can cost up to $80K
• Total spend can reach hundreds-of-millions of $$
•
Four main elements of the facility
• Power, Cooling, Space, Networks
•
Total Cost of Ownership (TCO)
• Initial capital (CapEx)
• Long-term operational expenditures (OpEx)
•
Power Usage Effectiveness (PUE)
• Power efficiency performance indicator
Facility basics
Facility – challenges
•
Optimizing cooling
• Hot Isle, Hot/cold air containment, Free cooling
•
Optimizing power feeding
• Redundancy dilemma
• AC vs. DC
•
Optimizing refresh rate
• 4-year optimal lifespan
Challenges in Modern Data Centers Management, Spring 2015 24
Facility basics
Networks
•
Veins and arteries of the data center
• Play key role in its performance and high-availability
•
Ensuring adequate availability
• Redundancy at layer 2 (data-link)
• Spanning Tree Protocol (SPT) & RSPT
• Per VLAN spanning tree (PVST)
• Multi-Switch Link aggregation (M-LAG)
• Redundancy at layer 3 (IP)
• Virtual Router Redundancy Protocol (VRRP)
Resource management I – III
•
5% resource waste in 10K-server data center can cost $3K/day ($1M /year)
• It is critical to utilize resources efficiently as possible
•
Resource management system (RMS) / Scheduler
1. Accepts requests from the users (millions per-day)
• VMs (Amazon), Map-reduce (Hadoop), Chip simulations (Intel), etc.
2. Queues and prioritizes them (decides which job to execute next)
• Subject to constraints, e.g., ensuring shares, deadlines, etc.
3. Allocates resources and launches the jobs on selected resources
• Various heuristics
Challenges in Modern Data Centers Management, Spring 2015 26
RM Part I-III RM Parts I-III
Resource management I
•
Proportional-share scheduling
• Very common scheduling heuristics used in data centers
• Every entity (VO, project, user) should get its promised share of the resources
•
Challenges
1. How to measure resource consumption ?
• 1-core X 4GB vs. 3-cores X 1GB
2. How to ensure fast ramp-up
• Limits, logical and physical buffers
3. Considering history
• Is this really important?
Resource management II
•
Matching the jobs with available resources
• Best-fit, worst-fit, first-fit, random, mix-fit, dynamic-programming, etc.
•
Challenges
1. Optimizing resource matching
• Single vs. multiple dimensions
• One job at a time vs. multiple jobs (look-ahead)
2. Dealing with jobs that cannot be scheduled
• Reservation (backfilling)
Challenges in Modern Data Centers Management, Spring 2015 28
Resource management III
•
Going global (meta-scheduling)
• Ensuring QoS, Load balancing
•
Practical considerations
• Scalability, Robustness, Usability
Data Access
•
Jobs (VMs, map-reduce, simulations) use data
• Huge burden on the storage (DoS attacks)
•
Challenges
1. Avoiding DoS within a data center
• Storage-side: Scale-out storage, Parallel NFS, etc.
• Client-side: cacheFS, CaMA (RO)
2. Enabling remote data access (going global)
• Synchronous and asynchronous replications
• Site-level caching, etc.
•
Continuous Integration (CI) use case
• Know your workload…
Challenges in Modern Data Centers Management, Spring 2015 30
Our course cont.
IntroductionFacility basics Networks RM Part I RM Part II RM Part III Data access Business Intelligence Predictive Analytics DC visit Security Part IBusiness Intelligence (BI)
•
Goal is to provide insights on the data center to help optimize its operation
• E.g., statistics on resource usage to help deiced which equipment to buy
•
Involves collecting, preparing, storing,
analyzing
, and accessing the data
• Challenges in each layer e.g., impact on source system, responsiveness, etc.
•
Focusing on
data analysis
• Optimizing data queries (SQL)
•
Join-sort-aggregate implementations
• How to assemble them optimally using time and space considerations
Challenges in Modern Data Centers Management, Spring 2015 32
Predictive analytics
•
One of the important usages for BI in the data center
• Help systems e.g., job scheduler, take data-driven actions in real time
•
Deep dive into one such use-case
• Predicting jobs resource usage for optimizing resource allocation
•
Data-Stream Mining (DSM)
• Continuous (endless) rapid incoming data
• Machine-learning must be applied online
Predictive analytics – challenges
•
Performance (real-time)
• Impossible to store all data (train) – each sample must be processed once
•
Adaptability
• Non-stationary data – model must be adaptable (sliding windows)
•
Quality, availability
• Perform at least as good as “no-stream” models
• Prediction must be provided continuously
•
Cover well known algorithms
• Regression trees, Decision trees, etc.
• Multiple Sliding Windows (MSW)
• Mimran & Even, 2013
Challenges in Modern Data Centers Management, Spring 2015 34
Security
•
Securing the data center is complex
• Security breach can cause real money, reputation, IP loss and legal actions
•
Security control model helps organize things
• Divide the data center into layers
• For each layer describe its attack vectors, vulnerabilities and controls
•
Challenges in two layers
1. Applications
• Web applications
• Code injections, e.g., SQL injection
• Web manipulations e.g., ClickJacking
• Web services
2. Identities Access Management (IAM)
• Managing multiple identities (SAML)
• Authentication
• Knowledge factors: Passwords, Kerberos
Visit to IDC data center (3/6)
Challenges in Modern Data Centers Management, Spring 2015 36
Summary
•
Course is unique
• Covers actual challenges encountered in real environments
• Delivered by domain experts with
huge
experience in designing and deploying solutions•
Interaction is important
• Don’t hesitate to ask (tough) questions