• No results found

Increasing Data Center Resilience While Lowering PUE

N/A
N/A
Protected

Academic year: 2021

Share "Increasing Data Center Resilience While Lowering PUE"

Copied!
29
0
0

Loading.... (view fulltext now)

Full text

(1)

Increasing Data Center Resilience While

Lowering PUE

Nandini Mouli, Ph.D. President/Founder eSai LLC [email protected] www.esai.technology

(2)

Introduction – eSai LLC

• eSai LLC: Is a Disadvantaged woman-owned minority business focused on providing energy management solutions for federal and state government agencies

• Core Competencies: Technologies:

• Technical/Business Feasibility Studies Dynamic Pricing, Demand Energy Audits, Commissioning Response

• Energy Conservation Measures Distributed Energy Services,

Combined Heat and Power

• Evaluation, Validation and Measurement Microgrid Integration

• Utility, Federal and State Grants Building Management

Systems

Experience in consulting and implementing clean energy programs to meet DOE, EPA and FEMP policies and programs.

Currently leading multiple projects to bring resiliency and energy conservation for federal agencies and private corporations

(3)

Topics for Discussion

• What is Resilience?

• Why it is Resilience critical for data centers? • Dynamics of treating resilience

• Challenges to achieving data center resilience • Some tools to achieving the resilience

• What is DCIM?

• How is DCIM a resilience platform for:

• Planning and implementation • Monitoring

• Data Collection

• Dash Board Visualization

• Getting the most out of DCIM tools • Key Take-aways !!

(4)

What is Resilience?

• TechTarget’s Definition of Resilience: “the ability of a server, network, storage system, or an entire data center, to recover quickly and

continue operating even when there has been an equipment failure, power outage or other disruption.”

• In the context of cyber security: “Resilience is the ability of a system to resist illegitimate activity and its ability to effect a speedy

(5)

Why is Resilience critical for Data Center?

• Forrester Research: Resilience is # 2 top priority for Facility Directors: • Carrier availability and density – 82%

• Availability, resilience – 80%

• Control over facility – 78%

• Access to Cloud and other partners – 75% • Lack of resilience is costly:

• IBM Reputational Risk and IT Study: system outage is one of the top two IT risks that can harm an organization’s reputation.

• 91% of data centers have experienced an unplanned data center outage in the past 24 months.

• The average cost per minute of data center downtime has increased 38% from $7,908 in 2013 to $11,000 in 2015

• Organizations which improve from “Laggard” to “Industry Average” levels of downtime can reduce losses ~$3 million/year.

(6)

Dynamics in Treating Resilience

• Achieving resilience

used

to mean redundancy:

• Two (or more) of everything – servers, power supplies, generators, and even whole data centers

• But most of this duplicate equipment was never utilized. • Waste of space and energy = Increased PUE

• Now, the trend: increase resilience sans waste 

selecting

software instead of hardware

• Fault tolerance built right into software

• Improve resilience through load balancing, virtualization, prediction and other techniques.

(7)

Challenges To Achieving Data Center Resilience

Measurement of how vulnerable the data center system is to failure and fixing the potential problems leads to increased uptime;

However,

• Increase in the number of applications to be managed and backed up • Organizations getting larger and more geographically dispersed

• Infrastructural ecosystems are more complex

• Decreasing costs of hardware  encouraging organizations to maintain backup and recovery in house  incompatible with other network

software to mitigate problems • Increasing use of virtualization

• Frequency and intensity of natural disasters Increasing risks

(8)

What Are Some Traditional Ways To Achieving Resilience?

Current Methodologies

Conventional Data Center relies on

manual response plan and Human

teams

Design Failure: Competent design firm,

integration firm, construction companies and commissioning team

Catastrophic Failure: Comprehensive maintenance and operation program

Compounding Failure: Paying more attention to details of each and every possible failure mode

Human-error Failure: Having experienced staff and training all responsible. Continuous

training and execution with pilot/co-pilot approach for operation.

(9)

Modern Tools To Achieving Resilience

• A modern data center needs the II dashboard. Due to the complexity of the operations, IT and Facility management can not rely on just the human component to combat failures occurring from a combination of two or three faults

• IT/Facilty Management have to align themselves in using predictive ways of disaster mitigation  DCIM

(10)

What is Data Center Infrastructure Management - DCIM?

• It is a software platform that helps operators safely manage

the physical infrastructure and controls with higher visibility

and transparency of the IT and the facilities operations and

quick identification and resolution of problems before they

happen

• Maximizes the efficient use of

power, cooling, and space

capacities

now and in the future.

• Two core building blocks:

• Asset Management • Monitoring

(11)

DCIM - A Resiliency Platform: Physical

Infrastructure/ Controls

From Device Level

Monitoring in a traditional data center system to

Context-Aware Monitoring so actions can be

performed to mitigate a risk !!!!

(12)

DCIM-Planning and Implementation Platform

Planning tools and functions:

• Display impact of pending moves on power

capacity and cooling distribution

• Graphical representations of IT equipment

and its location in the rack

• Proactively manage within rack and floor tile

weight limits

• Correlate data between CRAC units, the PDUs,

and the UPSs. The entire chain is monitored.

• Simulate consequences of power and cooling

device failure on IT equipment through “What If?” scenarios

• Generate recommended installation locations

for rack-mount IT equipment. The selection will be based on available power, cooling, space capacity, and network ports

(13)

DCIM – Monitoring and Automation Platform

• Alarming/Notification: DCIM sends out an

alarm from the rack prior to a breaker tripping. Provides operator with the

opportunity to make adjustments before shut-down

• Status: Notes are generated for minimum,

maximum, and average usage over time for that rack and for each rack

• Control: If a rack gets close to an

overcapacity threshold, predictive

simulation can be triggered generated to determine the best way to alleviate the situation.

• Reports and graphs are generated to help

(14)

DCIM – Monitoring and Automation Platform (contd.)

Comparison of Primary and Secondary Functions

Certain DCIM applications will

take certain data center

features as primary or

secondary functions.

Depending on the facility and

need, care must be taken to

select the right ones to

include in the suite of

integrated platform

(15)

DCIM- Data Collection Platform

The data collection subset

represents devices such as meters, power protection devices,

embedded cards, programmable logic controllers (PLCs), sensors and other such devices.

The devices perform the

fundamental function of gathering data and forwarding it to

management software for processing.

(16)

DCIM- Dash Board Platform

Key performance indicators are at the operators’ fingertips with DCIM When will I run out of power and what is the current cooling

capacity?

What is my current server utilization?

Do I have any servers that can be retired and if so what are they? The dashboard is the key

centerpiece for aggregation of

actionable data that can be shared quickly with decision-makers

Sample dashboard collects data across OT subsets and centralizes information anytime, any where and any user interfaces: mobile, laptop, PC

(17)

DCIM- Dash Board Platform (Contd.) –

Another view

(18)

DCIM – Energy and Power Saving Platform

• DCIM provides overview of facility energy use and cost and a complete breakdown of each kW per device

• Cost savings realized from the Servers  Rack Row Room Building and Beyond

(19)
(20)
(21)

DCIM Offer in the Market: Suite and Non-Suite

Providers

(22)
(23)

DCIM Market Trends

• Market is growing

• From $240 million in 2011 to $1.2 Billion in 2016

• Growth in Data Center is very high since facilities and IT meet to think about the business

• Inhibitors to adoption:

• Cost and functionality issues

• Difficulty of creating and maintaining asset databases

• Believe blindly that it is possible to manage data center without software solutions

• Energy Savings from well-managed data centers

• Reduce operating expenses by 20%

(24)

How To Get The Highest Benefit From DCIM?

• There are quite a variety of options. Care must be taken to ensure best fit • Scalable, modular, standardized, pre-engineered, open communication

architecture with a strong vendor support structure

• Agreement between facilities, IT, and management on operating parameters, metrics, and goals for the data center power and cooling systems and their management

• A review of existing processes and comparison to DCIM requirements • New processes should be formally defined and resources committed and

(25)
(26)
(27)

Case Study Conclusions:

Data centers are complex systems, changing constantly over time Monitoring and measurement of capacity is not enough

Much lost capacity can be reclaimed using predictive modeling and state of the art tools with support of DCIM measurements

(28)

Key Take-Aways - DCIM Benefits

DCIM provides higher visibility, more control and improved automation Decision Support and Information Management

• Asset Planning and Implementation

Monitoring, Measuring and Alerting Management and Control

• Fault-tolerant (fail-over)

Software Services

Final outcome: More reliable and efficient data center  higher resilience and decreased PUE.

(29)

THANK YOU !!!

Contact:

Nandini Mouli, Ph.D.

President/Founder

eSai LLC

www.esai.technology

[email protected]

(443) 691 7664

References

Related documents

This model raises four challenges for research in usability evaluation: (a) what forms of design products give the best evaluations; (b) how do we most effectively focus an

As you may recall, last year Evanston voters approved a referendum question for electric aggregation and authorized the city to negotiate electricity supply rates for its residents

Because hypervisors must ultimately run on physical servers, a single virtual host requires the entire gamut of access tools: all out-of-band tools relevant to a physical server,

A Xirrus wireless network simplifies and speeds the deployment of Raritan environmental sensors and intelligent power monitoring, including cost benefits realized through deploying

Belden atlas revealed only to henry carleton bank man turned upon your purchase order to a foreign to provide them, we exempt from the payment.. Effort to protect against the left

The notified conduct is likely to result in benefits to consumers who elect to withdraw money from Coles Express ATMs, as they will have the opportunity to make book

• A non-parametric statistical hypothesis test used when comparing two related samples (paired). • The test is named for Frank Wilcoxon (1892–1965) who, in a single paper,

In any case, this debate is central to the literature on a variety of social control outcomes, including race disparities in arrest (Ousey & Lee, 2008 ), jail