• No results found

Defining and Improving IT Utilisation Efficiency through Holistic Data Centre Monitoring

N/A
N/A
Protected

Academic year: 2021

Share "Defining and Improving IT Utilisation Efficiency through Holistic Data Centre Monitoring"

Copied!
27
0
0

Loading.... (view fulltext now)

Full text

(1)

Defining and Improving

IT Utilisation Efficiency through

Holistic Data Centre Monitoring

Michael Rudgyard CEO

(2)
(3)

• A spin-out company of a well-established UK SI

• Technology was developed for High Performance Computing

– Management of HPC resources needs to be ‘system-wide’

– Scalability (of both the architecture and the GUI) is paramount

• New company formed in March 2010

– Took on the product IP and existing HPC customer base – Notable investment from the UK Carbon Trust

• Currently in ‘semi-stealth’ mode

– Have developed new features for the Data Centre market

(4)

How Efficient is your

Data Centre ?

(5)

• Most new data centres are being designed against PUE targets

– For a given IT hardware capacity, PUE is a good planning metric – However, it is usually a poor operational metric

• Most importantly: what if the servers are not doing any useful

work ??

– The data centre may still have a ‘good’ PUE, but it would be very inefficient by any business metric

• We really need a measure of IT Usage Effectiveness

– ie. how effective the power is being used to deliver necessary IT

(6)

• Unlike PUE, the concept of ITUE encompasses a family of

performance metrics

• Some metrics may provide useful generic ITUE measurement

– MIPS/watt or CPU Utilisation/watt (for compute bound tasks) – IOPS/watt or Bytes/watt (when I/O is predominant)

• Some end-users may be interested in application-related metrics:

– Database transactions/watt

– Page refresh/watt – Search/watt

• Some may be business related:

– £s of products sold / watt; or £s of products sold / integrated IT cost

(7)

• With few exceptions, the most successful methodology for

improving energy conservation across all sectors is:

– Step 1: Identify who/what is responsible for significant energy waste – Step 2: Drive behaviour to ‘encourage’ change

• What is the implication for the Data Centre ?

• Need to monitor and report ITUE metrics by customer,

department or end-user

– Who or what applications/service are the worst offenders ?

– Management can use data to drive better practice (charge-back ?)

(8)
(9)

• Efficient DCs will need to monitor & manage both IT and Facilities

systems in a coherent manner:

– Environmental systems (temperature, humidity, air-conditioning..) – Power (at the distribution board, rack PDU and server PSU level …) – IT equipment (using standard protocols such as IPMI and SNMP…) – Operating systems & Virtual Machines (integrating with IT systems) – ..and perhaps applications themselves

• Software tools will need to integrate with multiple systems from

multiple vendors (both hardware and software) in an agnostic

manner

(10)

• Optimised environmental management to improve PUE (& ITUE)

• Identification of unused, under-used, inefficient or over-spec’ed IT

equipment

• Using active power management during low utilisation periods

• Dynamic orchestration of virtual machines based on

environmental, power and IT usage constraints

• Non-trivial energy savings through simple changes (20-25%)

• Opportunity for very significant savings in most DCs (25-75%)

(11)
(12)

• Consolidation of Data Centres is already happening

– Driven by economies of scale and the ‘Cloud’

– The trend is only likely to accelerate…

• Conversely, as Data Centres become bigger, energy

management will become even more important

• The winners in the race for the clouds will be those

who can operate the most efficiently …

– .. but few know how efficient they are now !!

(13)

• The largest data centres are owned by a handful of IT

giants:

– Google, Amazon, Microsoft, Yahoo etc…

• These giants are very aware of Data Centre Efficiency

– Some have turned common perceptions on their heads

– Some even design their own servers

– All have developed their own systems and software

(14)

• Imagine a data-centre with 50 -100,000 servers (cf. Google)

– ie. 1,500-3,000 racks and a similar number of PDUs and sensors – and up to (say) 16 VMs per server

• You might want to monitor (derive reports from & orchestrate…)

– 1,500-12,000 environmental sensors

– 20-30 data-points per server (IPMI, Power) = 1-3M points

– 20-100 data-points per OS/VM (eg. SNMP, WMI) = 16-160M points – … as well as user and application data.

• That’s hell of a lot of information !

– But even scaling this back by an order of magnitude presents a challenge for software.

(15)

Things that won’t work:

• Using a ‘single-instance’ software architecture

– Information will need to be processed in a distributed manner

• Putting unrefined data in a standard SQL data-base

– or you’ll need another data-centre to store, process & retrieve the data !

• Expecting simple GUIs (eg. lists and trees) to be effective

– Visualisation becomes a key aspect to usability

(16)

Concurrent Thinking’s

Products

(17)

‘Command & Control’ Architecture

• 1U appliance

• Collates information from concurrentCONTROL devices • Delivers highly polished, Web 2 GUI

 Manage anywhere from mobile, Iphone, PDA, PC etc…

• Built to be a scalable interface

• ‘Zero’ U, low-power appliance

• Monitors data from devices associated with local ‘racks’

 Power control and monitoring, Environmental information

• concurrentCOMMAND provides full system management GUI

• concurrentCONTROL devices act as slaves, and are designed to enable scalable and fault tolerant system monitoring and imaging

(18)

• Monitoring

– Power from power clamps, third party PDUs & PMBus PSUs – Environmental sensors: wired (5V) and wireless (866Mhz)

– Server hardware - IPMI, DCMI and Intel Node Manager support

– SNMP & WMI support for OS and VM monitoring; optional in-band ‘daemon’

• Reporting

– Power charge-back and ITUE metrics by group/customer/user/application – Scalable, ‘real-time’ data-centre views

– Extensive reporting of historical data

– Breach monitoring and reporting; Event data-base and visualisation

• Management

– Data Centre Inventory

– Power management (PDU and IPMI support) – Event scheduling

– Serial-over-LAN & SSH terminals

(19)
(20)

Visualisation of real-time metrics - data centre

view

(21)
(22)
(23)
(24)
(25)
(26)

Hardware repository: PDU association to

servers

(27)

• Integration with third party VMs (VMWare, KVM, Hyper-V ..)

– Dynamic orchestration of virtual machines

• Support for multiple sites

References

Related documents

• Liebert MPH ™ - Managed Rack PDUs with single-phase or three-phase capability for fixed power input/output with remote management support; one-piece unit with monitoring at

Emerson Network Power provides innovative solutions and expertise in areas including AC and DC power and precision cooling systems, embedded computing and power, integrated racks

Plus, they integrate rack level power and environmental monitoring information from the rack PDUs with higher level data center management software provided by Emerson or

Plus they integrate rack level power and environmental monitoring information from the rack PDUs with higher level data center management software provided by Emerson or third

Raritan’s PX intelligent rack PDU series offers more than just power distribution -- it’s a launch pad for real-time remote power monitoring, environmental sensors, data

Install metering equipment capable of measuring the total energy use of the data centre, including all power conditioning, distribution and cooling systems. Again, this should

Efficiency of power and cooling systems will be affected by the placement of server racks, air distribution units and power distribution equipment. The layout of the room

household in city, the low, medium and high income groups are distributed across the four residential zones (core, intermediate, sub-urban and planned