• No results found

Case study: End-to-end data centre infrastructure management

N/A
N/A
Protected

Academic year: 2021

Share "Case study: End-to-end data centre infrastructure management"

Copied!
8
0
0

Loading.... (view fulltext now)

Full text

(1)

Part 1: Cooling

Situation: A leading public sector organisation suspected that their air conditioning units were not cooling the data centre efficiently. Consequently, excessive power was being consumed and a number of servers were potentially at risk of overheating very quickly should more serious problems with the cooling system arise.

Solution: Concurrent COMMAND provided detailed energy monitoring and a visual temperature profile of the room. This allowed for the impact of various configurations of the air conditioning units and air flow conditions to be assessed and an optimal profile to be determined. Once the cooling system was functioning more effectively, the ambient temperature of the data centre was increased to deliver further reductions in power consumption and CO2 emissions, while ensuring that mission-critical IT systems were maintained within thermal bounds.

Part 2: IT utilisation

Situation: A leading public sector organisation identified the need to review their server estate with a view to optimising usage and potential system consolidation.

Solution: Concurrent COMMAND was used to determine the extent to which the estate was being used efficiently and which servers could be retired or converted to virtual machines in order to reduce energy consumption.

Moderate CPU usage

(2)

Part 1: Cooling

A leading public sector organisation suspected the cooling in their data centre could work significantly more effectively.

Initial investigations confirmed that the three air conditioning units in the data centre, while working hard and consuming significant amounts of power, were not working in an efficient manner.

The client suspected that there were multiple hot and cold spots in the data centre due to poor air-flow management and were concerned that this may pose a risk to the health of their server estate.

They realised they needed a Data Centre Infrastructure Management (DCIM) tool to help them mitigate the risk of fluctuating temperatures within the data centre, as well as to deliver meaningful energy and cost savings.

The first step was to provide accurate figures on the energy use across the data centre. This was achieved by using branch circuit

monitoring hardware and Concurrent COMMAND’s energy management capabilities.

The result was detailed information on the energy use of individual air conditioning units. Combined with temperature information and air-flow measurements, this information allowed the university to understand their operating efficiency. The instantaneous power drawn by each rack of IT equipment was also monitored.

The next task was to develop a temperature profile of the room to identify any hot or cold spots. A hundred of Concurrent Thinking’s low-cost, one-wire temperature sensors were daisy-chained using cat 5 cable along the front and back of the racks, which were then monitored continuously. Concurrent COMMAND also collected data from those computer systems that reported their own inlet and CPU temperatures over the SNMP and IPMI protocols. The result was a detailed

visualisation of the temperature conditions across the room. Various tests were undertaken and changes made in order to find the optimum configuration of the air conditioning units as well as to maximise air flow beneath the raised floor. With constant feedback from Concurrent COMMAND, the ambient air temperature was raised from 20˚C to 24˚C in steps of 1˚C. As the temperature was adjusted, the impact on the servers and the room as a whole was monitored in real time until an optimal temperature profile was found.

Baseline power

monitoring

Key facts

• 70m2 floor space • 38 standard racks • Approximately 170 servers

• 3 air conditioning units

A visual

temperature

profile

(3)

20% power

saving

Increased IT

resilience

Not only does detailed temperature monitoring result in a meaningful energy, cost and CO2 reduction, but the client’s data centre and IT managers continue to benefit from timely warnings of any potentially critical events that relate to their cooling systems. With continuous monitoring and automated alarms, modest variations in both cooling efficiency and local temperature can now be identified and resolved 20% power reduction for air

conditioning units shown in chart

Average room temperature over time shown in graph

and cold spots, while stabilising the temperature profile of the room as a whole. Previously, the room temperature had been found to vary continuously as the air conditioning units competed against each other, so adding to the drain on power consumption.

(4)

Part 1: Cooling

Conclusion

The use of Concurrent COMMAND allowed the public sector

organisation to confirm that their data centre cooling systems were not functioning optimally.

Concurrent COMMAND provided them with the detailed, visual information that they needed to make significant improvements, with the on-going knowledge that their data centre continues to operate efficiently.

The results were a safe increase in the overall temperature profile of the

room, with corresponding energy, CO2 and cost savings.

Temperature distribution in the data centre

(5)

Like most data centres that have grown naturally over time, the client suspected their IT systems were not operating in a very efficient manner. As a public sector organisation, they are under considerable pressures to reduce both OPEX costs and CO2 emissions, so optimising energy use within their data centres was a high priority.

They decided to review server utilisation and consider consolidation strategies that could drive down energy costs and reduce the need for space and cooling infrastructure. To do this, they used Concurrent COMMAND, a robust Data Centre Infrastructure Management (DCIM) tool that is able to analyse the effectiveness of their servers in detail. Concurrent COMMAND uses protocols such as ModBus, SNMP and IPMI in order to monitor power usage at the distribution board, rack PDU and server level – or indeed wherever hardware support for remote monitoring is available. It can also use SNMP and WMI protocols in order to obtain detailed information such as CPU, network and I/O usage by interrogating the operating system itself.

Data can be manipulated and presented in multiple ways using dashboard widgets, data centre plan and rack views, and historical graphs; both for individual devices and groups of devices. This intuitive GUI allows the user to obtain high-level management information and then drill down to obtain the detailed and highly granular technical information that is often needed to make operational decisions. Concurrent COMMAND was used to monitor the CPU load of over 80 Windows and Linux systems systems, each a potential target for consolidation, over the course of a week. This was replicated three times, both during and outside term time, to ensure the findings were consistent. Underperforming systems were categorised by moderate CPU usage, low-to-moderate CPU usage and low CPU usage.

A powerful DCIM

toolkit

Key facts

• 70m2 floor space • 38 standard racks • Over 170 servers

Monitoring

system utilisation

CPU usage over 24 hours by category

Moderate CPU usage

Low CPU usage

Low-to-Moderate CPU usage

(6)

Part 2: IT utilisation

While the use of simple metrics such as CPU usage, CPU usage per watt, or CPU usage per £ of energy are useful indicators, they do not tell the whole story.

In particular, CPUs vary enormously in terms of application performance: a three year old CPU is likely to be significantly less efficient than a state-of-the-art CPU and a modern server may have four or eight times as many CPU cores. For this reason, it is useful to combine information about particular servers from Concurrent COMMAND’s built-in asset database in order to make more meaningful comparisons.

In this study, publicly available benchmarks figures were used and assigned to groups of servers of a particular type and manufacturer within the asset database. Normalised CPU usage metrics were then compared, surprisingly demonstrating that the total combined load of the servers within the low and low-to-moderate usage categories equated to just 1.7 time the peak performance of a modern CPU core and yet they were consuming 3.9kW of energy.

Asset

management and

comprehensive

system

monitoring

Weekly usage for a server in the low usage category

The data provided by Concurrent COMMAND showed that servers in the low and low-to-moderate usage categories were still consuming large amounts of power, but performing very little useful work. The client was also able to review system activity data and identify trends on a weekly and daily basis. For example, in the low usage group, a peak period of daily activity was identified but beyond this many of the servers were virtually idle.

Detailed, easy to

access data

(7)

The client was able to use the information collated by Concurrent COMMAND to make decisions on how to best optimise their data centre assets.

Additional investigations into the role and workload of each machine are required before any action is taken, including their potential roles in fail-over and disaster recovery. However, through the use of Concurrent COMMAND, it is now clear that many of the servers in the low and low-moderate usage categories could be converted to virtual machines or retired.

Furthermore, with the detailed historical information provided by Concurrent COMMAND, the requirements of individual virtual machines and the servers that will be needed can be accurately scoped.

In a best case scenario, with a combined peak load of less than 10 modern CPU cores and an average of 6 modern CPU cores, it is possible that all the servers in the two low-usage categories could be replaced as virtual machines on a single modern server.

This would significantly reduce the overall power needed to run these services from circa 9.3kW to 0.3kW, saving circa 30% of the power used by all the IT equipment in the data centre.

Such a reduction would also have a knock-on saving with respect to cooling requirements, resulting in a total annual saving of £14,000. When combined with savings made in part 1 to optimise stand-alone cooling costs, the total potential annual saving is nearly £20,000 or 35% of the initial total energy cost.

Informed

decisions

reduce costs

and increase

efficiency

The above graph shows CPU load in blue and power used in red. This information allowed the client to identify major performance spikes, which could most likely be attributed to specific tasks such as a scheduled virus check or system backup.

Daily usage chart of one server

Annual Savings

• Power: 30% reduction

• Costs: £14,000

Total annual savings

• Energy: 35% reduction

(8)

Contact us

To find out more about Concurrent COMMAND or to request a

demo, contact us on [email protected].

Concurrent Thinking’s Data Centre Infrastructure Management (DCIM) product suite, Concurrent COMMAND, saves money by reducing risk, delivering significant operational efficiencies and cutting energy costs. It’s a unique, easy-to-use and modular DCIM solution that allows you to manage all your data centre facility and IT assets within a single framework.

Scalability in DCIM is a fundamental requirement; Concurrent COMMAND inherently manages hundreds of thousands of metrics every 15 seconds. This provides invaluable support to your business as you increase the number of sites, devices, racks, servers and virtual machines that you manage.

It supports your entire existing and future infrastructure and utilises industry standard protocols such as Modbus, SNMP, WMI, IPMI and 1-wire technology, as well as key vendor-specific protocols, such as Intel Node Manager.

It’s vendor neutral and truly customisable, allowing you to customise Concurrent COMMAND to meet your specific needs and to monitor and control virtually all third party devices through an extensible scripting interface.

Our customers measure their return on investment in months rather than years. Typically reported energy savings and improved operational efficiencies are over 20% and often significantly higher.

Our modular licensing approach caters for the DCIM needs of SMEs, corporate data centres, colocation providers and cloud providers alike. It allows you to choose the modules that meet your current budget and requirements while being able to scale up your service as required.

About Concurrent

COMMAND

How is our

DCIM solution

different?

Drive savings of

over 20%

A modular DCIM

solution that

References

Related documents

RaMP Data Center Manager is a complete data center infrastructure management (DCIM) solution that can manage everything from your IT infrastructure (virtual machines,

advanced data center infrastructure management (DCIM) suite that gives you the end-to-end visibility you need to optimize, operate, and manage your data center from rack to row

i) Streamline flow is that flow in which every particle flows along the path of its preceding particle. ii) The path taken by a particle in a flowing fluid is called its line of

Data Center Infrastructure Management (DCIM) Our market is defined as Products (hardware & software) and Services focused on data center management,

End to end data centre infrastructure management software for monitoring and control of power, cooling, security and energy usage from the building through.. >

Because of the resulting thermal bridges, better insulation performances are often achieved in the lower and middle temperature range (up to 300 °C) with pipe sections or

• 24 hour manned security, CCTV & intruder alarms • 24 hour network operations monitoring • All doors secured with biometric readers • Dual zone fire detection

This brings you to today – where your fragmentation symptoms have stopped further IT deployments and you realize that you must be able to visualize cooling and how it relates