Future Facilities Ltd.
Predictive Data Center
Infrastructure
Management (DCIM)
A toolset to maximize the return on capital expenditure in the physical data centerTable of Contents
Executive Summary – The Elephant in the Room ... 2
Data center capacity Fragmentation ... 3
Cooling Fragmentation - Just Because You Don’t See It Doesn’t Mean It’s Not There ... 5
You Can’t Manage if You Can’t Predict Cause and Effect ... 7
The Way Forward: Predictive DCIM with the Virtual Facility ... 8
Executive Summary – The Elephant in the Room
Data center capacity is the amount of IT equipment that is intended to be loaded in the data center and is typically expressed in terms of kW/sqft or kW/cabinet. This specification is derived from projections from the business units for the amount of computing capacity required over the long term and is the primary cost driver for the construction of the data center facility. But, most data centers never achieve the capacity they were designed for. This is a financial disaster or owner/operators because a significant percentage of the original capital expenditure is wasted and additional capital must be spent years sooner than planned. The cost of this lost capacity dwarfs all other financial considerations for a data center owner/operator.
Often 30% or more of data center capacity is lost in operation.[1] On a global scale, out of 15.5 GW[2] of available data center capacity, a minimum of 4.65 GW is unusable. At industry averages, this amounts to about 31 million square feet of wasted data center floor space and $70B of unrealized capital expense in the data center. The losses are staggering.
Gartner has found that “70% of data center facilities have failed to meet their capacity requirements without some level of renovation, expansion or relocation.”[3] In an Uptime Institute survey of 21,000 data center operators, 54% of respondents ranked “data center capacity” as the driver for their long term approach to data center energy efficiency[4] outweighing by far all other drivers.
Given the stakes, why isn’t much being said about lost capacity? Because these losses are due to fragmentation of infrastructure resources – space, power, cooling and networking – that build slowly and unperceptively early in the data center life span. As resources fragment, the data center becomes less and less able to support the full, intended IT load. Only well into the operational life of the facility, when the margin on capacity has closed is the problem discovered. Lack of visibility and the delay between cause and detection conceal the elephant in the room: Lost Capacity.
Data Center Capacity Fragmentation
Fragmentation occurs because a typical owner/operator will almost certainly break the original IT load assumptions made during the design phase of the facility. One example is that owner/operators will assume power density values for the IT equipment that will be installed in the future. Inevitably, after installation the owner/operator discovers that the assumptions were wrong. This error will fragment space or power depending on if the estimates were too high or low. The end result is lost data center capacity.
A second common example is that data center designers and owner/operators will assume generic cooling and physical characteristics for the cabinets and IT equipment that will be installed in the future. These assumptions may yield a successful configuration during the design phase, but quickly break down as real IT devices and cabinets are introduced into the data center. For example, while the power draw of a blade server might be the same as standard server hardware, the cooling (airflow) utilization will be substantially different. This and other seemingly minor differences between the design assumptions and the actual IT technology, cabinets and IT layouts will fragment the cooling distribution, often dramatically. The result is lost data center capacity.
Data center capacity plot for a data center that is subjected to an idealized IT configuration
Data center capacity plot for the same data center subjected to an actual IT configuration
Data center capacity comparison between a typical, idealized IT configuration and a specific IT configuration made up of a mixed set of vendor supplied cabinets and IT equipment.
To better understand fragmentation, consider a computer hard drive. Given that you pay per unit of storage, your goal is to fully utilize the capacity you have before buying more. However, hard drive capacity will fragment incrementally as you load and delete programs and files. The amount of
fragmentation that occurs depends on how the hard drive is used. Eventually, a point is reached at which the remaining available capacity is too fragmented to be of use. Only with defragmentation tools can you reclaim what has been lost and fully realize your investment in the device.
Fragmentation of hard drive storage capacity
The concept of resource fragmentation also applies to the data center. Data center capacity, like hard drive storage capacity, will fragment through use, at a rate that depends on how it is used. The simple answer to fully realizing the capacity potential of the data center is continuous defragmentation. This, however, is where the similarities between hard drives and data centers end.
The first difference is that hard drive capacity is defined by space only while data center capacity is defined by the combination of space, power, cooling and networking. This makes defragmentation of data center capacity significantly more complicated as it requires coordinated management of four
data center resources that are traditionally managed independently.
The second difference is that unlike hard drive capacity, data center capacity cannot be tracked by traditional means. This is because cooling – a component of capacity – is dependent on airflow that is invisible and impractical to monitor with sensors. Cooling problems therefore can be addressed only if airflow is made “visible”. A simulation technique called computational fluid dynamics (CFD) is the only way to make airflow visible. Therefore, the only means to defragment data center capacity that has
been affected by cooling problems is through the use of CFD simulation.
loss, fragmentation issues must predicted and addressed before IT deployments are physically
implemented.
These differences have a significant impact on the techniques required to protect against data center capacity fragmentation. Let’s take a closer look at a common fragmentation problem.
Cooling Fragmentation - Just Because You Don’t See It Doesn’t Mean It’s Not
There
Early in the life of a data center, capacity management seems straightforward. Cabinets are loaded with IT equipment and placed. Space and power are widely available and utilization is tracked with spread sheets and DCIM tools. IT load growth and capacity utilization seems to be progressing as planned and fragmentation is the furthest issue from your mind.
Later, as your data center approaches 50% of full load, fragmentation problems start to appear. With careful tracking, your teams have been able to keep space and power fragmentation under control – just barely. However, your temperature monitoring system is registering hot spots that are forming. This of course shouldn’t be happening. You have followed the deployment guidelines and best practices, and on paper have plenty of space, power and cooling capacity left!
Simple tracking of capacity-used overlooks the problem of capacity fragmentation
In reality, cooling has been fragmenting from day one because your IT configuration has been evolving away from the original design configuration. As a consequence, the cooling distribution (airflow) within the room and inside of the cabinets has been changing relative to the original design intent and separating from the space and power distributions.
Physical characteristics of IT equipment and cabinets are idealized for design
Actual physical characteristics of IT configurations are non-ideal
The physical characteristics of design IT loads and cabinets are always different from the actual IT loads. This is a common cause of cooling fragmentation and lost data center capacity.
The total amount of cooling hasn’t changed, but its distribution relative to the IT configuration has. This in turn has reduced data cente capacity, because capacity is set by the least available infrastructure resource.
Data center capacity and cooling distribution for an idealized design IT configuration
Data center capacity for the actual IT configuration that the design was intended to represent
Cooling distribution and data center capacity are determined by the physical characteristics of the IT configuration. This effect is can be seen only with a simulation model.
Capacity lost Capacity available
Now if cooling (airflow) was visible like space and power, you would have seen it fragmenting as the IT evolved. You would also have seen that best practices and deployment guidelines are effective under specific conditions only, not universally applicable as many in industry believe.
This brings you to today – where your fragmentation symptoms have stopped further IT deployments and you realize that you must be able to visualize cooling and how it relates to space, power, to protect and optimize data center capacity.
You’re stuck at 60% of full capacity, and have lost 40% somewhere inside this data center. And you are not alone.
You Can’t Manage if You Can’t Predict Cause and Effect
In order to protect and optimize data center capacity, the ability to simulate and visualize the fragmentation of infrastructure resources before IT deployments are made is required. This predictive capability brings to light fragmentation problems that would otherwise be discovered after the IT equipment is deployed – when risk of IT service downtime prevents them from being addressed. Simulation also makes the connection between decisions made today and available data center capacity over the long term. This foresight prevents the owner/operator from accumulating fragmentation problems for years before the symptoms become severe enough to detect with a monitoring system. Today, simulation is a common practice in data center design to validate designs and enable innovative energy saving design concepts. Given that the “design” will change many times during the lifespan of the data center, doesn’t it make sense to re-validate each change to ensure the original data center capacity specifications still hold?
The Way Forward: Predictive DCIM with the Virtual Facility
To this end, the Virtual Facility from Future Facilities is a predictive DCIM solution to identify and avoid the conditions that lead to lost data center capacity and IT service interruptions. The Virtual Facility is a dynamic, 3-dimensional, predictive model of the physical and logical data center. The inputs to the Virtual Facility model are:
data collected by the various real-time monitoring components of traditional DCIM solutions such as power consumption, IT asset tracking and environmental data
the continuously changing IT and facilities roadmap of future changes
Upon consolidation of the input data, simulations are performed to predict the short term and long term consequences of future IT deployment decisions such that rack and cooling resources are maximized.
The predictive capability of the Virtual Facility enables the owner/operator to weigh the short and long-term cost and performance consequences of IT deployments, facility upgrades and energy savings programs prior to implementing changes.
Increase IT service availability by addressing redundancy problems and environmental risks before changes are implemented in the data center
Make better operational decisions that extend the lifespan of the data center facility and offset the need for costly infrastructure upgrades
For more information, please visit www.futurefacilities.com
List of references:
[1] Future Facilities’ Data Center Lost Capacity Client Survey
[2] Global Data Center Energy Demand Forecasting - DataCenter Dynamics - September 2011