Chapter 5. Planning for data center availability
5.2 Data center planning
During the late 1990’s e-commerce boom, several companies specialized in providing data center outsourcing facilities for high-growth startup companies.
These data centers were built around design fundamentals which incorporated redundancy, resiliency, and high security concepts. The selling point for this high tech real-estate was the fact that companies could trust their hardware
operations to a world-class data center specialist. Any organization can adopt the fundamental elements of data center design which created this market for world class data center space.
5.2.1 Space and growth considerations
One of the most challenging elements of data center planning includes space planning. Many organizations fill up space at 2 or 3 times the planned rate of space consumption. Technology density (such as, servers and storage) continues to improve, but space planning should be the first element of design.
Data center space should accommodate 5-10 years of growth, depending on the business type and scale. The cost of building extra space typically pales in comparison to data center relocation costs and risks.
5.2.2 Fundamental elements of data center design
From a structural standpoint, the data center must be designed to withstand many kinds of disruptions, ranging from building evacuations to regional seismic events. The Risk Analysis phase of the Disaster Recovery Planning process, described in 3.2.2, “Disaster Recovery Planning” on page 44, helps
organizations understand the credible risks to the environment. Location of the data center is extremely important too, since building access can be controlled by external entities. This could hinder or complement recovery and security procedures for the data center.
Design fundamentals for a data center components center on scalability, redundancy, and resiliency. As an example, the power infrastructure must provide redundancy and scalability without disruption. Every power management device (transformers, UPS, systems, and so on) must be built with redundancy in mind, just like high availability systems architecture. Some organizations take measures to source power from separate power grids and suppliers, so as to reduce points of infrastructure failure, back to multiple power generation sources.
An example of infrastructure redundancy is shown in Figure 5-1.
Figure 5-1 Example of power systems design for availability
Similar design methods can be applied to other infrastructure components, including cooling, HVAC, halon, fire prevention, networks, telecommunications, and fibre optics. Support in the design or retro-fitting process can be obtained through professional services groups, including the data center planning specialists within IBM Global Services.
5.2.3 Preventative controls
An additional element of data center design includes the use of preventative controls. Redundancy and protection come at a cost that must balance with risks identified in the BIA process. Frequently found preventative controls include:
Uninterruptible power supplies (UPS) to provide short-term backup power to all systems components (including environmental and safety systems)
Petroleum powered generators to provide long-term backup power
Air conditioning systems with adequate excess capacity
Fire suppression systems
Fire and smoke detectors Power Management
Systems
Power Grids
Power Generation Facilities
Backup Generator UPS System Data Center Power Grid
Transformers
Redundant Power Elements Backup Power Elements
Water detectors
Plastic tarpaulins to protect equipment from water damage
Preventative controls must be documented and integrated into the overall DRP.
General awareness of how these measures are used is very important for personnel (for example, appropriate response if a fire alarm sounds), and can be instilled via drills, documentation provided to employees and so on. This will ensure they know what to do in the case of a real disaster.
5.2.4 Onsite parts maintenance locker
A simple and effective way to fortify site redundancy is to initiate a program for onsite parts inventory. Components which frequently fail (network cables, connectors, disk drives, tapes, HBAs, and so on) can be stocked in the data center for fast access in the event of component level failures. Some
organizations compile and analyze component failures for mean time between failure (MTBF) data, to help cost-effectively prepare for these basic kinds of service interruptions.
5.2.5 Data center facilities security
Security controls for data center access are also extremely important. Creating access controls and procedures helps to fortify operations against malicious human behavior. Scenarios range from card-key access doors to armed and fully supervised one person entry/exit chambers, which include visual recognition, armed guards, and two-stage points of clearance. At a minimum, data center access should be controlled and monitored.
Site locations can also be kept virtually private from public awareness by the use of unmarked buildings and data center locations. Some government and energy installations, for instance, limit the number of people who have knowledge of data center locations and access procedures. However paranoid, these measures provide an excellent level of protection and security.
5.2.6 Disaster Recovery Planning and infrastructure assessment
We realize that the majority of organizations rarely have the opportunity to design a data center from the ground up. Best practices for infrastructure and data center design are applied retroactively at best, and in most cases affect only new systems. The DRP process provides incredible value for mapping logical and physical environments and identifying contingencies within the infrastructure. We encourage readers to approach DRP with this frame of reference.