Independent analysis methods for
Data Centers
Max Altmeyer / Marvin Köhler
Bilfinger HSG Facility Management GmbH
Agenda
1. Weak point analysis
2. Energy efficiency analysis
3. Combined FMECA / RAM / ENERGY Analysis
4. Spare parts management
RAM – System modeling
Simulation of system availability and reliability within a defined period of observation
Optimization of maintenance strategy
Optimization of spare parts inventory
Quantitative assessment of optimization measures
SPOF Quick Check - Risk assessment using specific questions
Identification and assessment of weak points using a tailored questionnaire with regard to all relevant systems
Diversion of optimization measures
FME(C)A – Failure mode and effects analysis
Detailed system analysis
Identification of failure modes and investigation of their potential influences Qualitative assessment with assistance of a risk matrix
Qualitative assessment of optimization measures
Identification of weak points
using a modular concept
Lev e l of deta il
The „SPOF Quick Check“ evaluates the technical infrastructure of
Data Centers using a tailored questionnaire
1 Identification of relevant risks using a tailored questionnaire
2 Risk assessment using a coordinated risk matrix
3 Define need for action
4 Create results report
Assessment of
Fire protection system
Air-conditioning/cooling system Electrical system as well as physical security external risks energy efficiency
using a tailored questionnaire and a coordinated risk matrix to identify
Single Points Of Failure and deduce optimization measures.
SPOF Quick check methodology
Risk matrix Pro b a b ili ty o f o c c u re n c e Severity
> 50a 10a < MTBF < 50a 1a < MTBF < 10a < 1 a proactive reactive
<life cycle >= life cycle remaining
redundancy spare parts availability ranking 1 2 3 4
2N 0
N+1 On Site / agreement with
supplier 1 S x O 3 4 5 6 7 8 N set-up time > 12h 2 1 3 4 5 6 7 8 N-1 no availability 3 2 6 8 10 12 14 16 IT Outage 8 3 9 12 15 18 21 24 4 12 16 20 24 28 32 5 15 20 25 30 35 40 6 18 24 30 36 42 48 9 27 36 45 54 63 72 10 30 40 50 60 70 80 11 33 44 55 66 77 88 severity (S) sum O s um S occurrence (O) mean time between failures maintenance strategy
age of the component
FMECA is an qualitative risk assessment
of all system components
FMECA methodology
Risk matrix
FMECA =Failure Mode, Effects and Criticality
Analysis
Assessment of all system components with
regard to
Repair times
Spare parts availability
Redundancy concept
Maintenance strategy
Component age
Failure rate
Failure detection
… and their corresponding criticality to
answer the following questions:
What can fail?
What is the cause of the failure?
What are the effects of the failure?
What can be done in a preventive way?
Subsequently a catalog of measures for the
compensation of critical components will be prepared.
1 Breaking down the system into its components
2 Identification and assessment of potential failures
3 Inclusion of the operational employees
4 Define need for action
5 Create results report
RAM is an quantitative methodology to calculate the
reliability, availability and maintainability of a system
RAM methodology
RAM =
Reliability, Availability and Maintainability
The RAM analysis is using a realistic system
image (model) to identify reliability parameters like
System availability and
Number of system failures via a Monte-Carlo-Simulation.
Process of a RAM analysis:
Modeling the DC – Mapping of all components as blocks within a Reliability
Block Diagram (RBD)
Definition of failure models for each block – including fault rate, repair times, maintenance activities etc.
Model simulation via Isograph Availability Workbench ©
Excerpt of a RBD
1 As is - System model based on the FMECA
2 Parameterization and simulation of the as is - model based on the FMECA 3 Definition, modeling and simulation of optimization measures
4 Comparison of measures in regard to availability and reliability
Agenda
1. Weak point analysis
2. Energy efficiency analysis
3. Combined FMECA / RAM / ENERGY analysis
4. Spare parts management
Specific task in Data Centers:
Efficiency increase and therewith an improvement of PUE and other performance figures
An energy efficiency analysis tailored to data centers provides a structured identification of energy potentials based on the DC-specific Bilfinger Best Practice for energy efficiency The analysis is following the energy
flow in the data center:
Starting at the grid connection via
transformers, emergency power systems, UPS, PDU to the server and from there via the CRAC unit, the piping system, the pumps, the heat
exchangers, the coolers, the chillers
or other heat sinks to the heat dissipation into the environment
Our approach towards a better
energy efficiency in a Data Center
Bilfinger Best Practice for energy efficiency
in a Data Center
UPS-System
Highly efficient systems
Graduation / Shutdown
Use of modular systems
Alternative energy storage
In the server room
Cover plates
Raised floor sealing
Rack orientation
Hot and cold aisle containment
Management of perforated plates
Air cooling units
Retrofit of FC-controlled / EC fans
Increase of temperature difference (supply / return air)
Shutdown of excessively redundant plant
Air flow optimization
Self actuating flaps
Outside the server room
Optimize the cooling medium temperature
Extension of free cooling
Efficiency increase at partial-load operation
Frequency converter or EC technology for
actuators
Alternative heat sinks
Thermal energy storage
Subsequent use of waste heat
General electrical supply
On site power generation
CHP unit with absorption chillers
Agenda
1. Weak point analysis
2. Energy efficiency analysis
3. Combined FMECA / RAM / ENERGY analysis
4. Spare parts management
Identification of energy potentials
ROI calculation for identified measures
Detailed report illustrating the results
System modeling and calculation of current availability and reliability using the Monte-Carlo-Simulation.
Identification of existing risks
Identification of critical components / SPOFs
Employee training
The FMECA / RAM / ENERGY analysis considers availability and energy
efficiency to find the optimized solutions for your data center
Methodology
Availability analysis
Energy efficiency analysis
Measures
1
2
4
5
6
3
2
1
7
8
3
4
5
6
7
Measures Actual availability FM E CA RA MModeling of measures which effect availability to quantify their impact on availability and reliability. Availability with measures in place
1
4
2
8
2
3
7
Modeling and simulation of the most promising measures to find an optimalcombination
Actual energy consumption
RA
M
Modeling of energy efficiency measures to quantify their impact on availability and reliability.
Energy consumption with energy measures in place Availability with energy measures in place Target: Availability increase Target: Reduction of energy consumption E NERGY
Added value for all involved parties
Optimized operation of a Data Center
Consideration of technical plant with a view on reliability and energy efficiency Energy saving potentials additionally checked regarding the availability influences Quantification of the energy
efficiency of existing technical plant
optimized parameterization and operating mode of plant Knowledge transfer in terms
of legal requirements
Quantification of reliability and
availability
Information about the
criticality of all components
Improvement of the maintenance management
Data
Center
Operator
Energy
efficiency
analysis
FMECA /
RAM
analysis
Agenda
1. Weak point analysis
2. Energy efficiency analysis
3. Combined FMECA / RAM / ENERGY analysis
4. Spare parts management
Comp. 1
An efficient spare part management guarantees a minimum of
total cost of ownership
Plant 2 Plant 1 Plant 3 Plant 1 Comp. 2 Comp. 3 Comp. 2 Identification of critical plant Identification of critical components Excerpt of a RBD Kopt nopt Setup times Repair times Storage costs Downtime costs Maintenance costs Hazard rate Observation time
Weak point analysis Spare parts analysis
1
3
4
2
T CO Spare parts 1 System - FMECA to identify critical plant2 Plant - FMECA to identify critical components of the previously identified plant 3 Modeling of the critical plant and parameterization using the FMECA
4 Comparison of different spare parts concepts to achieve a minimum TCO
Contact
Bilfinger HSG Facility Management GmbH Max Altmeyer An der Gehespitz 50 63263 Neu-Isenburg Germany