Determine what should be done if a suitable avoidance action is not practical This may be a temporary measure used while a more permanent solution is found and implemented

R ELIABILITY C ENTERED M AINTENANCE (RCM)

7. Determine what should be done if a suitable avoidance action is not practical This may be a temporary measure used while a more permanent solution is found and implemented

The classical RCM analysis process is illustrated schematically in Figure 5.10. The process blends history, risk analysis, and economic considerations with actual condition to estimate the risk and consequences of failure and identify optimal decisions.

Figure 5.10 Classical RCM Analysis Process

Professor Andrew K.S. Jardine, PhD., in various papers and presentations, recommends using Proportional Hazards Modeling, a sophisticated multi-variate regression analysis procedure to optimize the content of a maintenance program. (20)

A NASA version of RCM seeks to develop the optimum synthesis of corrective, preventive, condition based, and proactive maintenance strategies to provide required reliability at least cost.

An RCM strategy being pursued by the US Navy asks the following questions:

 Has a failure occurred? If not, could it occur? (This is the risk assessment step mentioned in an

earlier paragraph)

 Is the avoidance task applicable?

 Is the task worth doing—does it increase safety, reduce the risk of mission failure, and pay for

itself?

RCM requires the knowledge and experience of a cross-functional analysis team led by a skilled facilitator or analyst. The facilitated team (team meetings) approach to analysis has been proven to work very well but is not the only way in which a valid result using the RCM methodology can be achieved. A skilled analyst can extract sufficient information from documentation and key subject matter expert interviews and validate the integrated results very effectively. Knowledge of current process performance, state of compliance, equipment history, and variability over time are valuable inputs to the RCM analysis. RCM analysis should also use (if it is available) resident information from a Computerized Maintenance Management System (CMMS), including PM routines and findings, field notes of conditions found and components affected, as well as observations on every piece of equipment to be addressed.(18)

Some of the elements that are needed or are highly desirable, if available, to assure a successful RCM outcome includes:(124)

 Knowledge of RCM methodology and documentation techniques and tools  Organizational management support for the entire project

 Availability of the best and brightest cognizant plant personnel for the project  Trained and disciplined RCM facilitators (or analysts)

 Cross-functional detailed equipment analyses

 Interaction with manufacturing Failure Modes, Effects and Criticality Analyses (FMECAs), if

available

 Defined plans prepared (in advance of analysis) to implement recommendations  Manpower and budgetary support for implementation

 Operator involvement with the project in as many ways as possible  Periodic review of project progress and resulting benefits

RCM Projects often fail because the results of analysis, while perfectly valid, are never implemented. There are many reasons for this including the two most important ones:

1. Support (budget, manpower, management interest) for implementation not provided or not provided soon enough to assure project continuity

2. No “buy-in” by personnel not involved with the RCM project analysis, but whose support is needed at time of implementation

So, an RCM project should be looked at from the start in its entirety to give it the best chance of success, as illustrated in Figure 5.11 following.

Figure 5.11

RCM Project Process Using Classical RCM Analysis and Including Task Comparison, Consolidation into an Optimum Plan, Implementation and Establishment of a Living Program and

Establishment of Links to Other Asset Management Elements

Selection of tasks is often a stumbling point in an RCM project, especially if participants are not familiar with modern condition monitoring and predictive maintenance technologies. The tendency in such cases is to select time directed tasks, even though there is no known or knowable basis for their periodicity. One of the ways to arrive at the optimum maintenance plan for an asset is to follow the logic of the task selection process depicted in Figure 5.12. The logic provides for considering on-condition based task, then time directed (repair or discard and replace) tasks. All are evaluated on the basis of “cost effectiveness” with the task that is most cost effective being the one favored over all others. More often than not on-condition tasks will be the most cost effective, since no action other than monitoring is required until the condition directs a repair action.

Perform the scheduled restoration task

at less than the age limit if most cost effective Perform the

on-condition task at less than the warning interval if most cost effective

Accomplish the scheduled replacement task at intervals less than the age limit

if most cost effective Is an

on-condition task technically feasible

and worth doing?

Is a scheduled discard (replacement) task technically feasible

and worth doing? Is a

scheduled restoration (PM) task

technically feasible and worth doing?

Can you repair and restore performance and will this

reduce failure rate?

Can you replace the item and will this reduce

failure rate? Can you effectively detect symptoms of a gradual loss of function? NO NO NO NO NO Run-to-failure Action depends on consequences YES YES YES YES YES YES Perform the

scheduled restoration task at less than the age limit

if most cost effective Perform the

on-condition task at less than the warning interval if most cost effective

Accomplish the scheduled replacement task at intervals less than the age limit

if most cost effective Is an

on-condition task technically feasible

and worth doing?

Is a scheduled discard (replacement) task technically feasible

and worth doing? Is a

scheduled restoration (PM) task

technically feasible and worth doing?

Can you repair and restore performance and will this

reduce failure rate?

Can you replace the item and will this reduce

Figure 5.12 Task Selection Logic to Arrive at the Optimum Plan for Maintenance

If no applicable and cost effective task can be found, the only immediate option may be run-to-failure. If the failure mode has safety or severe economic consequences, a re-design may be required to eliminate or mitigate the consequences of failure. Here, too, on-condition options should be considered. Often, a low cost modification to a component or system providing for installation of some monitoring instrumentation can give early indication of the onset of degradation well in advance of a condition that constitutes functional failure. This makes the failure mode manageable.

The rationale for adopting this approach is based on the idea that the less intrusive a maintenance program is the more reliably an asset will perform. Maintenance often creates the basis for failure, if not done correctly. As described in the following paragraphs, it was concluded by early study groups that “infant failure,” – failures shortly after manufacture, maintenance or installation was performed – was the dominant cause in commercial aircraft. Observers in other venues have reached the same conclusion, albeit without the statistical proof comparable to that from commercial aircraft fleets.

RCM Principles Applied to the Selection of Condition Measurement Technology and Condition Based Maintenance

A paper presented to the American Society of Naval Engineers (ASNE) described how the U.S. Navy applies RCM to select of appropriate tasks and enabling technologies.(114)_{Three of the major points from} this paper are summarized as follows:

1. RCM’s rules are based on a realistic analysis of the failure process. The process of equipment deterioration is one element that is crucial to the appropriate selection of condition monitoring methods.

2. The effects of a failure are not always important enough to justify preventive action. For example, some systems are designed with redundant units, so that system functions will not fail when a single unit fails. If the failed unit can be identified quickly, repaired promptly and cheaply, devoting resources to prevent every failure may not be worthwhile.

3. When the effects of the failure are important enough to justify preventive efforts, the challenge is to predict failures with sufficient accuracy and precision to support scheduling appropriate repairs before the failure occurs. There are several ways to make such a prediction. One way is to base the decision on the item’s age. Before the days of RCM and CBM, this was often the only method

used. However, age-based repair work has rightly lost favor as RCM and CBM have gained popularity, primarily because it tends to be appropriate (in RCM terms, “applicable”) for less than 25 percent of the failure modes that are encountered. While it should not be dismissed entirely, it cannot be the only tool in the toolkit of the maintenance program designer.

Figure 5.13, taken from the original 1978 report on RCM by United Airlines for the U.S. Department of Defense and enhanced, illustrates six common failure modes or conditional probability of failure profiles. Note that of the six only three, with rising failure probability at end of life, can be mitigated with active intervention. The total of these profiles having any “wearout” is 11 percent.

Time

4% 2% 5%

7% 14% 68%

Intrusive intervention does not improve reliability Intervention may improve reliability

Percentage of individual equipment following curves from Nowlan & Heap

F a il u re s Intervention point

Bathtub Increasing failure rate end of life Steadily increasing failure rate

Low early failure rate, constant after Constant failure rate throughout life High infant mortality, constant after

Figure 5.13 Six Common Failure Profiles and Percentage of Occurrence in Pre-Wide Body Commercial Aircraft

Three other statistically significant studies done using the same methodology over the past 40 years have confirmed virtually the same conclusion.3_{Profiles containing a “wearout” characteristic totaled no more} than 20 percent in any of these four comparable studies. Profiles exhibited a constant or slightly increasing conditional probability of failure characteristic in 80 to 90 percent of cases. What this means is that, if there is no “wearout,” that time directed tasks cannot be nearly as cost-effective nor as applicable as are condition monitoring tasks and condition directed repairs. While you may not have such extensive failure profile data for the particular asset for which you may be responsible, any RCM practitioner with broad exposure in many different venues can with confidence state that non-intrusive, condition monitoring and on-condition tasking is the way to go wherever feasible and most cost effective. When constructing a Physical Asset Optimization program these failure profile realities must be taken into consideration.

3_{The four studies from which failure profiles and statistics are taken are: “UAL Study” - DOD Report on Reliability-} Centered Maintenance by Nowlan & Heap of United Airlines, dated December 29,1978, which used data from the 1960’s and 1970’s and earlier papers and studies referenced therein; the “Broberg Study” believed done under NASA sponsorship (reported in 1973) and cited in Failure Diagnosis & Performance Monitoring Vol. 11 edited by L.F. Pau, published by Marcel-Dekker, 1981; the “MSP Study” - long title “Age Reliability Analysis Prototype Study”- done by American Management Systems under contract to U.S. Naval Sea Systems Command Surface Warship Directorate reported in 1993 but using 1980’s data from the Maintenance System (Development) Program; and the “SUBMEPP Study” reported in 2001, using data largely from 1990’s, and summarized in a paper dated 2001, entitled “U.S. Navy Analysis of Submarine Maintenance Data and the Development of Age and Reliability Profiles” by Tim Allen,

Reliability Analyst Leader at Submarine Maintenance Engineering, Planning and Procurement (SUBMEPP) a field activity of the Naval Sea Systems Command at Portsmouth NH.

Based on an appreciation of the failure process, RCM presents decision criteria for evaluating maintenance strategies: the presence of a dominant failure mode, task applicability, and task effectiveness

Dominant Failure Mode

Maintenance focuses on dominant failure modes — failure modes that are specific and reasonably likely to occur. The likelihood will vary to some extent with the severity of the failure mode’s effects. For example, if the failure mode is potentially lethal but highly improbable — a typical finding due to emphasis on safe design — it may be less “dominant” compared to a less severe failure mode with a higher probability of occurrence.

For equipment that is already in service, the best way to find out whether a failure mode is reasonably likely to occur is to ask the people who operate and maintain the equipment. The people who work with the equipment are the people most likely to be aware of these potential problems such as the failures that go unreported, and the failures that have been reported but cannot be found in history databases.

Failure histories only look backwards. They do not reveal “failures waiting to happen”. These include potential failures that either have been prevented by luck or haven’t yet occurred because conditions necessary for the failure to occur haven’t aligned. In either case, potential failures must be identified and the risk analyzed because most will eventually take place when the “right” combination of circumstances are aligned.

In order for a Condition Based Maintenance or monitoring technology to meet the goals of RCM, it must provide affirmative answers to the following questions:

 Does the technology monitor for condition leading to a specific failure mode?

 If so, what is the failure mode?

 Is it reasonable to expect that this failure mode will occur during the lifetime of the equipment?

If the technology is monitoring a parameter that cannot be correlated to a specific failure mode on that equipment, no one will know what failure the technology is intended to prevent. If the failure mode is not likely to occur, there is no need to acquire or apply the technology to prevent the failure.

Applicability

RCM has rules for “applicability” for each type of maintenance task - time directed, condition monitoring and directed, and failure finding. First, any task must be technically feasible —enabling a person to find, mitigate or prevent an actual failure or degrading condition leading to failure. For time-directed tasks to be applicable (i.e., they work) the interval between failures (by calendar or operating hours) must be known with reasonable accuracy.

Quantifiable condition measurement parameters may offer an advantage if they can be measured with sufficient consistency (including both the inherent errors of the measurement technique and also the errors that may be introduced by the person or automated tool performing the technique). In practice, a parameter is not always quantifiable.

Condition monitoring technologies must meet the following applicability requirements:

 The measured parameters must correlate to deterioration and related failure modes previously

identified.

 The parameters must be measurable. The measurements must be repeatable and sufficiently

stable over time to serve as reliable triggers for corrective action. In addition, measurements must be sufficiently consistent over every unit in a specific population to assure that a given

measurement more or less represents equivalent condition and severity of a problem every time.

 There must be sufficient time between the discovery of a potential failure and the onset of actual

functional failure to take appropriate corrective (condition-based) action.

 If condition-monitoring interval using periodic measurements is less than the minimum time

between warning and functional failure for a critical (e.g., safety related failure mode), on-line, continuous monitoring may be the best solution.

These points are related — condition measurements must be accurate, consistent and available at the correct intervals (or continuously) for a condition monitoring technology to be used reliably to predict

failures. Effectiveness

RCM’s rules for “effectiveness” are based on the consequences of the failure that the task is intended to address:

 For critical failures — the task must reduce the risk of failure to a tolerable level.

 For all other failures — the task must be cost-effective (e.g., cost in lost production to find a

hidden failure).

 If mission or economics are involved — the investment required in executing the task (e.g., for

capital and operating cost for a condition monitoring technology, including manhours, etc.,) should be less than the resources required to repair the failure after it has occurred.

When failure consequences affect the mission of an asset, the alternative to technology may be redundancy (to cover the mission requirements of the unit that suffers unplanned failures). Here, the tradeoff is one investment (technology) vs. another (capital equipment). There is a third alternative — increased spare parts (or a whole spare unit). In any case, the decision between technology and capital equipment or spares must be based on risk, financial considerations and the confidence in the strategy chosen to avert functional failures and their consequences.

Streamlining RCM

Virtually everyone with RCM experience validates that use of the Classical RCM or more rigorous approach is the best way to determine what maintenance must be performed to assure results that exceed any approach to maintenance program formulation used before it existed. However, after many years of extensive experience applying it in a large number of assets, many RCM practitioners and users (as well as some potential users) have concluded that in too many cases, Classical RCM is too expensive and time consuming to justify the results gained in the majority of total assets. By 2006, as this is written, this conclusion has become general industry consensus. — Classical RCM is too resource and time intensive to be applied cost effectively across a broad range of industrial equipment. It is justified economically when directed to the most critical, high-risk equipment and systems in many fixed or mobile assets.

To address this issue in commercial utility application “Streamlined RCM,” illustrated schematically in Figure 5.14(31)_{, was developed in the late 1980’s and early 1990’s under Electric Power Research Institute} (EPRI) sponsorship. The approach was to apply “templates” developed from nuclear powered generating plants to common systems in fossil plants. The EPRI contractor (Erin Engineering & Research, Inc., of Walnut Creek, California, now an SKF Group Company) later developed a broader set of templates for application beyond electric power generating utilities. Their version is called SRCM™.

EPRI’s Streamlined RCM differs from Classical RCM in three principal areas:

1. The RCM process is preceded by a risk ranking to assure resources are applied most effectively to equipment and systems with greatest opportunities for improvement.

2. General templates are utilized to make the most of broad knowledge regarding failure modes and maintenance actions.

3. Analysis results are compared with existing maintenance tasks to arrive at an optimum strategy. Existing maintenance tasks that do not clearly address a failure mode should be abandoned if no other rationale for their application can be established.

Templates or maintenance standards for specified components (motors, pumps, circuit breakers, transformers of specific design, manufacturer, application, capacity, etc.) include:

 Typical operating conditions  Common functions

 Typical functional failures

 Alternative functional failure mitigation strategies, tasks and other items (e.g., redesign, changed

Figure 5.14 The Streamlined RCM Process and its Relationship to Classical RCM (31)

Risk ranking is designed to identify systems and assets that have greatest risk — threat to operational requirements and cost objectives — and hence opportunity for improvement. Applying Streamlined RCM to highest risk systems and assets in a sequential order assures that the time and resources available for RCM gain greatest value in terms of both availability and cost.

Stated another way, every facility has systems and assets that are behaving well and seldom, if ever, experience problems. Whether it is design, installation, the operating context, to use a term from RCM, or the current maintenance program, these systems and assets don’t need immediate attention. Scarce

In document PHYSICAL ASSET MANAGEMENT HANDBOOK.doc (Page 90-104)