Can RCM (Reliability Centered Maintenance) and Streamlined RCM peacefully co-exist?

Religious or political zealots confront one another, often, not on the basis of the mores of their respective doctrines, but rather from the superficial differences in the details surrounding each other's cultural reference points...

Extracted from Chapter 14 of Reliability-centered Knowledge
By Murray Wiseman
Optimal Maintenance Decisions (OMDEC) Inc.
www.omdec.com

Introduction

Religious or political zealots confront one another, often, not on the basis of the mores of their respective doctrines, but rather from superficial differences in the details surrounding each other’s cultural reference points. Mathematicians take pride in their ability to adopt a new set of definitions and symbols as effortlessly as they would don a fresh suit of clothes. Thus they proceed, unfettered by prior points of view, to build new theorems upon old. The world of maintenance has, not dissimilarly, spawned a multitude of cultures and languages for formulating solutions to real problems.

In the preceding chapters we conducted RCM on several diverse item types. We systematically answered each of the seven RCM questions about the item, and, in the order stipulated by the SAE JA-1011 standard: 1) functions?, 2) failures?, 3) failure modes?, 4) failure effects?, 5) consequences?, 6) scheduled tasks?, and 7) default tasks?. We entered the answers to the questions in an electronic spreadsheet (for example, MS Excel or a database form) formatted as the RCM Worksheet illustrated in Figure 11-2 on page 138.

This chapter explores one of several streamlined RCM software programs. We begin with an examination of what is meant by “streamlining”. We illustrate the streamlined approach by describing a popular representative RCM software package called RCM Turbo[1]. We set up a cross-reference “dictionary” of terms describing similar sounding but, sometimes, differently applied concepts in the two “languages”. Finally we summarize the relative advantages and potential drawbacks of the “streamlined” RCM and the RCM processes. Through this process, we discover how the juxtaposition of two approaches may enlighten the proponents of both.

 

Why streamline RCM?

Chapter 11(page 137) cited the SAE Standard “Evaluation Criteria for Reliability-Centered Maintenance (RCM) Processes” that defines RCM as:

“… a specific process used to identify the policies which must be implemented to manage the failure modes which could cause the functional failure of any physical asset in a given operating context.”

It goes on, to define the process by adding:

“…Any RCM process shall ensure that all the following seven questions are answered
satisfactorily and are answered in the sequence shown as follows:
a. What are the functions and associated desired standards of performance of the asset in its present operating context (functions)?
b. In what ways can it fail to fulfill its functions (functional failures)?
c. What causes each functional failure (failure modes)?
d. What happens when each failure occurs (failure effects)?
e. In what way does each failure matter (failure consequences)?
f. What should be done to predict or prevent each failure (proactive tasks and task intervals)
g. What should be done if a suitable proactive task cannot be found (default actions)?”

Were we to consider the process (of answering the 7 RCM questions in the sequence stipulated) unacceptably resource intensive, then, understandably, we would seek to replace it with a process that consumes less time and fewer resources, but by one that provides, no less a responsible (sufficiently rigorous) analysis. We emphasize that the JA 1011 SAE standard stipulates a minimal set of criteria for a process to be called “RCM”. Therefore, it is to be expected that most commercially packaged RCM software systems and methodologies will add a considerable number of features that will enhance and facilitate the experience.

The original[2] as well as the various streamlined RCM methods all demand that the assembled team of analysts (operational, process, and maintenance specialists) possess, collectively, the knowledge necessary to make informed decisions regarding the maintenance characteristics of the item under scrutiny. The process chosen (either original or streamlined) must, therefore, encourage the maximum contribution by each participant so that RCM decisions will carry the force of all knowledge and experience available on the team. The success of any “RCM” methodology, therefore, depends heavily on its ability to gain true consensus, throughout every stage of the analysis. The group, guided by a well trained facilitator, exercises its best judgment when visualizing the typical worst case scenario (TWCS) surrounding each functional failure analyzed.

With these objectives in mind, we compare the two processes, by presenting a comparative lexicon of some of their respective terms of reference.

 

RCM/RCM Turbo dictionary

RCM

RCM Turbo

Item: a collection of parts, or systems that is convenient to analyze as a group. It has been selected at a high enough level of indenture that its failure may easily be related to that of the equipment as a whole, but at a low enough level so that the analysis is of manageable size (i.e. having a manageable number of failure modes).

Maintainable item (MI): same meaning

No equivalent terminology is specified by the RCM minimum criteria standard. (Any convenient or existing equipment hierarchy naming system may be used.) Operating context is often recorded in a flexible text structure at the top of the RCM worksheet.

Productive unit (PU): A system that includes several maintainable items. A convenient place to record the operating context of the MI. A productive unit belongs to a “Major Unit” and a “Plant” is the highest level in the Turbo RCM hierarchy.

Worksheet: A document (conveniently an electronic spreadsheet or simple database application) onto which the answers to the 7 RCM questions are recorded during the RCM team session.

Can RCM (Reliability Centered Maintenance) and Streamlined RCM peacefully co-exist?-Body-2

The RCM Turbo software product is not meant to be populated during the sessions, but afterwards by the facilitator or other person trained in the use of the software. A MS Excel form (Figure 14-2) is provided for use during the sessions.

The RCM minimum criteria standard does not specify a criticality or priority scale with which to schedule the order of items to be analyzed. Nowlan and Heap developed a simple priority system for the aviation industry that has only two criticality ratings: 1)significant item[3], and 2) non-significant item. This classification system has proved useful in a variety of other industries. For structurally significant items (SSI) Nowlan and Heap apply a further classification of one to four for each of the five categories: 1)Residual strength after failure, 2) Fatigue life, 3) Crack growth, 4) Corrosion, and 5) Accidental damage. The minimum class (for all 5) determines task frequency. There are two categories of SSI: 1) Damage-tolerant and 2) Safe-life. Classifications 1 to 5 apply to damage-tolerant items, but only classifications 4 and 5 apply to safe-life items. (See Example 4 of Chapter 13 on page 178).

Can RCM (Reliability Centered Maintenance) and Streamlined RCM peacefully co-exist?-Body

Criticality/Priority: values used to set priorities for PUs and MIs. It is derived by question and answer sessions driven by the program. (Criticality calculations in no way detract from RCM. They merely add another dimension to the analysis.)

Failure: Describes the way in which a specified function no longer performs as required. It distinguishes (for example) “full” from “partial” failure of a function. The RCM Worksheet enforces a one-to-many integrity constraint between Function and Failure.

Failure: same basic definition. However Turbo-RCM does not constrain a one-to-many (software) relationship between Function and Failure.

Failure Mode: A reasonably likely cause of a specified failure. Consists of a noun, a verb (active or passive form) and a phrase such as “due to …”. For example “bolt cracks due to stress corrosion fatigue”. The number of failure modes to list and their “depth of causality” depend on operating context. RCM enforces a one-to-many integrity constraint between failure and failure mode. RCM Turbo does not.

Failure Mode: A superset of the RCM definition. Structured in 3 parts as follows:

1) a component reference, 2) a “Failure Mode & Effect” field - a single field that includes both RCM concepts (Failure Mode and Failure Effects), and 3) a “Root cause” reference. An example of a RCM Turbo failure mode is: “Bearings” + “wear between rolling elements and racers leading to increased vibration levels, localized heating and eventual seizure and total stoppage of process due to” + “normal wear and tear”.

Failure Mode: In RCM, the terms “Root Cause”, “Failure Mode”, “Failure Mechanism”, “Failure Reason”, etc are synonymous and represented by the term “Failure Mode”. It is an “event” in the causality chain that leads to the failed state. The “link” in the causality chain selected as the “Failure Mode” is the one that the organization can manage effectively and practically by whichever means (proactive, detective, or redesign).

Root cause: related to Failure Mode. Same definition. That is, “Root Cause” in Turbo RCM is equivalent to “Failure Mode” in RCM.

Failure Effects: Text answering the following:

• what sequence of events (considering a TWCS[4] in the component, in the system, organization wide, and in the external world) could be touched off by the failure mode?

• how does the failure make itself known? What observable events lead up to the failure?

• how is safety or the environment impacted? (without mentioning the words "safety" or "environment")

• how is production impacted? (quality, cost, customer service)

• is there any additional damage caused by the failure?

• how long will it take and what actions must be accomplished to correct the failure?

• How does the likelihood of this failure depend on deeper causes? Has it happened before? How often? Under what circumstances?

 

Same definition but it is structurally embedded in the Failure Mode & Effect” field. In addition the following “Failure Mode” fields (with sample data) contribute to the “Effects” narrative:

Unit Output Reduction: Total stoppage,

PU Downtime Cost: $11,390 / hour,

MI Downtime Cost: $11,390 / hour

F/mode&Effects: Shaft failure-Chemical corrosion, overtorque, indicated by cracks, increase in vibration leading to shutdown of Brownstock washer

Characteristic: Definitive life / wear out characteristics

Measurability: Moderately easy to monitor

Category: Normal Operation

Typical Warn Time: 4 Weeks

Root cause: Normal wear & tear

MTBF: 5 years

Consequence: Total stoppage

Strategy: CBM

Hidden Function: A Function whose failure will not be detected under normal circumstances. Identified by RCM during functional analysis when examining each component (from schematics, p&ids, photographs, and physical walkaround) and listing the functions they suggest. Code phrases (such as “able to”, “in the presence of”, etc) are used to point out that a function is hidden or protected by a hidden function. Subsequent questions address the hidden function. The “hidden” consequence supplants the other (three) failure consequences in the RCM logic for determining a mitigating task.

Hidden Failure Mode: Same meaning as RCM’s “hidden function”. It is structured in the fields: Component, Failure Mode & Effects, Task Description, Frequency, Duration, Initiate Date, Job Group ID, Service Period, No. of Units in Service, No. of failures, and MTBF of the protective device (calculated).

RCM records this information in the free text answer to question 4, “Failure Effects”. However the JA1011 standard does not specify an explicit data field or structure for MTBF.

MTBF: related to the Failure Mode.

RCM records this information in the answer to question 6 and 7 “Tasks” when following one of the four branches (H, S, O, N) in the RCM decision logic tree.

Strategy: related to Failure Mode. Takes one of three possible values: 1) fixed time maintenance, 2) condition based maintenance, or 3) operate to failure

Same definition. RCM records this information in the free text answer to question 4, “Failure effects”.

P-F Interval: related to Failure Mode. Estimated interval (measured in working age units) between the appearance of a potential failure and a functional failure.

Potential failure: An indicator that a failure mode has initiated.

S/A (secondary action) Indicator: same meaning as “Potential failure” in RCM.

No equivalent concept in RCM. If a failure mode is due to design, lubrication, overload, or maintenance practices, they would each constitute a separate failure mode, and this information would be included in the failure mode description itself. The word “Safety” or “Environment” is not mentioned until the consequence phase of the RCM logic diagram.

Category: related to Failure Mode. Takes one of six possible values: 1) Design, 2) Lubrication, 3) Normal Operation, 4) Overload Condition, 5) Maintenance practices, or 6) Safety

RCM records this information in the free text answer to question 4, “Failure effects”. However no explicit data structure is specified by the JA1011 standard.

Characteristic: related to Failure Mode. Takes one of three possible values: 1) Definitive life/wearout, 2) General degradation, and 3) Random

Consequences: Question 5. Takes one of four possible values: 1) Hidden, 2) Safety /Environmental, 3) Operational, and 4)Non-operational.

RCM records RCM Turbo’s “Consequence” in the free text answer to question 4 “Failure effects”.

Consequence: related to Failure Mode. Takes one of four possible values: 1) Total stoppage, 2) Partial stoppage/quality, 3) No immediate effect, or 4) No effect. This information

RCM records this information both in the free text answer to Question 4 “Failure effects” and in the answer to Question 6 “Tasks”. Q6 asks whether there is an applicable CBM task. Once a (CBM or other) task is found to be applicable (practical) RCM then asks whether it will be effective. That is, will it sufficiently reduce or entirely avoid the consequences of failure at acceptable cost?

Measurability: related to Failure Mode. Takes one of three possible values: 1) Easy, 2) Moderate, or 3) Impossible

Redesign: RCM records this information in the free text answer to question 7, “Default Tasks”. Differs from RCM Turbo only in the sequence in which this question appears (i.e. following a determination that no proactive or failure finding task adequately mitigate the consequences of the failure.)

Design Notes: related to the Failure Mode. Records decision/recommendation to “design-out” the failure mode. (strictly speaking it is presented out of “RCM sequence”.)

RCM provides no specific field for this information, leaving its provision up to the implementer or commercial packager.

Strategy Notes: related to Failure Mode. A free text field used to store comments or notes on the chosen maintenance strategy. Useful where a second or alternative strategy has been considered and rejected.

RCM records this information in the free text answer to question 4, “Failure Effects”. However, without an explicitly specified structure.

Breakdown Action: related to Failure Mode. Describes what must be done to repair the functional failure. Also has the specific fields: Work Order No., SOP, Duration, Downtime, MI Status, S/A Initiator, Resources (up to six steps), Assumptions, Materials, Spares.

RCM develops this information in the decision algorithm of question 5 (Is there an on-condition maintenance task that is both applicable and effective?) The RCM standard does not elaborate an explicitly specified structure for recording this information.

Primary Action: Related to the Failure mode. Describes what should be done to prevent the failure mode. Also has the specific fields: Work Order No., SOP, Duration, Downtime, MI Status, S/A Initiator, Resources (up to six steps), Assumptions, Materials, Spares.

RCM records this information in the free text answer to question 6, “Tasks”. The RCM standard does not elaborate an explicitly specified structure for recording this information.

Secondary Action: related to Failure Mode. Describes what must be done following the detection of a potential failure. Also has the specific fields: Work Order No., SOP, Duration, Downtime, MI Status, S/A Initiator, Resources (up to six steps), Assumptions, Materials, Spares.

RCM records this information in the free text answer to question 4, “Failure Effects”. The RCM standard does not elaborate an explicitly specified structure for recording this information.

Overhaul Action: related to Failure Mode. Records Overhaul Maintenance actions. For example, where the Secondary Action was the change-out of a rotable item which itself requires subsequent overhaul. Also has the specific fields: Work Order No., SOP, Duration, Downtime, MI Status, O/H Venue, S/A Initiator, Resources (for up to six steps), Assumptions, Materials, Spares.

Not called a “library”. However, the records are accessible (structured as answers to the seven questions) in the RCM worksheets comprising the global RCM table. No corporate harmonizing process need be applied because every record is a “one-off” development. However, tools, training, supervision and support are required to validate and maintain and update the knowledge base with day-to-day experience. “Templating” of an entire item, is, nonetheless, possible by copying any or all records of an item after carefully comparing their respective operating context descriptions.

Failure Data Library: a table of “3 part” failure modes referenced by Machine Type. An administration process is used to control the quality of data from multiple sites and harmonize it for the purpose of providing “templates” where applicable in future analyses of other MIs or PUs. The focus on “templating” justifies the appellation “Streamlined” in the case of RCM Turbo.

 

We may conclude from Table 14-1, that, although RCM Turbo refers to itself as a streamlined process, and, that some of its terminology differs from that of RCM, it does not omit any vital knowledge element specified by the SAE RCM minimum criteria standard. RCM Turbo does deviate from the sequence stipulated in the standard. As pointed out in Chapter 11 (page 137), in practice, however, RCM is not a sequential process. RCM analysts anticipate the answers to subsequent questions while working the current question. Furthermore, the RCM process is iterative. That is, the analysts often return to a previous answer and adjust it in the light of revelations further on in the process. The iterative and non-sequential nature of the RCM process tends to render less important the differences between the two approaches.

 

The terminology comparisons of Table 14-1 show that RCM Turbo expands the information elements of RCM into greater structural detail. Such data structuring facilitates the post-RCM processes (included in the RCM Turbo software package) of workload smoothing, frequency calculations, and CMMS integration as well as integration with a spares optimization (optional) package.

 

Figure 14-1 of Example 1 shows how the RCM Worksheet of Chapter 11 (Figure 11-2 page 138) might be combined with the extended data fields of RCM Turbo.

 

Example 1

 

PU Code: Repulper, MI Code: Repulper screw

Consequences and Results of Decision Algorithm

Q5, Q6, Q7

Task

 

 

 

 

Interval

 

By

 

Function Statement

Q1

Failure

 

Q2

Failure mode

 

Q3

Effects

 

Q4

 

 

 

 

 

 

 

 

To feed material 24 hours/day

Does not feed at all

Shaft fails

Unit Output Reduction: Total stoppage,

PU Downtime Cost: $11,390 / hour,

MI Downtime Cost: $11,390 / hour

 

F/mode&Effects: Shaft failure-Chemical corrosion, overtorque, indicated by cracks, increase in vibration leading to shutdown of Brownstock washer

 

Characteristic: Definitive life / wear out characteristics

Measurability: Moderately easy to monitor

Category: Normal Operation

Typical Warn Time: 4 Weeks

Root cause: Normal wear & tear

MTBF: 5 years

Consequence: Total stoppage

Strategy: CBM

 

 

 

 

 

 

 

 

We may conclude from Table 14-1, that, although RCM Turbo refers to itself as a streamlined process, and, that some of its terminology differs from that of RCM, it does not omit any vital knowledge element specified by the SAE RCM minimum criteria standard. RCM Turbo does deviate from the sequence stipulated in the standard. As pointed out in Chapter 11 (page 137), in practice, however, RCM is not a sequential process. RCM analysts anticipate the answers to subsequent questions while working the current question. Furthermore, the RCM process is iterative. That is, the analysts often return to a previous answer and adjust it in the light of revelations further on in the process. The iterative and non-sequential nature of the RCM process tends to render less important the differences between the two approaches.

The terminology comparisons of Table 14-1 show that RCM Turbo expands the information elements of RCM into greater structural detail. Such data structuring facilitates the post-RCM processes (included in the RCM Turbo software package) of workload smoothing, frequency calculations, and CMMS integration as well as integration with a spares optimization (optional) package.

Figure 14-1 of Example 1 shows how the RCM Worksheet of Chapter 11 (Figure 11-2 page 138) might be combined with the extended data fields of RCM Turbo.

Example 1

Figure 14-1 RCM Worksheet applied to a RCM Turbo example

In the RCM worksheet of Figure 14-1 we note that most of the RCM Turbo “failure mode” fields (in bold) fall quite readily into the RCM Effects column, with the possible exception of the field “Strategy”. The latter appears to pre-empt the RCM decision logic of Questions 6 and 7. We view this, nonetheless, as an insignificant departure (from RCM), given that RCM analysts consider the mitigating task in the normal course describing the effects of failure. It is essential, however, that the RCM consequences (H, S, O, or M) be determined and the meticulous decision logic of RCM (on page 171) be applied immediately following this RCM Turbo step.

RCM Turbo facilitates data entry with a convenient Visual Basic MS Excel form illustrated in Figure 14-2.

Figure 14-2 MS Excel failure mode entry form in RCM Turbo

RCM Turbo then will perform a “primary” (i.e. a CBM) task frequency calculation (Figure 14-3) and display the results that 14 days (i.e. half the PF interval) is the recommended task frequency. RCM Turbo calculates the annualized cost of the CBM program so that it may be justified by comparison with the annualized economic consequences (based on the MTBF and the average cost of a failure) avoided by the CBM program.

Figure 14-3 CBM Frequency and Cost optimizing calculation

For scheduled overhaul, discard, and failure finding tasks RCM Turbo performs analogous calculations by applying a recorded MTBF, a qualitatively estimated hazard function, and the recorded average economic consequences of failure. The complete set of RCM Turbo’s data fields is given in Appendix 12 on page 236.

 

Conclusions

1. Table 14-1 illustrates that streamlined RCM (as it is embodied in RCM Turbo), is not “streamlined” (i.e. in the sense of being “abridged” or “reduced”). Rather, it encompasses the principles of RCM, adding features that address CMMS integration, quantitative reliability assessment and task frequency calculations, spares, workload scheduling and balancing, and other considerations.
2. RCM Turbo does address the 7 RCM questions, however, not in the sequence stipulated by the RCM Standard. The software expands the 7 information elements of RCM into various database fields. For example, MTBF, P-F Interval, and Repair time are explicit fields related to a Failure Mode.
3. A RCM Worksheet based on the SAE JA1011 standard, will provide excellent team focus regardless of the software adopted. If populated (perhaps adapted as in Figure 14-1) with RCM Turbo's needs in mind, the worksheet (incorporating the RCM decision algorithm) will benefit both streamlined and original RCM users.
4. Both RCM and RCM Turbo demand that the persons (primarily maintainers and operators), directly impacted by maintenance decisions, participate fully in the process. Indeed they must drive it. External consultants can only teach the principles and techniques of RCM. Regardless of the RCM software chosen, the organization must select its analysts from among its most experienced and competent operators and maintainers. It must chose a facilitator, from within, who will learn the RCM process fluently, elicit, and faithfully record the technical knowledge of the analysts. The facilitator must ask the 7 RCM questions and ensure that consensus has been reached. He or she must ask and ensure that each of the questions along the appropriate branch of the RCM decision tree are rigorously answered by the team, and duly recorded.
5. Finally, we emphasize that reliability-centered maintenance is not a software dominated process. Software records the results of RCM analysis in a convenient, accessible, and auditable format that traces every maintenance task back to a failure mode that the RCM team identified. Software enables integration with the CMMS and implementation therein of the RCM analysis results. As importantly, software, through regular feedback from the field, and integration with the CMMS, supports continuous “living” enhancement of the initial RCM analysis.

Do you have any comments on this article? If so send them to murray@omdec.com.

References:

1. RCM Turbo Maintenance Plan Development System Quick Reference Guide
2. RCM Turbo V9.2 User Guide
3. RCM Turbo V9 desktop guide rev 2
4. RCMT92 Installation Instructions

[1] Available from Strategic Corporate Assessment Systems, www.strategicorp.com.
[2] “Original” is meant here to refer to processes that conform closely to the RCM process developed by Nowlan and Heap as described in their 1978 report Reliability-centered Maintenance. Processes that conform to RCM as originally defined include: Ministry of Defence (UK) Defence Standard 02-45 Issue 2 CATEGORY 2 (NES 45 Issue 3 July 2000), John Moubray”Reliability-centered Maintenance”, MSG3.2002 Air Transport Association, Washington DC., NAVAIR 00-25-403., and others.
 

[3] One whose failure has hidden, safety, environmental, or serious economic consequences.
[4] Typical worst case scenario. A collective judgement that balances the extent of detail recorded with the gravity and likelihood of the failure consequences.