Can RCM (Reliability Centered Maintenance) and Streamlined RCM peacefully co-exist?
Religious or political zealots confront one another, often, not on the basis of the mores of their respective doctrines, but rather from the superficial differences in the details surrounding each other's cultural reference points...
Extracted from Chapter 14 of Reliability-centered Knowledge
By Murray Wiseman
Optimal Maintenance Decisions (OMDEC) Inc.
www.omdec.com
Religious or political zealots confront one another, often, not on the basis of the mores of their respective doctrines, but rather from superficial differences in the details surrounding each other’s cultural reference points. Mathematicians take pride in their ability to adopt a new set of definitions and symbols as effortlessly as they would don a fresh suit of clothes. Thus they proceed, unfettered by prior points of view, to build new theorems upon old. The world of maintenance has, not dissimilarly, spawned a multitude of cultures and languages for formulating solutions to real problems.
In the preceding chapters we conducted RCM on several diverse item types. We systematically answered each of the seven RCM questions about the item, and, in the order stipulated by the SAE JA-1011 standard: 1) functions?, 2) failures?, 3) failure modes?, 4) failure effects?, 5) consequences?, 6) scheduled tasks?, and 7) default tasks?. We entered the answers to the questions in an electronic spreadsheet (for example, MS Excel or a database form) formatted as the RCM Worksheet illustrated in Figure 11-2 on page 138.
This chapter explores one of several streamlined RCM software programs. We begin with an examination of what is meant by “streamlining”. We illustrate the streamlined approach by describing a popular representative RCM software package called RCM Turbo[1]. We set up a cross-reference “dictionary” of terms describing similar sounding but, sometimes, differently applied concepts in the two “languages”. Finally we summarize the relative advantages and potential drawbacks of the “streamlined” RCM and the RCM processes. Through this process, we discover how the juxtaposition of two approaches may enlighten the proponents of both.
Chapter 11(page 137) cited the SAE Standard “Evaluation Criteria for Reliability-Centered Maintenance (RCM) Processes” that defines RCM as:
“… a specific process used to identify the policies which must be implemented to manage the failure modes which could cause the functional failure of any physical asset in a given operating context.”
It goes on, to define the process by adding:
“…Any RCM process shall ensure that all the following seven questions are answered
satisfactorily and are answered in the sequence shown as follows:
a. What are the functions and associated desired standards of performance of the asset in its present operating context (functions)?
b. In what ways can it fail to fulfill its functions (functional failures)?
c. What causes each functional failure (failure modes)?
d. What happens when each failure occurs (failure effects)?
e. In what way does each failure matter (failure consequences)?
f. What should be done to predict or prevent each failure (proactive tasks and task intervals)
g. What should be done if a suitable proactive task cannot be found (default actions)?”
Were we to consider the process (of answering the 7 RCM questions in the sequence stipulated) unacceptably resource intensive, then, understandably, we would seek to replace it with a process that consumes less time and fewer resources, but by one that provides, no less a responsible (sufficiently rigorous) analysis. We emphasize that the JA 1011 SAE standard stipulates a minimal set of criteria for a process to be called “RCM”. Therefore, it is to be expected that most commercially packaged RCM software systems and methodologies will add a considerable number of features that will enhance and facilitate the experience.
The original[2] as well as the various streamlined RCM methods all demand that the assembled team of analysts (operational, process, and maintenance specialists) possess, collectively, the knowledge necessary to make informed decisions regarding the maintenance characteristics of the item under scrutiny. The process chosen (either original or streamlined) must, therefore, encourage the maximum contribution by each participant so that RCM decisions will carry the force of all knowledge and experience available on the team. The success of any “RCM” methodology, therefore, depends heavily on its ability to gain true consensus, throughout every stage of the analysis. The group, guided by a well trained facilitator, exercises its best judgment when visualizing the typical worst case scenario (TWCS) surrounding each functional failure analyzed.
With these objectives in mind, we compare the two processes, by presenting a comparative lexicon of some of their respective terms of reference.
RCM |
RCM Turbo |
Item: a collection of parts, or systems
that is convenient to analyze as a group. It has been selected at a high
enough level of indenture that its failure may easily be related to that of the equipment as a
whole, but at a low enough level so that the analysis is of manageable size
(i.e. having a manageable number of failure modes). |
Maintainable item
(MI): same meaning |
No equivalent
terminology is specified by the RCM minimum criteria standard. (Any
convenient or existing equipment hierarchy naming system may be used.)
Operating context is often recorded in a flexible text structure at the top
of the RCM worksheet. |
Productive unit
(PU): A system that
includes several maintainable items. A convenient place to record the operating
context of the MI. A productive unit belongs to a “Major Unit” and a “Plant” is the highest level in the
Turbo RCM hierarchy. |
Worksheet: A document (conveniently an
electronic spreadsheet or simple database application) onto which the answers
to the 7 RCM questions are recorded during the RCM team session. |
The RCM Turbo
software product is not meant to be populated during the sessions, but
afterwards by the facilitator or other person trained in the use of the
software. A MS Excel form (Figure 14-2) is provided for use
during the sessions. |
The RCM minimum
criteria standard does not specify a criticality or priority scale with which
to schedule the order of items to be analyzed. Nowlan and Heap developed a
simple priority system for the aviation industry that has only two
criticality ratings: 1)significant item[3],
and 2) non-significant item. This classification system has proved useful
in a variety of other industries. For
structurally significant items (SSI) Nowlan and Heap apply a further
classification of one to four for each of the five categories: 1)Residual
strength after failure, 2) Fatigue life, 3) Crack growth, 4) Corrosion, and
5) Accidental damage. The minimum class (for all 5) determines task
frequency. There are two categories of SSI: 1) Damage-tolerant and 2)
Safe-life. Classifications 1 to 5 apply to damage-tolerant items, but only
classifications 4 and 5 apply to safe-life items. (See Example 4 of Chapter 13 on page 178). |
Criticality/Priority: values used to set priorities for PUs
and MIs. It is derived by question and answer sessions driven by the program.
(Criticality calculations in no way detract from RCM. They merely add another
dimension to the analysis.) |
Failure: Describes the way in which a specified function no longer performs as
required. It distinguishes (for example) “full” from “partial” failure of a
function. The RCM Worksheet enforces a one-to-many integrity
constraint between Function and Failure. |
Failure: same basic definition. However
Turbo-RCM does not constrain a one-to-many (software) relationship between
Function and Failure. |
Failure Mode: A reasonably likely cause of a
specified failure. Consists of a noun, a verb (active or passive form) and a
phrase such as “due to …”. For example “bolt cracks due to stress corrosion
fatigue”. The number of failure modes to list and their “depth of causality”
depend on operating context. RCM enforces a one-to-many integrity constraint
between failure and failure mode. RCM Turbo does not. |
Failure Mode: A superset of the RCM definition.
Structured in 3 parts as follows: 1) a component
reference, 2) a “Failure Mode & Effect” field - a single field
that includes both RCM concepts (Failure Mode and Failure Effects), and 3) a
“Root cause” reference. An example of a RCM Turbo failure mode
is: “Bearings” + “wear between
rolling elements and racers leading to increased vibration levels, localized
heating and eventual seizure and
total stoppage of process due to” + “normal wear and tear”. |
Failure Mode: In RCM, the terms “Root Cause”,
“Failure Mode”, “Failure Mechanism”, “Failure Reason”, etc are synonymous and
represented by the term “Failure Mode”. It is an “event” in the causality
chain that leads to the failed state. The “link” in the causality chain
selected as the “Failure Mode” is the one that the organization can manage
effectively and practically by whichever means (proactive, detective, or
redesign). |
Root cause: related to Failure Mode. Same
definition. That is, “Root Cause” in Turbo RCM is equivalent to “Failure
Mode” in RCM. |
Failure Effects: Text answering the following: • what sequence of events
(considering a TWCS[4]
in the component, in the system, organization wide, and in the external
world) could be touched off by the
failure mode? • how does the failure make itself
known? What observable events lead up to the failure? • how is safety or the environment
impacted? (without mentioning the words "safety" or
"environment") • how is production impacted?
(quality, cost, customer service) • is there any additional damage
caused by the failure? • how long will it take and what
actions must be accomplished to correct the failure? • How does the likelihood of this
failure depend on deeper causes? Has it happened before? How often? Under
what circumstances? |
Same definition
but it is structurally embedded in the
“Failure Mode & Effect” field. In addition the following
“Failure Mode” fields (with sample data) contribute to the “Effects”
narrative: Unit Output Reduction: Total
stoppage, PU Downtime Cost: $11,390 / hour, MI Downtime Cost: $11,390 / hour F/mode&Effects: Shaft
failure-Chemical corrosion, overtorque, indicated by cracks, increase in
vibration leading to shutdown of Brownstock washer Characteristic: Definitive life / wear
out characteristics Measurability: Moderately easy to
monitor Category: Normal Operation Typical Warn Time: 4 Weeks Root cause: Normal wear & tear MTBF: 5 years Consequence: Total stoppage Strategy: CBM |
Hidden Function: A Function whose failure will not be detected under normal
circumstances. Identified by RCM during functional analysis when examining
each component (from schematics, p&ids, photographs, and physical
walkaround) and listing the functions they suggest. Code phrases (such as
“able to”, “in the presence of”, etc) are used to point out that a function
is hidden or protected by a hidden function. Subsequent questions address the
hidden function. The “hidden” consequence supplants the other (three) failure
consequences in the RCM logic for determining a mitigating task. |
Hidden Failure
Mode: Same meaning as
RCM’s “hidden function”. It is structured in the fields: Component, Failure Mode & Effects, Task
Description, Frequency, Duration, Initiate Date, Job Group ID, Service
Period, No. of Units in Service, No. of failures, and MTBF of the protective
device (calculated). |
RCM records this
information in the free text answer to question 4, “Failure Effects”. However
the JA1011 standard does not specify an explicit data field or structure for
MTBF. |
MTBF: related to the Failure Mode. |
RCM records this
information in the answer to question 6 and 7 “Tasks” when following one of
the four branches (H, S, O, N) in the RCM decision logic tree. |
Strategy: related to Failure Mode. Takes one of
three possible values: 1) fixed time maintenance, 2) condition based
maintenance, or 3) operate to failure |
Same definition.
RCM records this information in the free text answer to question 4, “Failure
effects”. |
P-F Interval: related to Failure Mode. Estimated
interval (measured in working age units) between the appearance of a
potential failure and a functional failure. |
Potential failure: An indicator that a failure mode has
initiated. |
S/A (secondary
action) Indicator: same
meaning as “Potential failure” in RCM. |
No equivalent
concept in RCM. If a failure mode is due to design, lubrication, overload, or
maintenance practices, they would each constitute a separate failure mode,
and this information would be included in the failure mode description
itself. The word “Safety” or “Environment” is not mentioned until the
consequence phase of the RCM logic diagram. |
Category: related to Failure Mode. Takes one of
six possible values: 1) Design, 2) Lubrication, 3) Normal Operation, 4)
Overload Condition, 5) Maintenance practices, or 6) Safety |
RCM records this
information in the free text answer to question 4, “Failure effects”. However
no explicit data structure is specified by the JA1011 standard. |
Characteristic: related to Failure Mode. Takes one of
three possible values: 1) Definitive life/wearout, 2) General degradation,
and 3) Random |
Consequences: Question 5. Takes one of four
possible values: 1) Hidden, 2) Safety /Environmental, 3) Operational, and
4)Non-operational. RCM records RCM
Turbo’s “Consequence” in the free text answer to question 4 “Failure
effects”. |
Consequence: related to Failure Mode. Takes one of
four possible values: 1) Total stoppage, 2) Partial stoppage/quality, 3) No
immediate effect, or 4) No effect. This information |
RCM records this
information both in the free text answer to Question 4 “Failure effects” and
in the answer to Question 6 “Tasks”. Q6 asks whether there is an applicable
CBM task. Once a (CBM or other) task is found to be applicable (practical)
RCM then asks whether it will be effective. That is, will it sufficiently
reduce or entirely avoid the consequences of failure at acceptable cost? |
Measurability: related to Failure Mode. Takes one of
three possible values: 1) Easy, 2) Moderate, or 3) Impossible |
Redesign: RCM records this information in the free text answer to question 7,
“Default Tasks”. Differs from RCM Turbo only in the sequence in which this
question appears (i.e. following a determination that no proactive or failure
finding task adequately mitigate the consequences of the failure.) |
Design Notes: related to the Failure Mode. Records
decision/recommendation to “design-out” the failure mode. (strictly speaking
it is presented out of “RCM sequence”.) |
RCM provides no specific field for this
information, leaving its provision up to the implementer or commercial
packager. |
Strategy Notes: related to Failure Mode. A free text
field used to store comments or notes on the chosen maintenance
strategy. Useful where a second or
alternative strategy has been considered and rejected. |
RCM records this information in the free text answer to question 4, “Failure Effects”.
However, without an explicitly specified structure. |
Breakdown Action: related to Failure Mode. Describes
what must be done to repair the functional failure. Also has the
specific fields: Work Order No., SOP, Duration, Downtime,
MI Status, S/A Initiator, Resources (up to six steps), Assumptions,
Materials, Spares. |
RCM develops this
information in the decision algorithm of question 5 (Is there an on-condition
maintenance task that is both applicable and effective?) The RCM standard
does not elaborate an explicitly specified structure for recording this
information. |
Primary Action: Related to the Failure mode.
Describes what should be done to prevent the failure mode. Also has
the specific fields: Work Order No., SOP, Duration, Downtime,
MI Status, S/A Initiator, Resources (up to six steps), Assumptions,
Materials, Spares. |
RCM records this information in the free text answer
to question 6, “Tasks”. The RCM standard does not elaborate an explicitly
specified structure for recording this information. |
Secondary Action: related to Failure Mode. Describes
what must be done following the detection of a potential failure. Also
has the specific fields: Work Order No., SOP, Duration, Downtime,
MI Status, S/A Initiator, Resources (up to six steps), Assumptions,
Materials, Spares. |
RCM records this
information in the free text answer
to question 4, “Failure Effects”. The RCM standard does not elaborate an
explicitly specified structure for recording this information. |
Overhaul Action: related to Failure Mode. Records
Overhaul Maintenance actions. For
example, where the Secondary Action was the change-out of a rotable item
which itself requires subsequent overhaul. Also has the specific fields: Work
Order No., SOP, Duration, Downtime, MI Status,
O/H Venue, S/A Initiator, Resources (for up to six steps), Assumptions, Materials, Spares. |
Not called a
“library”. However, the records are accessible (structured as answers to the
seven questions) in the RCM worksheets comprising the global RCM table. No
corporate harmonizing process need be applied because every record is a
“one-off” development. However, tools, training, supervision and support are
required to validate and maintain and update the knowledge base with day-to-day
experience. “Templating” of an entire item, is, nonetheless, possible by
copying any or all records of an item after carefully comparing their
respective operating context descriptions. |
Failure Data
Library: a table of “3
part” failure modes referenced by Machine Type. An administration process is
used to control the quality of data from multiple sites and harmonize it for
the purpose of providing “templates”
where applicable in future analyses of other MIs or PUs. The focus on
“templating” justifies the appellation “Streamlined” in the case of RCM
Turbo. |
We
may conclude from Table 14-1, that, although RCM Turbo
refers to itself as a streamlined process, and, that some of its
terminology differs from that of RCM, it does not omit any vital
knowledge element specified by the SAE RCM minimum criteria standard. RCM Turbo
does deviate from the sequence stipulated in the standard. As
pointed out in Chapter 11 (page 137), in practice, however, RCM is not a
sequential process. RCM analysts anticipate the answers to subsequent
questions while working the current question. Furthermore, the RCM process is
iterative. That is, the analysts often return to a previous answer and adjust
it in the light of revelations further on in the process. The iterative and
non-sequential nature of the RCM process tends to render less important the
differences between the two approaches.
The
terminology comparisons of Table 14-1 show that RCM Turbo
expands the information elements of RCM into greater structural detail. Such
data structuring facilitates the post-RCM processes (included in the RCM Turbo
software package) of workload smoothing, frequency calculations, and CMMS
integration as well as integration with a spares optimization (optional)
package.
Figure 14-1 of Example 1 shows how the RCM Worksheet of
Chapter 11 (Figure 11-2 page 138) might be combined with the extended data
fields of RCM Turbo.
Example 1
PU Code: Repulper, MI Code: Repulper screw |
Consequences
and Results of Decision Algorithm Q5, Q6,
Q7 |
Task |
Interval |
By |
|||||||
Function
Statement Q1 |
Failure Q2 |
Failure
mode Q3 |
Effects Q4 |
|
|
|
|
|
|
|
|
To feed
material 24 hours/day |
Does not
feed at all |
Shaft
fails |
Unit
Output Reduction: Total
stoppage, PU
Downtime Cost: $11,390
/ hour, MI
Downtime Cost: $11,390 / hour F/mode&Effects:
Shaft
failure-Chemical corrosion, overtorque, indicated by cracks, increase in
vibration leading to shutdown of Brownstock washer Characteristic: Definitive life / wear out characteristics Measurability: Moderately easy to monitor Category: Normal Operation Typical
Warn Time: 4 Weeks Root
cause: Normal
wear & tear MTBF: 5 years Consequence: Total stoppage Strategy: CBM |
|
|
|
|
|
|
|
|
We may conclude from Table 14-1, that, although RCM Turbo refers to itself as a streamlined process, and, that some of its terminology differs from that of RCM, it does not omit any vital knowledge element specified by the SAE RCM minimum criteria standard. RCM Turbo does deviate from the sequence stipulated in the standard. As pointed out in Chapter 11 (page 137), in practice, however, RCM is not a sequential process. RCM analysts anticipate the answers to subsequent questions while working the current question. Furthermore, the RCM process is iterative. That is, the analysts often return to a previous answer and adjust it in the light of revelations further on in the process. The iterative and non-sequential nature of the RCM process tends to render less important the differences between the two approaches.
The terminology comparisons of Table 14-1 show that RCM Turbo expands the information elements of RCM into greater structural detail. Such data structuring facilitates the post-RCM processes (included in the RCM Turbo software package) of workload smoothing, frequency calculations, and CMMS integration as well as integration with a spares optimization (optional) package.
Figure 14-1 of Example 1 shows how the RCM Worksheet of Chapter 11 (Figure 11-2 page 138) might be combined with the extended data fields of RCM Turbo.
Figure 14-1 RCM Worksheet applied to a RCM Turbo example
In the RCM worksheet of Figure 14-1 we note that most of the RCM Turbo “failure mode” fields (in bold) fall quite readily into the RCM Effects column, with the possible exception of the field “Strategy”. The latter appears to pre-empt the RCM decision logic of Questions 6 and 7. We view this, nonetheless, as an insignificant departure (from RCM), given that RCM analysts consider the mitigating task in the normal course describing the effects of failure. It is essential, however, that the RCM consequences (H, S, O, or M) be determined and the meticulous decision logic of RCM (on page 171) be applied immediately following this RCM Turbo step.
RCM Turbo facilitates data entry with a convenient Visual Basic MS Excel form illustrated in Figure 14-2.
Figure 14-2 MS Excel failure mode entry form in RCM Turbo
RCM Turbo then will perform a “primary” (i.e. a CBM) task frequency calculation (Figure 14-3) and display the results that 14 days (i.e. half the PF interval) is the recommended task frequency. RCM Turbo calculates the annualized cost of the CBM program so that it may be justified by comparison with the annualized economic consequences (based on the MTBF and the average cost of a failure) avoided by the CBM program.
Figure 14-3 CBM Frequency and Cost optimizing calculation
For scheduled overhaul, discard, and failure finding tasks RCM Turbo performs analogous calculations by applying a recorded MTBF, a qualitatively estimated hazard function, and the recorded average economic consequences of failure. The complete set of RCM Turbo’s data fields is given in Appendix 12 on page 236.
1. Table 14-1 illustrates that streamlined RCM (as it is embodied in RCM Turbo), is not “streamlined” (i.e. in the sense of being “abridged” or “reduced”). Rather, it encompasses the principles of RCM, adding features that address CMMS integration, quantitative reliability assessment and task frequency calculations, spares, workload scheduling and balancing, and other considerations.
2. RCM Turbo does address the 7 RCM questions, however, not in the sequence stipulated by the RCM Standard. The software expands the 7 information elements of RCM into various database fields. For example, MTBF, P-F Interval, and Repair time are explicit fields related to a Failure Mode.
3. A RCM Worksheet based on the SAE JA1011 standard, will provide excellent team focus regardless of the software adopted. If populated (perhaps adapted as in Figure 14-1) with RCM Turbo's needs in mind, the worksheet (incorporating the RCM decision algorithm) will benefit both streamlined and original RCM users.
4. Both RCM and RCM Turbo demand that the persons (primarily maintainers and operators), directly impacted by maintenance decisions, participate fully in the process. Indeed they must drive it. External consultants can only teach the principles and techniques of RCM. Regardless of the RCM software chosen, the organization must select its analysts from among its most experienced and competent operators and maintainers. It must chose a facilitator, from within, who will learn the RCM process fluently, elicit, and faithfully record the technical knowledge of the analysts. The facilitator must ask the 7 RCM questions and ensure that consensus has been reached. He or she must ask and ensure that each of the questions along the appropriate branch of the RCM decision tree are rigorously answered by the team, and duly recorded.
5. Finally, we emphasize that reliability-centered maintenance is not a software dominated process. Software records the results of RCM analysis in a convenient, accessible, and auditable format that traces every maintenance task back to a failure mode that the RCM team identified. Software enables integration with the CMMS and implementation therein of the RCM analysis results. As importantly, software, through regular feedback from the field, and integration with the CMMS, supports continuous “living” enhancement of the initial RCM analysis.
Do you have any comments on this article? If so send them to murray@omdec.com.
References:
1. RCM Turbo Maintenance Plan Development System Quick Reference Guide
2. RCM Turbo V9.2 User Guide
3. RCM Turbo V9 desktop guide rev 2
4. RCMT92 Installation Instructions
[1] Available from Strategic Corporate Assessment Systems, www.strategicorp.com.
[2] “Original” is meant here to refer to processes that conform closely to the RCM process developed by Nowlan and Heap as described in their 1978 report Reliability-centered Maintenance. Processes that conform to RCM as originally defined include: Ministry of Defence (UK) Defence Standard 02-45 Issue 2 CATEGORY 2 (NES 45 Issue 3 July 2000), John Moubray”Reliability-centered Maintenance”, MSG3.2002 Air Transport Association, Washington DC., NAVAIR 00-25-403., and others.