Condition Based
Maintenance Course Speaker
Notes Presented at: ABB, Vancouver, June 21, 2005-06-19 Presenters: Murray Wiseman, OMDEC, Joe Van Dyke, ABB |
|
Let us open the subject of CBM, not in the usual way, that is, by a discussion of the technical capabilities of this or that condition monitoring (CM) hardware and software, but rather by first verbalizing the essential CBM question: “Given the current CM and operational
data, what failure modes are causing deterioration in our asset, and what is
the right maintenance decision to take at this time?” The three choices are: 1 continue operating? 2. plan to perform maintenance on “x”
component within “y” working age units? 3. intervene immediately?: |
|
Before answering the question, let us situate CBM in the grand scheme of Maintenance. From a RCM view point there are really only 5 ways in which to manage the failure modes that stalk our physical assets. According to a study carried out by the late John Moubray, 33% of failure modes are appropriately dealt with by a strategy of detective maintenance. 25% by predictive maintenance (aka PdM, CBM, CM, on-condition maintenance, and other names). In general only 5% of failure modes are effectively dealt with by some form of time based renewal (replacement, overhaul). 33% of failure modes are correctly managed by allowing them to fail. And 4% of failure modes should be designed out. |
|
Another way of phrasing the CBM question is to ask “When shall we declare a potential failure? That is, what patterns shall we look for in our CM data, that will tell us:
Our ability to correctly answer the CBM questions each and every time a set of CM data is acquired, will depend on our knowledge, which in turn will depend on our skill in organizing and analyzing maintenance information. We wish to make the most effective use of the knowledge of others and of our own experience. |
|
The phrase "P-F
Interval" was coined by the late John Moubray. He used the term to
highlight the requirements of a CBM program in this well-known diagram (on
the left of the slide). However, this
empirical diagram is deceptively
simple. Deceptive, for at least two reasons. First it assumes that the
monitored data resembles the “Ideal” graphs on the slide – monotonically
increasing trend lines with the red alert limit set, presumably, to the level
of the potential failure “P”. How many of us,
involved in CBM, believe that data, generally, resembles these ideal plots?
Are not the random fluctuation and contradictory trends of the “real” graphs (on
the far right) more familiar? Most real CM data, t needs to be processed
before we can apply the P-F model. |
|
Recognizing the importance of historical information, many technology vendors are (rightly) focussing on the format of the information gathered through everyday maintenance and operational activities. |
|
Will the modernization, the computerization, of maintenance information alone provide us with the wisdom to take the right decisions (CBM and other maintenance decisions) at the right time? |
|
Here are our observations of the evolution of CMMSs over the past 3 decades, with regard to the quality of historical information from which we are supposed to draw knowledge with which to continuously improve our maintenance policies. Why haven’t we done better than this? In spite of excellent software, why haven’t we created effective knowledge bases within our existing maintenance management systems? |
|
In seeking an answer, I asked myself and my colleagues some elemental questions. What is the purpose of recording
information in maintenance? What are the sources of historical
information? What exactly do we want to do with it? |
|
What tool do we possess for building an effective maintenance knowledge base? Our CMMS. The CMMS is our very own (the maintenance department’s) “physical asset”. Let us describe our knowledge requirements of this asset. Let’s do so using the rigor of an RCM functional analysis. This functional analysis was extracted from Nowlan and Heap’s RCM report issued on Dec 31, 1978. |
|
Who, among you, knows of a currently operating CMMS that is fulfilling all of these functions to the satisfaction of its users? |
|
Would we agree, then, that a legitimate objective of any physical asset management department is the one expressed in this slide? The conversion of the CMMS and its related processes into a true intellectual asset – one with which to continuously improve maintenance policies. |
|
When we build any
system, especially a complex one, we would wish to communicate the details of
that system to everyone involved. The system development community has agreed
upon the Unified Modeling Language (UML)[1][3] to convey, through diagrams, the multitude
of perspectives of an evolving business solution. The UML “context diagram”
of this slide shows a proposed system and the actors who interrelate
(performing “use cases”) with it (. Other actors such, as equipment vendors,
maintenance or process specialists may likewise appear on the context
diagram. Even other systems and intelligent agents might be
shown, as interacting with our “reliability-centered
knowledge system”. |
|
This slide contains
the unified modeling language’s class diagram of a work order. The
work order is the “maintenance action form” – the fundamental record of what
happened. The work order class diagram represents the work order class.
A class is simply a specification of what a work order should be and do.
Notice that the diagram has three parts. The top part holds the class name.
The middle part specifies what it should be – its attributes. And the
bottom part, what it should do – its operations. |
|
For example, this
work order class diagram of indicates
that a work order should have a number, refer to a particular equipment, and
expose the working age of the equipment. One of the things
it might do is to calculate cost, or perhaps, estimate time. Note that a UML
class diagram never exposes everything there is to know about its entity, but
only those aspects that are of importance to the discussion of the moment.
(In techno-speak we might say that the diagram is an abstraction from
the totality of the work order class.) Question: What attributes should a work order have, to
support reliability (OEE) analysis and improvement? |
|
Answer: The 7 knowledge elements (questions) of
reliability-centered maintenance (RCM.) Here we abstract the first 5 elements
as work order class attributes. While the RCM question asks, “What are the
item’s functions?”, the work order asks, “Which function was lost
or compromised or threatened in this instance?”. The work order
continues to ask: In what way was it compromised?, what was the cause?,
what were the effects?, and what were the consequences? |
|
Good maintenance
decisions are driven by good historical information. What
information should be stored in the CMMS that will support subsequent good maintenance
decision making? The slide
illustrates a work order documentation “best practice”. Whenever a
maintenance technician closes a work order he records the five RCM knowledge
elements:
We may achieve this
best practice using any CMMS. Even when no specific fields are
available, the technician may enter key words followed by the description in
a comment field. (The key words in this slide are the five knowledge elements
in Indonesian.) This best practice combined with powerful software tools (EXAKT, ExpertALERT, Asset
Optimizer, and others) support optimal maintenance decisions. |
|
In defining the knowledge
functions in the RCM functional analysis we stipulated the requirement
of assessing the effectiveness of our CBM programs. When a CMMS is used and configured
as a reliability knowledge base we may generate analyses such as this one.
The conditional probability of failure is plotted against an asset's working
age. The CMMS record (via the 7 knowledge elements) records and discriminates
potential failures as well as functional failures. By
definition, potential failures have no dire consequences. We note, in
the case that the functional failure (conditional probability) curve is low
and flat – the characteristic shape of a well maintained item. The difference between the “Total
removals” curve and the “functional failures” curve represents the value
(effectiveness) of the CBM program for that item. CBM detects potential
failures in order to avoid the consequences of functional failures. |
|
Here is a similar
age-reliability analysis that discriminates among failure modes of a given
failure because our CMMS now records the
work order attribute “cause of failure”. Failure mode “A”
displays infant mortality. Failure mode B displays random failure[6] behaviour and failure mode C displays
wearout. Such an analysis may direct our maintenance managerial attention to
training or to quality problems in the case of failure mode A, or to the
possible requirement for scheduled asset renewal in the case of failure mode
C. A process or physical redesign might be appropriate to reduce the random
conditional probability of failure mode B. Thus, a CMMS configured and operated as a reliability-centered knowledge base will support a variety of reliability analyses that will allow it to meet many of the desirable functions stipulated earlier. |
|
Data acquisition is
the first and, one might assert, the easiest of the three CBM sub-processes
to implement. Assisted by advanced sensor, signal transmission, and storage
technologies, we can, without too much effort, implement systems that collect
and store impressive amounts of data. |
|
Signal processing in CBM is the filtering out of the acquired data all information that pertains to the operation of the asset and its environment. In other words, the processed signal should not reflect changes in load or operational conditions, but should react only to real changes in asset health, with respect to the deterioration by a failure mode that we are targeting with the CBM task. A variety of signal processing techniques have been (and continue to be) developed by industry and academic research organizations. We sometimes refer to signal processing, particularly in vibration analysis, as feature extraction. We process a raw time waveform signal (using an algorithm) in order to extract one or more features (condition indicators) that measure the evolution of particular conditions affecting or occurring in our physical asset. This and the following slides illustrate a small sample of the wide diversity of CBM signal processing techniques addressing specific failure modes. |
|
This slide illustrates that an effective CBM system may act as one half of an automatic control loop. Although most CBM programs operate in a manual control loop by directing a maintenance renewal task, the continuous oil analysis and treatment (COAT) system uses CBM condition data in an automatic control system. First it extracts features from a lubrication or cooling fluid’s infrared signature. The arrow on the left of the slide represents the signal processing algorithm that extracts the current additive level from the infrared spectrum. The additive level then can be tracked and trended in time. Other extracted features (i.e. condition indicators such as oxidation, additive content, and contamination) can be used similarly. In this case we portray the automated replenishment of depleted oil additives. |
|
Here we describe a
CBM signal processing algorithm that targets the failure mode “gear tooth
fails due to fatigue crack”[2]. The photograph at the top left illustrates
the development of a crack in tooth number 10 of the driven gear in a single-stage
helical gear reducer. The time waveform signal covering one revolution of the
driven gear appears in the top right. Note the amplitude and frequency
modulation occurring at 17 milliseconds into the revolution. This usually
indicates gear tooth damage, however some sort of processing is required if
this information is to be used in a practical CBM program for determining the
timing of a pro-active maintenance task. In this algorithm, a family of wavelets
is constructed to decompose the gear motion error signal and to extract the
residual error signal for gear fault detection. The bottom left displays the
signal for a single gear revolution and shows that tooth number 10 has a
motion pattern exhibiting high deviation from ideal motion and differing from
that of the other teeth. Finally the signal processing algorithm plots a
single indicator, called the “fault growth parameter” that is tracked over
macro time (e.g. weeks, months, years). Although the algorithm accomplishes the objective of signal processing – that is a monotonically increasing condition indictor revealing failure development, still, one crucial question remains, for the completion of the CBM process. |
The three lines, and the question “Where” on the fault growth graph illustrate the question: “When shall we intervene and perform a gearbox overhaul or change-out? At the first rise in value? At the second? Or, at the 280 time unit point when a third leveling off occurs at a FGP (fault growth parameter) value of 18. The answer to this last question, is at the heart of the third CBM sub-process – decision making. |
This graph
illustrates that we can be very conservative in our decisions by operating at
the left side of the graph, or very “adventurous” and operate at a high risk level
on the right side. Or we may operate anywhere in between the two extremes. Note how a very
conservative policy has a high risk of elevated cost and poor availability. In a conservative policy we tend to
panic too quickly when we see some high values in our CBM data. On the other
extreme, if we decide to “live dangerously”, near the right side of the
graph, we will also incur a high risk of low availability (if the MTTR for a
failure is much higher than for a PM) and high cost. Hence we pose two
additional questions:
|
|
The slide
illustrates an EXAKT decision policy. The green yellow and red graph
summarizes the significant risk factors that must be interpreted by a CBM
policy. The vertical axis is
the risk weighted sum of the monitored variables found to be significant risk
factors. The horizontal axis is the item’s working age (measured in some
appropriate engineering unit, e.g. tons of ore crushed). To the right of the
graph are various optimizing objectives. Below the graph is a table
indicating the optimal tradeoff between preventive maintenance and bottom
line cost. A preventive repair is (on the average) less costly than a repair action provoked by a functional failure... These factors affect the shape and position of the decision graph boundaries. |
|
Expert systems support maintenance decisions. This flow chart describes the operation of DLI Engineering’s ExpertALERT CBM decision making system. The diagram traces the flow of information through the signal processing steps (steps 1-5) and the decision making procedure (step 6) that uses a rule-based expert system. |
|
Step 3 performs a
cepstrum transformation[3]
of the fft spectrum. A cepstrum plot highlights series of spectral peaks that
are evenly spaced in the spectrum. These are called harmonics Harmonics can be synchronous (multiples of
shaft speed) or non-synchronous. The ExpertALERT algorithm searches the
spectrum for non-synchronous harmonics and any sidebands. If found they are
flagged as possible bearing tones, to be processed further in steps 5 and 6. The physics of each situation dictate the signal processing method selected. Non-synchronous peaks, such as those at 3.61 and 7.22 orders, are candidates for “bearing tones” that signal bearing faults. If, in addition, the non-synchronous peaks display sidebands spaced at orders of the shaft speed, an inner race defect is likely. The bottom schematic illustrates the physical explanation for bearing tones and the appearance of sidebands, with respect to an inner race spall. |
|
Demodulation (also called “envelope detection”) is a
signal processing technique used by ExpertALERT to supplement and verify the
information drawn from the cepstrum and spectrum analyses. Demodulation
provides an independent confirmation of bearing defects. If there is a spall on a bearing race, each time a ball passes it will impact and “ring” the bearing causing it to resonate at high frequencies. The resulting vibrations can be demodulated in order to extract the forcing frequency that is causing the ringing. The forcing frequencies will appear as peaks in the demodulated spectrum (bottom left). If they match the bearing tones from the screening matrix and the cepstrum, they provide further confirmation of a bearing defect. A distinct advantage of demodulation is that high frequencies do not travel far in a machine. Thus the demodulation process can localize the defective bearing. For example, if you see bearing tones in the narrow band spectral data from two different locations on the machine at the same frequency, and the demod data has |
matching peaks at
one location (but not the other), you can assume that the common location is
the one with the bearing problem. The 4 spectra illustrate this point
precisely. |
The amounts by which
the values in the CSDM (step three of slide 29) exceed
the threshold values (set up in the rules based on experience and knowledge)
is scored and converted into a relative severity. This normalizes a scale
with which to judge the state of health of each component. Thus the relative
severity for all components in the equipment can be trended on a single graph
(top right). The graph provides a decision support tool for performing a
corrective action on a component whose severity is high or has increased
substantially. In the following section, we propose to extend the automated
diagnosis one step further in order to estimate remaining life and provide an
optimized repair decision. |
|
Following step 6, the automated diagnostic tools hand over
their findings to the human decision makers. Can we process each
diagnostic fault and its respective severity one step further to provide:
i. to effect repair immediately, or ii. to repair within a particular time period from the current time, or to continue operation until the next
inspection. ? DLI and OMDEC have teamed up to provide this extension of a predictive system to a prognostic one. |
The photos illustrate the ABB demo fault simulater, in which a fault proceeds to functional failure, while being monitored by CBM decision agents such as EXAKT and ExpertALERT.[4] |
What conclusions can we draw from the foregoing? |
|
The slide
illustrates the two feedback “control” loops of physical asset management.
Their ultimate output achieves the corporate vision. Each feedback arrow
represents a management function:
|
|
Mastery of the processes represented by
the arrows of the previous slide
imposes the greatest challenge upon asset performance management. The next generation of maintenance performance management software
will dissect every KPI into its constituant incidences and knowledge elements
(as in this slide). The flow diagram illustrates that
historical data (contained in plant systems) fuel reliability analyses such
as Pareto, age-reliability relationships, and optimal CBM decision graphs[5].
Those methodologies are “shells” that need to be populated systematically
with knowledge and expereince. That done, they steer us towards improved
maintenance policies. The CMMS, the control system historian, CBM
databases, and other plant systems feed information to the performance
management system. The performance management system, in the hands of the
physical asset manager, outputs continually improving physical asset
management policies. Today, the
maintenance world hovers at the threshold of bridging two remaining gaps that
impede “excellence” in asset performance management. They are: 1. CMMS workorders do not yet systematically record reliability-centered
knowledge, and |
2. The RCM knowledge base is not yet fully integrated with the CMMS, process historian, and CBM databases. With these final
capabilities in hand, we may anticipate rewarding times ahead for physical
asset management. We have the skills, desire, and plan. Let us actually begin
the journey to OEE improvement at lowest cost in each or our enterprises. |
[2] Failure modes should consist of a noun and a verb (could be passive form) usually followed by a clause, such as “due to …” describing the appropriate causality level for the failure in question.
[3] One may say in a general sense that the more harmonics
and sidebands present, the worse the condition of the bearing. Thus, not only
does one wish to know if a peak is part of a larger family of peaks, one also
wants to get an idea of how much energy is contained in the series. Cepstrum
analysis is used for automating this task. The Cepstrum is a power spectrum of
a power spectrum of a waveform; therefore, any periodicities in the spectrum
(such as harmonic series or sideband families) will clearly appear as a peak in
the Cepstrum.
[4] See http://www.omdec.com/articles/p_CBMDecisionMakingwithExpertSystems.html for a description of the demo.
[5] These may be called age-reliability-significant factor relationships