Condition Based Maintenance Course Speaker Notes Presented at: ABB, Vancouver, June 21, 2005-06-19 Presenters: Murray Wiseman, OMDEC, Joe Van Dyke, ABB
Let us open the subject of CBM, not in the usual way, that is, by a discussion of the technical capabilities of this or that condition monitoring (CM) hardware and software, but rather by first verbalizing the essential CBM question: “Given the current CM and operational data, what failure modes are causing deterioration in our asset, and what is the right maintenance decision to take at this time?” The three choices are: 1 continue operating? 2. plan to perform maintenance on “x” component within “y” working age units? 3. intervene immediately?:
Before answering the question, let us situate CBM in the grand scheme of Maintenance. From a RCM view point there are really only 5 ways in which to manage the failure modes that stalk our physical assets. According to a study carried out by the late John Moubray, 33% of failure modes are appropriately dealt with by a strategy of detective maintenance. 25% by predictive maintenance (aka PdM, CBM, CM, on-condition maintenance, and other names). In general only 5% of failure modes are effectively dealt with by some form of time based renewal (replacement, overhaul). 33% of failure modes are correctly managed by allowing them to fail. And 4% of failure modes should be designed out.
Another way of phrasing the CBM question is to ask “When shall we declare a potential failure? That is, what patterns shall we look for in our CM data, that will tell us: What specific fault has occurred and is in a process deterioration towards an impending functional failure? How much time do we have to proact? Our ability to correctly answer the CBM questions each and every time a set of CM data is acquired, will depend on our knowledge, which in turn will depend on our skill in organizing and analyzing maintenance information. We wish to make the most effective use of the knowledge of others and of our own experience.
The phrase "P-F Interval" was coined by the late John Moubray. He used the term to highlight the requirements of a CBM program in this well-known diagram (on the left of the slide). However, this empirical diagram is deceptively simple. Deceptive, for at least two reasons. First it assumes that the monitored data resembles the “Ideal” graphs on the slide – monotonically increasing trend lines with the red alert limit set, presumably, to the level of the potential failure “P”. How many of us, involved in CBM, believe that data, generally, resembles these ideal plots? Are not the random fluctuation and contradictory trends of the “real” graphs (on the far right) more familiar? Most real CM data, t needs to be processed before we can apply the P-F model.
Recognizing the importance of historical information, many technology vendors are (rightly) focussing on the format of the information gathered through everyday maintenance and operational activities.
Will the modernization, the computerization, of maintenance information alone provide us with the wisdom to take the right decisions (CBM and other maintenance decisions) at the right time?
Here are our observations of the evolution of CMMSs over the past 3 decades, with regard to the quality of historical information from which we are supposed to draw knowledge with which to continuously improve our maintenance policies. Why haven’t we done better than this? In spite of excellent software, why haven’t we created effective knowledge bases within our existing maintenance management systems?
In seeking an answer, I asked myself and my colleagues some elemental questions. What is the purpose of recording information in maintenance? What are the sources of historical information? What exactly do we want to do with it?
What tool do we possess for building an effective maintenance knowledge base? Our CMMS. The CMMS is our very own (the maintenance department’s) “physical asset”. Let us describe our knowledge requirements of this asset. Let’s do so using the rigor of an RCM functional analysis. This functional analysis was extracted from Nowlan and Heap’s RCM report issued on Dec 31, 1978.
Who, among you, knows of a currently operating CMMS that is fulfilling all of these functions to the satisfaction of its users?
Would we agree, then, that a legitimate objective of any physical asset management department is the one expressed in this slide? The conversion of the CMMS and its related processes into a true intellectual asset – one with which to continuously improve maintenance policies.
When we build any system, especially a complex one, we would wish to communicate the details of that system to everyone involved. The system development community has agreed upon the Unified Modeling Language (UML)[1][3] to convey, through diagrams, the multitude of perspectives of an evolving business solution. The UML “context diagram” of this slide shows a proposed system and the actors who interrelate (performing “use cases”) with it (. Other actors such, as equipment vendors, maintenance or process specialists may likewise appear on the context diagram. Even other systems and intelligent agents might be shown, as interacting with our “reliability-centered knowledge system”.
This slide contains the unified modeling language’s class diagram of a work order. The work order is the “maintenance action form” – the fundamental record of what happened. The work order class diagram represents the work order class. A class is simply a specification of what a work order should be and do. Notice that the diagram has three parts. The top part holds the class name. The middle part specifies what it should be – its attributes. And the bottom part, what it should do – its operations.
For example, this work order class diagram of indicates that a work order should have a number, refer to a particular equipment, and expose the working age of the equipment. One of the things it might do is to calculate cost, or perhaps, estimate time. Note that a UML class diagram never exposes everything there is to know about its entity, but only those aspects that are of importance to the discussion of the moment. (In techno-speak we might say that the diagram is an abstraction from the totality of the work order class.) Question: What attributes should a work order have, to support reliability (OEE) analysis and improvement?
Answer: The 7 knowledge elements (questions) of reliability-centered maintenance (RCM.) Here we abstract the first 5 elements as work order class attributes. While the RCM question asks, “What are the item’s functions?”, the work order asks, “Which function was lost or compromised or threatened in this instance?”. The work order continues to ask: In what way was it compromised?, what was the cause?, what were the effects?, and what were the consequences?
Good maintenance decisions are driven by good historical information. What information should be stored in the CMMS that will support subsequent good maintenance decision making? The slide illustrates a work order documentation “best practice”. Whenever a maintenance technician closes a work order he records the five RCM knowledge elements: What function was lost or compromised or threatened (potential failure)? In what way? (e.g. total, partial, potential failure?) Why? What happened?, and How did it matter? We may achieve this best practice using any CMMS. Even when no specific fields are available, the technician may enter key words followed by the description in a comment field. (The key words in this slide are the five knowledge elements in Indonesian.) This best practice combined with powerful software tools (EXAKT, ExpertALERT, Asset Optimizer, and others) support optimal maintenance decisions.
In defining the knowledge functions in the RCM functional analysis we stipulated the requirement of assessing the effectiveness of our CBM programs. When a CMMS is used and configured as a reliability knowledge base we may generate analyses such as this one. The conditional probability of failure is plotted against an asset's working age. The CMMS record (via the 7 knowledge elements) records and discriminates potential failures as well as functional failures. By definition, potential failures have no dire consequences. We note, in the case that the functional failure (conditional probability) curve is low and flat – the characteristic shape of a well maintained item. The difference between the “Total removals” curve and the “functional failures” curve represents the value (effectiveness) of the CBM program for that item. CBM detects potential failures in order to avoid the consequences of functional failures.
Here is a similar age-reliability analysis that discriminates among failure modes of a given failure because our CMMS now records the work order attribute “cause of failure”. Failure mode “A” displays infant mortality. Failure mode B displays random failure[6] behaviour and failure mode C displays wearout. Such an analysis may direct our maintenance managerial attention to training or to quality problems in the case of failure mode A, or to the possible requirement for scheduled asset renewal in the case of failure mode C. A process or physical redesign might be appropriate to reduce the random conditional probability of failure mode B. Thus, a CMMS configured and operated as a reliability-centered knowledge base will support a variety of reliability analyses that will allow it to meet many of the desirable functions stipulated earlier.
Data acquisition is the first and, one might assert, the easiest of the three CBM sub-processes to implement. Assisted by advanced sensor, signal transmission, and storage technologies, we can, without too much effort, implement systems that collect and store impressive amounts of data.
Signal processing in CBM is the filtering out of the acquired data all information that pertains to the operation of the asset and its environment. In other words, the processed signal should not reflect changes in load or operational conditions, but should react only to real changes in asset health, with respect to the deterioration by a failure mode that we are targeting with the CBM task. A variety of signal processing techniques have been (and continue to be) developed by industry and academic research organizations. We sometimes refer to signal processing, particularly in vibration analysis, as feature extraction. We process a raw time waveform signal (using an algorithm) in order to extract one or more features (condition indicators) that measure the evolution of particular conditions affecting or occurring in our physical asset. This and the following slides illustrate a small sample of the wide diversity of CBM signal processing techniques addressing specific failure modes.
This slide illustrates that an effective CBM system may act as one half of an automatic control loop. Although most CBM programs operate in a manual control loop by directing a maintenance renewal task, the continuous oil analysis and treatment (COAT) system uses CBM condition data in an automatic control system. First it extracts features from a lubrication or cooling fluid’s infrared signature. The arrow on the left of the slide represents the signal processing algorithm that extracts the current additive level from the infrared spectrum. The additive level then can be tracked and trended in time. Other extracted features (i.e. condition indicators such as oxidation, additive content, and contamination) can be used similarly. In this case we portray the automated replenishment of depleted oil additives.
Here we describe a CBM signal processing algorithm that targets the failure mode “gear tooth fails due to fatigue crack”[2]. The photograph at the top left illustrates the development of a crack in tooth number 10 of the driven gear in a single-stage helical gear reducer. The time waveform signal covering one revolution of the driven gear appears in the top right. Note the amplitude and frequency modulation occurring at 17 milliseconds into the revolution. This usually indicates gear tooth damage, however some sort of processing is required if this information is to be used in a practical CBM program for determining the timing of a pro-active maintenance task. In this algorithm, a family of wavelets is constructed to decompose the gear motion error signal and to extract the residual error signal for gear fault detection. The bottom left displays the signal for a single gear revolution and shows that tooth number 10 has a motion pattern exhibiting high deviation from ideal motion and differing from that of the other teeth. Finally the signal processing algorithm plots a single indicator, called the “fault growth parameter” that is tracked over macro time (e.g. weeks, months, years). Although the algorithm accomplishes the objective of signal processing – that is a monotonically increasing condition indictor revealing failure development, still, one crucial question remains, for the completion of the CBM process.	The three lines, and the question “Where” on the fault growth graph illustrate the question: “When shall we intervene and perform a gearbox overhaul or change-out? At the first rise in value? At the second? Or, at the 280 time unit point when a third leveling off occurs at a FGP (fault growth parameter) value of 18. The answer to this last question, is at the heart of the third CBM sub-process – decision making.
This graph illustrates that we can be very conservative in our decisions by operating at the left side of the graph, or very “adventurous” and operate at a high risk level on the right side. Or we may operate anywhere in between the two extremes. Note how a very conservative policy has a high risk of elevated cost and poor availability. In a conservative policy we tend to panic too quickly when we see some high values in our CBM data. On the other extreme, if we decide to “live dangerously”, near the right side of the graph, we will also incur a high risk of low availability (if the MTTR for a failure is much higher than for a PM) and high cost. Hence we pose two additional questions: Where on that line do we want to operate?, and How do we set our CM action limits to the appropriate level of risk?
The slide illustrates an EXAKT decision policy. The green yellow and red graph summarizes the significant risk factors that must be interpreted by a CBM policy. The vertical axis is the risk weighted sum of the monitored variables found to be significant risk factors. The horizontal axis is the item’s working age (measured in some appropriate engineering unit, e.g. tons of ore crushed). To the right of the graph are various optimizing objectives. Below the graph is a table indicating the optimal tradeoff between preventive maintenance and bottom line cost. A preventive repair is (on the average) less costly than a repair action provoked by a functional failure... These factors affect the shape and position of the decision graph boundaries.
Expert systems support maintenance decisions. This flow chart describes the operation of DLI Engineering’s ExpertALERT CBM decision making system. The diagram traces the flow of information through the signal processing steps (steps 1-5) and the decision making procedure (step 6) that uses a rule-based expert system.
Step 3 performs a cepstrum transformation[3] of the fft spectrum. A cepstrum plot highlights series of spectral peaks that are evenly spaced in the spectrum. These are called harmonics Harmonics can be synchronous (multiples of shaft speed) or non-synchronous. The ExpertALERT algorithm searches the spectrum for non-synchronous harmonics and any sidebands. If found they are flagged as possible bearing tones, to be processed further in steps 5 and 6. The physics of each situation dictate the signal processing method selected. Non-synchronous peaks, such as those at 3.61 and 7.22 orders, are candidates for “bearing tones” that signal bearing faults. If, in addition, the non-synchronous peaks display sidebands spaced at orders of the shaft speed, an inner race defect is likely. The bottom schematic illustrates the physical explanation for bearing tones and the appearance of sidebands, with respect to an inner race spall.
Demodulation (also called “envelope detection”) is a signal processing technique used by ExpertALERT to supplement and verify the information drawn from the cepstrum and spectrum analyses. Demodulation provides an independent confirmation of bearing defects. If there is a spall on a bearing race, each time a ball passes it will impact and “ring” the bearing causing it to resonate at high frequencies. The resulting vibrations can be demodulated in order to extract the forcing frequency that is causing the ringing. The forcing frequencies will appear as peaks in the demodulated spectrum (bottom left). If they match the bearing tones from the screening matrix and the cepstrum, they provide further confirmation of a bearing defect. A distinct advantage of demodulation is that high frequencies do not travel far in a machine. Thus the demodulation process can localize the defective bearing. For example, if you see bearing tones in the narrow band spectral data from two different locations on the machine at the same frequency, and the demod data has	matching peaks at one location (but not the other), you can assume that the common location is the one with the bearing problem. The 4 spectra illustrate this point precisely.
The amounts by which the values in the CSDM (step three of slide 29) exceed the threshold values (set up in the rules based on experience and knowledge) is scored and converted into a relative severity. This normalizes a scale with which to judge the state of health of each component. Thus the relative severity for all components in the equipment can be trended on a single graph (top right). The graph provides a decision support tool for performing a corrective action on a component whose severity is high or has increased substantially. In the following section, we propose to extend the automated diagnosis one step further in order to estimate remaining life and provide an optimized repair decision.
Following step 6, the automated diagnostic tools hand over their findings to the human decision makers. Can we process each diagnostic fault and its respective severity one step further to provide: A residual life estimate relative to each failure mode, and An optimized decision as to whether i. to effect repair immediately, or ii. to repair within a particular time period from the current time, or to continue operation until the next inspection. ? DLI and OMDEC have teamed up to provide this extension of a predictive system to a prognostic one.	The photos illustrate the ABB demo fault simulater, in which a fault proceeds to functional failure, while being monitored by CBM decision agents such as EXAKT and ExpertALERT.[4]
What conclusions can we draw from the foregoing?
The slide illustrates the two feedback “control” loops of physical asset management. Their ultimate output achieves the corporate vision. Each feedback arrow represents a management function: To adjust maintenance policy in response to KPI achievement gaps. To adjust KPI targets in response to vision achievement gaps
Mastery of the processes represented by the arrows of the previous slide imposes the greatest challenge upon asset performance management. The next generation of maintenance performance management software will dissect every KPI into its constituant incidences and knowledge elements (as in this slide). The flow diagram illustrates that historical data (contained in plant systems) fuel reliability analyses such as Pareto, age-reliability relationships, and optimal CBM decision graphs[5]. Those methodologies are “shells” that need to be populated systematically with knowledge and expereince. That done, they steer us towards improved maintenance policies. The CMMS, the control system historian, CBM databases, and other plant systems feed information to the performance management system. The performance management system, in the hands of the physical asset manager, outputs continually improving physical asset management policies. Today, the maintenance world hovers at the threshold of bridging two remaining gaps that impede “excellence” in asset performance management. They are: 1. CMMS workorders do not yet systematically record reliability-centered knowledge, and	2. The RCM knowledge base is not yet fully integrated with the CMMS, process historian, and CBM databases. With these final capabilities in hand, we may anticipate rewarding times ahead for physical asset management. We have the skills, desire, and plan. Let us actually begin the journey to OEE improvement at lowest cost in each or our enterprises.

[2] Failure modes should consist of a noun and a verb (could be passive form) usually followed by a clause, such as “due to …” describing the appropriate causality level for the failure in question.

[3] One may say in a general sense that the more harmonics and sidebands present, the worse the condition of the bearing. Thus, not only does one wish to know if a peak is part of a larger family of peaks, one also wants to get an idea of how much energy is contained in the series. Cepstrum analysis is used for automating this task. The Cepstrum is a power spectrum of a power spectrum of a waveform; therefore, any periodicities in the spectrum (such as harmonic series or sideband families) will clearly appear as a peak in the Cepstrum.

[5] These may be called age-reliability-significant factor relationships