Interview with Dragan Banjevic
Q: Dr. Banjevic, I understand that EXAKT uses past failures, to develop a predictive model. Does that mean that companies must have many catastrophic functional failures before they can use your program?
A: I think that you have expressed some of the usual and basic concerns - data availability and data quality. Here are two typical situations that we have encountered.
Case 1. You have a single asset, say a pump, that has been operating for 30 years without failure. Probably you will have, for this pump, a lot of condition data (for example vibration, flow rates, motor current, etc.) taken at regular intervals, but no failure data. Alternatively, you have a brand new pump of a new design on which you have no experience at all.
Case 2. You have equipment and/or fleets of similar equipment. Over the years, you have accumulated large databases (or files of paper reports) containing condition monitoring data. During the same period you will have operated a CMMS, and you will have recorded (more or less accurately) the failures that occurred and the maintenance that has been performed on these assets.
Do these two cases just about cover the range of situations that you are familiar with?
Q: Yes, I think the second case is more common. Would you address Case 2 first?
A: Certainly. In most plants, some functional failures and numerous potential failures[1] do occur. The following would be a typical scenario for the development of one or more CBM optimization models:
1. You have a machine (or sometimes a fleet of machines).
2. Over time you record various measurements on a periodic (daily, weekly, monthly, etc.) basis. For example: load, vibration, amperage, phase, or whatever else may be appropriate. Those readings would also include working age measured in some service usage unit that describes the accumulating stress on the machine. Say fuel consumed, or widgets produced. In EXAKT we call each set of measurements taken at more or less regular intervals, an Inspection.
3. Once in a while you see some anomaly in the data, and you feel that you should do a deeper (more intrusive) "Inspection". Or, you (i.e. the maintainer) perform a time based maintenance task. In either case you physically inspect one or more components in the machine. You find that one of the components is in a failing state. You have, thus, discovered a potential failure.[2] You record this observation in the CMMS as an event which you might name "EFP1" (ending with potential failure type 1 – a potential failure of component X or of failure mode Y, for example).
4. You repeat steps 1 to 3 over time. That is how you normally accumulate a "sample" of condition and event data. (By the way, you are making use of an important function of your CMMS by populating it with this type of data. After all, you paid good money for the CMMS. Why not use its historical data recording capabilities to their fullest?[3]
5. Sometimes (as will happen) you will have missed detecting a potential failure soon enough, and you will experience a real (functional) failure. This, as well, becomes part of your historical database (i.e. your sample).
6. Over time you will have experienced several failure modes at the potential failure stage, and perhaps one or two actual functional failures. (Now, at last, you have a good sample). You analyze this sample in EXAKT and you build a model that can be used for automated prediction (residual life estimation) and optimal CBM decision making.
The important point to note in this hypothetical sequence, is that model building using EXAKT does not require you to have endured catastrophic or expensive functional failures. EXAKT was designed to extend current CBM decision making capability. The results of whatever current methods are being used to record condition data and event data may be analyzed by EXAKT in order to build an optimal CBM data interpretation model. That model can then be used as a policy (i.e. an alarm limit) for the future detection of a specific failure mode while it is in its “potential failure” stage.
Of course in the real world, maintainers have not recorded failures, potential failures, and other events as carefully as they perhaps would have, had they known about EXAKT's data analysis capabilities. Not to worry. EXAKT contains many data checking and validation procedures that help us "clean" our (less than meticulous) data. Usually, we are able to analyze that data and provide the maintenance department with a good predictive model. Or, at the very least, with some fresh new ideas on how to improve the effectiveness of their current CBM program. Tutorials 2, 3, and 4 on the OMDEC website[4] demonstrate some of our data cleansing techniques.
Q: Nevertheless, building a database can take a long time.
A: Whatever you do, the clock will tick and years will elapse. Either, during that time, you use standard procedures[5] to record what happened, or you populate your CMMS history database haphazardly. Opting for the former adds negligeable cost, but confers, in the short term, expanded awareness and better communication among your maintainers, operators, supervisors, and engineers. In the longer term, good historical information offers understanding through analysis.
Q: What about the first case you mentioned – when no failures have ever occurred on an equipment?
A: EXAKT offers two solutions depending on each of these two possible situations:
1. If you have some expert knowledge about the failure of the pump from the maintenance personnel or from the OEM, or you have some failure data from a similar pump (e.g., an earlier design of pump that you have used in the past), the Bayesian approach would be the most appropriate solution. EXAKT’s upcoming version implements Bayesian modeling. That is, it incorporates expert judgment of the relative risks associated with various condition indicators to build a prior model. EXAKT, subsequently and continuously, updates the model as actual failure or potential failure data accrues.
2. In a second situation, let’s assume that you know nothing about the failure of the pump. The Bayesian approach can still be applied by assuming a non-informative prior distribution for the CBM model parameters. As in the first situation, EXAKT continuously updates the model (as operational, condition, and failure and condition monitoring data accumulate). Of course, the prior model, based on a non-informative prior distribution, initially, will have no predictive value. Until the model evolves, the best we can do is to apply statistical process control methods or judgement limits to certain “features” of vibration, oil analysis, or other CBM data. In other words, the usual, or traditional, way that CBM is done.
Q: Are you saying that we must revert to our existing CBM procedures until data becomes available?
A: Not quite. The EXAKT approach provides two distinct advantages over previous CBM methods:
The first, is that EXAKT measures, monitors, and reports on the effectiveness of the evolving predictive model. This provides maintenance managers with a clear picture of whether and how their CBM programs are improving.
Secondly, and even more importantly, the EXAKT methodology imposes a novel business discipline on the maintenance data acquisition process itself. Technicians, reliability engineers, and managers alike, quickly experience the benefits of having understood and duly recorded the five RCM knowledge elements[6] prior to closing each work order.
A: If operating conditions, rates, materials, and environmental factors all change from their values in the past, how good will be the results of the model applied in the future? A gut response to that question might be, “No good at all!”. But if we stop to consider the nature of a model, we discover that it’s not as black as that. Consider the internal indicators that we include in a model – vibration features, throughput, wear particle size and quantity, component temperature, and so on. Then consider the range of circumstances that occurred in the past with regard to these variables and their relationship to a targeted failure mode. Although external conditions may have changed, the internal physics associated with a failure mode, captured in the statistical model, are still valid. If, however, the new conditions, provoke entirely new failure modes that have never occurred, the model cannot predict those new failure modes because the sample upon which it was built contains no failures or potential failures of that kind.
[1] A potential failure is an indication that a failure process is underway. Unlike a functional failure, a potential failure has no dire consequences, beyond those involved in applying a proactive fix.
[3] Interviewer’s note: The article “Data Strategy” describes how to use your CMMS in this way.
[4] Under menu item “CBM Optimization”
[5] See chapters 1, 2, and 3 of Reliability-centered Knowledge.
[6] The first five RCM knowledge element (i.e. questions) are: “What function was lost or compromised?”, “In what way (e.g. full, partial, functional or potential failure) was it lost?”, “Why?”, “What happened?, and “How did it m