The following questions appeared recently
(July 2004) on the PlantMaint list service in response to the "Elusive P-F Curve" article that was
published at Reliabilityweb.com. http://www.reliabilityweb.com/art04/p-f_curve.htm
Question 1. Referring to your graph on the right, does EXAKT ignore “high points” that occur at a lower age and therefore do not cross over the green-to-red threshold? For example let’s say you have a sudden high vibration reading but it is still in the green region. Isn’t it dangerously negligent to ignore such a signal? |
|
Although EXAKT does not recommend a renewal for a
“high point” that is still in the green region, the
“abnormal” vibration reading may indeed be caused by problems (either not
directly related to the risk of failure or to the failure mode being addressed
by the model). For example, the item being monitored may be a complex item, one
that is subject to more than one failure mode. If that is the case, the
reliability engineer, may elect to build, and arm the EXAKT agent with, a second
decision model. The agent will sequentially apply the second model, one that
targets a (different) failure mode (whose risk of failure may be more
dominantly reflected by vibration[1]).
Alternatively, the reliability engineer may
manually add warning limits to the model to suggest, for example, a more
intrusive inspection of the unit, or a calibration check on the instrumentation
used in collecting the vibration data. In EXAKT, this feature sets up
descriptive warning limits that will be superimposed upon the optimal decision
graph. The warning limits act as additional guidance for maintenance personnel
to act[2].
A final point to note is that, the CBM Lab
designed EXAKT to extend current methods in on-condition maintenance.
Thus, if human interpretation of monitored data reveals a new failure mode, not
addressed by the current EXAKT model, the human expert may elect to update or
add this ‘new’ experience to the intelligent decision agent’s ‘knowledge base’.[3]
The “cost comparison” function in EXAKT provides a means for monitoring the
improving effectiveness (towards the attainment of an objective) of a
model as time proceeds and new
experience accrues.
The shape of the boundary curve in the
decision graph depends on the value of the shape parameter[4]
in the PHM (proportional hazards model). If the shape parameter equals 1, the
boundary curve will be a horizontal line. The graph shown above represents the
situation when the shape parameter is much greater than 1. In this instance,
the working age dominates the other significant risk factors, such as vibration
level, that you refer to in your question.
To explain this more clearly, we can treat
the age itself as a risk factor (the actual variable is log(age)) in the model.
Now we have two risk factors: the age and the vibration level. If
we fit the model[5] with shape
parameter fixed to one[6],
the model will be the same as one fitted with only a single risk factor, the
vibration level. (Of course, in the retained model, shape parameter is not
fixed to 1). The advantage of using a model that treats age as a risk factor is
that we can see the effect of age (compared with other factors) on the risk of
failure.
Using the actual EXAKT model, however,
we observe (by noting the shape of the boundary curve) that the shape
parameter is, in fact, greater than 1. That says that the risk of failure is
dominated by the age factor. It does not say that the vibration level is
not important. Instead, it says that the vibration level is not as
important as the age factor. Hence, in this instance, where a high vibration
level does appear at an early age, the decision graph in EXAKT advises that there is no need to renew[7]
the unit.
As a general response, EXAKT’s algorithms’
statistical, mathematical, and logical robustness pass muster in the open
literature. Professional and academic
peer reviewed journals have published many papers that describe the CBM Lab’s
theoretical and practical development work in EXAKT. Here are just a few of those
references:
-
-
Jardine, A.K.S.,
Anderson, P. M. and Mann, D. S. (1987), “Application of the Weibull
proportional hazard model to aircraft and marine engine failure data”,
Quality & Reliability Engineering International, 5, 77-82.
- - Jardine, A.K.S., Banjevic, D., Wiseman, M., Buck S. and Joseph T. (2001). “Optimizing a mine haul truck wheel motor’s condition monitoring program: use of proportional hazards modeling”, Journal of Quality in Maintenance Engineering, Vol. 7, No. 4, pp. 286-301.
- - Makis, V. and Jardine, A.K.S. (1992). “Optimal replacement in the proportional hazards model”, INFOR, Vol. 30, pp.172-183.
-
- Vlok, P.J., Coetzee, J.L., Banjevic, D., Jardine, A.K.S. and Makis,
V. (2002), "Optimal Component Replacement Decisions using Vibration
Monitoring and the PHM", Journal of the Operational Research
Society, 53, pp. 193-202.
- - Daming Lin, Murray Wiseman, Dragan Banjevic, and A.K.S. Jardine (2004) "An approach to signal processing and condition-based maintenance for gearboxes subject to tooth failure" Mechanical Systems and Signal Processing, Volume 18, Issue 5 , September , Pages 993-1007
As a more specific answer we may discuss
some of the basic algorithms used in EXAKT. The first step in the development
of an EXAKT optimal decision model requires that we build a “proportional
hazard model”. The process consists of applying each variable monitored by a
CBM task to the EXAKT model creation process. In the procedure we test whether
a monitored variable does indeed correlate with past failures (functional or
potential). If it does, then that variable becomes a candidate for our
proportional hazard model. EXAKT uses an algorithm based on the Maximum
Likelihood Estimation method to fit the model to the available data. It then
uses extensive statistical testing to verify how good the resulting model
is.The figure below shows the window in EXAKT where these tests are initiated.
Once the
PHM is built and tested, we proceed to build the decision model. Once that is
done, we invoke additional statistical methods for investigating and testing
the decision model. A snapshot showing these features is shown below.
Finally,
once the reliability engineer is satisfied with the decision model, he (she)
exports it onto the network where it assumes the role of an “intelligent agent”
or “watchdog agent” that silently monitors new data as it arrives into one or
more databases[8]. The agent
generates its optimal interpretations as records in a database table. These
records may be accessed by the CMMS, ERP, or any other programs or humans
requiring the results of the CBM decision analysis.
Although
we have had several successful case studies with our industrial partners, the
final validation of the EXAKT methodology will be its wide acceptance and
implementation in industry. We are confident that this will come soon.
The boundary curve corresponds to the
optimal hazard level h*. In other words, all the points on the curve have the
same hazard level h*. You can imagine doing a hazard rate contour plot here.
The boundary curve is the contour with hazard level h*. In this example
decision graph, the smaller the age, the bigger the value of Z should be to get
the same hazard level h*. This is why you observe a wider range of Z under the
boundary curve at earlier age. At age zero, any real value of Z will lead to a
zero hazard level. So, to have the same hazard level h* at age zero, Z must be
infinity. According to the way the decision graph is meant to be read, this
does not mean that you are going to have infinite Z in reality. The boundary
curve just divides the whole space for (age, Z) into two parts: the area under
the curve and the area above the curve. If the point (age0, Z0) corresponding
to your readings lies in the area above (under) the curve, it means that the
current hazard level is above (below) the optimal hazard level h* and the
decision would be to replace (not to replace) the unit. For more detail on the
decision graph, please see “Optimal replacement policy and the structure of
software for condition-based maintenance”, Journal of Quality in Maintenance,
V. 3, No. 2, pp109-119.
[1] In which case that second model will have a boundary whose shape is closer to a horizontal straight line. Hence the output of EXAKT my consist of several optimal decision graphs each monitoring a different component or failure mode. Each model will process the same (vibration) data, but in different ways. This is the correct way to treat complex items. (see the article Using your CMMS for Reliability Improvement)
[2] For example, it may suggest a temporary increase in monitoring frequency
[3] This may be done either by incorporating the new “life cycle” into the current model, or by using the “marginal analysis” feature. In marginal analysis two or more models (corresponding to two or more respective failure modes) combine to cover the entire asset. Which of these two options (single or multiple model) to use depends on whether you consider the item “simple” or “complex”. If the item is complex, before updating the model, you need to judge whether the same or a different failure mode was active in ending the current life cycle (either by functional or potential failure).
[4] The shape parameter is estimated by the software using the Maximum Likelihood Estimation method. The shape parameter determines the influence of working age on failure risk. The other parameters estimated by the software determine the influence of the other factors (e.g. features extracted from a vibration signal) on risk of failure.
[5] By “fit” we mean estimate the parameters of the PHM so that the model fits the data with as “good” a fit as possible. (Later we test to see if the fit is “good enough”.)
[6] An EXAKT option that is used in the modeling process. We fix shape to 1 and test whether that hypothesis should be accepted or rejected. In the development process of an EXAKT CBM optimal model, this, as well as a number of other hypotheses are tested for statistical acceptability. Each hypothesis is either accepted or rejected.
[7] Repair or replace
[8] The decision model “knows” where (in which tables of which databases) to access the data it needs to run.