The EXAKT CBM optimizing methodology was applied to a set of experimental condition data measurements in order to develop an optimal interpretation process or policy. An optimal policy is a procedure for data interpretation, which, if applied consistently in a CBM program, minimizes the cost of maintenance of a physical asset or maximizes its availability...
Daming Lin, Murray Wiseman, Dragan Banjevic, Andrew K. Jardine
CBM Lab, University of Toronto
Abstract
A Condition Based Maintenance (CBM) policy is a procedure used by maintenance personnel to interpret a set of measured machine condition indicators and decide whether or not to renew a physical asset at the current moment.
However our ability to collect large amounts of condition data has continually outpaced our ability to define policies for its interpretation. Multiple measurement points in a process may be monitored and these health indicators may sometimes contradict one another. Upward or downward trends are frequently obscured by randomness in the data. In many instances no clear set of limits or rules have been developed to indicate whether or not a failure process is underway and how much time is available before the physical asset is no longer able to perform one of its functions.
The EXAKT CBM optimizing methodology was applied to a set of experimental condition data measurements in order to develop an optimal interpretation process or policy. An optimal policy is a procedure for data interpretation, which, if applied consistently in a CBM program, minimizes the cost of maintenance of a physical asset or maximizes its availability.
1. Introduction
The MDTB makes a large number of condition indicators available for analysis, for example, the conventional vibration features such as acceleration amplitudes at various gear and bearing frequencies. In particular the Fault Growth Parameter (FGP) was calculated from the residual error signal obtained by a signal processing algorithm (see Miller) developed at ARL. We call it, henceforward, the ARL algorithm. In this algorithm, a family of Wavelets is constructed to decompose the gear motion error signal and extract the residual error signal for gear fault detection. In addition, we proposed a modified version of FGP called FGP1 by weighting each point in the residual error signal spectrum proportional to its deviation from a reference baseline. Besides FGP and FGP1, other useful indicators were extracted from the residual error signal. It was found that the revised version of FGP (FGP1) is superior to the FGP and other condition indicators for building the EXAKT model and making the replacement decision.
The eleven test runs were designated Test Run Numbers: 05, 06, 07, 08, 09, 10, 11, 12, 13, 14, and 15. All gearboxes were run in at 540 in-lbs torque for the first 96 hours of each test. Following the initial run-in period the output torque was increased as follows:
Table 1
Vibration acceleration readings were taken at 8-hour intervals during the 96-hour run-in period and at 30-minute intervals during the high load 'operational' phase. Readings were of 10 seconds duration and sampled at a rate of 20 kHz. Accelerometers were located at various positions on the gearbox casing.
In this paper, we analyze the data set described above and apply the EXAKT CBM optimizing methodology to develop optimal maintenance policies for the gearboxes. The paper is organized as follows. In Section 2, a signal processing technique is used to extract useful information from the raw vibration signals. Based on the extracted information, we obtain the event and inspection data that are essential to applying EXAKT. Data cleaning and pre-processing are also included in this section. In Section 3, EXAKT software is used to analyze the data, build PHMs for the gearboxes and develop optimal maintenance policies for them. Finally in Section 4, the results are summarized and some concluding remarks are given.
2. Data pre-processing and analysis
2.1 Vibration Signal Processing
The goal of signal processing in CBM is to filter out of the signal, as much operational and environmental data as possible, so that the magnitude of the remaining signal reflects the “ground truth” state of deterioration for the targeted failure mode. We modified the definition of FGP by assigning weights to the residual error signal points which exceed three standard deviations from the baseline residual. The weights were computed as proportional to the magnitude of their deviations. The modified version of FGP is called FGP1 in this paper. The signal processing technique described above enables us to prepare a table of inspection data related to the degradation process of the failure mode “tooth fractured” and a table of “event” data (installations, failures, suspensions, adjustments, etc). These tables are essential to applying the EXAKT procedure for creating statistical models supporting predictive maintenance decision making.
2.2 Event Data
In the events table “Ident” refers to Test Run number. “Event” refers to events: “B” designates the installation of the gearbox, “EF” designates the tooth failure event and “ES” designates the end of the test due to shaft failure (considered as a suspension with respect to the “gear tooth fractured” failure mode). “Date” is the calendar date and time when the event occurred. “WorkingAge” refers to the working age as a measure of service usage. Since, in each test, the gearbox operated under varying loads with torques ranging from 540 in-lbs up to 1620 in-lbs, it would be inappropriate to use simple calendar “running time” as a service usage measure, which would ignore the different working conditions under which the gearbox operates. As a reasonable approach we used the integral of the product of actual running time and instantaneous torque as the working age reflecting the accumulated stress on the gear teeth. The unit for working age, as defined, is in-lb-day. [#fig1]
Figure 1: Events table for all eleven histories
2.3 Inspection Data
The signal processing technique is used to compile the table of inspection data related to the degradation associated with the targeted failure mode. Some tests ran for a period of time after failure occurred. Only inspections prior to detected tooth failure are included in the inspections table. For purposes of comparison some ‘conventional’ vibration features were also included in the inspections table. For example, the maximum amplitude of acceleration in a narrow frequency band around the gear mesh frequency and the sidebands, were tested as potential covariates in a proportional hazards model. The proportional hazards modeling analysis revealed, however, these are not significant indicators of gear tooth failure.
Inspection data are summarized as shown in Figure 2 which is a partial view of the entire Inspections table. “Ident”, “Date”, and “WorkingAge” have the same meaning as in the Events table. The other variables given in the Inspections table are:
In addition to FGP and FGP1, RFM, RFS, RTM, and RTS were extracted from the residual error signal.
Figure 2: Inspections table
In the data set, there are several outliers that were reported by the ARL investigators as invalid data. Two significant outliers, one in Test Run 10 and the other in Test Run 14, were corrected by interpolating between the preceding and next values. The last three inspections in Test Run 11 have very high values. These abnormal readings may have been caused by contamination from other vibration sources as a result of the shaft failing. These three inspections were removed from the table prior to analysis.
2.4 Data Pre-processing
Figure 3: FGP and FGP1 vs Timestamp (Test Runs 5?11)
From the graphs, we observe that FGP and FGP1 are almost identical for Test Runs 11, 13, 15, which ended as suspensions; and that FGP1 has larger values than FGP when the timestamp is close to the end of the test for all the test runs (Test Runs 5, 6, 7, 9, 12, 14) that ended in gear tooth failure. This is not very clear for Test Run 9. The relatively low values of FGP and FGP1 (and brief warning period) prior to failure in Test Run 9 might be explained by high variation in the reference baseline of the residual error signal. Using FGP alone, it is difficult or even impossible to distinguish between a failure history and a suspension history, (e.g., Test Run 14 and Test Run 15). Hence, we may expect that FGP1 is a better gear tooth failure indicator than FGP. It will be shown in the modeling phase that FGP1 is indeed a better indicator.
Figure 4: FGP and FGP1 vs Timestamp (Test Runs 12?15)
Next, correlations among the covariates were investigated for three cases: for data from Group A, for data from Group B and for the entire data set (Group A + Group B). Correlation analysis of the covariates is often useful to help in covariate selection in building a statistical (proportional-hazards) model. For each of the three cases, similar results were found, that is: FGP and FGP1 are highly correlated having a correlation coefficient of over 90%. Among the covariates RFM, RFS, RTM and RTS the correlation coefficients are over 90%. However, the correlation between any two covariates, one from the grouping of covariates FGP and FGP1, and the other from the grouping of covariates RFM, RFS, RTM and RPS, was relatively low (correlation coefficient less than 50%). It may be expected, then, that one representative from each grouping of covariates might be appropriate for inclusion as a covariate in the proportional hazards model.
3. Modeling and model analysis
The technique of PHM determines how the risk to failure, or hazard, depends on covariates. The influence of a covariate on the risk is expressed by the covariate parameters - covariate weights - which are the main outcome of the PHM analysis. The mathematical formula for the hazard at time t is:
Equation 1
The PHM is operating context specific. That is, if the physical asset’s operating context or mechanical configuration changes, then a different failure risk model (different covariate weightings) may apply. In the following subsection, we investigate whether the PHM depends on gearbox geometry. If so, we would be inclined to build two separate PHMs, one from each data set of Group A and Group B, rather than building a single PHM from the combined data of all test runs.
3.1 The Effect of Gearbox Geometry on the PHM
3.2 Analysis of Group A
In the analyses of Group A gearboxes, six different PHMs were investigated. The results for the six models with significant covariates are presented in Table 1. Also the model with both covariates FGP and FGP1 was analyzed and, as anticipated, FGP appeared not to be significant in that combination, although it is significant on its own. This means, simply, that FGP1 includes a greater amount of useful information than FGP.
Equation 2
The optimal replacement decision policy (see[1]) was calculated for each of the six PHMs. We used an estimate of the costs of failure and preventive replacement of $5000 and $1000 respectively. Alternatively, if maximum asset availability were the required optimization objective, one might apply a mean time to return to service (MTTR) of 1 week to 5 weeks respectively. To each policy there corresponds an optimal expected cost per unit of working age (given in the second column of Table 2). The column "Expected cost per in-lb-day" is the theoretical average cost (of both preventive and reactive maintenance) per unit working age determined at the minima of a graph of average cost versus risk. The "Average cost per in-lb-day applying the EXAKT decision policy" is the actual average cost that would have been expended had the optimal decision policy been in force during the sample.
Table 2:
Figure 5: Decision graphs for Test Runs 5-11
From Table 2, we see that the decision policy based on model “RFS” yields the lowest expected cost, better than the second lowest yielded by the model FGP1. Which model should we choose for an optimal CBM data interpretation policy? In principle the best method would be determined by applying all these models in practice and to see which one gives the best results on average. That would be impractical. A “cost comparison” function in the software may be used to conveniently investigate the relative merits of alternative policies. The cost comparison in EXAKT generates the average cost per unit of working age calculated when the policy is applied retroactively to the data used in the analysis. The results of the cost comparison are summarized also in Table 2. The Cost Comparison function may be considered a final check of the statistical and decision model by reporting whether the decision model is useful, i.e. whether it improves current practice.
From the cost comparison it was found that models FGP1 and FGP1+RFM are similarly good (with average costs $0.231 and $0.233 respectively) and better than the other models. This could have been expected given the calculation methods and physical meanings of the variables. The difference between the theoretical and retroactively calculated costs (columns 2 and 3 of Table 2) may be explained by inaccuracies in the model parameter estimates due to small sample size. We may, nonetheless, consider models FGP1 and FGP1+RFM as good models, useful for the interpretation of CBM inspection data. In the model FGP1+RFM, the working age appeared non-significant ( ). We may prefer to use FGP1 as the final model because it is simpler, having only one variable.
Model FGP1 was applied to all seven histories from Group A. The decision graphs for these tests are presented in Figure 5. The optimal decision policy is applied by acting upon these graphs. If the point corresponding to the composite of the covariates lies in the green (lower) region of the graph, no maintenance action is recommended, in which case the expected remaining useful life (RUL), which is defined as the expected time to replacement due to either failure or preventive maintenance, is reported in the text box on the upper-right corner of the decision graph. If the point lies in the red (upper) region, the policy recommends immediate renewal. From Figure 5, we observe that the application of the model would have resulted in a recommendation to renew the gearboxes (which actually failed) prior to their failure. Furthermore, we observe that no recommendation would have been made to unnecessarily remove the unfailed gearboxes.
3.3 Analysis of Group B
Again, the optimal expected costs per unit of working age and the results of the cost comparison for all models are summarized in Table 4.
Table 4: Optimal average maintenance costs for Group B
From Table 4, we see that model FGP1 has the lowest expected cost per unit of working age. The model with the second lowest average maintenance cost is model RTS. The cost comparison, however, shows that model RTS is slightly better than model FGP1 (with average costs $0.306 and $0.311 respectively). Summarizing, we conclude that model FGP1 may be deployed as the final model. The decision model FGP1 is applied to Test Runs 12-15 and the decision graphs are presented in Figure 6. The FGP1 in Test Run 15 fluctuates slightly towards the later part of the test in response to the ramping up and down of load in the test. To remove the fluctuation of FGP1 and improve the model, we may use a smoothing function in EXAKT. The model was rebuilt based on the smoothed version of FGP1 and the decision graphs are presented in Figure 7. From Figure 7, we observe again that the application of the model would have resulted in a recommendation to renew the gearboxes (which actually failed) prior to their failure and no recommendation to unnecessarily remove the unfailed gearboxes.
Figure 6: Decision graphs for Test Runs 12-15
Figure 7: Decision graphs for Test Runs 12-15 using smoothed FGP1
5. Conclusion
Acknowledgements We are most grateful to the Applied Research Laboratory at Penn State University and the Department of the Navy, Office of the Chief of Naval Research (ONR) for providing the data used to develop this work. We also thank Bob Luby at PricewaterhouseCoopers for his support. This work has been supported by the Natural Science and Engineering Research Council (NSERC) of Canada. The authors wish to thank NSERC for their financial support.
References
[2]. Lin, J. and Qu, L. (2000). “Feature extraction based on Morlet wavelet and its application for mechanical fault diagnosis”, Journal of Sound and Vibration, Vol. 234, No. 1, pp.135-148.
[3]. Makis, V. and Jardine, A.K.S. (1992). “Optimal replacement in the proportional hazards model”, INFOR, Vol. 30, pp.172-183.
[4]. Miller, A.J. (1999). “A New Wavelet Basis for the Decomposition of Gear Motion Error Signals and Its Application to Gearbox Diagnostics”, M.Sc. Thesis, The Pennsylvania State University.
[5]. Miller, A.J. and Reichard, K. M. (1999). "A new wavelet basis for automated fault diagnostics of gear teeth," Proceedings of Internoise 99, International Institute of Noise Control Engineering, Fort Lauderdale, FL.
[6]. Reichard, K. M. and Miller, A. J. (2000). “Wavelet-Based Filter Design for Gear Tooth Fault Diagnostics and Prognostics, Improving Productivity Through Applications of Condition Monitoring”, 54th Meeting of the Society for Machinery Failure Prevention Technology, Virginia Beach, VA, pp. 365-374.
[7]. Wang, W.J. and McFadden, P.D. (1995). “Decomposition of gear motion signals and its application to gearbox diagnostics”, Journal of Vibration and Acoustics, Vol. 117, pp. 363-369.
[8]. Wang, W.J. and McFadden, P.D. (1996). “Application of wavelets to gearbox vibration signals for fault detection”, Journal of Sound and Vibration, Vol. 192, No. 5, pp.927-939.
[9]. Young, R. (1993). Wavelet Theory and Its Applications, Kluwer Academic Publishers.