• No results found

4.4 Prognostic Algorithms

4.4.2 Prognostic Performance Metrics

Performance metrics are necessary to allow model developers to compare two or more competing models, to understand the validity of a prognostic estimate, and to characterize model performance over different operating regimes, fault modes, or systems. Performance metrics for monitoring, fault detection, and diagnostic systems are well established (Hines et al. 2008a), including accuracy, robustness or auto-sensitivity, spill-over or cross-auto-sensitivity, fault detectability, uncertainty measures, fault detection time, and false alarm/missed alarm rates. However, these conventional metrics fall short in characterizing prognostic model performance (Saxena et al. 2009a). Leão et al. (2008b) attempt to extend some of these metrics to prognostics, including false and missed alarm rates, but these prognostic metrics have seen limited application.

Generally, there are two classes of prognostic model performance analysis: offline and online.

Offline performance metrics evaluate the prognostic algorithm, as a whole, applied to a specific component or class of components. Offline performance utilizes known ground-truth, such as actual failure times, to evaluate model performance. This type of analysis can be useful in the development stage for selecting between multiple competing prognostic models. Online analysis gives a measure of how well a prognostic model is performing in real time for a specific SSC that has not yet failed. Offline performance metrics are currently more commonly proposed and employed for evaluating prognostic

al. 2010; Leao et al. 2011) can be used to comprehensively evaluate prognostic algorithms and compare competing solutions. The badness indicator quantitatively characterizes the deviation of the empirical cumulative density function (CDF) from the expected CDF in the PIT analysis. By establishing

confidence bounds and critical values for PIT (Leao et al. 2011), traditional hypothesis testing can be used to determine how a prognostic algorithm performs given the amount of training and validation data available. Additionally, Leao et al. (2008b) suggests a prognostics extension to the receiver operating characteristics, called the progROC curve, to help evaluate the trade-off between two conflicting prognostics goals.

Prognostic algorithm performance metrics tend to characterize performance in terms of either accuracy (estimation error) or precision (uncertainty). The field, however, is plagued by a problem common to many areas of prediction: The more precise the prognostic estimate, the less likely it is that this estimate will be correct. Practically, there is a trade-off between RUL accuracy and RUL precision;

therefore, both features should be considered simultaneously.

Prognostic models result in a time-series of RUL estimates, and the performance requirements for these predictions vary throughout the life of the system. In general, we are willing to suffer large errors and uncertainties early in life if the prognostic performance improves as the system approaches failure (Line and Clements 2006; Saxena et al. 2008b; Saxena et al. 2009a). Traditional error measures do not account for these progressive acceptable accuracy and precision levels. Saxena et al. (2008b) suggest several metrics to account for this, the most interesting of which are the α-λ performance metrics for accuracy and precision. The α-λ performance dictates that the accuracy (or uncertainty) should be within some specified α*100% of the actual value within a relative distance, λ, to the actual failure, as shown in Figure 4.6. In this figure, 𝑟𝑙 is the actual RUL, and both lines marked 𝑟𝑙 (red and green) represent different RUL estimates. The shaded region indicates 20% error about the actual RUL. The α-λ

performance is a binary true/false metric indicating the estimate is or is not within the specified tolerance at a given fraction of life. For the case shown, both estimates have an α-λ accuracy of "true" for α of 0.2 and λ of 0.5; however, only the red estimate would have an α-λ performance of "true" for λ of 0.9.

The Prognostic Horizon, also proposed in Saxena et al. (2008b), indicates the lead time between end of life (failure) and when the prognostic model first predicts failure to some specified performance (accuracy and/or precision). These metrics are expanded in Saxena et al. (2009b, 2010a; 2010b) to incorporate predicted RUL distribution information. The proposed metrics are largely visual, requiring evaluation of a graph of model performance, similar to that in Figure 4.6. Development of a single value to quantify this performance has not yet been reported.

In addition to concerns about the importance of correctly accounting for temporal needs, prognostic models that predict that failure will occur within a short time before actual failure are generally

considered better than those that predict failure will occur in the same short time after the actual failure.

RUL estimates greater than the actual remaining life leave room for unexpected failures and unplanned maintenance. Saxena et al. (2008b) suggest an exponentially weighted accuracy metric to account for this, which gives a larger penalty for late predictions than for early predictions of failure. This metric considers the RUL predictions made at one point in time across a population of systems, instead of the entire time series of predictions. A similar error metric was used in the 2008 PHM data challenge (Saxena et al. 2008a).

Figure 4.6. α-λ Performance for Accuracy (Saxena et al. 2008b)

Research in health monitoring algorithms and applications is active in many areas outside of the nuclear industry. In fact, most advances in PHM have originated in other areas, although the results may be applicable to NPPs. The following section summarizes the recent research in a few key industries:

electronics, defense, avionics, and wind turbines.