2.1 FAULT DIAGNOSIS METHODS
2.1.3 Pattern Recognition Methods
Pattern recognition has become increasingly important for FDD application (Gottlieb, Arzhanov, Gudowski, & Garis, 2006; Zio & Gola, 2006; Moshkbar-Bakhshayesh & Ghofrani, 2013; Chiang et al., 2004; Widodo & Yang, 2007; Zhu & Song, 2011). Pattern recognition mainly deals with categorization (classification or clustering) of objects into particular groups based on features extracted from related measurement data, so that objects in the same group are similar to one another from certain perspectives (Duda, Hart, & Stork, 2000; Jain, Duin, & Mao, 2000; Murty & Devi, 2011; Webb & Copsey, 2011). In fault diagnosis applications, measurements from a system are analyzed by pattern classification models to test hypotheses for different fault classes. Pattern recognition is very closely related to the fields of artificial intelligence and machine learning (Bishop, 2006; Alpaydin, 2010). In fact, most methods used for FDD applications have been studied in all three fields. Many data-driven methods, such as PCA and KDA, can be used for pattern recognition as well.
If class labels of the groups are unknown, pattern recognition is a clustering problem where the objects are partitioned into clusters or groups whose labels are just the cluster identities. Pattern recognition is more often a pattern classification problem with class labels of the groups as known. Pattern classification methods used for fault diagnosis applications are mostly supervised models where the classifier is trained using labeled
training data and new unlabeled data is then tested on the classifier. However, for cases where labeled training data is scarce, but unlabeled data are abundant, semi-supervised pattern classification can be considered where both labeled data and unlabeled data are integrated to train the classifier.
Clustering divides a set of objects into clusters, so that objects in the same cluster are more similar to each other than to those in another cluster (Jain & Dubes, 1988; Jain, Murty, & Flynn, 1999; Gan, Ma, & Wu, 2007; Webb & Copsey, 2011). Fuzzy c-means (FCM) clustering is one of the most visual algorithms for FDD applications (House, Lee, & Shin, 1999; Teppola, Mujunen, & Minkkinen, 1999; Zio & Baraldi, 2005a; Zio & Baraldi, 2005b; Aydin, Karakose, & Akin, 2008; Liu, Ma, & Mathew, 2009; Pan, Chen, & Li, 2010; Sun, Xue, Du, & Sun, 2010; Baraldi, Razavi-Fara, & Zio, 2011). Spectral clustering is a relatively new algorithm (Shi & Malik, 2000; Ng, Jordan, & Weiss, 2001; Von Luxburg, 2007). Though spectral clustering has not been applied for FDD, it is the basis of a semi-supervised classification model used in this research.
In a supervised pattern classification model, a classifier is first trained using data whose class labels are known. The classifier is then applied to new measurement data to
estimate the class labels. Pre-processing is often applied to the raw input data to extract a vector of features. The classifier is actually trained and tested using the features so that unique characteristics of different classes can be better revealed. A great variety of pattern classification methods have been developed such as k-nearest neighbor (k-NN), neural networks, Naive Bayes classifier, hidden Markov model (HMM), SVM, logistic regression, fuzzy logic, decision trees and rules, random forests, and the hybrid and ensemble of different models. Details regarding those models and the training processes can be found in the rich literature on pattern recognition and machine learning (Jain et al., 2000; Bishop, 2006; Murty & Devi, 2011; Webb & Copsey, 2011; Dougherty, 2013; Hsu & Lin, 2002). For a FDD application, the class labels are related to specific fault hypotheses. The classifier is trained offline using training data with known fault classes. When new measurement data become available, their class labels are estimated by the classifier; thus, the current condition of the system is determined from the class label assignment. Applications of pattern classification models to process fault diagnosis have
been extensive and are still growing fast. The growing research interests on this topic are reflected from one review of FDD application of the SVM algorithm alone (Widodo & Yang, 2007).
A supervised pattern classifier only produces credible results for scenarios covered by the training data. In some applications, reliable training data are very scarce due to excessive expenses to label the data or technical difficulties to acquire the data in the first place. However, unlabeled measurement data are easily available. Semi-supervised classification (SSC) models have been developed for such situations. In a SSC model, both labeled data and unlabeled data are utilized for model training. Additional information provided by the unlabeled data (e.g., data distribution and manifold structure) can help to achieve enhanced performance than using the labeled data alone. SSC is generally based on the clustering assumption which states that nearby data points likely belong to the same class, as well as the manifold assumption, which says that data points on the same manifold structure are likely to be in the same class (Chapelle, Weston, & Schölkopf, 2002; Belkin, Niyogi, & Sindhwani, 2006; Niyogi, 2013). A SSC model can achieve superior performance because the classifier can be designed to avoid cutting through high density regions or manifolds with the availability of unlabeled data. A number of SSC methods have been developed with different ways to realize the assumptions such as transductive SVM, co-training, and various graph-based methods using manifold regulations, graph minicut, harmonic functions, local and global
consistency, and spectral graph transducer. More information about the methods can be found in the following research papers and surveys (Blum & Mitchell, 1998; Vapnik, 1998; Joachims, 1999; Blum & Chawla, 2001; Seeger, 2001; Zhu , Ghahramani , & Lafferty, 2003; Zhou, Bousquet, Lal, Weston, & Schölkopf, 2004; Chapelle, Schölkopf, & Zien, 2006; Azran, 2007; Camps-Valls, Marsheva, & Zhou, 2007; Zhu, 2008;
Mallapragada, Jin, Jain, & Liu, 2009; Zhu & Goldberg, 2009). Superior performance of SSC has been demonstrated in various numerical studies. However, care has to be taken in excise for specific applications (Singh, Nowak, & Zhu, 2008; Lu, 2009). It has been shown in (Lu, 2009), that it is important to ensure that there exists a truly non-trivial relationship between distribution of the unlabeled data and the class labels. SSC has not been tested for process fault diagnosis applications; however it provides a promising tool
for fault diagnosis applications where acquiring training data under fault conditions is challenging but unlabeled data is readily accessible from the SCADA system. The reason is that correlations often exist in different variables of a process due to their physical and functional couplings. Therefore, data collected under the same fault condition tend to fall in the same high density region or on the same manifold structure.
In summary, pattern recognition envelopes a great number of methods for clustering and classification. Many methods have been used as inference engines to diagnose problems in various engineering fields. In fact, scientific studies on FDD applications have been extensive and are still becoming increasingly more popular. This is particularly the case for supervised pattern classification methods. The performance of a supervised classifier can be affected by scarcity of training data for applications in systems like a NPP. SSC provides an interesting alternative if labeled data is rare, but unlabeled data is easily available.