Data-driven fault detection using trending analysis

(1)

Louisiana State University

LSU Digital Commons

LSU Doctoral Dissertations Graduate School

2006

Data-driven fault detection using trending analysis

Min Luo

Louisiana State University and Agricultural and Mechanical College

Follow this and additional works at:https://digitalcommons.lsu.edu/gradschool_dissertations Part of theElectrical and Computer Engineering Commons

This Dissertation is brought to you for free and open access by the Graduate School at LSU Digital Commons. It has been accepted for inclusion in LSU Doctoral Dissertations by an authorized graduate school editor of LSU Digital Commons. For more information, please [email protected].

Recommended Citation

Luo, Min, "Data-driven fault detection using trending analysis" (2006). LSU Doctoral Dissertations. 3185.

(2)

DATA-DRIVEN FAULT DETECTION USING

TRENDING ANALYSIS

A Dissertation

Submitted to the Graduate Faculty of the Louisiana State University and Agricultural and Mechanical College

in partial fulfillment of the requirements for the degree of

Doctor of Philosophy in

The Department of Electrical and Computer Engineering

by Min Luo

B.S., Taiyuan Tech University, 1996 M.S., Tennessee Tech University, 2001

(3)

Acknowledgements

First and foremost, I would like to express my sincere gratitude and appreciation to my advisor, Dr. Jorge Aravena, for his constant guidance to my deeper understanding of the knowledge, and his invaluable comments during the whole research work on this dissertation. They always push the limit of my ultimate research ability and never accept less than my best efforts. I also express my thanks to my minor professor Dr. Jianhua Chen, my other defense committee, Dr. Kemin Zhou, Dr. Morteza Naraghi-Pour and Dr. Bahadir Gunturk.

Special thanks are given to Heath J Leblanc for making the software more user friendly.

I would like to dedicate my work to my parents, Dr. Jinquan Luo and Dr. Shurong Li for their constant encouragement throughout my life. Thanks for always telling me I might soar even higher until I spread my wings. I want to thank my husband Yongjun for his persistent understanding and support.

Last but certainly not the least, I thank Marcie Frank in Cleveland, OH for her grammar correction of my dissertation.

(4)

List of Tables

Table 4.1 The Comparison of Different Subsequence Lengths... 68 Table 4.2 The Comparison of the Number of Support Vectors... 72 Table 6.1 Fault Detection Result for Different Fault Severity and Normal Data

Variation ... 86 Table 6.2 Fault Detection Result for Different Fault Severity and Normal Data

Variation ... 89 Table 6.3 Comparison of Different Bins Number Subsequence Length ... 93 Table 6.4 Closed Loop Simulation Fault Detection Result with One-Class

Classification... 94 Table 6.5 Fault Isolation Performance with Different Multi-Class SVM Approach... 100 Table 6.6 Fault Isolation with One against One Method... 100

(7)

List of Figures

Figure 1.1 Multivariate vs. Univariate... 8

Figure 2.1 Forms of Qualitative Knowledge ... 13

Figure 2.2 Triangular Episodic Representation ... 18

Figure 2.3 Example of Triangular Representation of Process Data... 18

Figure 2.4 Fundamental Language: Primitive ... 19

Figure 3.1 The hypersphere containing the target data, described by the center aand radius R ... 28

Figure 3.2 Pitch Angle vs. Roll Angle under Normal Flying Conditions... 31

Figure 3.3 Non-convex Set Vs Convex Set ... 32

Figure 3.4 Convex Hull... 32

Figure 3.5 Simplex in ndimension... 33

Figure 3.6 Visible Region of p_r on the CH(P_r₋₁)... 37

Figure 3.7 Visible Facet... 37

Figure 3.8 Convex Hull CH(P_r)... 38

Figure 3.9 Convex Hull for Different Sizes of Samples... 41

Figure 3.10 Sphere Data Description with Uniformed Sample Size of 100... 42

Figure 3.11 Sphere Data Description with Uniformed Sample Size of 500... 42

Figure 3.12 Convex Hull and Sphere Data Distribution for Non-Convex Distribution ... 43

Figure 4.1 A Two-Class Classifier and One-Class Classifier... 45

Figure 4.2 An Overtrained Classifier... 47

Figure 4.3 Support Vector Machine... 49

(8)

Figure 4.5 Histogram of Real-Valued Data ... 57

Figure 4.6 Non-uniform Quantization ... 58

Figure 4.7 Mean Square Distortion vs. Bits Number for Normal Data ... 60

Figure 4.8 Mean Square Distortion D vs. the Number of Bits for the Engine Fault ... 60

Figure 4.9 Mean Square Distortion D vs. the Number of Bits for the Aileron Fault ... 61

Figure 4.10 Mean Square Distortion D vs. the Number of Bits for the Elevator Fault .... 61

Figure 4.11 Mean Square Distortion D vs. the Number of Bits for the Rudder Fault... 62

Figure 4.12 Piecewise Approximation of Normal Data... 62

Figure 4.13 Piecewise Approximation of Engine Fault Data ... 63

Figure 4.14 Non-Uniform Histogram of Normal Data ... 63

Figure 4.15 Non-Uniform Histogram of Engine Fault Data... 64

Figure 4.16 SVM Classifier ... 71

Figure 4.17 Majority Voting Schema ... 75

Figure 5.1 One against One Fault Isolation Concept... 78

Figure 5.2 One against All Fault Isolation Concept ... 79

Figure 6.1 Open Loop Linear Simulink Model ... 83

Figure 6.2 Normal Class ... 85

Figure 6.3 Time Delay Vs Fault Severity For Five Faults... 85

Figure 6.4 Distace of Test Data to Normal Convex Hull ... 87

Figure 6.5 Standard Deviation of Distace... 87

Figure 6.6 Trend Indicator – Difference of Standard Deviation of Distace ... 88

Figure 6.7 Closed Loop Non-Linear Simulink Model... 92

(9)

Figure 6.9 Aileron Failure... 96

Figure 6.10 Rudder Failure ... 97

Figure 6.11 Engine Failure ... 97

Figure 6.12 Stabilizer Failure ... 98

(10)

Abstract

The objective of this research is to develop data-driven fault detection methods which do not rely on mathematical models yet are capable of detecting process malfunctions. Instead of using mathematical models for comparing performances, the methods developed rely on extensive collection of data to establish classification schemes that detect faults in new data. The research develops two different trending approaches; one uses the normal data to define a one-class classifier. The second approach uses a data mining technique, e.g. support vector machine (SVM) to define multi class classifiers. Each classifier is trained on a set of example objects.

The one-class classification assumes that only information of one of the classes, namely the normal class, is available. The boundary between the two classes, normal and faulty, is estimated from data of the normal class only. The research assumes that the convex hull of the normal data can be used to define a boundary separating normal and faulty data.

The multi class classifier is implemented through several binary classifiers. It is assumed that data from two classes are available and the decision boundary is supported from both sides by example objects. In order to detect significant trends in the data the research implements a non-uniform quantization technique, based on Lloyd’s algorithm and defines a special subsequence-based kernel. The effect of the subsequence length is examined through computer simulations and theoretical analysis.

The test bed used to collect data and implement the fault detection is a six degrees of freedom, rigid body model of a B747 100/200 and only faults in the actuators are considered. In order to thoroughly test the efficiency of the approach, the test use only

(11)

sensor data that does not include manipulated variables. Even with this handicap the approach is effective with the average of 79.5% correct detection and 16.7% missed alarm and 3.9% false alarms for six different faults.

(12)

Chapter 1 Introduction

1.1 Fault Detection and Isolation

Any complex system is liable to faults. Although good design aims to minimize the occurrence of faults, recognition that such events do occur enables system operators to respond so that the effect faults exert is minimized. Numerous applications of FDI are reported in the literature for aeronautical and aerospace systems, automotive and traffic systems, chemical processes, electrical and electronic systems, nuclear plants, power systems and transportations systems [1].

The following introducessome terms and concepts associated with fault detection and isolation. Following this general introduction, specific fault detection problems in aircraft are given. Finally, areas of interest where research work is concentrated are discussed.

In general terms a fault is any change in a system that prevents it from operating in the proper manner. Reliable detection and isolation of faults is an important part in successful maximization of productivity and safety. If a fault is to trigger the performance of some special behavior or controller reconfiguration, then some method of determining what fault has occurred is required. The procedure is called fault detection and isolation (FDI). Fault detection is a binary decision making process. Either the system is functioning properly, or there is a fault occurrence. Fault isolation is the determination of a fault source. FDI is most often considered to be a two-stage process: firstly the fact that a fault has occurred must be recognized. Secondly the type (source of the fault)should be determined so that appropriate remedial action may be initiated. If further information on

(13)

a fault is required after isolation, such as magnitude, this may be found by fault identification. Once a fault has been detected and isolated, some action is required. The nature of this response varies from triggering an alarm to notifying the operator of a fault condition which allows the operation to continue with a minimal degradation in performance. We can say that the detection and compensation of faults is one of the critical issues in the operation of high-performance systems: production equipment at

power stations, chemical processes, transportation vehicles such as aircraft, space

vehicles, etc.

Fault detection and isolation system schemes in plants detect and try to compensate faults in one or more of the following three subsystems: the actuator, the plant and sensors. Actuator faults are a deviation between the intended control and its realization by the actuators. Plant faults are disturbances on the plant causing a shift in the plant outputs independently of the measured inputs, and may describe plant leaks, overloads, broken down components, etc. Sensor faults are discrepancies between the measured and true values of the plant’s output or input variables. The nature of faults can be classified into three categories: abrupt faults which are dramatic and persistent cause significant deviations from steady state operation, intermittent faults that are present only for very short periods of time that usually exhibit a relatively high occurrence rate after their first appearance and tend to become permanent and incipient faults that normally occur slowly over time are linked to wear and tear of components and drift in control parameters.

Fault detection and isolation systems have several major performance criteria. The design of fault detection systems demands the thorough consideration of these

(14)

criteria and tradeoffs. Based on [1], the most important criteria can be false alarm rate, missed alarm rate and time delay. There is a range of possible faults, and it is unlikely that all systems may correctly identify and respond to all fault modes. The number of false alarms that a detection system generates is very important. Response to a false alarm will tend to degrade the performance of the monitored device under no fault condition. Moreover, the missed alarm rate is more critical because no response to the real fault causes disastrous consequences. The speed at which the detection system operates is also a critical consideration. It can be termed as time delay which is the time difference between actual fault occurrence and fault detection. Ideally, remedial behavior should be triggered in a sufficiently short period sothat the controlled system is never put at risk. The robustness of a fault detection scheme in the presence of modeling errors is also of major importance in most practical cases.

This research work is part of an ongoing NASA sponsored research project with the goal of improving aircraft safety. Aircraft have a long-standing tradition of being the safest among all modes of transportation. Nevertheless, as aviation continues to grow, there are concerns that unless actions are taken to drastically reduce accident rates, increased flights will lead to more accidents. Aircraft hazards can be sorted into two major categories. The first set of hazards is related to the consequences of intentional actions or failures. Usually they are addressed by aircraft security and that is out of the scope of this dissertation research. The other category refers to the undesirable consequences of unintentional actions or failures. They are addressed by the topic of aviation safety whichis our research focus. These hazards caused by component failures account for roughly 25 percent of aircraft fatal accidents. The increased complexity of

(15)

aircraft will make it more difficult to identify all potential failure modes in the design phase. However, we assume that incipient hardware failures have characteristic signatures in theirmeasurements. This is the premise of the data-driven approach to fault detection. Therefore by detecting these signatures, interpreting their significance, and alerting the flight and maintenance crew, the effect of failure can be drastically minimized.

1.2 Research

Approach

Fault detection methods can be broadly classified as model-based or data-driven. Loosely speaking, model-based fault detection uses prior knowledge of the system to develop mathematical models that can, in turn, be used as references to evaluate the current data. The data-driven approach relies primarily on observations that allow the definition of a normal condition. A detailed literature review is in Chapter 2.

In model-based approaches, there exist methods that do not count on the residual for the indication of faults. One representative example is based on the use of multiple models (MM) [3]. It runs a bank of filters in parallel, each based on a model matching possible system structures due to different failures. The abrupt changes of the system are explicitly modeled by switching from one model to another in a probabilistic manner. The model probabilities are used as an indication of a failure because it provides a meaningful measure of how likely each fault mode is at a given time.

Most other model-based fault detection methods are residual-based [4][5]. In this approach, a mathematical model is created by knowing the input and the output of the system, and this model is used to compare the actual output with those nominal behaviors produced by the model and therefore residuals are formed. Then a decision needs to be

(16)

made for fault detection. Model-Based methods have been used in the production industry to overcome difficulties that arise with limit checking and claim advantages such as higher sensitivity; smaller faults can be detected and different faults can be isolated.

The obvious disadvantage of the model-based approach is the need for an accurate model of the process. When a suitable model cannot be derived from physical law, a data-driven method is applied. This method has gained increased acceptance in recent years, especially due to better techniques to define and detect patterns [6]. Unlike the model-based system, in the data-driven method an accurate mathematical model is not necessary. When the system is complex, it is difficult to develop an accurate mathematical model that represents the true system. Approximations and assumptions made in modeling compromise accuracy. Also, in model-based fault detection it is always essential to know the input to the system, which in certain cases might not be possible. Especially in an aircraft system, it is difficult to always have access to the system’s input, whereas the output of the system is mostly available.

We are interested in data-driven methods that require a minimum amount of analytical information about the system. Our approach is a data-driven approach that is possible to detect faults by using the data mining technique alone based on extensive data collection of both normal and faulty performance data. In this dissertation, trending analysis is used for fault detection.

In a majority of cases, process malfunctions leave a distinct trend in the sensors monitored. These distinct trends can be suitably utilized in identifying the underlying abnormality in the process. Trending analysis can be defined as a search for patterns over time in order to identify the ways in which they change and develop, veer in new

(17)

directions or shift. Thus, a suitable classification and analysis of process trends can detect the fault earlier and lead to quick control [7]. Many papers and projects [8][9] have shown trend modeling can be used to explain the various important events happening, do malfunction diagnosis, and predict future states. The premise of fault trending analysis is that hardware failures leave characteristic signatures in the sensor data and one can use data processing tools to make visible the effect of the fault [9][11]. Instead of mathematical models that permit the determination of normal behavior, we propose to make use of an extensive collection of sensor data and create empirical descriptions of various conditions. Moreover, commercial aircrafts have extensive instrumentation, usually with built-in hardware that is redundant both physically and analytically; hence sensor faults can be neglected and measurements can still be considered reliable.

In this thesis, we develop two different trending approaches which use convex hull and one data mining technique, e.g. support vector machine (SVM). Both of them belong in the classification category that assigns a new object to one of a set of classes which are known beforehand. The classifier which performs this classification operation is based on a set of example objects. Initially, we apply the convex hull to perform a one-class one-classification. In one-one-class one-classification it is assumed that only information of one of the classes, which is the normal class in our research scope, is available. The boundary between the two classes, normal and faulty, has to be estimated from data of only the normal class. The convex hull can be used to define a boundary around the normal class. Next the two class classification with the aid of SVM is discussed. It is assumed that data from two classes are available and the decision boundary is supported from both sides by example objects. Moreover, close attention is paid to how to build the string-based kernel

(18)

that is more appropriate for fault detection purposes. Also, multiclass SVM is explored for fault isolation. Simulation data is from NASA's model for a B747 aircraft. The test system is a closed loop nonlinear model obtained from the software package FTLAB 747 which is simulated using Matlab Simulink. A variety of faults have been simulated to study the performance of proposed fault detection and isolation methods.

Most current trend analysis is based on univariate trends. Their trend analysis approach is based on monitoring the trend of each process variable. In summary, the univariate trend analysis has two limitations: (1) The number of false alarms is increased. (2) The dependence among process variables cannot be taken into account [12]. For the simplest case, we consider two variables. In Figure 1.1, two variables y1 and y2 are plotted against each other. They follow multivariate distribution. The ellipse plot represents a set of normal measurements from this distribution. The same observations are also plotted as individual charts for y1 and y2. From this figure, suppose there is one point indicated by the symbol ⊗ that is an abnormal measurement which is clearly outside the ellipse region. However, neither of univariate charts gives any indication of a problem for point ⊗, because it is within limits in both of the charts. The individual univariate charts effectively create a joint acceptance region shaped like a square (shown with the ellipse). This will lead to accepting a wrong measurement as good (point ⊗) but also rejecting a good measurement as bad (point ◊).

Univariate trending limitations mentioned above will worsen as the number of variables increases. In order to avoid univariate trending limitations, we provide an efficient fault detection scheme by looking at the trend of multivariable together, i.e. the

(19)

multivariate trending. From multivariate trend analysis, if we find the measurements have the tendency to leave this joint region, the measurement can be labeled as fault.

Figure 1.1 Multivariate vs. Univariate

1.3 Organization of the Dissertation

This dissertation is organized into seven chapters. In this chapter, the introduction of fault detection and the objective of this research are discussed. The literature related to fault detection methodologies are reviewed in Chapter 2. Chapter 3 discusses the one-class classification, which uses a convex hull to determine the closed boundary of a normal data class. Chapter 4 provides the details of binary classification with SVM. The focus is on how to define the string-based kernel. Fault isolation with the help of multiclass SVM is introduced in Chapter 5. In Chapter 6, we verify the proposed two approaches by testing the performance with data obtained from the B747 aircraft model. Finally, conclusions and recommendations for future work are given in Chapter 7.

(20)

Chapter 2 Literature Review

The purpose of this section is to review the common methods for fault detection. In the past decade, considerable research has been devoted to the area of system fault detection and identification (FDI). There is an abundance of literature on process fault diagnosis ranging from analytical methods to artificial intelligence and statistical approaches [13],[14]. As mentioned in Chapter 1, the fault detection approach can be either model-based or model-free.

Model-Based methods are also known as white box models. They are largely dependant on the laws of physics. The key feature is that a model is maintained that reflects the important structure and features of the system. The basic structure of many model-based detection systems is that the actual system and a model of that system are presented in parallel. Some unexpected behavior is observed in the real system that results in some symptoms being obtained. It is possible to identify possible explanations for such symptoms by comparing the predicted behavior of the model to the behavior of the faulty system. These methods have been used in the production industry to overcome difficulties that arise with limit checking and claim advantages such as higher sensitivity – smaller faults can be detected and different faults can be isolated. The obvious disadvantage is the need for an accurate model of the process. Model-free approaches rely primarily on observations (and experience) that allow the definition of a normal condition. They have gained increased acceptance in recent years, especially due to better techniques to define and detect patterns. The following subsection gives a review of model-based FDI methods. After that we focus on model-free techniques.

(21)

2.1 Model-Based Methods

The model-based fault detection can be broadly classified as qualitative or quantitative. The model is usually developed based on some fundamental understanding of the physics of the process. In quantitative models this understanding is expressed in terms of mathematical functional relationships between the inputs and outputs of the system. In contrast, in qualitative model equations these relationships are expressed in terms of qualitative knowledge about a process. Typical qualitative models are causal models and abstraction hierarchies. Details are given in the following.

2.1.1 Quantitative Model-Based Methods

Most quantitative model-based methods are residual-based. Relying on an explicit model of the monitored plant, these model-based FDI methods require two steps. The first step generates inconsistencies between the actual and expected behavior. Such inconsistencies, also called residuals, reflect the potential faults of the system. The second step chooses a decision rule for diagnosis. There exists a wide variety of residual-based approaches for linear systems, e.g. the observer-residual-based approach, the parity space approach, and the parameter estimation approach.

There exist some model-based approaches that do not count on the residual for the indication of faults. One representative example is based on the use of multiple models (MM). It runs a bank of filters in parallel, each based on a model matching the possible system structures due to different failures. In noninteracting MM, the single-model-based filters are running in parallel without mutual interaction. Such an approach is quite effective in handling problems with an unknown structure or parameter but without structural or parametric changes. However, the problem of FDI does not fit well into such

(22)

a framework because, in general, the system structure or parameter does change as a component or subsystem fails. A notable recent advance in MM is the development of the interacting multiple-model (IMM) estimator [3]. By comparison, IMM can overcome the

above weaknesses of the noninteracting MM approach by explicitly modeling the abrupt

changes of the system by "switching" from one model to another in a probabilistic manner. In IMM, the model probabilities are used as an indication of a failure because it provides a meaningful measure of how likely each fault mode is at a given time.

Quantitative model-based methods have some desirable characteristics. If one has complete knowledge of all inputs and outputs of the system, including all forms of interactions with the environment, fault diagnosis would be a well-defined problem regardless of the number of faults present. On the other hand, if there is only a single sensor indicating whether the system is normal or faulty, then nothing can be diagnosed including the proper functioning of the sensor itself. The effectiveness of any diagnostic procedure is limited by the availability of sensor information [11].

A crucial need in the model-based approach is to state the significance of the observed changes with respect to the noise, unknown inputs which cannot, in any reasonable way, be modeled as random processes with known statistics. This is the general limitation of all the model-based approaches that have been developed so far. One of the popular ways of doing this is the method of disturbance decoupling. In this approach, all uncertainties are treated as disturbances and filters are designed to decouple the effect of faults and unknown inputs so that they can be differentiated [16], [17].

The other alternative is that the FDI problem has been addressed from a statistical point of view, with faults modeled as deviations in the parameter vector of a stochastic

(23)

system. Fault detection and isolation have been stated as hypotheses testing problems [18]. The key feature of this method is its ability to handle noises and uncertainties, to reject nuisance parameters and to select one among several hypotheses. First of all, FDI problems in dynamic systems are reduced to the universal static problem of monitoring the mean value of a Gaussian vector through the help of a convenient residual generation. Then different hypotheses testing methods are investigated for FDI [18].

Moreover, the types of models the analytical approaches can handle are limited to linear and some very specific nonlinear models. For a general nonlinear model, linear approximations can prove to be poor and the effectiveness of these methods might be greatly reduced. When a large-scale process is considered, the size of the bank of filters can be very large increasing the computational complexity.

2.1.2 Qualitative Model-Based Methods

The qualitative models can be developed either as qualitative causal models or abstraction hierarchies. Figure 2.1 shows the taxonomy of domain knowledge based on these two broad categories. In casual models, the cause-effect relations can be represented in the form of signed diagraphs. Causal models are a very good alternative when the quantitative models are not available but the functional dependencies are understood. Another form of model knowledge is through the development of abstraction hierarchies based on decomposition. The idea of decomposition is to be able to draw inference about the behavior of the overall system solely from the laws governing the behavior of its subsystems.

(24)

Figure 2.1 Forms of Qualitative Knowledge

Abstraction hierarchies help to quickly focus the attention of the diagnostic system to problem areas. One of the advantages of qualitative methods based on deep knowledge is that they can provide an explanation of a fault propagation path. This is indispensable when it comes to decision-support for operators. They can also guarantee completeness in that the actual fault will not be missed in the final set of faults identified. However, they suffer from the resolution problems resulting from the ambiguity in qualitative reasoning. When quantitative information is partially available, one could use the order-of-magnitude analysis or interval-calculus to improve the resolution of purely qualitative methods [11].

In the case of the qualitative model-based approaches, the combinatorial complexity is unavoidable and can only be partly alleviated with an efficient search [19]. Because of many multiple fault combinations, the search for multiple faults by specifying them explicitly as different classes and obtaining training patterns for them is not feasible.

(25)

From an industrial application viewpoint, the majority of fault diagnostic applications in process industries are based on model free or process history based approaches. This is due to the fact that process history based approaches are easy to implement, requiring very little modeling effort and a priori knowledge. Further, even for processes for which models are available, the models are usually steady-state models. It would require considerable effort to develop dynamic models specialized towards fault diagnosis applications.

2.2 Model-Free

Methods

Unlike the model-based approaches where a priori knowledge about the system is needed, in model-free methods, only the availability of the large amount of historical data is needed. They are also known as the black box approach. In this research, our goal is to develop global vehicle health indicators that do not rely on mathematical models yet are capable of detecting process malfunctions. There are different ways in that data can be transformed and presented as a priori knowledge to a detection system. This is known as feature extraction. In terms of feature extraction, model-free methods can be either qualitative or quantitative in nature. Two of the major methods that extract qualitative history information are the expert systems and trend modeling methods. Methods that extract quantitative information can be non-statistical or statistical methods. Neural networks are an important class of non-statistical classifiers. Nowadays data mining is one of the most active research fields. The key advantage of data mining-based fault detection is that it can automatically generate concise and accurate detection models from large amounts of data.

(26)

2.2.1 Qualitative Feature Extraction

2.2.1.1 Expert Systems

The main advantages in the development of expert systems for diagnostic problem-solving are: ease of development, transparent reasoning, the ability to reason under uncertainty and the ability to provide explanations for the solutions provided.

There are a number of researchers who have worked on the application of expert systems for diagnostic problems. Becraft has proposed an integrated framework comprising of a neural network and an expert system [20]. A neural network is used as a first-level filter to diagnose the most commonly encountered faults in chemical process plants. Once the faults are localized within a particular process by the neural network, a deep knowledge expert system analyzes the result, and either confirms the diagnosis or else offers an alternative solution. Tirifa has proposed a hybrid system that uses signed directed graphs (SDG) and fuzzy logic [21]. The SDG model of the process is used to perform qualitative simulation to predict possible process behaviors for various faults. Those predictions are used to generate (if-then) rules that are evaluated by an expert system using information about the actual process state and fuzzy logic.

There are two types of methods for modeling knowledge for an expert system. They are shallow knowledge and deep knowledge [22]. Shallow knowledge expert systems use (if-then) type rules as the primary means of knowledge representation. These rules are formulated based on a large collection of empirical observations. In cases where the failure modes are not well known (eg: some faults are unanticipated and have very low probability of occurring), these systems are inadequate, and deep knowledge systems are more appropriate. When confronted with an unfamiliar problem an expert can resort

(27)

to “first principles”. Through an in-depth understanding of the problem, an expert can resolve problems that have not been well documented by prior observation. In this situation, the knowledge used by the expert is referred to as “deep knowledge”. This approach provides a broader knowledge base, as well as modularity for incorporating new knowledge.

However, in all applications, the limitations of an expert system approach are obvious. The expert-based fault detection system fails to generalize anddetect new faults without known signatures. Knowledge-based systems developed from expert rules are very system-specific. Their representation power is quite limited, and they are difficult to update [23].

2.2.1.2 Trend Analysis

A second approach to qualitative feature extraction is the abstraction of trend information. For tasks such as diagnosis, qualitative trend representation often provides valuable information that facilitates temporal reasoning about the processes behavior. In a majority of cases, process malfunctions leave a distinct trend in the sensors monitored. These distinct trends can be suitably utilized in identifying the underlying abnormality in the process. Thus, a suitable classification and analysis of process trends can detect the fault earlier and lead to quick control. Many papers and projects have shown that trend modeling can be used to explain the various important events happening in the process, do malfunction diagnosis and predict future states. The following is an overview of some trend analysis methods and applications.

(28)

Cheung has built a formal framework for the representation of process trends [24]. A language called triangular episodic representation is formulated and used in trend extraction. It is based on temporal episodes modeled geometrically as triangles to describe the local temporal patterns in data as illustrated in Figure 2.2 and introduces triangulation to represent trends. Triangulation is a method where each segment of a trend is represented by its initial slope, its final slope (at each point, or critical point of the trend) and a line segment connecting the two critical points. A series of triangles constitutes a process trend. Through this method, the actual trend always lies within the bounding triangle which illustrates the maximum error in the representation of the trend. As a matter of fact, this triangular episode is similar to another trend representation language, primitives. Primitives are the fundamental elements of the trend description language i.e. A(0,0), B(+,+), C(+,0), D(+,-), E(-,+), F(-,0), G(-,-) where the signs are of the first and second derivatives respectively (Figure 2.4).

• Wavelets

Vedam proposed a wavelet theory based nonlinear adaptive system for identification of trends from sensor data named W-ASTRA and later proposed dyadic B-Splines-based trend analysis [8]. It uses the concept of multiresolution analysis in the neural network input. Sensor data is projected onto scaling functions at different levels. First of all, the coefficients from the highest level are used to identify the primitives. If a unique primitive identification is possible then the next set of samples is collected or else the coefficients from the next lower level are used. Then W-ASTRA compare the sensor trends with their fault signature which is the segment of its trend that characterizes its behavior for a given fault class.

(29)

Figure 2.2 Triangular Episodic Representation

(30)

Figure 2.4 Fundamental Language: Primitive • Qualitative Temporal Shape Analysis

Konstantinov proposed a generic methodology for qualitative analysis of the temporal shapes of process variables with the help of an expandable shape library that stores shapes like decreasing concavely, decreasing convexly and so on [8]. This procedure consists of three phases: analytical approximation of the process variable, its transformation into symbolic form based on the signs of the first and second derivatives of an analytical approximation function and a degree of certainty calculation.

The biggest challenge in applying trend analysis for FDI is how to automatically do trend extraction from noisy process data. In order to obtain a signal trend not too susceptible to momentary variations due to noise, some kind of filtering needs to be employed. One may simply use a filter (such as an auto-regressive filter) with a priori chosen filter coefficients (specifying the required degree of smoothing). However these types of filters suffer from the fact that they cannot distinguish well between a transient and true instability [26]. The essential qualitative characters might be distorted by these filters. Avoiding this problem requires that the trend be viewed from different time scales or different levels of abstraction. Dash proposed an interval-halving polynomial fit

(31)

approach for automatic trend extraction from noisy process data [27]. This approach parameterizes the data as a sequence of primitives with the “goodness of fit” determined with respect to noise. The interval-halving approach is a recursive method, where initially a single primitive is sought to characterize the entire data record, and when failing, the interval is halved and the process is repeated on the halved length scale until success is achieved. The procedure is recursively applied until the entire data is covered. Wavelet-based denoising is applied to remove noise. To determine the “goodness of fit”, i.e., significance of error, they use the estimate of noise provided by the wavelet analysis.

2.2.2 Quantitative Feature Extraction

2.2.2.1 Neural Network

In general, the learning strategy can be classified into supervised and unsupervised learning. In supervised learning strategies, by choosing a specific topology for the neural network, the network is parameterized in the sense that the problem at hand is reduced to the estimation of the connection weights. The connection weights are learned by explicitly utilizing the mismatch between the desired and actual values to guide the search. This givessupervised techniques the ability to correctly identify a known error for which the symptoms are not known. The most popular supervised learning strategy in a neural network has been back-propagation. The neural network which utilizes the unsupervised estimation technique is known as the self-organizing neural network as the structure is adaptively determined based on the input to the network, thus unsupervised learning may be used to identify new classes of errorspreviously not considered. Ortega, etc. [28] constructed a neural-based diagnostic system to inspect the defects of the ropes

(32)

backpropagation and momentum coefficient acquired the best results. A hierarchical neural network architecture for the detection of multiple faults was proposed by Watanabe [29]. Bakshiproposed Wavenet: a multi-resolution hierarchical neural network [30]. Wavenet is an NN with one hidden layer whose basis functions are drawn from a family of orthonormal wavelets. There are also other architectures such as self-organizing maps.

There are some limitations, however, to methods that are based solely on historic process data. It is the limitation of their generalization capability outside of the training data. This problem can be alleviated by radial and ellipsoidal units by avoiding a decision in case there are no similar training patterns in that region. This allows the network to detect unfamiliar situations arising from novel faults. Besides its lack of ability to

generalize to unfamiliar regions of measurement space, networks also have difficulty

with multiple faults [15]. This brings out a crucial point of distinction between model-based approaches and classifiers model-based on historic process data.

2.2.2.2 Data Mining -- Classification

Data mining is concerned with uncovering patterns, associations, changes, anomalies, and statistically significant structures and events in data. Simply put, it is the ability to take data and pull from it patterns or deviations which may not be seen easily to the naked eye. Another term sometimes used is knowledge discovery.

The recent rapid development in data mining has made available a wide variety of algorithms drawn from the fields of statistics, pattern recognition, machine learning, and database. The key advantage of data mining-based fault detection is that it can automatically generate concise and accurate detection models from large amounts of data.

(33)

The methodology itself is general, and therefore can be used to build fault detection systems for a wide variety of computing environments.

Data mining techniques such as Support Vector Machines (SVM) and the Association Rule have been investigated in the context of fault detection. SVM is a relatively new type of learning algorithm. When used for classification, SVM separates a given set of binary-labeled training data with a hyperplane that is maximally distant from them (known as maximal margin hyperplane) [31]. For cases in which no linear separation is possible, they can nonlinearly map the input vector into a high dimensional feature space where the data can be linearly classified. The hyperplane found by the SVM in feature space corresponds to a nonlinear decision boundary in the input space. Given a test instance, its distance from the hyperplane can be calculated and, following some threshold, it can be determined if the instance is anomalous. Sample applications in detecting novel data can be found in [32][33]. However, as a classifier, prior knowledge for the learned domain and novel region is needed to provide a learning basis for SVM tools.

There has been an increased interest in data mining-based approaches to build detection models for intrusion detection systems (IDS). These models generalize from both known attacks and normal behavior in order to detect unknown attacks. They can also be generated in a quicker and more automated method than manually encoded models that require difficult analysis of audit data by domain experts. Several effective data mining techniques for detecting intrusions have been developed [34][35][36], many of which perform close to or better than systems engineered by domain experts. In [37], the idea is to first compute the association rules and frequent episodes from audit data

(34)

which capture the intra- and inter- audit record patterns. These patterns are then utilized, with user participation, to guide the data gathering and feature selection processes.

In some cases, all positive examples are alike but each negative example is negative in its own way. Negative examples come from an unknown number of negative classes. In other cases, one class is sampled very well, while the other class is severely undersampled. The measurements on the undersampled class might be very expensive or difficult to obtain. The objective becomes making a description of a target set of objects and to detect which new objects resemble this training set. The difference with conventional classification is that in one-class classification only examples of one class are available. The objects from this class are called the target objects. All other objects are named the outlier objects. In the literature, a large number of different terms have been used for this problem. The term one-class classification originates from [38], but also outlier detection and novelty detection [39] are used. One possible approach to one-class one-classification is to use a density method which directly estimates the density of the target objects. By assuming a uniform outlier distribution and by the application of Bayes’ rule, the description of the target class is obtained. For instance, in [40] the density is estimated by a Parzen density estimator. In [39] not only the target density is estimated, but also the outlier density. Unfortunately, this procedure requires a complete density estimate in the complete feature space. Especially in high dimensional feature space this requires huge amounts of data. Furthermore, it assumes that the training data is a typical sample from the true data distribution. In most cases the user has to generate or measure training data and one might not know beforehand what the true distribution might be. This makes the application of the density methods problematic.

(35)

Alternatively, boundary methods have been developed which only focus on the boundary of the data. They try to avoid the estimation of the complete density of the data and therefore work with an uncharacteristic training data set. For the boundary methods, it is sufficient that the user can indicate just the boundary of the target class by using examples. An attempt to train just the boundaries of a data set is made in [41]. Neural networks are trained with extra constraints to give closed boundaries. In [42], a new type of one-class classifier is presented, the support vector data description. It models the boundary of the target data by a hypersphere with minimal volume around the data. The boundary is described by a few training objects, the support vectors.

We develop one-class classification with the convex hull concept and binary classification (normal and faulty) with SVM. Using the normal data collected we use a standard algorithm to define its convex hull. Measurements that fall outside the convex set are classified as indicating a fault. When both normal and faulty data are available, we consider using SVM for binary classification due to the excellent generalization performance (accuracy on test data) in practice.

(36)

Chapter 3 Fault Trending Analysis Part I – Convex Hull

3.1 One-Class Classification

Much effort has been expended to solve the fault detection problem with classification techniques. Although the problem of classification is far from solved in practice, the one-class one-classification is also of interest. In the fault detection problem, all positive examples are alike but each negative example is negative in its own way. For instance, the different faults considered in this thesis are the failure in each of the four control surfaces: elevator, aileron, rudder, stabilizer and in the engine. Not only are these five different faults different from the normal data, they quite differ from each other. Therefore, all the faulty data comes from a variety of faulty classes. In other cases, one-class is sampled very well, while the other class is severely undersampled. The measurements on the undersampled class might be very expensive or difficult to obtain. For example, in a machine monitoring system where the current condition of a system is examined, an alarm is raised when the machine has a problem. Measurements on the normal working condition of a machine are very inexpensive and easy to obtain. On the other hand, measurements of faults would require the destruction of the machine in all possible ways. Therefore, it is rather expensive, if not impossible, to generate all faulty situations [43].

One possible solution is one-class classification [38]. The goal in one-class classification is to make a description of a target set of objects and to detect whether new objects resemble this training set. If yes, new objects belong to the target class; otherwise, new objects are in the outlier class. In Chapter 2, we already introduced several different one-class classification methods. Here we are more interested in the boundary approach

(37)

that estimates the boundary of the target class, i.e. thenormal class with the normal data available only.

The most straightforward method to obtain a one-class classifier is to estimate the density of the training data and to set a threshold on this density [40]. Several distributions can be assumed, such as Gaussian or a Poisson distribution. When the sample size is sufficiently high and a flexible density model is used, this approach works very well. Unfortunately, this method requires a complete density estimate in the complete feature space. Especially in high dimensional feature space this method requires huge amounts of data. Furthermore, it assumes that the training data is a typical sample from the true data distribution. In most cases the user might not know beforehand what the true distribution might be. All these disadvantages make the application of the density methods problematic [43].

By comparison, boundary methods only focus on the boundary of the targeted data. They avoid estimating the complete density of the data and therefore working with an uncharacteristic training data set. This not only gives an advantage when just a limited sample is available, it is even possible to learn from data when the exact target density distribution is unknown [38]. Although in principle the boundary methods are more efficient than the density estimation, it is not directly clear how one should define a boundary around a target set, how to define the resemblance of an object to a target set and where to put the threshold. In most cases a distance to the target set is defined as a function of Euclidean distances between test objects, between the test object and target objects, and between the target objects themselves. However, this requires well-defined distances in the feature space and thus well-scaled features.

(38)

The first boundary method is the K -center method that covers the dataset with k

small balls with equal radii [45]. The ball centers µ_k are placed on training objects so that the maximum distance of all minimum distances between training objects and the centers is minimized. In the fitting of the method to the training data, the following error is minimized: max(min _i _k 2) k i center k x µ ε ₋ = − (3.1)

The K -centers method uses a forward search strategy starting from a random initialization. The radius is determined by the maximum distance to the objects that the corresponding ball should capture. By this construction the method is sensitive to the outliers in the training set, but it will work well when clear clusters are present in the data. When the centers have been trained, the distance from a test object z to the target

set can be calculated. This distance is now defined as:

( ) min _k 2

k center

k z z

d ₋ = −µ (3.2)

The other representative methodology is spherical data description [42]. A model that gives a closed boundary, a hypersphere around the data is defined. The sphere is characterized by a center a and radius R in Figure 3.1 and it demands that the sphere

(39)

Figure 3.1 The hypersphere containing the target data, described by the center and radius

a

R

They start by defining the error to minimize

= +

∑

i i C R a R ξ ε₍ _, ₎ 2 _(3.3)

with the constrain sothat all target objects lie within the hypersphere:

x_i −a 2 ≤R2 +ξ_i, ξ_i ≥0 (3.4)

Because they can give an expression for the center of the hypersphere , it is able to test if a new object is accepted by the description. A test object is accepted when the distance to the center is smaller or equal to the radius

a

z z

R .

Although the volume is not always actively minimized in the boundary methods, most methods have a strong bias towards a minimal volume solution. How small the volume is depends on the fit to the data. Because the boundary methods heavily rely on the distance between objects, they tend to be sensitive to scaling of the features. On the other hand, the number of objects that is required is smaller than one in the case of density methods.

(40)

For data-driven fault detection it is essential to be able to differentiate between sensor data generated by normal operations from data arising from faulty conditions and we need to do this differentiation without relying on detailed mathematical models. A simple approach for the characterization of normal behavior would be to use historical data to compute normal ranges for each measured variable, i.e. static "red-line" limits. For example, a fault in a heat regulator might be detected when a particular temperature gets higher than a given threshold. Such limits are popular because they are relatively easy to specify and use [46]. One could use the criterion that if all sensors reading are within normal ranges then the situation is normal. This conventional approach is widely used in the process industry where alarm panels provide visible indication when variables go out of range.

But they have numerous weaknesses, which are becoming increasingly significant as we move toward fault detection for aircraft including:

1) Late or missed alarms --- red-lines are relatively weak (wide) bounds, detecting faults only once they become critical and often dangerous. Earlier detection would support a wider range of recovery procedures, including preventative maintenance that would extend mission life.

2) False alarms --- red-lines are traditionally made quite wide intentionally, in large part to avoid false alarms. Nevertheless, such false alarms still occur routinely, sometimes resulting in operators eventually ignoring red-line alarms in those troublesome sensors altogether.

3) Failure to track system changes --- predefined red-lines fail to capture changes during the gradual degradation of components.

(41)

Especially for aircrafts the “red-line” limits approach may not be completely safe because simultaneous extreme values in some variables may lead to an unrecoverable condition.

It is our contention that for aircrafts this approach may not be completely safe as simultaneous extreme values in some variables may lead to an unrecoverable condition. In Figure 3.2 we show a “normal” variation of pitch and roll angles for a B747. The variations have been obtained using a simulator and introducing zero-mean random fluctuations to all actuators.

We note that requiring each variable to be between given bounds is equivalent to assuming the normal set to be a hypercube (under suitable normalization). It is known any linear constraint defines a convex set and a set of N simultaneous linear constraints defines the intersection of N convex set, which is also a convex set. Therefore, we can think normal data stay in a convex set. In the case of commercial flights, most of the time an aircraft operates in the neighborhood of a trimming point, equivalent to the operating point in an industrial process. Hence, small variations can be well described by a linear model. Motivated by this heuristic justification, in this work we consider the assumption that normal operation sensor data should cluster in a convex set centered at the trimming values. The approach proposed here is to find a suitable enclosure for the conditions known to be normal, instead of a rigid hypersphere in [42]. To minimize the chance of accepting faulty data, the volume of the normal convex data set has to be minimized. In the next section, this minimization is achieved by the description of the convex hull which is the smallest convex set.

(42)

How that set is determined and the effect on missed and false alarms are issues discussed here. The size of this normal convex set is critical and there is a clear tradeoff between its size and missed/false alarm rates. For size zero everything is abnormal so we have 0% missed alarms and 100% false alarms. If the normal convex set is extremely large, then everything is normal and one has 100% missed alarms and 0% false alarms. We also consider the curse of dimensionality as it applies to this case; testing if a point is in the inside of a high dimensional convex set can be computationally demanding hence one must seek ways to reduce the number of variables that are being considered.

Figure 3.2 Pitch Angle vs. Roll Angle under Normal Flying Conditions

3.2 Convex Hull Classification

Since one always has only a finite number of data points, formal determination of a convex set of normal values cannot be done through experimental data collection. At any given time, we use instead the convex hull of the points “known to be normal.”

(43)

Therefore, it is necessary to also consider the possibility of modifying the set if additional normal data is received; i.e., one must include the option of recursively defining the convex hull. We first discuss the classification using convex hull and its numerical complexity. Then we discuss the method proposed to modify the hull if new data, known to be normal, falls outside the current hull.

3.2.1 Convex Hull Algorithm

A set is convex if every line segment connecting two points in the set is fully contained in the set (Figure 3.3).

Figure 3.3 Non-convex Set Vs Convex Set

(44)

The convex hull of a set of points is the smallest convex set that contains the points (Figure 3.4).

In this thesis, we use the Quickhull algorithm [47] to construct the normal convex set. The convex hull is represented by a set of facets and a set of adjacency lists giving the neighbors and vertices for each facet. The boundary elements of a facet are called ridges. Each ridge signifies the adjacency of two facets.

In Quickhull algorithm, it is assumed that the normal input points are in general position (i.e, no set of d+1 points define a (d −1) flat), so that their convex hull is a simplicial complex. A simplicial complex is a space with a triangulation. Formally, a simplicial complex in _{R is a collection of simplicies in}n _{R . A simplex, sometimes}n called a hypertetrahedron, is the generalization of a tetrahedral region of space to n

dimensions. In one dimension, the simplex is the line segment. In two dimensions, the simplex is the convex hull of the equilateral triangle. In three dimensions, the simplex is the convex hull of the tetrahedron. Figure 3.5 shows the graphs for the n-simplexes with

to 7.

2 = n

Figure 3.5 Simplex in dimension n

The Quickhull algorithm can be extended to singular data by triangulating non-simplicial facets [47][48].

(45)

We represent a d-dimensional convex hull by its vertices and (d-1)-dimensional faces. Each facet includes a set of vertices, a set of neighboring facets, and a hyperplane

equation. The ( )-dimensional faces are the ridges of the convex hull. Each ridge is

the intersection of the vertices of two neighboring facets. For notational purposes, we let

be the set of r measurements and CH its convex hull. In the end,

Quickhull returns the indices of the points (vertices) in that comprise the facets of the convex hull of . 2 − d } r P ,..., { : ₁ _r r p p P = (P_r) r P

In d-dimensions, a (d-1)-dimensional facet is a hyperplane. If the distance of a point to the hyperplane is positive, the point is above the hyperplane. A hyperplane can be represented by its unit normal and its offset from the origin. TheHessian normal form of the plane is nˆ⋅p=−s (3.5) ,...} , , { ˆ n_x n_y n_z n= 0 x

is the unit norm vector. In N-dimensional geometry, the normal n at

the point is a generalized cross product of

ˆ ) 1 (N− edge vectors. ) ( ... ) ( ) ( ) ( ... ... ... ... ... ) ( ... ) ( ) ( ) ( ) ( ... ) ( ) ( ) ( ˆ ... ˆ ˆ ˆ ˆ 0 1 0 1 0 1 1 0 2 0 2 0 2 0 2 0 1 0 1 0 1 0 1 w w z z y y x x w w z z y y x x w w z z y y x x w z y x n N N N x N − − − − − − − − − − − − = − − − −

As usual, we can form the normalized normal

n n n ˆ ˆ ˆ = . ) ˆ ˆ

(46)

s is the offset from the origin. The distance to a point p_r is

D= ˆn⋅p_r +s (3.6)

If D>0, p_r it is in the same direction of n , otherwise, it is the opposite direction of . ˆ nˆ

After the value of unit normal vector n is determined by the cross product of edge vectors, we need to decide its direction. Since the convex hull is determined, we can use any point known to be inside the hull to determine the direction of the outward-pointing unit normal vector n . The distance from to the hyperplane is calculated. If the distance is positive, the normal vector of the hyperplane is in same direction of , which points to the inside of convex hull. If the distance is negative, the normal vector n of the hyperplane is in the opposite direction of , which points to the outside of the convex hull. ˆ 0 p ˆ p₀ 0 p nˆ p₀ ˆ

The fault detection decision is based on the following principle:

Principle: If the point is below all facets, it is inside the convex hull. Otherwise, the

point is outside CH [47]. r p (P r p _r₋₁)

We can use the sign of distance of to all facets as described above to determine if the isinside or outside the convex hull.

r

p

r

p

The complete decision process is as follows. Procedure:

1) Calculate the signed distance D₀ =nˆ₀ ⋅p₀ +s₀ for any point p which is inside ₀

) (P_r₋₁ CH

(47)

2) If D₀ >0, then nˆ=−nˆ₀, s =−s₀

Else if D₀ <0, then nˆ=nˆ₀, s = s₀

Once the direction of the normal vector is defined, we can calculate the signed

distance for every facet of CH . If any D , the point is in

outside set, i.e., is a faulty point. i r i i n p s D = ˆ ⋅ + ) ) (P_r₋₁ _i >0 p_r (P_r₋₁ CH p_r

3.2.1.1 Convex Hull Computation Complexity

Theorem: [47] if an execution of Quickhull is balanced, its expected complexity is

for and O

) log

(n r

O d ≤3 (nf_r /r+ f_r)ford ≥4.

d is the dimension, is the number of input points, n r is the number of vertices, and

is the maximum number of facets of

r

f

r vertices. Moreover, it is known that increases

with . For example, when

r

f

d d =4 , f_r =101;d =5, f_r =216. In the convex hull

classification we need to test every facet for a new coming point, so the large number of will increase the calculation effort.

r

f

We already estimated the complexity of calculating a convex set as the number of measurements increases. For the case of the B747, the number of measurements can easily exceed 70 and even concentrating on the basic 12 state variables may make trending computationally intense. Therefore, one necessary step before applying the convex hull classification is to reduce dimensionality of the measurements.

3.2.1.2 Recursively Increasing Convex Hull

If a new point is known to be normal and it is outside the convex hull, the convex hull should be able to extend itself to include this new point, to transform from CH(P_r₋₁) to

(48)

) (P_r

CH . In order to add a point to a convex hull, we have to first identify the facets

below the point. These are the visible facets for the point and their boundary is the point’s horizon. Identifying the visible facets can be facilitated by the above described direction decision when performing the detection if this new point is outside of the convex hull.

Figure 3.6 shows the horizon of on theCH . In Figure 3.7, consider plane h

containing a facet f of , is visible from a point if that point lies in the open

half-space on the other side of h . r p f ) (P_r₋₁ r p f ) (P_r₋₁ CH f

Figure 3.6 Visible Region of on the CH(P_r₋₁)

(49)

Next, these visible facets are replaced by a cone of new facets. Each new facet is defined

by the point and one horizon facet. Figure 3.8 shows the new convex hull CH by

connecting each horizon edge with to create a new triangle facet.

) (P_r

r

p

Figure 3.8 Convex Hull CH(P_r)

3.3 Reduction in Dimensionality

In this section we discuss the sensitivity analysis used to reduce the dimensionality of the data.

The first reason is due to computation effort mentioned in the last section. The other reason is because of the curse of dimensionality [49]. When a large number of features per object is used, it causes a severe overfitting problem, so increasing the number of features can deteriorate the classification performance.

However, sensitivity analysis is not the focus of this research. Details can be referred to in [50][51].

Data-driven fault detection using trending analysis

LSU Digital Commons

Data-driven fault detection using trending analysis

DATA-DRIVEN FAULT DETECTION USING

TRENDING ANALYSIS

Acknowledgements

Table of Contents

List of Tables

List of Figures

Abstract

Chapter 1

Introduction

1.1 Fault Detection and Isolation

1.2 Research

Approach

1.3 Organization of the Dissertation

Chapter 2

Literature Review

2.1 Model-Based Methods

2.1.1 Quantitative Model-Based Methods

2.1.2 Qualitative Model-Based Methods

2.2 Model-Free

Methods

2.2.1 Qualitative Feature Extraction

2.2.2 Quantitative Feature Extraction

Chapter 3

Fault Trending Analysis Part I – Convex Hull

3.1 One-Class Classification

∑

3.2 Convex Hull Classification

3.2.1 Convex Hull Algorithm

3.3 Reduction in Dimensionality