Artificial Neural Networks - Data-driven Models

Chapter 2 Background and Literature Survey

2.3 Review of RUL Prediction Methods

2.3.3 Data-driven Models

2.3.3.3 Artificial Neural Networks

One of the most commonly used data-driven approaches in the prognostic literature is the Artificial Neural Networks (ANNs) (Goebel et al., 2008b). ANNs are biologically inspired programs that are loosely analogous to the behaviour of the neural networks of the brain, and they are used as machine learning systems made up of data processing neurones which are the units of neural networks (Bishop, 1995; Kozlowski et al., 2001). The neurones establish a set of interconnected functional links between input series and a desired output where the connections can be calculated and trained for the optimal performance (Byington et al., 2004a). This connection is typically achieved by exposing the network to a set of input samples, training the network, and re-adapting the network to minimise errors (Bishop, 1995).

The computational model of ANN is a set of multiplication, summa- tion and transfer functions (Krenker et al., 2011). The neurons practice the multiplication by weighting the inputs in the first step and then summing all weighted inputs. The sum of these weighted inputs is exposed to the transfer function. In this process, the weights used at the neurons are automatically changed to increase the compliance of the model with the data (Bishop, 1995). These connections and organisation of ANN are the key features in establish- ing a set of interconnected functional relationships between numerous input series and a desired outputs where the relationship can be trained for optimal performance (Byington et al., 2004a).

Neural networks are practical tools for effectively modelling engineering systems consisting of a broad category of non-linear dynamical systems, dimen-

sion reduction methods, regression analysis and discriminant models (Sarle, 1994). In certain complex engineering applications, the observations from a system may not include precise data, and the desired results may not have a direct link with the input data; in such cases, ANNs are a competent application to model the system without knowing the exact relationship between input and output data series (Murata et al., 1994). ANNs, therefore, are suitable for complex system prognostic algorithms and are faster and easier to calculate compared to various other prognostic methods. For all these reasons, ANNs have been widely employed as one of the most popular data-driven prognostic methods, and a significant number of studies across different disciplines have stated the merits of artificial neural networks through the introduction of numerous different methodologies.

Such an advanced neural network training is designed by Heimes (2008) to classify the difference between healthy and degraded condition monitoring time series of a complex system. Their network is designed with back- propagation through time gradient calculations and developed to solve the issues in adaptation, filtering and classification. Initially, a Multi-Layer Per- ceptron (MLP) neural network design is undertaken to determine whether the difference between a healthy system and a failed system can be classified effectively. MLP function predicts the number of cycles remaining before the failure. It is assumed that the earliest samples in each time series characterise a healthy time line, while the latest samples correspond to a degraded time line. The condition monitoring data is then clustered into two different parts: a healthy unit and a degraded unit. An MLP-based classifier could distinguish these units with 1% error rate. To handle the filtering within non-linear time domain dynamic system modelling, a recurrent neural network (RNN) structure with internal memory and feedback components is used to learn complex non-linear dynamic mappings. The RNN structure utilises all sensor data and

operating conditions as input series for estimation.

Peel (2008) also constructed a similar Multi-Layer Perceptron and Ra-

dial Basis Function networks as the regression models for prognostics. A

Kalman filter method is combined with a proper selection of these networks. Their designed algorithm delivers a mechanism for fusing the Kalman filter and multiple neural network model predictions over time. Peel (2008) discussed that the data pre-processing and data exploration are essential initial stages of a successful prognostic framework for complex systems which operate under multiple operational conditions. They concluded that these stages result in the identification of different regimes from sensors.

Peng et al. (2012a) and Rigamonti et al. (2016) provided the echo state network-based prognostic model, which is an architecture and supervised learning principle for recurrent neural networks. This network model drives a ran- dom, large and fixed recurrent neural network via the input signal, thus in- ducing each neuron within the network to act as a non-linear response signal, and then combines a desired output through a trainable linear combination of all of the response signals. Abbas (2010) developed a further multilayer feed- forward neural network architecture using an error back propagation algorithm in which they employed to develop the creep damage predictive models for different regime phases, i.e., take-off, climb, and cruise. Wang (2010) also ex- panded the earlier trajectory similarity-based prediction method (Wang et al., 2008) with a Radial Basis Function Neural Network (RBFN) based RUL prediction method. It has been stated that the similarity-based model has shown considerable advantages in their predictive performance over ANN-based prediction methods.

In the field of complex systems with multi-dimensional condition monitoring data, Jianzhong et al. (2010) and Riad et al. (2010) extended the use of ANNs in an attempt to achieve better performance within the learning

methodology regarding RUL estimation of test trajectories. Javed et al. (2012) emphasised the neural networks in the feature selection procedure for prognostics with the intention of showing that feature selection should be performed according to the predictability of features. For multi-step ahead predictions, Bektas and Jones (2016) introduced a non-linear autoregressive neural network prognostic model as a form of dynamic filtering in which past values of a time series are used to predict future values. However, it was observed that this type of neural network has issues predicting the exponential behaviour of damage propagation. Therefore, a recurrence relation model was used to transform input and output data for network training.

In Table 2.5, ANN prognostic approaches applied in complex domains are described. Although the most attractive feature of these neural network applications is the accomplishment of their learning ability, it is not always possible to train the network as desired. The networks used at different phases of prognostics may not be as effective as expected, and this is generally more evident in time series showing complex degradation growth or decay. In such cases, the ANN structure behaves as an autonomous system which attempts to recursively imitate the dynamic system behaviour that caused the non-linear time series (Haykin and Li, 1995; Haykin and Principe, 1998). The multi-step ahead predictions of ANN applications can be quite challenging when only a few time series or a little previous knowledge about the degradation process is available and the failure point is expected to happen in the longer term (Menezes and Barreto, 2008).

ANNs, which are designed for one-step-ahead prediction models, include only actual sample points of the initial time series used in network modelling; the prediction tasks are modelled to estimate the next value of time series, without feeding externally back to the models’ input values (Zemouri et al., 2010). In a longer multi-step-ahead prediction horizon, ANN model output

Table 2.5: ANN-based Prognostics in Complex Systems Parker Jr

et al. (1993)

A neural network pattern recognition model that uses a con- strained, minimum-logistic-loss criterion for multi-class problems

Brotherton et al. (2000)

A classification model for complex systems that are difficult to model physically. The approach provides novelty detection capability in sensor data and thereby statistical state modelling of the complex system with respect to known faults. Bonissone

and Goebel (2002)

A systematic framework for building a model to estimate time-to-fail. Hybrid models of neural, fuzzy and evolutionary computation methods are applied to classification, prediction, and control problems.

Heimes (2008)

Applied to multidimensional dataset. MLP based classifier and RNN based structure for filtering within non-linear time domain of turbofan systems

Goebel et al. (2008b)

ANN is applied to learn the damage state of relatively sparse training sets with very high noise content (a rotating equip- ment in an aerospace setting)

Peel (2008)

The method involves the estimation of RUL of an unspecified complex system using a data-driven model combination of Multi-Layer Perceptron Radial Basis Function networks with a Kalman filter method

Baraldi et al. (2013)

A trained bagged ensemble of Artificial Neural Networks is embedded in the Particle Filtering method as an empirical measurement model

is required to be externally fed back to the initial time series for a fixed but finite number of steps; the regressing components of these input series, which are previously formed of actual sample points of the initial time series, are progressively replaced by already predicted values (Sorjamaa et al., 2007). Such a replacement might cause an imbalance in exponential curve predictions and the multi-step predictions can overly imitate the training data (Bektas and Jones, 2016).

However, in data filtering models for prognostics, ANNs provide a ro- bust computational mapping between the raw data and a desired output to be used in network prediction (Demuth et al., 2008). Such neural network filtering

models have been applied in multidimensional data, and showed high performance in prognostic performance (Parker Jr et al., 1993; Brotherton et al., 2000; Heimes, 2008). Considering that the multi-step ahead predictions and dynamic modelling in ANNs are complex tasks, and play a challenging and significant role in ANN structure (Principe et al., 1999; Menezes and Barreto, 2008), a combination of ANNs with alternative methods is generally necessary in order to achieve better prognostic performance. Some of these various methods are shown in literature such as Parker Jr et al. (1993); Brotherton et al. (2000); Bonissone and Goebel (2002); Peel (2008); Baraldi et al. (2013); Bek- tas and Jones (2016). Each combination in these applications can be regarded as a hybrid model for prognostic frameworks.

Learning in a typical network is associative since the network learns an association between one type of data (input) and another (output) (Cross et al., 1995). When the network training is supervised by the target data, it knows the desired response and the actual outcome. Such supervised-type classification techniques tend to be more accurate since each classifier is trained by a representative data trajectory known as a corpus. In contrast, the unsupervised learning does not employ prior training to process the classification (Chaovalit and Zhou, 2005). In various prognostics models such as Ramasso (2009); Ramasso and Gouriveau (2010); Sarkar et al. (2011); Xue et al. (2011); Yu (2013); Lin et al. (2013); Tamilselvan and Wang (2013); Bektas and Jones (2016); Mosallam et al. (2016), the supervised classification stages is used for health indicator detection.

The generalised time-varying health index equation introduced by Sax- ena et al. (2008b) is commonly used as an additive term to yield supervised classifications for complex systems.

where d is an arbitrary point in the wear-space, a and b are model parameters and t is time. This health index can also be used for different phenomena within a system. As an example, health can be described by the trajectories for flow (f) and efficiency (e) that might differ for various fault modes. Thereby, they are required to be as separate health-related indexes.

e(t) = 1−de−exp(ae(t)tbe(t)) (2.32)

f(t) = 1−df −exp(af(t)tbf(t)) (2.33)

These terms are then aggregated to form the overall health index;

H(t) =g(e(t), f(t)) (2.34)

where the functiong corresponds to the minimum of all operative mar- gins.

The main challenge in supervised prognostic approaches (as well as the neural network fitting approaches) is the identification of data characteristics. When a common dataset is available and used for predictions, a predefined model might show complications when attempting to define characteristics in different instances in the dataset. The most obvious examples of these complications are the initial wear levels and failure points. Since each operating trajectory would have a case-specific starting performance level and a mu- tual threshold level, the operational health level should be standardised by regarding these levels.

Alternative unsupervised learning algorithms for classification are used in the publications of Wang et al. (2008); Wang (2010); Peng et al. (2012a); Sarkar et al. (2011); Javed et al. (2013); Lam et al. (2014); Mosallam et al. (2016); Rigamonti et al. (2016) in order to perform dataset partitioning into

the operating regimes. These methods could be regarded as cluster analysis, which is used for exploratory regime analysis to identify hidden patterns in

data trends. As these methods regard with the population parameters in

the entire dataset, the characteristic trajectory features such as initial wear level and failure point could be standardised into a common scale. Since the supervised training stage are trained by the representative data trajectory, the identification of population parameters is disregarded. However, one can apply the unsupervised learning to take the population parameters into account and then use the outcomes in a more effective supervised model as an output data in a neural network filtering model.

2.3.3.3.1 Neural Network Architecture:

The structure of neural networks covers a broad area of study and it would be impractical to discuss all types of neural networks in this work (Hagan and Demuth, 1999). Instead, the common tools of neural network architecture for filtering and the multilayer perceptron will be concentrated on in the following sections. This architecture returns a common neural network fitting function with a hidden layer (Beale and Demuth, 1998). Similar neural network applications in prognostics can be found in the works of Greitzer et al. (1999); Fink et al. (2014); Wu et al. (2016); Loutas et al. (2017); Elforjani (2016); Yang et al. (2016); Zheng et al. (2017). Based on these works, this section defines a network model that uses the unsupervised HI outputs in the network training stage and generates a network function to fit the raw data trajectories.

The most essential building block in neural networks is the single-input neuron structure (Demuth et al., 2008), such as that shown in Figure 2.6. In this sample neuron, there are three definite functional operations, namely the weight function, the net input function and the transfer function (Demuth et al., 2008). The scalar input x is first multiplied by the scalar weight, w,

Figure 2.6: A single neuron

to form another scalar product, wx. The weighted input, wx, is added to the bias,b, to form the net input,n, in the second functional operation. The bias is such as a weight with the exception that it has a constant input of “1”. In the final operation, n penetrates through the transfer function, f, which results in the output, y. The basic idea of a neuron is to adjust parameters x,wand b such that the network demonstrates the desired behaviour. Hence, the network can be trained to do a particular task by adjusting the weight and bias parameters.

For the neurone output, the general equation can be denoted by the following formula (Lippmann, 1988):

y=f(wx+b) (2.35)

As ANNs are complex structures, a neuron typically has more than a single input. A network illustration with multiple inputs is demonstrated in Figure 2.7. All inputs, x(1), x(2), x(3),· · · , x(r), are weighted by their corresponding elements, w(1,1,),· · · , w(1, r) in the weight matrix, W (Hagan et al., 2002).

Figure 2.7: A single neurone with multiple inputs

This equation, in matrix form, can be expressed as:

n =Wx+b (2.37)

where the weight matrix has a corresponding element for each input entering the network.

W=         w1,1 w1,2 · · · w1,r w2,1 w2,2 · · · w2,r .. . ... ... ws,1 ws,2 · · · ws,r         (2.38)

and the scalar output of a neurone can be formulated as

y=f(Wx+b) (2.39) y=f n X i=1 wixi+b ! (2.40)

ables, (xi), with the addition of bias,b. The neuron’s output,y, is obtained as

a result of the nodes and the transfer function of the neurons f (Barad et al., 2012; Krenker et al., 2011).

Figure 2.8: A multiple-layer neural network

Generally, one neuron, even if structured with many inputs, is not suffi- cient for training, and multiple neurones operating in parallel layers are instead necessary (Hagan and Demuth, 1999). In Figure 2.8, a multiple-layer neural network model is shown. Each layer here is formed of its own weight matrix, bias vector, transfer functions, and inputs and outputs. The demonstrated structure is formed of a feed-forward model that takes a set of input vectors (raw data) as columns in a matrix, and then arranges another set of output vectors (HI) into a second matrix.

Neurons, which are the building blocks of neural networks, evaluate these input state variables. This definition can be denoted by the following equation:

y(t) =f x(t) =fo ( b+ nh X h=1 whfh bh+ n X i=1 wihx(i) !) (2.41)

where the network used for fitting function is a two-layer feed forward network with a sigmoid transfer function,fh, in the hidden layer and a linear

transfer function in the output layer, fo (Demuth et al., 2008).

fh(n) = ₁₊1_e−n, fo(n) =n (2.42)

2.3.3.3.2 Neural Network Regularisation:

In the training of multilayer neural networks, overfitting and computational overheads might lead to poor network calculations, especially when the data is excessively complex and the parameters relating to the number of observations are outnumbered (Srivastava et al., 2014). In such cases, the network could memorise the samples for the training data, but it cannot properly generalise the upcoming testing cases (Lawrence et al., 1997).

One common network training approach to avoid overfitting is Bayesian regularisation (BR) method. Other well-known regularisation alternatives to this method are the “Levenberg-Marquardt” and “scaled conjugate gradient”

practices (Demuth et al., 2015). In the case of the Levenberg-Marquardt

model, the network training typically requires more memory but less time and it automatically terminates when generalisation stops improving, as de- fined by the increase in the mean squared error value of the validation samples. The main drawback of the Levenberg-Marquardt algorithm is that it requires the storage of some matrices that can be quite large for certain problems.

The “scaled conjugate gradient” needs less memory and can be used when the training data is short. Bayesian regularisation typically requires more time in comparison to other regularisation algorithms due to the adaptive weight min- imisation, but it is able to provide satisfactory generalisations from difficult and/or noisy datasets (Demuth et al., 2008).

The function in the Bayesian regularisation method stands on the Gauss- Newton approximation to the Hessian matrix. The algorithms updates the network weights and biases in accordance with Levenberg-Marquardt optimi- sation. The regularisation reduces a compound of squared errors and weights in pursuant of diminishing the computational overhead, and then identifies the correct compound so as to provide a practical network generalisation. This definition of Bayesian regularised neural network is formed by the extensive works of MacKay (1992b) and Foresee and Hagan (1997). Based on their works, the detailed application of the Bayesian rule to neural network train-

In document An adaptive data filtering model for remaining useful life estimation (Page 71-89)