A Fault Diagnostics Hybrid for Multivariate Normal Processes
5.2 Multivariate Fault Diagnostics Approaches
Several source identification approaches have been proposed for multivariate processes using multivariate quality control charts (Huda et al. (2014)). de-Felipe and Benedito (2017b)
94
developed a methodology to reduce the dimension of data in process capability analysis by restructuring multiple quality indicators obtained in quality tests and assists practitioners in identifying and ranking quality characteristics responsible for poor performance; this is to forecast the potential capability loss and to compare the performance of different processes.
They argue that MPCIs are a more suitable tool for process monitoring and improvement than control charts. Venkatasubramanian et al. (2003) categorised source identification and ranking into two types of approach: machine learning and statistical. Meanwhile, Huda et al.
(2014) grouped the existing approaches for source identification into three categories: the wrapper model with expert knowledge; filter models with wrapper evaluation; and statistical approaches. In this study, machine learning, principal component analysis and the proposed impact-factor algorithm were used to carry out the task of fault diagnosis in multivariate normal process capability analysis.
5.2.1 Machine Learning
Selecting features exclusively based on a weight-based metric may yield unreliable outcomes.
Hsu et al. (2002) proposed an approach that incorporates a weight analysis-based heuristic called artificial neural net input gain measurement approximation (ANNIGMA) to direct the search in the wrapper model and allows effective feature selection for neural networks.
ANNIGMA is feature ranking approach which is mathematically derived from the backpropagation training formulation of artificial neural network (ANN). This weight analysis-based approach can rank features of a data set during the training of ANN by relating the weight associated with each input feature. In general, irrelevant or redundant feature produces more error than relevant or significant features. ANNIGMA controls the weights of noisy features during training such that the noisy features contribute to the output of the network as least as possible. Consequently, the speed of the fault diagnostic task using ANNIGMA substantially increases (Hsu et al. (2002)).
95
For a two-layer neural network, if i, j and k represent the input, hidden and output layer node indexes, respectively; L is the second layer linear multiplier value; π΄π is the input node (feature); W is the weight between layers and F is a logistic activation linear function F(x) = 1/ (1+ exp(-x)), then the output of the network ππ is given by Equation (57).
ππ= πΏπΓ β πΉ (β π΄π π π Γ πππ) Γ πππ (57)
where Wij and Wjk are the network weights. The local gain can be defined as:
πΏπΊππ = |β πβ π΄π
π| (58)
According to (Hsu et al. 2002), the local gain, πΏπΊππ, can be defined in terms of network weights by:
πΏπΊππ = β |ππ ππΓ πππ| (59)
The ANNIGMA score for πth input and πth node is defined as (Hsu et al. 2002):
π΄πππΌπΊππ΄ππ = max(πΏπΊπΏπΊππ
π) (60)
The input for the ANNIGMA is the out-of-controls and in-controls samples from process capability analysis together with two extra columns. βSample conforming to specification regionβ is represented by a β1β and βsample not conforming to specification regionβ is represented by a β0β in the first column. In the second column, βsample conforming to specification regionβ is represented by a β0β and βsample not conforming to specification regionβ is represented by a β1β. The significant group of variables that are responsible for the process performance is determined by the subset with highest accuracy or close to the highest accuracy with a fewer number of variables as an output.
A large number of neural training takes place during each training cycle. It is important to adjust the neural net training parameters such that time is not wasted overtraining the nets
96
(Hsu et al. (2002)). They found that only a few (less than 10) epochs can generate usable ANNIGMA scores. After configuration is determined, we estimate the error rate of the neural net with no feature selection by applying 10-fold cross-validation to train 10 sets of the weights using the training set, and then testing them against the hold-out set and averaging the resulting error rates. Next, we estimate the feature selection performance of the ANNIGMA by applying each of the search strategies thirty times and report the average number of features selected and average the error rate. In each trial, the error rate is estimated after the final feature subset is selected by applying 10-fold cross-validation on the training set and then averaging hold-out set errors. The third column lists the results with no feature selection and the next two columns compare the results using the search strategies. The ANNIGMA training is based on a backward elimination process, starting with all variables and then removing the lowest ranking variable with each iteration. The ranking is carried out using the network weights (Equation 57). Additionally, in each iteration, ANNIGMA calculates the corresponding accuracy; that is, the percentage of the process explained by the respective set. The process continues until only one variable is left. From the ANNIGMA output, the highest accuracy corresponding to the smallest number of variables is selected (Hsu et al. (2002)).
Artificial neural network weights give an estimate of the relative importance of input features. Several parameters have to be predetermined and these include initialising the entire set of attributes available, the number of wrapper cycles, the total number of cross-validation subsets used, the weights and biases of a trained neural net. In this simulation, a 10-fold cross-validation was used. The ANNIGMA values are weighted by their performance and this ensures that the better performing neural nets have proportionately greater influence on the final ranking of attributes. Many neural net training takes place during each cycle and the
97
neural net training parameters are adjusted such that time is not wasted over-training the nets.
Less than 10 epochs can generate adequate ANNIGMA results.
Since input and feature selection are based on whether the observation lies inside the tolerance range. Therefore, without the out-of-control signals the ANNIGMA will not be ideal for identification and ranking of the contributions of individual variables to the processβ
poor performance. Consequently, the machine learning approach is an effective fault diagnostic tool when some products are outside their respective specification limits; however, when they are within the limits (a capable process), the machine learning approach is incapable of describing the process. Furthermore, machine learning cannot diagnose whether the behaviour has occurred due to a shift in the target mean or variance (Gunaratne et al.
(2017)).
5.2.2 Principal Component Analysis
In the context of statistical approaches (Abdi and Williams (2010)), one can use source identification based on PCA where the ratio of each eigenvalue to the summation of the eigenvalues is proportional to the variability attributed to each principal component, πππ , defined as:
ππ£π = βπππ
π£ π
π=1 for π=1 , 2 , 3 , . . . . , π£ (61)
Abdi and Williams (2010) stated that an interpretation of loading requires that the principal component and its variables closely correspond to each other; that is, the angle between the vectors representing them in π π£ is very small. The correlation between the πth variable and the πth principal component is given by:
πππ = π’ππ[πππ
ππ]
1β2
(62)
98
where π’ππ denotes the loading for the πth variable in the πth principal component, ππ represents the eigenvalue associated with the jth principal component and πππ is the variance for the πth variable in the jth principal component. The value of πππ is used to rank the contributions of the ith variable towards process performance. This can be estimated through the fviz_pca_contrib function in the R statistical package (Santos-Fenandez and Scagliarini (2012)).
The PCA approach is an effective source identification tool only for capable processes. As with machine learning, PCA cannot classify whether the behaviour of a capable process is caused by shift in mean or a shift in variance.
5.2.3 Proposed source identification algorithm based on mean and variance impact factors
To overcome the shortcomings of the machine learning and PCA approaches in classifying whether the shift was in mean or in variance, a source identification algorithm was developed, based on mean and variance impact factors (hereinafter, referred to as the βimpact factorβ approach). This impact factor algorithm is able to assist the quality practitioner in identifying major sources of variability in multivariate processes with respect to shifts in mean or process spread and in ranking them accordingly. To the best of the authorβs knowledge, none of the existing approaches are able to classify whether process behaviour performance is caused by a shift in mean or a change in variance with respect to the responsible variables. In this research, it is assumed that (1-1/D) β€ 0.001 (Equation 37) indicates no significant shift in the process mean from the set target and that MCp β₯ 1.0 (Equation 46) denotes no significant shift in process spread.