Feature Selection and Reduction - Digital Signal Processing

2.3 Digital Signal Processing

2.3.4 Feature Selection and Reduction

Feature selection and reduction can be applied to extracted features from raw sensor data in order to reduce the data set in size and dimensionality and to contain only the information of interest. The number of extracted features can be significant, which can add complexity to subsequent data fusion, pattern processing, pattern recognition or decision making algorithms. Feature selection and feature reduction techniques are therefore used to reduce the extracted data set, retain data that contains useful information and remove data that is redundant or has little value. The complexity of subsequent computations is then reduced. Furthermore, the number of sensors required in the system may also be reduced.

Feature Selection

Feature selection is the method of selecting an optimal subset of the given signal features without further transformation of the data. The selection is made in order to optimise a function, such as the classification accuracy of a subsequent decision-making algorithm. Several techniques are available in order to determine which features contain the most information and which have high redundancy. There are normally two methods for feature selection; filter methods and wrapper methods [103]. Filter methods use general characteristics of the features to evaluate whether they are useful without involving any subsequent learning algorithm. Wrapper methods use the performance of a chosen learning algorithm to determine the optimum feature subset. Whilst wrapper methods can lead to an improved output from the learning algorithm, they can be significantly slower to run. Machining process monitoring literature has used relatively simple filter methods for feature selection, such as the correlation-based feature selection (CFS) used by Cho et al. [93], where possible subsets are given a ‘Merit’ which ranks their overall value. The heuristic scoring method is shown by Equation (7). Features with high correlation to a class (which may be time in cut or tool wear, for example) increase the Merit, whilst mutual correlations between other features in the subset reduce the Merit. The authors also concluded that the CFS method provided an improved feature set compared to an alternative Chi-squared statistics- based feature selection. It is also noted that some full feature sets (such as spindle power) outperform the reduced set for classification accuracy, showing that the Merit ranking is not an optimum selection.

Merit = 𝑘r̅̅̅̅𝑐𝑓 √𝑘 + 𝑘(𝑘 − 1)r̅̅̅̅𝑓𝑓

(7)

Where k is the number of features in the subset, r𝑓𝑓 is the mean feature-feature correlation

and r𝑓𝑐 is the mean feature-class correlation

Jemielniak [96] ranked features suitable for monitoring remaining tool life by first applying a low pass filter to each feature data set, then measuring how well the original feature data approximates the filtered data using the coefficient of determination (R-squared). Many assumptions have been made in selecting this approach. The filtered data has been chosen to represent the true model “to avoid any uncertain suppositions about the mathematical formula of this model”. The feature is deemed as useful when the coefficient of determination between the filtered data and the original sensor data is greater than an arbitrary value of 0.4. The author has not explained how the delay caused by using a filter has been dealt with.

Whilst these two methods discussed can rank the features most suitable for tracking a particular variable over time, they may overlook features containing information relating to either transient events or other variables not correlated against. The feature selection method must therefore be considerate of the monitoring system objective(s), whilst still being practical enough that feature subsets can be selected without extensive computation. Once a function that defines the value of any feature or feature set is derived, a search for the feature subset that optimises this function is required. Given the large number of sensor signal features available in machining monitoring systems, searching for an optimum subset by testing all possible subsets is impractical due to the significant number of subset combinations and therefore the large computational expense. Greedy hill-climbing algorithms, such as that used by Cho et al. [93], provide an efficient alternative, though the method does not consider interactions between features.

It is also possible to select a feature set based on theoretical or practical engineering knowledge. A potential difficulty in deriving feature subsets from knowledge of the physical system is that the underlying physical effects for issues such as tool wear and chip formation are complex. Furthermore, the transmission of data from the source to the sensor, particularly for vibration and AE, has an impact on the signal noise. Jemielniak [96] stated

that “it is impossible to predict which sensor signal features will be useful in any particular case”. Though this is rather pessimistic and arguably untrue, it emphasises that there is the impression that the physical mechanisms that lead to sensor signal generation are complex and not well understood. An understanding of the physical system has been important in chatter detection algorithms, such as process damping theory [94], therefore it may be worth pursuing for condition monitoring applications. No research was found to compare practical based feature selection with model based selection in this field.

Feature Reduction

Feature reduction is the method of reducing the dimensionality of a multi-feature (multivariate) data set. It is sometimes referred to as feature transformation. Possibly the most popular of these techniques is principal component analysis (PCA). PCA is used for mapping the variance of multivariate data into a reduced set of principle components, disregarding the dimensions in the original data set that contain the least variance. This also provides a valuable tool for visualising multivariate data sets in 2 or 3-dimensional space. Typically, a subset of features can be chosen using feature selection techniques, followed by reducing dimensionality further with feature reduction.

In document Unsupervised Monitoring of Machining Processes (Page 55-57)