Feature generation and selection - Elements of a pattern recognition system

19signal classification

4.3. Elements of a pattern recognition system

4.3.1. Feature generation and selection

The process of acquiring information from an underlying data set (measurement space), and the estimation of the inherent information content within the obtained feature space are called feature generation (sometimes feature extraction) and feature selection, respectively. Both steps are regarded as the most important part of a pattern recognition system (e.g. Schukat-Talamazzini, 1995, p. 75, Niemann, 1990, p. 9).

In the feature generation step, individual signal parameters are calculated from the raw measurements, which then build the basis for the subsequent classification process. Consequently, the sin- gle features used for the data representation must contain valuable information for the discrimination of classes. In case of a good knowledge about the underlying physical processes of the data set, a possible strategy is to derive the parametrization from the theoretical background. Alternatively, if the knowledge about the data production process is poor, a parametrization can be chosen by taking into account human expertise or by mimicking human perception principles. In the present context of seismic signal classification, the measurements consist of evenly sam- pled, discrete time series. Those represent recordings of the ground motion at a seismograph system proportional to ground displacement, velocity or acceleration depending on the deployed instrument type. The seismogram contains information about the involved seismic source process, the propagation medium and the instrument response. Whereas the theory of seismic wave propagation is well-developed, and the instrument response is a known quantity, the location and nature of the seismic source as well as the properties of the propagation medium are generally not well constrained. It is therefore difficult to derive an appropriate parametrization solely from theoretical considerations.

The experiences from over 100 years of seismological observatory practice provide a good start- ing point for a reasonable choice of signal parameters for the classification of seismic signals. An important issue in the visual inspection of seismograms is the fact, that an observer is trained to look at contextual information. Whereas detailed analysis of small seismogram portions provide information about short-term signal attributes, the classification of the waveform can only be per- formed by taking into account the variation of signal parameters over the whole duration of the signal. An example is given in Fig. 4.2: the short time windows on the left show similar signal characteristics and would be visually classified as a portion of seismic noise. However, viewing the same signal windows within a larger time scale (Fig. 4.2 on the right) reveals that one of the signals is actually part of a seismic event (MP-type signal recorded at 1.6 km distance at Merapi), whereas the other waveform sample belongs to the preceding seismic noise. Consequently, for the classification of seismic events it is important to include contextual information either in the signal representation process (feature generation step) or in the classifier approach.

24 Elements of a pattern recognition system

The choice of signal attributes for the purpose of detecting and classifying seismic events has been the subject of numerous scientific research in the past. A review of the most commonly used features which have been proposed in earthquake research is provided in section 4.4. At this point it is sufficient to note that a variety of signal parameters can be derived from seismogram recordings, mostly based on knowledge sources from observatory practice as well as from considerations regarding the theory of wave propagation and the corresponding seismogram structure. At first sight, any signal parameter estimated from the raw data streams can be used to parame- trize the seismic data. Without a priori knowledge about the relevance of individual signal parameters for the given classification task it is difficult to give preference to particular feature estimates. Hence, in a first step, it is common practice to include as much features as possible into the feature vector. However, the number of reasonable feature candidates may be high. In order to keep the computational complexity of the following classifier design in tractable limits, the dimensionality of the feature vector space has to be restricted to some reasonable size.

The feature selection step of a pattern recognition system consequently aims to select an optimal subset of the previously acquired features for the classification task. One major difficulty in the feature selection stage is to define an optimality criterion. A common approach (see e.g. discus- sion in Niemann, 1983, p. 108) is based intuitively on the criterion of class separability in the feature vector space, i.e. to evaluate the discriminative power of the feature vectors.

A widely used method to reduce the dimensionality of the feature vector space while maintaining the discriminative power of the feature vectors relies on the usage of linear transformations. The Karhunen-Loeve (KL) expansion has shown to be suitable for deriving an appropriate transformation with the desired properties (Kittler and Young, 1973). The KL-expansion is based upon the eigenvector analysis of the sample covariance matrix built from a training set of feature vectors. The result of this analysis can be used to linearly transform the representation vectors into a new

0 1 2 3 4 5

Time [s]

0 10 20 30

Time [s]

FIGURE 4.2: Waveform example demonstrating the importance of contextual information in seismogram interpretation. In the left column two waveform samples are shown, which would be visually classified as seismic noise. The same waveform windows are shown on the right side on a larger time scale within their temporal context. From the contextual information, the lower seismogram sample is now clearly recognized as part of a seismic transient signal (MP-type signal at Merapi volcano).

Elements of a pattern recognition system 25

coordinate system in which the coordinate coefficients are mutually uncorrelated, and where the information of the original feature vectors is mapped onto the first few axes of the new coordinate system. It is then possible to use a new feature vector of reduced dimension, which approximates the original representation vectors in a least square sense.

Consider the original feature vector of dimension , and let be a matrix formed by row-vectors , which build an orthonormal basis of the vector space . Then any vector

may be represented as an expansion of the form:

, 4.1

with coefficients . Using the incomplete expansion formula:

with , 4.2

for representing by will lead to the mean approximation error , expressed as:

4.3

Considering the feature vector as a random variable, the expression is the expectation or the first statistical moment of the distribution function of the underlying random process (compare appendix A.2). Furthermore, the matrix in the rightmost term of EQ 4.3 is equivalent to the matrix of the second moments of the random distribution function (autocorrelation matrix). may be estimated from a set of training vectors , like:

. 4.4

The matrix equals the sample covariance matrix if the

overall mean of the training set is the null vector. Minimizing in EQ 4.3 is achieved by solving the eigenvalue problem . The matrix of second moments can be written as: . 4.5 x D Φ DxD D φi_∈ℜD _ℜD x x y_iφi i=1 D

∑

= y_i = φiTx x' y_iφi i=1 d

∑

= d≤D x x' ε_d ε_d E x[ –x'2] E y_iφi i=d+1 D

∑

2 φiTE xx[ T]φi i=d+1 D

∑

φiTSφi i=d+1 D

∑

= = = = x E[ ] S S X = {x_j x_j∈ℜD} j = 1, ,… J S' 1 J --- x_jxT_j j=1 J

∑

= S = E xx[ T] C = E[(x–µ)(x–µ)T] µ = E x[ ] ε_d Sφ_i = λ_iφ_i S S ΦTΛΦ ΦT λ₁ … 0 … … … 0 … λ_D Φ = =

26 Elements of a pattern recognition system

As is a symmetric positive-definite matrix (compare EQ 4.4), all eigenvalues are real and positive. Sorting the eigenvalues , , in descending order and inserting EQ 4.5 into EQ 4.3 minimizes the mean expected error in a least square sense:

. 4.6

As a consequence, the linear transformation:

, 4.7

where contains as columns the ordered set of eigenvectors , results in the KL coordinate system as given in EQ 4.1. The coordinate coefficients are then mutually uncorrelated and it has been shown, that the components are sorted according to their degree of information about the random variable (e.g. Kittler and Young, 1973). A reduction of dimensionality is achieved by dropping components with index higher than . An appropriate value of is usually found from arguments regarding the magnitude of the corresponding eigenvalues or by trial and error. In practice, the transformation matrix is obtained from the eigenproblem solution of the sample covariance matrix for the centralized vector .

The de-correlation transformation in EQ 4.7 may be further modified in order to obtain a new coordinate system where the sample covariance matrix of the feature vectors equals the unity matrix . This is an advantageous property if the following classifier approach is based on the euclidean metric. The so-called prewhitening transformation is given by:

. 4.8

This transformation normalizes the individual components in the transformed feature vector according to their respective standard deviation, which in turn allows to use the euclidean metric as a proper distance measure in the reduced vector space (e.g. Deller et al., 1993, p. 62). The importance of the normalization property of the transformation given by EQ 4.8 will become evi- dent in the following subsection.

In document Continuous Automatic Classification of Seismic Signals of Volcanic Origin at Mt. Merapi, Java, Indonesia (Page 33-36)