• No results found

3.5 Building the logistic regression model

3.5.2 Initial feature selection

One of the most important aspects of classification model design is the selection of features that will make up the model. Finding features that can be used to effectively differentiate between two classes not only results in a highly accurate classification model, but also provides information about the differences in the two

0 10 20 30 40 50 60 −0.04 −0.02 0 0.02 0.04 Time (s) Acceleration (m s −2 ) Freight 0 10 20 30 40 50 60 −0.04 −0.02 0 0.02 0.04 Time (s) Acceleration (m s −2 ) Passenger

Figure 3.4: Examples of known signals of freight and passenger railway

vibration

classes. When optimising the features used in the logistic regression model for this work, over 130 features were initially introduced to the model. Through a com- bination of univariate and multivariate significance testing, testing of correlation between features and accuracy testing, the number of features was reduced to only 2. To avoid the model being over-fitted to a certain set of signals, each test was performed over 1000 randomised splits of training, cross-validation and test sets and decisions were made based on the mean value of these repeated tests. The process of selecting these features is described in this section.

As was demonstrated in Section 3.2, the annoyance responses collected in the field study by Waddington et al. (2014) indicate that the annoyance response to freight railway vibration is different than that to passenger railway vibration. A sensible starting point for feature selection therefore is to consider vibration magnitude descriptors that have previously been shown by Waddington et al. (2014) to correlate well with annoyance . One such descriptor is the vibration dose value, recommended as a quantifier of vibration exposure by the British Standard BS 6472-1:2008. For more information on this metric, and the relevant British

Standard, see Section 2.3.3. Vibration dose value is calculated as follows: VDV = 4 s Z T 0 a(t)4dt (3.14)

where a(t) is the vibration acceleration time history with total duration T. An- other vibration magnitude descriptor that was shown to correlate with annoyance is the rms acceleration as recommended by the standard BS ISO 2631-1:1997. Again, for more information on this metric, and the standard which recommends it, see Section 2.3.3. The rms acceleration is calculated as follows:

rms= s 1 T Z T 0 a(t)2dt (3.15)

Other vibration magnitude descriptors include the equivalent continuous vibration level, which is analogous to the equivalent continuous sound pressure level,Leq and

is calculated as follows: Veq = 20 log10 rms 1×10−6 (3.16)

The vibration exposure level is analogous to the sound exposure level,SEL, and is the vibration level of duration 1 second that would have the same energy content as the whole event. The vibration exposure level is calculated using the following equation:

V EL=Veq+ 10 log10(Ts) (3.17)

The above descriptors should sufficiently quantify any differences between freight and passenger vibration signals that may exist in exposure magnitude. As well as differences in exposure magnitude between these sources, there may be differences in the spectral content which could lead to differences in the annoyance response, since the perception of whole-body vibration has been shown to be influenced by

vibration frequency (see Section 2.2.3). Therefore, it would be sensible to include some descriptors which are measures of the frequency content of the vibration sig- nals in the initial feature selection. One such feature is the spectral centroid, which is a weighted mean of the frequency content in the signal, with the magnitudes of the Fourier transform coefficients as weights, defined as follows:

fsc = P f(n)cmf(n) P cmf(n) (3.18)

wheref(n) is the central frequency of thenth spectral bin andcmf(n) is the mag-

nitude Fourier coefficient of thenth spectral bin. As well as the spectral centroid, spectral energy is quantified by determining the proportionalrmsacceleration con- tained within all 1/3rd octave bands between 0.5 and 80 Hz. This proportional

rms acceleration is defined as the rms acceleration within a particular 1/3rd oc- tave band divided by the rms acceleration in the entire signal. Different spectral distributions may also be captured by the Wb and Wk frequency weightings that

can be applied to calculations of VDV and rms respectively.

Freight railway passbys are typically longer in duration than passenger railway passbys, which may account for differences in the annoyance response, since the perception of whole-body vibration has also been shown to be influenced by du- ration (see Section 2.2.4). Descriptors of the duration of the signal will therefore also be included in the initial feature selection. These descriptors include the du- ration defined by the 3 dB downpoints of the signal (T3dB), the duration defined

by the 10 dB downpoints of the signal (T10dB) and the “event signal duration”

(Te) defined here as the duration of the signal that exceeds the top 1/3rd of the

signal’s dynamic range. All downpoints are defined as the “first” and “last” points of the signal which exceed the specified threshold, in order to avoid capturing only strong peaks in the signal. The event signal duration was defined to capture the duration of signals regardless of their dynamic range, since theT3dB and T10dB de-

scriptors can under-estimate or over-estimate the duration of signals with high or low dynamic ranges respectively. This is important to consider, since the dynamic range of vibration signals will be affected by the propagation distance from the

railway to the residence. Other temporal descriptors include the rise time, defined as the duration between the first 10 dB and 3 dB downpoints, and the fall time, defined as the duration between the last 3 dB and 10 dB downpoints.

Some statistical parameters that are included in the initial feature selection include the crest factor, which is a function of the peak vibration acceleration, and is a measure of the “peakiness” of the vibration signal:

Cr= amax

rms (3.19)

whereamaxis the peak vibration acceleration. Another descriptor which quantifies

the peakiness of the vibration signal is the kurtosis,Kt, which is defined as follows:

Kt= 1 T σ4 T X t=0 [a(t)−¯a]4 (3.20)

where σ is the standard deviation of the vibration acceleration (equivalent to the

rms acceleration when the mean of the acceleration signal, ¯a, is zero). Another statistical descriptor considered by Waddington et al. (2014) as a potential metric to quantify exposure is the skewness, Sk, which is a measure of the temporal

distribution of the acceleration signal envelope and is calculated as follows:

Sk= 1 T σ3 T X t=0 |a(t)−¯a|3 (3.21)

All the above descriptors cover the exposure magnitude, spectral and temporal characteristics of the vibration signal and the use of these descriptors in the build- ing of the logistic regression model should allow any potential differences in freight and passenger vibration signals to be determined, and subsequently utilised in a classification algorithm for unknown vibration signals. Since features in a regres- sion model may have vastly different magnitudes, it is often recommended that the features be normalised in order to make computation easier, and the regression coefficients more interpretable. Due to potential differences in ground conditions

and source to receiver distances between measurement positions, each signal fea- ture was normalised against the mean value of the same feature of all event signals recorded at the same control position, with the same ground conditions and source to receiver distances.