A Multivariable Time Series Classification Approach Based on Improved Functional Echo State Network

(1)

2019 International Conference on Information Technology, Electrical and Electronic Engineering (ITEEE 2019) ISBN: 978-1-60595-606-0

A Multivariable Time Series Classification Approach Based on Improved

Functional Echo State Network

Jian-xi YANG

1

, Ying-ying HE

1

, Zheng-wu LI

2

, Ren LI

1,*

and Jing-pei DAN

3 1_{College of Information Science and Engineering, Chongqing Jiaotong University, China}

2

Highway Administration Bureau of Ningxia Hui Autonomous Region, Ningxia, China

3_{College of Computer Science, Chongqing University, China}

*Corresponding author

Keywords: Functional echo state network, Time series classification, Softmax regression, Genetic algorithm, Structural damage detection.

Abstract. Functional echo state network (FESN) is a new kind of recurrent neural network which has been successfully used for time series classification. In order to make FESN more suitable for multi-variable time series data classification task, we present a novel FESN model by modifying the output layer of original FESN with softmax regression, and the L-BFGS algorithm is employed to train such proposed model. Moreover, the genetic algorithm is used to determine the hyper-parameter of the improved FESN. The experimental results show that the proposed approach can achieve better accuracy than classical classifiers such as support vector machine, Long Short-Term Memory neural network and original FESN, in the context of multi-variable series data classification.

Introduction

Time series analysis always be a hot topic and challenging issue in the field of machine learning and data mining communities [1]. Especially in the era of Internet of Things and big data, large amount of sequential data is continuously obtained with the increasing development of sensing technologies. For decades, many research works have been proposed for temporal data processing. For example, the recurrent neural networks (RNN), especially the Long Short-Term Memory (LSTM) and the bi- directional LSTM have been successfully applied in many domains [2].

It is noteworthy that the Echo State Network (ESN) has emerged as a novel recurrent neural network that can efficiently process temporal dependency of time series. Different from traditional RNN, training an ESN is simple and fast, because the reservoir with fixed weights in ESN avoids the vanishing gradient and exploding gradient problems, simple linear regression algorithm can be employed to compute the linear readout layer weights. To date, ESN has been successfully applied for time series prediction [3]. Meanwhile, aiming to the time series classification task, Ma et al. [4] introduced Functional Echo State Network (FESN) which integrates temporal aggregation operator into the reservoir units and replaces the numerical variables of the output weights in ESN with time-varying functions. However, current FESN is only evaluated in the context of univariate time series, how to employ FESN in multi-variate time series classification is not yet been explored.

In this paper, we propose a novel FESN model which modifies the original output layer of FESN with softmax regression. The proposed model is evaluated for structural damage detection context that can be transformed into a supervised classification problem for multi-variables time series.

This paper is organized as follows. We describe the improved FESN model in Section 2. In the Section 3, the experimental setup and results are presented. The conclusion and some future works are discussed in Section 4.

The Improved FESN Model

(2)

layer of FESN, and the numeric variables of output weights in ESN is replaced by the time-varying functions. Based on that, FESN can be used as classifier instead of a regressive model.

[image:2.595.77.523.141.273.2]

In order to make FESN model more suitable for the structural damage detection task [5] which has significant characteristics of multivariable time series classification, we modify the output layer of original FESN model with softmax regression. The novel model structure is showed in Fig.1.

Figure 1. The model structure of improved FESN with softmax regression output.

The softmax regression can be computed by using gradient descent method or unconstrained optimization algorithm. Gradient descent method includes stochastic gradient descent, batch gradient descent, etc. Unconstrained term optimization algorithms include Newton method, quasi-Newton method, and Limited Storage Quasi-Newton Method [6]. Comparing with gradient descent algorithm, unconstrained optimization algorithm has the following advantages. First, it does not need to choose the step manually. Second, it is usually faster than gradient descent algorithm. L-BFGS algorithm [7] is an optimized newton algorithm. It has the fast convergence characteristics, as well as it does not need to store Hesse matrix. It also overcomes the storage capacity of BFGS matrix. When the scale of optimization problem is large, the storage and calculation of matrix will become infeasible, and the non-sparse matrix will lead to the shortcoming of slow training speed. L-BFGS algorithm is more suitable for large-scale numerical computation. Therefore, we adopt the L-BFGS algorithm for learning and optimizing the parameters of softmax-based output layer.

Hyper Parameters Determination

Determining the hyper-parameters is one of the most important step before training the improved FESN model. These hyper-parameters include the weight matrix Win of the FESN input layer to the reservoir, the internal unit weight matrix W, input unit scale and sparsity degree of the reservoir, etc. Genetic algorithm is an adaptive global optimization search algorithm that simulates the evolutionary processes in natural environment [8]. For the hyper-parameters selection purpose, genetic algorithm is used to compute the object function func = min f(error), where error = (x1, x2,

x3, x4) represent the radius, scale, input unit scale and sparsity of the internal connection weight

spectrum of the reservoir. Assume that the objective function. The computational steps are as follows.

(1) Select the population size as M = 20 and the maximum algebra G = 20, coding x1, x2, x3 and x4

to generate the random initial population.

(2) Calculate the fitness value of each individual fitness(i) = R/f(errori), and set a constant R to prevent the error rate being too large and the fitness function reciprocal close to 0. The selection operator usually adopts the elite individual preservation strategy and the bet round selection operator, that is, the individual with the highest fitness must be selected. Calculate the selected probability P and cumulative probability Q of each individual in the overall population fitness as:

1 1

( ) ( )

M i

i i j

i j

P fitness i fitness i Q P

 









(1)

(3)

Finally, when the termination condition is satisfied, the iterative computation is stopped, and the result is the optimal solution. The Fourier-varied feature matrix of the FESN can be obtained from the optimized parameters, and the feature matrix can be trained by using softmax regression.

Multivariable Time Series Classification

The training set consists of labeled samples {(x(1), y(1)),…, (x(m), y(m))}, y(i){1, 2, …, k}, x denotes the multivariable time series. In Softmax regression, the probability of classifying x as category j is:

( ) ( ) ( ) 1 ( | ; ) T j T j x i i i

k x i

j

e p y j x

e      



₍₂₎

where  is the model parameter,

( ) 1

1

T j

k x i

j e  



normalizes the probability distribution so that the

sum of all the probabilities equals to 1. The cost function of softmax regression algorithm is:

( )

1 1

1

1 ( )

( ) [ 1{ }log ]

( ) T j T j m k i k i j j

e x i

J y j

m _{e x i}

       







₍₃₎ Modify the cost function by adding a weight decay term that will punish the oversized parameter value and the cost function becomes,

( ) 2

1 1 1 0

1

1 ( )

( ) [ 1{ }log ]

2 ( ) T j T j

m k k n

i

ij k

i j i j

j

e x i

J y j

m _{e x i}

           



 





₍₄₎ With this weight decay term, the cost function becomes a strict convex function, which guarantees an independent solution. In order to use the iterative optimization algorithm L-BFGS, the derivative of this new function J(_{) is required. The derived derivatives are as follows.}

( ) ( ) ( ) ( )

1

( ) [ (1{ } ( | ; ))]

j

m

i i i i

i

J x y j p y j x

m

  



  



  

(5) By minimizingJ( ) , the softmax regression can be achieved.

In order to obtain the optimal parameters, we minimizeJ( ) _{by using the L-BFGS algorithm.}

L-BFGS automatically adjusts the learning rate to get the appropriate step value. The process of the algorithm are as follows:

Step 1: select initial pointx₀, operating error  0 , store the most recent miteration data;

Step 2: k 0,H0 1 ,r f x( )0 ;

Step 3: if f x( _k_₁) , return the optimal solutionx, else move to Step 4; Step 4: calculate the feasible direction of this iteration,p_k  r_k;

Step 5:calculation step _k 0, search for the formula f x( _k_kp_k)min (f x_k _kp_k); Step 6:update weight x,x_k_₁x_k_kp_k;

Step 7:if k m, delete (sk m ,tk m ) when the most recent vector is retained;

Step 8:compute and save s_k x_k_₁x t_k, _k  f x( _k_₁)f x( _k); Step 9:use the two-loop recursion algorithm to find r_k  B_k f x( _k); Step 10:k k 1, move to Step3;

(4)

Experiment and Results

In this section, we empirically evalutate the performance of the proposed framework by using the bookshelf dataset [9]. The vibration data with obvious multi-variable time series features are obtained from a three-story building structure constructed of Unistrut columns and aluminium floor plates. The structure is instrumented with 24 piezoelectric single axis accelerometers. The damage levels in the bookshelf dataset, such as “D00”, “DB0”, “DBB”, “DHT” and “D05”. 90% of these data are used as training dataset. In order to evaluate the accuracy, four classifiers: Support Vector Machine (SVM), LSTM [10], FESN and the proposed I-FESN are compared.

First, based on the genetic algorithm-based optimizing steps, the hyper-parameters such as the input unit scale 1 is set to 0.9, the input unit scale 2 is set to 0.8, the reservoir sparsity degree is set to 0.01, the reservoir scale is set to 305, the reservoir internal connection weight spectrum radius is set to 0.9, weight matrix Win and W are randomly initialized. The detailed hyper-parameter settings of the FESN and I-FESN for the experimental structural damage detection task are showed in Table 1 and Table 2, respectively.

Table 1. The hyper-parameters of FESN used in the experimental models.

Training and test set

ratio

Input unit scale 1

Input unit scale 2

Reservoir sparsity

Reservoir scale

Reservoir internal connection weight spectrum radius

Number of samples

W_、 in W

initializa-tion

9:1 0.9 0.8 0.01 305 0.9 5500 Random

initializa-tion

Table 2. The hyper-parameters of the proposed improved FESN used in the experimental models.

Training and test set

ratio

Input unit scale

Reservior sparsity

Reservior scale

Reservior internal connection weight spectrum radius

Number of samples

W_、 in

W _{initialization}

9:1 0.9 0.1 305 1 5500 Random

[image:4.595.98.464.545.773.2]

initialization

Fig. 2 and Fig. 3 shows the accuracy and loss curves of improved FESN during training.

After the model parameters were set and the model training is completed on the Bookshelf frame structure experimental training data set, the performance damage identification models were evaluated on the test data set. In the evaluation process, the accuracy of the structural damage identification evaluation index is used for performance evaluation. The classification error rate is calculated using the following formula:

total number of misclassification total number of test data error

(5)

Figure 3. The loss of improved FESN during training and testing.

[image:5.595.132.475.80.318.2]

The accuracy of the traditional SVM–based method for the Bookshelf framework is 58.42%, the LSTM-based is 84.63%, the FESN-based method is 69.27%, and the proposed improved FESN (I-FESN) based method is 99.45%. The experimental results are shown in Table 3.

Table 3. The performance contrast structural damage identification method.

Loss identification method

SVM-based damage detection

FESN-based damage detection

LSTM-based damage detection

I-FESN based damage detection

Accuracy/% _58.42% _69.27% _84.63% _99.45%

The performance comparison results of the structural damage identification method show that the traditional SVM damage identification method has the lowest accuracy, and the structural damage identification method with improved FESN model has the highest accuracy. The proposed structural damage identification model based on improved FESN is outperfomed about 30% compared with the original FESN-based structural damage detection method.

Conclusions

In this paper, we proposed a novel FESN model for the multivariable time series classification task. The output layer of original functional echo state network is changed from the linear regression to the softmax regression, and trained with the L-BFGS algorithm. The improved FESN model is evaluated in the bookself data set, and compared with the traditional classifers such as SVM, LSTM and FESN. The experimental results show that the improved FESN model outperforms these traditional damage detection methods and achieves the highest accuracy.

Acknowledgement

(6)

References

[1]J. Lines, A . Bagnall. Time series classification with ensembles of elastic distance measures. Data Mining and Knowledge Discovery, 2015, 29(3):565-592.

[2]P. P. Barman, A. Boruah. A RNN based Approach for next word prediction in Assamese Phonetic Transcription. Procedia Computer Science, 2018, 143.

[3]H. Jaeger , H. Haas , Harnessing nonlinearity: Predicting chaotic systems and saving energy in wireless communication, Science, 2004, 304 (5667): 78–80.

[4]Q. Ma, L. Shen, W. Chen et al. Functional Echo State Network for Time Series Classification. Information Sciences, 2016, 373:1-20.

[5]J. Guo, X. Xie, R. Bie and L. Sun. Structural health monitoring by using a sparse coding-based deep learning algorithm with wireless sensor networks. Pers. Ubiquit. Comput, 2014, 18, pp. 1977-1987.

[6]F. Rahpeymaii, M. Kimiaei, A. Bagheri. A limited memory quasi-Newton trust-region method for box constrained optimization. Journal of Computational & Applied Mathematics, 2016, 303:105-118.

[7]J. B. Erway, R. F. Marcia. Limited-memory BFGS Systems with Diagonal Updates. Linear Algebra & Its Applications, 2012, 437(1):333-344.

[8]G. M. Morris, D. S. Goodsell, R. S. Halliday, et al. Automated docking using a Lamarckian genetic algorithm and an empirical binding free energy function. Journal of Computational Chemistry, 2015, 19(14):1639-1662.

[9]Los Alamos National Laboratory (2000), http://insitute.lanl.gov/ei/software-and-data/