Comparing different models - Predicting building-automation time-series data with supervised me

Given the input features the next step is to create machine learning models which can use these features to predict future CO2 values. It should however be noted

that these algorithms are not only limited to predicting CO2 measurements, but

any real-valued time-series, here we are only using CO2 values as one example.

Firstly we wanted to create the simplest possible model to be used as a baseline. The model we decided to use is nothing but an identity function, using the current CO2 measurement as an input and giving that same value as an output. In our case

this model can be seen as the following function

f (X) =         0 0 1 · · · 0         · [x1, x2, · · · , xn] = xc (36)

where X is an input vector containing all features and xc is the CO2 value feature.

We will call this model the predict previous value model (PPV). This model worked surprisingly well, even outperforming some of the machine learning models. We mentioned earlier that we are using Tf orward = 10, this means that the PPV model

is predicting that the CO2 value in 10 minutes will be the same as it is currently.

This makes the PPV model lag slightly behind the actual outputs and a higher Tf orward value will increase that lag even further, making the model less accurate.

Figure 16: Visualization of the PPV model predictions

Here in Figure 16 we can see the previously mentioned lag of the PPV model when Tf orward = 10. After creating the PPV model we began creating supervised learn-

ing models. The algorithms used were linear regression (LR), decision tree (DT), random forest (RF), feedforward neural network (FFNN) and finally a long short- term memory neural network (LSTMNN). Table 1 lists the accuracy of the different models.

Table 1: Comparison of different models created for predicting CO2 values

Training RMSE Test RMSE Training R2 Score Test R2 Score

PPV 119.56 44.96 0.714 0.846 ARIMA 119.56 47.39 0.714 0.830 LR 112.77 48.77 0.659 0.786 DT 29.90 70.02 0.982 0.626 RF 43.15 54.66 0.960 0.719 FFNN 118.96 44.89 0.710 0.843 LSTMNN 24.66 9.09 0.988 0.944

In Table 1 one can notice that the training scores are worse than the test scores for some of the models, this is due to there being a large peak in the CO2 value in the

training data, which was difficult to model. This problem with the training data and the test data being so different could have been solved by using cross-validation. The first 3 machine learning models, as well as the ARIMA model, used the input features which we defined in the previous section. The two remaining models which are both neural networks only use the CO2 values as inputs, seeing as they perform

well using only the original time-series as input.

6.3.1 ARIMA model

The first model we created was an ARIMA model to see how it compared to the supervised learning methods. As with our other models, this one was created in Python. We used the Python package called statsmodels [SP10] in order to make our ARIMA model. The model which we created was a ARIMA(5, 1, 0) model, meaning that the lag value for autoregression is 5, the difference is of order 1 which makes the time-series stationary and the order of the moving average model is 0. When comparing to our machine learning models it performed quite well. The ARIMA model was only slightly worse than the PPV model, it beat all of the LR, DT and the RF models in accuracy. The only machine learning models which it could not outperform were the neural network models.

6.3.2 Linear regression model

The following model we created was a linear regression model. The LR model, as well as the DT and RF models, was written in Python using the scikit-learn machine learning library [PVG+_{11]. When compared to the PPV model the LR}

model performed reasonably well, yet still worse than the baseline model. Both the RMSE and R2 _{scores were quite close to the PPV model scores, being only about}

8% worse which can be seen in table 1. Since the accuracy was worse than the PPV it is clear that we can not use it as our final model.

6.3.3 Decision tree and random forest models

The next two models are both based on decision trees, one being a decision tree model and the second one being a random forest model. As discussed in Chapter 4,

decision trees are not as well suited for regression tasks as classification tasks. This discussion is supported by our experiments, as seen in Table 1.

Figure 17: Comparison of the predictions of the decision tree model (left) and the random forest model (right)

In table 1 we can see that both the DT model and the RF model performed quite poorly. The DT model had a R2 score of 0.626 and the RF model had 0.719, both worse than the PPV model as well as the LR model. We can also see a visualization of how they performed in Figure 17. In the figure it is clear that the DT prediction varies highly, it slightly follows the actual CO2 value but not nearly enough to be

considered as out final model. The RF model does perform better than the DT model, as it should seeing as it is an improvement upon the DT model, yet still not well enough to outperform the PPV model. This low accuracy may mean that decision tree models are not quite good enough with regression tasks, which is why we can not use them here.

The poor performance of the decision tree models compared to the baseline is an indication of overfitting. We tried reducing the overfit by training the models by giving them only the current CO2 value as input. This did however not reduce the

6.3.4 Neural network models

The final models which we consider are neural network models, one of which is a feedforward neural network and the other one is a recurrent neural network. As with the decision tree models, the feedforward neural network (MLP) model was created using the scikit-learn library. This model performed much better than the previous ones, with a performance similar to the PPV model. But one could argue that since the results of this model are so similar to the PPV model and creating the PPV model is much simpler, there is no reason to use it over the PPV model. So as mentioned the final model that we created was a Long short-term memory (LSTM) model. This model was created using the Keras Python library [C+15], which in turn used the Tensorflow [AAB+15]. With the Keras library we were able to manually create our neural network. The network which we created was not overly complex, it contained one LSTM layer as well as one dense layer with MSE as the loss function. We can see by the very high test R2 _{score in table 1 that this}

model performed extremely well, beating all of the other models with a wide margin.

Figure 18: Comparison of the predictions of the feedforward neural network model (left) and the LSTM network model (right)

Here in Figure 18 we can see a comparison of the FFNN model and the LSTM model. Both of the models seem quite similar, both follow the actual output values very closely. The one thing to note however is that the FFNN model predictions lag behind the real outputs slightly more than the LSTM model predictions do, which

is the reason why the LSTM model has better accuracy. This result is one which we expected, we discussed how the RNN networks is good at discovering patterns in time-series data and here we can see how it works in practice.

In document Predicting building-automation time-series data with supervised methods (Page 34-39)