Experiment Improvement of Restricted Boltzmann Machine Methods for Image Classi cation

(1)

Experiment Improvement of Restricted Boltzmann Machine Methods for Image Classi¯cation

Christine Dewi

Department of Information Management Chaoyang University of Technology

Taichung, Taiwan, R.O.C.

Faculty of Information Technology Satya Wacana Christian University

Salatiga, Indonesia [email protected]

Rung-Ching Chen^* Department of Information Management

Chaoyang University of Technology Taichung, Taiwan, R.O.C.

[email protected]

Hendry

Faculty of Information Technology Satya Wacana Christian University

Salatiga, Indonesia [email protected]

Hsiu-Te Hung

Department of Information Management Chaoyang University of Technology

Taichung, Taiwan, R.O.C.

[email protected]

Received 22 April 2020 Accepted 20 November 2020

Published 19 January 2021

Restricted Boltzmann machine (RBM) plays an important role in current deep learning tech- niques, as most of the existing deep networks are based on or related to generative models and image classi¯cation. Many applications for RBMs have been developed for a large variety of learning problems. Recent developments have demonstrated the capacity of RBM to be pow- erful generative models, able to extract useful features from input data or construct deep

*Corresponding author.

This is an Open Access article published by World Scienti¯c Publishing Company. It is distributed under the terms of the Creative Commons Attribution 4.0 (CC BY) License which permits use, distribution and reproduction in any medium, provided the original work is properly cited.

Vietnam Journal of Computer Science Vol. 8, No. 3 (2021) 417–432

#.c The Author(s)

DOI:10.1142/S2196888821500184

417 Vietnam J. Comp. Sci. 2021.08:417-432. Downloaded from www.worldscientific.com by 134.122.89.123 on 02/18/22. Re-use and distribution is strictly not permitted, except for Open Access articles.

(2)

arti¯cial neural networks. In this work, we propose a learning algorithm to ¯nd the optimal model complexity for the RBM by improving the hidden layer (50–750 layers). Then, we compare and analyze the classi¯cation performance in depth of regular RBM use RBM () function, classi¯cation RBM use stackRBM() function, and Deep Belief Network (DBN) use DBN() function with the di®erent hidden layer. As a result, Stacking RBM and DBN could improve our classi¯cation performance compared to regular RBM.

Keywords: Classi¯cation comparison; DBN; RBM; stackRBM.

1. Introduction

Deep neural networks use a large number of layers of neurons and by using layer-wise unsupervised pre-training to learn a probabilistic model. Deep learning has gained its popularity recently as a huge probabilistic model and way of learning complex.

Furthermore, a deep neural network is typically constructed by stacking multiple Restricted Boltzmann Machines (RBMs) so that the hidden layer of one RBM becomes the visible layer of another RBM. The layer-wise pre-training of RBM then facilitates ¯nding a more accurate model for the data. RBM is a two-layered stochastic recurrent neural network¹capable of unsupervised learning and feature extraction. The two layers of the RBM are visible layer to which the test/training sets are applied and the hidden layer, which is the feature extractor. Moreover, RBM has been particularly successful in classi¯cation problems either as feature extractors for text and image data² or as a good initial training phase for deep neural network classi¯ers.^3,4However, in both cases, the RBM is merely the preprocessing learning algorithm of neural networks.

This work is an extended version of the conference ACIIDS 2020 paper.⁵In that paper, propose a learning algorithm to ¯nd the optimal model complexity for the RBM by improving the hidden layer (50–400 layers) for image classi¯cation. The main contributions of this work can be summarized as follows: First, we propose a learning algorithm to ¯nd the optimal model complexity for the RBM by improving the hidden layer (50–750 layers). Second, we will compare and analyze the classi¯cation performance of regular RBM use RBM () function, classi¯cation RBM use stackRBM() function, and Deep Belief Network (DBN) use DBN() function with the di®erent hidden layer. Nevertheless, the rest of the paper is organized as follows. In Sec. 2, we describe a brief explanation about the RBM. Section 3 describes the proposed experimental improvement for RBM. In Sec. 4, we present experimental results, and ¯nally, Sec. 5 concludes this paper and suggests future work.

2. Related Work

The RBM and its deep architectures have di®erent kinds of applications⁶such as dimensionality reduction,⁷ classi¯cation,⁸ collaborative ¯ltering problem,⁹ auto- encoder,^10,11 feature learning,¹² and topic modeling.^3,14 Detailed knowledge of the RBM and its deep architectures can be found in Refs. 13,15and16.

Vietnam J. Comp. Sci. 2021.08:417-432. Downloaded from www.worldscientific.com by 134.122.89.123 on 02/18/22. Re-use and distribution is strictly not permitted, except for Open Access articles.

(3)

2.1. Restricted Boltzmann machines

RBM is a Boltzmann machine that has a bipartite connectivity graph between hidden units h-vector and visible units x-vector. Units in each layer have no connections between them and are connected to all other units in other layers. As a result, the information °ows in both directions, and the weights W-vector are the same in both directions. The RBMs are probabilistic models that use a layer of hidden variables to model a distribution over visible variables.^16,17 Hence, RBM is undirected graphs and graphical models belonging to the family of Boltzmann machines. They are used as generative data models.¹⁵ RBM can be used for data reduction and can also be adjusted for classi¯cation purposes.¹⁸They consist of only two layers of nodes, namely, a hidden layer with hidden nodes and a visible layer consisting of nodes that represent the data. The discriminate RBM was proposed by Larochelle,^18,19which uses class information as visible input so that RBM can provide a self-contained framework for deriving a nonlinear classi¯er. The discriminate RBM model the joint distribution of the inputs and associated target classes, whose graphical model is illustrated in Fig.1.¹⁹

RBM consists of visible unitsv, binary hidden unit's h, and symmetric connections between visible units and hidden units. The connections are represented by a weight matrixW. RBM uses the energy function for the probabilistic semantics. The energy function is described as follow^20–22:

Eðv; hÞ ¼ X

i

X

i

v_iW_ijh_jX

j

b_jh_jX

j

c_iv_i; ð1Þ

where b_j are biases of hidden units and c_i are biases of visible units. This energy function is used to con¯gure a probability model for RBM.W is the weight matrix, v andh represent the visible and hidden layers. a and b are the bias of the visible and hidden layers. When the visible unit state is determined, each hidden element activation state is conditional independent of others. Thejth hidden element activation probability is denied as following²³:

Pðh_j ¼ 1jv; Þ ¼ b_iþX

i

v_iw_ij

!

: ð2Þ

When the hidden element state is determined, the activation state of each visible element is also independent of each other. The probability of theith visible unit is

Fig. 1. Discriminative RBM.

(4)

de¯ned as follows²²:

Pðh_j¼ 1jh; Þ ¼ a_iþX

i

v_iw_ijh_j

!

: ð3Þ

2.2. Stack RBM and deep belief network

In most cases, stacking RBM is only used as a greedy pre-training method for training a DBN as the top layers of a stacked RBM do not in°uence on the lower- level model weights. Function to stack several RBMs, trained greedily by training an RBM (using the RBM function) at each layer, and then using the output of that RBM to train the next layer RBM. However, this model should still learn more complex features than regular RBM. The stack RBM architecture is showed in Fig.2.

In this experiment, we stack some layers of RBM with the stackRBM() function, this function calls the RBM function for training each layer and so the arguments are not much di®erent, except for the added layers argument. With the layers' argument, we can de¯ne how many RBM you want to stack and how many hidden nodes each hidden layer should have.

DBN is composed of multiple layers of RBM. The input of the whole DBN corresponds to the input of the ¯rst layer RBM, and the output corresponds to the output of the last layer RBM. The output of the previous RBM provides the input of each RBM after the ¯rst layer. DBN, as shown in Fig.3, is a deep architecture built upon RBM to increase its representation power by increasing depth. In a DBN, two adjacent layers are connected in the same way as in RBM. The network is trained in a greedy, layer-by-layer manner,²¹ where the bottom layer is trained alone as an RBM, and then ¯xed to train the next layer.

DBN was originally developed by Hinton et al.^24,25 and was originally trained with the sleep-wake algorithm without pre-training. However, in 2006 Hinton et al.

found a method that is more e±cient at training DBN by ¯rst training a stacked RBM and then use these parameters as good starting parameters for training the DBN.^26,27 Then, DBN adds labels at the end of the model and uses either backpropagation or the sleep-wake algorithm to tune the system with the labels as the criterion. The DBN() function in the RBM package uses the backpropagation

Fig. 2. The Stack RBM architecture.

(5)

algorithm.^28,29 The backpropagation algorithm works as follows. (1) First, a feed- forward pass is made through all the hidden layers ending at the output layer. (2) Then, the output is compared to the actual label, and (3) the error is used to adjust the weights in all the layers by going back through the whole system. This process is repeated until some stopping criterion is reached in the DBN() function that is the maximum number of epochs. Still, it could also be the prediction error on a vali- dation set.

3. Methodology 3.1. Dataset

In this stage, we will explain our dataset and experiment. This paper uses the Modi¯ed National Institute of Standards and Technology database (MNIST dataset). The MNIST dataset is an extensive database of handwritten digits that is commonly used for training various image processing systems (see Fig. 4). The database is also widely used for training and testing in the ¯eld of machine learning.^30–32 Hence, the MNIST database of handwritten digits has a training set of

Fig. 3. The DBN architecture.

Fig. 4. MNIST dataset.

(6)

60,000 examples and a test set of 10,000 examples. It is a subset of a larger set available from the National Institute of Standards and Technology (NIST). The digits have been size-normalized and centered in a ¯xed-size image.

Various types of hidden layers were used in the experiments. Further, we raise the nodes in the hidden layer for each model. The con¯gurations of nodes in hidden layer are 50, 100, 150, 200, 250, 300, 350, 400, 450, 500, 550, 600, 650, 700, and 750. This work also combines di®erent layers to improve the classi¯cation performance of RBM. Moreover, we use 2 and 3 layers for stack RBM and DBN. We will compare the classi¯cation performance of regular RBM using RBM function, classi¯cation RBM using stackRBM function, and DBN function with a di®erent hidden layer. The n.hidden argument de¯nes how many hidden nodes the RBM will have, and size.

minibatch is the number of training samples that will be used at every epoch. For each model, we use 1,000 as the number of iterations and 10 for the minibatch. The work°ow of this research could be seen in Fig.5.

However, after training the RBM model, stackRBM model, and the DBN model, we can check how well it reconstructs the data with the ReconstructRBM function.

The function will then output the original image with the reconstructed image next to it. If the model is good, the reconstructed image should look similar or even better than the original. RBM is not only good at reconstructing data but can make predictions on new data with the classi¯cation RBM. So, after we trained our regular RBM, classi¯cation RBM and DBN, we can use it to predict the labels on some unseen test data with the PredictRBM function. This function will output a confusion matrix and the accuracy score on the test set.

3.2. Reconstruction RBM

RBM is a Stochastic Neural Network, which means that each neuron will have some random behavior when activated. There are two other layers of bias units (hidden bias and visible bias) in an RBM. This is what makes RBM di®erent from autoencoders. Besides, the hidden bias RBM produces the activation on the forward pass, and the visible bias helps RBM to reconstruct the input during a backward pass.

The reconstructed input is always di®erent from the actual input as there are no

Fig. 5. The work°ow of the research.

(7)

relationships between the visible units. Therefore, no way of transferring information among themselves. After the training process is ¯nished, the RBM model, the stackRBM model, and the DBN model, we can check how well it reconstructs the data with the ReconstructRBM function. The function will then output the original image with the reconstructed image next to it. If the model is good, the reconstructed image should look similar or even better than the original. RBM not only good at reconstructing data but can make predictions on new data with the classi¯cation RBM.

The ¯rst step in RMB is training an RBM with multiple inputs. However, the inputs are multiplied by the weights and then added to the bias. Next, the result is then passed through a sigmoid activation function, and the output determines if the hidden state gets activated or not. Weights will be a matrix with the number of input nodes as the number of rows and the number of hidden nodes as the number of columns. The ¯rst hidden node will receive the vector multiplication of the inputs multiplied by the ¯rst column of weights before the corresponding bias term is added to it. The sigmoid function can be seen in Eq. (4).

SðxÞ ¼ 1

1 þ e^x: ð4Þ

Figure6represents the reconstruction process in the RBM, but this image shows the reverse phase or the reconstruction phase. It is similar to the ¯rst pass but in the opposite direction.

v^ð1Þ¼ Sðh^ð1Þw^Tþ aÞ; ð5Þ

Fig. 6. The reconstruction process.

(8)

where v^ð1Þ and h^ð1Þ are the corresponding vectors (column matrices) for the visible and the hidden layers with the superscript as the iteration,a is the activations, w is the weight, andb is the visible layer bias vector.

Reconstruction is di®erent from regression or classi¯cation in that it estimates the probability distribution of the original input instead of associating a continuous value to an input example. This means it is trying to guess multiple values at the same time. Hence, generative learning happens in a classi¯cation problem such as mapping input to labels.

3.3. Performance evaluation

Evaluation for this experiment is based on the calculation of accuracy. Accuracy is how often the model trained is correct, which depicted by using the Confusion Matrix. A confusion matrix is the summary of prediction results on a classi¯cation problem.³³The number of correct and incorrect predictions summarized with count values and separated by each class is the key to the confusion matrix.^34–36The higher the accuracy value, the better the method. The confusion matrix can be seen in Table1.

Accuracy calculation in this paper based on formula below³⁷: Accuracy ¼ ðTP þ TNÞ

TP þ TN þ FP þ FN; ð6Þ

where: TP = True positive; FP = False positive; TN = True negative; FN = False negative.

4. Experiments and Results

In this stage, we describe the results of our experiments. We evaluate the performance of the proposed learning algorithm using the MNIST dataset.³⁰ In the classi¯cation results, we focused on whether the experiment improvement RBM obtained the best classi¯cation accuracy performance. Also, we compared the number of hidden neurons RBM. The classi¯er used in all the experiments is the Back-Propa- gation Network (BPN).

Table 2 shows the classi¯cation accuracy of the MNIST dataset with various types of hidden layers using the RBM function. In this experiment, we use 50, 100, 150, 200, 250, 300, 350, 400, 450, 500, 550, 600, 650, 700, and 750 nodes in hidden

Table 1. Confusion matrix.

Prediction class Original class Class A Class B

Class A TP FN

Class B FP TN

(9)

layer. Besides, to train the RBM, we need to provide the function with train data, which should be a matrix of the shape (samples * features); other parameters have default settings. The number of iterations de¯nes the number of training epochs; at each epoch, RBM will sample a new minibatch. When we have enough data, it is recommended to set the number of iterations to a high value as this will improve our model, and the downside is that the function will also take longer to train. The n.hidden argument de¯nes how many hidden nodes the RBM will have, and size.

minibatch is the number of training samples that will be used at every epoch. We use 1,000 as the number of iterations and 10 for the minibatch. Moreover, the highest accuracy is 86%, with 350 nodes in the hidden layer.

After training the RBM model, we can check how well it reconstructs the data with the ReconstructRBM function. The function will then output the original image with the reconstructed image next to it. If the model is any good, the reconstructed image should look similar or even better than the original.

The reconstruction model for digit \8" using RBM function could be seen in Figs.7(a) and7(b). If we use more layers, the reconstructing image will be better, as shown in Fig. 7(a) use 50 layers, and Fig. 7(b) use 700 layers. In addition, the reconstruction model for digit \3" using the RBM function could be seen in Fig.8.

The model reconstruction looks even more like an eight and three than the original image. Therefore, RBM not only good at reconstructing data but can make predictions on new data with the classi¯cation RBM. After we trained our classi¯- cation RBM, we can use it to predict the labels on some unseen test data with the PredictRBM function, which should output a confusion matrix and the accuracy score on the test set that could be seen in Fig.9.

Table3represents the classi¯cation accuracy of the MNIST dataset with various types of hidden layers using stackRBM function. In this experiment, we use various

Table 2. Classi¯cation accuracy of MNIST dataset with various types of hidden layer using RBM function.

No n.iter n.hidden Accuracy

1 1,000 50 0.8245

2 1,000 100 0.846

3 1,000 150 0.851

4 1,000 200 0.8585

5 1,000 250 0.859

6 1,000 300 0.86

7 1,000 350 0.86

8 1,000 400 0.815

9 1,000 450 0.8445

10 1,000 500 0.8075

11 1,000 550 0.7975

12 1,000 600 0.853

13 1,000 650 0.7985

14 1,000 700 0.8075

15 1,000 750 0.8355

(10)

type of hidden layer consists of 50, 100, 150, 200, 250, 300, 350, 400, 450, 500, 550, 600, 650, 700, and 750 in hidden layer nodes for each layer (2 and 3). In this work, the highest accuracy for 2 layers is 91.4% use 500 nodes in the hidden layer, and for 3 layers is 91.65% use 350 nodes in hidden layers. As we can see in Table 3, stacking RBM uses 350 nodes in the hidden layer receives 90.9% accuracy, and it is higher than on Table2normal RBM receives 86% for 350 nodes in the hidden layer. Based on this result, we can conclude that stacking RBM improves our classi¯cation

(a) (b)

Fig. 7. Reconstruction model digit \8" using RBM model 50 layers (a) 700 layers (b).

(a) (b)

Fig. 8. Reconstruction model digit \3" using RBM model 50 layers (a) 700 layers (b).

(11)

performance. However, stackRBM is not a very elegant method, though, as each RBM layer is trained on the output of the last layer, and all the other RBM weights are frozen. It is a greedy method that will not give us the most optimal results for classi¯cation. After training the stackRBM model, we can check how well it reconstructs the data with the ReconstructRBM function. The reconstruction model for the digit \8" and digit \3" could be seen in Figs.10and 11. Figure 12 represents about confusion matrix MNIST dataset using stackRBM function with 350 nodes in the hidden layer using 2 hidden layers, which obtained 90.9% accuracy.

Table 4 shows the classi¯cation accuracy of the MNIST dataset with various types of hidden layers using the DBN function. In this experiment, we use various type of hidden layer consisting of 50, 100, 150, 200, 250, 300, 350, 400, 450, 500, 550, 600, 650, 700, and 750 nodes in hidden layer for each layer (2 and 3). In this work, the highest accuracy for 2 layers is 90.25% use 600 hidden layers, and for 3 layers is 90%

Fig. 9. Confusion matrix MNIST dataset using RBM model.

Table 3. Classi¯cation accuracy of MNIST dataset with various types of hidden layers using stackRBM function.

No n.iter n.hidden Layers Accuracy Layers Accuracy

1 1,000 50 2 0.846 3 0.827

2 1,000 100 2 0.8695 3 0.868

3 1,000 150 2 0.889 3 0.901

4 1,000 200 2 0.8885 3 0.897

5 1,000 250 2 0.9015 3 0.899

6 1,000 300 2 0.9015 3 0.8975

7 1,000 350 2 0.909 3 0.9165

8 1,000 400 2 0.907 3 0.906

9 1,000 450 2 0.8975 3 0.9145

10 1,000 500 2 0.914 3 0.911

11 1,000 550 2 0.9005 3 0.912

12 1,000 600 2 0.902 3 0.9075

13 1,000 650 2 0.9065 3 0.9075

14 1,000 700 2 0.9055 3 0.9125

15 1,000 750 2 0.891 3 0.897

(12)

use 400 nodes in hidden layers. Figure13explains about confusion matrix MNIST dataset using DBN function with 350 nodes in hidden layers, for 2 layers and 1 label layers, which obtained 90.15% accuracy.

Based on the experiment result in Fig.14, the trends of accuracy increase. When we use more nodes in the hidden layer, we can get higher accuracy performance.

After that, the accuracy performance relatively decreases. As we can see in Fig.13, stackRBM obtains the optimum accuracy of 91.4% using 600 hidden layers. It must

(a) (b)

Fig. 10. Reconstruction model digit \8" using stackRBM Model 50 layers (a) 700 layers (b).

(a) (b)

Fig. 11. Reconstruction model digit \3" using stackRBM Model 50 layers (a) 700 layers (b).

(13)

Fig. 12. Confusion matrix MNIST dataset using stackRBM model.

Table 4. Classi¯cation accuracy of MNIST dataset with various type of hidden layer using DBN function.

No n.iter n.hidden layers Accuracy layers Accuracy

1 1,000 50 2 0.8485 3 0.8215

2 1,000 100 2 0.8845 3 0.8665

3 1,000 150 2 0.8905 3 0.8865

4 1,000 200 2 0.8895 3 0.899

5 1,000 250 2 0.8675 3 0.889

6 1,000 300 2 0.902 3 0.8985

7 1,000 350 2 0.9015 3 0.8945

8 1,000 400 2 0.89 3 0.90

9 1,000 450 2 0.8785 3 0.8995

10 1,000 500 2 0.8865 3 0.88

11 1,000 550 2 0.891 3 0.865

12 1,000 600 2 0.9025 3 0.8965

13 1,000 650 2 0.8885 3 0.882

14 1,000 700 2 0.902 3 0.861

15 1,000 750 2 0.883 3 0.869

Fig. 13. Confusion matrix MNIST dataset using DBN model.

(14)

be remembered that when we use multiple layers, we also need a lot of processing time. Finally, in this work, the best performance is obtained when using 350 nodes in the hidden layer and using the DBN function.

5. Conclusions

In this brief, we presented a learning algorithm to ¯nd the optimal model complexity for the RBM by improving the hidden layer (50–750 layers). Based on all experiment results in Tables2,3, and4the number of hidden units and the key parameter of the RBM plays an important role in the modeling capability. Too many hidden units lead to large model size and slow convergence speed, even over¯tting results in poor generalization ability. And too few hidden units result in low accuracy and bad performance of feature extraction. Stacking RBM and DBN could improve our classi¯cation performance compared to regular RBM. Our experiment was focused on comparing the number of hidden neurons using RBM function, stackRBM function, and DBN function. Our future work includes designing a fully automated incremental learning algorithm that can be used in the deep architecture, and we will use other advanced types of RBM like Gaussian RBM and Deep Boltzmann Machines.

Acknowledgments

This paper is supported by the Ministry of Science and Technology, Taiwan. The Nos are MOST-107-2221-E-324-018 -MY2 and MOST-109-2622-E-324-004, Taiwan.

Fig. 14. Classi¯cation accuracy performance MNIST dataset with various types of hidden layers.

(15)

This research is also partially sponsored by Chaoyang University of Technology (CYUT) and Higher Education Sprout Project, Ministry of Education (MOE), Taiwan, under the project name: \The R&D and the cultivation of talent for health- enhancement products."

References

1. W. Yi, J. Park and J. J. Kim, GeCo: Classi¯cation restricted Boltzmann machine hardware for on-chip semisupervised learning and Bayesian inference. IEEE Trans. Neural Netw. Learning Syst. 31 (2020) 53–65.

2. P. V. Gehler, A. D. Holub and M. Welling, The rate adapting poisson model for information retrieval and object recognition, in ACM Int. Conf. Proceeding Series (2006), pp. 337–344.

3. J. Gu and V. O. K. Li, ACL-IJCNLP 2015 - 53rd Annual Meeting of the Association for Computational Linguistics and the 7th Int. Joint Conf. Natural Language Processing of the Asian Federation of Natural Language Processing, Proceedings of the Conference (2015), pp. 162–167.

4. R. Salakhutdinov and G. Hinton, Replicated softmax: An undirected topic model, in Advances in Neural Information Processing Systems 22– Proc. 2009 Conference (2009), pp. 1607–1614.

5. C. Dewi, R. C. Chen, Hendry and H. Te Hung, Comparative analysis of restricted Boltzmann machine models for image classi¯cation. In Asian Conf. Intelligent Informa- tion and Database Systems ACIIDS 2020 (2020), pp. 285–296.

6. R. Salakhutdinov, J. B. Tenenbaum and A. Torralba, Learning with hierarchical-deep models, IEEE Trans. Patt. Anal. Mach. Intell. 35 (2013) 1958–1971.

7. G. E. Hinton and R. R. Salakhutdinov, Reducing the dimensionality of data with neural networks, Science 313 (2006) 504–507.

8. D. Ciregan, U. Meier and J. Schmidhuber, Multi-column deep neural networks for image classi¯cation, in Proc. IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2012), pp. 3642–3649.

9. P. Vincent, H. Larochelle, I. Lajoie, Y. Bengio and P. A. Manzagol, Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion, J. Mach. Learning Res. 11 (2010) 3371–3408.

10. J. Barbieri, L. G. M. Alvim, F. Braida and G. Zimbrão, Autoencoders and recommender systems: COFILS approach, Exp. Syst. Appl. 89 (2017) 81–90.

11. Edureka, Autoencoders Tutorial: A Beginner's Guide to Autoencoders.

12. Y. Bengio, A. Courville and P. Vincent, Representation learning: A review and new perspectives, IEEE Trans. Patt. Anal. Mach. Intell. 35 (2013) 1798–1828.

13. Y. Bengio, Learning deep architectures for AI, Found. Trends Mach. Learning 2 (2009) 1–27.

14. The MIT Press: Scaling Learning Algorithms toward AI. In Large-Scale Kernel Machines (2019).

15. G. E. Hinton, A practical guide to training restricted boltzmann machines. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Arti¯cial Intelligence and Lecture Notes in Bioinformatics) (2012), pp. 599–619.

16. A. Fischer and C. Igel, An introduction to restricted Boltzmann machines, in Lecture Notes in Computer Science (Including Subseries Lecture Notes in Arti¯cial Intelligence and Lecture Notes in Bioinformatics) (2012), pp. 14–36.

(16)

17. C. Gou, K. Wang, Y. Yao and Z. Li, Vehicle license plate recognition based on extremal regions and restricted Boltzmann machines, IEEE Trans. Intell. Transp. Syst. 17 (2016) 1096–1107.

18. H. Larochelle, M. Mandel, R. Pascanu and Y. Bengio, Learning algorithms for the classi¯cation restricted Boltzmann machine. J. Mach. Learning Res. 13 (2012) 643–669.

19. H. Larochelle and Y. Bengio, Classi¯cation using discriminative restricted boltzmann machines, in Proc. 25th Int. Conf. Machine Learning (2008), pp. 536–543.

20. Y. Jiang, J. Xiao, X. Liu and J. Hou, A removing redundancy Restricted Boltzmann Machine, in Proc. 2018 10th Int. Conf. Advanced Computational Intelligence, ICACI 2018 (2018), pp. 57–62.

21. J. Yu, J. Gwak, S. Lee and M. Jeon, An incremental learning approach for restricted boltzmann machines, in ICCAIS 2015 - 4th Int. Conf. Control, Automation and Infor- mation Sciences (2015), pp. 113–117.

22. H. G. Kim, S. H. Han and H. J. Choi, Discriminative restricted Boltzmann machine for emergency detection on healthcare robot, in 2017 IEEE Int. Conf. Big Data and Smart Computing, BigComp 2017 (2017), pp. 407–409.

23. J. Yao, T. Sheng, J. Zhen and X. Bao, Fault prognosis based on restricted Boltzmann machine and data label for switching power ampli¯ers, in Proc. - 12th Int. Conf. Reli- ability, Maintainability, and Safety, ICRMS 2018 (2019), pp. 287–291.

24. G. E. Hinton, P. Dayan, B. J. Frey and R. M. Neal, The \wake-sleep" algorithm for unsupervised neural networks, Science 268 (1995) 1158.

25. C. Dewi, R.-C. Chen and S.-K. Tai, Evaluation of robust spatial pyramid pooling based on convolutional neural network for tra±c sign recognition system. Electronics 9 (2020) 889.

26. G. E. Hinton, S. Osindero and Y. W. Teh, A fast learning algorithm for deep belief nets.

Neural Comput. 18 (2006) 1527–1554.

27. S. Tai, C. Dewi, R. Chen, Y. Liu and X. Jiang, Deep learning for tra±c sign recognition based on spatial pyramid pooling with scale analysis. Appl. Sci. 10 (2020) 6997.

28. S. Ravi and H. Larochelle, Optimization as a model for few-shot learning, in 5th Int. Conf.

Learning Representations, ICLR 2017 - Conference Track Proceedings (2019).

29. C. Dewi, R. C. Chen and H. Yu, Weight analysis for various prohibitory sign detection and recognition using deep learning. Multim. Tools Appl. (2020) 1–21.

30. Y. LeCun, L. Bottou, Y. Bengio and P. Ha®ner, Gradient-based learning applied to document recognition. Proc. IEEE 86 (1998) 2278–2324.

31. R. Zhao, R. Yan, Z. Chen, K. Mao, P. Wang and R. X. Gao, Deep learning and its applications to machine health monitoring. Mech. Syst. Signal Processing (2019).

32. S. Nasrin, J. L. Drobitch, S. Bandyopadhyay and A. R. Trivedi, Low power restricted Boltzmann machine using mixed-mode Magneto-tunneling junctions. IEEE Electron Dev.

Lett. 40 (2019) 345–348.

33. C. Dewi and R.-C. Chen, Human activity recognition based on evolution of features selection and random forest. 2019 IEEE Int. Conf. Systems, Man and Cybernetics (SMC) (2019), pp. 2496–2501.

34. C. Dewi and R. C. Chen, Random forest and support vector machine on features selection for regression analysis. Int. J. of Innov. Comput. Inf. Control 15 (2019) 2027–2037.

35. R. C. Chen, C. Dewi, S. W. Huang and R. E. Caraka, Selecting critical features for data classi¯cation based on machine learning methods. J. Big Data 7 (2020) 1–26.

36. C. Dewi and R.-C. Chen, Decision making based on IoT data collection for precision agriculture, in Studies in Computational Intelligence (2020), pp. 31–42.

37. C. Dewi, R.-C. Chen, Y.-T. Liu, Y.-S. Liu and L.-Q. Jiang, Taiwan stop sign recognition with customize anchor. In ICCMS '20 (Brisbane, QLD, Australia, 2020), pp. 51–55.