Conclusion - Investigating Neural-Based Learning Algorithms for Control

In this paper, we proposed two recurrent neural network structures, namely Simple Gated Unit (SGU) and Deep Simple Gated Unit (DSGU). Both structures require fewer parameters than GRU and LSTM.

In experiments, we noticed that both DSGU and SGU are very fast and often more accurate than GRU and LSTM. However, unlike DSGU, SGU seems to sometimes lack the ability to characterize accurately the mapping between two time steps, which indicates that DSGU might be more useful for general applications.

Figure 26: Validation accuracy of DSGU, GRU and SGU and LSTM in terms of time in the text generation task. Each line is drawn by taking the mean value of 15 runs of each configuration.

one hand, with properly designed multiplication gate, RNN can learn faster than other models but become more fragile due to the fluctuation of data, on the other hand, adding a layer to the multiplication gate can make RNN more stable, while keeping the learning speed.

In the future, we would like to prove the usefulness of deep multiplication gate math- ematically, test the performance with a deeper gate as well as perform experiments on more tasks.

References

Ama98 Amari, S.-I., Natural gradient works efficiently in learning. Neural computation, 10,2(1998), pages 251–276.

Bak01 Bakker, B., Reinforcement learning with long short-term memory. NIPS, 2001, pages 1475–1482.

Bak02 Bakker, B., Reinforcement Learning with Long Short-Term Memory, 2002.

Bak03 Bakker, B., Reinforcement learning with long short-term memory. In Advances in Neural Information Processing Systems 15 (NIPS-2002, 2003, pages 1475–1482.

Bar98 Barto, A. G., Reinforcement learning: An introduction. MIT press, 1998.

BB01 Baxter, J. and Bartlett, P. L., Infinite-horizon policy-gradient estimation. Journal of Artificial Intelligence Research, pages 319–350.

BBB+₁₀ _{Bergstra, J., Breuleux, O., Bastien, F., Lamblin, P., Pascanu, R., Des-}

jardins, G., Turian, J., Warde-Farley, D. and Bengio, Y., Theano: a CPU and GPU math expression compiler. Proceedings of the Python for Scientific Computing Conference (SciPy), June 2010. Oral Presen- tation.

BC13 Ba, L. and Caurana, R., Do Deep Nets Really Need to be Deep ? arXiv preprint arXiv:1312.6184, pages 1–6. URL http://arxiv.org/ abs/1312.6184.

BCB15 Bahdanau, D., Cho, K. and Bengio, Y., Neural machine translation by jointly learning to align and translate. ICLR.

BLP+12 Bastien, F., Lamblin, P., Pascanu, R., Bergstra, J., Goodfellow, I. J., Bergeron, A., Bouchard, N. and Bengio, Y., Theano: new features and speed improvements, Deep Learning and Unsupervised Feature Learn- ing NIPS 2012 Workshop, 2012.

BPP12 Bhatnagar, S., Prasad, H. and Prashanth, L., Stochastic recursive algorithms for optimization: simultaneous perturbation methods, volume 434. Springer, 2012.

Cas97 Cassandra, A. R., A Survey of POMDP Applications. Uncertainty in Artificial Intelligence, 1997, pages 472–480.

CGCB15 Chung, J., Gulcehre, C., Cho, K. and Bengio, Y., Gated feedback recurrent neural networks. arXiv preprint arXiv:1502.02367.

Cot89 Cotter, N. E., The stone-weierstrass theorem and its application to neural networks. IEEE transactions on neural networks/a publication of the IEEE Neural Networks Council, 1,4(1989), pages 290–295. CVMG+14a Cho, K., Van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares,

F., Schwenk, H. and Bengio, Y., Learning phrase representations using rnn encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078.

CvMG+_{14b Cho, K., van Merrienboer, B., Gulcehre, C., Bougares, F., Schwenk, H.}

and Bengio, Y., Learning Phrase Representations using RNN Encoder- Decoder for Statistical Machine Translation. arXiv. URL http:// arxiv.org/abs/1406.1078.

DHS11 Duchi, J., Hazan, E. and Singer, Y., Adaptive subgradient methods for online learning and stochastic optimization. The Journal of Machine Learning Research, 12, pages 2121–2159.

DKB14 Dinh, L., Krueger, D. and Bengio, Y., NICE: Non-linear Independent Components Estimation. 1,2(2014), page 11. URL http://arxiv.org/ abs/1410.8516.

DNP12 Daniel, C., Neumann, G. and Peters, J., Hierarchical Relative En- tropy Policy Search. Fifteenth International Conference on Artifi- cial Intelligence and Statistics (AISTATS 2012), 2012, pages 1–9, URL http://www.is.tuebingen.mpg.defileadmin/user\_upload/ files/publications/2012/AISTATS-2012-Daniel.pdf.

Fuk75 Fukushima, K., Cognitron: A self-organizing multilayered neural network. Biological cybernetics, 20,3-4(1975), pages 121–136.

GG16 Gao, Y. and Glowacka, D., Deep gate recurrent neural network, 2016. GJM13 Graves, A., Jaitly, N. and Mohamed, A. R., Hybrid speech recognition

Speech Recognition and Understanding, ASRU 2013 - Proceedings, 2013, pages 273–278.

GMH13 Graves, A., Mohamed, A.-r. and Hinton, G., Speech recognition with deep recurrent neural networks. Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on. IEEE, 2013, pages 6645–6649.

Gra13a Graves, A., Generating sequences with recurrent neural networks. arXiv preprint arXiv:1308.0850, pages 1–43. URL http://arxiv.org/abs/ 1308.0850.

Gra13b Graves, A., Generating sequences with recurrent neural networks. arXiv preprint arXiv:1308.0850.

GS04 Gomez, F. J. and Schmidhuber, J., Co-Evolving Recurrent Neurons Learn Deep Memory POMDPs. Technical Report, 2004.

GSC00 Gers, F. A., Schmidhuber, J. and Cummins, F., Learning to forget: Continual prediction with lstm. Neural computation, 12,10(2000), pages 2451–2471.

GU11 Guizzo, E. S. U. and Urmson, C., How Google’s self-driving car works, 2011. URL http://spectrum.ieee.org/automaton/robotics/ artificial-intelligence/how-google-self-driving-car-works. Gui11 Guizzo, E., How google’s self-driving car works. IEEE Spectrum Online,

October, 18.

GWD14 Graves, A., Wayne, G. and Danihelka, I., Neural Turing Machines. pages 1–26. URL http://arxiv.org/abs/1410.5401.

HGT+14 Hoffman, J., Guadarrama, S., Tzeng, E., Hu, R., Donahue, J., Girshick, R., Darrell, T. and Saenko, K., LSDA: Large Scale Detection Through Adaptation. pages 1–9. URL http://arxiv.org/abs/1407.5035. HOT06 Hinton, G. E., Osindero, S. and Teh, Y.-W., A fast learning algorithm

for deep belief nets. Neural computation, 18,7(2006), pages 1527–1554. HS97 Hochreiter, S. and Schmidhuber, J., Long short-term memory. Neural

HS06 Hinton, G. E. and Salakhutdinov, R. R., Reducing the dimensionality of data with neural networks. Science, 313,5786(2006), pages 504–507. HSW89 Hornik, K., Stinchcombe, M. and White, H., Multilayer feedforward networks are universal approximators. Neural networks, 2,5(1989), pages 359–366.

JY09 Ji, S. and Ye, J., An accelerated gradient method for trace norm min- imization. Proceedings of the 26th annual international conference on machine learning. ACM, 2009, pages 457–464.

KB14a Kingma, D. and Ba, J., Adam: A Method for Stochastic Optimization. pages 1–13. URL http://arxiv.org/abs/1412.6980.

KB14b Kingma, D. and Ba, J., Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.

KBP13 Kober, J., Bagnell, J. a. and Peters, J., Reinforcement learning in robotics: A survey. The International Journal of Robotics Research, 32, pages 1238–1274. URL http://ijr.sagepub.com/cgi/doi/10. 1177/0278364913495721.

KL14 Karpathy, A. and Leung, T., Large-scale Video Classification with Con- volutional Neural Networks. Proceedings of 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014, pages 1725–1732. KLM96 Kaelbling, L. P., Littman, M. L. and Moore, A. W., Reinforcement

learning: A survey. Journal of Artificial Intelligence Research, 4, pages 237–285.

KSH12 Krizhevsky, A., Sutskever, I. and Hinton, G. E., ImageNet Classifica- tion with Deep Convolutional Neural Networks. Advances In Neural Information Processing Systems, pages 1–9.

LBBH98 LeCun, Y., Bottou, L., Bengio, Y. and Haffner, P., Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86,11(1998), pages 2278–2324.

LE07 Liu, Z. and Elhanany, I., A scalable model-free recurrent neural network framework for solving POMDPs. Proceedings of the 2007 IEEE Symposium on Approximate Dynamic Programming and Reinforcement Learning, ADPRL 2007, 2007, pages 119–126.

LGTB97 Lawrence, S., Giles, C. L., Tsoi, A. C. and Back, A. D., Face recognition: A convolutional neural-network approach. Neural Networks, IEEE Transactions on, 8,1(1997), pages 98–113.

LJH15 Le, Q. V., Jaitly, N. and Hinton, G. E., A simple way to initialize recurrent networks of rectified linear units. arXiv preprint arXiv:1504.00941. LLS13 Lenz, I., Lee, H. and Saxena, A., Deep Learning for Detecting Robotic Grasps. CoRR, abs/1301.3. URL http://arxiv.org/abs/1301.3592. MGW+06 Mayer, H., Gomez, F., Wierstra, D., Nagy, I., Knoll, A. and Schmid- huber, J., A system for robotic heart surgery that learns to tie knots using recurrent neural networks. IEEE International Conference on Intelligent Robots and Systems, 2006, pages 543–548.

MHG14 Mnih, V., Heess, N. and Graves, A., Recurrent Mod- els of Visual Attention. Advances in Neural Informa- tion . . . , pages 1–9. URL http://papers.nips.cc/paper/ 5542-recurrent-models-of-visual-attention.

Mil68 Miller, B. L., Finite state continuous time markov decision processes with an infinite planning horizon. Journal of Mathematical Analysis and Applications, 22,3(1968), pages 552–569.

MKS+13 Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D. and Riedmiller, M., Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602.

MKS+₁₅ _{Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Belle-}

mare, M. G., Graves, A., Riedmiller, M., Fidjeland, A. K., Ostrovski, G. et al., Human-level control through deep reinforcement learning. Nature, 518,7540(2015), pages 529–533.

Mon82 Monahan, G. E., A Survey of Partially Observable Markov Decision Processes: Theory, Models, and Algorithms. Management Science, 28, pages 1–16. URL http://www.jstor.org/stable/2631070.

MSM97 Matari, M. J., Systems, C. and Matarić, M., Reinforcement learning in the multi-robot domain. Autonomous Robots, 4, pages 73–83. URL http://www.springerlink.com/index/K662611455651Q42.pdf.

NW90 Nguyen, D. H. and Widrow, B., Neural networks for self-learning control systems. Control Systems Magazine, IEEE, 10,3(1990), pages 18–23. ONL+13 O’Connor, P., Neil, D., Liu, S. C., Delbruck, T. and Pfeiffer, M., Real-

time classification and sensor fusion with a spiking deep belief network. Frontiers in Neuroscience.

PMB12 Pascanu, R., Mikolov, T. and Bengio, Y., On the difficulty of training recurrent neural networks. arXiv preprint arXiv:1211.5063.

PS06a Peters, J. and Schaal, S., Policy gradient methods for robotics. Intel- ligent Robots and Systems, 2006 IEEE/RSJ International Conference on. IEEE, 2006, pages 2219–2225.

PS06b Peters, J. and Schaal, S., Policy gradient methods for robotics. IEEE In- ternational Conference on Intelligent Robots and Systems, 2006, pages 2219–2225.

RBK+₁₄ _{Romero, A., Ballas, N., Kahou, S. E., Chassang, A., Gatta, C. and}

Bengio, Y., FitNets: Hints for Thin Deep Nets. pages 1–12. URL http://arxiv.org/abs/1412.6550.

RBNP08 Raibert, M., Blankespoor, K., Nelson, G. and Playter, R., BigDog , the Rough-Terrain Quaduped Robot. Technical Report, 2008.

RGHL09 Riedmiller, M., Gabel, T., Hafner, R. and Lange, S., Reinforcement learning for robot soccer. Autonomous Robots, 27, pages 55–73.

Ros58 Rosenblatt, F., The perceptron: a probabilistic model for information storage and organization in the brain. Psychological review, 65,6(1958), page 386.

SB98a Sutton, R. S. and Barto, A. G., Reinforcement learning: an introduction. IEEE transactions on neural networks / a publication of the IEEE Neural Networks Council, 9, page 1054.

SB98b Sutton, R. S. and Barto, A. G., Reinforcement learning: an introduction. IEEE transactions on neural networks / a publication of the IEEE Neural Networks Council, 9, page 1054.

SBW+10 Schaul, T., Bayer, J., Wierstra, D., Sun, Y., Felder, M., Sehnke, F., Rückstieß, T. and Schmidhuber, J., PyBrain. Journal of Machine Learning Research, 11, pages 743–746.

Seh13 Sehnke, F., Efficient baseline-free sampling in parameter exploring policy gradients: Super symmetric pgpe. In Artificial Neural Networks and Machine Learning–ICANN 2013, Springer, 2013, pages 130–137. SH09 Salakhutdinov, R. and Hinton, G., Deep Boltzmann Machines. Artificial

Intelligence, 5, pages 448–455. URL http://www.cs.utoronto.ca/ ~rsalakhu/papers/dbm.pdf.

SLH+14 Silver, D., Lever, G., Heess, N., Degris, T., Wierstra, D. and Riedmiller, M., Deterministic Policy Gradient Algorithms. Proceedings of the 31st International Conference on Machine Learning (ICML-14), 2014, pages 387–395.

SMSM99 Sutton, R. S., Mcallester, D., Singh, S. and Mansour, Y., Policy Gradi- ent Methods for Reinforcement Learning with Function Approximation. In Advances in Neural Information Processing Systems 12, 1999, pages 1057–1063.

Son71 Sondik, E. J., The optimal control of partially observable markov processes. Technical Report, DTIC Document, 1971.

SOR+₀₈ _{Sehnke, F., Osendorfer, C., Rückstieß, T., Graves, A., Peters, J. and}

Schmidhuber, J., Policy gradients with parameter-based exploration for control. In Artificial Neural Networks-ICANN 2008, Springer, 2008, pages 387–396.

SS12 Stulp, F. and Sigaud, O., Path Integral Policy Improvement with Co- variance Matrix Adaptation. arXiv preprint arXiv:1206.4621. URL http://arxiv.org/abs/1206.4621.

SWSS09 Sun, Y., Wierstra, D., Schaul, T. and Schmidhuber, J., Efficient natural evolution strategies. Proceedings of the 11th Annual conference on Ge- netic and evolutionary computation GECCO 09, pages 539–545. URL http://portal.acm.org/citation.cfm?doid=1569901.1569976.

SZ06 Schäfer, A. M. and Zimmermann, H. G., Recurrent neural networks are universal approximators. In Artificial Neural Networks–ICANN 2006, Springer, 2006, pages 632–640.

SZR11 Schäfer, A., Zimmermann, H.-g. and Riedmiller, M., Reinforcement Learning with Recurrent Neural Networks. thesis, 9, pages 410–420. URL http://link.springer.com/10.1007/s11768-011-0177-1. TH12 Tieleman, T. and Hinton, G., Lecture 6.5-rmsprop: Divide the gradient

by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning, 4.

Thr96 Thrun, S., Explanation-Based Neural Network Learning. Springer, 1996. VLL+10 Vincent, P., Larochelle, H., Lajoie, I., Bengio, Y. and Manzagol, P.-A., Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion. Journal of Machine Learning Research, 11, pages 3371–3408.

Wer90 Werbos, P. J., Backpropagation through time: What it does and how to do it. Proceedings of the IEEE, 78, pages 1550–1560.

WF07 Wierstra, D. and Foerster, A., Solving deep memory POMDPs with recurrent policy gradients. Artificial Neural Networks- . . . , 1,1(2007). URL http://link.springer.com/chapter/10.1007/ 978-3-540-74690-4\_71.

Wil92 Williams, R. J., Simple statistical gradient-following algorithms for con- nectionist reinforcement learning. Machine learning, 8,3-4(1992), pages 229–256.

YLA15 Yang, Y., Li, Y. and Aloimonos, Y., Robot Learning Manipulation Ac- tion Plans by "Watching" Unconstrained Videos from the World Wide Web. Under Review. URL http://www.umiacs.umd.edu/~yzyang/ paper/YouCookMani\_CameraReady.pdf.

ZHNS11 Zhao, T., Hachiya, H., Niu, G. and Sugiyama, M., Analysis and improvement of policy gradient estimation. Advances in Neural Informa- tion Processing Systems, 2011, pages 262–270.

In document Investigating Neural-Based Learning Algorithms for Control (Page 55-64)