Conclusions - Towards Lightweight AI: Leveraging Stochasticity, Quantization, and Tensorization

We provide analytical rationale for the characteristics of deep ESN design that in- fluence forecasting performance. Within the malleable Mod-DeepESN architecture, we experimentally support that networks perform optimally near the “edge of chaos.” Provided constraints on model size or compute resources, we explore the effects of neuron allocation and reservoir placement on performance. We also demonstrate that network breadth plays a role in dictating certainty of performance between instances. Redundancy through parallel pathways, extraction of nonlinear data regularities with depth, and discernibility of latent representations all appear to have a significant impact on Mod-DeepESN performance. We also demonstrate that the recent posit numerical system has a high affinity for deep neural network inference at ≤8-bit precision. The proposed posit hardware is shown to be competitive with the floating point counterpart in terms of resource utilization and energy-delay-product. Moreover, the posit EMAC offers a superior maximum operating frequency over that of floating point. With regard to performance degradation, direct quantization to ultra-low precision favors posits heavily, surpassing fixed-point vastly. Moreover, the performance of floating point is either matched or surpassed consistently by posits across multiple datasets. Lastly, tensorization, low-precision computation, and alternative training paradigms all demonstrably reduce model complexity on the order of magnitudes. We show that the forecasting of data modalities that exhibit multi-scale and nonlinear

CHAPTER 5. DISCUSSION & CONCLUSIONS

dynamics can be achieved on resource-scarce platforms without sacrificing performance. The door to many future directions is opened up by this work. Tensorization can be further extended to the reservoir parameters of the Mod-DeepESN architec- ture. Furthermore, the quantization techniques explored are naive, which becomes a larger problem when considering recurrent neural networks which propagate error each timestep. Tensor regression should also be explored to enhance predictive performance as opposed to matricizing decomposed state tensors. We hope the methods presented in this work will aid in broadening the applications of forecasting models.

Bibliography

[1] Martín Abadi, Ashish Agarwal, Paul Barham, et al. 2015. TensorFlow: Large- Scale Machine Learning on Heterogeneous Systems. CoRR abs/1603.04467 (2015). arXiv:1603.04467 http://arxiv.org/abs/1603.04467 - software available from tensorflow.org. [Cited on page 45]

[2] Pau Vilimelis Aceituno, Yan Gang, and Yang-Yu Liu. 2017. Tailoring Artificial Neural Networks for Optimal Learning. CoRR abs/1707.02469 (2017), 1–22. arXiv:cs/1707.02469 http://arxiv.org/abs/1707.02469 [Cited on page 46]

[3] Brett W. Bader and Tamara G. Kolda. 2006. Algorithm 862: MATLAB Tensor Classes for Fast Algorithm Prototyping. ACM Trans. Math. Software 32, 4 (Dec. 2006), 635–653. https://doi.org/10.1145/1186785.1186794 [Cited on page 11]

[4] Casey Battaglino, Grey Ballard, and Tamara G. Kolda. 2018. A Practical Randomized CP Tensor Decomposition. SIAM J. Matrix Analysis Applications 39, 2 (2018), 876–901. https://doi.org/10.1137/17M1112303 [Cited on page 13]

[5] Mert Bay and Andreas F. Ehmannnd J. Stephen Downie. 2009. Evaluation of Multiple-F0 Estimation and Tracking Systems. In Proceedings of the 10th

International Society for Music Information Retrieval Conference, ISMIR, Keiji

Hirata, George Tzanetakis, and Kazuyoshi Yoshii (Eds.). International Society for Music Information Retrieval, Kobe International Conference Center, Kobe, Japan, 315–320. http://ismir2009.ismir.net/proceedings/PS2-21.pdf

[Cited on page 25]

[6] Nelson H. F. Beebe. 1993. Accurate Hyperbolic Tangent Computation. Technical Report version 1.07. Center for Scientific Computing, Department of Mathemat- ics, University of Utah, Salt Lake City, UT, USA. https://www.math.utah. edu/~beebe/software/ieee/tanh.pdf Supplemental class notes prepared for Mathematics 119. [Cited on pages 30 and 31]

[7] Yoshua Bengio. 2013. Deep Learning of Representations: Looking Forward. In Proceedings of the 1st International Conference on Statistical Language

and Speech Processing, SLSP (Lecture Notes in Computer Science), Adrian-

Horia Dediu, Carlos Martín-Vide, Ruslan Mitkov, and Bianca Truthe (Eds.), Vol. 7978. Springer, Tarragona, Spain, 1–37. https://doi.org/10.1007/ 978-3-642-39593-2_1 [Cited on page 5]

[8] Yoshua Bengio and Yann LeCun (Eds.). 2016. Conference Track Proceedings of

the 4th International Conference on Learning Representations, ICLR. arXiv,

San Juan, Puerto Rico. https://iclr.cc/archive/www/doku.php%3Fid= iclr2016:accepted-main.html [Cited on pages 80 and 82]

BIBLIOGRAPHY

[9] Nicolas Boulanger-Lewandowski, Yoshua Bengio, and Pascal Vincent. 2012. Modeling Temporal Dependencies in High-Dimensional Sequences: Application to Polyphonic Music Generation and Transcription. In Proceedings of the 29th

International Conference on Machine Learning, ICML (ICML’12). Omnipress,

Edinburgh, Scotland, UK, 1881–1888. http://icml.cc/2012/papers/590. pdf [Cited on pages 25, 44, 51, 52, and 53]

[10] John B. Butcher, David Verstraeten, Benjamin Schrauwen, Charles R. Day, and Peter W. Haycock. 2013. Reservoir computing and extreme learning machines for non-linear time-series data analysis. Neural Networks 38 (2013), 76–89. https://doi.org/10.1016/j.neunet.2012.11.011 [Cited on pages 2, 9, 48, and 49]

[11] Zachariah Carmichael, Hamed Fatemi Langroudi, Char Khazanov, et al. 2019. Deep Positron: A Deep Neural Network Using the Posit Number System. In

Proceedings of the Design, Automation & Test in Europe Conference & Exhibi-

tion, DATE. IEEE, Florence, Italy, 1421–1426. https://doi.org/10.23919/

DATE.2019.8715262 [Cited on pages ix, 35, 36, 38, 40, 64, and 72]

[12] Zachariah Carmichael, Humza Syed, Stuart Burtner, and Dhireesha Ku- dithipudi. 2018. Mod-DeepESN: Modular Deep Echo State Network. Confer-

ence on Cognitive Computational Neuroscience abs/1808.00523 (Sept. 2018),

1–4. arXiv:cs/1808.00523 http://arxiv.org/abs/1808.00523 or https: //ccneuro.org/2018/proceedings/1239.pdf. [Cited on pages 3, 9, 17, and 44]

[13] J. Douglas Carroll and Jih-Jie Chang. 1970. Analysis of individual differ- ences in multidimensional scaling via an n-way generalization of “Eckart- Young” decomposition. Psychometrika 35, 3 (1 Sept. 1970), 283–319. https: //doi.org/10.1007/BF02310791 [Cited on pages 12 and 13]

[14] Raymond B. Cattell. 1944. “Parallel proportional profiles” and other principles for determining the choice of factors by rotation. Psychometrika 9, 4 (1 Dec. 1944), 267–283. https://doi.org/10.1007/BF02288739 [Cited on page 12]

[15] Raymond B. Cattell. 1952. The three basic factor-analytic research designs– their interrelations and derivatives. Psychological Bulletin 49, 5 (Dec. 1952), 267–283. https://doi.org/10.1037/h0054245 [Cited on page 12]

[16] Rohit Chaurasiya, John Gustafson, Rahul Shrestha, et al. 2018. Parameterized Posit Arithmetic Hardware Generator. In Proceedings of the 36th International

Conference on Computer Design, ICCD. IEEE Computer Society, Orlando,

FL, USA, 334–341. https://doi.org/10.1109/ICCD.2018.00057 [Cited on pages 71 and 72]

[17] Jianyu Chen, Zaid Al-Ars, and H. Peter Hofstee. 2018. A Matrix-multiply Unit for Posits in Reconfigurable Logic Leveraging (Open)CAPI. In Proceedings of

BIBLIOGRAPHY

NY, USA, Article 1, 5 pages. https://doi.org/10.1145/3190339.3190340

[Cited on pages 71 and 72]

[18] François Chollet et al. 2015. Keras. https://github.com/fchollet/keras.

[Cited on page 45]

[19] Eric S. Chung, Jeremy Fowers, Kalin Ovtcharov, et al. 2018. Serving DNNs in Real Time at Datacenter Scale with Project Brainwave. IEEE Micro 38, 2 (2018), 8–20. https://doi.org/10.1109/MM.2018.022071131 [Cited on pages 1, 3, and 5]

[20] Marco Cococcioni, Emanuele Ruffaldi, and Sergio Saponara. 2018. Exploit- ing Posit Arithmetic for Deep Neural Networks in Autonomous Driving Appli- cations. In International Conference of Electrical and Electronic Technologies

for Automotive (AEIT AUTOMOTIVE). IEEE, Turin, Italy, 1–6. https:

//doi.org/10.23919/EETA.2018.8493233 [Cited on page 6]

[21] William J. Cody, Jr. and William Waite. 1980. Software Manual for the Ele-

mentary Functions. Prentice-Hall, Inc., Upper Saddle River, NJ, USA. [Cited

on pages ix, 30, and 31]

[22] Philip Colangelo, Nasibeh Nasiri, Eriko Nurvitadhi, et al. 2018. Exploration of Low Numeric Precision Deep Learning Inference Using Intel® FPGAs. In Pro-

ceedings of the 26th Annual International Symposium on Field-Programmable

Custom Computing Machines, FCCM. IEEE Computer Society, Boulder, CO,

USA, 73–80. https://doi.org/10.1109/FCCM.2018.00020 [Cited on page 1]

[23] Matthieu Courbariaux, Yoshua Bengio, and Jean-Pierre David. 2015. Low precision arithmetic for deep learning. In Workshop Track Proceedings of the

3rd International Conference on Learning Representations, ICLR, Yoshua Ben-

gio and Yann LeCun (Eds.). arXiv, San Diego, CA, USA, 1–10. http: //arxiv.org/abs/1412.7024 [Cited on page 5]

[24] Paul Dean, John Porrill, and James V. Stone. 2002. Decorrelation control by the cerebellum achieves oculomotor plant compensation in simulated vestibulo- ocular reflex. The Royal Society 269, 1503 (2002), 1895–1904. https://doi. org/10.1098/rspb.2002.2103 [Cited on page 22]

[25] Dheeru Dua and Casey Graff. 2019. UCI Machine Learning Repository. http: //archive.ics.uci.edu/ml [Cited on page 65]

[26] Jean-Pierre Eckmann and David Ruelle. 1985. Ergodic theory of chaos and strange attractors. Reviews of Modern Physics 57 (July 1985), 617–656. Issue 3. https://doi.org/10.1103/RevModPhys.57.617 [Cited on page 24]

[27] Ronald A. Fisher. 1936. The use of multiple measurements in taxonomic prob- lems. Annals of Eugenics 7, 2 (1936), 179–188. https://doi.org/10.1111/ j.1469-1809.1936.tb02137.x [Cited on page 65]

BIBLIOGRAPHY

[28] Claudio Gallicchio and Alessio Micheli. 2011. Architectural and Markovian factors of echo state networks. Neural Networks 24, 5 (2011), 440–456. https: //doi.org/10.1016/j.neunet.2011.02.002 [Cited on pages 48 and 49]

[29] Claudio Gallicchio and Alessio Micheli. 2017. Echo State Property of Deep Reservoir Computing Networks. Cognitive Computation 9, 3 (2017), 337–350. https://doi.org/10.1007/s12559-017-9461-9 [Cited on pages 3 and 19]

[30] Claudio Gallicchio, Alessio Micheli, and Luca Pedrelli. 2017. Deep reservoir computing: A critical experimental analysis. Neurocomputing 268 (2017), 87– 99. https://doi.org/10.1016/j.neucom.2016.12.089 [Cited on pages ix, 2, 9, and 17]

[31] Claudio Gallicchio, Alessio Micheli, and Luca Pedrelli. 2018. Deep Echo State Networks for Diagnosis of Parkinson’s Disease. In Proceedings of the

26th European Symposium on Artificial Neural Networks, ESANN. i6doc.com,

Bruges, Belgium, 397–402. http://www.elen.ucl.ac.be/Proceedings/ esann/esannpdf/es2018-163.pdf [Cited on pages 52 and 53]

[32] Claudio Gallicchio, Alessio Micheli, and Luca Pedrelli. 2018. Design of deep echo state networks. Neural Networks 108 (2018), 33–47. https://doi.org/ 10.1016/j.neunet.2018.08.002 [Cited on pages 2 and 9]

[33] Claudio Gallicchio, Alessio Micheli, and Luca Silvestri. 2018. Local Lyapunov Exponents of Deep Echo State Networks. Neurocomputing 298 (2018), 34–45. https://doi.org/10.1016/j.neucom.2017.11.073 [Cited on pages 9, 23, 24, 51, and 71]

[34] Timur Garipov, Dmitry Podoprikhin, Alexander Novikov, and Dmitry P. Vetrov. 2016. Ultimate tensorization: compressing convolutional and FC layers alike. CoRR/Learning with Tensors: Why Now and How? (NIPS Workshop) abs/1611.03214 (2016), 1–6. arXiv:1611.03214 http://arxiv.org/abs/1611. 03214 [Cited on page 16]

[35] Thomas E. Gibbons. 2010. Unifying quality metrics for reservoir networks. In Proceedings of the International Joint Conference on Neural Networks,

IJCNN. IEEE, Barcelona, Spain, 1–7. https://doi.org/10.1109/IJCNN.

2010.5596307 [Cited on page 23]

[36] Xavier Glorot and Yoshua Bengio. 2010. Understanding the difficulty of train- ing deep feedforward neural networks. In Proceedings of the 13th International

Conference on Artificial Intelligence and Statistics, AISTATS (JMLR Proceed-

ings), Yee Whye Teh and D. Mike Titterington (Eds.), Vol. 9. JMLR.org, Chia

Laguna Resort, Sardinia, Italy, 249–256. http://jmlr.org/proceedings/ papers/v9/glorot10a.html [Cited on page 20]

[37] John L. Gustafson and Isaac T. Yonemoto. 2017. Beating Floating Point at its Own Game: Posit Arithmetic. Supercomputing Frontiers and Innovations 4, 2

BIBLIOGRAPHY

(2017), 71–86. https://doi.org/10.14529/jsfi170206 [Cited on pages 3, 7, and 8]

[38] Philipp M. Gysel. 2016. Ristretto: Hardware-Oriented Approximation of Con-

volutional Neural Networks. Master’s thesis. University of California, Davis.

http://arxiv.org/abs/1605.06402 [Cited on page 3]

[39] Nathan Halko, Per-Gunnar Martinsson, and Joel A. Tropp. 2011. Finding Structure with Randomness: Probabilistic Algorithms for Constructing Ap- proximate Matrix Decompositions. SIAM Rev. 53, 2 (2011), 217–288. https: //doi.org/10.1137/090771806 [Cited on page 34]

[40] Dan W. Hammerstrom. 1990. A VLSI architecture for high-performance, low- cost, on-chip learning. In Proceedings of the International Joint Conference

on Neural Networks, IJCNN, Vol. 2. IEEE, San Diego, CA, USA, 537–544.

https://doi.org/10.1109/IJCNN.1990.137621 [Cited on page 5]

[41] Song Han, Huizi Mao, and William J. Dally. 2016. Deep Compression - Com- pressing Deep Neural Networks with Pruning, Trained Quantization and Huff- man Coding, See [8], 1–13. https://arxiv.org/abs/1510.00149 [Cited on page 1]

[42] Richard A. Harshman. 1970. Foundations of the PARAFAC procedure: Models and conditions for an “explanatory” multimodal factor analysis. UCLA Working

Papers in Phonetics 16 (1970), 1–84. [Cited on pages 12 and 13]

[43] Soheil Hashemi, Nicholas Anthony, Hokchhay Tann, R. Iris Bahar, and Sherief Reda. 2017. Understanding the impact of precision quantization on the accuracy and energy of neural networks. In Proceedings of the Design, Automation &

Test in Europe Conference & Exhibition, DATE, David Atienza and Giorgio Di

Natale (Eds.). IEEE, Lausanne, Switzerland, 1474–1479. https://doi.org/ 10.23919/DATE.2017.7927224 [Cited on pages 1, 3, and 5]

[44] Frank L. Hitchcock. 1927. The Expression of a Tensor or a Polyadic as a Sum of Products. Journal of Mathematics and Physics 6, 1-4 (April 1927), 164–189. https://doi.org/10.1002/sapm192761164 [Cited on page 12]

[45] Frank L. Hitchcock. 1928. Multiple Invariants and Generalized Rank of a P- Way Matrix or Tensor. Journal of Mathematics and Physics 7, 1-4 (April 1928), 39–79. https://doi.org/10.1002/sapm19287139 [Cited on page 12]

[46] Andrew G. Howard, Menglong Zhu, Bo Chen, et al. 2017. MobileNets: Effi- cient Convolutional Neural Networks for Mobile Vision Applications. CoRR abs/1704.04861 (2017), 1–9. arXiv:1704.04861 http://arxiv.org/abs/1704. 04861 [Cited on page 13]

[47] John D. Hunter. 2007. Matplotlib: A 2D Graphics Environment. Computing in

Science & Engineering 9, 3 (2007), 90–95. https://doi.org/10.1109/MCSE.

BIBLIOGRAPHY

[48] Rob J. Hyndman and Yangzhuoran Yang. 2018. Daily minimum temperatures in Melbourne, Australia (1981–1990). https://pkg.yangzhuoranyang.com/ tsdl/ [Cited on pages x, 44, and 49]

[49] Akira Iwata, Yukio Yoshida, Satoshi Matsuda, Yukimasa Sato, and Nobuo Suzumura. 1989. An artificial neural network accelerator using general purpose 24 bits floating point digital signal processors. In Proceedings of the Interna-

tional Joint Conference on Neural Networks, IJCNN, Vol. 2. IEEE, Washington,

DC, USA, 171–175. https://doi.org/10.1109/IJCNN.1989.118695 [Cited on page 5]

[50] Paul Jaccard. 1912. The Distribution of the Flora in the Alpine Zone. The

New Phytologist 11, 2 (Feb. 1912), 37–50. https://doi.org/10.1111/j.

1469-8137.1912.tb05611.x [Cited on page 26]

[51] Herbert Jaeger. 2001. The “Echo State” Approach to Analysing and Train-

ing Recurrent Neural Networks-with an Erratum Note. Technical Report 148.

Fraunhofer Institute for Autonomous Intelligent Systems, GMD-German Na- tional Research Institute for Information Technology. http://www.faculty. jacobs-university.de/hjaeger/pubs/EchoStatesTechRep.pdf [Cited on pages 2, 8, 19, and 46]

[52] Herbert Jaeger. 2002. Short term memory in echo state networks. Tech- nical Report 152. Fraunhofer Institute for Autonomous Intelligent Sys- tems, GMD-German National Research Institute for Information Tech-

nology. http://www.faculty.jacobs-university.de/hjaeger/pubs/

STMEchoStatesTechRep.pdf [Cited on page 23]

[53] Herbert Jaeger, Mantas Lukoševičius, Dan Popovici, and Udo Siewert. 2007. Optimization and applications of echo state networks with leaky-integrator neu- rons. Neural networks 20, 3 (2007), 335–352. https://doi.org/10.1016/j. neunet.2007.04.016 [Cited on pages 2, 3, 48, and 49]

[54] Manish Kumar Jaiswal and Hayden Kwok-Hay So. 2018. Architecture Gener- ator for Type-3 Unum Posit Adder/Subtractor. In Proceedings of the Interna-

tional Symposium on Circuits and Systems, ISCAS. IEEE, Florence, Italy, 1–5.

https://doi.org/10.1109/ISCAS.2018.8351142 [Cited on pages 71 and 72]

[55] Manish Kumar Jaiswal and Hayden Kwok-Hay So. 2018. Universal number posit arithmetic generator on FPGA. In Proceedings of the Design, Automa-

tion & Test in Europe Conference & Exhibition, DATE. IEEE, Dresden, Ger-

many, 1159–1162. https://doi.org/10.23919/DATE.2018.8342187 [Cited on pages 71 and 72]

[56] Jeff Johnson. 2018. Rethinking floating point for deep learning. CoRR abs/1811.01721 (2018), 8. arXiv:1811.01721 http://arxiv.org/abs/1811. 01721 [Cited on pages 6, 71, 72, and 73]

BIBLIOGRAPHY

[57] Eric Jones, Travis Oliphant, Pearu Peterson, et al. 2001. SciPy: Open Source Scientific Tools for Python. http://www.scipy.org/ [Cited on page 45]

[58] Norman P. Jouppi, Cliff Young, Nishant Patil, et al. 2017. In-Datacenter Perfor- mance Analysis of a Tensor Processing Unit. In Proceedings of the 44th Annual

International Symposium on Computer Architecture, ISCA. ACM, Toronto,

ON, Canada, 1–12. https://doi.org/10.1145/3079856.3080246 [Cited on page 3]

[59] James Kennedy and Russell C. Eberhart. 1995. Particle swarm optimization. In Proceedings of the International Conference on Neural Networks, ICNN’95 (1995), Vol. 4. IEEE, Perth, WA, Australia, 1942–1948. https://doi.org/ 10.1109/ICNN.1995.488968 [Cited on page 21]

[60] Henk A. L. Kiers. 2000. Towards a standardized notation and terminol- ogy in multiway analysis. Journal of Chemometrics: A Journal of the

Chemometrics Society 14, 3 (9 June 2000), 105–122. https://doi.org/10.

1002/1099-128X(200005/06)14:3<105::AID-CEM582>3.0.CO;2-I [Cited on page 12]

[61] Yong-Deok Kim, Eunhyeok Park, Sungjoo Yoo, et al. 2016. Compression of Deep Convolutional Neural Networks for Fast and Low Power Mobile Appli- cations, See [8], 1–16. http://arxiv.org/abs/1511.06530 [Cited on pages 4 and 15]

[62] Stephan Kolassa and Roland Martin. 2011. Percentage Errors Can Ruin Your Day (and Rolling the Dice Shows How). Foresight: The International Journal

of Applied Forecasting Issue 23 (Fall 2011), 21–27. https://ideas.repec.

org/a/for/ijafaa/y2011i23p21-27.html [Cited on page 26]

[63] Tamara G. Kolda and Brett W. Bader. 2009. Tensor Decompositions and Ap- plications. SIAM Rev. 51, 3 (2009), 455–500. https://doi.org/10.1137/ 07070111X [Cited on pages ix, 10, 11, 12, 13, 14, and 15]

[64] Ulrich Kulisch. 2013. Computer Arithmetic and Validity: Theory, Imple-

mentation, and Applications. Vol. 33. Walter de Gruyter, Berlin, Germany.

https://doi.org/10.1515/9783110301793 [Cited on pages 35 and 36]

[65] Hamed Fatemi Langroudi, Tej Pandit, and Dhireesha Kudithipudi. 2018. Deep Learning Inference on Embedded Devices: Fixed-Point vs Posit. In Proceed-

ings of the 1st Workshop on Energy Efficient Machine Learning and Cognitive

Computing for Embedded Applications (EMC2). IEEE, Williamsburg, VA, USA,

19–23. https://doi.org/10.1109/EMC2.2018.00012 [Cited on page 5]

[66] Lieven De Lathauwer, Bart De Moor, and Joos Vandewalle. 2000. A Multi- linear Singular Value Decomposition. SIAM Journal of Matrix Analysis and

Applications, SIMAX 21, 4 (2000), 1253–1278. https://doi.org/10.1137/

BIBLIOGRAPHY

[67] Vadim Lebedev, Yaroslav Ganin, Maksim Rakhuba, Ivan V. Oseledets, and Victor S. Lempitsky. 2015. Speeding-up Convolutional Neural Networks Us- ing Fine-tuned CP-Decomposition. In Conference Track Proceedings of the 3rd

International Conference on Learning Representations, ICLR, Yoshua Ben-

gio and Yann LeCun (Eds.). arXiv, San Diego, CA, USA, 1–11. http: //arxiv.org/abs/1412.6553 [Cited on pages 4 and 13]

[68] Yann LeCun, Léon Bottou, Yoshua Bengio, Patrick Haffner, et al. 1998. Gradient-based learning applied to document recognition. Proc. IEEE 86, 11 (Nov. 1998), 2278–2324. https://doi.org/10.1109/5.726791 [Cited on page 65]

[69] Zoltán Lehóczky, András Retzler, Richárd Tóth, et al. 2018. High-level .NET Software Implementations of Unum Type I and Posit with Simultaneous FPGA Implementation Using Hastlayer. In Proceedings of the Conference for Next

Generation Arithmetic (CoNGA ’18). ACM, New York, NY, USA, Article 4,

7 pages. https://doi.org/10.1145/3190339.3190343 [Cited on pages 71 and 72]

[70] Joseph Levin. 1963. Three-Mode Factor Analysis. Ph.D. Dissertation. University of Illinois at Urbana-Champaign, Champaign, IL, USA. [Cited on page 13]

[71] Aleksandr M. Lyapunov. 1992. The general problem of the stability of mo- tion. Internat. J. Control 55, 3 (1992), 531–534. https://doi.org/10.1080/ 00207179208934253 [Cited on page 24]

[72] Thomas Lymburn, Alexander Khor, Thomas Stemler, et al. 2019. Consistency in Echo-State Networks. Chaos: An Interdisciplinary Journal of Nonlinear

Science 29, 2 (2019), 23118. https://doi.org/10.1063/1.5079686 [Cited on

page 23]

[73] Qianli Ma, Lifeng Shen, and Garrison W. Cottrell. 2017. Deep-ESN: A Multi- ple Projection-encoding Hierarchical Reservoir Computing Framework. CoRR abs/1711.05255 (2017), 15. arXiv:cs/1711.05255 http://arxiv.org/abs/ 1711.05255 [Cited on pages 2, 9, 46, 47, 48, 49, and 50]

[74] Wolfgang Maass, Thomas Natschläger, and Henry Markram. 2002. Real-Time Computing Without Stable States: A New Framework for Neural Computation Based on Perturbations. Neural Computation 14, 11 (2002), 2531–2560. https: //doi.org/10.1162/089976602760407955 [Cited on page 2]

[75] Michael C. Mackey and Leon Glass. 1977. Oscillation and chaos in physiological control systems. Science 197, 4300 (1977), 287–289. https://doi.org/10. 1126/science.267326 [Cited on pages x, 46, and 47]

[76] Zeeshan K. Malik, Amir Hussain, and Qingming J. Wu. 2016. Multilayered Echo State Machine: A Novel Architecture and Algorithm. IEEE Transactions

BIBLIOGRAPHY

on Cybernetics 47, 4 (June 2016), 946–959. https://doi.org/10.1109/TCYB.

2016.2533545 [Cited on pages 2, 9, 48, and 49]

[77] Wes McKinney. 2010. Data Structures for Statistical Computing in Python. In Proceedings of the 9th Python in Science Conference (2010), Stéfan van der Walt and Jarrod Millman (Eds.). SciPy, Austin, TX, 51–56. [Cited on page 45]

[78] Lester James V. Miranda. 2018. PySwarms, a Research-Toolkit for Particle Swarm Optimization in Python. Journal of Open Source Software 3, 21 (2018), 433. https://doi.org/10.21105/joss.00433 [Cited on page 45]

[79] Asit K. Mishra and Debbie Marr. 2018. WRPN & Apprentice: Methods for Training and Inference using Low-Precision Numerics. CoRR abs/1803.00227 (2018), 3. arXiv:1803.00227 http://arxiv.org/abs/1803.00227 [Cited on page 3]

[80] Shinichi Nakajima, Masashi Sugiyama, S. Derin Babacan, and Ryota Tomioka. 2013. Global Analytic Solution of Fully-observed Variational Bayesian Matrix Factorization. The Journal of Machine Learning Research 14, 1 (Jan. 2013), 1–37. http://dl.acm.org/citation.cfm?id=2502582 [Cited on page 15]

[81] Alexander Novikov, Dmitry Podoprikhin, Anton Osokin, and Dmitry P. Vetrov. 2015. Tensorizing Neural Networks. In Advances in Neural Information Process-

ing Systems 28: Annual Conference on Neural Information Processing Systems,

Corinna Cortes, Neil D. Lawrence, Daniel D. Lee, Masashi Sugiyama, and Ro- man Garnett (Eds.). Curran Associates, Inc., Montreal, Quebec, Canada, 442– 450. http://papers.nips.cc/paper/5787-tensorizing-neural-networks

[Cited on pages 4 and 16]

[82] Travis E. Oliphant. 2006. A Guide to NumPy. Vol. 1. Trelgol Publishing, USA.

[Cited on page 45]

[83] Ivan V. Oseledets. 2011. Tensor-Train Decomposition. SIAM Journal on Scien-

tific Computing, SISC 33, 5 (2011), 2295–2317. https://doi.org/10.1137/

090752286 [Cited on pages 15 and 16]

[84] Nobuyuki Otsu. 1979. A Threshold Selection Method from Gray-Level His- tograms. IEEE Transactions on Systems, Man, and Cybernetics 9, 1 (Jan. 1979), 62–66. https://doi.org/10.1109/TSMC.1979.4310076 [Cited on page 52]

[85] Mustafa C. Ozturk, Dongming Xu, and José Carlos Príncipe. 2007. Analysis and Design of Echo State Networks. Neural Computation 19, 1 (2007), 111–138. https://doi.org/10.1162/neco.2007.19.1.111 [Cited on page 23]

[86] Karl Pearson and Francis Galton. 1895. VII. Note on regression and inheritance in the case of two parents. Proceedings of the Royal Society of London 58, 347- 352 (1895), 240–242. https://doi.org/10.1098/rspl.1895.0041 [Cited on page 70]

BIBLIOGRAPHY

[87] Artur Podobas and Satoshi Matsuoka. 2018. Hardware Implementation of POSITs and Their Application in FPGAs. In Proceedings of the International

Parallel and Distributed Processing Symposium Workshops, IPDPS. IEEE Com-

puter Society, Vancouver, BC, Canada, 138–145. https://doi.org/10.1109/ IPDPSW.2018.00029 [Cited on pages 71 and 72]

[88] Graham E. Poliner and Daniel P. W. Ellis. 2006. A Discriminative Model for Polyphonic Piano Transcription. EURASIP Journal on Advances in Signal

Processing 2007, 1 (2006), 48317. https://doi.org/10.1155/2007/48317

[Cited on pages 44 and 51]

[89] Ashley Prater. 2017. Classification via tensor decompositions of echo state

In document Towards Lightweight AI: Leveraging Stochasticity, Quantization, and Tensorization for Forecasting (Page 97-111)