HARDWARE EMULATION STUDY OF NEURONAL PROCESSING IN CORTEX FOR PATTERN RECOGNITION. A Thesis submitted to the Department of Computer Science

(1)

HARDWARE EMULATION STUDY OF NEURONAL PROCESSING IN CORTEX FOR PATTERN RECOGNITION

A Thesis submitted to the Department of Computer Science African University of Science and technology

in partial fulfillment of the requirements for the degree of Master of Computer Science

BY

Ogbodo Mark Ikechukwu

African University of Science and Technology www.aust.edu.ng

P.M.B 681, Galadimawa, Abuja F.C.T Nigeria

November 2017

(2)

CERTIFICATION

This is to certify that the thesis titled “Hardware emulation of neuronal processing in cortex for pattern recognition” submitted to the school of postgraduate studies, African University of Science and Technology (AUST), Abuja, Nigeria for the award of Master’s degree is a record of original research carried out by Ogbodo Mark Ikechukwu in the Department of Computer Science.

(3)

HARDWARE EMULATION STUDY OF NEURONAL PROCESSING IN CORTEX FOR PATTERN RECOGNITION

BY

OGBODO MARK IKECHUKWU

A THESIS APPROVED BY THE DEPARTMENT OF COMPUTER SCIENCE

RECOMMENDED: ……….

Supervisor Prof. Ben Abdallah

……….

Head, Department of Computer Science

APPROVED: ………

Chief Academic Officer

(4)

Ogbodo Mark Ikechukwu

ALL RIGHTS RESERVED

(5)

ABSTRACT

Artificial Neural network (ANN) is an area of computing that is modeled after the neural network of the biological brain and over the last few decades, has experienced huge success in its application in areas such as business, Medicine, Industry, Automotive, Astronomy, Finance, etc.

Since Neural Networks are inherently parallel architectures, there have been several earlier researches to build custom ASIC based systems that include multiple parallel processing units.

However, these ASIC based systems suffered from several limitations such as the ability to run only specific algorithms and limitations on the size of a network. Recently, much work has focused on implementing artificial neural networks on reconfigurable computing platforms. Reconfigurable computing allows to increasing the processing density beyond that provided by general-purpose computing systems. Field Programmable Gate Arrays (FPGAs) can be used for reconfigurable computing and offer flexibility in design with performance speeds almost closer to Application Specific Integrated Circuits (ASICs).

This thesis presents a study of an FPGA-based acceleration solution and performance exploration of a Feedforward Artificial Neural Networks (FFANN). The architecture is described using Very- High-Speed Integrated Circuits Hardware Description Language (VHDL) and implemented and demonstrated on an FPGA board. Synthesis and simulation are made with Quartus II tool and ModelSim respectively. The given system was efficiently trained and evaluated in hardware with digit recognition application.

(6)

ACKNOWLEDGEMENT

I would like to first, express my sincere gratitude to God Almighty who provided me the opportunity to undertake this program, took me through it and made it possible for me to successfully complete it.

I would like to express my appreciation to my supervisor Prof. Ben Abdallah for his scholarly assistance, constant support and words of encouragement. The Head of Department Prof. Amos David, for his support in making sure this work goes according to schedule.

In addition, I would like to thank African Development Bank for their scholarship award to undertake the 18 months MSc. program. I would also like to acknowledge Prof. Michael Cohen, Prof. Lehel Csato, Prof Mohamed Hamada, Dr. Ekpe Okoroafor, Dr. V. Odumuyiwa and other faculties at AUST. Also, my appreciation goes to David Clement, Nsima, Bolade, Paulina, Bobby and all the other staffs of AUST.

I also would like to thank Dr. Ituma Chinagolum the Head of Department Computer Science, Ebonyi State University and Engr. Emeka, for their support during my MSc program. I am equally grateful to my pastors: Rev. Philip Chinagorom and Otubo victor for their spiritual support, my friends: Ezzaka, Ojiugwo, Okechukwu and others, for their encouragement, my roommate Nkoro, and the entire computer science 2016/2017 stream.

Finally, I would like to specially appreciate my parents Mr. and Mrs. Ogbodo Leonard for their parental support, Love, care and training up to this point in life. Many thanks also to my siblings;

Mandy, Kingsley, Collins and Cynthia for believing in me.

(7)

DEDICATION

I dedicate this thesis work to God Almighty and all my loved ones.

(8)

List of Figures

Figure 1: Mathematical Model of an Artificial Neuron ... 3

Figure 2: Artificial Neural Network (ANN) ... 4

Figure 3: Supervised Learning ... 5

Figure 4: Unsupervised Learning ... 6

Figure 5: Reinforced Learning ... 7

Figure 6: ANN Implementation Models ... 12

Figure 7: System Architecture ... 13

Figure 8: Block Diagram of SRAM (Cs & We, n.d.) ... 14

Figure 9: SRAM Read/Write ... 15

Figure 10: Pattern Recognizer ANN ... 17

Figure 11: Block Diagram of a Single Neuron ... 18

Figure 12: Sigmoid activation Function ... 18

Figure 13: Backward propagation (Yaldex, n.d.) ... 19

Figure 14: System schematic design ... 20

Figure 15: Schematic design of the ANN, SRAM, Controls and ALU ... 21

Figure 16: Schematic design of the SRAM ... 22

Figure 17: Schematic design of the ANN ... 23

Figure 18: Schematic design of the ALU ... 24

Figure 19: Schematic design of the Control Switches ... 25

Figure 20: Schematic design of the Display ... 26

Figure 21:Connecting the FPGA to the computer with USB blaster ... 27

Figure 22: Training the ANN ... 28

Figure 23: Recognizing Pattern for letter A ... 29

Figure 24:Relative Energy Cost (Horowitz, 2014) ... 32

Figure 25: Relative Area Cost (Dally, 2016) ... 33

(11)

List of Tables

Table 1: SRAM Read/Write Operation... 16

Table 2: System evaluation ... 29

Table 3: Data representation format ... 30

Table 4: Energy and Area cost of Operation (Mark, 2014), (Song, 2016). ... 31

(12)

Chapter 1

1.1 Introduction

Moore’s law predicted that the number of transistors on a dense integrated circuit doubles every two years(Moore, 1975). So far, this has been true, but it is only a matter of time before this circuit will max out, and this is because further increasing the number of transistors on it will make it consume more power, overheat and become impossible to cool. Again, it is difficult to get the conventional computer with Von Neumann architecture to perform operations like understanding human languages, recognizing objects, learning to dance, etc. activities the human brain does very easily. The human brain is not good at arithmetic operations, but it does well in operations that involves processing continuous streams of data from the environment and can do it very quickly. So, to build a computer that will be able to carry out these activities, a computing paradigm called artificial neural network which mimics the biological brain was adopted(Abdallah, 2017).

Artificial Neural Network is a computing paradigm after the neural network of the biological brain.

The Biological brain is made up of billions of neurons which are interconnected to form a network.

It is fault tolerant, consumes extremely low amount of power and can carry out significant parallel computations(Indiveri, Linares-Barranco, Legenstein, Deligeorgis, & Prodromakis, 2013). This computing paradigm started as early as 1943(Macukow, 2016) and has continued to improve having its application in the areas of pattern recognition; a discipline that is aimed at classifying objects (text, images, speech, etc.), image recognition, object classification and much more.

1.2 Statement of Problem

There has been massive success in implementing the neural network in software, and one of the reasons is because it allows for flexibility. However, real time applications like autonomous driving

(13)

vehicles, real-time surveillance cameras, air traffic control system, etc. require much speed for operation(Abdallah, 2016). This speed can better be offered by neural networks implemented in hardware rather than in software. Although lacking flexibility, hardware implementation allows the neural network more speed, the advantage of more parallelism and cost effectiveness by reducing the number of components (Misra & Saha, 2010).

1.3 Biological Neuron

The biological neuron has four features which the artificial neuron, models and they are; the dendrites which are responsible for receiving input signals from other neurons into the cell body, the neuron cell body (nucleus) which is responsible for processing the input signal, the axon which is responsible for transferring the result of the processed signal out of the neuron cell body and the synapse that serve as the point of connection between two neurons and also plays a part in the transfer of output from one neuron to another(Hagan & Beale, n.d.).

Figure 1.1: Structure of a Biological Neuron

1.4 Artificial Neuron

An artificial neuron takes in input(s) as figures or set of figures, and each input is multiplied with the synaptic weight (which represents the strength of the connection between two neurons) of its

(14)

connection. When this is done, the neuron takes the product of each input and the synaptic weight and performs a sum operation on them. A bias is added to the value of the sum, and finally, an activation function operates on it to determine the final output(Hagan & Beale, n.d.).

Several activation functions can be employed when designing a neural network, but the choice depends on the specification of the problem the designer wants the neuron to solve. These activation functions can either be linear; Linear, Saturating Linear, Symmetric Saturating Linear, etc. or non-linear; Log-Sigmoid, Hyperbolic Tangent Sigmoid, etc.(Hagan & Beale, n.d.).

Figure 1: Mathematical Model of an Artificial Neuron

Where:

• x a column vector is the input.

• w a matrix with one row is the synaptic weight.

• b, the bias.

• , the Summation function

• and , the Activation function

(15)

1.5 Artificial Neural Network (ANN)

A neuron alone can solve tiny or no problems but to solve bigger problems, an interconnection of multiple neurons arranged in layers working together is required. This interconnection forms a network which is called a neural network. The way these neurons are arranged in relation to each other in a network is called the architecture. The arrangement is basically organized by controlling the direction of the synaptic connection between neurons. Artificial neural networks are arranged in three layers; input, hidden and output. The input layer is tasked with receiving input from the environment, the hidden layer with processing the input to identify patterns, and the output layer with presenting the result done in the hidden layer(da Silva, Hernane Spatti, Andrade Flauzino, Liboni, & dos Reis Alves, 2017).

Figure 2: Artificial Neural Network (ANN)

1.6 ANN Architectures

The neurons in a neural network can be connected in different ways. These types of connection are what is referred to as architectures. The most common among them is the feedforward

(16)

neural network. This architecture has an input layer, one or more hidden layers, and an output layer. The data comes in from the input layer and flows in one direction through the hidden layer(s) until it gets to the output layer.

Other neural network architectures include a recurrent neural network which allows data to flow round in cycles being able to remember information for a long time and symmetrically connected neural networks(Hinton, Srivastava, & Swersky, 2012).

1.7 Learning Algorithms

For a neural network to solve problems and get an accurate result, it needs to be trained, and it must learn how to do so. Neural network learning is classified into three groups of algorithms.

1.7.1 Supervised learning

In supervised learning, the network is given data in pairs, an input data and a target result. The aim is for the network to be able to extract information from the labeled dataset given to it, so that it can label new data sets. It can also be called functional approximation.

Figure 3: Supervised Learning

(17)

1.7.2 Unsupervised learning

Here, the network is given only input data set(s) and from it, the network is expected to derive some structure from the relationship that exists between the input data. This neural network architecture deals more with description.

Figure 4: Unsupervised Learning

1.7.3 Reinforced learning

This neural network is based on the concept of reward. So, the network makes decisions that will enable it to obtain a maximum reward.

(18)

Figure 5: Reinforced Learning

1.8 Research Objectives

The goal of this thesis is to acquire a deep understanding of neuro-inspired computing and its application and to emulate in hardware neuronal processing in the cortex for pattern recognition.

This will be carried out on a Field Programmable Gate Array (FPGA) a configurable integrated circuit that can be configured using Hardware Description Language (HDL).

1.9 Organization of Work

This work has been organized as follows: Chapter 2 starts with a brief history of the neural network and goes on to give an insight into related works. Chapter 3 presents the general system architecture, a description of the individual components that make up the entire system and the process of implementation. Chapter 4 discusses the analysis of the implementation and the result achieved, evaluating the power consumption and complexity. Chapter 5 concludes the research, giving insight into possible future work.

(19)

Chapter 2

2.1 Brief History

Neural Network started as a concept and has advanced, including implementation developments.

Early works in the field of neural network started as far back as the late 19^th and the early 20^th centuries with scientists like Herman Von Helmholtz, Ernst Mach and Ivan Pavlov whom their work cut across several disciplines like physics, psychology and neurophysiology, highlighting general theories of learning, Vision, conditioning etc. (Hagan & Beale, n.d.).

In the 1940s, McCulloch and Walter Pitts published a research article that showed that simple neural networks are capable of carrying out any arithmetic or logical operation(Michael Marsalli, n.d.). From there, interest increased, and in 1949, Hebb published a book titled “The Organization of behavior”. in 1951, the first neuro computer was constructed by Marvin Minsky. Between 1957 and 1958, Frank Rosenblatt, Charles Whightman and others. In the mid-1960s, there were setbacks and progress halted. However, by the 1970s, researchers like Fukushima, Grossberg, Klopf began to publish, and in the 1980s, there was a turnaround. Physicist John Hopfield published papers and eventually, other scientists began picking interest in it, and after the publication of the PDP (Parallel distributed processing) books in 1986 the field became highly active. The International Neural Network Society was eventually formed in 1988, Neural Computation in 1989 and IEEE Transactions on Neural Networks in 1990(Yadav, Yadav, &

Kumar, 2015).

Within the last two decades, advancement in neural network has been massive with IBM coming up with Deep Blue in 1997(Campbell, Hoane, & Hsu, 2002), Yann LeCun and his team with MNIST database in 1998(Deng, 2012), Torch in 2001(Collobert, Kavukcuoglu, & Farabet, 2011), Kaggle in 2010, IBM with Watson in 2011, Facebooks DeepFace in 2014(Taigman, Yang, &

(20)

Ranzato, 2014), Google Sibyl in 2014(Canini et al., n.d.), Google AlphaGo in 2016 and so many others.

2.2 Applications of ANN

The artificial neural network has thrived over the last decade and has been more than pervasive in its application in various fields. Some of these applications are

1. Businesses

for target marketing, sales forecasting, shopping cart analysis, trading and financial forecasting, derivative securities, corporate bankruptcy prediction, credit scoring, fraud detection and premium pricing in insurance.

2. Telecommunication

In telecommunication, artificial neural networks are used for Optimizing routing of service by analyzing network traffic in real time and analyzing call data records in real time to identify illegal behaviors immediately.

3. Sports

ANN is used for top-level football analysis and simulation of team tactics and formation.

4. Healthcare

ANN is used to predict patient health condition, possible complications, quality of care, disease diagnosis, it is also used on human DNA to get personalized medicine, etc.

(21)

2.3 Other Related Works

Hardware implementation of artificial neural network is an active research field, and several works have been done on it.

In(Khalil & Al-kazzaz, n.d.), Khalil and Al-Kazzaz presented the digital implementation of multiply- accumulate (MAC) circuit of an artificial neuron with a Xilinx FPGA (Field Programmable Gate Array) using the hardlims, satlins and tansig activation functions. Their design shows the advantages of using FPGAs which enables multiple modules to work in parallel without having much effect on the performance irrespective of the increased number of interconnections.

In(Forssell, 2014), Forssell implements in hardware an artificial neural network for vibroacoustic signals classification. His work majored in selecting a neural network architecture that will be used to classify damages on DMA-1 toothed gear using vibroacoustic signals. A neural classifier was built to recognize the technical state of the toothed gear applying the LVQ (Learning Vector Quantization) model of the neural network and finally, it was implemented in hardware using FPGA with regards to parallelism and speed.

in(Lozito, Laudani, Riganti-Fulginei, & Salvini, 2014)(Won, 2007), Esraa and Haitham implemented a feed forward multi-layer neural network in hardware as a digital circuit on a Field Programmable Gate Array (FPGA). Stating that the reason for using FPGA is its ability to offer parallelism, flexible designs, and cost-effectiveness.

(Aliaga et al., 2008) describes how a systolic array for a multilayer perceptron is implemented on a Virtex XCV400 FPGA using a pipelined on-line backpropagation algorithm. They also went ahead to evaluate the performance of this algorithm which takes care of some of the challenges that traditional backpropagations face when implemented on VLSI circuits.

(22)

(Savran & Ünsal, n.d.) implemented neural network on FPGA using a modular neural network architecture which allows for an increase or decrease of the number of neurons, as well as layers.

(Girau & Torres-huitzil, 2007) digitally implemented in hardware a spiking neural network for image segmentation based on the time oscillatory correlation theory. Their sim was to show that digital and flexible solutions may efficiently handle substantial spiking neural networks. This work was implemented on an FPGA.

(23)

Chapter 3

3.1 System Emulation

Implementation of artificial neural network in hardware can be done using analog model, digital model or the hybrid model. The analog model, uses an analog circuit that imitates the actions of a biological neuron, the digital model in which all the analog neuron is replaced with a digital neuron(Fox, 2013), or hybrid model, a combination of both analog and digital models. Each approach has its strengths and weaknesses(Kakkar, 2009), so the choice depends on the approach the designer believes works best for the application. For this design, the digital approach will be used for implementation because digital circuits tend to scale more easily with new processes(Kakkar, 2009).

Figure 6: ANN Implementation Models

3.2 System Architecture

The system is made up of different components available on the FPGA board which work together to bring about the desired result. These components are the Neural network itself, SRAM Interface, Push Buttons, Toggle Switches, LCD Module, and Memory on chip. The main components of the system are the SRAM and the Neural Network.

(24)

Figure 7: System Architecture

3.2.1 SRAM

There are different devices on which a computer stores data and Static Random-Access Memory (SRAM) is one of them. Random Access Memory (RAM) allows data to be read and written to it in no specific order, and this is done in the same amount of time regardless of where the data is physically located in the memory(Santhiya & Mathan, 2015). It is regarded as a volatile memory because it retains data while it has power and loses it as soon as it loses power. An SRAM is a semiconductor memory that holds data in a static form, and unlike Dynamic Random-Access Memory (DRAM) does not need to be refreshed. To store a bit, the SRAM uses a bi-stable latching circuitry. For this work, the SRAM is used to store the synaptic weights and biases of the artificial neural network.

(25)

Figure 8: Block Diagram of SRAM (Cs & We, n.d.)

(26)

Figure 9: SRAM Read/Write

• (Tristate output): Low, high or off allows for connection of outputs from chips

• CE (Chip Enable): Control input signal that affects the power dissipation of the circuit

• OE (Output Enable): Control input signal that permits or prevents data output

• A (Address): selects one of the 2¹³ 8-bit locations.

• WR (Write): Stores new data in selected location

• D (Data in/out): This is for read/write cycle.

(27)

Table 1: SRAM Read/Write Operation

CE OE WR D0…Db-1 Action

0 ? ? Hi Z Disabled

1 0 0 Hi Z Idle

1 1 0 Out Read

1 ? 1 In Write

3.2.2 The Pattern Recognition Artificial Neural Network

The neural network used for this work is a feedforward artificial neural network that uses sigmoid activation function. It is composed of 64 neurons; 16 in the input layer, 32 in the hidden layer and 16 in the output layer.

(28)

Figure 10: Pattern Recognizer ANN

(29)

Figure 11: Block Diagram of a Single Neuron

Figure 12: Sigmoid activation Function

(30)

3.3 Learning Process

The artificial neural network was trained using supervised learning process, and to improve its accuracy, backward propagation algorithm was used. Backward propagation uses gradient descent to calculate the gradient of the error function concerning the artificial neural network’s weights. If a dataset that has input-target output pairs of size N given as X = (𝑥1⃗⃗⃗⃗ ,𝑦1⃗⃗⃗⃗ ), …, (𝑥𝑁⃗⃗⃗⃗⃗ ,𝑦𝑁⃗⃗⃗⃗⃗ )

where 𝑥𝑖⃗⃗⃗ is the input and 𝑦𝑖⃗⃗⃗ is the target output. An error function 𝐸(𝑋, 𝜃) determine the error between the target output 𝑦𝑖⃗⃗⃗ and the calculated output 𝑦𝑖⃗⃗⃗ ̂ . From the learning rate , each iteration of gradient descent updates the weights and biases 𝜃^𝑡+1  𝜃^𝑡   ^{𝜕𝐸(𝑋, 𝜃}^𝑡⁾

𝜕𝜃 . 𝜃^𝑡 is the parameter of the artificial neural network at iteration t in gradient descent.

For updating the weights 𝑤_𝑖𝑗^𝑘   ^{𝜕𝐸(𝑋,𝜃)}

𝜕𝑤_𝑖𝑗^𝑘 . Where 𝑤_𝑖𝑗^𝑘 is the weight for node j in layer lk.

Figure 13: Backward propagation (Yaldex, n.d.)

(31)

3.4 Schematic design of the system

The schematic design of the entire system shows how components are interconnected.

Figure 14: System schematic design

(32)

3.5 Schematic design of individual components

Figure 15: Schematic design of the ANN, SRAM, Controls and ALU

(33)

Figure 16: Schematic design of the SRAM

(34)

Figure 17: Schematic design of the ANN

(35)

Figure 18: Schematic design of the ALU

(36)

Figure 19: Schematic design of the Control Switches

(37)

Figure 20: Schematic design of the Display

(38)

Chapter 4

4.1 System Testing

The design is going to be tested using Quartus II which is a software from Altera for designing a programmable logic device.

Launch Quartus II, open the project files from inside of Quartus II and compile. After the

compilation, verify that it compiled correctly, connect the FPGA device to the computer using the USB blaster and load the design on it.

Figure 21:Connecting the FPGA to the computer with USB blaster

(39)

Figure 22: Training the ANN

(40)

Figure 23: Recognizing Pattern for letter A

4.2 System Evaluation

The system is evaluated based on its power consumption and the number of resources used.

Table 2: System evaluation

Total Logic Elements 11,392/ 33,216 (34%) Total Combinational Functions 10,989/ 33,216 (33%) Dedicated Logic Registers 5,814/ 33,216 (18%)

Total Registers 5814

Total Memory bits 4,956/ 483,840 (1%) Embedded Multiplier 9-bit elements 54/ 70 (77%)

Total Thermal Power Dissipation 286.84mW

Core Dynamic Thermal Power Dissipation 104.41mW

Core Static Thermal Power Dissipation 80.52mW

I/O Thermal Power Dissipation 101.91mW

(41)

Chapter 5

5.1 Conclusion

ANN implemented in hardware is significantly helping to increase the speed and parallelism harnessed from ANN and this is because specialized hardware provides high computational power and cost-effectiveness. A specialized hardware called Field Programmable Gate Array (FPGA) is used for the emulation of this system. The architecture of the system emulated used different components available on the FPGA; The SRAM, SDRAM, toggle switches, LCD and the push button switches. The neural network emulated takes the pattern from 16 toggle switches on the FPGA which forms a 4x4 operates on it and displays the closest possible letter it represents.

The network has input, hidden and one output layer; 16 perceptrons in the input layer, 32 in the hidden layer and 16 in the output layer. The network has a feedforward architecture and is trained, with Supervised learning architecture using the backward propagation technique.

5.2 Future Work

The three layers of the ANN is implemented with 32-bit single precision floating point numbers.

In order to reduce cost with regards to Area and Energy, I propose using the int 8.

Table 3: Data representation format

Format Sign Exponent Mantissa Range Accuracy

FP 32 1 8 23 10⁻³⁸ to 10³⁸ 0.000006%

FP 16 1 5 10 6 x 10⁻⁵ to 6 x 10⁴ 0.5%

Int 32 1 31 0 to 2 x 10⁹ 0.5%

Int 16 1 15 0 to 6 x 10⁴ 0.5

Int 8 1 7 0 to 127 0.5

Fixed point

(42)

Google tpu uses int 8 an integer, to represent a fixed-point number. The fixed-point number has three parts; Sign, Integer that is followed by a radix point, and a Fractional part.

The 8bit and 16bit is preferred to the 32bit floating point because of the cost.

Table 4: Energy and Area cost of Operation (Mark, 2014), (Song, 2016).

Operation Energy (pJ) Area (𝝁𝒎^𝟐)

8 bit Add 0.03 36

16 bit Add 0.05 67

32 bit Add 0.1 137

16 bit FP Add 0.4 1360

32 bit FP Add 0.9 4184

8 bit Mult 0.2 282

32 bit Mult 3.1 3495

16 bit FP Mult 1.1 1640

32 bit FP Mult 3.7 7700

32 bit SRAM Read (8KB) 5 N/A

32 bit DRAM Read 640 N/A

(43)

Figure 24:Relative Energy Cost (Horowitz, 2014)

(44)

Figure 25: Relative Area Cost (Dally, 2016)

(45)

References

Abdallah, A. Ben. (2016). Adaptive SoCs for Smart Autonomous Systems.

Abdallah, A. Ben. (2017). Neuro-Inspired Adaptive Manycore SoCs and Applications.

Aliaga, R. J., Gadea, R., Colom, R. J., Monzó, J. M., Lerche, C. W., Martínez, J. D., … Mateo, F. (2008). Multiprocessor SoC implementation of neural network training on FPGA.

Proceedings - International Conference on Advances in Electronics and Micro-Electronics, ENICS 2008, 149–154. https://doi.org/10.1109/ENICS.2008.22

Campbell, M., Hoane, A. J., & Hsu, F. (2002). Deep Blue. Artificial Intelligence, 134(1–2), 57–

83. https://doi.org/10.1016/S0004-3702(01)00129-1

Canini, K., Chandra, T., Ie, E., Mcfadden, J., Goldman, K., Gunter, M., … Llinares, T. L. (n.d.).

Sibyl : A system for large scale supervised machine learning ● Users respond differently to different information.

Collobert, R., Kavukcuoglu, K., & Farabet, C. (2011). Torch7: A matlab-like environment for machine learning. BigLearn, NIPS Workshop, 1–6. Retrieved from

http://infoscience.epfl.ch/record/192376/files/Collobert_NIPSWORKSHOP_2011.pdf Cs, A., & We, O. E. (n.d.). Block Diagram of a Static RAM.

da Silva, I. N., Hernane Spatti, D., Andrade Flauzino, R., Liboni, L. H. B., & dos Reis Alves, S.

F. (2017). Artificial Neural Networks, 21–29. https://doi.org/10.1007/978-3-319-43162-8 Dally, B. (2016). Hardware for Deep Learning.

Deng, L. (2012). The MNIST database of handwritten digit images for machine learning research. IEEE Signal Processing Magazine, 29(6), 141–142.

https://doi.org/10.1109/MSP.2012.2211477

Forssell, M. (2014). Hardware Implementation of Artificial Neural Networks for Vibroacoustic Signal Classification. Information Flow in Networks, 18(1), 1–4.

(46)

https://doi.org/10.1063/1.1459605

Fox, P. J. (2013). Massively Parallel Neural Computation. University of Cambridge Computer Laboratory, (830), 1–105.

Girau, B., & Torres-huitzil, C. (2007). Massively distributed digital implementation of an integrate-and-fire LEGION network for visual scene segmentation, 70, 1186–1197.

https://doi.org/10.1016/j.neucom.2006.11.009 Hagan, M. T., & Beale, M. H. (n.d.). NNDesign.pdf.

Hinton, G., Srivastava, N., & Swersky, K. (2012). Neural Networks for Machine Learning Lecture. 2a. An overview of the main types of neural network architecture Feed-forward neural networks. Machine Learning, 32.

Horowitz, M. (2014). 1.1 Computing’s energy problem (and what we can do about it). Digest of Technical Papers - IEEE International Solid-State Circuits Conference, 57, 10–14.

https://doi.org/10.1109/ISSCC.2014.6757323

Indiveri, G., Linares-Barranco, B., Legenstein, R., Deligeorgis, G., & Prodromakis, T. (2013).

Integration of nanoscale memristor synapses in neuromorphic computing architectures.

https://doi.org/10.1088/0957-4484/24/38/384010

Kakkar, V. (2009). Comparative Study on Analog and Digital Neural Networks. International Journal of Computer Science and Network Security, 9(7), 14–21. Retrieved from http://paper.ijcsns.org/07_book/200907/20090703.pdf

Khalil, R. A., & Al-kazzaz, S. A. (n.d.). Digital Hardware Implementation of Artificial Neurons Models Using FPGA. University of Mosul, (2008), 12–24.

Lozito, G. M., Laudani, A., Riganti-Fulginei, F., & Salvini, A. (2014). FPGA implementations of feed forward neural network by using floating point hardware accelerators. Advances in Electrical and Electronic Engineering, 12(1), 30–39.

https://doi.org/10.15598/aeee.v12i1.831

Macukow, B. (2016). Computer Information Systems and Industrial Management, 9842, 3–14.

(47)

https://doi.org/10.1007/978-3-319-45378-1

Michael Marsalli. (n.d.). McCulloch-Pitts Neurons (Overview). Retrieved October 29, 2017, from http://www.mind.ilstu.edu/curriculum/modOverview.php?modGUI=212

Misra, J., & Saha, I. (2010). Artificial neural networks in hardware: A survey of two decades of progress. Neurocomputing, 74(1–3), 239–255.

https://doi.org/10.1016/j.neucom.2010.03.021

Moore, B. G. E. (1975). Cramming more components onto integrated circuits, 38(8).

Santhiya, V., & Mathan, N. (2015). Review on Performance of Static Random Access Memory ( SRAM ), 4(2), 403–406. https://doi.org/10.17148/IJARCCE.2015.4291

Savran, A., & Ünsal, S. (n.d.). HARDWARE IMPLEMENTATION OF A FEEDFORWARD NEURAL NETWORK USING FPGAs, 3–6.

Taigman, Y., Yang, M., & Ranzato, M. A. (2014). Deepface: Closing the gap to humal-level performance in face verification. CVPR IEEE Conference, 1701–1708.

https://doi.org/10.1109/CVPR.2014.220

Won, E. (2007). A Hardware Implementation of Artificial Neural Network Using Field

Programmable Gate Arrays. Proceedings of the IEEE International Joint Conference on Neural Network, 41(3), 13. https://doi.org/10.7763/IJCTE.2013.V5.795

Yadav, N., Yadav, A., & Kumar, M. (2015). An Introduction to Neural Network Methods for Differential Equations, 13–16. https://doi.org/10.1007/978-94-017-9816-7

Yaldex. (n.d.). Game Development. Retrieved November 17, 2017, from

http://www.yaldex.com/game-development/1592730043_ch20lev1sec5.html#ch20fig05 Adaptive Systems Lab, University of Aizu, Japan, URL: http://adaptive.u-aizu.ac.jp/

Ben Abdallah, Invited speaker, International Conference on Control, Automation and Robotics, April 22-24, 2017, Nagoya, Japan. Title: ”Neuro-Inspired Adaptive Manycore SoCs and

Applications’‘.

(48)

Ben Abdallah, Invited Speaker, 17th International Conference on Sciences and Techniques of Automatic control & Computer Engineering (STA2016), Sousse, Dec. 19-21, 2016.

Title: ”Adaptive SoCs for Smart Autonomous Systems”.

Ben Abdallah, Invited Speaker, 15th International Conference on Sciences and Techniques of Automatic control & Computer Engineering, Hammamet, Dec. 21-23, 2014. Title: ”Si-Photonics Technology Towards fJ/bit Optical Communication in Many-core Chips”

Khanh N. Dang, Akram Ben Ahmed, Yuichi Okuyama, and Abderazek Ben Abdallah, ”Scalable Design Methodology and Online Algorithm for TSV-cluster Defects Recovery in Highly Reliable 3D-NoC Systems”, IEEE Transactions on Emerging Topics in Computing, Special Issue on Reliability-aware Design and Analysis Methods for Digital Systems: from Gate to System Level, 2017 (in press). DOI: 10.1109/TETC.2017.2762407

Khanh N. Dang, Akram Ben Ahmed, Xuan-Tu Tran, Yuichi Okuyama, Abderazek Ben Abdallah,

”A Comprehensive Reliability Assessment of Fault-Resilient Network-on-Chip Using Analytical Model”, IEEE Transactions on Very Large Scale Integration (VLSI) Systems, Vol. 25, Issue: 11, pp. 3099 – 3112, Nov. 2017. DOI:10.1109/TVLSI.2017.2736004

Akram Ben Ahmed, “High-throughput Architecture and Routing Algorithms Towards the Design of Reliable Mesh-based Many-Core Network-on-Chip Systems”, Ph.D. Thesis, Graduate School of Computer Science and Engineering, University of Aizu, March 2015.

Kenichi Mori, OASIS Network-on-Chip Prototyping on FPGA, ”Master’s Thesis, Graduate School of Computer Science and Engineering, The University of Aizu, Feb. 2012, Ref. 19KM- MT11.

Ben Ahmed Akram, On the Design of a 3D Network-on-Chip for Many-core SoC, Master’s Thesis, The University of Aizu, Feb. 2012.

HARDWARE EMULATION STUDY OF NEURONAL PROCESSING IN CORTEX FOR PATTERN RECOGNITION. A Thesis submitted to the Department of Computer Science

CERTIFICATION

Ogbodo Mark Ikechukwu

ALL RIGHTS RESERVED

ABSTRACT

ACKNOWLEDGEMENT

DEDICATION

Table of Contents

List of Figures

Figure 1: Mathematical Model of an Artificial Neuron ... 3

Figure 2: Artificial Neural Network (ANN) ... 4

Figure 3: Supervised Learning ... 5

Figure 4: Unsupervised Learning ... 6

Figure 5: Reinforced Learning ... 7

Figure 6: ANN Implementation Models ... 12

Figure 7: System Architecture ... 13

Figure 8: Block Diagram of SRAM (Cs & We, n.d.) ... 14

Figure 9: SRAM Read/Write ... 15

Figure 10: Pattern Recognizer ANN ... 17

Figure 11: Block Diagram of a Single Neuron ... 18

Figure 12: Sigmoid activation Function ... 18

Figure 14: System schematic design ... 20

Figure 15: Schematic design of the ANN, SRAM, Controls and ALU ... 21

Figure 16: Schematic design of the SRAM ... 22

Figure 17: Schematic design of the ANN ... 23

Figure 18: Schematic design of the ALU ... 24

Figure 19: Schematic design of the Control Switches ... 25

Figure 20: Schematic design of the Display ... 26

Figure 21:Connecting the FPGA to the computer with USB blaster ... 27

Figure 22: Training the ANN ... 28

Figure 23: Recognizing Pattern for letter A ... 29

Figure 24:Relative Energy Cost (Horowitz, 2014) ... 32

Figure 25: Relative Area Cost (Dally, 2016) ... 33

List of Tables

Table 1: SRAM Read/Write Operation... 16

Table 2: System evaluation ... 29

Table 3: Data representation format ... 30

Table 4: Energy and Area cost of Operation (Mark, 2014), (Song, 2016). ... 31

Chapter 1

1.1 Introduction

1.2 Statement of Problem

1.3 Biological Neuron

1.4 Artificial Neuron

1.5 Artificial Neural Network (ANN)

1.6 ANN Architectures

1.7 Learning Algorithms

1.7.1 Supervised learning

1.7.2 Unsupervised learning

1.7.3 Reinforced learning

1.8 Research Objectives

1.9 Organization of Work

Chapter 2

2.1 Brief History

2.2 Applications of ANN

2.3 Other Related Works

Chapter 3

3.1 System Emulation

3.2 System Architecture

3.2.1 SRAM

3.2.2 The Pattern Recognition Artificial Neural Network

3.3 Learning Process

3.4 Schematic design of the system

The schematic design of the entire system shows how components are interconnected.

3.5 Schematic design of individual components

Chapter 4

4.1 System Testing

4.2 System Evaluation

The system is evaluated based on its power consumption and the number of resources used.

Total Logic Elements 11,392/ 33,216 (34%) Total Combinational Functions 10,989/ 33,216 (33%) Dedicated Logic Registers 5,814/ 33,216 (18%)

Total Registers 5814

Total Memory bits 4,956/ 483,840 (1%) Embedded Multiplier 9-bit elements 54/ 70 (77%)

Total Thermal Power Dissipation 286.84mW

Core Dynamic Thermal Power Dissipation 104.41mW

Core Static Thermal Power Dissipation 80.52mW

I/O Thermal Power Dissipation 101.91mW

Chapter 5

5.1 Conclusion

5.2 Future Work

The three layers of the ANN is implemented with 32-bit single precision floating point numbers.

In order to reduce cost with regards to Area and Energy, I propose using the int 8.