ABSTRACT: Artificial Neural Network is deigns to mimics the nervous system , ANNs are compose of multilayer of

(1)

ISSN(Online): 2319-8753 ISSN (Print): 2347-6710

I

nternational

J

ournal of

I

nnovative

R

esearch in

S

cience,

E

ngineering and

T

echnology

(A High Impact Factor, Monthly, Peer Reviewed Journal)

Visit: www.ijirset.com

Vol. 8, Issue 10, October 2019

Structure of Neural Network and

Mathematics behind It

Mukul Kumar

B-Tech 3rd Year, Department of Civil Branch, National Institute of Technology Andhra Pradesh,

Andhra Pradesh, India

ABSTRACT: Artificial Neural Network is deigns to mimics the nervous system , ANNs are compose of multilayer of perceptrons that work together to learn , recognize patterns and predict data.

Weights are assign to input values, it amplify the connection between nodes. Activation function is applied , which decide probability of what of particular class. Feed forward propagation occur to pass signal to next layer. By back propagation, error can be find. Gradient descent minimize the error, by finding new weight.

KEYWORDS: Neural Network, Artificial Neural Network, Weight, Activation Function, Sigmoid Function, Tanh, ReLu, Leaky ReLU, Softmax Function, Back Propagation.

I. INTRODUCTION

Neural Network is set of algorithm that mimics the way the human brain behave , these are design to recognize patterns.

Today Neural Network are used for solving many business problem such as sales , forecasting , data validation , and risk management.

The history of Neural Network begins in the early 1940’s , nearly simultaneously with the history of programmable electronic computers. As soon as 1943, WARREN McCULLOCH and WALTER PITTS introduced models of neurological networks , recreated threshold switches based on neurons and showed that even simple networks of this kind are able to calculate nearly any logic or arithmetic function.

The first Artificial Neural Network was invented in 1958 psychologist

FRANK RASENBLATT called perceptron, it was intended to model how to human brain processed visual data and learned to recognize objects.

1)WEIGHT:

(2)

I

nternational

J

ournal of

I

nnovative

R

esearch in

S

cience,

E

ngineering and

T

echnology

There is no rule to connect the nodes of neuron. During the learning process extra connection will de-emphasize by decreasing the weight value nearly to zero. With the help of matrix dot multiplication calculation of output from second layer,

Output from the first node of layer 2,

x1w11 + x2w21 + x3w31 = y1

Output from the second node of layer 2,

x1w12 + x2w22 + x3w32 = y2

Output from the third node of layer 2,

x1w13 + x2w23 + x3w33 = y3

above equations in matrix form,

𝑤11 𝑤12 𝑤13

𝑤21 𝑤22 𝑤23

𝑤31 𝑤32 𝑤33 .

𝑥1

𝑥2

𝑥3

= 𝑦1

𝑦2

𝑦3

fig. 1

(3)

I

nternational

J

ournal of

I

nnovative

R

esearch in

S

cience,

E

ngineering and

T

echnology

2) Activation function:

It introduce non-linear property to Neural Network. It actually squeeze the input give output which have non-linear relation and also decides whether a neuron should be activated or not by applying activation function.

There are many type of activation function, it can use according to need, but generally sigmoid function is use

2.1)Sigmoid Function:

Sigmoid function is also known as logistic function . It is used for model where output required between (0,1) and where model need to predict probability , when the input value of sigmoid function less than zero it gives the output in range (0, 0.5), when input is greater than or equal to zero it give in range [0.5,1] . It never gives output 0 and 1 , because it shows 0 and 1 at –∞ and ∞ respectively.

Equation of sigmoid function,

y= 1 1+𝑒−𝑥

where e is constant have value 2.7182…. , x is input value and y is output value. #code in python

import scipy.special

lambda x: scipy.special.explit(x)

(4)

I

nternational

J

ournal of

I

nnovative

R

esearch in

S

cience,

E

ngineering and

T

echnology

2.2) Tanh or Hyperbolic function:

Tanh is also like logistic sigmoid but it has range from -1 to +1.It is mainly used for classifications between two classes.

Equation of tanh function,

y= tanh( 𝑥) = 𝑒𝑥−𝑒−𝑥 𝑒𝑥+𝑒−𝑥

#code in python import numpy as np np.tanh(x)

2.3)ReLU(Rectified Linear Unit):

ReLU is half rectified function. It neutralize the input having having less than zero value, means it gives zero value for all input having value less than zero(y=0). And for input value greater than zero it give function as,

y= 0 𝑥 ≤ 0

𝑥 𝑥 > 0 or,

(5)

I

nternational

J

ournal of

I

nnovative

R

esearch in

S

cience,

E

ngineering and

T

echnology

2.4) Leaky ReLU:

Leaky ReLU are one attempt to fix the “dying ReLU” problem. It provide negative slope for x<0. Equation of Leaky ReLU Function,

Leaky ReLU = max(0,x) + negative slope * min(0,x)

#code in python

tf.nn.leaky_relu(features,alpha = 0.01, name = None) features is x values

alpha is negative slope

2.5)Softmax Function:

It is more generalize sigmoid function , used for multiclass classification. It is generally use for output layer.

The Softmax function transforms a bunch of arbitrarily large or small numbers into a valid probability distribution, means its sum should be 1.

Range of Softmax function is [0,1]. It doesn’t have any graph. Equation of Softmax function ,

_𝑒𝑒𝑎 𝑖_{𝑎 𝑖}

𝑖

Let the input value in output layer before applying any activation function are,

[ 0.5 , 1.5 , -0.7 ] 𝑒𝑎𝑖

𝑖 = 𝑒0.5+ 𝑒1.5+ 𝑒−0.7 = 6.63

Output layer value after activation function are, [ 0.25 , 0.68 , 0.07 ]

#code in python

import numpy as np import tensorflow as tf #using numpy

np.exp(x)/sum(np.exp(x) , axis = 0)) #using tensorflow

(6)

I

nternational

J

ournal of

I

nnovative

R

esearch in

S

cience,

E

ngineering and

T

echnology

3)Feed Forward Propagation From fig. 1

Let, X=

𝑥1

𝑥2

𝑥3 Winpute_hidden =

𝑤11 𝑤12 𝑤13

𝑤21 𝑤22 𝑤23

𝑤31 𝑤32 𝑤33 Xhidden = Winpute_hidden * X

Ohidden = activation function (Xhidden)

Xhidden =

𝑥1 𝑤11 𝑥2 𝑤21 𝑥3 𝑤31

𝑥1 𝑤12 𝑥2 𝑤22 𝑥3 𝑤32

𝑥1 𝑤13 𝑥2 𝑤23 𝑥3 𝑤33 This Xhidden act as input for outer layer.

4)Back Propagation:

Error = target – actual output

(7)

I

nternational

J

ournal of

I

nnovative

R

esearch in

S

cience,

E

ngineering and

T

echnology

The denominator of weight matrix are just normalizing factor. Removal of this won’t affect the error, it only lose the scaling.

e hidden =

𝑤₁₁ 𝑤11 + 𝑤21

𝑤₁₂ 𝑤12 + 𝑤22 𝑤21

𝑤11 + 𝑤21

𝑤22 𝑤12 + 𝑤22

. 𝑒_𝑒1 2

here weight matrix is transpose of feed forward matrix, where feed forward matrix is,

𝑤_𝑤11 𝑤12

21 𝑤22

so, it can be written as ehidden = wThidden_output . eoutput

similarly,

einput = wTinput_hidden . ehidden

weight is update by process called brute force method

5)Gradient Descent:

Gradient descent is an optimization algorithm and it is used to find minimum of function by moving iteratively in direction of steepest descent. Here it can

use to find minimum error.

If it has negative slope , weight will increase, if it shows positive slope, weight will decrease because minimum error is at the bottom of curve,

where,

_𝑑𝑤𝑑𝐸 = 0

Equation of gradient descent for sigmoid function, 𝑑𝐸

𝑑𝑤 = -Ek* sigmoid(Ok) * (1 – sigmoid(Ok)) . Oi wnew = wold – 𝛼

𝑑𝐸 𝑑𝑤

△ 𝑤 = 𝛼 Ek* sigmoid(Ok) * (1 – sigmoid(Ok)) . OiT

wnew = wold + △ 𝑤

(8)

I

nternational

J

ournal of

I

nnovative

R

esearch in

S

cience,

E

ngineering and

T

echnology

II. CONCLUSION

The proposed Neural Network architecture has an ability to classify the character patterns in same degree. But it shows difficulties during the classification of unknown sample.

REFERENCES

1. Bishop,C.(1999).Pattern recognition and Feed forward networks .In MIT encyclopedia of the cognitive sciences R.A. Wilson and F.C keil (eds),

MIT Press 629-632

2. Ekpenyong, M.and Bello, M.(2002). A neural computing approach to pattern recognition of digits using Bipolar inputs, Journal of Natural and

applied science:79-88.

3. Anastasio,T.(2002).Neural Network learning. http://www.cs.rtu.lv/dssg/download/publications/2002/pchelkin-EROAT-2002.pdf

4. Gurney K.(1997) An introduction to neural networks .CRC press, london.

5. Kim, C.; Govindaraju, V.(1997). A Lixicon driven approach to handwriting word recognition for real time application . IEEE PAMI 18,no.4:

366- 379.

6. Tang, Y,; Lee, S.; And suen ,C.(1996).Automatic document processing: a survey pattern recognition 29,12:1931-1952.

7. Machine learning: detecting malicious domains with Tensorflow.The coruscan project.

8. McCulloch,Warren.; Walter Pitts(1943). "A logical calculus of ideas immanent in nervous activity". Bulletin of mathematical biophysics.

5(4):115-133.doi;:10.1007/BF02478259.

9. Dreyfus, Stuart E.(1990)."Artificial neural networks , Back propagation , and the kelley -Bryson gradient procedure".Journal of Guidance ,

control and Dynamics. 13(5):926-928.Bibcode:1990JGCD...13...926D.doi:10.2514/3.25422. ISSN 07315090.

10. Li, Y.; Fu, Y.; Li, H.; Zhang, S.W.(2009).The improved training algorithm of Back propagation neural networks with self adaptive learning