ISSN(Online): 2319-8753 ISSN (Print): 2347-6710
I
nternational
J
ournal of
I
nnovative
R
esearch in
S
cience,
E
ngineering and
T
echnology
(A High Impact Factor, Monthly, Peer Reviewed Journal)
Visit: www.ijirset.com
Vol. 8, Issue 10, October 2019
Structure of Neural Network and
Mathematics behind It
Mukul Kumar
B-Tech 3rd Year, Department of Civil Branch, National Institute of Technology Andhra Pradesh,
Andhra Pradesh, India
ABSTRACT: Artificial Neural Network is deigns to mimics the nervous system , ANNs are compose of multilayer of perceptrons that work together to learn , recognize patterns and predict data.
Weights are assign to input values, it amplify the connection between nodes. Activation function is applied , which decide probability of what of particular class. Feed forward propagation occur to pass signal to next layer. By back propagation, error can be find. Gradient descent minimize the error, by finding new weight.
KEYWORDS: Neural Network, Artificial Neural Network, Weight, Activation Function, Sigmoid Function, Tanh, ReLu, Leaky ReLU, Softmax Function, Back Propagation.
I. INTRODUCTION
Neural Network is set of algorithm that mimics the way the human brain behave , these are design to recognize patterns.
Today Neural Network are used for solving many business problem such as sales , forecasting , data validation , and risk management.
The history of Neural Network begins in the early 1940βs , nearly simultaneously with the history of programmable electronic computers. As soon as 1943, WARREN McCULLOCH and WALTER PITTS introduced models of neurological networks , recreated threshold switches based on neurons and showed that even simple networks of this kind are able to calculate nearly any logic or arithmetic function.
The first Artificial Neural Network was invented in 1958 psychologist
FRANK RASENBLATT called perceptron, it was intended to model how to human brain processed visual data and learned to recognize objects.
1)WEIGHT:
ISSN(Online): 2319-8753 ISSN (Print): 2347-6710
I
nternational
J
ournal of
I
nnovative
R
esearch in
S
cience,
E
ngineering and
T
echnology
(A High Impact Factor, Monthly, Peer Reviewed Journal)
Visit: www.ijirset.com
Vol. 8, Issue 10, October 2019
There is no rule to connect the nodes of neuron. During the learning process extra connection will de-emphasize by decreasing the weight value nearly to zero. With the help of matrix dot multiplication calculation of output from second layer,
Output from the first node of layer 2,
x1w11 + x2w21 + x3w31 = y1
Output from the second node of layer 2,
x1w12 + x2w22 + x3w32 = y2
Output from the third node of layer 2,
x1w13 + x2w23 + x3w33 = y3
above equations in matrix form,
π€11 π€12 π€13
π€21 π€22 π€23
π€31 π€32 π€33 .
π₯1
π₯2
π₯3
= π¦1
π¦2
π¦3
fig. 1
ISSN(Online): 2319-8753 ISSN (Print): 2347-6710
I
nternational
J
ournal of
I
nnovative
R
esearch in
S
cience,
E
ngineering and
T
echnology
(A High Impact Factor, Monthly, Peer Reviewed Journal)
Visit: www.ijirset.com
Vol. 8, Issue 10, October 2019
2) Activation function:
It introduce non-linear property to Neural Network. It actually squeeze the input give output which have non-linear relation and also decides whether a neuron should be activated or not by applying activation function.
There are many type of activation function, it can use according to need, but generally sigmoid function is use
2.1)Sigmoid Function:
Sigmoid function is also known as logistic function . It is used for model where output required between (0,1) and where model need to predict probability , when the input value of sigmoid function less than zero it gives the output in range (0, 0.5), when input is greater than or equal to zero it give in range [0.5,1] . It never gives output 0 and 1 , because it shows 0 and 1 at ββ and β respectively.
Equation of sigmoid function,
y= 1 1+πβπ₯
where e is constant have value 2.7182β¦. , x is input value and y is output value. #code in python
import scipy.special
lambda x: scipy.special.explit(x)
ISSN(Online): 2319-8753 ISSN (Print): 2347-6710
I
nternational
J
ournal of
I
nnovative
R
esearch in
S
cience,
E
ngineering and
T
echnology
(A High Impact Factor, Monthly, Peer Reviewed Journal)
Visit: www.ijirset.com
Vol. 8, Issue 10, October 2019
2.2) Tanh or Hyperbolic function:
Tanh is also like logistic sigmoid but it has range from -1 to +1.It is mainly used for classifications between two classes.
Equation of tanh function,
y= tanh( π₯) = ππ₯βπβπ₯ ππ₯+πβπ₯
#code in python import numpy as np np.tanh(x)
2.3)ReLU(Rectified Linear Unit):
ReLU is half rectified function. It neutralize the input having having less than zero value, means it gives zero value for all input having value less than zero(y=0). And for input value greater than zero it give function as,
y= 0 π₯ β€ 0
π₯ π₯ > 0 or,
ISSN(Online): 2319-8753 ISSN (Print): 2347-6710
I
nternational
J
ournal of
I
nnovative
R
esearch in
S
cience,
E
ngineering and
T
echnology
(A High Impact Factor, Monthly, Peer Reviewed Journal)
Visit: www.ijirset.com
Vol. 8, Issue 10, October 2019
2.4) Leaky ReLU:
Leaky ReLU are one attempt to fix the βdying ReLUβ problem. It provide negative slope for x<0. Equation of Leaky ReLU Function,
Leaky ReLU = max(0,x) + negative slope * min(0,x)
#code in python
tf.nn.leaky_relu(features,alpha = 0.01, name = None) features is x values
alpha is negative slope
2.5)Softmax Function:
It is more generalize sigmoid function , used for multiclass classification. It is generally use for output layer.
The Softmax function transforms a bunch of arbitrarily large or small numbers into a valid probability distribution, means its sum should be 1.
Range of Softmax function is [0,1]. It doesnβt have any graph. Equation of Softmax function ,
πππ ππ π
π
Let the input value in output layer before applying any activation function are,
[ 0.5 , 1.5 , -0.7 ] πππ
π = π0.5+ π1.5+ πβ0.7 = 6.63
Output layer value after activation function are, [ 0.25 , 0.68 , 0.07 ]
#code in python
import numpy as np import tensorflow as tf #using numpy
np.exp(x)/sum(np.exp(x) , axis = 0)) #using tensorflow
ISSN(Online): 2319-8753 ISSN (Print): 2347-6710
I
nternational
J
ournal of
I
nnovative
R
esearch in
S
cience,
E
ngineering and
T
echnology
(A High Impact Factor, Monthly, Peer Reviewed Journal)
Visit: www.ijirset.com
Vol. 8, Issue 10, October 2019
3)Feed Forward Propagation From fig. 1
Let, X=
π₯1
π₯2
π₯3 Winpute_hidden =
π€11 π€12 π€13
π€21 π€22 π€23
π€31 π€32 π€33 Xhidden = Winpute_hidden * X
Ohidden = activation function (Xhidden)
Xhidden =
π₯1 π€11 π₯2 π€21 π₯3 π€31
π₯1 π€12 π₯2 π€22 π₯3 π€32
π₯1 π€13 π₯2 π€23 π₯3 π€33 This Xhidden act as input for outer layer.
4)Back Propagation:
Error = target β actual output
ISSN(Online): 2319-8753 ISSN (Print): 2347-6710
I
nternational
J
ournal of
I
nnovative
R
esearch in
S
cience,
E
ngineering and
T
echnology
(A High Impact Factor, Monthly, Peer Reviewed Journal)
Visit: www.ijirset.com
Vol. 8, Issue 10, October 2019
The denominator of weight matrix are just normalizing factor. Removal of this wonβt affect the error, it only lose the scaling.
e hidden =
π€11 π€11 + π€21
π€12 π€12 + π€22 π€21
π€11 + π€21
π€22 π€12 + π€22
. ππ1 2
here weight matrix is transpose of feed forward matrix, where feed forward matrix is,
π€π€11 π€12
21 π€22
so, it can be written as ehidden = wThidden_output . eoutput
similarly,
einput = wTinput_hidden . ehidden
weight is update by process called brute force method
5)Gradient Descent:
Gradient descent is an optimization algorithm and it is used to find minimum of function by moving iteratively in direction of steepest descent. Here it can
use to find minimum error.
If it has negative slope , weight will increase, if it shows positive slope, weight will decrease because minimum error is at the bottom of curve,
where,
ππ€ππΈ = 0
Equation of gradient descent for sigmoid function, ππΈ
ππ€ = -Ek* sigmoid(Ok) * (1 β sigmoid(Ok)) . Oi wnew = wold β πΌ
ππΈ ππ€
β³ π€ = πΌ Ek* sigmoid(Ok) * (1 β sigmoid(Ok)) . OiT
wnew = wold + β³ π€
ISSN(Online): 2319-8753 ISSN (Print): 2347-6710
I
nternational
J
ournal of
I
nnovative
R
esearch in
S
cience,
E
ngineering and
T
echnology
(A High Impact Factor, Monthly, Peer Reviewed Journal)
Visit: www.ijirset.com
Vol. 8, Issue 10, October 2019
II. CONCLUSION
The proposed Neural Network architecture has an ability to classify the character patterns in same degree. But it shows difficulties during the classification of unknown sample.
REFERENCES
1. Bishop,C.(1999).Pattern recognition and Feed forward networks .In MIT encyclopedia of the cognitive sciences R.A. Wilson and F.C keil (eds),
MIT Press 629-632
2. Ekpenyong, M.and Bello, M.(2002). A neural computing approach to pattern recognition of digits using Bipolar inputs, Journal of Natural and
applied science:79-88.
3. Anastasio,T.(2002).Neural Network learning. http://www.cs.rtu.lv/dssg/download/publications/2002/pchelkin-EROAT-2002.pdf
4. Gurney K.(1997) An introduction to neural networks .CRC press, london.
5. Kim, C.; Govindaraju, V.(1997). A Lixicon driven approach to handwriting word recognition for real time application . IEEE PAMI 18,no.4:
366- 379.
6. Tang, Y,; Lee, S.; And suen ,C.(1996).Automatic document processing: a survey pattern recognition 29,12:1931-1952.
7. Machine learning: detecting malicious domains with Tensorflow.The coruscan project.
8. McCulloch,Warren.; Walter Pitts(1943). "A logical calculus of ideas immanent in nervous activity". Bulletin of mathematical biophysics.
5(4):115-133.doi;:10.1007/BF02478259.
9. Dreyfus, Stuart E.(1990)."Artificial neural networks , Back propagation , and the kelley -Bryson gradient procedure".Journal of Guidance ,
control and Dynamics. 13(5):926-928.Bibcode:1990JGCD...13...926D.doi:10.2514/3.25422. ISSN 07315090.
10. Li, Y.; Fu, Y.; Li, H.; Zhang, S.W.(2009).The improved training algorithm of Back propagation neural networks with self adaptive learning