Two Solutions to the XOR Problem using minimum configuration MLP

(1)



Abstract— In this paper we will see two possible solution to the XOR problem, which is a non-linearly separable problem.

This is in continuation to what author did in our previous work [18]. In this paper the two Mathematical Solutions to the XOR problem using Minimum Configuration Multilayer Perceptron MLP is given. The solution proposed is different to what author did in the previous paper [18]. The Activation function used for the Non-Linear Neurons present in the network is hyperbolic tangent function.

Index Terms—ANN, MLP, Nonlinearly Separable Problem, Activation Function, Signal Flow Graph, Architectural Graph.

I. INTRODUCTION

Artificial Neural Network is an Emerging field in the Computer Science and Engineering Department. The ability of the Artificial Neural Network to deal with inconsistent and irrelevant data makes it versatile for its usage in wide range of domains. The name Artificial Neural Network itself says that it is going to be a network of neurons. The term neuron came from the biological neuron. In brain neuron are the nerve cells which are responsible for all the activities pursued by the nervous system. Biological neural network is a parallel and distributed processor; likewise Artificial Neural Network is also a parallel and distributed processor.

Biological Neural Network operates in milliseconds whereas artificial neural network which is supposed to run in Computer having speed in nanoseconds. Artificial Neural Network is used to solve various types of problems. There are three basic elements of the Artificial Neural Network.

The names of the elements are individual neuron present in the network, network topology and learning algorithm. The property that is of primary significance for a neural network is its ability to learn from its environment. This ability makes the Artificial Neural Network more intelligible about the environment in which it is working. There are different types of Artificial Intelligent Systems which are used to solve variety of problems. One such system is the Multilayer perceptron. In this paper we will see the solution to the XOR problem using Multilayer perceptron.

XOR is a nonlinearly separable problem. Linear separabilty is discussed in the next section i.e. in the Problem statement section. The problems which exhibit the

.

characteristics similar to XOR are called non-linearly separable problems.

II. PROBLEM STATEMENT

The Definition of Linear Separability is Two sets of points A and B in an n-dimensional space are called linearly separable if (n+1) real numbers w1, w2,……w(n+1) exist, such that every point , … . . ∈ satisfies

∑^��= ≥ + exist and every point

, … . ∈ satisfies ∑^�_�= < + exist.

The Perceptron model proposed by Rosenblatt [7] using perceptron convergence algorithm was able to solve the above problems. XOR problem [18] is a non linearly separable problem and thus can’t be solved by the common mechanism.

III. L^ITERATURESURVEY

In [1] McCulloch and Pitts gave the model of neuron and the logical operations performed in the neuron. The proposed model was lagging in learning mechanism. In [2]

Hebb gave statement which was rephrased into two part rule and gave rise to Hebbian learning. Using Hebbian learning law modification in weight was proposed. Minsky[3,4]

proposed the use of learning machine for the construction of Artificial Neural Network. In [7] Rosenblatt proposed the perceptron brain model. He proposed perceptron convergence algorithm for training his perceptron model.

Perceptron model was the first model which utilized supervised learning in its framework. In [8] Widrow and Hoff proposed the use of Adaptive Switching Circuits in Artificial Neural Network. In [9] Widrow and Stearns proposed the Adaptive Signal Processing mechanism for ANN. Widrow and Hoff are responsible for the proposal of the Adaline systems also. In [5,6] Minsky and Papert gave introduction to the concept of Multi Layer Perceptron (MLP) and discussed them for solving hard problems. They proposed the model but the learning framework for not precise. In [10,11,12,13] Rumelhart et. al. made contribution for the formulation of Generalized delta rule. Which is used widely in neural network for performing training. In [14,15]

Werbos proposed Error backpropagation. In [16] Hopfield gave a brief introduction to energy analysis in ANN. In [17]

Hinton and Sejnowski gave a note on Boltzmann machine employing Boltzmann learning. In [18] V.K. Singh

Two Solutions to the XOR Problem using minimum configuration MLP

Vaibhav Kant Singh

1

Department of Computer Science & Engineering, Institute of Technology,

Guru Ghasidas Vishwavidyalaya, Central University, Bilaspur, Chhattisgarh, India, 495001

[email protected]

(2)

proposed one possible solution to the XOR problem using Artificial Neural Network.

TABLEI.TRUTHTABLEFORXOR.

x1 x2 = ⨁

0 0 0

1 0 1

0 1 1

1 1 0

IV. P^ROPOSEDWORK A. First Proposed Solution

In this paper we will see two solutions to the XOR problem using minimum configuration MLP. In the first solution in the MLP there will be one hidden layer which is going to contain two neurons. The nature of these two neurons is going to be non linear, the neurons are going to exhibit non linearity by the presence of hyperbolic tangent function as the activation function. In the output layer there will be one neuron which is going to be possessing linear characteristic. The neuron is going to show linearity by the presence of Threshold function as Activation Function. The first solution is proved below.

Fig.1. Architectural graph representing the solution to the XOR problem using minimum configuration MLP having two neurons in the hidden layer.

Fig. 2. Signal flow graph representing the solution to the

XOR problem using minimum configuration MLP having two neurons in the hidden layer.

SOLUTION1(From Fig. 2) CASE 1:-

x1=0 and x2=0

At node1 signal value will be

= × + × + × − . … … … …

= × + × − . = − . … … … … At node2 signal value will be

= × + × + × − . … … … …

= × + × − . = − . … … … …

Here, the activation function used for non linear elements in the hidden layer is hyperbolic tangent function. Definition of hyperbolic tangent function is :-

tan ℎ =sin ℎ cos ℎ =

�− ^−�

�+ ^−�= ^�−

�+ =

− ^{− �} + ^{− �}… . Here, e=2.7182818456 i.e. the natural logarithm base known as Euler’s number, x=induced local field value and Range of values for function will be [-1,+1].

At node3 the signal value will be (From Eq(2) & Eq(5))

� − . = − ^{− × − .5}

+ ^{− × − .5} =− .

= − . … … … ..

� − . = − ^{− × − .5} + ^{− × − .5} =− .

= − . … … … ….

= × − + × + × − . … … … . .

= − . × − + − . × − .

= − . … … … …

The value produced by node5 in Eq(9) will be squashed by the linear activation function i.e. the threshold function.

Since, in the output layer the neuron present exhibits linear characteristics in minimum configuration MLP. Definition of Threshold function is :-

� = { , < ℎ ℎ

, ≥ ℎ ℎ … … … . .

Here, threshold=0

From Eq(9) and Eq(10) the value of output y when x1=0 and

x2=0 will be = � − . = … … … …

Since,

− . < ℎ ℎ ℎ ℎ

CASE2:- x1=1 and x2=0

At node1 the signal value will be from Eq(1)

= + − . = − . … … … . At node2 the signal value will be from Eq(3)

(3)

= + − . = . … … … …

At node3 the signal value will be from Eq(12) & Eq(5)

= � − . = − ^{− × − .5} + ^{− × − .5} =− .

= − . … … … ….

= � . = − ^{− × .5}

+ ^{− × .5} = .

= . … … … ….

At node5 the signal value will be from Eq(14),Eq(15) &

Eq(8) =

− . × − + . × − . ×

= . … … … .

Therefore the value of Output y for x1=1 and x2=0 from Eq(16) and Eq(10) is

= � . = … … … .

Since,

. > ℎ ℎ ℎ ℎ

CASE 3:- x1=0 and x2=1

= + − . = − . … … … . . At node2 the signal value will be from Eq(3)

= + − . = . … … … …

� − . = − ^{− × − .5} + ^{− × − .5} =− .

= − . … … … ….

� . = − ^{− × .5}

+ ^{− × .5} = .

= . … … … . ..

At node5 the signal value will be from Eq(20), Eq(21) &

Eq(8)=

− . × − + . × − . ×

= . … … … …

Therefore the value of output y for x1=0 and x2=1 from Eq(22) and Eq(10) is

= � . = … … … …

Since,

. > ℎ ℎ ℎ ℎ

CASE 4:- x1=1 and x2=1

= + − . = . … … … . At node2 the signal value will be from Eq(3)

= + − . = . … … … …

� . = − ^{− × .5} + ^{− × .5}= .

= . … … … . ..

At node4 the signal value will be from Eq(25) and Eq(5)

= � . = − ^{− × .5}

+ ^{− × .5} = .

= . … … … ..

At node5 the signal value will be from Eq(26), Eq(27) and Eq(8)=

= . × − + . × − .

= − . … … … …

Therefore, the value of y when x1=1 and x2=1 will be from Eq(28) and Eq(10)

= � − . = … … … …

Since,

− . < ℎ ℎ ℎ ℎ

From Eq(11), Eq(17), Eq(23), Eq(29) and Table-I it is concluded that the model proposed having minimum configuration MLP solves the XOR problem.

B. Second Proposed Solution

In this part of this section we will see the proof for the solution provided by the author for XOR problem. In the second solution there are only two neurons. In the hidden layer there is one neuron which is non linear in nature. The neuron exhibits non linear characteristics by the presence of hyperbolic tangent function as the activation function. In the output layer there is one neuron the characteristic of which is going to be linear. It exhibits linearity by the presence of linear activation function i.e. threshold function.

Fig. 3. Architectural graph representing the solution to the XOR problem using minimum configuration MLP having one neuron in the hidden layer.

(4)

Fig. 4. Signal flow graph representing the solution to the XOR problem using minimum configuration MLP having one neuron in the hidden layer.

SOLUTION2 (From Fig. 4) CASE 1:-

x1=0 and x2=0

At node1 the value of the signal will be

= × + × + × − . … … … . .

= × + × + − . = − . … … … … The Activation function used in this case for hidden layer having non-linear element is hyperbolic tangent function.

The definition of hyperbolic tangent function is:- tan ℎ = sin ℎ

cos ℎ =

�− ^−�

�+ ^−�= ^�−

�+ =

− ^{− �} + ^{− �}… where, e= 2.718281828456 (constant) is the natural logarithm base also known as Euler’s number, x is the induced local field of the neuron and Range [-1.+1].

At node2 the value of the signal will be from Eq(b) & Eq(c)

= � − . = − ^{− × − .5}

+ ^{− × − .5} =− .

= − . … … … ..

At node3 the value of the signal will be from Eq(4)

= × + × + × − + × − . … . .

= × + × + − . × − + × − .

= . − . = . … … … … .

In this case for output neuron which is linear element we will be using threshold function as the activation function.

The threshold value is 1.35. The definition of Threshold function is

� = { ≥ ℎ ℎ

< ℎ ℎ … … … . Here, x= induced local field value.

From Eq(f) and Eq(g) we get the value of y for x1=0 and x2=0 as

= � . = … … … ℎ

Since,

. < . ℎ . ℎ ℎ ℎ

CASE 2:- x1=1 and x2=0

At node1 value of the signal will be from Eq(a)

= + − . = − . … … … . .

At node2 value of the signal will be from Eq(i) & Eq(c)

= � − . = − ^{− × − .5} + ^{− × − .5} =− .

= − . … … … ..

At node3 value of the signal will be from Eq(j) & Eq(e)

= + + − × − . − .

= . … … … .

The value of y for x1=1 and x2=0 will be from Eq(k)&Eq(g)

= � . = … … … .

Since,

. > . ℎ . ℎ ℎ ℎ

CASE 3:- x1=0 and x2=1

At node1 the value of signal from Eq(a)

= + − . = − . … … … .

At node2 the value of signal from Eq(m) & Eq(c)

= � − . = − ^{− × − .5} + ^{− × − .5} =− .

= − . … … … ….

At node3 the value of the signal from Eq(n) and Eq(e)

= + + − × − . − .

= . … … … …

Therefore the value of y for the input x1=0 and x2=1 will be from Eq(o) and Eq(g)

= � . = … … … …

Since,

. > . ℎ . ℎ ℎ ℎ

CASE 4:- x1=1 and x2=1

At node1 the signal value will be from Eq(a)

= + − . = . … … … . .

At node2 the signal value will be from Eq(q) and Eq(c)

= � . = − ^{− × .5} + ^{− × .5}= .

= . … … … ….

At node3 the signal value will be from Eq(r) & Eq(e)

= + + (− × . ) − .

= − . − . = . … …

Therefore the value of y for x1=1 and x2=1 from Eq(s) &

Eq(g) will be

(5)

= � . = … … … . Since,

. < . ℎ . ℎ ℎ ℎ

From Eq(h), Eq(l), Eq(p), Eq(t) and Table-I we are able to conclude that the proposed minimum configuration MLP is able to solve the XOR problem.

V. C^ONCLUSIONS

During this research work while deriving the solution for XOR problem we saw several instances where for solving non linearly separable problems Multilayer perceptron is a very good solution. At last we concluded that it is possible to construct minimum configuration MLP for solving XOR problem. It is also concluded that although a MLP should be constructed with neurons having non linear elements. A special class of MLP exist which is called minimum configuration MLP where in the output layer linear elements do exist.

ACKNOWLEDGMENT

I place my sincere thanks to my mother, wife, brothers, sister and my children for their support in carrying out this work.

REFERENCES

[1] W.S. McCulloch and W. Pitts, “A logical calculus of the ideas immanent in nervous activity”, Bull. Math. Biophy., vol. 5, pp. 115- 133, 1943.

[2] D.O. Hebb, The Organization of Behaviour: A Neuropsycological Theory, New York:Wiley, 1949.

[3] M.L. Minsky, “Theory of neural-analog reinforcement systems and its application to the brain-model problem”, Ph.D. thesis, Princeton University, Princeton, NJ, 1954.

[4] M. Minsky, “Steps toward artificial intelligence”, Proceedings of the IRE, vol. 49, pp. 5-30, 1961.

[5] M.L. Minsky and S.A. Papert, Perceptrons, Cambridge, MA:MIT Press, 1969.

[6] M.L. Minsky and S.A. Papert, Perceptrons, expanded ed., Cambridge, MA:MIT Press, 1990.

[7] F. Rosenblatt, “The perceptron: A probabilistic model for information storage and organization in the brain”, Psychological Review, vol. 65, pp. 386-408, 1958.

[8] B. Widrow and M.E. Hoff, “Adaptive switching circuits”, IRE WESCON Convention Record, vol. 4, pp. 96-104, Aug. 1960.

[9] B. Widrow and S.D. Stearns, Adaptive Signal Processing, Englewood Cliffs, NJ:Prentice Hall Inc., 1985

[10] D.E. Rumelhart and D. Zipser, “Feature discovery by competitive learning”, Cognitive Sci., vol. 9, pp. 75-112, 1985.

[11] D.E. Rumelhart and J.L. McClelland, Parallel Distributed Processing:Explorations in the Microstructures of Cognition, vol. 1, Cambridge, MA:MIT Press, 1986.

[12] D.E. Rumelhart, G.E. Hinton and R.J. Williams, “Learning internal representations by error backpropagation”, Parallel Distributed Processing: Explorations in the Microstructures of Cognition, vol. 1, Cambridge, MA:MIT Press, pp. 318-362, 1986.

[13] D.E. Rumelhart, P. Smolensky, J.L. McClelland and G.E. Hinton,

“Schemata and sequential thought processes in PDP models”, Parallel Distributed Processing: Explorations in Microstructure of Cognition, vol. 2, Cambridge, MA:MIT Press, pp. 7-57, 1986.

[14] P.J. Werbos, “Beyond regression: New tools for prediction and analysis in the behavioural sciences”, Ph.D. thesis, Harvard University, Cambridge, MA, 1974.

[15] P.J. Werbos, The roots of backpropagation: from ordered derivatives to neural networks and political forecasting, New York:John Wiley, 1994.

[16] J.J. Hopfield, “Neural Networks and physical systems with emergent collective computational capabilities”, Proceedings of the National Academy of Sciences (USA), vol. 79, pp. 2554-2558, Nov. 1982.

[17] G.E. Hinton and T.J. Sejnowski, “Learning and relearning in Boltzman machines”, Parallel Distributed Processing: Explorations in the Microstructure of Cognition, vol. I, Cambridge, MA, MIT Press, pp. 282-317, 1986.

[18] V.K. Singh, “One solution to XOR problem using Multilayer Perceptron having minimum configuration”, Accepted for Publishing in International Journal of Science and Engineering, vol. 3, 2015.