A Stochastic Computing Method For Generating Activation Functions in Multilayer Feedforward Neural Networks

(1)

Content of this journal is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

A SC Method For Generating Activation Functions in ANN

Ersoy and Erkmen.

Corresponding Author:

Durmuş Ersoy E-mail:

[email protected] Received: April 21, 2021 Accepted: June 22, 2021

Available Online Date: August 23, 2021

DOI: 10.5152/electr.2021.21043

ORIGINAL ARTICLE

A Stochastic Computing Method For Generating Activation Functions in Multilayer Feedforward Neural Networks

Durmuş Ersoy

¹

, Burcu Erkmen

²

1Department of Electrical and Electronics Engineering, İstanbul Esenyurt University, İstanbul, Turkey

2Department of Electronics and Communication Engineering, Yıldız Technical University, İstanbul, Turkey

Cite this article as: D. Ersoy and B. Erkmen. “A Stochastic Computing Method for Generating Activation Functions in Multilayer Feedforward Neural Networks.”, Electrica. August 23, 2021. DOI: 10.5152/electr.2021.21043.

ABSTRACT

Stochastic computing using basic arithmetic logic elements based on stochastic bit sequences provides very beneficial solutions in terms of speed and hardware cost, relative to deterministic calculation. Studies for the realization of tangent hyperbolic and exponential functions used in the development of activation functions in Artificial Neural Networks by stochastic methods exist in the literature. The techniques presented using state transitions on finite state machines were constructed on the basis of two different forms of finite state machines, one-dimensional (Linear) and two-dimensional. In this analysis, in terms of both error rate and circuit cost, the advantageous two-dimensional finite state machines-based stochastic computing approach for tangent hyperbolic and exponential functions is presented. The presented approach is implemented on Field Programmable Gate Array and the results are given for hardware simulation. The dataset used for the classification process in a decentralized smart grid control has been applied to the multilayer feedforward neural network and deterministic computing, for the stability classification which is carried out separately with the linear finite state machines-based stochastic computing and the proposed 2D finite state machines-based stochastic computing methods.

Index Terms—Field Programmable Gate Array, finite state machine, multilayer feedforward neural networks, stochastic activation function, stochastic computing.

Electrica 2021; XX(XX): 1-13

I. INTRODUCTION

In recent years, new data processing methods have been reported in the literature for techniques such as deep learning and image processing, where data density is high (big data) and computing time is long. In applications with high data density, the circuit costs and speed are very significant [1]. Recently, stochastic-based computing techniques have become the preferred approaches for processing big data. The greatest benefit of stochastic computation (SC), which was first implemented in the 1960s, is that by using regular logic elements, it can execute arithmetic operations very rapidly and at a very low cost. SC calculates the probabilistic value of data by means of a randomly ordered binary representation [2].

Stochastic computation has been used for image processing [3,4], energy management systems [5], cloud computing [6], game strategies [7], and artificial intelligence applications [8], as reported in the literature, for the last ten years. SC methods are especially used in those calculations which are widely employed during the training of ANN. The gradient descent optimization algorithm for training in ANN is used for stochastic representation bit sequences [9]. In yet another work, the back propagation algorithm (SC-MLP) was suggested for updating the layer weights of the artificial neural network [10]. For convolutional neural networks (CNN), the stochastic ReLU function (sReLU) is designed using unipolar stochastic computing hardware [11]. Additionally, an automated architecture based on SC has been proposed for deep convolution neural networks (DCNN), guided by circuit cost criteria and taking into account the overall accuracy [12].

In Ji et al. [13], it has been shown that the error rate in neural network simulations decreases as the bit stream length increases in SC. As the length of the bit increases, it leads to a

(2)

of the absolute exponential function and the absolute value function was proposed. Additionally, the image processing application was rendered using this absolute value function in Li et al [15]. The Accumulative Parallel Counter (APC) was used to prevent the vulnerable failure effect of the scaled addition that could lead to potential incorrect results [16]. In the case of DNN, weights are pre-processed, and the APC for weighted sum and linear FSM-based stochastic computing techniques for activation function generation has been included in the literature [17]. Integral Stochastic Computation (ISC) was introduced as a binary type of APC performance, demon- strating that it is not suitable for structure of the Deep Belief Networks (DBN), in Ardakani et al. [18]. In addition to the linear FSM architecture, different functions are realized with the aid of Bernstein’s polynomial extended to two-dimensional FSM (2D-FSM) [19]. Generation of a Gaussian function for radial-based function networks is presented using linear and 2D-FSM-based SC [13, 20]. In [21], using the 2D-FSM architecture, the softmax function in the output layer of Spiking Neural Networks (SNN) is designed. In the study, state transition in 2D-FSM states is realized depending on the parameters of the SNN. In addition, the size of the FSM depends on the number of neurons, and the weighted sums of node inputs control the state transitions.

Generally, we can split the SC literature studies into four classes.

The first category consists of research on the basic principles of probabilistic logic architecture after 1956 [22]. The second category is the concept of SC and the development of SC-related general purpose computers [2,23]. The third category involves the use of SC in special fields such as ANN and hybrid control- lers [24–26]. As the last category, error correction studies and new general purpose architectures for SC are presented [27,28].

In this study, the 2D-FSM SC-based approach is presented for generating activation functions in multilayer feedforward neural networks. In comparison to other studies, the stochastic sequence was scaled and provided as an input to 2D-FSM.

Furthermore, the stochastic representation of the function was obtained from the output of 2D-FSM, without the need for any post-processing.

The stochastic activation functions based on the proposed 2D-FSM method and the stochastic weighted sum method were used in the feedforward process of multilayer perceptrons.

in Section 5.

II. STOCHASTIC COMPUTING

In stochastic computing, numbers are represented using arbi- trary bit sequences. The real value of the stochastic sequences depends on the statistics of each bit. Let X 

^{0 1}^, ^{mean a}

random sequence of stochastic representations. To represent a real number in the x 0 1, range, only a bit sequence is given in (1):

Pr(X) = x (1) Here, Pr X

demonstrates the real value of a stochastic rep- resentation for every real x number. It is known in the unipolar form. The bipolar form is another widely used format defined by setting representation in the x  11, range, and the equation is stated in (2):

Pr X

x¹

2 (2)

During stochastic calculation, it is possible to obtain stochastic representations of the real number x at different bit lengths.

For instance, the stochastic bit representation sequences given as (1,0,0,0), (0,1,0,0), and (0,1,0,0,0,1,0,0) in unipolar format are probabilistic representations of 0.25. In addition, the predicted error in stochastic representation is minimized by increasing the length of bits.

A. Addition in Stochastic Computing

The sum of the two numbers in the range [0–1] is in the range [0–2]. Therefore, scaled addition operations are used in SC to keep the addition results in the range [0–1]. Addition in SC is typically performed using scaled adders or OR gates [2].

In some systems, a multiplexer (MUX) is used to execute the adder. The output of the scaled adder is given in Y, the equation is given in (3):

Y A S B

¹ S

⁽³⁾

If the chosen value S is a stochastic bit sequence with a probability of 0.5, as seen in Fig. 1 (a), the predicted value of Y is (Pr(A) + Pr(B))/2. This helps the output of a two-input scaled adder to minimize the domain of each encoding format by a

(3)

factor of two and restrict the output to a range of [0–1]. The S value must be set with 1/M probability for an MUX with M-inputs. In this way, operations such as the weighted sum used in ANN, involving several entries, can be done effec- tively. The disadvantages of the scaled adder with M-inputs are that it does not take into account data in other inputs. A longer bit sequence should be used to achieve the desired accuracy.

OR gates can also be used as adders in the stochastic domain, as seen in Fig. 1(b). The output of the OR gate, Y, with A, B stochastic representation inputs is also represented as in (4):

Y A B A B (4)

In (4), Pr A B

must be set close to 0 if only OR gates are used.

The inputs must then be scaled to ensure that the above condi- tions are satisfied. Adders in the stochastic domain constructed with OR gates also cause high errors due to losses caused by the scaling factor. In order to overcome these errors, the bit lengths of the stochastic representations of real numbers are kept as long as possible, and need storage space, which often results in a longer processing time.

The APC is used to solve sensitive losses due to high errors in scaled quantities and to avoid a decrease in processing speed, when increasing the bit length of the stochastic representations [17]. The APC takes K parallel bits as input and adds them to the counter for each clock cycle of the machine. Although the addition is carried out in parallel, it results in low latency [17]. This adder is thus restricted to situations where an inser- tion is made, in order to produce a result which includes an intermediate result in a binary format.

B. Multiplication in Stochastic Computing

As seen in Fig. 2, the multiplication of two stochastic bit sequences is performed using the AND and XNOR gates in unipolar and bipolar coding formats, respectively. The multiplication of the stochastic representations of inputs A and B in stochastic format is determined in the unipolar encoded format as in (5). Multiplication in the bipolar format is given in (6):

Y AND A B

^, A B⁽⁵⁾

Y XNOR A B

^, OR A B

^{, 1}

A

¹ B

⁽⁶⁾

Real values of stochastic representations are obtained in (6) using (7):

Pr

Y Pr

A B ^{Pr 1}

A

¹ B

⁽⁷⁾

Equations (8) and (9) are obtained when the input stochastic representation sequences are independent;

Pr

Y Pr

APr

B ^Pr

¹A

^Pr

¹B

⁽⁸⁾

y ² ^Pr

Y ^{1 2}

^Pr

A ¹

² ^Pr

B ¹

⁽⁹⁾

The basic components of SC are circuits, which convert binary to stochastic values and convert stochastic to binary numbers.

In each clock cycle, the random number generator using the L-bit input binary number provides the stochastic representation. If the random number's bit values are less than the binary number's bit values, the output bit value is 1, otherwise 0. Given that the random numbers are uniformly distributed over the [0–1] range, the probability that 1 will appear at the Fig. 1. Stochastic addition implemented using (a) MUX and (b) OR gate.

(4)

comparator output in each clock cycle is equal to the binary input of the converter, represented as a fractional integer. The translation of a stochastic number to a binary number is much simpler. The probabilistic value of the stochastic number, p value, is represented as the ratio of the number of 1s in the bit- stream to the length of the string. Therefore, counting these 1s is sufficient to find p.

C. Generation of Functions in Stochastic Computing Various functions can be generated in SC, in addition to arithmetic operations. In general, the FSM architecture-based computation methods are classified into two classes, Linear FSM and 2D-FSM architectures.

In the literature, linear FSM-based applications are commonly used to realize activation functions in ANN. With the linear FSM-based stochastic computing method in Li et al. [4] and Brown and Card [14], the exponential, tangent hyperbolic, and linear gain functions have been introduced. For generating the Stochastic Tangent hyperbolic (SCtanh) and the Stochastic Exponential (SCexp) functions as defined in Brown

and Card [14], the state transition diagram of a linear FSM is given in Fig. 3. In (10) and (11), the linear FSM-based tangent hyperbolic function and exponential function expressions are given. The bit sequence of the stochastic representation is applied to the input, namely X, as shown in Fig. 3, while the bit sequence of the output is Y. The FSM can change from one state to another in response to inputs. When determining the output bits, it is tested whether the state number exceeds the states assigned as offset for each bit entry, and the output is assigned as a 0 or 1 binary sequence variable. The output bit for a linear FSM with instantaneous state Si and number of states n; the output bit for the SCtanh function is 0 in the case of a 0

2 1

i n interval; in other situations, the output bit is 1.

In the case of the 0 i n G 1 interval, the output bit is 1 for the SCexp function. Here, G is used for fine tuning and stands for linear gain:

Tanh n x

SCTanh n X

2 , (10)

Exp

² G x

SCExp n X G

^{, ,}

⁽¹¹⁾

Fig. 3. Linear FSM-based state transition diagrams. (a) SCExp (X, n, G), (b) SCTanh (X, n).

Fig. 2. Stochastic multiplication implemented using (a) AND and (b) XNOR gates.

(5)

Inputs and outputs for SCTanh and SCExp functions are in unipolar and bipolar format, respectively.

For functions that cannot be approximated in linear FSM-based stochastic calculation, the 2D-SDM-based stochastic calculation method using the Bernstein polynomial is presented in Li et al. [19]. The inputs consist of the stochastic sequence of the number whose function value is to be determined, the stochastic sequence of the bit added as an additional input, and the stochastic representations of coefficients of Bernstein polynomials. While the output of linear FSM-based systems is obtained by comparing the offset state with the present state [14], in 2D-FSM-based systems, the polynomial coefficients are selected according to the output present state number. The output is obtained by multiplying the input stochastic bit sequence with these selected coefficients [19,29].

III. A STOCHASTIC COMPUTING METHOD BASED ON 2D-FSM

A SC approach based on 2D-FSM is introduced in this work. The proposed approach is based on the 2D-FSM state transition diagram provided in Li et al. [19], for the realization of tangent hyperbolic and sigmoid functions used as activation functions in multilayer feedforward neural networks, in order to obtain fast and low-cost solutions. The stochastic sequence that forms the input was scaled and applied to 2D-FSM to generate activation functions. In the proposed form, the state transition happens in 2D-FSM according to the input bit values, and the output bit value is obtained after being compared with the offset state. In order to generate the bit sequence at the output, this process proceeds iteratively. In this way, an external stochastic representation denoted by K was not used, unlike the procedure in the literature, and the diagonal offset line for the 2D-FSM calculation was used in the decision boundary. The stochastic representation of the activation function is obtained without the need for any additional post-processing at the output of the 2D-FSM.

Therefore, the value of the real number is defined as x and the sequence of the stochastic representation of the real number is defined as X. Here, X is the array composed of “0” and “1.” The stochastic sequence X with bit length N is converted into two sequences, called X₁ and X₂ in such a way that the sequence is N = L/2. Here, the meanings are X1[ [ ], [ ], , [X 0 X1 …X L/2 1] ] and X2[ [X L/2], [X L/2 1] ], , [ …X L1]].

The 2D-FSM state structure format can be shown as a matrix with the number of columns n, the number of rows m, and the total number of n × m states. The tangent-hyperbolic (S2Tanh) function using 2D-FSM is defined in (12) and the exponential function (S2Exp) is given in (13). The inputs and outputs for the S2Tanh and S2Exp functions must be in bipolar and unipolar format, respectively:

Tanh x

n S Tanh X X n m

6 2 1 2

^, ^{, ,}

⁽¹²⁾

Exp

n ¹

x

S Exp X X n m²

¹^, ²^{, ,}

⁽¹³⁾

The position of the states is designed in a square matrix for- mat when defining the 2D-FSM architecture, and n = m is thus selected. The 2D-FSM state transition is modeled as a Time–

Homogenous Markov chain, as defined in the literature [19].

Based on Markov’s chain theory, 2D-FSM has an equilibrium state distribution [30]. It is classified as the present condition S_t. Here, t is defined as 0 t m n 1. In the case of X1=0, X2=0, while the present state is S_t, the next state is St−1. The present state in the case of X1=1, X2=1 isSt+1. The present state in the case of X1=1, X2=0 is St n+ . The present state in the case of ,

X2=1 will be St n+ .

The probability of the t state is defined as PSt. State transitions have an even distribution of probabilities, according to the Markov chain theorem. Defining it as t j n k , here j and k represent the respective 2D-FSM row and column numbers.

The state transitions between the columns are expressed in (14) and the state transitions between the rows are expressed as in (15), if we describe the state placement in matrix form:

PSj n k PX1 PX2PSj n k 1

1 PX1

1 PX2

(14) PS _j1 _{n k} PX1

1 PX2

PSj n k

1 PX1

PX2 (15) According to the Markov chain theory, the number of all probabilities in (14) and (15) is equal to 1, as defined in (16):

j m

k n

PS_{j n k}

0

1

0 1

1 (16)

The 2D-FSM states are used as a counter when evaluating the bits of the stochastic representation to be obtained from the 2D-FSM output, and are compared with the offset line in each row. By comparing the state with the offset value in the row to which it is related, the output bits are determined. If the present state is greater than the offset state of the row to which it is connected, the outputs for the S2Tanh and S2Exp functions are one and zero, respectively.

The present state S_t, t is the present state number according to the 2D-FSM architecture, and the offset value is determined for each row as defined in (17) and (18):

a t

=n (17)

toffset a n

¹

⁽¹⁸⁾

The row number of the current state is obtained using (17) and the offset state is derived by (18). Here, a is the corresponding row, and n is the number of 2D-FSM columns. A pseudo code is provided for calculating the 2D-FSM-based S2Tanh function, given in Algorithm 1.

(6)

A. Generation of the S2Exp Function

Offset boundaries on 2D-FSM during S2Exp function generation are seen in Fig. 4. The output bits are obtained for each row according to (19). Inputs and outputs are set in the unipolar format for the S2Exp:

Y S offset S offset

t t

1 0 ,

, (19)

B. Generation of the S2Tanh Function

The offset boundaries on the 2D-FSM for S2Tanh function are the same form as in the S2Exp function, and the state transition diagram is seen in Fig. 5. Using (20), the output bits for each row are obtained. Inputs and outputs are set in a bipolar format for the S2Tanh function:

Y S offset S offset

t t

0 1 ,

, (20)

C. Generation of the S2Sig Function

As seen in (21), the sigmoid function can be obtained by using the hyperbolic tangent function. The S2Sig function is computed by (22):

 x

Tanh x

1 2

2 (21)

S Sig x

S Tanh x 2

1 2 2

2

(22)

IV. EXPERIMENTAL STUDIES

In this section, activation functions have been simulated with the Python 3 programming language using the Google Colab interface. The hardware simulations were realized by the Xilinx

ISE program, and the generated bit files embedded into an FPGA Nexys DDR4 Artix-7 chip.

A. Simulation Results

The original function and the S2Tanh approximation results for Tanh (2×) are seen in Fig. 6. 2D-FSM is set in the size of a 3 × 3 state matrix format. Outputs are obtained for the stochastic representation sequences of 32, 64, and 128 bit lengths, respectively.

The function generation for S2Sig(x) using stochastic bit sequence entries with the lengths of 32, 64, and 128 bits is demonstrated in Fig. 7. The 2D-FSM state matrix dimensions are set as 12 × 12, in order to obtain the S Tanh2

2

x

function used during the generation of the S2Sig(x) function.

For inputs with 16, 32, and 64 bit lengths of stochastic representation to generate exponential functions, the outputs are obtained from (13), as demonstrated in Fig. 8. The 2D-FSM state matrix dimensions are set as 2 × 2.

Fig. 6. Tanh (2×) function and S2Tanh approximations.

Fig. 5. 2D-FSM architecture used for S2Tanh function.

Fig. 4. 2D-FSM architecture used for S2Exp function.

(7)

As can be seen in Fig. 6–8, the constraints of bit length are the main subject for output values. The effect of each bit on the stochastic representation value of the output will also be 0.125, 0.0625, and 0.03125 , if the stochastic representation is designed for 8, 16, and 32 bit lengths, respectively.

The mean square error (MSE) is presented in Table I, considering possibilities according to the input bit lengths for the presented 2D-FSM-based measurement. Here, the complete probability numbers are determined as 2⁸, 2¹², 2¹⁶, and 2²⁴ for 8, 12, 16 and 24 bits respectively. The dimensions of the 2D-FSM are set as 6x6 and 2x2 for S2Tanh(x) and S2Exp(-3x) respectively.

In Table I, the linear FSM-based tangent hyperbolic and exponential functions are computed using the methods provided in (12) and (13), as stated in [14,15]. The SCExp function equation is used in (13) for G = 1.

B. Hardware Simulation

A comparative performance for FPGA implementation according to linear FSM-based SC and 2D-FSM-based SC for Tanh(2x) considering power consumption, delay, and resource

utilization, is given in Table II. The input bit length is selected to be 16 for the 2D-FSM, and the output is selected to be 8 bits.

Input data and output data are chosen as 8 bits for the linear FSM. The state layout dimensions are also set as 3 × 3 and 1 × 4, respectively, for 2D-FSM and linear FSM.

V. STOCHASTIC COMPUTATIONAL MULTILAYER PERCEPTRONS

In this study, the classification performance of the multilayer feedforward neural network was investigated by deterministic and SC methods. The Electrical Grid Stability Simulated Data (EGSSD) dataset was used to analyze the accuracy of MLP for the Decentralized Intelligent Grid Control (DSGC) model [31].

The 2D-FSM-based MLP architecture where SC is performed is seen in Fig. 9. The dataset is translated by the deterministic to stochastic number converter (D2S). Randomly sequenced bits and the probability matching values are generated by the D2S unit. As both negative and positive values are included, our conversion data are in a bipolar format. The mathematical expression of the conversion is given in (23). The number of 1s is calculated from (24) and the remaining elements are “0.” The output is achieved by arbitrarily adjusting the position of the sequence bits:

x ² ^Pr

X ¹⁽²³⁾

Fig. 7. Approximation results using σ(2×) function and the proposed 2D-FSM-based SC method.

Fig. 8. Exp (−3×) function and S2Exp approximation results.

TABLE I. COMPARATIVE MSE RESULTS OF METHODS FOR VARIOUS BIT LENGTH

L

Tanh(x) Exp(−3x) σ(2x)

[14] S2Tanh (x) [14] S2Exp (−3x) S2Sig (2x)

8 1.1027 0.4554 0.3873 0.1825 0.1224

12 0.7727 0.3673 0.3108 0.1729 0.1163

16 0.6241 0.3321 0.2687 0.1697 0.1149

24 1.0386 0.4649 0.3480 0.1636 0.1171

TABLE II. A COMPARATIVE FPGA IMPLEMENTATION PERFORMANCE FOR SC METHODS

FPGA Performance

Criteria Linear FSM

SCTanh (2×) Proposed 2D-FSM S2Tanh (2×)

State architecture 1 × 4 3 × 3

Power consumption 82 mW 82 mW

Max delay 7.75 ns 4.73 ns

Number of slices LUTS 1% utilization 3-used

1% Utilization 6-Used Number of slices

regular 1% utilization

2-used 1% utilization

4-used

(8)

s int x N

¹

2 (24)

The data obtained from D2S is transferred to the MLP input layer. After all computations are done in the stochastical form in MLP, the stochastic output Y is sent to the S2D unit in order to convert it to deterministic values. The conversion process of the S2D is in unipolar format, and the conversion expression is given in (25). In this equation y is the output of the feedforward neural networks:

y ² ^Pr

Y ¹⁽²⁵⁾

A. Electrical Grid Stability Dataset

Electrical grids need a balance between the supply and demand of electricity in order to be stable. The traditional schemes accomplish this equilibrium by the demand-driven production of electricity. The growth in the usage of renewable energy sources has made the production of electricity irregular.

In addition, there has been a change in demand for power due to the instantly changing application of electricity rates on the electrical grid.

The Electrical Grid Stability Simulated Data Data Set (EGSSD) dataset was used in this work to classify the reliability of the electricity grid. The EGSSD dataset comprises 10 000 data. The four-node power grid is simulated with localized generation. The properties of the data gathered for an average of two seconds are explained in the following order.

tau[x]: Time of reaction of the participant.

p[x]: Rated consuming electricity. Power consumed (negative)/

power generated (positive). Here, p1 = |p2 + p3 + p4| is a non- predictive function p1.

g[x]: Coefficient in proportion to price elasticity. The g1 is the value for the electricity producer.

stabf: System stability mark (categorically: stable/unstable) In the dataset, tau1, tau2, tau3, tau4, p1, p2, p3, p4, g1, g2, g3, and g4 reflect the data of manufacturers and customers. \ text{stabf\_stable} data are the target data that provide infor- mation on the reliability of the power grid system. While the data p[x] from the input data have negative values, the other data are positive.

B. Stochastic Computational MLP Performance

During the MLP training without cross validation (CV), three- fold and five-fold CV training approaches were applied. All data were normalized within [–1, 1]. For weight updates during training, the stochastic gradient descent method was used in back propagation. The MLP network consists of an input layer, two hidden layers, and an output layer. The tangent hyperbolic function was used as the activation function in the hidden layers for deterministic approximation, and the sigmoid function in the output layer. S2Tanh in (12) for hidden layers and S2Sig functions in (22) for the output layer were used for 2D-FSM- based SC.

Fig. 9. 2D-FSM-based MLP architecture.

(9)

Stochastic conversion was applied to the input of networks and output of networks using D2S and S2D units. Average error according to bit lengths was determined for the data in the range [–1, 1] for the S2Tanh function, as shown in Fig. 10. All test data were transformed into 1024-bit random sequences in the D2S unit. As shown in the figure, the lowest measured error value was 9% for 1024 bits.

AND gates were used for the multiplication. In order to deter- mine the NET value of the neuron, the multiplication results were sent to the sum block. The integer addition method suggested in Ardakani [18] was used for the addition operation.

When MUX was used for addition in SC, the errors of the network would be high. In this process, adding is realized bit by bit, and the data can take [0, m] values for unipolar format in a series with a scaling factor m, while in bipolar format, [−m, m]

values can be taken.

The NET values obtained from the weighted sum block were passed to the S2Tanh function, which is the hidden neuron activation function, as shown in Fig. 11. Here, the state matrix in the S2Tanh model was set to 6x6 for the Tanh(x) function.

The sequence at the output of the tanh activation function was obtained in length N/2. It is a fact that reducing the sequence length would increase the error much further. High errors are then obtained, as the length of the output layer is N/8 and decreases by half in each layer. For this reason, to achieve the N length sequence, the sequence obtained at the output of the S2Tanh activation function was duplicated.

In the output layer of MLP, the sigmoid activation function was used. The state matrix for the SCTanh x

2

function was set as 12 × 12, using the transformation from the tangent hyperbolic function to the sigmoid function, as defined in (22).

In the S2D unit, the stochastic output bit sequence was converted to deterministic values. The multilayer feedforward neural network was trained using three training approaches:

without CV, and with three-fold and five-fold CV. The percent- age distributions of the training and test data were given as 67 and 33 respectively for the training without CV. The value

“0” (“unstable”) was applied to values of y less than 0.4 and, in other situations, it was “1” (“stable”).

The classification accuracy percentages for MLP performance using deterministic, linear FSM, and the proposed computation methods are shown in Table III.

Fig. 12 Shows the ROC (Receiver Operating Characteristic) curve and AUC (area under the curve) for the classification performance of MLP designed with the proposed 2D-FSM-based stochastic calculation method.

Fig. 11. The stochastic computation for hidden neurons.

Fig. 10. S2Tanh function error graphic depending on bit length.

(10)

VI. CONCLUSIONS

In this study, a 2D-FSM-based stochastic computing approach is introduced for exponential and tangent hyperbolic functions. There are four key issues reported as current approaches in the literature; circuit cost, power consumption, error rate, and latency in increasing bit lengths. The presented method has two times more resource usage and it consists of 2.25 times more states than linear FSM, in Table II. In the same table, the presented method runs 1.63 times faster than the 1D-FSM-based SC architecture. In terms of error rate, this method minimized errors arising from SCs of tangent hyperbolic and exponential functions. Furthermore, the presented approach is more beneficial in terms of data processing power than other 2D-FSM-based approaches, since the bit length is decreased by 50% according to other methods. In addition, there is no need for an additional MUX block and the coefficient for an external input, compared to the exist- ing 2D-FSM-based models in the literature. Furthermore, it is demonstrated that as the bit length increases, the mean square errors decrease.

The results show that the presented approach provides a useful alternative method to the approaches in the literature in terms of error rate and speed comparing the linear FSM and 2D-FSM- based stochastic computing methods. In addition to this, in terms of resource utilization on the hardware, the presented

The presented 2D-FSM approach has been evaluated for smart grid regulation using the multilayer feedforward neural network. Due to the randomness and the normalization process, this method obtained 12% less accurate results than deterministic methods. The presented method provides 18% more accuracy compared to linear FSM-based stochastic computing for the EGSSD dataset. As a consequence, 2D-FSM architecture can be used for real-time smart grid control, where data density is high and speed is important.

Peer-review: Externally peer-reviewed.

Author Contributions: Concept – D.E., B.E.; Design – D.E., B.E.;

Supervision – D.E., B.E.; Materials – D.E., B.E.; Data Collection and/or Processing – D.E., B.E.; Analysis and/or Interpretation – D.E., B.E.;

Literature Search – D.E., B.E.; Writing Manuscript – D.E., B.E.

Conflict of Interest: The authors have no conflicts of interest to declare.

Financial Disclosure: The authors declared that this study has received no financial support.

REFERENCES

1. J. Li, A. Ren, Z. Li, C. Ding, B. Yuan, Q. Qiu, and Y. Wang. “Towards acceleration of deep convolutional neural networks using sto- chastic computing,” 22nd Asia and South Pacific Design Automa- tion Conference (ASP-DAC), pp. 115–120, Jan. 2017.

2. B. R. Gaines, “Stochastic computing systems,” Advances in Informa- tion Systems Science. Tou J.T. (eds) Boston, MA: Springer, 1969, pp.

37–172.

3. P. Li and D. J. Lilja, “Using stochastic computing to implement digital image processing algorithms” 29th International Confer- ence on Computer Design (ICCD), pp. 154–161, October 2011.

4. P. Li, D. J. Lilja, W. Qian, K. Bazargan and M. D. Riedel, “Computation on stochastic bit streams digital image processing case studies,”

IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol.

22, no. 3, pp. 449–462, April 2013.

5. X. Zeng and J. Wang, “A parallel hybrid electric vehicle energy management strategy using stochastic model predictive control with road grade preview,” IEEE Transactions on Control Systems Technology, vol. 23, no. 6, pp. 2416–2423, Nov. 2015.

6. J. Zheng, Y. Cai, Y. Wu and X. Shen, “Dynamic computation offload- ing for mobile cloud computing: A stochastic game-theoretic Fig. 12. ROC curve for the proposed SC-based MLP classifier.

(11)

approach,” IEEE Transactions on Mobile Computing, vol. 18, no. 4, pp. 771–786, June 2018.

7. G. Arslan and S. Yüksel, “Decentralized Q-learning for stochastic teams and games,” IEEE Transactions on Automatic Control, vol. 62, no. 4, pp. 1545–1558, Aug. 2016.

8. B. Li, Y. Qin, B. Yuan and D. J. Lilja, “Neural network classifiers using stochastic computing with a hardware-oriented approximate activation function,” IEEE International Conference on Computer Design (ICCD), vol. 66, no. 7, pp. 97 –104, Nov. 2017.

9. S. Liu, H. Jiang, L. Liu and J. Han, “Gradient descent using stochastic circuits for efficient training of learning machines,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 37, no. 11, pp. 2530–2541, Oct. 2018.

10. Y. Liu, S. Liu, Y. Wang, F. Lombardi and J. Han, “A stochastic computational multi-layer perceptron with backward propagation,”

IEEE Transactions on Computers, vol. 67, no. 9, pp. 1273–1286, March 2018.

11. J. Yu, K. Kim, J. Lee and K. Choi, “Accurate and efficient stochastic computing hardware for convolutional neural networks,” IEEE International Conference on Computer Design (ICCD), pp. 105–112, Nov. 2017.

12. Z. Li, J. Li, A. Ren, C. Ding, J. Draper, Q. Qiu, B. Yuan, and Y. Wang,

“Towards budget-driven hardware optimization for deep convo- lutional neural networks using stochastic computing,” IEEE Com- puter Society Annual Symposium on VLSI (ISVLSI), pp. 28–33, July 2018.

13. Y. Ji, F. Ran, C. Ma and D. J. Lilja, “A hardware implementation of a radial basis function neural network using stochastic logic,”

Design, Automation & Test in Europe Conference & Exhibition (DATE), pp. 880–883, Mar. 2015.

14. B. D. Brown and H. C. Card, “Stochastic neural computation. I.

Computational elements,” IEEE Transactions on Computers, vol. 50, no. 9, pp. 891–905, Sept. 2001.

15. P. Li, D. J. Lilja, W. Qian, M. D. Riedel and K. Bazargan, “Logical computation on stochastic bit streams with linear finite-state machines,” IEEE Transactions on Computers, vol. 63, no. 6, pp.

1474–1486, June 2014.

16. P. S. Ting and J. P. Hayes, “Stochastic logic realization of matrix operations,”17th Euromicro Conference on Digital System Design, pp. 356–364, Oct. 2014.

17. K. Kim, J. Kim, J. Yu, J. Seo, J. Lee and K. Choi, “Dynamic energy- accuracy trade-off using stochastic computing in deep neural networks, ” Proceedings of the 53rd Annual Design Automation Con- ference, pp. 1–6, June 2016.

18. A. Ardakani, F. Leduc-Primeau, N. Onizawa, T. Hanyu and W. J.

Gross, “VLSI implementation of deep neural network using inte- gral stochastic computing,” IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 25, no. 10, pp. 2688–2699, October 2017.

19. P. Li, D. J. Lilja, W. Qian, K. Bazargan and M. Riedel, “The synthesis of complex arithmetic computation on stochastic bit streams using sequential logic,” Proceedings of the International Conference on Computer-Aided Design, pp. 480–487, Nov. 2012.

20. D. Ersoy and B. Erkmen, “Stochastic Gaussian function for RBF net- work,” IEEE Transactions on Bio-Medical Engineering International Conference on Electrical, Communication, and Computer Engineer- ing (ICECCE), pp. 1–3, June 2020.

21. S. C. Smithson, K. Boga, A. Ardakani, B. H. Meyer and W. J. Gross,

“Stochastic computing can improve upon digital spiking neural networks,” IEEE International Workshop on Signal Processing Sys- tems (SiPS), Dallas, TX, USA, pp. 309–314, Oct. 2016.

22. J. Von Neumann, “Probabilistic logics and the synthesis of reliable organisms from unreliable components.”, In C. Shannon and J.

McCarthy,editors, Automata Studies, Vol. 34, pp. 43–98, Princeton University Press, 1956.

23. W. J. Poppelbaum, C. Afuso and J. W. Esch, “Stochastic computing elements and systems,” Proceedings of the November 14-16, 1967, Fall Joint Computer Conference, pp. 635–644, Nov. 1967.

24. Md. A. Abeed, and S. Bandyopadhyay, “Sensitivity of the power spectra of thermal magnetization fluctuations in low barrier nanomagnets proposed for stochastic computing to in-plane bar- rier height variations and structural defects,” Spin, vol. 10, no. 1, Oct. 2020.

25. T. Hirtzlin, B. Penkovsky, M. Bocquet, J. Klein, J. Portal and D.

Querlioz, “Stochastic computing for hardware implementation of binarized neural networks,” IEEE Access, vol. 7, pp. 76394–76403, June 2019.

26. O. Camps, S. G. Stavrinides and R. Picos, “Stochastic computing implementation of chaotic systems,” Mathematics, vol. 9, no. 4, p. 375, Feb. 2021.

27. V. C. Gaudet and A. C. Rapley, “Iterative decoding using stochastic computation,” Electronics Letters, vol. 39, no. 3, pp. 299–301, Feb.

2003.

28. S. I. Chu, C. E. Hsieh and Y. J. Huang, “Design of FSM-based function with reduced number of states in integral stochastic computing,” IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 27, no. 6, pp. 1475–1479, Jan. 2019.

29. Y. Liu and K. K. Parhi, “Architectures for recursive digital filters using stochastic computing,” IEEE Transactions on Signal Process- ing, vol. 64, no. 14, pp. 3705–3718, April 2016.

30. A. A. Markov , “Extension of the limit theorems of probability theory to a sum of variables connected in a chain.” The Notes of the Imperial Academy of Sciences of St. Petersburg VIII Series, Physio- Mathematical College, vol. XXII, no. 9, May 1907.

31. V. Arzamasov, K. Böhm, P. Jochem, U. Machine and Learning Repository, May. 2020. Available: https ://ar chive .ics. uci.e du/ml / data sets/ Elect rical +Grid +Stab ility +Simu lated +Data +. [Accessed].

(12)

Burcu Erkmen received the B.S., M.S., and Ph.D. degrees in Electronics and Communication engineering from Yıldız Technical University, İstanbul, Turkey, in 1999, 2001, and 2007, respectively. From 1999 to 2009, she was a Research Assistant in the Department of Electronics and Communications Engineering, Yildiz Technical University, where she has been an associate professor since 2014. Her current research interests include FPGA Based System Design, Optimization Techniques in Electronic Circuits, Artificial Neural Networks, Deep Learning, and Artificial Intelligence in Power Converters.

(13)

A Stochastic Computing Method For Generating Activation Functions in Multilayer Feedforward Neural Networks

A Stochastic Computing Method For Generating Activation Functions in Multilayer Feedforward Neural Networks

Durmuş Ersoy

, Burcu Erkmen

 

 

 









 

 

 









 

 

 

 





 

 

 

 









 



 





 



 













































