Greedy Cross-Weight Fixing - GraphDNN: DNN Dynamic Partitioning Optimization Algorithm Design

Chapter 5 DNN Dynamic Partitioning Optimization Algorithm Design

5.1 GraphDNN: DNN Dynamic Partitioning Optimization Algorithm Design

5.1.2 Greedy Cross-Weight Fixing

As clarified, dynamic pruning can actually prune cross-partition connections with very small absolute weight values. To compensate for this pruning strategy, some other components of the neural network should increase their weight values, and they may also lie as cross-partition synapses. Inspired by this compensation process, we hope more cross-partition synapses could be cut out and synapses in a same partition could compensate these changes. On the other hand, we cannot cut off all cross-partition synapses since it would definitely damage the accuracy of the deep neural network (classification accuracy cannot be secured because only in-consecutive and irregular parts of input are given). In this way, we then propose a new dynamic optimization

-Figure 5.1 Weight changing process: (a) initial weights; (b) fixing the weight connection; (c) after retraining; (d) in the next iteration.

Greedy Cross-Weight Fixing.

5.1.2.1 Background Learning Process

Importantly, in order to fully understand our optimization, we should first understand the basic learning mechanism of neural networks: how the nearby network part will react when we change the value of a synapse and fix it. Descriptions about this is discussed in the following two processes: the changing process and the fixing process.

1) The Changing Process

The back-propagation method which utilizes the gradient descent method is adopted to fine-tune and fix those cross-partition connection synapses. For instance, given a sim-ple one-fully-connected-layer neural network, we will now explain how those weights surrounding the fixed weight change during the training phase. Principles in multiple layers can be referred accordingly. During the back-propagation process, the signifi-cance of the gradient descent - how much the weight value changes - associated with the neuron is determined by its input values from multiple previous neurons. Therefore, when we cut off (i.e. fix as zero) a cross-partition synapse with weight W_ij connecting an input neuron i to j, an error can occur between the expected output value of j and the real output value of j; this error is induced by the miscalculation of i and W_ij. For-tunately, if other input neurons (e.g., i − 1, i + 1) are correlated with i, this error can be compensated by i − 1 and i + 1 in the back-propagation process: as the fixed weight W_ij is cut off in the back-propagation process, the gradient descent algorithm will instead

change the weight synapse connecting j to i − 1 and i + 1 to minimize the error.

Figure 5.1 demonstrates the above changing process. As shown in Figure 5.1(a), a group of neighboring input neurons P_i, P_i−1, P_i+1have similar values. With the gradient descent algorithm, the associated weights also have similar values.

[Note: why they would have similar values? Because, generally, as well as in the following experiments, the values of the input samples, for a common neural-network based application, are highly correlated. For example, in a visual application, the input data of the neural networks are continuous because the image pixel at a nearby region have similar values. It is worth noticing that the partitioning algorithm is designed to partition the network work loads for its computation demands; the original network topology and values still maintain the original meanings.]

From descriptions in the figure, the weight WPiO belongs the dataset of which we find elements are as far as from each other; thus we cut off the connection synapse and fix its weight value, as shown in Figure 5.1(b), which causes an increased error in the value of the output neuron. When this neural network is trained again, the local gradient value of WPiO, due to the fixed WPiO, has to be embodied in WPi−1O and W_P_i+1_Oinstead, so that the value of the output neuron bounces back to the original one, as shown in Figure 5.1(c). This process continues, as shown in Figure 5.1(d), until the accuracy change reaches our limit, for instance, 2% in reduction.

2) The Fixing Process

This process describes how we will make decisions on fixing during the dynamic procedure. The overall principle is that each time the fixing process prefers to select and fix those cross-partition synapses that keep themselves as far as much from each other.

The above Changing Process describes when one connecting synapse is fixed, how its surrounding synapses perform weight value adjustments to compensate for this weight change. In this part, the Fixing Process will focus on how to select those cross-partition synapse to be pruned within pre-stipulated accuracy limitations:

There are two design aspects for next connection weight fixing considerations: First, also as what is explained in the above Changing Process, the continuity property shown by the input data (which can be image pixels or video frames) maintains in the

dis-tributed computing environment as well. Second, weight compensations are applied by some surrounding connecting synapses of the fixed network synapse. Inspired by k-means clustering algorithm, this paper tries to make the fixing operations on network connections which hold the most far distances. This idea is apparently intuitive enough for k-means applications where nodes distances are already measured by their synapse weights. When it comes to synapse distances in the neural network, we should take the fully-connected layer structure into consideration: as figure 5.1 illustrates, changes introduced by W_P_i_O, has to be embodied in W_P_i−1_Oand W_P_i+1_O. Then we think output nodes are the key measure for synapse distances in neural networks.

5.1.2.2 Our Algorithm: Greedy Cross-Weight Fixing

Now we define synapse distances in fully-connected layers as: given a fully-connected layer with i input nodes and j output nodes, P_krepresents the k_thinput node, Q_k repre-sents the kthoutput node, WPmQn represents the weight value of the synapse connecting the m_thinput node and the n_thoutput node. The definition of synapse distances can be expressed as the equation:

Dist_ab

= Dist_P_u_Q_v−P_mQn

= |(i − u) + (n − v − 1) × i + m|

= |(n − v) × i + m − u|

(5-1)

i.e. Distance of Synapse a (connecting Pm and Qn) and Synapse b (connecting P_u and Q_v) equals to the absolute value of result (i − u) + (n − v − 1) × i + m = (n − v) × i + m − u.

Keep that the distance of different synapses as the fixing measurement and the pre-stipulated accuracy reduction as the end retraining standard. Both of the above Chang-ing Process and the FixChang-ing Process then can constitute the followChang-ing Greedy FixChang-ing Optimization.

Furthermore, if you would like to combine the dynamic pruning process and the

greedy cross-weight fixing process during the retraining, there are two points should be followed: First, the implementing sequence of ’dynamic pruning-then-greedy cross-weight fixing’ should be assured each iteration. Because dynamic pruning processes the small block of synapses. If the dynamic pruning is performed after greedy fixing, some of synapse pruning operations in fixing become meaningless. Second, you should keep one redundant cycle each time after one greedy weight fixing operation, which is utilized to justify the values of the weight matrix.

Algorithm 5: Dynamic Retraining Optimization2: Greedy Cross-Weight Fixing Input: Let P−[1×(r×c)]be the partition matrix, element value equals to one if the

edge is a cross-partition edge, zero otherwise; Let M−[1×(r×c)]be the mask matrix, element value equals to zero if the edge is cut off, one otherwise; Let W−[1×(r×c)]be the weight matrix, each element represents corresponding edge weight; Let Dist_ab be the distance of edge a and edge b; k be the desired number of cross-partition edges to be cut off each iteration; ct records the total count of current cross-partition connecting edges.

Output: Weight Matrix W_{−[1×(r×c)]}⁰ and Partition Matrix P_{−[1×(r×c)]}⁰ .

1 while δ < EndCondition do

2 for i in all r × c connectingedges do

3 if P [i] = 1 then

4 ct+ = 1;

5 end

6 for i in all r × c connectingedges do

7 if P [i] = 1 and tmp = ct/k then

8 M [i] = 0, W [i] = 0, P [i] = 0;

9 end

10 end

11 end

12 end

Figure 5.2 Zero Probability of CIFAR-10 DNN

In document 论文题目 : Deep Neural Network Partitioning. in Distributed Computing System SHANGHAI JIAO TONG UNIVERSITY BACHELOR S THESIS (Page 41-46)