Deep Convolutional Neural Network. for Motion Deblurring

(1)



Abstract—Removing blur caused by camera shake in images has always been a challenging problem in computer vision literature due to its ill-posed nature. So it is very difficult to accurately predict blur kernels. In this paper, we propose deep convolutional neural networks for this debluring task to avoid calculating blur kernels. Compared with others approaches, we used the L2 norm to the loss function. L2 regularization is generally used to optimize the regular term in the objective function, which guarantees the restoration image quality and prevents overfitting caused by too many parameters. The model proposed in this paper realizes image deblurring directly by learning the intrinsic relationship between blurred image and target image.

I. INTRODUCTION

The images is so important in the way of information communication serve as carriers with the development of 4G technology and the arrival of the 5G era. In 2013, Facebook, the world's largest social networking site, revealed that the company's 1.15 billion users uploaded an average of 350 million photos per day to their websites, the total number of photos uploaded reached 250 billion. However, the Image quality is always degraded in the process of image acquisition affected by various factors more or less. The most typical causes of image degradation are as follows [1-2]: Influenced by atmospheric turbulence, solar radiation, etc., the image will be distorted during imaging, and the relative motion between the camera and the target will produce motion blur, and if the focus is not accurate or there are targets of different depths in the imaging area will leads to defocusing. The motion blur of an image refers to a relative motion between the target and the imaging device during shooting, resulting in blurring of the obtained image.

The inverse filtering, began to be widely used in the field of image restoration as early as the mid-1960s. In 1974, the R- L algorithm proposed by Richardson [3], based on the Bayesian theory framework, assumed that the blur image conformed to the Poisson distribution and was solved iteratively using the maximum likelihood method. But the R-

Supported by Foundation of Sichuan Science and Technology Department (2017GZ0331)

1, graduate student, Chengdu University of Information Technology, Chengdu, China (corresponding author to provide phone: 18408272762; e- mail: 18408272762@163.com).

2, professor, Chengdu University of Information Technology, Chengdu, China.

3, associate professor, Chengdu University of Information Technology, Chengdu, China.

L algorithm will leads to ring with a little noise and also sensitive to it. In 2008, Shan et al. [4] constructed a regular term constraint by limiting the gradient of the blurred image and the restored image locally. This method suppresses the ringing effect well, but it exists detail information loss and noise. Fergus et al. [5] constructed a regular term for the image restoration process by approximating the heavy-tailed distribution of the gradient of natural images by Gaussian model, which reduces the loss of detail information and noise, but the image is due to the existence of approximation error, leads the results are not accurate in structure. Traditional approaches of various types of blind restoration algorithms have been proposed, it is still difficult to accurately estimating blur kernels. Recently, with the development of deep learning, Sun et al. [6] first attempted to estimate motion blur kernel information in a small region of a given image using the deep learning method in 2015, and then attempted to recover the sharp image. This method can well identify complex motion blur information and effectively remove it, but it constraints the blur kernel. In 2017, Nah et al. [7] trained a multi-scale deep network for the above problems, gradually recovering clear images through an end-to-end network, but there is still the problem of image motion blur. Ke [8] et al. proposed a different layers of CNNs to deal with blurred images in 2018, and achieved good results, but the edge features of the image restoration results were not obvious. Thus, to solve those problems, in this work, we propose a method of motion blurred image restoration based on deep convolutional neural network, which makes the edge features of image restoration results more obvious and reduces the loss of detail information.

II. OVERVIEW OF THE CONVOLUTIONAL NEURAL NETWORK

CNNs (Convolutional neural networks) are a very popular neural network model in the field of deep learning research, which also belongs to the scope of artificial neural networks.

CNN are multi-layer neural networks that excel at dealing with related machine learning problems of images. CNN successfully reduces the image data dimension through a series

4, graduate student, Chengdu University of Information Technology, Chengdu, China

Deep Convolutional Neural Network

for Motion Deblurring

Yu Zhou¹, Rui Fang², Peng Liu³, Kai Liu⁴

Downloaded 08/24/21 to 134.122.89.123. Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/page/terms

(2)

of methods and finally enables it to be trained. CNN was first proposed by Yann LeCun [9] and applied to handwritten font recognition (MINST). The network is called L Net, and its structure is as follows Fig.1.

Figure 1. L Net

III. IMAGE DEBLURRING ALGORITHM BASED ON

CONVOLUTIONAL NEURAL NETWORK

Combined with the related process of image deblurring [10-11], the idea of deblurring the image in this paper is shown in the fig.2:

Figure 2. Total ideas

For a blurred image IB, it is directly input into the trained deep convolutional network model, and the output IS is a clear result.

A. Deep convolutional network model structure design Construct a deep convolutional network model to generate clear images. The CNN includes a convolutional layer, a pooling layer, a fully connected layer, and an active layer. The convolution layer is composed of convolution kernels of multiple channels. Convolution kernels are usually represented by a matrix, each element in the matrix represents a neuron. The pooling layer is a downsampling operation on the original matrix. The commonly pooling methods have mean pooling, maximum pooling and random pooling. The fully connected layer converts the 2D feature map output from the convolutional layer into a vector, it’s convenient for subsequent classification or regression. The role of the activation layer is to introduce nonlinear elements. The convolutional, pooling, and fully connected layers mentioned above are linear. However, if the network has only a linear model, the network's ability to express is not enough. So it’s

necessary to add a nonlinear activation layer to the network.

Common activation functions are Sigmoid, Tanh, ReLU etc.

Our model selects the ReLU activation function, the calculation of the whole process saves a lot. And ReLU will make the some neurons output zero, which will cause the sparseness of the network and reduce the interdependence of parameters. Then alleviate the problem of overfitting. We added the Dropout layer before the output to reduce overfitting, and it also automatically handles the scale of the neuron output values.

The model consists of 25 convolution layers, 1 dense connection layer, and 1 Dropout layer. The structure is: first convolutional layer 0 → maximum pooling layer0 → second convolutional layer1→maximum pooling layer1→...→25th convolutional layer 24→maximum pooling layer 24→dense connection layer →Dropout layer → output results.

Among them, the "→" symbol indicates the neural network layer connection, and "A→B" indicates the A layer output data as the B layer input (as shown in fig.3).

Figure 3. 25-layer CNN

The model consists of 20 convolution layers, 1 dense connection layer, and 1 Dropout layer. The structure is: first convolutional layer 0→maximum pooling layer 0→second convolutional layer1→maximum pool Layer 1 → ... → 19th layer 20 → maximum pool layer 20 → dense layer → Dropout layer → output result (as shown in fig.4).

Figure 4. 20-layer CNN

B. Loss function

 Cross-entropy constraint

Theoretically, the generated image f(x) should be as close as possible to the sharp image y pixel. So it is based on the cross-entropy, minimizing the error between the generated image f(x) and the sharp image y at each scale. It can be expressed by the following formula:

 𝐿₁= − ∑ 𝑦𝑙𝑜𝑔(𝑓(𝑥)) 

 L2 sparse regularization constraint

The training process includes forward propagation and error back propagation. The errors propagation is most important in the process of weight adjustment. L2

(3)

regularization is generally used to optimize regular terms in the objective function. This paper proposes the L2 norm, which guarantees the restoration image quality and prevents the parameters are too complicated to over-fitting. So we use L2-norm between the output clear image and the real clear image.

 L2=¹

2∑ ||𝑓(𝑥) − 𝑦||₂² 

In the above formula, f(x) is the final output of forward propagation, and y is the target pixel value.

 Overall model

The total loss function is a linear combination of each part.

The final loss function formula of this chapter is as follows.

L = L1+ L2 

According to the error formula, the error is transmitted back to each layer by layer. And use Adaptive Moment Estimation (ADAM) optimization strategy. Adam is a first- order optimization algorithm that can replace the traditional stochastic gradient descent process. It iteratively updates neural network weights based on training data. The main advantage of Adam is that after the offset correction, each iteration learning rate has a certain range. Then make the parameters smoother. Adam was originally proposed by OpenAI's Diederik Kingma and the University of Toronto's Jimmy Ba [12] in the submission to the 2015 ICLR paper (Adam: A Method for Stochastic Optimization).

The specific implementation is as follows:

Requirement: step value ε, initial parameter θ, numerical stability δ, first-order momentum attenuation coefficient ρ1, second-order momentum attenuation coefficient ρ2.

Generally, Some of them are: δ=10−8,ρ1=0.9,ρ2=0.999.

Intermediate variables: first-order momentum s, second- order momentum r. All initialized to 0

Each iteration process:

(1) Randomly extract a batch of samples of size m from the training set{𝑥₁… 𝑥_𝑚}, and the associated output 𝑦_𝑖;

g ← +_𝑚¹∇𝜃 ∑ 𝐿(𝑓(𝑥_𝑖 𝑖; 𝜃), 𝑦𝑖) 



 (2) Calculate the gradient and error, update r and s;

s ← ρ₁s + (1 − ρ₁)g 

r ← ρ2𝑟 + (1 − 𝜌2)𝑔⨀𝑔

(3)Calculate the update parameter based on r, s and the gradient;

𝑠̂ ← 𝑠

1 − 𝜌₁ 

𝑟̂ ← 𝑟 1 − 𝜌₂

(4) Calculate the update parameter amount from the gradient of r and s, and estimate change of θ;

∆θ = −𝜖 𝑠̂

√𝑟̂ + 𝛿 

(5) Update θ according to initial parameters θ and ∆θ

θ ← θ + ∆θ 

C. Model training process

 Forward propagation.

The convolution of the forward process is a typical valid convolution process, which is the convolution kernel W is overlaid on the input map X. Multiply the corresponding position and then sum, then get a value assigned to the position corresponding to the output image Y. The main task of this stage is forward feature extraction, classification calculation.

The network calculation process is to multiply the input information by the weight matrix of each layer in the convolutional neural network, after layer-by-layer transformation then plus some bias, the final output is obtained, transferred to the output layer in the end.

Input: 1 image, the number of CNN’s layers L, the types of all hidden layers. For convolutional layers, define the size of the convolution kernel K, the dimension F of the convolution kernel submatrix, fill size P, stride S. For the pooling layer, define the pooling area size k and the pooling standard (MAX or Average). For the fully connected layer, define the activation function of the fully connected layer (except the output layer) and the number of neurons in each layer.

Output: output 𝑎^𝐿 of CNN model

(1) Fill the edge of the original picture according to the input layer fill size P to obtain the input tensor 𝑎¹.

(2) Initialize all hidden layer parameters W, b (3) for L=2 to L−1:

a) If the first layer is a convolutional layer, the output is 𝑎^𝑙= ReLU(𝑧^𝑙) = ReLU(𝑎^𝑙−1∗ 𝑊^𝑙+ 𝑏^𝑙) 

b) If the Lth layer is a pooling layer, the output is 𝑎^𝑙= pool(𝑎^𝑙−1), The pool here refers to the reducing the input tensor according to the pooling area size k and the pooling standard.

c) If the Lth layer is a fully connected layer, the output is a) 𝑎^𝑙= 𝜎(𝑧^𝑙) = 𝜎(𝑊^𝑙𝑎^𝑙−1+ 𝑏^𝑙) 

(4) For the output layer L:

(4)

𝑎^𝐿= 𝑠𝑜𝑓𝑡𝑚𝑎𝑥(𝑧^𝐿) = 𝑠𝑜𝑓𝑡𝑚𝑎𝑥(𝑊^𝐿𝑎^𝐿−1+ 𝑏^𝐿) 

 Back propagation

In this phase, the main function is feedback to the error and the update of the weight.

Input: m images, the number of CNN’s layers L, the types of all hidden layers. For convolutional layers, define the size of the convolution kernel K, the dimension F of the convolution kernel submatrix, fill size P, stride S. For the pooling layer, define the pooling area size k and the pooling standard (MAX or Average). For the fully connected layer, define the activation function of the fully connected layer (except the output layer) and the number of neurons in each layer. Gradient iteration parameter α, maximum iteration number MAX and stop iteration threshold ε.

Output: W, b of each hidden layer and output layer of CNN model

(1) Initialize the value W and b to a random value of each hidden layer and output layer.

(2) for iter to 1 to MAX:

(2-1) for i =1 to m：

a) Set the CNN input 𝑎¹ corresponding to the tensor 𝑥𝑖

b) for L=2 to L-1, the forward propagation algorithm is calculated according to the following three cases:

b-1) If it is currently a fully connected layer: then there is 𝑎^𝑖,𝑙= 𝜎(𝑧^𝑖,𝑙) = 𝜎(𝑊^𝑙𝑎^{𝑖,𝑙−1}+ 𝑏^𝑙)

b-2) If it is currently a convolution layer: then there is 𝑎^𝑖,𝑙= 𝜎(𝑧^𝑖,𝑙) = 𝜎(𝑊^𝑙∗ 𝑎^{𝑖,𝑙−1}+ 𝑏^𝑙)

b-3) If it is currently a pooling layer: then there is 𝑎^𝑖,𝑙= 𝑝𝑜𝑜𝑙(𝑎^{𝑖,𝑙−1}), The pool here refers to the reducing the input tensor according to the pooling area size k and the pooling standard.

c) for the output layer L: 𝑎^𝑖,𝑙= 𝑠𝑜𝑓𝑡𝑚𝑎𝑥(𝑧^𝑖,𝐿) = 𝑠𝑜𝑓𝑡𝑚𝑎𝑥(𝑊^𝐿𝑎^{𝑖,𝐿−1}+ 𝑏^𝐿)

c) Calculate the 𝛿^𝑖,𝐿 of the output layer by the loss function

d) for L= L-1 to 2, the backpropagation algorithm is calculated according to the following three cases:

d-1) If it is currently a fully connected layer: 𝛿^𝑖,𝑙= (𝑊^𝑙+1)^𝑇𝛿^𝑖,𝑙+1⊙ σ′(𝑧^𝑖,𝑙)

d-2) If it is currently a convolution layer: 𝛿^𝑖,𝑙= 𝛿^𝑖,𝑙+1∗ 𝑟𝑜𝑡180(𝑊^𝑙+1) ⊙ σ′(𝑧^𝑖,𝑙)

d-3) If it is currently a pooling layer: 𝛿^𝑖,𝑙= upsample(𝛿^𝑖,𝑙+1) ⊙ σ′(𝑧^𝑖,𝑙)

(2-2) for L = 2 to L, update 𝑊^𝑙, 𝑏^𝑙of the Lth layer according to the following two cases:

(a) If it is currently a fully connected layer: 𝑊^𝑙= 𝑊^𝑙− 𝛼 ∑^𝑚_𝑖=1𝛿^𝑖,𝑙(𝑎^{𝑖,𝑙−1})^𝑇，𝑏^𝑙= 𝑏^𝑙− 𝛼 ∑^𝑚𝑖=1𝛿^𝑖,𝑙

(b) If it is currently a convolution layer, for each convolution kernel: 𝑊^𝑙= 𝑊^𝑙− 𝛼 ∑^𝑚_𝑖=1𝛿^𝑖,𝑙∗ 𝑟𝑜𝑡180(𝑎^{𝑖,𝑙−1})， 𝑏^𝑙= 𝑏^𝑙− 𝛼 ∑^𝑚_𝑖=1∑_𝑢,𝑣(𝛿^𝑖,𝑙)_𝑢,𝑣

(2-3) If all W, b changes are less than the stop iteration threshold ε, then skipped the iteration loop to step 3.

(3) Output a linear relationship coefficient matrix W and bias vector b of each hidden layer and output layer.

IV. EXPERIMENTAL RESULTS AND ANALYSIS

To verify the feasibility of the algorithm, applying the algorithm to the deblurring of motion blur license plates. The experimental environment is the ubuntu operating system and the python 3 operating platform, use the tensorflow framework.

The datasets is from the Imperial College of London Laboratory.

A. Comparison of network models with different parameter settings

In this paper, the hyperparameters 𝛼₁and 𝛼₂in the loss function are determined by several experiments, and 𝛼₁ is selected as 10^-2, 𝛼₂ is selected as 10^-4. In this paper, Adaptive Moment Estimation (ADAM) optimization strategy is adopted. And apply batch training, mini-batchsize is set to 600. After many training and tuning, the final network learning rate is set at 10^-4, and the total number of training is 200K times.

B. Experimental results

(1) Verified by experiments, after the model is trained 60K times on the training set, the model has reached a convergence state. The deblurred license plate results of the model output are shown in Fig.5 below. Due to the large number of test sets, this section takes a few representative test images for display.

Figure 5. Part of the image of the license plate in the database

The output of the model proposed is compared with the output of the literature [8] model. The image in Fig.6 is

(5)

randomly picked from the dataset. From left to right, it is a blurred image, the literature [8] model deblurring results, our results.

Figure 6. Results of the License plate deblurring

From the results of the motion deblurring, there is some uncertain information in the deblurring result of the literature [8] model. As in the license plate, the license plate character

"3" restored by the literature [8] model obviously has a certain distortion, moreover, there are still some blur regions.

However, in our methods, the L2 norm is added to loss function. While ensuring the quality of restored images, constraints the image edge effectively.

(2) Compare the 25-layer network structure with the 20- layer structure in this paper. And verify the feasibility of the algorithm. The respective iteration results of the two structures are shown in fig.7 and fig.8.

Figure 7. Loss in the 25-layer CNN training process

Figure 8. Comparison of losses during training between 25-story and 20- story CNN

As can be seen from Figure 7, the loss is falling, indicating that we are learning very well. After 50K iterations, the loss curve begins to converge. At 70K iterations, the final loss of the 25-layer CNN model is 1. 910. As can be seen from Fig. 8, after training for 14K times, the 20-layer CNN (purple) loss value was 2.733, and the 25-layer CNN (blue) loss value was only 1.774.

It can be seen from the experimental results that as the depth of the network increases, the convergence speed of the loss curve shows a certain increase and the loss is reduced. The depth of the model greatly affects the final deblurring performance. However, at the same time, the training time will become longer. This problem severely limits the practical application of CNN.

V. CONCLUSION

Through the above analysis, the convolutional neural network had better results in image restoration. Our results show that our model and framework outperforms the traditional restoration algorithm. However, the research in this paper is limited to deep convolutional neural networks. In addition to the above-mentioned restoration algorithm, there are many other algorithms, so it is necessary to make a large number of comparisons in the following research, in order to better

REFERENCES

[1] Andrews C, Hunt B R. Digital Image Restoration [M]. Englewood Cliffs, NJ: Prentice-Hall, 1977.

(6)

[2] Kenneth. R. Castleman. Digital Image Processing [M]. PRENTICE HALL, 1998.

[3] W H Richardson. Bayesian-based iterative method of image restoration [J]. Journal of the Optical Society of America, 1972,62(1): 55-59 [4] Shan Q, Jia J, Agarwala A. High-quality motion deblurring from a

single image[J].ACM Transactions on Graphics，2008.

[5] Krishnan D, Tay T, Fergus R. Blind deconvolution using a normalized sparsity measure[C]//IEEE Conference on Computer Vision and Pattern Recognition, 2011

[6] J. Sun, W. Cao, Z. Xu, and J. Ponce. Learning a convolutional neural network for non-uniform motion blur removal. In CVPR, pages 769–

777. IEEE, 2015. 1, 2, 3, 6, 7, 8

[7] S. Nah, T. H. Kim, and K. M. Lee. Deep multi-scale convolutional neural network for dynamic scene deblurring. pages 3883–3891, 2017.

[8] Ke Yu , Chao Dong2 et. Crafting a Tool chain for Image Restoration by Deep Reinforcement Learning. In CVPR. IEEE, 2018.

[9] Yann LeCun, Koray Kavukcuoglu and Clement Farabet. Convolutional Networks and Applications in Vision. IEEE,2010.

[10] P. Wieschollek, M. Hirsch, B. Sch¨olkopf, and H. P. Lensch. Learning blind motion deblurring. In ICCV. IEEE, 2017.

[11] C. J. Schuler, M. Hirsch, S. Harmeling, and B. Sch¨olkopf. Learning to deblur. IEEE transactions on pattern analysis and machine intelligence, 38(7):1439–1451, 2016. 1, 2, 3.

[12] Diederik Kingma. Jimmy Ba. Adam: A Method for Stochastic Optimization. ICLR, 2015.