Video Copy Detection based on Uniform Local Binary Pattern

(1)

2017 International Conference on Computer Science and Application Engineering (CSAE 2017) ISBN: 978-1-60595-505-6

Video Copy Detection based on Uniform Local Binary Pattern

Yanyan Hou1,*, Xiuzhen Wang1, Sanrong Liu1 and Yu Zhang2

1

College of Information Science and Engineering, Zaozhuang University, 277160 Zaozhuang, China

2

Zaozhuang Branch China Mobile Group ShandongCompany; NO.1077, Guang’ming Road, 277160 Zaozhuang, China

ABSTRACT

Video copy detection techniques are used to detect copies of video widely, this paper proposed a new algorithm based on spatiotemporal analysis for copy detection and compares detection precision and efficiency with existing algorithms. The descriptor encodes the structure of video key frames by computing uniform Local Binary Pattern with rotation invariance. Besides, Chi-square tests are employed to speed up the matching process. The proposed algorithm can deal with various kinds of video transformations, such as brightness conversion, sharpen, contrast and gray scale, especially for video rotation which is not well addressed in existing algorithms. The results of experiments tested on TRECVID 2015 dataset, experimental results indicate that precision and recall are improved, proposed algorithm is with good robustness and discrimination accuracy, detection performance is improved further.

INTRODUCTION

With the development of multimedia technology, video information has been widely applied in Internet, illegal video copies have been transmitted on the network through editing and brightness change. The task of video copy detection determines if a given video has a duplicate in a set of videos[1]. Video copy detection mainly detects video similarity by extracting the feature information distance between detection video and reference video, so as to better protect intellectual property of the video information. Video copy detection focuses on feature extraction, including spatial feature extraction, temporal feature extraction and spatiotemporal feature extraction.

Spatial feature extraction is based on image feature extraction. Firstly, shot segmentation are performed, key frame features are extracted for video identification. Semin Kim [2] proposed two complementary region binary patterns from several rings in key frames are used as video fingerprint which is robust against rotation and flipping, but algorithm is more sensitive to sharpen. Teng Li [3] proposed video copy detection method jointly utilizing the characteristics of temporal continuity and multi-modality of video with high efficiency and robustness.

(2)

similarities between textural feature vectors which are extracted from videos, the proposed method is based on weber binarized statistical image features.

Considering various attacks, many algorithms combine spatial and temporal feature. Semin Kim [7] proposed adaptive weighted fusion with new spatial and temporal fingerprints for improved video copy detection, it’s with good robustness against various attacks, but increasing computational complexity. Uniform Local Binary Pattern is low computational complexity and successfully applied in fingerprint recognition, face recognition and other fields. In this paper, we propose video feature extraction algorithm based on uniform Local Binary Pattern (LBP) with rotation invariance, which can adapt to different motion complexity and against a variety of video attacks.

VIDEO COPY DETECTION

Video feature extraction is mainly divided into three stages. In the first stage, video shot segmentation is done and key frame is extraction based on motion features, in the second stage, uniform LBP of video key frame is extracted, in the third stage, we get video similarity by video feature descriptor matching.

Video Key Frame Extraction

Video as image sequence not only has the characteristics of image itself, but also has video own features. Video copy detection system based on testing similarities between textural feature vectors which are extracted from videos[8]. Video is composed of a series of clips, video frames within a clip are similar, near-duplicate videos have self-similarity [9]. Key frame is the representative of a clip which can express the major video content.

In this paper, the key frames are got by calculating the maximum entropy.

,

( ) ( , ) ln ( , )

i j

H p  



p i j p i j (1)

,

( , ) ( , ) / ( , )

i j

p i j  x i j



x i j (2) ( , )

x i j is the pixel value in position. Video information is with a big difference between clips and the clip change information is calculated by key frame differences. The maximum entropy is used to detect key frames, which can avoid the problem that the key frame is too bright or too dark.

Video Feature Extraction

Video features include color, gradient, brightness, edge and shape structure [10]. LBP is mainly used for video texture extraction and statistical histogram of LBP feature spectrum is used as video feature vector. LBP feature is related to video location, there will produce great error if directly extract and analysis two video frames [11]. In order to better extract video feature, video frame can be divided into sub regions, all LBP features are extracted from each sub region. LBP encoding is obtained for each pixel, the original LBP feature map is got and the statistical histogram of LBP features is got for each sub block. Each sub block region can be described by a statistical histogram, each video frame is composed of a plurality of statistical histograms.

(3)

2) For each pixel in the cell, the center pixel is compared with the adjacent 8 pixels. If the adjacent pixels is greater than the center pixel, adjacent pixel is marked as 1, otherwise adjacent pixel is as 0. Eight adjacent pixels of center pixel can generate a binary number (8 bits), LBP code is got by converting binary number into decimal number, LBP value is used to show the texture information. Figure 1 is an example of LBP.

8 3

9 6 4

7 8 3

7

2 4

128 8

64 32 16

1

1 0

1 1 0

1

[image:3.595.121.456.145.248.2]

(a) Pixel example (b) Threshold (c) Weights Figure 1. LBP of an example.

The binary of the center pixel is 11000111, the corresponding decimal number (LBP code) is 227, center pixel is replaced by 227 and local contrast is: C= (7+8+9+7+8)/5-(3+4+3)/3=4.5. The rotation invariant LBP operator is obtained by rotation, a series of initial LBP values are obtained by rotating the circular neighborhood and the minimum value is taken as LBP value of the neighborhood.

3) The histogram is got by calculating the frequency of each number (if it is a decimal number LBP value) and the histogram is normalized.

4) A feature is got by connecting statistical histogram of each cell and it is the LBP texture feature vector. LBP encode is obtained after the original LBP operator is extracted and LBP features of each frame is still an image.

The difference between the LBP coding are calculated, differences between of the LBP image pixel are got. Each result decimal number is considered as a type of macro-pattern are often represented in histograms whose bin each contain one type of pattern. LBP can be represented by fields of different scales, which combines circular field with bilinear interpolation, and LBP represents P sampling points on a circle with a radius of R.

1 , 0 ( )2 P p

P R p c

p

LBP s g g

  



 (3) 1, 0 ( ) 0, 0 x s x x     _

 ₍₄₎

Uniform LBP model is a binary sequence changing from 0 to 1 or from 1 to 0 not more than 2 times. For example, the number 10100000 is not a uniform pattern for changing 3 times, 8 bit binary numbers have a total of 58 uniform patterns. Researchers found that most of codes are calculated in 58 uniform patterns, all other are the 59th pattern. Uniform LBP codes are divided into 59 patterns, the dimension of vector reduction is realized by this method. Improved Uniform LBP is with rotation invariance, we define formula (5) and formula (6).

1 2

0 ,

( ), ( ) 2

1, ( ) 2

P

p c p

rui p P R

p

s g g U G

LBP

P U G

   _ _       



(5)

, 1 0

1

1 1

( ) | ( ) ( )

| ( ) ( ) |

P R P c c

P

p c p c

p

U LBP s g g s g g

s g g s g g

(4)

( _P)

U G stands for hops from 0 to 1 or 1 to 0; 2 ,

rui P R

LBP is a rotation invariant

equivalent model. LBP_{P R}rui_, 2 is with rotation invariance and gray level invariance,

LBP pattern categories have also been substantially reduced. Figure 2 is uniform LBP for BUS video (the 1th frame), Figure 3 is uniform LBP for BUS video (the 150th frame), Figure 4 is the uniform LBP difference between the 1th frame and the 150th frame.

[image:4.595.147.435.158.271.2]

Figure 2. Uniform LBP for BUS video (the1th frame).

[image:4.595.148.436.433.546.2]

Figure 3. Uniform LBP for BUS video (the 150th frame).

Figure 4. The uniform LBP differences (the 1th frame and 150th frame). Video Matching

Chi-square test is the deviation value between actual observation data and theoretical data, the deviation determines the size of the chi-square value. Video frame is more inconsistent if chi-square is larger. If chi-square value is 0, theoretical value is fully consistent with observation value. U₁ is uniform LBP of detection

video and is uniform LBP of reference video, chi-square test is used to measure the similarity histogram, if d U( ₁U₂) is greater than the specified threshold, detection video is similar to the reference video.

2

1 2

1 2 2

1 2

( ( ) ( ))

( )

( ( ) ( ))

I

U I U I

d U U

U I U I



 



(5)

EXPERIMENT

Video precision and recall are used as evaluation criteria [11]. The experimental data includes 200 videos from the TRECIVID 2015 as the database video and the video length is about 2~10 minutes, BUS, BOR11_007, Moon009 Anni002, etc. were selected as detection videos. Frame deletion, brightness conversion, reduction, cropping, sharpen and other copy attacks are used to attack video, attacked video are stored in the database as video copy. The detection accuracy and time complexity are analyzed [12]. Local Phase Quantization (LPQ), Binarized Statistical Image Features (BSIF), LBP and ULBP comparisons are showed in Figure 5 (a).

Recall and precision are as high as possible for good video copy detection algorithm. It can be seen that precision of proposed algorithm (ULBP) is better than LPQ, BSIF and LBP algorithm when the recall is 80%, precision is slightly lower than that of OM algorithm when recall is less than 60%. For the high recall rate in practical applications, the performance of proposed algorithm is optimal.

Figure 5 (b) shows the effect of the system on different attacks, the system has very stable effect and good robustness for multiple copy transform. System for brightness conversion and rotation is slightly better than other transformations, the influence of logo is relatively large, which affects the final detection results.

[image:5.595.97.476.334.504.2]

(a) Comparison for different algorithm (b) Comparison for different attacks Figure 5. Comparison of detection effect.

[image:5.595.98.481.571.742.2]

F operator is used to evaluate detection effects under several attacks for several algorithms, experimental results are shown in TABLE I.

TABLE I. F operator comparison.

Attack type LPQ BSIF LBP ULBP

Gray scale _0.846 _0.896 _0.874 _0.903

Brightness _0.853 _0.879 _0.881 _0.896

Rotation _0.864 _0.889 _0.879 _0.929

Contrast _0.851 _0.854 _0.847 _0.887

Sharpen _0.823 _0.836 _0.829 _0.846

Cropping _0.807 _0.851 _0.857 _0.862

Reduction _0.821 _0.881 _0.891 _0.902

(6)

CONCLUSIONS

This paper makes two contributions toward video copy detection, namely, it proposes a new video feature extraction method based on uniform LBP feature with rotation invariance and video matching algorithm based on chi-square test. The algorithm combines inter and intra frame features, which not only overcomes high time consuming and high complexity based on local features, but also overcomes the limitation of video frame content features. Experimental results show that proposed algorithm can effectively improve video coped detection speed and accuracy. Future research includes the characterization of the video feature for different types of content and exploring a modification to preserve spatial coherence that makes this video feature intuitively more meaningful.

ACKNOWLEDGEMENT

This paper is supported by the Natural Science Foundation of China (No.41204025).The authors would like to thank Processor Wang Hong Jun of Shandong University for the many valuable discussions concerning this work. The authors are also grateful to Professor Li Qing Hua of QILU University of Technology for their assistance with our experiments.

REFERENCES

1. M. Douze, H. Jegou and C. Schmmid. 2010. “An Image-Based Approach to Video Copy Detection with Spatio-Temporal Post-Filtering,” IEEE Transactions on Multimedia, 12(4):257-266.

2. Semin Kim, Seung Ho Lee, and Yong Man Ro. 2014. “Rotation and flipping robust region binary patterns for video copy detection,” Journal of Visual Communication & Image

Representation, 25(2):373- 383.

3. T. Li, F. Nian, X. Wu, Q. Gao and Y. Lu. 2016. “Efficient video copy detection using multi-modality and dynamic path search,” Multimedia Systems, 22(1):1-11.

4. X. Wu , J. Li , S. Tang and AG Junbo. 2010. “Video Copy Detection Based on Spatio-Temporal Trajectory Behavior Feature,” Journal of Computer Research & Development, 47(11):1871-1877.

5. A. Natsev, M. Hill and J. R. Smith.2010. “Design and evaluation of an effective and efficient video copy detection system, ” presented at IEEE International Conference on Multimedia & Expo, July19-23,2010.

6. A Boukhari and A Serir. 2014. “Weber Local Descriptor from Orthogonal Planes Based Video Copy Detection,” presented at 5th European Workshop on Visual Information Processing. 7. Semin Kim, Jae Young Choi, Seungwan Han and Yong Man Ro. 2014. “Adaptive weighted

fusion with new spatial and temporal fingerprints for improved video copy detection,” Signal

Processing Image Communication, 29(7): 788-806.

8. Aissa Boukhari and Amina Serir. 2016. “Weber Binarized Statistical Image Features (WBSIF) based video copy detection,” Journal of Visual Communication and Image Representation, 34(1):50-64.

9. Zhipeng Wu and Kiyoharu Aizawa. 2014. “Self-similarity-based partial near-duplicate video retrieval and alignment,” International Journal of Multimedia Information Retrieval, 3(1), pp. 1-14.

10. F. Yuan, L.M. Po, M. Liu, X. Xu and W. Jian. 2016. “Shearlet Based Video Fingerprint for Content-Based Copy Detection,” Journal of Signal & Information Processing, 07(2):84-97. 11. J. Li, Q. Wu, X. Lian and J. Sun. 2016. “Real-time video copy detection based on Hadoop,”

presented at sixth International Conference on Information Science and Technology, May 6-8, 2016.