Kd-Tree Based Algorithm For Copy-Move Forgery Detection

(1)

Kd-Tree Based Algorithm For Copy-Move

Forgery Detection

Abdullah M. Moussa

Abstract: Using image manipulation programs has become easier and more powerful than before. Due to such fact, detection of image forgeries has produced significant interest recently. Falsification of images can initiate dangerous legal concerns. Among the most extensively utilized approaches for image forgeries is copy-move forgery in which a section of the image is copied and duplicated in another location in the same image. A significant part of a digital image can be covered or added using this procedure. In this paper, we propose an accurate algorithm for copy-move forgery detection. A block-based approach is suggested that uses the KD-tree data structure and a simple yet efficient feature vector to detect the possible forgery. The results demonstrated in this work match state of the art methods while providing a significant speedup.

Index Terms: Forgery detection, Block matching, Copy-move forgery, Feature extraction, Digital forensics, KD-tree, Image forensics. ——————————  ——————————

1 INTRODUCTION

In the age of affordable computational power, easy-to-apply images falsification can lead to dangerous social problems, and this is because of the presence of many effective image processing software utilities. Many sorts of image forgeries can be found. Copy-move forgery and image splicing are two examples of such forgeries. In the former, a portion of the image is copied and duplicated into another position to add or cover some significant details in the original image [1]. While in the latter, a new image can be made as a combination of several images via segmenting and merging some of their interesting parts. Copy-move forgery can be considered as a robust and easy to use technique. It may be very hard for non-experts to recognize such forgeries without using computer software. Fig. 1 presents a digital image copy move forgery sample. There are several ways that have been conducted to handle the problem of Copy-Move Forgery Detection (CMFD). The two main categories of approaches suggested to tackle CMFD are key-point matching based methods and the ones that are based on matching blocks of the suspicious image. Key-point matching based techniques rely on determining the positions of high entropy in the image to be checked. Such positions can be utilized as vectors of features. Many techniques can be used to extract keypoints such as SURF [3] and SIFT [4]. In [5], [6], [7] and [8], the methods suggested are based on the keypoints of SIFT. Other techniques rely on SURF to tackle CMFD [9] and [10]. The key-point matching methods generally are fast. However, they have a common drawback which is their sensitivity to tiny modifications in intensities of pixels. So, they may be not robust to handle little size cloned parts.

Fig. 1 An image copy-move forgery sample [2]: (a) original image and (b) forged image

On the other hand, the other main class of CMFD techniques relies on block-matching. The general methodology of such techniques relies on splitting the analyzed image into many blocks. These blocks can be nonoverlapping or overlapping ones. Feature vectors are extracted from blocks and compared in search of similar ones. Some algorithms within this class rely on moments as features. For example, Ryu et al. [11], Xiong et al. [12] and Al-Qershi and Khoo [13] utilized features that are based on Zernike moments, Hu moments are also used as features in [14]. In another subset of methods, features are obtained by using frequency as in [15] and [16]. In [15], Haar-wavelets have been utilized to get the features. And in [16], the features extracted using Fourier-Mellin transform. Other methods use local binary pattern variance [17], blocks directional information [18] and blocks entropy [19] to get the features. Some other approaches use both of key-points and blocks [20] and [21]. In general, the robustness of block-matching algorithms is high; however, the main drawback of them is their relative time inefficiency. In our previous work [22], we proposed an algorithm that is based on block-matching to tackle the problem of CMFD. The suggested algorithm relied on sum of pixel intensities to get features and utilize the KD-tree data structure to store and retrieve the block features. The work proposed here is an extension of the previous work in [22]. We present a new version of the algorithm with several modifications which significantly increase the time efficiency while maintaining very comparable results. In the proposed method, the image to be checked is segmented to blocks of square shape with a predefined side length; k square, equally spaced sub-blocks are extracted from each one of the blocks. Using a sliding window, a vector of k dimensions is calculated from the summation of pixel intensities of each sub block. Such calculated vector is utilized for each block as its vector. The resulting vectors of all the

_____________________________

(2)

blocks are stored in a KD-tree data structure. Near-duplicated nodes in the KD-tree that correspond to non-overlapping blocks within the image are found and a block matching operation is applied in search of a possible forgery. There are two main improvements in the current version of the algorithm. First, the previous version used a 9-dimensional vector as a feature vector. In this version, we proved experimentally that a 5-dimensional vector is far enough to get very accurate results. Second, while the algorithm version in [22] search for possible duplicates across all the nodes of the KD-tree corresponding to all the blocks of the image, the current version only checks the nodes in the KD-tree corresponding to non-overlapping blocks in the image. As we will see, such improvement makes a significant reduction in the time taken by the algorithm. This is achieved as a result of enhancing the time complexity from

O

(

NM

)

to

approximately

O

(

N

M

)

.We have tested the presented technique using a dataset proposed by [20]. The quality of the presented method has been compared with [22] and another state-of-the-art algorithm suggested in [20]. The experiments illustrate that the suggested technique is effective and fast. The organization of the remaining part of the paper is as the following: Section 2 presents the suggested method for copy-move forgery detection. In section 3, the complexity analysis of the proposed technique is illustrated. The results of the experiments are provided in Section 4. Finally, in section 5, the conclusions are summarized.

2 THE

PROPOSED

ALGORITHM

The suggested technique is within the CMFD class that is based on block-matching. The general methodology of the method is: (1): segment the image to analyze into overlapping blocks; (2): From all the blocks, calculate the feature vectors; (3): Build a KD-tree from the resulting feature vectors; (4): Specify near-duplicated nodes within the KD-tree corresponding to non-overlapping blocks in the image and then match the blocks in search of possible forgeries. In the presented technique, the image to analyze SI is converted to grayscale and then split into overlapping blocks of square shape with a side length



that is prespecified.



is utilized as one of the parameters of the technique. For each pixel within SI, five square, equally spaced sub-blocks are specified which are in the center and the four corners. We found experimentally that extracting such five sub-blocks is effective and provide simpler implementation. Using a sliding window, and for each sub-block, the summation of intensities of pixels is computed and the five values produced are used for that block as its feature vector. Using such feature vector is inspired by the template matching technique that has been proposed in [23] and [24]. The resulting vectors of all pixels of SI are saved in five-dimensional tree T by using a KD-tree data structure with 1-norm distance. In the second stage of the algorithm, SI is segmented to non-overlapping blocks that have



as a side length. For each node in T corresponding to each non-overlapping block and within a radius r, T is queried in search of nearest neighbor nodes (if any exist). Every pixel in the block associated with the current node is inspected with the corresponding pixel in the block associated with its nearest neighbor node in T while the following condition should be satisfied:

C

_f





2

t

_m (1)

Where

C

_f is defined as the overall number of pixel couples that the summation of absolute difference (SAD) between their color channels exceeds a specific value α (in the conducted experiments α had a value of ten), and

t

_m is a problem-dependent parameter to identify correct matches that varies from zero to below one. The two blocks are regarded as duplicate blocks if condition (1) is met. Algorithm 1 illustrates all of the procedure.

Algorithm 1 The proposed Algorithm

1: Input: SI,



, r, and

t

_m

2: Output: the final duplicate regions map M.

3: Split SI into overlapping blocks with a side length of



and for each block, specify the five square, equally-spaced sub-blocks that are in the center and the four corners. 4: With the help of a sliding window, at each pixel in SI, calculate the sum of pixel intensities within each sub-block. 5: Save the resulting five values of each pixel in a feature vector.

6: Store the feature vectors in a five-dimensional tree T (KD tree

data structure) with 1-norm distance.

7: Split SI into non-overlapping blocks with a side length of



. 8: For each node



T corresponding to each block specified in 7 do

9: Determine (if any) the nearest neighbor node within the radius r.

10: if

C

_f





2

t

_m then

11: Mark the corresponding blocks as duplicates. 12: end

13: end 14: return M.

3 COMPLEXITY

ANALYSIS

To compute the complexity analysis of the proposed algorithm, put the following assumptions: N is the number of pixels in the image to analyze, M is the number of pixels in the block and K is the number of sub-blocks. To compute the summation of pixel intensities of every sub-block by using a sliding window, we need computations that have a complexity of _{O N}₍ ₎ or

) (N M

O as



is, by definition, of O( M). And to search for the nearest neighbor of every node in the KD-tree corresponding to every non-overlapping block of side length of



, we require computations of _O₍N _logN ₎

M M

, While the

KD-tree building process needs computations of

_{O N}

₍

_log

_N

₎

.

Also, the technique utilized operations of _O₍N _logN ₎

M M

in

search of the nearest neighbor for every node in the KD-tree corresponding to every non-overlapping block of side length

of



. Moreover, the technique needs O((N )M) M

or

O N

(

)

(3)

algorithm has a total complexity of _{O N}₍ ₍ _M 1 _logN ₁₎₎

M M

  or

approximately

O

(

N

M

)

. Furthermore, the algorithm uses

(

)

O NK

of space to save the KD-tree.

4 EXPERIMENTAL

ANALYSIS

To examine the quality of the presented technique, we have used a dataset consists of twenty three images proposed by Silva et al. [20]. A subset of this dataset images has been utilized in the first IEEE International Image Forensics Challenge. ALGLIB library [25] has been used to create the KD-tree and to match the nodes in our technique, while OpenCV library [26] has been used for image conversion from RGB to grayscale. The PC used for the experiments had 4 GB of RAM and a Core i-5 2.3-GHz processor. The presented technique has been compared with two methods suggested in [20] and [22]1. We will refer to [20] and [22] as Algorithm I and Algorithm II respectively. Three main tests have been performed. In the first one, we have compared and examined the three algorithms using the dataset without modifications (Basic Test), and in the second test, a JPEG compression of quality factor 100 has been applied on the dataset (JPEG100 Test), while in the third experiment; a Gaussian blur filter with a radius of one has been applied on the dataset (Blur Test). We have computed the processing time and the effectiveness achieved for each test. Three metrics have been used to compute the effectiveness of the three techniques according to the metrics utilized in the first IEEE International Image Forensics Challenge as follows:

True Positive Rate (TPR): gives the percentage of successfully located duplicate regions. It is computed as:

p

T

T TPR

C



(2)

Where

p

T is the number of pixels successfully recognized as

duplicates, and

T

C is the total number of real duplicate pixels.

False Positive Rate (FPR): this metric specifies the percentage of unsuccessfully located duplicate regions. It is computed as:

p

N

F FPR

C

 (3)

Where

F

_P is the total number of pixels unsuccessfully

recognized as duplicates, and

C

_N is the total number of

pixels in the suspicious image which don't belong to the duplicate parts.

Accuracy (ACC): it indicates the overall performance of the technique using TPR and FPR as follows:

(1 )

2

TPR FPR

ACC    (4)

Fig. 2 illustrates a performance comparison between the proposed algorithm against Algorithm I and Algorithm II in

each experiment in terms of computational time (in seconds), and Table 1 illustrates the performance of the three techniques using TPR, FPR and ACC.

Table 1. For all three experiments, the average accuracy (ACC), true positive rate (TPR), false positive rate (FPR), time (in milliseconds) and speed-up ratio attained by all the

three tested techniques

Algorithm ACC TPR FPR Time Speed-up Ratio Basic

Algorithm I 0.964 0.954 0.026 17093 4.5 Algorithm II 0.968 0.940 0.003 9581 2.52 Proposed 0.953 0.910 0.004 3797 1 JPEG100

Algorithm I 0.927 0.882 0.027 20780 4.42 Algorithm II 0.912 0.872 0.049 10185 2.17 Proposed 0.928 0.902 0.0453 4697 1 Blur

Algorithm I 0.935 0.896 0.026 13124 3.18 Algorithm II 0.942 0.897 0.013 10677 2.59 Proposed 0.934 0.883 0.015 4123 1

Fig. 3. Comparison of the performance between the proposed algorithm, Algorithm I and Algorithm II in each main test. (a), (b) and (c) are the comparisons for Basic, JPEG100 and Blur

tests respectively. For each chart, green bars represent Algorithm I, red bars represent Algorithm II and blue bars

(4)

As we can see in Table 1, both of the three algorithms have very comparable performance but the proposed algorithm significantly outperforms both of the two algorithms in terms of computational speed in all the three experiments. Notice that Algorithm I is a scale and rotation invariant method. So, under differences in orientation and/or scale of forgeries, Algorithm I may have a better quality, while the presented technique will be a better option when there are no variations in orientation/scale due to its speed. Also, one has to make a trade-off between accuracy and speed of computation when specifying the value of



. This is because it can help the

technique to handle the fine details of forgeries if



was a relatively small value, but this will slow down the performance as a result of increasing the number of the tree nodes. When choosing a large value of



, the effect was found to be the opposite. Fig. 3 illustrates some samples of the output of the presented technique. It can be shown from Fig. 3 that the proposed technique detects the duplicated parts in an accurate way as the detection maps and ground truth are very similar to each other.

5 CONCLUSIONS

Digital image forgeries can cause severe social and legal issues. Copy-move forgery is a simple yet powerful technique of such forgeries. So, a need is always present to develop robust algorithms to handle this problem. In this paper, an accurate and fast technique to tackle the copy move forgery detection problem has been presented. To check the efficiency, the proposed technique has been compared with two state-of-the-art methods. Results of the experiments show that the presented technique is effective and efficient. Our future work includes investigating algorithm parallelization which may be of high significance especially in the case of images of large size.

Figure.3 Some samples of the presented algorithm output for images from the Basic test. For each row from left to right, the image, the ground truth and the algorithm output are shown

respectively

REFERENCES

[1]. G. Birajdar, V. Mankar, Digital image forgery detection using passive techniques: A survey, Digital Investigation 10 (3) (2013) 226-245.

[2]. Amerini, L. Ballan, R. Caldelli, A. D. Bimbo, G. Serra, A sift-based forensic method for copy-move attack detection and transformation recovery, IEEE Transactions on Information Forensics and Security 6 (3) (2011) 1099-1110.

[3]. H. Bay, A. Ess, T. Tuytelaars, L. V. Gool, Speeded-up robust features (surf), Computer Vision and Image Understanding 110 (3) (2008) 346-359.

[4]. D. Lowe, Distinctive image features from scale-invariant keypoints, Int. J. Computer Vision 60 (2) (2004) 91-110.

[5]. B. Yang, X. Sun, H. Guo, Z. Xia, X. Chen, A copy-move forgery detection method based on cmfd-sift, Multimedia Tools and Applications 77 (1) (2018) 837-855.

[6]. X. Pan, S. Lyu, Detecting image region duplication using sift features, Intl. Conference on Acoustics, Speech and Signal Processing (ICASSP) (2010) 1706-1709.

[7]. H. Huang, W. Guo, Y. Zhang, Detection of copy-move forgery in digital images using sift algorithm, Intl. Conference on Acoustics, Speech and Signal Processing (ICASSP) (2008) 272-276.

[8]. X. Pan, S. Lyu, Region duplication detection using image feature matching, Trans. Inform. Forensics Secur. (2010) 857-867.

[9]. B. L. Shivakumar, S. Baboo, Detection of region duplication forgery in digital images using surf, International Journal of Computer Science Issues 8 (4) (2011) 199-205.

[10].X. Bo, W. Junwen, L. Guangjie, D. Yuewei, Image copy-move forgery detection based on surf, Multimedia Information Networking and Security (2010) 889-892.

[11].S. Ryu, M. Lee, H. Lee, Detection of copy-rotate-move forgery using zernike moments, Information Hiding Conference (2010) 51-65.

[12].C. Xiong, J. Zhu, Y. Li, R. Xiang, Image-based forgery detection using big data clustering, Multimedia Tools and Applications (2018) 1-8.

[13].O. M. Al-Qershi, B. Khoo, Copy-move forgery detection using on locality sensitive hashing and k-means clustering, Information Science and Applications (ICISA) (2016) 663-672.

[14].J. Wang, G. Liu, Z. Zhang, Y. Dai, Z. Wang, Fast and robust forensics for image region-duplication forgery, Acta Automatica Sinica 35 (12) (2009) 1488-1495. [15].M. Bashar, K. Noda, N. Ohnishi, K. Mori, Exploring

duplicated regions in natural images, IEEE Transactions on Image Processing.

[16].S. Bayram, H. Sencar, N. Memon, An efficient and robust method for detecting copy-move forgery, IEEE International Conference on Acoustics, Speech, and Signal Processing (2009) 1053-1056.

(5)

[18]. W. Luo, J. Huang, G. Qiu, Robust detection of region-duplication forgery in digital images, International Conference on Pattern Recognition 4 (2006) 746-749. [19]. S. Bravo-Solorio, A. K. Nandi, Exposing duplicated regions affected by reflection, rotation and scaling, International Conference on Acoustics, Speech and Signal Processing (2011) 1880-1883.

[20]. E. Silva, T. Carvalho, A. Ferreira, A. Rocha, Going deeper into copy-move forgery detection: Exploring image telltales via multi-scale analysis and voting processes, J. Vis. Commun. Image R. 29 (2015) 16-32.

[21]. Ferreira, S. C. Felipussi, C. Alfaro, P. Fonseca, J. E. Vargas-Muñoz, J. A. dos Santos, A. Rocha, Behavior knowledge space-based fusion for copy-move forgery detection, IEEE Transactions on Image Processing 25 (10) (2016) 4729-4742.

[22]. M. Moussa, A fast and accurate algorithm for copy-move forgery detection, International Conference on Computer Engineering & Systems (ICCES) (2015) 281-285.

[23]. M. Moussa, M. Habib, R. Y. Rizk, FRoTeMa: Fast and robust template matching, International Journal of Advanced Computer Science and Applications (IJACSA) 6 (10).

[24]. M. Moussa, Robust template matching for grayscale images, Master thesis, Port-Said University, Egypt (2016).

[25]. S. Bochkanov, Alglib, Available from: http://alglib.net/. [26]. Open computer vision library, Available from: