• No results found

Paper ID:

N/A
N/A
Protected

Academic year: 2021

Share "Paper ID:"

Copied!
23
0
0

Loading.... (view fulltext now)

Full text

(1)

Paper ID: 1570186597

imPlag: Detecting Image Plagiarism Using Hierarchical Near Duplicate Retrieval

Siddharth Srivastava, Prerana Mukherjee, Brejesh Lall

Indian Institute of Technology, Delhi

(2)

Introduction

The key characteristic of image plagiarism is that it may involve the reproduction of the original image using an entirely different mode such as hand made sketches. Image Plagiarism can be posed as a superset of image copy detection problems.

Fig. 1. (a) Original Image (b) Plagiarised image (reproduction of the source image) (c) Copied image (considered as strong attack by copy detection algorithms but an expected case for Image Plagiarism)

(3)

Problems ?

• Detection of similar images – Huge Databases, Interactive Time

• Plagiarism brings in innovation

• Hence involves both Research and Engineering Challenges

- Stitched from 3888 images

- One column/row pixel from each image

So knowing your limits is necessary

Image Courtesy:Eirik Solheim (Image has been used for demonstrating

the extent of deformation possible in images)

(4)

KEY CONTRIBUTIONS

Development of a hierarchical feature extraction and feature indexing technique.

Evaluation of recent feature extraction techniques against simple, moderate and extreme deformations.

Dataset construction for testing image plagiarism algorithms.

(5)

Dataset

• Natural Images – mountains, rivers, animals, birds etc.

• Actual scenario – too many images can be similar but might not be plagiarized (synthetically transformed)

• So for evaluation, dataset was created since detecting image plagiarism is not really only Content Based Image Retrieval

• Search for images on Flickr, ukbench dataset

• Find similar images using Google Reverse Image Search (Google doesn’t index Flickr !!)

• Transformed Images – Affine, Grayscale, Color channel separation

etc. (30 transformations)

(6)

Methodology

Heirarchical Feature Extraction

Feature Indexing

Search Query NN

Match/Exact Matching Ranking or

Verification

Fingerprint the image - Perceptual Hash

- SIFT >

SURF, ORB, FREAK, PCA-SIFT

Store for retrieval - Database

- Apache Lucene - Locality Sensitive Hashing

Search the index - Search LSH Index Relevant Results ranked at the top

- Bag of Visual Words

Histogram matching

(7)

Hierarchical Indexing

Lucene Index

Perceptual Hash

Locality Sensitive Hashing

SIFT Features

Bag of Visual Words

Images

Lucene Index Traditional

Database

(8)

Layered Retrieval

Input Image

Perceptual Hash Calculate of input Image

Search LSH Index for nearest neighbours

Rank images based on BoVW

histogram matching

(9)

Perceptual Hash

• Can be used for multimedia content (audio, video, images)

• Similar images have similar hash values

Scale image to

32x32

Convert to grayscale

Compute DCT

Keep first 8x8 coefficie

nts

Take Average

(no DC Term) Coeff > Avg

=> 1 Coeff <= Avg

=> 0 Flatten to

64bit vector

(10)

Bag of Visual Words

• SIFT features converted to Bag of Visual Words

• More efficient than direct keypoint matching

• Observations:

• Large vocabulary size may increase false negatives

• Small vocabulary size may increase false positives

• Though there is no definite pattern on what should the vocabulary

size be

(11)

Results

• Accuracy: 81%

• Scalability

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

0 1000 2000 3000 4000 5000 6000 7000

Time(sec)

Number of images in the dataset

top-60 results

(12)

Conclusion

• We perform evaluations to choose best criteria and techniques for detecting image plagiarism.

• A method is proposed, consisting of perceptual hashing and SIFT with hierarchical approximate matching scheme.

• This scheme was able to maintain the tradeoff between time and

accuracy.

(13)

References

• E. Chalom, E. Asa, and E. Biton, “Measuring image similarity: an overview of some useful applications,”

Instrumentation & Measurement Magazine, IEEE, vol. 16, no. 1, pp. 24–28, 2013.

• C. Zauner, M. Steinebach, and E. Hermann, “Rihamark: perceptual image hash benchmarking,” in IS&T/SPIE Electronic Imaging. International Society for Optics and Photonics, 2011, pp. 78 800X–78 800X.

• V. Voronin, V. Frantc, V. Marchuk, and K. Egiazarian, “Fast texture and structure image reconstruction using the perceptual hash,” Image Processing: Algorithms and Systems XI, 2013.

• A. Kumar, A. Anand, A. Akella, A. Balachandran, V. Sekar, and S. Seshan, “Flexible multimedia content

retrieval using infonames,” ACM SIGCOMM Computer Communication Review, vol. 41, no. 4, pp. 455–456, 2011

• A. Gionis, P. Indyk, R. Motwani et al., “Similarity search in high dimensions via hashing,” in VLDB, vol. 99, 1999, pp. 518–529.

• V. Christlein, C. Riess, J. Jordan, and E. Angelopoulou, “An evaluation of popular copy-move forgery detection approaches,” Information Forensics and Security, IEEE Transactions on, vol. 7, no. 6, pp. 1841– 1854, 2012.

• M. Lux and S. A. Chatzichristofis, “Lire: lucene image retrieval: an extensible java cbir library,” in Proceedings of the 16th ACM international conference on Multimedia. ACM, 2008, pp. 1085–1088.

(14)

THANKYOU

(15)

Appendix: Dataset Images

Perceptually Similar ?

(16)

Appendix: Nature is not always greenish

(17)

Appendix: Accuracy

0 10 20 30 40 50 60 70 80 90 100

60 100 150 200 250

Accuracy(%)

Number of Results(top-N)

Accuracy(%)

(18)

Appendix: Results

Fig 2. Comparison of Feature matching techniques Fig 3. Average time taken by SIFT, SURF and Perceptual Hash PH

SURF

SIFT

(19)

Appendix: Results

Fig 4. Comparison of ranked retrieval Fig 5. Ranked V/s Non Ranked Retrieval

(20)

Appendix: Results

Fig 6. Time vs Number of results Fig 7. Time vs Number of Images in the dataset

(21)

Appendix: Results

Fig 8. Lucene v/s Database Retrieval time

(22)

Locality Sensitive Hashing

• Similar features hashed to same hash values

• Parameters

• No of bits (k)

• No of tables (l)

• Maximum Bucket capacity (usually unlimited)

• Empirical Analysis needed for determining parameters as per the dataset

• varying number of bits, varies bucket size (small hash, more collisions

and vice versa)

(23)

Lucene

• Very efficient in document indexing and retrieval

• Bag of Visual words histograms are indexed

• Allows for random access of documents

• Histograms are fetched from Lucene index and ranked (Filtering)

References

Related documents

 select participants for testing where select participants for testing where there are there are indications of herbal incense use. indications of herbal

Perioperative in- formation collected prospectively included type of procedure performed, operative time (time from incision to closure), time for placing bag in the abdomen

A report by the Institute of Medicine#{176} of the National Academy of Sciences has estimated that current rates of low birth weight infants could be reduced by 15% among the

An example of practical implementation of three-level non-formal educational system in informatics with the help of which lessons in CCCP were organized (Children’s Computer Club

There is a direct link be- tween energy security and climate change mitigation actions that focus on fuel switch options, such as biofuels and electrification [ 40 , 46 , 66 ]