International Journal of Emerging Technology and Advanced Engineering
Website: www.ijetae.com (ISSN 2250-2459,ISO 9001:2008 Certified Journal, Volume 3, Issue 5, May 2013)
639
Analysis of existing CBIR Systems: improvements and
validation Using Color Features
Fuhar Joshi
11
Student, Medicaps Institute of Technology and Management Indore Abstract— Content-based image retrieval is a technique
which utilizes visual contents to search images from large scale image databases according to user interest. A significant progress has been made in equally theoretical research and system development. Early methods were not generally based on visual features but on the textual naming or tagging of images. In other words, images were first tagged with text and then searched using a text-based approach from traditional database management systems. In this experimental paper, we propose an effective content-based image retrieval (searching) method based on simplest feature that is extracted from images such as combination of multi-resolution colour. The purpose of this paper is to explore a new possibility which enables searching of the images an easy and user friendly task which is rather cumbersome in nature. Mostly images are searched through its Meta data i.e. user enters either the name of the image, or relevant facts associated with it which works fine. But due to massively increasing databases of images currently makes it impossible to individually annotate the images. Primary goal of this paper is to understand basic concepts involved in image processing currently used and utilized the DCT algorithm in order to develop a GUI based application. The paper is of utmost importance as it will pave a path for users to understand new breakthroughs in field of image processing particularly and computer science at large.
Keywords— Colour, Digital image, Histogram Image, CBIR, Data Mining, Retrieval.
I. INTRODUCTION
Content-based image retrieval (CBIR), also known as query by image content (QBIC) is a procedure which uses visual contents to explore images from large scale image databases according to individual interest. Early techniques were not usually based on visual features but on the textual naming or tagging/labeling of images. In other words, images were first tagged with text and then searched using a text-based approach from traditional database management systems. Text-based image retrieval uses conventional database techniques to manage images. Content-based image retrieval utilizes the visual contents of an image such as texture, shape and color layout to represent the image.
In typical content-based image retrieval systems the visual contents of the images in the database are extracted. The feature vectors of the images in the database form a feature database.
[image:1.612.341.548.275.449.2]
Figure 1: A general Overview of CBIR System [13]
To retrieve images, users provide the retrieval system with example images. The system then converts this query image into its internal representation of feature. The similarities /distances between the feature of the query example or sketch and those of the images in the database are then calculated and retrieval is performed. Figure 1 represents the general overview of CBIR. Recent retrieval systems have incorporated users' relevance feedback to modify the retrieval process in order to generate perceptually and semantically more meaningful retrieval results.
II. KEY CONCEPTS
The key concepts used in this paper are discussed below
International Journal of Emerging Technology and Advanced Engineering
Website: www.ijetae.com (ISSN 2250-2459,ISO 9001:2008 Certified Journal, Volume 3, Issue 5, May 2013)
640 B) Digital image: A digital image is a numeric illustration (binary) of a two dimensional image. Depending upon the setting of image resolution, it may be of raster or vector type. The term "digital image" usually refers to raster images also called bitmap images. [7][12]
C) CBIR: Content-based image retrieval (CBIR) is the application of computer vision techniques to the image retrieval problem, that is, the problem of searching for digital images in large databases [7].
D) Data Mining: Data mining an interdisciplinary subfield of computer science is the computational method of discovering patterns in large data sets involving processes at the junction of database systems, statistics, machine learning, and artificial intelligence. [12]
E) Colour: Colour or color is the visual perceptual property related in humans to the type called blue, red, pink, green and other colours. Colour originated from the spectrum of light interacts in the eye with the phantom sensitivities of the light receptors. [8]
F) Histogram: Histogram is one of the very basic image features. The baseline method of representing color information of images in CBIR systems is making use of color histograms. A color histogram representation is a kind of bar graph, where each bar corresponds to a particular color of the color space being involved. [9]
G) Texture: An image applied to a polygon to create the appearance of a surface
III. RELATED WORK
Content-based image retrieval is a method which uses visual contents to search images from large scale image databases according to users concern, has been an active and fast progressive research area since the inception of 1990. During the past decade, remarkable progress has been made in both theoretical research and system development. However, there stay behind many taxing research issues that keep on to attract researchers from multiple disciplines.
Nidhi Singhai et al [4] in their survey paper entitled ―A Survey On: Content Based Image Retrieval Systems‖ provides the discussion, analysis and comparison of various techniques of content based image retrieval. Journal paper also introduced the features like color histogram fuzzy technique, texture and edge density for accurate and efficient Content Based Image Retrieval System.
Anlei Dong and Bir Bhanu [1] in their paper proposed an active concept learning approach relies on the mixture model to deal with the two basic aspects of a database system: the changing nature of a database and user queries. Author proposed a new user directed semi supervised expectation-maximization algorithm for mixture parameter estimation, and developed a novel model selection method based on Bayesian analysis to achieve concept learning that evaluates the consistency of hypothesized models with the available information. The key contributions in this paper are
1) A novel semi-supervised EM algorithm for mixture model parameter estimation. By inserting a modification step between E-step and M-step based on the labeling information obtained from multiple users, they achieve reliable concept learning which is close to the ground-truth image distribution.
2) A novel semi-supervised model selection algorithm, which can efficiently learn the number of components in the mixture model.
3) Used the concept learning knowledge to improve the retrieval performance of dynamic databases, Author presented a concept knowledge transduction approach that can efficiently deal with the cases of image insertion and query images being outside the database,
4) A variety of experimental results on Corel database show the efficacy of active concept learning approach and the improvement in retrieval performance by concept transduction.
Szabolcs [2] in his paper entitled ―Color Histogram Features Based Image Classification in Content-Based Image Retrieval Systems‖ analyzed one of the preprocessing algorithms for image classification. Author introduced a novel approach which is based on low level image histogram features.
International Journal of Emerging Technology and Advanced Engineering
Website: www.ijetae.com (ISSN 2250-2459,ISO 9001:2008 Certified Journal, Volume 3, Issue 5, May 2013)
641 John M. Zachary et al [5] presents the problem from the viewpoint of real-world system creation, discusses the main feature extraction techniques used in current CBIR systems, and summarizes a number of CBIR system implementations.
Wei-Ying Ma et al [6] is a prototype image retrieval system developed at the University of California at Santa Barbara. The system uses a hybrid approach to feature extraction by incorporating color, texture, and shape information from an image in its indexing method. The characteristic feature of NETRA is it uses segmented local regions for indexing images in the database. Thus, both global and local characteristics are exploited.
IV. ISSUES UN-ADDRESSED IN THE EXISTING
APPROACHES
With the growing scope and expansion of multimedia, digital equipment are becoming more popular, users are not satisfied with the conventional information (image) retrieval techniques and systems. So at the present time the content based image retrieval is flattering a source of fast and correct retrieval. The one of the aim of this paper is to discuss about the challenges and the learning opportunities offered in the field of image processing and image data mining. The "content" in the CBIR system indicates that the searching technique of the image will focus on contents of the image rather than the metadata of the image. The intention of the work presented in the paper is to explore new possibilities which will make searching of the images an easy task which is rather cumbersome in nature. The Existing approaches in the same area are more cumbersome and complex based on various different features. The ultimate objective of current research is to provide an efficient solution by reducing the number of iterations required and to improve overall retrieval performance and Time complexity using the visual contents of an image such as RGB color features. Target search in content-based image retrieval (CBIR) systems refers to finding a specific (target) image such as a particular registered logo or a specific historical photograph in a specified time.
V. IMAGE RETRIEVAL BASED ON COLOR FEATURE USING DCT
The focus of this work is on utilizing transformation technique for searching, browsing and retrieving images from a image repository indexed by user.
In this proposed work Discrete Cosine Transform is used to generate the feature vector. The proposed system involves the simplest feature (color) of any image content instead of more complex and advance features and DCT of each R, G and B components of an image are considered. The application is based on DCT (Discrete cosine Transform) using the JPEG coefficients of a compressed image [14][15]
A Discrete cosine transforms
Discrete cosine transforms (DCTs) convey a function or a signal in terms of a sum of sinusoids with diverse frequencies and amplitudes. Similar to the discrete Fourier transforms (DFT), a DCT process on a function at a fixed number of discrete data point. [16]
B. Methodology Involved
Methodology involved in work are given below
C. Development Environment
The functional code for proposed prototype system was implemented using Sun JAVA 1.6. It is executed in the minimum hardware of Intel Pentium III,RAM of 1 GB,hard disk capacity of 80GB,15 inches monitor, 104 keys and mouse, the software java, the windows XP operating system and that has been used in this work are found to be technically feasible.
D. Database Preparation
The database of images consists of different categories such as Flag, Flowers, Gun, Speedway, textures, Regensburg, child, Food and Natural scenes and collected from the open source. Every category is used for retrieval. These images are accumulated in few popular formats with size 384x256 and 256×384, in CFS file system. All images are represented with RGB color space.
VI. EXPERIMENTAL RESULTS
International Journal of Emerging Technology and Advanced Engineering
Website: www.ijetae.com (ISSN 2250-2459,ISO 9001:2008 Certified Journal, Volume 3, Issue 5, May 2013)
[image:4.612.76.265.149.339.2]642
Figure 2: Representative sample of query images of different class
VII. PERFORMANCE EVALUATION
[image:4.612.348.571.171.484.2]To calculate the performance of image retrieval system, two dimensions, namely, recall and precision [17], are borrowed from conventional information retrieval. Upon generating the feature vectors for all images in the repository, they are stored in a CFS file. A feature Vector of query image of every category is computed to search the feature database. The sorted image distance measure in ascending order between the query image and the database images feature vectors are used to calculate the precision and recall to measure the retrieval performance of the algorithm used.
Figure 3: Query Image from Gun Class
International Journal of Emerging Technology and Advanced Engineering
Website: www.ijetae.com (ISSN 2250-2459,ISO 9001:2008 Certified Journal, Volume 3, Issue 5, May 2013)
643
TABLE 1
AVERAGE PRECISION FOR DIFFERENT CLASS OF IMAGE FOR FIRST 15 RESULT IMAGES
The proposed work has produced good results as it can be seen in the Figure 4 where 6 retrieved images are of same category and relevant as the query image in figure 3. The very first image is itself the query image.
To show the performance and efficiency of this proposed work, for color images we made a collection of images composed of around 500 overlapping images created from image of Gun, Flag, Flower, Riegersburg, baby Texture etc Hence the image repository includes 8 different classes of images and each class includes variable images.
Image
Class
Total Relevant Images Obtained in 15 retrieved images
Min Max Avg
Precision
(Percent age) Query
1
Query
2
Query
3
Query
4
Gun 10 11 14 9 9 14 11
73.3
Flag 10 13 11 12 11 13 11.5
76.6
Flowers 8 13 12 9 8 13 10.5
70
Riegersburg 11 9 7 10 7 11 9.25
61.66
Baby 15 10 10 11 10 15 11.5
76.6
texture 12 11 10 8 8 12 10.25
68.3
Graz 6 6 8 10 6 10 7.5
50
Berta 10 11 6 9 6 11 9
60
Average Precision
International Journal of Emerging Technology and Advanced Engineering
Website: www.ijetae.com (ISSN 2250-2459,ISO 9001:2008 Certified Journal, Volume 3, Issue 5, May 2013)
644 Any retrieval system should retrieve majority of images as similar in response to a query image. Project used the original randomly selected 4 images from the image database as the query image.
The overall retrieval performance of the implemented algorithm is illustrated and summarized in Table 1. Precision percentage is calculated for individual class as well as for whole classes of images. Result shows an Average precision including all categories is 67.05 which are considerably good.
VIII. CONCLUSIONS
The Paper presents a contribution towards the process is about image search engine, not by text annotated to the image by an end user, but using the visual contents available into the image itself. A pragmatic analysis of content based image retrieval algorithm and methods are done. Common visual features most extensively used in content-based image retrieval are color, texture and shape. Color is typically represented by the color histogram, color correlogram, and color coherence under a certain color space. The proposed system can be categorized as a content-based image retrieval system and is uniformly efficient both for similar and sub-image searching. However the essential indexing time is a little larger than the existing algorithms but retrieval efficiency is fairly good, the work utilized Discrete Cosine Transform of each R, G and B component of images individually to generate feature vectors of 3 dimensions which is considerably small in size as compared to using full transform as a feature vector. The end product is an easy to handle GUI based application which will enable the end user to explore the system conveniently and efficiently.
REFERENCES
[1] Anlei Dong and Bir Bhanu ―Active Concept Learning in Image Databases‖ IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS PART B CYBERNETICS, VOL. 35, pp-450-466 [2] Szabolcs Sergy´an ―Color Histogram Features Based Image
Classification in Content-Based Image Retrieval Systems‖ 6th IEEE International Symposium on Applied Machine Intelligence and Informatics, pp. 221-224
[3] Ritendra Datta, Dhiraj Joshi, Jia Li and James Z. Wang ―Image Retrieval: Ideas, Influences, and Trends of the New Age‖ ACM Computing Surveys, Vol. 40, No. 2, pp. 5:1-5:60
[4] Nidhi Singhai & Prof. Shishir K. Shandilya―A Survey On: Content Based Image Retrieval Systems‖ International Journal of Computer Applications Volume 4 – No.2, pp. 22-26
[5] John M. Zachary, Jr. and Sitharama S. Iyengar ―Content Based Image Retrieval Systems‖
[6] Wei-Ying Ma, B. S. Manjunath (1999)―NeTra: A toolbox for navigating large image databases‖ Journals of Multimedia Systems 7 pp 184–198
[7] Rafael C. Gonzalez, Richard E. Woods (2nd Edition) Digital Image Processing
[8] Calvinnme (2nd Edition) Digital Image Processing
[9] Gose, Earl, Jost, Steve, Johnsonbaugh, Richard (1st Ed) Pattern Recognition and Image Analysis.
[10] Suenaga, Yasuhito, Sakaue, Katsuhiko, Aizawa, Kiyoharu (1st Ed) Image Processing Technologies: Algorithms Sensors and Applications.
[11] Chattamvelli,Rajan (2011) Data mining algorithms Narosa Publishing House:
[12] http://en.wikipedia.org/
[13] Abebe Rorissa ―Image Retrieval Benchmarking Visual Information Indexing and Retrieval Systems‖, The information association for the information age, Bulletin, February/March 2007
[14] Ngo,C W, Pong T C and Chin R T. ―Exploiting Image Indexing Techniques in DCT Domain‖, Pattern Recognition, 34, 2001 pp. 1841-1851.
[15] Huang, Y. L. and Chang, R. F., ―Texture Features for DCT-Coded Image Retrieval and Classification‖, Proceedings of IEEE Int’l conference on Acoustics, Speech and Signal Processing, Phoenix,AZ, USA, pp. 3013-3016.
[16] The Discrete Cosine Transform (DCT)
(http://www.cs.cf.ac.uk/Dave/Multimedia/node231.html)