Automatic Image Annotation and Semantic based Image Retrieval System

(1)

1

Automatic Image Annotation and Semantic based

Image Retrieval System

Ismail PK, D. Ravi, R. Subathra, Kathir College of Engineering, Coimbatore

Abstract—Automatic image annotation is the process by which a computer system automatically assigns metadata in the form of captioning or keywords to an image. This application of computer vision techniques is used in image retrieval systems to organize and locate images of interest from database. Most of the existing work such as Latent Semantic Indexing and Markovian Semantic Indexing (MSI) has been presented for online image retrieval system. By extending these methods, the proposed system uses Semantic annotated Markovian Semantic Indexing (SMSI) for retrieving the images. The proposed system automatically annotates the images using hidden Markov model. To annotate images, concepts are represented as states by using Hidden Markov model. The parameters of the model are estimated from a set of manually annotated (training) images. Each image in a large test collection is then automatically annotated with the a posteriori probability of concepts present in it. After annotating these images, semantic retrieval of images can be done. This can be done by using Natural Language processing tool namely WordNet and also measuring semantic similarity of annotated images in the database using MSI.

Index Terms — Image Retrieval, Hidden Markov model, Markovian Semantic Indexing. I. INTRODUCTION

Image retrieval procedures can be divided into two approaches: query-by-text and query-by-example. In query by text, queries are texts and targets are images. That is, query by text is a cross-medium retrieval. In query by example, queries are images and targets are images. Thus, query by text is a mono-medium retrieval. For practicality, images in query by text [19] retrieval are often annotated by words. When images are sought using these annotations, such retrieval is known as annotation-based image retrieval (ABIR). In contrast, annotations in a query by example, setting are not necessary, although they can be used. The retrieval is carried out according to the image contents. Such retrieval is known as content-based image retrieval (CBIR). A basic difference between ABIR and CBIR is related to the values of textual and visual information in image retrieval.

An image retrieval system is a computer system for browsing, searching and retrieving images from a large database of digital images. Most traditional and common methods of image retrieval utilize some method of adding metadata such as captioning, keywords, or descriptions to the images so that retrieval can be performed over the annotated words. Manual image annotation is time consuming, laborious and expensive; to address this, there has been a large amount of research done on automatic image annotation. Additionally, the increase in social web applications and the semantic web have inspired the development of several web-based image annotation tools. Image search is a specialized data search used to find images. To search for images, a user may provide query terms such as keyword, image file/link, or click on some image, and the system will return images similar to the query. The similarity used for search criteria could be meta tags, color distribution in images, region per shape attributes, etc.

Automatic image annotation (also known as automatic image tagging or linguistic indexing) is the process by

which a computer system automatically assigns metadata in the form of captioning or keywords to a digital image. This application of computer vision techniques is used in image retrieval systems to organize and locate images of interest from a database. This method can be regarded as a type of multi class image classification with a very large number of classes as large as the vocabulary size. Typically, machine learning uses feature vectors extracted from images and the annotation words during the training phase to automatically apply annotations to new images. The other is that the search problem is far more tolerant of erroneous inferences than is usually assumed necessary for object recognition in individual images. For this first develop a generative stochastic model for images and their accompanying captions for Hidden Markov Model. Training material for such a model, is readily obtained from naturally occurring image plus caption pairs. The vocabulary of the words used in the captions is used to annotate the images that may not have surrounding text. If surrounding text is available, this model is capable of using it as a prior, and computing posterior probabilities of the words in the caption given the visual evidence in the image. The parameters of this model estimates from a manually annotated (training) collection of image plus caption pairs, and then automatically annotate a collection of test images using this generic vocabulary of keywords.

II. RELATED WORK

(2)

2

semantics through levels of generalization, as well as being a natural choice for browsing applications. An additional advantage of our approach is that since it is a generative model, it implicitly contains processes for predicting image components, words and features from observed image components. The Story Picturing Engine [2] presents an unsupervised approach to automated story picturing. Semantic keywords are extracted from the story; an annotated image database is searched. Thereafter, a novel image ranking scheme automatically determines the importance of each image. Both lexical annotations and visual content play a role in determining the ranks. Annotations are processed using the Wordnet. A mutual reinforcement based rank is calculated for each image.

Hidden Markov Models (HMM)

Hidden Markov Models for Automatic Annotation [4] introduces a novel method for automatic annotation of images with keywords from a generic vocabulary of concepts or objects for the purpose of content-based image retrieval. An image, represented as sequence of feature vectors characterizing low-level visual features such as color, texture or oriented-edges, is modeled as being stochastically generated by a hidden Markov model, whose states represent concepts. The parameters of the model are estimated from a set of manually annotated (training) images. Each image in a large test collection is then automatically annotated with a posteriori probability of concepts present in it. Annotation of Images Using Spatial Hidden Markov Model is a 2D generalization of the traditional HMM in the sense that both vertical and horizontal transitions between hidden states are taken into consideration. The two most crucial problems with this approach are how to build a statistical model[5] for each concept class, and how to propagate annotations from keywords associated with some specific classes. In this approach a new spatial-HMM is used to describe the spatial relationships of objects and investigate the semantic structures of concepts in natural scene images.

Latent Semantic Indexing (LSI)

Latent semantic indexing (LSI)[7] is an indexing and retrieval method that uses a mathematical technique called singular value decomposition (SVD) to identify patterns in the relationships between the terms and concepts contained in an unstructured collection of text. LSI is based on the principle that words that are used in the same contexts tend to have similar meanings. A key feature of LSI is its ability to

extract the conceptual content of a body of text by establishing associations between those terms that occur in similar contexts. When Latent Semantic Indexing (LSI) is used for Image retrieval, Singular Value Decomposition (SVD) is used to decompose the Image-Name by Feature matrix into three matrices: T: an Image_Name by dimension matrix, S: a singular value matrix (dimension by dimension), and D: a Feature by dimension matrix. After computing the SVD using the original term by document matrix, we calculate image-to-image similarities. LSI provides two natural methods for describing image-to-image similarity. First, the Image-to-Image matrix can be created using TkSk. .This matrix can be used to compare images using a vector distance measure, such as cosine similarity. In this case, the cosine similarity is computed for each pair of rows in the TkSk matrix. After the image similarities are created, we need to determine the order of co-occurrence for each pair of images. The order of occurrence is computed by tracing the co-occurrence paths.

Markovian Semantic Indexing (MSI)

The Markovian Semantic Indexing (MSI)[8], a new method for automatic annotation and annotation based image retrieval. The properties of MSI make it particularly suitable for ABIR tasks when the per image annotation data is limited. The characteristics of the method make it also particularly applicable in the context of online image retrieval systems. The proposed approach (MSI) is presented in together with a proximity measure (distance). In geometric interpretation and the optimal properties of the proposed distance are examined a statistical model with hybrid text/visual characteristics also based on the aspect model. The approach proposed here in, while stochastic in nature, raises the reasoning aspect of probabilities, as it defines explicit relevance relationships between keywords. Using network representations for capturing semantics is common in AI systems.

III. PROPOSED SYSTEM

In proposed system, Semantic annotated Markovian Semantic Indexing (SMSI) is introduced. In the framework of an image retrieval system where users search for images by submitting queries that are made of keywords. This system carries out two major tasks.

1.Automatic image annotation 2.Annotation based image retrieval

3.1 System Architecture

(3)

3

generate an annotated image database. Annotation based image retrieval phase gets a user query, and then finds similar terms for the query with the help of WordNet. Also discover the similarity between the query and images in annotated image database. Then find the similarity between matching images. The architecture for the proposed system is shown in Figure 3.1. The cosine similarity is computed for each pair of rows in the matrix. After the term similarities are created, we need to determine the order of co-occurrence for each pair of terms. The order of co-occurrence is computed by tracing the co-occurrence paths.

The collection of valuable user generated posts and their summary generation process is done by dividing the task into four different modules. These modules are implemented [10] in the order as they are given in the following section.

3.2 System Modules

System implementation consists of two phases. Image annotation phase and Image retrieval phase.

Figure 3.2 Image Annotation

In the Image Annotation phase a test image is given as input to the system. Low level features like color, texture and shape are extracted from the test image. This phase makes use of a training set consisting of 1000 images manually annotated for the experimental purpose. The above said features were extracted from each image in the training set and stored in the database. HMM classifier is used to annotate the image using existing categories. The process is repeated for large number of test images such that this phase can be used to automatically generate classified image sets which can be used for image retrieval.

3.2.2 Image Retrieval Phase

In the image retrieval phase, a text query is given as user input. Annotation based image retrieval is carried out. Search results are obtained from the web in the case that, they are not present in the annotated category obtained from the first phase. From the search results thus obtained, low level features based on color and texture is extracted. The image by feature matrix is thereby generated. The search results thus obtained are indexed using LSI and MSI. And the results are displayed accordingly.

Recall

Recall rate is defined as the ratio of number of relevant images retrieved to the total number of

relevant images in the collection

.

Recall = ((relevant images) & (retrieved images)) / Relevant images

Figure 4.1: Annotated Images from automatic annotation system. It shows the output of three categories of input images are (a) car (b) person (c)

birds.

Figure 4.3: Precision-Recall comparison graph

IV. IMPLEMENTATION DETAILS

(4)

4

SMSI image retrieval system.

4.1 Performance Matrices

The proposed method is compared with the Markovian semantic indexing and Latent Semantic Indexing. Comparison with MSI and SMSI in the application area of Annotation Based Image Retrieval with Precision versus Recall diagrams on ground truth databases reveal that the proposed approach achieves better retrieval scores.

Precision

The precision rate is defined as the ratio of the number of relevant images retrieved to the total number of images in the collection.

Precision = ((relevant images) & (retrieved images)) / retrieved images.

The experimental results for proposed system are plotted on graphs based on these formulas. The graphs shown in the figure 4.2 gives better analysis perspective on the Automatic annotated online image retrieval task.

The above graph in figure 4.3 compare the Precision-Recall parameter between LSI(Latent Semantic Indexing), MSI(Markovian Semantic Indexing) and SMSI (Semantic annotated Markovian Semantic Indexing). These measures are mathematically calculated by using formula. From the graph can see that, Precision-Recall of the system is reduced somewhat in LSI, MSI than the SMSI (Semantic annotated Markovian Semantic Indexing).So it is concluded that the Precision-Recall rate of SMSI is increased which will be the best one.

V. CONCLUSION AND FUTURE WORK

The present work proposes Semantic annotated Markovian Semantic Indexing (SMSI) a semantic image retrieval is done and its performance improved by incorporating an automatic annotation system. Automatic annotation of images in database has been done by using a proposed Hidden Markov model which uses the extracted features (color and texture) where all states represent the concepts. Semantic similarity based image retrieval can be done with the use of Natural language processing tool namely WordNet where conceptual similarity between natural language terms were done. Comparative analysis of proposed Semantic annotated Markovian Semantic Indexing (SMSI) has been done with the LSI annotated image retrieval and MSI annotated image retrieval. Experimental results of Precision versus Recall for the proposed system achieves better scores than LSI and MSI.

There is still a long way to go. The ultimate goal of image annotation is to process tens of thousands of

images of various concepts precisely and efficiently, not a single query case. Hence in the future, we will work on reinforcing the labels of images inside a large scale database. Moreover, we are interested to tackle the problem of how to annotate query images without any associated keywords.

5. Acknowledgements

An endeavor over a long period may be successful only with advice and guidance of many well wishers. I take this opportunity to express my gratitude to all who encouraged me to complete this work. I would like to express my deep sense of gratitude to my respected Principal for his inspiration and for creating an atmosphere in the college to do the work. I would like to thank Head of the department, Computer Science and Engineering .

REFERENCES

[1] K. Barnard and D. Forsyth, \Learning the semantics of words and pictures," in Computer Vision, 2001. ICCV 2001. Proceedings. Eighth IEEE International Conference on, vol. 2. IEEE, 2001, pp. 408{415.

[2] D. Joshi, J. Z.Wang, and J. Li, \The story picturing engine|a system for automatic text illustration," ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP), vol. 2, no. 1, pp. 68{89, 2006. [3] J. Li and J. Z. Wang, \Real-time computerized annotation of pictures," Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 30, no. 6, pp. 985{ 1002, 2008.

[4] A. Ghoshal, P. Ircing, and S. Khudanpur, \Hidden markov models for automatic annotation and content-based retrieval of images and video," in Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 2005, pp. 544{551. [5] F. Yu and H. H.-S. Ip, \Automatic semantic annotation of images using spatial hidden markov model," in Multimedia and Expo, 2006 IEEE International Conference on. IEEE, 2006, pp. 305{308.

[6] S. Wan and R. A. Angryk, \Measuring semantic similarity using wordnet-based context vectors," in Systems, Man and Cybernetics, 2007. ISIC. IEEE International Conference on. IEEE, 2007, pp. 908{913.

[7] M. W. Berry, S. T. Dumais, and G. W. O'Brien, \Using linear algebra for intelligent information retrieval," SIAM review, vol. 37, no. 4, pp. 573{595, 1995.

(5)

5

2002, pp. 168{175.

[10] S. Santini and R. Jain, \Similarity measures," Pattern analysis and machine intelligence, IEEE transactions on, vol. 21, no. 9, pp. 871{883, 1999. [11] L. Rabiner, \A tutorial on hidden markov models and selected applications in speech recognition," Proceedings of the IEEE, vol. 77, no. 2, pp. 257{286, 1989.

[12] F. Monay and D. Gatica-Perez, \On image auto-annotation with latent space models," in Proceedings of the eleventh ACM international conference on Multimedia.ACM, 2003, pp. 275{278.

[13] D. M. Blei, A. Y. Ng, and M. I. Jordan, \Latent dirichlet allocation," the Journal of machine Learning research, vol. 3, pp. 993{1022, 2003. [14] M. Wallace, Jawbone Java WordNet API, 2007. [Online]. Available: http:

//mfwallace.googlepages.com/jawbone.html

[15] Z. Zivkovic and B. Krose, \An em-like algorithm for color-histogram-based object tracking," in Computer Vision and Pattern Recognition, 2004. CVPR 2004. Pro- ceedings of the 2004 IEEE Computer Society Conference on, vol. 1. IEEE, 2004, pp. I{798.

[16] J. Han and K.-K. Ma, \Fuzzy color histogram and its use in color image retrieval," Image Processing, IEEE Transactions on, vol. 11, no. 8, pp. 944{952, 2002. [17] C. Mohankumar, J. Madhavan, P. Chauhan, G. Upadhyay, D. Shastri, M. Ibrahim, S. Raj, N. Jose, N. K. Srivastava, A. K. Jaiswal et al., \Content based image retrieval using 2-d discrete wavelet with texture feature with di_erent classi_ers," International Journal of Computer Science and Network Security, vol. 9, no. 5, 2009. Department of Computer Science & Engineering MEA Engineering College

[18] H. S. Bhatt, S. Bharadwaj, R. Singh, and M. Vatsa, \On matching sketches with digital face images," in Biometrics: Theory Applications and Systems (BTAS), 2010 Fourth IEEE International Conference on. IEEE, 2010, pp. 1{7.

[19] M. Inoue, \On the need for annotation-based image retrieval," in Proceedings of the workshop on information retrieval in context (IRiX), She_eld, UK, 2004, pp.

44{46.

AUTHORS BIOGRAPHY

Ismail P.K: He Completed UGfrom Mohandas College of Engineering and Technology, Thiruvananthapuram, India. He is Currently doing PG in computer Science and Engineering from Kathir College of Engineering, Coimbature, India. He is an active member of IEEE student branch.

Attended various International, National Conferences and Attended various work shop based on Image processing and Data mining and participated on national workshop on android application development. His research interests are Image processing, Data mining.

Ravi.D: He received the MCA degree a from the IGNOU, New Delhi in the year 2003, the M.Phil from the Bhrathidasan University, Trichy in the 2005. the M.E (CSE) in Anna University of Technology, Coimbatore in the year 2011. He has 11 years of