IMAGE RETRIEVAL BY INFORMATION FU SION OF MULTIMEDIA RESOURCES

(1)

IMAGE RETRIEVAL BY INFORMATION

FUSION OF MULTIMEDIA RESOURCES

Ms. Vaishali B. Khobragade, Mrs. Leena H. Patil, Mrs. Uma Patel

Abstract - Internet based image search using text queries has

received considerable attention. However, current state of image retrieval approach require training models for every new query, and are therefore unsuitable for real-world web search applications. Combining textual features, based on the occurrence of query terms from database and image meta-data, and visual histogram representations of images. In this paper different multimedia resources are method i.e textual and visual matched with the help of local binary pattern later by applying apriori algorithm relevant as well as irrelevant images can be retrieved. Furthermore, based on the demand of different applications and current technology, a few promising research directions are suggested.

Key items -Feature extraction and matching text and visual field, local binary pattern(lbp),apriori algorithm.

I.INTRODUCTION

The retrieving method utilizes multimodality fusion of the data, multimodal information (textual and visual) which is a recent trend in most relevant image retrieval researches. It combines two different multimodality area to retrieve semantically related images: clustering and apriori algorithm. The semantic association rules mining is constructed at the offline phase where the association rules are discovered between the text semantic clusters and the visual clusters of the images to use it later at the online phase .By using a textual pre-filtering combined with an image re-ranking in a Multimedia Data(image) Retrieval . Multimodal will refer to the ability to represent, process and analyze two data modalities simultaneously textual keywords or unstructured images.

There is three step based image retrieval processes and a well-selected combination of visual and textual techniques help the developed Multimedia Information Retrieval System to overcome the semantic gap in a given query: Content representation of each modality ,Information fusion, Multimodal retrieval algorithms. The proposed method is a Multimodal Fusion method based on Apriori algorithm considered as a late fusion which will uses Apriori algorithm to explore the relations between text semantic clusters and image visual features clusters for obtaining the most relevant images.

Earlier different information sources present in a multimedia resource (video, image, audio and text), multimedia fusion has become in a very interesting field of research in recent times for Information Retrieval (IR) and search in Multimedia Databases or on the Web.

Ms.Vaishali B.Khobragade, CSE Dept., P.I.E.T Nagpur,India

Prof.L.H.Patil CSE Dept., P.I.E.T Nagpur,India, Prof.Uma Patel,, CSE Dept., P.I.E.T Nagpur,India

In the particular case of image retrieval, both textual and visual features are usually provided: annotations or metadata as textual information, and low level features (color, texture, etc.) as visual information[1]. The idea behind multimedia fusion is to exploit the individual advantages of each mode, and use the different sources as complementary information to accomplish a particular search task. In an image retrieval task, multimedia fusion tries to help in solving the semantic gap problem while obtaining accurate results.

Currently, most Web based images search engines rely purely on textual metadata. That produces a lot of garbage in the results because users usually enter that metadata manually which is inefficient, expensive and may not capture every keyword that describes the image. On the other hand, the Content Based Image Retrieval (CBIR) systems can filter images based on their visual contents such as colors, shapes size or any other information that can be derived from the image itself which may provide better indexing and return more accurate results[2]. At the same time, these visual features contents extracted by the computer may be different from the image contents that people understand. It requires the

translation of high-level user perceptions into low-level image features and this is the called “semantic gap” issue. This issue is the reason behind why the CBIR systems are not widely used for retrieving Web images. A lot of efforts have been made to bridge this gap by using different techniques. The major categories of the state-of the- art techniques in narrowing down the semantic gap one of them is fusing the retrieval results of multimodal features. Fusion for image retrieval (IR) is considered as a novel area with very little achievements in the early days of research.

In Web medium, the representation of images can be naturally split into two or more independent modalities such as visual features (color, shape, size) and textual feature. Truly, we live in a multimodal world, and we as humans always take the benefit of each media for sensory interpretation to narrow the semantic gap problem and enhance the retrieval performance by fusing the two basic modalities of Web images, i.e. textual and visual features for retrieving relevant images.

(2)

been successfully in global ranking, Most of the fusion techniques in image retrieval are based on combining monomodal (textual and visual) results following symmetric schemas[2].

These strategies combine decisions coming from text and visual-based systems by mean of aggregation functions or classical combinations algorithms, which don’t take into account the different semantic level of each modality. At best, some of them just use weighted factors to assign different levels of confidence to each mode. It is an asymmetric multimedia fusion strategy, which exploits the complementarity of each mode. The schema consists in a prefiltering textual step, which semantically reduces the collection for the visual retrieval, followed by a monomodal results fusion phase. Finally will show how retrieval performance is improved, while the task is made scalable thanks to the significant reduction of the collection. The idea behind multimedia fusion is to exploit the individual advantages of each mode, and use the different sources as complementary information to accomplish a particular search task. In an image retrieval task, multimedia fusion tries to help in solving the semantic gap problem while obtaining accurate results.

I. LITERATURE SURVEY

The information retrieval community found the power of fusing multiple information sources for increasing the retrieving performance [3]. Information fusion has the potential of improving retrieval performance by relying on the assumption that the heterogeneity of multiple information sources allows cross correction of some of the errors, leading to better results [4].Depending on the available information in a certain field, different levels of fusion may be defined.

In [12] Late fusion is specified for combining different rankings, is also referred to as rank aggregation or data fusion. In information retrieval, data fusion merges the retrieval results of multiple systems and aims at achieving a performance better than all systems involved in the process. There are several algorithms to combine rankings that are well known in the information retrieval community, such as linear combination of rankings, summing all similarity scores for each document and voting algorithms inspired in social sciences among others. Simple algorithms based on sets operations to merge ranking lists have been evaluated for image retrieval, using a text search engine and a content-based image retrieval system .In addition, showed that linear combinations of text and visual rankings may lead to better results than each individual system.

In [5]Early fusion specifies an aim to build an integrated representation of multimodal data to take advantage of implicit relationships and normalize the feature vector representations of each modality. For image retrieval this approach has also been evaluated and extended using Latent Semantic Indexing [11]. An image is considered a document with text data in a vector space model and and visual patterns represented by a bag of features. Both representations are projected together to a latent space in which the search for similar images is performed. Canonical Correlation Analysis (CCA) has also been proposed to find relationships between visual patterns and text descriptions web image collection to identify links between visual and text representations in order to solve cross modal queries. More recently, the problem of early fusion has been

reformulated as a subspace learning problem that offers both dimensionality reduction and feature fusion [5]. The general problem of feature fusion is of great interest in multimedia processing for applications in classification and retrieval tasks.

In [6] Trans-media Fusion method has been stated, the main idea behind is to use first one of the modalities (say image) to gather relevant documents (nearest neighbors from a visual point of view) and then to use the dual modalities (text representations of the visually nearest neighbors) to perform the final retrieval. Most proposed methods under this category are based on adopted relevance feedback or pseudo-relevance feedback techniques. The pseudo-relevance feedback to gather the N most relevant documents from the dataset using some visual similarity measures with respect to the visual features of the query or, reciprocally, using a purely textual similarity with respect to the textual features of the query, then aggregate these mono-modal similarities.

In[7] Image Re-ranking is specified which is an another level for fusing the visual and textual modalities. In image re-ranking, it needs first to perform the search based on the text query. Then the returned list of images is reordered according to the visual features similarity. In [8], the cross-reference re-ranking strategy is proposed for the refinement of the initial search results of text-based video search engines. While [6] method deals with the clusters of the modalities, [7] proposed a method that construct a semantic relation between text (words) and visual clusters using the ARM algorithm.

In [8] relevance feedback presents an important gap has been stated, between the low-level features that search engines use and the human perception (semantic gap) [2], the TBIR (text-based image retrieval) systems can better capture the queries with a great “semantic meaning, and the conceptual meaning of the question than CBIR (content-based image retrieval) systems [5]. RF is an on-line processing which taken to reduce the gap between the high level semantic concepts and the low-level image features. It refers to the feedback from a user on specific images result regarding their relevance to a target image, the refined query is re-evaluated in each iteration. Basically, the image retrieval process with relevance feedback.

It comprised of four steps:

1. Showing a number of retrieved images to the user 2. User indication of relevant and non-relevant

images

3. Learning the user needs by taking into account his/her feedbacks

4. Selecting a new set of images.

This procedure is repeated until a satisfactory result is reached. Then, the CBIR system better succeeds in this semantically reduced collection avoiding false positives, images visually similar from the low-level visual features but with different concept meaning. Furthermore, CBIR process will be significantly reduced, both in terms of time and computation.

(3)

hierarchical categories [7]. This is similar to classification by keywords, but the fact that the keywords belong to a hierarchy enriches the annotations. For example, it can easily be found out that a car or bus is a subclass of the class land vehicle, while car and bus have a disjoint relationship. At best, some of them just use weighted factors to assign different levels of confidence to each mode.

III.PROPOSED WORK

A three-subsystem architecture consist of : TBIR (Text-Based Image Retrieval), CBIR (Content-(Text-Based Image Retrieval), and Fusion subsystem a simple data flow architecture is shown in below figure of data flow diagram of multimodality fusion.

I. Text-Based Information Retrieval (TBIR) Sub-System TBIR is in charge of retrieving relevant images for a given query taking into account the textual information available in the collection. Different steps are required in order to accomplish this task: information extraction, textual preprocessing, indexation and retrieval.

II. Content-Based Information Retrieval (CBIR) Sub-System

The IBIR sub-system is in charge of taking the relevant images. The two main steps of the IBIR sub-system are: the feature extraction and the similarity module.

Data flow diagram for multimodality fusion

I. Local Binary pattern(LBP)

[image:3.595.35.283.393.649.2]

Typically a gray-scale image of a subject is initially segmented into a number of uniform ,evenly distributed patches that cover the entire image[11].LBP is then applied to each pixel of a patch resulting in a histogram representing the feature characteristics for that particular patch shown in figure 1. A feature vector is created by simply concatenating all of the histograms associated with each patch.

Figure 1.Feature extraction with LBP

Consider a pixel surrounded by eight neighbors. The matrix represents the normal pixel values in an image[12]. The values in the pattern matrix and the weights are the worth of each value in the pattern matrix. To extract the LBP for the center pixel, C, the difference between it and each of its neighbor pixels, W, the histogram is calculated and matched and feature is extracted shown in figure 2.

II.Apriori algorithm

Association rules are used to identify relationships among a set of items in database. These relationships are not based on inherent properties of the knowledge themselves, but based on co-occurrence of the knowledge items. Frequent itemset and association rule mining have become a widely researched area, and hence many algorithms have

(4)

search space[13]. All nonempty subsets of a frequent itemset must also be frequent.

Apriori algorithm works on two concepts

a) Self Join- two itemsets having one and only one different item can be joined.

b) Pruning- Candidates containing non-frequent subsets are removed.

RECALL is the ratio of the number of relevant records retrieved to the total number of relevant records in the database. It is usually expressed as a percentage.

PRECISION is the ratio of the number of relevant records retrieved to the total number of irrelevant and relevant records retrieved. It is usually expressed as a percentage.

EXPERIMENTAL RESULTS

In the first module to justify multimodality fusion texual and visual data needed to be fed within a system selecting any visual image from the collection of data set through the browse button. Based on visual data textual data needs to be entered within a system ,after searching relevant image based on textual data entry point.similar to visual field both the data are presented to the user for comparision which will be done by local binary pattern i.e. lbp point to point matching is done with the help of binary graph which is prepared by dividing image into equal number of patches and taking each path divide it into 3*3 matrices based on intensity point of center pixel to neighbouring pixel binary gragh is prepared and image is matched where either the images from textual and visual field are matched or unmatched.

(5)

[image:5.595.36.560.57.255.2] [image:5.595.34.560.176.454.2]

Figure 1.1 : apriori module

figure 1.2 : relevant output Thus user gets most relevant output as well as irrelevant

output, as while searching over web getting relevant result is difficult so we work with approximation i.e a near about values related to input values.. Even after performing point to point matching through LBP user can get either 100% relevant output image or no image in the database has been matched with the input image. Thus there are two condition with apriori and without apriori : without apriori single

output is with the user either 100% relevant one or no data has been matched while with apriori relevant as well as irrelevant data can be captured. Calculations has been based on in figure 1.1 how many relevant images are based on query ‘duck water stone” 3 images are there which have all 3 content present within it . Figure 1.3 states most relevant one thus recall and precision values are calculated which are inversely proportional to one another.

Category of image

Mode

recall

precision

elephants

Local histogram

0.22

0.04 horses

Variance segment

0.208

0.416 waterdiving

Product algo

0.6280

0.4530

(6)

Apriori algorithm

Category of images

recall

precision

Time(in sec)

duck

0.8

0.667 3.2s

0.73333

0.6 2.1s

0.3333

1 2.7s

0.2

0.75 3.3s

deer

1

0.6 1.65s

1

0.5 1.04s

Improvement

12.90%

14.88%

Time taken less

The above table represent the comparative analysis between referred paper[8] and my work over the year. After

comparing many method apriori give 12.90% recall value and 14.88% precision value as compare to referred paper

[image:6.595.41.533.337.571.2]

values presented in table 1. A graph shown in figure 1.3 has states the graphical representation of fusion data based on apriori algorithm.

Figure 1.3 : Graphical representation of fusion data based on apriori algorithm.

In the graph above, the two lines may represent the performance of search systems the general inverse relationship between recalland precision remains. precision will suffer. Because synonyms may not be exact synonyms the probability of retrieving irrelevant material increases. records must be considered either relevant or irrelevant when calculating precision and recall. Obviously records can exist which are marginally relevant or somewhat irrelevant. Others may be very relevant and others completely irrelevant. This problem is complicated by individual perception: what is relevant to one person may not be relevant to another.Knowing the goal of the search --to find everything on a --topic, just a few relevant images, or something in-between data -- determines what strategies the searcher will use.

V.CONCLUSION

(7)

1727

VI. ACKNOWLEDGMENT

Prof. L.H. Patil and Professor Prof. Uma Patel has guided me along the way for the completion of this paper and the project. With their support and expert advice on the subject matter I was able to complete this paper successfully. I thank them for their valuable input and time.

REFERENCES

[1] J. A. Aslamand M. Montague, “Models for metasearch,” in Proc. 24th _{Annu. Int. ACM SIGIR} Conf. Res. Develop.” Inform. Retrieval”, New Orleans, LA, USA, 2001, pp. 276–284 2013.

[2] Gabriela Csurka1, Julien Ah-Pine2, and Stéphane Clinchant1 “Unsupervised Visual and Textual Information Fusion in Multimedia Retrieval - A Graph-based Point of View” arXiv:1401.6891v1 [cs.IR] 27 Jan 2014.

[3] Nikhil Rasiwasia, Jose Costa Pereira, Emanuele Coviello, Gabriel Doyle, Gert R. G. Lanckriet, Roger Levy, and Nuno Vasconcelos. A new approach to cross-modal multimedia retrieval. In ACM Multimedia, 2010.

[4] Wu Siqing; Xiong Gang, Wang Geng,Zhang Guoping Consumer Electronics,” Digital-image retrieval based on shape-feature and MPEG-7 standard”, International Conference 2011.

[5] ”A semantic image classifier based on hierarchical fuzzy association rule mining”Abolfazl Tazaree & Amir-Masud Eftekhar-Moghadam & Saeedeh Sajjadi-Ghaem-Maghami Springer Science +Business Media, LLC 2012

[6] S. Clinchant, G. Csurka, and J. Ah-Pine, “Semantic combination of textual and visual information in multimedia retrieval,” in Proc. 1st _{ACM Int. Conf.} Multimedia Retrieval, New York, NY, USA, 2011. [7] G. Csurka,S.Clinchant, andA. Popescu,

“XRCEandCEALIST’s “Participation at Wikipedia Retrieval of ImageCLEF 2011,” in CLEF 2011Working Notes, V. Petras, P. Forner, and P. Clough, Eds., Amsterdam, The Netherlands, Sep. 2011.

[8] “Multimedia information retrieval based on late semantic fusion approaches: experiments on a wikipedia image collection” xaro benavent, ana garcia-serrano,ruben granados, joan benavent, and esther de ves ieee transactions on multimedia, vol. 15, no. 8, december 2014.

[9] Timo Ahonen, Abdenour Hadid, Matti Pietikinen, "Face Description with Local Binary Patterns: Application to Face Recognition," IEEE Transactions

[10] J. Cao and C. Tong, “Facial expression recognition based on LBPEHMM,”in Proc. Congr. Image

Signal Process., 2008.

[11] E. A. Fox and J. A. Shaw, “Combination of multiple searches,” in Proc. 2nd Text Retrieval Conf., 1993. [12] S. Brin, R. Motwani, J. D. Ullman, and S. Tsur,

―Dynamic data itemset counting and implication rules for market basket data,ǁ SIGMOD Rec., Vol. 26, No. 2, 1997, pp. 255–264.

[13] N. Badal and S.Tripathi, ―Frequent Data Itemset Mining Using VS_AprioriAlgorithmsǁ,in International Journal on Computer Science and Engineering Vol. 02, No. 04, 2010, pp.1111-1118. [14] J. Han and M. Kamber,"Data mining concepts and