STATE-OF-THE-ART

CHAPTER 03 A FRAMEWORK FOR IMAGE ANNOTATION ENHANCEMENT & REFINING USING

3.2 STATE-OF-THE-ART

We can classify most of the existing automatic image annotation algorithms into two categories. First, they formulate automatic image annotation to classification problems with considering keyword (concept) as a unique class of the classifier, which are SVM classifier [Gao, et al 2006, Cusano et al 2004 and Yang, et al 2006] Gaussian Mixture Hierarchical Model [Carneiro, et al 2005a], [Carneiro, et al 2005b], Bayes Point Machines [Chang, et al 2003], 2-dimensional Multi-resolution Hidden Markov Model [Li, et al 2003] and so on. Second, many statistical models have been published for image annotation. [Mori et al. 1999] used a co-occurrence model, which estimates the correct probability by counting the co- occurrence of words with image objects. [Wei-Chao Lin et al. 2010] uses of the Information Gain (IG) and AdaBoost learning algorithms for noise and outlier information filtering in the system training stage with the hope that improve the performance of image classification. [Duygulu et al. 2002], strived to map keywords to individual image objects. Both dealt with keywords as one language and blob-tokens as another language, allowing the image annotation problem to be observed as translation between two languages. Using some classic machine translation models, they annotated a test set of images based on a large number of annotated training images. Based on translation model, [Pan et al. 2004] have put forward various methods to discover correlations between image features and keywords. They have applied correlation and cosine methods and introduced SVD as well, but the work is still based on a translation model with the seizure that all features are equally important and no knowledgebase (KB) has been used. The problem of the translation model is that frequent keywords are associated with too many different image segments but infrequent keywords have little chance of appearing in the annotation. To figure out this problem, [F. Kang et al. 2004] suggested two modified translation models for automatic image annotation and achieve better results [Kang, et al 2004]. [Jeon et al 2003] introduce cross media relevance models (CMRM) where the joint distribution of blobs and words is learned from a training set of annotated images. Unlike translation model, CMRM expects there are many to many correlations between keywords and blob tokens rather than one to one. Therefore, CMRM genuinely takes into account context facts. Furthermore, [Lavrenko et al, 2004] propose a continuous relevance model by separating an image into a fixed number of grids and avoiding segmentation and clustering issues that are observed in previous models. [Guangyu Zhu et al.

2010] applied decomposition techniques on the user provided tag matrix into a low-rank refined matrix and a sparse error matrix and targeting the optimality measure with low-rank, content consistency, tag correlation, error sparsity. However, in all of this work annotation contains many noisy keywords and there is no attempt to extend this “limit” of automatic image annotation problem.

[Amjad et al 2009] put forward a framework for video annotation enhancing and validation using WordNet and ConceptNet. [Amjad et al 2009] enhance the existing annotation by adjoining synonym set with each term and then validate each term using ConceptNet “capableOf”, “usedFor” and “locationAt” relations. The only curb of this approach is that, [Amjad et al 2009], does not care about the noisy keywords generated around during annotation process. For enlightening annotation, [Barrat et al. 2010] propose probabilistic graphical model to represent weakly annotated images, where they classify images and extend existing annotation to new images by considering semantic relation between keywords. [Yohan et al. 2005], bring up the innovative approach using semantic similarity measure among annotated keywords. [Yohan et al. 2005], Detected irrelevant keywords among candidate annotated keywords by uniting evidence-rule based on semantic similarity in WordNet by the help of Translational Model based Hybrid Dempster (TMHD) model. For instance, if an image has been annotated with „sky‟, „water‟, „mountain‟, „door‟ by TM model, TMHD model computes the semantic similarity of one word [Yohan et al. 2005] called „semantic dominance‟) over all other candidate words (e.g., „sky‟ with other keywords such as „water‟, „mountain‟ and „door‟). TMHD model combined semantic dominance score from three different semantic similarity measurements (JNC, LIN, BNP) and keep only strong candidate annotation keywords whose scores are above the threshold. This approach reduces the annotation diversity and hence decreases in the retrieval degree.

To overwhelm the inadequacy of [Amjad et al 2009, Barrat et al. 2010 and Yohan et al. 2005], we are proposing a newfangled framework for annotation enhancement and refinement that will expand lexically and commonsensically the annotation by utilizing the well-known knowledgebases. The main theme of the proposed framework is to take annotated datasets (either generated manually or by automatic means) and perform the data filtration process on that, which includes redundancy control, stopwords process and unification of the different forms of words. Next to expand the terms lexically and

commonsensically via well-known knowledgebases i.e. WordNet and ConceptNet, while this process generates a set of keywords where some of the terms are related whilst several are irrelevant and that‟s need to be remove. In order to remove irrelevant keywords, we applied semantic similarity threshold between original keyword and that of generated keywords by utilizing the WordNet and terms equal or above the threshold are retain in the list, while others are discarded. The output of this framework is in the form of XML document for each image based on the [LabelMe] annotation structure that can be used for further processing and portability. Keeping flexible nature of this framework, so that not only can easily be plugging to any image's corpus, but also can be integrated with any other knowledgebases or domain ontologies. Moreover, the latest release of the WordNet and ConceptNet can be accommodated by only updating their API‟s.

In document Semantic multimedia modelling & interpretation for annotation (Page 102-104)

CHAPTER 03 A FRAMEWORK FOR IMAGE ANNOTATION ENHANCEMENT &amp; REFINING USING

3.2 STATE-OF-THE-ART

CHAPTER 03 A FRAMEWORK FOR IMAGE ANNOTATION ENHANCEMENT & REFINING USING