International Journal of Emerging Technology and Advanced Engineering
Website: www.ijetae.com (ISSN 2250-2459,ISO 9001:2008 Certified Journal, Volume 3, Issue 3, March 2013)
640
An Adaptive Fuzzy Logic Based Technique To Read User’s
Mind for Image Annotation
SHYNI.G
1, SHANMUGAPRIYA.K
21
PG Student (Final M.E Applied Electronics), Lord Jegannath College of Engineering and Technology, Ramanathichanputhur
2Assistant Professor of ECE Department, Lord Jegannath College of Engineering and Technology, Ramanathichanputhur
Abstract - Nowadays, Eye trackers are used in research on the visual system. Eye tracking is the process of measuring either the point of gaze or the motion of an eye relative to the head. In digital imaging technology, image annotation is an important and promising methodology for image retrieval. This image annotation can be used for indexing and understanding of large collections of image data. In this paper, fuzzy logic based genetic algorithm is used to provide high accuracy in image annotation. It gives the possible solutions for image annotation and retrieval by implicitly monitoring user attention via eye-tracking. We show in this paper that, the existing information in gaze patterns can be used to improve the machine’s judgment of image content.
Keywords- Gaze tracking, human computer interaction, image annotation, image retrieval, visual attention.
I. INTRODUCTION
Eye movement research and eye tracking flourished in the 1970s, with great advances in both eye tracking technology and psychological theory to link eye tracking data to cognitive processes. Eye tracking is a technique used in variety of areas, including neuroscience, psychology, cognitive science, human computer interaction, etc[2]. The most common form of eye tracking today is the desk-mounted tracker. These systems typically reside underneath the user's PC and track the user’s point of gaze on a standard monitor. They operate in real time and typically report the users gazepoint in screen coordinates. In this paper we discuss the use of eye-trackers for image annotation in the field of multimedia and vision to classify images as a function of the target template that guides user visual attention implicitly and accurately.
In general, an image annotation task consists to assign a set of semantic tags or labels to a novel mage based on some models learned from certain training data. Conventional image annotation approaches often attempt to detect semantic concepts with a collection of human-labeled training images [8]. Recently, with the prevalence of Web 2.0 applications and social games, more and more users contribute numerous tags to Web images, such as Flickr [9], ESP games [7], etc.
These tags provide the meaningful descriptors of images, which are especially important for those images containing little or no textual context. The success of Flickr proves that users are willing to provide this semantic context through manual annotations. Recent user studies on this topic reveal that users do annotate their photos with the motivation to make them better accessible to the general public[1].
Automated image annotation has been an active and challenging research topic in computer vision and pattern recognition for years. Automated image annotation is essential to make huge unlabeled digital photos index able by existing text based indexing and search solutions. In general, an image annotation task consists to assign a set of semantic tags or labels to a novel image based on some models learned from certain training data. Due to the long-standing challenge of object recognition, such approaches, though working reasonably well for small-sized test beds, often perform poorly on large dataset in real-world. Besides, it is often expensive and timeconsuming to collect the trainingdata.If the annotation is automatic [5], the machines conduct the process by inspection of the low
-level features of the images[4], classification of them and optimization of the classification results. However, regardless of how well the classification is performed, the semantic gap [6] remains a problem.
Kozmaet.al. [3] introduced GaZIR which is a gaze-based interface for image browsing and search. They used logistic regression classificationalgorithm.The system computes on-line predictions of relevance of images based on implicit feedback, and when the user zooms in, the images predicted to be the most relevant are brought out. Consequently many researchers have started to investigate semi-automatic image annotation and retrieval by the help of the implicit feedback acquired from eye-trackers..
International Journal of Emerging Technology and Advanced Engineering
Website: www.ijetae.com (ISSN 2250-2459,ISO 9001:2008 Certified Journal, Volume 3, Issue 3, March 2013)
641
We use the UILs to annotate images that the system has no information about by having some key information about the concept in their mind or by looking at the cluster of the images with high values of UIL.
II. METHODOLOGY
In this approach a new technique has been implemented. We use genetic algorithm for optimizing fuzzy rules. While optimizing fuzzy rules we can get more accurate image annotation than previous methods.
Figure: 1 Flow Diagram of System
2.1 Gaze Inference System
The location of face and eye should be known for trackingeye movements. We assume that this location informationhas already been obtained through extent techniques. Exact eye movements can be measured by special techniques.
Figure: 2 Face and Gaze detection
The camera with infra-red filters and the MATLAB software package were used as the eye-tracking technology. In this framework, video images are captured by camera and stamped by their corresponding capture time and sent to the processing software. Next eye positions are identified in every single image and two glints which are the brightest spots in the image are located in them. This glint is the reflection of an infra-red source by the eyes which is positioned between the cameras. By comparing the position of the glints to the position of the pupils in each video image, the software can estimate the direction vector of the user’s gaze.
2.2 Feature Extraction Unit
When the user finished exploring the page and requested a new page the data of every page of the scenario are sent to the FEU. The inputs of this FEU unit are three vectors of fixation coordinates, fixation duration and ID and spatial properties of the images on the screen. In this unit, two different feature vectors are extracted which are called the transition feature vector (TFV) and the image feature vector (IFV).
First, the IFV contains the features show how the user paid attention to an individual image. The length of feature variable of every feature in this FV is equal to the number of images that appeared on the screen. Second, the TFV contains the features show the properties of transition of gaze from one image to another. The final output of the system is the average of these two T-UIL and I-UIL values. We have developed two TSK-FIS, there are two sets of data pairs with IFV and TFV as inputs. If they belong to a TC+ image or a gaze transition that involves a TC+ image, then their corresponding CV is 1 otherwise it is zero.
Let { }be N data pairs for developing GIS’s TSK-FIS.
Video preprocessing
GIS
Feature Extraction
Genetic Algorithm
Fuzzy logic image annotation
T-FV I-FV
Average
International Journal of Emerging Technology and Advanced Engineering
Website: www.ijetae.com (ISSN 2250-2459,ISO 9001:2008 Certified Journal, Volume 3, Issue 3, March 2013)
642
For data pairs with IFV as input:
N=5x Number of Images on Each Page.
For data pairs with TFV as input:
N=Total number of times that the user moved from one image to another image.
The output of each rule of the TSK-FIS can be shown as
⁄ ̅ ⁄
Where shows the coefficient matrix
of the output membership function of the cth rule.
The final output of the TSK-FIS can be shown as follows:
[ ̅ ̅ ⁄ ̅ ̅ ⁄ ] [ ]
2.3 Feature Evaluation
[image:3.612.337.541.143.331.2]The effect of our features are calculated by the quality of the output of our framework and we tested the training data of every user by calculating the mutual information between the samples of every feature and their corresponding class variable, where the class variable shows the real state of an image with regards to the TC (e.g., if the TC is tiger and the image contains “tiger” as a concept, then its class variable is 1; otherwise, it is 0). This value reveals the reliability of the stored data in the extracted feature for classification.
Figure: 3 Framework performance
Figure: 4 accuracy measurement
Figure shows the reliable framework performance of the system. By implementing the genetic algorithm we can able to get high accurate image annotation than the previous methods and the execution time also reduced.
2.4 Genetic Algorithm
Genetic algorithm (GA) is now a very popular tool for solving optimization problems. It has been used to effectively find optimal solutions for a variety of problems (e.g. operations research, hybrid techniques, image processing, etc.). Genetic algorithms are based on the mechanics of natural selection and natural genetics. Genetic algorithms have been employed for generating and/or adjusting membership functions of fuzzy sets. Karr adjusted fuzzy membership functions and Nomura et al determined. Fuzzy partition of input spaces by genetic algorithms. In this paper, the authors use the genetic algorithm approach in fuzzy rules design. The genetic optimization replaces the tedious process of trial and error for better combination of fuzzy rules. Representative data are used from real world or reliable resources for genetic optimization. Studies are carried out based on fuzzy models.
Genetic optimization algorithm starts with static variable initialization. All the static and dynamic variables are initialized in the GOL constructor. The GOL uses bit-wise interpretation, which means, the length of a particular allele is expressed in terms of bit. If an allele carries a possible value from 0 to 7 (or 8 possible features), the length of the allele is then three. Calculations are then being carried out for some other static variables, such as the length of a chromosome. Initial individuals or chromosome of the members in a population have to be defined.
1 2 3 4
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 Images E x e c u ti o n T im e Performance Existing Proposed
1 2 3 4
[image:3.612.64.267.474.661.2]International Journal of Emerging Technology and Advanced Engineering
Website: www.ijetae.com (ISSN 2250-2459,ISO 9001:2008 Certified Journal, Volume 3, Issue 3, March 2013)
643
[image:4.612.74.248.299.443.2]They can be generated randomly, supplied by the user or partially generated randomly with partial user input. Fitness of initial chromosomes is calculated with the overridden fitness calculation function, including the penalty factor calculation. The Reproduction’s main function is to generate the new generation from the present generation. It consists of selection, mutation and crossover operations. With analogy to the biological world, stronger individual stands higher chance in natural selection for breeding. Thus, individual with better fitness will stand higher chance to be selected in the selection operation. Stochastic sampling with replacement method is used. Crossover and mutation may happen while breeding.
Figure: 5 Example UIL calculation of six images
The success of crossover and mutation process is depending on the pre-defined mutation probability and crossover probability, respectively. The breeding of new generation is continuing until the population size reaches its limit. All the new generation will replace the present generation in the generation change process. Then the global population information is updated. The whole processes are repeated for the desired number of generation.
Figure 5 shows the UIL of the particular user for the various six images. While optimizing the Fuzzy rules we can be able to get more accurate feature extraction.From these images we shall find the user interest level using the distance values. If the values of distance are 0 the user interest level is high or else user interest level is low.
III. CONCLUSION
In this paper, the solution for image annotation and retrieval by monitoring user attention via eye-tracking is proposed. Here, the features are extracted from the gaze trajectory of users to provide implicit information.
By applying the genetic algorithm we can be able to optimize Fuzzy rules so that we can process the gaze data more accurately than the previous methods. We use more number of features using genetic algorithm, so that the accuracy is increased. This algorithm is able to be trained for every user and to produce better results with high accuracy.
REFERENCES
[1 ] M. Ames and M. Naaman,” Why we tag: motivations for annotation in mobile and online media“ In Proc. of CHI ’07, pages 971–980, SanJose, California, USA, 2007.
[2 ] Andrew T. Duchowski. “Eye tracking methodology” Springer, 2003. [3 ] L. Kozma, A. Klami, and S. Kaski, “Gazir: Gaze-based zooming interface for image retrieval” in Proc. 2009 Int. Conf. Multimodal Interfaces, New York, 2009, pp. 305–312, ser. ICMI-MLMI’09, ACM.
[4 ] X. S. Zhou and T. S. Huang, “Relevance feedback in image retrieval: A Comprehensive review” Multimedia Syst., vol. 8, pp. 536–544, 2003.
[5 ] R. Shi, H. Feng, T.-S. Chua and C.-H Lee, “An adaptive image content representation and segmentation approach to automatic image annotation in Image and Video Retrieval” ser. Lecture Notes in Computer Science. Berlin, Germany: Springer, 2004, vol. 3115, pp. 1951–1951.
[6 ] H.Ma, J. Zhu, M.-T. Lyu, and I. King, “Bridging the semantic gap between image contents and tags” IEEE Trans. Multimedia, vol.12,no.5, pp.462–473,2010.
[7 ] ESP.http://www.espgame.org.
[8 ] J. Fan, Y. Gao, H. Luo, and R. Jain, “Mining multilevel image semantics via hierarchical classification” IEEE Transactions on Multimedia, 10 (2):167–187, 2008.
[9 ] Flicker.http://www.fickr.com.
[10 ]A. L. Yarbus, “Eye Movements and Vision” New York: Plenum, 1967, translated from Russian by Basil Haig,, Original Russian edition published in Moscow in 1965.
[11 ]“Gazir: Gaze-based Zooming Interface for Image Retrieval” by laszlokozma, Artokl, Samuelkaski, Department of Information and Computer science, Helsinki University of Technology Finland. [12 ]“Learning To Rank Images From Eye Movements” by Kituchart,
Pushpa, Craig J. Saunders, SandorSzedmak, School of Electronics & Computer Science University of Southampton SO17 1BJ, United Kingdom.
[13 ]“Implicit Emotional Tagging Of Multimedia Using EEG Signals And Brain Computer Interface” by AshkanYazdani, Jong-Seok Lee Touradj Ebrahimi, Institute of Electrical Engineering, Ecole Poly technique Federale de Lausanne (EPFL), EPFL/STI/IEL/GR-EB Station 11, CH-1015 Lausanne, Switzerland.
[14 ]“Optimization Of Fuzzy Rules Design Using Genetic Algorithm” by S.V. Wong, A.M.S. Hamouda, Department of Mechanical and Manufacturing Engineering, University Putra Malaysia, 43400 Serdang, Malaysia.
International Journal of Emerging Technology and Advanced Engineering
Website: www.ijetae.com (ISSN 2250-2459,ISO 9001:2008 Certified Journal, Volume 3, Issue 3, March 2013)
644
[16 ]“Comparison Of Fitness Scaling Functions In Genetic AlgorithmsWith Applications To Optical Processing” by Farzad A. Sadjadi School of Physics and Astronomy, University of Minnesota, Twin Cities.
[17 ]“Application Of Genetic Algorithms In Fuzzy Rule Generation” Written by Masoud Makrehchi.