A REVIEW ON OBJECT RECOGNITION
TECHNIQUES FOR PRACTICAL APPLICATION
Sushree Dhyanimudra Nayak
1, Siddharth Swarup Rautaray
21, 2 School of Computer Engineering, KIIT University, Bhubaneswar(India)
ABSTRACT
Object recognition is an important part of computer vision because it is closely related to the success of many computer vision applications. A number of object recognition algorithms and systems have been proposed for a long time in order to address this problem. Yet, there still lacks a general and comprehensive solution. Recently, model-based object recognition approach, feature based object recognition approach and appearance-based object recognition approach have demonstrated good performance on a variety of object recognition problems.
In this paper, the most representative papers of each approach are first reviewed. After that, some possible extensions to the recently proposed view-based approach and feature-based approach and model-based approach are proposed and discussed in the paper.
Index Terms: Object Recognition, Model Based Object Recognition Approach, View Based Object Recognition Approach, Feature Based Object Recognition.
I.
INTRODUCTION
Object recognition - in computer vision is the task of finding and identifying objects in an image or video sequence. Humans recognize a multitude of objects in images with little effort, despite the fact that the image of the objects may vary somewhat in different viewpoints, in many different sizes / scale or even when they are translated or rotated. Objects can even be recognized when they are partially obstructed from view. This task is still a challenge for computer vision systems. Many approaches to the task have been implemented over multiple decades. It is an important part of computer vision because it is closely related to the success of many computer vision applications such as robotics, surveillance, registration and manipulation etc. A number of object recognition algorithms and systems have been proposed for a long time toward this problem. Yet, a general and comprehensive solution to this problem has not be made.
In model-based object recognition, a 3D model of the object being recognized is available. The 3D model contains detailed information about the object, including the shape of its structure, the spatial relationship between its parts and its appearance. This 3D model provides prior knowledge to the problem being solved.
There are two common ways to approach this problem. The first approach involves obtaining 3D information of an object from images and then comparing it with the object models. To obtain 3D information, specialized hardware, such as stereo vision camera, is required to provide the 3D information in some forms. The second approach requires less hardware support but is more difficult. It first obtains the 2D representation of the structure of the object and then compares it with the 2D projections of the generative model.
In view-based object recognition, 3D model of the object is not available. The only known information is a number of representations of the same object viewed at different angles and distances. The representations of the
99 | P a g e object can be obtained by taking a series of images of the same object in a panorama fashion. Most of these operate by comparing a 2D, image representation of object appearance against many representations stored in a memory and finding the closest match. Matching of this type of recognition is simpler but the space requirements for representing all the views of the object is large. Again, there are many ways to approach this problem. One of the common way is to extract salient information, such as corner points, edges and region etc, from the image and match to the information obtained from the image database. Another common approach extracts translation, rotation and scale invariant features, such as SIFT, GLOH and RIFT, from each image and compares them to the features in the feature database.
A search is used to find feasible matches between object features and image features. The primary constraint is that a single position of the object must account for all of the feasible matches. Methods extract features from the objects to be recognized and the images to be searched.
surface patches
corners
linear edge
The remainder of the paper is organized as follows. In section Ⅱ, a review of various techniques based on model based approach. A review of various techniques based on view based approach in Section Ⅲ. After that, a review of various techniques based on feature based approach in section Ⅳ, and we draw our conclusion in the last section.
II. MODEL-BASED OBJECT RECOGNITION
Generative models are the commonly used models in modeling object for matching. They are models described by mathematical functions and operators and is sufficiently complete to generate images of target objects. One of the common usage of model-based object recognition is face modeling and recognition. Generative face models can generate realistic images of a human face, with changeable facial expression and pose. The general steps to recognize an object using model is:
1. Locate the object,
2. locate and label its structure,
3. adjust the model's parameters until the model generates an image similar enough to the real object.
Generative models that are of particular interest are deformable models because objects in a class are often not identical. To be able to identify all objects within the class, we have to deal with variability. Deformable models are the models that capture the principle components of the class of objects and they can deform to fit any of the object in a class. A number of models have been proposed to address the problem, they include Object Recognition for Service Robots in the Smart Environment, Recognition Architecture using distributed and parallel computation.
2.1 Object Recognition for Service Robots in the Smart Environment
With the advent of the Internet, the amount of multimedia content has been increased dramatically and the need for the efficient content-based image retrieval (CBIR) has emerged accordingly. CBIR uses the visual contents of an image such as colour, shape, and texture to represent and index the image[1,2] . In typical CBIR systems, visual contents of an image are automatically extracted and described as multi-dimensional feature values.
A smart environment is developed for light-weight service robots, full of smart items with a radio frequency identification (RFID) tag and smart devices with a RFID reader. In the approach, object recognition is performed in collaboration with the environment to distribute computational workload and to ensure reliability in object recognition. This paper introduces a new object recognition system that is robust to those environmental constraints to be used for service robots in our smart environment constructed . The recognition system uses two visual descriptors of MPEG-7, specifically, dominant color descriptor (DCD) and edge histogram descriptor (EHD)[3].
Park J. et al. have proposed a new object recognition system for service robots in a smart environment full of RFID tags and connected by wireless networks. Object recognition is one of the basic functionalities a service robot should perform and many researchers have attempted to make the service robot recognize objects through vision processing in natural environments. However there is no conventional vision system that can recognize target objects robustly in a real-world workspace. In our smart environment, flexibility and robustness in object recognition are provided with the help of RFID tags and communication networks. RFID tags provide the presence and identity information of the object and the robot can recognize the object through vision processing based on a detected RFID code and downloaded visual descriptor information. For feature descriptors, we adopted MPEG 7 visual descriptors due to its concise and unambiguous description of the complex media contents. This paper focuses on developing our object recognition engine based on visual descriptors information in the smart environment. To this end, we propose a new object recognition architecture for service robots, and present a implemented matching algorithm on the basis of MPEG 7 visual descriptors such as colour and texture.
III. VIEW BASED OBJECT RECOGNITION
This type of object recognition is also known as appearance-based recognition. There are many object recognition approaches of this type. Correlation-based template matching is one of the approaches that is very commonly used in commercial object recognition system. It is simple to implement and effective for certain engineered environments. However, this type of method is not effective when the illumination of the environment, the object posture and the scale of the object are allowed to change. The result of this method is even more poor when occlusion may occur and image databases is large. An alternative approach is to use color histogram to locate and match image with the model. While this approach is robust to changing of viewpoint and occlusion, it requires good isolation and segmentation of objects from image. Another approach is to extract features from the image that are salient and match only to those features when searching all locations for matches. Many possible feature types have been proposed, they includes region , shape context , line segments and groupings of edges etc. Some of these features are view variant and resolution dependent. They have worked well for certain classes of object, but they are often not detected frequently enough for reliable recognition. Therefore, approach that matches these kinds of features generally has difficulty in dealing with partial visibility and extraneous features. A highly restricted set, such as corners, don't have these problems.
They are view invariant, local and highly informative.
3.1 Appearance-based Object Recognition using Multiple Views
Most object recognition systems determine the identity of an object on the basis of the information gathered from a single image. Typically, a set of features is extracted from an image and compared against the features of
101 | P a g e training images, subject to varying degrees of geometric constraint. There has been considerable research into what features are best suited for recognition, under what conditions, and to what degree geometric constraints can be usefully employed. Appearance based methods derived from this research have been among the most effective . Single view recognition fails when the features extracted from the image are not sufficient to confidently identify any object. This can happen either because a view of an object in the database looks very similar to a view of a different object in the database, or because a large number of the features have been corrupted due to clutter and occlusion.
One solution to this problem is to use information from several views of the object. There has been considerable research on active object recognition systems, where the camera is moved around the object to gather additional views until there is enough evidence to obtain a sufficient level of confidence in an object hypothesis. More recently, Schiele and Crowley[4] developed a method for detecting the most salient viewpoints of an object and used it to build an active object recognition system which moves the camera to the most discriminant viewpoint of an object in order to verify the presence of the hypothesized object. The Adore system developed by Draper [5] dynamically selects vision procedures based on a Markov decision process model. Borotschnig combined Eigen space object representations with probability distributions to obtain classifications that can be used as a gauge to perform view planning. While active object recognition systems obtain good results, they require a complicated and expensive setup that is not always easy to achieve. A setup where two or more fixed cameras are directed towards an object is much easier and can be a part of a larger surveillance system. There has been comparatively little work on combining the evidence from static cameras. One such system is the system developed by Mao et al., which uses a majority voting scheme that keeps track of the number of consistent votes cast by prototype hypotheses for particular object models. In this paper we investigate how a single view object recognition system can be used in a multiple camera setup to improve object recognition results. The cameras are presumed fixed, or at least not fully reorient able on-line, and information about their relative pose and calibration parameters may be limited. In our experiments, we use a previously developed object recognition system ; however the approach can be used in conjunction with any object recognition system that can return several hypotheses about object identity and pose.
The advantage of such a method over active object recognition systems is the simplicity of the setup, and the ability to use independently developed single-view recognition systems. The method can be used in conjunction with any single-view object recognition system that obtains several object and pose hypotheses.
The method can dramatically improve performance over the single-view level, with two important, and interesting limitations: First, use of multiple views does not significantly performance if single view performance is weak. Second, the benefits of imposing constraints between views plateau, so that using additional available constraint information, does not necessarily produce a concomitant improvement in performance. We argued that both limitations arise from a property common to many computer vision recognition systems, that we term a weak target error. It tends to fail when object types look very similar. This weakness is exacerbated when the number of object types becomes large, thus increasing the likelihood that two object types will look very similar.
Object recognition from a single view fails when the available features are not sufficient to determine the identity of a single object, either because of similarity with another object or because of feature corruption due to clutter and occlusion. Active object recognition systems have addressed this problem successfully, but they
require complicated systems with adjustable viewpoints that are not always available. In this paper Selinger A.
and Nelson R. have investigated the performance gain available by combining the results of a single view object recognition system applied to imagery obtained from multiple fired cameras. In particular they have addressed performance in cluttered scenes with varying degrees of information about relative camera pose. They have argued that a property common to many computer vision recognition systems, which we term a weak target error, is responsible for two interesting limitations of multi-view performance enhancement: the lack of significant improvement in systems whose single-view performance is weak, and the plateauing of performance improvement as additional multi-view constraints are added.
3.2 Moving Object Recognition Using Background Subtraction-Based Image Processing
Simulation of prosthetic vision offers an alternative way of adjusting implant designs and estimate the minimal information requirements for the prosthetic wearer. Currently used models of prosthetic vision simulate phosphene shape, intensity, size and regularity to simplify the possible range of precepts by normally sighted volunteers, and therefore, represent a best case scenario for a blind patient under these conditions . In this way, many research groups have effectively estimated the performance capacity of a visual prosthesis for such things as facial recognition and eccentric reading. Based on previous work, Boyle et al. applied several variations to region-of-interest (ROI) processing and edge detection for scene and face recognition. Results showed that presenting a digital zoom of a salient region within the generated ROI was preferable to presenting an entire ROI-processed image. Zhao et al. [6]used two kinds of image processing strategies (adaptive threshold- binarization and edge extraction) to investigate the effect of pixel shape and resolution in a recognition task of common objects or scenes. Results demonstrated that image mode had a significant impact on recognition accuracy near the threshold of recognition. van Rheede and colleagues[7] have implemented three techniques (Full-field representation, ROI and Fisheye) for optimizing the information content of a simulated prosthetic image in order to evaluate visual acuity, object recognition and manipulation, and navigation within the environment. Their experimental results suggested that changing the conditions of image presentation proved advantageous for different visual functional tasks.
BS regards the foreground as the difference between the current image and a reference background image, often called the „„background model‟‟. Several techniques have been used, such as the single Gaussian model or temporal median filtering that are simple models but extremely sensitive to changes in dynamic scenes resulting from lighting and/or extraneous events. A recent proposed universal BS technique called visual background extractor (ViBe) outperforms other techniques, not only in terms of computational speed and classification accuracy , but also the ability to solve problems arising from the nature of video surveillance . Due to its excellent performance and almost parameter less procedure, we used ViBe for foreground segmentation. ViBe compares current input pixels with a background model by randomly selecting neighbouring pixels.
The background-subtraction-based strategies are advantageous to recognition of a dynamic scene in low- resolution prosthetic vision. It have the ability to deal with the incoming object detection and enhancement. The BRFE strategy added more gray scale information and subjects attained higher recognition accuracy.
Although simulation provides a feasible way to approximate the experience of prosthesis wearers, it is still idealized relative to the actual visual percept achieved with an irregular retinotopic map and/or phosphene dropout such as reported by chronic implant recipients. The ability to recognize objects and faces in low-quality images is related to edge quality , enhancing contour information based on BS techniques is a promising method whereby perception of moving objects can be enhanced for prosthetic vision.
103 | P a g e Wang J. et al. have proposed a visual prosthesis that applies electrical stimulation to different parts of the visual pathway as a viable approach to restore functional vision. However, the created percept is currently limited due to the low-resolution images elicited from a limited number of stimulating electrodes. Thus, methods to optimize the visual percepts providing useful visual information are being considered. We used two image- processing strategies based on a novel background subtraction technique to optimize the content of dynamic scenes of daily life. Psychophysical results showed that background reduction, or background reduction with foreground enhancement, increased response accuracy compared with methods that directly merged pixels to lower resolution. By adding more gray scale information, a background reduction or foreground enhancement strategy resulted in the best performance and highest recognition accuracy. Further development of image- processing modules for a visual prosthesis based on these results will assist implant recipients to avoid dangerous situations and attain independent mobility in daily life.They have focused on exploring effective image processing strategies to optimize information content and to improve the perception of moving objects.
We demonstrated that adaptive post processing effectively improved foreground segmentation performance.
Importantly, results of psycho physiological experiments indicated that BR and BRFE strategies were advantageous to the perception of movement. The BRFE strategy added more gray scale information and subjects attained higher recognition accuracy.
3.3 Object Recognition From A 2D Image Using GA Algorithm
Object recognition technology can be roughly divided into two ways: matching with 3D model[8] and matching onto 2D space[9][10]. As for the method of matching with 3D model, matching is relatively easy because it does not heavily depend on a view point. However, restoring the 3D structure is difficult. As for the method of matching onto 2D space, it is unnecessary to extract the 3D features, however, it should be able to treat any 2D images with different viewpoints.
The genetic algorithms (GAS), which are optimizing techniques based on evolution, have received attention recently. we investigate an object recognition technique using GA. A ,3D object model which is prepared in advance is rotated, magnified or reduced in 3D space and is projected onto 2D space. Therefore the 3D object model Cain represent the 2D images observed from various points of view. The resultant 2D image can be matched (adequately with the input 2D image. Accordingly the problem of object recognition can be regarded as a kind of optimization problem, that is, to search for model providing the most similar image to the input image, and then, GA is applied to solve the optimization problem. In addition, hierarchical object recognition is employed: the coarse matching and the recognition of object. In the coarse matching, the input image is approximated by simple figures such as circles, triangles and rectangles. Then in the recognition of the object, the simplified image by the coarse matching is matched with the image projected a 3D fundamental object model. This process is based on such an idea that human beings coarsely grasp the locations and the sizes of objects at first, and then, recognize them.
ABE Y. and HAGIWARA M. have proposed a new approach for object recognition in this paper. Many methods to recognize objects have been studied, Most of them required one precise object model for recognizing only one object. Accordingly it is necessary to prepare a model for only one object in advance. Moreover it is difficult to make the precise model, and a long computational time is necessary to match it with an input image.
In this paper, a hierarchical object recognition method using a genetic algorithm (GA) is proposed. In the proposed method, at first the input image is simplified, and then the simplified image is matched with a
fundamental model. By means of the hierarchical method, a precise object model is not necessary, and only one fundamental model represents the objects which belong to the same category. In this paper, they have proposed a new approach for object recognition. The proposed method consists of the coarse matching and the recognition of the object. In the coarse matching, the input image is matched by the head-trunk model and the ribon using GA, and then, it is simplified. The trunk model represents a human head and a trunk, and can coarsely grasp the size and the location of the upper half of the human being in the input image. The ribon consists of two straight lines. It can seize such slender parts as human arms. In the recognition process of the object, the 3D skeleton model is matched with the simplified image by the coarse matching using GA, and the system recognizes whether the input image is a human being or not. By means of hierarchical approach, the proposed method does not need a precise model but a fundamental model (a 3D skeleton model in our experiments). Because the fundamental model is simplified, it is not difficult to prepare in advance as compared with a precise model.
3.4 Object Recognition Based on N-Gram Expression of Human Actions
In the field of computer vision, the mainstream of object recognition is appearance based approach including shape models, colors and so on . For these methods, however, it is difficult to identify objects whose shapes are not registered beforehand for unknown environment. To deal with this problem, human behaviour can be used as clues to infer the function and usage of the object categories. In the field of object recognition, most of previous works are based on appearance based features. In these method, large database is necessary to register various shapes of objects. It is not easy to collect dataset of shapes comprehensively. On the contrary, another approach have been attempted on object recognition by making use of human activities around objects. Moore et al. proposed a method of object recognition by integrating shape models and hand motions around objects complementally[11]. Higuchi el al. proposed a hierarchical models of objects and actions which interact each other in recognition process[12]. In addition, Gupta et al. proposed a method in which the interactions between objects and actions are represented as probabilistic networks [13]. In these methods, actions are classified to certain categories and then made correspondence with objects. Consequently the final results depend on the performance of intermediate action recognition. These action categories for action recognition are, however, not always effective in object recognition.
In this paper, a novel method is presented of object recognition from human actions directly. The proposed method is based on bag of-features approach using n-gram expression of human motions. An n-gram is a subsequence of n words/letters used in the field of document analysis. Similarly, the paper regards n-gram of primitive motions of human as features to characterize the object category which human take actions on. bag- of-features approach which was originally applied to document analysis have been introduced to video applications. Hamid et al. proposed an unsupervised method for detecting anomalous activities by using bags of event n-grams . In their method, human activities are represented as n-grams of actions. As for more primitive motions, Thurau et al. introduced n-grams of primitive level motion features . They captured change of poses as n-grams of HOG features from sequential frames. From this, we consider bag-of features is applicable to learn human activities. Therefore we adopt bag-of-features with n-grams for recognizing objects rather than human gestures.
Kojima A. et al. have proposed In this paper, we propose a novel method for recognizing bag-of-features. The key contribution of our method is that human actions are represented as n-grams of symbols and used to identify specific object categories. First, features of human actions taken on a object are extracted from video images and encoded to symbols. Then, n-grams are generated from the sequence of symbols and registered for
105 | P a g e corresponding object category. For recognition phase, actions taken on the object are converted into set of n- grams in the same way and compared with ones representing object categories. We performed experiments to recognize objects in an office environment and confirmed the effectiveness of our method. First, human actions on objects are captured from image sequences. In each sequence, successive actions taken on a object are recorded. For example, actions on a book shelf include opening, picking up or placing books and closing. Then, human motion features such as position of head, direction and distance from head to hand, and so on are extracted. These features are encoded into sequence of symbols. Finally, n-grams, i.e. sequential n symbols, are generated and registered to the object category. After a set of n grams is generated from query image sequence in the same way, it is compared with those of object categories. Thus the object is recognized as most fitted one.
IV. FEATURE BASED OBJECT RECOGNITION
A number of models have been proposed under this approach. These are as follows.
4.1 Local Features And Histogram Based Planar Object Recognition
Visual object recognition system recognizes an object in scene image. Although it has been researched over the past few decades, it‟s still a difficult problem because the object in scene image can be affected by a lot of variations including image pose, illumination changes, clutter background, etc. In general, there are two main approaches for object recognition including global and local features based methods. The global features are extracted from the whole content of object, while the local features are extracted from local pieces of object.
Global features based methods commonly use shape and colour features. However, these methods are sensitive to object occlusion and cluttering parts which disturb recognition. Moreover, using the global features for object recognition requires object segmentation from scene image which is still a challenging problem. Changes, object transformation and object variation, variety local features have been proposed and then they are applied to many applications such as object detection and recognition, image matching, image stitching, object tracking, etc. In 1999, D. Lowe proposed a Scale Invariant Feature Transformation (SIFT) feature detector and descriptor for object recognition [14]. SIFT is invariant to image translation, scaling, rotation and partially invariant to illumination changes and affine transformation but complexity and heavy. This work becomes the original inspiration for the most of local feature descriptors proposed later. Then, a Speed Up Robust Features (SURF) was reported as a more efficient substitution for SIFT by H. Bay [15] in 2006 because it produces smaller descriptor as well as speeds up matching step. However, these feature descriptors are still complicated and take much time for matching. For object recognition system, the goal is to make descriptors faster to compute, more compact while remaining robust. To address these requirements, many researchers tended to build a light weight descriptor based on binary string. In 2010, Calonder proposed a Binary Robust Independent Elementary Features (BRIEF) descriptor which relies on a relatively small number of intensity difference tests to represent an image patch as a binary string [16]. Global approach is simpler and faster than local approach but not robust under illumination, image pose changes and clutter images. Therefore, we propose an approach to combine the local and global feature to make a robust object recognition system. In our method, the object is detected first using local features matching, and then the background is removed. Next, we extract histogram feature on segmented image. To combine the spatial information into histogram feature, a local histogram features is proposed that describes the intensity distribution of different region in image. The goal in this approach is dealing with planar objects such as book cover, cd cover, painting, box, logo image, etc. So, a system is built
using start of- art binary features, the robust feature ORB combine with local histograms. Phan D. et al. have proposed an approach for planar object recognition using binary local invariant feature and local histogram features, in scene image. First, the object is detected based on a homograph which represents transformation of object in scene image from reference image, estimated from matching pairs of key points between two images.
Then, local intensity histograms are computed from blocks inside scene object. In order to locate these blocks, a reference object is divided into many blocks then the corresponding blocks of the scene object are located based on the homograph with assumption that object is a plane. To make the feature invariant to the illumination, the local histogram is computed from only intensity component (V) of HSV color of image. Similarity of two images is calculated from the similarity of their local histogram features and the matching ratio. The recognized object is the most similar reference object in data set. For evaluation, we experiment our method with a planar dataset from Stanford University which includes book cover, cd cover, painting, business card, etc.
4.2 Semantic-Feature-Based Object Recognition By Using Internet Data Mining
There are many successful algorithms that promise a high accuracy in recognizing objects, e.g. SIFT[17] , etc.
However, unfortunately they are only on image feature base, therefore could not establish relationships among images and the words inside. Moreover, the recognition itself is based on the given objects which may cause inconveniences in certain fields such as robotics. In the other hand, it is very difficult for the traditional image- processing-based methods to cluster objects, which function similarly but appear differently, correctly, because the image-based methods could not retrieve hidden property information that is beneath these objects‟
appearances.
To overcome these problems, in this paper a method is introduced that describes, identifies and clusters objects by measuring their semantic similarities. In order to detect text regions in an image, a text region classifier is trained with a SVM module in advance. Each image is processed by the SVM module to detect text regions . These text regions are then split out , and then binarized with thresholds calculated from their histograms. The binarized images are then passed into an Optical-Character-Recognition (OCR) module to read text labels out from images into text strings that are comprehensible to computers. The process is mentioned as Text Extraction . These text strings, been considered as keywords that are related to the image itself and the object that is inside, are searched one by one with a Google Web Search API to look for logically related information. With a number of web-search results in hand, these results are splitted into single words, which are considered as component words. These component words are definitely related to the corresponding keywords. This process is mentioned as Web Search and Component Words Extraction.
Xu J. et al. have considered a problem of automated object description and clustering. Because traditional image-processing based object recognition algorithms can only cluster objects in image-base, they have proposed a method to describe an object in human language and group similar objects together in text processing way. This paper describes a system that recognizes objects with text labels printed on the surface of objects themselves or their packing cases. By analyzing them, objects could be described in English words, and then be clustered into corresponding groups. we presented an algorithm that automatically read text labels that are printed on the surface of objects, and then search them through Internet, mining results to describe the object in human languages as feature words. Objects with feature words description are calculated similarities between each other to get them grouped into several groups.
107 | P a g e
V. DISCUSSION
In this paper, a survey of the recent researches on object recognition was given. Several recently proposed object recognition approaches are described and discussed. From the survey, we know that creating distinctive key point descriptors for object matching is a critical step to the whole object matching process because it is the invariance property of the intelligently made descriptors that make the object matching invariant to the scale.
While view-based object recognition approach is generally a bottom-up approach, it is possible to incorporate top-down approach technique into it to make the object recognition faster and more accurate.
REFERENCES
[1] R. S. Choraś, “Content-Based Image Retrival-A Survey,” Biometrics, Computer Security Systems and Artificial Intelligence Applications, Springer US, pp. 31-44, 2006.
[2] R.C. Veltkamp, M. Tanase, “Content-Based Image Retrieval Systems: A Survey”, Technical Report UU- CS-2000-34 (extended version), 2002.
[3] B. S. Manjunath et al., “Introduction to MPEG-7: Multimedia Content Description Interface,” John Wiley
& Sons, 2002.
[4] Bernt Schiele and James L. Crowley. Transinformation for active object recognition. In International Conference on Computer Vision, New Delhi, India, 1998.
[5] Bruce A. Draper, Jose Bins, and Kyungim Baek. Adore: Adaptive object recognition. In International Conference onVision Systems, Las Palmas de Gran Canaria, Spain, 1999.
[6] Y. Zhao, Y. Lu, Y. Tian, L. Li, Q. Ren, X. Chai, Image processing based recognition of images with a limited number of pixels using simulated prosthetic vision, Inform. Sci. 180 (16) (2011) 2915–2924.
[7] J.J. van Rheede, C. Kennard, S.L. Hicks, Simulating prosthetic vision: optimizing the information content of a limited visual display, J. Vis. 10 (14) (2010) 1–14.
[8] Yamamoto and H. Tamura: “A Model-Based Approach .o Localizing Stable Polyhedra by Using Geometric Conitraints”, The Trans. of IEICE(D-11), Vol.J73-D-II, No.2, pp.200-206, 1990-2.
[9] 4. Yamamoto, H. Okii, N. Ueda, K. Shimada, N. Kaneki md K. Ono: ‟„ Color Image Recognition Using Genetic 4lgorithm and Neural Network Model ” , Proceedings of ,he 1994 IEICE fall conference, SD-7-6, pp.468, 1994.
[10] 3runo Cemuschi-Frias and David B. Cooper: “3-D Space Location and Orientation Parameter Estimation of Lam- Dertian Spheres and Cylinders From a Single 2-D Image By Fitting Lines and Ellipses to Thresholded Data” , BEE Trans. on Pattern Analysis and Machine Intelligence, Vol.PAM1-6, No.4, pp.430-441, 19847.
[11] D. Moore, I. Essa and M. Hayes: “Exploiting Human Actions and Object Context for Recognition Tasks,”
Proc. of 7th Int. Conf. on Computer Vision, Vol. 1, pp. 80–86, 1999.
[12] M. Higuchi, S. Aoki, A. Kojima and K. Fukunaga: “Scene Recognition based on Relationship between Human Actions and Objects,” Proc. of 17th Int. Conf. On Pattern Recognition, Vol. 3, pp. 73–78, 2004.
[13] A. Gupta and L.S. Davis: “Objects in Action: An Approach for Combining Action Understanding and Object Perception,” Computer Vision and Pattern Recognition (CVPR2007), pp. 1–8, 2007.
[14] Lowe, D.G., "Object recognition from local scale-invariant features," Computer Vision, 1999. The Proceedings of the Seventh IEEE International Conference on , vol.2, no., pp.1150,1157 vol.2, 1999.
[15] H. Bay, T. Tuytelaars, L.V. Gool, "SURF: Speeded Up Robust Features", European Conference on Computer Vision, May 2006.
[16] M. Calonder, V. Lepetit, C. Strecha, and P. Fua, “Brief: Binary robust independent elementary features”, European Conference on Computer Vision, 2010.
[17] David G. Lowe, Distinctive Image Features from Scale- Invariant Keypoints. The Proceedings of the Seventh IEEE International Conference on Computer Vision: pp.1150-1157 vol.2, 1999.