Cloud – based object recognition: A system
proposal
Daniel LORENČÍK1, Peter SINČÁK2
Abstract In this chapter, we will present a proposal for the cloud – based object recognition system. The system will extract the local features from the image and classify the object on the image using Membership Function ARTMAP (MF ARTMAP) or Gaussian Markov Random Field model. The feature extraction will be based on SIFT, SURF and ORB methods. Whole system will be built on the cloud architecture, to be readily available for the needs of the new emerging tech-nological field of cloud robotics. Besides the system proposal, we specified re-search and technical goals for the following rere-search.
1 Introduction
Since the history of computers and computing began roughly 70 years ago, we have seen the large scale computers replaced by affordable personal computers. In the last years, we are witnesses to another notion – the personal computers shrank in size to tablets, netbooks and even smartphones, and the heavy computational and storage tasks are offloaded to the cloud. Also, the applications available on the cloud have high impact on the productivity as they allow for easy implementation on sharing data between several users, thereby promoting real-time collaboration and aggregation of crowd knowledge, example being the Google Apps suite [1] or Microsoft Office 365 [2].
With this knowledge in mind, it is possible to envision the similar system of application which will be available for use by robots. The obvious benefit is the possibility of creating small robots with greater longevity of battery life since the heavy computation is done elsewhere. These robots will not have to be highly so-phisticated. Therefore, they can be cheap or can be created from available re-sources like smartphones combined with the wheeled chassis. More than that, the robots can benefit from the sharing of knowledge. This idea was presented by
1 Department of Cybernetics and Artificial Intelligence, Technical University of Košice
2 Department of Cybernetics and Artificial Intelligence, Technical University of Košice
fessor James Kuffner in his talk “Robots with their heads in the Clouds” [3]. The knowledge sharing in real time has a potential to influence the ability of robots to exist in the real world as the knowledge gained in the learning process is gained by all robots using the service. We will provide more detailed information on the cloud robotics in the next section.
With the availability of cloud computing, and method of artificial intelligence provided as a service in the cloud environment, the idea of remote brain [4] resur-faces again. It is true that the connection to the cloud is crucial; therefore it is a weak link in the chain, but with the available connection options via WiFi and availability of 3G and 4G networks, this is more of a technological problem.
The structure of the chapter is as follows: in the second section, we will provide an introduction to the cloud computing and define the cloud robotics. In this sec-tion, we will also present the projects of cloud robotics. In the third secsec-tion, we will provide an overview of three methods (SIFT, SURF, ORB) for the feature ex-traction from the image, and two methods for classification (Membership Function ARTMAP and Gaussian Markov Random Field model) of the objects on the im-age based on extracted features. In the fourth section, we will propose a cloud based system for object recognition on image. Fifth section contains the conclu-sion of the chapter.
2 Cloud Robotics
The cloud computing can be viewed as grid computing with the added concepts from utility, service and distributed computing [5]. The relationship between dif-ferent distributed computing systems is shown on Fig. 1.
Fig. 1 Relationship between orientation and scale of different distributed computing systems
The cloud computing was defined by the National Institute of Standards and Technology as the “model for enabling ubiquitous, convenient, on-demand
net-work access to a shared pool of configurable computing resources (e. g. netnet-works, servers, storage, applications and services) that can be rapidly provisioned with minimal effort or service provider interaction.” [6].
The clouds are provided in four deployment models:
Private Cloud – cloud infrastructure is used exclusively by a single organiza-tion
Community Cloud – cloud infrastructure is used by a group of consumers with shared concerns
Public Cloud – cloud infrastructure is provided for public use
Hybrid Cloud – combines at least two of the previous models with the clear distinction between models of cloud infrastructure but provides the possibility to port applications from one model to another
Besides the deployment models, the cloud also provide three types of services: Infrastructure as a Service (IaaS) – the user has the ability to create and manage
virtual machines depending on his/her individual needs. The administration of machines, networking and all settings are the responsibility of the user
Platform as a Service (PaaS) – the user is provided with an access to the high-level integrated environment to build, test and deploy applications. Part of the required settings is managed by the platform itself (also the scaling is done au-tomatically). However, this can present some restrictions on the use of pro-gramming language or tools.
Software as a Service (SaaS) – the software or application is provided to the end users. Benefits are instant update of the application, and the minimal foot-print of it on the user computer (usually is used from internet browser). Exam-ples of cloud services are Google Apps [1], or storage oriented services with automatic synchronization like Dropbox [7] or SkyDrive [8].
Cloud robotics is based on the notions of “cloud” and combines the computa-tional power of the computer cloud and the availability of internet-connected de-vices. Device can be any hardware that has the ability to connect to the internet, and can be programmed to use the cloud services. It can be virtually any robot that has wired or wireless connection, or it can be smartphone, small computer (NetDuino [9], Raspberry Pi [10]). Especially when using smartphones with con-nected actuators (Romo [11], SmartBot [12]), or low cost small computers like Raspberry Pi, it is possible to create affordable robots, for which the cloud robot-ics can provide the software needed. This software can be in the form of AI bricks [13].
Most of the projects in cloud robotics until now were focused on the task of creating the cloud robotics infrastructure. In the process, similar to the services in cloud computing, the Robot as a Service (RaaS) was defined [14]. RaaS has to have the features of Service oriented architecture, namely it has to be a service provider, service broker and service client. The RaaS makes available the actions it can perform, accepts connections to it, and is able to use other services as well.
As was said, there are several projects concerned with cloud robotics:
DAvinCi is a cloud-based framework for service robots [15], which allows sev-eral robots to communicate together and collaborate on the creation of the envi-ronment map using FastSLAM algorithm.
MyRobots.com is a web based project focused on “connecting all robots and in-telligent devices to the Internet” [16]. It is promoted as a cloud service for robots, although currently only app store and basic monitoring service are available. It is possible to download the application for the device, and also upload user-created application. Monitoring service allows for remote monitoring of robot status, and it can send alerts if the robots encounter problem.
ASORO is an acronym for A*Star Social Robotics [17]. The main goal of the project is to create and promote social robots. From the cloud robotics point of view, this project is intriguing because all the robots created use the Unified Ro-botics Framework (UROF), which is essentially an operating system allowing to connect modules for robot functions. These are similar to the AI bricks already mentioned, and are used as needed for the tasks as path planning, task planning, navigation control and other.
RoboEarth is a project which goal has been to “create a World Wide Web” for robots [18]. RoboEarth is a collection of databases storing actions, objects data and environments data. These databases are shared amongst connected robots. Therefore if one robot has learned to identify certain object, all others robots gain this knowledge as well. The same is true for actions (or action recipes), which de-scribe how to do tasks, and environments, which store information about the ob-ject and their locations. The data in databases are encoded in semantic language OWL, so it is possible to derive new knowledge from existing or to use the same approach to the similar action. Also, the actions are finely tuned with the use. The action recipes are composed of atomic actions, which are again similar to the no-tion of the AI bricks.
The proposal of the system is based upon knowledge gained from these pro-jects and aims to an AI brick, which can be used in already available cloud robot-ics frameworks.
3 Image Processing and Object Classification
As our proposed system will provide a cloud based service for object classification on the image, in this section we will provide an overview of methods we will use. We will start with methods for extraction of local features from the image - SIFT, SURF and ORB, and continue with the classification methods based on Member-ship Function ARTMAP and Gaussian Markov Random Fields model.
Scale-invariant Feature Transform was described in detail in [19]. SIFT ex-tracts local features from scale-space extrema called key points. Key points are identified as a minimum (or maximum) of difference of Gaussians occurring at multiple scales of image pyramid. Next, the unstable key points are removed, and the orientation is assigned. Computing from the image pyramid provide invariance to scale, assigning the orientation based on a peak in local histogram provides in-variance to the rotation, and inin-variance to illumination is provided by thresholding the values of descriptor vector comprised of values of orientation histograms. The final descriptor has 128 dimensions. Experimental results from [19] suggest that SIFT can recognize even partially occluded objects, as for the object recognition only 3 descriptors are needed. It is invariant to scale, rotation and translation and partially to the affine translation (up to 50 degrees). Several improved method were proposed – Affine SIFT [20] to improve invariance to the tilt of the camera and Principal Component Analysis SIFT (PCA-SIFT) [21] which creates the de-scriptor vector using PCA.
Speeded Up Robust Transform was described in [22]. It is inspired by SIFT, uses three stages: detection, description and matching and is tailored to provide high speed and accuracy similar to SIFT. Detection is based on using basic Hessi-an matrix approximation Hessi-and integral images. Interest points are blobs located at maxima of the determinant of Hessian matrix. The searching of interest points on different scales is done by gradually larger filters as opposed to the resampling of the image in the SIFT [22], [23]. Description is similar to the SIFT approach, and based on Haar wavelet responses and produces a 64 dimensional vector (there are modification with different length of descriptor vector – 36 and SURF-128). In matching phase, the sign of Laplacian is used to determine if the interest point is a bright blob on a dark background or dark blob on a white background. Only the features with the same sign are compared, which leads to speed im-provement over SIFT.
Oriented FAST and Rotated BRIEF is a new approach proposed in [24]. It uses modified edge detector FAST (Features from Accelerated Segment Test, [25], [26]) and modified feature point descriptor BRIEF (Binary Robust Inde-pendent Elemental Features, [27]). It uses FAST edge detector to detect corners, and to provide scale invariation, the FAST is run on the scale image pyramid. Ori-entation of the corner is found by use of intensity centroid, which assumes that the corner intensity is offset from the center of the corner. BRIEF is found to be equal in performance to SIFT and SURF, and since it uses binary strings as description vectors, it is computed faster [27]. As the BRIEF is not rotation-invariant, the cor-ners found by oriented FAST are normalized in orientation and then the BRIEF descriptor is computed. Then the uncorrelated key points are selected and used to construct a Rotated BRIEF descriptor. ORB was designed to be faster than exist-ing local feature detectors SIFT and SURF. Experiments described in [24] suggest it is magnitude faster than SURF and two magnitudes faster than SIFT with the similar recognition ability.
Membership Function ARTMAP is a classification tool [28], [29] based on Adaptive Resonance Theory [30], [31] and the theory of Fuzzy Sets. The knowledge representation is based on the hypothesis that input samples are in fuzzy clusters in feature space, which is the universe of fuzzy sets. Therefore, it is possible to calculate the membership value of each point from feature space to every fuzzy cluster defined in this space. The MF ARTMAP network consists of: The input layer of neuron, which normalizes input and maps the input samples
to the comparison layer, and the number of neurons is the same as the number of dimensions in feature space;
Comparison layer is n to m grid, where n is the dimension of the feature space, and m is the number of neurons in recognition layer; size of the layer is dynam-ically changed in the process of learning;
Recognition layer contains neurons representing the fuzzy clusters in feature space; therefore the number of neurons can change in the learning process MapField layer which consists of neurons representing fuzzy classes. Here is
computed the value of the membership of the sample to the fuzzy class
Learning algorithm of MF ARTMAP is divided into two steps: structure adapta-tion, where new fuzzy clusters are created (therefore changing recognition and comparison layer); and parameters adaptation, where parameters of membership function stored in connections between layers are changed.
Gaussian Markov Random Field model (GMRF) was investigated for the task of object classification in [32], [33] on texture data, and in [34] the GMRF was in-vestigated for the task of image classification. The use of GMRF is interesting be-cause any data distribution can be approximated by Gaussian mixtures and there are many mathematical techniques to allow easy work with these data [35]. In the [32], the textures were modeled with the GMRF with parameters estimated from training samples observed at given angle as the GMRF is not invariant to scale and rotation. Classifier was based on modified Bayes rule consisting of obtaining the maximum likelihood estimate of the rotation and scale parameters for each class hypothesis, comparing the results and mapping the input sample to the class with the highest estimate. GMRF model of texture was parameterized, and the rotation and scale parameters have been a part of the model using spectral density of the GMRF. The results of the experiments show that this approach proved successful, and can be found in [32].
4 System Proposal for Cloud-based Object Classification
Based on the study of the cloud robotics, we have identified a challenging research topic in this field – cloud-based system for object classification from image data.
This system contributes to the cloud robotics as a distributed vision system can be available for any device capable of connecting to the internet and able to use
cloud service, will be based on the shared knowledge base and will fulfill the cri-teria for becoming and AI brick as defined in [13]. This system should be usable in existing cloud robotics frameworks like RoboEarth. The high level overview of the proposed system is shown on the Fig. 2.
The system will accept the image from the device in most commonly used im-age formats. Then, the imim-age will be preprocessed, and features will be extracted using already mentioned local feature extracting methods. The final decision which method will be used will be done depending on the results of performance tests. By using the single feature extraction method, we will ensure that the feature space will be normalized for all users. The clustering and classification will be done in classification module which will use shared knowledge from all users. The classification service will then send back the result of classification which will consist of at least five most probable object classes. In case the object on the im-age will not be classified or classified wrongly, the user will have the option to of-fer a better result.
Fig. 2 High level overview of architecture of the proposed system. Object classification module is shared between users and knowledge is stored in the structure of classifier
The main contribution of the system will be availability for various devices, and as a result of use of cloud computing platform should be available “ every-where and every time”. Second main contribution will be the use of shared knowledge. This will allow for increased rate in building a knowledge base, and will provide a higher quality service with a higher number of users. The
knowledge sharing, easy availability and implementation are the characteristics of proposed system.
The challenge here will be to adapt the MF ARTMAP and GMRF model for the cloud architecture. These two methods will be compared to their stand-alone versions.
To test this system, we will also create a test to verify the robustness and per-formance of the system. We will use standard classification tests, as well as test the classification of standard household objects.
Most decisive test will be the comparison of the system which is open to gen-eral use versus the system open to only handful of expert teachers. Open system can benefit from the crowd knowledge, where every user of the system can also contribute to the learning process. Hypothesis stands that the system open to gen-eral use will be faster in training, will have more object classes and will provide more accurate results in the long-term use.
5 Conclusion
In this chapter, an overview of cloud robotics, local features extraction methods SIFT, SURF, ORB and classifiers Membership Function ARTMAP and Gaussian Markov Random Field model were presented.
Based on the knowledge gained, the cloud-based object classification system was proposed.
The system was proposed as an AI brick, and aims to provide powerful and easy to use object classification from the image data for existing and future robots. The advantages of providing this system as a cloud service will be availability, both geographical, and for various devices (only requirements are the ability to connect to the internet and be able to use the cloud services), instant sharing of gained knowledge between connected devices, easy rollout of the new version, scalability, reliability and offloading of heavy tasks to the cloud.
More importantly, the system will be created in a way that will allow for easy integration to the existing cloud robotics frameworks like RoboEarth.
From the research point of view, since the cloud robotics is a relatively new field of technology and research, there are many challenges associated with it. One of them is the question if the cloud robotics will have an impact on the method traditionally used in Artificial Intelligence, the example being the implementation of the neural network on the cloud with the shared structure for every user. Acknowledgments Research supported by the "Center of Competence of knowledge technolo-gies for product system innovation in industry and service", with ITMS project number: 26220220155 for years 2012-2015.
References
[1] “Google Apps.” [Online]. Available: http://www.google.com/enterprise/apps/. [Accessed: 05-Jun-2013].
[2] “Microsoft Office 365.” [Online]. Available: http://office.microsoft.com/en-001/. [Accessed: 24-Jul-2013].
[3] E. Guizzo, “Robots With Their Heads in the Clouds,” IEEE Spectrum, 28-Feb-2011. [4] M. Inaba, “Remote-brained humanoid project,” Advanced Robotics, vol. 11, no. 6, pp.
605–620, 1996.
[5] I. T. Foster, Y. Zhao, I. Raicu, and L. Shiyong, “Cloud Computing and Grid Computing 360-Degree Compared,” in 2008 Grid Computing Environments Workshop, 2008, pp. 1– 10.
[6] P. Mell and T. Grance, “The NIST Definition of Cloud Computing Recommendations of the National Institute of Standards and Technology,” Nist Special Publication, vol. 145, p. 7, 2011.
[7] “DropBox.” [Online]. Available: http://www.dropbox.com/. [Accessed: 03-Jun-2013]. [8] “SkyDrive.” [Online]. Available: https://skydrive.live.com/. [Accessed: 05-Jun-2013]. [9] “netduino.” [Online]. Available: http://www.netduino.com/. [Accessed: 22-Jul-2013]. [10] “Raspberry Pi.” [Online]. Available: http://www.raspberrypi.org/. [Accessed:
22-Jul-2013].
[11] “Romo.” [Online]. Available: http://romotive.com/. [Accessed: 25-Jul-2013].
[12] “SmartBot.” [Online]. Available: http://www.overdriverobotics.com/SmartBot/. [Accessed: 25-Jul-2013].
[13] T. Ferraté, “Cloud Robotics - new paradigm is near,” Robotica Educativa y Personal, 20-Jan-2013.
[14] Y. Chen, Z. Du, and M. García-Acosta, “Robot as a Service in Cloud Computing,” 2010 Fifth IEEE International Symposium on Service Oriented System Engineering, pp. 151– 158, Jun. 2010.
[15] R. Arumugam, V. R. Enti, K. Baskaran, and a S. Kumar, “DAvinCi: A cloud computing framework for service robots,” in 2010 IEEE International Conference on Robotics and Automation, 2010, pp. 3084–3089.
[16] “MyRobots.com.” [Online]. Available: http://myrobots.com. [Accessed: 08-Jun-2013]. [17] H. Li, “A*Star Social Robotics.” [Online]. Available:
http://www.asoro.a-star.edu.sg/index.html. [Accessed: 13-Jun-2013].
[18] “RoboEarth Project.” [Online]. Available: http://www.roboearth.org/. [Accessed: 03-Jun-2013].
[19] D. G. Lowe, “Object recognition from local scale-invariant features,” in Proceedings of the Seventh IEEE International Conference on Computer Vision, 1999, pp. 1150–1157 vol.2.
[20] G. Yu and J.-M. Morel, “A FULLY AFFINE INVARIANT IMAGE COMPARISON METHOD,” in IEEE International Conference on Acoustics, Speech and Signal Processing, 2009, pp. 1597–1600.
[21] Y. Ke and R. Sukthankar, “PCA-SIFT: a more distinctive representation for local image descriptors,” in Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004, vol. 2, pp. 506–513.
[22] H. Bay, A. Ess, T. Tuytelaars, and L. Van Gool, “Speeded-Up Robust Features (SURF),”
Computer Vision and Image Understanding, vol. 110, no. 3, pp. 346–359, Jun. 2008. [23] H. Bay, T. Tuytelaars, and L. Van Gool, “SURF: Speeded Up Robust Features,” in
European Conference on Computer Vision, 2006, pp. 404–417.
[24] E. Rublee, V. Rabaud, K. Konolige, and G. Bradski, “ORB : an efficient alternative to SIFT or SURF,” in IEEE International Conference on Computer Vision, 2011, pp. 2564– 2571.
[25] E. Rosten and T. Drummond, “Machine learning for high-speed corner detection,” in
European Conference on Computer Vision, 2006, pp. 430–443.
[26] E. Rosten, R. Porter, and T. Drummond, “Faster and better: a machine learning approach to corner detection.,” IEEE transactions on pattern analysis and machine intelligence, vol. 32, no. 1, pp. 105–119, Jan. 2010.
[27] M. Calonder, V. Lepetit, C. Strecha, and P. Fua, “BRIEF : Binary Robust Independent Elementary Features,” in European Conference on Computer Vision, 2010, pp. 778–792. [28] P. Sinčák, M. Hric, and J. Vaščák, “Membership Function-ARTMAP Neural Networks,”
TASK Quarterly, vol. 7, no. 1, pp. 43–52, 2003.
[29] P. Smolár, “Object Categorization using ART Neural Networks,” Technical University of Kosice, 2012.
[30] G. A. Carpenter and S. Grossberg, “The ART of adaptive pattern recognition by a self-organizing neural network,” Computer, vol. 21, no. 3, pp. 77–88, 1988.
[31] G. A. Carpenter and S. Grossberg, “Adaptive Resonance Theory,” MIT Press, Boston, 2003.
[32] F. S. Cohen, Z. Fan, and M. A. Patel, “Classification of rotated and scaled textured images using Gaussian Markov random field models,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 13, no. 2, pp. 192–202, 1991.
[33] G. Rellier, X. Descombes, F. Falzon, and J. Zerubia, “Texture feature analysis using a gauss-Markov model in hyperspectral image classification,” IEEE Transactions on Geoscience and Remote Sensing, vol. 42, no. 7, pp. 1543–1551, Jul. 2004.
[34] M. Berthod, Z. Kato, S. Yu, and J. Zerubia, “Bayesian image classification using Markov random fields,” Image and Vision Computing, vol. 14, no. 4, pp. 285–295, May 1996. [35] R. A. Gopinath, “Maximum likelihood modeling with Gaussian distributions for
classification,” in Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP ’98 (Cat. No.98CH36181), 1998, vol. 2, no. 914, pp. 661–664.