processes to Augmented Reality for Remote Access Laboratories. Applying reliable and efficient parametric methods supports real-time object tracking for the follow-on sub-systems.
10
10 Computer Vision Object Detection
This chapter describes the current Computer Vision models associated with extraction of key digital image attributes. Existing Computer Vision models are reviewed to assess their capabilities to operate within the Augmented Reality Remote Access laboratory environment, including methods developed as part of this research.
Object detection models within the CV field, are built upon several models which rely on separating the signal from the noise. Computer Vision techniques such as edge or corner point detection find interest points associated with objects through the detection of discontinuities within the image colour or intensity gradients. Instead of attempting to map discrete points to features of an object within an image, alternative methods for object detection rely on classifying portions of an image as belonging to a group. The testing infrastructure, testing regime and the interfaces involved for object detection are explained in this chapter.
Augmented Reality systems require the CV systems to gather information pertaining to the environment, to interface with and successfully immerse the user into the environment. Object detection is a key requirement for AR RAL systems, and this chapter focusses on assessing the variety of CV object detection models that are suitable for the application. Contributions of this chapter consist of the testing and assessment of several CV object detection models to ascertain their capability of achieving the goals of this research; to perform object detection and tracking without the use of fiducial markers, in real-time.
Effective object detection learning algorithm models such as Convolutional Neural Networks (CNN) [229], rely upon comprehensive training sets. According to Girshick
et al. [230], research into feature detection systems has failed to achieve improvements in performance in recent years, even considering the extensive penetration of CNN models into the field. While their recent derivative of a CNN system is very effective, it is still dependent upon training set data. All Artificial Intelligence (A.I.) systems, such as genetic algorithms or neural network systems are precluded from this study because of their requirement for extensive training. Histogram of Orientated Gradient (HOG) [205, 231] is another source for functional object detection. Based on both boundary gradient orientation and image histogram signatures, the system readily locates objects/features within an image. The system is effective at discovering key objects within an image, but to locate a specific object, it also requires considerable training [205]. While these systems are very successful, as stated previously, the goal for this research is for the discovery of model(s) to deduce all the necessary information directly from the live video stream. Some attributes of the HOG model have been retained for a unique SIFT style implementation created as a result of this research, and described in Section 11.7 - Markerless Tracking.
Research contributions from this chapter involve methods to discover objects, within the test image, in a manner useable by Computer Vision object tracking systems. Additionally, a new histogram mode has been developed (see Chapter 8, Two- Dimensional Colour Histogram Object Signatures) which provides a medium for effective image segmentation and object matching.
The chapter is structured as follows: Section 10.1 explains the methodology employed to validate the various experiments. Section 10.2 defines segmentation CV models, while section 10.3 explains parametric object detection methods. The results of experimentation are discussed in section 10.4, and section 10.5 summarises the outcomes and contributions.
10.1 Experimentation Methodology
The ability to identify an object within a digital image data set is non-trivial. Understanding the attributes which inhabit an object, extracting the appropriate attributes and then matching them to a signature associated with the object is both complex and time consuming. Different CV object detection methods determine the strategy for validating the model’s capabilities.
All experiments were conducted on a dual Intel Quad Core i7-4790 CPU @ 3.6 GHz computer, with 8GB of RAM, running the 64-bit Windows 8.1 (build 9200) operating system. The video card is an AMD Radeon R7 200 Series with 38.97fps OpenGL CINEBENCH R15 score and 709 CPU score.
10.1.1 Segmentation Validation
Object detection by image segmentation functions via the collection of homogenous attributes within the image. Segmentation produces an array of non-standard shapes, which may or may not be linked with the object of interest. Evaluation of the object shape becomes difficult, as metrics for non-standard shapes require a method to quantify the object’s attributes. In isolation, segmentation can produce an indication of the location, shape and size of the required object, but very little other supporting information [232] to be useful for follow-on sub-processes such as object tracking. This research, as a result of the need to perform object tracking, developed an object detection method to locate objects between consecutive video frames. The detection method is called the Hotspot method, and filters the image under test with the attributes of the specific segmentation method. This object detection method was not interested in identifying the object. That is to say, it was not necessary to know that the object of interest is a car, or a kangaroo, just that the attributes associated with the object are able to create a hot spot within the image, to be found in consecutive video frames.
Segmentation attributes can be used to perform an additional (secondary) segmentation. Classification of each pixel is a binary test of the pixel attributes; matched attributes set the pixel classification to foreground, otherwise the pixel is classified as background. Hotspots are then easily able to locate the mass of foreground pixels, related with the object. Figure 10-1 demonstrate the effects of hotspot detection. The model image’s attributes (Figure 10-1 (a)) are used to segment the composition image (Figure 10-1 (b)) to produce the hotspot image shown in Figure 10-1 (c). The object of interest has the highest density (hotspot) of foreground pixels to anywhere else within the composition image.
An object associated with a hotspot no longer consists of spatial data. Hotspot detection is only interested in locating the region of the image with the highest density of set pixels. As this work is focussed on Augmented Reality functionality in Remote Access Laboratories, hotspot detection for tracking becomes simpler. See Chapter 11.2.2, Segmentation Matching for more details. For this chapter, the assessment of CV object detection models, based on segmentation, is concerned with the model’s ability to isolate sufficient object pixels to create a density structure.
10.1.2 Parametric Validation
Parametric style object detection models provide a unique set of attributes associated with the object of interest. The method of identifying the attributes and then measuring their kinship to the signature of the object of interest becomes the parametric model. Statistical analysis has provided a suite of data comparison tools. Analysing all potential tools is beyond the scope of this research, but the tools selected are common industry methods and fall inside the requirements of this research in providing fast processing for the data sets presented.
The majority of parametric object detection models provide data sets in a manner that is suitable for testing using the Least Sum of Squares Difference (SSD), shown in Equation 10-1. Comparing pairs of attributes from the template (prototype) and the
(a)
(b) (c) Figure 10-1. Segmentation example
Left: Target object, composition image including the target object Right: Segmentation showing hotspot locations
SSD = ∑ attribxy(ap− as) 2