Validation and Verification - Bitonal pixel model

Equation 4-3. Bitonal pixel model

5.3 Validation and Verification

This research builds ground truth images using a novel approach, to minimise the subjective nature of defining ground truth features. This approach applies multiple Computer Vision edge detection models to the image, building an extended edge map of potential interest points. A trend of edges (or votes) develops as each CV model is applied, with the raw trend data for ground truth image. Test image GT-03 is shown in Figure 5-3 as it accumulates votes from the fifteen edge detectors listed below.

• Laplacian 3 x 3 [13] • Gradient Derivative

• Gradient Edge (First Derivative)

Figure 5-2. Extreme edge detection results of Airplane Image (top) Left: Gradient (Second Derivative) Edge Detector

• Gradient Edge (Second Derivative) • Laplacian 5 x 5 [13] • Laplacian of Gaussian [14] • Sobel 3 x 3 [10] • Sobel (Absolute) • Prewitt 3 x 3 [12] • Kirsch 3 x 3 [11] • Canny [15] • Homogeneity • Compass Sobel • Compass Prewitt • Compass Kirsch

Pixels assigned to edge maps, for each CV model, increment the corresponding pixel votes within the ground truth image. This effect is visible in the images shown in Figure 5-3. Additionally, the clarity of the ground truth image degrades as noise from each CV edge detection model builds up. This is apparent from the minimal detail remaining in the first image (top-left) to the overexposed last image (bottom-right). However, accumulating pixel votes uncovers the primary concentration of common detected edge pixels.

As can be seen in Figure 5-3, there is significant noise from the accumulated edge detection process. The distribution of votes across the ground truth image, shown in Figure 5-4, could be considered to follow a single-sided Gaussian distribution, so two methods were considered to improve the raw vote trend data: manually remove the noise, or apply a filter to the data. Filtering the accumulated edge votes by removing pixels with votes outside one standard deviation can minimise some of the subjective choices. Figure 5-5 shows the clarity achieved between the raw trend edge data, and the Gaussian distribution filtered data. Between the two sets of data, it is simple to complete a ground truth image set.

Human intervention is still required to assess the trends and hardcode the ground truth regions. But from the trends, the effects of human subjectiveness can be minimised. It was felt that the combination of both multiple CV model trends and human decisions would provide the best of both capabilities to develop the ground truth images.

Figure 5-5. Ground Truth images, comparison against Gaussian weighted hits Figure 5-4. Edge Detector voting distribution

Colour coding of the ground truth images allows true and false positives classifications, along with true and false negatives measurements. Black dots or lines represent true positives (TP) pixels, representing valid corners, feature points or edges. Red areas are regions which should not have any hits from a CV detector, so will record a false positive (FP) if a CV model places anything in the region. White represents regions where spurious results may appear, but we do not care about the data. For example, areas that may appear white can be associated with textured backgrounds or inconsistent lighting and is not of any interest for testing. Any point, line or features point located within the don’t care region is not recorded. Human intervention was necessary to determine unimportant regions (don’t care areas), but equally important was the assessment of whether important features were all accounted for.

Within the ground truth image, pixels are allocated a membership to one of three possible classifications, as defined by their colour coding.

• Key Point: This is a pixel that indicates an edge or feature point that the model under review must detect.

• No Point: This is a pixel that is not an edge or feature point, and the model under review must not detect.

• Don’t Care: Represents pixels that are not relevant to the detection process. In most situations this represents regions or features that are unimportant to the goals of the model

Pixels classified as Don’t Care are primarily selected in regions of the image which are not analysed by the CV model. For example, a CV system monitoring road traffic does not care about edge detection of the nearby trees. While the CV models may detect the trees to varying degrees, the application of the model means that the region will not be considered as part of the effectiveness score.

5.4 Summary

The reduction of subjective decisions regarding appropriate features of an image enhances the reputation of the golden standard. Through Computer Vision image analysis model voting on the location of key features, reliable validation, verification and classification of model trials is possible, with little human intervention. Some subjective decision may be performed, if necessary.

Further improvement of the ground truth image may occur manually, if desired, depending upon the requirements of the work. Figure 5-6 demonstrates some additional subjective work on the intermediate composite statistically filtered image. Some lines have been completed, and regions have been marked as Don’t Care (in white) as, for the current tests, these regions where not important in object detection of an aeroplane. This method was employed to create all the ground truth CV image analysis files for this research. The original image and the matching ground truth images can be seen in Test Images and Ground Truth Images.

With access to reliable ground truth image files, testing edge detection models becomes possible with binary performance classifiers to score measures such as accuracies, precision, and sensitivity. This methodology is discussed in the relevant chapters.

6

6Computer Vision Image Analysis for Augmented Reality

Systems

This chapter describes the current Computer Vision models associated with extraction of key interest points such as edge, corner and feature points. Computer Vision models are assessed to measure their capabilities to operate within the Augmented Reality Remote Access Laboratory environment.

Analysis of digital images occurs to gather knowledge from the image data set. Knowledge gathered from the digital image allows an understanding of the components with the image, for use by higher function processes. Within visual AR systems, image analysis interprets the raw information sets for follow-on processes such as object tracking or other AR sub-systems. The major contributions from this chapter comprise of ascertaining the performance classifiers for edge detection models, and the validation and verification of object detection models within digital images, which support the requirements of AR RAL environments.

Recent advances with high definition video screens and monitors, plus the use of multi- chip digital image capture devices have allowed vast improvements in Computer Vision systems, but at the cost of increased data sets sizes. The distribution of pixel colours throughout an image, is processed in an attempt to understand the scene, and extract details of objects and their relationship to other objects. Three primary methods are employed as a first step towards information discovery; segmentation, edge/corner detection, and feature detection.

• Segmentation involves classifying pixels as either foreground or background, based on the criteria of the current model [7].

• Edge or corner detection locates boundaries associated with geometric discontinuities [188].

• Feature detection isolates key aspects of the image in order to identify objects or key reference points [189].

Many image analysis models are computationally expensive, and even with current high end graphic workstations, real-time processing is difficult to achieve [190]. This research performs CV image analysis on real-world images, to locate key interest points in support of follow-on CV object detection and tracking systems. Computer Vision models are also assessed to determine their ability to operate in real-time, a requirement of AR RAL systems.

This chapter is structured as follows: Section 6.1 explains the experimentation methodology, while sections 6.2, 6.3 and 6.4 define the edge, corner and feature point image analysis models. Section 6.5 provides the results of CV image analysis verification and validation, while section 6.7 summarises the research.

In document Object tracking in augmented reality remote access laboratories without fiducial markers (Page 100-106)