3. Marker-based tracking
3.1 Marker detection
3.1.2 Pre-processing
Before the actual detection of the marker, the system needs to obtain an intensity image (a greyscale image). If the captured image format is something else, the system converts it, e.g. an RGB image is converted into an intensity image using a well-known technique (see [73], for example)[73]. From now on, we will assume that the marker detection system is operating with a greyscale image.
The first task of the marker detection process is to find the boundaries of the potential markers. Detection systems use two approaches: either they first thresh-old an image and search for markers from the binary image, or they detect edges from a greyscale image. These lower level image-processing tasks (thresholding, edge detection, line fitting, etc.) are well known and are therefore not discussed in detail here. For further information, consult [73, 74], for example.
Figure 24. On the left, the original image; on the right, the image after adaptive thresholding (example produced with ALVAR).
Marker detection systems using the threshold approach normally use an adaptive thresholding method (e.g. [75]) to cope with local illumination changes (see Figure 24). After thresholding, the system has a binary image consisting of a background and objects. All objects are potential marker candidates at this stage. Habitually the next step is to label all of them or otherwise keep track of the objects. Suitable labelling algorithms are presented for example in [73]. During the labelling pro-cess, the system may reject objects that are too small or otherwise are clearly something other than markers. We discuss these fast acceptance/rejection tests more in Section 3.1.3. Finally, the edges of all potential markers are marked (Fig-ure 25) and their locations are undistorted for line fitting (Fig(Fig-ure 26). After line fitting, the system tests the potential markers again, checking whether they have exactly four straight lines and four corners. Finally, the system optimises the corner locations in sub-pixel accuracy, these are used in further calculations. Figure 27 shows the marker coordinate system and an augmentation on top of the marker.
Figure 25. An example of edge detection using ALVAR. On the left: the edge contours detected on the threshold image (Figure 24) are superimposed onto the original image with red. On the right: the remaining edges after applying the four-corner test and size test.
Edge detection in greyscale images is time consuming, therefore marker detection systems using this approach generally use sub-sampling and detect edges only on a predefined grid [76]. As this approach results in separate line points, the marker detection system needs to link edge pixels into segments. Systems usually group these segments into longer lines using edge sorting. As the system samples the original points using a coarse grid, it needs to extend the lines to full length to find the exact corners of the marker. A common procedure is to use the gradient in-formation of the original image to extend the edges to full length.
Figure 26. An example of line fitting using ALVAR. On the left, line fitting in un-distorted coordinates; deduced corner point locations are marked with circles. On the right, detected lines over the original image.
Applications use several methods for line detection, line fitting and line sorting. In general, methods based on edge sorting (such as in ARTag) and the method presented in [76]) are robust against partial occlusion, but are computationally more expensive, which make them unsuitable for current mobile devices.
Traditionally in photogrammetry, the whole image is undistorted (commonly be-fore preprocessing) using the inverse distortion function calculated during the camera calibration process. This is a suitable approach in off-line computer vision applications. In real-time augmented reality applications, systems typically un-distort only the locations of feature points (e.g. the detected edges of a marker) to speed up the system. We discuss camera distortions a bit more in Section 3.2 and explain them in detail in the Appendix B.
Figure 27. A cube augmented on top of a detected marker. The marker coordinate system (X,Y,Z) is rendered with (red, green, blue). Example produced using ALVAR.
Even small errors in detected 2D locations of edges and corners significantly af-fect the calculated pose of the camera [77–79]. Detection errors can be caused by a pixel quantisation error, incorrect threshold value, motion blur, noise, etc. These errors cause annoying jitter in an object’s pose even if the camera hardly moves.
In order to increase accuracy, detection systems optimise the locations after initial detection.
For example, if the system detects a marker from a threshold image, it may use the greyscale image to find edges or corners with higher accuracy. The system may also use the detected corners as an initial estimate for a more accurate cor-ner detection method than initially used. Sometimes systems estimate motion blur from detected marker edges and decompensate motion blur for higher accuracy [80, 81]. Pixel quantisation errors are especially obtrusive if a marker edge coin-cides with the pixel coordinate axis. The whole edge may jump from one pixel line to another. In diagonal edges, errors occur in several directions and edge fitting stabilises these.
The number of edges and corners in the original image may be huge, and if all of them were analysed initially with high accuracy, the system would waste a lot of capacity in processing non-marker information. Therefore, these kinds of two-step approaches are common in real-time AR applications.