Detection of Content in the Viewfinder - Visual Recognition of an External Display

5.2 Visual Recognition of an External Display

5.2.1 Detection of Content in the Viewfinder

Before any matching can take place, the content needs to be identified and separated from the background. We assume that there is a high contrast between the background and the foreground layer. This difference allows for better separation of them by the human eye and allows for a better recognition [AHH+02]. Similarly to the visual recognition process of the human eye, we use this contrast to separate the content from the background. As pictures taken by mobile devices usually have a lower quality compared to high-end digital cameras, the image needs to be preprocessed slightly to allow for a maximum recognition rate. The first step is to reduce noise and further increase the contrast of the image which later allows a clear identification of the content’s boundaries. The increased contrast then allows separating the content from any background using a binary filter (i.e., black if the brightness of a pixel is under a given value, white otherwise). The image now only contains black (i.e., background) and white (i.e., content) regions. As we are only interested in the content and not in the background at all, we can dismiss the background by finding connected components in the image. Each connected component thereby describes a rectangle in which the item is in (i.e., the bounding box), and contains the

binary sub-image. These so-calledblobscan be further filtered in two steps: first, blobs that are either too big (e.g., the screen in its entirety) or too small (e.g., remaining noise in the picture) are deleted. Second, blobs that entirely lie within another correct blob are also removed for ambiguity reasons. The remaining blobs now need to be matched to existing content on the external display. Figure 5.1 summarizes the image preprocessing.

Figure 5.1:Preprocessing of pictures taken by a mobile device: the contrast of the original

image (a) is first increased (b) for better contour detection. Thresholding the image allows removing the background (c). A blob detection finds connected components (d). These blobs are then filtered by removing too small (or too large) blobs as well as dismissing blobs that are placed entirely inside others (e). The remaining blobs are then used to ”cut” the regions of the original image (f).

If users hold the device exactly in front of the display without any rotation (i.e., the horizontal axes of both displays match), these blobs already give a good match between the taken picture and the external screen. This restriction is nearly impossible if we want to allow multiple users interacting on an external display simultaneously. Multiple users lead to arbitrary angles and dis- tances between mobile device and target display as they have at least slightly different viewing positions. The correct transformation between the mobile device’s image plane and the external display’s image plane needs to be calculated. Assuming that we have rectangular content, we first need to identify the corner points of each detected blob’s image. We chose to use theHough

5.2 Visual Recognition of an External Display 93

then be intersected to identify the corner points. The processing time of theHough Transforma- tiondepends on the number of pixels that may be on a line (here: each white pixel in the image). As we are only interested in the outline, we can dismiss all pixels inside the outline (i.e., turn them black). This is done by leaving the first and the last detected white pixel but turning all pixels in between black for both rows and columns. Now, theHough Transformonly tests pixels that are already describing the outline of each blob and results in four lines described by their distance to the image’s center as well as the angle of the line’s normal. To find the corner points of each blob, the four detected lines are intersected. Figure 5.2 summarizes the process.

Figure 5.2: Detecting polygons in the captured image for a good (top) and bad (bottom)

blob. From left to right: The contour of the detected blob is first calculated. A hough trans- form calculates lines (denoted in red). Intersecting these lines leads to a polygon (depicted in green). The polygon is used to ”cut” the respective portion out of the taken image.

Figure 5.3: Rectification of distorted content in the taken picture. Left shows the principle

of the Direct Linear Transformation. The transformation matrix H is invertible. Center shows the rectification with a correct point matching. Right shows the same process if the points were detected in a different order.

The four detected corner points are used to create a polygon for each blob. This polygon can now be used to counteract the perspective distortion for the visual matching described in the next section. To do so, we apply theDirect Linear Transformto rectify the distorted image [AAK71]. Such methods are recently been used in the hardware of digital cameras (e.g.,Casio Exilim H102)

as well. This procedure allows the transformation of a perspectively distorted rectangle into a two-dimensional one. As the content shown on the remote display is given in two-dimensions (in our scenario), applying this method to all recognized blobs allows a sufficient image matching of two rectified images with the same size. The size of the actual content is still unknown as no matching took place so far. Both the remote content as well as distorted images need to be sized equally in order to match them pixel-wise. Choosing the size of the predefined rectangle is a trade-off between the details shown in an image (i.e., the higher the size, the more details are shown) and the processing time (i.e., the smaller the size, the faster is the recognition process). As users are allowed to hold the device in arbitrary angles with respect to the display’s horizontal edge, the detected corner points are possibly not in the correct order. This means, that the first corner point is not necessarily the top-left corner point of the content. Hence, three further rotated images (i.e., 90◦, 180◦, and 270◦) of the rectified image are created. The entire process of rectifying the image and prepare it for image comparison is denoted in figure 5.3.

In document Boring, Sebastian (2010): Interacting "Through the Display": A New Model for Interacting on and Across External Displays. Dissertation, LMU München: Fakultät für Mathematik, Informatik und Statistik (Page 111-114)