Touch/Hover Classification Algorithms - Runtime Implementation

CHAPTER 4: REALIZATIONS

4.1 Software

4.1.2 Runtime Implementation

4.1.2.2 Touch/Hover Classification Algorithms

Next, we cover our implementation of the two touch/hover classification algorithms in C++ and OpenCV.

4.1.2.2.1 Base Touch Detector

We created a generic base touch detector class that handles certain components common to both the projection space and plane sweep touch detection algorithms, such as handling cameras and projectors, region of interest masks, background image models, and network communication. Ad- ditionally, this class has helper structures used to interface with the various lookup tables. His- torical information about a touch—such as the assignment of a consistent label and the number of tracked frames (see Section 3.2.2.4)—is also maintained by this class, as this information is independent of the specific detection algorithm used. This class is designed to be easily extensi- ble; our implementations of the two touch detection algorithms inherit from it, and other potential lookup-table-based algorithms could likewise extend this base class.

Depending on the desired semantic interactivity and on whether the touch surface is planar or non-planar, the touch detectors must consider one of three possible lookup table types:

1. Full lookup tables support the entire set of graphical responses as described previously, with Unity-based semantic content engines.

2. Limited lookup tables support touch detection and response when three-dimensional graphical content is not desired—that is, when interactions occur entirely within projection space. For limited lookup tables, the touch detectors need not consider localization on a graphics mesh G or any semantic regions (as discussed in Section 3.3.1).

3. Interpolated lookup tables support planar touch interactions. Planar surfaces require sepa- rate lookup table construction, as described in Section 4.1.1.1.

Each algorithm uses the same basic structure, starting with the segmentation of camera images, the detection of touches, the assignment of a label to each touch, and the transmission of touch messages to the semantic content engine. Our implementations of the two algorithms largely follow the theoretical presentation of Section 3.2.2; below, we highlight certain interesting aspects of our implementation, including some specific functionality and performance optimizations.

4.1.2.2.2 Projection Space Touch Detector

In practice, the conversion of camera imagery to projector space dominates the runtime require- ments of this touch detection algorithm. This is especially true as the number of cameras and projectors increases. As such, we incorporated a few optimizations to reduce computation time.

Using OpenCV, it is possible to transform the pixels of a source image to a destination image using a predefined matrix of pixel mappings. Our earliest implementation of the projection space touch detection algorithm relied on this remap functionality. However, it is somewhat inefficient in this case, as it remaps the entire camera images, even though only a small number of pixels represent a potential touch event. Moreover, the remap operation functions in reverse, filling the pixels of the destination images by indexing into mappings into the source images; this leads to additional computation time, as the destination projector images have significantly higher resolutions than

the source camera images. Instead, we convert only the candidate touch contours themselves, performed via sequential lookup table conversions of each contour pixel in camera space to its corresponding pixel in projector space.

Furthermore, we improved the efficiency of many of the projection space operations by performing them on specific rectangular regions of interest rather than the entire projection space images. For example, projector response masks can be naively computed by thresholding and summing each full camera-to-projector image. However, by computing the bounding boxes of each camera-to- projector contour, we can reduce execution time by performing these threshold and sum operations only in regions with nonzero pixels. Likewise, when evaluating the number of cameras that contributed to a response mask contour, we can limit processing to the overall contour bounding box. In practice, these regions of interest are on the order of a few hundred pixels—significantly smaller than the entire 1920 × 1080 pixel projector images. Such bounding box processing is di- rectly available in OpenCV, requiring only a few bounding box computations and resulting in a substantial speedup.

4.1.2.2.3 Plane Sweep Touch Detector

Our implementation for back-projecting camera pixels to rays and determining their intersections on a set of planes closely matches the theoretical presentation of Section 3.2.2.2. The singular value decomposition used for the initial best fit plane, the convex hull operations, and the planar area calculations are all facilitated by OpenCV routines. To compute intersections between the plane normal vectors and the touch and graphics meshes (S and G, respectively), we implemented the M¨oller-Trumbore ray-triangle intersection algorithm in C++ [107].

In general, our initial, straightforward implementation of the plane sweep touch detector was already significantly faster than even our optimized projection space touch detector implementation.

As one optimization, when computing mesh intersections, we use the camera-to-S and camera-to- G lookup sub-table correspondences to reduce the search to a set of candidate mesh faces, similar to the discussion of our implementation of preprocessing phase step PP8.

4.1.2.2.4 Sending Touch Messages

Transmitting information about a detected touch from the touch sensing system to a semantic content engine depends on the type of semantic content. For simpler projection-based content engines implemented in C++ and OpenCV, the internal detection structure—containing the coordinates of the touch in projection space and on the touch mesh S—already contains all necessary information to determine a touch response, so these structures are passed around as function arguments as needed.

However, for more sophisticated semantic content engines created in Unity, we use the UDP- based networking capabilities provided by the Boost C++ libraries [18] to send touch messages from our C++/OpenCV touch detector implementations to Unity. Internally, the touch messages (as described in Section 3.3.2) are created as strings storing the relevant information about the detection. As these messages are independent of the specific touch detection algorithm used, the base detector class handles the creation and transmission of touch messages.

4.1.2.2.5 Lookup Table Accesses

Finally, we wrote a small library comprising functions to load the lookup tables into OpenCV matrix structures to allow for indexing operations. This library also includes various routines that process lookup table data, such as converting camera contours to projector space. Each touch

detection algorithm implementation uses a subset of these loading and indexing routines specific to the lookup sub-tables it requires to function.

In document Multi-touch Detection and Semantic Response on Non-parametric Rear-projection Surfaces (Page 153-157)