• No results found

4.3 Browsing, Filing and Sharing

6.1.2 Vision-Based Multi-Touch Sensing

A second strategy to implement multi-touch capable hardware is to use cameras and computer vision techniques to detect fingertips and other objects. Vision-based systems can loosely be grouped into two categories; “direct” systems, where cameras are aimed directly at the users hands, and “indirect” systems where cameras (usually behind a projection screen) observe changes in an image taken from the surface itself. Vision-based approaches have been tremen- dously popular and proved to be powerful in many projects proposed in the literature. One reason is that the sensor – a digital camera – is readily procurable and yields results quickly. A second compelling advantage is the flexibility and power of computer vision techniques that can be har- nessed in vision-based sensing.

Hand and Gesture Recognition

One of the earliest examples of computer vision techniques in natural hand gesture human computer interaction are several projects by Myron Krueger and colleagues. In VIDEO- PLACE[KGH85] users stand in front of a blank, wall-sized screen (see Figure6.3 a). Camera images are segmented by the system so to attain a silhouette of the users hand or even entire body (see Figure6.3 b + c). The outlines of captured objects are then projected back onto the

6.1 Multi-touch Input Technologies 79

(a) (b) (c)

Figure 6.3: (a) The VIDEOPLACE [KGH85] installation. (b) users could interact with virtual objects with their entire body. (c) Outlines of hands were segmented from the back- ground and used in the interaction.

screen and augmented with computer graphics or used to interact with virtual objects. This tech- nique and similar ones have been demonstrated in purely artistic contexts and applications such as remote communication between two VIDEOPLACES.

Wellner’s Digital Desk[Wel93] is an early example of a vision based tabletop system. The system uses frame differencing of thresholded images in order to segment fingers from a cluttered background (a regular work desk see Figure6.4a). A microphone is then used to determine when users tap the surface of the desk to indicate a “click”. Several interesting applications have been shown that merged the physical items on the desk with virtual objects projected onto the table from above, such as a virtual calculator displayed next to a printed spreadsheet. Figures from the spreadsheet may be used as input to the digital calculator by simply pointing with a pen. The Digital Deskstill serves as a source of inspiration for many recent tabletop projects.

(a) (b)

Figure 6.4: (a) Wellner’s DigitalDesk setup on the left. Example applications include a pro- jected calculator alongside a printed spreadsheet and copying printed information through simple gestures. (b) PlayAnywhere’s compact and mobile setup. Bi-Manual map manipu- lations enabled by optical flow motion estimation.

Wilson demonstrates several interesting advancements to the direct vision-based approach in thePlayAnywhere[Wil05] system, a compact portable front-projected and vision-based tabletop

80 6. Technical Foundation

prototype. A commercially available short throw projector is augmented with an IR illuminant and an off-axis mounted camera equipped with a matching IR-pass filter. Several computer-vision techniques are demonstrated to detect touch/no-touch events, detect and track visual barcodes (position and orientation), track pages of regular paper as well as a novel algorithm to interact with on-screen objects through the calculation of motion-flow rather than tracking of individual fingertips (Figure6.4b). Touch detection is accomplished through IR shadow tracking. Changes in the shape of shadows casted by fingers are heuristically evaluated. As a finger approaches the surface the distance between finger and shadow decreases, ultimately the finger occludes its own shadow. This approach only works for one finger per hand. In order to enable more fluid interactions with virtual information using multiple fingers or even whole hands a novel algorithm to calculate optical flow from the captured image is detailed. Instead of tracking discrete contact points, simple statistics about changes in the image are calculated, and areas of motion at the location of virtual objects are integrated into a continuous position and orientation transform.

Thevisual touchpad [ML04] system follows a similar idea but features more robust finger tracking and an extensive set of gestures on and above the touchpad. The disparity of two, off-axis cameras is used to track fingers and calculate the height above a low-cost “touchpad” (basically a black cardboard rectangle). After background subtraction to isolate hand contours, the system uses heuristics to label fingers and calculate their 2D orientation allowing for multi-touch in- put as well as multi-finger gesture recognition. Several interaction techniques are demonstrated that enable fluid and bi-manual interaction in photo manipulating, finger painting and text input scenarios.

Indirect Multi-Touch Recognition

In the last few years a second approach to vision based sensing has enjoyed great popularity in the research community and in commercially available products: “indirect” vision systems. The key aspects here are an IR illumination scheme and a camera pointed at the projection screen from

(a) (b) (c)

Figure 6.5: (a) HoloWall illumination and sensing scheme on top. Bi-manual and multi-user interaction on bottom . (b) FTIR screen as seen by the camera. Two-handed multi-touch interaction enabled by the approach. (c) The approach scales to large wall-sized displays.

6.1 Multi-touch Input Technologies 81

behind or underneath. Touch events are not detected via directly segmenting hands and fingertips against the background and using second measurements (microphones, stereo disparity). Instead, these systems use a diffusive projection screen and the reflectiveness of skin in the IR spectrum to detect when the surface is being touched.

Matsushita and Rekimoto’s HoloWall [MR97] is an early and typical example for this ap- proach. The HoloWall is a wall size interactive display. It consists of a regular sheet of glass coated with a semi-opaque, diffuse projection screen. Behind the wall an IR light source and a video camera equipped with an IR cut-off filter are installed (only letting light pass above a wavelength of 840nm). A diffuse projection screen in front of the wall shields objects from the camera’s view. Unless an IR reflective object - human skin is roughly 20% IR reflective - is brought into close vicinity of the glass. Depending on the selected threshold, objects close enough (0 to 30 cm) reflect some of the IR light back through the screen and therefore become visible to the camera (Figure6.5 a). Not only fingers but any IR reflective object can be sensed, for example objects tagged with a 2D barcode can be located and identified. The system can de- tect many contact points simultaneously and because of the camera’s location occlusion problems are mitigated.

Also relying on sensing in the IR domain, Han demonstrated a simply albeit robust scheme based onfrustrated total internal reflection[Han05] (FTIR). When light encounters an interface between media with different indices of refraction the light becomes refracted and remains within the medium with the higher index of refraction. This phenomenon is the basis for all optical light- guides. However, another material in contact with the interface between the two materials can change the index of refraction and frustrate the total internal reflection, causing light to escape the waveguide at this point.

(a) (b) (c)

Figure 6.6: (a) Multiple tradeshow visitors interacting with a custom made, multi-touch enabled tourist guide application. (b) The table in the show room at IFA 2007 in Berlin. (c) A virtual tourist guide for the city of Berlin has been developed and deployed on the multi-touch table.

The approach uses an IR edge-lit sheet of acrylic as waveguide. Fingers touching the panel causeIRlight to escape and reflect it back through the screen where a camera equipped with an IR -pass filter detects this scattered light (Figure6.5 c + d). Simple computer vision techniques are used to detect fingertips and multiple contact points can be tracked at interactive rates across

82 6. Technical Foundation

video frames. This system caught the attention of many researchers and DIY enthusiasts alike. Due to its relative simplicity and ease of construction a multitude of devices in various form factors and for different application domains have been built based on this approach. For example, we have built our own multi-touch table based onFTIRtogether with a Berlin based company (Foresee2). The table was exhibited in a public showroom at the IFA tradeshow in September 2007 (see Figure 6.6). One limitation of the FTIR approach is the requirement for touching objects to be soft and slightly wet, such as skin. Therefore the system cannot detect objects made from hard material (e.g., a coffee mug) or detect visual markers such as barcodes. Furthermore the approach requires some form of compliant surface that allows to project virtual imagery onto the waveguide without getting in the way of theFTIReffect.