Algorithm - Projection Space Touch Detection

CHAPTER 2: RELATED WORK

2.1 Touch Sensing Technology

3.2.2 Runtime Touch Detection

3.2.2.1 Projection Space Touch Detection

3.2.2.1.1 Algorithm

Suppose that C cameras capture images of an actual touch or hover event (i.e. not noise or erroneous contours), but it is unknown whether they correspond to a touch or to a hover. Let E denote this event, and let ECi _{= {x}Ci

1 , x Ci

2 , . . . , xCni} denote the pixels in the camera contour of camera Ci,

where 1 ≤ i ≤ C (Figure 3.7). Via the lookup table, each camera pixel xCi

j can be converted to

a 3D point XTCH3

Ci,j on the touch surface S; let X

Ci _{= {X}TCH3 Ci,1 , X TCH3 Ci,2 , . . . , X TCH3 Ci,n } be the set of

3D points on S from camera Ci’s contour. Furthermore, for the present discussion, we assume

that the lookup table contains perfect correspondence information. If E represents a touch, then the actual 3D points corresponding to the user’s finger are located on the touch surface S, which implies that the sets {XC1_{, X}C2_{, . . . , X}CC_{} of the 3D correspondences of observed camera pixels}

comprise 3D points on the same 3D region of S. If E instead represents a hover, then these 3D user input points are located off the surface. However, by definition, all of the camera-to-3D points XTCH3

Ci,j are constrained to the surface, and so the sets X

different (but potentially overlapping) 3D regions ofS. This is exactly equivalent to the behavior shown in Figures 3.8 and 3.9.

To simplify determining whether and to what extent the sets XCi _{overlap, let us instead consider}

the conversions of ECi _{from camera space to projector space: that is, let U}P Ci = {u

P 1, u

2, . . . , uPn}

be the projector contour pixels in projector P corresponding to the camera contour pixels of camera Ci. Since the camera pixel xCji corresponds to the 3D touch surface point XTCH

Ci,j , and x Ci j

corresponds to projector pixel uP_j, it follows that uP_j corresponds to XTCH3

Ci,j . The same conclusion

about the contour overlap for touch events versus hovers applies here, but the advantage is that this holds in a two-dimensional coordinate space (PRO2), where such computations are more direct.

In other words, if E represents a touch, the sets {U_CP

1, U P

C2, . . . , U P

CC} will comprise the same 2D

regions in the coordinate space of projectorP, whereas if E represents a hover, these sets will comprise different (but potentially overlapping) 2D regions. This holds for the entire set of projectors {P1, P2, . . . , PP}.

Compared to the analysis of the 3D regions XCi_{, this is a much simpler operation that can be}

performed directly on the pixels of images. To compute the amount of overlap, we create the projector response mask RP _{with the same dimensions as the projector P, where R}P

u,v is set to be

the number of camera-to-projector contours containing pixel (u, v)P. More formally,

RP_u,v = C X i=1      1 if (u, v)P ∈ UP Ci 0 if (u, v)P _{6∈ U}P Ci (3.3)

If camera Ci contains some pixel (x, y)Ci that corresponds to projector P pixel (u, v)P, so that

(u, v)P _{∈ U}P

Ci, we say that camera Ci contributes(u, v)

P_{. Thus, informally, the projector response}

mask RP encodes the number of cameras that contributed each projector pixel through its camera- to-projector contour. An example is shown in Figure 3.12.

Based on this formulation, RP

u,v ≥ 1 for every projector pixel for which at least one camera-

to-projector contour contains the pixel (u, v)P_{, and R}P

u,v = 0 if no camera-to-projector contour

contains this pixel. Thus, we can represent the aforementioned conclusion regarding touch versus hover imagery as the following: E contains a touch if RP_u,v 6= 0 =⇒ RP

u,v = C for all pixels

(u, v)P. This means that all camera-to-projector contours are identical, which implies that the corresponding 3D points on the touch surface S exist on the same 3D region of S. If this statement does not hold—that is, ∃(u, v)P | (RP

u,v ≥ 1 ∧ RPu,v < C)—then these 3D points occupy distinct

(but potentially overlapping) regions of S and thus represent a hover.

1 1 1 (a) 1 1 1 2 2 2 3 (b) 1 1 1 2 2 2 3 (c)

Figure 3.12: Example projector response masks, consisting of potentially overlapping camera-to- projector contours. In each example, the number of cameras contributing to each contour is shown. (a) The three camera-to-projector contours show no overlap, suggesting a hover. (b) The contours show some degree of overlap. (c) The contours show a high degree of overlap, suggesting a touch.

In practice, we must loosen this requirement. First, due to the relationships between the cameras, a projector P, and the touch-sensitive surface, it may be the case that not all C cameras have a correspondence at pixel (u, v)P. Instead, we consider the number of cameras that are known to have correspondences at (u, v)P, denoted N (u, v)P, where N (u, v)P ≤ C. This value can be computed directly from the lookup table, which encodes missing correspondences as null entries. Now, we state that E contains a touch if RP_u,v 6= 0 =⇒ RP

u,v = N (u, v)P. Informally, this

Furthermore, we cannot assume that the lookup tables contain perfect correspondences: in practice, the camera-to-projector contours of an actual touch may exhibit a small amount of inconsistency, being composed of several close overlapping regions instead of a single consistent one, due to inac- curacies in calibration, lookup table construction, or infrared imagery segmentation. However, we can still expect camera-to-projector contours from hovers to show a higher degree of dissimilarity. Instead of asserting strict equality among the contours, we employ a weighted scoring scheme that considers how they overlap. If a large number of cameras—but not all—contributed a large number of projector response mask pixels—but not all—this is still evidence of a touch event. However, if only one or two cameras out of a possible four contributed each projector pixel, this suggests a hover. Thus, we consider the ratio of projector response mask pixels contributed by i cameras to the number of pixels contributed by at least one camera (i.e. the area of the projector response mask contour in pixels).

More formally, we create a weight vector w = {w1, w2, . . . , wC}, where weight wi applies to

regions of the projector response mask RP for which i cameras contained a contour pixel with a correspondence in projector P. Let A(RP_{) refer to the number of nonzero pixels in R}P_{, and let}

A(RP_{, i) be the number of pixels in the response mask equal to i, meaning i cameras contributed.}

Finally, let N (RP) ⊆ {1, 2, . . . , C} be the set of nonzero values in the response mask—that is, i ∈ N (RP) =⇒ ∃(u, v)P _{| R}P

u,v = i. Thus, the maximum number of cameras that could

contribute to the projector response mask is given by max N (RP). The multi-camera agreement scoreS(RP_{) is defined as} S(RP) =                        0 N (RP_{) = ∅} N (RP₎ X i wi A(RP_{, i)} A(RP₎ N (RP₎ X i wi N (RP_{) 6= ∅} (3.4)

Informally, this represents:

• the weighted sum of the ratio between the number of pixels contributed by i cameras (repre- sented by A(RP, i)) to the number of pixels contributed by at least one camera (A(RP)),

• divided by the sum of the weights.

This provides a confidence score in the range [0, 1] for a projector response mask. In the event that all C camera-to-projector contours are identical, then N (RP) = {C}, A(RP, C) = A(RP), and the multi-camera agreement is equal to 1. If no cameras contributed any pixels to the projector response mask, the score is set to 0. As the size of regions for which the projector response mask is less than max N (RP) increases, the multi-camera agreement decreases as dictated by the weight values. For example, if w1 < w2 < · · · < wC, then pixels for which the projector response mask is

equal to 2 contribute less to the score than those for which the projector response mask is equal to 3. This places a penalty on regions for which only a small number of cameras contributed projector response mask pixels and gives higher weight to regions with a larger number of contributing cameras. Thus, higher scores provide higher confidence of an observation of a touch, while lower scores suggest a hover.

In document Multi-touch Detection and Semantic Response on Non-parametric Rear-projection Surfaces (Page 100-104)