4.5 Supervised Perception
4.5.2 Object Detection
The setup comprises of one YouBot with a laptop, one iPad, one USB Webcam and one Wireless network. Shown in Figure4.13is the user interface for specifying parameters for the optimization routines for detecting the wires. The human user swipes on the line it wishes to cut, and at 30Hz, the robot autonomously tracks and approaches the identified wire.
(a.) (b.)
Figure 4.13: The graphical user interface for eliciting user visual preferences can be accessed through a tablet (a.) or desktop web browser (b.)
Color Saliency
To feed the line detection algorithm important edge detection information, the luminance channel typically defines edges [2]. While this provides a high degree of variance in natural scenes, many color cues are ignored that help find striking edges between preferred color boundaries. With a webcam, images are captured in YUV space; the Y channel is similar to RGB2Gray conversion in OpenCV [14] that estimates luminance. Edges can disappear in the wrong conditions using this channel and learning the correct colorspace can provide more identifiable boundaries. The saliency of the color space [144] is increased using a very simple approach to find an optimal mapping from YUV to the greyscale colorspace.
maximize u
∑
i,j ||uTI(i,j)||22 subject to uTu=1 iMIN≤i≤iMAX jMIN≤ j≤jMAX (33)With a human in the loop, the user can provides a region of interest with their swipes that informs of the proper color channels. Within this region, statistics can be calculated to form a more
intelligent method for converting YUV images into a grayscale image with identifiable boundaries. Each channel must be equalized to ensure the correct scale between channels. The region is specified with a box from coordinates(iMIN,jMIN)to(iMAX,jMAX)and is shown in red in Figure4.13(a.) with
added orientation. The overall optimization formulation for finding the linear colorspace transform,
u, is shown in (33). A typical resulting single channel image is shown in Figure4.14(a.).
(a.) (b.)
Figure 4.14: The image on the left shows the lines detected in the color channel of highest variance. The right image highlights the edges in the image.
Line Detection
The Radon transform [116] performs line detection, which requires an edge image. The edge image is calculated using the zero crossings of the Laplacian of Gaussian technique [125], which performs the smoothing and gradient mask in one step. An anisotropic Gaussian is informed by the user, where the principal direction is perpendicular to the direction of the wire’s edges; the underlying Gaussian is described with covarianceΣin Equation34, whereρ0specifies the radial edge sensi-
tivity. By the direction of the user’s swipes, an approximate direction can be used to find pertinent edges using the Laplacian of Gaussian filter,∇2N(µ,Σ). One disadvantage of the anisotropic nature
is that the discretized kernel must be larger than a commensurate isotropic filter in order to capture the anisotropic characteristics. The result of the anisotropic filter is shown in Figure4.14(b.).
Σ= ρ0cosθ ρ0sinθ −ρ0sinθ ρ0cosθ (34)
Figure 4.15: The Radon space image, shown on the left, shows the votes for lines in an image. Nearby parallel lines signify a wire, shown on the right, from a zoom-in of the left Radon image.
The continuous form of the Radon transform over the image space is shown in Equation (35), with its discretized version for our application in Equation (36) whereR(ρ,θ)is the probability of having a line with the specified coordinates [12]. The result of the discrete transform on an image is shown in Figure4.15. R(ρ,θ) = Z w 0 Z h 0 I(x,y)δ(ρ−xcosθ−ysinθ)dx dy (35) R(ρ,θ) = w
∑
0 h∑
0 I(i,j)δ(ρ−icosθ−jsinθ) (36)In Radon space, parallel lines are trivial to find by fixingθ and searchingρ for the two largest peaks. Parallel lines will share a commonθ, but these lines should be close together, notated with the constraint|ρ1−ρ2|<∆ρMAX. Neighboringρmay split peaks, so a constraint must be used such that|ρ1−ρ2|>∆ρMIN. Additionally, the human provides swipe information that aids in detecting
the angle of the parallel lines, notated as an additional costγ||θ−θH||22. Written as an optimization
max θ,ρ1,ρ2 w
∑
i=0 h∑
j=0I(i,j) (δ(ρ1−icosθ−jsinθ) +δ(ρ2−icosθ−jsinθ))
+γ||θ−θH||22
subject to |ρ1−ρ2|<∆ρMAX |ρ1−ρ2|>∆ρMIN
(37)
Some computational tricks can be applied to speed the optimization. If two pixels are found in a row vertically, than only update vertical theta. Horizontally, the same concept applies. This may be more robust to noisy pixels. When searching for lines, we need only populate the vertical or horizontal half of the Radon space, thus reducing the search space.
At 60Hz, we recorded(x,y)positions of the touches of the human operator. We ran these mea- surements through a linear Kalman filter [155] to smooth the data and acquire reliable velocity estimations. With the information from the touch input, we can effect a prior onto the hough trans- form space, so that we upweight lines in a certain region. For future work, this can be cast as a tracking problem, where the human provides new measurement updates. In general, this tracking is performed in Radon space [9], [96], where the object is modeled with a gaussian distribution. Using a filter, the human guess can be tracked in Radon space as the arm moves. Assuming that the object of interest is not moving, a control input can be added to our model based on how the arm moves.