• No results found

1.3 Outline of the Thesis

2.1.1 Range Sensing

Various approaches for contactless measurement of distances exist. In this work, only optical range sensors are considered, as non-optical methods, such as sonar and radar, although they provide depth information in an inexpensive way, lack of accuracy in the measurement direction.

Furthermore, optical range sensors differ in their physical measurement tech- nique. The most common systems are based on structured light, Time-of-Flight (ToF), active or passive triangulation (Blais, 2004).

For structured light systems, a predefined light pattern is projected onto a scene and simultaneously observed by a camera (Zhang et al., 2002; Scharstein and Szeliski, 2003; Geng, 2011). The affordable and thus widely-used active RGB- D (Red, Green, Blue plus Depth) cameras, such as the Microsoft Kinect 1 or

Asus Xtion 2, use an infrared sensor to infer depth from the deformation of a projected speckle pattern and an RGB (Red, Green, Blue) camera to match color information to the range image. For details on the working principle of RGB- D cameras see (Han et al., 2013) and for a review on its various applications see (Berger et al., 2013).

ToF sensors measure the absolute time (phase delay) between emitting a light pulse and receiving its reflection. They require precise calibration and noise reduction for accurate depth measurements (Fuchs, 2012), since the measure- ment of returned light pulse is inexact due to light scattering and multipath mitigation. For a detailed review of ToF sensors see (Foix et al., 2011).

Triangulation-based systems, which can be divided into active and passive, mea- sure the distance by determining the size of a triangle, which is formed by two non-parallel rays viewing the same point (Hartley and Zisserman, 2003). For passive triangulation systems, the triangle consists of the two rays of a camera pair, a so-called stereo camera. The method to match the reflected light of the global illumination in both images is referred to as stereo matching. The

1

Microsoft Kinect http://www.microsoft.com, 2014

2

2.1. 3D DATA ACQUISITION 13

Table 2.1: Comparison of different range sensors (stereo camera, RGB-D camera, ToF sensor and laser stripe profiler) in the context of 3D modeling: accuracy refers to the precision of the depth measurement and robustness refers to how well the sensor performs under changing conditions such as illumination and on untextured surfaces. The sensors are rated by best (++), good (+), bad (-) and worst (- -) suitability for each category.

Stereo RGB-D ToF Laser Resolution ++ ++ - - - Data Rate ++ + + - - High Illumination ++ - - + + Low Illumination - - + + ++ Untextured Surfaces - - ++ ++ ++ Accuracy - - - - ++

distance between the two cameras is referred to as base distance.

For active triangulation systems which contain a light source, the ray of the light source to the intersected surface point and the reflected ray captured by an optical camera form the triangle. Structured-light sensors are also based on the active triangulation principle and laser triangulation systems can also be categorized as structured-light systems since a laser stripe or point also repre- sent patterns. Nevertheless, these are treated separately here. In the area of laser triangulation, we only consider laser stripe profilers such as presented by Winkelbach et al. (2006) and Suppa et al. (2007). Here, laser light illuminates a stripe when colliding with the object surface recording the reflection with a cam- era. Laser stripers obtain high quality depth measurements, but only acquire a 1D range image. More recently, laser range scanners have been developed which allow for a variable range due to an autofocus camera (Kielhöfer et al., 2011). However, the application to 3D modeling still needs to be evaluated.

In Tab. 2.1 the different range sensor types (stereo camera, RGB-D camera, ToF sensor and laser stripe profiler) are compared in the context of 3D acquisi- tion concerning range image resolution, frame rate, measurement accuracy and system robustness. The resolution is defined by the size of the range image which represents a matrix or stripe of depth values. The data rate relates to the amount of depth values per time segment. High illumination refers to very bright scenarios such as outside or a room with sunlight coming in whereas low illumination refers to dark areas. Untextured surfaces describe object surfaces with no texture as in several industrial objects. Accuracy refers to the absolute measurement error of the sensor. The sensors are rated by best (++), good (+), bad (-) and worst (- -) suitability for each category. However, as several examples for each range sensor type exist, this categorization only describes an

estimation but might not be correct for each model.

A stereo system is not very robust against low illumination, as no active illu- mination is given. Furthermore, stereo cannot measure untextured surfaces as the measurement principle depends on the corresponding feature. However, this disadvantage can be compensated by using a pattern projector in addition to the passive stereo system. Stereo cameras are very flexible as the base distance and the camera type and therefore working range and resolution can be ad- justed depending on the application. Nevertheless, higher resolution does not allow for real-time range image acquisition as is the case for the other sensors. In order to acquire near real-time stereo reconstruction, the algorithms need to be ported to GPU (Graphics Processing Unit) or FPGA (Field Programmable Gate Array) boards. In (Gehrig et al., 2009) an FPGA implementation of the Semi-Global Matching (SGM) algorithm (Hirschmüller, 2008) is presented. The suggested system allows for real-time range image acquisition with VGA (Video Graphics Array) resolution. RGB-D cameras also generate range images with VGA resolution, ToF cameras only below QVGA (Quarter Video Graphics Ar- ray) and laser stripers simply deliver a 1D range image. A major drawback of the laser striper is that the acquisition of a complete view of an object requires time, since the laser stripe needs to be moved over the object.

RGB-D sensors catalyzed a multitude of efforts for 3D modeling and recognition due to the low-cost. Despite its indisputable uses, the work of Meister et al. (2012) shows that for 3D reconstruction of objects, curved and concave details in the scale of around 10 mm are lost and simply smoothed out. This indicates that only with laser sensors accurate depth measurements can be generated. In (Smisek et al., 2011), the accuracy of stereo, ToF and Kinect systems is compared. Stereo and Kinect perform similar and the ToF sensor generates the most inaccurate range images. However, the ToF does not seem to be calibrated correctly and noise reduction is not considered. Stoyanov et al. (2011) also compare Kinect and two ToF sensors with a laser sensor, which is used as ground truth. In their evaluation, the Kinect is slightly better than the two ToF but the difference in accuracy is minor.

In some robotic applications such as for flying robots, the weight and power consumption of the sensor is an issue. Stereo cameras can be very light and RGB-D cameras are also light. ToF and laser sensors are both rather heavy. Although RGB-D sensors are light, their shape is not very reasonable for at- taching it to the hand of a robot. The working range is also of interest as it presents the depth area in which measurements can be obtained. The working range of the Kinect is more limited than for ToF cameras. For stereo systems

2.1. 3D DATA ACQUISITION 15 it is adjustable as mentioned before. The high accuracy of laser striper systems comes at the cost of a very narrow working range.

Other robotic applications might also be considered to work outdoors or both outdoor/indoor for example in case of a mobile robot. In general, stereo systems work better in outdoor scenarios (Schmid et al., 2012) as they depend on the global illumination. Outdoors, ToF sensors perform well (Langmann et al., 2012), whereas RGB-D cameras have difficulties with direct sunlight and heated surfaces (Mura et al., 2012).

In this thesis, we will refer to ToF, RGB-D and stereo sensors as aerial 3D sensors. In contrast to laser stripers, these obtain a matrix of distances which represents a larger area of the environment and not simply a line.