3.5 Experimental Results
3.5.2 Visual odometry
We conducted two sets of experiments to evaluate the performance of the proposed visual odometry method using six error metrics (3 translational and 3 rotational): z-axis translation (positive axis extends in the direction pointed to by the Kinect), x axis translation (positive axis extends to the left), y axis translation (positive axis extends upwards), roll angle rotation (rotation about the x-axis), pitch angle rotation (rotation about the y-axis) and yaw angle rotation (rotation about the z-axis). The results of those experiments are as follows.
Comparative results for a lit room
In order to evaluate the performance of our proposed VO method and compare it with the state-of-the-art method (FOVIS [72]), we moved the Kinect sensor around a rectangular path of dimensions 1.13m × 0.9m and back to the initial position at a slow pace. Ideally, the error in all of the six aforementioned components should be zero. Table 3.5 shows the results of these experiments, which indicates that the accuracy of our method is higher than the accuracy of FOVIS (in most error metrics) in a lit environment.
Comparative results for a dark room
In this experiment, we switched off the lab lights in order to evaluate the performance of the proposed VO method in a dark room (using the pre-processed IR image for visual feature extraction and matching). Table 3.5 shows that the accuracy of our method in a dark room is similar to the accuracy of FOVIS in a lit room (while this method completely fails in a dark environment).
Table 3.2: Comparative effect of using different feature extraction and matching methods on the accuracy of the 3D registration in a dark room.
Method Pitch error(◦) Process time(s) SIFT 0.838 ± 0.716 1.78 ± 0.045 SURF 0.873 ± 0.656 1.8 ± 0.058 ORB-BRIEF 0.389 ± 0.270 0.364 ± 0.044
Table 3.3: The performance of 3D alignment with the room lights off using the IR image compared to the 3D alignment with the room lights on using the RGB image.
Method Pitch error(◦) Process time(s) IR image - light off 0.389 ± 0.270 0.364 ± 0.044 RGB image - light on 0.383 ± 0.278 0.3024 ± 0.005
Table 3.4: Comparison between the performance of 3D alignment in a dark room using our method vs. depth only ICP.
Method Pitch error(◦) Process time(s) MSSE + ICP ( using IR image) 0.389 ± 0.270 0.364 ± 0.044
Depth only ICP 3.802 ± 2.87 0.2615 ± 0.057
Table 3.5: Comparison between the performance accuracy of the proposed VO methods and FOVIS.
Method X error(m) Y error(m) Z error(m) Our method (dark room) 0.181 ± 0.093 0.129 ± 0.056 0.09 ± 0.022
Our method (lit room) 0.068 ± 0.02 0.034 ± 0.036 0.034 ± 0.023 Fovis (lit room) 0.137 ± 0.012 0.156 ± 0.012 0.019 ± 0.0142
Method Roll error(◦) Pitch error(◦) Yaw error(◦) Our method (dark room) 8.08 ± 2.24 7.639 ± 6.04 4.068 ± 3.524 Our method (lit room) 2.827 ± 0.0952 7.067 ± 3.135 4.3545 ± 1.776
Fovis (lit room) 2.406 ± 0.796 14.075 ± 1.138 7.1429 ± 1.05
3.6
Conclusion
In this chapter, we outlined the problem of aligning RGB-D images in an environment with lighting variations (particularly, encountering a dark environment). We present a novel 3D registration approach that is able to align 3D points in a dark room using a Microsoft Kinect sensor. Our method is based on automatically switching between the RGB and IR images for
3.6 Conclusion 65
feature extraction, based on the brightness level of the images. The extracted visual features are matched using their feature descriptors and these matches were then refined, and an initial transformation estimation is obtained using a robust ranked ordered statistics estimation technique called MSSE. This transformation is then refined using an ICP method and finally, this transformation is applied to the 3D points in the source frame. We also propose a visual odometry method that concatenates the MSSE estimated transformations for obtaining a global pose of the sensor.
We showed that our system is able to align 3D points with high accuracy in a dark environ- ment by appropriately processing the IR image. We evaluated the performance of different alignment methods in terms of accuracy and computational efficiency under different param- eters and settings. We also showed that the proposed VO method effectively estimates the robot’s global pose in a dark room using the pre-processed IR image. We have discovered that the proposed VO method is less tolerable to distant objects in a dark environment when compared to a lit environment, since these objects are less visible in the IR image when compared to the RGB image.
Chapter 4
3D SLAM in Texture-less Environments
Using Rank Order Statistics
4.1
Introduction
As we mentioned in Chapter 3, the availability of affordable RGB-D sensors has generated intense interest in creating dense 3D models of the environment that are very useful for autonomous navigation applications. However, the use of RGB-D sensors for 3D SLAM poses a number of challenges. In particular, mobile robots are commonly required to navigate in texture-less areas such as offices, warehouses and residential buildings. The registration of frames in such texture-less environments is difficult as there are not readily available visual cues to align these frames. This issue is illustrated in figure 4.2 which shows a typical corridor in a university building. Furthermore, the size of data generated by RGB-D sensors makes it difficult to capture, process and visualize this data in real-time.
In the previous chapter, we proposed a method that uses the IR image, which is provided by the RGB-D sensor, for registering frames in dark environments. However, since this method relies on using visual information for matching frames, it would simply fail when registering texture-less images, whether we use RGB or IR images for registration.
This chapter outlines the problem of 3D SLAM in texture-less environments (in addition to mapping environments with lighting variations that may result in limiting the amount of visual information). Our aim is to develop a fast and accurate method that does not rely on the information provided by the RGB images. This enables us to study the limits of using structures for solving 3D SLAM. To fulfill this goal, we developed a sampling strategy to
extract salient geometric 3D keypoints from sequential frames. We then assign a descriptor to each feature, match them using their descriptors and refine the matches and calculate a rigid-body transformation between the two frames using a robust estimator [86]. The relative transformations are then concatenated up to the current time resulting in a global pose. We finally employ a loop closure and pose graph optimization technique [148] in order to reduce the drift and obtain a globally consistent trajectory. Finally, a map is constructed by projecting and transforming the points according to the optimized trajectory. An example of a map that is obtained using the proposed method is shown in figure 4.6. Extraction and matching of 3D keypoints for a typical point cloud captured by an RGB-D sensor is very time consuming (in this work, we utilize the Microsoft Kinect sensor, which is based on the structured light approach for obtaining the depth information). The emphasis here is on the selection of a small subset of points that can be used to register two point clouds with similar registration accuracy to using the entire sets.
The main contribution is the development of an informative sampling based 3D feature extraction technique. The method is able to exploit the geometric information of the points and their neighbors to identify points that carry the most useful information. We call the points resulting from our informative sampling scheme: Ranked Order Statistics (ROS) key- points. We show that the proposed keypoint extraction method is highly repeatable and can obtain a subset of points of the original point cloud that results in a very accurate registration compared to using a point cloud containing many more points (≈ 15 times more points). The main advantage of using this sampling technique is that it would reduce computational time significantly. In fact, we will show that our method outperforms several state of the art registration methods both in accuracy and computational efficiency. Figure 4.1 shows a system overview of the proposed SLAM method. The above steps are outlined in the following sections.
The rest of this chapter is organized as follows. In section 4.2 the related work in this area is reviewed. We will present our informative sampling based feature extraction method in section 4.3 and our 3D registration and mapping method in section 4.4. Results are presented in section 4.5 followed by a conclusion in section 4.6.
4.2 Related Work 69 Point cloud(t) Point cloud pre- processing ROS keypoint extraction Point cloud(t) Feature Descriptor assignment Sampled points Initial matching Transformat ion est. Pose graph construction Graph optimization Map building New key- frame? Features (t) Optimized trajectory Matches Transfor- mation Key-frame constructer Features(t) Memory Map Memory Global pose New key-frame Loop closure detection Graph Depth Image RGB Image
Transformations Key-frame poses Global Pose
Features(t-1)
Figure 4.1: A system overview of the proposed SLAM method.