Monocular Visual Inertial Odometry using Learning-based Methods

(1)

Dissertations and Theses

7-2020

Monocular Visual Inertial Odometry using Learning-based

Methods

Yuan Tian

Follow this and additional works at: https://commons.erau.edu/edt Part of the Mechanical Engineering Commons

Scholarly Commons Citation Scholarly Commons Citation

Tian, Yuan, "Monocular Visual Inertial Odometry using Learning-based Methods" (2020). Dissertations and Theses. 527.

https://commons.erau.edu/edt/527

This Dissertation - Open Access is brought to you for free and open access by Scholarly Commons. It has been accepted for inclusion in Dissertations and Theses by an authorized administrator of Scholarly Commons. For more information, please contact [email protected].

(2)

MONOCULAR VISUAL INERTIAL ODOMETRY USING LEARNING-BASED METHODS

by

Yuan Tian

A Dissertation Submitted to the College of Engineering Department of Mechanical Engineering in Partial Fulfillment of the Requirements for the Degree of

Doctor of Philosophy in Mechanical Engineering

Embry-Riddle Aeronautical University Daytona Beach, Florida

(3)

MONOCULAR VISUAL INERTIAL ODOMETRY USING LEARNING-BASED METHODS

by

Yuan Tian

This thesis was prepared under the direction of the candidate's Dissertation Committee Chair, Dr. Marc Compere, Associate Professor, Daytona Beach Campus, and Dissertation

Committee Members Dr. Eric Coyle, Associate Professor, Daytona Beach Campus, Dr. Patrick Currier, Associate Professor, Daytona Beach Campus, Dr. Jianhua Liu, Associate Professor, Daytona Beach Campus, and Dr. Hongyun Chen, Assistant Professor, Daytona Beach Campus, and has been approved by the Dissertation Committee. It was submitted

to the Department of Mechanical Engineering in partial fulfillment of the requirements for the degree of Master of Science in Mechanical Engineering

Dissertation Review Committee:

Marc Compere

DigitallysignedbyMarcCompere Date: 2020.07.09 17 :34:22 -04'00'

Marc Compere, Ph.D. Committee Chair

Eric Coyle

Digitally signed by Eric Coyle

Date: 2020.07.14 14:17 :18 -04'00'

Eric Coyle, Ph.D. Committee Member

� 7/14/2020

----C,/

J .ll,,U . .Ll..1.\..1.U. .J...J.lU' ..I. J.J..LJ •

(�----vvt-Committee Member

E

r

I•

C

Oy

I

e

Digitally signed by Eric Coyle Date: 2020.07.1414:17 :31 -04'00' Eric Coyle, Ph.D. Ph.D. Program Coordinator, /\ J /Me�hanical �ngineering

l1P-•-"-""""Q,>,,,,

Maj Mirmirani, Ph.D.

Dean, College of Engineering

I

, • Digitally signed by Patrick Currier

Patnck Gurner

Date: 2020.07.1412 :51 :32

-04'00' Patrick CmTier, Ph.D.

Committee Member

Hongyun Chen

DlgilallysignedbyHongyunChen Date: 2020.07.14 15:59:58-04'00' Hongyun Chen, Ph.D. Committee Member Eduardo Divo, Ph.D. Department Chair, Mechanical Engineering

�r

Senior Vice President for Academics Affairs and Provost

7 -

'::r-,...

"20 Date

(4)

(5)

(6)

Acknowledgements

First, I would like to take this opportunity to thank my advisor, Dr. Marc Compere, for his valuable advice and selfless help for both my coursework, research and even in the preparation of my dissertation during my stay at Emery-Riddle Aeronautical University. I still remember that I sat in his office, saying I would determine the research topic next week. After four and a half years, I have grown and become more cautious and mature towards research. I benefit from his high standard not only for academic research, but also for normal life.

Moreover, I would also like to thank other dissertation committee members, Dr. Eric. Coyle, Dr. Patrick Currier, Dr. Jianhua Liu and Dr. Hongyun Chen, for their valuable time and comments. Also, I want to thank Dr. Divo, the Mechanical Engineering department chair. He has always been so nice as to give any help he can. Thanks are also given to our department office assistant, Mrs. J. Gibbs, for her help either related to the study or normal office life.

I would also like to extend my thanks to my friends, Dancheng Yang, Mengyuan Cheng, Xinyu Zhao and Y aoyao Huang for your company and encouragement whenever I needed you.

Last, but most importantly, my parents are worthy of the greatest thanks. They have given me the freedom and selfless love to make me strong and brave. From the bottom of my heart, I hope you can live in happy forever.

(7)

Researcher: Yuan Tian

Abstract

Title: Monocular Visual-Inertial Odometry using Learning-based Methods

Institution: Emb_ry-Riddle Aeronautical University

Degree: Doctor of Philosophy in Mechanical Engineering

Year: 2020

Precise pose information 1s a fundamental prerequisite for numerous applications in robotics, Artificial Intelligent and mobile computing. Many well-developed algorithms

have been established using a single sensor or multiple sensors. Visual Inertial Odometry

(VIO) uses images and inertial measurements to estimate the motion and is considered a key technology for GPS-denied localization in the real world and also virtual reality and augmented reality.

This study develops three novel learning-based approaches to odometry estimation using a monocular camera and inertial measurement unit. The networks are well-trained on standard datasets, KITTI and EuROC, and a custom dataset using supervised, unsupervised and semi-supervised training methods. Compared to traditional methods, the deep-learning methods presented here do not require precise manual synchronization of the camera and IMU or explicit camera calibration.

To the best of our knowledge, the proposed supervised method is a novel end-to-end

trainable Visual-Inertial Odometry method with an IMU pre-integration module, that

simplifies the network architecture and reduces the computation cost. Meanwhile, the unsupervised Visual-Inertial Odometry method shows its novelty in achieving outstanding

accuracy in odometry estimation while training with monocular images and inertial

(8)

measurements only. Last but not least, the semi-supervised method is the first Visual Inertial Odometry approach that uses a semi-supervised training technique in the literature, allowing the network to learn from both labeled and unlabeled datasets.

Through our qualitative and quantitative experimentation on a wide range of datasets, we conclude that the proposed methods can be used to obtain accurate visual localization information to a wide variety of consumer devices and robotic platforms.

(9)

(10)

(11)

(12)

(13)

(14)

(15)

(16)

(17)

(18)

(19)

(20)

(21)

(22)

(23)

(24)

(25)

(26)

(27)

(28)

(29)

(30)

(31)

(32)

(33)

(34)

(35)

(36)

(37)

(38)

(39)

(40)

(41)

(42)

(43)

(44)

(45)

(46)

(47)

(48)

(49)

(50)

(51)

(52)

(53)

(54)

(55)

(56)

(57)

(58)

(59)

(60)

(61)

(62)

(63)

(64)

(65)

(66)

(67)

(68)

(69)

(70)

(71)

(72)

(73)

(74)

(75)

(76)

(77)

(78)

4.2.2 Depth Estimation

a. Methodology

As it is known to all, depth estimation from images is an important tool in a variety of applications, especially in robotics. One big problem for monocular visual-inertial odometry is the lack of one-dimension information, scale, that makes depth estimation an important part. For example, although VIOLearner[56] claims that it is an unsupervised VIO approach, it requires the ground-truth depth map for training. Although several dedicated ranging sensors, such as LIDAR and structured light sensors, provide superior depth detection compared, they are not suited for all applications, such as hand-held devices [36]. Moreover, LID AR units are usually high-cost, while structured light sensors suffer from poor detection range [36]. Cameras remain a cost-friendly and competitive sensor choice for small robotic platforms. Traditionally, stereo or multiple cameras are used, and depth is calculated as the inverse of image disparity.

Encoder Decoder

Figure 4.2. Scheme of Depth Estimation Network

In this research, a DepthNet that is recreated from [73] is proposed to generate depth

information from consecutive monocular images, as shown in Figure 4.3. Depth estimation

(79)

(80)

(81)

(82)

(83)

(84)

(85)

(86)

(87)

(88)

(89)

(90)

(91)

(92)

(93)

(94)

(95)

(96)

(97)

(98)

(99)

(100)

(101)

(102)

(103)

(104)

(105)

The first interesting finding is that learning-based methods outperform traditional methods. The reasonable explanation is that traditional methods cannot handle the raw IMU data without synchronization. Moreover, proposed SemiVIO beats other VIO learningbased methods, VIOLearner and SfMLearner in translation. However, our results could not successfully beat Deep VIO on KITTI sequence 10 [32], which requires stereo cameras for training.

Secondly, among three methods, our supervisedVIO shows the best performances with the lowest average translational and rotational error on most of training datasets ( sequence

00-08), as shown in Figure 6.1. The possible reason for that is the training processes are

done on the same sequences. A conclusion can be drawn that within the same or highly similar environments, supervised method, training with ground truth, can provide the most accurate estimates. However, the test on sequence 09 and 10, semiVIO outperforms both supervised and unsupervised VIO. It is possible that the supervised method suffers from overfitting, that makes it perfect for training set but less accurate for testing. The improvement from semiVIO to unsupervisedVIO demonstrates that finetuning with ground truth is an effective way to boost the performance of the network.

One more thing that worth noting is that, on sequence 10, suprevisedVIO beats unsupervisedVIO while on sequence 09, the performance is opposite. With a deeper study of the KITTI dataset, we find that on sequence 10, there are several edge cases such as the forward vehicle moving backward and large moving objects. The unsupervised method partially relies on depth estimation and theses special cases degrade the performance of

(106)

(107)

(108)

(109)

(110)

(111)

(112)

Table 6.4. Absolute Trajectory Error (ATE) on KITTI odometry dataset. The results of other baselines are taken from[26],[56]

Method Sequence 09 Sequence IO

ORB-SLAM 0.014 ± 0.008 0.012 ± 0.01 I SfM-Learner 0.016 ± 0.009 0.013 ± 0.009 VIOLearner 0.012 0.012 Supervised VIO 0.01 I ± 0.005 0.010 ± 0.009 Unsupervised VIO 0.010 ± 0.006 0.01 I ± 0.007 Semi-supervised VIO 0.010 ± 0.003 0.009 ± 0.008

6.3 Results on EuROC dataset

Additionally, we successfully run three proposed methods on EuROC dataset [53]. KITTI dataset is famous for the outdoor environment and EuROC is famous for the indoor environment. The network configuration and training parameters and strategies are the same as Chapter 2-4. EuROC dataset is divided into a training set (80%) and a validation

set (20%).

Table 6.5 reports the results on MH04 and MH05. Results of other methods are reproduced from [59]. It is obvious that all the VIO methods outperform Deep VO because of combining IMU data with a camera. Moreover, semiVIO outperforms all other methods in terms of rotational accuracy. However, compared with the results on KITTI dataset, learning-based methods' strengths are limited in terms of translational accuracy. Traditional methods, ORB-SLAM and VINS, outperforms not only proposed methods, but also other learning-based approaches on translational accuracy.

(113)

Table 6.5. Comparison of trajectories estimation performance with existing approaches on EuROC dataset.

SupervisedVIO SemiVIO UndeepVIO DeepVIO ORB-SLAM DeepVO VINS

Seq

1rel frel 1rel frel trel frel trel frel 1rel fr_el _1rel _frel _1rel _frel

MH04 0.43 0.60 0.33 0.58 0.52 0.62 0.69 0.76 0.06 1.76 :::::5 ::::: 30 0.34 0.87

MH05 0.57 0.80 0.55 0.70 0.69 0.95 0.52 0.81 0.05 1.56 ::::: 6 ::::: 35 0.29 0.69

Average 0.50 0.70 0.44 0.64 0.56 0.79 0.61 0.79 0.06 1.66 ::::: 5.5 ::::: 32.5 0.32 0.78

(114)

(115)

(116)

(117)

(118)

(119)

(120)

(121)

(122)

(123)

(124)

(125)

(126)

(127)

(128)

(129)

(130)

(131)

(132)

(133)

(134)

(135)

(136)

(137)

(138)

(139)

(140)

(141)

(142)

(143)

(144)

(145)

(146)

(147)

(148)

(149)

(150)

(151)

(152)

(153)

(154)

(155)

(156)

(157)

(158)

(159)

(160)

(161)

(162)

(163)

(164)

(165)

(166)

(167)