Top PDF Optical Flow and Deep Learning Based Approach to Visual Odometry

Optical Flow and Deep Learning Based Approach to Visual Odometry

Optical Flow and Deep Learning Based Approach to Visual Odometry

Figure 3.2: Convolutional neural network architecture based on the contractive part of FlowNetS. 3.2 Training Data For this system, ego-motion is estimated from two consecutive video frames. Instead of inputting the two images directly into the network, a flow image is computed using the FlowNetS architecture, and then that result is used as the network input. Each of these flow images represents the change between two video frames, so the corre- sponding motion differentials can be computed from the ground truth provided by the KITTI odometry dataset [8]. The KITTI dataset gives 11 video sequences with ground truth data for each frame in the videos for training, and 11 more sequences without ground truth data to be used in the online evaluation of a visual odometry system. In this work, and in other works of similar function, the 11 sequences without ground truth are ignored, and instead, sequences 08, 09, and 10 are used for evalu- ation, and sequences 00 through 07 are used for training and fine-tuning the visual odometry system. The number of frames in each of the training and testing sequences are given in Table 3.1. Three examples of images in dataset are shown in Figs. 3.3, 3.4, and 3.5. Examples of flow images, colored for visual representation, are shown in Figs. 3.6, 3.7, and 3.8. The flow images show how pixels should move for straight vehicle movement, left turns, and right turns, respectively. Although these are color images, they are only shown for human visualization. Raw, two-channel optical flow images are used as input to the system.
Show more

74 Read more

Learning Kalman Network: A deep monocular visual odometry for on-road driving

Learning Kalman Network: A deep monocular visual odometry for on-road driving

The pipeline of the proposed LKN-VO with 3D dense mapping is shown in Fig. 1. To be more specific, firstly the dense optical flow and depth are obtained using FlowNet2 [44] and DepthNet [38], respectively. Subsequently, the LKN si- multaneously estimates the ego-motion from current measurement and filters the states from a sequence of measurements. Consequently, a sequence of filtered states, i.e. 6 DOF relative poses, can be transformed to the global pose trajectory by the SE(3) composition layer [10]. Simultaneously, the point cloud is consis- tently generated from the estimated depth, and incrementally mapped with the learned global pose. Furthermore, an Octree depth fusion [43][45] is employed for a robust depth refinement, in which multi-view measurements are used to elim- inate inaccurate predictions. Finally, a dense 3D map can be obtained. As shown in Figs. 2 and 3, LKN is a computation graph made up of a Kalman Filter archi- tecture with learning observation and transition models, which can be trained as a complete graph from end to end. Please note that only monocular RGB images are employed for localization and mapping.
Show more

42 Read more

Dynamic Attention-based Visual Odometry

Dynamic Attention-based Visual Odometry

Abstract— This paper proposes a dynamic attention-based visual odometry framework (DAVO), a learning-based VO method, for estimating the ego-motion of a monocular camera. DAVO dynamically adjusts the attention weights on different semantic categories for different motion scenarios based on optical flow maps. These weighted semantic categories can then be used to generate attention maps that highlight the relative importance of different semantic regions in input frames for pose estimation. In order to examine the proposed DAVO, we perform a number of experiments on the KITTI Visual Odometry and SLAM benchmark suite to quantitatively and qualitatively inspect the impacts of the dynamically adjusted weights on the accuracy of the evaluated trajectories. Moreover, we design a set of ablation analyses to justify each of our design choices, and validate the effectiveness as well as the advantages of DAVO. Our experiments on the KITTI dataset shows that the proposed DAVO framework does provide satisfactory perfor- mance in ego-motion estimation, and is able deliver competitive performance when compared to the contemporary VO methods.
Show more

8 Read more

Learning and Searching Methods for Robust, Real-Time Visual Odometry.

Learning and Searching Methods for Robust, Real-Time Visual Odometry.

Feature detectors have long been used in computer vision to focus processing on portions of the image with strong signal, or represent the image abstractly with highly-invariant image features. These detectors are typically categorized as corner, edge, or region detectors and much work has been done in this domain. A few of the most well known detectors include the Harris [19] and FAST [20] corner detectors, the Canny edge detector [25], and region detectors such as SIFT [26] and SURF [27]. Dense methods have also been a focus of much research, such as dense disparity estimation (dense stereo) and optical flow. Scharstein and Szeliski published the well known Middlebury dense stereo datasets for evaluating dense stereo methods against ground truth [14, 15], including a review of approaches. Recent work by Newcombe et al. demonstrated monocular camera tracking with a dense representation on a GPU [47] and showed great resilience to motion blur, but heavy GPU usage may require too much power for use on small robotic platforms.
Show more

114 Read more

A Global Occlusion-Aware Approach to Self-Supervised Monocular Visual Odometry

A Global Occlusion-Aware Approach to Self-Supervised Monocular Visual Odometry

SSM-VO is often cast into a view synthesis problem based on solving two closely intertwined problems: monocular depth estimation and relative camera pose regression. The key challenge faced by an SSM-VO method is the pres- ence of occlusions and moving objects in the scene (Fig- ure 1). Existing SSM-VO approaches (Bian et al. 2019; Luo et al. 2018) typically leverage multiple frames and additional models (e.g., optical flow models) to estimate an occlusion mask. With the mask, partial view synthesis is performed by excluding the masked regions. However, occlusion and moving object detection itself is an unsolved problem. Inac- curate detection inevitably results in incorrect depth as well as pose estimation. These methods thus choose to use more
Show more

9 Read more

Visual odometry and vision system measurements based algorithms for rover navigation

Visual odometry and vision system measurements based algorithms for rover navigation

Sensors dedicated to navigation were limited in size and weight, also the computational capabilities were limited: the clock speed of the CPU was 10 MHz. The attitude and position of the rover was supposed to be calculated in two steps. The solar direction is measured after the robot awakes and before the robot falls into sleep and it is used to calculate the rover attitude with reference to the small body. The attitude is then reconstructed by gyroscope signal integration. Then surface images of the asteroid, captured during rover hopping, are used to estimate the hop velocity and gravity, these value were used to estimate absolute and discrete localization. The velocity was estimated using the optical flow, and the distance from the comet by considering rover’s own shadow in the acquired image.
Show more

148 Read more

Visual Odometry Estimation Using Selective Features

Visual Odometry Estimation Using Selective Features

stereo camera approach. Since the camera was mounted on a slider which was level and the camera’s pose was fixed, the camera had epiploic geometry. The cameras baseline distance was the length of the slider bar and this information made calculations easier. The main assumption is that neither the robot, nor the surrounding moves during the image capturing stage. Once the images were captured, corners in one image were detected using Morvec’s corner detector [9] and these corners are matched to the right image using NCC (Normalized Cross Correlation). These corners are tracked to the next consecutive frame capturing the incremental motion of the robot using optical flow. Variance in the overall flow and discrepancies in the neighboring pixel depth information of the features can be outlined for outlier rejection. With the set of 3D points tracked between subsequent frames, rigid body transformation is used to align triangulated 3D points. Weighted least square of the triangulation vector of features based on their weights was used to reduce mean error in solving the equation obtained from two sets of 3D points. Once the camera captures the nine images and analyze these images for motion estimation, the robot would move. The motion in between the image capturing stage was very minimal and hence the speed at which the robot could travel was restricted. This was a major drawback. Moravec visualized the stereo camera by setting up a camera free to slide on an axis perpendicular to the scene being captured. As the sliding is done at known distances and the images captures are from single camera, they depict stereo image pair. This approach proved to be more accurate in terms of depth computation, as the stereo computation could be done over multiple images captured at discrete known distances.
Show more

71 Read more

Using Unsupervised Deep Learning Technique for Monocular Visual Odometry

Using Unsupervised Deep Learning Technique for Monocular Visual Odometry

Corresponding author: Qiang Liu (qliui@essex.ac.uk) ABSTRACT Deep learning technique-based visual odometry systems have recently shown promising results compared to feature matching-based methods. However, deep learning-based systems still require the ground truth poses for training and the additional knowledge to obtain absolute scale from monocular images for reconstruction. To address these issues, this paper presents a novel visual odometry system based on a recurrent convolutional neural network. The system employs an unsupervised end-to-end training approach. The depth information of scenes is used alongside monocular images to train the network in order to inject scale. Poses are inferred only from monocular images, thus making the proposed visual odometry system a monocular one. The experiments are conducted and the results show that the proposed method performs better than other monocular visual odometry systems. This paper has made two main contributions:
Show more

13 Read more

Fast Uncertainty Estimation for Deep Learning Based Optical Flow

Fast Uncertainty Estimation for Deep Learning Based Optical Flow

Without uncertainty aware reasoning, the optical flow model, especially when it is used for mission critical fields such as robotics and aerospace, can cause catastrophic failures. Although several approaches such as the ones based on Bayesian neural networks have been proposed to handle this issue, they are computationally expensive. Thus, to speed up the processing time, our approach applies a generative model, which is trained by input images and an uncertainty map derived through a Bayesian approach. By using synthetically generated images of spacecraft, we demonstrate that the trained generative model can produce the uncertainty map 100∼700 times faster than the conventional uncertainty estimation method used for training the generative model itself. We also show that the quality of uncertainty map derived by the generative model is close to that of the original uncertainty map. By applying the proposed approach, the deep learning model operated in real-time can avoid disastrous failures by considering the uncertainty as well as achieving better performance removing uncertain portions of the prediction result.
Show more

7 Read more

Monocular Visual Inertial Odometry using Learning-based Methods

Monocular Visual Inertial Odometry using Learning-based Methods

This study develops three novel learning-based approaches to odometry estimation using a monocular camera and inertial measurement unit. The networks are well-trained on standard datasets, KITTI and EuROC, and a custom dataset using supervised, unsupervised and semi-supervised training methods. Compared to traditional methods, the deep-learning methods presented here do not require precise manual synchronization of the camera and IMU or explicit camera calibration.

167 Read more

Non-Parametric Learning for Monocular Visual Odometry

Non-Parametric Learning for Monocular Visual Odometry

non-parametric model during the learning stage, and extrapolated to new data using inference based on the GP framework regression methodology. The GP framework, however, struggles with angular motion estimation, resulting in a residual drift that also compromises long-term localization results. We attribute this angular drift to the presence of smaller overlapping areas between frames, which compromises the optical ow distribution throughout the entire image, and also to the presence of fewer vehicle turning samples in the training dataset. Since the vehi- cle moves mostly in a straight forward motion during navigation, the various smooth and sharp turns encountered during tests were under-represented and there was not enough information for a robust recovery. Furthermore, the MOGP framework is not capable of correctly modelling the cross-dependencies between outputs, generated by vehicle constraints that limit linear and angular motion to only certain specic combi- nations, and thus linear velocity information does not translate into a better angular velocity estimation, and vice-versa. This information exchange between dierent outputs would be valuable as a way to decrease the amount of training information
Show more

240 Read more

Deep Learning Based Visual Tracking: A Review

Deep Learning Based Visual Tracking: A Review

Da Zhang, Hamid Maei, Xin Wang, and Yuan-Fang Wang presented the first neural-network tracker that combines convolutional and recurrent networks with RL algorithm in [12]. The tracker is capable of effectively leveraging temporal and contextual information among consecutive frames. And it comprises three components: a CNN extracting best tracking features in each video frame, an RNN constructing video memory state, and a reinforcement learning (RL) agent making target location decisions. The tracking problem is formulated as a decision-making process, and the model can be trained with RL algorithms to learn good tracking policies that pay attention to continuous, inter-frame correlation and maximize tracking performance in the long run. The proposed tracking approach works well in various scenarios on artificial video sequences with ground truth.
Show more

5 Read more

Automatic Visual Features for Writer Identification: A Deep Learning Approach

Automatic Visual Features for Writer Identification: A Deep Learning Approach

This work was supported in part by the Higher Education Department, Khyber Pakhtun Khywa, under Grant ADP 483/170009. ABSTRACT Identification of a person from his writing is one of the challenging problems; however, it is not new. No one can repudiate its applications in a number of domains, such as forensic analysis, historical documents, and ancient manuscripts. Deep learning-based approaches have proved as the best feature extractors from massive amounts of heterogeneous data and provide promising and surprising predictions of patterns as compared with traditional approaches. We apply a deep transfer convolutional neural network (CNN) to identify a writer using handwriting text line images in English and Arabic languages. We evaluate different freeze layers of CNN (Conv3, Conv4, Conv5, Fc6, Fc7, and fusion of Fc6 and Fc7) affecting the identification rate of the writer. In this paper, transfer learning is applied as a pioneer study using ImageNet (base data-set) and QUWI data-set (target data-set). To decrease the chance of over-fitting, data augmentation techniques are applied like contours, negatives, and sharpness using text- line images of target data-set. The sliding window approach is used to make patches as an input unit to the CNN model. The AlexNet architecture is employed to extract discriminating visual features from multiple representations of image patches generated by enhanced pre-processing techniques. The extracted features from patches are then fed to a support vector machine classifier. We realized the highest accuracy using freeze Conv5 layer up to 92.78% on English, 92.20% on Arabic, and 88.11% on the combination of Arabic and English, respectively.
Show more

9 Read more

Addressing the Data Scarcity of Learning-based Optical Flow Approaches

Addressing the Data Scarcity of Learning-based Optical Flow Approaches

The proposed techniques for the generation of optical flow ground truth have several advantages and disadvantages. Middlebury’s approach based on fluorescent textures provides very accurate ground truth but is restricted to a lab environment and needs a time consuming preparation. In contrast, KITTI generates ground truth outside of the lab. However, the re-projection of laser measurements into the images only allow for sparse ground truth and the setup is not applicable in arbitrary environments. In addition, cars are the only class of dynamic objects where approximate ground truth is provided. Finally, the technique used in HCI Benchmark can further improve on the precision of the optical flow ground truth in comparison to KITTI but is restricted to a certain area that was scanned in advance. In conclusion, all real datasets so far are restricted to a certain environment or setting and are missing complex scenes with non-rigid objects. We tackle this problem in Chapter 4 with a novel approach to obtain accurate reference data from High-Speed video cameras by tracking pixel through densely sampled space-time volume. In contrast to previous methods, our approach allows the acquisition of optical flow ground truth in challenging everyday scenes and, in addition, to augment the data with realistic effects such as motion blur to compare methods in varying conditions. Using this approach, we generate 160 diverse real-world sequences of dynamic scenes with a significantly larger resolution (1280 × 1024 pixels) than previous optical datasets and compare several state-of-the-art optical techniques on this data under varying conditions.
Show more

127 Read more

Recognizing Textual Entailment based on Deep Learning Approach

Recognizing Textual Entailment based on Deep Learning Approach

The decision component is used to decide if the text is entailed the hypothesis or not depending on the comparison component output Deep Neural Networks (DNNs) are extremely powerful machine learning models that achieve excellent performance on difficult problems such as speech recognition and visual object recognition. DNNs are powerful because they can perform arbitrary parallel computation for text recognition.

6 Read more

Application of vision-based particle filter and visual odometry for UAV localization

Application of vision-based particle filter and visual odometry for UAV localization

Future work in the field of this paper would include additional experiments comparing proposed algorithm with geo-referencing based localization to benchmark localization accuracy. An alternative image similarity coefficient based on deep learning can be proposed in replacement of correlation coefficient to improve local- ization accuracy of the algorithm.

5 Read more

DeshadowGAN: a deep learning approach to remove shadows from optical coherence tomography images

DeshadowGAN: a deep learning approach to remove shadows from optical coherence tomography images

Thus, we may be able to provide a robust deep learning framework to consistently remove retinal blood vessel shadows of varying sizes and intensities. In addition, DeshadowGAN was able to success- fully eliminate the deleterious effects of light atten- uation affecting the visibility of retinal layers and deeper tissues such as the LC. DeshadowGAN helped substantially recover the visibility of the anterior lamina cribrosa boundary, where sensitive pathophys- iologic deformation could signal the onset of early glaucoma. 32–34 Deep collagenous tissues such as the LC and adjacent peripapillary sclera are the main load- bearing tissues of the eye in the ONH region, 35 and it has been reported that biomechanical and morpho- logical changes in these tissues may serve as risk factors for glaucoma. 36–38 The robustness of the OCT- based measurements performed on these tissues could be substantially improved after application of our proposed algorithm.
Show more

15 Read more

DeshadowGAN: A Deep Learning Approach to Remove Shadows from Optical Coherence Tomography Images

DeshadowGAN: A Deep Learning Approach to Remove Shadows from Optical Coherence Tomography Images

Thus, we may be able to provide a robust deep learning framework to consistently remove retinal blood vessel shadows of varying sizes and intensities. In addition, DeshadowGAN was able to success- fully eliminate the deleterious effects of light atten- uation affecting the visibility of retinal layers and deeper tissues such as the LC. DeshadowGAN helped substantially recover the visibility of the anterior lamina cribrosa boundary, where sensitive pathophys- iologic deformation could signal the onset of early glaucoma. 32–34 Deep collagenous tissues such as the LC and adjacent peripapillary sclera are the main load- bearing tissues of the eye in the ONH region, 35 and it has been reported that biomechanical and morpho- logical changes in these tissues may serve as risk factors for glaucoma. 36–38 The robustness of the OCT- based measurements performed on these tissues could be substantially improved after application of our proposed algorithm.
Show more

15 Read more

Deep-learning-based motion-correction algorithm in optical resolution photoacoustic microscopy

Deep-learning-based motion-correction algorithm in optical resolution photoacoustic microscopy

the envelopes of each depth-resolved photoacoustic sig- nal using the Hilbert transform and projected the max- imum amplitude along the axial direction to form a MAP image. We implemented our algorithm for mo- tion correction using a tensor flow package and trained this neural network using Python software on a per- sonal computer.

6 Read more

Tutorial on Visual Odometry

Tutorial on Visual Odometry

Davide Scaramuzza – University of Zurich – Robotics and Perception Group - rpg.ifi.uzh.ch Loop constraints are very valuable constraints for pose graph optimization Loop constraints can be found by evaluating visual similarity between the current camera images and past camera images.

55 Read more

Show all 10000 documents...