Date of Award


Document Type

Dissertation - Open Access

Degree Name

Doctor of Philosophy in Mechanical Engineering


Mechanical Engineering

Committee Chair

Marc Compere, Ph.D.

First Committee Member

Eric Coyle, Ph.D.

Second Committee Member

Patrick Currier, Ph.D.

Third Committee Member

Hongyun Chen, Ph.D.


Precise pose information is a fundamental prerequisite for numerous applications in robotics, Artificial Intelligent and mobile computing. Many well-developed algorithms have been established using a single sensor or multiple sensors. Visual Inertial Odometry (VIO) uses images and inertial measurements to estimate the motion and is considered a key technology for GPS-denied localization in the real world and also virtual reality and augmented reality.

This study develops three novel learning-based approaches to Odometry estimation using a monocular camera and inertial measurement unit. The networks are well-trained on standard datasets, KITTI and EuROC, and a custom dataset using supervised, unsupervised and semi-supervised training methods. Compared to traditional methods, the deep-learning methods presented here do not require precise manual synchronization of the camera and IMU or explicit camera calibration.

To the best of our knowledge, the proposed supervised method is a novel end-to-end trainable Visual-Inertial Odometry method with an IMU pre-integration module,that simplifies the network architecture and reduces the computation cost. Meanwhile, the unsupervised Visual-Inertial Odometry method shows its novelty in achieving outstanding accuracy in Odometry estimation while training with monocular images and inertial measurements only. Last but not least, the semi-supervised method is the first VisualInertial Odometry approach that uses a semi-supervised training technique in the literature, allowing the network to learn from both labeled and unlabeled datasets.

Through our qualitative and quantitative experimentation on a wide range of datasets, we conclude that the proposed methods can be used to obtain accurate visual localization information to a wide variety of consumer devices and robotic platforms.