Propagation-based Tracking (PbT)

One of the major steps in visual environment perception for automotive applications is to track keypoints and to subsequently estimate egomotion and environment structure from the trajectories of these keypoints. We present a propagation based tracking (PbT) method to obtain the 2D trajectories of keypoints from a sequence of images in a monocular camera setup.

Instead of relying on the classical RANSAC to obtain accurate keypoint correspondences, we steer the search for keypoint matches by means of propagating the estimated 3D position of the keypoint into the next frame and verifying the photometric consistency. In this process, we continuously predict, estimate and refine the frame-to-frame relative pose which induces the epipolar relation.

New keypoint correspondences are obtained using prior-based matching, as explain here (link to prior motion page). The triangulation uses scale estimates as proposed in our multimodal scale estimation (link to scale estimation page).

Modules of PbT

CNN-based IMO detection

Cars are labeled using CNN-based instance segmentation. Static cars are identified from the stable 3D position of tracked keypoints. Vehicles are labeled as IMO, when most keypoints on the cars fail epipolar matching check.
Keypoints on IMOs are subsequently flagged as outliers and rejected, while tracked keypoints on static cars can still be utilized for pose estimation.


Experiments on the KITTI dataset as well as on the synthetic COnGRATS dataset show promising results on the estimated courses and accurate keypoint trajectories. Our result (PbT-M2) is currently the top rank among monocular approaches in KITTI odometry benchmark (as of June 09, 2017).


  • N. Fanani, M. Ochs, H. Bradler and R. Mester: Keypoint trajectory estimation using propagation based tracking. Intelligent Vehicles Symposium (IV), Gothenburg, Sweden, June 2016

Multimodal scale estimation for monocular visual odometry

Visual odometry using only a monocular camera offers simplicity while facing more algorithmic challenges than stereo odometry, such as the necessity to perform scale estimation.
Our framework offers a robust monocular scale estimation for automotive applications. By using cues from sparse and dense ground plane estimation as well as a prediction mechanism based on typical vehicle dynamics it provides high reliability.

CNN-based street segmentation

The method is further enhanced by street classification using a Convolutional Neural Network (CNN) to determine the street area. This makes it possible to increase the area which can be used while also preventing the injection of non-street points into the estimation. Thus, the method yields highly accurate scale estimates.


  • N. Fanani, A. Stürck, M. Barnada and R. Mester: Multimodal scale estimation for monocular visual odometry. Intelligent Vehicles Symposium (IV), Los Angeles, USA, June 2017

Joint Epipolar Tracking

Traditionally, pose estimation is considered as a two step problem. First, feature correspondences are determined by direct comparison of image patches, or by associating feature descriptors. In a second step, the relative pose and the coordinates of corresponding points are estimated, most often by minimizing the reprojection error (RPE). RPE optimization is based on a loss function that is merely aware of the feature pixel positions but not of the underlying image intensities.

In this paper, we propose a sparse direct method which introduces a loss function that allows to simultaneously optimize the unscaled relative pose, as well as the set of feature correspondences directly considering the image intensity values. Furthermore, we show how to integrate statistical prior information on the motion into the optimization process. This constructive inclusion of a Bayesian bias term is particularly efficient in application cases with a strongly predictable (short term) dynamic, e.g. in a driving scenario.


  • H. Bradler, M. Ochs, N. Fanani and R. Mester: Joint Epipolar Tracking (JET): Simultaneous optimization of epipolar geometry and feature correspondences. A IEEE Winter Conference on Applications of Computer Vision (WACV), Santa Rosa, California, USA, March 2017

Motion priors estimation for robust matching initialization in automotive applications

Tracking keypoints through a video sequence is a crucial first step in the processing chain of many visual SLAM approaches. We present a robust initialization method to provide the initial match for a keypoint tracker, from the 1st frame where a keypoint is detected to the 2nd frame, that is: when no depth information is available.

We deal explicitly with the case of long displacements. The starting position is obtained through an optimization that employs a distribution of motion priors based on pyramidal phase correlation, and epipolar geometry constraints. Experiments on the KITTI dataset demonstrate the significant impact of applying a motion prior to the matching.


  • N. Fanani, M. Barnada and R. Mester: Motion priors estimation for robust matching initialization in automotive applications. International Symposium on Visual Computing (ISVC), Las Vegas, Nevada, USA, December 2015
Machine Learning for Intelligent Control of Vehicles
Computer Vision for Autonomous Driving