Computer Vision for Autonomous Driving

Driver assistance has been a major application field for the research performed by Prof. Rudolf Mester and his group since the early 90s, when he made significant contributions to traffic, security, and vehicle-related image and video interpretation, as one of the initiators of computer vision research at Bosch Research Center Hildesheim.

Since many years, Prof. Mester cooperates with major automotive manufacturers, and acts as supervisor for Master degrees and Ph.D.students, for instance in Driver Assistance.

Our work deals with:

  • precision estimation of egomotion and 3D environment structure
  • predictive structures for automotive visual sensing
  • fast pre-estimation of pitch, yaw and roll
  • fast monocular and multi-monocular surround sensing
  • the stixel representation [Badino 2007, Erbs 2010-2014]
  • estimation of illumination changes
  • far-field object detection using stereo disparity and motion [Pinggera et al., 2013-present]

Recent keynote talks

  • Rudolf Mester: Predictive Visual Perception for Automotive Applications. Irish Machine Vision and Image Processing conference (IMVIP), Galway, Ireland, August 2016. (Abstract)
  • Rudolf Mester: Predictive Video Processing for ADAS. Intelligent Vehicles (IV) Symposium - Workshop on holistic interfaces for environmental fusion models, Gothenburg, Sweden, June 2016.  (Abstract)

Recent projects

Retrieving dense information from sparse measurements with PCA

In computer vision most iterative optimization algorithms, both sparse and dense, rely on a coarse and reliable dense initialization to bootstrap their optimization procedure. For example, dense optical flow algorithms profit massively in speed and robustness if they are initialized well in the basin of convergence of the used loss function. The same holds true for methods as sparse feature tracking when initial flow or depth information for new features at arbitrary positions is needed. The method is able to determine a dense reconstruction from sparse measurement. When facing situations with only very sparse measurements, typically the number of principal components is further reduced which results in a loss of expressiveness of the basis. We overcome this problem and inject prior knowledge in a maximum a posterior (MAP) approach.

M. Ochs, H. Bradler and R. Mester: Learning Rank Reduced Interpolation with Principal Component Analysis. Intelligent Vehicles (IV) Symposium, Los Angeles, USA, June 2017.

Instance-level Segmentation of Vehicles

The recognition of individual object instances in single monocular images is still an incompletely solved task. In this work, we propose a new approach for detecting and separating vehicles in the context of autonomous driving. Our method uses the fully convolutional network (FCN) for semantic labeling and for estimating the boundary of each vehicle. Even though a contour is in general a one pixel wide structure which cannot be directly learned by a CNN, our network addresses this by providing areas around the contours. Based on these areas, we separate the individual vehicle instances.

J. van den Brand, M. Ochs and R. Mester: Instance-level Segmentation of Vehicles by Deep Contours. ACCV 2016 - Workshop on Computer Vision Technologies for Smart Vehicle, Taipei, Taiwan, November 2016.

COnGRATS: Synthetic Datasets

COnGRATS is a framework for the generation of synthetic data sets that support the development and evaluation of vision algorithms in the context of driver assistance applications and traffic surveillance. 

Due to constraints with regards to safety and general feasibility, it is often not possible to acquire the necessary testing data for many of the interesting and safety-relevant conditions which can occur. The demands for test or training datasets can be satisfied with the use of synthetic data, where different scenarios can be created and the associated ground truth data is absolutely accurate

The COnGRATS team continuously creates highly realistic image sequences featuring traffic scenarios. The sequences are generated using a realistic state of the art vehicle physics model; different kinds of environments are featured, thus providing a wide range of testing scenarios. Due to the physics-based rendering technique and variable camera models employed for the image rendering process, we can simulate different sensor setups and provide appropriate and fully accurate ground truth data.

D. Biedermann, M. Ochs and R. Mester: Evaluating visual ADAS components on the COnGRATS dataset.  Intelligent Vehicles (IV) Symposium, Gothenburg, Sweden, June 2016.
D. Biedermann, M. Ochs and R. Mester: COnGRATS: Realistic Simulation of Traffic Sequences for Autonomous Driving. Image and Vision Computing New Zealand (IVCNZ), Auckland, New Zealand, November 2015. (Best Student Paper Award)

More Information

Monocular Visual Odometry

Statistical Models for Vehicle Egomotion

Vehicle/Camera Rotation Estimation

Motion on far-field windows is influenced mostly by rotations and not by translation. In this paper, we determined the translation of several windows of the images to then robustly estimate the global 2D displacement and the rotation angles. The estimation of the rotation matrix is used in our current research for determining the motion priors of several keypoints and to track keypoints and edge pixels. 

M. Barnada, C. Conrad, H. Bradler, M. Ochs and R. Mester: Estimation of Automotive Pitch, Yaw, and Roll using Enhanced Phase Correlation on Multiple Far-field Windows. Intelligent Vehicle (IV) Symposium, Seoul South Korea, June 2015

More information  Detailed presentation

Selection of good edge pixels (edgels) to track

A common problem in multiple view geometry and structure from motion scenarios is the sparsity of keypoints to track. We present an approach that allows to select the edge pixels to track which are not aligned with the epipolar lines. Knowing the motion of the camera, they can be matched as a 1D optimization problem.

T. Piccini, M. Persson, K. Nordberg, M. Felsberg and R. Mester: Good Edgels To track: Beating the Aperture Problem with Epipolar Geometry. European Conference on Computer Vision (ECCV) - 2nd Workshop for Road Scene Understanding and Autonomous Driving, Zürich, Switzerland, September 2014

Multi-sensor Multi-camera Traffic Sequences

In cooperation with Linköping University, Sweden, where R. Mester has a guest professorship, we perform a research project on multi-camera/multi-sensor environment sensing for cars in urban and highway traffic. The challenge is to generalize state of the art computer vision procedures for typical vehicle-related tasks such as driver assistance towards multi-view / omnidirectional processing. The technical / scientific approach taken here has a strong emphasis on recursive processing, incremental exploration using confidence information, usage of dynamic models, and precise measuring methods of motion and structure, using modern statistical signal processing methods.

P. Koschorrek, T. Piccini, P. Öberg, M. Felsberg, L. Nielsen and R. Mester: A multi-sensor traffic scene dataset with omnidirectional video. Conference on Computer Vision and Pattern Recognition (CVPR) - Workshops, Portland, USA, June 2013.

Highly Accurate Depth Estimation for Objects at Large Distances

Joint computation of segmentation, flow, and disparity for long range stereo.

P. Pinggera, U. Franke and Rudolf Mester: Highly Accurate Depth Estimation for Objects at Large Distances. German Conference on Pattern Recognition (GCPR), Saarbrücken, Germany, September 2013

Illumination Invariance

Illumination changes belong to the very serious degradations which appear in real-
life video data streams, and which make tasks like optical flow or stereo particularly
difficult. We have developed a method that estimates the changes of illumination and camera characteristics between two subsequent frames, and thus yields a stabilized input for motion processing (Dederscheck et al. 2012) with a particular emphasis on traffic applications. The method determines areas where linear or affine changes of the sensor transfer characteristic can be observed, and estimates the ?relative intensity transfer function? from that data. The according compensation of the input images can improve the performance of further vision tasks significantly, here demonstrated by results from optical flow. Our method identifies corresponding intensity values from areas in the images where no apparent motion is present. The RITF is then estimated from that data and regularized based on its curvature. Finally, built-in tests reliably flag image pairs with 'adverse conditions' where no compensation could be performed. The method is a interesting alternative to census-based motion estimation methods (Müller et al. 2011).