Computer Vision for Autonomous Driving
Driver assistance has been a major application field for the research performed by Prof. Rudolf Mester and his group since the early 90s, when he made significant contributions to traffic, security, and vehicle-related image and video interpretation, as one of the initiators of computer vision research at Bosch Research Center Hildesheim.
Since many years, Prof. Mester cooperates with major automotive manufacturers, and acts as supervisor for Master degrees and Ph.D.students, for instance in Driver Assistance.
Our work deals with:
- precision estimation of egomotion and 3D environment structure
- predictive structures for automotive visual sensing
- fast pre-estimation of pitch, yaw and roll
- fast monocular and multi-monocular surround sensing
- the stixel representation
- estimation of illumination changes
- far-field object detection using stereo disparity and motion
Conference: Irish Machine Vision and Image Processing conference (IMVIP), Galway, Ireland, August 2016
Abstract: Understanding the world around us while we are moving means to continuously maintain a dynamically changing representation of the environment, to make predictions about what to see next, and to correctly process those perceptions which were surprising, relative to our predictions. This principle is valid both for animate beings, as well as for technical systems that successfully participate in traffic. At the VSI Lab, we put special emphasis on this recursive / predictive approach to visual perception in ongoing projects for driver assistance and autonomous driving. These processing structures are complemented by statistical modeling of egomotion, environment, and the measurement process. In our opinion, this approach leads to particularly efficient systems, since computational ressources may be focussed on ’surprising‘ (thus rare) observations, and since this allows for a large reduction of search spaces in typical visual matching and tracking tasks. The talk will present examples for such predictive / recursive processing structures. Furthermore, recent results in the field of monocular, stereo, and multi-monocular (surround vision) applications will be shown.
Conference: Intelligent Vehicles (IV) Symposium – Workshop on holistic interfaces for environmental fusion models, Gothenburg, Sweden, June 2016
Abstract: Understanding the world around us while we are moving means continuously maintaining a dynamically changing representation of the environment, making predictions about what to see next, and correctly processing those perceptions which were surprising, relative to our predictions.
This principle is valid both for animate beings, as well as for technical systems that successfully participate in traffic. The VSI Lab at Frankfurt University puts special emphasis on this recursive / predictive approach to visual perception in ongoing projects for ADAS and autonomous driving. In our opinion, this approach leads to particularly efficient systems, since computational ressources may be focussed on ’surprising‘ (thus rare) observations, and since this allows for a large reduction of search spaces in typical visual matching and tracking tasks.
Furthermore, since the environment representation is actually closely coupled to the measuring process, and not a distant result at the end of a long processing pipeline, it allows for a simplified fusion of information from different sensors. This implies of course a more tight coupling between sensor data processing and interpretation. The talk will present examples for the such predictive / recursive processing structures and put the pros and cons up to discussion.
Conference: International Conference on Computer Vision (ICCV) – Computer Vision for Road Scene Understanding and Autonomous Driving, Santiago, Chile, December 2015
Abstract: The talk presents work of the VSI group (Frankfurt) and the CVL group (Linköping) in the area of Visual (Surround) Sensing for cars, emphasizing methods for measuring visual motion reliably, and extracting 3D information from sets of trajectories over multiple frames. The algorithms presented here are characterized by a strong predictive / recursive character of the processing pipeline, and they involve stochastic models of vehicle dynamics, such as presented in the companion paper [Bradler et al., CVRSUAD 2015]. We show examples of diverse new test data sets, both in a multi-monocular surround view mode (AMUSE data set) as well as very realistic synthetic sequences (COnGRATS) that include precise pixelwise ground truth for 3D depth, optical flow, surface orientation, and semantic labeling. We conclude with an examples of how the diverse variants of the investigated environment perception methods (monocular, stereo, and multi-monocular) perform on real driving scene data.
Keypoint Trajectory Estimation using Propagation-based Tracking (PbT)
One of the major steps in visual environment perception for automotive applications is to track keypoints and to subsequently estimate egomotion and environment structure from the trajectories of these keypoints. We present a propagation based tracking (PbT) method to obtain the 2D trajectories of keypoints from a sequence of images in a monocular camera setup.
Instead of relying on the classical RANSAC to obtain accurate keypoint correspondences, we steer the search for keypoint matches by means of propagating the estimated 3D position of the keypoint into the next frame and verifying the photometric consistency. In this process, we continuously predict, estimate and refine the frame-to-frame relative pose which induces the epipolar relation.
Experiments on the KITTI dataset as well as on the synthetic COnGRATS dataset show promising results on the estimated courses and accurate keypoint trajectories.
- N. Fanani, M. Ochs, H. Bradler and R. Mester: Keypoint trajectory estimation using propagation based tracking. Intelligent Vehicles Symposium (IV), Gothenburg, Sweden, June 2016.
Retrieving dense information from sparse measurements with PCA
In computer vision most iterative optimization algorithms, both sparse and dense, rely on a coarse and reliable dense initialization to bootstrap their optimization procedure. For example, dense optical flow algorithms profit massively in speed and robustness if they are initialized well in the basin of convergence of the used loss function.
The same holds true for methods as sparse feature tracking when initial flow or depth information for new features at arbitrary positions is needed. The method is able to determine a dense reconstruction from sparse measurement.
When facing situations with only very sparse measurements, typically the number of principal components is further reduced which results in a loss of expressiveness of the basis. We overcome this problem and inject prior knowledge in a maximum a posterior (MAP) approach.
- M. Ochs, H. Bradler and R. Mester: Learning Rank Reduced Interpolation with Principal Component Analysis. Intelligent Vehicles (IV) Symposium, Los Angeles, USA, June 2017
Instance-level segmentation of vehicles
The recognition of individual object instances in single monocular images is still an incompletely solved task. In this work, we propose a new approach for detecting and separating vehicles in the context of autonomous driving.
Our method uses the fully convolutional network (FCN) for semantic labeling and for estimating the boundary of each vehicle. Even though a contour is in general a one pixel wide structure which cannot be directly learned by a CNN, our network addresses this by providing areas around the contours. Based on these areas, we separate the individual vehicle instances.
- J. van den Brand, M. Ochs and R. Mester: Instance-level Segmentation of Vehicles by Deep Contours. ACCV 2016 – Workshop on Computer Vision Technologies for Smart Vehicle, Taipei, Taiwan, November 2016.
COnGRATS: Synthetic Datasets
COnGRATS is a framework for the generation of synthetic data sets that support the development and evaluation of vision algorithms in the context of driver assistance applications and traffic surveillance.
Due to constraints with regards to safety and general feasibility, it is often not possible to acquire the necessary testing data for many of the interesting and safety-relevant conditions which can occur. The demands for test or training datasets can be satisfied with the use of synthetic data, where different scenarios can be created and the associated ground truth data is absolutely accurate
The COnGRATS team continuously creates highly realistic image sequences featuring traffic scenarios. The sequences are generated using a realistic state of the art vehicle physics model; different kinds of environments are featured, thus providing a wide range of testing scenarios. Due to the physics-based rendering technique and variable camera models employed for the image rendering process, we can simulate different sensor setups and provide appropriate and fully accurate ground truth data.
- D. Biedermann, M. Ochs and R. Mester: Evaluating visual ADAS components on the COnGRATS dataset. Intelligent Vehicles (IV) Symposium, Gothenburg, Sweden, June 2016
- D. Biedermann, M. Ochs and R. Mester: COnGRATS: Realistic Simulation of Traffic Sequences for Autonomous Driving. Image and Vision Computing New Zealand (IVCNZ), Auckland, New Zealand, November 2015 (Best Student Paper Award)
Selection of good edge pixels (edgels) to track
A common problem in multiple view geometry and structure from motion scenarios is the sparsity of keypoints to track. We present an approach that allows to select the edge pixels to track which are not aligned with the epipolar lines. Knowing the motion of the camera, they can be matched as a 1D optimization problem.
- T. Piccini, M. Persson, K. Nordberg, M. Felsberg and R. Mester: Good Edgels To track: Beating the Aperture Problem with Epipolar Geometry. European Conference on Computer Vision (ECCV) – 2nd Workshop for Road Scene Understanding and Autonomous Driving, Zürich, Switzerland, September 2014