In this branch of VSI Frankfurt activities, we are interested in developing learning approaches for multi-camera systems. In contrast to vision systems that are engineered in a conventional way, by employing knowledge about cameras, image physics, multi-image geometry, etc.
We are particularly interested in principles that allow visual perception to emerge autonomously on the basis of the inherent structure of natural image signals. The guiding principle is that the statistics of natural image signals provides enough information to self-organize a processing system that extracts useful information from a video stream. This research resulted in several unsupervised and biologically inspired algorithms which are far off the main stream of computer vision research, but offer answers to the question how mother nature enables living beings with visual perception.
PhD Dissertation by Christian Conrad, Spring 2017: “Unsupervised Learning of Correspondence Relations in Image Streams.” Download Thesis
Learning in Multi-Camera Systems
Temporal Coincidence Analysis (TCA) is a learning approach developed by the VSI group which determines the geometric and photometric relationship between multiple cameras which have at least partially overlapping fields of view. The essential difference to standard matching techniques is that the search for similar spatial patterns is replaced by an analysis of temporal coincidences of certain characteristic or rare events in single pixels. In TCA, a correspondence is represented by means of a correspondence distribution. This correspondence distribution is estimated based on the repeated detection and matching of strong temporal grey value changes (=events) among the regarded views. Correspondences are never computed explicitly, only the evidence for a correspondence relation by means of matched events is collected over time.
The images below show stereo correspondences that have been learnt via TCA. It can be seen that the approach is able to learn correspondences where the involved camera views show large differences in scale and are rotated w.r.t. each other.
Learning of geometric and photometric relationships between multiple cameras is an intertwined process: the learning of correspondences depends on a model of the photometric relationships and the learning of the photometric relationships depends on estimated correspondences. We model the photometric differences between camera views by means of a Grey Value Transfer Function (GVTF) and learn its parameters based on a comparagram of pairs of grey values observed at learnt correspondences.
Besides learning correspondence distributions in a stereo setup, TCA can also be used to learn the average optical flow in a monocular video stream without explicitly estimating optical flow vectors.
In addition to learning average optical flow, we may also estimate parameters of instantaneous (per frame) image motion via TCA. To this end, we aggregate TCAs event detection results of a specific subset of pixels from which we may infer, e.g., the (scaled) yaw rate. The figures below show examples of the event detection (red/blue dots) and aggregation (masks in bottom right corner of each image).
- C. Conrad: Unsupervised Learning of Correspondence Relations in Image Streams. Dissertation, Goethe University Frankfurt, Frankfurt am Main, Germany, 2017.
- C. Conrad and R. Mester: Learning Motion from Temporal Coincidences. Workshop on Biological Inspired Computer Vision, Sicily, Italy, September 2017
- C. Conrad and R. Mester: Learning Relative Photometric Differences of Pairs of Cameras. International Conference on Advanced Video and Signal based Surveillance (AVSS), Karlsruhe, Germany, 2015.
- J. Eisenbach, C. Conrad, and R. Mester: A temporal scheme for fast learning of image-patch correspondences in realistic multi-camera setups. Workshops of the International Conference Computer Vision and Pattern Recognition (WCVPR). Portland, USA, June 2013.
- C. Conrad, A. Guevara and R. Mester: Learning multi-view correspondences from temporal coincidences. Computer Vision and Pattern Recognition Workshops, Colorado Springs, CO, USA, June 2011
Learning geometrical transformations in closed form
In a BMVC paper, we propose an unsupervised and sampling-free approach to learn the correspondence relations between pairs of cameras in closed form, employing a linear statistical model known as Canonical Correlation Analysis (CCA).
The term correspondence relation comprises the geometrical and photometrical transform that transfers one view of a scene of object into another view. These different views can be simple transforms, such as an arbitrary rotation or reflection, or more complex, like a perspective transforma- tion of a planar or curved surface. The only assumption we make during the processing is that this transform remains fixed. The approach learns sparse basis patterns that relate each local area in image A (or camera view A) to the corresponding area in image B or camera view B.
Applying a learnt transformation to unseen data:
- Top two images: A learnt transformation of 90 degrees is applied to previously unseen data, here to Olivetti faces and MNIST digits. Each pair of images shows the input image (left) and the result image after applying the learnt transformation (right).
- Bottom two images: As before but for a rotation of 45 degrees. Note how unrelated areas are filled with noise.
- C. Conrad and R. Mester: Learning Rank Reduced Mappings using Canonical Correlation Analysis. Statistical Signal Processing Workshop (SSP), Palma de Mallorca, Spain, June 2016.
- C. Conrad and R. Mester: Learning Multi-View Correspondences via Subspace-Based Temporal Coincidences. British Machine Vision Conference (BMVC), Guildford, United Kingdom, September 2012