Foundations of Computer Vision
For a number of important computer vision tasks, we have investigated the use of optimization methods, in particular variational approaches and their corresponding partial differential equations (PDEs). Optimization approaches offer conceptual advantages such as transparent modeling, the possibility of relying on well-understood mathematical theories and sound algorithms, and a straightforward incorporation of invariances into continuous models.
To denoise image data, we have introduced a general optimization formulation, the so-called GNDS framework. It incorporates many known methods such as bilateral filtering, median filters, M-smoothers, diffusion approaches, regularization methods and nonlocal means, but also enables the design of novel filters with better performance.
The diversity of display technologies and introduction of high dynamic range imagery introduces the necessity of comparing images with radically different dynamic ranges. Current quality assessment metrics are not suitable for this task. We have developed a novel image quality metric capable of operating on an image pair where both images have arbitrary dynamic ranges. Our metric utilizes a model of the human visual system, and its central idea is a new defintion of visible distortion based on the detection and classification of visible changes in the image structure. In joint work with M. Hein, we also proposed a system for the enhancement of bright video features for HDR displays.
Tensor-valued images constitute a class of advanced three-dimensional datatypes that are becoming increasingly important in many applications, e.g. diffusion tensor magnetic resonance imaging (DT-MRI). In order to preserve intrinsic properties of the data, such as positive semidefiniteness, sophisticated methods must be developed. In this context, Bernhard Burgeth has introduced a general operator algebraic framework that allows one to extend scalar-valued continuous approaches and discrete algorithms to the tensor-valued setting. Three Dagstuhl workshops have been organized by Hans Hagen (TU Kaiserslautern), Joachim Weickert, David Laidlaw (Brown University) and Bernhard Burgeth, where experts from image processing and scientific visualization of tensor fields have been brought together. This has resulted in three Springer postproceeding volumes that are among the first books in this emerging field; see e.g. Thomas Schultz used tensor decomposition for estimating crossing fibers in DT-MRI (best paper award at IEEE Visualization 2008) and investigated the robust extraction of crease surfaces in the presence of degeneracies of the Hessian.
Motion analysis in image sequences is a key problem in many computer vision applications. Here we were able to continue the tradition of Saarland University to contribute some of the world's most accurate algorithms by developing a variational approach that exploits the complementarity of the data and the smoothness term. To cope with realistic real-world scenarios, it incorporates photometric invariants within the HSV color model and uses coarse-to-fine warping strategies for handling large displacements. It ranked number 1 in the widely-used Middlebury testbed.
In the area of stereo reconstruction, we have developed variational approaches for reconstructing a 3D scene from multiple views. This has emerged from a collaboration of the vision group led by Joachim Weickert and the graphics group led by Hans-Peter Seidel. In another collaboration, between the groups led by Joachim Weickert and Christian Theobalt, we have introduced a method that simultaneously computes the scene flow, the stereo geometry (fundamental matrix) and the depth map from stereo image sequences. Experiments show that it outperforms previous approaches in this field.
Shape from shading constitutes another classical 3D reconstruction problem. Since it aims to reconstruct the 3D scene from a single image under specified illumination conditions, it is even more difficult than the stereo reconstruction problem. We have contributed variational methods with novel, adaptive higher-order regularisers, and we have introduced models that combine more realistic geometric assumptions (perspective instead of orthographic projection models) with more advanced illumination assumptions (Phong model instead of the Lambertian assumption). In combination with level set based segmentation approaches, this has led to a processing pipeline that is better suited for handling more challenging real-world data sets.
Real-Time Algorithms
For prototypical methods in optimization- and PDE-based image analysis, we have developed a number of reliable and highly efficient algorithms that allow applications in time-critical scenarios.
Multigrid methods are among the fastest strategies for solving the linear and nonlinear systems of equations that arise in this context. We have several years of experience in optimizing multigrid algorithms for variational optic flow problems. More recently, we have also extended this framework to other computer vision problems such as stereo reconstruction. Moreover, multigrid methods are also used in our PDE approaches for image compression. While multigrid methods are often optimal for single-core architectures, it is challenging to exploit the potential of modern multi-core architectures and GPUs. In this context, Robert Strzodka has performed extensive research, addressing e.g. the optimal interplay between CPU and GPU and minimally invasive integration of hardware acceleration into existing software packages.
In the context of real-time rendering, many highly efficient algorithms for different hardware platforms have been developed by Philipp Slusallek's group. They are described in the summary of RA7 (Large-Scale Virtual Environments).
Robust and Compact Data Representation and Transmission
Since image and surface data sets can be very large, it is desirable to replace them with compact, feature-based representations. The idea is to keep only a few semantically important structures such as edges or shape skeletons, and to reconstruct the missing data by suitable interpolation or inpainting strategies. To this end, we have shown that PDE-based approaches can outperform the JPEG standard for high compression rates and non-textured images. More recently, even JPEG 2000 could be outperformed, and first methods have been developed for PDE-based lossy compression of surface data and videos.
However, not only compact data representation, but also its reliable and timely transmission under real-world conditions is a challenge for visual computing applications. Guaranteeting a high quality of service (QoS) in wireless home networks is one of the research topics of the Telecommunications Lab headed by Thorsten Herfet. Moreover, this group has also developed methods for creating UMTS-based live-video out of a driving car.
Confluence of Image Analysis and Image Synthesis
In recent years it has become clear that computer vision and computer graphics can benefit very much from each other. Central research topics of the groups led by Bodo Rosenhahn, Christian Theobalt and Thorsten Thormählen exploit the synergy that arises from the confluence of the two fields.
Bodo Rosenhahn's research focuses on pose estimation and tracking. One example of his research is markerless tracking of athletes interacting with sports gear. In collaboration with the groups of Thomas Brox (Dresden), Daniel Cremers (Bonn), Joachim Weickert and Hans-Peter Seidel, a markerless motion capture system has been developed that takes into account the motion restrictions that arise from interactions with sports gear (e.g. bicycle, snowboard) as soft constraints during pose estimation.
For the reconstruction of a static 3D scene, Thorsten Thormählen and Hans-Peter Seidel have developed a semi-automatic approach that enables the generation of a high-quality 3D model of a static object from an image sequence that was taken by a moving, uncalibrated consumer camera [#!TS08!#]. The approach is capable of handling not only diffuse surfaces, but even translucent or specular surfaces, and is therefore still applicable where today's laser scanners or fully automatic image-based approaches would generate inaccurate results. First, the camera parameters for each input image are estimated by automatic camera tracking. Afterwards, approaches from image-based rendering are used to generate an orthographic projection on a bounding box that is placed around the object. These ortho-images can be imported as background maps in the orthographic views of any modeling package (e.g., the top, side, and front view). Now, modelers can use the ortho-images to guide their modeling with the familiar tools of their modeling package. They can thereby use all the advanced features that the modeling package has to offer, such as spline modeling or subdivision methods.
In our research, we also developed some of the first algorithms for high quality scene reconstruction with new types of 3D sensors, so-called Time-of-Flight (ToF) cameras. ToF cameras can measure depth at video rate, but have very challenging noise characteristics. These noise characteristics make it impossible to use their data out of the box for 3D reconstruction. We therefore developed new 3D superresolution approaches that greatly enhance the detail of the captured data and, at the same time, strongly reduce the noise. With a new probabilistic superresolution and alignment method, it is even feasible to reconstruct rather detailed 3D models of real world scenes by manually moving the camera around an object [4]. By these means, we took the first steps towards making 3D reconstruction technology available to consumers, and eventually making detailed 3D models as widely used as image and video data are today.
Last but not least, it should be mentioned that five MMCI researchers from image analysis and image synthesis have jointly organised the 2007 Conference on Visualization, Modeling and Vision (VMV) The VMV symposia have become an important annual platform for bringing together the German experts on computer vision and computer graphics.
Confluence of Visual Computing and Text and Speech Processing
See the SIF project "Multimodal Far Field Speech Processing," in which computer vision techniques are used to improve speech recognition.