Program of the Software Integration Platform

Virtual environments (VEs) provide an ideal basis for a Software Integration Platform for “Multi-Modal Computing and Interaction”. On one hand, VEs are multimodal by design and many of the research results from the cluster naturally integrate into the platform; hence, building such a VE is intrinsically a large-scale integration effort. On the other hand, the VE platform provides the basis for performing new research in multimodal technology, e.g. for virtual experiments with repeatable and controllable multimodal input. While we have already integrated other modalities, the VE demonstrator still has its main strength in graphics. We are expanding its support for other modalities including speech, text, and gesture, among others, to support more and better cross-modality research. We will specifically address the issues of multimodal user input as well as distributed VEs needed for multimodal communication and collaboration.

A key feature is a tighter integration of semantics technology (RA 5, Knowledge Management) into our software platform. Making (probabilistic) semantic information and reasoning a core capability of our platform will support new research activities and allow us to more tightly link the different modalities and their processing chains. For example, common sense knowledge (addressed in RA5) can provide prior information about objects (e.g. size, typical appearance, etc.) to improve 3D scene analysis and understanding. Semantic annotations and reasoning about a partially recognized 3D scene can provide cues about the probability of finding certain other types of objects.

Virtual characters can use semantic annotations to intelligently navigate and plan their actions in 3D scenes. We can use semantic information about types of 3D scenes (e.g. cars in cities versus cars on highways) to automatically generate prototypical 3D scenarios, which in turn can be propagated to realistically rendered images that provide labeled data for training image analysis algorithms. Semantic information thus provides the means to close the loop from image analysis to higher-level models, to simulation within and about those models, to image synthesis, and back to analysis. Of course, the same applies to other modalities as well, whether we want to generate speech for a virtual character related to its 3D environment, analyze the gestures of humans related to its speech, monitor the user’s eye gaze as important meta-information about his or her actions and perception of a 3D environment, or intelligently adapt the user interface based on the semantics of the objects we operate on. The SIP manages the software engineering efforts required to integrate the code from the research areas into a consistent, stable, and portable software basis. This includes creating suitable building blocks for multimodal computing as well as a large toolbox of multimodal interaction tools. The SIP is focused on supporting our demonstrators, but should also support the research work in the RAs. Moreover, the SIP and the demonstrators built upon it are instrumental for technology transfer to industry and for collaborations with other research groups.

The SIP is built in close collaboration with the DFKI group on “Agents and Simulated Reality” (Prof. Ph. Slusallek), which has significant experience in similar software platforms and was responsible for the VE demonstrator in the first phase. The SIP is based on Web technology and thus facilitates direct dissemination of our research results through multimodal and interactive presentations across the Internet. Additional stand-alone versions of tools will also be created. Most of the SIP software will be made available with open source licensing.