RA9: Multimodal Dialog Systems

Vision and Research Strategy

The distributed nature of cyber-physical environments (CPEs) leads away from the classical
human-machine interaction and towards human-environment interaction. Moreover, the
human is no longer sitting or standing in front of a particular computer or interaction device,
but is interacting inside the computer itself. Due to their number, diversity and degree of
distribution, the transitions between the individual systems will become blurry from the
user’s point of view. In the future, rather than addressing devices, users will more likely
describe the changes to the environment they are attempting to realize. This new form
of human-environment interaction poses several challenges, which are addressed in this
research area.
Most current dialogue systems in research mainly cover scenarios that support multimodality
as a combination of two modalities. A dialogue management system for a cyber-physical
environment must be able to deal with massively multimodal interactions trying to concurrently
address all human senses in heavily instrumented environments. On the one hand,
this includes the free choice of modality, which means that any interaction should be, if
possible, realizable by every modality available based on the preferences of the user. On
the other hand, clearly more than two modalities should be integrated into a multimodal
system that can also be used in combination. Massive multimodality also means that many
homogeneous devices of the same modality are used together in one application. This could
be several microphones that collect speech input commands from a number of users.
In our research, a prototype of a massively multimodal dialogue platform called SiAM-dp
was created with the aforementioned capabilities in mind. Numerous demonstrators were
built for the following demonstration scenarios: (1) smart homes, (2) retail environments, (3)
smart factories, (4) cars, (5) production and car repair garages.

