Vision and Research Strategy
In many situations, the most convenient way for humans to interact with intelligent systems is via language. To arrive at natural models of interaction, computer systems need to be equipped with the means to efficiently process language and extract meaning from it. However, computing the meaning of utterances is a complex task, which involves various linguistic levels, e.g. the lexical meaning of individual words, the semantic argument structure of clauses and sentences (who did what to whom?), the discourse context (how does a sentence relate to its neighbouring sentences?), and finally the situational context (who is speaking/listening? what does the speaker want to achieve through making the utterance?). Context plays a crucial role here: sentential context influences the meaning of words, discourse context influences the meaning of sentences, and situational context influences the meaning of a discourse. In our research group, we aim to develop intelligent models of language meaning which take context information into account. We envisage that such context-aware models will perform better than approaches which deal with different linguistic phenomena in isolation. More specifically, we work on combining lexical semantics and discourse processing, for instance as applied to word sense disambiguation and semantic parsing.
A second guiding principle of our work is a focus on unsupervised or semi-supervised models, i.e., models which require no — or only a small set of — manually-labelled training data. Manual data annotation is extremely time-consuming and thus costly, particularly for semantic and discourse phenomena. Consequently, there is a severe shortage of annotated data for tasks dealing with (deep) language meaning. Moreover, if annotated data are available, they are available only for some domains and a small minority of languages. Models trained on these data are typically highly specialised, and their performance drops if they are applied to other data. In our group, we focus on developing technology that uses and combines information from various (contextual) sources in order to reach acceptable performance levels even without large amounts of manually-labelled data. We also experiment with semi-automatic annotation schemes, e.g., using automatic pre-annotation or active learning.





