Open-Science-Web Demonstrator

Clustered Results of Harvesting Entity Photos

This demonstrator is driven by the long-term objective of automatically building and maintaining comprehensive knowledge bases about entities and their semantic classes, relationships between entities, associated multimodal information such as photos, and informative cross-linkage with Web sites and Web 2.0 sources.

The demonstrator builds on earlier work in G. Weikum's group, which has developed methodologies for automatically constructing a large knowledge base from Wikipedia and other Web sources. The YAGO base has been downloaded several thousand times, and is used in many research projects worldwide. A new edition of the knowledge base, YAGO2, currently contains more than 200 million facts, including meta-facts about time, location, and provenance. The facts in knowledge bases like YAGO2 also serve as seeds for pattern-based information extraction from arbitrary natural-language texts and Web sources. This has been pursued in both RA1 (H. Uszkoreit) and RA5 (G. Weikum). The knowledge base can be leveraged to semantically annotate entities and facts in natural-language texts such as news or blogs. Entity recognition and disambiguation is also leveraged in the work of Sebastian Michel's IRG on Web 2.0 streams, as well as in joint work by Pinkal's and Weikum's groups, spanning RAs 1 and 5.
The search engine NAGA ranks search results on a novel form of statistical language model computed on the 500-million-page corpus CleuWeb2009 using MapReduce methods developed in Ralf Schenkel's IRG. To compute only the top-k results of a query, we have built on the algorithms jointly developed by Weikum's group and the IRGs led by Hannah Bast and Ralf Schenkel. To query entity names in image search engines yielding large candidate lists with high precision and satisfactory recall, our approach harnesses the knowledge base facts about the entities of interest, including salient keyphrases automatically mined from seed pages such as Wikipedia.

URDF is a powerful rule-based engine for querying and reasoning. With the rich knowledge imported from YAGO, it has been used as a backend engine for a dialog system developed at DFKI. Work on RA5 at the DFKI Language Technology Lab is aimed at novel applications that assist scientists in searching and authoring scientific publications.

Knowledge Bases

Search and Exploration

Interactive Reasoning

Discovery and Visualization

Multimodal Knowledge