Antrittsvorlesung: Prof. Dr. Ansgar Scherp
Audimax, Hörsaal D, Christian-Albrechts-Platz 2
Titel: "How to Juggle with more than a Billion Triples?"
The talk first introduces the principals of Linked Open Data (LOD). Goal of LOD is to publish and interlink data on the web, similar to how HTML pages are published and connected through hyperlinks today. The LOD movement started in 2007 and has since then gained widespread popularity amongst various non-commercial organizations like the Library of Congress in the US and the German National Library as well as commercial organizations such as the BBC, New York Times, Google, Microsoft, and Facebook. Since its advent, the amount of LOD published has increased dramatically. It has reached more than 30 Billion triples---the smallest information unit in Linked Data---in 2011.
In the second part of the talk three examples of how to deal with Linked Data at large-scale will be presented. The first example is SemaPlorer, an interactive application that allows for an iterative, faceted search and navigation of a very large amount of open social media data of differing quality and originating from different sources in real-time. The SemaPlorer application has won the Billion Triple Challenge in 2008, which aims at "demonstrating something useful" with more than a Billion triples. Subsequently, we will present SchemEX as a tool that allows for the efficient extraction of implicit and explicit schema information from LOD. SchemEX follows a stream-based approach, i.e., only triples occurring in a specific window size are considered. The SchemEX approach has won the Billion Triple Challenge in 2011. Finally, we present LODatio as a Google-inspired search engine designed for data engineers to find relevant sources of LOD. LODatio makes use of the schema-level SchemEX index. It supports the data engineer in finding relevant sources of data by providing example snippets and query suggestions like "Did you mean" to broaden the current query and "Related Queries" to further specialize the information need.