Thesis archive
Scope
Bachelor or Master
Abstract
The goal of this bachelor thesis is to automatically extract dates in web sites and to generate date related queries at the Europeana API [3]. For example, the following sentence contains the year 1970, which should be extracted and used as a search query. "The University of Passau, foundet in the late 1970s, is the". Extracting dates requires methods from the field of information extraction. Similarly, places should be extracted. From both, dates and places a query should be constructed against the Europeana API and evaluated. The implemantation must be done in JavaScript. An evaluation on the quality of the extraction has to be conducted in order to validate your approach.
In a bachelor thesis you can choose to either extract places or dates relationships.
In a master thesis you should do both, dates an places.
Topics covered
•Information Extraction
•Practical implementation/Web Development
•Evaluation techniques
Scope
Bachelor
The Web constitutes a rich source of cultural heritage objects contained in collections maintained by online digital libraries, small museums, and the like. Such content is called long-tail content of the Web, because it is distributed across a multitude of web sites, which are rarely visited compared to major web sites like Google, Amazon, and Facebook. Most Web users, however, while potentially interested in the cultural content, do not access these resources because they either do not know of their existence or do not know how to search. Within the EU project EEXCESS [1] strategies are being researched to bring this long-tail content to the users' habitual whereabouts. RESTful Services for accessing long-tail content were already developed within the project. MediaWiki [2] is a commonly used Wiki software with Wikipedia being the most prominent example.
In this thesis an open source plugin for MediaWiki should be developed that accesses the EEXCESS REST services and inject the retrieved results into MediaWiki pages.
Helpful knowledge/skills
• REST services
• PHP
• XHTML, CSS, JavaScript
[1] http://eexcess.eu
[2] https://www.mediawiki.org/wiki/MediaWiki
Contact: Christin.Seifert@uni-passau.de
Scope
Bachelor's dissertation or Master's thesis
Description
The objective of this paper is to implement an index structure and corresponding search interface which allows regular-expression-based searches in large text corpuses. The paper uses this procedure as a starting point.
The finished software will be made available as Open Source software and/or integrated into existing search libraries such as Lucene.
Scope
Bachelor Thesis in Semantic Web/Information Retrieval
Abstract
Entity Disambiguation refers to the process of resolving whether a word represents a particular entity or not. For example the word “bank” could mean “river bank” or a “financial institute”. The most crucial resource in disambiguation is a corresponding index, or database that holds all resolvable entities. Linked Data provides a disambiguation resource. However, due to its structure it is not directly usable and requires preprocessing and efficient indexing.
The aim of this thesis is to investigate efficient methods to create disambiguation indices from distributed RDF (Linked Data) data repositories. The bachelor thesis will be conducted for the EU Project CODE. Depending on its success, further research opportunities could be opened in this project.
Tasks
- State of the Art analysis: Semantic Web, Information Retrieval
- Devleopment of an indexing concept using declarative rules and RDF based heuristics
- Implementation of the indexing strategy in Java using the Lucene search engine
- Evaluation of efficiency and accuracy
The Wiki Game is a game where two persons compete for finding the shortest way between two topics. But what is a good strategy in winning? What links should be followed and which background knowledge is used for making decisions?
Recent research conducted in Stanford and Graz has investigated the topic by analysing the link structure of Wikipedia and showed some interesting navigational patterns. In this thesis a mix of link structure and textual background knowledge should be investigated for providing hypothesis on how users play such wiki games. Genetic Algorithms should be used to conduct experiments on a data set of wiki games that will be provided by us.
Challenges
•The Wikipedia Category tree is large and would in general not fit to the display area (pruning the tree)
•Suitable interaction mechanisms have to be defined to allow the correction of the automatically inferred interests.
•A user may not wish to disclose all inferred interests to the system (privacy concerns), and thus interaction need to be defined for defining a privacy policy within the visualization.
References
[1] Help:Category
[2] Visualization papers
Scope
Bachelor or Master
Abstract
Finding an optimal query is not an easy task. Within this bachelor thesis automatic query construction algorithms should be implemented and evaluated. Given a paragraph of a web site, you should implement an algorithm that automatically constructs a set of distinctive queries from that paragraph, whereas the search engine is treated as black box. The queries should be maximum specific. Some state of the art algorithms therefore already exist [1], [2]. Ideally the implementation is done in JavaScript, but Java and Python are also possible solutions.
In a bachelor thesis you should implement the algorithm in [1] and analyse its performance against the Europeana API [3].
In a master thesis you should extend the algorithm in [1] by taking into account the content of results returned and analyse its performance against the Europeana API and two other search engines of your choice.
Topics covered
•Information Retrieval and Text Mining Techniques
•Practical implementation
•Evaluation techniques
[1] http://www.uni-weimar.de/medien/webis/publications/papers/stein_2010l.pdf
[2] http://www.uni-weimar.de/medien/webis/publications/papers/stein_2013d.pdf
[3] http://www.europeana.eu/portal/api-introduction.html
Scope: Bachelor/Master
QueryCrumbs [1] visualize the (recent) query history in a compact manner. Every query is indicated by a query mark and similar queries share the same color. For a more detailed comparison, the fill levels of the query marks indicate the percentage similarity w.r.t a particular query [2].
The thesis should implement the QueryCrumbs approach to be used with the Google Search Engine (most likely as a browser extension), add additional extensions (like for example an advanced history management) and evaluate its usage.
[1] Christin Seifert, Jörg Schlötterer, Michael Granitzer: QueryCrumbs: A Compact Visualization for Navigating the Search Query History, Information Visualisation (IV), 2017. (paper available from the chair upon request)
[2] demo at https://www.dropbox.com/s/5snromtc4ww3fl8/querycrumbs.mp4?dl=0
contact: joerg.schlotterer@uni-Passau.de