University of Twente Student Theses

Login

Federated Aggregated Search

Marenco Zúñiga, Andrés (2014) Federated Aggregated Search.

[img] PDF
7MB
Abstract:This thesis investigates the problem of creating an 'aggregated search engine' with federated verticals. The core component for a good performance of the system is a vertical representation created by sampling each vertical. However, due to the heterogeneous nature of the documents, many times this sample is not descriptive enough. With the help of Wikipedia, we experiment with three techniques found in the literature aimed to enrich the vertical representation: a) using only Wikipedia articles as representation; b) using a combination of Wikipedia articles and the vertical sample; and c) expanding the contents of each sampled document. We discovered how by applying LDA to model the hidden topics of each vertical it is possible to identify Wikipedia articles with the same theme coverage. Then, by using only those articles for representation of some particular verticals, the selection task is improved. Finally, we experimented with the modeled topics together with Wikipedia categories and query categorization to boost the score of the verticals that could be associated with the query string. Although in this case our results are inconclusive, the numbers suggest that the approach could lead to a better vertical selection.
Item Type:Essay (Master)
Faculty:EEMCS: Electrical Engineering, Mathematics and Computer Science
Subject:54 computer science
Programme:Computer Science MSc (60300)
Link to this item:https://purl.utwente.nl/essays/66435
Export this item as:BibTeX
EndNote
HTML Citation
Reference Manager

 

Repository Staff Only: item control page