University of Twente Student Theses


Ontology-driven information integration of food industry related RSS news feeds

Brink, P.H.B. (Pieter) van den (2007) Ontology-driven information integration of food industry related RSS news feeds.

[img] PDF
Abstract:In this master thesis, an information system is described that integrates news articles from various online sources, focusing on RSS news feeds, in the food industry domain. The research was initially triggered by the consultancy company Infortellence, which is active in this domain. The business goal was stated as: To aggregate food-industry related news articles and provide a selection of this news tailored to a customer’s interest, resulting in more interest in other Infortellence services. The system makes use of an ontology which contains terms from the food domain, to automatically expand user queries with the aim to improve relevancy of the results to the user. Thus, the system has been labelled FORCA – Food Ontology-driven RSS Content Aggregator. In addition, the system enabled users to create profiles of their interest. These provide direct access to the latest news articles matching the profile. To measure relevancy, several experiments were done which measured the information retrieval (IR) metrics of recall (percentage of all relevant articles that were retrieved) and precision (percentage of relevant articles among the ones that were retrieved). Relevancy was established through use of a gold standard: a domain expert evaluated all articles in the corpus (around 1600) for their relevancy to each of the 14 test queries. The first round of experiments did not use any other IR techniques such as stemming. In the second round of experiments, stemming was implemented. This was found to grant a 6% increase in recall, with precision remaining stable. However, automatic expansion of queries with the ontology was found not to be beneficial overall. Narrower terms and synonyms were found to have little effect. Related terms and broader terms resulted in a noticeable increase in recall, but the loss in precision resulted in a lower overall performance. Further research could focus on using a larger document corpus with a larger set of queries, or using a pre-classified corpus from a large commercial database. This would take away some of the subjectivity of relying on a single domain expert to do the gold standard classification. Another avenue is to focus on different, more user-focused metrics. The system was found to have value in saving time and effort to obtain information for its users. This could be further confirmed with research using representative domain users to trial the system, and evaluate it with a standardized questionnaire. Finally, the FORCA system and methodology can be viewed as a business model that can be applied to other companies as well. The idea of generating interest through news aggregation ties in most closely with information service-related companies, such as consultancy companies. However, it can provide value to any company interested in attracting more visitors to their corporate website and learning more about their existing or potential customers.
Item Type:Essay (Master)
Faculty:EEMCS: Electrical Engineering, Mathematics and Computer Science
Subject:85 business administration, organizational science
Programme:Business Information Technology MSc (60025)
Link to this item:
Export this item as:BibTeX
HTML Citation
Reference Manager


Repository Staff Only: item control page