University of Twente Student Theses
Towards distributed information retrieval based on economic models
Eerenberg, E. (2011) Towards distributed information retrieval based on economic models.
PDF
514kB |
Abstract: | Introduction With the ever increasing amount of data on the Internet, there is an increasing need to search this information in new and more efficient ways. A part of the data on the Internet are not accessible to traditional search engines, as these data can only be accessed by a form for example. With distributed information retrieval systems however, these types of data can be accessed. In these systems there is a central broker with multiple servers, and the broker redirects queries to the servers. Each server fetches results from its own database and returns this to the broker. We are interested if this architecture can be built using an economic model, in which servers need to pay for the right to return results. We have seen from previous research that the use of an economic model might yield good results, as a successful spam filter based on an economic model has already been built. The aim of this research is to build a successful distributed information retrieval system based on an economic model, allowing servers to open up their part of the deep web. Methodology This research consists of three parts: 1) selecting suitable economic models, 2) simulating these models, and 3) performing a real-world test. We selected the economic models starting with a review of the current literature on economic models. With the obtained information we performed a multi-criteria analysis, a model checking phase, and a test on economic properties to select suitable models. The remaining models were simulated in custom-built simulation software, in which multiple variables were modified in different runs in order to observe their effects. The most suitable economic model was implemented in a real-world test, in which users valued the results of the system based on an economic model as well as a traditional search engine. Results We found the models of Vickrey auction and bond redistribution to be the most suitable ones. These models behaved well in our simulation and both outperformed a naive comparison model. The Vickrey auction model performed best in a scenario that mostly resembles the Internet. On average 69% of all models with a strong correlation between the economic outcomes and the performance of information retrieval (Kendall’s-τ > 0.6) is a Vickrey auction model. In the realworld test we show that users appreciate both the use and administration of an information retrieval system based on an economic model. Furthermore, if we apply a perfect categorization, the economic model outperforms the comparison engine with a 66% increase in performance. Discussion We conclude that it is possible to build a distributed information retrieval system based on an economic model. It performs better than a naive system and also in a real-world test it outperforms a traditional engine. However, non-human categorization of the queries negatively influenced the performance of the models, which shows the need for better categorization algorithm. Exposing the deep web with the use of an economic model is feasible and might even introduce new business models for servers and brokers by earning money with search results. |
Item Type: | Essay (Bachelor) |
Faculty: | EEMCS: Electrical Engineering, Mathematics and Computer Science |
Subject: | 54 computer science |
Programme: | Computer Science BSc (56964) |
Link to this item: | https://purl.utwente.nl/essays/60020 |
Export this item as: | BibTeX EndNote HTML Citation Reference Manager |
Repository Staff Only: item control page