University of Twente Student Theses
As of Friday, 8 August 2025, the current Student Theses repository is no longer available for thesis uploads. A new Student Theses repository will be available starting Friday, 15 August 2025.
A Study on the Evolution of the Dutch Web
Kooij, D. (2022) A Study on the Evolution of the Dutch Web.
PDF
8MB |
Abstract: | Search engines need to have an up-to-date view of the Web, but Web crawling resources are limited, meaning that not all pages can be crawled continuously. With this research, the evolution of the Dutch Web is studied, having the ultimate goal of generating insights that can be used to optimise the Web crawlers of search engines. As part of this research, daily crawls are performed on a large number of pages from the Dutch Web, using a custom-built Web crawler that circumvents cookie walls. This results in a novel high-quality dataset of the Dutch Web, which is used to carry out a large-scale study on the evolution of the Dutch Web. This study includes an investigation of change types, the discovery of temporal change patterns, and the composition of a predictive Machine Learning model that can predict whether the text on pages will change. |
Item Type: | Essay (Master) |
Faculty: | EEMCS: Electrical Engineering, Mathematics and Computer Science |
Subject: | 54 computer science |
Programme: | Computer Science MSc (60300) |
Link to this item: | https://purl.utwente.nl/essays/90539 |
Export this item as: | BibTeX EndNote HTML Citation Reference Manager |
Repository Staff Only: item control page