University of Twente Student Theses

Login

A Study on the Evolution of the Dutch Web

Kooij, D. (2022) A Study on the Evolution of the Dutch Web.

[img] PDF
8MB
Abstract:Search engines need to have an up-to-date view of the Web, but Web crawling resources are limited, meaning that not all pages can be crawled continuously. With this research, the evolution of the Dutch Web is studied, having the ultimate goal of generating insights that can be used to optimise the Web crawlers of search engines. As part of this research, daily crawls are performed on a large number of pages from the Dutch Web, using a custom-built Web crawler that circumvents cookie walls. This results in a novel high-quality dataset of the Dutch Web, which is used to carry out a large-scale study on the evolution of the Dutch Web. This study includes an investigation of change types, the discovery of temporal change patterns, and the composition of a predictive Machine Learning model that can predict whether the text on pages will change.
Item Type:Essay (Master)
Faculty:EEMCS: Electrical Engineering, Mathematics and Computer Science
Subject:54 computer science
Programme:Computer Science MSc (60300)
Link to this item:https://purl.utwente.nl/essays/90539
Export this item as:BibTeX
EndNote
HTML Citation
Reference Manager

 

Repository Staff Only: item control page