A Study on the Evolution of the Dutch Web
Kooij, D. (2022)
Search engines need to have an up-to-date view of the Web, but Web crawling resources are limited, meaning that not all pages can be crawled continuously. With this research, the evolution of the Dutch Web is studied, having the ultimate goal of generating insights that can be used to optimise the Web crawlers of search engines. As part of this research, daily crawls are performed on a large number of pages from the Dutch Web, using a custom-built Web crawler that circumvents cookie walls. This results in a novel high-quality dataset of the Dutch Web, which is used to carry out a large-scale study on the evolution of the Dutch Web. This study includes an investigation of change types, the discovery of temporal change patterns, and the composition of a predictive Machine Learning model that can predict whether the text on pages will change.
Kooij_MA_EEMCS.pdf