University of Twente Student Theses
Tracking Internet Host Usage Types : Classification and Trends via Reverse DNS Data
Khavrona, Roman (2024) Tracking Internet Host Usage Types : Classification and Trends via Reverse DNS Data.
PDF
5MB |
Abstract: | The rapid growth of the Internet has led to increased complexity in IP address usage, making it challenging to effectively track and understand the functions these IP addresses serve, such as whether they belong to datacenters, educational institutions, internet service providers or other categories. This topic has not been widely researched publicly and is crucial for network management, cybersecurity and content regulation. Given that IP addresses are merely numerical labels, we utilized their associated string hostnames obtained via reverse DNS for our classification tasks. We analyzed 1,285,834,541 IPv4 addresses (approximately 30\% of the total IPv4 address space) and classified each IP address by its usage type using IP2Location's commercial dataset as the ground truth. Datasets covering these types of data are expensive to obtain and have unknown methodologies; therefore, we aimed to create a model with an open methodology that would output the usage type by only providing a hostname, which is easily obtainable by anyone performing a reverse DNS query with the IP address of interest. In our research, we analyzed both manually crafted and automatic features generated through the Word2Vec technique from hostnames and supplied these features to machine learning models, achieving a prediction accuracy close to 70%. Additionally, we performed a longitudinal analysis of usage types throughout 2023, utilizing four different data snapshots to identify trends and tendencies in shifts of IP usage types, and observed notable changes, such as the consistent decline in the organization (ORG) category, overall decrease in mobile ISP services, the steady increase in datacenter IP addresses and expansion in educational IP address usage. Lastly, we investigated the potential of applying our developed techniques to predict the country attribute within the IP2Location dataset. This attribute was selected due to IP2Location's claim of high accuracy, a claim that has been substantiated by other researchers who have reported accuracy levels approaching 100%. Our objective was to assess the performance of our techniques on an attribute with established accuracy, and the results demonstrated an accuracy exceeding 90%, potentially indicating that our methodology for inferring usage types is practical for real-world scenarios. |
Item Type: | Essay (Master) |
Faculty: | EEMCS: Electrical Engineering, Mathematics and Computer Science |
Subject: | 54 computer science |
Programme: | Computer Science MSc (60300) |
Link to this item: | https://purl.utwente.nl/essays/103649 |
Export this item as: | BibTeX EndNote HTML Citation Reference Manager |
Repository Staff Only: item control page