University of Twente Student Theses


Detecting spam machines, a netflow-data based approach

Vliek, G. (2009) Detecting spam machines, a netflow-data based approach.

[img] PDF
Abstract:Spam is a problem that practically every email user encounters. More than 75% of all email messages are likely to be spam and this level is still rising. This makes spam prevention a very relevant topic. Most research on spam prevention has been focused at the receiving side, spam filters in email clients and receiving servers. This is changing as spam gets more interest from the research community. Network-level behavior has in recent years been seen as another research direction. This thesis focusses on detecting spam machines via Net flow data. Because Net ow only provides information about the communications between hosts and not about the contents of those communications, this is not an easy task. The aim of this work is to inspect the feasability of detecting spam machines via Net flow. To reach this goal, a large repository of Net flow data has been studied to find behavior that dfferentiates spam machines from normal email servers. With a few simple assumptions a high number of IPs with suspicious behavior could be found. The behavior displayed by those suspicious IPs has been used to propose a number of criteria for detecting spamming machines. These criteria have been combined to implement an algorithm to detect spam machines via Net flow data. To validate this algorithm, DNS blacklists and SpamAssassin log-files were used with the Netflow data of the University of Twente. The first tests with DNS blacklist validation show that with randomly picking IPs around 95% of the IPs is listed in a blacklist. Closer inspection shows that this is because of the high number of IPs with only a single or a few SMTP connections directed at the monitored network. These are probably bots, sending only a low number of spam messages per domain to avoid detection. Because of the lack of data (only Net flow data within the monitored network is available) those IPs cannot be analyzed with Net ow. This is why the focus of this work lies on machines with a higher number (at least 100) of outgoing SMTP connections. When validating the algorithm itself a few surprising observations were made. Among those, it is observed that a high percentage of idle time for an IP seems to be by far the most effective criterium. With the help of validation results, the algorithm has been optimized. The end result is a validation rate of 99% positively validated machines. This result was obtained with Net ow data captures over multiple time spans. Based on those results, we conclude that it is possible to detect spammers via only Net flow data. This has been a feasibility study and there are some open issues. Those have been left for future work.
Item Type:Essay (Master)
Faculty:EEMCS: Electrical Engineering, Mathematics and Computer Science
Subject:54 computer science
Programme:Computer Science MSc (60300)
Link to this item:
Export this item as:BibTeX
HTML Citation
Reference Manager


Repository Staff Only: item control page