University of Twente Student Theses


Dynamic detection of mobile malware using real-life data and machine learning

Panman de Wit, J.S. (2018) Dynamic detection of mobile malware using real-life data and machine learning.

This is the latest version of this item.

[img] PDF
Abstract:Mobile malwares are malicious programs that target mobile devices, which are an increasing problem. This is reflected by the rise of detected mobile malware samples per year. Additionally, the number of active smartphone users is expected to grow, stressing the importance of research on the detection of mobile malware. Detection methods for mobile malware exists, although methods are still limited and incomprehensive. In this paper, we propose detection methods that use device information such as the CPU usage, battery usage, and memory usage for the detection of 10 subtypes of Mobile Trojans. The focus of this paper is the Android Operating System (OS) as it is dominating the mobile device industry with an 80 per cent market share. This research uses a dataset containing device and malware data of 47 users for an entire year (2016) to create multiple mobile malware detection methods. By using real-life data this research provides a realistic assessment of its detection methods. Additionally, using this dataset we examine which features, i.e. aspects, of a device, are most important in detecting (subtypes of) Mobile Trojans. The performance of the following machine learning classifiers are assessed: Random Forest, K-Nearest neighbour, Naïve Bayes, Multilayer perceptron, and AdaBoost. All classifiers are assessed using a 4-fold cross-validation with holdout method. Additionally, the hyperparameters of all classifiers are tuned with the use of a GridSearch. Furthermore, we assess performances of classifiers when one model is trained for all subtypes of Mobile Trojans, and when separate models are trained for each subtype of Mobile Trojans. Our results show that the Random Forest classifier is most suited for the detection of Mobile Trojans. The Random Forest classifier achieves an f1 score of 0,73 with an False Positve Rate (FPR) of 0.009 and False Negative Rate (FNR) of 0.380 when one model is created to detect all 10 subtypes of Mobile Trojans. Furthermore, our research shows that the Random Forest, K-nearest neighbour classifier, and AdaBoost classifiers achieve, on average, an f1 score > 0.72, an FPR of <0.02 and an FNR <0.33, when models are created separately for each subtype of Mobile Trojans. Moreover, we examine the usability of the different detection methods. By assessing multiple metrics such as the model size and training times, we analyse whether the methods can be deployed locally on devices. Lastly, we examine the cost and benefits, for businesses, associated with deploying self-made detection methods.
Item Type:Essay (Master)
Faculty:EEMCS: Electrical Engineering, Mathematics and Computer Science
Subject:54 computer science
Programme:Business Information Technology MSc (60025)
Link to this item:
Export this item as:BibTeX
HTML Citation
Reference Manager


Repository Staff Only: item control page