University of Twente Student Theses


Automated Vulnerability Detection in Java Source Code using J-CPG and Graph Neural Network

Patil, Samarjeet Singh (2021) Automated Vulnerability Detection in Java Source Code using J-CPG and Graph Neural Network.

[img] PDF
Abstract:In this digital era, detecting a software vulnerability is a crucial yet daunting task to protect the systems from adversarial cybersecurity attacks. Although there has been researching in this direction, vulnerability detection remains open, evidenced by the numerous vulnerabilities reported daily. There are several tools available to mitigate the consequences of software vulnerabilities and improve system security. The traditional tools such as the static analysis tools can detect only generic errors using a list of pre-defined rules and vulnerability patterns or contradict expected software behavior. Hence, these tools cannot easily extend it to more specific vulnerability patterns without thoroughly studying the vulnerability and its causes. Additionally, a new set of modern tools inspired by machine learning models in text/speech processing, image processing, and computer vision are also available. However, these tools consider the source codes as flat sequences which do not alleviate the long-term dependency problem. The vulnerability within a source code must be identified at a finer granularity to localize the vulnerability and facilitate the fix. To alleviate these limitations, inspired by the recent development of Graph Neural Networks and their practical application in various fields, we explore Graph Neural Networks' applicability in learning the properties of source code from a security standpoint. We propose an automatic and intelligent vulnerability detection method that uses a tool operating at the source code level to provide an intermediate graphical representation of the source code and graph neural network-based model for vulnerability prediction at method-level granularity. Working towards this direction, we developed a tool called JCPG that operates at the source code level to capture the data and control flow analyses and generate an intermediate graphical representation of the source codes at the file level and the method level. Our approach uses the JCPG tool to represent source codes as graphs fed to a pre-trained GNN model to perform representation learning and then uses a multilayer perceptron model to perform the classification task. We report our experiments' results and show that our model outperforms the static analyzers and the previously used GNN models for the Juliet Java dataset. Thus, we confirm that using a tool that operates at the source code to generate an intermediate graphical representation combined with a highly expressive GNN model can be used as a vulnerability prediction tool that works even for source code that is not compilable.
Item Type:Essay (Master)
Securify, Amsterdam, Netherlands
Faculty:EEMCS: Electrical Engineering, Mathematics and Computer Science
Subject:54 computer science
Programme:Computer Science MSc (60300)
Link to this item:
Export this item as:BibTeX
HTML Citation
Reference Manager


Repository Staff Only: item control page