University of Twente Student Theses


Extract offender information from text

Rens, Eduard (2018) Extract offender information from text.

[img] PDF
Abstract:Fraudehelpdesk requested to research the possibility to automatically extract their gathered data about fraud-incident. To accomplish the request an approach was defined which is based on machine Learning Techniques for Natural Language Processing. Most of the data is stored as Text from Email exchanges between the informant and Fraudehelpdesk. Since Fraudehelpdesk is in the Netherlands most of the text is written in Dutch and a small part in English. The chosen methods in this research will be described as well as their purpose to form an application called offender information extractor. Named Entity Recognition (NER) and Part of Speech tagging were used to understand the structure of the text and to detect named entities as possible offender information. The usage of a self-made Clause Information Extractor helped to structure the data into clauses in which relation and facts are created. Furthermore manual annotation on actual data made it possible to create helpful information to distinguish offender information as well as to increase the NER performance to an acceptable detection rate on the actual data with the help of bootstrapping. Afterwards Rules were used to extract offender information, which were compared with existing data that was provided from Fraudehelpdesk.
Item Type:Essay (Master)
SafeCin, Apeldoorn, Netherlands
Faculty:EEMCS: Electrical Engineering, Mathematics and Computer Science
Subject:54 computer science
Programme:Interaction Technology MSc (60030)
Link to this item:
Export this item as:BibTeX
HTML Citation
Reference Manager


Repository Staff Only: item control page