University of Twente Student Theses

Login

Domain independence of Machine Learning and lexicon based methods in sentiment analysis.

Xhymshiti, Meriton (2020) Domain independence of Machine Learning and lexicon based methods in sentiment analysis.

[img] PDF
336kB
Abstract:Sentiment analysis is a sub-area in the field of Natural Language Processing (NLP) and it aims at automatically detecting the polarity of an opinion expressed on a textual information. There are two main approaches for analyzing a sentiment and determining its polarity: Lexicon based approaches and Machine Learning approaches. A lexicon-based approach uses a dictionary of words together with a polarity label for each of these words to determine the sentiment polarity of a document (e.g positive, negative or neutral). A machine learning approach trains a classifier in a labelled dataset and predicts sentiments using the model it creates. This paper presents a comparison on the domain independence of a ML system and lexicon-based system for Dutch sentiment analysis. The main contribution of this paper is that we show that in absence of “good-quality” labelled data for training in a specific domain, a lexicon-based system can be as good as a ML system. The dataset that will be used is in Dutch language and consists of large datasets of product and clothing reviews crawled from bol.com and a small dataset of "life memories" of people collected by researchers at the University of Tilburg. Pattern will be used as a lexicon based method and Support Vector Machines as a machine learning method.
Item Type:Essay (Bachelor)
Faculty:EEMCS: Electrical Engineering, Mathematics and Computer Science
Subject:54 computer science
Programme:Computer Science BSc (56964)
Link to this item:https://purl.utwente.nl/essays/81995
Export this item as:BibTeX
EndNote
HTML Citation
Reference Manager

 

Repository Staff Only: item control page