Domain independence of Machine Learning and lexicon based methods in sentiment analysis.

Xhymshiti, Meriton

Sentiment analysis is a sub-area in the field of Natural Language Processing (NLP) and it aims at automatically detecting the polarity of an opinion expressed on a textual information. There are two main approaches for analyzing a sentiment and determining its polarity: Lexicon based approaches and Machine Learning approaches. A lexicon-based approach uses a dictionary of words together with a polarity label for each of these words to determine the sentiment polarity of a document (e.g positive, negative or neutral). A machine learning approach trains a classifier in a labelled dataset and predicts sentiments using the model it creates. This paper presents a comparison on the domain independence of a ML system and lexicon-based system for Dutch sentiment analysis. The main contribution of this paper is that we show that in absence of “good-quality” labelled data for training in a specific domain, a lexicon-based system can be as good as a ML system. The dataset that will be used is in Dutch language and consists of large datasets of product and clothing reviews crawled from bol.com and a small dataset of "life memories" of people collected by researchers at the University of Tilburg. Pattern will be used as a lexicon based method and Support Vector Machines as a machine learning method.

Domain independence of Machine Learning and lexicon based methods in sentiment analysis.

Xhymshiti, Meriton (2020)