University of Twente Student Theses


Automatic Product Name Recognition from Short Product Descriptions

Pazhouhi, E. (2018) Automatic Product Name Recognition from Short Product Descriptions.

[img] PDF
Abstract:This thesis studies the problem of product name recognition from short product descriptions. To approach the problem, we define it as a classification problem. Next we investigate and compare the performance of a set of hybrid solutions that combine machine learning and gazetteer-based approaches. We study a solution space that uses four learning models: linear and non-linear SVC, Random Forest, and AdaBoost. We divide the features into four groups: token-level features, document-level features, gazetteer-based features and frequency-based features. To evaluate the solutions, we develop a machine learning framework that automatically selects the optimal number of most relevant features, optimizes the hyper-parameters of the learning models, trains the learning models, and evaluates them. We conduct a set of experiments and based on the results, we answer the research questions of this thesis. Specifically, we determine (1) which learning models are more effective, (2) which feature groups contain the most relevant features, (3) what is the contribution of different feature groups to the overall performance, (4) how gazetteer-based features are incorporated into the machine learning solutions, and (5) how effective they are, (6) what the role of hyper-parameter optimization is and (7) which models are more sensitive to the hyper-parameter optimization.
Item Type:Essay (Master)
Faculty:EEMCS: Electrical Engineering, Mathematics and Computer Science
Subject:54 computer science
Programme:Interaction Technology MSc (60030)
Link to this item:
Export this item as:BibTeX
HTML Citation
Reference Manager


Repository Staff Only: item control page