University of Twente Student Theses

Login

Information retrieval by semantically grouping search query data

Florijn, W.J. (2019) Information retrieval by semantically grouping search query data.

[img]

PDF
989kB

Abstract:	Query data analysis is a time-consuming task. Currently, a method exists where word (combinations) in queries are labelled by using an information collection consisting of regexes. Because the information collection does not contain regexes from never-before seen domains, the method heavily relies on manual work, resulting in decreased scalability. Therefore, a machine-learning based method is proposed in order to automate the annotation of word (combinations) in queries. This research searches for the optimal configuration of a pre-processing method, word embedding model, additional data set and classifier variant. All configurations have been examined on multiple data sets, and appropriate performance metrics have been calculated. The results show that the optimal configuration consists of omitting pre-processing, training a fastText model and enriching word features using additional data in combination with a recurrent classifier. We found that an approach using machine learning is able to obtain excellent performance on the task of labelling word (combinations) in search queries.
Item Type:	Essay (Master)
Clients:	New Media, Hengelo, The Netherlands
Faculty:	EEMCS: Electrical Engineering, Mathematics and Computer Science
Subject:	54 computer science
Programme:	Computer Science MSc (60300)
Link to this item:	https://purl.utwente.nl/essays/77413
Export this item as:	BibTeX EndNote HTML Citation Reference Manager

Show download statistics for this publication

Repository Staff Only: item control page