University of Twente Student Theses
Information retrieval by semantically grouping search query data
Florijn, W.J. (2019) Information retrieval by semantically grouping search query data.
PDF
989kB |
Abstract: | Query data analysis is a time-consuming task. Currently, a method exists where word (combinations) in queries are labelled by using an information collection consisting of regexes. Because the information collection does not contain regexes from never-before seen domains, the method heavily relies on manual work, resulting in decreased scalability. Therefore, a machine-learning based method is proposed in order to automate the annotation of word (combinations) in queries. This research searches for the optimal configuration of a pre-processing method, word embedding model, additional data set and classifier variant. All configurations have been examined on multiple data sets, and appropriate performance metrics have been calculated. The results show that the optimal configuration consists of omitting pre-processing, training a fastText model and enriching word features using additional data in combination with a recurrent classifier. We found that an approach using machine learning is able to obtain excellent performance on the task of labelling word (combinations) in search queries. |
Item Type: | Essay (Master) |
Clients: | New Media, Hengelo, The Netherlands |
Faculty: | EEMCS: Electrical Engineering, Mathematics and Computer Science |
Subject: | 54 computer science |
Programme: | Computer Science MSc (60300) |
Link to this item: | https://purl.utwente.nl/essays/77413 |
Export this item as: | BibTeX EndNote HTML Citation Reference Manager |
Repository Staff Only: item control page