Information retrieval by semantically grouping search query data

Florijn, W.J.

Query data analysis is a time-consuming task. Currently, a method exists where word (combinations) in queries are labelled by using an information collection consisting of regexes. Because the information collection does not contain regexes from never-before seen domains, the method heavily relies on manual work, resulting in decreased scalability. Therefore, a machine-learning based method is proposed in order to automate the annotation of word (combinations) in queries. This research searches for the optimal configuration of a pre-processing method, word embedding model, additional data set and classifier variant. All configurations have been examined on multiple data sets, and appropriate performance metrics have been calculated. The results show that the optimal configuration consists of omitting pre-processing, training a fastText model and enriching word features using additional data in combination with a recurrent classifier. We found that an approach using machine learning is able to obtain excellent performance on the task of labelling word (combinations) in search queries.

Information retrieval by semantically grouping search query data

Author(s): Florijn, W.J. (2019)

Abstract:

Document(s):