University of Twente Student Theses


Customer segmentation and enrichment using expectation maximization algorithm

Kumar, Niveditha (2020) Customer segmentation and enrichment using expectation maximization algorithm.

[img] PDF
Abstract:Targeting customers based on their interest in order to show personalised advertisements helps companies to improve their revenue. In the retail setting, for showing personalised advertisements customer segmentation is done. Existing customer segmentation is carried out at Albert Heijn (AH) based on business rules. Existing customer segmentation at AH makes use of only a few features. Manually adjusting the threshold based on a feature can add customers that are not necessarily interested in a product into the segment. The goal of the project is to add additional customers to existing customer segments based on the Expectation Maximization (EM) algorithm. Enrichment of the segment is done by adding additional features. It is assumed that customers that are interested in organic products and other customers are drawn from two different Gaussian distributions. The customers are mathematically represented as realizations of random variables in a sample space consisting of all the features and underlying class labels. Based on the mathematical formulation and assumptions, the algorithm helps to classify if a customer is interested in organic products or not. In order to show that adding additional features does help to enrich the segment, numerical experiments are carried out on the synthetic data set. The EM algorithm suffers from locally optimal solutions and the final parameter estimates depend on the starting parameters. So, numerical experiments are set up with different initialization of the parameters of the distribution. Based on the numerical experiments a number of observations is highlighted. The first observation is that adding additional features does help to enrich the segment. It is also observed that the EM algorithm fails to differentiate between completely overlapping clusters. The last observation is that the EM algorithm works best when initialized based on business rules. The EM algorithm is further applied on the AH data set. Based on visualising the data it can be observed that that data points are not linearly separable. Similar to the numerical experiments, it is observed that when features that are highly correlated to the organic segment are added, the algorithm is able to decrease type 1 and type 2 error and capture most of the relevant customers in order to enrich the segment. When features that are less correlated to the organic segment are added, the algorithm gives a higher type 1 and type 2 error. The problem was formulated mathematically and a framework is provided in order to segment and enrich existing customer segments. It can be concluded that adding additional features that are correlated with the organic customers based on business rules helps to enrich the segment. The EM algorithm is set up which is a soft clustering algorithm and the output is easy to interpret. Further improvements with respect to initialization of the parameters, can be done based on the results obtained via A/B testing. A clustering based recommender system is also recommended as future research.
Item Type:Essay (Master)
Faculty:EEMCS: Electrical Engineering, Mathematics and Computer Science
Subject:31 mathematics
Programme:Applied Mathematics MSc (60348)
Link to this item:
Export this item as:BibTeX
HTML Citation
Reference Manager


Repository Staff Only: item control page