Author(s): Huang, H. (2025)
Abstract:
With the growing number and variety of products in supermarkets, managing retail shelves manually is time-consuming and prone to human error, making it difficult for staff to recognize and organize them efficiently. This work focuses on detecting and grouping similar or identical products from shelf images. In this project, we proposed a novel unsupervised, three-stage framework for grouped product recognition in supermarket environments, consisting of: (a) grocery product detection, (b) product characterization, and (c) grouped product recognition. For grocery product detection, we employ YOLOv5 to detect and locate each grocery object, and for product characterization, we extract multiple types of features, including CNN-based deep features, color histograms, shape and texture information, text from packaging, and product position on the shelf. Finally, in the grouped product recognition stage, we apply unsupervised clustering algorithms, including OPTICS and Agglomerative Clustering, to group similar products. We also evaluate the effectiveness of recent Vision Language Models (VLMs) for product detection and localization, and compare their performance with our proposed framework. Experimental results on public and real Dutch supermarket datasets show that the combination of CNN, color, and spatial features achieved the highest clustering performance, with an ARI of 0.7894, NMI of 0.8020, and Silhouette Score of 0.0358 on the Grocery Products dataset.
Document(s):
Huang_MA_EEMCS.pdf