University of Twente Student Theses

Login

Wagon Number Localization Using a Vision-Language Model

Mitev, Ivan (2025) Wagon Number Localization Using a Vision-Language Model.

[img] PDF
1MB
Abstract:This research explores the feasibility of using the Contrastive Language-Image Pretraining (CLIP) Vision-Language Model (VLM) for reading Unique Identification Codes (UICs) off of train wagons. The paper examines state of the art solutions to the problem of localizing wagon numbers on trains and tests CLIP’s capabilities to be fine tuned for text localization. More specifically, it explains how four different fine-tuned versions of CLIP were trained and presents the results of their evaluation for the UIC localization. In the end, it is evaluated if CLIP shows promise in solving this task and should be pursued further or other methods are better suited.
Item Type:Essay (Bachelor)
Faculty:EEMCS: Electrical Engineering, Mathematics and Computer Science
Subject:54 computer science
Programme:Computer Science BSc (56964)
Link to this item:https://purl.utwente.nl/essays/105067
Export this item as:BibTeX
EndNote
HTML Citation
Reference Manager

 

Repository Staff Only: item control page