University of Twente Student Theses
Wagon Number Localization Using a Vision-Language Model
Mitev, Ivan (2025) Wagon Number Localization Using a Vision-Language Model.
PDF
1MB |
Abstract: | This research explores the feasibility of using the Contrastive Language-Image Pretraining (CLIP) Vision-Language Model (VLM) for reading Unique Identification Codes (UICs) off of train wagons. The paper examines state of the art solutions to the problem of localizing wagon numbers on trains and tests CLIP’s capabilities to be fine tuned for text localization. More specifically, it explains how four different fine-tuned versions of CLIP were trained and presents the results of their evaluation for the UIC localization. In the end, it is evaluated if CLIP shows promise in solving this task and should be pursued further or other methods are better suited. |
Item Type: | Essay (Bachelor) |
Faculty: | EEMCS: Electrical Engineering, Mathematics and Computer Science |
Subject: | 54 computer science |
Programme: | Computer Science BSc (56964) |
Link to this item: | https://purl.utwente.nl/essays/105067 |
Export this item as: | BibTeX EndNote HTML Citation Reference Manager |
Repository Staff Only: item control page