University of Twente Student Theses

Login

Enhancing Sequential Visual Place Recognition With Foundational Vision Model and Spatio-Temporal Feature mixing

Chirca, Lucian (2024) Enhancing Sequential Visual Place Recognition With Foundational Vision Model and Spatio-Temporal Feature mixing.

[img] PDF
22MB
Abstract:Research into Visual Place Recognition (VPR) is useful in fields such as robotics, where a robot needs to navigate an environment and assess if it has seen a place before. To do this, we can use a sequence of image frames to be able to better assess the location where they were taken. However, research in this direction suffers from a lack of training data, poor performance when trained on small datasets, poor cross-dataset performance and limited research into the use of temporal information. Recently, research into image to image visual place recognition has shown convincingly the ability of foundation vision models to have high same-dataset and cross-dataset performance, even when fine-tuned with limited data. This work focuses on bringing the benefits of foundation vision models from the field of image to image VPR to the field of sequence to sequence VPR and expands on the existing methods of generating spatio-temporal image sequence descriptors and shows that our methods outperform all previous state of the art methods in sequence to sequence VPR on 3 major datasets: MSLS, Nordland and Oxford RobotCar. We expand the MLP Mixer model architecture to generate spatio-temporal descriptors and we perform novel research into the use of the DINOv2 foundation model as a backbone in visual place recognition for the goal of matching image sequences. Consequently, we show that our model performs well with limited training data, we show the highest same-dataset and cross-dataset performance in all our experiments, compared to previous state of the art in sequence to sequence VPR, thus solving the problems of the current state of the art. As a result of this research, simultaneous localization and mapping (SLAM) systems \cite{slam}, such as robot navigation, will be able to better navigate their environment since our models output the correct result more often than previous methods.
Item Type:Essay (Master)
Faculty:EEMCS: Electrical Engineering, Mathematics and Computer Science
Subject:54 computer science
Programme:Computer Science MSc (60300)
Link to this item:https://purl.utwente.nl/essays/103313
Export this item as:BibTeX
EndNote
HTML Citation
Reference Manager

 

Repository Staff Only: item control page