University of Twente Student Theses

As of Friday, 8 August 2025, the current Student Theses repository is no longer available for thesis uploads. A new Student Theses repository will be available starting Friday, 15 August 2025.

Image Representation Learning with Masked Image Modeling Pre-training in Vision Mamba State Space Models

Duyum, Arda (2024) Image Representation Learning with Masked Image Modeling Pre-training in Vision Mamba State Space Models.

PDF
16MB

Abstract:	Vision Mamba, recognized for its computational and memory efficiency, addresses the need for environmentally sustainable machine learning models. However, it faces challenges in scalability and stability, particularly with large-scale visual tasks such as ImageNet-1k. This paper improves Vision Mamba by integrating Masked Auto-encoders (MAEs) to enhance image representation learning. Specifically, three masking strategies—random, block, and center masking—were implemented and their impact on the model’s performance was evaluated. Experiments demonstrate that block masking achieves the highest Structural Similarity Index Measure (SSIM) values, indicating superior image reconstruction quality, while center masking delivers the highest classification accuracy, reaching approximately 0.26 by epoch 20. Conversely, random masking performed the worst in both metrics.
Item Type:	Essay (Bachelor)
Faculty:	EEMCS: Electrical Engineering, Mathematics and Computer Science
Subject:	54 computer science
Programme:	Computer Science BSc (56964)
Link to this item:	https://purl.utwente.nl/essays/100979
Export this item as:	BibTeX EndNote HTML Citation Reference Manager

Show download statistics for this publication

Repository Staff Only: item control page