University of Twente Student Theses

Login

Image Representation Learning with Masked Image Modeling Pre-training in Vision Mamba State Space Models

Duyum, Arda (2024) Image Representation Learning with Masked Image Modeling Pre-training in Vision Mamba State Space Models.

[img] PDF
16MB
Abstract:Vision Mamba, recognized for its computational and memory efficiency, addresses the need for environmentally sustainable machine learning models. However, it faces challenges in scalability and stability, particularly with large-scale visual tasks such as ImageNet-1k. This paper improves Vision Mamba by integrating Masked Auto-encoders (MAEs) to enhance image representation learning. Specifically, three masking strategies—random, block, and center masking—were implemented and their impact on the model’s performance was evaluated. Experiments demonstrate that block masking achieves the highest Structural Similarity Index Measure (SSIM) values, indicating superior image reconstruction quality, while center masking delivers the highest classification accuracy, reaching approximately 0.26 by epoch 20. Conversely, random masking performed the worst in both metrics.
Item Type:Essay (Bachelor)
Faculty:EEMCS: Electrical Engineering, Mathematics and Computer Science
Subject:54 computer science
Programme:Computer Science BSc (56964)
Link to this item:https://purl.utwente.nl/essays/100979
Export this item as:BibTeX
EndNote
HTML Citation
Reference Manager

 

Repository Staff Only: item control page