University of Twente Student Theses
Image Representation Learning with Masked Image Modeling Pre-training in Vision Mamba State Space Models
Duyum, Arda (2024) Image Representation Learning with Masked Image Modeling Pre-training in Vision Mamba State Space Models.
PDF
16MB |
Abstract: | Vision Mamba, recognized for its computational and memory efficiency, addresses the need for environmentally sustainable machine learning models. However, it faces challenges in scalability and stability, particularly with large-scale visual tasks such as ImageNet-1k. This paper improves Vision Mamba by integrating Masked Auto-encoders (MAEs) to enhance image representation learning. Specifically, three masking strategies—random, block, and center masking—were implemented and their impact on the model’s performance was evaluated. Experiments demonstrate that block masking achieves the highest Structural Similarity Index Measure (SSIM) values, indicating superior image reconstruction quality, while center masking delivers the highest classification accuracy, reaching approximately 0.26 by epoch 20. Conversely, random masking performed the worst in both metrics. |
Item Type: | Essay (Bachelor) |
Faculty: | EEMCS: Electrical Engineering, Mathematics and Computer Science |
Subject: | 54 computer science |
Programme: | Computer Science BSc (56964) |
Link to this item: | https://purl.utwente.nl/essays/100979 |
Export this item as: | BibTeX EndNote HTML Citation Reference Manager |
Repository Staff Only: item control page