University of Twente Student Theses
Video Text Matching : A Deep Learning Model for Video to Descriptive Text Matching
Anazo, M.A.C. (2024) Video Text Matching : A Deep Learning Model for Video to Descriptive Text Matching.
PDF
3MB |
Abstract: | The goal is to develop a functional multi-modal model that can retrieve short videos based on a text description provided by a user and also give a textual description based on a user-provided video. This will be done by processing textual descriptions and video clips and designing a feature space that will be shared for both text and video, thus enabling the matching of the two data types using contrastive learning. The model is trained and tested on the Microsoft Research Video Description Corpus (MSVD). |
Item Type: | Essay (Bachelor) |
Faculty: | EEMCS: Electrical Engineering, Mathematics and Computer Science |
Subject: | 54 computer science |
Programme: | Computer Science BSc (56964) |
Link to this item: | https://purl.utwente.nl/essays/100861 |
Export this item as: | BibTeX EndNote HTML Citation Reference Manager |
Repository Staff Only: item control page