University of Twente Student Theses


Automatic image caption generation for digital cultural emages collections

Waveren, S. van (2018) Automatic image caption generation for digital cultural emages collections.

[img] PDF
Abstract:Recent years have witnessed considerable growth of the volume of digital collections, which has lead to an increasing demand for automated techniques that support the management, navigation and search of these collections. As machine learning techniques are advancing, it becomes feasible to automatically generate image captions. However, one of the main challenges that needs to be addressed is to create captions that include higher-level information, such as the event or location shown in an image. Recently, the automatic image caption generation problem has been formulated as a translation problem. However, state-of-the-art on image captioning models’ captions are limited to low-level description of the image itself. In this work, we assume that images and text naturally co-occur and explore the feasibility of including title information in a pretrained state-of-the-art image captioning model using OCLC’s CONTENTdm data. By using a combined objective function based on both title and image, we give a proof-of-concept for compressing image and title features with an autoencoder, so that can be used as input for a pretrained image caption generation model. Although the results are mixed, this thesis provides initial insights into the automatic generation of higher-level image captions.
Item Type:Essay (Master)
Faculty:EEMCS: Electrical Engineering, Mathematics and Computer Science
Subject:54 computer science
Programme:Interaction Technology MSc (60030)
Link to this item:
Export this item as:BibTeX
HTML Citation
Reference Manager


Repository Staff Only: item control page