University of Twente Student Theses


Normal map prediction from light field data through deep learning

Veen, M.W. van (2020) Normal map prediction from light field data through deep learning.

[img] PDF
Abstract:Recently the increasing use of augmented reality has demanded for a more realistic integration of synthetic and real world images. One of the ways in which this integration could be improved is by attaining a normal map of the real world which in turn theoretically allows for relighting and physic interaction with computer generated objects. This research aims to explore if deep learning in combination with a light field data can be used to solve this problem by predicting normal maps directly from RGB inputs. To do this two methods for the construction normal maps from RGB inputs have been researched as a baseline. Since normal maps can only be computed from depth maps this is done in two steps. First two methods to compute normal maps from depth data have been implemented and evaluated. Secondly two methods to compute depth maps from RGB light field data have been implemented and evaluated. In order to see if machine learning can be utilized for the prediction of normal maps, a dataset has been created. For this purpose 8 synthetic scenes have been built with a 3d programme. From these scenes 288 light fields have been rendered. Image enhancement methods through the use of gamma, saturation and value adjustments as well as flipping and rotating are used to scale up this data set to 24624 light fields for training (6 scenes), 4104 for validation (1 scene) and 1 for testing (from a novel scene). To determine the deep learning architecture nine experiments have been done to test a large number of network architectures. In these experiments various parameters have been tested as well as different activation and loss functions. Based on these experiments a final architecture has been chosen. In order to evaluate the quality of the network the predicted results are compared with two other methods mentioned above. Here two depth maps are made using stereoscopy and EPInet. Based on these depth maps the Hinterstoisser method is applied to create two normal maps. These maps are compared with the prediction result visually and using a metric called mean of difference in angles (MDA). For the comparison it can be concluded that although the network architecture used in this research produces better normal maps than the combined methods above, it is still far from the ground truth. This would make them difficult to use in any real world application. For improving upon this result it is suggested to look at a higher quality synthetic dataset, possible precomputations and parameter tweaking of both the neural network and light field camera setup.
Item Type:Essay (Master)
Faculty:EEMCS: Electrical Engineering, Mathematics and Computer Science
Subject:02 science and culture in general
Programme:Interaction Technology MSc (60030)
Link to this item:
Export this item as:BibTeX
HTML Citation
Reference Manager


Repository Staff Only: item control page