University of Twente Student Theses

As of Friday, 8 August 2025, the current Student Theses repository is no longer available for thesis uploads. A new Student Theses repository will be available starting Friday, 15 August 2025.

A regression-based Convolutional Neural Network for yield estimation of soybean

Venugopal, Arun (2023) A regression-based Convolutional Neural Network for yield estimation of soybean.

PDF
4MB

Abstract:	Crop yield estimation is essential for decision-making and ensuring food security. This MSc thesis explores the explainability of a regression-based Convolutional Neural Network (CNN) model based on Earth Observation data. This is a relatively underdeveloped area because most Earth observation research is focused on classification tasks. Understanding the model’s behaviour and analysing its performance and result using saliency maps is also essential to figure out if there is a bias in the results. This research focuses on developing an explainable regression-based CNN model for crop yield estimation. The soybean yield is taken as the target to be estimated by the model. The input data is Sentinel-2 imagery, downloaded from Google Earth Engine from 2017 to 2021. A CNN model is trained and focuses on the explainability of the model and how the model behaves or interprets the data to estimate crop yield. The top soybean-producing states from 2017 to 2021 in the United States are taken as the study region. The target yield values at the county level are taken from the United States Department of Agriculture. The areas that are covered by soybean are identified by the Cropland Data Layer. This is added to the input as a mask layer. The dataset is prepared for training and linear regression and CNN models are trained. The results are compared by training one CNN model with the mask layer and the other CNN without the mask layer. The model with the mask layer has better accuracy of 98%, while the model without the mask layer has 72%. Different saliency maps, such as gradCAM, gradient, smoothGrad, Guided Back Propagation, Layerwise Relevance Propagation, Deep Taylor, Integrated Gradients, etc., are generated from the test dataset. These saliency maps are then evaluated by performing a perturbation analysis. The input image is gridded, and these grids are perturbed by providing Gaussian noise based on the order of importance for each grid. The difference in the results of the perturbated input and the true value is compared for all the explainable methods. The area under the curve of each perturbation plot is used to quantify the perturbation analysis for all the patches. The results are also critically analysed spatially with vegetation indices and land use maps to explain why the model focuses on specific regions. This demonstrates that the model focuses on regions with higher vegetation indices without the mask layer as the input. Our findings also show that the mask layer is significant when estimating the yield without bias.
Item Type:	Essay (Master)
Faculty:	ITC: Faculty of Geo-information Science and Earth Observation
Subject:	38 earth sciences, 48 agricultural science, 54 computer science
Programme:	Geoinformation Science and Earth Observation MSc (75014)
Link to this item:	https://purl.utwente.nl/essays/96269
Export this item as:	BibTeX EndNote HTML Citation Reference Manager

Show download statistics for this publication

Repository Staff Only: item control page