University of Twente Student Theses


A big data approach to model bird occurrence from crowd-sourced data

Kondi, Vijayudu (2021) A big data approach to model bird occurrence from crowd-sourced data.

Link to full-text:
(only accessible for UT students and staff)
Abstract:There are many endangered bird species to be conserved. For conservation planning it is important to understand the species occurrence and their habitat preference through space and time. Range maps are one of the important tools to understand the species occurrence. Range maps are generally obtained by statistical models. Input data for such models is usually obtained from traditional bird surveys where the rules and regulations of the survey are predefined and strictly followed. In this study we explore the possibilities of generating seasonal and annual range maps for 213 species using presence-only type of crowd-sourced data collected within the Netherlands (WNL) from 2010-19. Crowd sourcing has immense potential for collecting bird observation at huge spatio-temporal extents. Unlike traditional bird surveys, observers in crowd-sourced programs are free to choose where to visit, when to visit, what to find, and what to report. Such freedom of choice leads to creation of voluminous data but brings different types of variability in the collected data. Types of variability that are common in crowd-sourced data are: variability in observer effort through space and time, variability in observer skills, variability in detectability of the species, and variability in report likelihood of the species. The aim of this study is to account for the variability in the selected dataset and generate four seasonal range maps and one annual range map for any of the selected 213 species. Few metrics are designed to account for the variability in the data, they are: weighted observer days, weighted encounter days, and weighted observed hitrate. To generate a seasonal range map for a selected (species, season) pair, the designed model automatically selects two sets of spatial units or blocks. First set represents the blocks where the species is supposed to be present and the second set represents the blocks where the species are supposed to be absent during the selected season. Withe these two sets and 306 explanatory variables a Random Forest classifier (RF) is trained and range map for the selected (species, season) is generated. By repeating the same procedure, range maps for other three seasons are obtained. A set of conditions are determined and used to combine the seasonal range maps and obtain the annual range map. The winter and summer range maps generated by the model are compared against the range maps obtained from Sovon Atlas. Performance of the model is assessed based on classification accuracy, precision, recall, and F1 score. Overall, accounting for variability in detectability and report likelihood from the observations data was a big challenge in this study. The model has performed better for species that occur inland compared to the species that occur along the coastal waters. Predictions for species that has limited occurrence in spacewere more optimistic compared to the occurrence shown in validation maps.
Item Type:Essay (Master)
Faculty:ITC: Faculty of Geo-information Science and Earth Observation
Programme:Geoinformation Science and Earth Observation MSc (75014)
Link to this item:
Export this item as:BibTeX
HTML Citation
Reference Manager


Repository Staff Only: item control page