The design and prototyping of an ontology for integrating citizen science datasets

Abstract:Citizen Science is an approach to science that uses the general public in conducting scientific studies about a phenomenon or an occurrence in nature. Citizen Science makes room for the general public to measure, map and record occurrences of events on the earth's surface. These activities generate the various data and information. Most importantly, natural and environmentaldatasets resulting from Citizen Science projects have several qualities which can be used to increase scientific knowledge and to aid in the scientific knowledge discovery. Therefore, different efforts have been made to use such information. It is evidentthat potential knowledge and information can be obtainedthrough the integration of the different datasets from the different citizen science programs. The integration of these datasets is mostly a challenge due to non-interoperability and incompatibilityamong the different datasets. These challenges most often come from semi-structured heterogeneous data sources. An essential requirement for Citizen Science communities appears to be a standard medium to manage the generated data and allow to integrate these datasets with other datasets for sharing and reuse. This research seeks to propose a solution for solving and managing the non-interoperability and incompatibility among Citizen Science datasetsby building an ontology for data integration in Citizen Science. The design of the citizen science ontology for data integration was developedby the fusion of the IEEE standard for software development and the Generic Ontology Development Framework. The ontology was built using both spatial and non-spatialrelations in Citizen Science for mapping concepts and knowledge. It was finally implemented in an OWL format. The Citizen Science ontology serves as a surrogate for structuring and modelling different datasets to have a commonstructure to make them compatibleand interoperable. The designed ontology was used to model different datasets in Citizen Science using the Karma Data Integration tool. The modelled and combined dataset was tested using SPARQL for the different information contained in the different datasets. The results proved that the ontology is a potential tool for modelling and transforming different datasets to make them compatible with each other in the Citizen Science domain.
