University of Twente Student Theses


Managing continuous uncertain data by a probabilistic XML database management system

Scholte, Theodoor (2008) Managing continuous uncertain data by a probabilistic XML database management system.

[img] PDF
Abstract:Database systems are widely used in today’s world. Almost every information system contains one or more databases. From a traditional perspective, databases are used to store precise values about objects in the ’real world’. However, many information is uncertain or imprecise. Consider, for example, sensor applications. Sensors produce uncertain and imprecise data since readings of sensors are inherently imprecise and uncertain. Current database management systems are not able to store, manipulate or query continuous uncertain data unless through user-defined attributes. However, this approach delegates the responsibility of managing the uncertainty associated with the data to the end-user. In many situations, the uncertainty associated with the data is distributed continuously, the data can be represented in terms of a continuous probability distribution. In this thesis, we present an extension to an existing probabilistic data model, resulting in a data model which is capable of storing continuous uncertain data in XML documents.We give a sound semantical foundation to this data model. The probabilistic XML data model is based on the probabilistic tree. In the probabilistic tree, elements and subtrees can be associated with probabilities. Our extension to the probabilistic XML data model extends the probabilistic XML data model in such a way that probability density functions can be associated with elements. Instead of enumerating explicitly the data values with their associated probabilities, a probability density function represents a continuous probability distribution in terms of integrals, it can represent the probability that an element attains a value on a specific interval. In order to query this data, we present a query language containing query operations that are based on probability theory.We show how querying of continuous uncertain data works using a sound semantical foundation. Next, we introduce some new query operators supporting the aggregation of continuous probability distributions using the same semantical foundation. An aggregation operator accepts a number of histograms representing continuous probability distributions, aggregates them and returns one histogram representing a continuous probability distribution. A proof of concept demonstrates the outcomes of our study towards the management of continuous uncertain data. This proof of concept allows the end-user to query XML documents containing continuous uncertain data.
Item Type:Essay (Master)
Faculty:EEMCS: Electrical Engineering, Mathematics and Computer Science
Subject:54 computer science
Programme:Computer Science MSc (60300)
Link to this item:
Export this item as:BibTeX
HTML Citation
Reference Manager


Repository Staff Only: item control page