University of Twente Student Theses

Login

B-epsilon-tree and cache-oblivious lookahead array: a comparative study of two write-optimised data structures

Khavrona, Oleh-Yevhen (2021) B-epsilon-tree and cache-oblivious lookahead array: a comparative study of two write-optimised data structures.

[img] PDF
668kB
Abstract:The ever-growing amounts of data stored in the world require efficient and fast data structures to store and process it. Due to the large size of such massive data sets, the data structures that operate on them grow so large that they can no longer fit in main memory. Thus, the number of I/O operations between fast main memory and slow disk becomes the performance bottleneck of these data structures. To properly assess their performance, these data structures are analysed in the external memory model that puts emphasis on the number of blocks transferred between main memory and disk. Multiple data structures and their variations were developed in the external memory model to optimise the number of block transfers, among which the B-tree is the most well-known one. One of the research areas related to designing data structures in the external memory model has been focused on making data structures that keep the same search performance as the B-tree but asymptotically improve the speed of writes. Despite extensive theoretical results in the area, little experimental data about performance of such write-optimised data structures is available. In this research study, we analysed two write-optimised data structures - the B-epsilon-tree and cache-oblivious lookahead array (COLA) - and performed experiments to determine which data structure performs better under which conditions. As our results show, the COLA has much better write speeds than the B-epsilon-tree when inserted elements are not sorted, but achieves worse results when the data is sorted either in the descending or ascending order. Point queries are faster in the B-epsilon-tree, which makes it a better choice for workloads that require more querying than updating data. Lastly, the support of an efficient read-and-update operation and more streamlined experience of implementing the B-epsilon-tree compared to the COLA make it an even more favourable data structure to consider for a use in data storage systems.
Item Type:Essay (Bachelor)
Faculty:EEMCS: Electrical Engineering, Mathematics and Computer Science
Subject:54 computer science
Programme:Computer Science BSc (56964)
Link to this item:https://purl.utwente.nl/essays/87368
Export this item as:BibTeX
EndNote
HTML Citation
Reference Manager

 

Repository Staff Only: item control page