University of Twente Student Theses


Hardware acceleration of sweep detection using Clash : Computer Architecture for Embedded Systems

Bollen, L.M. (2021) Hardware acceleration of sweep detection using Clash : Computer Architecture for Embedded Systems.

[img] PDF
Abstract:With technologies for whole genome sequencing becoming more affordable and common place in bio-informatics, there has been an increasing interest in processing genomes to extract valuable information. The hitchhiking effect is the tagging along of neighbouring mutations when a beneficial mutation is selected due to natural selection. This leads to a local reduction in genetic variation known as a selective sweep. Multiple tools have been developed that can process whole genome data to detect selective sweeps, each with their pros and cons. Sweep Detector [1] finds the location of the most likely recent selective sweep based on changes in the Site Frequency Spectrum (SFS). As shown during the SARS-CoV-2 pandemic, detecting and localizing selective sweeps can help us track the evolution of viruses and answer important questions regarding its evolution. One of the main problems with selective sweep detection is that depending on data set size and required precision, processing data sets is very time consuming. This thesis presents a flexible device independent hardware design for FPGAs that can aid in significantly accelerating selective sweep detection. A combination of profiling and manual analysis was used to determine which part of Sweep Detector should be accelerated. Sweep Detector makes use of double precision floating point operators which require lot of resources on FPGA, we found that up to 64 bit fixed point values could not be used as replacement. However, Using single precision floating point operators did not significantly affect the precision of Sweep Detector. The accelerator heavily relies on data access to arrays that can not be stored on chip, for this thesis we assumed ideal memory to first focus on the design of the accelerator. The likelihood calculations performed by Sweep Detector contain four selected loops which contain loop carried dependencies and data dependent repetition. Loop carried dependencies make it impossible to concurrently process multiple iterations of the same loop and data dependent repetition makes batch processing of iterations inefficient. This has been solved by unrolling the outermost loop which contains neither of these properties and by combining custom variations of the merge and branch circuit presented by Styles et al. [2] with a decentralised control method that aims for maximum throughput of this accelerator regardless of configuration. By utilizing the high level abstractions offered by Clash we created a parametrized design that allows us to easily control the level of parallelism in the design based on the resources available on the target device to maximize the performance. Synchronisation of data is ensured on a type level such that the user does not need to worry about synchronisation issues. Furthermore a decentralised control method is applied that aims to maximize the throughput of this accelerator regardless of configuration. The design was configured for a Cyclone V FPGA that allowed for 14 times parallelism, compared to the C implementation we found a maximum theoretical speedup of up to 93 times for the same functionality on a single core Xeon Gold 6126 processor. At this rate the accelerator would be processing 195 gigabits of data per second from these large arrays, further showing dire need for a high performance tailored memory management system.
Item Type:Essay (Master)
Faculty:EEMCS: Electrical Engineering, Mathematics and Computer Science
Subject:42 biology, 50 technical science in general, 53 electrotechnology, 54 computer science
Programme:Embedded Systems MSc (60331)
Link to this item:
Export this item as:BibTeX
HTML Citation
Reference Manager


Repository Staff Only: item control page