University of Twente Student Theses


Locating Selective Sweeps with Accelerated Convolutional Neural Networks

Souilljee, M.L. (2021) Locating Selective Sweeps with Accelerated Convolutional Neural Networks.

[img] PDF
Abstract:Discovering how a species adapted to a specific environment over a long period, and how this affected the evolution of that species is of great importance to researchers. A major force that drives the shaping of the evolution of a species is positive selection. Positive selection provides information on how a species evolved, and therefore how the species adapted to its environment. The act of positive selection leaves a selective sweep in the genetic material of a species. The detection and localization of selective sweeps and therefore traces of positive selection is a goal for the development of various methods and tools. For sweep detection, various signature­based methods and tools are developed. Besides these signature­based methods and tools, the use of convolutional neural networks (CNN) for whole­genome sweep detection is not yet explored. This work presents ASDEC (Accurate Sweep Detection Enabled by a CNN), a CNN­based method for whole­genome sweep detection. ASDEC was developed in a user­configurable way and shows great performance against current signature­based methods and tools. ASDEC is, to the best of my knowledge, the first whole­genome CNN­based sweep detection method. For the development of ASDEC, a hand­designed neural architecture search (NAS) was used and led to a final CNN architecture (dubbed SweepNet). ASDEC was compared with signature­based methods and tools such as RAiSD, OmegaPlus, SweeD, and SweepFinder2. ASDEC showed equal to increasing performance for almost all data­sets compared with the top performer signature­based method. The performance evaluation of ASDEC consisted of three different confounding factors bottleneck, migration, and recombination. Besides the use of simulated data­sets, ASDEC can be deployed for real genomic data­sets. A scan of the first chromosome of the human genome (Yoruba population, 1000Genomes dataset) was performed, showing nine different candidate genes. The nine candidate genes discovered by ASDEC have already been identified by previous research to be targets of positive selection. ASDEC provides support for conventional hardware such as multi­core CPUs and GPUs. Extending the usability of ASDEC even further a CNN inference accelerator is implemented and compared with a multi­core CPU in terms of performance. Execution on a state of the art FPGA achieves a 10.7x faster processing than a general­purpose six­core CPU.
Item Type:Essay (Master)
Faculty:EEMCS: Electrical Engineering, Mathematics and Computer Science
Subject:42 biology, 54 computer science
Programme:Embedded Systems MSc (60331)
Link to this item:
Export this item as:BibTeX
HTML Citation
Reference Manager


Repository Staff Only: item control page