University of Twente Student Theses
Locating Selective Sweeps with Accelerated Convolutional Neural Networks
Souilljee, M.L. (2021) Locating Selective Sweeps with Accelerated Convolutional Neural Networks.
PDF
3MB |
Abstract: | Discovering how a species adapted to a specific environment over a long period, and how this affected the evolution of that species is of great importance to researchers. A major force that drives the shaping of the evolution of a species is positive selection. Positive selection provides information on how a species evolved, and therefore how the species adapted to its environment. The act of positive selection leaves a selective sweep in the genetic material of a species. The detection and localization of selective sweeps and therefore traces of positive selection is a goal for the development of various methods and tools. For sweep detection, various signaturebased methods and tools are developed. Besides these signaturebased methods and tools, the use of convolutional neural networks (CNN) for wholegenome sweep detection is not yet explored. This work presents ASDEC (Accurate Sweep Detection Enabled by a CNN), a CNNbased method for wholegenome sweep detection. ASDEC was developed in a userconfigurable way and shows great performance against current signaturebased methods and tools. ASDEC is, to the best of my knowledge, the first wholegenome CNNbased sweep detection method. For the development of ASDEC, a handdesigned neural architecture search (NAS) was used and led to a final CNN architecture (dubbed SweepNet). ASDEC was compared with signaturebased methods and tools such as RAiSD, OmegaPlus, SweeD, and SweepFinder2. ASDEC showed equal to increasing performance for almost all datasets compared with the top performer signaturebased method. The performance evaluation of ASDEC consisted of three different confounding factors bottleneck, migration, and recombination. Besides the use of simulated datasets, ASDEC can be deployed for real genomic datasets. A scan of the first chromosome of the human genome (Yoruba population, 1000Genomes dataset) was performed, showing nine different candidate genes. The nine candidate genes discovered by ASDEC have already been identified by previous research to be targets of positive selection. ASDEC provides support for conventional hardware such as multicore CPUs and GPUs. Extending the usability of ASDEC even further a CNN inference accelerator is implemented and compared with a multicore CPU in terms of performance. Execution on a state of the art FPGA achieves a 10.7x faster processing than a generalpurpose sixcore CPU. |
Item Type: | Essay (Master) |
Faculty: | EEMCS: Electrical Engineering, Mathematics and Computer Science |
Subject: | 42 biology, 54 computer science |
Programme: | Embedded Systems MSc (60331) |
Link to this item: | https://purl.utwente.nl/essays/88618 |
Export this item as: | BibTeX EndNote HTML Citation Reference Manager |
Repository Staff Only: item control page