University of Twente Student Theses

Login

Improving Performance of Multiple Sequence Alignment through Maximal Exact Match Identification

Wehning, T. (2023) Improving Performance of Multiple Sequence Alignment through Maximal Exact Match Identification.

[img] PDF
495kB
Abstract:Multiple sequence alignment is an integral part in the field of DNA analysis and genomics, and it is necessary in order to properly identify evolutionary patterns as well as functional motifs. However, one of its biggest drawbacks is scalability. Execution times increase rapidly with larger numbers of sequences to be aligned. In this paper a new approach is presented, that takes the concept of seed-and-extend algorithms from pairwise sequence alignment and applies it to multiple sequences. The result is an alignment tool called MEMSA (MEM Extracting Multiple Sequence Aligner), which applies multiple pre-processing steps in order to reduce the search space of alignment. It shows promising results for data sets with a high homology but struggles with genomic sequences that are too divergent. For a data set of 500 MERS genomes, the tool of this paper was able to reduce the execution time for alignment by a factor of 27 while even improving alignment quality slightly.
Item Type:Essay (Bachelor)
Faculty:TNW: Science and Technology
Subject:42 biology, 54 computer science
Programme:Advanced Technology BSc (50002)
Link to this item:https://purl.utwente.nl/essays/96273
Export this item as:BibTeX
EndNote
HTML Citation
Reference Manager

 

Repository Staff Only: item control page