University of Twente Student Theses

Login

An API for intelligent deployment of numerical calculations

Perera, Navoda (2020) An API for intelligent deployment of numerical calculations.

[img] PDF
3MB
Abstract:Linear algebra is widely used in many scientific and engineering applications, where Basic Linear Algebra Subroutines (BLAS) and Linear Algebra PACKage (LAPACK) libraries are generally used in the back end for efficient calculations. There are hardware optimized versions of BLAS and LAPACK provided by vendors such as Intel Math Kernel Library (MKL) for Intel processors and Nvidia CuBLAS/CuSolver for Nvidia Graphical Processing Units (GPUs) respectively. As most of these routines can benefit from parallel processing, they have been shown to yield better performance on platforms that exploit parallelism such as GPUs and Field Programmable Gate Arrays (FPGAs) than general purpose processors. However, exploring these gains is not a trivial task in terms of software implementation. This project explores the concept of an Application Programming Interface (API) for BLAS and LAPACK routines which abstracts the hardware specific details from the user and provides a convenient interface for deployment. A compute node with a CPU and a GPU connected through a PCI Express link is considered in the study. In addition to the convenience of deployment, it employs a model based approach to predict execution times and data transfer overheads based on input/output data sizes. These predictions are used in the optional dynamic deployment mode, where the deployment of a given routine will be decided at run time by the API. In addition, a data flow analysis method, which is also based on the mentioned prediction models, is presented to further improve the execution time of a given application code. The API is shown to perform as expected with user-specified deployment, dynamic deployment and the data flow analysis modes. The two latter modes use performance models derived through an empirical approach to predict execution and data transfer times. Although considerable variations in execution time are observed for arbitrary sized inputs/outputs, all models are shown to predict with mean absolute percentage errors less than 20%.
Item Type:Essay (Master)
Faculty:EEMCS: Electrical Engineering, Mathematics and Computer Science
Subject:53 electrotechnology, 54 computer science
Programme:Embedded Systems MSc (60331)
Link to this item:http://purl.utwente.nl/essays/80982
Export this item as:BibTeX
EndNote
HTML Citation
Reference Manager

 

Repository Staff Only: item control page