University of Twente Student Theses

As of Friday, 8 August 2025, the current Student Theses repository is no longer available for thesis uploads. A new Student Theses repository will be available starting Friday, 15 August 2025.

Optimizing the Computational Efficiency of Fine-tuning and Inference for Large Language Models

Nguyen, L.G.K (2025) Optimizing the Computational Efficiency of Fine-tuning and Inference for Large Language Models.

PDF
3MB

Abstract:	Large language models (LLMs) have achieved remarkable performance across a wide range of natural language processing tasks. However, their increasing scale poses significant challenges in fine-tuning, particularly when optimizing for long-context scenarios under constrained computational resources. While various parallelization strategies have been explored, existing approaches often rely on external libraries and are designed for large-scale multi-GPU clusters, making them impractical for resource-limited environments. This research introduces a design novelty that utilizes a 2D parallelism approach, combining Fully Sharded Data Parallelism (FSDP) and Tensor Parallelism (TP) for fine-tuning Llama 3.x models using Low-Rank Adaptation (LoRA) in agentic applications where extended context length is critical. Unlike existing methods, our approach is implemented purely in PyTorch, avoiding dependencies on external parallelization frameworks like FairScale or DeepSpeed. Additionally, we focus on optimizing parallelism specifically for fine-tuning rather than pre-training, with an emphasis on prioritizing sequence length over batch size—an underexplored area in the literature. Another key innovation is the integration of the 2D parallelism paradigm into LoRA adapters' weights, which, to our knowledge, has not been systematically studied. Finally, we develop an efficient, zero-redundant model loading mechanism that is both GPU- and CPU-efficient for distributed FSDP-TP setups. By addressing these gaps, our work aims to make large-scale fine-tuning more computationally efficient and accessible in constrained environments.
Item Type:	Essay (Master)
Clients:	Aalto University, Espoo, Finland System 2 AI, Helsinki, Finland
Faculty:	EEMCS: Electrical Engineering, Mathematics and Computer Science
Subject:	54 computer science
Programme:	Computer Science MSc (60300)
Link to this item:	https://purl.utwente.nl/essays/106461
Export this item as:	BibTeX EndNote HTML Citation Reference Manager

Show download statistics for this publication

Repository Staff Only: item control page