Efficient Finetuning of LLMs for Domain-Specific Code Generation

SARKHEL, BARISH

DR Home
→
THESES & PROJECT REPORTS
→
MS THESES
→
View Item

dc.contributor.advisor	Kumar, Sudhir
dc.contributor.author	SARKHEL, BARISH
dc.date.accessioned	2025-05-19T03:53:20Z
dc.date.available	2025-05-19T03:53:20Z
dc.date.issued	2025-05
dc.identifier.citation	109	en_US
dc.identifier.uri	http://dr.iiserpune.ac.in:8080/xmlui/handle/123456789/9950
dc.description.abstract	This work investigates fine-tuning Large Language Models (LLMs) on domain-specific code generation, aiming to enhance their performance in test code generation. The work consists of both small size code dataset and large size code dataset. It explores fine-tuning techniques for different open source large language models (e.g., Llama, Phi). The primary focus of this work was to meticulously prepare the dataset for the task and to explore efficient fine-tuning techniques that would enable the adaptation of our large dataset while operating within the constraints of limited computational resources. This involved curating and structuring the dataset to ensure its relevance and effectiveness, as well as identifying optimization strategies to enhance the fine-tuning process. By carefully balancing model performance with resource efficiency, this work aimed to achieve optimal results despite computational limitations. The primary focus of this work was to explore efficient fine-tuning techniques. Finding the best model and best set of hyperparameters for the datasets we were using was the main objective. The project also focus on the impact of the size and complexity of the dataset in finetuning task. The project seeks to identify effective fine-tuning methods for both the cases. Ultimately, its goal is to deepen our understanding of fine-tuning LLMs and improve their applicability to specific tasks and domains. We also experiment with Retrieval-Augmented Generation (RAG) and Reinforcement Learning, to enhance model performance by improving response accuracy. By integrating all these techniques and methodologies, the project aims to contribute valuable insights into scalable and efficient model training, ultimately enhancing the robustness and practicality of LLMs for real-world applications.	en_US
dc.language.iso	en	en_US
dc.subject	Finetuning	en_US
dc.subject	RAG	en_US
dc.subject	Code Generation	en_US
dc.subject	Large Language Models (LLMs)	en_US
dc.subject	Reinforcement Learning	en_US
dc.title	Efficient Finetuning of LLMs for Domain-Specific Code Generation	en_US
dc.type	Thesis	en_US
dc.description.embargo	No Embargo	en_US
dc.type.degree	BS-MS	en_US
dc.contributor.department	Dept. of Data Science	en_US
dc.contributor.registration	20201242	en_US

Files in this item

Name: 20201242_Barish_S ...

Size: 1.127Mb

Format: PDF

Description: MS Thesis

View/Open

This item appears in the following Collection(s)

MS THESES [2219]
Thesis submitted to IISER Pune in partial fulfilment of the requirements for the BS-MS Dual Degree Programme/MSc. Programme/MS-Exit Programme

Show simple item record

Search Repository

Advanced Search

Browse

All of Repository
This Collection
- Titles
- Authors
- By Advisor
- By Issue Date
- Subjects
- By Type
- By Department

Efficient Finetuning of LLMs for Domain-Specific Code Generation

Files in this item

This item appears in the following Collection(s)

Search Repository

Browse

All of Repository

This Collection

My Account