Digital Repository

Efficient Finetuning of LLMs for Domain-Specific Code Generation

Show simple item record

dc.contributor.advisor Kumar, Sudhir
dc.contributor.author SARKHEL, BARISH
dc.date.accessioned 2025-05-19T03:53:20Z
dc.date.available 2025-05-19T03:53:20Z
dc.date.issued 2025-05
dc.identifier.citation 109 en_US
dc.identifier.uri http://dr.iiserpune.ac.in:8080/xmlui/handle/123456789/9950
dc.description.abstract This work investigates fine-tuning Large Language Models (LLMs) on domain-specific code generation, aiming to enhance their performance in test code generation. The work consists of both small size code dataset and large size code dataset. It explores fine-tuning techniques for different open source large language models (e.g., Llama, Phi). The primary focus of this work was to meticulously prepare the dataset for the task and to explore efficient fine-tuning techniques that would enable the adaptation of our large dataset while operating within the constraints of limited computational resources. This involved curating and structuring the dataset to ensure its relevance and effectiveness, as well as identifying optimization strategies to enhance the fine-tuning process. By carefully balancing model performance with resource efficiency, this work aimed to achieve optimal results despite computational limitations. The primary focus of this work was to explore efficient fine-tuning techniques. Finding the best model and best set of hyperparameters for the datasets we were using was the main objective. The project also focus on the impact of the size and complexity of the dataset in finetuning task. The project seeks to identify effective fine-tuning methods for both the cases. Ultimately, its goal is to deepen our understanding of fine-tuning LLMs and improve their applicability to specific tasks and domains. We also experiment with Retrieval-Augmented Generation (RAG) and Reinforcement Learning, to enhance model performance by improving response accuracy. By integrating all these techniques and methodologies, the project aims to contribute valuable insights into scalable and efficient model training, ultimately enhancing the robustness and practicality of LLMs for real-world applications. en_US
dc.language.iso en en_US
dc.subject Finetuning en_US
dc.subject RAG en_US
dc.subject Code Generation en_US
dc.subject Large Language Models (LLMs) en_US
dc.subject Reinforcement Learning en_US
dc.title Efficient Finetuning of LLMs for Domain-Specific Code Generation en_US
dc.type Thesis en_US
dc.description.embargo No Embargo en_US
dc.type.degree BS-MS en_US
dc.contributor.department Dept. of Data Science en_US
dc.contributor.registration 20201242 en_US


Files in this item

This item appears in the following Collection(s)

  • MS THESES [1969]
    Thesis submitted to IISER Pune in partial fulfilment of the requirements for the BS-MS Dual Degree Programme/MSc. Programme/MS-Exit Programme

Show simple item record

Search Repository


Advanced Search

Browse

My Account