Digital Repository

Workload-Driven Adaptation of Language Models for Enterprise Text-to-SQL

Show simple item record

dc.contributor.advisor MONTEIRO, JOY
dc.contributor.author TALREJA, ASHISH
dc.date.accessioned 2026-05-13T12:02:02Z
dc.date.available 2026-05-13T12:02:02Z
dc.date.issued 2026-05
dc.identifier.citation 99 en_US
dc.identifier.uri http://dr.iiserpune.ac.in:8080/xmlui/handle/123456789/10965
dc.description.abstract Recent advances in Large Language Models (LLMs) have significantly improved the ability to translate natural language questions (NLQs) into executable SQL queries (this task is often referred to as Text-to-SQL or NL-to-SQL). While many studies focus on general purpose benchmarks, enterprise environments present a distinct setting in which databases are accompanied by historical SQL query logs (a collection of SQL queries that were fired on the database), commonly referred to as SQL workloads. These workloads capture domain specific query patterns and represent a valuable yet often underutilized source of supervision for improving Text-to-SQL systems. However, in practical enterprise deployments, labeled workloads are typically scarce, often consisting of only a few dozen NLQ–SQL pairs. Existing approaches predominantly rely on in-context learning (ICL) with proprietary LLMs to exploit such limited data. Although effective, these approaches incur high operational cost due to repeated API calls and, more often than not, high inference latency too. In this thesis, we investigate whether open-source Text-to-SQL task-aware Small Language Models (SLMs) can be adapted more efficiently using the available workload data. We propose a workload-adaptive strategy that systematically selects the most suitable combination of model class (LLM vs. SLM) and learning paradigm among three alternatives: ICL (through few-shot prompting) with an LLM, supervised fine-tuning (SFT) of an SLM on the workload, and SFT augmented with synthetic data generated by an LLM. The strategy relies on workload-based diagnostic evaluation to guide this selection. Experiments conducted on fourteen databases under simulated scarce-workload conditions demonstrate that the proposed strategy consistently identifies effective adaptation pathways. In the majority of cases, fine-tuned task-aware SLMs outperform ICL-based LLM approaches while achieving significantly lower inference latency. When the available workload is insufficient for effective fine-tuning, augmenting the workload with carefully generated synthetic data further improves performance. These results highlight the potential of workload-driven adaptation as a practical and efficient approach for deploying Text-to-SQL systems in enterprise environments. en_US
dc.language.iso en en_US
dc.subject Text-to-SQL en_US
dc.subject Large Language Models en_US
dc.subject Structured Query Language en_US
dc.subject Databases en_US
dc.subject Natural Language Processing en_US
dc.subject SQL Workload en_US
dc.title Workload-Driven Adaptation of Language Models for Enterprise Text-to-SQL en_US
dc.type Thesis en_US
dc.description.embargo No Embargo en_US
dc.type.degree BS-MS en_US
dc.contributor.department Dept. of Data Science en_US
dc.contributor.registration 20191222 en_US


Files in this item

This item appears in the following Collection(s)

  • MS THESES [2013]
    Thesis submitted to IISER Pune in partial fulfilment of the requirements for the BS-MS Dual Degree Programme/MSc. Programme/MS-Exit Programme

Show simple item record

Search Repository


Advanced Search

Browse

My Account