Workload-Driven Adaptation of Language Models for Enterprise Text-to-SQL

TALREJA, ASHISH

DR Home
→
THESES & PROJECT REPORTS
→
MS THESES
→
View Item

dc.contributor.advisor	MONTEIRO, JOY
dc.contributor.author	TALREJA, ASHISH
dc.date.accessioned	2026-05-13T12:02:02Z
dc.date.available	2026-05-13T12:02:02Z
dc.date.issued	2026-05
dc.identifier.citation	99	en_US
dc.identifier.uri	http://dr.iiserpune.ac.in:8080/xmlui/handle/123456789/10965
dc.description.abstract	Recent advances in Large Language Models (LLMs) have significantly improved the ability to translate natural language questions (NLQs) into executable SQL queries (this task is often referred to as Text-to-SQL or NL-to-SQL). While many studies focus on general purpose benchmarks, enterprise environments present a distinct setting in which databases are accompanied by historical SQL query logs (a collection of SQL queries that were fired on the database), commonly referred to as SQL workloads. These workloads capture domain specific query patterns and represent a valuable yet often underutilized source of supervision for improving Text-to-SQL systems. However, in practical enterprise deployments, labeled workloads are typically scarce, often consisting of only a few dozen NLQ–SQL pairs. Existing approaches predominantly rely on in-context learning (ICL) with proprietary LLMs to exploit such limited data. Although effective, these approaches incur high operational cost due to repeated API calls and, more often than not, high inference latency too. In this thesis, we investigate whether open-source Text-to-SQL task-aware Small Language Models (SLMs) can be adapted more efficiently using the available workload data. We propose a workload-adaptive strategy that systematically selects the most suitable combination of model class (LLM vs. SLM) and learning paradigm among three alternatives: ICL (through few-shot prompting) with an LLM, supervised fine-tuning (SFT) of an SLM on the workload, and SFT augmented with synthetic data generated by an LLM. The strategy relies on workload-based diagnostic evaluation to guide this selection. Experiments conducted on fourteen databases under simulated scarce-workload conditions demonstrate that the proposed strategy consistently identifies effective adaptation pathways. In the majority of cases, fine-tuned task-aware SLMs outperform ICL-based LLM approaches while achieving significantly lower inference latency. When the available workload is insufficient for effective fine-tuning, augmenting the workload with carefully generated synthetic data further improves performance. These results highlight the potential of workload-driven adaptation as a practical and efficient approach for deploying Text-to-SQL systems in enterprise environments.	en_US
dc.language.iso	en	en_US
dc.subject	Text-to-SQL	en_US
dc.subject	Large Language Models	en_US
dc.subject	Structured Query Language	en_US
dc.subject	Databases	en_US
dc.subject	Natural Language Processing	en_US
dc.subject	SQL Workload	en_US
dc.title	Workload-Driven Adaptation of Language Models for Enterprise Text-to-SQL	en_US
dc.type	Thesis	en_US
dc.description.embargo	No Embargo	en_US
dc.type.degree	BS-MS	en_US
dc.contributor.department	Dept. of Data Science	en_US
dc.contributor.registration	20191222	en_US

Files in this item

Name: 20191222_Talreja_ ...

Size: 8.543Mb

Format: PDF

Description: MS Thesis

View/Open

This item appears in the following Collection(s)

MS THESES [2219]
Thesis submitted to IISER Pune in partial fulfilment of the requirements for the BS-MS Dual Degree Programme/MSc. Programme/MS-Exit Programme

Show simple item record

Search Repository

Advanced Search

Browse

All of Repository
This Collection
- Titles
- Authors
- By Advisor
- By Issue Date
- Subjects
- By Type
- By Department

Workload-Driven Adaptation of Language Models for Enterprise Text-to-SQL

Files in this item

This item appears in the following Collection(s)

Search Repository

Browse

All of Repository

This Collection

My Account