Please use this identifier to cite or link to this item: http://dr.iiserpune.ac.in:8080/xmlui/handle/123456789/10965
Title: Workload-Driven Adaptation of Language Models for Enterprise Text-to-SQL
Authors: MONTEIRO, JOY
TALREJA, ASHISH
Dept. of Data Science
20191222
Keywords: Text-to-SQL
Large Language Models
Structured Query Language
Databases
Natural Language Processing
SQL Workload
Issue Date: May-2026
Citation: 99
Abstract: Recent advances in Large Language Models (LLMs) have significantly improved the ability to translate natural language questions (NLQs) into executable SQL queries (this task is often referred to as Text-to-SQL or NL-to-SQL). While many studies focus on general purpose benchmarks, enterprise environments present a distinct setting in which databases are accompanied by historical SQL query logs (a collection of SQL queries that were fired on the database), commonly referred to as SQL workloads. These workloads capture domain specific query patterns and represent a valuable yet often underutilized source of supervision for improving Text-to-SQL systems. However, in practical enterprise deployments, labeled workloads are typically scarce, often consisting of only a few dozen NLQ–SQL pairs. Existing approaches predominantly rely on in-context learning (ICL) with proprietary LLMs to exploit such limited data. Although effective, these approaches incur high operational cost due to repeated API calls and, more often than not, high inference latency too. In this thesis, we investigate whether open-source Text-to-SQL task-aware Small Language Models (SLMs) can be adapted more efficiently using the available workload data. We propose a workload-adaptive strategy that systematically selects the most suitable combination of model class (LLM vs. SLM) and learning paradigm among three alternatives: ICL (through few-shot prompting) with an LLM, supervised fine-tuning (SFT) of an SLM on the workload, and SFT augmented with synthetic data generated by an LLM. The strategy relies on workload-based diagnostic evaluation to guide this selection. Experiments conducted on fourteen databases under simulated scarce-workload conditions demonstrate that the proposed strategy consistently identifies effective adaptation pathways. In the majority of cases, fine-tuned task-aware SLMs outperform ICL-based LLM approaches while achieving significantly lower inference latency. When the available workload is insufficient for effective fine-tuning, augmenting the workload with carefully generated synthetic data further improves performance. These results highlight the potential of workload-driven adaptation as a practical and efficient approach for deploying Text-to-SQL systems in enterprise environments.
URI: http://dr.iiserpune.ac.in:8080/xmlui/handle/123456789/10965
Appears in Collections:MS THESES

Files in This Item:
File Description SizeFormat 
20191222_Talreja_Ashish_Dwarkadas_MS_Thesis.pdfMS Thesis8.75 MBAdobe PDFView/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.