Workload-Driven Adaptation of Language Models for Enterprise Text-to-SQL

TALREJA, ASHISH

DR Home
→
THESES & PROJECT REPORTS
→
MS THESES
→
View Item

Workload-Driven Adaptation of Language Models for Enterprise Text-to-SQL

TALREJA, ASHISH

URI: http://dr.iiserpune.ac.in:8080/xmlui/handle/123456789/10965 Date: 2026-05

Abstract:

Recent advances in Large Language Models (LLMs) have significantly improved the ability to translate natural language questions (NLQs) into executable SQL queries (this task is often referred to as Text-to-SQL or NL-to-SQL). While many studies focus on general purpose benchmarks, enterprise environments present a distinct setting in which databases are accompanied by historical SQL query logs (a collection of SQL queries that were fired on the database), commonly referred to as SQL workloads. These workloads capture domain specific query patterns and represent a valuable yet often underutilized source of supervision for improving Text-to-SQL systems. However, in practical enterprise deployments, labeled workloads are typically scarce, often consisting of only a few dozen NLQ–SQL pairs. Existing approaches predominantly rely on in-context learning (ICL) with proprietary LLMs to exploit such limited data. Although effective, these approaches incur high operational cost due to repeated API calls and, more often than not, high inference latency too. In this thesis, we investigate whether open-source Text-to-SQL task-aware Small Language Models (SLMs) can be adapted more efficiently using the available workload data. We propose a workload-adaptive strategy that systematically selects the most suitable combination of model class (LLM vs. SLM) and learning paradigm among three alternatives: ICL (through few-shot prompting) with an LLM, supervised fine-tuning (SFT) of an SLM on the workload, and SFT augmented with synthetic data generated by an LLM. The strategy relies on workload-based diagnostic evaluation to guide this selection. Experiments conducted on fourteen databases under simulated scarce-workload conditions demonstrate that the proposed strategy consistently identifies effective adaptation pathways. In the majority of cases, fine-tuned task-aware SLMs outperform ICL-based LLM approaches while achieving significantly lower inference latency. When the available workload is insufficient for effective fine-tuning, augmenting the workload with carefully generated synthetic data further improves performance. These results highlight the potential of workload-driven adaptation as a practical and efficient approach for deploying Text-to-SQL systems in enterprise environments.

Show full item record

Files in this item

Name: 20191222_Talreja_ ...

Size: 8.543Mb

Format: PDF

Description: MS Thesis

View/Open

This item appears in the following Collection(s)

MS THESES [2219]
Thesis submitted to IISER Pune in partial fulfilment of the requirements for the BS-MS Dual Degree Programme/MSc. Programme/MS-Exit Programme

Search Repository

Advanced Search

Browse

All of Repository
This Collection
- Titles
- Authors
- By Advisor
- By Issue Date
- Subjects
- By Type
- By Department

Workload-Driven Adaptation of Language Models for Enterprise Text-to-SQL

Workload-Driven Adaptation of Language Models for Enterprise Text-to-SQL

Abstract:

Files in this item

This item appears in the following Collection(s)

Search Repository

Browse

All of Repository

This Collection

My Account