| dc.contributor.advisor | Patwardhan, Manasi | |
| dc.contributor.author | MULE, SRUJAN PRAKASH | |
| dc.date.accessioned | 2026-05-21T07:19:11Z | |
| dc.date.available | 2026-05-21T07:19:11Z | |
| dc.date.issued | 2026-05 | |
| dc.identifier.citation | 126 | en_US |
| dc.identifier.uri | http://dr.iiserpune.ac.in:8080/xmlui/handle/123456789/11108 | |
| dc.description.abstract | As generative language models (LMs) accelerate scientific research by automating hypothesis generation, a new bottleneck emerges: evaluating and filtering hundreds of LM generated ideas without exhaustive experimentation. This work asks whether LMs can learn to judge the empirical success of research ideas before any experiments are run. This thesis studies comparative empirical forecasting: given a benchmark-specific research goal and two candidate ideas, predict which will achieve better leaderboard performance. A dataset of 11,488 idea pairs grounded in objective outcomes from PapersWithCode is created for this task. While untrained 8B-parameter models struggle (≈30% accuracy), Supervised Fine-Tuning dramatically boosts performance to 77.1%, significantly outperforming frontier models like GPT-5 (61.1%). By framing evaluation as a reasoning task via Reinforcement Learning with Verifiable Rewards, models are trained to discover latent reasoning paths, achieving 71.35% accuracy with interpretable justifications. Crucially, these RL-trained variants demonstrate superior cross-domain generalization, achieving 67.49% on an independent test set and surpassing a zero-shot retrieval-augmented GPT-4.1 system by 16 percentage points. These results demonstrate that compute-efficient small language models can show potential as effective, objective verifiers, offering a scalable path for autonomous scientific discovery. | en_US |
| dc.description.sponsorship | TCS Research | en_US |
| dc.language.iso | en | en_US |
| dc.subject | AI for Scientific Discovery | en_US |
| dc.subject | Automated Scientific Discovery | en_US |
| dc.subject | Comparative Empirical Forecasting | en_US |
| dc.subject | Research Idea Evaluation | en_US |
| dc.subject | Large Language Models (LLMs) | en_US |
| dc.subject | Small Language Models (SLMs) | en_US |
| dc.subject | Reinforcement Learning | en_US |
| dc.subject | Interpretable Reasoning | en_US |
| dc.subject | Scientific Benchmarking | en_US |
| dc.title | Forecasting Research Success through Learned Comparison of Scientific Ideas | en_US |
| dc.type | Thesis | en_US |
| dc.description.embargo | No Embargo | en_US |
| dc.type.degree | BS-MS | en_US |
| dc.contributor.department | Dept. of Data Science | en_US |
| dc.contributor.registration | 20211245 | en_US |