Benchmarking Machine Learning Force Fields via Energy Landscape Exploration

SHARMA, ANAND

Please use this identifier to cite or link to this item: http://dr.iiserpune.ac.in:8080/xmlui/handle/123456789/11097

Title:	Benchmarking Machine Learning Force Fields via Energy Landscape Exploration
Authors:	Poltavsky, Igor Tkatchenko, Alexandre SHARMA, ANAND Dept. of Physics 20211055
Keywords:	Machine Learning Force Fields Atomistic Simulations Computational Chemistry Interatomic Potential Benchmarking
Issue Date:	May-2026
Citation:	81
Abstract:	General-purpose machine learning force fields (GP-MLFFs) have emerged as a transformative approach in computational chemistry and materials science, combining near-quantum-mechanical accuracy with the computational efficiency of classical force fields for molecular dynamics simulation. However, ensuring the reliability of GP-MLFF predictions beyond their training regime remains a central and largely unresolved challenge. Traditional benchmarking approaches evaluate models on fixed test datasets, which are fundamentally limited in their ability to probe model behavior under genuine extrapolation, as no fixed dataset can adequately sample the vast configurational space a model may encounter during molecular dynamics simulations or when applied to novel chemical systems. In this work, we introduce a general, system- and model-agnostic benchmarking framework that directly this limitation. Rather than relying on predefined test sets, the framework evaluates a GP-MLFF's ability to represent the chemical space of local bonding motifs by using the model itself to generate molecular structures through relaxing randomly initialized atomic configuration. The resulting structures are evaluated through comparison with reference ab initio calculations, model's training data, and cross-model validation, providing both quantitative accuracy metrics and a model-agnostic measure of chemical plausibility. The framework is demonstrated on two state-of-the-art SO3-equivariant GP-MLFFs applied to the chemical space of H, C, N, and O atoms. The results reveal pronounced differences in generative behavior, chemical diversity, and force prediction accuracy between the two models, potentially traceable to their distinct training data compositions. The framework successfully probes extrapolative regimes, identifying model's bias and failure modes that traditional fixed-dataset benchmarks cannot detect. The presented framework offers a practical and extensible approach for evaluating GP-MLFF reliability beyond interpolative accuracy, with direct applications to active learning, training data augmentation, chemical space exploration, and systematic identification of model failure modes across chemical space.
URI:	http://dr.iiserpune.ac.in:8080/xmlui/handle/123456789/11097
Appears in Collections:	MS THESES

Files in This Item:

File	Description	Size	Format
20211055_ANAND_SHARMA_MS_Thesis.pdf	MS Thesis	2.51 MB	Adobe PDF	View/Open Request a copy

Show full item record