Please use this identifier to cite or link to this item: http://dr.iiserpune.ac.in:8080/xmlui/handle/123456789/11097
Title: Benchmarking Machine Learning Force Fields via Energy Landscape Exploration
Authors: Poltavsky, Igor
Tkatchenko, Alexandre
SHARMA, ANAND
Dept. of Physics
20211055
Keywords: Machine Learning Force Fields
Atomistic Simulations
Computational Chemistry
Interatomic Potential
Benchmarking
Issue Date: May-2026
Citation: 81
Abstract: General-purpose machine learning force fields (GP-MLFFs) have emerged as a transformative approach in computational chemistry and materials science, combining near-quantum-mechanical accuracy with the computational efficiency of classical force fields for molecular dynamics simulation. However, ensuring the reliability of GP-MLFF predictions beyond their training regime remains a central and largely unresolved challenge. Traditional benchmarking approaches evaluate models on fixed test datasets, which are fundamentally limited in their ability to probe model behavior under genuine extrapolation, as no fixed dataset can adequately sample the vast configurational space a model may encounter during molecular dynamics simulations or when applied to novel chemical systems. In this work, we introduce a general, system- and model-agnostic benchmarking framework that directly this limitation. Rather than relying on predefined test sets, the framework evaluates a GP-MLFF's ability to represent the chemical space of local bonding motifs by using the model itself to generate molecular structures through relaxing randomly initialized atomic configuration. The resulting structures are evaluated through comparison with reference ab initio calculations, model's training data, and cross-model validation, providing both quantitative accuracy metrics and a model-agnostic measure of chemical plausibility. The framework is demonstrated on two state-of-the-art SO3-equivariant GP-MLFFs applied to the chemical space of H, C, N, and O atoms. The results reveal pronounced differences in generative behavior, chemical diversity, and force prediction accuracy between the two models, potentially traceable to their distinct training data compositions. The framework successfully probes extrapolative regimes, identifying model's bias and failure modes that traditional fixed-dataset benchmarks cannot detect. The presented framework offers a practical and extensible approach for evaluating GP-MLFF reliability beyond interpolative accuracy, with direct applications to active learning, training data augmentation, chemical space exploration, and systematic identification of model failure modes across chemical space.
URI: http://dr.iiserpune.ac.in:8080/xmlui/handle/123456789/11097
Appears in Collections:MS THESES

Files in This Item:
File Description SizeFormat 
20211055_ANAND_SHARMA_MS_Thesis.pdfMS Thesis2.51 MBAdobe PDFView/Open    Request a copy


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.