Abstract:
The movement toward large-scale screening studies aimed at understanding and predicting the behavior of organocatalysts poses significant challenges in computational chemistry. The primary bottleneck in studying these systems using traditional techniques rooted in density functional theory is the effort required to locate the computationally expensive transition states (TS). As such, it would be ideal to establish a suitable theoretical model capable of quickly and accurately predicting this critically important data with a minimal computational cost. Historically, concepts based on Linear Scaling Relationships (LSRs), such as the Bell-Evans-Polanyi (BEP) principle that relates the activation barrier and enthalpy of analogous reactions, provided practical, simple to use guidelines for estimating transition states. Here, we seek to establish a quantitatively more accurate relationship beyond simple linear regressions and leverage machine learning to estimate the TS activation barriers. To accomplish this, we directly optimize geometries and establish the energies associated with key intermediates using a variety of inexpensive theoretical levels, such as semiempirical methods. The energies are then used to train machine learning (ML) models by applying a non-linear regression, which provides an approximation of the TS energies at the target DFT level directly from the energies of intermediates computed using the aforementioned methods. In essence, this procedure is an analog to the BEP principle, which, rather than relying on LSRs, uses non-linear regression and machine learning to draw connections between the structures and energies of intermediates with the associated activation barriers. The energetic data obtained using this ML framework also extends beyond simple BEP type relationships and could be used to accurately predict targeted chemical properties (e.g., stereoselectivity) with minimal computation cost.