Please use this identifier to cite or link to this item: http://dr.iiserpune.ac.in:8080/xmlui/handle/123456789/6711
Title: MolGPT: Molecular Generation Using a Transformer-Decoder Model
Authors: BAGAL, VIRAJ
Aggarwal, Rishal
Vinod, P. K.
Priyakumar, U. Deva
Dept. of Chemistry
Keywords: Molecular properties
Partition coefficient
Molecular modeling
Scaffolds
Molecules
2021
Issue Date: Oct-2021
Publisher: American Chemical Society
Citation: Journal of Chemical Information and Modeling
Abstract: Application of deep learning techniques for de novo generation of molecules, termed as inverse molecular design, has been gaining enormous traction in drug design. The representation of molecules in SMILES notation as a string of characters enables the usage of state of the art models in natural language processing, such as Transformers, for molecular design in general. Inspired by generative pre-training (GPT) models that have been shown to be successful in generating meaningful text, we train a transformer-decoder on the next token prediction task using masked self-attention for the generation of druglike molecules in this study. We show that our model, MolGPT, performs on par with other previously proposed modern machine learning frameworks for molecular generation in terms of generating valid, unique, and novel molecules. Furthermore, we demonstrate that the model can be trained conditionally to control multiple properties of the generated molecules. We also show that the model can be used to generate molecules with desired scaffolds as well as desired molecular properties by conditioning the generation on scaffold SMILES strings of desired scaffolds and property values. Using saliency maps, we highlight the interpretability of the generative process of the model
URI: https://doi.org/10.1021/acs.jcim.1c00600
http://dr.iiserpune.ac.in:8080/xmlui/handle/123456789/6711
ISSN: 1549-9596
1549-960X
Appears in Collections:JOURNAL ARTICLES

Files in This Item:
There are no files associated with this item.


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.