Spectra to structure: contrastive learning framework for library ranking and generating molecular structures for infrared spectra

被引:0
|
作者
Kanakala, Ganesh Chandan [1 ]
Sridharan, Bhuvanesh [1 ]
Priyakumar, U. Deva [1 ]
机构
[1] Int Inst Informat Technol, Ctr Computat Nat Sci & Bioinformat, Hyderabad 500 032, India
来源
DIGITAL DISCOVERY | 2024年 / 3卷 / 12期
关键词
INFORMATION; SMILES;
D O I
10.1039/d4dd00135d
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Inferring complete molecular structure from infrared (IR) spectra is a challenging task. In this work, we propose SMEN (Spectra and Molecule Encoder Network), a framework for scoring molecules against given IR spectra. The proposed framework uses contrastive optimization to obtain similar embedding for a molecule and its spectra. For this study, we consider the QM9 dataset with molecules consisting of less than 9 heavy atoms and obtain simulated spectra. Using the proposed method, we can rank the molecules using embedding similarity and obtain a Top 1 accuracy of similar to 81%, Top 3 accuracy of similar to 96%, and Top 10 accuracy of similar to 99% on the evaluation set. We extend SMEN to build a generative transformer for a direct molecule prediction from IR spectra. The proposed method can significantly help molecule library ranking tasks and aid the problem of inferring molecular structures from spectra.
引用
收藏
页码:2417 / 2423
页数:7
相关论文
共 50 条