MolData, a molecular benchmark for disease and target based machine learning

被引:0
|
作者
Arash Keshavarzi Arshadi
Milad Salem
Arash Firouzbakht
Jiann Shiun Yuan
机构
[1] University of Central Florida,Burnett School of Biomedical Sciences
[2] University of Central Florida,Department of Electrical and Computer Engineering
[3] University of Illinois at Urbana,Department of Chemistry
来源
关键词
Artificial intelligence; Benchmark; Biological assays; Big data; Database; Drug discovery; Machine learning; PubChem;
D O I
暂无
中图分类号
学科分类号
摘要
Deep learning’s automatic feature extraction has been a revolutionary addition to computational drug discovery, infusing both the capabilities of learning abstract features and discovering complex molecular patterns via learning from molecular data. Since biological and chemical knowledge are necessary for overcoming the challenges of data curation, balancing, training, and evaluation, it is important for databases to contain information regarding the exact target and disease of each bioassay. The existing depositories such as PubChem or ChEMBL offer the screening data for millions of molecules against a variety of cells and targets, however, their bioassays contain complex biological descriptions which can hinder their usage by the machine learning community. In this work, a comprehensive disease and target-based dataset is collected from PubChem in order to facilitate and accelerate molecular machine learning for better drug discovery. MolData is one the largest efforts to date for democratizing the molecular machine learning, with roughly 170 million drug screening results from 1.4 million unique molecules assigned to specific diseases and targets. It also provides 30 unique categories of targets and diseases. Correlation analysis of the MolData bioassays unveils valuable information for drug repurposing for multiple diseases including cancer, metabolic disorders, and infectious diseases. Finally, we provide a benchmark of more than 30 models trained on each category using multitask learning. MolData aims to pave the way for computational drug discovery and accelerate the advancement of molecular artificial intelligence in a practical manner. The MolData benchmark data is available at https://GitHub.com/Transilico/MolData as well as within the additional files.
引用
收藏
相关论文
共 50 条
  • [21] Hyperspectral imaging benchmark based on machine learning for intraoperative brain tumour detection
    Raquel Leon
    Himar Fabelo
    Samuel Ortega
    Ines A. Cruz-Guerrero
    Daniel Ulises Campos-Delgado
    Adam Szolna
    Juan F. Piñeiro
    Carlos Espino
    Aruma J. O’Shanahan
    Maria Hernandez
    David Carrera
    Sara Bisshopp
    Coralia Sosa
    Francisco J. Balea-Fernandez
    Jesus Morera
    Bernardino Clavo
    Gustavo M. Callico
    [J]. npj Precision Oncology, 7
  • [22] Benchmark of Data Processing Methods and Machine Learning Models for Gut Microbiome-Based Diagnosis of Inflammatory Bowel Disease
    Kubinski, Ryszard
    Djamen-Kepaou, Jean-Yves
    Zhanabaev, Timur
    Hernandez-Garcia, Alex
    Bauer, Stefan
    Hildebrand, Falk
    Korcsmaros, Tamas
    Karam, Sani
    Jantchou, Prevost
    Kafi, Kamran
    Martin, Ryan D.
    [J]. FRONTIERS IN GENETICS, 2022, 13
  • [23] A Federated Learning Benchmark for Drug-Target Interaction
    Mittone, Gianluca
    Svoboda, Filip
    Aldinucci, Marco
    Lane, Nicholas D.
    Lio, Pietro
    [J]. COMPANION OF THE WORLD WIDE WEB CONFERENCE, WWW 2023, 2023, : 1177 - 1181
  • [24] Threat Estimation of Aerial Target Based on Extreme Learning Machine
    Zhang, Shu-li
    Zhang, Tao
    Xu, Xi-meng
    [J]. PROCEEDINGS OF THE 38TH CHINESE CONTROL CONFERENCE (CCC), 2019, : 7233 - 7238
  • [25] Machine learning in Alzheimer's disease drug discovery and target identification
    Geng, Chaofan
    Wang, ZhiBin
    Tang, Yi
    [J]. AGEING RESEARCH REVIEWS, 2024, 93
  • [26] Machine learning based cardiovascular disease prediction
    Chinnasamy, P.
    Kumar, S. Arun
    Navya, V
    Priya, K. Lakshmi
    Boddu, Siva Sruthi
    [J]. MATERIALS TODAY-PROCEEDINGS, 2022, 64 : 459 - 463
  • [27] A Disease Diagnosis Method Based on Machine Learning
    Li, Xinrong
    Xie, Xiaolan
    [J]. ICIIP'18: PROCEEDINGS OF THE 3RD INTERNATIONAL CONFERENCE ON INTELLIGENT INFORMATION PROCESSING, 2018, : 184 - 189
  • [28] Machine learning based cardiovascular disease prediction
    Chinnasamy, P.
    Kumar, S. Arun
    Navya, V.
    Priya, K. Lakshmi
    Boddu, Siva Sruthi
    [J]. MATERIALS TODAY-PROCEEDINGS, 2022, 64 : 459 - 463
  • [29] An empirical evaluation of benchmark machine learning classifiers for risk prediction of cardiovascular disease in diabetic males
    Sonia, S. V. Evangelin
    Nedunchezhian, R.
    Ramakrishnan, S.
    Kannammal, K. E.
    [J]. INTERNATIONAL JOURNAL OF HEALTHCARE MANAGEMENT, 2023,
  • [30] Performance Benchmark of Machine Learning-Based Methodology for Swahili News Article Categorization
    Little, Shaun Anthony
    Roy, Kaushik
    Al Hamoud, Ahmed
    [J]. 2022 21ST IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS, ICMLA, 2022, : 1517 - 1521