GeneralizedDTA: combining pre-training and multi-task learning to predict drug-target binding affinity for unknown drug discovery

被引:3
|
作者
Lin, Shaofu [1 ]
Shi, Chengyu [1 ]
Chen, Jianhui [1 ,2 ,3 ]
机构
[1] Beijing Univ Technol, Fac Informat Technol, 100 Pingleyuan, Beijing 100124, Peoples R China
[2] Beijing Univ Technol, Beijing Int Collaborat Base Brain Informat & Wisd, 100 Pingleyuan, Beijing 100124, Peoples R China
[3] Beijing Univ Technol, Beijing Key Lab MRI & Brain Informat, 100 Pingleyuan, Beijing 100124, Peoples R China
基金
北京市自然科学基金;
关键词
DTA prediction; Pre-training task; Multi-task learning; Dual adaptation mechanism; SEQUENCE;
D O I
10.1186/s12859-022-04905-6
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background Accurately predicting drug-target binding affinity (DTA) in silico plays an important role in drug discovery. Most of the computational methods developed for predicting DTA use machine learning models, especially deep neural networks, and depend on large-scale labelled data. However, it is difficult to learn enough feature representation from tens of millions of compounds and hundreds of thousands of proteins only based on relatively limited labelled drug-target data. There are a large number of unknown drugs, which never appear in the labelled drug-target data. This is a kind of out-of-distribution problems in bio-medicine. Some recent studies adopted self-supervised pre-training tasks to learn structural information of amino acid sequences for enhancing the feature representation of proteins. However, the task gap between pre-training and DTA prediction brings the catastrophic forgetting problem, which hinders the full application of feature representation in DTA prediction and seriously affects the generalization capability of models for unknown drug discovery. Results To address these problems, we propose the GeneralizedDTA, which is a new DTA prediction model oriented to unknown drug discovery, by combining pre-training and multi-task learning. We introduce self-supervised protein and drug pre-training tasks to learn richer structural information from amino acid sequences of proteins and molecular graphs of drug compounds, in order to alleviate the problem of high variance caused by encoding based on deep neural networks and accelerate the convergence of prediction model on small-scale labelled data. We also develop a multi-task learning framework with a dual adaptation mechanism to narrow the task gap between pre-training and prediction for preventing overfitting and improving the generalization capability of DTA prediction model on unknown drug discovery. To validate the effectiveness of our model, we construct an unknown drug data set to simulate the scenario of unknown drug discovery. Compared with existing DTA prediction models, the experimental results show that our model has the higher generalization capability in the DTA prediction of unknown drugs. Conclusions The advantages of our model are mainly attributed to two kinds of pre-training tasks and the multi-task learning framework, which can learn richer structural information of proteins and drugs from large-scale unlabeled data, and then effectively integrate it into the downstream prediction task for obtaining a high-quality DTA prediction in unknown drug discovery.
引用
收藏
页数:17
相关论文
共 50 条
  • [1] GeneralizedDTA: combining pre-training and multi-task learning to predict drug-target binding affinity for unknown drug discovery
    Shaofu Lin
    Chengyu Shi
    Jianhui Chen
    [J]. BMC Bioinformatics, 23
  • [2] OdinDTA: Combining Mutual Attention and Pre-training for Drug-target Affinity Prediction
    Xu, Shuting
    Wang, Ruochen
    [J]. 2023 IEEE 35TH INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE, ICTAI, 2023, : 680 - 687
  • [3] Predicting Drug-Target Interactions Binding Affinity by Using Dual Updating Multi-task Learning
    Shi, Chengyu
    Lin, Shaofu
    Chen, Jianhui
    Wang, Mengzhen
    Gao, Qingcai
    [J]. COMPUTER SUPPORTED COOPERATIVE WORK AND SOCIAL COMPUTING, CHINESECSCW 2021, PT II, 2022, 1492 : 66 - 76
  • [4] Graph neural pre-training based drug-target affinity prediction
    Ye, Qing
    Sun, Yaxin
    [J]. FRONTIERS IN GENETICS, 2024, 15
  • [5] Prediction of drug-target interactions through multi-task learning
    Moon, Chaeyoung
    Kim, Dongsup
    [J]. SCIENTIFIC REPORTS, 2022, 12 (01)
  • [6] Multi-task bioassay pre-training for protein-ligand binding affinity prediction
    Yan, Jiaxian
    Ye, Zhaofeng
    Yang, Ziyi
    Lu, Chengqiang
    Zhang, Shengyu
    Liu, Qi
    Qiu, Jiezhong
    [J]. BRIEFINGS IN BIOINFORMATICS, 2024, 25 (01)
  • [7] GanDTI: A multi-task neural network for drug-target interaction prediction
    Wang, Shuyu
    Shan, Peng
    Zhao, Yuliang
    Zuo, Lei
    [J]. COMPUTATIONAL BIOLOGY AND CHEMISTRY, 2021, 92
  • [8] Halogen Bond: Its Role beyond Drug-Target Binding Affinity for Drug Discovery and Development
    Xu, Zhijian
    Yang, Zhuo
    Liu, Yingtao
    Lu, Yunxiang
    Chen, Kaixian
    Zhu, Weiliang
    [J]. JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2014, 54 (01) : 69 - 78
  • [9] Drug knowledge discovery via multi-task learning and pre-trained models
    Li, Dongfang
    Xiong, Ying
    Hu, Baotian
    Tang, Buzhou
    Peng, Weihua
    Chen, Qingcai
    [J]. BMC MEDICAL INFORMATICS AND DECISION MAKING, 2021, 21 (SUPPL 9)
  • [10] Drug knowledge discovery via multi-task learning and pre-trained models
    Dongfang Li
    Ying Xiong
    Baotian Hu
    Buzhou Tang
    Weihua Peng
    Qingcai Chen
    [J]. BMC Medical Informatics and Decision Making, 21