Comprehensive ensemble in QSAR prediction for drug discovery

被引:104
|
作者
Kwon, Sunyoung [1 ,3 ]
Bae, Ho [2 ]
Jo, Jeonghee [2 ]
Yoon, Sungroh [1 ,2 ,4 ,5 ,6 ,7 ]
机构
[1] Seoul Natl Univ, Dept Elect & Comp Engn, Seoul 08826, South Korea
[2] Seoul Natl Univ, Interdisciplinary Program Bioinformat, Seoul 08826, South Korea
[3] NAVER Corp, Clova AI Res, Seongnam 13561, South Korea
[4] Seoul Natl Univ, Biol Sci, Seoul 08826, South Korea
[5] Seoul Natl Univ, ASRI, Seoul 08826, South Korea
[6] Seoul Natl Univ, INMC, Seoul 08826, South Korea
[7] Seoul Natl Univ, Inst Engn Res, Seoul 08826, South Korea
基金
新加坡国家研究基金会;
关键词
Ensemble-learning; Meta-learning; Drug-prediction; REGRESSION;
D O I
10.1186/s12859-019-3135-4
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background Quantitative structure-activity relationship (QSAR) is a computational modeling method for revealing relationships between structural properties of chemical compounds and biological activities. QSAR modeling is essential for drug discovery, but it has many constraints. Ensemble-based machine learning approaches have been used to overcome constraints and obtain reliable predictions. Ensemble learning builds a set of diversified models and combines them. However, the most prevalent approach random forest and other ensemble approaches in QSAR prediction limit their model diversity to a single subject. Results The proposed ensemble method consistently outperformed thirteen individual models on 19 bioassay datasets and demonstrated superiority over other ensemble approaches that are limited to a single subject. The comprehensive ensemble method is publicly available at . Conclusions We propose a comprehensive ensemble method that builds multi-subject diversified models and combines them through second-level meta-learning. In addition, we propose an end-to-end neural network-based individual classifier that can automatically extract sequential features from a simplified molecular-input line-entry system (SMILES). The proposed individual models did not show impressive results as a single model, but it was considered the most important predictor when combined, according to the interpretation of the meta-learning.
引用
收藏
页数:12
相关论文
共 50 条
  • [21] Drug design by machine learning: Ensemble learning for QSAR modeling
    Liu, Y
    [J]. ICMLA 2005: FOURTH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS, PROCEEDINGS, 2005, : 187 - 193
  • [22] QSAR IN PREDICTION OF TARGET AND ANTITARGET DRUG PROPERTIES
    Takac, Milena Jadrijevic-Mladar
    [J]. EUROPEAN JOURNAL OF PHARMACEUTICAL SCIENCES, 2009, 38 (01) : 43 - 44
  • [23] FL-QSAR: a federated learning-based QSAR prototype for collaborative drug discovery
    Chen, Shaoqi
    Xue, Dongyu
    Chuai, Guohui
    Yang, Qiang
    Liu, Qi
    [J]. BIOINFORMATICS, 2020, 36 (22-23) : 5492 - 5498
  • [24] Logistic Regression Ensemble (LORENS) Applied to Drug Discovery
    Widhianingsih, T. Dwi Ary
    Kuswanto, Heri
    Prastyo, Dedy Dwi
    [J]. MATEMATIKA, 2020, 36 (01) : 43 - 49
  • [25] Conformational ensemble comparison for small molecules in drug discovery
    Matthew Habgood
    [J]. Journal of Computer-Aided Molecular Design, 2018, 32 : 841 - 852
  • [26] Conformational ensemble comparison for small molecules in drug discovery
    Habgood, Matthew
    [J]. JOURNAL OF COMPUTER-AIDED MOLECULAR DESIGN, 2018, 32 (08) : 841 - 852
  • [27] Improving ensemble docking for drug discovery by machine learning
    Wong, Chung F.
    [J]. JOURNAL OF THEORETICAL & COMPUTATIONAL CHEMISTRY, 2019, 18 (03):
  • [28] A comprehensive strategy for ADME screening in drug discovery
    White, RE
    [J]. PHARMACEUTICAL PROFILING IN DRUG DISCOVERY FOR LEAD SELECTION, 2004, 1 : 431 - 450
  • [29] Comprehensive Assessment of ADMET Risks in Drug Discovery
    Wang, Jianling
    [J]. CURRENT PHARMACEUTICAL DESIGN, 2009, 15 (19) : 2195 - 2219
  • [30] Comprehensive Profiling of Protein Ubiquitination for Drug Discovery
    Xu, Guoqiang
    Jaffrey, Samie R.
    [J]. CURRENT PHARMACEUTICAL DESIGN, 2013, 19 (18) : 3315 - 3328