Exploiting ensemble learning to improve prediction of phospholipidosis inducing potential

被引:10
|
作者
Nath, Abhigyan [1 ]
Sahu, Gopal Krishna [1 ]
机构
[1] Pt Jawahar Lal Nehru Mem Med Coll, Dept Biochem, Raipur 492001, Madhya Pradesh, India
关键词
Phospholipidosis; Stacking; Hierarchical clustering; Cationic amphiphilic drugs; Deep learning; Ensemble learning; DRUG-INDUCED PHOSPHOLIPIDOSIS; IN-SILICO; PULMONARY; ACCURACY; AUC;
D O I
10.1016/j.jtbi.2019.07.009
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Phospholipidosis is characterized by the presence of excessive accumulation of phospholipids in different tissue types (lungs, liver, eyes, kidneys etc.) caused by cationic amphiphilic drugs. Electron microscopy analysis has revealed the presence of lamellar inclusion bodies as the hallmark of phospholipidosis. Some phospholipidosis causing compounds can cause tissue specific inflammatory/retrogressive changes. Reliable and accurate in silico methods could facilitate early screening of phospholipidosis inducing compounds which can subsequently speed up the pharmaceutical drug discovery pipelines. In the present work, stacking ensembles are implemented for combining a number of different base learners to develop predictive models (a total of 256 trained machine learning models were tested) for phospholipidosis inducing compounds using a wide range of molecular descriptors (ChemMine, JOELib, Open babel and RDK descriptors) and structural alerts as input features. The best model consisting of stacked ensemble of machine learning algorithms with random forest as the second level learner outperformed other base and ensemble learners. JOELib descriptors along with structural alerts performed better than the other types of descriptor sets. The best ensemble model achieved an overall accuracy of 88.23%, sensitivity of 86.27%, specificity of 90.20%, mcc of 0.765, auc of 0.896 with 88.21 g-means. To assess the robustness and stability of the best ensemble model, it is further evaluated using stratified 10x10 fold cross validation and holdout testing sets (repeated 10 times) achieving 84.83% mean accuracy with 0.708 mean mcc and 88.46% mean accuracy with 0.771 mean mcc respectively. A comparison of different meta classifiers (Generalized linear regression, Gradient boosting machines, Random forest and Deep learning neural networks) in stacking ensemble revealed that random forest is the better choice for combining multiple classification models. (C) 2019 Elsevier Ltd. All rights reserved.
引用
收藏
页码:37 / 47
页数:11
相关论文
共 50 条
  • [1] Weka Machine Learning for Predicting the Phospholipidosis Inducing Potential
    Ivanciuc, Ovidiu
    [J]. CURRENT TOPICS IN MEDICINAL CHEMISTRY, 2008, 8 (18) : 1691 - 1709
  • [2] Prediction of phospholipidosis-inducing potential of drugs by in vitro biochemical and physicochemical assays followed by multivariate analysis
    Kuroda, Yukihiro
    Saito, Madoka
    [J]. TOXICOLOGY IN VITRO, 2010, 24 (02) : 661 - 668
  • [3] Exploiting submodel diversity in ensemble prediction
    Daga, Pankaj
    Waldman, Marvin
    Clark, Robert
    [J]. ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 2017, 254
  • [4] Chromatography approaches for early screening of the phospholipidosis-inducing potential of pharmaceuticals
    Jiang, Zhengjin
    Reilly, John
    [J]. JOURNAL OF PHARMACEUTICAL AND BIOMEDICAL ANALYSIS, 2012, 61 : 184 - 190
  • [5] Ensemble learning can significantly improve human microRNA target prediction
    Yu, Seunghak
    Kim, Juho
    Min, Hyeyoung
    Yoon, Sungroh
    [J]. METHODS, 2014, 69 (03) : 220 - 229
  • [6] Ensemble Learning to Improve the Prediction of Fetal Macrosomia and Large-for-Gestational Age
    Ye, Shangyuan
    Zhang, Hui
    Shi, Fuyan
    Guo, Jing
    Wang, Suzhen
    Zhang, Bo
    [J]. JOURNAL OF CLINICAL MEDICINE, 2020, 9 (02)
  • [7] Using Coding-Based Ensemble Learning to Improve Software Defect Prediction
    Sun, Zhongbin
    Song, Qinbao
    Zhu, Xiaoyan
    [J]. IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART C-APPLICATIONS AND REVIEWS, 2012, 42 (06): : 1806 - 1817
  • [8] Second basic pKa: An overlooked parameter in predicting phospholipidosis-inducing potential of diamines
    Sakai, Hiroki
    Inoue, Hidekazu
    Murata, Kenji
    Toba, Tetsuya
    Takemoto, Naohiro
    Matsumoto, Takahiro
    Kawabata, Takeo
    [J]. BIOORGANIC & MEDICINAL CHEMISTRY LETTERS, 2020, 30 (09)
  • [9] A Classification Method of Scientific Collaborator Potential Prediction Based on Ensemble Learning
    Ai, Ke
    Ma, Guoshuai
    Yang, Kaikai
    Qian, Yuhua
    [J]. Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2019, 56 (07): : 1383 - 1395
  • [10] An Ensemble Prediction Model for Potential Student Recommendation Using Machine Learning
    Yan, Lijuan
    Liu, Yanshen
    [J]. SYMMETRY-BASEL, 2020, 12 (05):