GTB-PPI: Predict Protein-protein Interactions Based on L1-regularized Logistic Regression and Gradient Tree Boosting

被引:29
|
作者
Yu, Bin [1 ,2 ,3 ]
Chen, Cheng [2 ,3 ]
Zhou, Hongyan [2 ,3 ]
Liu, Bingqiang [4 ]
Ma, Qin [5 ]
机构
[1] Univ Sci & Technol China, Sch Life Sci, Hefei 230027, Peoples R China
[2] Qingdao Univ Sci & Technol, Coll Math & Phys, Qingdao 266061, Peoples R China
[3] Qingdao Univ Sci & Technol, Artificial Intelligence & Biomed Big Data Res Ctr, Qingdao 266061, Peoples R China
[4] Shandong Univ, Sch Math, Jinan 250100, Peoples R China
[5] Ohio State Univ, Coll Med, Dept Biomed Informat, Columbus, OH 43210 USA
基金
中国国家自然科学基金;
关键词
Protein-protein interaction; Feature fusion; L1-regularized logistic regression; Gradient tree boosting; Machine learning; FEATURE-SELECTION; INFORMATION; ENSEMBLE; NETWORK; CELL;
D O I
10.1016/j.gpb.2021.01.001
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
Protein-protein interactions (PPIs) are of great importance to understand genetic mechanisms, delineate disease pathogenesis, and guide drug design. With the increase of PPI data and development of machine learning technologies, prediction and identification of PPIs have become a research hotspot in proteomics. In this study, we propose a new prediction pipeline for PPIs based on gradient tree boosting (GTB). First, the initial feature vector is extracted by fusing pseudo amino acid composition (PseAAC), pseudo position-specific scoring matrix (PsePSSM), reduced sequence and index-vectors (RSIV), and autocorrelation descriptor (AD). Second, to remove redundancy and noise, we employ L1-regularized logistic regression (L1-RLR) to select an optimal feature subset. Finally, GTB-PPI model is constructed. Five-fold cross-validation showed that GTB-PPI achieved the accuracies of 95.15% and 90.47% on Saccharomyces cerevisiae and Helicobacter pylori datasets, respectively. In addition, GTB-PPI could be applied to predict the independent test datasets for Caenorhabditis elegans, Escherichia coli, Homo sapiens, and Mus musculus, the one-core PPI network for CD9, and the crossover PPI network for the Wnt-related signaling pathways. The results show that GTB-PPI can significantly improve accuracy of PPI prediction. The code and datasets of GTB-PPI can be downloaded from https://github.com/QUST-AIBBDRC/GTB-PPI/.
引用
收藏
页码:582 / 592
页数:11
相关论文
共 5 条
  • [1] GTB-PPI:Predict Protein–protein Interactions Based on L1-regularized Logistic Regression and Gradient Tree Boosting
    Bin Yu
    Cheng Chen
    Hongyan Zhou
    Bingqiang Liu
    Qin Ma
    Genomics,Proteomics & Bioinformatics, 2020, 18 (05) : 582 - 592
  • [2] GTB-PPI:Predict Protein–protein Interactions Based on L1-regularized Logistic Regression and Gradient Tree Boosting
    Bin Yu
    Cheng Chen
    Hongyan Zhou
    Bingqiang Liu
    Qin Ma
    Genomics,Proteomics & Bioinformatics, 2020, (05) : 582 - 592
  • [3] Introducing l1-regularized Logistic Regression in Markov Networks based EDAs
    Luigi, Malago
    Matteo, Matteucci
    Gabriele, Valentini
    2011 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION (CEC), 2011, : 1581 - 1588
  • [4] Predicting Protein-Protein Interactions based on Biological Information using Extreme Gradient Boosting
    Beltran, Jerome Cary
    Valdez, Paolo
    Naval, Prospero, Jr.
    2019 16TH IEEE INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE IN BIOINFORMATICS AND COMPUTATIONAL BIOLOGY - CIBCB 2019, 2019, : 346 - 351
  • [5] Multi-label l2-regularized logistic regression for predicting activation/inhibition relationships in human protein-protein interaction networks
    Mei, Suyu
    Zhang, Kun
    SCIENTIFIC REPORTS, 2016, 6