A Deep Learning Framework for Identifying Essential Proteins by Integrating Multiple Types of Biological Information

被引:67
|
作者
Zeng, Min [1 ]
Li, Min [1 ]
Fei, Zhihui [1 ]
Wu, Fang-Xiang [2 ,3 ]
Li, Yaohang [4 ]
Pan, Yi [5 ]
Wang, Jianxin [1 ]
机构
[1] Cent South Univ, Sch Comp Sci & Engn, Changsha, Peoples R China
[2] Univ Saskatchewan, Div Biomed Engn, Saskatoon, SK S7N5A9, Canada
[3] Univ Saskatchewan, Dept Mech Engn, Saskatoon, SK S7N5A9, Canada
[4] Old Dominion Univ, Dept Comp Sci, Norfolk, VA 23529 USA
[5] Georgia State Univ, Dept Comp Sci, Atlanta, GA 30302 USA
基金
中国国家自然科学基金;
关键词
Deep learning; essential proteins; protein-protein interaction network; gene expression; subcellular localization; SUBCELLULAR-LOCALIZATION; ESSENTIAL GENES; CENTRALITY; ANNOTATION;
D O I
10.1109/TCBB.2019.2897679
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Computational methods including centrality and machine learning-based methods have been proposed to identify essential proteins for understanding the minimum requirements of the survival and evolution of a cell. In centrality methods, researchers are required to design a score function which is based on prior knowledge, yet is usually not sufficient to capture the complexity of biological information. In machine learning-based methods, some selected biological features cannot represent the complete properties of biological information as they lack a computational framework to automatically select features. To tackle these problems, we propose a deep learning framework to automatically learn biological features without prior knowledge. We use node2vec technique to automatically learn a richer representation of protein-protein interaction (PPI) network topologies than a score function. Bidirectional long short term memory cells are applied to capture non-local relationships in gene expression data. For subcellular localization information, we exploit a high dimensional indicator vector to characterize their feature. To evaluate the performance of our method, we tested it on PPI network of S. cerevisiae. Our experimental results demonstrate that the performance of our method is better than traditional centrality methods and is superior to existing machine learning-based methods. To explore which of the three types of biological information is the most vital element, we conduct an ablation study by removing each component in turn. Our results show that the PPI network embedding contributes most to the improvement. In addition, gene expression profiles and subcellular localization information are also helpful to improve the performance in identification of essential proteins.
引用
收藏
页码:296 / 305
页数:10
相关论文
共 50 条
  • [1] A deep learning framework for identifying essential proteins based on multiple biological information
    Yi Yue
    Chen Ye
    Pei-Yun Peng
    Hui-Xin Zhai
    Iftikhar Ahmad
    Chuan Xia
    Yun-Zhi Wu
    You-Hua Zhang
    [J]. BMC Bioinformatics, 23
  • [2] A deep learning framework for identifying essential proteins based on multiple biological information
    Yue, Yi
    Ye, Chen
    Peng, Pei-Yun
    Zhai, Hui-Xin
    Ahmad, Iftikhar
    Xia, Chuan
    Wu, Yun-Zhi
    Zhang, You-Hua
    [J]. BMC BIOINFORMATICS, 2022, 23 (01)
  • [3] DeepEP: a deep learning framework for identifying essential proteins
    Zeng, Min
    Li, Min
    Wu, Fang-Xiang
    Li, Yaohang
    Pan, Yi
    [J]. BMC BIOINFORMATICS, 2019, 20 (Suppl 16)
  • [4] DeepEP: a deep learning framework for identifying essential proteins
    Min Zeng
    Min Li
    Fang-Xiang Wu
    Yaohang Li
    Yi Pan
    [J]. BMC Bioinformatics, 20
  • [5] EnsembleDL-ATG: Identifying autophagy proteins by integrating their sequence and evolutionary information using an ensemble deep learning framework
    Yu, Lezheng
    Zhang, Yonglin
    Xue, Li
    Liu, Fengjuan
    Jing, Runyu
    Luo, Jiesi
    [J]. COMPUTATIONAL AND STRUCTURAL BIOTECHNOLOGY JOURNAL, 2023, 21 : 4836 - 4848
  • [6] Deep Contextual Representation Learning for Identifying Essential Proteins via Integrating Multisource Protein Features
    LI Weihua
    LIU Wenyang
    GUO Yanbu
    WANG Bingyi
    QING Hua
    [J]. Chinese Journal of Electronics, 2023, 32 (04) : 868 - 881
  • [7] Deep Contextual Representation Learning for Identifying Essential Proteins via Integrating Multisource Protein Features
    Li Weihua
    Liu Wenyang
    Guo Yanbu
    Wang Bingyi
    Qing Hua
    [J]. CHINESE JOURNAL OF ELECTRONICS, 2023, 32 (04) : 868 - 881
  • [8] An ensemble framework for identifying essential proteins
    Xue Zhang
    Wangxin Xiao
    Marcio Luis Acencio
    Ney Lemke
    Xujing Wang
    [J]. BMC Bioinformatics, 17
  • [9] An ensemble framework for identifying essential proteins
    Zhang, Xue
    Xiao, Wangxin
    Acencio, Marcio Luis
    Lemke, Ney
    Wang, Xujing
    [J]. BMC BIOINFORMATICS, 2016, 17
  • [10] A deep learning framework for identifying essential proteins based on protein-protein interaction network and gene expression data
    Zeng, Min
    Li, Min
    Fei, Zhihui
    Wu, Fang-Xiang
    Li, Yaohang
    Pan, Yi
    [J]. PROCEEDINGS 2018 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2018, : 583 - 588