DeepPPF: A deep learning framework for predicting protein family

被引:15
|
作者
Yusuf, Shehu Mohammed [1 ]
Zhang, Fuhao [1 ]
Zeng, Min [1 ]
Li, Min [1 ]
机构
[1] Cent South Univ, Sch Comp Sci & Engn, Changsha 410083, Peoples R China
基金
中国国家自然科学基金;
关键词
Multi-scale convolutional neural network; Protein functional family; Protein sequence; Deep learning; MULTIPLE SEQUENCE ALIGNMENT;
D O I
10.1016/j.neucom.2020.11.062
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Machine learning pipelines for protein functional family prediction are urgently needed especially now that only 1% of raw protein sequences have been manually annotated. Although existing machine learning algorithms have achieved a decent performance in modeling and predicting the functional families of protein sequences, they still have two drawbacks. First, biological dependencies among nucleotides are not rich enough to describe motifs for these methods. Also, existing algorithms are not accurate enough to predict the functional families of newly discovered proteins. To address the above limitations simultaneously, we propose a novel deep learning framework for predicting protein family, DeepPPF, which employs the word2vec technique in capturing distributional dependencies among nucleotides and discovers rich features from diverse motif lengths to characterize proteins. The novelty of the DeepPPF is in utilizing distributional dependencies among nucleotides. Experimental results on G protein-coupled receptor hierarchical datasets show the effectiveness of DeepPPF in achieving the state-of-the-art performance in items of Mathew's correlation coefficients (MCC) of 97.62%, 88.45% and, 83.09% for family, subfamily and, sub-subfamily hierarchical levels, respectively. Also, DeepPPF outperformed existing methods in terms of prediction accuracy and Mathew's correlation coefficients on the cluster of orthologous groups (COG) and phage of orthologous groups (POG) datasets. Furthermore, we analyzed the ability of DeepPPF framework to discover rich motifs for functional classes with the least sets of protein sequences. The experimental results show that rich motif discovery is key to improving the modeling performance of protein families through deep learning techniques. Finally, we investigated the effect of transferring a low-level functional domain level to a high-level functional domain and results show that the target domain prediction can be improved with transfer learning. Therefore, our proposed deep learning framework can be useful in characterizing protein functional families. The codes and datasets are available at https://github.com/CSUBioGroup/DeepPPF. (C) 2020 Published by Elsevier B.V.
引用
收藏
页码:19 / 29
页数:11
相关论文
共 50 条
  • [1] ProtInteract: A deep learning framework for predicting protein-protein interactions
    Soleymani, Farzan
    Paquet, Eric
    Viktor, Herna Lydia
    Michalowski, Wojtek
    Spinello, Davide
    [J]. COMPUTATIONAL AND STRUCTURAL BIOTECHNOLOGY JOURNAL, 2023, 21 : 1324 - 1348
  • [2] A Deep Learning Framework for Predicting Protein Functions With Co-Occurrence of GO Terms
    Li, Min
    Shi, Wenbo
    Zhang, Fuhao
    Zeng, Min
    Li, Yaohang
    [J]. IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2023, 20 (02) : 833 - 842
  • [3] A Deep Learning Framework for Predicting Response to Therapy in Cancer
    Sakellaropoulos, Theodore
    Vougas, Konstantinos
    Narang, Sonali
    Koinis, Filippos
    Kotsinas, Athanassios
    Polyzos, Alexander
    Moss, Tyler J.
    Piha-Paul, Sarina
    Zhou, Hua
    Kardala, Eleni
    Damianidou, Eleni
    Alexopoulos, Leonidas G.
    Aifantis, Iannis
    Townsend, Paul A.
    Panayiotidis, Mihalis I.
    Sfikakis, Petros
    Bartek, Jiri
    Fitzgerald, Rebecca C.
    Thanos, Dimitris
    Shaw, Kenna R. Mills
    Petty, Russell
    Tsirigos, Aristotelis
    Gorgoulis, Vassilis G.
    [J]. CELL REPORTS, 2019, 29 (11): : 3367 - +
  • [4] A deep learning framework for predicting cyber attacks rates
    Fang, Xing
    Xu, Maochao
    Xu, Shouhuai
    Zhao, Peng
    [J]. EURASIP JOURNAL ON INFORMATION SECURITY, 2019, 2019 (1)
  • [5] A deep learning framework for predicting cyber attacks rates
    Xing Fang
    Maochao Xu
    Shouhuai Xu
    Peng Zhao
    [J]. EURASIP Journal on Information Security, 2019
  • [6] Efficient Framework for Predicting ncRNA-Protein Interactions Based on Sequence Information by Deep Learning
    Zhan, Zhao-Hui
    You, Zhu-Hong
    Zhou, Yong
    Li, Li-Ping
    Li, Zheng-Wei
    [J]. INTELLIGENT COMPUTING THEORIES AND APPLICATION, PT II, 2018, 10955 : 337 - 344
  • [7] Investigation of protein family relationships with deep learning
    Ponamareva, Irina
    Andreeva, Antonina
    Bileschi, Maxwell L.
    Colwell, Lucy
    Bateman, Alex
    [J]. BIOINFORMATICS ADVANCES, 2024, 4 (01):
  • [8] ROSE: A Deep Learning Based Framework for Predicting Ribosome Stalling
    Zhang, Sai
    Hu, Hailin
    Zhou, Jingtian
    He, Xuan
    Jiang, Tao
    Zeng, Jianyang
    [J]. RESEARCH IN COMPUTATIONAL MOLECULAR BIOLOGY, RECOMB 2017, 2017, 10229 : 402 - 403
  • [9] Predicting Book Sales Trend using Deep Learning Framework
    Feng, Tan Qin
    Choy, Murphy
    Laik, Ma Nang
    [J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2020, 11 (02) : 28 - 39
  • [10] Robust Deep Learning Framework For Predicting Respiratory Anomalies and Diseases
    Lam Pham
    McLoughlin, Ian
    Huy Phan
    Minh Tran
    Truc Nguyen
    Palaniappan, Ramaswamy
    [J]. 42ND ANNUAL INTERNATIONAL CONFERENCES OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY: ENABLING INNOVATIVE TECHNOLOGIES FOR GLOBAL HEALTHCARE EMBC'20, 2020, : 164 - 167